Simple function to search a PCIBus to find a port by
it's port number.
CXL interleave decoding uses the port number as a target
so it is necessary to locate the port when doing interleave
decoding.
Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-31-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This adds just enough of a root port implementation to be able to
enumerate root ports (creating the required DVSEC entries). What's not
here yet is the MMIO nor the ability to write some of the DVSEC entries.
This can be added with the qemu commandline by adding a rootport to a
specific CXL host bridge. For example:
-device cxl-rp,id=rp0,bus="cxl.0",addr=0.0,chassis=4
Like the host bridge patch, the ACPI tables aren't generated at this
point and so system software cannot use it.
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-17-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This works like adding a typical pxb device, except the name is
'pxb-cxl' instead of 'pxb-pcie'. An example command line would be as
follows:
-device pxb-cxl,id=cxl.0,bus="pcie.0",bus_nr=1
A CXL PXB is backward compatible with PCIe. What this means in practice
is that an operating system that is unaware of CXL should still be able
to enumerate this topology as if it were PCIe.
One can create multiple CXL PXB host bridges, but a host bridge can only
be connected to the main root bus. Host bridges cannot appear elsewhere
in the topology.
Note that as of this patch, the ACPI tables needed for the host bridge
(specifically, an ACPI object in _SB named ACPI0016 and the CEDT) aren't
created. So while this patch internally creates it, it cannot be
properly used by an operating system or other system software.
Also necessary is to add an exception to scripts/device-crash-test
similar to that for exiting pxb as both must created on a PCIexpress
host bus.
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan.Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-15-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A CXL component is a hardware entity that implements CXL component
registers from the CXL 2.0 spec (8.2.3). Currently these represent 3
general types.
1. Host Bridge
2. Ports (root, upstream, downstream)
3. Devices (memory, other)
A CXL component can be conceptually thought of as a PCIe device with
extra functionality when enumerated and enabled. For this reason, CXL
does here, and will continue to add on to existing PCI code paths.
Host bridges will typically need to be handled specially and so they can
implement this newly introduced interface or not. All other components
should implement this interface. Implementing this interface allows the
core PCI code to treat these devices as special where appropriate.
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Adam Manzanares <a.manzanares@samsung.com>
Message-Id: <20220429144110.25167-2-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A global boolean variable "vga_interface_created"(declared in softmmu/globals.c)
has been used to track the creation of vga interface. If the vga flag is passed
in the command line "default_vga"(declared in softmmu/vl.c) variable is set to 0.
To warn user, the condition checks if vga_interface_created is false
and default_vga is equal to 0. If "-vga none" is passed, this patch will not warn the
user regarding the creation of VGA device.
The warning "A -vga option was passed but this
machine type does not use that option; no VGA device has been created"
is logged if vga flag is passed but no vga device is created.
This patch has been tested for x86_64, i386, sparc, sparc64 and arm boards.
Signed-off-by: Gautam Agrawal <gautamnagrawal@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/581
Message-Id: <20220501122505.29202-1-gautamnagrawal@gmail.com>
[thuth: Fix wrong warning with "-device" in some cases as reported by Paolo]
Signed-off-by: Thomas Huth <thuth@redhat.com>
This patch skips [de]asserting a LSI interrupt if the device doesn't
have any LSI defined. Doing so would trigger an assert in
pci_irq_handler().
The PCIE root port implementation in qemu requests a LSI (INTA), but a
subclass may want to change that behavior since it's a valid
configuration. For example on the POWER8/POWER9/POWER10 systems, the
root bridge doesn't request any LSI.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220408131303.147840-2-fbarrat@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.
This commit only touches allocations with size arguments of the form
sizeof(T).
Patch created mechanically with:
$ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \
--macro-file scripts/cocci-macro-file.h FILES...
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20220315144156.1595462-4-armbru@redhat.com>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
on creation a PCIDevice has power turned on at the end of pci_qdev_realize()
however later on if PCIe slot isn't populated with any children
it's power is turned off. It's fine if native hotplug is used
as plug callback will power slot on among other things.
However when ACPI hotplug is enabled it replaces native PCIe plug
callbacks with ACPI specific ones (acpi_pcihp_device_*plug_cb) and
as result slot stays powered off. It works fine as ACPI hotplug
on guest side takes care of enumerating/initializing hotplugged
device. But when later guest is migrated, call chain introduced by]
commit d5daff7d31 (pcie: implement slot power control for pcie root ports)
pcie_cap_slot_post_load()
-> pcie_cap_update_power()
-> pcie_set_power_device()
-> pci_set_power()
-> pci_update_mappings()
will disable earlier initialized BARs for the hotplugged device
in powered off slot due to commit 23786d1344 (pci: implement power state)
which disables BARs if power is off.
Fix it by setting PCI_EXP_SLTCTL_PCC to PCI_EXP_SLTCTL_PWR_ON
on slot (root port/downstream port) at the time a device
hotplugged into it. As result PCI_EXP_SLTCTL_PWR_ON is migrated
to target and above call chain keeps device plugged into it
powered on.
Fixes: d5daff7d31 ("pcie: implement slot power control for pcie root ports")
Fixes: 23786d1344 ("pci: implement power state")
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2053584
Suggested-by: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220301151200.3507298-3-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
During qemu init stage, when there is pci BDF conflicts, qemu print
a warning but not showing which device the BDF is occupied by. E.x:
"PCI: slot 2 function 0 not available for virtio-scsi-pci, in use by virtio-scsi-pci"
To facilitate user knowing the offending device and fixing it, showing
the id info in the warning.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20220223094435.64495-1-zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Convenience function for retrieving the PCIDevice object of the N-th VF.
Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
Reviewed-by: Knut Omang <knuto@ifi.uio.no>
Message-Id: <20220217174504.1051716-4-lukasz.maniak@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch provides the building blocks for creating an SR/IOV
PCIe Extended Capability header and register/unregister
SR/IOV Virtual Functions.
Signed-off-by: Knut Omang <knuto@ifi.uio.no>
Message-Id: <20220217174504.1051716-2-lukasz.maniak@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Skip triggering an LSI when the AER root error status is updated if no
LSI is defined for the device. We can have a root bridge with no LSI,
MSI and MSI-X defined, for example on POWER systems.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Message-Id: <20211116170133.724751-4-fbarrat@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Move the pci_intx() definition to the PCI header file, so that it can
be called from other PCI files. It is used by the next patch.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Message-Id: <20211116170133.724751-3-fbarrat@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Unify format used by trace_pci_update_mappings_del(),
trace_pci_update_mappings_add(), trace_pci_cfg_write() and
trace_pci_cfg_read() to print the device name and bus number,
slot number and function number.
For instance:
pci_cfg_read virtio-net-pci 00:0 @0x20 -> 0xffffc00c
pci_cfg_write virtio-net-pci 00:0 @0x20 <- 0xfea0000c
pci_update_mappings_del d=0x555810b92330 01:00.0 4,0xffffc000+0x4000
pci_update_mappings_add d=0x555810b92330 01:00.0 4,0xfea00000+0x4000
becomes
pci_cfg_read virtio-net-pci 01:00.0 @0x20 -> 0xffffc00c
pci_cfg_write virtio-net-pci 01:00.0 @0x20 <- 0xfea0000c
pci_update_mappings_del virtio-net-pci 01:00.0 4,0xffffc000+0x4000
pci_update_mappings_add virtio-net-pci 01:00.0 4,0xfea00000+0x4000
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20211105192541.655831-1-lvivier@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Orginal qemu commit hash:14d02cfbe4adaeebe7cb833a8cc71191352cf03b
In function pcie_add_capability, an assert contains the
"offset < offset + size" expression.
Both variable offset and variable size are uint16_t,
the comparison is always true due to type promotion.
The next expression may be the same.
It might be like this:
Thread 1 "qemu-system-x86" hit Breakpoint 1, pcie_add_capability (
dev=0x555557ce5f10, cap_id=1, cap_ver=2 '\002', offset=256, size=72)
at ../hw/pci/pcie.c:930
930 {
(gdb) n
931 assert(offset >= PCI_CONFIG_SPACE_SIZE);
(gdb) n
932 assert(offset < offset + size);
(gdb) p offset
$1 = 256
(gdb) p offset < offset + size
$2 = 1
(gdb) set offset=65533
(gdb) p offset < offset + size
$3 = 1
(gdb) p offset < (uint16_t)(offset + size)
$4 = 0
Signed-off-by: Daniella Lee <daniellalee111@gmail.com>
Message-Id: <20211126061324.47331-1-daniellalee111@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add an expire time for pending delete, once the time is over allow
pressing the attention button again.
This makes pcie hotplug behave more like acpi hotplug, where one can
try sending an 'device_del' monitor command again in case the guest
didn't respond to the first attempt.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20211111130859.1171890-7-kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In case the slot is powered off (and the power indicator turned off too)
we can unplug right away, without round-trip to the guest.
Also clear pending attention button press, there is nothing to care
about any more.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20211111130859.1171890-6-kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
No functional change.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20211111130859.1171890-5-kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Refuse to push the attention button in case the guest is busy with some
hotplug operation (as indicated by the power indicator blinking).
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20211111130859.1171890-4-kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
With this patch hot-plugged pci devices will only be visible to the
guest if the guests hotplug driver has enabled slot power.
This should fix the hot-plug race which one can hit when hot-plugging
a pci device at boot, while the guest is in the middle of the pci bus
scan.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20211111130859.1171890-3-kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This allows to power off pci devices. In "off" state the devices will
not be visible. No pci config space access, no pci bar access, no dma.
Default state is "on", so this patch (alone) should not change behavior.
Use case: Allows hotplug controllers implement slot power. Hotplug
controllers doing so should set the inital power state for devices in
the ->plug callback.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20211111130859.1171890-2-kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Mark property as experimental/internal adding 'x-' prefix.
Property was introduced in 6.1 and it should have provided
ability to turn on native PCIE hotplug on port even when
ACPI PCI hotplug is in use is user explicitly sets property
on CLI. However that never worked since slot is wired to
ACPI hotplug controller.
Another non-intended usecase: disable native hotplug on slot
when APCI based hotplug is disabled, which works but slot has
'hotplug' property for this taks.
It should be relatively safe to rename it to experimental
as no users should exist for it and given that the property
is broken we don't really want to leave it around for much
longer lest users start using it.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Message-Id: <20211112110857.3116853-2-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
They're actually more commonly used than the helper without _under_bus, because
most callers do have the pci bus on hand. After exporting we can switch a lot
of the call sites to use these two helpers.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20211028043129.38871-3-peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
They're used in quite a few places of pci.[ch] and also in the rest of the code
base. Define them so that it doesn't need to be defined all over the places.
The pci_bus_fn is similar to pci_bus_dev_fn that only takes a PCIBus* and an
opaque. The pci_bus_ret_fn is similar to pci_bus_fn but it allows to return a
void* pointer.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20211028043129.38871-2-peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
PCI resource reserve capability should use LE format as all other PCI
things. If we don't then seabios won't boot:
=== PCI new allocation pass #1 ===
PCI: check devices
PCI: QEMU resource reserve cap: size 10000000000000 type io
PCI: secondary bus 1 size 10000000000000 type io
PCI: secondary bus 1 size 00200000 type mem
PCI: secondary bus 1 size 00200000 type prefmem
=== PCI new allocation pass #2 ===
PCI: out of I/O address space
This became more important since we started reserving IO by default,
previously no one noticed.
Fixes: e2a6290aab ("hw/pcie-root-port: Fix hotplug for PCI devices requiring IO")
Cc: marcel.apfelbaum@gmail.com
Fixes: 226263fb5c ("hw/pci: add QEMU-specific PCI capability to the Generic PCI Express Root Port")
Cc: zuban32s@gmail.com
Fixes: 6755e618d0 ("hw/pci: add PCI resource reserve capability to legacy PCI bridge")
Cc: jing2.liu@linux.intel.com
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Rename the "allocate and return" qbus creation function to
qbus_new(), to bring it into line with our _init vs _new convention.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
Message-id: 20210923121153.23754-6-peter.maydell@linaro.org
Rename qbus_create_inplace() to qbus_init(); this is more in line
with our usual naming convention for functions that in-place
initialize objects.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20210923121153.23754-5-peter.maydell@linaro.org
Rename the pci_root_bus_new_inplace() function to
pci_root_bus_init(); this brings the bus type in to line with a
"_init for in-place init, _new for allocate-and-return" convention.
To do this we need to rename the implementation-internal function
that was using the pci_root_bus_init() name to
pci_root_bus_internal_init().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20210923121153.23754-4-peter.maydell@linaro.org
This helps to get the min and max bus number of a PCI bus hierarchy.
Signed-off-by: Xingang Wang <wangxingang5@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <1625748919-52456-6-git-send-email-wangxingang5@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add a new bypass_iommu property for PCI host and use it to check
whether devices attached to the PCI root bus will bypass iommu.
In pci_device_iommu_address_space(), check the property and
avoid getting iommu address space for devices bypass iommu.
Signed-off-by: Xingang Wang <wangxingang5@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <1625748919-52456-2-git-send-email-wangxingang5@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Instead of changing the hot-plug type in _OSC register, do not
set the 'Hot-Plug Capable' flag. This way guest will choose ACPI
hot-plug if it is preferred and leave the option to use SHPC with
pcie-pci-bridge.
The ability to control hot-plug for each downstream port is retained,
while 'hotplug=off' on the port means all hot-plug types are disabled.
Signed-off-by: Julia Suvorova <jusual@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20210713004205.775386-4-jusual@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit e50caf4a5c ("tracing: convert documentation to rST")
converted docs/devel/tracing.txt to docs/devel/tracing.rst.
We still have several references to the old file, so let's fix them
with the following command:
sed -i s/tracing.txt/tracing.rst/ $(git grep -l docs/devel/tracing.txt)
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210517151702.109066-2-sgarzare@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Stop including exec/address-spaces.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-5-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Commit 4c70875372 ("pci: advertise a page aligned ATS") advertises
the page aligned via ATS capability (RO) to unbrek recent Linux IOMMU
drivers since 5.2. But it forgot the compat the capability which
breaks the migration from old machine type:
(qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x104 read:
0 device: 20 cmask: ff wmask: 0 w1cmask:0
This patch introduces a new parameter "x-ats-page-aligned" for
virtio-pci device and turns it on for machine type which is newer than
5.1.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: qemu-stable@nongnu.org
Fixes: 4c70875372 ("pci: advertise a page aligned ATS")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20210406040330.11306-1-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If a device model
(a) doesn't set the value to a correct interrupt number and then
(b) triggers an interrupt for itself,
it's device model bug. Add assert on interrupt pin number to catch
this kind of bug more obviously.
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Message-Id: <9cf8ac3b17e162daac0971d7be32deb6a33ae6ec.1616532563.git.isaku.yamahata@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In x86/ACPI world, linux distros are using predictable
network interface naming since systemd v197. Which on
QEMU based VMs results into path based naming scheme,
that names network interfaces based on PCI topology.
With itm on has to plug NIC in exactly the same bus/slot,
which was used when disk image was first provisioned/configured
or one risks to loose network configuration due to NIC being
renamed to actually used topology.
That also restricts freedom to reshape PCI configuration of
VM without need to reconfigure used guest image.
systemd also offers "onboard" naming scheme which is
preferred over PCI slot/topology one, provided that
firmware implements:
"
PCI Firmware Specification 3.1
4.6.7. DSM for Naming a PCI or PCI Express Device Under
Operating Systems
"
that allows to assign user defined index to PCI device,
which systemd will use to name NIC. For example, using
-device e1000,acpi-index=100
guest will rename NIC to 'eno100', where 'eno' is default
prefix for "onboard" naming scheme. This doesn't require
any advance configuration on guest side to com in effect
at 'onboard' scheme takes priority over path based naming.
Hope is that 'acpi-index' it will be easier to consume by
management layer, compared to forcing specific PCI topology
and/or having several disk image templates for different
topologies and will help to simplify process of spawning
VM from the same template without need to reconfigure
guest NIC.
This patch adds, 'acpi-index'* property and wires up
a 32bit register on top of pci hotplug register block
to pass index value to AML code at runtime.
Following patch will add corresponding _DSM code and
wire it up to PCI devices described in ACPI.
*) name comes from linux kernel terminology
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210315180102.3008391-3-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
While pci_bus_realize() currently does not use the Error* argument,
it would be an error to leave pcie_bus_realize() setting bus->flags
if pci_bus_realize() had failed.
Fix by using a local Error* and return early (propagating the error)
if pci_bus_realize() failed.
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210201153700.618946-1-philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When the pcie slot is initialized, by default PCI_EXP_LNKSTA_DLLLA
(Data Link Layer Link Active) is set in PCI_EXP_LNKSTA
(Link Status) without checking if the slot is empty or not.
This is confusing for the kernel because as it sees the link is up
it tries to read the vendor ID and fails:
(From https://bugzilla.kernel.org/show_bug.cgi?id=211691)
[ 1.661105] pcieport 0000:00:02.2: pciehp: Slot Capabilities : 0x0002007b
[ 1.661115] pcieport 0000:00:02.2: pciehp: Slot Status : 0x0010
[ 1.661123] pcieport 0000:00:02.2: pciehp: Slot Control : 0x07c0
[ 1.661138] pcieport 0000:00:02.2: pciehp: Slot #0 AttnBtn+ PwrCtrl+ MRL- AttnInd+ PwrInd+ HotPlug+ Surprise+ Interlock+ NoCompl- IbPresDis- LLActRep+
[ 1.662581] pcieport 0000:00:02.2: pciehp: pciehp_get_power_status: SLOTCTRL 6c value read 7c0
[ 1.662597] pcieport 0000:00:02.2: pciehp: pciehp_check_link_active: lnk_status = 2204
[ 1.662703] pcieport 0000:00:02.2: pciehp: pending interrupts 0x0010 from Slot Status
[ 1.662706] pcieport 0000:00:02.2: pciehp: pcie_enable_notification: SLOTCTRL 6c write cmd 1031
[ 1.662730] pcieport 0000:00:02.2: pciehp: pciehp_check_link_active: lnk_status = 2204
[ 1.662748] pcieport 0000:00:02.2: pciehp: pciehp_check_link_active: lnk_status = 2204
[ 1.662750] pcieport 0000:00:02.2: pciehp: Slot(0-2): Link Up
[ 2.896132] pcieport 0000:00:02.2: pciehp: pciehp_check_link_status: lnk_status = 2204
[ 2.896135] pcieport 0000:00:02.2: pciehp: Slot(0-2): No device found
[ 2.896900] pcieport 0000:00:02.2: pciehp: pending interrupts 0x0010 from Slot Status
[ 2.896903] pcieport 0000:00:02.2: pciehp: pciehp_power_off_slot: SLOTCTRL 6c write cmd 400
[ 3.656901] pcieport 0000:00:02.2: pciehp: pending interrupts 0x0009 from Slot Status
This is really a problem with virtio-net failover that hotplugs a VFIO
card during the boot process. The kernel can shutdown the slot while
QEMU is hotplugging it, and this likely ends by an automatic unplug of
the card. At the end of the boot sequence the card has disappeared.
To fix that, don't set the "Link Active" state in the init function, but
rely on the plug function to do it, as the mechanism has already been
introduced by 2f2b18f60b.
Fixes: 2f2b18f60b ("pcie: set link state inactive/active after hot unplug/plug")
Cc: zhengxiang9@huawei.com
Fixes: 3d67447fe7 ("pcie: Fill PCIESlot link fields to support higher speeds and widths")
Cc: alex.williamson@redhat.com
Fixes: b2101eae63 ("pcie: Set the "link active" in the link status register")
Cc: benh@kernel.crashing.org
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20210212135250.2738750-5-lvivier@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit a1190ab628 has added a "allow_unplug_during_migration = true" at
the end of the main "if" block, so it is not needed to set it anymore
in the previous checking.
Remove it, to have only sub-ifs that check for needed conditions and exit
if one fails.
Fixes: 4f5b6a05a4 ("pci: add option for net failover")
Fixes: a1190ab628 ("migration: allow unplug during migration for failover devices")
Cc: jfreimann@redhat.com
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20210212135250.2738750-2-lvivier@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
This property can be useful for distros to set up known-good ROM sizes for
migration purposes. The VM will fail to start if the ROM is too large,
and migration compatibility will not be broken if the ROM is too small.
Note that even though romsize is a uint32_t, it has to be between 1
(because empty ROM files are not accepted, and romsize must be greater
than the file) and 2^31 (because values above are not powers of two and
are rejected).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201218182736.1634344-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210203131828.156467-3-pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
get_image_size() returns an int64_t, which pci_add_option_rom() assigns
to an "int" without any range checking. A 32-bit BAR could be up to
2 GiB in size, so reject anything above it. In order to accomodate
a rounded-up size of 2 GiB, change pci_patch_ids's size argument
to unsigned.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210203131828.156467-2-pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
These cases require a bit more thought to review; in each case, the
code was appending to a list, but not with a FOOList **tail variable.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210113221013.390592-6-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Flawed change to qmp_guest_network_get_interfaces() dropped]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
When the slot is in steady powered-off state and the device is being
removed, there's no need to press the attention button. Nor is it
mandated by the Standard Hot-Plug Controller Specification, Rev. 1.0.
Moreover it confuses the guest, Linux in particular, as it assumes that
the attention button pressed in this state indicates that the device has
been inserted and will need to be powered on. Therefore it transitions
the slot into BLINKING_ON state for 5 seconds, and discovers at the end
that no device is actually inserted:
... unplug request
[12685.451329] shpchp 0000:01:00.0: Button pressed on Slot(2)
[12685.455478] shpchp 0000:01:00.0: PCI slot #2 - powering off due to button press
... in 5 seconds OS powers off the slot, QEMU ejects the device
[12690.632282] shpchp 0000:01:00.0: Latch open on Slot(2)
... excessive button press in steady powered-off state
[12690.634267] shpchp 0000:01:00.0: Button pressed on Slot(2)
[12690.636256] shpchp 0000:01:00.0: Card not present on Slot(2)
... the last button press spawns powering on the slot
[12690.638909] shpchp 0000:01:00.0: PCI slot #2 - powering on due to button press
... in 5 more seconds attempt to power on discovers empty slot
[12695.735986] shpchp 0000:01:00.0: No adapter on slot(2)
Worse, if the real device insertion happens within 5 seconds from the
apparent completion of the previous device removal (signaled via
DEVICE_DELETED event), the new button press will be interpreted as the
cancellation of that misguided powering on:
[13448.965295] shpchp 0000:01:00.0: Button pressed on Slot(2)
[13448.969430] shpchp 0000:01:00.0: PCI slot #2 - powering off due to button press
[13454.025107] shpchp 0000:01:00.0: Latch open on Slot(2)
[13454.027101] shpchp 0000:01:00.0: Button pressed on Slot(2)
[13454.029165] shpchp 0000:01:00.0: Card not present on Slot(2)
... the excessive button press spawns powering on the slot
... device has already been ejected by QEMU
[13454.031949] shpchp 0000:01:00.0: PCI slot #2 - powering on due to button press
... new device is inserted in the slot
[13456.861545] shpchp 0000:01:00.0: Latch close on Slot(2)
... valid button press arrives before 5 s since the wrong one
[13456.864894] shpchp 0000:01:00.0: Button pressed on Slot(2)
[13456.869211] shpchp 0000:01:00.0: Card present on Slot(2)
... the valid button press is counted as cancellation of the wrong one
[13456.873173] shpchp 0000:01:00.0: Button cancel on Slot(2)
[13456.877101] shpchp 0000:01:00.0: PCI slot #2 - action canceled due to button press
As a result, the newly inserted device isn't brought up by the guest.
Avoid this situation by not pushing the attention button when the device
in the slot is in powered-off state and is being ejected.
FWIW pcie implementation doesn't suffer from this problem.
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Message-Id: <20201102053750.2281818-1-rvkagan@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit 8118f0950f "migration: Append JSON description of migration
stream" needs a JSON writer. The existing qobject_to_json() wasn't a
good fit, because it requires building a QObject to convert. Instead,
migration got its very own JSON writer, in commit 190c882ce2 "QJSON:
Add JSON writer". It tacitly limits numbers to int64_t, and strings
contents to characters that don't need escaping, unlike
qobject_to_json().
The previous commit factored the JSON writer out of qobject_to_json().
Replace migration's JSON writer by it.
Cc: Juan Quintela <quintela@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201211171152.146877-17-armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Move the property types and property macros implemented in
qdev-properties-system.c to a new qdev-properties-system.h
header.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20201211220529.2290218-16-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Generalize the qdev_hotplug variable to the different phases of
machine initialization. We would like to allow different
monitor commands depending on the phase.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Prevent future developers working on root complexes, root ports, or
bridges that also wish to implement a BAR for those, from shooting
themselves in the foot. PCI type 1 headers only support 2 base address
registers. It is incorrect and difficult to figure out what is wrong
with the device when this mistake is made. With this, it is immediate
and obvious what has gone wrong.
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Message-Id: <20201015181411.89104-2-ben.widawsky@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Asserts are used for developer bugs. As registering a bar of the wrong
size is not something that should be possible for a user to achieve,
this is a developer bug.
While here, use the more obvious helper function.
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Message-Id: <20201015181411.89104-1-ben.widawsky@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
After Linux kernel commit 61363c1474b1 ("iommu/vt-d: Enable ATS only
if the device uses page aligned address."), ATS will be only enabled
if device advertises a page aligned request.
Unfortunately, vhost-net is the only user and we don't advertise the
aligned request capability in the past since both vhost IOTLB and
address_space_get_iotlb_entry() can support non page aligned request.
Though it's not clear that if the above kernel commit makes
sense. Let's advertise a page aligned ATS here to make vhost device
IOTLB work with Intel IOMMU again.
Note that in the future we may extend pcie_ats_init() to accept
parameters like queue depth and page alignment.
Cc: qemu-stable@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20200909081731.24688-1-jasowang@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
These assertions similar to those in the adjacent pci_bus_get_irq_level()
function ensure that irqnum lies within the valid PCI bus IRQ range.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20201011082022.3016-1-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20201024203900.3619498-3-f4bug@amsat.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Extract pci_bus_change_irq_level() from pci_change_irq_level() to
make it clearer it operates on the bus.
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20201024203900.3619498-2-f4bug@amsat.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
'occupied' is spelled like 'ocuppied' in the message.
Signed-off-by: Julia Suvorova <jusual@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20201006133958.600932-1-jusual@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Only qemu-system-FOO and qemu-storage-daemon provide QMP
monitors, therefore such declarations and definitions are
irrelevant for user-mode emulation.
Extracting the PCI commands to their own schema reduces the size of
the qapi-misc* headers generated, and pulls less QAPI-generated code
into user-mode.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200913195348.1064154-9-philmd@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
When debugging QEMU it is often useful to put a breakpoint on the
error_setg_internal method impl.
Unfortunately the object_property_add / object_class_property_add
methods call object_property_find / object_class_property_find methods
to check if a property exists already before adding the new property.
As a result there are a huge number of calls to error_setg_internal
on startup of most QEMU commands, making it very painful to set a
breakpoint on this method.
Most callers of object_find_property and object_class_find_property,
however, pass in a NULL for the Error parameter. This simplifies the
methods to remove the Error parameter entirely, and then adds some
new wrapper methods that are able to raise an Error when needed.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200914135617.1493072-1-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Meson doesn't enjoy the same flexibility we have with Make in choosing
the include path. In particular the tracing headers are using
$(build_root)/$(<D).
In order to keep the include directives unchanged,
the simplest solution is to generate headers with patterns like
"trace/trace-audio.h" and place forwarding headers in the source tree
such that for example "audio/trace.h" includes "trace/trace-audio.h".
This patch is too ugly to be applied to the Makefiles now. It's only
a way to separate the changes to the tracing header files from the
Meson rewrite of the tracing logic.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The pci host config register is used to save PCI address for
read/write config data. If guest writes a value to config register,
and then QEMU pauses the vcpu to migrate, after the migration, the guest
will continue to write pci config data, and the write data will be ignored
because of new qemu process losing the config register state.
To trigger the bug:
1. guest is booting in seabios.
2. guest enables the SMRAM in seabios:piix4_apmc_smm_setup, and then
expects to disable the SMRAM by pci_config_writeb.
3. after guest writes the pci host config register, QEMU pauses vcpu
to finish migration.
4. guest write of config data(0x0A) fails to disable the SMRAM because
the config register state is lost.
5. guest continues to boot and crashes in ipxe option ROM due to SMRAM
in enabled state.
Example Reproducer:
step 1. Make modifications to seabios and qemu for increase reproduction
efficiency, write 0xf0 to 0x402 port notify qemu to stop vcpu after
0x0cf8 port wrote i440 configure register. qemu stop vcpu when catch
0x402 port wrote 0xf0.
seabios:/src/hw/pci.c
@@ -52,6 +52,11 @@ void pci_config_writeb(u16 bdf, u32 addr, u8 val)
writeb(mmconfig_addr(bdf, addr), val);
} else {
outl(ioconfig_cmd(bdf, addr), PORT_PCI_CMD);
+ if (bdf == 0 && addr == 0x72 && val == 0xa) {
+ dprintf(1, "stop vcpu\n");
+ outb(0xf0, 0x402); // notify qemu to stop vcpu
+ dprintf(1, "resume vcpu\n");
+ }
outb(val, PORT_PCI_DATA + (addr & 3));
}
}
qemu:hw/char/debugcon.c
@@ -60,6 +61,9 @@ static void debugcon_ioport_write(void *opaque, hwaddr addr, uint64_t val,
printf(" [debugcon: write addr=0x%04" HWADDR_PRIx " val=0x%02" PRIx64 "]\n", addr, val);
#endif
+ if (ch == 0xf0) {
+ vm_stop(RUN_STATE_PAUSED);
+ }
/* XXX this blocks entire thread. Rewrite to use
* qemu_chr_fe_write and background I/O callbacks */
qemu_chr_fe_write_all(&s->chr, &ch, 1);
step 2. start vm1 by the following command line, and then vm stopped.
$ qemu-system-x86_64 -machine pc-i440fx-5.0,accel=kvm\
-netdev tap,ifname=tap-test,id=hostnet0,vhost=on,downscript=no,script=no\
-device virtio-net-pci,netdev=hostnet0,id=net0,bus=pci.0,addr=0x13,bootindex=3\
-device cirrus-vga,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2\
-chardev file,id=seabios,path=/var/log/test.seabios,append=on\
-device isa-debugcon,iobase=0x402,chardev=seabios\
-monitor stdio
step 3. start vm2 to accept vm1 state.
$ qemu-system-x86_64 -machine pc-i440fx-5.0,accel=kvm\
-netdev tap,ifname=tap-test1,id=hostnet0,vhost=on,downscript=no,script=no\
-device virtio-net-pci,netdev=hostnet0,id=net0,bus=pci.0,addr=0x13,bootindex=3\
-device cirrus-vga,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2\
-chardev file,id=seabios,path=/var/log/test.seabios,append=on\
-device isa-debugcon,iobase=0x402,chardev=seabios\
-monitor stdio \
-incoming tcp:127.0.0.1:8000
step 4. execute the following qmp command in vm1 to migrate.
(qemu) migrate tcp:127.0.0.1:8000
step 5. execute the following qmp command in vm2 to resume vcpu.
(qemu) cont
Before this patch, we get KVM "emulation failure" error on vm2.
This patch fixes it.
Cc: qemu-stable@nongnu.org
Signed-off-by: Hogan Wang <hogan.wang@huawei.com>
Message-Id: <20200727084621.3279-1-hogan.wang@huawei.com>
Reported-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
qbus_set_hotplug_handler() is a simple wrapper around
object_property_set_link().
object_property_set_link() fails when the property doesn't exist, is
not settable, or its .check() method fails. These are all programming
errors here, so passing &error_abort to qbus_set_hotplug_handler() is
appropriate.
Most of its callers do. Exceptions:
* pcie_cap_slot_init(), shpc_init(), spapr_phb_realize() pass NULL,
i.e. they ignore errors.
* spapr_machine_init() passes &error_fatal.
* s390_pcihost_realize(), virtio_serial_device_realize(),
s390_pcihost_plug() pass the error to their callers. The latter two
keep going after the error, which looks wrong.
Drop the @errp parameter, and instead pass &error_abort to
object_property_set_link().
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200630090351.1247703-15-armbru@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200610053247.1583243-18-armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
I'm converting from qdev_create()/qdev_init_nofail() to
qdev_new()/qdev_realize_and_unref(); recent commit "qdev: New
qdev_new(), qdev_realize(), etc." explains why.
PCI devices use qdev_create() through pci_create() and
pci_create_multifunction().
Provide pci_new(), pci_new_multifunction(), and
pci_realize_and_unref() for converting PCI devices.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200610053247.1583243-14-armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
I'm going to convert device realization to qdev_realize() with the
help of Coccinelle. Convert bus realization to qbus_realize() first,
to get it out of Coccinelle's way. Readability improves.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200610053247.1583243-7-armbru@redhat.com>
Sometimes it would be good to be able to read the pin number along
with the IRQ number allocated. Since we'll dump the IRQ number, no
reason to not dump the pin information. For example, the vfio-pci
device will overwrite the pin with the hardware pin number. It would
be nice to know the pin number of one assigned device from QMP/HMP.
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Alex Williamson <alex.williamson@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
CC: Julia Suvorova <jusual@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200317195908.283800-1-peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
QEMU currently aborts when being started with "-nic model=rocker" or with
"-net nic,model=rocker". This happens because the "rocker" device is not
a normal NIC but a switch, which has different properties. Thus we should
only consider real NIC devices for "-nic" and "-net". These devices can
be identified by the "netdev" property, so check for this property before
adding the device to the list.
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Fixes: 52310c3fa7 ("net: allow using any PCI NICs in -net or -nic")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20200527153152.9211-1-thuth@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This code is not related to hardware emulation.
Move it under accel/ with the other hypervisors.
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200508100222.7112-1-philmd@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
IEC binary prefixes ease code review: the unit is explicit.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20200601142930.29408-5-f4bug@amsat.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
memory_region_set_size() handle the 16 Exabytes limit by
special-casing the UINT64_MAX value. This is not a problem
for the 32-bit maximum, 4 GiB.
By using the UINT32_MAX value, the pci_bridge_io MemoryRegion
ends up missing 1 byte:
(qemu) info mtree
memory-region: pci_bridge_io
0000000000000000-00000000fffffffe (prio 0, i/o): pci_bridge_io
0000000000000060-0000000000000060 (prio 0, i/o): i8042-data
0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd
00000000000001ce-00000000000001d1 (prio 0, i/o): vbe
0000000000000378-000000000000037f (prio 0, i/o): parallel
00000000000003b4-00000000000003b5 (prio 0, i/o): vga
...
Fix by using the correct value. We now have:
memory-region: pci_bridge_io
0000000000000000-00000000ffffffff (prio 0, i/o): pci_bridge_io
0000000000000060-0000000000000060 (prio 0, i/o): i8042-data
0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd
...
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20200601142930.29408-4-f4bug@amsat.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
While accessing PCI configuration bytes, assert that
'address + len' is within PCI configuration space.
Generally it is within bounds. This is more of a defensive
assert, in case a buggy device was to send 'address' which
may go out of bounds.
Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20200604113525.58898-1-ppandit@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Check for hot plug capability earlier to avoid removing devices attached
during the initialization process.
Run qemu with an unattached drive:
-drive file=$FILE,if=none,id=drive0 \
-device pcie-root-port,id=rp0,slot=3,bus=pcie.0,hotplug=off
Hotplug a block device:
device_add virtio-blk-pci,id=blk0,drive=drive0,bus=rp0
If hotplug fails on plug_cb, drive0 will be deleted.
Fixes: 0501e1aa1d ("hw/pci/pcie: Forbid hot-plug if it's disabled on the slot")
Acked-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Julia Suvorova <jusual@redhat.com>
Message-Id: <20200604125947.881210-1-jusual@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
PCI spec says:
For all accesses to MSI-X Table and MSI-X PBA fields, software must use
aligned full DWORD or aligned full QWORD transactions; otherwise, the
result is undefined.
However, since MSI-X was converted to use memory API, QEMU
started blocking qword transactions, only allowing DWORD
ones. Guests do not seem to use QWORD accesses, but let's
be spec compliant.
Fixes: 95524ae8dc ("msix: convert to memory API")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Devices may have component devices and buses.
Device realization may fail. Realization is recursive: a device's
realize() method realizes its components, and device_set_realized()
realizes its buses (which should in turn realize the devices on that
bus, except bus_set_realized() doesn't implement that, yet).
When realization of a component or bus fails, we need to roll back:
unrealize everything we realized so far. If any of these unrealizes
failed, the device would be left in an inconsistent state. Must not
happen.
device_set_realized() lets it happen: it ignores errors in the roll
back code starting at label child_realize_fail.
Since realization is recursive, unrealization must be recursive, too.
But how could a partly failed unrealize be rolled back? We'd have to
re-realize, which can fail. This design is fundamentally broken.
device_set_realized() does not roll back at all. Instead, it keeps
unrealizing, ignoring further errors.
It can screw up even for a device with no buses: if the lone
dc->unrealize() fails, it still unregisters vmstate, and calls
listeners' unrealize() callback.
bus_set_realized() does not roll back either. Instead, it stops
unrealizing.
Fortunately, no unrealize method can fail, as we'll see below.
To fix the design error, drop parameter @errp from all the unrealize
methods.
Any unrealize method that uses @errp now needs an update. This leads
us to unrealize() methods that can fail. Merely passing it to another
unrealize method cannot cause failure, though. Here are the ones that
do other things with @errp:
* virtio_serial_device_unrealize()
Fails when qbus_set_hotplug_handler() fails, but still does all the
other work. On failure, the device would stay realized with its
resources completely gone. Oops. Can't happen, because
qbus_set_hotplug_handler() can't actually fail here. Pass
&error_abort to qbus_set_hotplug_handler() instead.
* hw/ppc/spapr_drc.c's unrealize()
Fails when object_property_del() fails, but all the other work is
already done. On failure, the device would stay realized with its
vmstate registration gone. Oops. Can't happen, because
object_property_del() can't actually fail here. Pass &error_abort
to object_property_del() instead.
* spapr_phb_unrealize()
Fails and bails out when remove_drcs() fails, but other work is
already done. On failure, the device would stay realized with some
of its resources gone. Oops. remove_drcs() fails only when
chassis_from_bus()'s object_property_get_uint() fails, and it can't
here. Pass &error_abort to remove_drcs() instead.
Therefore, no unrealize method can fail before this patch.
device_set_realized()'s recursive unrealization via bus uses
object_property_set_bool(). Can't drop @errp there, so pass
&error_abort.
We similarly unrealize with object_property_set_bool() elsewhere,
always ignoring errors. Pass &error_abort instead.
Several unrealize methods no longer handle errors from other unrealize
methods: virtio_9p_device_unrealize(),
virtio_input_device_unrealize(), scsi_qdev_unrealize(), ...
Much of the deleted error handling looks wrong anyway.
One unrealize methods no longer ignore such errors:
usb_ehci_pci_exit().
Several realize methods no longer ignore errors when rolling back:
v9fs_device_realize_common(), pci_qdev_unrealize(),
spapr_phb_realize(), usb_qdev_realize(), vfio_ccw_realize(),
virtio_device_realize().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200505152926.18877-17-armbru@redhat.com>
A little cleanup is possible because of hotplug_pdev introduction.
Signed-off-by: Julia Suvorova <jusual@redhat.com>
Message-Id: <20200427182440.92433-3-jusual@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Raise an error when trying to hot-plug/unplug a device through QMP to a device
with disabled hot-plug capability. This makes the device behaviour more
consistent and provides an explanation of the failure in the case of
asynchronous unplug.
Signed-off-by: Julia Suvorova <jusual@redhat.com>
Message-Id: <20200427182440.92433-2-jusual@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
The pci_do_device_reset() function (called from pci_device_reset)
clears the PCI_INTERRUPT_LINE config reg of devices on the bus but did
this without taking wmask into account. We'll have a device model now
that needs to set a constant value for this reg and this patch allows
to do that without additional workaround in device emulation to
reverse the effect of this PCI bus reset function.
Suggested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-id: 20200313082444.2439-4-mark.cave-ayland@ilande.co.uk
Signed-off-by: John Snow <jsnow@redhat.com>
Make hot-plug/hot-unplug on PCIe Root Ports optional to allow libvirt
manage it and restrict unplug for the whole machine. This is going to
prevent user-initiated unplug in guests (Windows mostly).
Hotplug is enabled by default.
Usage:
-device pcie-root-port,hotplug=off,...
If you want to disable hot-unplug on some downstream ports of one
switch, disable hot-unplug on PCIe Root Port connected to the upstream
port as well as on the selected downstream ports.
Discussion related:
https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg00530.html
Signed-off-by: Julia Suvorova <jusual@redhat.com>
Message-Id: <20200226174607.205941-1-jusual@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Define the new macro VMSTATE_INSTANCE_ID_ANY for callers who wants to
auto-generate the vmstate instance ID. Previously it was hard coded
as -1 instead of this macro. It helps to change this default value in
the follow up patches. No functional change.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Both functions are called by MemoryRegionOps.[read/write] handlers
with unsigned 'size' argument. Both functions call
pci_host_config_[read/write]_common() which expect a uint32_t 'len'
parameter (also unsigned).
Since it is pointless (and confuse) to use a signed value, use a
unsigned type.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20191216002134.18279-3-philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In commit 3bf4dfdd11 we introduced the pci_cfg_[read/write]
trace events in pci_host_config_[read/write]_common().
We have the following call trace:
pci_host_data_[read/write]()
- PCI_DPRINTF()
- pci_data_[read/write]()
- PCI_DPRINTF()
- pci_host_config_[read/write]_common()
trace_pci_cfg_[read/write]()
Since the PCI_DPRINTF() calls are redundant with the trace
events, remove them.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20191216002134.18279-2-philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Now that the old pc-0.x machine types have been removed, this config
knob is not required anymore.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20191209125248.5849-4-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On x86, KVM needs some function from the PCI subsystem in order to set
up interrupt routes. Provide some stubs to support x86 machines that
lack PCI.
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
PCIe requester IDs are used by modern IOMMUs to differentiate devices
in order to provide a unique IOVA address space per device. These
requester IDs are composed of the bus/device/function (BDF) of the
requesting device. Conventional PCI pre-dates this concept and is
simply a shared parallel bus where transactions are claimed by
decoding target ranges rather than the packetized, point-to-point
mechanisms of PCI-express. In order to interface conventional PCI
to PCIe, the PCIe-to-PCI bridge creates and accepts packetized
transactions on behalf of all downstream devices, using one of two
potential forms of a requester ID relating to the bridge itself or its
subordinate bus. All downstream devices are therefore aliased by the
bridge's requester ID and it's not possible for the IOMMU to create
unique IOVA spaces for devices downstream of such buses.
At least that's how it works on bare metal. Until now point we've
ignored this nuance of vIOMMU support in QEMU, creating a unique
AddressSpace per device regardless of the virtual bus topology.
Aside from simply being true to bare metal behavior, there are aspects
of a shared address space that we can use to our advantage when
designing a VM. For instance, a PCI device assignment scenario where
we have the following IOMMU group on the host system:
$ ls /sys/kernel/iommu_groups/1/devices/
0000:00:01.0 0000:01:00.0 0000:01:00.1
An IOMMU group is considered the smallest set of devices which are
fully DMA isolated from other devices by the IOMMU. In this case the
root port at 00:01.0 does not guarantee that it prevents peer to peer
traffic between the endpoints on bus 01: and the devices are therefore
grouped together. VFIO considers an IOMMU group to be the smallest
unit of device ownership and allows only a single shared IOVA space
per group due to the limitations of the isolation.
Therefore, if we attempt to create the following VM, we get an error:
qemu-system-x86_64 -machine q35... \
-device intel-iommu,intremap=on \
-device pcie-root-port,addr=1e.0,id=pcie.1 \
-device vfio-pci,host=1:00.0,bus=pcie.1,addr=0.0,multifunction=on \
-device vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1
qemu-system-x86_64: -device vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1: vfio \
0000:01:00.1: group 1 used in multiple address spaces
VFIO only allows a single IOVA space (AddressSpace) for both devices,
but we've placed them into a topology where the vIOMMU expects a
separate AddressSpace for each device. On bare metal we know that
a conventional PCI bus would provide the sort of aliasing we need
here, forcing the IOMMU to consider these devices to be part of a
single shared IOVA space. The support provided here does the same
for QEMU, such that we can create a conventional PCI topology to
expose equivalent AddressSpace sharing requirements to the VM:
qemu-system-x86_64 -machine q35... \
-device intel-iommu,intremap=on \
-device pcie-pci-bridge,addr=1e.0,id=pci.1 \
-device vfio-pci,host=1:00.0,bus=pci.1,addr=1.0,multifunction=on \
-device vfio-pci,host=1:00.1,bus=pci.1,addr=1.1
There are pros and cons to this configuration; it's not necessarily
recommended, it's simply a tool we can use to create configurations
which may provide additional functionality in spite of host hardware
limitations or as a benefit to the guest configuration or resource
usage. An incomplete list of pros and cons:
Cons:
a) Extended PCI configuration space is unavailable to devices
downstream of a conventional PCI bus. The degree to which this
is a drawback depends on the device and guest drivers.
b) Applying this topology to devices which are already isolated by
the host IOMMU (singleton IOMMU groups) will result in devices
which appear to be non-isolated to the VM (non-singleton groups).
This can limit configurations within the guest, such as userspace
drivers or nested device assignment.
Pros:
a) QEMU better emulates bare metal.
b) Configurations as above are now possible.
c) Host IOMMU resources and VM locked memory requirements are reduced
in vIOMMU configurations due to shared IOMMU domains on the host
and avoidance of duplicate locked memory accounting.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <157187083548.5439.14747141504058604843.stgit@gimli.home>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In "b06424de62 migration: Disable hotplug/unplug during migration" we
added a check to disable unplug for all devices until we have figured
out what works. For failover primary devices qdev_unplug() is called
from the migration handler, i.e. during migration.
This patch adds a flag to DeviceState which is set to false for all
devices and makes an exception for PCI devices that are also
primary devices in a failover pair.
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Message-Id: <20191029114905.6856-8-jfreimann@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Set pending_deleted_event in DeviceState for failover
primary devices that were successfully unplugged by the Guest OS.
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Message-Id: <20191029114905.6856-5-jfreimann@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Only the guest unplug request was triggered. This is needed for
the failover feature. In case of a failed migration we need to
plug the device back to the guest.
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Message-Id: <20191029114905.6856-4-jfreimann@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch adds a failover_pair_id property to PCIDev which is
used to link the primary device in a failover pair (the PCI dev) to
a standby (a virtio-net-pci) device.
It only supports ethernet devices. Also currently it only supports
PCIe devices. The requirement for PCIe is because it doesn't support
other hotplug controllers at the moment. The failover functionality can
be added to other hotplug controllers like ACPI, SHCP,... later on.
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Message-Id: <20191029114905.6856-3-jfreimann@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In my "build everything" tree, changing sysemu/sysemu.h triggers a
recompile of some 5400 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).
hw/qdev-core.h includes sysemu/sysemu.h since recent commit e965ffa70a
"qdev: add qdev_add_vm_change_state_handler()". This is a bad idea:
hw/qdev-core.h is widely included.
Move the declaration of qdev_add_vm_change_state_handler() to
sysemu/sysemu.h, and drop the problematic include from hw/qdev-core.h.
Touching sysemu/sysemu.h now recompiles some 1800 objects.
qemu/uuid.h also drops from 5400 to 1800. A few more headers show
smaller improvement: qemu/notify.h drops from 5600 to 5200,
qemu/timer.h from 5600 to 4500, and qapi/qapi-types-run-state.h from
5500 to 5000.
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20190812052359.30071-28-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Commit e35704ba9c "numa: Move NUMA declarations from sysemu.h to
numa.h" left a few NUMA-related macros behind. Move them now.
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190812052359.30071-26-armbru@redhat.com>
In my "build everything" tree, changing hw/qdev-properties.h triggers
a recompile of some 2700 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).
Many places including hw/qdev-properties.h (directly or via hw/qdev.h)
actually need only hw/qdev-core.h. Include hw/qdev-core.h there
instead.
hw/qdev.h is actually pointless: all it does is include hw/qdev-core.h
and hw/qdev-properties.h, which in turn includes hw/qdev-core.h.
Replace the remaining uses of hw/qdev.h by hw/qdev-properties.h.
While there, delete a few superfluous inclusions of hw/qdev-core.h.
Touching hw/qdev-properties.h now recompiles some 1200 objects.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190812052359.30071-22-armbru@redhat.com>
In my "build everything" tree, changing hw/hw.h triggers a recompile
of some 2600 out of 6600 objects (not counting tests and objects that
don't depend on qemu/osdep.h).
The previous commits have left only the declaration of hw_error() in
hw/hw.h. This permits dropping most of its inclusions. Touching it
now recompiles less than 200 objects.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190812052359.30071-19-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
In my "build everything" tree, changing migration/vmstate.h triggers a
recompile of some 2700 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).
hw/hw.h supposedly includes it for convenience. Several other headers
include it just to get VMStateDescription. The previous commit made
that unnecessary.
Include migration/vmstate.h only where it's still needed. Touching it
now recompiles only some 1600 objects.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190812052359.30071-16-armbru@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
In my "build everything" tree, changing hw/irq.h triggers a recompile
of some 5400 out of 6600 objects (not counting tests and objects that
don't depend on qemu/osdep.h).
hw/hw.h supposedly includes it for convenience. Several other headers
include it just to get qemu_irq and.or qemu_irq_handler.
Move the qemu_irq and qemu_irq_handler typedefs from hw/irq.h to
qemu/typedefs.h, and then include hw/irq.h only where it's still
needed. Touching it now recompiles only some 500 objects.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190812052359.30071-13-armbru@redhat.com>
In my "build everything" tree, changing migration/qemu-file-types.h
triggers a recompile of some 2600 out of 6600 objects (not counting
tests and objects that don't depend on qemu/osdep.h).
The culprit is again hw/hw.h, which supposedly includes it for
convenience.
Include migration/qemu-file-types.h only where it's needed. Touching
it now recompiles less than 200 objects.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-10-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Rename function arguments to make intent clearer.
Better documentation for slot control logic.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
During boot, linux guests tend to clear all bits in pcie slot status
register which is used for hotplug.
If they clear bits that weren't set this is racy and will lose events:
not a big problem for manual hotplug on bare-metal, but a problem for us.
For example, the following is broken ATM:
/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -S -machine q35 \
-device pcie-root-port,id=pcie_root_port_0,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
-device virtio-balloon-pci,id=balloon,bus=pcie_root_port_0 \
-monitor stdio disk.qcow2
(qemu)device_del balloon
(qemu)cont
Balloon isn't deleted as it should.
As a work-around, detect this attempt to clear slot status and revert
status to what it was before the write.
Note: in theory this can be detected as a duplicate button press
which cancels the previous press. Does not seem to happen in
practice as guests seem to only have this bug during init.
Note2: the right thing to do is probably to fix Linux to
read status before clearing it, and act on the bits that are set.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: Igor Mammedov <imammedo@redhat.com>
During boot, linux would sometimes overwrites control of a powered off
slot before powering it on. Unfortunately QEMU interprets that as a
power off request and ejects the device.
For example:
/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -S -machine q35 \
-device pcie-root-port,id=pcie_root_port_0,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
-monitor stdio disk.qcow2
(qemu)device_add virtio-balloon-pci,id=balloon,bus=pcie_root_port_0
(qemu)cont
Balloon is deleted during guest boot.
To fix, save control beforehand and check that power
or led state actually change before ejecting.
Note: this is more a hack than a solution, ideally we'd
find a better way to detect ejects, or move away
from ejects completely and instead monitor whether
it's safe to delete device due to e.g. its power state.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: Igor Mammedov <imammedo@redhat.com>
If we are trying to set multiple bits at once, testing that just one of
them is already set gives a false positive. As a result we won't
interrupt guest if e.g. presence detection change and attention button
press are both set. This happens with multi-function device removal.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
No header includes qemu-common.h after this commit, as prescribed by
qemu-common.h's file comment.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-5-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c
block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c
target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h
target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h
target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h
target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and
net/tap-bsd.c fixed up]
Rather than looking inside the definition of a BusState with "s->bus.qbus",
use the QOM prefered style: "BUS(&s->bus)".
This patch was generated using the following Coccinelle script:
// Use BUS() macros to access BusState.qbus
@use_bus_macro_to_access_qbus@
expression obj;
identifier bus;
@@
-&obj->bus.qbus
+BUS(&obj->bus)
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Message-Id: <20190528164020.32250-4-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
The only remaining caller of pci_get_bus_devfn() is pci_nic_init_nofail(),
itself an old compatibility function. Fold the two together to avoid
re-using the stale interface.
While we're there replace the explicit fprintf()s with error_report().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190513061939.3464-6-david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Since c2077e2c "pci: Adjust PCI config limit based on bus topology",
pci_adjust_config_limit() has been used in the config space read and write
paths to only permit access to extended config space on buses which permit
it. Specifically it prevents access on devices below a vanilla-PCI bus via
some combination of bridges, even if both the host bridge and the device
itself are PCI-E.
It accomplishes this with a somewhat complex call up the chain of bridges
to see if any of them prohibit extended config space access. This is
overly complex, since we can always know if the bus will support such
access at the point it is constructed.
This patch simplifies the test by using a flag in the PCIBus instance
indicating whether extended configuration space is accessible. It is
false for vanilla PCI buses. For PCI-E buses, it is true for root
buses and equal to the parent bus's's capability otherwise.
For the special case of sPAPR's paravirtualized PCI root bus, which
acts mostly like vanilla PCI, but does allow extended config space
access, we override the default value of the flag from the host bridge
code.
This should cause no behavioural change.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20190513061939.3464-4-david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
'MSIX_CAP_LENGTH' is defined in two .c file. Move it
to hw/pci/msix.h file to reduce duplicated code.
CC: qemu-trivial@nongnu.org
Signed-off-by: Li Qiang <liq3ea@163.com>
Message-Id: <20190521151543.92274-5-liq3ea@163.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
pci_bus_is_root() currently relies on a method in the PCIBusClass.
But it's always known if a PCI bus is a root bus when we create it, so
using a dynamic method is overkill.
This replaces it with an IS_ROOT bit in a new flags field, which is set on
root buses and otherwise clear. As a bonus this removes the special
is_root logic from pci_expander_bridge, since it already creates its bus
as a root bus.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20190424041959.4087-3-david@gibson.dropbear.id.au>
These functions have an explicit test for accesses above the device's
config size. But pci_host_config_{read,write}_common() which they're
about to call already have checks against the config space limit and
do the right thing. So, remove the redundant tests.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190424041959.4087-2-david@gibson.dropbear.id.au>
Some machines have an AHCI adapter, but no PCI. To be able to
compile hw/ide/ahci.c without CONFIG_PCI, we still need the two
functions msi_enabled() and msi_notify() for linking.
This is required for the new Kconfig-like build system, if a user
wants to compile a QEMU binary with just one machine that has AHCI,
but no PCI, like the ARM "cubieboard" for example.
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
LSI mapping in spapr currently open-codes standard PCI swizzling. It thus
duplicates the code of pci_swizzle_map_irq_fn().
Expose the swizzling formula so that it can be used with a slot number
when building the device tree. Simply drop pci_spapr_map_irq() and call
pci_swizzle_map_irq_fn() instead.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155448184841.8446.13959787238854054119.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Some PHB implementations, eg. PAPR used on pseries machine, act like
a regular PCI bus rather than a PCIe bus, but allow access to the
PCIe extended config space anyway.
Introduce a new PCI bus class method to modelize this behaviour and
use it when adjusting the config space size limit during accesses.
No behaviour change for existing PCI bus types.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155414130271.574858.4253514266378127489.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We spell out sub/dir/ in sub/dir/trace-events' comments pointing to
source files. That's because when trace-events got split up, the
comments were moved verbatim.
Delete the sub/dir/ part from these comments. Gets rid of several
misspellings.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190314180929.27722-3-armbru@redhat.com
Message-Id: <20190314180929.27722-3-armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Not all interrupt controllers have a working implementation of
message-signalled interrupts; in some cases, the guest may expect
MSI to work but it won't due to the buggy or lacking emulation.
In QEMU this is represented by the "msi_nonbroken" variable. This
patch adds a new configuration symbol enabled whenever the binary
contains an interrupt controller that will set "msi_nonbroken". We
can then use it to remove devices that cannot be possibly added
to the machine, because they require MSI.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Implementing an ACS capability on downstream ports and multifunction
endpoints indicates isolation and IOMMU visibility to a finer
granularity. This creates smaller IOMMU groups in the guest and thus
more flexibility in assigning endpoints to guest userspace or an L2
guest.
Signed-off-by: Knut Omang <knut.omang@oracle.com>
Message-Id: <07489975121696f5573b0a92baaf3486ef51e35d.1550768238.git-series.knut.omang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Instead of including the same list of devices for each target,
set CONFIG_PCI to true, and make the devices default to present
whenever PCI is available. However, s390x does not want all the
PCI devices, so there is a separate symbol to enable them.
Done mostly with the following script:
while read i; do
i=${i%=y}; i=${i#CONFIG_}
sed -i -e'/^config '$i'$/!b' -en \
-e'a\' -e' default y if PCI_DEVICES\' -e' depends on PCI' \
`grep -lw $i hw/*/Kconfig`
done < default-configs/pci.mak
followed by replacing a few "depends on" clauses with "select"
whenever the symbol is not really related to PCI.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20190123065618.3520-31-yang.zhong@intel.com>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make pcie splited from pci and make it configurable.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20190123065618.3520-30-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The Kconfig files were generated mostly with this script:
for i in `grep -ho CONFIG_[A-Z0-9_]* default-configs/* | sort -u`; do
set fnord `git grep -lw $i -- 'hw/*/Makefile.objs' `
shift
if test $# = 1; then
cat >> $(dirname $1)/Kconfig << EOF
config ${i#CONFIG_}
bool
EOF
git add $(dirname $1)/Kconfig
else
echo $i $*
fi
done
sed -i '$d' hw/*/Kconfig
for i in hw/*; do
if test -d $i && ! test -f $i/Kconfig; then
touch $i/Kconfig
git add $i/Kconfig
fi
done
Whenever a symbol is referenced from multiple subdirectories, the
script prints the list of directories that reference the symbol.
These symbols have to be added manually to the Kconfig files.
Kconfig.host and hw/Kconfig were created manually.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20190123065618.3520-27-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When unplugging a device, at one point the device will be destroyed
via object_unparent(). This will, one the one hand, unrealize the
removed device hierarchy, and on the other hand, destroy/free the
device hierarchy.
When chaining hotplug handlers, we want to overwrite a bus hotplug
handler by the machine hotplug handler, to be able to perform
some part of the plug/unplug and to forward the calls to the bus hotplug
handler.
For now, the bus hotplug handler would trigger an object_unparent(), not
allowing us to perform some unplug action on a device after we forwarded
the call to the bus hotplug handler. The device would be gone at that
point.
machine_unplug_handler(dev)
/* eventually do unplug stuff */
bus_unplug_handler(dev)
/* dev is gone, we can't do more unplug stuff */
So move the object_unparent() to the original caller of the unplug. For
now, keep the unrealize() at the original places of the
object_unparent(). For implicitly chained hotplug handlers (e.g. pc
code calling acpi hotplug handlers), the object_unparent() has to be
done by the outermost caller. So when calling hotplug_handler_unplug()
from inside an unplug handler, nothing is to be done.
hotplug_handler_unplug(dev) -> calls machine_unplug_handler()
machine_unplug_handler(dev) {
/* eventually do unplug stuff */
bus_unplug_handler(dev) -> calls unrealize(dev)
/* we can do more unplug stuff but device already unrealized */
}
object_unparent(dev)
In the long run, every unplug action should be factored out of the
unrealize() function into the unplug handler (especially for PCI). Then
we can get rid of the additonal unrealize() calls and object_unparent()
will properly unrealize the device hierarchy after the device has been
unplugged.
hotplug_handler_unplug(dev) -> calls machine_unplug_handler()
machine_unplug_handler(dev) {
/* eventually do unplug stuff */
bus_unplug_handler(dev) -> only unplugs, does not unrealize
/* we can do more unplug stuff */
}
object_unparent(dev) -> will unrealize
The original approach was suggested by Igor Mammedov for the PCI
part, but I extended it to all hotplug handlers. I consider this one
step into the right direction.
To summarize:
- object_unparent() on synchronous unplugs is done by common code
-- "Caller of hotplug_handler_unplug"
- object_unparent() on asynchronous unplugs ("unplug requests") has to
be done manually
-- "Caller of hotplug_handler_unplug"
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190228122849.4296-2-david@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The entire link status register for SR-IOV VFs is defined as RsvdZ,
reads simply return zero. Usually this is nothing more than lspci
reporting inconsequentially broken values:
LnkSta: Speed unknown, Width x0, ...
However, now that we're using the downstream endpoint link status to
fill in the value at the parent downstream port, invalid values become
a problem. In particular, the PCIe hotplug driver in Linux looks for
a valid negotiated link width and will fail to enumerate hot-added
downstream endpoints without non-zero value here, ex:
pciehp 0000:00:02.0:pcie004: Slot(0): Attention button pressed
pciehp 0000:00:02.0:pcie004: Slot(0) Powering on due to button press
pciehp 0000:00:02.0:pcie004: Slot(0): Card present
pciehp 0000:00:02.0:pcie004: Slot(0): Link Up
pciehp 0000:00:02.0:pcie004: link training error: status 0x2000
pciehp 0000:00:02.0:pcie004: Failed to check link status
Resolve by using minimum width and speed values for the downstream
port link status when the endpoint fails to provide valid values.
Long term, we may want to implement emulation in the vfio-pci host
driver to suppliment this field with the PF value as the SR-IOV spec
seems to allow, but the solution here is compatible should that be
implemented later.
Fixes: 727b48661f ("pci: Sync PCIe downstream port LNKSTA on read")
Reported-by: Jens Freimann <jfreimann@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <155060310248.19547.14979269067689441201.stgit@gimli.home>
Tested-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Certain devices types, like memory/CPU, are now being handled using a
hotplug interface provided by a top-level MachineClass. Hotpluggable
host bridges are another such device where it makes sense to use a
machine-level hotplug handler. However, unlike those devices,
host-bridges have a parent bus (the main system bus), and devices with
a parent bus use a different mechanism for registering their hotplug
handlers: qbus_set_hotplug_handler(). This interface currently expects
a handler to be a subclass of DeviceClass, but this is not the case
for MachineClass, which derives directly from ObjectClass.
Internally, the interface only requires an ObjectClass, so expose that
in qbus_set_hotplug_handler().
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <154999589921.690774.3640149277362188566.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It is going to be used later on outside MSI code to detect whether one
MSI vector is masked out.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In msix_exclusive_bar the bar_pba_size is more than what the pba is
expected to have, although this never affects the bar size.
Specifically, the math in msix_init_exclusive_bar allocates too much
memory in some cases.
For example consider nentries = 8. msix_exclusive_bar will give us
bar_pba_size = 16. So 16 bytes. However 8 bytes would be enough - this
is all that the spec requires.
So in practice bar_pba_size sometimes allocates an extra 8 bytes but
never more.
Since each MSIX entry size is 16 bytes, and since we make sure that
table+pba is a power of two, this always leaves a multiple of 16 bytes
for the PBA, so extra 8 bytes have no effect.
However, its ugly to have pba size temporary variable have an incorrect
value. For consistency switch to the formula used in msix_init.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We better stop right away. For now, errors would be partially ignored
(so the guest might get informed or the device might get unplugged),
although actual plug/unplug will be reported as failed to the user.
While at it, properly move the check to the pre_plug handler for the plug
case, as we can test the slot state before the device will be realized.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Most files that have TABs only contain a handful of them. Change
them to spaces so that we don't confuse people.
disas, standard-headers, linux-headers and libdecnumber are imported
from other projects and probably should be exempted from the check.
Outside those, after this patch the following files still contain both
8-space and TAB sequences at the beginning of the line. Many of them
have a majority of TABs, or were initially committed with all tabs.
bsd-user/i386/target_syscall.h
bsd-user/x86_64/target_syscall.h
crypto/aes.c
hw/audio/fmopl.c
hw/audio/fmopl.h
hw/block/tc58128.c
hw/display/cirrus_vga.c
hw/display/xenfb.c
hw/dma/etraxfs_dma.c
hw/intc/sh_intc.c
hw/misc/mst_fpga.c
hw/net/pcnet.c
hw/sh4/sh7750.c
hw/timer/m48t59.c
hw/timer/sh_timer.c
include/crypto/aes.h
include/disas/bfd.h
include/hw/sh4/sh.h
libdecnumber/decNumber.c
linux-headers/asm-generic/unistd.h
linux-headers/linux/kvm.h
linux-user/alpha/target_syscall.h
linux-user/arm/nwfpe/double_cpdo.c
linux-user/arm/nwfpe/fpa11_cpdt.c
linux-user/arm/nwfpe/fpa11_cprt.c
linux-user/arm/nwfpe/fpa11.h
linux-user/flat.h
linux-user/flatload.c
linux-user/i386/target_syscall.h
linux-user/ppc/target_syscall.h
linux-user/sparc/target_syscall.h
linux-user/syscall.c
linux-user/syscall_defs.h
linux-user/x86_64/target_syscall.h
slirp/cksum.c
slirp/if.c
slirp/ip.h
slirp/ip_icmp.c
slirp/ip_icmp.h
slirp/ip_input.c
slirp/ip_output.c
slirp/mbuf.c
slirp/misc.c
slirp/sbuf.c
slirp/socket.c
slirp/socket.h
slirp/tcp_input.c
slirp/tcpip.h
slirp/tcp_output.c
slirp/tcp_subr.c
slirp/tcp_timer.c
slirp/tftp.c
slirp/udp.c
slirp/udp.h
target/cris/cpu.h
target/cris/mmu.c
target/cris/op_helper.c
target/sh4/helper.c
target/sh4/op_helper.c
target/sh4/translate.c
tcg/sparc/tcg-target.inc.c
tests/tcg/cris/check_addo.c
tests/tcg/cris/check_moveq.c
tests/tcg/cris/check_swap.c
tests/tcg/multiarch/test-mmap.c
ui/vnc-enc-hextile-template.h
ui/vnc-enc-zywrle.h
util/envlist.c
util/readline.c
The following have only TABs:
bsd-user/i386/target_signal.h
bsd-user/sparc64/target_signal.h
bsd-user/sparc64/target_syscall.h
bsd-user/sparc/target_signal.h
bsd-user/sparc/target_syscall.h
bsd-user/x86_64/target_signal.h
crypto/desrfb.c
hw/audio/intel-hda-defs.h
hw/core/uboot_image.h
hw/sh4/sh7750_regnames.c
hw/sh4/sh7750_regs.h
include/hw/cris/etraxfs_dma.h
linux-user/alpha/termbits.h
linux-user/arm/nwfpe/fpopcode.h
linux-user/arm/nwfpe/fpsr.h
linux-user/arm/syscall_nr.h
linux-user/arm/target_signal.h
linux-user/cris/target_signal.h
linux-user/i386/target_signal.h
linux-user/linux_loop.h
linux-user/m68k/target_signal.h
linux-user/microblaze/target_signal.h
linux-user/mips64/target_signal.h
linux-user/mips/target_signal.h
linux-user/mips/target_syscall.h
linux-user/mips/termbits.h
linux-user/ppc/target_signal.h
linux-user/sh4/target_signal.h
linux-user/sh4/termbits.h
linux-user/sparc64/target_syscall.h
linux-user/sparc/target_signal.h
linux-user/x86_64/target_signal.h
linux-user/x86_64/termbits.h
pc-bios/optionrom/optionrom.h
slirp/mbuf.h
slirp/misc.h
slirp/sbuf.h
slirp/tcp.h
slirp/tcp_timer.h
slirp/tcp_var.h
target/i386/svm.h
target/sparc/asi.h
target/xtensa/core-dc232b/xtensa-modules.inc.c
target/xtensa/core-dc233c/xtensa-modules.inc.c
target/xtensa/core-de212/core-isa.h
target/xtensa/core-de212/xtensa-modules.inc.c
target/xtensa/core-fsf/xtensa-modules.inc.c
target/xtensa/core-sample_controller/core-isa.h
target/xtensa/core-sample_controller/xtensa-modules.inc.c
target/xtensa/core-test_kc705_be/core-isa.h
target/xtensa/core-test_kc705_be/xtensa-modules.inc.c
tests/tcg/cris/check_abs.c
tests/tcg/cris/check_addc.c
tests/tcg/cris/check_addcm.c
tests/tcg/cris/check_addoq.c
tests/tcg/cris/check_bound.c
tests/tcg/cris/check_ftag.c
tests/tcg/cris/check_int64.c
tests/tcg/cris/check_lz.c
tests/tcg/cris/check_openpf5.c
tests/tcg/cris/check_sigalrm.c
tests/tcg/cris/crisutils.h
tests/tcg/cris/sys.c
tests/tcg/i386/test-i386-ssse3.c
ui/vgafont.h
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20181213223737.11793-3-pbonzini@redhat.com>
Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Eric Blake <eblake@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Stefan Markovic <smarkovic@wavecomp.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This adds cleanup counterparts to pci_register_root_bus(),
pci_root_bus_new(), and pci_bus_irqs().
These cleanup routines are needed in the case of hotpluggable
PCIHostBridge implementations. Currently we can rely on the
object_unparent()'ing of the PCIHostState recursively unparenting
and cleaning up it's child buses, but we need explicit calls
to also:
1) remove the PCIHostState from pci_host_bridges global list.
otherwise, we risk accessing freed memory when we access
the list later
2) clean up memory allocated in pci_bus_irqs()
Both are handled outside the context of any particular bus or
host bridge's init/realize functions, making it difficult to
avoid the need for explicit cleanup functions without remodeling
how PCIHostBridges are created. So keep it simple and just add
them for now.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A conventional PCI bus does not support config space accesses above
the standard 256 byte configuration space. PCIe-to-PCI bridges are
not permitted to forward transactions if the extended register address
field is non-zero and must handle it as an unsupported request (PCIe
bridge spec rev 1.0, 4.1.3, 4.1.4). Therefore, we should not support
extended config space if there is a conventional bus anywhere on the
path to a device.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introduce and use the "unplug" callback.
This is a preparation for multi-stage hotplug handlers, whereby the bus
hotplug handler is overwritten by the machine hotplug handler. This handler
will then pass control to the bus hotplug handler. So to get this running
cleanly, we also have to make sure to go via the hotplug handler chain when
actually unplugging a device after an unplug request. Lookup the hotplug
handler and call "unplug".
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introduce and use the "unplug" callback.
This is a preparation for multi-stage hotplug handlers, whereby the bus
hotplug handler is overwritten by the machine hotplug handler. This handler
will then pass control to the bus hotplug handler. So to get this running
cleanly, we also have to make sure to go via the hotplug handler chain when
actually unplugging a device after an unplug request. Lookup the hotplug
handler and call "unplug".
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The callbacks are also called for cold plugged devices. Drop the "hot"
to better match the actual callback names.
While at it, also rename shpc_device_hotplug_common() to
shpc_device_plug_common().
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The callbacks are also called for cold plugged devices. Drop the "hot"
to better match the actual callback names.
While at it, also rename pcie_cap_slot_hotplug_common() to
pcie_cap_slot_plug_common().
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Make use of the PCIESlot speed and width fields to update link
information beyond those configured in pcie_cap_v1_fill(). This is
only called for devices supporting a version 2 capability and
automatically skips any non-PCIESlot devices. Only devices with
increased link values generate any visible config space differences.
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The PCIe link speed and width between a downstream device and its
upstream port is negotiated on real hardware and susceptible to
dynamic changes due to signal issues and power management. In the
emulated device case there is no real hardware link, but we still
might wish to have some consistency between endpoint and downstream
port via a virtual negotiation. There is of course a real link for
assigned devices and this same virtual negotiation allows the
downstream port to match the endpoint, synchronizing on every read
to support underlying physical hardware dynamically adjusting the
link.
This negotiation is intentionally unidirectional for compatibility.
If the endpoint exceeds the capabilities of the downstream port or
there is no endpoint device, the downstream port reports negotiation
to its maximum speed and width, matching the previous case where
negotiation was absent. De-tuning the endpoint to match a virtual
link doesn't seem to benefit anyone and is a condition we've thus
far reported without functional issues.
Note that PCI_EXP_LNKSTA is already ignored for migration
compatibility via pcie_cap_v1_fill().
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In preparation for reporting higher virtual link speeds and widths,
create enums and macros to help us manage them.
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When loadvm'ing a *running* snapshot qemu crashes due to an invalid
free. It's fortunately caught early by glibc heap memory corruption
protection and qemu gets killed with SIGABRT.
Steps to reproduce:
1) Create VM (e.g w/ virsh define)
2) Start the VM and take a snapshot while it's running and having a
PCI bridge attached
3) Destroy the VM and revert the running snapshot.
This commit fixes the issue.
Signed-off-by: Matthias Weckbecker <matthias@weckbecker.name>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When VM boots from the latest version of linux kernel, after
hot-unpluging virtio-blk disks which are hotplugged into
pcie-root-port, the VM's dmesg log shows:
[ 151.046242] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0001 from Slot Status
[ 151.046365] pciehp 0000:00:05.0:pcie004: Slot(0-3): Attention button pressed
[ 151.046369] pciehp 0000:00:05.0:pcie004: Slot(0-3): Powering off due to button press
[ 151.046420] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 151.046425] pciehp 0000:00:05.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
[ 151.046464] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 151.046468] pciehp 0000:00:05.0:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0
[ 156.163421] pciehp 0000:00:05.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 2f1
[ 156.163427] pciehp 0000:00:05.0:pcie004: pciehp_unconfigure_device: domain🚌dev = 0000:06:00
[ 156.198736] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 156.198772] pciehp 0000:00:05.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
[ 157.224124] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0018 from Slot Status
[ 157.224194] pciehp 0000:00:05.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
[ 157.224220] pciehp 0000:00:05.0:pcie004: pciehp_check_link_active: lnk_status = 2011
[ 157.224223] pciehp 0000:00:05.0:pcie004: Slot(0-3): Link Up
[ 157.224233] pciehp 0000:00:05.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 7f1
[ 157.224281] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 157.224285] pciehp 0000:00:05.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0
[ 157.224300] pciehp 0000:00:05.0:pcie004: __pciehp_link_set: lnk_ctrl = 0
[ 157.224336] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 157.224339] pciehp 0000:00:05.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
[ 159.739294] pci 0000:06:00.0 id reading try 50 times with interval 20 ms to get ffffffff
[ 159.739315] pciehp 0000:00:05.0:pcie004: pciehp_check_link_status: lnk_status = 2011
[ 159.739318] pciehp 0000:00:05.0:pcie004: Failed to check link status
[ 159.739371] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 159.739394] pciehp 0000:00:05.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
[ 160.771426] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 160.771452] pciehp 0000:00:05.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
[ 160.771495] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 160.771499] pciehp 0000:00:05.0:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
[ 160.771535] pciehp 0000:00:05.0:pcie004: pending interrupts 0x0010 from Slot Status
[ 160.771539] pciehp 0000:00:05.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
After analyzing the log information, it seems that qemu doesn't
change the Link Status from active to inactive after hot-unplug.
This results in the abnormal log after the linux kernel commit
d331710ea78fea merged.
Furthermore, If I hotplug the same virtio-blk disk after hot-unplug,
the virtio-blk would turn on and then back off.
So this patch set the Link Status inactive after hot-unplug and
active after hot-plug.
Signed-off-by: Zheng Xiang <zhengxiang9@huawei.com>
Signed-off-by: Zheng Xiang <xiang.zheng@linaro.org>
Cc: Wang Haibin <wanghaibin.wang@huawei.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The load_image() function is deprecated, as it does not let the
caller specify how large the buffer to read the file into is.
Instead use load_image_size().
While we are converting this code, add an error-check
for read failure.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20181130151712.2312-5-peter.maydell@linaro.org
Because they are supposed to remain const.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20181114132931.22624-1-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Mao Zhongyi <maozhongyi@cmss.chinamobile.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
According to PCI specification, subsystem id and subsystem vendor id
are present only in type 0 and type 2 headers (at different offsets),
but not in type 1 headers.
Thus we should make this data optional in struct PciDeviceId and skip
reporting them via HMP if the information is not available.
Additional (wrong information) about PCI bridges (Type1 devices) has been
added in 5383a705 and fortunately not released. This patch fixes that
problem. The problem was spotted by Markus.
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
Message-Id: <20181002135538.12113-1-den@openvz.org>
Reported-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This is a long story. Red Hat has relicensed Windows KVM device drivers
in 2018 and there was an agreement that to avoid WHQL driver conflict
software manufacturers should set proper PCI subsystem vendor ID in
their distributions. Thus PCI subsystem vendor id becomes actively used.
The problem is that this field is applied by us via hardware compats.
Thus technically it could be lost.
This patch adds PCI susbsystem id and vendor id to exportable parameters
for validation.
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180918095852.28422-1-den@openvz.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Factor "bus_reserve", "io_reserve", "mem_reserve", "pref32_reserve"
and "pref64_reserve" fields of the "GenPCIERootPort" structure out
to "PCIResReserve" structure, so that other PCI bridges can
reuse it to add resource reserve capability.
Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
Reviewed-by: Marcel Apfelbaum<marcel.apfelbaum@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>