Commit Graph

547 Commits

Author SHA1 Message Date
Philippe Mathieu-Daudé
4c8a2054e7 hw/vfio/ccw: Simplify using DEVICE() macro
QOM parenthood relationship is:

  VFIOCCWDevice -> S390CCWDevice -> CcwDevice -> DeviceState

We can directly use the QOM DEVICE() macro to get the parent object.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <20230213170145.45666-3-philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-02-27 09:15:38 +01:00
Avihai Horon
48e4d8289f vfio: Alphabetize migration section of VFIO trace-events file
Sort the migration section of VFIO trace events file alphabetically
and move two misplaced traces to common.c section.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Link: https://lore.kernel.org/r/20230216143630.25610-11-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-02-16 12:13:46 -07:00
Avihai Horon
7429aebe1c vfio/migration: Remove VFIO migration protocol v1
Now that v2 protocol implementation has been added, remove the
deprecated v1 implementation.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Link: https://lore.kernel.org/r/20230216143630.25610-10-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-02-16 12:13:46 -07:00
Avihai Horon
31bcbbb5be vfio/migration: Implement VFIO migration protocol v2
Implement the basic mandatory part of VFIO migration protocol v2.
This includes all functionality that is necessary to support
VFIO_MIGRATION_STOP_COPY part of the v2 protocol.

The two protocols, v1 and v2, will co-exist and in the following patches
v1 protocol code will be removed.

There are several main differences between v1 and v2 protocols:
- VFIO device state is now represented as a finite state machine instead
  of a bitmap.

- Migration interface with kernel is now done using VFIO_DEVICE_FEATURE
  ioctl and normal read() and write() instead of the migration region.

- Pre-copy is made optional in v2 protocol. Support for pre-copy will be
  added later on.

Detailed information about VFIO migration protocol v2 and its difference
compared to v1 protocol can be found here [1].

[1]
https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>.
Link: https://lore.kernel.org/r/20230216143630.25610-9-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-02-16 12:13:46 -07:00
Avihai Horon
6eeb290910 vfio/migration: Rename functions/structs related to v1 protocol
To avoid name collisions, rename functions and structs related to VFIO
migration protocol v1. This will allow the two protocols to co-exist
when v2 protocol is added, until v1 is removed. No functional changes
intended.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Link: https://lore.kernel.org/r/20230216143630.25610-8-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-02-16 12:13:46 -07:00
Avihai Horon
16fe4e8ab7 vfio/migration: Move migration v1 logic to vfio_migration_init()
Move vfio_dev_get_region_info() logic from vfio_migration_probe() to
vfio_migration_init(). This logic is specific to v1 protocol and moving
it will make it easier to add the v2 protocol implementation later.
No functional changes intended.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Link: https://lore.kernel.org/r/20230216143630.25610-7-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-02-16 12:13:46 -07:00
Avihai Horon
29d81b71aa vfio/migration: Block multiple devices migration
Currently VFIO migration doesn't implement some kind of intermediate
quiescent state in which P2P DMAs are quiesced before stopping or
running the device. This can cause problems in multi-device migration
where the devices are doing P2P DMAs, since the devices are not stopped
together at the same time.

Until such support is added, block migration of multiple devices.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Link: https://lore.kernel.org/r/20230216143630.25610-6-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-02-16 12:13:46 -07:00
Avihai Horon
8b942af393 vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one
vfio_devices_all_running_and_saving() is used to check if migration is
in pre-copy phase. This is done by checking if migration is in setup or
active states and if all VFIO devices are in pre-copy state, i.e.
_SAVING | _RUNNING.

In VFIO migration protocol v2 pre-copy support is made optional. Hence,
a matching v2 protocol pre-copy state can't be used here.

As preparation for adding v2 protocol, change
vfio_devices_all_running_and_saving() logic such that it doesn't use the
VFIO pre-copy state.

The new equivalent logic checks if migration is in active state and if
all VFIO devices are in running state [1]. No functional changes
intended.

[1] Note that checking if migration is in setup or active states and if
all VFIO devices are in running state doesn't guarantee that we are in
pre-copy phase, thus we check if migration is only in active state.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Link: https://lore.kernel.org/r/20230216143630.25610-5-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-02-16 12:13:46 -07:00
Avihai Horon
b051a3f640 vfio/migration: Allow migration without VFIO IOMMU dirty tracking support
Currently, if IOMMU of a VFIO container doesn't support dirty page
tracking, migration is blocked. This is because a DMA-able VFIO device
can dirty RAM pages without updating QEMU about it, thus breaking the
migration.

However, this doesn't mean that migration can't be done at all.
In such case, allow migration and let QEMU VFIO code mark all pages
dirty.

This guarantees that all pages that might have gotten dirty are reported
back, and thus guarantees a valid migration even without VFIO IOMMU
dirty tracking support.

The motivation for this patch is the introduction of iommufd [1].
iommufd can directly implement the /dev/vfio/vfio container IOCTLs by
mapping them into its internal ops, allowing the usage of these IOCTLs
over iommufd. However, VFIO IOMMU dirty tracking is not supported by
this VFIO compatibility API.

This patch will allow migration by hosts that use the VFIO compatibility
API and prevent migration regressions caused by the lack of VFIO IOMMU
dirty tracking support.

[1]
https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com/

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Link: https://lore.kernel.org/r/20230216143630.25610-4-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-02-16 12:13:46 -07:00
Avihai Horon
5c4dbcb748 vfio/migration: Fix NULL pointer dereference bug
As part of its error flow, vfio_vmstate_change() accesses
MigrationState->to_dst_file without any checks. This can cause a NULL
pointer dereference if the error flow is taken and
MigrationState->to_dst_file is not set.

For example, this can happen if VM is started or stopped not during
migration and vfio_vmstate_change() error flow is taken, as
MigrationState->to_dst_file is not set at that time.

Fix it by checking that MigrationState->to_dst_file is set before using
it.

Fixes: 02a7e71b1e ("vfio: Add VM state change handler to know state of VM")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Link: https://lore.kernel.org/r/20230216143630.25610-3-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-02-16 12:13:46 -07:00
Juan Quintela
24beea4efe migration: Rename res_{postcopy,precopy}_only
Once that res_compatible is removed, they don't make sense anymore.
We remove the _only preffix.  And to make things clearer we rename
them to must_precopy and can_postcopy.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-15 20:04:30 +01:00
Juan Quintela
24f254ed79 migration: Remove unused res_compatible
Nothing assigns to it after previous commit.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-15 20:04:30 +01:00
Juan Quintela
fd70385d38 migration: Remove unused threshold_size parameter
Until previous commit, save_live_pending() was used for ram.  Now with
the split into state_pending_estimate() and state_pending_exact() it
is not needed anymore, so remove them.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2023-02-06 19:22:56 +01:00
Juan Quintela
c8df4a7aef migration: Split save_live_pending() into state_pending_*
We split the function into to:

- state_pending_estimate: We estimate the remaining state size without
  stopping the machine.

- state pending_exact: We calculate the exact amount of remaining
  state.

The only "device" that implements different functions for _estimate()
and _exact() is ram.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2023-02-06 19:22:56 +01:00
Juan Quintela
255dc7af7e migration: No save_live_pending() method uses the QEMUFile parameter
So remove it everywhere.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2023-02-06 19:22:56 +01:00
Markus Armbruster
edf5ca5dbe include/hw/pci: Split pci_device.h off pci.h
PCIDeviceClass and PCIDevice are defined in pci.h.  Many users of the
header don't actually need them.  Similar structs live in their own
headers: PCIBusClass and PCIBus in pci_bus.h, PCIBridge in
pci_bridge.h, PCIHostBridgeClass and PCIHostState in pci_host.h,
PCIExpressHost in pcie_host.h, and PCIERootPortClass, PCIEPort, and
PCIESlot in pcie_port.h.

Move PCIDeviceClass and PCIDeviceClass to new pci_device.h, along with
the code that needs them.  Adjust include directives.

This also enables the next commit.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221222100330.380143-6-armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-01-08 01:54:22 -05:00
Stefan Hajnoczi
f21f1cfeb9 pci,pc,virtio: features, tests, fixes, cleanups
lots of acpi rework
 first version of biosbits infrastructure
 ASID support in vhost-vdpa
 core_count2 support in smbios
 PCIe DOE emulation
 virtio vq reset
 HMAT support
 part of infrastructure for viommu support in vhost-vdpa
 VTD PASID support
 fixes, tests all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmNpXDkPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpD0AH/2G8ZPrgrxJC9y3uD5/5J6QRzO+TsDYbg5ut
 uBf4rKSHHzcu6zdyAfsrhbAKKzyD4HrEGNXZrBjnKM1xCiB/SGBcDIWntwrca2+s
 5Dpbi4xvd4tg6tVD4b47XNDCcn2uUbeI0e2M5QIbtCmzdi/xKbFAfl5G8DQp431X
 Kmz79G4CdKWyjVlM0HoYmdCw/4FxkdjD02tE/Uc5YMrePNaEg5Bw4hjCHbx1b6ur
 6gjeXAtncm9s4sO0l+sIdyiqlxiTry9FSr35WaQ0qPU+Og5zaf1EiWfdl8TRo4qU
 EAATw5A4hyw11GfOGp7oOVkTGvcNB/H7aIxD7emdWZV8+BMRPKo=
 =zTCn
 -----END PGP SIGNATURE-----

Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging

pci,pc,virtio: features, tests, fixes, cleanups

lots of acpi rework
first version of biosbits infrastructure
ASID support in vhost-vdpa
core_count2 support in smbios
PCIe DOE emulation
virtio vq reset
HMAT support
part of infrastructure for viommu support in vhost-vdpa
VTD PASID support
fixes, tests all over the place

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmNpXDkPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpD0AH/2G8ZPrgrxJC9y3uD5/5J6QRzO+TsDYbg5ut
# uBf4rKSHHzcu6zdyAfsrhbAKKzyD4HrEGNXZrBjnKM1xCiB/SGBcDIWntwrca2+s
# 5Dpbi4xvd4tg6tVD4b47XNDCcn2uUbeI0e2M5QIbtCmzdi/xKbFAfl5G8DQp431X
# Kmz79G4CdKWyjVlM0HoYmdCw/4FxkdjD02tE/Uc5YMrePNaEg5Bw4hjCHbx1b6ur
# 6gjeXAtncm9s4sO0l+sIdyiqlxiTry9FSr35WaQ0qPU+Og5zaf1EiWfdl8TRo4qU
# EAATw5A4hyw11GfOGp7oOVkTGvcNB/H7aIxD7emdWZV8+BMRPKo=
# =zTCn
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 07 Nov 2022 14:27:53 EST
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (83 commits)
  checkpatch: better pattern for inline comments
  hw/virtio: introduce virtio_device_should_start
  tests/acpi: update tables for new core count test
  bios-tables-test: add test for number of cores > 255
  tests/acpi: allow changes for core_count2 test
  bios-tables-test: teach test to use smbios 3.0 tables
  hw/smbios: add core_count2 to smbios table type 4
  vhost-user: Support vhost_dev_start
  vhost: Change the sequence of device start
  intel-iommu: PASID support
  intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function
  intel-iommu: drop VTDBus
  intel-iommu: don't warn guest errors when getting rid2pasid entry
  vfio: move implement of vfio_get_xlat_addr() to memory.c
  tests: virt: Update expected *.acpihmatvirt tables
  tests: acpi: aarch64/virt: add a test for hmat nodes with no initiators
  hw/arm/virt: Enable HMAT on arm virt machine
  tests: Add HMAT AArch64/virt empty table files
  tests: acpi: q35: update expected blobs *.hmat-noinitiators expected HMAT:
  tests: acpi: q35: add test for hmat nodes without initiators
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-11-07 18:43:56 -05:00
Cindy Lu
baa44bce87 vfio: move implement of vfio_get_xlat_addr() to memory.c
- Move the implement vfio_get_xlat_addr to softmmu/memory.c, and
  change the name to memory_get_xlat_addr(). So we can use this
  function on other devices, such as vDPA device.
- Add a new function vfio_get_xlat_addr in vfio/common.c, and it will check
  whether the memory is backed by a discard manager. then device can
  have its own warning.

Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20221031031020.1405111-2-lulu@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-11-07 14:08:17 -05:00
Avihai Horon
2461e75219 vfio/migration: Fix wrong enum usage
vfio_migration_init() initializes VFIOMigration->device_state using enum
of VFIO migration protocol v2. Current implemented protocol is v1 so v1
enum should be used. Fix it.

Fixes: 429c728006 ("vfio/migration: Fix incorrect initialization value for parameters in VFIOMigration")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Link: https://lore.kernel.org/r/20221016085752.32740-1-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-11-03 15:57:31 -06:00
Alex Williamson
85b6d2b5fc vfio/common: Fix vfio_iommu_type1_info use after free
On error, vfio_get_iommu_info() frees and clears *info, but
vfio_connect_container() continues to use the pointer regardless
of the return value.  Restructure the code such that a failure
of this function triggers an error and clean up the remainder of
the function, including updating an outdated comment that had
drifted from its relevant line of code and using host page size
for a default for better compatibility on non-4KB systems.

Reported-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/all/20220910004245.2878-1-nicolinc@nvidia.com/
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/r/166326219630.3388898.12882473157184946072.stgit@omen
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-27 14:26:42 -06:00
Kunkun Jiang
429c728006 vfio/migration: Fix incorrect initialization value for parameters in VFIOMigration
The structure VFIOMigration of a VFIODevice is allocated and initialized
in vfio_migration_init(). "device_state" and "vm_running" are initialized
to 0, indicating that VFIO device is_STOP and VM is not-running. The
initialization value is incorrect. According to the agreement, default
state of VFIO device is _RUNNING. And if a VFIO device is hot-plugged
while the VM is running, "vm_running" should be 1. This patch fixes it.

Fixes: 02a7e71b1e ("vfio: Add VM state change handler to know state of VM")
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Link: https://lore.kernel.org/r/20220711014651.1327-1-jiangkunkun@huawei.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-27 14:26:39 -06:00
Akihiko Odaki
362239c05f ui/console: Do not return a value with ui_info
The returned value is not used and misleading.

Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Message-Id: <20220226115516.59830-2-akihiko.odaki@gmail.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2022-06-14 10:34:37 +02:00
Eric Auger
ec6600be0d vfio/common: remove spurious warning on vfio_listener_region_del
851d6d1a0f ("vfio/common: remove spurious tpm-crb-cmd misalignment
warning") removed the warning on vfio_listener_region_add() path.

However the same warning also hits on region_del path. Let's remove
it and reword the dynamic trace as this can be called on both
map and unmap path.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/20220524091405.416256-1-eric.auger@redhat.com
Fixes: 851d6d1a0f ("vfio/common: remove spurious tpm-crb-cmd misalignment warning")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-06-08 08:44:19 -06:00
Bernhard Beschow
8f1b608798 hw/vfio/pci-quirks: Resolve redundant property getters
The QOM API already provides getters for uint64 and uint32 values, so reuse
them.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220301225220.239065-2-shentey@gmail.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-05-24 10:38:50 +10:00
Alex Williamson
e4082063e4 linux-headers: Update to v5.18-rc6
Update to c5eb0a61238d ("Linux 5.18-rc6").  Mechanical search and
replace of vfio defines with white space massaging.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-13 08:20:11 -06:00
Yi Liu
44ee6aaae0 vfio/common: Rename VFIOGuestIOMMU::iommu into ::iommu_mr
Rename VFIOGuestIOMMU iommu field into iommu_mr. Then it becomes clearer
it is an IOMMU memory region.

no functional change intended

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20220502094223.36384-4-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06 09:06:51 -06:00
Eric Auger
0d570a2572 vfio/pci: Use vbasedev local variable in vfio_realize()
Using a VFIODevice handle local variable to improve the code readability.

no functional change intended

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20220502094223.36384-3-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06 09:06:50 -06:00
Eric Auger
9d38ffc5d8 hw/vfio/pci: fix vfio_pci_hot_reset_result trace point
"%m" format specifier is not interpreted by the trace infrastructure
and thus "%m" is output instead of the actual errno string. Fix it by
outputting strerror(errno).

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20220502094223.36384-2-yi.l.liu@intel.com
[aw: replace commit log as provided by Eric]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06 09:06:50 -06:00
Eric Auger
851d6d1a0f vfio/common: remove spurious tpm-crb-cmd misalignment warning
The CRB command buffer currently is a RAM MemoryRegion and given
its base address alignment, it causes an error report on
vfio_listener_region_add(). This region could have been a RAM device
region, easing the detection of such safe situation but this option
was not well received. So let's add a helper function that uses the
memory region owner type to detect the situation is safe wrt
the assignment. Other device types can be checked here if such kind
of problem occurs again.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/20220506132510.1847942-3-eric.auger@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06 09:06:50 -06:00
Xiang Chen
99510d271b vfio/common: Fix a small boundary issue of a trace
It uses [offset, offset + size - 1] to indicate that the length of range is
size in most places in vfio trace code (such as
trace_vfio_region_region_mmap()) execpt trace_vfio_region_sparse_mmap_entry().
So change it for trace_vfio_region_sparse_mmap_entry(), but if size is zero,
the trace will be weird with an underflow, so move the trace and trace it
only if size is not zero.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1650100104-130737-1-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06 09:06:50 -06:00
Longpeng(Mike)
dc580d51f7 vfio: defer to commit kvm irq routing when enable msi/msix
In migration resume phase, all unmasked msix vectors need to be
setup when loading the VF state. However, the setup operation would
take longer if the VM has more VFs and each VF has more unmasked
vectors.

The hot spot is kvm_irqchip_commit_routes, it'll scan and update
all irqfds that are already assigned each invocation, so more
vectors means need more time to process them.

vfio_pci_load_config
  vfio_msix_enable
    msix_set_vector_notifiers
      for (vector = 0; vector < dev->msix_entries_nr; vector++) {
        vfio_msix_vector_do_use
          vfio_add_kvm_msi_virq
            kvm_irqchip_commit_routes <-- expensive
      }

We can reduce the cost by only committing once outside the loop.
The routes are cached in kvm_state, we commit them first and then
bind irqfd for each vector.

The test VM has 128 vcpus and 8 VF (each one has 65 vectors),
we measure the cost of the vfio_msix_enable for each VF, and
we can see 90+% costs can be reduce.

VF      Count of irqfds[*]  Original        With this patch

1st           65            8               2
2nd           130           15              2
3rd           195           22              2
4th           260           24              3
5th           325           36              2
6th           390           44              3
7th           455           51              3
8th           520           58              4
Total                       258ms           21ms

[*] Count of irqfds
How many irqfds that already assigned and need to process in this
round.

The optimization can be applied to msi type too.

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20220326060226.1892-6-longpeng2@huawei.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06 09:06:50 -06:00
Longpeng(Mike)
75d546fc18 Revert "vfio: Avoid disabling and enabling vectors repeatedly in VFIO migration"
Commit ecebe53fe9 ("vfio: Avoid disabling and enabling vectors
repeatedly in VFIO migration") avoids inefficiently disabling and
enabling vectors repeatedly and lets the unmasked vectors be enabled
one by one.

But we want to batch multiple routes and defer the commit, and only
commit once outside the loop of setting vector notifiers, so we
cannot enable the vectors one by one in the loop now.

Revert that commit and we will take another way in the next patch,
it can not only avoid disabling/enabling vectors repeatedly, but
also satisfy our requirement of defer to commit.

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20220326060226.1892-5-longpeng2@huawei.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06 09:06:50 -06:00
Longpeng(Mike)
8ab217d5d3 vfio: simplify the failure path in vfio_msi_enable
Use vfio_msi_disable_common to simplify the error handling
in vfio_msi_enable.

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20220326060226.1892-4-longpeng2@huawei.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06 09:06:50 -06:00
Longpeng(Mike)
be4a46eccf vfio: move re-enabling INTX out of the common helper
Move re-enabling INTX out, and the callers should decide to
re-enable it or not.

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20220326060226.1892-3-longpeng2@huawei.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06 09:06:50 -06:00
Longpeng(Mike)
a6f5770fb2 vfio: simplify the conditional statements in vfio_msi_enable
It's unnecessary to test against the specific return value of
VFIO_DEVICE_SET_IRQS, since any positive return is an error
indicating the number of vectors we should retry with.

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20220326060226.1892-2-longpeng2@huawei.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06 09:06:50 -06:00
Marc-André Lureau
8e3b0cbb72 Replace qemu_real_host_page variables with inlined functions
Replace the global variables with inlined helper functions. getpagesize() is very
likely annotated with a "const" function attribute (at least with glibc), and thus
optimization should apply even better.

This avoids the need for a constructor initialization too.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06 10:50:38 +02:00
Markus Armbruster
b21e238037 Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n).  It's also safer,
for two reasons.  One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.

This commit only touches allocations with size arguments of the form
sizeof(T).

Patch created mechanically with:

    $ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \
	     --macro-file scripts/cocci-macro-file.h FILES...

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20220315144156.1595462-4-armbru@redhat.com>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
2022-03-21 15:44:44 +01:00
Longpeng(Mike)
def4c5570c kvm/msi: do explicit commit when adding msi routes
We invoke the kvm_irqchip_commit_routes() for each addition to MSI route
table, which is not efficient if we are adding lots of routes in some cases.

This patch lets callers invoke the kvm_irqchip_commit_routes(), so the
callers can decide how to optimize.

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg00967.html

Signed-off-by: Longpeng <longpeng2@huawei.com>
Message-Id: <20220222141116.2091-3-longpeng2@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-15 11:26:20 +01:00
Bernhard Beschow
5e78c98b7c Mark remaining global TypeInfo instances as const
More than 1k of TypeInfo instances are already marked as const. Mark the
remaining ones, too.

This commit was created with:
  git grep -z -l 'static TypeInfo' -- '*.c' | \
  xargs -0 sed -i 's/static TypeInfo/static const TypeInfo/'

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Message-id: 20220117145805.173070-2-shentey@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-02-21 13:30:20 +00:00
Peng Liang
f3bc3a73c9 vfio: Fix memory leak of hostwin
hostwin is allocated and added to hostwin_list in vfio_host_win_add, but
it is only deleted from hostwin_list in vfio_host_win_del, which causes
a memory leak.  Also, freeing all elements in hostwin_list is missing in
vfio_disconnect_container.

Fix: 2e4109de8e ("vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2)")
CC: qemu-stable@nongnu.org
Signed-off-by: Peng Liang <liangpeng10@huawei.com>
Link: https://lore.kernel.org/r/20211117014739.1839263-1-liangpeng10@huawei.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-11-17 11:25:55 -07:00
Kunkun Jiang
e4b3470838 vfio/common: Add a trace point when a MMIO RAM section cannot be mapped
The MSI-X structures of some devices and other non-MSI-X structures
may be in the same BAR. They may share one host page, especially in
the case of large page granularity, such as 64K.

For example, MSIX-Table size of 82599 NIC is 0x30 and the offset in
Bar 3(size 64KB) is 0x0. vfio_listener_region_add() will be called
to map the remaining range (0x30-0xffff). If host page size is 64KB,
it will return early at 'int128_ge((int128_make64(iova), llend))'
without any message. Let's add a trace point to inform users like commit
5c08600547 ("vfio: Use a trace point when a RAM section cannot be DMA mapped")
did.

Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Link: https://lore.kernel.org/r/20211027090406.761-3-jiangkunkun@huawei.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-11-01 12:17:51 -06:00
Kunkun Jiang
f36d4fb85f vfio/pci: Add support for mmapping sub-page MMIO BARs after live migration
We can expand MemoryRegions of sub-page MMIO BARs in
vfio_pci_write_config() to improve IO performance for some
devices. However, the MemoryRegions of destination VM are
not expanded any more after live migration. Because their
addresses have been updated in vmstate_load_state()
(vfio_pci_load_config) and vfio_sub_page_bar_update_mapping()
will not be called.

This may result in poor performance after live migration.
So iterate BARs in vfio_pci_load_config() and try to update
sub-page BARs.

Reported-by: Nianyao Tang <tangnianyao@huawei.com>
Reported-by: Qixin Gan <ganqixin@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Link: https://lore.kernel.org/r/20211027090406.761-2-jiangkunkun@huawei.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-11-01 12:17:51 -06:00
Kevin Wolf
f3558b1b76 qdev: Base object creation on QDict rather than QemuOpts
QDicts are both what QMP natively uses and what the keyval parser
produces. Going through QemuOpts isn't useful for either one, so switch
the main device creation function to QDicts. By sharing more code with
the -object/object-add code path, we can even reduce the code size a
bit.

This commit doesn't remove the detour through QemuOpts from any code
path yet, but it allows the following commits to do so.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211008133442.141332-15-kwolf@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-10-15 16:11:22 +02:00
Peter Xu
142518bda5 memory: Name all the memory listeners
Provide a name field for all the memory listeners.  It can be used to identify
which memory listener is which.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210817013553.30584-2-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30 15:30:24 +02:00
Sean Christopherson
56918a126a memory: Add RAM_PROTECTED flag to skip IOMMU mappings
Add a new RAMBlock flag to denote "protected" memory, i.e. memory that
looks and acts like RAM but is inaccessible via normal mechanisms,
including DMA.  Use the flag to skip protected memory regions when
mapping RAM for DMA in VFIO.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30 14:50:19 +02:00
Cai Huoqing
631ba5a128 hw/vfio: Fix typo in comments
Fix typo in comments:
*programatically  ==> programmatically
*disconecting  ==> disconnecting
*mulitple  ==> multiple
*timout  ==> timeout
*regsiter  ==> register
*forumula  ==> formula

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210730012613.2198-1-caihuoqing@baidu.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-09-16 11:57:01 +02:00
Cornelia Huck
759a5d3be0 vfio-ccw: forward halt/clear errors
hsch and csch basically have two parts: execute the command,
and perform the halt/clear function. For fully emulated
subchannels, it is pretty clear how it will work: check the
subchannel state, and actually 'perform the halt/clear function'
and set cc 0 if everything looks good.

For passthrough subchannels, some of the checking is done
within QEMU, but some has to be done within the kernel. QEMU's
subchannel state may be such that we can perform the async
function, but the kernel may still get a cc != 0 when it is
actually executing the instruction. In that case, we need to
set the condition actually encountered by the kernel; if we
set cc 0 on error, we would actually need to inject an interrupt
as well.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Jared Rossi <jrossi@linux.ibm.com>
Message-Id: <20210705163952.736020-2-cohuck@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2021-09-06 16:22:54 +02:00
Markus Armbruster
eb24a23e15 vfio: Avoid error_propagate() after migrate_add_blocker()
When migrate_add_blocker(blocker, &err) is followed by
error_propagate(errp, err), we can often just as well do
migrate_add_blocker(..., errp).  This is the case in
vfio_migration_probe().

Prior art: commit 386f6c07d2 "error: Avoid error_propagate() after
migrate_add_blocker()".

Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210720125408.387910-8-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
2021-08-26 17:15:28 +02:00
Cai Huoqing
1bd9f1b14d vfio/pci: Add pba_offset PCI quirk for BAIDU KUNLUN AI processor
Fix pba_offset initialization value for BAIDU KUNLUN Virtual
Function device. The KUNLUN hardware returns an incorrect
value for the VF PBA offset, and add a quirk to instead
return a hardcoded value of 0xb400.

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Link: https://lore.kernel.org/r/20210713093743.942-1-caihuoqing@baidu.com
[aw: comment & whitespace tuning]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-07-14 13:47:17 -06:00
Cai Huoqing
936555bc4f vfio/pci: Change to use vfio_pci_is()
Make use of vfio_pci_is() helper function.

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Link: https://lore.kernel.org/r/20210713014831.742-1-caihuoqing@baidu.com
[aw: commit log wording]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-07-14 13:47:17 -06:00
David Hildenbrand
a5dba9bc05 vfio: Fix CID 1458134 in vfio_register_ram_discard_listener()
CID 1458134:  Integer handling issues  (BAD_SHIFT)
    In expression "1 << ctz64(container->pgsizes)", left shifting by more
    than 31 bits has undefined behavior.  The shift amount,
    "ctz64(container->pgsizes)", is 64.

Commit 5e3b981c33 ("vfio: Support for RamDiscardManager in the !vIOMMU
case") added an assertion that our granularity is at least as big as the
page size.

Although unlikely, we could have a page size that does not fit into
32 bit. In that case, we'd try shifting by more than 31 bit.

Let's use 1ULL instead and make sure we're not shifting by more than 63
bit by asserting that any bit in container->pgsizes is set.

Fixes: CID 1458134
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Link: https://lore.kernel.org/r/20210712083135.15755-1-david@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-07-14 13:47:17 -06:00
Peter Maydell
57e28d34c0 s390x updates:
- add gen16 cpumodels
 - refactor/cleanup some code
 - bugfixes
 -----BEGIN PGP SIGNATURE-----
 
 iIgEABYIADAWIQRpo7U29cv8ZSCAJsHeiLtWQd5mwQUCYObg3RIcY29odWNrQHJl
 ZGhhdC5jb20ACgkQ3oi7VkHeZsGAdAD/dSZkhfgjNWJjka0hmnyQyNCSzq6jox1L
 PccGyqhkqU8BAM4DUa2bZdst8bLfhUuAA0M5gKkCqkzHdDraBqTL8LQJ
 =H7dn
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/cohuck-gitlab/tags/s390x-20210708' into staging

s390x updates:
- add gen16 cpumodels
- refactor/cleanup some code
- bugfixes

# gpg: Signature made Thu 08 Jul 2021 12:26:21 BST
# gpg:                using EDDSA key 69A3B536F5CBFC65208026C1DE88BB5641DE66C1
# gpg:                issuer "cohuck@redhat.com"
# gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>" [unknown]
# gpg:                 aka "Cornelia Huck <huckc@linux.vnet.ibm.com>" [full]
# gpg:                 aka "Cornelia Huck <cornelia.huck@de.ibm.com>" [full]
# gpg:                 aka "Cornelia Huck <cohuck@kernel.org>" [unknown]
# gpg:                 aka "Cornelia Huck <cohuck@redhat.com>" [unknown]
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0  18CE DECF 6B93 C6F0 2FAF
#      Subkey fingerprint: 69A3 B536 F5CB FC65 2080  26C1 DE88 BB56 41DE 66C1

* remotes/cohuck-gitlab/tags/s390x-20210708:
  target/s390x: split sysemu part of cpu models
  target/s390x: move kvm files into kvm/
  target/s390x: remove kvm-stub.c
  target/s390x: use kvm_enabled() to wrap call to kvm_s390_get_hpage_1m
  target/s390x: make helper.c sysemu-only
  target/s390x: split cpu-dump from helper.c
  target/s390x: move sysemu-only code out to cpu-sysemu.c
  target/s390x: start moving TCG-only code to tcg/
  target/s390x: rename internal.h to s390x-internal.h
  target/s390x: remove tcg-stub.c
  hw/s390x: only build tod-tcg from the CONFIG_TCG build
  hw/s390x: tod: make explicit checks for accelerators when initializing
  hw/s390x: rename tod-qemu.c to tod-tcg.c
  target/s390x: meson: add target_user_arch
  s390x/tcg: Fix m5 vs. m4 field for VECTOR MULTIPLY SUM LOGICAL
  target/s390x: Fix CC set by CONVERT TO FIXED/LOGICAL
  s390x/cpumodel: add 3931 and 3932

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-12 19:15:11 +01:00
David Hildenbrand
53d1b5fcfb vfio: Disable only uncoordinated discards for VFIO_TYPE1 iommus
We support coordinated discarding of RAM using the RamDiscardManager for
the VFIO_TYPE1 iommus. Let's unlock support for coordinated discards,
keeping uncoordinated discards (e.g., via virtio-balloon) disabled if
possible.

This unlocks virtio-mem + vfio on x86-64. Note that vfio used via "nvme://"
by the block layer has to be implemented/unlocked separately. For now,
virtio-mem only supports x86-64; we don't restrict RamDiscardManager to
x86-64, though: arm64 and s390x are supposed to work as well, and we'll
test once unlocking virtio-mem support. The spapr IOMMUs will need special
care, to be tackled later, e.g.., once supporting virtio-mem.

Note: The block size of a virtio-mem device has to be set to sane sizes,
depending on the maximum hotplug size - to not run out of vfio mappings.
The default virtio-mem block size is usually in the range of a couple of
MBs. The maximum number of mapping is 64k, shared with other users.
Assume you want to hotplug 256GB using virtio-mem - the block size would
have to be set to at least 8 MiB (resulting in 32768 separate mappings).

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210413095531.25603-14-david@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-07-08 15:54:45 -04:00
David Hildenbrand
0fd7616e0f vfio: Support for RamDiscardManager in the vIOMMU case
vIOMMU support works already with RamDiscardManager as long as guests only
map populated memory. Both, populated and discarded memory is mapped
into &address_space_memory, where vfio_get_xlat_addr() will find that
memory, to create the vfio mapping.

Sane guests will never map discarded memory (e.g., unplugged memory
blocks in virtio-mem) into an IOMMU - or keep it mapped into an IOMMU while
memory is getting discarded. However, there are two cases where a malicious
guests could trigger pinning of more memory than intended.

One case is easy to handle: the guest trying to map discarded memory
into an IOMMU.

The other case is harder to handle: the guest keeping memory mapped in
the IOMMU while it is getting discarded. We would have to walk over all
mappings when discarding memory and identify if any mapping would be a
violation. Let's keep it simple for now and print a warning, indicating
that setting RLIMIT_MEMLOCK can mitigate such attacks.

We have to take care of incoming migration: at the point the
IOMMUs get restored and start creating mappings in vfio, RamDiscardManager
implementations might not be back up and running yet: let's add runstate
priorities to enforce the order when restoring.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210413095531.25603-10-david@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-07-08 15:54:45 -04:00
David Hildenbrand
a74317f636 vfio: Sanity check maximum number of DMA mappings with RamDiscardManager
Although RamDiscardManager can handle running into the maximum number of
DMA mappings by propagating errors when creating a DMA mapping, we want
to sanity check and warn the user early that there is a theoretical setup
issue and that virtio-mem might not be able to provide as much memory
towards a VM as desired.

As suggested by Alex, let's use the number of KVM memory slots to guess
how many other mappings we might see over time.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210413095531.25603-9-david@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-07-08 15:54:45 -04:00
David Hildenbrand
3eed155caf vfio: Query and store the maximum number of possible DMA mappings
Let's query the maximum number of possible DMA mappings by querying the
available mappings when creating the container (before any mappings are
created). We'll use this informaton soon to perform some sanity checks
and warn the user.

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210413095531.25603-8-david@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-07-08 15:54:45 -04:00
David Hildenbrand
5e3b981c33 vfio: Support for RamDiscardManager in the !vIOMMU case
Implement support for RamDiscardManager, to prepare for virtio-mem
support. Instead of mapping the whole memory section, we only map
"populated" parts and update the mapping when notified about
discarding/population of memory via the RamDiscardListener. Similarly, when
syncing the dirty bitmaps, sync only the actually mapped (populated) parts
by replaying via the notifier.

Using virtio-mem with vfio is still blocked via
ram_block_discard_disable()/ram_block_discard_require() after this patch.

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210413095531.25603-7-david@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-07-08 15:54:45 -04:00
Cho, Yu-Chen
67043607d1 target/s390x: move kvm files into kvm/
move kvm files into kvm/
After the reshuffling, update MAINTAINERS accordingly.
Make use of the new directory:

target/s390x/kvm/

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Cho, Yu-Chen <acho@suse.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210707105324.23400-14-acho@suse.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2021-07-07 14:01:59 +02:00
Eric Farman
c626710fc7 s390x/css: Add passthrough IRB
Wire in the subchannel callback for building the IRB
ESW and ECW space for passthrough devices, and copy
the hardware's ESW into the IRB we are building.

If the hardware presented concurrent sense, then copy
that sense data into the IRB's ECW space.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <20210617232537.1337506-5-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2021-06-21 08:48:21 +02:00
Kirti Wankhede
d742d064c1 vfio/migration: Correct device state from vmstate change for savevm case
Set _SAVING flag for device state from vmstate change handler when it
gets called from savevm.

Currently State transition savevm/suspend is seen as:
    _RUNNING -> _STOP -> Stop-and-copy -> _STOP

State transition savevm/suspend should be:
    _RUNNING -> Stop-and-copy -> _STOP

State transition from _RUNNING to _STOP occurs from
vfio_vmstate_change() where when vmstate changes from running to
!running, _RUNNING flag is reset but at the same time when
vfio_vmstate_change() is called for RUN_STATE_SAVE_VM, _SAVING bit
should be set.

Reported by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Message-Id: <1623177441-27496-1-git-send-email-kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-06-18 09:10:35 -06:00
Kunkun Jiang
22fca190e2 vfio: Fix unregister SaveVMHandler in vfio_migration_finalize
In the vfio_migration_init(), the SaveVMHandler is registered for
VFIO device. But it lacks the operation of 'unregister'. It will
lead to 'Segmentation fault (core dumped)' in
qemu_savevm_state_setup(), if performing live migration after a
VFIO device is hot deleted.

Fixes: 7c2f5f75f9 (vfio: Register SaveVMHandlers for VFIO device)
Reported-by: Qixin Gan <ganqixin@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Message-Id: <20210527123101.289-1-jiangkunkun@huawei.com>
Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-06-18 08:38:04 -06:00
Stefano Garzarella
d0fb9657a3 docs: fix references to docs/devel/tracing.rst
Commit e50caf4a5c ("tracing: convert documentation to rST")
converted docs/devel/tracing.txt to docs/devel/tracing.rst.

We still have several references to the old file, so let's fix them
with the following command:

  sed -i s/tracing.txt/tracing.rst/ $(git grep -l docs/devel/tracing.txt)

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210517151702.109066-2-sgarzare@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2021-06-02 06:51:09 +02:00
Eric Farman
dcc9cf3801 vfio-ccw: Attempt to clean up all IRQs on error
The vfio_ccw_unrealize() routine makes an unconditional attempt to
unregister every IRQ notifier, though they may not have been registered
in the first place (when running on an older kernel, for example).

Let's mirror this behavior in the error cleanups in vfio_ccw_realize()
so that if/when new IRQs are added, it is less confusing to recognize
the necessary procedures. The worst case scenario would be some extra
messages about an undefined IRQ, but since this is an error exit that
won't be the only thing to worry about.

And regarding those messages, let's change it to a warning instead of
an error, to better reflect their severity. The existing code in both
paths handles everything anyway.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Acked-by: Matthew Rosato <mjrosato@linux.ibm.com>
Message-Id: <20210428143652.1571487-1-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2021-05-20 14:19:30 +02:00
Eric Farman
6178d4689a vfio-ccw: Permit missing IRQs
Commit 690e29b911 ("vfio-ccw: Refactor ccw irq handler") changed
one of the checks for the IRQ notifier registration from saying
"the host needs to recognize the only IRQ that exists" to saying
"the host needs to recognize ANY IRQ that exists."

And this worked fine, because the subsequent change to support the
CRW IRQ notifier doesn't get into this code when running on an older
kernel, thanks to a guard by a capability region. The later addition
of the REQ(uest) IRQ by commit b2f96f9e4f ("vfio-ccw: Connect the
device request notifier") broke this assumption because there is no
matching capability region. Thus, running new QEMU on an older
kernel fails with:

  vfio: unexpected number of irqs 2

Let's adapt the message here so that there's a better clue of what
IRQ is missing.

Furthermore, let's make the REQ(uest) IRQ not fail when attempting
to register it, to permit running vfio-ccw on a newer QEMU with an
older kernel.

Fixes: b2f96f9e4f ("vfio-ccw: Connect the device request notifier")
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <20210421152053.2379873-1-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2021-05-20 14:19:30 +02:00
Thomas Huth
2068cabd3f Do not include cpu.h if it's not really necessary
Stop including cpu.h in files that don't need it.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-4-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-05-02 17:24:51 +02:00
Thomas Huth
4c386f8064 Do not include sysemu/sysemu.h if it's not really necessary
Stop including sysemu/sysemu.h in files that don't need it.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-2-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-05-02 17:24:50 +02:00
Thomas Huth
f6527eadeb hw: Do not include hw/sysbus.h if it is not necessary
Many files include hw/sysbus.h without needing it. Remove the superfluous
include statements.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210327082804.2259480-1-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-05-02 17:24:50 +02:00
Thomas Huth
e06054368c hw: Remove superfluous includes of hw/hw.h
The include/hw/hw.h header only has a prototype for hw_error(),
so it does not make sense to include this in files that do not
use this function.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210326151848.2217216-1-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-05-02 17:24:50 +02:00
Keqian Zhu
758b96b61d vfio/migrate: Move switch of dirty tracking into vfio_memory_listener
For now the switch of vfio dirty page tracking is integrated into
@vfio_save_handler. The reason is that some PCI vendor driver may
start to track dirty base on _SAVING state of device, so if dirty
tracking is started before setting device state, vfio will report
full-dirty to QEMU.

However, the dirty bmap of all ramblocks are fully set when setup
ram saving, so it's not matter whether the device is in _SAVING
state when start vfio dirty tracking.

Moreover, this logic causes some problems [1]. The object of dirty
tracking is guest memory, but the object of @vfio_save_handler is
device state, which produces unnecessary coupling and conflicts:

1. Coupling: Their saving granule is different (perVM vs perDevice).
   vfio will enable dirty_page_tracking for each devices, actually
   once is enough.

2. Conflicts: The ram_save_setup() traverses all memory_listeners
   to execute their log_start() and log_sync() hooks to get the
   first round dirty bitmap, which is used by the bulk stage of
   ram saving. However, as vfio dirty tracking is not yet started,
   it can't get dirty bitmap from vfio. Then we give up the chance
   to handle vfio dirty page at bulk stage.

Move the switch of vfio dirty_page_tracking into vfio_memory_listener
can solve above problems. Besides, Do not require devices in SAVING
state for vfio_sync_dirty_bitmap().

[1] https://www.spinics.net/lists/kvm/msg229967.html

Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210309031913.11508-1-zhukeqian1@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16 10:06:44 -06:00
Kunkun Jiang
1eb7f64275 vfio: Support host translation granule size
The cpu_physical_memory_set_dirty_lebitmap() can quickly deal with
the dirty pages of memory by bitmap-traveling, regardless of whether
the bitmap is aligned correctly or not.

cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of
host page size. So it'd better to set bitmap_pgsize to host page size
to support more translation granule sizes.

[aw: The Fixes commit below introduced code to restrict migration
support to configurations where the target page size intersects the
host dirty page support.  For example, a 4K guest on a 4K host.
Due to the above flexibility in bitmap handling, this restriction
unnecessarily prevents mixed target/host pages size that could
otherwise be supported.  Use host page size for dirty bitmap.]

Fixes: 87ea529c50 ("vfio: Get migration capability flags for container")
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
Message-Id: <20210304133446.1521-1-jiangkunkun@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16 10:06:44 -06:00
Shenming Lu
ecebe53fe9 vfio: Avoid disabling and enabling vectors repeatedly in VFIO migration
In VFIO migration resume phase and some guest startups, there are
already unmasked vectors in the vector table when calling
vfio_msix_enable(). So in order to avoid inefficiently disabling
and enabling vectors repeatedly, let's allocate all needed vectors
first and then enable these unmasked vectors one by one without
disabling.

Signed-off-by: Shenming Lu <lushenming@huawei.com>
Message-Id: <20210310030233.1133-4-lushenming@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16 10:06:44 -06:00
Shenming Lu
8ce1ff990e vfio: Set the priority of the VFIO VM state change handler explicitly
In the VFIO VM state change handler when stopping the VM, the _RUNNING
bit in device_state is cleared which makes the VFIO device stop, including
no longer generating interrupts. Then we can save the pending states of
all interrupts in the GIC VM state change handler (on ARM).

So we have to set the priority of the VFIO VM state change handler
explicitly (like virtio devices) to ensure it is called before the
GIC's in saving.

Signed-off-by: Shenming Lu <lushenming@huawei.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20210310030233.1133-3-lushenming@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16 10:06:44 -06:00
Shenming Lu
d329f5032e vfio: Move the saving of the config space to the right place in VFIO migration
On ARM64 the VFIO SET_IRQS ioctl is dependent on the VM interrupt
setup, if the restoring of the VFIO PCI device config space is
before the VGIC, an error might occur in the kernel.

So we move the saving of the config space to the non-iterable
process, thus it will be called after the VGIC according to
their priorities.

As for the possible dependence of the device specific migration
data on it's config space, we can let the vendor driver to
include any config info it needs in its own data stream.

Signed-off-by: Shenming Lu <lushenming@huawei.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Message-Id: <20210310030233.1133-2-lushenming@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16 10:06:44 -06:00
Eric Auger
8dca037b48 vfio: Do not register any IOMMU_NOTIFIER_DEVIOTLB_UNMAP notifier
In an attempt to fix smmu/virtio-iommu - vhost regression, commit
958ec334bc ("vhost: Unbreak SMMU and virtio-iommu on dev-iotlb support")
broke virtio-iommu integration. This is due to the fact VFIO registers
IOMMU_NOTIFIER_ALL notifiers, which includes IOMMU_NOTIFIER_DEVIOTLB_UNMAP
and this latter now is rejected by the virtio-iommu. As a consequence,
the registration fails. VHOST behaves like a device with an ATC cache. The
VFIO device does not support this scheme yet.

Let's register only legacy MAP and UNMAP notifiers.

Fixes: 958ec334bc ("vhost: Unbreak SMMU and virtio-iommu on dev-iotlb support")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210209213233.40985-2-eric.auger@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16 10:06:44 -06:00
Philippe Mathieu-Daudé
4eda914cac hw/vfio/pci-quirks: Replace the word 'blacklist'
Follow the inclusive terminology from the "Conscious Language in your
Open Source Projects" guidelines [*] and replace the word "blacklist"
appropriately.

[*] https://github.com/conscious-lang/conscious-lang-docs/blob/main/faq.md

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210205171817.2108907-9-philmd@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16 10:06:44 -06:00
Zenghui Yu
4292d50193 vfio: Fix vfio_listener_log_sync function name typo
There is an obvious typo in the function name of the .log_sync() callback.
Spell it correctly.

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Message-Id: <20201204014240.772-1-yuzenghui@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16 10:06:44 -06:00
Philippe Mathieu-Daudé
538f049704 sysemu: Let VMChangeStateHandler take boolean 'running' argument
The 'running' argument from VMChangeStateHandler does not require
other value than 0 / 1. Make it a plain boolean.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20210111152020.1422021-3-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-03-09 23:13:57 +01:00
Eric Farman
d6cd66311f vfio-ccw: Do not read region ret_code after write
A pwrite() call returns the number of bytes written (or -1 on error),
and vfio-ccw compares this number with the size of the region to
determine if an error had occurred or not.

If they are not equal, this is a failure and the errno is used to
determine exactly how things failed. An errno of zero is possible
(though unlikely) in this situation and would be translated to a
successful operation.

If they ARE equal, the ret_code field is read from the region to
determine how to proceed. While the kernel sets the ret_code field
as necessary, the region and thus this field is not "written back"
to the user. So the value can only be what it was initialized to,
which is zero.

So, let's convert an unexpected length with errno of zero to a
return code of -EFAULT, and explicitly set an expected length to
a return code of zero. This will be a little safer and clearer.

Suggested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <20210303160739.2179378-1-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2021-03-04 11:24:49 +01:00
Prasad J Pandit
24202d2b56 vfio: add quirk device write method
Add vfio quirk device mmio write method to avoid NULL pointer
dereference issue.

Reported-by: Lei Sun <slei.casper@gmail.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20200811114133.672647-4-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-08 15:15:32 +01:00
Marc-André Lureau
a7dfbe289e ui: add an optional get_flags callback to GraphicHwOps
Those flags can be used to express different requirements for the
display or other needs.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20210204105232.834642-12-marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2021-02-04 15:58:54 +01:00
Eric Farman
b2f96f9e4f vfio-ccw: Connect the device request notifier
Now that the vfio-ccw code has a notifier interface to request that
a device be unplugged, let's wire that together.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20210104202057.48048-4-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2021-01-21 11:19:45 +01:00
Peter Maydell
729cc68373 Remove superfluous timer_del() calls
This commit is the result of running the timer-del-timer-free.cocci
script on the whole source tree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201215154107.3255-4-peter.maydell@linaro.org
2021-01-08 15:13:38 +00:00
Eduardo Habkost
1e198715e1 qdev: Rename qdev_get_prop_ptr() to object_field_prop_ptr()
The function will be moved to common QOM code, as it is not
specific to TYPE_DEVICE anymore.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Paul Durrant <paul@xen.org>
Message-Id: <20201211220529.2290218-31-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-12-18 15:20:18 -05:00
Eduardo Habkost
ea7c1e5c3e qdev: Move dev->realized check to qdev_property_set()
Every single qdev property setter function manually checks
dev->realized.  We can just check dev->realized inside
qdev_property_set() instead.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Paul Durrant <paul@xen.org>
Message-Id: <20201211220529.2290218-24-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-12-18 15:20:17 -05:00
Eduardo Habkost
ce35e2295e qdev: Move softmmu properties to qdev-properties-system.h
Move the property types and property macros implemented in
qdev-properties-system.c to a new qdev-properties-system.h
header.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20201211220529.2290218-16-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-12-18 15:20:17 -05:00
Eduardo Habkost
828ade86ee qdev: Make qdev_get_prop_ptr() get Object* arg
Make the code more generic and not specific to TYPE_DEVICE.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com> #s390 parts
Acked-by: Paul Durrant <paul@xen.org>
Message-Id: <20201211220529.2290218-10-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-12-15 10:02:07 -05:00
Kirti Wankhede
bb0990d174 vfio: Change default dirty pages tracking behavior during migration
By default dirty pages tracking is enabled during iterative phase
(pre-copy phase).
Added per device opt-out option 'x-pre-copy-dirty-page-tracking' to
disable dirty pages tracking during iterative phase. If the option
'x-pre-copy-dirty-page-tracking=off' is set for any VFIO device, dirty
pages tracking during iterative phase will be disabled.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-23 10:05:58 -07:00
Alex Williamson
cf254988a5 vfio: Make migration support experimental
Support for migration of vfio devices is still in flux.  Developers
are attempting to add support for new devices and new architectures,
but none are yet readily available for validation.  We have concerns
whether we're transferring device resources at the right point in the
migration, whether we're guaranteeing that updates during pre-copy are
migrated, and whether we can provide bit-stream compatibility should
any of this change.  Even the question of whether devices should
participate in dirty page tracking during pre-copy seems contentious.
In short, migration support has not had enough soak time and it feels
premature to mark it as supported.

Create an experimental option such that we can continue to develop.

[Retaining previous acks/reviews for a previously identical code
 change with different specifics in the commit log.]

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-23 08:29:29 -07:00
Stefan Weil
ac9574bc87 docs: Fix some typos (found by codespell)
Fix also a similar typo in a code comment.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-Id: <20201117193448.393472-1-sw@weilnetz.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-18 09:29:41 +01:00
Kirti Wankhede
e408aeef86 Fix use after free in vfio_migration_probe
Fixes Coverity issue:
CID 1436126:  Memory - illegal accesses  (USE_AFTER_FREE)

Fixes: a9e271ec9b ("vfio: Add migration region initialization and finalize function")
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: David Edmondson <dme@dme.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-12 15:58:16 -07:00
Jean-Philippe Brucker
1b296c3def vfio: Don't issue full 2^64 unmap
IOMMUs may declare memory regions spanning from 0 to UINT64_MAX. When
attempting to deal with such region, vfio_listener_region_del() passes a
size of 2^64 to int128_get64() which throws an assertion failure.  Even
ignoring this, the VFIO_IOMMU_DMA_MAP ioctl cannot handle this size
since the size field is 64-bit. Split the request in two.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-11-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 16:39:05 -05:00
Bharat Bhushan
b917749842 vfio: Set IOMMU page size as per host supported page size
Set IOMMU supported page size mask same as host Linux supported page
size mask.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-9-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 07:19:27 -05:00
Zhengui li
c624b6b312 vfio: fix incorrect print type
The type of input variable is unsigned int
while the printer type is int. So fix incorrect print type.

Signed-off-by: Zhengui li <lizhengui@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:52 -07:00
Amey Narkhede
88eef59796 hw/vfio: Use lock guard macros
Use qemu LOCK_GUARD macros in hw/vfio.
Saves manual unlock calls

Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:52 -07:00
Matthew Rosato
92fe289ace vfio: Add routine for finding VFIO_DEVICE_GET_INFO capabilities
Now that VFIO_DEVICE_GET_INFO supports capability chains, add a helper
function to find specific capabilities in the chain.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:52 -07:00
Matthew Rosato
7486a62845 vfio: Find DMA available capability
The underlying host may be limiting the number of outstanding DMA
requests for type 1 IOMMU.  Add helper functions to check for the
DMA available capability and retrieve the current number of DMA
mappings allowed.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[aw: vfio_get_info_dma_avail moved inside CONFIG_LINUX]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:51 -07:00
Matthew Rosato
3ab7a0b40d vfio: Create shared routine for scanning info capabilities
Rather than duplicating the same loop in multiple locations,
create a static function to do the work.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:51 -07:00
Kirti Wankhede
3710586caa qapi: Add VFIO devices migration stats in Migration stats
Added amount of bytes transferred to the VM at destination by all VFIO
devices

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:51 -07:00
Kirti Wankhede
a22651053b vfio: Make vfio-pci device migration capable
If the device is not a failover primary device, call
vfio_migration_probe() and vfio_migration_finalize() to enable
migration support for those devices that support it respectively to
tear it down again.
Removed migration blocker from VFIO PCI device specific structure and use
migration blocker from generic structure of  VFIO device.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:51 -07:00
Kirti Wankhede
9e7b0442f2 vfio: Add ioctl to get dirty pages bitmap during dma unmap
With vIOMMU, IO virtual address range can get unmapped while in pre-copy
phase of migration. In that case, unmap ioctl should return pages pinned
in that range and QEMU should find its correcponding guest physical
addresses and report those dirty.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
[aw: fix error_report types, fix cpu_physical_memory_set_dirty_lebitmap() cast]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:51 -07:00