Commit Graph

501 Commits

Author SHA1 Message Date
Matthew Rosato
3ab7a0b40d vfio: Create shared routine for scanning info capabilities
Rather than duplicating the same loop in multiple locations,
create a static function to do the work.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:51 -07:00
Kirti Wankhede
3710586caa qapi: Add VFIO devices migration stats in Migration stats
Added amount of bytes transferred to the VM at destination by all VFIO
devices

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:51 -07:00
Kirti Wankhede
a22651053b vfio: Make vfio-pci device migration capable
If the device is not a failover primary device, call
vfio_migration_probe() and vfio_migration_finalize() to enable
migration support for those devices that support it respectively to
tear it down again.
Removed migration blocker from VFIO PCI device specific structure and use
migration blocker from generic structure of  VFIO device.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:51 -07:00
Kirti Wankhede
9e7b0442f2 vfio: Add ioctl to get dirty pages bitmap during dma unmap
With vIOMMU, IO virtual address range can get unmapped while in pre-copy
phase of migration. In that case, unmap ioctl should return pages pinned
in that range and QEMU should find its correcponding guest physical
addresses and report those dirty.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
[aw: fix error_report types, fix cpu_physical_memory_set_dirty_lebitmap() cast]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:51 -07:00
Kirti Wankhede
9a04fe0957 vfio: Dirty page tracking when vIOMMU is enabled
When vIOMMU is enabled, register MAP notifier from log_sync when all
devices in container are in stop and copy phase of migration. Call replay
and get dirty pages from notifier callback.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:51 -07:00
Kirti Wankhede
b6dd6504e3 vfio: Add vfio_listener_log_sync to mark dirty pages
vfio_listener_log_sync gets list of dirty pages from container using
VFIO_IOMMU_GET_DIRTY_BITMAP ioctl and mark those pages dirty when all
devices are stopped and saving state.
Return early for the RAM block section of mapped MMIO region.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
[aw: fix error_report types, fix cpu_physical_memory_set_dirty_lebitmap() cast]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:51 -07:00
Kirti Wankhede
e663f51683 vfio: Add function to start and stop dirty pages tracking
Call VFIO_IOMMU_DIRTY_PAGES ioctl to start and stop dirty pages tracking
for VFIO devices.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:51 -07:00
Kirti Wankhede
87ea529c50 vfio: Get migration capability flags for container
Added helper functions to get IOMMU info capability chain.
Added function to get migration capability information from that
capability chain for IOMMU container.

Similar change was proposed earlier:
https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg03759.html

Disable migration for devices if IOMMU module doesn't support migration
capability.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Cc: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:50 -07:00
Kirti Wankhede
3336d21710 vfio: Add load state functions to SaveVMHandlers
Sequence  during _RESUMING device state:
While data for this device is available, repeat below steps:
a. read data_offset from where user application should write data.
b. write data of data_size to migration region from data_offset.
c. write data_size which indicates vendor driver that data is written in
   staging buffer.

For user, data is opaque. User should write data in the same order as
received.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:50 -07:00
Kirti Wankhede
1bc3c535ff vfio: Add save state functions to SaveVMHandlers
Added .save_live_pending, .save_live_iterate and .save_live_complete_precopy
functions. These functions handles pre-copy and stop-and-copy phase.

In _SAVING|_RUNNING device state or pre-copy phase:
- read pending_bytes. If pending_bytes > 0, go through below steps.
- read data_offset - indicates kernel driver to write data to staging
  buffer.
- read data_size - amount of data in bytes written by vendor driver in
  migration region.
- read data_size bytes of data from data_offset in the migration region.
- Write data packet to file stream as below:
{VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data,
VFIO_MIG_FLAG_END_OF_STATE }

In _SAVING device state or stop-and-copy phase
a. read config space of device and save to migration file stream. This
   doesn't need to be from vendor driver. Any other special config state
   from driver can be saved as data in following iteration.
b. read pending_bytes. If pending_bytes > 0, go through below steps.
c. read data_offset - indicates kernel driver to write data to staging
   buffer.
d. read data_size - amount of data in bytes written by vendor driver in
   migration region.
e. read data_size bytes of data from data_offset in the migration region.
f. Write data packet as below:
   {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data}
g. iterate through steps b to f while (pending_bytes > 0)
h. Write {VFIO_MIG_FLAG_END_OF_STATE}

When data region is mapped, its user's responsibility to read data from
data_offset of data_size before moving to next steps.

Added fix suggested by Artem Polyakov to reset pending_bytes in
vfio_save_iterate().
Added fix suggested by Zhi Wang to add 0 as data size in migration stream and
add END_OF_STATE delimiter to indicate phase complete.

Suggested-by: Artem Polyakov <artemp@nvidia.com>
Suggested-by: Zhi Wang <zhi.wang.linux@gmail.com>
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:50 -07:00
Kirti Wankhede
7c2f5f75f9 vfio: Register SaveVMHandlers for VFIO device
Define flags to be used as delimiter in migration stream for VFIO devices.
Added .save_setup and .save_cleanup functions. Map & unmap migration
region from these functions at source during saving or pre-copy phase.

Set VFIO device state depending on VM's state. During live migration, VM is
running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO
device. During save-restore, VM is paused, _SAVING state is set for VFIO device.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:50 -07:00
Kirti Wankhede
050c588c2e vfio: Add migration state change notifier
Added migration state change notifier to get notification on migration state
change. These states are translated to VFIO device state and conveyed to
vendor driver.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:50 -07:00
Kirti Wankhede
02a7e71b1e vfio: Add VM state change handler to know state of VM
VM state change handler is called on change in VM's state. Based on
VM state, VFIO device state should be changed.
Added read/write helper functions for migration region.
Added function to set device_state.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[aw: lx -> HWADDR_PRIx, remove redundant parens]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:50 -07:00
Kirti Wankhede
a9e271ec9b vfio: Add migration region initialization and finalize function
Whether the VFIO device supports migration or not is decided based of
migration region query. If migration region query is successful and migration
region initialization is successful then migration is supported else
migration is blocked.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:50 -07:00
Kirti Wankhede
c5e2fb3ce4 vfio: Add save and load functions for VFIO PCI devices
Added functions to save and restore PCI device specific data,
specifically config space of PCI device.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:50 -07:00
Kirti Wankhede
e93b733bcf vfio: Add vfio_get_object callback to VFIODeviceOps
Hook vfio_get_object callback for PCI devices.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Suggested-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:50 -07:00
Kirti Wankhede
0f7a903ba3 vfio: Add function to unmap VFIO region
This function will be used for migration region.
Migration region is mmaped when migration starts and will be unmapped when
migration is complete.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01 12:30:50 -07:00
Cornelia Huck
c8726f7b24 vfio-ccw: plug memory leak while getting region info
vfio_get_dev_region_info() unconditionally allocates memory
for a passed-in vfio_region_info structure (and does not re-use
an already allocated structure). Therefore, we have to free
the structure we pass to that function in vfio_ccw_get_region()
for every region we successfully obtained information for.

Fixes: 8fadea24de ("vfio-ccw: support async command subregion")
Fixes: 46ea3841ed ("vfio-ccw: Add support for the schib region")
Fixes: f030532f2a ("vfio-ccw: Add support for the CRW region and IRQ")
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200928101701.13540-1-cohuck@redhat.com>
2020-10-02 13:52:49 +02:00
Eduardo Habkost
8063396bf3 Use OBJECT_DECLARE_SIMPLE_TYPE when possible
This converts existing DECLARE_INSTANCE_CHECKER usage to
OBJECT_DECLARE_SIMPLE_TYPE when possible.

$ ./scripts/codeconverter/converter.py -i \
  --pattern=AddObjectDeclareSimpleType $(git grep -l '' -- '*.[ch]')

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Paul Durrant <paul@xen.org>
Message-Id: <20200916182519.415636-6-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-09-18 14:12:32 -04:00
Daniel P. Berrangé
448058aa99 util: rename qemu_open() to qemu_open_old()
We want to introduce a new version of qemu_open() that uses an Error
object for reporting problems and make this it the preferred interface.
Rename the existing method to release the namespace for the new impl.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-16 10:33:48 +01:00
Peter Maydell
f4ef8c9cc1 QOM boilerplate cleanup
Documentation build fix:
 * memory: Remove kernel-doc comment marker (Eduardo Habkost)
 
 QOM cleanups:
 * Rename QOM macros for consistency between
   TYPE_* and type checking constants (Eduardo Habkost)
 
 QOM new macros:
 * OBJECT_DECLARE_* and OBJECT_DEFINE_* macros (Daniel P. Berrangé)
 * DECLARE_*_CHECKER macros (Eduardo Habkost)
 
 Automated QOM boilerplate changes:
 * Automated changes to use DECLARE_*_CHECKER (Eduardo Habkost
 * Automated changes to use OBJECT_DECLARE* (Eduardo Habkost)
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEWjIv1avE09usz9GqKAeTb5hNxaYFAl9abc0UHGVoYWJrb3N0
 QHJlZGhhdC5jb20ACgkQKAeTb5hNxaYU9Q/8CyK1w2SlItxBhos7zojqnZ9TP1Jt
 b1YCApQJ+bKSPAUDyefajQA0D9HeR9bFlreiOprQnmZWOqeOvnRIxNGvelJRqRRu
 KcIA5DIfVMJRkKJQEXairrGdnPmFLWSLEb7AmwxyAhp5G51PCP/3kbudi3T/vrNr
 OaccUejs5UgImPfO8Fm+0zqZPmblq/xmtU0p77FvDxGNFPPG8ddpu7eKksGD7FYd
 5bTJTtUhONYG9EJMUD2TBxnJoy1pi6AYUu4+2T211RpBcxeiyNSSitI8fZTk6BGl
 33VwQib9SXjGaE8VsSvHDHhLLec7sqqr2JH3rfvyKF6BOptKWzmSzFdbo2mrRkSy
 8jfCImQgTBBMAHBWP+MFTeKuzfhikZx2DbBLzpppHMMvCca6Zc+oYgR2FbVwuPsw
 H2YL+8Wx4Ws6RXe147toNDRbv75vnS7F3fU800Pcur5VHJWTgSpT/tggzmVPWsdU
 GeUgceYlXyVk5/fC89ZhhtD9eurfBSzQR4eN7/nie2wD6PFMpZkOjHwLn40uWsyq
 xRO0F4uYghNU1N8z6NBhEYLTBtEcS1HFEisSLQrnTQH9W0I7mBx3MaZib/uK7NLC
 b2gT0hossTT8Z46Z8ynoZarwO5EquAMWEQtc9hfZGWacrQEpjVm2DMYMfu83krWb
 xhgl+mpKqVasAPk=
 =RjXc
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ehabkost/tags/machine-next-pull-request' into staging

QOM boilerplate cleanup

Documentation build fix:
* memory: Remove kernel-doc comment marker (Eduardo Habkost)

QOM cleanups:
* Rename QOM macros for consistency between
  TYPE_* and type checking constants (Eduardo Habkost)

QOM new macros:
* OBJECT_DECLARE_* and OBJECT_DEFINE_* macros (Daniel P. Berrangé)
* DECLARE_*_CHECKER macros (Eduardo Habkost)

Automated QOM boilerplate changes:
* Automated changes to use DECLARE_*_CHECKER (Eduardo Habkost
* Automated changes to use OBJECT_DECLARE* (Eduardo Habkost)

# gpg: Signature made Thu 10 Sep 2020 19:17:49 BST
# gpg:                using RSA key 5A322FD5ABC4D3DBACCFD1AA2807936F984DC5A6
# gpg:                issuer "ehabkost@redhat.com"
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" [full]
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/machine-next-pull-request: (33 commits)
  virtio-vga: Use typedef name for instance_size
  vhost-user-vga: Use typedef name for instance_size
  xilinx_axienet: Use typedef name for instance_size
  lpc_ich9: Use typedef name for instance_size
  omap_intc: Use typedef name for instance_size
  xilinx_axidma: Use typedef name for instance_size
  tusb6010: Rename TUSB to TUSB6010
  pc87312: Rename TYPE_PC87312_SUPERIO to TYPE_PC87312
  vfio: Rename PCI_VFIO to VFIO_PCI
  usb: Rename USB_SERIAL_DEV to USB_SERIAL
  sabre: Rename SABRE_DEVICE to SABRE
  rs6000_mc: Rename RS6000MC_DEVICE to RS6000MC
  filter-rewriter: Rename FILTER_COLO_REWRITER to FILTER_REWRITER
  esp: Rename ESP_STATE to ESP
  ahci: Rename ICH_AHCI to ICH9_AHCI
  vmgenid: Rename VMGENID_DEVICE to TYPE_VMGENID
  vfio: Rename VFIO_AP_DEVICE_TYPE to TYPE_VFIO_AP_DEVICE
  dev-smartcard-reader: Rename CCID_DEV_NAME to TYPE_USB_CCID_DEV
  ap-device: Rename AP_DEVICE_TYPE to TYPE_AP_DEVICE
  gpex: Fix type checking function name
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-09-11 19:26:51 +01:00
Eduardo Habkost
01b4606440 vfio: Rename PCI_VFIO to VFIO_PCI
Make the type checking macro name consistent with the TYPE_*
constant.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20200902224311.1321159-56-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-09-09 13:20:22 -04:00
Eduardo Habkost
8b3a1ee5f2 vfio: Rename VFIO_AP_DEVICE_TYPE to TYPE_VFIO_AP_DEVICE
This will make the type name constant consistent with the name of
the type checking macro.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20200902224311.1321159-9-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-09-09 13:20:22 -04:00
Eduardo Habkost
fab2afff61 ap-device: Rename AP_DEVICE_TYPE to TYPE_AP_DEVICE
This will make the type name constant consistent with the name of
the type checking macro.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20200902224311.1321159-6-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-09-09 13:20:22 -04:00
Markus Armbruster
b15e402fc8 trace-events: Fix attribution of trace points to source
Some trace points are attributed to the wrong source file.  Happens
when we neglect to update trace-events for code motion, or add events
in the wrong place, or misspell the file name.

Clean up with help of scripts/cleanup-trace-events.pl.  Funnies
requiring manual post-processing:

* accel/tcg/cputlb.c trace points are in trace-events.

* block.c and blockdev.c trace points are in block/trace-events.

* hw/block/nvme.c uses the preprocessor to hide its trace point use
  from cleanup-trace-events.pl.

* hw/tpm/tpm_spapr.c uses pseudo trace point tpm_spapr_show_buffer to
  guard debug code.

* include/hw/xen/xen_common.h trace points are in hw/xen/trace-events.

* linux-user/trace-events abbreviates a tedious list of filenames to
  */signal.c.

* net/colo-compare and net/filter-rewriter.c use pseudo trace points
  colo_compare_miscompare and colo_filter_rewriter_debug to guard
  debug code.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200806141334.3646302-5-armbru@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-09-09 17:17:58 +01:00
Eduardo Habkost
8110fa1d94 Use DECLARE_*CHECKER* macros
Generated using:

 $ ./scripts/codeconverter/converter.py -i \
   --pattern=TypeCheckMacro $(git grep -l '' -- '*.[ch]')

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-12-ehabkost@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-13-ehabkost@redhat.com>
Message-Id: <20200831210740.126168-14-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-09-09 09:27:09 -04:00
Eduardo Habkost
db1015e92e Move QOM typedefs and add missing includes
Some typedefs and macros are defined after the type check macros.
This makes it difficult to automatically replace their
definitions with OBJECT_DECLARE_TYPE.

Patch generated using:

 $ ./scripts/codeconverter/converter.py -i \
   --pattern=QOMStructTypedefSplit $(git grep -l '' -- '*.[ch]')

which will split "typdef struct { ... } TypedefName"
declarations.

Followed by:

 $ ./scripts/codeconverter/converter.py -i --pattern=MoveSymbols \
    $(git grep -l '' -- '*.[ch]')

which will:
- move the typedefs and #defines above the type check macros
- add missing #include "qom/object.h" lines if necessary

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-9-ehabkost@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-10-ehabkost@redhat.com>
Message-Id: <20200831210740.126168-11-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-09-09 09:26:43 -04:00
Chen Qun
9b83b0043f vfio/platform: Remove dead assignment in vfio_intp_interrupt()
Clang static code analyzer show warning:
hw/vfio/platform.c:239:9: warning: Value stored to 'ret' is never read
        ret = event_notifier_test_and_clear(intp->interrupt);
        ^     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Message-Id: <20200827110311.164316-8-kuhn.chenqun@huawei.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-09-01 12:02:48 +02:00
Eduardo Habkost
42db0fb5e0 vfio/pci: Move QOM macros to header
This will make future conversion to OBJECT_DECLARE* easier.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Tested-By: Roman Bolshakov <r.bolshakov@yadro.com>
Message-Id: <20200825192110.3528606-43-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-08-27 14:04:55 -04:00
Pan Nengyuan
0216b18b79 hw/vfio/ap: Plug memleak in vfio_ap_get_group()
Missing g_error_free() in vfio_ap_get_group() error path. Fix that.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Message-Id: <20200814160241.7915-3-pannengyuan@huawei.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2020-08-27 12:37:03 +02:00
Marc-André Lureau
4f780d5629 meson: convert hw/vfio
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-21 06:30:26 -04:00
Paolo Bonzini
2becc36a3e meson: infrastructure for building emulators
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-21 06:30:17 -04:00
Paolo Bonzini
243af0225a trace: switch position of headers to what Meson requires
Meson doesn't enjoy the same flexibility we have with Make in choosing
the include path.  In particular the tracing headers are using
$(build_root)/$(<D).

In order to keep the include directives unchanged,
the simplest solution is to generate headers with patterns like
"trace/trace-audio.h" and place forwarding headers in the source tree
such that for example "audio/trace.h" includes "trace/trace-audio.h".

This patch is too ugly to be applied to the Makefiles now.  It's only
a way to separate the changes to the tracing header files from the
Meson rewrite of the tracing logic.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-21 06:18:24 -04:00
Gerd Hoffmann
8ec1415935 vfio: fix use-after-free in display
Calling ramfb_display_update() might replace the DisplaySurface with the
boot display, which in turn will free the currently active
DisplaySurface.

So clear our DisplaySurface pinter (dpy->region.surface pointer) to (a)
avoid use-after-free and (b) force replacing the boot display with the
real display when switching back.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-id: 20200713124520.23266-1-kraxel@redhat.com
2020-07-16 10:20:12 +02:00
Markus Armbruster
af175e85f9 error: Eliminate error_propagate() with Coccinelle, part 2
When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away.  The previous commit did that with a Coccinelle script I
consider fairly trustworthy.  This commit uses the same script with
the matching of return taken out, i.e. we convert

    if (!foo(..., &err)) {
        ...
        error_propagate(errp, err);
        ...
    }

to

    if (!foo(..., errp)) {
        ...
        ...
    }

This is unsound: @err could still be read between afterwards.  I don't
know how to express "no read of @err without an intervening write" in
Coccinelle.  Instead, I manually double-checked for uses of @err.

Suboptimal line breaks tweaked manually.  qdev_realize() simplified
further to placate scripts/checkpatch.pl.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-36-armbru@redhat.com>
2020-07-10 15:18:08 +02:00
Markus Armbruster
668f62ec62 error: Eliminate error_propagate() with Coccinelle, part 1
When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away.  Convert

    if (!foo(..., &err)) {
        ...
        error_propagate(errp, err);
        ...
        return ...
    }

to

    if (!foo(..., errp)) {
        ...
        ...
        return ...
    }

where nothing else needs @err.  Coccinelle script:

    @rule1 forall@
    identifier fun, err, errp, lbl;
    expression list args, args2;
    binary operator op;
    constant c1, c2;
    symbol false;
    @@
         if (
    (
    -        fun(args, &err, args2)
    +        fun(args, errp, args2)
    |
    -        !fun(args, &err, args2)
    +        !fun(args, errp, args2)
    |
    -        fun(args, &err, args2) op c1
    +        fun(args, errp, args2) op c1
    )
            )
         {
             ... when != err
                 when != lbl:
                 when strict
    -        error_propagate(errp, err);
             ... when != err
    (
             return;
    |
             return c2;
    |
             return false;
    )
         }

    @rule2 forall@
    identifier fun, err, errp, lbl;
    expression list args, args2;
    expression var;
    binary operator op;
    constant c1, c2;
    symbol false;
    @@
    -    var = fun(args, &err, args2);
    +    var = fun(args, errp, args2);
         ... when != err
         if (
    (
             var
    |
             !var
    |
             var op c1
    )
            )
         {
             ... when != err
                 when != lbl:
                 when strict
    -        error_propagate(errp, err);
             ... when != err
    (
             return;
    |
             return c2;
    |
             return false;
    |
             return var;
    )
         }

    @depends on rule1 || rule2@
    identifier err;
    @@
    -    Error *err = NULL;
         ... when != err

Not exactly elegant, I'm afraid.

The "when != lbl:" is necessary to avoid transforming

         if (fun(args, &err)) {
             goto out
         }
         ...
     out:
         error_propagate(errp, err);

even though other paths to label out still need the error_propagate().
For an actual example, see sclp_realize().

Without the "when strict", Coccinelle transforms vfio_msix_setup(),
incorrectly.  I don't know what exactly "when strict" does, only that
it helps here.

The match of return is narrower than what I want, but I can't figure
out how to express "return where the operand doesn't use @err".  For
an example where it's too narrow, see vfio_intx_enable().

Silently fails to convert hw/arm/armsse.c, because Coccinelle gets
confused by ARMSSE being used both as typedef and function-like macro
there.  Converted manually.

Line breaks tidied up manually.  One nested declaration of @local_err
deleted manually.  Preexisting unwanted blank line dropped in
hw/riscv/sifive_e.c.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-35-armbru@redhat.com>
2020-07-10 15:18:08 +02:00
Markus Armbruster
62a35aaa31 qapi: Use returned bool to check for failure, Coccinelle part
The previous commit enables conversion of

    visit_foo(..., &err);
    if (err) {
        ...
    }

to

    if (!visit_foo(..., errp)) {
        ...
    }

for visitor functions that now return true / false on success / error.
Coccinelle script:

    @@
    identifier fun =~ "check_list|input_type_enum|lv_start_struct|lv_type_bool|lv_type_int64|lv_type_str|lv_type_uint64|output_type_enum|parse_type_bool|parse_type_int64|parse_type_null|parse_type_number|parse_type_size|parse_type_str|parse_type_uint64|print_type_bool|print_type_int64|print_type_null|print_type_number|print_type_size|print_type_str|print_type_uint64|qapi_clone_start_alternate|qapi_clone_start_list|qapi_clone_start_struct|qapi_clone_type_bool|qapi_clone_type_int64|qapi_clone_type_null|qapi_clone_type_number|qapi_clone_type_str|qapi_clone_type_uint64|qapi_dealloc_start_list|qapi_dealloc_start_struct|qapi_dealloc_type_anything|qapi_dealloc_type_bool|qapi_dealloc_type_int64|qapi_dealloc_type_null|qapi_dealloc_type_number|qapi_dealloc_type_str|qapi_dealloc_type_uint64|qobject_input_check_list|qobject_input_check_struct|qobject_input_start_alternate|qobject_input_start_list|qobject_input_start_struct|qobject_input_type_any|qobject_input_type_bool|qobject_input_type_bool_keyval|qobject_input_type_int64|qobject_input_type_int64_keyval|qobject_input_type_null|qobject_input_type_number|qobject_input_type_number_keyval|qobject_input_type_size_keyval|qobject_input_type_str|qobject_input_type_str_keyval|qobject_input_type_uint64|qobject_input_type_uint64_keyval|qobject_output_start_list|qobject_output_start_struct|qobject_output_type_any|qobject_output_type_bool|qobject_output_type_int64|qobject_output_type_null|qobject_output_type_number|qobject_output_type_str|qobject_output_type_uint64|start_list|visit_check_list|visit_check_struct|visit_start_alternate|visit_start_list|visit_start_struct|visit_type_.*";
    expression list args;
    typedef Error;
    Error *err;
    @@
    -    fun(args, &err);
    -    if (err)
    +    if (!fun(args, &err))
         {
             ...
         }

A few line breaks tidied up manually.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200707160613.848843-19-armbru@redhat.com>
2020-07-10 15:18:08 +02:00
David Hildenbrand
aff92b8286 vfio: Convert to ram_block_discard_disable()
VFIO is (except devices without a physical IOMMU or some mediated devices)
incompatible with discarding of RAM. The kernel will pin basically all VM
memory. Let's convert to ram_block_discard_disable(), which can now
fail, in contrast to qemu_balloon_inhibit().

Leave "x-balloon-allowed" named as it is for now.

Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Tony Krowiak <akrowiak@linux.ibm.com>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Pierre Morel <pmorel@linux.ibm.com>
Cc: Eric Farman <farman@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20200626072248.78761-4-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-07-02 05:54:59 -04:00
Farhan Ali
f030532f2a vfio-ccw: Add support for the CRW region and IRQ
The crw region can be used to obtain information about
Channel Report Words (CRW) from vfio-ccw driver.

Currently only channel-path related CRWs are passed to
QEMU from vfio-ccw driver.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20200505125757.98209-7-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2020-06-18 12:13:54 +02:00
Eric Farman
690e29b911 vfio-ccw: Refactor ccw irq handler
Make it easier to add new ones in the future.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20200505125757.98209-5-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2020-06-18 12:13:54 +02:00
Farhan Ali
46ea3841ed vfio-ccw: Add support for the schib region
The schib region can be used to obtain the latest SCHIB from the host
passthrough subchannel. Since the guest SCHIB is virtualized,
we currently only update the path related information so that the
guest is aware of any path related changes when it issues the
'stsch' instruction.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20200505125757.98209-4-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2020-06-18 12:13:54 +02:00
Eric Farman
2a3b9cbaa7 vfio-ccw: Refactor cleanup of regions
While we're at it, add a g_free() for the async_cmd_region that
is the last thing currently created.  g_free() knows how to handle
NULL pointers, so this makes it easier to remember what cleanups
need to be performed when new regions are added.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20200505125757.98209-3-farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2020-06-18 12:13:54 +02:00
Peter Maydell
7d3660e798 * Miscellaneous fixes and feature enablement (many)
* SEV refactoring (David)
 * Hyper-V initial support (Jon)
 * i386 TCG fixes (x87 and SSE, Joseph)
 * vmport cleanup and improvements (Philippe, Liran)
 * Use-after-free with vCPU hot-unplug (Nengyuan)
 * run-coverity-scan improvements (myself)
 * Record/replay fixes (Pavel)
 * -machine kernel_irqchip=split improvements for INTx (Peter)
 * Code cleanups (Philippe)
 * Crash and security fixes (PJP)
 * HVF cleanups (Roman)
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl7jpdAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMfjwf/X7+0euuE9dwKFKDDMmIi+4lRWnq7
 gSOyE1BYSfDIUXRIukf64konXe0VpiotNYlyEaYnnQjkMdGm5E9iXKF+LgEwXj/t
 NSGkfj5J3VeWRG4JJp642CSN/aZWO8uzkenld3myCnu6TicuN351tDJchiFwAk9f
 wsXtgLKd67zE8MLVt8AP0rNTbzMHttPXnPaOXDCuwjMHNvMEKnC93UeOeM0M4H5s
 3Dl2HvsNWZ2SzUG9mAbWp0bWWuoIb+Ep9//87HWANvb7Z8jratRws18i6tYt1sPx
 8zOnUS87sVnh1CQlXBDd9fEcqBUVgR9pAlqaaYavNhFp5eC31euvpDU8Iw==
 =F4sU
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* Miscellaneous fixes and feature enablement (many)
* SEV refactoring (David)
* Hyper-V initial support (Jon)
* i386 TCG fixes (x87 and SSE, Joseph)
* vmport cleanup and improvements (Philippe, Liran)
* Use-after-free with vCPU hot-unplug (Nengyuan)
* run-coverity-scan improvements (myself)
* Record/replay fixes (Pavel)
* -machine kernel_irqchip=split improvements for INTx (Peter)
* Code cleanups (Philippe)
* Crash and security fixes (PJP)
* HVF cleanups (Roman)

# gpg: Signature made Fri 12 Jun 2020 16:57:04 BST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (116 commits)
  target/i386: Remove obsolete TODO file
  stubs: move Xen stubs to accel/
  replay: fix replay shutdown for console mode
  exec/cpu-common: Move MUSB specific typedefs to 'hw/usb/hcd-musb.h'
  hw/usb: Move device-specific declarations to new 'hcd-musb.h' header
  exec/memory: Remove unused MemoryRegionMmio type
  checkpatch: reversed logic with acpi test checks
  target/i386: sev: Unify SEVState and SevGuestState
  target/i386: sev: Remove redundant handle field
  target/i386: sev: Remove redundant policy field
  target/i386: sev: Remove redundant cbitpos and reduced_phys_bits fields
  target/i386: sev: Partial cleanup to sev_state global
  target/i386: sev: Embed SEVState in SevGuestState
  target/i386: sev: Rename QSevGuestInfo
  target/i386: sev: Move local structure definitions into .c file
  target/i386: sev: Remove unused QSevGuestInfoClass
  xen: fix build without pci passthrough
  i386: hvf: Drop HVFX86EmulatorState
  i386: hvf: Move mmio_buf into CPUX86State
  i386: hvf: Move lazy_flags into CPUX86State
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

# Conflicts:
#	hw/i386/acpi-build.c
2020-06-12 23:06:22 +01:00
Thomas Huth
643a4eacef hw/vfio/pci-quirks: Fix broken legacy IGD passthrough
The #ifdef CONFIG_VFIO_IGD in pci-quirks.c is not working since the
required header config-devices.h is not included, so that the legacy
IGD passthrough is currently broken. Let's include the right header
to fix this issue.

Buglink: https://bugs.launchpad.net/qemu/+bug/1882784
Fixes: 29d62771c8 ("hw/vfio: Move the IGD quirk code to a separate file")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-11 11:36:40 -06:00
Jon Derrick
ee7932b0bb hw/vfio: Add VMD Passthrough Quirk
The VMD endpoint provides a real PCIe domain to the guest, including
bridges and endpoints. Because the VMD domain is enumerated by the guest
kernel, the guest kernel will assign Guest Physical Addresses to the
downstream endpoint BARs and bridge windows.

When the guest kernel performs MMIO to VMD sub-devices, MMU will
translate from the guest address space to the physical address space.
Because the bridges have been programmed with guest addresses, the
bridges will reject the transaction containing physical addresses.

VMD device 28C0 natively assists passthrough by providing the Host
Physical Address in shadow registers accessible to the guest for bridge
window assignment. The shadow registers are valid if bit 1 is set in VMD
VMLOCK config register 0x70.

In order to support existing VMDs, this quirk provides the shadow
registers in a vendor-specific PCI capability to the vfio-passthrough
device for all VMD device ids which don't natively assist with
passthrough. The Linux VMD driver is updated to check for this new
vendor-specific capability.

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-11 11:36:39 -06:00
Peter Xu
97a3757616 vfio/pci: Use kvm_irqchip_add_irqfd_notifier_gsi() for irqfds
VFIO is currently the only one left that is not using the generic
function (kvm_irqchip_add_irqfd_notifier_gsi()) to register irqfds.
Let VFIO use the common framework too.

Follow up patches will introduce extra features for kvm irqfd, so that
VFIO can easily leverage that after the switch.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200318145204.74483-3-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-10 12:10:28 -04:00
Jared Rossi
24e58a7b1d vfio-ccw: allow non-prefetch ORBs
Remove the explicit prefetch check when using vfio-ccw devices.
This check does not trigger in practice as all Linux channel programs
are intended to use prefetch.

Newer Linux kernel versions do not require to force the PFCH flag with
vfio-ccw devices anymore.

Signed-off-by: Jared Rossi <jrossi@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <20200512181535.18630-2-jrossi@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2020-06-05 17:13:11 +02:00
Leonardo Bras
9c7c040702 vfio/nvlink: Remove exec permission to avoid SELinux AVCs
If SELinux is setup without 'execmem' permission for qemu, all mmap
with (PROT_WRITE | PROT_EXEC) will fail and print a warning in
SELinux log.

If "nvlink2-mr" memory allocation fails (fist diff), it will cause
guest NUMA nodes to not be correctly configured (V100 memory will
not be visible for guest, nor its NUMA nodes).

Not having 'execmem' permission is intesting for virtual machines to
avoid buffer-overflow based attacks, and it's adopted in distros
like RHEL.

So, removing the PROT_EXEC flag seems the right thing to do.

Browsing some other code that mmaps memory for usage with
memory_region_init_ram_device_ptr, I could notice it's usual to
not have PROT_EXEC (only PROT_READ | PROT_WRITE), so it should be
no problem around this.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Message-Id: <20200501055448.286518-1-leobras.c@gmail.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-05-27 15:29:36 +10:00
Gerd Hoffmann
2fc979cb9d Revert "hw/display/ramfb: initialize fw-config space with xres/ yres"
This reverts commit f79081b4b7.

Patch has broken byteorder handling: RAMFBCfg fields are in bigendian
byteorder, the reset function doesn't care so native byteorder is used
instead.  Given this went unnoticed so far the feature is obviously
unused, so just revert the patch.

Cc: Hou Qiming <hqm03ster@gmail.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Message-id: 20200429115236.28709-2-kraxel@redhat.com
2020-05-18 15:42:34 +02:00
Markus Armbruster
b69c3c21a5 qdev: Unrealize must not fail
Devices may have component devices and buses.

Device realization may fail.  Realization is recursive: a device's
realize() method realizes its components, and device_set_realized()
realizes its buses (which should in turn realize the devices on that
bus, except bus_set_realized() doesn't implement that, yet).

When realization of a component or bus fails, we need to roll back:
unrealize everything we realized so far.  If any of these unrealizes
failed, the device would be left in an inconsistent state.  Must not
happen.

device_set_realized() lets it happen: it ignores errors in the roll
back code starting at label child_realize_fail.

Since realization is recursive, unrealization must be recursive, too.
But how could a partly failed unrealize be rolled back?  We'd have to
re-realize, which can fail.  This design is fundamentally broken.

device_set_realized() does not roll back at all.  Instead, it keeps
unrealizing, ignoring further errors.

It can screw up even for a device with no buses: if the lone
dc->unrealize() fails, it still unregisters vmstate, and calls
listeners' unrealize() callback.

bus_set_realized() does not roll back either.  Instead, it stops
unrealizing.

Fortunately, no unrealize method can fail, as we'll see below.

To fix the design error, drop parameter @errp from all the unrealize
methods.

Any unrealize method that uses @errp now needs an update.  This leads
us to unrealize() methods that can fail.  Merely passing it to another
unrealize method cannot cause failure, though.  Here are the ones that
do other things with @errp:

* virtio_serial_device_unrealize()

  Fails when qbus_set_hotplug_handler() fails, but still does all the
  other work.  On failure, the device would stay realized with its
  resources completely gone.  Oops.  Can't happen, because
  qbus_set_hotplug_handler() can't actually fail here.  Pass
  &error_abort to qbus_set_hotplug_handler() instead.

* hw/ppc/spapr_drc.c's unrealize()

  Fails when object_property_del() fails, but all the other work is
  already done.  On failure, the device would stay realized with its
  vmstate registration gone.  Oops.  Can't happen, because
  object_property_del() can't actually fail here.  Pass &error_abort
  to object_property_del() instead.

* spapr_phb_unrealize()

  Fails and bails out when remove_drcs() fails, but other work is
  already done.  On failure, the device would stay realized with some
  of its resources gone.  Oops.  remove_drcs() fails only when
  chassis_from_bus()'s object_property_get_uint() fails, and it can't
  here.  Pass &error_abort to remove_drcs() instead.

Therefore, no unrealize method can fail before this patch.

device_set_realized()'s recursive unrealization via bus uses
object_property_set_bool().  Can't drop @errp there, so pass
&error_abort.

We similarly unrealize with object_property_set_bool() elsewhere,
always ignoring errors.  Pass &error_abort instead.

Several unrealize methods no longer handle errors from other unrealize
methods: virtio_9p_device_unrealize(),
virtio_input_device_unrealize(), scsi_qdev_unrealize(), ...
Much of the deleted error handling looks wrong anyway.

One unrealize methods no longer ignore such errors:
usb_ehci_pci_exit().

Several realize methods no longer ignore errors when rolling back:
v9fs_device_realize_common(), pci_qdev_unrealize(),
spapr_phb_realize(), usb_qdev_realize(), vfio_ccw_realize(),
virtio_device_realize().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200505152926.18877-17-armbru@redhat.com>
2020-05-15 07:08:14 +02:00
Markus Armbruster
40c2281cc3 Drop more @errp parameters after previous commit
Several functions can't fail anymore: ich9_pm_add_properties(),
device_add_bootindex_property(), ppc_compat_add_property(),
spapr_caps_add_properties(), PropertyInfo.create().  Drop their @errp
parameter.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200505152926.18877-16-armbru@redhat.com>
2020-05-15 07:08:14 +02:00
Markus Armbruster
d2623129a7 qom: Drop parameter @errp of object_property_add() & friends
The only way object_property_add() can fail is when a property with
the same name already exists.  Since our property names are all
hardcoded, failure is a programming error, and the appropriate way to
handle it is passing &error_abort.

Same for its variants, except for object_property_add_child(), which
additionally fails when the child already has a parent.  Parentage is
also under program control, so this is a programming error, too.

We have a bit over 500 callers.  Almost half of them pass
&error_abort, slightly fewer ignore errors, one test case handles
errors, and the remaining few callers pass them to their own callers.

The previous few commits demonstrated once again that ignoring
programming errors is a bad idea.

Of the few ones that pass on errors, several violate the Error API.
The Error ** argument must be NULL, &error_abort, &error_fatal, or a
pointer to a variable containing NULL.  Passing an argument of the
latter kind twice without clearing it in between is wrong: if the
first call sets an error, it no longer points to NULL for the second
call.  ich9_pm_add_properties(), sparc32_ledma_realize(),
sparc32_dma_realize(), xilinx_axidma_realize(), xilinx_enet_realize()
are wrong that way.

When the one appropriate choice of argument is &error_abort, letting
users pick the argument is a bad idea.

Drop parameter @errp and assert the preconditions instead.

There's one exception to "duplicate property name is a programming
error": the way object_property_add() implements the magic (and
undocumented) "automatic arrayification".  Don't drop @errp there.
Instead, rename object_property_add() to object_property_try_add(),
and add the obvious wrapper object_property_add().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200505152926.18877-15-armbru@redhat.com>
[Two semantic rebase conflicts resolved]
2020-05-15 07:07:58 +02:00
Daniel Brodsky
6e8a355de6 lockable: replaced locks with lock guard macros where appropriate
- ran regexp "qemu_mutex_lock\(.*\).*\n.*if" to find targets
- replaced result with QEMU_LOCK_GUARD if all unlocks at function end
- replaced result with WITH_QEMU_LOCK_GUARD if unlock not at end

Signed-off-by: Daniel Brodsky <dnbrdsky@gmail.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-id: 20200404042108.389635-3-dnbrdsky@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-05-04 16:07:43 +01:00
Alexey Kardashevskiy
79178edd2a vfio/spapr: Fix page size calculation
Coverity detected an issue (CID 1421903) with potential call of clz64(0)
which returns 64 which make it do "<<" with a negative number.

This checks the mask and avoids undefined behaviour.

In practice pgsizes and memory_region_iommu_get_min_page_size() always
have some common page sizes and even if they did not, the resulting page
size would be 0x8000.0000.0000.0000 (gcc 9.2) and
ioctl(VFIO_IOMMU_SPAPR_TCE_CREATE) would fail anyway.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20200324063912.25063-1-aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-04-07 08:55:10 +10:00
Philippe Mathieu-Daudé
180f3fd2d7 hw/vfio/display: Remove superfluous semicolon
Fixes: 8b818e059b
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200218094402.26625-9-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-02-18 20:20:49 +01:00
Michal Privoznik
b09d51c909 Report stringified errno in VFIO related errors
In a few places we report errno formatted as a negative integer.
This is not as user friendly as it can be. Use strerror() and/or
error_setg_errno() instead.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <4949c3ecf1a32189b8a4b5eb4b0fd04c1122501d.1581674006.git.mprivozn@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-02-18 20:20:49 +01:00
Thomas Huth
29d62771c8 hw/vfio: Move the IGD quirk code to a separate file
The IGD quirk code defines a separate device, the so-called
"vfio-pci-igd-lpc-bridge" which shows up as a user-creatable
device in all QEMU binaries that include the vfio code. This
is a little bit unfortunate for two reasons: First, this device
is completely useless in binaries like qemu-system-s390x.
Second we also would like to disable it in downstream RHEL
which currently requires some extra patches there since the
device does not have a proper Kconfig-style switch yet.

So it would be good if the device could be disabled more easily,
thus let's move the code to a separate file instead and introduce
a proper Kconfig switch for it which gets only enabled by default
if we also have CONFIG_PC_PCI enabled.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-02-06 11:55:42 -07:00
Marc-André Lureau
4f67d30b5e qdev: set properties with device_class_set_props()
The following patch will need to handle properties registration during
class_init time. Let's use a device_class_set_props() setter.

spatch --macro-file scripts/cocci-macro-file.h  --sp-file
./scripts/coccinelle/qdev-set-props.cocci --keep-comments --in-place
--dir .

@@
typedef DeviceClass;
DeviceClass *d;
expression val;
@@
- d->props = val
+ device_class_set_props(d, val)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20200110153039.1379601-20-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24 20:59:15 +01:00
Peter Xu
0446f81217 vfio/pci: Don't remove irqchip notifier if not registered
The kvm irqchip notifier is only registered if the device supports
INTx, however it's unconditionally removed.  If the assigned device
does not support INTx, this will cause QEMU to crash when unplugging
the device from the system.  Change it to conditionally remove the
notifier only if the notify hook is setup.

CC: Eduardo Habkost <ehabkost@redhat.com>
CC: David Gibson <david@gibson.dropbear.id.au>
CC: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-stable@nongnu.org # v4.2
Reported-by: yanghliu@redhat.com
Debugged-by: Eduardo Habkost <ehabkost@redhat.com>
Fixes: c5478fea27 ("vfio/pci: Respond to KVM irqchip change notifier")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1782678
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-01-06 14:19:42 -07:00
Vladimir Sementsov-Ogievskiy
b5e45b0f48 hw/vfio/ap: drop local_err from vfio_ap_realize
No reason for local_err here, use errp directly instead.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20191205174635.18758-21-vsementsov@virtuozzo.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2019-12-18 08:43:19 +01:00
Boris Fiuczynski
91f751dc11 vfio-ccw: Fix error message
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <20191128143015.5231-1-fiuczy@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-12-14 10:22:38 +01:00
David Gibson
c5478fea27 vfio/pci: Respond to KVM irqchip change notifier
VFIO PCI devices already respond to the pci intx routing notifier, in order
to update kernel irqchip mappings when routing is updated.  However this
won't handle the case where the irqchip itself is replaced by a different
model while retaining the same routing.  This case can happen on
the pseries machine type due to PAPR feature negotiation.

To handle that case, add a handler for the irqchip change notifier, which
does much the same thing as the routing notifier, but is unconditional,
rather than being a no-op when the routing hasn't changed.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
2019-11-26 10:11:30 +11:00
David Gibson
ad54dbd89d vfio/pci: Split vfio_intx_update()
This splits the vfio_intx_update() function into one part doing the actual
reconnection with the KVM irqchip (vfio_intx_update(), now taking an
argument with the new routing) and vfio_intx_routing_notifier() which
handles calls to the pci device intx routing notifier and calling
vfio_intx_update() when necessary.  This will make adding support for the
irqchip change notifier easier.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
2019-11-26 10:11:30 +11:00
Paolo Bonzini
29b95c992a vfio: vfio-pci requires EDID
hw/vfio/display.c needs the EDID subsystem, select it.

Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-11-18 10:41:49 -07:00
Jens Freimann
ed92369a57 vfio: don't ignore return value of migrate_add_blocker
When an error occurs in migrate_add_blocker() it sets a
negative return value and uses error pointer we pass in.
Instead of just looking at the error pointer check for a negative return
value and avoid a coverity error because the return value is
set but never used. This fixes CID 1407219.

Reported-by: Coverity (CID 1407219)
Fixes: f045a0104c ("vfio: unplug failover primary device before migration")
Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-11-18 10:41:48 -07:00
Michal Privoznik
1335d64323 hw/vfio/pci: Fix double free of migration_blocker
When user tries to hotplug a VFIO device, but the operation fails
somewhere in the middle (in my testing it failed because of
RLIMIT_MEMLOCK forbidding more memory allocation), then a double
free occurs. In vfio_realize() the vdev->migration_blocker is
allocated, then something goes wrong which causes control to jump
onto 'error' label where the error is freed. But the pointer is
left pointing to invalid memory. Later, when
vfio_instance_finalize() is called, the memory is freed again.

In my testing the second hunk was sufficient to fix the bug, but
I figured the first hunk doesn't hurt either.

==169952== Invalid read of size 8
==169952==    at 0xA47DCD: error_free (error.c:266)
==169952==    by 0x4E0A18: vfio_instance_finalize (pci.c:3040)
==169952==    by 0x8DF74C: object_deinit (object.c:606)
==169952==    by 0x8DF7BE: object_finalize (object.c:620)
==169952==    by 0x8E0757: object_unref (object.c:1074)
==169952==    by 0x45079C: memory_region_unref (memory.c:1779)
==169952==    by 0x45376B: do_address_space_destroy (memory.c:2793)
==169952==    by 0xA5C600: call_rcu_thread (rcu.c:283)
==169952==    by 0xA427CB: qemu_thread_start (qemu-thread-posix.c:519)
==169952==    by 0x80A8457: start_thread (in /lib64/libpthread-2.29.so)
==169952==    by 0x81C96EE: clone (in /lib64/libc-2.29.so)
==169952==  Address 0x143137e0 is 0 bytes inside a block of size 48 free'd
==169952==    at 0x4A342BB: free (vg_replace_malloc.c:530)
==169952==    by 0xA47E05: error_free (error.c:270)
==169952==    by 0x4E0945: vfio_realize (pci.c:3025)
==169952==    by 0x76A4FF: pci_qdev_realize (pci.c:2099)
==169952==    by 0x689B9A: device_set_realized (qdev.c:876)
==169952==    by 0x8E2C80: property_set_bool (object.c:2080)
==169952==    by 0x8E0EF6: object_property_set (object.c:1272)
==169952==    by 0x8E3FC8: object_property_set_qobject (qom-qobject.c:26)
==169952==    by 0x8E11DB: object_property_set_bool (object.c:1338)
==169952==    by 0x5E7BDD: qdev_device_add (qdev-monitor.c:673)
==169952==    by 0x5E81E5: qmp_device_add (qdev-monitor.c:798)
==169952==    by 0x9E18A8: do_qmp_dispatch (qmp-dispatch.c:132)
==169952==  Block was alloc'd at
==169952==    at 0x4A35476: calloc (vg_replace_malloc.c:752)
==169952==    by 0x51B1158: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6000.6)
==169952==    by 0xA47357: error_setv (error.c:61)
==169952==    by 0xA475D9: error_setg_internal (error.c:97)
==169952==    by 0x4DF8C2: vfio_realize (pci.c:2737)
==169952==    by 0x76A4FF: pci_qdev_realize (pci.c:2099)
==169952==    by 0x689B9A: device_set_realized (qdev.c:876)
==169952==    by 0x8E2C80: property_set_bool (object.c:2080)
==169952==    by 0x8E0EF6: object_property_set (object.c:1272)
==169952==    by 0x8E3FC8: object_property_set_qobject (qom-qobject.c:26)
==169952==    by 0x8E11DB: object_property_set_bool (object.c:1338)
==169952==    by 0x5E7BDD: qdev_device_add (qdev-monitor.c:673)

Fixes: f045a0104c ("vfio: unplug failover primary device before migration")
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-11-18 10:41:47 -07:00
Jens Freimann
f045a0104c vfio: unplug failover primary device before migration
As usual block all vfio-pci devices from being migrated, but make an
exception for failover primary devices. This is achieved by setting
unmigratable to 0 but also add a migration blocker for all vfio-pci
devices except failover primary devices. These will be unplugged before
migration happens by the migration handler of the corresponding
virtio-net standby device.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <20191029114905.6856-12-jfreimann@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-10-29 18:55:26 -04:00
Wei Yang
038adc2f58 core: replace getpagesize() with qemu_real_host_page_size
There are three page size in qemu:

  real host page size
  host page size
  target page size

All of them have dedicate variable to represent. For the last two, we
use the same form in the whole qemu project, while for the first one we
use two forms: qemu_real_host_page_size and getpagesize().

qemu_real_host_page_size is defined to be a replacement of
getpagesize(), so let it serve the role.

[Note] Not fully tested for some arch or device.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-26 15:38:06 +02:00
Evgeny Yakovlev
d964d3b5ab hw/vfio/pci: fix double free in vfio_msi_disable
The following guest behaviour patter leads to double free in VFIO PCI:

1. Guest enables MSI interrupts
vfio_msi_enable is called, but fails in vfio_enable_vectors.
In our case this was because VFIO GPU device was in D3 state.
Unhappy path in vfio_msi_enable will g_free(vdev->msi_vectors) but not
set this pointer to NULL

2. Guest still sees MSI an enabled after that because emulated config
write is done in vfio_pci_write_config unconditionally before calling
vfio_msi_enable

3. Guest disables MSI interrupts
vfio_msi_disable is called and tries to g_free(vdev->msi_vectors)
in vfio_msi_disable_common => double free

Signed-off-by: Evgeny Yakovlev <wrfsh@yandex-team.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-10-10 11:07:28 -06:00
Eric Auger
549d400587 memory: allow memory_region_register_iommu_notifier() to fail
Currently, when a notifier is attempted to be registered and its
flags are not supported (especially the MAP one) by the IOMMU MR,
we generally abruptly exit in the IOMMU code. The failure could be
handled more nicely in the caller and especially in the VFIO code.

So let's allow memory_region_register_iommu_notifier() to fail as
well as notify_flag_changed() callback.

All sites implementing the callback are updated. This patch does
not yet remove the exit(1) in the amd_iommu code.

in SMMUv3 we turn the warning message into an error message saying
that the assigned device would not work properly.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-04 18:49:18 +02:00
Eric Auger
d7d8783647 vfio: Turn the container error into an Error handle
The container error integer field is currently used to store
the first error potentially encountered during any
vfio_listener_region_add() call. However this fails to propagate
detailed error messages up to the vfio_connect_container caller.
Instead of using an integer, let's use an Error handle.

Messages are slightly reworded to accomodate the propagation.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-04 18:49:18 +02:00
Chen Zhang
f75ca62723 vfio: fix a typo
Signed-off-by: Chen Zhang <tgfbeta@me.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <8E5A9C27-C76D-46CF-85B0-79121A00B05F@me.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-09-19 11:50:37 +02:00
Tony Nguyen
d5d680cacc memory: Access MemoryRegion with endianness
Preparation for collapsing the two byte swaps adjust_endianness and
handle_bswap into the former.

Call memory_region_dispatch_{read|write} with endianness encoded into
the "MemOp op" operand.

This patch does not change any behaviour as
memory_region_dispatch_{read|write} is yet to handle the endianness.

Once it does handle endianness, callers with byte swaps can collapse
them into adjust_endianness.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Tony Nguyen <tony.nguyen@bt.com>
Message-Id: <8066ab3eb037c0388dfadfe53c5118429dd1de3a.1566466906.git.tony.nguyen@bt.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03 08:30:39 -07:00
Tony Nguyen
475fbf0a3c hw/vfio: Access MemoryRegion with MemOp
The memory_region_dispatch_{read|write} operand "unsigned size" is
being converted into a "MemOp op".

Convert interfaces by using no-op size_memop.

After all interfaces are converted, size_memop will be implemented
and the memory_region_dispatch_{read|write} operand "unsigned size"
will be converted into a "MemOp op".

As size_memop is a no-op, this patch does not change any behaviour.

Signed-off-by: Tony Nguyen <tony.nguyen@bt.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <e70ff5814ac3656974180db6375397c43b0bc8b8.1566466906.git.tony.nguyen@bt.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03 08:30:38 -07:00
Markus Armbruster
54d31236b9 sysemu: Split sysemu/runstate.h off sysemu/sysemu.h
sysemu/sysemu.h is a rather unfocused dumping ground for stuff related
to the system-emulator.  Evidence:

* It's included widely: in my "build everything" tree, changing
  sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600
  objects (not counting tests and objects that don't depend on
  qemu/osdep.h, down from 5400 due to the previous two commits).

* It pulls in more than a dozen additional headers.

Split stuff related to run state management into its own header
sysemu/runstate.h.

Touching sysemu/sysemu.h now recompiles some 850 objects.  qemu/uuid.h
also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400
to 4200.  Touching new sysemu/runstate.h recompiles some 500 objects.

Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also
add qemu/main-loop.h.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-30-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
[Unbreak OS-X build]
2019-08-16 13:37:36 +02:00
Markus Armbruster
d5938f29fe Clean up inclusion of sysemu/sysemu.h
In my "build everything" tree, changing sysemu/sysemu.h triggers a
recompile of some 5400 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).

Almost a third of its inclusions are actually superfluous.  Delete
them.  Downgrade two more to qapi/qapi-types-run-state.h, and move one
from char/serial.h to char/serial.c.

hw/semihosting/config.c, monitor/monitor.c, qdev-monitor.c, and
stubs/semihost.c define variables declared in sysemu/sysemu.h without
including it.  The compiler is cool with that, but include it anyway.

This doesn't reduce actual use much, as it's still included into
widely included headers.  The next commit will tackle that.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190812052359.30071-27-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2019-08-16 13:31:53 +02:00
Markus Armbruster
a27bd6c779 Include hw/qdev-properties.h less
In my "build everything" tree, changing hw/qdev-properties.h triggers
a recompile of some 2700 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).

Many places including hw/qdev-properties.h (directly or via hw/qdev.h)
actually need only hw/qdev-core.h.  Include hw/qdev-core.h there
instead.

hw/qdev.h is actually pointless: all it does is include hw/qdev-core.h
and hw/qdev-properties.h, which in turn includes hw/qdev-core.h.
Replace the remaining uses of hw/qdev.h by hw/qdev-properties.h.

While there, delete a few superfluous inclusions of hw/qdev-core.h.

Touching hw/qdev-properties.h now recompiles some 1200 objects.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190812052359.30071-22-armbru@redhat.com>
2019-08-16 13:31:53 +02:00
Markus Armbruster
db72581598 Include qemu/main-loop.h less
In my "build everything" tree, changing qemu/main-loop.h triggers a
recompile of some 5600 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).  It includes block/aio.h,
which in turn includes qemu/event_notifier.h, qemu/notify.h,
qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h,
qemu/thread.h, qemu/timer.h, and a few more.

Include qemu/main-loop.h only where it's needed.  Touching it now
recompiles only some 1700 objects.  For block/aio.h and
qemu/event_notifier.h, these numbers drop from 5600 to 2800.  For the
others, they shrink only slightly.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-21-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16 13:31:52 +02:00
Markus Armbruster
dc5e9ac716 Include qemu/queue.h slightly less
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190812052359.30071-20-armbru@redhat.com>
2019-08-16 13:31:52 +02:00
Markus Armbruster
650d103d3e Include hw/hw.h exactly where needed
In my "build everything" tree, changing hw/hw.h triggers a recompile
of some 2600 out of 6600 objects (not counting tests and objects that
don't depend on qemu/osdep.h).

The previous commits have left only the declaration of hw_error() in
hw/hw.h.  This permits dropping most of its inclusions.  Touching it
now recompiles less than 200 objects.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190812052359.30071-19-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16 13:31:52 +02:00
Markus Armbruster
d645427057 Include migration/vmstate.h less
In my "build everything" tree, changing migration/vmstate.h triggers a
recompile of some 2700 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).

hw/hw.h supposedly includes it for convenience.  Several other headers
include it just to get VMStateDescription.  The previous commit made
that unnecessary.

Include migration/vmstate.h only where it's still needed.  Touching it
now recompiles only some 1600 objects.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190812052359.30071-16-armbru@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16 13:31:52 +02:00
Markus Armbruster
64552b6be4 Include hw/irq.h a lot less
In my "build everything" tree, changing hw/irq.h triggers a recompile
of some 5400 out of 6600 objects (not counting tests and objects that
don't depend on qemu/osdep.h).

hw/hw.h supposedly includes it for convenience.  Several other headers
include it just to get qemu_irq and.or qemu_irq_handler.

Move the qemu_irq and qemu_irq_handler typedefs from hw/irq.h to
qemu/typedefs.h, and then include hw/irq.h only where it's still
needed.  Touching it now recompiles only some 500 objects.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190812052359.30071-13-armbru@redhat.com>
2019-08-16 13:31:52 +02:00
Markus Armbruster
71e8a91585 Include sysemu/reset.h a lot less
In my "build everything" tree, changing sysemu/reset.h triggers a
recompile of some 2600 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).

The main culprit is hw/hw.h, which supposedly includes it for
convenience.

Include sysemu/reset.h only where it's needed.  Touching it now
recompiles less than 200 objects.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190812052359.30071-9-armbru@redhat.com>
2019-08-16 13:31:52 +02:00
Alex Williamson
f5cf94cdab vfio-ccw: Test vfio_set_irq_signaling() return value
Coverity doesn't like that most callers of vfio_set_irq_signaling() check
the return value and doesn't understand the equivalence of testing the
error pointer instead.  Test the return value consistently.

Reported-by: Coverity (CID 1402783)
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <156209642116.14915.9598593247782519613.stgit@gimli.home>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-07-08 12:10:37 +02:00
Eric Auger
5053bd7811 vfio/pci: Trace vfio_set_irq_signaling() failure in vfio_msix_vector_release()
Report an error in case we fail to set a trigger action
on any VFIO_PCI_MSIX_IRQ_INDEX subindex. This might be
useful in debugging a device that is not working properly.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reported-by: Coverity (CID 1402196)
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-07-02 13:16:29 -06:00
Cornelia Huck
8fadea24de vfio-ccw: support async command subregion
A vfio-ccw device may provide an async command subregion for
issuing halt/clear subchannel requests. If it is present, use
it for sending halt/clear request to the device; if not, fall
back to emulation (as done today).

Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Message-Id: <20190613092542.2834-1-cohuck@redhat.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-06-24 17:27:57 +02:00
Cornelia Huck
b5e89f044d vfio-ccw: use vfio_set_irq_signaling
Use the new helper.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Message-Id: <20190617101036.4087-1-cohuck@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-06-24 17:27:57 +02:00
Eric Auger
201a733145 vfio/common: Introduce vfio_set_irq_signaling helper
The code used to assign an interrupt index/subindex to an
eventfd is duplicated many times. Let's introduce an helper that
allows to set/unset the signaling for an ACTION_TRIGGER,
ACTION_MASK or ACTION_UNMASK action.

In the error message, we now use errno in case of any
VFIO_DEVICE_SET_IRQS ioctl failure.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-06-13 09:57:37 -06:00
Alex Williamson
c60807dea5 vfio/pci: Allow MSI-X relocation to fixup bogus PBA
The MSI-X relocation code can sometimes be used to work around bogus
MSI-X capabilities, but this test for whether the PBA is outside of
the specified BAR causes the device to error before we can apply a
relocation.  Let it proceed if we intend to relocate MSI-X anyway.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-06-13 09:57:36 -06:00
Alex Williamson
3412d8ec98 vfio/pci: Hide Resizable BAR capability
The resizable BAR capability is currently exposed read-only from the
kernel and we don't yet implement a protocol for virtualizing it to
the VM.  Exposing it to the guest read-only introduces poor behavior
as the guest has no reason to test that a control register write is
accepted by the hardware.  This can lead to cases where the guest OS
assumes the BAR has been resized, but it hasn't.  This has been
observed when assigning AMD Vega GPUs.

Note, this does not preclude future enablement of resizable BARs, but
it's currently incorrect to expose this capability as read-only, so
better to not expose it at all.

Reported-by: James Courtier-Dutton <james.dutton@gmail.com>
Tested-by: James Courtier-Dutton <james.dutton@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-06-13 09:57:36 -06:00
Markus Armbruster
a8d2532645 Include qemu-common.h exactly where needed
No header includes qemu-common.h after this commit, as prescribed by
qemu-common.h's file comment.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-5-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c
block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c
target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h
target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h
target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h
target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and
net/tap-bsd.c fixed up]
2019-06-12 13:20:20 +02:00
Markus Armbruster
0b8fa32f55 Include qemu/module.h where needed, drop it from qemu-common.h
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-4-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c
hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c;
ui/cocoa.m fixed up]
2019-06-12 13:18:33 +02:00
Gerd Hoffmann
a6c9d5da08 vfio/display: set dmabuf modifier field
Fill the new QemuDmaBuf->modifier field properly from plane info.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-id: 20190529072144.26737-3-kraxel@redhat.com
2019-06-07 11:52:35 +02:00
Philippe Mathieu-Daudé
a2596aee6c hw/vfio/pci: Use the QOM DEVICE() macro to access DeviceState.qdev
Rather than looking inside the definition of a DeviceState with
"s->qdev", use the QOM prefered style: "DEVICE(s)".

This patch was generated using the following Coccinelle script:

    // Use DEVICE() macros to access DeviceState.qdev
    @use_device_macro_to_access_qdev@
    expression obj;
    identifier dev;
    @@
    -&obj->dev.qdev
    +DEVICE(obj)

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <20190528164020.32250-10-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-06-06 11:35:21 +02:00
Hou Qiming
f79081b4b7 hw/display/ramfb: initialize fw-config space with xres/ yres
If xres / yres were specified in QEMU command line, write them as an initial
resolution to the fw-config space on guest reset, which a later BIOS / OVMF
patch can take advantage of.

Signed-off-by: HOU Qiming <hqm03ster@gmail.com>
Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Message-id: 20190513115731.17588-4-marcel.apfelbaum@gmail.com
[fixed malformed patch]
Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2019-05-24 09:10:29 +02:00
Li Qiang
2d9574bdbe pci: msix: move 'MSIX_CAP_LENGTH' to header file
'MSIX_CAP_LENGTH' is defined in two .c file. Move it
to hw/pci/msix.h file to reduce duplicated code.

CC: qemu-trivial@nongnu.org
Signed-off-by: Li Qiang <liq3ea@163.com>
Message-Id: <20190521151543.92274-5-liq3ea@163.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-05-22 17:35:27 +02:00
Li Qiang
bf04ef354c vfio: platform: fix a typo
'eventd' should be 'eventfd'.

CC: qemu-trivial@nongnu.org
Signed-off-by: Li Qiang <liq3ea@163.com>
Message-Id: <20190521151543.92274-4-liq3ea@163.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-05-22 17:35:27 +02:00
Li Qiang
da56e33006 hw: vfio: drop TYPE_FOO MACRO in VMStateDescription
It's recommended that VMStateDescription names are decoupled from QOM
type names as the latter may freely change without consideration of
migration compatibility.

Link: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg02175.html

CC: qemu-trivial@nongnu.org
Signed-off-by: Li Qiang <liq3ea@163.com>
Message-Id: <20190521151543.92274-3-liq3ea@163.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-05-22 17:35:27 +02:00
Li Qiang
0c0c8f8aaf vfio: pci: make "vfio-pci-nohotplug" as MACRO
The QOMConventions recommends we should use TYPE_FOO
for a TypeInfo's name. Though "vfio-pci-nohotplug" is not
used in other parts, for consistency we should make this change.

CC: qemu-trivial@nongnu.org
Signed-off-by: Li Qiang <liq3ea@163.com>
Message-Id: <20190521151543.92274-2-liq3ea@163.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-05-22 17:35:27 +02:00
Peter Maydell
9ec34ecc97 ppc patch queue 2019-04-26
Here's the first ppc target pull request for qemu-4.1.  This has a
 number of things that have accumulated while qemu-4.0 was frozen.
 
  * A number of emulated MMU improvements from Ben Herrenschmidt
 
  * Assorted cleanups fro Greg Kurz
 
  * A large set of mostly mechanical cleanups from me to make target/ppc
    much closer to compliant with the modern coding style
 
  * Support for passthrough of NVIDIA GPUs using NVLink2
 
 As well as some other assorted fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlzCnusACgkQbDjKyiDZ
 s5LfhhAAuem5UBGKPKPj33c87HC+GGG+S4y89ic3ebyKplWulGgouHCa4Dnc7Y5m
 9MfIEcljRDpuRJCEONo6yg9aaRb3cW2Go9TpTwxmF8o1suG/v5bIQIdiRbBuMa2t
 yhNujVg5kkWSU1G4mCZjL9FS2ADPsxsKZVd73DPEqjlNJg981+2qtSnfR8SXhfnk
 dSSKxyfC6Hq1+uhGkLI+xtft+BCTWOstjz+efHpZ5l2mbiaMeh7zMKrIXXy/FtKA
 ufIyxbZznMS5MAZk7t90YldznfwOCqfh3di1kx8GTZ40LkBKbuI5LLHTG0sT75z5
 LHwFuLkBgWmS8RyIRRh9opr7ifrayHx8bQFpW368Qu+PbPzUCcTVIrWUfPmaNR74
 CkYJvhiYZfTwKtUeP7b2wUkHpZF4KINI4TKNaS4QAlm3DNbO67DFYkBrytpXsSzv
 smEpe+sqlbY40olw9q4ESP80r+kGdEPLkRjfdj0R7qS4fsqAH1bjuSkNqlPaCTJQ
 hNsoz2D+f56z0bBq4x8FRzDpqnBkdy4x6PlLxkJuAaV7WAtvq7n7tiMA3TRr/rIB
 OYFP2xPNajjP8MfyOB94+S4WDltmsgXoM7HyyvrKp2JBpe7mFjpep5fMp5GUpweV
 OOYrTsN1Nuu3kFpeimEc+IOyp1BWXnJF4vHhKTOqHeqZEs5Fgus=
 =RpAK
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190426' into staging

ppc patch queue 2019-04-26

Here's the first ppc target pull request for qemu-4.1.  This has a
number of things that have accumulated while qemu-4.0 was frozen.

 * A number of emulated MMU improvements from Ben Herrenschmidt

 * Assorted cleanups fro Greg Kurz

 * A large set of mostly mechanical cleanups from me to make target/ppc
   much closer to compliant with the modern coding style

 * Support for passthrough of NVIDIA GPUs using NVLink2

As well as some other assorted fixes.

# gpg: Signature made Fri 26 Apr 2019 07:02:19 BST
# gpg:                using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-4.1-20190426: (36 commits)
  target/ppc: improve performance of large BAT invalidations
  ppc/hash32: Rework R and C bit updates
  ppc/hash64: Rework R and C bit updates
  ppc/spapr: Use proper HPTE accessors for H_READ
  target/ppc: Don't check UPRT in radix mode when in HV real mode
  target/ppc/kvm: Convert DPRINTF to traces
  target/ppc/trace-events: Fix trivial typo
  spapr: Drop duplicate PCI swizzle code
  spapr_pci: Get rid of duplicate code for node name creation
  target/ppc: Style fixes for translate/spe-impl.inc.c
  target/ppc: Style fixes for translate/vmx-impl.inc.c
  target/ppc: Style fixes for translate/vsx-impl.inc.c
  target/ppc: Style fixes for translate/fp-impl.inc.c
  target/ppc: Style fixes for translate.c
  target/ppc: Style fixes for translate_init.inc.c
  target/ppc: Style fixes for monitor.c
  target/ppc: Style fixes for mmu_helper.c
  target/ppc: Style fixes for mmu-hash64.[ch]
  target/ppc: Style fixes for mmu-hash32.[ch]
  target/ppc: Style fixes for misc_helper.c
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-04-27 21:34:46 +01:00
Alexey Kardashevskiy
ec132efaa8 spapr: Support NVIDIA V100 GPU with NVLink2
NVIDIA V100 GPUs have on-board RAM which is mapped into the host memory
space and accessible as normal RAM via an NVLink bus. The VFIO-PCI driver
implements special regions for such GPUs and emulates an NVLink bridge.
NVLink2-enabled POWER9 CPUs also provide address translation services
which includes an ATS shootdown (ATSD) register exported via the NVLink
bridge device.

This adds a quirk to VFIO to map the GPU memory and create an MR;
the new MR is stored in a PCI device as a QOM link. The sPAPR PCI uses
this to get the MR and map it to the system address space.
Another quirk does the same for ATSD.

This adds additional steps to sPAPR PHB setup:

1. Search for specific GPUs and NPUs, collect findings in
sPAPRPHBState::nvgpus, manage system address space mappings;

2. Add device-specific properties such as "ibm,npu", "ibm,gpu",
"memory-block", "link-speed" to advertise the NVLink2 function to
the guest;

3. Add "mmio-atsd" to vPHB to advertise the ATSD capability;

4. Add new memory blocks (with extra "linux,memory-usable" to prevent
the guest OS from accessing the new memory until it is onlined) and
npuphb# nodes representing an NPU unit for every vPHB as the GPU driver
uses it for link discovery.

This allocates space for GPU RAM and ATSD like we do for MMIOs by
adding 2 new parameters to the phb_placement() hook. Older machine types
set these to zero.

This puts new memory nodes in a separate NUMA node to as the GPU RAM
needs to be configured equally distant from any other node in the system.
Unlike the host setup which assigns numa ids from 255 downwards, this
adds new NUMA nodes after the user configures nodes or from 1 if none
were configured.

This adds requirement similar to EEH - one IOMMU group per vPHB.
The reason for this is that ATSD registers belong to a physical NPU
so they cannot invalidate translations on GPUs attached to another NPU.
It is guaranteed by the host platform as it does not mix NVLink bridges
or GPUs from different NPU in the same IOMMU group. If more than one
IOMMU group is detected on a vPHB, this disables ATSD support for that
vPHB and prints a warning.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[aw: for vfio portions]
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <20190312082103.130561-1-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-26 10:41:23 +10:00
Cornelia Huck
41c3d4269b Support for booting from a vfio-ccw passthrough dasd device
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJcsHOvAAoJEC7Z13T+cC21IgoP/37yKOLOuagT4an9L573qWPp
 xCS48CJ4rNkpWXP3SHuTe+UHSp20sk+5b6rgM/VkLT2d301WS4gxF/VVu85ZFxGX
 tkqDwnljb87MqwTbH5Yj/U4mmq8tZkNg4CqmWuvJhWv4aFe6T/YtAzUkl7y7YCcT
 QfKqErl363JJKkL7cz+QWopFn5Gl/hy+mvEhbEtexWLIV1UNZ1i2hPWURfkh8FWH
 C1047CAfnC/rEJy0GcwkH4iCem8n4LWkMuf3Zehq+Yx+f2e8FfxMkoOtLJVCoKWj
 qoMidAplGqUxLoamZsbU1wEzwH6YH28X+uNUULsgDIIBuyW35ymbGsGTmKmNm6ED
 zVM1K7badvLeO3PXBxUkviZk7UFjxjXz3xCQMheY47wPoslfX+EN0xUvQ2gW2MvO
 Dhb9oPWsr/PFrMMJ35D2OOFH5kJC8Sj30YiP5lsnRoUBi4ecHCIUSlw6esKuYI+H
 JfDAYzxe6QqoGg5cxSNUXP+vAgU/FQq+nGNGHzHnsIR4Udt/JsAtNwxQ9DCbYR0C
 LA5qxZZTqkDtLPAHynqOzd8m7AoNaBSAgP2qp7Yp8ItXMyemlW2OIYR7yRxuL5bH
 zWO/deGYHp3+j9/Z0quzSUL5G85m0o1xgRJcJe9T2fYjWgsy271WFqaZg1JEvpI6
 zHcXEw71B7WQuAFaFO1+
 =tJvv
 -----END PGP SIGNATURE-----

Merge tag 's390-ccw-bios-2019-04-12' into s390-next-staging

Support for booting from a vfio-ccw passthrough dasd device

# gpg: Signature made Fri 12 Apr 2019 01:17:03 PM CEST
# gpg:                using RSA key 2ED9D774FE702DB5
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg:                 aka "Thomas Huth <thuth@redhat.com>" [undefined]
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [undefined]
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]

* tag 's390-ccw-bios-2019-04-12':
  pc-bios/s390: Update firmware images
  s390-bios: Use control unit type to find bootable devices
  s390-bios: Support booting from real dasd device
  s390-bios: Add channel command codes/structs needed for dasd-ipl
  s390-bios: Use control unit type to determine boot method
  s390-bios: Refactor virtio to run channel programs via cio
  s390-bios: Factor finding boot device out of virtio code path
  s390-bios: Extend find_dev() for non-virtio devices
  s390-bios: cio error handling
  s390-bios: Support for running format-0/1 channel programs
  s390-bios: ptr2u32 and u32toptr
  s390-bios: Map low core memory
  s390-bios: Decouple channel i/o logic from virtio
  s390-bios: Clean up cio.h
  s390-bios: decouple common boot logic from virtio
  s390-bios: decouple cio setup from virtio
  s390 vfio-ccw: Add bootindex property and IPLB data

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-04-25 14:09:20 +02:00
David Hildenbrand
905b7ee4d6 exec: Introduce qemu_maxrampagesize() and rename qemu_getrampagesize()
Rename qemu_getrampagesize() to qemu_minrampagesize(). While at it,
properly rename find_max_supported_pagesize() to
find_min_backend_pagesize().

s390x is actually interested into the maximum ram pagesize, so
introduce and use qemu_maxrampagesize().

Add a TODO, indicating that looking at any mapped memory backends is not
100% correct in some cases.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190417113143.5551-3-david@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-04-25 13:47:27 +02:00
Markus Armbruster
8f8f588565 vfio: Report warnings with warn_report(), not error_printf()
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190417190641.26814-8-armbru@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
2019-04-18 22:18:59 +02:00
Jason J. Herne
44445d8668 s390 vfio-ccw: Add bootindex property and IPLB data
Add bootindex property and iplb data for vfio-ccw devices. This allows us to
forward boot information into the bios for vfio-ccw devices.

Refactor s390_get_ccw_device() to return device type. This prevents us from
having to use messy casting logic in several places.

Signed-off-by: Jason J. Herne <jjherne@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1554388475-18329-2-git-send-email-jjherne@linux.ibm.com>
[thuth: fixed "typedef struct VFIOCCWDevice" build failure with clang]
Signed-off-by: Thomas Huth <thuth@redhat.com>
2019-04-12 12:39:52 +02:00
Daniel P. Berrangé
e1d0b37261 hw/vfio/ccw: avoid taking address members in packed structs
The GCC 9 compiler complains about many places in s390 code
that take the address of members of the 'struct SCHIB' which
is marked packed:

hw/vfio/ccw.c: In function ‘vfio_ccw_io_notifier_handler’:
hw/vfio/ccw.c:133:15: warning: taking address of packed member of ‘struct SCHIB’ may result in an unaligned pointer value \
[-Waddress-of-packed-member]
  133 |     SCSW *s = &sch->curr_status.scsw;
      |               ^~~~~~~~~~~~~~~~~~~~~~
hw/vfio/ccw.c:134:15: warning: taking address of packed member of ‘struct SCHIB’ may result in an unaligned pointer value \
[-Waddress-of-packed-member]
  134 |     PMCW *p = &sch->curr_status.pmcw;
      |               ^~~~~~~~~~~~~~~~~~~~~~

...snip many more...

Almost all of these are just done for convenience to avoid
typing out long variable/field names when referencing struct
members. We can get most of this convenience by taking the
address of the 'struct SCHIB' instead, avoiding triggering
the compiler warnings.

In a couple of places we copy via a local variable which is
a technique already applied elsewhere in s390 code for this
problem.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20190329111104.17223-12-berrange@redhat.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-04-03 11:19:57 +02:00
Markus Armbruster
dec9776049 trace-events: Fix attribution of trace points to source
Some trace points are attributed to the wrong source file.  Happens
when we neglect to update trace-events for code motion, or add events
in the wrong place, or misspell the file name.

Clean up with help of cleanup-trace-events.pl.  Same funnies as in the
previous commit, of course.  Manually shorten its change to
linux-user/trace-events to */signal.c.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20190314180929.27722-6-armbru@redhat.com
Message-Id: <20190314180929.27722-6-armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-03-22 16:18:07 +00:00
Markus Armbruster
a9779a3ab0 trace-events: Delete unused trace points
Tracked down with cleanup-trace-events.pl.  Funnies requiring manual
post-processing:

* block.c and blockdev.c trace points are in block/trace-events.

* hw/block/nvme.c uses the preprocessor to hide its trace point use
  from cleanup-trace-events.pl.

* include/hw/xen/xen_common.h trace points are in hw/xen/trace-events.

* net/colo-compare and net/filter-rewriter.c use pseudo trace points
  colo_compare_udp_miscompare and colo_filter_rewriter_debug to guard
  debug code.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20190314180929.27722-5-armbru@redhat.com
Message-Id: <20190314180929.27722-5-armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-03-22 16:18:07 +00:00
Markus Armbruster
500016e5db trace-events: Shorten file names in comments
We spell out sub/dir/ in sub/dir/trace-events' comments pointing to
source files.  That's because when trace-events got split up, the
comments were moved verbatim.

Delete the sub/dir/ part from these comments.  Gets rid of several
misspellings.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190314180929.27722-3-armbru@redhat.com
Message-Id: <20190314180929.27722-3-armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-03-22 16:18:07 +00:00
Peter Maydell
46316f1dff VFIO updates 2019-03-11
- Resolution support for mdev displays supporting EDID interface
    (Gerd Hoffmann)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJchrRTAAoJECObm247sIsiQwMQAIENkbd41VAcUFkoSLfloebP
 FppvUoCdWKkVLaOeEltlOGJyxh6Ixirv+iUTw3+5kNPyZyXdOaFp0pOr/SCD9p1x
 l2kYUazxCJM495py15o646X/ZKeUHKNU01X+7ZcQqQXwYqnNBVDVClAYw+kNJ2aL
 2QJ2KkaHu/JJJTQqWgOBcAVI10CeaBoTaZT65gmPELSUS3NsjSSjErJiQREzHoyP
 J0LsYuk1UVF80FP7ZNo2kS1UPmDEyQPISaDcgngvxjeR9KWE6TuHcsI9xJz4k4sm
 S/QxAlYWdXYX/I5pC0UlBEvaJxSsXSf4gr+AGYm4ebnptONT5/m82pc6YMitHr1C
 XDkbXkxv2n9q2NG9kbDvUdjlDt99W81XxTOpp/LkrGjrI7RkwowIHVSDNFYzmNFV
 Pma1Q93q1Zr6HiAm7817bEFltO9RjAMmiFCvgAyWWSwCwWI1I19BfCSJ/8aGhrwR
 GIIW+cTHZYXH70os354XrpYsW8sZNFbjtfhOUt5b5H4l341FR9+w6nmB/BJDz5U5
 ijm6ABWnT6UYx2n4lpwO0vuBtOMplL7tkxHI8+iiX4HeKGvhmaxLiNaeIFC5w65J
 MvdlTBMFKJgxXqfKDTlHkxkLoF7vjOUr7vflcSCny53RSUUztajpBy0ByRWIeTPH
 PvmnGbV+Y3EGhRSjpFwU
 =5V6/
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/awilliam/tags/vfio-updates-20190311.0' into staging

VFIO updates 2019-03-11

 - Resolution support for mdev displays supporting EDID interface
   (Gerd Hoffmann)

# gpg: Signature made Mon 11 Mar 2019 19:17:39 GMT
# gpg:                using RSA key 239B9B6E3BB08B22
# gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>" [full]
# gpg:                 aka "Alex Williamson <alex@shazbot.org>" [full]
# gpg:                 aka "Alex Williamson <alwillia@redhat.com>" [full]
# gpg:                 aka "Alex Williamson <alex.l.williamson@gmail.com>" [full]
# Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B  8A90 239B 9B6E 3BB0 8B22

* remotes/awilliam/tags/vfio-updates-20190311.0:
  vfio/display: delay link up event
  vfio/display: add xres + yres properties
  vfio/display: add edid support.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-12 13:37:29 +00:00
Alexey Kardashevskiy
013002f0fb vfio: Make vfio_get_region_info_cap public
This makes vfio_get_region_info_cap() to be used in quirks.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <20190307050518.64968-3-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 16:17:35 +11:00
Alexey Kardashevskiy
3cdd801b0b vfio/spapr: Rename local systempagesize variable
The "systempagesize" name suggests that it is the host system page size
while it is the smallest page size of memory backing the guest RAM so
let's rename it to stop confusion. This should cause no behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190227085149.38596-4-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 10:50:59 +11:00
Alexey Kardashevskiy
1610799876 vfio/spapr: Fix indirect levels calculation
The current code assumes that we can address more bits on a PCI bus
for DMA than we really can but there is no way knowing the actual limit.

This makes a better guess for the number of levels and if the kernel
fails to allocate that, this increases the level numbers till succeeded
or reached the 64bit limit.

This adds levels to the trace point.

This may cause the kernel to warn about failed allocation:
   [65122.837458] Failed to allocate a TCE memory, level shift=28
which might happen if MAX_ORDER is not large enough as it can vary:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/Kconfig?h=v5.0-rc2#n727

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190227085149.38596-3-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12 10:50:59 +11:00
Gerd Hoffmann
8781c70144 vfio/display: delay link up event
Kick the display link up event with a 0.1 sec delay,
so the guest has a chance to notice the link down first.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
[update for redefined macro]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-03-11 12:59:59 -06:00
Gerd Hoffmann
c62a0c7ce3 vfio/display: add xres + yres properties
This allows configure the display resolution which the vgpu should use.
The information will be passed to the guest using EDID, so the mdev
driver must support the vfio edid region for this to work.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-03-11 12:59:59 -06:00
Gerd Hoffmann
08479114b0 vfio/display: add edid support.
This patch adds EDID support to the vfio display (aka vgpu) code.
When supported by the mdev driver qemu will generate a EDID blob
and pass it on using the new vfio edid region.  The EDID blob will
be updated on UI changes (i.e. window resize), so the guest can
adapt.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
[remove control flow via macro, use unsigned format specifier]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-03-11 12:59:59 -06:00
Paolo Bonzini
97575f928f vfio-pci: enable by default
CONFIG_VFIO_PCI was not "default y" - and once you do that, it is also
important to disable it if PCI is not there.

Reported-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-11 14:45:10 +01:00
Paolo Bonzini
2ac041c2c3 vfio: express vfio dependencies with Kconfig
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-07 21:45:53 +01:00
Paolo Bonzini
e0e312f352 build: switch to Kconfig
The make_device_config.sh script is replaced by minikconf, which
is modified to support the same command line as its predecessor.

The roots of the parsing are default-configs/*.mak, Kconfig.host and
hw/Kconfig.  One difference with make_device_config.sh is that all symbols
have to be defined in a Kconfig file, including those coming from the
configure script.  This is the reason for the Kconfig.host file introduced
in the previous patch. Whenever a file in default-configs/*.mak used
$(...) to refer to a config-host.mak symbol, this is replaced by a
Kconfig dependency; this part must be done already in this patch
for bisectability.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20190123065618.3520-28-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-07 21:45:53 +01:00
Paolo Bonzini
82f5181777 kconfig: introduce kconfig files
The Kconfig files were generated mostly with this script:

  for i in `grep -ho CONFIG_[A-Z0-9_]* default-configs/* | sort -u`; do
    set fnord `git grep -lw $i -- 'hw/*/Makefile.objs' `
    shift
    if test $# = 1; then
      cat >> $(dirname $1)/Kconfig << EOF
config ${i#CONFIG_}
    bool

EOF
      git add $(dirname $1)/Kconfig
    else
      echo $i $*
    fi
  done
  sed -i '$d' hw/*/Kconfig
  for i in hw/*; do
    if test -d $i && ! test -f $i/Kconfig; then
      touch $i/Kconfig
      git add $i/Kconfig
    fi
  done

Whenever a symbol is referenced from multiple subdirectories, the
script prints the list of directories that reference the symbol.
These symbols have to be added manually to the Kconfig files.

Kconfig.host and hw/Kconfig were created manually.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20190123065618.3520-27-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-07 21:45:53 +01:00
Peter Maydell
1ba530a4ec s390x updates:
- tcg: support the floating-point extension facility
 - vfio-ap: support hot(un)plug of vfio-ap device
 - fixes + cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEw9DWbcNiT/aowBjO3s9rk8bwL68FAlx9EjsSHGNvaHVja0By
 ZWRoYXQuY29tAAoJEN7Pa5PG8C+vS1EP/2breENmZ8XymtsXtIWwli4WUIidOdvf
 KKXt7sCmWwPr19JhLjtLVfdETT9EVFKmA30pNxB4EzVkD1KCFYMYK8jvTj6FDFd/
 pYXqQGZopwQqr+jhTWtdmSoc15+jkyCD4Gg9+jlLB7XvfY6g1TTttYxT8RxonGd9
 o8Xy4ODY++sEVA1Y9B6QFc9ePbzDpN6jWHJht6ZHGeJUmLexwanKC6LixSQkqhsG
 B6+J+j6NUO3HLvHvIvNpVDQEgFYeCBzhpVF0Atl0O4MmFH7Jke9/iI3b6dX5iGtb
 cWmCffuu91dBjJFitDABS7n3gNRYV/cFAz64y4Zm6KBz2h/xy7lzHQDtdPw4AWOa
 OeiFoWPPXFy8ClbQJvdMNzxgMK5GjejfudAxZ2lMWY+x8ThKnL/g3xv/hlWEEFpq
 tnGbHALKA2qGBlv+6M5q/nk0ZpNdMt7wFzhkbGZeeUepgsyd0JRJIuaFqLaSxawl
 p6pQ4XAsBKH2Q0ZgOak2atZV2coyFqXNPFo3HDfFGP82G9K6sRMFAbNSHJeoSjfE
 4ArNBFxXcs5OTauprw+wqgUhOB8DPtnlQIsqfxuYEP6X0Ayu1BOfrUpEEO/8Mbsh
 j6MxXinpo5NqC5v1vJZsypF7Jr/Rft5lHV30WXGovmUPhPVMQ9aIw+uSdL96CHec
 JK4EuqHMJrij
 =iS1q
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190304' into staging

s390x updates:
- tcg: support the floating-point extension facility
- vfio-ap: support hot(un)plug of vfio-ap device
- fixes + cleanups

# gpg: Signature made Mon 04 Mar 2019 11:55:39 GMT
# gpg:                using RSA key C3D0D66DC3624FF6A8C018CEDECF6B93C6F02FAF
# gpg:                issuer "cohuck@redhat.com"
# gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>" [unknown]
# gpg:                 aka "Cornelia Huck <huckc@linux.vnet.ibm.com>" [full]
# gpg:                 aka "Cornelia Huck <cornelia.huck@de.ibm.com>" [full]
# gpg:                 aka "Cornelia Huck <cohuck@kernel.org>" [unknown]
# gpg:                 aka "Cornelia Huck <cohuck@redhat.com>" [unknown]
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0  18CE DECF 6B93 C6F0 2FAF

* remotes/cohuck/tags/s390x-20190304: (27 commits)
  s390x: Add floating-point extension facility to "qemu" cpu model
  s390x/tcg: Handle all rounding modes overwritten by BFP instructions
  s390x/tcg: Implement rounding mode and XxC for LOAD ROUNDED
  s390x/tcg: Implement XxC and checks for most FP instructions
  s390x/tcg: Prepare for IEEE-inexact-exception control (XxC)
  s390x/tcg: Refactor saving/restoring the bfp rounding mode
  s390x/tcg: Check for exceptions in SET BFP ROUNDING MODE
  s390x/tcg: Handle SET FPC AND LOAD FPC 3-bit BFP rounding modes
  s390x/tcg: Fix simulated-IEEE exceptions
  s390x/tcg: Refactor SET FPC AND SIGNAL handling
  s390x/tcg: Hide IEEE underflows in some scenarios
  s390x/tcg: Fix parts of IEEE exception handling
  s390x/tcg: Factor out conversion of softfloat exceptions
  s390x/tcg: Fix rounding from float128 to uint64_t/uint32_t
  s390x/tcg: Fix TEST DATA CLASS instructions
  s390x/tcg: Implement LOAD COUNT TO BLOCK BOUNDARY
  s390x/tcg: Implement LOAD LENGTHENED short HFP to long HFP
  s390x/tcg: Factor out gen_addi_and_wrap_i64() from get_address()
  s390x/tcg: Factor out vec_full_reg_offset()
  s390x/tcg: Clarify terminology in vec_reg_offset()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-04 13:38:54 +00:00
Peter Maydell
1d31f1872b pci, pc, virtio: fixes, cleanups, tests
Lots of work on tests: BiosTablesTest UEFI app,
 vhost-user testing for non-Linux hosts.
 Misc cleanups and fixes all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJccBqMAAoJECgfDbjSjVRpvSEIAKYPRNdCBX/SSS/L/tmJS5Zt
 8IyU/HW1YJ249vO+aT6z4Q3QPgqNC3KjXC3brx/WRoPZnRroen4rv2Kqnk6SayPa
 a52d2ubXKWxb3swdG1CAVzFRhq/ABpgAPx0dr1JW+RXgo2lxpJ4GNYxKMosQTaPE
 hRNeXl1XlcIK525kJhFH3Hlij9mTRuY6T7ydpPQd8dUq2dBRaL9RrzZRrkZxCy6l
 gQPUqNzPhG0XXyOiJmwYyVX0zGzbYrMLrMQAor2SBIYmU+zv2eZGPJUYxoMTUMzt
 YR0WCpvkvPITlAryaBoozAIDYVz8PxBRT1KRwpDal+2rzlm6o+veKDiF8R46gn0=
 =GzUz
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pci, pc, virtio: fixes, cleanups, tests

Lots of work on tests: BiosTablesTest UEFI app,
vhost-user testing for non-Linux hosts.
Misc cleanups and fixes all over the place

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Fri 22 Feb 2019 15:51:40 GMT
# gpg:                using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream: (26 commits)
  pci: Sanity test minimum downstream LNKSTA
  hw/smbios: fix offset of type 3 sku field
  pci: Move NVIDIA vendor id to the rest of ids
  virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size
  virtio-balloon: Use ram_block_discard_range() instead of raw madvise()
  virtio-balloon: Rework ballon_page() interface
  virtio-balloon: Corrections to address verification
  virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate
  i386/kvm: ignore masked irqs when update msi routes
  contrib/vhost-user-blk: fix the compilation issue
  Revert "contrib/vhost-user-blk: fix the compilation issue"
  pc-dimm: use same mechanism for [get|set]_addr
  tests/data: introduce "uefi-boot-images" with the "bios-tables-test" ISOs
  tests/uefi-test-tools: add build scripts
  tests: introduce "uefi-test-tools" with the BiosTablesTest UEFI app
  roms: build the EfiRom utility from the roms/edk2 submodule
  roms: add the edk2 project as a git submodule
  vhost-user-test: create a temporary directory per TestServer
  vhost-user-test: small changes to init_hugepagefs
  vhost-user-test: create a main loop per TestServer
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-04 11:04:31 +00:00
Tony Krowiak
374b78e370 s390x/vfio-ap: Implement hot plug/unplug of vfio-ap device
Introduces hot plug/unplug support for the vfio-ap device.

To hot plug a vfio-ap device using the QEMU device_add command:

	(qemu) device_add vfio-ap,sysfsdev=$path-to-mdev

	Where $path-to-mdev is the absolute path to the mediated matrix device
	to which AP resources to be used by the guest have been assigned.

A vfio-ap device can be hot plugged only if:

1. A vfio-ap device has not been attached to the virtual machine's ap-bus
   via the QEMU command line or a prior hot plug action.

2. The guest was started with the CPU model feature for AP enabled
   (e.g., -cpu host,ap=on)

To hot unplug a vfio-ap device using the QEMU device_del command:

	(qemu) device_del vfio-ap,sysfsdev=$path-to-mdev

	Where $path-to-mdev is the absolute path to the mediated matrix device
	specified when the vfio-ap device was attached to the virtual machine's
	ap-bus.

A vfio-ap device can be hot unplugged only if:

1. A vfio-ap device has been attached to the virtual machine's ap-bus
   via the QEMU command line or a prior hot plug action.

2. The guest was started with the CPU model feature for AP enabled
   (e.g., -cpu host,ap=on)

Please note that a hot plug handler is not necessary for the vfio-ap device
because the AP matrix configuration for the guest is performed by the
kernel device driver when the vfio-ap device is realized. The vfio-ap device
represents a VFIO mediated device created in the host sysfs for use by a guest.
The mdev device is configured with an AP matrix (i.e., adapters and domains) via
its sysfs attribute interfaces prior to starting the guest or plugging a vfio-ap
device in. When the device is realized, a file descriptor is opened on the mdev
device which results in a callback to the vfio_ap kernel device driver. The
device driver then configures the AP matrix in the guest's SIE state description
from the AP matrix assigned via the mdev device's sysfs interfaces. The AP
devices will be created for the guest when the AP bus running on the guest
subsequently performs its periodic scan for AP devices.

The qdev_simple_device_unplug_cb() callback function is used for the same
reaons; namely, the vfio_ap kernel device driver will perform the AP resource
de-configuration for the guest when the vfio-ap device is unplugged. When the
vfio-ap device is unrealized, the mdev device file descriptor is closed which
results in a callback to the vfio_ap kernel device driver. The device driver
then clears the AP matrix configuration in the guest's SIE state description
and resets all of the affected queues. The AP devices created for the guest
will be removed when the AP bus running on the guest subsequently performs
its periodic scan and finds there are no longer any AP resources assigned to the
guest.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <1550519397-25359-2-git-send-email-akrowiak@linux.ibm.com>
[CH: adapt to changed qbus_set_hotplug_handler() signature]
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-03-04 11:49:31 +01:00
Alexey Kardashevskiy
ee1cd0099a pci: Move NVIDIA vendor id to the rest of ids
sPAPR code will use it too so move it from VFIO to the common code.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190214051440.59167-1-aik@ozlabs.ru>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-02-22 10:51:31 -05:00
Eric Auger
2b6326c0bf hw/vfio/common: Refactor container initialization
We introduce the vfio_init_container_type() helper.
It computes the highest usable iommu type and then
set the container and the iommu type.

Its usage in vfio_connect_container() makes the code
ready for addition of new iommu types.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-02-21 21:07:03 -07:00
Alex Williamson
567d7d3e6b vfio/common: Work around kernel overflow bug in DMA unmap
A kernel bug was introduced in v4.15 via commit 71a7d3d78e3c which
adds a test for address space wrap-around in the vfio DMA unmap path.
Unfortunately due to overflow, the kernel detects an unmap of the last
page in the 64-bit address space as a wrap-around.  In QEMU, a Q35
guest with VT-d emulation and guest IOMMU enabled will attempt to make
such an unmap request during VM system reset, triggering an error:

  qemu-kvm: VFIO_UNMAP_DMA: -22
  qemu-kvm: vfio_dma_unmap(0x561f059948f0, 0xfef00000, 0xffffffff01100000) = -22 (Invalid argument)

Here the IOVA start address (0xfef00000) and the size parameter
(0xffffffff01100000) add to exactly 2^64, triggering the bug.  A
kernel fix is queued for the Linux v5.0 release to address this.

This patch implements a workaround to retry the unmap, excluding the
final page of the range when we detect an unmap failing which matches
the requirements for this issue.  This is expected to be a safe and
complete workaround as the VT-d address space does not extend to the
full 64-bit space and therefore the last page should never be mapped.

This workaround can be removed once all kernels with this bug are
sufficiently deprecated.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1662291
Reported-by: Pei Zhang <pezhang@redhat.com>
Debugged-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-02-21 21:07:03 -07:00
Paolo Bonzini
f5ce0a5f5a hw/vfio/Makefile.objs: Create new CONFIG_* variables for VFIO core and PCI
Make hw/vfio configurable and add new CONFIG_VFIO_* to the
default-configs/s390x*-softmmu.mak.  This allow a finer-grain
selection of the various VFIO backends.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20190202072456.6468-28-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-05 16:50:21 +01:00
Paolo Bonzini
e01ee04f8b vfio: move conditional up to hw/Makefile.objs
Instead of wrapping the entire Makefile.objs with an ifeq/endif, just
include the directory only for Linux.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20190202072456.6468-4-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-05 16:50:19 +01:00
Daniel P. Berrangé
772f1b3721 trace: forbid use of %m in trace event format strings
The '%m' format instructs glibc's printf()/syslog() implementation to
insert the contents of strerror(errno). Since this is a glibc extension
it should generally be avoided in QEMU due to need for portability to a
variety of platforms.

Even though vfio is Linux-only code that could otherwise use "%m", it
must still be avoided in trace-events files because several of the
backends do not use the format string and so this error information is
invisible to them.

The errno string value should be given as an explicit trace argument
instead, making it accessible to all backends. This also allows it to
work correctly with future patches that use the format string with
systemtap's simple printf code.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-id: 20190123120016.4538-4-berrange@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-01-24 14:16:56 +00:00
Paolo Bonzini
f481ee2d5e qemu/queue.h: typedef QTAILQ heads
This will be needed when we change the QTAILQ head and elem structs
to unions.  However, it is also consistent with the usage elsewhere
in QEMU for other list head structs (see for example FsMountList).

Note that most QTAILQs only need their name in order to do backwards
walks.  Those do not break with the struct->union change, and anyway
the change will also remove the need to name heads when doing backwards
walks, so those are not touched here.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 15:46:55 +01:00
Paolo Bonzini
10ca76b4d2 vfio: make vfio_address_spaces static
It is not used outside hw/vfio/common.c, so it does not need to
be extern.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 15:46:54 +01:00
Peter Maydell
15763776bf pci, pc, virtio: fixes, features
VTD fixes
 IR and split irqchip are now the default for Q35
 ACPI refactoring
 hotplug refactoring
 new names for virtio devices
 multiple pcie link width/speeds
 PCI fixes
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcG967AAoJECgfDbjSjVRpUlMH/1wy8WW7Br/4JxlWUPsfZTqZ
 0Lg2n59wuFzRVS+gLotp6bUaJGR9xn9fKjI1wfD59oVrDTKyauuw82v0OityEs33
 ZquFecuJvP6k5N40DkqA9YJjKSw7nUmLrsyrD0t2H43npikP2aD9f6yootrr3oVV
 nPwBvyT9ykIBQc7IzsHDiLw3EPdIf+9IR7+l+PLptzV0zK43Jfwi/nzHIQq00UMz
 eLM/ejQLIx4BZJnYGS0Cy6v3K7cS3k45LHDY0cGc0id2jHFN2vdQyHCF9I1ps72q
 pSlhMaLEwvQSYyre6iFTG5uuvyIPWv3LOkaBEwMSA5B/HXuEb2RKHThYzS9dc68=
 =OwW7
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pci, pc, virtio: fixes, features

VTD fixes
IR and split irqchip are now the default for Q35
ACPI refactoring
hotplug refactoring
new names for virtio devices
multiple pcie link width/speeds
PCI fixes

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Thu 20 Dec 2018 18:26:03 GMT
# gpg:                using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream: (44 commits)
  x86-iommu: turn on IR by default if proper
  x86-iommu: switch intr_supported to OnOffAuto type
  q35: set split kernel irqchip as default
  pci: Adjust PCI config limit based on bus topology
  spapr_pci: perform unplug via the hotplug handler
  pci/shpc: perform unplug via the hotplug handler
  pci: Reuse pci-bridge hotplug handler handlers for pcie-pci-bridge
  pci/pcie: perform unplug via the hotplug handler
  pci/pcihp: perform unplug via the hotplug handler
  pci/pcihp: overwrite hotplug handler recursively from the start
  pci/pcihp: perform check for bus capability in pre_plug handler
  s390x/pci: rename hotplug handler callbacks
  pci/shpc: rename hotplug handler callbacks
  pci/pcie: rename hotplug handler callbacks
  hw/i386: Remove deprecated machines pc-0.10 and pc-0.11
  hw: acpi: Remove AcpiRsdpDescriptor and fix tests
  hw: acpi: Export and share the ARM RSDP build
  hw: arm: Support both legacy and current RSDP build
  hw: arm: Convert the RSDP build to the buid_append_foo() API
  hw: arm: Carry RSDP specific data through AcpiRsdpData
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-12-21 14:06:01 +00:00
Markus Armbruster
b7d89466dd Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes, with the changes
to the following files manually reverted:

    contrib/libvhost-user/libvhost-user-glib.h
    contrib/libvhost-user/libvhost-user.c
    contrib/libvhost-user/libvhost-user.h
    linux-user/mips64/cpu_loop.c
    linux-user/mips64/signal.c
    linux-user/sparc64/cpu_loop.c
    linux-user/sparc64/signal.c
    linux-user/x86_64/cpu_loop.c
    linux-user/x86_64/signal.c
    target/s390x/gen-features.c
    tests/migration/s390x/a-b-bios.c
    tests/test-rcu-simpleq.c
    tests/test-rcu-tailq.c

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20181204172535.2799-1-armbru@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Acked-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Viktor Prutyanov <viktor.prutyanov@phystech.edu>
2018-12-20 10:29:08 +01:00
Alex Williamson
d26e543891 vfio/pci: Remove PCIe Link Status emulation
Now that the downstream port will virtually negotiate itself to the
link status of the downstream device, we can remove this emulation.
It's not clear that it was every terribly useful anyway.

Tested-by: Geoffrey McRae <geoff@hostfission.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-19 16:48:16 -05:00
Alex Williamson
d96a0ac71c pcie: Create enums for link speed and width
In preparation for reporting higher virtual link speeds and widths,
create enums and macros to help us manage them.

Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-19 16:48:16 -05:00
Cornelia Huck
1883e8fc80 vfio-ap: flag as compatible with balloon
vfio-ap devices do not pin any pages in the host. Therefore, they
are compatible with memory ballooning.

Flag them as compatible, so both vfio-ap and a balloon can be
used simultaneously.

Cc: qemu-stable@nongnu.org
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2018-12-12 10:39:28 +01:00
Cornelia Huck
ab17b1fc75 s390x/vfio-ap: report correct error
If ioctl(..., VFIO_DEVICE_RESET) fails, we want to report errno
instead of ret (which is always -1 on error).

Fixes Coverity issue CID 1396176.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2018-11-05 09:55:01 +01:00
Markus Armbruster
c3b8e3e0ed vfio: Clean up error reporting after previous commit
The previous commit changed vfio's warning messages from

    vfio warning: DEV-NAME: Could not frobnicate

to

    warning: vfio DEV-NAME: Could not frobnicate

To match this change, change error messages from

    vfio error: DEV-NAME: On fire

to

    vfio DEV-NAME: On fire

Note the loss of "error".  If we think marking error messages that way
is a good idea, we should mark *all* error messages, i.e. make
error_report() print it.

Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <20181017082702.5581-7-armbru@redhat.com>
2018-10-19 14:51:34 +02:00
Markus Armbruster
e1eb292ace vfio: Use warn_report() & friends to report warnings
The vfio code reports warnings like

    error_report(WARN_PREFIX "Could not frobnicate", DEV-NAME);

where WARN_PREFIX is defined so the message comes out as

    vfio warning: DEV-NAME: Could not frobnicate

This usage predates the introduction of warn_report() & friends in
commit 97f40301f1.  It's time to convert to that interface.  Since
these functions already prefix the message with "warning: ", replace
WARN_PREFIX by VFIO_MSG_PREFIX, so the messages come out like

    warning: vfio DEV-NAME: Could not frobnicate

The next commit will replace ERR_PREFIX.

Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181017082702.5581-6-armbru@redhat.com>
2018-10-19 14:51:34 +02:00
Markus Armbruster
4b5766488f error: Fix use of error_prepend() with &error_fatal, &error_abort
From include/qapi/error.h:

  * Pass an existing error to the caller with the message modified:
  *     error_propagate(errp, err);
  *     error_prepend(errp, "Could not frobnicate '%s': ", name);

Fei Li pointed out that doing error_propagate() first doesn't work
well when @errp is &error_fatal or &error_abort: the error_prepend()
is never reached.

Since I doubt fixing the documentation will stop people from getting
it wrong, introduce error_propagate_prepend(), in the hope that it
lures people away from using its constituents in the wrong order.
Update the instructions in error.h accordingly.

Convert existing error_prepend() next to error_propagate to
error_propagate_prepend().  If any of these get reached with
&error_fatal or &error_abort, the error messages improve.  I didn't
check whether that's the case anywhere.

Cc: Fei Li <fli@suse.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181017082702.5581-2-armbru@redhat.com>
2018-10-19 14:51:34 +02:00
Li Qiang
2683ccd5be vfio-pci: make vfio-pci device more QOM conventional
Define a TYPE_VFIO_PCI and drop DO_UPCAST.

Signed-off-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-10-15 11:22:29 -06:00
Eric Auger
a49531ebd0 vfio/platform: Make the vfio-platform device non-abstract
Up to now the vfio-platform device has been abstract and could not be
instantiated.  The integration of a new vfio platform device required
creating a dummy derived device which only set the compatible string.

Following the few vfio-platform device integrations we have seen the
actual requested adaptation happens on device tree node creation
(sysbus-fdt).

Hence remove the abstract setting, and read the list of compatible
values from sysfs if not set by a derived device.

Update the amd-xgbe and calxeda-xgmac drivers to fill in the number of
compatible values, as there can now be more than one.

Note that sysbus-fdt does not support the instantiation of the
vfio-platform device yet.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
[geert: Rebase, set user_creatable=true, use compatible values in sysfs
	instead of user-supplied manufacturer/model options, reword]
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-10-15 10:52:09 -06:00
Gerd Hoffmann
b290659fc3 hw/vfio/display: add ramfb support
So we have a boot display when using a vgpu as primary display.

ramfb depends on a fw_cfg file.  fw_cfg files can not be added and
removed at runtime, therefore a ramfb-enabled vfio device can't be
hotplugged.

Add a nohotplug variant of the vfio-pci device (as child class).  Add
the ramfb property to the nohotplug variant only.  So to enable the vgpu
display with boot support use this:

  -device vfio-pci-nohotplug,display=on,ramfb=on,sysfsdev=...

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-10-15 10:52:09 -06:00
Tony Krowiak
2fe2942cd6 s390x/vfio: ap: Introduce VFIO AP device
Introduces a VFIO based AP device. The device is defined via
the QEMU command line by specifying:

    -device vfio-ap,sysfsdev=<path-to-mediated-matrix-device>

There may be only one vfio-ap device configured for a guest.

The mediated matrix device is created by the VFIO AP device
driver by writing a UUID to a sysfs attribute file (see
docs/vfio-ap.txt). The mediated matrix device will be named
after the UUID. Symbolic links to the $uuid are created in
many places, so the path to the mediated matrix device $uuid
can be specified in any of the following ways:

/sys/devices/vfio_ap/matrix/$uuid
/sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough/devices/$uuid
/sys/bus/mdev/devices/$uuid
/sys/bus/mdev/drivers/vfio_mdev/$uuid

When the vfio-ap device is realized, it acquires and opens the
VFIO iommu group to which the mediated matrix device is
bound. This causes a VFIO group notification event to be
signaled. The vfio_ap device driver's group notification
handler will get called at which time the device driver
will configure the the AP devices to which the guest will
be granted access.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Tested-by: Pierre Morel <pmorel@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Pierre Morel <pmorel@linux.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20181010170309.12045-6-akrowiak@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
[CH: added missing g_free and device category]
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2018-10-12 11:32:18 +02:00
Cornelia Huck
c55510b722 qemu-error: add {error, warn}_report_once_cond
Add two functions to print an error/warning report once depending
on a passed-in condition variable and flip it if printed. This is
useful if you want to print a message not once-globally, but e.g.
once-per-device.

Inspired by warn_once() in hw/vfio/ccw.c, which has been replaced
with warn_report_once_cond().

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20180830145902.27376-2-cohuck@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Function comments reworded]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-09-24 17:13:07 +02:00
Alex Williamson
8709b3954d vfio/pci: Fix failure to close file descriptor on error
A new error path fails to close the device file descriptor when
triggered by a ballooning incompatibility within the group.  Fix it.

Fixes: 238e917285 ("vfio/ccw/pci: Allow devices to opt-in for ballooning")
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-23 10:45:58 -06:00
Alex Williamson
a1c0f88649 vfio/pci: Handle subsystem realpath() returning NULL
Fix error reported by Coverity where realpath can return NULL,
resulting in a segfault in strcmp().  This should never happen given
that we're working through regularly structured sysfs paths, but
trivial enough to easily avoid.

Fixes: 238e917285 ("vfio/ccw/pci: Allow devices to opt-in for ballooning")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-23 10:45:57 -06:00
Alexey Kardashevskiy
c26bc185b7 vfio/spapr: Allow backing bigger guest IOMMU pages with smaller physical pages
At the moment the PPC64/pseries guest only supports 4K/64K/16M IOMMU
pages and POWER8 CPU supports the exact same set of page size so
so far things worked fine.

However POWER9 supports different set of sizes - 4K/64K/2M/1G and
the last two - 2M and 1G - are not even allowed in the paravirt interface
(RTAS DDW) so we always end up using 64K IOMMU pages, although we could
back guest's 16MB IOMMU pages with 2MB pages on the host.

This stores the supported host IOMMU page sizes in VFIOContainer and uses
this later when creating a new DMA window. This uses the system page size
(64k normally, 2M/16M/1G if hugepages used) as the upper limit of
the IOMMU pagesize.

This changes the type of @pagesize to uint64_t as this is what
memory_region_iommu_get_min_page_size() returns and clz64() takes.

There should be no behavioral changes on platforms other than pseries.
The guest will keep using the IOMMU page size selected by the PHB pagesize
property as this only changes the underlying hardware TCE table
granularity.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-08-21 14:28:45 +10:00
Alex Williamson
238e917285 vfio/ccw/pci: Allow devices to opt-in for ballooning
If a vfio assigned device makes use of a physical IOMMU, then memory
ballooning is necessarily inhibited due to the page pinning, lack of
page level granularity at the IOMMU, and sufficient notifiers to both
remove the page on balloon inflation and add it back on deflation.
However, not all devices are backed by a physical IOMMU.  In the case
of mediated devices, if a vendor driver is well synchronized with the
guest driver, such that only pages actively used by the guest driver
are pinned by the host mdev vendor driver, then there should be no
overlap between pages available for the balloon driver and pages
actively in use by the device.  Under these conditions, ballooning
should be safe.

vfio-ccw devices are always mediated devices and always operate under
the constraints above.  Therefore we can consider all vfio-ccw devices
as balloon compatible.

The situation is far from straightforward with vfio-pci.  These
devices can be physical devices with physical IOMMU backing or
mediated devices where it is unknown whether a physical IOMMU is in
use or whether the vendor driver is well synchronized to the working
set of the guest driver.  The safest approach is therefore to assume
all vfio-pci devices are incompatible with ballooning, but allow user
opt-in should they have further insight into mediated devices.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-17 09:27:16 -06:00
Alex Williamson
c65ee43315 vfio: Inhibit ballooning based on group attachment to a container
We use a VFIOContainer to associate an AddressSpace to one or more
VFIOGroups.  The VFIOContainer represents the DMA context for that
AdressSpace for those VFIOGroups and is synchronized to changes in
that AddressSpace via a MemoryListener.  For IOMMU backed devices,
maintaining the DMA context for a VFIOGroup generally involves
pinning a host virtual address in order to create a stable host
physical address and then mapping a translation from the associated
guest physical address to that host physical address into the IOMMU.

While the above maintains the VFIOContainer synchronized to the QEMU
memory API of the VM, memory ballooning occurs outside of that API.
Inflating the memory balloon (ie. cooperatively capturing pages from
the guest for use by the host) simply uses MADV_DONTNEED to "zap"
pages from QEMU's host virtual address space.  The page pinning and
IOMMU mapping above remains in place, negating the host's ability to
reuse the page, but the host virtual to host physical mapping of the
page is invalidated outside of QEMU's memory API.

When the balloon is later deflated, attempting to cooperatively
return pages to the guest, the page is simply freed by the guest
balloon driver, allowing it to be used in the guest and incurring a
page fault when that occurs.  The page fault maps a new host physical
page backing the existing host virtual address, meanwhile the
VFIOContainer still maintains the translation to the original host
physical address.  At this point the guest vCPU and any assigned
devices will map different host physical addresses to the same guest
physical address.  Badness.

The IOMMU typically does not have page level granularity with which
it can track this mapping without also incurring inefficiencies in
using page size mappings throughout.  MMU notifiers in the host
kernel also provide indicators for invalidating the mapping on
balloon inflation, not for updating the mapping when the balloon is
deflated.  For these reasons we assume a default behavior that the
mapping of each VFIOGroup into the VFIOContainer is incompatible
with memory ballooning and increment the balloon inhibitor to match
the attached VFIOGroups.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-17 09:27:16 -06:00