Commit Graph

36775 Commits

Author SHA1 Message Date
Nicholas Piggin
27f61d1b0b ppc/pnv: Implement big-core PVR for Power9/10
Power9/10 CPUs have PVR[51] set in small-core mode and clear in big-core
mode. This is used by skiboot firmware.

PVR is not hypervisor-privileged but it is not so important that spapr
to implement this because it's generally masked out of PVR matching code
in kernels, and only used by firmware.

Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:21:06 +10:00
Nicholas Piggin
cf0eb929e5 ppc/pnv: Add allow for big-core differences in DT generation
device-tree building needs to account for big-core mode, because it is
driven by qemu cores (small cores). Every second core should be skipped,
and every core should describe threads for both small-cores that make
up the big core.

Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:21:06 +10:00
Nicholas Piggin
c26504afd5 ppc/pnv: Add a big-core mode that joins two regular cores
POWER9 and POWER10 machines come in two variants, big-core and
small-core. Big-core machines are SMT8 from software's point of view,
but the low level platform topology ("xscom registers and pervasive
addressing"), these look more like a pair of small cores ganged
together.

Presently the way this is modelled is to create one SMT8 PnvCore and add
special cases to xscom and pervasive for big-core mode that tries to
split this into two small cores, but this is becoming too complicated to
manage.

A better approach is to create 2 core structures and ganging them
together to look like an SMT8 core in TCG. Then the xscom and pervasive
models mostly do not need to differentiate big and small core modes.

This change adds initial mode bits and QEMU topology handling to
split SMT8 cores into 2xSMT4 cores.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:21:06 +10:00
Nicholas Piggin
59c921f229 ppc: Add has_smt_siblings property to CPUPPCState
The decision to branch out to a slower SMT path in instruction
emulation will become a bit more complicated with the way that
"big-core" topology that will be implemented in subsequent changes.
Hide these details from the wider CPU emulation code with a bool
has_smt_siblings flag that can be set by machine initialisation.

Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:21:06 +10:00
Nicholas Piggin
feb37fdc82 ppc: Add a core_index to CPUPPCState for SMT vCPUs
The way SMT thread siblings are matched is clunky, using hard-coded
logic that checks the PIR SPR.

Change that to use a new core_index variable in the CPUPPCState,
where all siblings have the same core_index. CPU realize routines have
flexibility in setting core/sibling topology.

Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:21:06 +10:00
Nicholas Piggin
25de28220c ppc/pnv: Extend chip_pir class method to TIR as well
The chip_pir chip class method allows the platform to set the PIR
processor identification register. Extend this to a more general
ID function which also allows the TIR to be set. This is in
preparation for "big core", which is a more complicated topology
of cores and threads.

Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:21:06 +10:00
Nicholas Piggin
d76cb5a53b ppc/pnv: use class attribute to limit SMT threads for different machines
Use a class attribute to specify the number of SMT threads per core
permitted for different machines, 8 for powernv8 and 4 for powernv9/10.

Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:21:06 +10:00
Nicholas Piggin
0ca94b2f11 ppc/pnv: Move timebase state into PnvCore
The timebase state machine is per per-core state and can be driven
by any thread in the core. It is currently implemented as a hack
where the state is in a CPU structure and only thread 0's state is
accessed by the chiptod, which limits programming the timebase
side of the state machine to thread 0 of a core.

Move the state out into PnvCore and share it among all threads.

Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:21:06 +10:00
Nicholas Piggin
060e614367 ppc/pnv: Add pointer from PnvCPUState to PnvCore
This helps move core state from CPU to core structures.

Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:21:06 +10:00
Nicholas Piggin
24bd283bcc ppc/pnv: Implement ADU access to LPC space
One of the functions of the ADU is indirect memory access engines that
send and receive data via ADU registers.

This implements the ADU LPC memory access functionality sufficiently
for IBM proprietary firmware to access the UART and print characters
to the serial port as it does on real hardware.

This requires a linkage between adu and lpc, which allows adu to
perform memory access in the lpc space.

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:21:06 +10:00
Nicholas Piggin
53f18b3ef2 ppc/pnv: Begin a more complete ADU LPC model for POWER9/10
This implements a framework for an ADU unit model.

The ADU unit actually implements XSCOM, which is the bridge between MMIO
and PIB. However it also includes control and status registers and other
functions that are exposed as PIB (xscom) registers.

To keep things simple, pnv_xscom.c remains the XSCOM bridge
implementation, and pnv_adu.c implements the ADU registers and other
functions.

So far, just the ADU no-op registers in the pnv_xscom.c default handler
are moved over to the adu model.

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:21:06 +10:00
Nicholas Piggin
24c3caff99 ppc/pnv: Implement POWER9 LPC PSI serirq outputs and auto-clear function
The POWER8 LPC ISA device irqs all get combined and reported to the line
connected the PSI LPCHC irq. POWER9 changed this so only internal LPC
host controller irqs use that line, and the device irqs get routed to
4 new lines connected to PSI SERIRQ0-3.

POWER9 also introduced a new feature that automatically clears the irq
status in the LPC host controller when EOI'ed, so software does not have
to.

The powernv OPAL (skiboot) firmware managed to work because the LPCHC
irq handler scanned all LPC irqs and handled those including clearing
status even on POWER9 systems. So LPC irqs worked despite OPAL thinking
it was running in POWER9 mode. After this change, UART interrupts show
up on serirq1 which is where OPAL routes them to:

 cat /proc/interrupts
 ...
 20:          0  XIVE-IRQ 1048563 Level     opal-psi#0:lpchc
 ...
 25:         34  XIVE-IRQ 1048568 Level     opal-psi#0:lpc_serirq_mux1

Whereas they previously turn up on lpchc.

Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:21:06 +10:00
Glenn Miles
c6e07f03f7 ppc/pnv: Fix loss of LPC SERIRQ interrupts
The LPC HC irq status register bits are set when an LPC IRQSER input is
asserted. These irq status bits drive the PSI irq to the CPU interrupt
controller. The LPC HC irq status bits are cleared by software writing
to the register with 1's for the bits to clear.

Existing register write was clearing the irq status bits even when the
input was asserted, this results in interrupts being lost.

This fix changes the behavior to keep track of the device IRQ status
in internal state that is separate from the irq status register, and
only allowing the irq status bits to be cleared if the associated
input is not asserted.

Signed-off-by: Glenn Miles <milesg@linux.ibm.com>
[np: rebased before P9 PSI SERIRQ patch, adjust changelog/comments]
Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:21:06 +10:00
Aditya Gupta
977e789c4a ppc/pnv: Update Power10's cfam id to use Power10 DD2
Power10 DD1.0 was dropped in:

    commit 8f054d9ee8 ("ppc: Drop support for POWER9 and POWER10 DD1 chips")

Use the newer Power10 DD2 chips cfam id.

Signed-off-by: Aditya Gupta <adityag@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:21:06 +10:00
Akihiko Odaki
785c8637f9 ppc/vof: Fix unaligned FDT property access
FDT properties are aligned by 4 bytes, not 8 bytes.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:21:06 +10:00
Akihiko Odaki
8af863f2bd spapr: Free stdout path
This fixes LeakSanitizer warnings.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:21:06 +10:00
Nicholas Piggin
1a7a31aec4 spapr: Migrate ail-mode-3 spapr cap
This cap did not add the migration code when it was introduced. This
results in migration failure when changing the default using the
command line.

Cc: qemu-stable@nongnu.org
Fixes: ccc5a4c5e1 ("spapr: Add SPAPR_CAP_AIL_MODE_3 for AIL mode 3 support for H_SET_MODE hcall")
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:21:06 +10:00
Richard Henderson
6410f877f5 Misc HW patch queue
- Restrict probe_access*() functions to TCG (Phil)
 - Extract do_invalidate_device_tlb from vtd_process_device_iotlb_desc (Clément)
 - Fixes in Loongson IPI model (Bibo & Phil)
 - Make docs/interop/firmware.json compatible with qapi-gen.py script (Thomas)
 - Correct MPC I2C MMIO region size (Zoltan)
 - Remove useless cast in Loongson3 Virt machine (Yao)
 - Various uses of range overlap API (Yao)
 - Use ERRP_GUARD macro in nubus_virtio_mmio_realize (Zhao)
 - Use DMA memory API in Goldfish UART model (Phil)
 - Expose fifo8_pop_buf and introduce fifo8_drop (Phil)
 - MAINTAINERS updates (Zhao, Phil)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmagFF8ACgkQ4+MsLN6t
 wN5bKg//f5TwUhsy2ff0FJpHheDOj/9Gc2nZ1U/Fp0E5N3sz3A7MGp91wye6Xwi3
 XG34YN9LK1AVzuCdrEEs5Uaxs1ZS1R2mV+fZaGHwYYxPDdnXxGyp/2Q0eyRxzbcN
 zxE2hWscYSZbPVEru4HvZJKfp4XnE1cqA78fJKMAdtq0IPq38tmQNRlJ+gWD9dC6
 ZUHXPFf3DnucvVuwqb0JYO/E+uJpcTtgR6pc09Xtv/HFgMiS0vKZ1I/6LChqAUw9
 eLMpD/5V2naemVadJe98/dL7gIUnhB8GTjsb4ioblG59AO/uojutwjBSQvFxBUUw
 U5lX9OSn20ouwcGiqimsz+5ziwhCG0R6r1zeQJFqUxrpZSscq7NQp9ygbvirm+wS
 edLc8yTPf4MtYOihzPP9jLPcXPZjEV64gSnJISDDFYWANCrysX3suaFEOuVYPl+s
 ZgQYRVSSYOYHgNqBSRkPKKVUxskSQiqLY3SfGJG4EA9Ktt5lD1cLCXQxhdsqphFm
 Ws3zkrVVL0EKl4v/4MtCgITIIctN1ZJE9u3oPJjASqSvK6EebFqAJkc2SidzKHz0
 F3iYX2AheWNHCQ3HFu023EvFryjlxYk95fs2f6Uj2a9yVbi813qsvd3gcZ8t0kTT
 +dmQwpu1MxjzZnA6838R6OCMnC+UpMPqQh3dPkU/5AF2fc3NnN8=
 =J/I2
 -----END PGP SIGNATURE-----

Merge tag 'hw-misc-20240723' of https://github.com/philmd/qemu into staging

Misc HW patch queue

- Restrict probe_access*() functions to TCG (Phil)
- Extract do_invalidate_device_tlb from vtd_process_device_iotlb_desc (Clément)
- Fixes in Loongson IPI model (Bibo & Phil)
- Make docs/interop/firmware.json compatible with qapi-gen.py script (Thomas)
- Correct MPC I2C MMIO region size (Zoltan)
- Remove useless cast in Loongson3 Virt machine (Yao)
- Various uses of range overlap API (Yao)
- Use ERRP_GUARD macro in nubus_virtio_mmio_realize (Zhao)
- Use DMA memory API in Goldfish UART model (Phil)
- Expose fifo8_pop_buf and introduce fifo8_drop (Phil)
- MAINTAINERS updates (Zhao, Phil)

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmagFF8ACgkQ4+MsLN6t
# wN5bKg//f5TwUhsy2ff0FJpHheDOj/9Gc2nZ1U/Fp0E5N3sz3A7MGp91wye6Xwi3
# XG34YN9LK1AVzuCdrEEs5Uaxs1ZS1R2mV+fZaGHwYYxPDdnXxGyp/2Q0eyRxzbcN
# zxE2hWscYSZbPVEru4HvZJKfp4XnE1cqA78fJKMAdtq0IPq38tmQNRlJ+gWD9dC6
# ZUHXPFf3DnucvVuwqb0JYO/E+uJpcTtgR6pc09Xtv/HFgMiS0vKZ1I/6LChqAUw9
# eLMpD/5V2naemVadJe98/dL7gIUnhB8GTjsb4ioblG59AO/uojutwjBSQvFxBUUw
# U5lX9OSn20ouwcGiqimsz+5ziwhCG0R6r1zeQJFqUxrpZSscq7NQp9ygbvirm+wS
# edLc8yTPf4MtYOihzPP9jLPcXPZjEV64gSnJISDDFYWANCrysX3suaFEOuVYPl+s
# ZgQYRVSSYOYHgNqBSRkPKKVUxskSQiqLY3SfGJG4EA9Ktt5lD1cLCXQxhdsqphFm
# Ws3zkrVVL0EKl4v/4MtCgITIIctN1ZJE9u3oPJjASqSvK6EebFqAJkc2SidzKHz0
# F3iYX2AheWNHCQ3HFu023EvFryjlxYk95fs2f6Uj2a9yVbi813qsvd3gcZ8t0kTT
# +dmQwpu1MxjzZnA6838R6OCMnC+UpMPqQh3dPkU/5AF2fc3NnN8=
# =J/I2
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 24 Jul 2024 06:36:47 AM AEST
# gpg:                using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full]

* tag 'hw-misc-20240723' of https://github.com/philmd/qemu: (28 commits)
  MAINTAINERS: Add myself as a reviewer of machine core
  MAINTAINERS: Cover guest-agent in QAPI schema
  util/fifo8: Introduce fifo8_drop()
  util/fifo8: Expose fifo8_pop_buf()
  util/fifo8: Rename fifo8_pop_buf() -> fifo8_pop_bufptr()
  util/fifo8: Rename fifo8_peek_buf() -> fifo8_peek_bufptr()
  util/fifo8: Use fifo8_reset() in fifo8_create()
  util/fifo8: Fix style
  chardev/char-fe: Document returned value on error
  hw/char/goldfish: Use DMA memory API
  hw/nubus/virtio-mmio: Fix missing ERRP_GUARD() in realize handler
  dump: make range overlap check more readable
  crypto/block-luks: make range overlap check more readable
  system/memory_mapping: make range overlap check more readable
  sparc/ldst_helper: make range overlap check more readable
  cxl/mailbox: make range overlap check more readable
  util/range: Make ranges_overlap() return bool
  hw/mips/loongson3_virt: remove useless type cast
  hw/i2c/mpc_i2c: Fix mmio region size
  docs/interop/firmware.json: convert "Example" section
  ...

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-24 15:39:43 +10:00
Richard Henderson
dd4bc5f1cf vfio queue:
* IOMMUFD Dirty Tracking support
 * Fix for a possible SEGV in IOMMU type1 container
 * Dropped initialization of host IOMMU device with mdev devices
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmafyVUACgkQUaNDx8/7
 7KGebRAAzEYxvstDxSPNF+1xx937TKbRpiKYtspTfEgu4Ht50MwO2ZqnVWzTBSwa
 qcjhDf2avMBpBvkp4O9fR7nXR0HRN2KvYrBSThZ3Qpqu4KjxCAGcHI5uYmgfizYh
 BBLrw3eWME5Ry220TinQF5KFl50vGq7Z/mku5N5Tgj2qfTfCXYK1Kc19SyAga49n
 LSokTIjZAGJa4vxrE7THawaEUjFRjfCJey64JUs/TPJaGr4R1snJcWgETww6juUE
 9OSw/xl0AoQhaN/ZTRC1qCsBLUI2MVPsC+x+vqVK62HlTjCx+uDRVQ8KzfDzjCeH
 gaLkMjxJSuJZMpm4UU7DBzDGEGcEBCGeNyFt37BSqqPPpX55CcFhj++d8vqTiwpF
 YzmTNd/znxcZTw6OJN9sQZohh+NeS86CVZ3x31HD3dXifhRf17jbh7NoIyi+0ZCb
 N+mytOH5BXsD+ddwbk+yMaxXV43Fgz7ThG5tB1tjhhNtLZHDA5ezFvGZ5F/FJrqE
 xAbjOhz5MC+RcOVNSzQJCULNqFpfE6Gqeys6btEDm/ltf4LpAe6W1HYuv8BJc19T
 UsqGK2yKAuQX8GErYxJ1zqZCttVrgpsmXFYTC5iGbxC84mvsF0Iti96IdXz9gfzN
 Vlb2OxoefcOwVqIhbkvTZW0ZwYGGDDPAYhLMfr5lSuRqj123OOo=
 =cViP
 -----END PGP SIGNATURE-----

Merge tag 'pull-vfio-20240723-1' of https://github.com/legoater/qemu into staging

vfio queue:

* IOMMUFD Dirty Tracking support
* Fix for a possible SEGV in IOMMU type1 container
* Dropped initialization of host IOMMU device with mdev devices

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmafyVUACgkQUaNDx8/7
# 7KGebRAAzEYxvstDxSPNF+1xx937TKbRpiKYtspTfEgu4Ht50MwO2ZqnVWzTBSwa
# qcjhDf2avMBpBvkp4O9fR7nXR0HRN2KvYrBSThZ3Qpqu4KjxCAGcHI5uYmgfizYh
# BBLrw3eWME5Ry220TinQF5KFl50vGq7Z/mku5N5Tgj2qfTfCXYK1Kc19SyAga49n
# LSokTIjZAGJa4vxrE7THawaEUjFRjfCJey64JUs/TPJaGr4R1snJcWgETww6juUE
# 9OSw/xl0AoQhaN/ZTRC1qCsBLUI2MVPsC+x+vqVK62HlTjCx+uDRVQ8KzfDzjCeH
# gaLkMjxJSuJZMpm4UU7DBzDGEGcEBCGeNyFt37BSqqPPpX55CcFhj++d8vqTiwpF
# YzmTNd/znxcZTw6OJN9sQZohh+NeS86CVZ3x31HD3dXifhRf17jbh7NoIyi+0ZCb
# N+mytOH5BXsD+ddwbk+yMaxXV43Fgz7ThG5tB1tjhhNtLZHDA5ezFvGZ5F/FJrqE
# xAbjOhz5MC+RcOVNSzQJCULNqFpfE6Gqeys6btEDm/ltf4LpAe6W1HYuv8BJc19T
# UsqGK2yKAuQX8GErYxJ1zqZCttVrgpsmXFYTC5iGbxC84mvsF0Iti96IdXz9gfzN
# Vlb2OxoefcOwVqIhbkvTZW0ZwYGGDDPAYhLMfr5lSuRqj123OOo=
# =cViP
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 24 Jul 2024 01:16:37 AM AEST
# gpg:                using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B  0B60 51A3 43C7 CFFB ECA1

* tag 'pull-vfio-20240723-1' of https://github.com/legoater/qemu:
  vfio/common: Allow disabling device dirty page tracking
  vfio/migration: Don't block migration device dirty tracking is unsupported
  vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
  vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
  vfio/iommufd: Probe and request hwpt dirty tracking capability
  vfio/{iommufd, container}: Invoke HostIOMMUDevice::realize() during attach_device()
  vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps
  vfio/{iommufd,container}: Remove caps::aw_bits
  vfio/iommufd: Introduce auto domain creation
  vfio/ccw: Don't initialize HOST_IOMMU_DEVICE with mdev
  vfio/ap: Don't initialize HOST_IOMMU_DEVICE with mdev
  vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
  backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities
  vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev
  vfio/pci: Extract mdev check into an helper
  hw/vfio/container: Fix SIGSEV on vfio_container_instance_finalize()

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-24 12:58:46 +10:00
Richard Henderson
43f59bf765 * target/i386/kvm: support for reading RAPL MSRs using a helper program
* hpet: emulation improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmaelL4UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMXoQf+K77lNlHLETSgeeP3dr7yZPOmXjjN
 qFY/18jiyLw7MK1rZC09fF+n9SoaTH8JDKupt0z9M1R10HKHLIO04f8zDE+dOxaE
 Rou3yKnlTgFPGSoPPFr1n1JJfxtYlLZRoUzaAcHUaa4W7JR/OHJX90n1Rb9MXeDk
 jV6P0v1FWtIDdM6ERm9qBGoQdYhj6Ra2T4/NZKJFXwIhKEkxgu4yO7WXv8l0dxQz
 jE4fKotqAvrkYW1EsiVZm30lw/19duhvGiYeQXoYhk8KKXXjAbJMblLITSNWsCio
 3l6Uud/lOxekkJDAq5nH3H9hCBm0WwvwL+0vRf3Mkr+/xRGvrhtmUdp8NQ==
 =00mB
 -----END PGP SIGNATURE-----

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

* target/i386/kvm: support for reading RAPL MSRs using a helper program
* hpet: emulation improvements

# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmaelL4UHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroMXoQf+K77lNlHLETSgeeP3dr7yZPOmXjjN
# qFY/18jiyLw7MK1rZC09fF+n9SoaTH8JDKupt0z9M1R10HKHLIO04f8zDE+dOxaE
# Rou3yKnlTgFPGSoPPFr1n1JJfxtYlLZRoUzaAcHUaa4W7JR/OHJX90n1Rb9MXeDk
# jV6P0v1FWtIDdM6ERm9qBGoQdYhj6Ra2T4/NZKJFXwIhKEkxgu4yO7WXv8l0dxQz
# jE4fKotqAvrkYW1EsiVZm30lw/19duhvGiYeQXoYhk8KKXXjAbJMblLITSNWsCio
# 3l6Uud/lOxekkJDAq5nH3H9hCBm0WwvwL+0vRf3Mkr+/xRGvrhtmUdp8NQ==
# =00mB
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 23 Jul 2024 03:19:58 AM AEST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]

* tag 'for-upstream' of https://gitlab.com/bonzini/qemu:
  hpet: avoid timer storms on periodic timers
  hpet: store full 64-bit target value of the counter
  hpet: accept 64-bit reads and writes
  hpet: place read-only bits directly in "new_val"
  hpet: remove unnecessary variable "index"
  hpet: ignore high bits of comparator in 32-bit mode
  hpet: fix and cleanup persistence of interrupt status
  Add support for RAPL MSRs in KVM/Qemu
  tools: build qemu-vmsr-helper
  qio: add support for SO_PEERCRED for socket channel
  target/i386: do not crash if microvm guest uses SGX CPUID leaves

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-24 11:25:40 +10:00
Richard Henderson
5885bcef3d virtio,pci,pc: features,fixes
pci: Initial support for SPDM Responders
 cxl: Add support for scan media, feature commands, device patrol scrub
     control, DDR5 ECS control, firmware updates
 virtio: in-order support
 virtio-net: support for SR-IOV emulation (note: known issues on s390,
                                           might get reverted if not fixed)
 smbios: memory device size is now configurable per Machine
 cpu: architecture agnostic code to support vCPU Hotplug
 
 Fixes, cleanups all over the place.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmae9l8PHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRp8fYH/impBH9nViO/WK48io4mLSkl0EUL8Y/xrMvH
 zKFCKaXq8D96VTt1Z4EGKYgwG0voBKZaCEKYU/0ARGnSlSwxINQ8ROCnBWMfn2sx
 yQt08EXVMznNLtXjc6U5zCoCi6SaV85GH40No3MUFXBQt29ZSlFqO/fuHGZHYBwS
 wuVKvTjjNF4EsGt3rS4Qsv6BwZWMM+dE6yXpKWk68kR8IGp+6QGxkMbWt9uEX2Md
 VuemKVnFYw0XGCGy5K+ZkvoA2DGpEw0QxVSOMs8CI55Oc9SkTKz5fUSzXXGo1if+
 M1CTjOPJu6pMym6gy6XpFa8/QioDA/jE2vBQvfJ64TwhJDV159s=
 =k8e9
 -----END PGP SIGNATURE-----

Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging

virtio,pci,pc: features,fixes

pci: Initial support for SPDM Responders
cxl: Add support for scan media, feature commands, device patrol scrub
    control, DDR5 ECS control, firmware updates
virtio: in-order support
virtio-net: support for SR-IOV emulation (note: known issues on s390,
                                          might get reverted if not fixed)
smbios: memory device size is now configurable per Machine
cpu: architecture agnostic code to support vCPU Hotplug

Fixes, cleanups all over the place.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmae9l8PHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRp8fYH/impBH9nViO/WK48io4mLSkl0EUL8Y/xrMvH
# zKFCKaXq8D96VTt1Z4EGKYgwG0voBKZaCEKYU/0ARGnSlSwxINQ8ROCnBWMfn2sx
# yQt08EXVMznNLtXjc6U5zCoCi6SaV85GH40No3MUFXBQt29ZSlFqO/fuHGZHYBwS
# wuVKvTjjNF4EsGt3rS4Qsv6BwZWMM+dE6yXpKWk68kR8IGp+6QGxkMbWt9uEX2Md
# VuemKVnFYw0XGCGy5K+ZkvoA2DGpEw0QxVSOMs8CI55Oc9SkTKz5fUSzXXGo1if+
# M1CTjOPJu6pMym6gy6XpFa8/QioDA/jE2vBQvfJ64TwhJDV159s=
# =k8e9
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 23 Jul 2024 10:16:31 AM AEST
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [undefined]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [undefined]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (61 commits)
  hw/nvme: Add SPDM over DOE support
  backends: Initial support for SPDM socket support
  hw/pci: Add all Data Object Types defined in PCIe r6.0
  tests/acpi: Add expected ACPI AML files for RISC-V
  tests/qtest/bios-tables-test.c: Enable basic testing for RISC-V
  tests/acpi: Add empty ACPI data files for RISC-V
  tests/qtest/bios-tables-test.c: Remove the fall back path
  tests/acpi: update expected DSDT blob for aarch64 and microvm
  acpi/gpex: Create PCI link devices outside PCI root bridge
  tests/acpi: Allow DSDT acpi table changes for aarch64
  hw/riscv/virt-acpi-build.c: Update the HID of RISC-V UART
  hw/riscv/virt-acpi-build.c: Add namespace devices for PLIC and APLIC
  virtio-iommu: Add trace point on virtio_iommu_detach_endpoint_from_domain
  hw/vfio/common: Add vfio_listener_region_del_iommu trace event
  virtio-iommu: Remove the end point on detach
  virtio-iommu: Free [host_]resv_ranges on unset_iommu_devices
  virtio-iommu: Remove probe_done
  Revert "virtio-iommu: Clear IOMMUDevice when VFIO device is unplugged"
  gdbstub: Add helper function to unregister GDB register space
  physmem: Add helper function to destroy CPU AddressSpace
  ...

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-24 09:32:04 +10:00
Philippe Mathieu-Daudé
e4e9db2562 util/fifo8: Introduce fifo8_drop()
Add the fifo8_drop() helper for clarity.
It is a simple wrapper over fifo8_pop_buf().

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20240722160745.67904-8-philmd@linaro.org>
2024-07-23 22:34:54 +02:00
Philippe Mathieu-Daudé
23ad571173 util/fifo8: Expose fifo8_pop_buf()
Extract fifo8_pop_buf() from hw/scsi/esp.c and expose
it as part of the <qemu/fifo8.h> API. This function takes
care of non-contiguous (wrapped) FIFO buffer (which is an
implementation detail).

Suggested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20240722160745.67904-7-philmd@linaro.org>
2024-07-23 22:34:54 +02:00
Philippe Mathieu-Daudé
06252bf512 util/fifo8: Rename fifo8_pop_buf() -> fifo8_pop_bufptr()
Since fifo8_pop_buf() return a const buffer (which points
directly into the FIFO backing store). Rename it using the
'bufptr' suffix to better reflect that it is a pointer to
the internal buffer that is being returned. This will help
differentiate with methods *copying* the FIFO data.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20240722160745.67904-6-philmd@linaro.org>
2024-07-23 22:34:54 +02:00
Philippe Mathieu-Daudé
06a16e7ba9 util/fifo8: Rename fifo8_peek_buf() -> fifo8_peek_bufptr()
Since fifo8_peek_buf() return a const buffer (which points
directly into the FIFO backing store). Rename it using the
'bufptr' suffix to better reflect that it is a pointer to
the internal buffer that is being returned. This will help
differentiate with methods *copying* the FIFO data.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20240722160745.67904-5-philmd@linaro.org>
2024-07-23 22:34:54 +02:00
Philippe Mathieu-Daudé
c9e0b9a59c hw/char/goldfish: Use DMA memory API
Rather than using address_space_rw(..., 0 or 1),
use the simpler DMA memory API which expand to
the same code. This allows removing a cast on
the 'buf' variable which is really const. Since
'buf' is only used in the CMD_READ_BUFFER case,
we can reduce its scope.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-Id: <20240723181850.46000-1-philmd@linaro.org>
2024-07-23 22:34:33 +02:00
Zhao Liu
2f28f28e74 hw/nubus/virtio-mmio: Fix missing ERRP_GUARD() in realize handler
According to the comment in qapi/error.h, dereferencing @errp requires
ERRP_GUARD():

* = Why, when and how to use ERRP_GUARD() =
*
* Without ERRP_GUARD(), use of the @errp parameter is restricted:
* - It must not be dereferenced, because it may be null.
...
* ERRP_GUARD() lifts these restrictions.
*
* To use ERRP_GUARD(), add it right at the beginning of the function.
* @errp can then be used without worrying about the argument being
* NULL or &error_fatal.
*
* Using it when it's not needed is safe, but please avoid cluttering
* the source with useless code.

In nubus_virtio_mmio_realize(), @errp is dereferenced without
ERRP_GUARD().

Although nubus_virtio_mmio_realize() - as a DeviceClass.realize()
method - is never passed a null @errp argument, it should follow the
rules on @errp usage.  Add the ERRP_GUARD() there.

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Message-ID: <20240723161802.1377985-1-zhao1.liu@intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-07-23 22:34:09 +02:00
Yao Xingtao
7b3e371526 cxl/mailbox: make range overlap check more readable
use ranges_overlap() instead of open-coding the overlap check to improve
the readability of the code.

Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240722040742.11513-5-yaoxt.fnst@fujitsu.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-07-23 20:30:36 +02:00
Yao Xingtao
c8f1a322d1 hw/mips/loongson3_virt: remove useless type cast
The type of kernel_entry, kernel_low and kernel_high is uint64_t, cast
the pointer of this type to uint64_t* is useless.

Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240722091728.4334-2-yaoxt.fnst@fujitsu.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-07-23 20:30:36 +02:00
BALATON Zoltan
53858a6a30 hw/i2c/mpc_i2c: Fix mmio region size
The last register of this device is at offset 0x14 occupying 8 bits so
to cover it the mmio region needs to be 0x15 bytes long. Also correct
the name of the field storing this register value to match the
register name.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Fixes: 7abb479c7a ("PPC: E500: Add FSL I2C controller")
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240721225506.B32704E6039@zero.eik.bme.hu>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-07-23 20:30:36 +02:00
Philippe Mathieu-Daudé
9ea0f206b7 docs: Correct Loongarch -> LoongArch
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <20240718133312.10324-20-philmd@linaro.org>
2024-07-23 20:30:36 +02:00
Philippe Mathieu-Daudé
13e8ec6cf3 hw/intc/loongson_ipi: Declare QOM types using DEFINE_TYPES() macro
When multiple QOM types are registered in the same file,
it is simpler to use the the DEFINE_TYPES() macro. Replace
the type_init() / type_register_static() combination.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <20240718133312.10324-2-philmd@linaro.org>
2024-07-23 20:30:36 +02:00
Philippe Mathieu-Daudé
0c2086bc73 hw/intc/loongson_ipi: Fix resource leak
Once initialised, QOM objects can be realized and
unrealized multiple times before being finalized.
Resources allocated in REALIZE must be deallocated
in an equivalent UNREALIZE handler.

Free the CPU array in loongson_ipi_unrealize()
instead of loongson_ipi_finalize().

Cc: qemu-stable@nongnu.org
Fixes: 5e90b8db38 ("hw/loongarch: Set iocsr address space per-board rather than percpu")
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20240723111405.14208-3-philmd@linaro.org>
2024-07-23 20:30:36 +02:00
Bibo Mao
2465c89fb9 hw/intc/loongson_ipi: Access memory in little endian
Loongson IPI is only available in little-endian,
so use that to access the guest memory (in case
we run on a big-endian host).

Cc: qemu-stable@nongnu.org
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Fixes: f6783e3438 ("hw/loongarch: Add LoongArch ipi interrupt support")
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Tested-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <20240718133312.10324-3-philmd@linaro.org>
2024-07-23 20:30:30 +02:00
Clément Mathieu--Drif
35422553bc hw/i386/intel_iommu: Extract device IOTLB invalidation logic
This piece of code can be shared by both IOTLB invalidation and
PASID-based IOTLB invalidation

No functional changes intended.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-ID: <20240718081636.879544-12-zhenzhong.duan@intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-07-23 18:08:44 +02:00
Joao Martins
30b9167785 vfio/common: Allow disabling device dirty page tracking
The property 'x-pre-copy-dirty-page-tracking' allows disabling the whole
tracking of VF pre-copy phase of dirty page tracking, though it means
that it will only be used at the start of the switchover phase.

Add an option that disables the VF dirty page tracking, and fall
back into container-based dirty page tracking. This also allows to
use IOMMU dirty tracking even on VFs with their own dirty
tracker scheme.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23 17:14:53 +02:00
Joao Martins
f48b472450 vfio/migration: Don't block migration device dirty tracking is unsupported
By default VFIO migration is set to auto, which will support live
migration if the migration capability is set *and* also dirty page
tracking is supported.

For testing purposes one can force enable without dirty page tracking
via enable-migration=on, but that option is generally left for testing
purposes.

So starting with IOMMU dirty tracking it can use to accommodate the lack of
VF dirty page tracking allowing us to minimize the VF requirements for
migration and thus enabling migration by default for those too.

While at it change the error messages to mention IOMMU dirty tracking as
well.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
[ clg: - spelling in commit log ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-07-23 17:14:53 +02:00
Joao Martins
7c30710bd9 vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
that fetches the bitmap that tells what was dirty in an IOVA
range.

A single bitmap is allocated and used across all the hwpts
sharing an IOAS which is then used in log_sync() to set Qemu
global bitmaps.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23 17:14:52 +02:00
Joao Martins
52ce88229c vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that
enables or disables dirty page tracking. The ioctl is used if the hwpt
has been created with dirty tracking supported domain (stored in
hwpt::flags) and it is called on the whole list of iommu domains.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23 17:14:52 +02:00
Joao Martins
dddfd8d667 vfio/iommufd: Probe and request hwpt dirty tracking capability
In preparation to using the dirty tracking UAPI, probe whether the IOMMU
supports dirty tracking. This is done via the data stored in
hiod::caps::hw_caps initialized from GET_HW_INFO.

Qemu doesn't know if VF dirty tracking is supported when allocating
hardware pagetable in iommufd_cdev_autodomains_get(). This is because
VFIODevice migration state hasn't been initialized *yet* hence it can't pick
between VF dirty tracking vs IOMMU dirty tracking. So, if IOMMU supports
dirty tracking it always creates HWPTs with IOMMU_HWPT_ALLOC_DIRTY_TRACKING
even if later on VFIOMigration decides to use VF dirty tracking instead.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[ clg: - Fixed vbasedev->iommu_dirty_tracking assignment in
         iommufd_cdev_autodomains_get()
       - Added warning for heterogeneous dirty page tracking support
	 in iommufd_cdev_autodomains_get() ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23 17:14:52 +02:00
Joao Martins
83a4d596a9 vfio/{iommufd, container}: Invoke HostIOMMUDevice::realize() during attach_device()
Move the HostIOMMUDevice::realize() to be invoked during the attach of the device
before we allocate IOMMUFD hardware pagetable objects (HWPT). This allows the use
of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially tell if the IOMMU
behind the device supports dirty tracking.

Note: The HostIOMMUDevice data from legacy backend is static and doesn't
need any information from the (type1-iommu) backend to be initialized.
In contrast however, the IOMMUFD HostIOMMUDevice data requires the
iommufd FD to be connected and having a devid to be able to successfully
GET_HW_INFO. This means vfio_device_hiod_realize() is called in
different places within the backend .attach_device() implementation.

Suggested-by: Cédric Le Goater <clg@redhat.cm>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
[ clg: Fixed error handling in iommufd_cdev_attach() ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23 17:14:52 +02:00
Joao Martins
21e8d3a3aa vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps
Store the value of @caps returned by iommufd_backend_get_device_info()
in a new field HostIOMMUDeviceCaps::hw_caps. Right now the only value is
whether device IOMMU supports dirty tracking (IOMMU_HW_CAP_DIRTY_TRACKING).

This is in preparation for HostIOMMUDevice::realize() being called early
during attach_device().

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23 17:14:52 +02:00
Joao Martins
6c63532642 vfio/{iommufd,container}: Remove caps::aw_bits
Remove caps::aw_bits which requires the bcontainer::iova_ranges being
initialized after device is actually attached. Instead defer that to
.get_cap() and call vfio_device_get_aw_bits() directly.

This is in preparation for HostIOMMUDevice::realize() being called early
during attach_device().

Suggested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23 17:14:52 +02:00
Joao Martins
5b1e96e654 vfio/iommufd: Introduce auto domain creation
There's generally two modes of operation for IOMMUFD:

1) The simple user API which intends to perform relatively simple things
with IOMMUs e.g. DPDK. The process generally creates an IOAS and attaches
to VFIO and mainly performs IOAS_MAP and UNMAP.

2) The native IOMMUFD API where you have fine grained control of the
IOMMU domain and model it accordingly. This is where most new feature
are being steered to.

For dirty tracking 2) is required, as it needs to ensure that
the stage-2/parent IOMMU domain will only attach devices
that support dirty tracking (so far it is all homogeneous in x86, likely
not the case for smmuv3). Such invariant on dirty tracking provides a
useful guarantee to VMMs that will refuse incompatible device
attachments for IOMMU domains.

Dirty tracking insurance is enforced via HWPT_ALLOC, which is
responsible for creating an IOMMU domain. This is contrast to the
'simple API' where the IOMMU domain is created by IOMMUFD automatically
when it attaches to VFIO (usually referred as autodomains) but it has
the needed handling for mdevs.

To support dirty tracking with the advanced IOMMUFD API, it needs
similar logic, where IOMMU domains are created and devices attached to
compatible domains. Essentially mimicking kernel
iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU domain
it falls back to IOAS attach.

The auto domain logic allows different IOMMU domains to be created when
DMA dirty tracking is not desired (and VF can provide it), and others where
it is. Here it is not used in this way given how VFIODevice migration
state is initialized after the device attachment. But such mixed mode of
IOMMU dirty tracking + device dirty tracking is an improvement that can
be added on. Keep the 'all of nothing' of type1 approach that we have
been using so far between container vs device dirty tracking.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
[ clg: Added ERRP_GUARD() in iommufd_cdev_autodomains_get() ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23 17:14:52 +02:00
Zhenzhong Duan
8b8705e7f2 vfio/ccw: Don't initialize HOST_IOMMU_DEVICE with mdev
mdevs aren't "physical" devices and when asking for backing IOMMU info,
it fails the entire provisioning of the guest. Fix that by setting
vbasedev->mdev true so skipping HostIOMMUDevice initialization in the
presence of mdevs.

Fixes: 9305895201 ("vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23 17:14:52 +02:00
Zhenzhong Duan
c598d65aef vfio/ap: Don't initialize HOST_IOMMU_DEVICE with mdev
mdevs aren't "physical" devices and when asking for backing IOMMU info,
it fails the entire provisioning of the guest. Fix that by setting
vbasedev->mdev true so skipping HostIOMMUDevice initialization in the
presence of mdevs.

Fixes: 9305895201 ("vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23 17:14:52 +02:00
Joao Martins
b07dcb7d4f vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
In preparation to implement auto domains have the attach function
return the errno it got during domain attach instead of a bool.

-EINVAL is tracked to track domain incompatibilities, and decide whether
to create a new IOMMU domain.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23 17:14:52 +02:00
Joao Martins
2d1bf25897 backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities
The helper will be able to fetch vendor agnostic IOMMU capabilities
supported both by hardware and software. Right now it is only iommu dirty
tracking.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23 17:14:52 +02:00
Joao Martins
9f17604195 vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev
mdevs aren't "physical" devices and when asking for backing IOMMU info, it
fails the entire provisioning of the guest. Fix that by skipping
HostIOMMUDevice initialization in the presence of mdevs, and skip setting
an iommu device when it is known to be an mdev.

Cc: Zhenzhong Duan <zhenzhong.duan@intel.com>
Fixes: 9305895201 ("vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23 17:14:52 +02:00
Joao Martins
13e522f644 vfio/pci: Extract mdev check into an helper
In preparation to skip initialization of the HostIOMMUDevice for mdev,
extract the checks that validate if a device is an mdev into helpers.

A vfio_device_is_mdev() is created, and subsystems consult VFIODevice::mdev
to check if it's mdev or not.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23 17:14:52 +02:00