Commit Graph

1385 Commits

Author SHA1 Message Date
Peter Xu
5504a81261 KVM: Dynamic sized kvm memslots array
Zhiyi reported an infinite loop issue in VFIO use case.  The cause of that
was a separate discussion, however during that I found a regression of
dirty sync slowness when profiling.

Each KVMMemoryListerner maintains an array of kvm memslots.  Currently it's
statically allocated to be the max supported by the kernel.  However after
Linux commit 4fc096a99e ("KVM: Raise the maximum number of user memslots"),
the max supported memslots reported now grows to some number large enough
so that it may not be wise to always statically allocate with the max
reported.

What's worse, QEMU kvm code still walks all the allocated memslots entries
to do any form of lookups.  It can drastically slow down all memslot
operations because each of such loop can run over 32K times on the new
kernels.

Fix this issue by making the memslots to be allocated dynamically.

Here the initial size was set to 16 because it should cover the basic VM
usages, so that the hope is the majority VM use case may not even need to
grow at all (e.g. if one starts a VM with ./qemu-system-x86_64 by default
it'll consume 9 memslots), however not too large to waste memory.

There can also be even better way to address this, but so far this is the
simplest and should be already better even than before we grow the max
supported memslots.  For example, in the case of above issue when VFIO was
attached on a 32GB system, there are only ~10 memslots used.  So it could
be good enough as of now.

In the above VFIO context, measurement shows that the precopy dirty sync
shrinked from ~86ms to ~3ms after this patch applied.  It should also apply
to any KVM enabled VM even without VFIO.

NOTE: we don't have a FIXES tag for this patch because there's no real
commit that regressed this in QEMU. Such behavior existed for a long time,
but only start to be a problem when the kernel reports very large
nr_slots_max value.  However that's pretty common now (the kernel change
was merged in 2021) so we attached cc:stable because we'll want this change
to be backported to stable branches.

Cc: qemu-stable <qemu-stable@nongnu.org>
Reported-by: Zhiyi Guo <zhguo@redhat.com>
Tested-by: Zhiyi Guo <zhguo@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240917163835.194664-2-peterx@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-17 19:41:30 +02:00
Akihiko Odaki
8bd6072de3 dma: Fix function names in documentation
Ensure the function names match.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20241012-dma-v2-1-6afddf5f3c8d@daynix.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-10-15 15:16:17 +01:00
Peter Maydell
f7214f99ff vl.c: Remove pxa2xx-specific -portrait and -rotate options
The ``-portrait`` and ``-rotate`` options were documented as only
working with the PXA LCD device, and all the machine types using
that display device were removed in 9.2.

These options were intended to simulate a mobile device being
rotated by the user, and had three effects:
 * the display output was rotated by 90, 180 or 270 degrees
   (implemented in the PXA display device models)
 * the mouse/trackpad input was rotated the opposite way
   (implemented in generic code)
 * the machine model would signal to the guest about its
   orientation
   (implemented by e.g. the spitz machine model)

Of these three things, the input-rotation was coded without being
restricted to boards which supported the full set of device-rotation
handling, so in theory the options were usable on other machine
models with odd effects (rotating input but not display output).  But
this was never intended or documented behaviour, so we can reasonably
drop these command line arguments without a formal deprecate-and-drop
cycle for them.

Remove the options, and their implementation and documentation.
Describe the removal in removed-features.rst.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20241003140010.1653808-7-peter.maydell@linaro.org
2024-10-15 15:16:17 +01:00
Peter Maydell
b5ab62b3c0 * pc: Add a description for the i8042 property
* kvm: support for nested FRED
 * tests/unit: fix warning when compiling test-nested-aio-poll with LTO
 * kvm: refactoring of VM creation
 * target/i386: expose IBPB-BRTYPE and SBPB CPUID bits to the guest
 * hw/char: clean up serial
 * remove virtfs-proxy-helper
 * target/i386/kvm: Report which action failed in kvm_arch_put/get_registers
 * qom: improvements to object_resolve_path*()
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmb++MsUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPVnwf/cdvfxvDm22tEdlh8vHlV17HtVdcC
 Hw334M/3PDvbTmGzPBg26lzo4nFS6SLrZ8ETCeqvuJrtKzqVk9bI8ssZW5KA4ijM
 nkxguRPHO8E6U33ZSucc+Hn56+bAx4I2X80dLKXJ87OsbMffIeJ6aHGSEI1+fKVh
 pK7q53+Y3lQWuRBGhDIyKNuzqU4g+irpQwXOhux63bV3ADadmsqzExP6Gmtl8OKM
 DylPu1oK7EPZumlSiJa7Gy1xBqL4Rc4wGPNYx2RVRjp+i7W2/Y1uehm3wSBw+SXC
 a6b7SvLoYfWYS14/qCF4cBL3sJH/0f/4g8ZAhDDxi2i5kBr0/5oioDyE/A==
 =/zo4
 -----END PGP SIGNATURE-----

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

* pc: Add a description for the i8042 property
* kvm: support for nested FRED
* tests/unit: fix warning when compiling test-nested-aio-poll with LTO
* kvm: refactoring of VM creation
* target/i386: expose IBPB-BRTYPE and SBPB CPUID bits to the guest
* hw/char: clean up serial
* remove virtfs-proxy-helper
* target/i386/kvm: Report which action failed in kvm_arch_put/get_registers
* qom: improvements to object_resolve_path*()

# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmb++MsUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroPVnwf/cdvfxvDm22tEdlh8vHlV17HtVdcC
# Hw334M/3PDvbTmGzPBg26lzo4nFS6SLrZ8ETCeqvuJrtKzqVk9bI8ssZW5KA4ijM
# nkxguRPHO8E6U33ZSucc+Hn56+bAx4I2X80dLKXJ87OsbMffIeJ6aHGSEI1+fKVh
# pK7q53+Y3lQWuRBGhDIyKNuzqU4g+irpQwXOhux63bV3ADadmsqzExP6Gmtl8OKM
# DylPu1oK7EPZumlSiJa7Gy1xBqL4Rc4wGPNYx2RVRjp+i7W2/Y1uehm3wSBw+SXC
# a6b7SvLoYfWYS14/qCF4cBL3sJH/0f/4g8ZAhDDxi2i5kBr0/5oioDyE/A==
# =/zo4
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 03 Oct 2024 21:04:27 BST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (23 commits)
  qom: update object_resolve_path*() documentation
  qom: set *ambiguous on all paths
  qom: rename object_resolve_path_type() "ambiguousp"
  target/i386/kvm: Report which action failed in kvm_arch_put/get_registers
  kvm: Allow kvm_arch_get/put_registers to accept Error**
  accel/kvm: refactor dirty ring setup
  minikconf: print error entirely on stderr
  9p: remove 'proxy' filesystem backend driver
  hw/char: Extract serial-mm
  hw/char/serial.h: Extract serial-isa.h
  hw: Remove unused inclusion of hw/char/serial.h
  target/i386: Expose IBPB-BRTYPE and SBPB CPUID bits to the guest
  kvm: refactor core virtual machine creation into its own function
  kvm/i386: replace identity_base variable with a constant
  kvm/i386: refactor kvm_arch_init and split it into smaller functions
  kvm: replace fprintf with error_report()/printf() in kvm_init()
  kvm/i386: fix return values of is_host_cpu_intel()
  kvm/i386: make kvm_filter_msr() and related definitions private to kvm module
  hw/i386/pc: Add a description for the i8042 property
  tests/unit: remove block layer code from test-nested-aio-poll
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

# Conflicts:
#	hw/arm/Kconfig
#	hw/arm/pxa2xx.c
2024-10-04 19:28:37 +01:00
Julia Suvorova
a1676bb304 kvm: Allow kvm_arch_get/put_registers to accept Error**
This is necessary to provide discernible error messages to the caller.

Signed-off-by: Julia Suvorova <jusual@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240927104743.218468-2-jusual@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-03 22:04:19 +02:00
Dr. David Alan Gilbert
07bea2d35f block-backend: Remove deadcode
blk_by_public last use was removed in 2017 by
  c61791fc23 ("block: add aio_context field in ThrottleGroupMember")

blk_activate last use was removed earlier this year by
  eef0bae3a7 ("migration: Remove block migration")

blk_add_insert_bs_notifier, blk_op_block_all, blk_op_unblock_all
last uses were removed in 2016 by
  ef8875b549 ("virtio-scsi: Remove op blocker for dataplane")

blk_iostatus_disable last use was removed in 2016 by
  66a0fae438 ("blockjob: Don't touch BDS iostatus")

Remove them.

Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-10-03 17:26:06 +03:00
Dr. David Alan Gilbert
40ebdc4b60 replay: Remove unused replay_disable_events
replay_disable_events has been unused since 2019's
  c8aa7895eb ("replay: don't drain/flush bdrv queue while RR is working")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-10-03 17:26:06 +03:00
Juraj Marcin
1b063fe2df reset: Use ResetType for qemu_devices_reset() and MachineClass::reset()
Currently, both qemu_devices_reset() and MachineClass::reset() use
ShutdownCause for the reason of the reset. However, the Resettable
interface uses ResetState, so ShutdownCause needs to be translated to
ResetType somewhere. Translating it qemu_devices_reset() makes adding
new reset types harder, as they cannot always be matched to a single
ShutdownCause here, and devices may need to check the ResetType to
determine what to reset and if to reset at all.

This patch moves this translation up in the call stack to
qemu_system_reset() and updates all MachineClass children to use the
ResetType instead.

Message-ID: <20240904103722.946194-2-jmarcin@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2024-09-24 11:33:34 +02:00
Philippe Mathieu-Daudé
01d01edc9f system: Remove support for CRIS target
We are about to remove the CRIS target, so remove
the sysemu part. This remove the CRIS 'none' machine.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Message-ID: <20240904143603.52934-13-philmd@linaro.org>
2024-09-13 20:11:13 +02:00
Danny Canter
2c760670af hvf: Split up hv_vm_create logic per arch
This is preliminary work to split up hv_vm_create
logic per platform so we can support creating VMs
with > 64GB of RAM on Apple Silicon machines. This
is done via ARM HVF's hv_vm_config_create() (and
other APIs that modify this config that will be
coming in future patches). This should have no
behavioral difference at all as hv_vm_config_create()
just assigns the same default values as if you just
passed NULL to the function.

Signed-off-by: Danny Canter <danny_canter@apple.com>
Message-id: 20240828111552.93482-3-danny_canter@apple.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-13 15:31:46 +01:00
Johannes Stoelp
6a8703aecb kvm: Use 'unsigned long' for request argument in functions wrapping ioctl()
Change the data type of the ioctl _request_ argument from 'int' to
'unsigned long' for the various accel/kvm functions which are
essentially wrappers around the ioctl() syscall.

The correct type for ioctl()'s 'request' argument is confused:
 * POSIX defines the request argument as 'int'
 * glibc uses 'unsigned long' in the prototype in sys/ioctl.h
 * the glibc info documentation uses 'int'
 * the Linux manpage uses 'unsigned long'
 * the Linux implementation of the syscall uses 'unsigned int'

If we wrap ioctl() with another function which uses 'int' as the
type for the request argument, then requests with the 0x8000_0000
bit set will be sign-extended when the 'int' is cast to
'unsigned long' for the call to ioctl().

On x86_64 one such example is the KVM_IRQ_LINE_STATUS request.
Bit requests with the _IOC_READ direction bit set, will have the high
bit set.

Fortunately the Linux Kernel truncates the upper 32bit of the request
on 64bit machines (because it uses 'unsigned int', and see also Linus
Torvalds' comments in
  https://sourceware.org/bugzilla/show_bug.cgi?id=14362 )
so this doesn't cause active problems for us.  However it is more
consistent to follow the glibc ioctl() prototype when we define
functions that are essentially wrappers around ioctl().

This resolves a Coverity issue where it points out that in
kvm_get_xsave() we assign a value (KVM_GET_XSAVE or KVM_GET_XSAVE2)
to an 'int' variable which can't hold it without overflow.

Resolves: Coverity CID 1547759
Signed-off-by: Johannes Stoelp <johannes.stoelp@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20240815122747.3053871-1-peter.maydell@linaro.org
[PMM: Rebased patch, adjusted commit message, included note about
 Coverity fix, updated the type of the local var in kvm_get_xsave,
 updated the comment in the KVMState struct definition]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-09-13 15:31:46 +01:00
Markus Armbruster
b1019999e8 qapi/cryptodev: Rename QCryptodevBackendAlgType to *Algo, and drop prefix
QAPI's 'prefix' feature can make the connection between enumeration
type and its constants less than obvious.  It's best used with
restraint.

QCryptodevBackendAlgType has a 'prefix' that overrides the generated
enumeration constants' prefix to QCRYPTODEV_BACKEND_ALG.

We could simply drop 'prefix', but I think the abbreviation "alg" is
less than clear.

Additionally rename the type to QCryptodevBackendAlgoType.  The prefix
becomes QCRYPTODEV_BACKEND_ALGO_TYPE.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20240904111836.3273842-19-armbru@redhat.com>
2024-09-10 14:03:30 +02:00
Nicholas Piggin
94962ff00d Revert "replay: stop us hanging in rr_wait_io_event"
This reverts commit 1f881ea4a4.

That commit causes reverse_debugging.py test failures, and does
not seem to solve the root cause of the problem x86-64 still
hangs in record/replay tests.

The problem with short-cutting the iowait that was taken during
record phase is that related events will not get consumed at the
same points (e.g., reading the clock).

A hang with zero icount always seems to be a symptom of an earlier
problem that has caused the recording to become out of synch with
the execution and consumption of events by replay.

Acked-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20240813050638.446172-6-npiggin@gmail.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240813202329.1237572-14-alex.bennee@linaro.org>
2024-08-16 14:04:19 +01:00
Nicholas Piggin
9dbab31d9e replay: allow runstate shutdown->running when replaying trace
When replaying a trace, it is possible to go from shutdown to running
with a reverse-debugging step. This can be useful if the problem being
debugged triggers a reset or shutdown.

This can be tested by making a recording of a machine that shuts down,
then using -action shutdown=pause when replaying it. Continuing to the
end of the trace then reverse-stepping in gdb crashes due to invalid
runstate transition.

Just permitting the transition seems to be all that's necessary for
reverse-debugging to work well in such a state.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20240813050638.446172-5-npiggin@gmail.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240813202329.1237572-13-alex.bennee@linaro.org>
2024-08-16 14:04:19 +01:00
Harsh Prateek Bora
c6a3d7bc9e accel/kvm: Introduce kvm_create_and_park_vcpu() helper
There are distinct helpers for creating and parking a KVM vCPU.
However, there can be cases where a platform needs to create and
immediately park the vCPU during early stages of vcpu init which
can later be reused when vcpu thread gets initialized. This would
help detect failures with kvm_create_vcpu at an early stage.

Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:21:06 +10:00
Richard Henderson
dd4bc5f1cf vfio queue:
* IOMMUFD Dirty Tracking support
 * Fix for a possible SEGV in IOMMU type1 container
 * Dropped initialization of host IOMMU device with mdev devices
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmafyVUACgkQUaNDx8/7
 7KGebRAAzEYxvstDxSPNF+1xx937TKbRpiKYtspTfEgu4Ht50MwO2ZqnVWzTBSwa
 qcjhDf2avMBpBvkp4O9fR7nXR0HRN2KvYrBSThZ3Qpqu4KjxCAGcHI5uYmgfizYh
 BBLrw3eWME5Ry220TinQF5KFl50vGq7Z/mku5N5Tgj2qfTfCXYK1Kc19SyAga49n
 LSokTIjZAGJa4vxrE7THawaEUjFRjfCJey64JUs/TPJaGr4R1snJcWgETww6juUE
 9OSw/xl0AoQhaN/ZTRC1qCsBLUI2MVPsC+x+vqVK62HlTjCx+uDRVQ8KzfDzjCeH
 gaLkMjxJSuJZMpm4UU7DBzDGEGcEBCGeNyFt37BSqqPPpX55CcFhj++d8vqTiwpF
 YzmTNd/znxcZTw6OJN9sQZohh+NeS86CVZ3x31HD3dXifhRf17jbh7NoIyi+0ZCb
 N+mytOH5BXsD+ddwbk+yMaxXV43Fgz7ThG5tB1tjhhNtLZHDA5ezFvGZ5F/FJrqE
 xAbjOhz5MC+RcOVNSzQJCULNqFpfE6Gqeys6btEDm/ltf4LpAe6W1HYuv8BJc19T
 UsqGK2yKAuQX8GErYxJ1zqZCttVrgpsmXFYTC5iGbxC84mvsF0Iti96IdXz9gfzN
 Vlb2OxoefcOwVqIhbkvTZW0ZwYGGDDPAYhLMfr5lSuRqj123OOo=
 =cViP
 -----END PGP SIGNATURE-----

Merge tag 'pull-vfio-20240723-1' of https://github.com/legoater/qemu into staging

vfio queue:

* IOMMUFD Dirty Tracking support
* Fix for a possible SEGV in IOMMU type1 container
* Dropped initialization of host IOMMU device with mdev devices

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmafyVUACgkQUaNDx8/7
# 7KGebRAAzEYxvstDxSPNF+1xx937TKbRpiKYtspTfEgu4Ht50MwO2ZqnVWzTBSwa
# qcjhDf2avMBpBvkp4O9fR7nXR0HRN2KvYrBSThZ3Qpqu4KjxCAGcHI5uYmgfizYh
# BBLrw3eWME5Ry220TinQF5KFl50vGq7Z/mku5N5Tgj2qfTfCXYK1Kc19SyAga49n
# LSokTIjZAGJa4vxrE7THawaEUjFRjfCJey64JUs/TPJaGr4R1snJcWgETww6juUE
# 9OSw/xl0AoQhaN/ZTRC1qCsBLUI2MVPsC+x+vqVK62HlTjCx+uDRVQ8KzfDzjCeH
# gaLkMjxJSuJZMpm4UU7DBzDGEGcEBCGeNyFt37BSqqPPpX55CcFhj++d8vqTiwpF
# YzmTNd/znxcZTw6OJN9sQZohh+NeS86CVZ3x31HD3dXifhRf17jbh7NoIyi+0ZCb
# N+mytOH5BXsD+ddwbk+yMaxXV43Fgz7ThG5tB1tjhhNtLZHDA5ezFvGZ5F/FJrqE
# xAbjOhz5MC+RcOVNSzQJCULNqFpfE6Gqeys6btEDm/ltf4LpAe6W1HYuv8BJc19T
# UsqGK2yKAuQX8GErYxJ1zqZCttVrgpsmXFYTC5iGbxC84mvsF0Iti96IdXz9gfzN
# Vlb2OxoefcOwVqIhbkvTZW0ZwYGGDDPAYhLMfr5lSuRqj123OOo=
# =cViP
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 24 Jul 2024 01:16:37 AM AEST
# gpg:                using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B  0B60 51A3 43C7 CFFB ECA1

* tag 'pull-vfio-20240723-1' of https://github.com/legoater/qemu:
  vfio/common: Allow disabling device dirty page tracking
  vfio/migration: Don't block migration device dirty tracking is unsupported
  vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
  vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
  vfio/iommufd: Probe and request hwpt dirty tracking capability
  vfio/{iommufd, container}: Invoke HostIOMMUDevice::realize() during attach_device()
  vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps
  vfio/{iommufd,container}: Remove caps::aw_bits
  vfio/iommufd: Introduce auto domain creation
  vfio/ccw: Don't initialize HOST_IOMMU_DEVICE with mdev
  vfio/ap: Don't initialize HOST_IOMMU_DEVICE with mdev
  vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
  backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities
  vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev
  vfio/pci: Extract mdev check into an helper
  hw/vfio/container: Fix SIGSEV on vfio_container_instance_finalize()

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-24 12:58:46 +10:00
Richard Henderson
43f59bf765 * target/i386/kvm: support for reading RAPL MSRs using a helper program
* hpet: emulation improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmaelL4UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMXoQf+K77lNlHLETSgeeP3dr7yZPOmXjjN
 qFY/18jiyLw7MK1rZC09fF+n9SoaTH8JDKupt0z9M1R10HKHLIO04f8zDE+dOxaE
 Rou3yKnlTgFPGSoPPFr1n1JJfxtYlLZRoUzaAcHUaa4W7JR/OHJX90n1Rb9MXeDk
 jV6P0v1FWtIDdM6ERm9qBGoQdYhj6Ra2T4/NZKJFXwIhKEkxgu4yO7WXv8l0dxQz
 jE4fKotqAvrkYW1EsiVZm30lw/19duhvGiYeQXoYhk8KKXXjAbJMblLITSNWsCio
 3l6Uud/lOxekkJDAq5nH3H9hCBm0WwvwL+0vRf3Mkr+/xRGvrhtmUdp8NQ==
 =00mB
 -----END PGP SIGNATURE-----

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

* target/i386/kvm: support for reading RAPL MSRs using a helper program
* hpet: emulation improvements

# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmaelL4UHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroMXoQf+K77lNlHLETSgeeP3dr7yZPOmXjjN
# qFY/18jiyLw7MK1rZC09fF+n9SoaTH8JDKupt0z9M1R10HKHLIO04f8zDE+dOxaE
# Rou3yKnlTgFPGSoPPFr1n1JJfxtYlLZRoUzaAcHUaa4W7JR/OHJX90n1Rb9MXeDk
# jV6P0v1FWtIDdM6ERm9qBGoQdYhj6Ra2T4/NZKJFXwIhKEkxgu4yO7WXv8l0dxQz
# jE4fKotqAvrkYW1EsiVZm30lw/19duhvGiYeQXoYhk8KKXXjAbJMblLITSNWsCio
# 3l6Uud/lOxekkJDAq5nH3H9hCBm0WwvwL+0vRf3Mkr+/xRGvrhtmUdp8NQ==
# =00mB
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 23 Jul 2024 03:19:58 AM AEST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]

* tag 'for-upstream' of https://gitlab.com/bonzini/qemu:
  hpet: avoid timer storms on periodic timers
  hpet: store full 64-bit target value of the counter
  hpet: accept 64-bit reads and writes
  hpet: place read-only bits directly in "new_val"
  hpet: remove unnecessary variable "index"
  hpet: ignore high bits of comparator in 32-bit mode
  hpet: fix and cleanup persistence of interrupt status
  Add support for RAPL MSRs in KVM/Qemu
  tools: build qemu-vmsr-helper
  qio: add support for SO_PEERCRED for socket channel
  target/i386: do not crash if microvm guest uses SGX CPUID leaves

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-24 11:25:40 +10:00
Joao Martins
7c30710bd9 vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
that fetches the bitmap that tells what was dirty in an IOVA
range.

A single bitmap is allocated and used across all the hwpts
sharing an IOAS which is then used in log_sync() to set Qemu
global bitmaps.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23 17:14:52 +02:00
Joao Martins
52ce88229c vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that
enables or disables dirty page tracking. The ioctl is used if the hwpt
has been created with dirty tracking supported domain (stored in
hwpt::flags) and it is called on the whole list of iommu domains.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23 17:14:52 +02:00
Joao Martins
21e8d3a3aa vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps
Store the value of @caps returned by iommufd_backend_get_device_info()
in a new field HostIOMMUDeviceCaps::hw_caps. Right now the only value is
whether device IOMMU supports dirty tracking (IOMMU_HW_CAP_DIRTY_TRACKING).

This is in preparation for HostIOMMUDevice::realize() being called early
during attach_device().

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23 17:14:52 +02:00
Joao Martins
6c63532642 vfio/{iommufd,container}: Remove caps::aw_bits
Remove caps::aw_bits which requires the bcontainer::iova_ranges being
initialized after device is actually attached. Instead defer that to
.get_cap() and call vfio_device_get_aw_bits() directly.

This is in preparation for HostIOMMUDevice::realize() being called early
during attach_device().

Suggested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23 17:14:52 +02:00
Joao Martins
5b1e96e654 vfio/iommufd: Introduce auto domain creation
There's generally two modes of operation for IOMMUFD:

1) The simple user API which intends to perform relatively simple things
with IOMMUs e.g. DPDK. The process generally creates an IOAS and attaches
to VFIO and mainly performs IOAS_MAP and UNMAP.

2) The native IOMMUFD API where you have fine grained control of the
IOMMU domain and model it accordingly. This is where most new feature
are being steered to.

For dirty tracking 2) is required, as it needs to ensure that
the stage-2/parent IOMMU domain will only attach devices
that support dirty tracking (so far it is all homogeneous in x86, likely
not the case for smmuv3). Such invariant on dirty tracking provides a
useful guarantee to VMMs that will refuse incompatible device
attachments for IOMMU domains.

Dirty tracking insurance is enforced via HWPT_ALLOC, which is
responsible for creating an IOMMU domain. This is contrast to the
'simple API' where the IOMMU domain is created by IOMMUFD automatically
when it attaches to VFIO (usually referred as autodomains) but it has
the needed handling for mdevs.

To support dirty tracking with the advanced IOMMUFD API, it needs
similar logic, where IOMMU domains are created and devices attached to
compatible domains. Essentially mimicking kernel
iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU domain
it falls back to IOAS attach.

The auto domain logic allows different IOMMU domains to be created when
DMA dirty tracking is not desired (and VF can provide it), and others where
it is. Here it is not used in this way given how VFIODevice migration
state is initialized after the device attachment. But such mixed mode of
IOMMU dirty tracking + device dirty tracking is an improvement that can
be added on. Keep the 'all of nothing' of type1 approach that we have
been using so far between container vs device dirty tracking.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
[ clg: Added ERRP_GUARD() in iommufd_cdev_autodomains_get() ]
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23 17:14:52 +02:00
Joao Martins
2d1bf25897 backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities
The helper will be able to fetch vendor agnostic IOMMU capabilities
supported both by hardware and software. Right now it is only iommu dirty
tracking.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23 17:14:52 +02:00
Huai-Cheng Kuo
bc419a1cc5 backends: Initial support for SPDM socket support
SPDM enables authentication, attestation and key exchange to assist in
providing infrastructure security enablement. It's a standard published
by the DMTF [1].

SPDM supports multiple transports, including PCIe DOE and MCTP.
This patch adds support to QEMU to connect to an external SPDM
instance.

SPDM support can be added to any QEMU device by exposing a
TCP socket to a SPDM server. The server can then implement the SPDM
decoding/encoding support, generally using libspdm [2].

This is similar to how the current TPM implementation works and means
that the heavy lifting of setting up certificate chains, capabilities,
measurements and complex crypto can be done outside QEMU by a well
supported and tested library.

1: https://www.dmtf.org/standards/SPDM
2: https://github.com/DMTF/libspdm

Signed-off-by: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
Signed-off-by: Chris Browy <cbrowy@avery-design.com>
Co-developed-by: Jonathan Cameron <Jonathan.cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
[ Changes by WM
 - Bug fixes from testing
]
Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
[ Changes by AF:
 - Convert to be more QEMU-ified
 - Move to backends as it isn't PCIe specific
]
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20240703092027.644758-3-alistair.francis@wdc.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:42 -04:00
Salil Mehta
08c3286822 accel/kvm: Extract common KVM vCPU {creation,parking} code
KVM vCPU creation is done once during the vCPU realization when Qemu vCPU thread
is spawned. This is common to all the architectures as of now.

Hot-unplug of vCPU results in destruction of the vCPU object in QOM but the
corresponding KVM vCPU object in the Host KVM is not destroyed as KVM doesn't
support vCPU removal. Therefore, its representative KVM vCPU object/context in
Qemu is parked.

Refactor architecture common logic so that some APIs could be reused by vCPU
Hotplug code of some architectures likes ARM, Loongson etc. Update new/old APIs
with trace events. New APIs qemu_{create,park,unpark}_vcpu() can be externally
called. No functional change is intended here.

Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Xianglai Li <lixianglai@loongson.cn>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Reviewed-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240716111502.202344-2-salil.mehta@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22 20:15:41 -04:00
Anthony Harivel
0418f90809 Add support for RAPL MSRs in KVM/Qemu
Starting with the "Sandy Bridge" generation, Intel CPUs provide a RAPL
interface (Running Average Power Limit) for advertising the accumulated
energy consumption of various power domains (e.g. CPU packages, DRAM,
etc.).

The consumption is reported via MSRs (model specific registers) like
MSR_PKG_ENERGY_STATUS for the CPU package power domain. These MSRs are
64 bits registers that represent the accumulated energy consumption in
micro Joules. They are updated by microcode every ~1ms.

For now, KVM always returns 0 when the guest requests the value of
these MSRs. Use the KVM MSR filtering mechanism to allow QEMU handle
these MSRs dynamically in userspace.

To limit the amount of system calls for every MSR call, create a new
thread in QEMU that updates the "virtual" MSR values asynchronously.

Each vCPU has its own vMSR to reflect the independence of vCPUs. The
thread updates the vMSR values with the ratio of energy consumed of
the whole physical CPU package the vCPU thread runs on and the
thread's utime and stime values.

All other non-vCPU threads are also taken into account. Their energy
consumption is evenly distributed among all vCPUs threads running on
the same physical CPU package.

To overcome the problem that reading the RAPL MSR requires priviliged
access, a socket communication between QEMU and the qemu-vmsr-helper is
mandatory. You can specified the socket path in the parameter.

This feature is activated with -accel kvm,rapl=true,path=/path/sock.sock

Actual limitation:
- Works only on Intel host CPU because AMD CPUs are using different MSR
  adresses.

- Only the Package Power-Plane (MSR_PKG_ENERGY_STATUS) is reported at
  the moment.

Signed-off-by: Anthony Harivel <aharivel@redhat.com>
Link: https://lore.kernel.org/r/20240522153453.1230389-4-aharivel@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22 19:19:37 +02:00
Eric Auger
8fe0ebe15d HostIOMMUDevice: Introduce get_page_size_mask() callback
This callback will be used to retrieve the page size mask supported
along a given Host IOMMU device.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09 11:50:37 +02:00
Eric Auger
d59ca1ca17 HostIOMMUDevice : remove Error handle from get_iova_ranges callback
The error handle argument is not used anywhere. let's remove it.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09 11:50:37 +02:00
Thomas Weißschuh
6269086b01 hw/misc/pvpanic: add support for normal shutdowns
Shutdown requests are normally hardware dependent.
By extending pvpanic to also handle shutdown requests, guests can
submit such requests with an easily implementable and cross-platform
mechanism.

Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Message-Id: <20240527-pvpanic-shutdown-v8-5-5a28ec02558b@t-8ch.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-01 17:16:04 -04:00
Richard Henderson
3f044554b9 vfio queue:
* Add a host IOMMU device abstraction
 * VIRTIO-IOMMU/VFIO: Fix host iommu geometry handling
 * QOMify VFIOContainer
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmZ541QACgkQUaNDx8/7
 7KFdnQ/8Dih3HI2qtY93bTxg0lmJ+ZMibojTkEkTu3kSvwoI12wkiSMFKzzTWpZE
 UtGyIqQQij8IfQtIz87uQskv7oFiZKG6JWMTAX4uJ8ZIgZiih29/e/38VGEbogBh
 yO+1Pqr3ETlyLnQcu9ruBTJ293LXovmD4d9feoaVdURBNZ1EqIh7sv/y7YdUsR+i
 tXa6kW1ZIlKBI54o/uuODHWQYyOHs39VtZ6JZvgxVVEQsNikcJsosK9ts9A1EByi
 0roQVXm2QAK/nPXlmMGLvJWzQcdeXQ6W6hzYkO2HqGnCLURnpW+y/ZVbNcxGOOiU
 2G6L0TASlqA3yqCJeLuZZqjM6S2VbnvrA8omyg4QnygIHppYjp2CdcCmUpg6wfze
 rkgbVLNasX+le4ss2emuHPh55dLDP20yW83DeGeqSgE//foaJWhtOK/cnvs04zV2
 D6oSAVsOsZ6ozYlQckYnaxIBANDKLRnzCXVZLUCmHxCUhxHuiNJUsHfZYIv/Zxen
 C5ZjD/JPgx3onkoKbNfTRTgwOCdXhVPjWnnp7Su49jymsekqdk1ntln4ixDT3Vol
 ghQPQLjICBc8qXiOJAcFDwqLf/telPlzUUzvlDeC4BYMnpBAP6rQ3JJ8i0vCCiWv
 zKCtmbcDqDRMDpWyJWM3XA/kVKP9i2tNa1R/ej2SleCFLgRapBw=
 =3koe
 -----END PGP SIGNATURE-----

Merge tag 'pull-vfio-20240624' of https://github.com/legoater/qemu into staging

vfio queue:

* Add a host IOMMU device abstraction
* VIRTIO-IOMMU/VFIO: Fix host iommu geometry handling
* QOMify VFIOContainer

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmZ541QACgkQUaNDx8/7
# 7KFdnQ/8Dih3HI2qtY93bTxg0lmJ+ZMibojTkEkTu3kSvwoI12wkiSMFKzzTWpZE
# UtGyIqQQij8IfQtIz87uQskv7oFiZKG6JWMTAX4uJ8ZIgZiih29/e/38VGEbogBh
# yO+1Pqr3ETlyLnQcu9ruBTJ293LXovmD4d9feoaVdURBNZ1EqIh7sv/y7YdUsR+i
# tXa6kW1ZIlKBI54o/uuODHWQYyOHs39VtZ6JZvgxVVEQsNikcJsosK9ts9A1EByi
# 0roQVXm2QAK/nPXlmMGLvJWzQcdeXQ6W6hzYkO2HqGnCLURnpW+y/ZVbNcxGOOiU
# 2G6L0TASlqA3yqCJeLuZZqjM6S2VbnvrA8omyg4QnygIHppYjp2CdcCmUpg6wfze
# rkgbVLNasX+le4ss2emuHPh55dLDP20yW83DeGeqSgE//foaJWhtOK/cnvs04zV2
# D6oSAVsOsZ6ozYlQckYnaxIBANDKLRnzCXVZLUCmHxCUhxHuiNJUsHfZYIv/Zxen
# C5ZjD/JPgx3onkoKbNfTRTgwOCdXhVPjWnnp7Su49jymsekqdk1ntln4ixDT3Vol
# ghQPQLjICBc8qXiOJAcFDwqLf/telPlzUUzvlDeC4BYMnpBAP6rQ3JJ8i0vCCiWv
# zKCtmbcDqDRMDpWyJWM3XA/kVKP9i2tNa1R/ej2SleCFLgRapBw=
# =3koe
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 24 Jun 2024 02:21:24 PM PDT
# gpg:                using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B  0B60 51A3 43C7 CFFB ECA1

* tag 'pull-vfio-20240624' of https://github.com/legoater/qemu: (42 commits)
  vfio/container: Move vfio_container_destroy() to an instance_finalize() handler
  vfio/container: Introduce vfio_iommu_legacy_instance_init()
  vfio/container: Remove vfio_container_init()
  vfio/container: Remove VFIOContainerBase::ops
  vfio/container: Introduce an instance_init() handler
  vfio/container: Switch to QOM
  vfio/container: Change VFIOContainerBase to use QOM
  vfio/container: Discover IOMMU type before creating the container
  vfio/container: Introduce vfio_create_container()
  vfio/container: Introduce vfio_get_iommu_class_name()
  vfio/container: Modify vfio_get_iommu_type() to use a container fd
  vfio/container: Simplify vfio_container_init()
  vfio/container: Introduce vfio_address_space_insert()
  vfio/common: Extract vIOMMU code from vfio_sync_dirty_bitmap()
  vfio/common: Move dirty tracking ranges update to helper
  vfio: Remove unused declarations from vfio-common.h
  vfio: Make vfio_devices_dma_logging_start() return bool
  memory: Remove IOMMU MR iommu_set_iova_range API
  hw/vfio: Remove memory_region_iommu_set_iova_ranges() call
  virtio-iommu: Remove the implementation of iommu_set_iova_range
  ...

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-06-24 21:30:34 -07:00
Eric Auger
a95264191f HostIOMMUDevice: Store the aliased bus and devfn
Store the aliased bus and devfn in the HostIOMMUDevice.
This will be useful to handle info that are iommu group
specific and not device specific (such as reserved
iova ranges).

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
2024-06-24 23:15:30 +02:00
Eric Auger
3ad35d9158 HostIOMMUDevice: Introduce get_iova_ranges callback
Introduce a new HostIOMMUDevice callback that allows to
retrieve the usable IOVA ranges.

Implement this callback in the legacy VFIO and IOMMUFD VFIO
host iommu devices. This relies on the VFIODevice agent's
base container iova_ranges resource.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
2024-06-24 23:15:30 +02:00
Eric Auger
dc169694ca HostIOMMUDevice: Store the VFIO/VDPA agent
Store the agent device (VFIO or VDPA) in the host IOMMU device.
This will allow easy access to some of its resources.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
2024-06-24 23:15:30 +02:00
Zhenzhong Duan
42965386ea backends/iommufd: Introduce helper function iommufd_backend_get_device_info()
Introduce a helper function iommufd_backend_get_device_info() to get
host IOMMU related information through iommufd uAPI.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
2024-06-24 23:15:30 +02:00
Zhenzhong Duan
9005f92844 backends/iommufd: Introduce TYPE_HOST_IOMMU_DEVICE_IOMMUFD[_VFIO] devices
TYPE_HOST_IOMMU_DEVICE_IOMMUFD represents a host IOMMU device under
iommufd backend. It is abstract, because it is going to be derived
into VFIO or VDPA type'd device.

It will have its own .get_cap() implementation.

TYPE_HOST_IOMMU_DEVICE_IOMMUFD_VFIO is a sub-class of
TYPE_HOST_IOMMU_DEVICE_IOMMUFD, represents a VFIO type'd host IOMMU
device under iommufd backend. It will be created during VFIO device
attaching and passed to vIOMMU.

It will have its own .realize() implementation.

Opportunistically, add missed header to include/sysemu/iommufd.h.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
2024-06-24 23:15:30 +02:00
Zhenzhong Duan
38998c79a1 backends/host_iommu_device: Introduce HostIOMMUDeviceCaps
HostIOMMUDeviceCaps's elements map to the host IOMMU's capabilities.
Different platform IOMMU can support different elements.

Currently only two elements, type and aw_bits, type hints the host
platform IOMMU type, i.e., INTEL vtd, ARM smmu, etc; aw_bits hints
host IOMMU address width.

Introduce .get_cap() handler to check if HOST_IOMMU_DEVICE_CAP_XXX
is supported.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
2024-06-24 23:15:30 +02:00
Zhenzhong Duan
1f94b21801 backends: Introduce HostIOMMUDevice abstract
A HostIOMMUDevice is an abstraction for an assigned device that is protected
by a physical IOMMU (aka host IOMMU). The userspace interaction with this
physical IOMMU can be done either through the VFIO IOMMU type 1 legacy
backend or the new iommufd backend. The assigned device can be a VFIO device
or a VDPA device. The HostIOMMUDevice is needed to interact with the host
IOMMU that protects the assigned device. It is especially useful when the
device is also protected by a virtual IOMMU as this latter use the translation
services of the physical IOMMU and is constrained by it. In that context the
HostIOMMUDevice can be passed to the virtual IOMMU to collect physical IOMMU
capabilities such as the supported address width. In the future, the virtual
IOMMU will use the HostIOMMUDevice to program the guest page tables in the
first translation stage of the physical IOMMU.

Introduce .realize() to initialize HostIOMMUDevice further after instance init.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
2024-06-24 23:15:30 +02:00
Pierrick Bouvier
d4d133a34b qtest: move qtest_{get, set}_virtual_clock to accel/qtest/qtest.c
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240530220610.1245424-5-pierrick.bouvier@linaro.org>
Message-Id: <20240620152220.2192768-8-alex.bennee@linaro.org>
2024-06-24 10:14:56 +01:00
Alex Bennée
e83e386200 qtest: use cpu interface in qtest_clock_warp
This generalises the qtest_clock_warp code to use the AccelOps
handlers for updating its own sense of time. This will make the next
patch which moves the warp code closer to pure code motion.

From: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-Id: <20240530220610.1245424-3-pierrick.bouvier@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240620152220.2192768-6-alex.bennee@linaro.org>
2024-06-24 10:14:39 +01:00
Alex Bennée
113ac1d212 sysemu: add set_virtual_time to accel ops
We are about to remove direct calls to individual accelerators for
this information and will need a central point for plugins to hook
into time changes.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240530220610.1245424-2-pierrick.bouvier@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240620152220.2192768-5-alex.bennee@linaro.org>
2024-06-24 10:14:34 +01:00
Edgar E. Iglesias
9ecdd4bf08 xen: mapcache: Add support for grant mappings
Add a second mapcache for grant mappings. The mapcache for
grants needs to work with XC_PAGE_SIZE granularity since
we can't map larger ranges than what has been granted to us.

Like with foreign mappings (xen_memory), machines using grants
are expected to initialize the xen_grants MR and map it
into their address-map accordingly.

CC: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2024-06-09 20:16:14 +02:00
Edgar E. Iglesias
49a7202979 xen: mapcache: Pass the ram_addr offset to xen_map_cache()
Pass the ram_addr offset to xen_map_cache.
This is in preparation for adding grant mappings that need
to compute the address within the RAMBlock.

No functional changes.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-06-09 20:16:14 +02:00
Phil Dennis-Jordan
a3c67dfc14 hvf: Makes assert_hvf_ok report failed expression
When a macOS Hypervisor.framework call fails which is checked by
assert_hvf_ok(), Qemu exits printing the error value, but not the
location
in the code, as regular assert() macro expansions would.

This change turns assert_hvf_ok() into a macro similar to other
assertions, which expands to a call to the corresponding _impl()
function together with information about the expression that failed
the assertion and its location in the code.

Additionally, stringifying the numeric hv_return_t code is factored
into a helper function that can be reused for diagnostics and debugging
outside of assertions.

Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
Message-ID: <20240605112556.43193-8-phil@philjordan.eu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-08 10:33:38 +02:00
Phil Dennis-Jordan
f21f0cbc2c hvf: Consistent types for vCPU handles
macOS Hypervisor.framework uses different types for identifying vCPUs, hv_vcpu_t or hv_vcpuid_t, depending on host architecture. They are not just differently named typedefs for the same primitive type, but reference different-width integers.

Instead of using an integer type and casting where necessary, this change introduces a typedef which resolves the active architecture’s hvf typedef. It also removes a now-unnecessary cast.

Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
Reviewed-by: Roman Bolshakov <roman@roolebo.dev>
Tested-by: Roman Bolshakov <roman@roolebo.dev>
Message-ID: <20240605112556.43193-4-phil@philjordan.eu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-08 10:33:38 +02:00
Michal Privoznik
5d9a9a6170 backends/hostmem: Report error when memory size is unaligned
If memory-backend-{file,ram} has a size that's not aligned to
underlying page size it is not only wasteful, but also may lead
to hard to debug behaviour. For instance, in case
memory-backend-file and hugepages, madvise() and mbind() fail.
Rightfully so, page is the smallest unit they can work with. And
even though an error is reported, the root cause it not very
clear:

  qemu-system-x86_64: Couldn't set property 'dump' on 'memory-backend-file': Invalid argument

After this commit:

  qemu-system-x86_64: backend 'memory-backend-file' memory size must be multiple of 2 MiB

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Message-ID: <b5b9f9c6bba07879fb43f3c6f496c69867ae3716.1717584048.git.mprivozn@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-08 10:33:38 +02:00
Edgar E. Iglesias
1be974bc26 xen: Add xen_mr_is_memory()
Add xen_mr_is_memory() to abstract away tests for the
xen_memory MR.

No functional changes.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240529140739.1387692-4-edgar.iglesias@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-06-04 11:53:34 +02:00
Zhenzhong Duan
9067d50dff backends/iommufd: Make iommufd_backend_*() return bool
This is to follow the coding standand to return bool if 'Error **'
is used to pass error.

The changed functions include:

iommufd_backend_connect
iommufd_backend_alloc_ioas

By this chance, simplify the functions a bit by avoiding duplicate
recordings, e.g., log through either error interface or trace, not
both.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-16 16:59:20 +02:00
Paolo Bonzini
1935b7ead1 kconfig: allow compiling out QEMU device tree code per target
Introduce a new Kconfig symbol, CONFIG_DEVICE_TREE, that specifies whether
to include the common device tree code in system/device_tree.c and to
link to libfdt.  For now, include it unconditionally if libfdt is
available.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-10 15:45:15 +02:00
Richard Henderson
873f9ca385 Accelerator patches
- Extract page-protection definitions to page-protection.h
 - Rework in accel/tcg in preparation of extracting TCG fields from CPUState
 - More uses of get_task_state() in user emulation
 - Xen refactors in preparation for adding multiple map caches (Juergen & Edgar)
 - MAINTAINERS updates (Aleksandar and Bin)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmY40CAACgkQ4+MsLN6t
 wN5drxAA1oIsuUzpAJmlMIxZwlzbICiuexgn/HH9DwWNlrarKo7V1l4YB8jd9WOg
 IKuj7c39kJKsDEB8BXApYwcly+l7DYdnAAI8Z7a+eN+ffKNl/0XBaLjsGf58RNwY
 fb39/cXWI9ZxKxsHMSyjpiu68gOGvZ5JJqa30Fr+eOGuug9Fn/fOe1zC6l/dMagy
 Dnym72stpD+hcsN5sVwohTBIk+7g9og1O/ctRx6Q3ZCOPz4p0+JNf8VUu43/reaR
 294yRK++JrSMhOVFRzP+FH1G25NxiOrVCFXZsUTYU+qPDtdiKtjH1keI/sk7rwZ7
 U573lesl7ewQFf1PvMdaVf0TrQyOe6kUGr9Mn2k8+KgjYRAjTAQk8V4Ric/+xXSU
 0rd7Cz7lyQ8jm0DoOElROv+lTDQs4dvm3BopF3Bojo4xHLHd3SFhROVPG4tvGQ3H
 72Q5UPR2Jr2QZKiImvPceUOg0z5XxoN6KRUkSEpMFOiTRkbwnrH59z/qPijUpe6v
 8l5IlI9GjwkL7pcRensp1VC6e9KC7F5Od1J/2RLDw3UQllMQXqVw2bxD3CEtDRJL
 QSZoS4d1jUCW4iAYdqh/8+2cOIPiCJ4ai5u7lSdjrIJkRErm32FV/pQLZauoHlT5
 eTPUgzDoRXVgI1X1slTpVXlEEvRNbhZqSkYLkXr80MLn5hTafo0=
 =3Qkg
 -----END PGP SIGNATURE-----

Merge tag 'accel-20240506' of https://github.com/philmd/qemu into staging

Accelerator patches

- Extract page-protection definitions to page-protection.h
- Rework in accel/tcg in preparation of extracting TCG fields from CPUState
- More uses of get_task_state() in user emulation
- Xen refactors in preparation for adding multiple map caches (Juergen & Edgar)
- MAINTAINERS updates (Aleksandar and Bin)

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmY40CAACgkQ4+MsLN6t
# wN5drxAA1oIsuUzpAJmlMIxZwlzbICiuexgn/HH9DwWNlrarKo7V1l4YB8jd9WOg
# IKuj7c39kJKsDEB8BXApYwcly+l7DYdnAAI8Z7a+eN+ffKNl/0XBaLjsGf58RNwY
# fb39/cXWI9ZxKxsHMSyjpiu68gOGvZ5JJqa30Fr+eOGuug9Fn/fOe1zC6l/dMagy
# Dnym72stpD+hcsN5sVwohTBIk+7g9og1O/ctRx6Q3ZCOPz4p0+JNf8VUu43/reaR
# 294yRK++JrSMhOVFRzP+FH1G25NxiOrVCFXZsUTYU+qPDtdiKtjH1keI/sk7rwZ7
# U573lesl7ewQFf1PvMdaVf0TrQyOe6kUGr9Mn2k8+KgjYRAjTAQk8V4Ric/+xXSU
# 0rd7Cz7lyQ8jm0DoOElROv+lTDQs4dvm3BopF3Bojo4xHLHd3SFhROVPG4tvGQ3H
# 72Q5UPR2Jr2QZKiImvPceUOg0z5XxoN6KRUkSEpMFOiTRkbwnrH59z/qPijUpe6v
# 8l5IlI9GjwkL7pcRensp1VC6e9KC7F5Od1J/2RLDw3UQllMQXqVw2bxD3CEtDRJL
# QSZoS4d1jUCW4iAYdqh/8+2cOIPiCJ4ai5u7lSdjrIJkRErm32FV/pQLZauoHlT5
# eTPUgzDoRXVgI1X1slTpVXlEEvRNbhZqSkYLkXr80MLn5hTafo0=
# =3Qkg
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 06 May 2024 05:42:08 AM PDT
# gpg:                using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full]

* tag 'accel-20240506' of https://github.com/philmd/qemu: (28 commits)
  MAINTAINERS: Update my email address
  MAINTAINERS: Update Aleksandar Rikalo email
  system: Pass RAM MemoryRegion and is_write in xen_map_cache()
  xen: mapcache: Break out xen_map_cache_init_single()
  xen: mapcache: Break out xen_invalidate_map_cache_single()
  xen: mapcache: Refactor xen_invalidate_map_cache_entry_unlocked
  xen: mapcache: Refactor xen_replace_cache_entry_unlocked
  xen: mapcache: Break out xen_ram_addr_from_mapcache_single
  xen: mapcache: Refactor xen_remap_bucket for multi-instance
  xen: mapcache: Refactor xen_map_cache for multi-instance
  xen: mapcache: Refactor lock functions for multi-instance
  xen: let xen_ram_addr_from_mapcache() return -1 in case of not found entry
  system: let qemu_map_ram_ptr() use qemu_ram_ptr_length()
  user: Use get_task_state() helper
  user: Declare get_task_state() once in 'accel/tcg/vcpu-state.h'
  user: Forward declare TaskState type definition
  accel/tcg: Move @plugin_mem_cbs from CPUState to CPUNegativeOffsetState
  accel/tcg: Restrict cpu_plugin_mem_cbs_enabled() to TCG
  accel/tcg: Restrict qemu_plugin_vcpu_exit_hook() to TCG plugins
  accel/tcg: Update CPUNegativeOffsetState::can_do_io field documentation
  ...

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-05-06 10:19:10 -07:00
Edgar E. Iglesias
5a5585f45d system: Pass RAM MemoryRegion and is_write in xen_map_cache()
Propagate MR and is_write to xen_map_cache().
This is in preparation for adding support for grant mappings.

No functional change.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-ID: <20240430164939.925307-14-edgar.iglesias@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-05-06 14:41:39 +02:00