This allows other device classes that will be exposed via PCI to be
able to do so in the appropriate hw/ directory. I resisted the
temptation to re-order headers to be more aesthetically pleasing.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200925125147.26943-4-alex.bennee@linaro.org>
Message-Id: <20220321153037.3622127-2-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Coverity rightly reports that is not free in that case.
Fixes: Coverity CID 1487559
Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20220512175747.142058-7-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Per https://discourse.gnome.org/t/port-your-module-from-g-memdup-to-g-memdup2-now/5538
The old API took the size of the memory to duplicate as a guint,
whereas most memory functions take memory sizes as a gsize. This
made it easy to accidentally pass a gsize to g_memdup(). For large
values, that would lead to a silent truncation of the size from 64
to 32 bits, and result in a heap area being returned which is
significantly smaller than what the caller expects. This can likely
be exploited in various modules to cause a heap buffer overflow.
Replace g_memdup() by the safer g_memdup2() wrapper.
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20220512175747.142058-6-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
With the introduction of MQ the index of the vq needs to be calculated
with the device model vq_index.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220512175747.142058-5-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: 6d0b222666 ("vdpa: Adapt vhost_vdpa_get_vring_base to SVQ")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20220512175747.142058-4-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Only the first one of them were properly enqueued back.
Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20220512175747.142058-3-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The device could have access to modify them, and it definitely have
access when we implement packed vq. Harden SVQ maintaining a private
copy of the descriptor chain. Other fields like buffer addresses are
already maintained sepparatedly.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20220512175747.142058-2-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Unlike most virtio features ACCESS_PLATFORM is considered mandatory by
QEMU, i.e. the driver must accept it if offered by the device. The
virtio specification says that the driver SHOULD accept the
ACCESS_PLATFORM feature if offered, and that the device MAY fail to
operate if ACCESS_PLATFORM was offered but not negotiated.
While a SHOULD ain't exactly a MUST, we are certainly allowed to fail
the device when the driver fences ACCESS_PLATFORM. With commit
2943b53f68 ("virtio: force VIRTIO_F_IOMMU_PLATFORM") we already made the
decision to do so whenever the get_dma_as() callback is implemented (by
the bus), which in practice means for the entirety of virtio-pci.
That means, if the device needs to translate I/O addresses, then
ACCESS_PLATFORM is mandatory. The aforementioned commit tells us in the
commit message that this is for security reasons. More precisely if we
were to allow a less then trusted driver (e.g. an user-space driver, or
a nested guest) to make the device bypass the IOMMU by not negotiating
ACCESS_PLATFORM, then the guest kernel would have no ability to
control/police (by programming the IOMMU) what pieces of guest memory
the driver may manipulate using the device. Which would break security
assumptions within the guest.
If ACCESS_PLATFORM is offered not because we want the device to utilize
an IOMMU and do address translation, but because the device does not
have access to the entire guest RAM, and needs the driver to grant
access to the bits it needs access to (e.g. confidential guest support),
we still require the guest to have the corresponding logic and to accept
ACCESS_PLATFORM. If the driver does not accept ACCESS_PLATFORM, then
things are bound to go wrong, and we may see failures much less graceful
than failing the device because the driver didn't negotiate
ACCESS_PLATFORM.
So let us make ACCESS_PLATFORM mandatory for the driver regardless
of whether the get_dma_as() callback is implemented or not.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Fixes: 2943b53f68 ("virtio: force VIRTIO_F_IOMMU_PLATFORM")
Message-Id: <20220307112939.2780117-1-pasic@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
The vsock callbacks .vhost_vsock_set_guest_cid and
.vhost_vsock_set_running are the only ones to be conditional
on #ifdef CONFIG_VHOST_VSOCK. This is different from any other
device-dependent callbacks like .vhost_scsi_set_endpoint, and it
also broke when CONFIG_VHOST_VSOCK was changed to a per-target
symbol.
It would be possible to also use the CONFIG_DEVICES include, but
really there is no reason for most virtio files to be per-target
so just remove the #ifdef to fix the issue.
Reported-by: Dov Murik <dovmurik@linux.ibm.com>
Fixes: 9972ae314f ("build: move vhost-vsock configuration to Kconfig")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The virtio-scsi event virtqueue is not emptied by its handler function.
This is typical for rx virtqueues where the device uses buffers when
some event occurs (e.g. a packet is received, an error condition
happens, etc).
Polling non-empty virtqueues wastes CPU cycles. We are not waiting for
new buffers to become available, we are waiting for an event to occur,
so it's a misuse of CPU resources to poll for buffers.
Introduce the new virtio_queue_aio_attach_host_notifier_no_poll() API,
which is identical to virtio_queue_aio_attach_host_notifier() except
that it does not poll the virtqueue.
Before this patch the following command-line consumed 100% CPU in the
IOThread polling and calling virtio_scsi_handle_event():
$ qemu-system-x86_64 -M accel=kvm -m 1G -cpu host \
--object iothread,id=iothread0 \
--device virtio-scsi-pci,iothread=iothread0 \
--blockdev file,filename=test.img,aio=native,cache.direct=on,node-name=drive0 \
--device scsi-hd,drive=drive0
After this patch CPU is no longer wasted.
Reported-by: Nir Soffer <nsoffer@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Nir Soffer <nsoffer@redhat.com>
Message-id: 20220427143541.119567-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
vhost-user-fs is a device and it should be possible to enable/disable
it with --without-default-devices, not --without-default-features.
Compute its default value in Kconfig to obtain the more intuitive
behavior.
In this case the configure options were undocumented, too.
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vhost-vsock and vhost-user-vsock are two devices of their own; it should
be possible to enable/disable them with --without-default-devices, not
--without-default-features. Compute their default value in Kconfig to
obtain the more intuitive behavior.
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since a sourceset already exists for this, avoid unnecessary repeat
of CONFIG_VIRTIO_PCI.
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The correct name of the macro is TARGET_PPC64.
Fixes: 27598393a2 ("Lift max memory slots limit imposed by vhost-user")
Reported-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
Cc: Peter Turschmid <peter.turschm@nutanix.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20220503180108.34506-1-muriloo@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The spec clarifies now that QEMU should not send a file descriptor in a
request to remove a memory region. Change it accordingly.
For libvhost-user, this is a bug fix that makes it compatible with
rust-vmm's implementation that doesn't send a file descriptor. Keep
accepting, but ignoring a file descriptor for compatibility with older
QEMU versions.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20220407133657.155281-4-kwolf@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The qemu_*block() functions are meant to be be used with sockets (the
win32 implementation expects SOCKET)
Over time, those functions where used with Win32 SOCKET or
file-descriptors interchangeably. But for portability, they must only be
used with socket-like file-descriptors. FDs can use
g_unix_set_fd_nonblocking() instead.
Rename the functions with "socket" in the name to prevent bad usages.
This is effectively reverting commit f9e8cacc55 ("oslib-posix:
rename socket_set_nonblock() to qemu_set_nonblock()").
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Those calls are non-socket fd, or are POSIX-specific. Use the dedicated
GLib API. (qemu_set_nonblock() is for socket-like)
(this is a preliminary patch before renaming qemu_set_nonblock())
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
This header only defines the tcg_allowed variable and the tcg_enabled()
function - which are not required in many files that include this
header. Drop the #include statement there.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220315144107.1012530-1-thuth@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* whpx support for breakpoints and stepping
* initial support for Hyper-V Synthetic Debugging
* use monotonic clock for QemuCond and QemuSemaphore
* Remove qemu-common.h include from most units and lots of other clenaups
* do not include headers for all virtio devices in virtio-ccw.h
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmJXCQAUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNT6wf+NHDJUEdDiwaVGVTGXgHuiaycsymi
FpNPiw/+XxSGN5xF3fkUGgqaDrcwIYwVfnXlghKSz8kp1cP3cjxa5CzNMLGTp5je
N6BxFbD7yC6dhagGm3mj32jlsptv3M38OHqKc3t+RaUAotP5RF2VdCyfUBLG6vU0
aMzvMfMtB5aG0D8Fr5EV63t1JMTceFU0YxsG73UCFs2Yx4Z0cGBbNxMbHweRhd1q
tPeVDS46MFPM3/2cGGHpeeqxkoCTU7A9j1VuNQI3k+Kg+6W5YVxiK/UP7bw77E/a
yAHsmIVTNro8ajMBch73weuHtGtdfFLvCKc6QX6aVjzK4dF1voQ01E7gPQ==
=rMle
-----END PGP SIGNATURE-----
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* Add cpu0-id to query-sev-capabilities
* whpx support for breakpoints and stepping
* initial support for Hyper-V Synthetic Debugging
* use monotonic clock for QemuCond and QemuSemaphore
* Remove qemu-common.h include from most units and lots of other clenaups
* do not include headers for all virtio devices in virtio-ccw.h
# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmJXCQAUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroNT6wf+NHDJUEdDiwaVGVTGXgHuiaycsymi
# FpNPiw/+XxSGN5xF3fkUGgqaDrcwIYwVfnXlghKSz8kp1cP3cjxa5CzNMLGTp5je
# N6BxFbD7yC6dhagGm3mj32jlsptv3M38OHqKc3t+RaUAotP5RF2VdCyfUBLG6vU0
# aMzvMfMtB5aG0D8Fr5EV63t1JMTceFU0YxsG73UCFs2Yx4Z0cGBbNxMbHweRhd1q
# tPeVDS46MFPM3/2cGGHpeeqxkoCTU7A9j1VuNQI3k+Kg+6W5YVxiK/UP7bw77E/a
# yAHsmIVTNro8ajMBch73weuHtGtdfFLvCKc6QX6aVjzK4dF1voQ01E7gPQ==
# =rMle
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 13 Apr 2022 10:31:44 AM PDT
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [undefined]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (53 commits)
target/i386: Remove unused XMMReg, YMMReg types and CPUState fields
target/i386: do not access beyond the low 128 bits of SSE registers
virtio-ccw: do not include headers for all virtio devices
virtio-ccw: move device type declarations to .c files
virtio-ccw: move vhost_ccw_scsi to a separate file
s390x: follow qdev tree to detect SCSI device on a CCW bus
hw: hyperv: Initial commit for Synthetic Debugging device
hyperv: Add support to process syndbg commands
hyperv: Add definitions for syndbg
hyperv: SControl is optional to enable SynIc
thread-posix: optimize qemu_sem_timedwait with zero timeout
thread-posix: implement Semaphore with QemuCond and QemuMutex
thread-posix: use monotonic clock for QemuCond and QemuSemaphore
thread-posix: remove the posix semaphore support
whpx: Added support for breakpoints and stepping
build-sys: simplify AF_VSOCK check
build-sys: drop ntddscsi.h check
Remove qemu-common.h include from most units
qga: remove explicit environ argument from exec/spawn
Move fcntl_setfl() to oslib-posix
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
A potential Use-after-free was reported in virtio_iommu_handle_command
when using virtio-iommu:
> I find a potential Use-after-free in QEMU 6.2.0, which is in
> virtio_iommu_handle_command() (./hw/virtio/virtio-iommu.c).
>
>
> Specifically, in the loop body, the variable 'buf' allocated at line 639 can be
> freed by g_free() at line 659. However, if the execution path enters the loop
> body again and the if branch takes true at line 616, the control will directly
> jump to 'out' at line 651. At this time, 'buf' is a freed pointer, which is not
> assigned with an allocated memory but used at line 653. As a result, a UAF bug
> is triggered.
>
>
>
> 599 for (;;) {
> ...
> 615 sz = iov_to_buf(iov, iov_cnt, 0, &head, sizeof(head));
> 616 if (unlikely(sz != sizeof(head))) {
> 617 tail.status = VIRTIO_IOMMU_S_DEVERR;
> 618 goto out;
> 619 }
> ...
> 639 buf = g_malloc0(output_size);
> ...
> 651 out:
> 652 sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
> 653 buf ? buf : &tail, output_size);
> ...
> 659 g_free(buf);
>
> We can fix it by set ‘buf‘ to NULL after freeing it:
>
>
> 651 out:
> 652 sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
> 653 buf ? buf : &tail, output_size);
> ...
> 659 g_free(buf);
> +++ buf = NULL;
> 660 }
Fix as suggested by the reporter.
Signed-off-by: Wentao Liang <Wentao_Liang_g@163.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20220407095047.50371-1-mst@redhat.com
Message-ID: <20220406040445-mutt-send-email-mst@kernel.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Replace the global variables with inlined helper functions. getpagesize() is very
likely annotated with a "const" function attribute (at least with glibc), and thus
optimization should apply even better.
This avoids the need for a constructor initialization too.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace a config-time define with a compile time condition
define (compatible with clang and gcc) that must be declared prior to
its usage. This avoids having a global configure time define, but also
prevents from bad usage, if the config header wasn't included before.
This can help to make some code independent from qemu too.
gcc supports __BYTE_ORDER__ from about 4.6 and clang from 3.2.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[ For the s390x parts I'm involved in ]
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220323155743.1585078-7-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
GLib g_get_real_time() is an alternative to gettimeofday() which allows
to simplify our code.
For semihosting, a few bits are lost on POSIX host, but this shouldn't
be a big concern.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220307070401.171986-5-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The vhost-shadow-virtqueue.c build requires include files from
linux-headers/, so it cannot be built on non-Linux systems.
Fortunately it is only needed by vhost-vdpa, so move it there.
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
One less qemu-specific macro. It also helps to make some headers/units
only depend on glib, and thus moved in standalone projects eventually.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.
This commit only touches allocations with size arguments of the form
sizeof(T).
Patch created mechanically with:
$ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \
--macro-file scripts/cocci-macro-file.h FILES...
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20220315144156.1595462-4-armbru@redhat.com>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
*opaque is an alias to *obj. Using the ladder makes the code consistent with
with other devices, e.g. accel/kvm/kvm-all and accel/tcg/tcg-all. It also
makes the cast more typesafe.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220301222301.103821-2-shentey@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
* VSS header fixes (Marc-André)
* 5-level EPT support (Vitaly)
* AMX support (Jing Liu & Yang Zhong)
* Bundle changes to MSI routes (Longpeng)
* More precise emulation of #SS (Gareth)
* Disable ASAN testing
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmIwb5QUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOOUQf8DiNcq8XVVMdX946Qwa4pSxc4ZJtF
X+RkNsscluuLJ2vGEFKwPVps6c6UPqAhXUruZOQmcLmma511MsyJrxyfd4iRgPD2
tL1+n4RpfsbnTEGT8c6TFWWMEIOjLTbKmR/SIxuxpeVG3xlk6tlCevykrIdc90gP
vQIByTGFx3GwiPyDo0j92mA/CsWLnfq6zQ2Tox1xCyt8R+QDimqG0KGLc5RAyiyC
ZmilN2yaqizDfkIzinwHG6gP1NGwVUsrUNl4X9C4mwEMFnsXiyKP5n/BlDZ7w4Wb
QXalFpPg1hJxRGGvyta6OF9VmCfmK9Q0FNVWm1lPE5adn3ECHFo6FJKvfg==
=LVgf
-----END PGP SIGNATURE-----
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* whpx fixes in preparation for GDB support (Ivan)
* VSS header fixes (Marc-André)
* 5-level EPT support (Vitaly)
* AMX support (Jing Liu & Yang Zhong)
* Bundle changes to MSI routes (Longpeng)
* More precise emulation of #SS (Gareth)
* Disable ASAN testing
# gpg: Signature made Tue 15 Mar 2022 10:51:00 GMT
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (22 commits)
gitlab-ci: do not run tests with address sanitizer
KVM: SVM: always set MSR_AMD64_TSC_RATIO to default value
i386: Add Icelake-Server-v6 CPU model with 5-level EPT support
x86: Support XFD and AMX xsave data migration
x86: add support for KVM_CAP_XSAVE2 and AMX state migration
x86: Add AMX CPUIDs enumeration
x86: Add XFD faulting bit for state components
x86: Grant AMX permission for guest
x86: Add AMX XTILECFG and XTILEDATA components
x86: Fix the 64-byte boundary enumeration for extended state
linux-headers: include missing changes from 5.17
target/i386: Throw a #SS when loading a non-canonical IST
target/i386: only include bits in pg_mode if they are not ignored
kvm/msi: do explicit commit when adding msi routes
kvm-irqchip: introduce new API to support route change
update meson-buildoptions.sh
qga/vss: update informative message about MinGW
qga/vss-win32: check old VSS SDK headers
meson: fix generic location of vss headers
vmxcap: Add 5-level EPT bit
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We invoke the kvm_irqchip_commit_routes() for each addition to MSI route
table, which is not efficient if we are adding lots of routes in some cases.
This patch lets callers invoke the kvm_irqchip_commit_routes(), so the
callers can decide how to optimize.
[1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg00967.html
Signed-off-by: Longpeng <longpeng2@huawei.com>
Message-Id: <20220222141116.2091-3-longpeng2@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
SVQ is able to log the dirty bits by itself, so let's use it to not
block migration.
Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
enabled. Even if the device supports it, the reports would be nonsense
because SVQ memory is in the qemu region.
The log region is still allocated. Future changes might skip that, but
this series is already long enough.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Setting the log address would make the device start reporting invalid
dirty memory because the SVQ vrings are located in qemu's memory.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This is needed to achieve migration, so the destination can restore its
index.
Setting base as last used idx, so destination will see as available all
the entries that the device did not use, including the in-flight
processing ones.
This is ok for networking, but other kinds of devices might have
problems with these retransmissions.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Use translations added in VhostIOVATree in SVQ.
Only introduce usage here, not allocation and deallocation. As with
previous patches, we use the dead code paths of shadow_vqs_enabled to
avoid commiting too many changes at once. These are impossible to take
at the moment.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This tree is able to look for a translated address from an IOVA address.
At first glance it is similar to util/iova-tree. However, SVQ working on
devices with limited IOVA space need more capabilities, like allocating
IOVA chunks or performing reverse translations (qemu addresses to iova).
The allocation capability, as "assign a free IOVA address to this chunk
of memory in qemu's address space" allows shadow virtqueue to create a
new address space that is not restricted by guest's addressable one, so
we can allocate shadow vqs vrings outside of it.
It duplicates the tree so it can search efficiently in both directions,
and it will signal overlap if iova or the translated address is present
in any tree.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Initial version of shadow virtqueue that actually forward buffers. There
is no iommu support at the moment, and that will be addressed in future
patches of this series. Since all vhost-vdpa devices use forced IOMMU,
this means that SVQ is not usable at this point of the series on any
device.
For simplicity it only supports modern devices, that expects vring
in little endian, with split ring and no event idx or indirect
descriptors. Support for them will not be added in this series.
It reuses the VirtQueue code for the device part. The driver part is
based on Linux's virtio_ring driver, but with stripped functionality
and optimizations so it's easier to review.
However, forwarding buffers have some particular pieces: One of the most
unexpected ones is that a guest's buffer can expand through more than
one descriptor in SVQ. While this is handled gracefully by qemu's
emulated virtio devices, it may cause unexpected SVQ queue full. This
patch also solves it by checking for this condition at both guest's
kicks and device's calls. The code may be more elegant in the future if
SVQ code runs in its own iocontext.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
First half of the buffers forwarding part, preparing vhost-vdpa
callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
this is effectively dead code at the moment, but it helps to reduce
patch size.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
It reports the shadow virtqueue address from qemu virtual address space.
Since this will be different from the guest's vaddr, but the device can
access it, SVQ takes special care about its alignment & lack of garbage
data. It assumes that IOMMU will work in host_page_size ranges for that.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This allows SVQ to negotiate features with the guest and the device. For
the device, SVQ is a driver. While this function bypasses all
non-transport features, it needs to disable the features that SVQ does
not support when forwarding buffers. This includes packed vq layout,
indirect descriptors or event idx.
Future changes can add support to offer more features to the guest,
since the use of VirtQueue gives this for free. This is left out at the
moment for simplicity.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This will make qemu aware of the device used buffers, allowing it to
write the guest memory with its contents if needed.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
At this mode no buffer forwarding will be performed in SVQ mode: Qemu
will just forward the guest's kicks to the device.
Host memory notifiers regions are left out for simplicity, and they will
not be addressed in this series.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Vhost shadow virtqueue (SVQ) is an intermediate jump for virtqueue
notifications and buffers, allowing qemu to track them. While qemu is
forwarding the buffers and virtqueue changes, it is able to commit the
memory it's being dirtied, the same way regular qemu's VirtIO devices
do.
This commit only exposes basic SVQ allocation and free. Next patches of
the series add functionality like notifications and buffers forwarding.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
When ioeventfd is emulated using qemu_pipe(), only EventNotifier's wfd
can be used for writing.
Use the recently introduced event_notifier_get_wfd() function to
obtain the fd that our peer must use to signal the vring.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20220304100854.14829-3-slp@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In vhost_vsock_common_send_transport_reset(), if an element popped from
the virtqueue is invalid, we should call virtqueue_detach_element() to
detach it from the virtqueue before freeing its memory.
Fixes: fc0b9b0e1c ("vhost-vsock: add virtio sockets device")
Fixes: CVE-2022-26354
Cc: qemu-stable@nongnu.org
Reported-by: VictorV <vv474172261@gmail.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20220228095058.27899-1-sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The driver can create a bypass domain by passing the
VIRTIO_IOMMU_ATTACH_F_BYPASS flag on the ATTACH request. Bypass domains
perform slightly better than domains with identity mappings since they
skip translation.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20220214124356.872985-4-jean-philippe@linaro.org>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently the virtio-iommu device must be programmed before it allows
DMA from any PCI device. This can make the VM entirely unusable when a
virtio-iommu driver isn't present, for example in a bootloader that
loads the OS from storage.
Similarly to the other vIOMMU implementations, default to DMA bypassing
the IOMMU during boot. Add a "boot-bypass" property, defaulting to true,
that lets users change this behavior.
Replace the VIRTIO_IOMMU_F_BYPASS feature, which didn't support bypass
before feature negotiation, with VIRTIO_IOMMU_F_BYPASS_CONFIG.
We add the bypass field to the migration stream without introducing
subsections, based on the assumption that this virtio-iommu device isn't
being used in production enough to require cross-version migration at
the moment (all previous version required workarounds since they didn't
support ACPI and boot-bypass).
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20220214124356.872985-3-jean-philippe@linaro.org>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>