The previous fix to virtio_device_started revealed a problem in its
use by both the core and the device code. The core code should be able
to handle the device "starting" while the VM isn't running to handle
the restoration of migration state. To solve this duel use introduce a
new helper for use by the vhost-user backends who all use it to feed a
should_start variable.
We can also pick up a change vhost_user_blk_set_status while we are at
it which follows the same pattern.
Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started)
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Message-Id: <20221107121407.1010913-1-alex.bennee@linaro.org>
The motivation of adding vhost-user vhost_dev_start support is to
improve backend configuration speed and reduce live migration VM
downtime.
Today VQ configuration is issued one by one. For virtio net with
multi-queue support, backend needs to update RSS (Receive side
scaling) on every rx queue enable. Updating RSS is time-consuming
(typical time like 7ms).
Implement already defined vhost status and message in the vhost
specification [1].
(a) VHOST_USER_PROTOCOL_F_STATUS
(b) VHOST_USER_SET_STATUS
(c) VHOST_USER_GET_STATUS
Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for
device start and reset(0) for device stop.
On reception of the DRIVER_OK message, backend can apply the needed setting
only once (instead of incremental) and also utilize parallelism on enabling
queues.
This improves QEMU's live migration downtime with vhost user backend
implementation by great margin, specially for the large number of VQs of 64
from 800 msec to 250 msec.
[1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Parav Pandit <parav@nvidia.com>
Message-Id: <20221017064452.1226514-3-yajunw@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
There were several different ways to deal with the situation where the
vector specified for a msix function is out of bound:
- early return a function and keep progresssing
- propagate the error to the caller
- mark msix unusable
- assert it is in bound
- just ignore
An out-of-bound vector should not be specified if the device
implementation is correct so let msix functions always assert that the
specified vector is in range.
An exceptional case is virtio-pci, which allows the guest to configure
vectors. For virtio-pci, it is more appropriate to introduce its own
checks because it is sometimes too late to check the vector range in
msix functions.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20220829083524.143640-1-akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Yuval Shaia <yuval.shaia.ml@gmail.com>
Signed-off-by: Akihiko Odaki <<a href="mailto:akihiko.odaki@daynix.com" target="_blank">akihiko.odaki@daynix.com</a>><br>
vhost backend sends host notification for every VQ. If backend creates
VQs in parallel, the VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG may
arrive to QEMU in different order than incremental queue index order.
For example VQ 1's message arrive earlier than VQ 0's:
After alloc VhostUserHostNotifier for VQ 1. GPtrArray becomes
[ nil, VQ1 pointer ]
After alloc VhostUserHostNotifier for VQ 0. GPtrArray becomes
[ VQ0 pointer, nil, VQ1 pointer ]
This is wrong. fetch_notifier will return NULL for VQ 1 in
vhost_user_get_vring_base, causes host notifier miss removal(leak).
The fix is to remove current element from GPtrArray, make the right
position for element to insert.
Fixes: 503e355465 ("virtio/vhost-user: dynamically assign VhostUserHostNotifiers")
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Parav Pandit <parav@nvidia.com>
Message-Id: <20221018023651.1359420-1-yajunw@nvidia.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Most other virtio-pci devices allow MSI-X, let's have it for rng too.
Signed-off-by: David Daney <david.daney@fungible.com>
Reviewed-by: Marcin Nowakowski <marcin.nowakowski@fungible.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@fungible.com>
Message-Id: <20221014160947.66105-1-philmd@fungible.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Expose vhost_virtqueue_stop(), we need to use it when resetting a
virtqueue.
Signed-off-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221017092558.111082-9-xuanzhuo@linux.alibaba.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Expose vhost_virtqueue_start(), we need to use it when restarting a
virtqueue.
Signed-off-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221017092558.111082-8-xuanzhuo@linux.alibaba.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
PCI devices support device specific vq enable.
Based on this function, the driver can re-enable the virtqueue after the
virtqueue is reset.
Signed-off-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221017092558.111082-7-xuanzhuo@linux.alibaba.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
PCI devices support vq reset.
Based on this function, the driver can adjust the size of the ring, and
quickly recycle the buffer in the ring.
The migration of the virtio devices will not happen during a reset
operation. This is becuase the global iothread lock is held. Migration
thread also needs the lock. As a result, when migration of virtio
devices starts, the 'reset' status of VirtIOPCIQueue will always be 0.
Thus, we do not need to add it in vmstate_virtio_pci_modern_queue_state.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221017092558.111082-6-xuanzhuo@linux.alibaba.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introduce the interface queue_enable() in VirtioDeviceClass and the
fucntion virtio_queue_enable() in virtio, it can be called when
VIRTIO_PCI_COMMON_Q_ENABLE is written and related virtqueue can be
started. It only supports the devices of virtio 1 or later. The
not-supported devices can only start the virtqueue when DRIVER_OK.
Signed-off-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221017092558.111082-4-xuanzhuo@linux.alibaba.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introduce a new interface function virtio_queue_reset() to implement
reset for vq.
Add a new callback to VirtioDeviceClass for queue reset operation for
each child device.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221017092558.111082-3-xuanzhuo@linux.alibaba.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Separate the logic of vq reset. This logic will be called directly
later.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221017092558.111082-2-xuanzhuo@linux.alibaba.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In theory the virtio-iommu-pci could be plugged anywhere in the PCIe
topology and as long as the dt/acpi info are properly built this should
work. However at the moment we fail to do that because the
virtio-iommu-pci BDF is not computed at plug time and in that case
vms->virtio_iommu_bdf gets an incorrect value.
For instance if the virtio-iommu-pci is plugged onto a pcie root port
and the virtio-iommu protects a virtio-block-pci device the guest does
not boot.
So let's do not pretend we do support this case and fail the initialize()
if we detect the virtio-iommu-pci is plugged anywhere else than on the
root bus. Anyway this ability is not needed.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20221012163448.121368-1-eric.auger@redhat.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Tested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio-crypto: Modify the current interface of virtio-crypto
device to support asynchronous mode.
Signed-off-by: lei he <helei.sig11@bytedance.com>
Message-Id: <20221008085030.70212-2-helei.sig11@bytedance.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Enabling all the code path created before.
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
So SVQ code knows if an event is needed.
The code is not reachable at the moment.
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Actually use the new field of the used ring and tell the device if SVQ
wants to be notified.
The code is not reachable at the moment.
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
There was not enough room to accomodate them.
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
... and implement it under POSIX. When a ThreadContext is provided,
create new threads via the context such that these new threads obtain a
properly configured CPU affinity.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <20221014134720.168738-6-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's
* give the function a "qemu_*" style name
* make sure the parameters in the implementation match the prototype
* rename smp_cpus to max_threads, which makes the semantics of that
parameter clearer
... and add a function documentation.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <20221014134720.168738-2-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
While being at it add a #define for the magic 0x1040 number.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20221004112100.301935-6-kraxel@redhat.com>
Not needed for a virtio 1.0 device. virtio_pci_device_plugged()
overrides them anyway (so no functional change).
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20221004112100.301935-3-kraxel@redhat.com>
Not needed for a virtio 1.0 device. virtio_pci_device_plugged()
overrides them anyway (so no functional change).
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20221004112100.301935-2-kraxel@redhat.com>
This new command shows the information of a VirtQueue element.
[Note: Up until v10 of this patch series, virtio.json had many (15+)
enums defined (e.g. decoded device features, statuses, etc.). In v10
most of these enums were removed and replaced with string literals.
By doing this we get (1) simpler schema, (2) smaller generated code,
and (3) less maintenance burden for when new things are added (e.g.
devices, device features, etc.).]
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <1660220684-24909-6-git-send-email-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
These new commands show the internal status of a VirtIODevice's
VirtQueue and a vhost device's vhost_virtqueue (if active).
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <1660220684-24909-5-git-send-email-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Display feature names instead of bitmaps for host, guest, and
backend for VirtIODevices.
Display status names instead of bitmaps for VirtIODevices.
Display feature names instead of bitmaps for backend, protocol,
acked, and features (hdev->features) for vhost devices.
Decode features according to device ID. Decode statuses
according to configuration status bitmap (config_status_map).
Decode vhost user protocol features according to vhost user
protocol bitmap (vhost_user_protocol_map).
Transport features are on the first line. Undecoded bits (if
any) are stored in a separate field.
[Jonah: Several changes made to this patch from prev. version (v14):
- Moved all device features mappings to hw/virtio/virtio.c
- Renamed device features mappings (less generic)
- Generalized @FEATURE_ENTRY macro for all device mappings
- Virtio device feature map definitions include descriptions of
feature bits
- Moved @VHOST_USER_F_PROTOCOL_FEATURES feature bit from transport
feature map to vhost-user-supported device feature mappings
(blk, fs, i2c, rng, net, gpu, input, scsi, vsock)
- New feature bit added for virtio-vsock: @VIRTIO_VSOCK_F_SEQPACKET
- New feature bit added for virtio-iommu: @VIRTIO_IOMMU_F_BYPASS_CONFIG
- New feature bit added for virtio-mem: @VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE
- New virtio transport feature bit added: @VIRTIO_F_IN_ORDER
- Added device feature map definition for virtio-rng
]
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <1660220684-24909-4-git-send-email-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This new command shows the status of a VirtIODevice, including
its corresponding vhost device's status (if active).
Next patch will improve output by decoding feature bits, including
vhost device's feature bits (backend, protocol, acked, and features).
Also will decode status bits of a VirtIODevice.
[Jonah: From patch v12; added a check to @virtio_device_find to ensure
synchronicity between @virtio_list and the devices in the QOM
composition tree.]
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <1660220684-24909-3-git-send-email-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This new command lists all the instances of VirtIODevices with
their canonical QOM path and name.
[Jonah: @virtio_list duplicates information that already exists in
the QOM composition tree. However, extracting necessary information
from this tree seems to be a bit convoluted.
Instead, we still create our own list of realized virtio devices
but use @qmp_qom_get with the device's canonical QOM path to confirm
that the device exists and is realized. If the device exists but
is actually not realized, then we remove it from our list (for
synchronicity to the QOM composition tree).
Also, the QMP command @x-query-virtio is redundant as @qom-list
and @qom-get are sufficient to search '/machine/' for realized
virtio devices. However, @x-query-virtio is much more convenient
in listing realized virtio devices.]
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <1660220684-24909-2-git-send-email-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This is the first step towards moving all device config size calculation
logic into the virtio core code. In particular, this adds a struct that
contains all the necessary information for common virtio code to be able
to calculate the final config size for a device. This is expected to be
used with the new virtio_get_config_size helper, which calculates the
final length based on the provided host features.
This builds on top of already existing code like VirtIOFeature and
virtio_feature_get_config_size(), but adds additional fields, as well as
sanity checking so that device-specifc code doesn't have to duplicate it.
An example usage would be:
static const VirtIOFeature dev_features[] = {
{.flags = 1ULL << FEATURE_1_BIT,
.end = endof(struct virtio_dev_config, feature_1)},
{.flags = 1ULL << FEATURE_2_BIT,
.end = endof(struct virtio_dev_config, feature_2)},
{}
};
static const VirtIOConfigSizeParams dev_cfg_size_params = {
.min_size = DEV_BASE_CONFIG_SIZE,
.max_size = sizeof(struct virtio_dev_config),
.feature_sizes = dev_features
};
// code inside my_dev_device_realize()
size_t config_size = virtio_get_config_size(&dev_cfg_size_params,
host_features);
virtio_init(vdev, VIRTIO_ID_MYDEV, config_size);
Currently every device is expected to write its own boilerplate from the
example above in device_realize(), however, the next step of this
transition is moving VirtIOConfigSizeParams into VirtioDeviceClass,
so that it can be done automatically by the virtio initialization code.
All of the users of virtio_feature_get_config_size have been converted
to use virtio_get_config_size so it's no longer needed and is removed
with this commit.
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Message-Id: <20220906073111.353245-2-d-tatianin@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This allows is to instantiate a vhost-user-gpio device as part of a PCI
bus. It is mostly boilerplate which looks pretty similar to the
vhost-user-fs-pci device.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <5f560cab92d0d789b1c94295ec74b9952907d69d.1641987128.git.viresh.kumar@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220802095010.3330793-14-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This creates the QEMU side of the vhost-user-gpio device which connects
to the remote daemon. It is based of vhost-user-i2c code.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <5390324a748194a21bc99b1538e19761a8c64092.1641987128.git.viresh.kumar@linaro.org>
[AJB: fixes for qtest, tweaks to feature bits]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Message-Id: <20220802095010.3330793-13-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The `started` field is manipulated internally within the vhost code
except for one place, vhost-user-blk via f5b22d06fb (vhost: recheck
dev state in the vhost_migration_log routine). Mark that as a FIXME
because it introduces a potential race. I think the referenced fix
should be tracking its state locally.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220802095010.3330793-12-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwittz@nutanix.com>
All the boilerplate virtio code does the same thing (or should at
least) of checking to see if the VM is running before attempting to
start VirtIO. Push the logic up to the common function to avoid
getting a copy and paste wrong.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220802095010.3330793-11-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
These are useful for tracing the lifetime of vhost-user connections.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220802095010.3330793-10-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
If the guest driver attempts to use the UNUSED(30) bit it is
potentially buggy as 6.3 Legacy Interface: Reserved Feature Bits
states it "SHOULD NOT be negotiated". For now just log this guest
error.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220802095010.3330793-9-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220802095010.3330793-8-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
There are some extra bits used over a vhost-user connection which are
hidden from the device itself. We need to set them here to ensure we
enable things like the protocol extensions.
Currently net/vhost-user.c has it's own inscrutable way of persisting
this data but it really should live in the core vhost_user code.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220726192150.2435175-7-alex.bennee@linaro.org>
Message-Id: <20220802095010.3330793-2-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.
This commit only touches allocations with size arguments of the form
sizeof(T).
Patch created mechanically with:
$ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \
--macro-file scripts/cocci-macro-file.h FILES...
The previous iteration was commit a95942b50c.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220923084254.4173111-1-armbru@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
GCC issues a false positive warning, resulting in build failure with -Werror:
In file included from /usr/include/glib-2.0/glib.h:114,
from src/include/glib-compat.h:32,
from src/include/qemu/osdep.h:144,
from ../src/hw/virtio/vhost-shadow-virtqueue.c:10:
In function ‘g_autoptr_cleanup_generic_gfree’,
inlined from ‘vhost_handle_guest_kick’ at ../src/hw/virtio/vhost-shadow-virtqueue.c:292:42:
/usr/include/glib-2.0/glib/glib-autocleanups.h:28:3: error: ‘elem’ may be used uninitialized [-Werror=maybe-uninitialized]
28 | g_free (*pp);
| ^~~~~~~~~~~~
../src/hw/virtio/vhost-shadow-virtqueue.c: In function ‘vhost_handle_guest_kick’:
../src/hw/virtio/vhost-shadow-virtqueue.c:292:42: note: ‘elem’ was declared here
292 | g_autofree VirtQueueElement *elem;
| ^~~~
cc1: all warnings being treated as errors
There is actually no problem since "elem" is initialized in both branches.
Silence the warning by initializig it with "NULL".
$ gcc --version
gcc (GCC) 12.2.0
Fixes: 9c2ab2f1ec ("vhost: stop transfer elem ownership in vhost_handle_guest_kick")
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220910151117.6665-1-shentey@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
As the close-on-exec flags is not set on the file descriptors returned
by socketpair() at default, the fds will survive across exec' function.
In the case that exec' function get invoked, such as the live-update feature
which is been developing, it will cause fd leaks.
To address this problem, we should call qemu_socketpair() to create an pair of
connected sockets with the close-on-exec flag set.
Signed-off-by: Guoyi Tu <tugy@chinatelecom.cn>
Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <7002b12a5fb0a30cd878e14e07da61c36da72913.1661240709.git.tugy@chinatelecom.cn>
We can restore the device state in the destination via CVQ now. Remove
the migration blocker.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Since QEMU will be able to inject new elements on CVQ to restore the
state, we need not to depend on a VirtQueueElement to know if a new
element has been used by the device or not. Instead of check that, check
if there are new elements only using used idx on vhost_svq_flush.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
As discussed in previous series [1], this memory barrier is useless with
the atomic read of used idx at vhost_svq_more_used. Deleting it.
[1] https://lists.nongnu.org/archive/html/qemu-devel/2022-07/msg02616.html
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Since we're going to allow SVQ to add elements without the guest's
knowledge and without its own VirtQueueElement, it's easier to check if
an element is a valid head checking a different thing than the
VirtQueueElement.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
It was easier to allow vhost_svq_add to handle the memory. Now that we
will allow qemu to add elements to a SVQ without the guest's knowledge,
it's better to handle it in the caller.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reduce code duplication.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
We can unbind twice a file descriptor if we call twice
vhost_svq_set_svq_kick_fd because of this. Since it comes from vhost and
not from SVQ, that file descriptor could be a different thing that
guest's vhost notifier.
Likewise, it can happens the same if a guest start and stop the device
multiple times.
Reported-by: Lei Yang <leiyang@redhat.com>
Fixes: dff4426fa6 ("vhost: Add Shadow VirtQueue kick forwarding capabilities")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Nothing actually reads the return value, but an error in cleaning some
entries could cause device stop to abort, making a restart impossible.
Better ignore explicitely the return value.
Reported-by: Lei Yang <leiyang@redhat.com>
Fixes: 34e3c94eda ("vdpa: Add custom IOTLB translations to SVQ")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Although the device will be reset before usage, the right thing to do is
to clean it.
Reported-by: Lei Yang <leiyang@redhat.com>
Fixes: 34e3c94eda ("vdpa: Add custom IOTLB translations to SVQ")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
It's convenient to call iova_tree_remove from a map returned from
iova_tree_find or iova_tree_find_iova. With the current code this is not
possible, since we will free it, and then we will try to search for it
again.
Fix it making accepting the map by value, forcing a copy of the
argument. Not applying a fixes tag, since there is no use like that at
the moment.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
If a map fails for whatever reason, it must not be saved in the tree.
Otherwise, qemu will try to unmap it in cleanup, leaving to more errors.
Fixes: 34e3c94eda ("vdpa: Add custom IOTLB translations to SVQ")
Reported-by: Lei Yang <leiyang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Next patch will skip the registering of dma maps that the vdpa device
rejects in the iova tree. We need to consider that here or we cause a
SIGSEGV accessing result.
Reported-by: Lei Yang <leiyang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
virtio level reset should not affect pci express
registers such as PM, error or link.
Fixes: 27ce0f3afc ("hw/virtio: fix Power Management Control Register for PCI Express virtio devices")
Fixes: d584f1b9ca ("hw/virtio: fix Link Control Register for PCI Express virtio devices")
Fixes: c2cabb3422 ("hw/virtio: fix error enabling flags in Device Control register")
Cc: "Marcel Apfelbaum" <marcel@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
As reads happen in the callback we were never seeing them. We only
really care about the header so move the tracepoint to when the header
is complete.
Fixes: 6ca6d8ee9d (hw/virtio: add vhost_user_[read|write] trace points)
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220728135503.1060062-5-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The assert() protecting against leakage is a little aggressive and
causes needless crashes if a device is shutdown without having been
configured. In this case no descriptors are lost because none have
been assigned.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220728135503.1060062-4-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
I've noticed asserts firing because we query the status of vdev after
a vhost connection is closed down. Rather than faulting on the NULL
indirect just quietly reply false.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220728135503.1060062-3-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently we only enforce power-of-two mappings (required by the QEMU
notifier) for UNMAP requests. A MAP request not aligned on a
power-of-two may be successfully handled by VFIO, and then the
corresponding UNMAP notify will fail because it will attempt to split
that mapping. Ensure MAP and UNMAP notifications are consistent.
Fixes: dde3f08b5c ("virtio-iommu: Handle non power of 2 range invalidations")
Reported-by: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20220718135636.338264-1-jean-philippe@linaro.org>
Tested-by: Tina Zhang <tina.zhang@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost_vdpa_listener_region_del is always deleting the first iova entry
of the tree, since it's using the needle iova instead of the result's
one.
This was detected using a vga virtual device in the VM using vdpa SVQ.
It makes some extra memory adding and deleting, so the wrong one was
mapped / unmapped. This was undetected before since all the memory was
mappend and unmapped totally without that device, but other conditions
could trigger it too:
* mem_region was with .iova = 0, .translated_addr = (correct GPA).
* iova_tree_find_iova returned right result, but does not update
mem_region.
* iova_tree_remove always removed region with .iova = 0. Right iova were
sent to the device.
* Next map will fill the first region with .iova = 0, causing a mapping
with the same iova and device complains, if the next action is a map.
* Next unmap will cause to try to unmap again iova = 0, causing the
device to complain that no region was mapped at iova = 0.
Fixes: 34e3c94eda ("vdpa: Add custom IOTLB translations to SVQ")
Reported-by: Lei Yang <leiyang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The SVQ vring used idx usually match with the guest visible one, as long
as all the guest buffers (GPA) maps to exactly one buffer within qemu's
VA. However, as we can see in virtqueue_map_desc, a single guest buffer
could map to many buffers in SVQ vring.
Also, its also a mistake to rewind them at the source of migration.
Since VirtQueue is able to migrate the inflight descriptors, its
responsability of the destination to perform the rewind just in case it
cannot report the inflight descriptors to the device.
This makes easier to migrate between backends or to recover them in
vhost devices that support set in flight descriptors.
Fixes: 6d0b222666 ("vdpa: Adapt vhost_vdpa_get_vring_base to SVQ")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Since the vhost-vdpa device is exposing _F_LOG, adding a migration blocker if
it uses CVQ.
However, qemu is able to migrate simple devices with no CVQ as long as
they use SVQ. To allow it, add a placeholder error to vhost_vdpa, and
only add to vhost_dev when used. vhost_dev machinery place the migration
blocker if needed.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Do a simple forwarding of CVQ buffers, the same work SVQ could do but
through callbacks. No functional change intended.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Shadow CVQ will copy buffers on qemu VA, so we avoid TOCTOU attacks from
the guest that could set a different state in qemu device model and vdpa
device.
To do so, it needs to be able to map these new buffers to the device.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This allows external handlers to be aware of new buffers that the guest
places in the virtqueue.
When this callback is defined the ownership of the guest's virtqueue
element is transferred to the callback. This means that if the user
wants to forward the descriptor it needs to manually inject it. The
callback is also free to process the command by itself and use the
element with svq_push.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
It allows the Shadow Control VirtQueue to wait for the device to use the
available buffers.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This allows external parts of SVQ to forward custom buffers to the
device.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This function allows external SVQ users to return guest's available
buffers.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
A guest's buffer continuos on GPA may need multiple descriptors on
qemu's VA, so SVQ should track its length sepparatedly.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This will allow SVQ to add context to the different queue elements.
This patch only store the actual element, no functional change intended.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
VirtQueueElement comes from the guest, but we're heading SVQ to be able
to modify the element presented to the device without the guest's
knowledge.
To do so, make SVQ accept sg buffers directly, instead of using
VirtQueueElement.
Add vhost_svq_add_element to maintain element convenience.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The series need to expose vhost_svq_add with full functionality,
including checking for full queue.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The series needs to expose vhost_svq_add with full functionality,
including kick
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Future code needs to call it from vhost_svq_add.
No functional change intended.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
In the next patch we will allow busypolling of this value. The compiler
have a running path where shadow_used_idx, last_used_idx, and vring used
idx are not modified within the same thread busypolling.
This was not an issue before since we always cleared device event
notifier before checking it, and that could act as memory barrier.
However, the busypoll needs something similar to kernel READ_ONCE.
Let's add it here, sepparated from the polling.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
It's done for both in and out descriptors so it's better placed here.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
We also need to switch to the right address space on dest side
after loading the device status. DMA to wrong address space is
destructive.
Fixes: 3facd774962fd ("virtio-iommu: Add bypass mode support to assigned device")
Suggested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20220624093740.3525267-1-zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Vhost has error notifications, let's log them like other errors.
For each virt-queue setup eventfd for vring error notifications.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
[vsementsov: rename patch, change commit message and dump error like
other errors in the file]
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20220623161325.18813-3-vsementsov@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Kernel and user vhost may report virtqueue errors via eventfd.
This is only reliable way to get notification about protocol error.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Message-Id: <20220623161325.18813-2-vsementsov@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
The structure of probe request doesn't include the tail, this leads
to a few field missed to be copied. Currently this isn't an issue as
those missed field belong to reserved field, just in case reserved
field will be used in the future.
Changed 4th parameter of virtio_iommu_iov_to_req() to receive size
of device-readable part.
Fixes: 1733eebb9e ("virtio-iommu: Implement RESV_MEM probe request")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20220623023152.3473231-1-zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
When check queue state in the vhost_dev_set_log routine, it miss the error
routine check, this patch also check queue state in error case.
Fixes: 1e5a050f57 ("check queue state in the vhost_dev_set_log routine")
Signed-off-by: Ni Xun <richardni@tencent.com>
Reviewed-by: Zhigang Lu <tonnylu@tencent.com>
Message-Id: <OS0PR01MB57139163F3F3955960675B52EAA79@OS0PR01MB5713.jpnprd01.prod.outlook.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
There are two parts in this patch:
1, support akcipher service by cryptodev-builtin driver
2, virtio-crypto driver supports akcipher service
In principle, we should separate this into two patches, to avoid
compiling error, merge them into one.
Then virtio-crypto gets request from guest side, and forwards the
request to builtin driver to handle it.
Test with a guest linux:
1, The self-test framework of crypto layer works fine in guest kernel
2, Test with Linux guest(with asym support), the following script
test(note that pkey_XXX is supported only in a newer version of keyutils):
- both public key & private key
- create/close session
- encrypt/decrypt/sign/verify basic driver operation
- also test with kernel crypto layer(pkey add/query)
All the cases work fine.
Run script in guest:
rm -rf *.der *.pem *.pfx
modprobe pkcs8_key_parser # if CONFIG_PKCS8_PRIVATE_KEY_PARSER=m
rm -rf /tmp/data
dd if=/dev/random of=/tmp/data count=1 bs=20
openssl req -nodes -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -subj "/C=CN/ST=BJ/L=HD/O=qemu/OU=dev/CN=qemu/emailAddress=qemu@qemu.org"
openssl pkcs8 -in key.pem -topk8 -nocrypt -outform DER -out key.der
openssl x509 -in cert.pem -inform PEM -outform DER -out cert.der
PRIV_KEY_ID=`cat key.der | keyctl padd asymmetric test_priv_key @s`
echo "priv key id = "$PRIV_KEY_ID
PUB_KEY_ID=`cat cert.der | keyctl padd asymmetric test_pub_key @s`
echo "pub key id = "$PUB_KEY_ID
keyctl pkey_query $PRIV_KEY_ID 0
keyctl pkey_query $PUB_KEY_ID 0
echo "Enc with priv key..."
keyctl pkey_encrypt $PRIV_KEY_ID 0 /tmp/data enc=pkcs1 >/tmp/enc.priv
echo "Dec with pub key..."
keyctl pkey_decrypt $PRIV_KEY_ID 0 /tmp/enc.priv enc=pkcs1 >/tmp/dec
cmp /tmp/data /tmp/dec
echo "Sign with priv key..."
keyctl pkey_sign $PRIV_KEY_ID 0 /tmp/data enc=pkcs1 hash=sha1 > /tmp/sig
echo "Verify with pub key..."
keyctl pkey_verify $PRIV_KEY_ID 0 /tmp/data /tmp/sig enc=pkcs1 hash=sha1
echo "Enc with pub key..."
keyctl pkey_encrypt $PUB_KEY_ID 0 /tmp/data enc=pkcs1 >/tmp/enc.pub
echo "Dec with priv key..."
keyctl pkey_decrypt $PRIV_KEY_ID 0 /tmp/enc.pub enc=pkcs1 >/tmp/dec
cmp /tmp/data /tmp/dec
echo "Verify with pub key..."
keyctl pkey_verify $PUB_KEY_ID 0 /tmp/data /tmp/sig enc=pkcs1 hash=sha1
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: lei he <helei.sig11@bytedance.com
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20220611064243.24535-2-pizhenwei@bytedance.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
With address space switch supported, dma access translation only
happen after endpoint is attached to a non-bypass domain.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20220613061010.2674054-4-zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When switching address space with mutex lock hold, mapping will be
replayed for assigned device. This will trigger relock deadlock.
Also release the mutex resource in unrealize routine.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20220613061010.2674054-3-zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently assigned devices can not work in virtio-iommu bypass mode.
Guest driver fails to probe the device due to DMA failure. And the
reason is because of lacking GPA -> HPA mappings when VM is created.
Add a root container memory region to hold both bypass memory region
and iommu memory region, so the switch between them is supported
just like the implementation in virtual VT-d.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20220613061010.2674054-2-zhenzhong.duan@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In fetch_or_create_notifier, idx begins with 0. So the GPtrArray size
should be idx + 1 and g_ptr_array_set_size should be called with idx + 1.
This wrong GPtrArray size causes fetch_or_create_notifier return an invalid
address. Passing this invalid pointer to vhost_user_host_notifier_remove
causes assert fail:
qemu/include/qemu/int128.h:27: int128_get64: Assertion `r == a' failed.
shutting down, reason=crashed
Backends like dpdk-vdpa which sends out vhost notifier requests almost always
hit qemu crash.
Fixes: 503e355465 ("virtio/vhost-user: dynamically assign VhostUserHostNotifiers")
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Parav Pandit <parav@nvidia.com>
Change-Id: I87e0f7591ca9a59d210879b260704a2d9e9d6bcd
Message-Id: <20220526034851.683258-1-yajunw@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Eddie Dong <eddie.dong@intel.com>
Make virtio_mmio_soft_reset reset the virtio device, which is performed by
both the "soft" and the "hard" reset; and then call virtio_mmio_soft_reset
from virtio_mmio_reset to emphasize that the latter is a superset of the
former.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
All calls to virtio_bus_reset are preceded by virtio_bus_stop_ioeventfd,
move the call in virtio_bus_reset: that makes sense and clarifies
that the vdc->reset function is called with ioeventfd already stopped.
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the queue PFN is set to zero on a virtio-mmio device, the device is reset.
In that case however the virtio_bus_stop_ioeventfd function was not
called; add it so that the behavior is similar to when status is set to 0.
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Variable `vdev` in `struct vhost_dev` will not be ready
until start the device, so let's not use it for the error
output here.
Fixes: 5653493 ("hw/virtio/vhost-user: don't suppress F_CONFIG when supported")
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Message-Id: <20220525125540.50979-1-changpeng.liu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: 5653493 ("hw/virtio/vhost-user: don't suppress F_CONFIG when supported")
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Fixes: 5653493 ("hw/virtio/vhost-user: don't suppress F_CONFIG when supported")
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
The name vhost_vdpa_one_time_request() was confusing. No
matter whatever it returns, its typical occurrence had
always been at requests that only need to be applied once.
And the name didn't suggest what it actually checks for.
Change it to vhost_vdpa_first_dev() with polarity flipped
for better readibility of code. That way it is able to
reflect what the check is really about.
This call is applicable to request which performs operation
only once, before queues are set up, and usually at the beginning
of the caller function. Document the requirement for it in place.
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Message-Id: <1651890498-24478-7-git-send-email-si-wei.liu@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
The vhost_vdpa_one_time_request() branch in
vhost_vdpa_set_backend_cap() incorrectly sends down
ioctls on vhost_dev with non-zero index. This may
end up with multiple VHOST_SET_BACKEND_FEATURES
ioctl calls sent down on the vhost-vdpa fd that is
shared between all these vhost_dev's.
To fix it, send down ioctl only once via the first
vhost_dev with index 0. Toggle the polarity of the
vhost_vdpa_one_time_request() test should do the
trick.
Fixes: 4d191cfdc7 ("vhost-vdpa: classify one time request")
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <1651890498-24478-6-git-send-email-si-wei.liu@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch adds a get_vhost() callback function for VirtIODevices that
returns the device's corresponding vhost_dev structure, if the vhost
device is running. This patch also adds a vhost_started flag for
VirtIODevices.
Previously, a VirtIODevice wouldn't be able to tell if its corresponding
vhost device was active or not.
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <1648819405-25696-3-git-send-email-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch drops the name parameter for the virtio_init function.
The pair between the numeric device ID and the string device ID
(name) of a virtio device already exists, but not in a way that
lets us map between them.
This patch lets us do this and removes the need for the name
parameter in the virtio_init function.
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <1648819405-25696-2-git-send-email-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
At a couple of hundred bytes per notifier allocating one for every
potential queue is very wasteful as most devices only have a few
queues. Instead of having this handled statically dynamically assign
them and track in a GPtrArray.
[AJB: it's hard to trigger the vhost notifiers code, I assume as it
requires a KVM guest with appropriate backend]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220321153037.3622127-14-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Previously we would silently suppress VHOST_USER_PROTOCOL_F_CONFIG
during the protocol negotiation if the QEMU stub hadn't implemented
the vhost_dev_config_notifier. However this isn't the only way we can
handle config messages, the existing vdc->get/set_config can do this
as well.
Lightly re-factor the code to check for both potential methods and
instead of silently squashing the feature error out. It is unlikely
that a vhost-user backend expecting to handle CONFIG messages will
behave correctly if they never get sent.
Fixes: 1c3e5a2617 ("vhost-user: back SET/GET_CONFIG requests with a protocol feature")
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220321153037.3622127-13-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
These are useful when trying to debug the initial vhost-user
negotiation, especially when it hard to get logging from the low level
library on the other side.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220321153037.3622127-4-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200925125147.26943-6-alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220321153037.3622127-3-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This allows other device classes that will be exposed via PCI to be
able to do so in the appropriate hw/ directory. I resisted the
temptation to re-order headers to be more aesthetically pleasing.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200925125147.26943-4-alex.bennee@linaro.org>
Message-Id: <20220321153037.3622127-2-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Coverity rightly reports that is not free in that case.
Fixes: Coverity CID 1487559
Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20220512175747.142058-7-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Per https://discourse.gnome.org/t/port-your-module-from-g-memdup-to-g-memdup2-now/5538
The old API took the size of the memory to duplicate as a guint,
whereas most memory functions take memory sizes as a gsize. This
made it easy to accidentally pass a gsize to g_memdup(). For large
values, that would lead to a silent truncation of the size from 64
to 32 bits, and result in a heap area being returned which is
significantly smaller than what the caller expects. This can likely
be exploited in various modules to cause a heap buffer overflow.
Replace g_memdup() by the safer g_memdup2() wrapper.
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20220512175747.142058-6-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
With the introduction of MQ the index of the vq needs to be calculated
with the device model vq_index.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220512175747.142058-5-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: 6d0b222666 ("vdpa: Adapt vhost_vdpa_get_vring_base to SVQ")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20220512175747.142058-4-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Only the first one of them were properly enqueued back.
Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20220512175747.142058-3-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The device could have access to modify them, and it definitely have
access when we implement packed vq. Harden SVQ maintaining a private
copy of the descriptor chain. Other fields like buffer addresses are
already maintained sepparatedly.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20220512175747.142058-2-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Unlike most virtio features ACCESS_PLATFORM is considered mandatory by
QEMU, i.e. the driver must accept it if offered by the device. The
virtio specification says that the driver SHOULD accept the
ACCESS_PLATFORM feature if offered, and that the device MAY fail to
operate if ACCESS_PLATFORM was offered but not negotiated.
While a SHOULD ain't exactly a MUST, we are certainly allowed to fail
the device when the driver fences ACCESS_PLATFORM. With commit
2943b53f68 ("virtio: force VIRTIO_F_IOMMU_PLATFORM") we already made the
decision to do so whenever the get_dma_as() callback is implemented (by
the bus), which in practice means for the entirety of virtio-pci.
That means, if the device needs to translate I/O addresses, then
ACCESS_PLATFORM is mandatory. The aforementioned commit tells us in the
commit message that this is for security reasons. More precisely if we
were to allow a less then trusted driver (e.g. an user-space driver, or
a nested guest) to make the device bypass the IOMMU by not negotiating
ACCESS_PLATFORM, then the guest kernel would have no ability to
control/police (by programming the IOMMU) what pieces of guest memory
the driver may manipulate using the device. Which would break security
assumptions within the guest.
If ACCESS_PLATFORM is offered not because we want the device to utilize
an IOMMU and do address translation, but because the device does not
have access to the entire guest RAM, and needs the driver to grant
access to the bits it needs access to (e.g. confidential guest support),
we still require the guest to have the corresponding logic and to accept
ACCESS_PLATFORM. If the driver does not accept ACCESS_PLATFORM, then
things are bound to go wrong, and we may see failures much less graceful
than failing the device because the driver didn't negotiate
ACCESS_PLATFORM.
So let us make ACCESS_PLATFORM mandatory for the driver regardless
of whether the get_dma_as() callback is implemented or not.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Fixes: 2943b53f68 ("virtio: force VIRTIO_F_IOMMU_PLATFORM")
Message-Id: <20220307112939.2780117-1-pasic@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
The vsock callbacks .vhost_vsock_set_guest_cid and
.vhost_vsock_set_running are the only ones to be conditional
on #ifdef CONFIG_VHOST_VSOCK. This is different from any other
device-dependent callbacks like .vhost_scsi_set_endpoint, and it
also broke when CONFIG_VHOST_VSOCK was changed to a per-target
symbol.
It would be possible to also use the CONFIG_DEVICES include, but
really there is no reason for most virtio files to be per-target
so just remove the #ifdef to fix the issue.
Reported-by: Dov Murik <dovmurik@linux.ibm.com>
Fixes: 9972ae314f ("build: move vhost-vsock configuration to Kconfig")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The virtio-scsi event virtqueue is not emptied by its handler function.
This is typical for rx virtqueues where the device uses buffers when
some event occurs (e.g. a packet is received, an error condition
happens, etc).
Polling non-empty virtqueues wastes CPU cycles. We are not waiting for
new buffers to become available, we are waiting for an event to occur,
so it's a misuse of CPU resources to poll for buffers.
Introduce the new virtio_queue_aio_attach_host_notifier_no_poll() API,
which is identical to virtio_queue_aio_attach_host_notifier() except
that it does not poll the virtqueue.
Before this patch the following command-line consumed 100% CPU in the
IOThread polling and calling virtio_scsi_handle_event():
$ qemu-system-x86_64 -M accel=kvm -m 1G -cpu host \
--object iothread,id=iothread0 \
--device virtio-scsi-pci,iothread=iothread0 \
--blockdev file,filename=test.img,aio=native,cache.direct=on,node-name=drive0 \
--device scsi-hd,drive=drive0
After this patch CPU is no longer wasted.
Reported-by: Nir Soffer <nsoffer@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Nir Soffer <nsoffer@redhat.com>
Message-id: 20220427143541.119567-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
vhost-user-fs is a device and it should be possible to enable/disable
it with --without-default-devices, not --without-default-features.
Compute its default value in Kconfig to obtain the more intuitive
behavior.
In this case the configure options were undocumented, too.
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vhost-vsock and vhost-user-vsock are two devices of their own; it should
be possible to enable/disable them with --without-default-devices, not
--without-default-features. Compute their default value in Kconfig to
obtain the more intuitive behavior.
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since a sourceset already exists for this, avoid unnecessary repeat
of CONFIG_VIRTIO_PCI.
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The correct name of the macro is TARGET_PPC64.
Fixes: 27598393a2 ("Lift max memory slots limit imposed by vhost-user")
Reported-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
Cc: Peter Turschmid <peter.turschm@nutanix.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20220503180108.34506-1-muriloo@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The spec clarifies now that QEMU should not send a file descriptor in a
request to remove a memory region. Change it accordingly.
For libvhost-user, this is a bug fix that makes it compatible with
rust-vmm's implementation that doesn't send a file descriptor. Keep
accepting, but ignoring a file descriptor for compatibility with older
QEMU versions.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20220407133657.155281-4-kwolf@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The qemu_*block() functions are meant to be be used with sockets (the
win32 implementation expects SOCKET)
Over time, those functions where used with Win32 SOCKET or
file-descriptors interchangeably. But for portability, they must only be
used with socket-like file-descriptors. FDs can use
g_unix_set_fd_nonblocking() instead.
Rename the functions with "socket" in the name to prevent bad usages.
This is effectively reverting commit f9e8cacc55 ("oslib-posix:
rename socket_set_nonblock() to qemu_set_nonblock()").
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Those calls are non-socket fd, or are POSIX-specific. Use the dedicated
GLib API. (qemu_set_nonblock() is for socket-like)
(this is a preliminary patch before renaming qemu_set_nonblock())
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
This header only defines the tcg_allowed variable and the tcg_enabled()
function - which are not required in many files that include this
header. Drop the #include statement there.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220315144107.1012530-1-thuth@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* whpx support for breakpoints and stepping
* initial support for Hyper-V Synthetic Debugging
* use monotonic clock for QemuCond and QemuSemaphore
* Remove qemu-common.h include from most units and lots of other clenaups
* do not include headers for all virtio devices in virtio-ccw.h
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmJXCQAUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNT6wf+NHDJUEdDiwaVGVTGXgHuiaycsymi
FpNPiw/+XxSGN5xF3fkUGgqaDrcwIYwVfnXlghKSz8kp1cP3cjxa5CzNMLGTp5je
N6BxFbD7yC6dhagGm3mj32jlsptv3M38OHqKc3t+RaUAotP5RF2VdCyfUBLG6vU0
aMzvMfMtB5aG0D8Fr5EV63t1JMTceFU0YxsG73UCFs2Yx4Z0cGBbNxMbHweRhd1q
tPeVDS46MFPM3/2cGGHpeeqxkoCTU7A9j1VuNQI3k+Kg+6W5YVxiK/UP7bw77E/a
yAHsmIVTNro8ajMBch73weuHtGtdfFLvCKc6QX6aVjzK4dF1voQ01E7gPQ==
=rMle
-----END PGP SIGNATURE-----
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* Add cpu0-id to query-sev-capabilities
* whpx support for breakpoints and stepping
* initial support for Hyper-V Synthetic Debugging
* use monotonic clock for QemuCond and QemuSemaphore
* Remove qemu-common.h include from most units and lots of other clenaups
* do not include headers for all virtio devices in virtio-ccw.h
# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmJXCQAUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroNT6wf+NHDJUEdDiwaVGVTGXgHuiaycsymi
# FpNPiw/+XxSGN5xF3fkUGgqaDrcwIYwVfnXlghKSz8kp1cP3cjxa5CzNMLGTp5je
# N6BxFbD7yC6dhagGm3mj32jlsptv3M38OHqKc3t+RaUAotP5RF2VdCyfUBLG6vU0
# aMzvMfMtB5aG0D8Fr5EV63t1JMTceFU0YxsG73UCFs2Yx4Z0cGBbNxMbHweRhd1q
# tPeVDS46MFPM3/2cGGHpeeqxkoCTU7A9j1VuNQI3k+Kg+6W5YVxiK/UP7bw77E/a
# yAHsmIVTNro8ajMBch73weuHtGtdfFLvCKc6QX6aVjzK4dF1voQ01E7gPQ==
# =rMle
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 13 Apr 2022 10:31:44 AM PDT
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [undefined]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (53 commits)
target/i386: Remove unused XMMReg, YMMReg types and CPUState fields
target/i386: do not access beyond the low 128 bits of SSE registers
virtio-ccw: do not include headers for all virtio devices
virtio-ccw: move device type declarations to .c files
virtio-ccw: move vhost_ccw_scsi to a separate file
s390x: follow qdev tree to detect SCSI device on a CCW bus
hw: hyperv: Initial commit for Synthetic Debugging device
hyperv: Add support to process syndbg commands
hyperv: Add definitions for syndbg
hyperv: SControl is optional to enable SynIc
thread-posix: optimize qemu_sem_timedwait with zero timeout
thread-posix: implement Semaphore with QemuCond and QemuMutex
thread-posix: use monotonic clock for QemuCond and QemuSemaphore
thread-posix: remove the posix semaphore support
whpx: Added support for breakpoints and stepping
build-sys: simplify AF_VSOCK check
build-sys: drop ntddscsi.h check
Remove qemu-common.h include from most units
qga: remove explicit environ argument from exec/spawn
Move fcntl_setfl() to oslib-posix
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
A potential Use-after-free was reported in virtio_iommu_handle_command
when using virtio-iommu:
> I find a potential Use-after-free in QEMU 6.2.0, which is in
> virtio_iommu_handle_command() (./hw/virtio/virtio-iommu.c).
>
>
> Specifically, in the loop body, the variable 'buf' allocated at line 639 can be
> freed by g_free() at line 659. However, if the execution path enters the loop
> body again and the if branch takes true at line 616, the control will directly
> jump to 'out' at line 651. At this time, 'buf' is a freed pointer, which is not
> assigned with an allocated memory but used at line 653. As a result, a UAF bug
> is triggered.
>
>
>
> 599 for (;;) {
> ...
> 615 sz = iov_to_buf(iov, iov_cnt, 0, &head, sizeof(head));
> 616 if (unlikely(sz != sizeof(head))) {
> 617 tail.status = VIRTIO_IOMMU_S_DEVERR;
> 618 goto out;
> 619 }
> ...
> 639 buf = g_malloc0(output_size);
> ...
> 651 out:
> 652 sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
> 653 buf ? buf : &tail, output_size);
> ...
> 659 g_free(buf);
>
> We can fix it by set ‘buf‘ to NULL after freeing it:
>
>
> 651 out:
> 652 sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
> 653 buf ? buf : &tail, output_size);
> ...
> 659 g_free(buf);
> +++ buf = NULL;
> 660 }
Fix as suggested by the reporter.
Signed-off-by: Wentao Liang <Wentao_Liang_g@163.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20220407095047.50371-1-mst@redhat.com
Message-ID: <20220406040445-mutt-send-email-mst@kernel.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Replace the global variables with inlined helper functions. getpagesize() is very
likely annotated with a "const" function attribute (at least with glibc), and thus
optimization should apply even better.
This avoids the need for a constructor initialization too.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace a config-time define with a compile time condition
define (compatible with clang and gcc) that must be declared prior to
its usage. This avoids having a global configure time define, but also
prevents from bad usage, if the config header wasn't included before.
This can help to make some code independent from qemu too.
gcc supports __BYTE_ORDER__ from about 4.6 and clang from 3.2.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[ For the s390x parts I'm involved in ]
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220323155743.1585078-7-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
GLib g_get_real_time() is an alternative to gettimeofday() which allows
to simplify our code.
For semihosting, a few bits are lost on POSIX host, but this shouldn't
be a big concern.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220307070401.171986-5-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The vhost-shadow-virtqueue.c build requires include files from
linux-headers/, so it cannot be built on non-Linux systems.
Fortunately it is only needed by vhost-vdpa, so move it there.
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
One less qemu-specific macro. It also helps to make some headers/units
only depend on glib, and thus moved in standalone projects eventually.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.
This commit only touches allocations with size arguments of the form
sizeof(T).
Patch created mechanically with:
$ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \
--macro-file scripts/cocci-macro-file.h FILES...
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20220315144156.1595462-4-armbru@redhat.com>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
*opaque is an alias to *obj. Using the ladder makes the code consistent with
with other devices, e.g. accel/kvm/kvm-all and accel/tcg/tcg-all. It also
makes the cast more typesafe.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220301222301.103821-2-shentey@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
* VSS header fixes (Marc-André)
* 5-level EPT support (Vitaly)
* AMX support (Jing Liu & Yang Zhong)
* Bundle changes to MSI routes (Longpeng)
* More precise emulation of #SS (Gareth)
* Disable ASAN testing
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmIwb5QUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOOUQf8DiNcq8XVVMdX946Qwa4pSxc4ZJtF
X+RkNsscluuLJ2vGEFKwPVps6c6UPqAhXUruZOQmcLmma511MsyJrxyfd4iRgPD2
tL1+n4RpfsbnTEGT8c6TFWWMEIOjLTbKmR/SIxuxpeVG3xlk6tlCevykrIdc90gP
vQIByTGFx3GwiPyDo0j92mA/CsWLnfq6zQ2Tox1xCyt8R+QDimqG0KGLc5RAyiyC
ZmilN2yaqizDfkIzinwHG6gP1NGwVUsrUNl4X9C4mwEMFnsXiyKP5n/BlDZ7w4Wb
QXalFpPg1hJxRGGvyta6OF9VmCfmK9Q0FNVWm1lPE5adn3ECHFo6FJKvfg==
=LVgf
-----END PGP SIGNATURE-----
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* whpx fixes in preparation for GDB support (Ivan)
* VSS header fixes (Marc-André)
* 5-level EPT support (Vitaly)
* AMX support (Jing Liu & Yang Zhong)
* Bundle changes to MSI routes (Longpeng)
* More precise emulation of #SS (Gareth)
* Disable ASAN testing
# gpg: Signature made Tue 15 Mar 2022 10:51:00 GMT
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (22 commits)
gitlab-ci: do not run tests with address sanitizer
KVM: SVM: always set MSR_AMD64_TSC_RATIO to default value
i386: Add Icelake-Server-v6 CPU model with 5-level EPT support
x86: Support XFD and AMX xsave data migration
x86: add support for KVM_CAP_XSAVE2 and AMX state migration
x86: Add AMX CPUIDs enumeration
x86: Add XFD faulting bit for state components
x86: Grant AMX permission for guest
x86: Add AMX XTILECFG and XTILEDATA components
x86: Fix the 64-byte boundary enumeration for extended state
linux-headers: include missing changes from 5.17
target/i386: Throw a #SS when loading a non-canonical IST
target/i386: only include bits in pg_mode if they are not ignored
kvm/msi: do explicit commit when adding msi routes
kvm-irqchip: introduce new API to support route change
update meson-buildoptions.sh
qga/vss: update informative message about MinGW
qga/vss-win32: check old VSS SDK headers
meson: fix generic location of vss headers
vmxcap: Add 5-level EPT bit
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We invoke the kvm_irqchip_commit_routes() for each addition to MSI route
table, which is not efficient if we are adding lots of routes in some cases.
This patch lets callers invoke the kvm_irqchip_commit_routes(), so the
callers can decide how to optimize.
[1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg00967.html
Signed-off-by: Longpeng <longpeng2@huawei.com>
Message-Id: <20220222141116.2091-3-longpeng2@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
SVQ is able to log the dirty bits by itself, so let's use it to not
block migration.
Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
enabled. Even if the device supports it, the reports would be nonsense
because SVQ memory is in the qemu region.
The log region is still allocated. Future changes might skip that, but
this series is already long enough.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Setting the log address would make the device start reporting invalid
dirty memory because the SVQ vrings are located in qemu's memory.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This is needed to achieve migration, so the destination can restore its
index.
Setting base as last used idx, so destination will see as available all
the entries that the device did not use, including the in-flight
processing ones.
This is ok for networking, but other kinds of devices might have
problems with these retransmissions.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Use translations added in VhostIOVATree in SVQ.
Only introduce usage here, not allocation and deallocation. As with
previous patches, we use the dead code paths of shadow_vqs_enabled to
avoid commiting too many changes at once. These are impossible to take
at the moment.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This tree is able to look for a translated address from an IOVA address.
At first glance it is similar to util/iova-tree. However, SVQ working on
devices with limited IOVA space need more capabilities, like allocating
IOVA chunks or performing reverse translations (qemu addresses to iova).
The allocation capability, as "assign a free IOVA address to this chunk
of memory in qemu's address space" allows shadow virtqueue to create a
new address space that is not restricted by guest's addressable one, so
we can allocate shadow vqs vrings outside of it.
It duplicates the tree so it can search efficiently in both directions,
and it will signal overlap if iova or the translated address is present
in any tree.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Initial version of shadow virtqueue that actually forward buffers. There
is no iommu support at the moment, and that will be addressed in future
patches of this series. Since all vhost-vdpa devices use forced IOMMU,
this means that SVQ is not usable at this point of the series on any
device.
For simplicity it only supports modern devices, that expects vring
in little endian, with split ring and no event idx or indirect
descriptors. Support for them will not be added in this series.
It reuses the VirtQueue code for the device part. The driver part is
based on Linux's virtio_ring driver, but with stripped functionality
and optimizations so it's easier to review.
However, forwarding buffers have some particular pieces: One of the most
unexpected ones is that a guest's buffer can expand through more than
one descriptor in SVQ. While this is handled gracefully by qemu's
emulated virtio devices, it may cause unexpected SVQ queue full. This
patch also solves it by checking for this condition at both guest's
kicks and device's calls. The code may be more elegant in the future if
SVQ code runs in its own iocontext.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
First half of the buffers forwarding part, preparing vhost-vdpa
callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
this is effectively dead code at the moment, but it helps to reduce
patch size.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
It reports the shadow virtqueue address from qemu virtual address space.
Since this will be different from the guest's vaddr, but the device can
access it, SVQ takes special care about its alignment & lack of garbage
data. It assumes that IOMMU will work in host_page_size ranges for that.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This allows SVQ to negotiate features with the guest and the device. For
the device, SVQ is a driver. While this function bypasses all
non-transport features, it needs to disable the features that SVQ does
not support when forwarding buffers. This includes packed vq layout,
indirect descriptors or event idx.
Future changes can add support to offer more features to the guest,
since the use of VirtQueue gives this for free. This is left out at the
moment for simplicity.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This will make qemu aware of the device used buffers, allowing it to
write the guest memory with its contents if needed.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
At this mode no buffer forwarding will be performed in SVQ mode: Qemu
will just forward the guest's kicks to the device.
Host memory notifiers regions are left out for simplicity, and they will
not be addressed in this series.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Vhost shadow virtqueue (SVQ) is an intermediate jump for virtqueue
notifications and buffers, allowing qemu to track them. While qemu is
forwarding the buffers and virtqueue changes, it is able to commit the
memory it's being dirtied, the same way regular qemu's VirtIO devices
do.
This commit only exposes basic SVQ allocation and free. Next patches of
the series add functionality like notifications and buffers forwarding.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
When ioeventfd is emulated using qemu_pipe(), only EventNotifier's wfd
can be used for writing.
Use the recently introduced event_notifier_get_wfd() function to
obtain the fd that our peer must use to signal the vring.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20220304100854.14829-3-slp@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In vhost_vsock_common_send_transport_reset(), if an element popped from
the virtqueue is invalid, we should call virtqueue_detach_element() to
detach it from the virtqueue before freeing its memory.
Fixes: fc0b9b0e1c ("vhost-vsock: add virtio sockets device")
Fixes: CVE-2022-26354
Cc: qemu-stable@nongnu.org
Reported-by: VictorV <vv474172261@gmail.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20220228095058.27899-1-sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The driver can create a bypass domain by passing the
VIRTIO_IOMMU_ATTACH_F_BYPASS flag on the ATTACH request. Bypass domains
perform slightly better than domains with identity mappings since they
skip translation.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20220214124356.872985-4-jean-philippe@linaro.org>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>