Next patches in this series will no longer perform an
immediate poll and check of the device's used buffers
for each CVQ state load command. Instead, they will
send CVQ state load commands in parallel by polling
multiple pending buffers at once.
To achieve this, this patch refactoring vhost_svq_poll()
to accept a new argument `num`, which allows vhost_svq_poll()
to wait for the device to use multiple elements,
rather than polling for a single element.
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <950b3bfcfc5d446168b9d6a249d554a013a691d4.1693287885.git.yin31149@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Doing that way allows CVQ to be enabled before the dataplane vqs,
restoring the state as MQ or MAC addresses properly in the case of a
migration.
The patch does it by defining a ->load NetClientInfo callback also for
dataplane. Ideally, this should be done by an independent patch, but
the function is already static so it would only add an empty
vhost_vdpa_net_data_load stub.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230822085330.3978829-5-eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The vhost-vdpa net backend needs to enable vrings in a different order
than default, so export it.
No functional change intended except for tracing, that now includes the
(virtio) index being enabled and the return value of the ioctl.
Still ignoring return value of this function if called from
vhost_vdpa_dev_start, as reorganize calling code around it is out of
the scope of this series.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230822085330.3978829-3-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Lots of virtio functions that are on a hot path in data transmission
are initializing indirect descriptor cache at the point of stack
allocation. It's a 112 byte structure that is getting zeroed out on
each call adding unnecessary overhead. It's going to be correctly
initialized later via special init function. The only reason to
actually initialize right away is the ability to safely destruct it.
Replacing a designated initializer with a function to only initialize
what is necessary.
Removal of the unnecessary stack initializations improves throughput
of virtio-net devices in terms of 64B packets per second by 6-14 %
depending on the case. Tested with a proposed af-xdp network backend
and a dpdk testpmd application in the guest, but should be beneficial
for other virtio devices as well.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Message-Id: <20230811143423.3258788-1-i.maximets@ovn.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
To use the generic device the user will need to provide the config
region size via the command line. We also add a notifier so the guest
can be pinged if the remote daemon updates the config.
With these changes:
-device vhost-user-device-pci,virtio-id=41,num_vqs=2,config_size=8
is equivalent to:
-device vhost-user-gpio-pci
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230710153522.3469097-11-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In theory we shouldn't need to repeat so much boilerplate to support
vhost-user backends. This provides a generic vhost-user-base QOM
object and a derived vhost-user-device for which the user needs to
provide the few bits of information that aren't currently provided by
the vhost-user protocol. This should provide a baseline implementation
from which the other vhost-user stub can specialise.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230710153522.3469097-8-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Similarly to commit de6cd7599b ("meson: Replace softmmu_ss
-> system_ss"), rename the virtio source set common to all
system emulation as 'system_virtio_ss[]'. This is clearer
because softmmu can be used for user emulation.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230710100510.84862-1-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The previous commit removed the dependencies on the
target-specific TARGET_PAGE_FOO macros. We can now
move vhost-vdpa.c to the 'softmmu_virtio_ss' source
set to build it once for all our targets.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230710100432.84819-1-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Similarly to commit e414ed2c47 ("virtio-iommu: Use
target-agnostic qemu_target_page_mask"), Replace the
target-specific TARGET_PAGE_SIZE and TARGET_PAGE_MASK
definitions by a call to the runtime qemu_target_page_size()
helper which is target agnostic.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230710094931.84402-5-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Use TARGET_PAGE_SIZE to calculate TARGET_PAGE_ALIGN
(see the rationale in previous commits).
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230710094931.84402-4-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Propagate TARGET_PAGE_MASK (see the previous commit for
rationale).
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230710094931.84402-3-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In order to make vhost-vdpa.c a target-agnostic source unit,
we need to remove the TARGET_PAGE_SIZE / TARGET_PAGE_MASK /
TARGET_PAGE_ALIGN uses. TARGET_PAGE_SIZE will be replaced by
the runtime qemu_target_page_size(). The other ones will be
deduced from TARGET_PAGE_SIZE.
Since the 3 macros are used in 3 related functions (sharing
the same call tree), we'll refactor them to only depend on
TARGET_PAGE_MASK.
Having the following call tree:
vhost_vdpa_listener_region_del()
-> vhost_vdpa_listener_skipped_section()
-> vhost_vdpa_section_end()
The first step is to propagate TARGET_PAGE_MASK to
vhost_vdpa_listener_skipped_section().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230710094931.84402-2-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio_load() as a whole should run in coroutine context because it
reads from the migration stream and we don't want this to block.
However, it calls virtio_set_features_nocheck() and devices don't
expect their .set_features callback to run in a coroutine and therefore
call functions that may not be called in coroutine context. To fix this,
drop out of coroutine context for calling virtio_set_features_nocheck().
Without this fix, the following crash was reported:
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007efc738c05d3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2 0x00007efc73873d26 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007efc738477f3 in __GI_abort () at abort.c:79
#4 0x00007efc7384771b in __assert_fail_base (fmt=0x7efc739dbcb8 "", assertion=assertion@entry=0x560aebfbf5cf "!qemu_in_coroutine()",
file=file@entry=0x560aebfcd2d4 "../block/graph-lock.c", line=line@entry=275, function=function@entry=0x560aebfcd34d "void bdrv_graph_rdlock_main_loop(void)") at assert.c:92
#5 0x00007efc7386ccc6 in __assert_fail (assertion=0x560aebfbf5cf "!qemu_in_coroutine()", file=0x560aebfcd2d4 "../block/graph-lock.c", line=275,
function=0x560aebfcd34d "void bdrv_graph_rdlock_main_loop(void)") at assert.c:101
#6 0x0000560aebcd8dd6 in bdrv_register_buf ()
#7 0x0000560aeb97ed97 in ram_block_added.llvm ()
#8 0x0000560aebb8303f in ram_block_add.llvm ()
#9 0x0000560aebb834fa in qemu_ram_alloc_internal.llvm ()
#10 0x0000560aebb2ac98 in vfio_region_mmap ()
#11 0x0000560aebb3ea0f in vfio_bars_register ()
#12 0x0000560aebb3c628 in vfio_realize ()
#13 0x0000560aeb90f0c2 in pci_qdev_realize ()
#14 0x0000560aebc40305 in device_set_realized ()
#15 0x0000560aebc48e07 in property_set_bool.llvm ()
#16 0x0000560aebc46582 in object_property_set ()
#17 0x0000560aebc4cd58 in object_property_set_qobject ()
#18 0x0000560aebc46ba7 in object_property_set_bool ()
#19 0x0000560aeb98b3ca in qdev_device_add_from_qdict ()
#20 0x0000560aebb1fbaf in virtio_net_set_features ()
#21 0x0000560aebb46b51 in virtio_set_features_nocheck ()
#22 0x0000560aebb47107 in virtio_load ()
#23 0x0000560aeb9ae7ce in vmstate_load_state ()
#24 0x0000560aeb9d2ee9 in qemu_loadvm_state_main ()
#25 0x0000560aeb9d45e1 in qemu_loadvm_state ()
#26 0x0000560aeb9bc32c in process_incoming_migration_co.llvm ()
#27 0x0000560aebeace56 in coroutine_trampoline.llvm ()
Cc: qemu-stable@nongnu.org
Buglink: https://issues.redhat.com/browse/RHEL-832
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230905145002.46391-3-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For symmetric algorithms, the length of ciphertext must be as same
as the plaintext.
The missing verification of the src_len and the dst_len in
virtio_crypto_sym_op_helper() may lead buffer overflow/divulged.
This patch is originally written by Yiming Tao for QEMU-SECURITY,
resend it(a few changes of error message) in qemu-devel.
Fixes: CVE-2023-3180
Fixes: 04b9b37edda("virtio-crypto: add data queue processing handler")
Cc: Gonglei <arei.gonglei@huawei.com>
Cc: Mauro Matteo Cascella <mcascell@redhat.com>
Cc: Yiming Tao <taoym@zju.edu.cn>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20230803024314.29962-2-pizhenwei@bytedance.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When the vhost-user reconnect to the backend, the notifer should be
cleanup. Otherwise, the fd resource will be exhausted.
Fixes: f9a09ca3ea ("vhost: add support for configure interrupt")
Signed-off-by: Li Feng <fengli@smartx.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20230731121018.2856310-2-fengli@smartx.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
virtio_queue_packed_set_last_avail_idx() is used by vhost devices to set
the internal queue indices to what has been reported by the vhost
back-end through GET_VRING_BASE. For packed virtqueues, this
32-bit value is expected to contain both the device's internal avail and
used indices, as well as their respective wrap counters.
To get the used index, we shift the 32-bit value right by 16, and then
apply a mask of 0x7ffff. That seems to be a typo, because it should be
0x7fff; first of all, the virtio specification says that the maximum
queue size for packed virt queues is 2^15, so the indices cannot exceed
2^15 - 1 anyway, making 0x7fff the correct mask. Second, the mask
clearly is wrong from context, too, given that (A) `idx & 0x70000` must
be 0 at this point (`idx` is 32 bit and was shifted to the right by 16
already), (B) `idx & 0x8000` is the used_wrap_counter, so should not be
part of the used index, and (C) `vq->used_idx` is a `uint16_t`, so
cannot fit the 0x70000 part of the mask anyway.
This most likely never produced any guest-visible bugs, though, because
for a vhost device, qemu will probably not evaluate the used index
outside of virtio_queue_packed_get_last_avail_idx(), where we
reconstruct the 32-bit value from avail and used indices and their wrap
counters again. There, it does not matter whether the highest bit of
the used_idx is the used index wrap counter, because we put the wrap
counter exactly in that position anyway.
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230721134945.26967-1-hreitz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: German Maglione <gmaglione@redhat.com>
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20230721072820.75797-1-david.edmondson@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The QEMU CI fails in virtio-scmi test occasionally. As reported by
Thomas Huth, this happens most likely when the system is loaded and it
fails with the following error:
qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659:
msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' failed.
../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped)
As discovered by Fabiano Rosas, the cause is a duplicate invocation of
msix_unset_vector_notifiers via duplicate vu_scmi_stop calls:
msix_unset_vector_notifiers
virtio_pci_set_guest_notifiers
vu_scmi_stop
vu_scmi_disconnect
...
qemu_chr_write_buffer
msix_unset_vector_notifiers
virtio_pci_set_guest_notifiers
vu_scmi_stop
vu_scmi_set_status
...
qemu_cleanup
While vu_scmi_stop calls are protected by vhost_dev_is_started()
check, it's apparently not enough. vhost-user-blk and vhost-user-gpio
use an extra protection, see f5b22d06fb (vhost: recheck dev state in
the vhost_migration_log routine) for the motivation. Let's use the
same in vhost-user-scmi, which fixes the failure above.
Fixes: a5dab090e1 ("hw/virtio: Add boilerplate for vhost-user-scmi device")
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Message-Id: <20230720101037.2161450-1-mzamazal@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
At several locations we compute the granule from the config
page_size_mask using ctz() and then format it in traces using
BIT(). As the page_size_mask is 64b we should use ctz64 and
BIT_ULL() for formatting. We failed to be consistent.
Note the page_size_mask is garanteed to be non null. The spec
mandates the device to set at least one bit, so ctz64 cannot
return 64. This is garanteed by the fact the device
initializes the page_size_mask to qemu_target_page_mask()
and then the page_size_mask is further constrained by
virtio_iommu_set_page_size_mask() callback which can't
result in a new mask being null. So if Coverity complains
round those ctz64/BIT_ULL with CID 1517772 this is a false
positive
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Fixes: 94df5b2180 ("virtio-iommu: Fix 64kB host page size VFIO device assignment")
Message-Id: <20230718182136.40096-1-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
In the virtio_iommu_handle_command() when a PROBE request is handled,
output_size takes a value greater than the tail size and on a subsequent
iteration we can get a stack out-of-band access. Initialize the
output_size on each iteration.
The issue was found with ASAN. Credits to:
Yiming Tao(Zhejiang University)
Gaoning Pan(Zhejiang University)
Fixes: 1733eebb9e ("virtio-iommu: Implement RESV_MEM probe request")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reported-by: Mauro Matteo Cascella <mcascell@redhat.com>
Cc: qemu-stable@nongnu.org
Message-Id: <20230717162126.11693-1-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Let's support device unplug by forwarding the unplug_request_check()
callback to the virtio-mem device.
Further, disallow changing the requested-size once an unplug request is
pending.
Disallowing requested-size changes handles corner cases such as
(1) pausing the VM (2) requesting device unplug and (3) adjusting the
requested size. If the VM would plug memory (due to the requested size
change) before processing the unplug request, we would be in trouble.
Message-ID: <20230711153445.514112-8-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
In many cases, blindly unplugging a virtio-mem device is problematic. We
can only safely remove a device once:
* The guest is not expecting to be able to read unplugged memory
(unplugged-inaccessible == on)
* The virtio-mem device does not have memory plugged (size == 0)
* The virtio-mem device does not have outstanding requests to the VM to
plug memory (requested-size == 0)
So let's add a callback to the virtio-mem device class to check for that.
We'll wire-up virtio-mem-pci next.
Message-ID: <20230711153445.514112-7-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's support unplug requests for virtio-md-pci devices that provide
a unplug_request_check() callback.
We'll wire that up for virtio-mem-pci next.
Message-ID: <20230711153445.514112-6-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
While we fence unplug requests from the outside, the VM can still
trigger unplug of virtio based memory devices, for example, in Linux
doing on a virtio-mem-pci device:
# echo 0 > /sys/bus/pci/slots/3/power
While doing that is not really expected to work without harming the
guest OS (e.g., removing a virtio-mem device while it still provides
memory), let's make sure that we properly handle it on the QEMU side.
We'll add support for unplugging of virtio-mem devices in some
configurations next.
Message-ID: <20230711153445.514112-5-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's factor out (un)plug handling, to be reused from arm/virt code.
Provide stubs for the case that CONFIG_VIRTIO_MD is not selected because
neither virtio-mem nor virtio-pmem is enabled. While this cannot
currently happen for x86, it will be possible for arm/virt.
Message-ID: <20230711153445.514112-3-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's add a new abstract "virtio memory device" type, and use it as
parent class of virtio-mem-pci and virtio-pmem-pci.
Message-ID: <20230711153445.514112-2-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
To achieve desired "x-ignore-shared" functionality, we should not
discard all RAM when realizing the device and not mess with
preallocation/postcopy when loading device state. In essence, we should
not touch RAM content.
As "x-ignore-shared" gets set after realizing the device, we cannot
rely on that. Let's simply skip discarding of RAM on incoming migration.
Note that virtio_mem_post_load() will call
virtio_mem_restore_unplugged() -- unless "x-ignore-shared" is set. So
once migration finished we'll have a consistent state.
The initial system reset will also not discard any RAM, because
virtio_mem_unplug_all() will not call virtio_mem_unplug_all() when no
memory is plugged (which is the case before loading the device state).
Note that something like VM templating -- see commit b17fbbe55c
("migration: allow private destination ram with x-ignore-shared") -- is
currently incompatible with virtio-mem and ram_block_discard_range() will
warn in case a private file mapping is supplied by virtio-mem.
For VM templating with virtio-mem, it makes more sense to either
(a) Create the template without the virtio-mem device and hotplug a
virtio-mem device to the new VM instances using proper own memory
backend.
(b) Use a virtio-mem device that doesn't provide any memory in the
template (requested-size=0) and use private anonymous memory.
Message-ID: <20230706075612.67404-5-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Already when starting QEMU we perform one system reset that ends up
triggering virtio_mem_unplug_all() with no actual memory plugged yet.
That, in turn will trigger ram_block_discard_range() and perform some
other actions that are not required in that case.
Let's optimize virtio_mem_unplug_all() for the case that no memory is
plugged. This will be beneficial for x-ignore-shared support as well.
Message-ID: <20230706075612.67404-3-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
QEMU uses vhost_svq_translate_addr() to translate addresses
between the QEMU's virtual address and the SVQ IOVA. In order
to validate this translation, QEMU checks whether the translated
range falls within the mapped range.
Yet the problem is that, the value of `needle_last`, which is calculated
by `needle.translated_addr + iovec[i].iov_len`, should represent the
exclusive boundary of the translated range, rather than the last
inclusive addresses of the range. Consequently, QEMU fails the check
when the translated range matches the size of the mapped range.
This patch solves this problem by fixing the `needle_last` value to
the last inclusive address of the translated range.
Note that this bug cannot be triggered at the moment, because QEMU
is unable to translate such a big range due to the truncation of
the CVQ command in vhost_vdpa_net_handle_ctrl_avail().
Fixes: 34e3c94eda ("vdpa: Add custom IOTLB translations to SVQ")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Message-Id: <ee31c5420ffc8e6a29705ddd30badb814ddbae1d.1688743107.git.yin31149@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: 544f0278af (virtio: introduce macro VIRTIO_CONFIG_IRQ_IDX)
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230710153522.3469097-4-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The current error messages in virtio_iommu_set_page_size_mask()
sound quite similar for different situations and miss the IOMMU
memory region that causes the issue.
Clarify them and rework the comment.
Also remove the trace when the new page_size_mask is not applied as
the current frozen granule is kept. This message is rather confusing
for the end user and anyway the current granule would have been used
by the driver.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20230705165118.28194-3-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Tested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
When running on a 64kB page size host and protecting a VFIO device
with the virtio-iommu, qemu crashes with this kind of message:
qemu-kvm: virtio-iommu page mask 0xfffffffffffff000 is incompatible
with mask 0x20010000
qemu: hardware error: vfio: DMA mapping failed, unable to continue
This is due to the fact the IOMMU MR corresponding to the VFIO device
is enabled very late on domain attach, after the machine init.
The device reports a minimal 64kB page size but it is too late to be
applied. virtio_iommu_set_page_size_mask() fails and this causes
vfio_listener_region_add() to end up with hw_error();
To work around this issue, we transiently enable the IOMMU MR on
machine init to collect the page size requirements and then restore
the bypass state.
Fixes: 90519b9053 ("virtio-iommu: Add bypass mode support to assigned device")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20230705165118.28194-2-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Tested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
With TPM CRM device, vhost-vdpa reports an error when it tries
to register a listener for a non aligned memory region:
qemu-system-x86_64: vhost_vdpa_listener_region_add received unaligned region
qemu-system-x86_64: vhost_vdpa_listener_region_del received unaligned region
This error can be confusing for the user whereas we only need to skip
the region (as it's already done after the error_report())
Rather than introducing a special case for TPM CRB memory section
to not display the message in this case, simply replace the
error_report() by a trace function (with more information, like the
memory region name).
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20230704071931.575888-2-lvivier@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A device reset is issued per device, not per VQ. The legacy device reset
message, VHOST_USER_RESET_OWNER, is already a per device message. Therefore,
this change adds the proper message, VHOST_USER_RESET_DEVICE, to per device
messages.
Signed-off-by: Tom Lonergan <tom.lonergan@nutanix.com>
Message-Id: <20230628163927.108171-3-tom.lonergan@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Some devices, like virtio-scsi, consist of one vhost_dev, while others, like
virtio-net, contain multiple vhost_devs. The QEMU vhost-user code has a
concept of one-time messages which is misleading. One-time messages are sent
once per operation on the device, not once for the lifetime of the device.
Therefore, as discussed in [1], vhost_user_one_time_request should be
renamed to vhost_user_per_device_request and the relevant comments updated
to match the real functionality.
[1] https://lore.kernel.org/qemu-devel/20230127083027-mutt-send-email-mst@kernel.org/
Signed-off-by: Tom Lonergan <tom.lonergan@nutanix.com>
Message-Id: <20230628163927.108171-2-tom.lonergan@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
This allows is to instantiate a vhost-user-scmi device as part of a PCI bus.
It is mostly boilerplate similar to the other vhost-user-*-pci boilerplates
of similar devices.
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Message-Id: <20230628100524.342666-3-mzamazal@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This creates the QEMU side of the vhost-user-scmi device which connects to
the remote daemon. It is based on code of similar vhost-user devices.
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Message-Id: <20230628100524.342666-2-mzamazal@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The guest can disable or never enable Device-TLB. In these cases,
it can't be used even if enabled in QEMU. So, check Device-TLB state
before registering IOMMU notifier and select unmap flag depending on
that. Also, implement a way to change IOMMU notifier flag if Device-TLB
state is changed.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2001312
Signed-off-by: Viktor Prutyanov <viktor@daynix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230626091258.24453-2-viktor@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It is always 0 and it is not useful to route call through file
descriptor.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230526153736.472443-1-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add MEMORY_LISTENER_PRIORITY_DEV_BACKEND for the symbolic value
for memory listener to replace the hard-coded value 10 for the
device backend.
No functional change intended.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <8314d91688030d7004e96958f12e2c83fb889245.1687279702.git.isaku.yamahata@intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Slave/master nomenclature was replaced with backend/frontend in commit
1fc19b6527 ("vhost-user: Adopt new backend naming")
This patch replaces all remaining uses of master and slave in the
codebase.
Signed-off-by: Emmanouil Pitsidianakis <manos.pitsidianakis@linaro.org>
Message-Id: <20230613080849.2115347-1-manos.pitsidianakis@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
in vhost_dev_enable_notifiers(), if virtio_bus_set_host_notifier(true)
fails, we call vhost_dev_disable_notifiers() that executes
virtio_bus_set_host_notifier(false) on all queues, even on queues that
have failed to be initialized.
This triggers a core dump in memory_region_del_eventfd():
virtio_bus_set_host_notifier: unable to init event notifier: Too many open files (-24)
vhost VQ 1 notifier binding failed: 24
.../softmmu/memory.c:2611: memory_region_del_eventfd: Assertion `i != mr->ioeventfd_nb' failed.
Fix the problem by providing to vhost_dev_disable_notifiers() the
number of queues to disable.
Fixes: 8771589b6f ("vhost: simplify vhost_dev_enable_notifiers")
Cc: longpeng2@huawei.com
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20230602162735.3670785-1-lvivier@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The vdpa devices that use va addresses neeeds these maps shared.
Otherwise, vhost_vdpa checks will refuse to accept the maps.
The mmap call will always return a page aligned address, so removing the
qemu_memalign call. Keeping the ROUND_UP for the size as we still need
to DMA-map them in full.
Not applying fixes tag as it never worked with va devices.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230602143854.1879091-4-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Let's separate plug and unplug handling to prepare for future changes
and make the code a bit easier to read -- working on block states
(plugged/unplugged) instead of on a bitmap.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Gavin Shan <gshan@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230523183036.517957-1-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost_dev_start function does not release virtqueue objects when
event_notifier_init() function fails. Release virtqueue objects
and log a message about function failure.
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
Message-Id: <20230529114333.31686-3-ppandit@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: f9a09ca3ea ("vhost: add support for configure interrupt")
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: qemu-stable@nongnu.org
Acked-by: Jason Wang <jasowang@redhat.com>
vhost_dev_start function does not release memory_listener object
in case of an error. This may crash the guest when vhost is unable
to set memory table:
stack trace of thread 125653:
Program terminated with signal SIGSEGV, Segmentation fault
#0 memory_listener_register (qemu-kvm + 0x6cda0f)
#1 vhost_dev_start (qemu-kvm + 0x699301)
#2 vhost_net_start (qemu-kvm + 0x45b03f)
#3 virtio_net_set_status (qemu-kvm + 0x665672)
#4 qmp_set_link (qemu-kvm + 0x548fd5)
#5 net_vhost_user_event (qemu-kvm + 0x552c45)
#6 tcp_chr_connect (qemu-kvm + 0x88d473)
#7 tcp_chr_new_client (qemu-kvm + 0x88cf83)
#8 tcp_chr_accept (qemu-kvm + 0x88b429)
#9 qio_net_listener_channel_func (qemu-kvm + 0x7ac07c)
#10 g_main_context_dispatch (libglib-2.0.so.0 + 0x54e2f)
Release memory_listener objects in the error path.
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
Message-Id: <20230529114333.31686-2-ppandit@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Fixes: c471ad0e9b ("vhost_net: device IOTLB support")
Cc: qemu-stable@nongnu.org
Acked-by: Jason Wang <jasowang@redhat.com>
The previous commit remove the unnecessary "virtio-access.h"
header. These files no longer have target-specific dependency.
Move them to the generic 'softmmu_ss' source set.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230524093744.88442-11-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
None of these files use the VirtIO Load/Store API declared
by "hw/virtio/virtio-access.h". This header probably crept
in via copy/pasting, remove it.
Note, "virtio-access.h" is target-specific, so any file
including it also become tainted as target-specific.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230524093744.88442-10-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
In order to have virtio-iommu.c become target-agnostic,
we need to avoid using TARGET_PAGE_MASK. Get it with the
qemu_target_page_mask() helper.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20230524093744.88442-9-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Instead of having "virtio/virtio-bus.h" implicitly included,
explicitly include it, to avoid when rearranging headers:
hw/virtio/vhost-vsock-common.c: In function ‘vhost_vsock_common_start’:
hw/virtio/vhost-vsock-common.c:51:5: error: unknown type name ‘VirtioBusClass’; did you mean ‘VirtioDeviceClass’?
51 | VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
| ^~~~~~~~~~~~~~
| VirtioDeviceClass
hw/virtio/vhost-vsock-common.c:51:25: error: implicit declaration of function ‘VIRTIO_BUS_GET_CLASS’; did you mean ‘VIRTIO_DEVICE_CLASS’? [-Werror=implicit-function-declaration]
51 | VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
| ^~~~~~~~~~~~~~~~~~~~
| VIRTIO_DEVICE_CLASS
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230524093744.88442-8-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Avoid accessing RAMBlock internals, use the provided
qemu_ram_get_fd() getter to get the file descriptor.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230524093744.88442-7-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Instead of adding 'vhost-vsock-common.c' twice (for VHOST_VSOCK
and VHOST_USER_VSOCK), have it depend on VHOST_VSOCK_COMMON,
selected by both symbols.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230524093744.88442-6-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Add asymmetric crypto support in vhost_user backend.
Signed-off-by: Gowrishankar Muthukrishnan <gmuthukrishn@marvell.com>
Message-Id: <20230516083139.2349744-1-gmuthukrishn@marvell.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We use the user_ss[] array to hold the user emulation sources,
and the softmmu_ss[] array to hold the system emulation ones.
Hold the latter in the 'system_ss[]' array for parity with user
emulation.
Mechanical change doing:
$ sed -i -e s/softmmu_ss/system_ss/g $(git grep -l softmmu_ss)
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230613133347.82210-10-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Mechanical change running Coccinelle spatch with content
generated from the qom-cast-macro-clean-cocci-gen.py added
in the previous commit.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230601093452.38972-3-philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
All callers now pass is_external=false to aio_set_fd_handler() and
aio_set_event_notifier(). The aio_disable_external() API that
temporarily disables fd handlers that were registered is_external=true
is therefore dead code.
Remove aio_disable_external(), aio_enable_external(), and the
is_external arguments to aio_set_fd_handler() and
aio_set_event_notifier().
The entire test-fdmon-epoll test is removed because its sole purpose was
testing aio_disable_external().
Parts of this patch were generated using the following coccinelle
(https://coccinelle.lip6.fr/) semantic patch:
@@
expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque;
@@
- aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque)
+ aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, opaque)
@@
expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready;
@@
- aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, io_poll_ready)
+ aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready)
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230516190238.8401-21-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Host notifiers can now use is_external=false since virtio-blk and
virtio-scsi no longer rely on is_external=true for drained sections.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230516190238.8401-20-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
virtio_queue_aio_detach_host_notifier() does two things:
1. It removes the fd handler from the event loop.
2. It processes the virtqueue one last time.
The first step can be peformed by any thread and without taking the
AioContext lock.
The second step may need the AioContext lock (depending on the device
implementation) and runs in the thread where request processing takes
place. virtio-blk and virtio-scsi therefore call
virtio_queue_aio_detach_host_notifier() from a BH that is scheduled in
AioContext.
The next patch will introduce a .drained_begin() function that needs to
call virtio_queue_aio_detach_host_notifier(). .drained_begin() functions
cannot call aio_poll() to wait synchronously for the BH. It is possible
for a .drained_poll() callback to asynchronously wait for the BH, but
that is more complex than necessary here.
Move the virtqueue processing out to the callers of
virtio_queue_aio_detach_host_notifier() so that the function can be
called from any thread. This is in preparation for the next patch.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230516190238.8401-17-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add virtio-multitouch-pci, a Multitouch-capable input device, to the
list of devices that can be provided by virtio-input-pci.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230526112925.38794-5-slp@redhat.com>
The VirtioInfoList is already allocated by QAPI_LIST_PREPEND and
need not be allocated by the caller.
Fixes Coverity CID 1508724.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add an option for hostmem-file to start the memory object at an offset
into the target file. This is useful if multiple memory objects reside
inside the same target file, such as a device node.
In particular, it's useful to map guest memory directly into /dev/mem
for experimentation.
To make this work consistently, also fix up all places in QEMU that
expect fd offsets to be 0.
Signed-off-by: Alexander Graf <graf@amazon.com>
Message-Id: <20230403221421.60877-1-graf@amazon.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
According to PCIe Address Translation Services specification 5.1.3.,
ATS Control Register has Enable bit to enable/disable ATS. Guest may
enable/disable PCI ATS and, accordingly, Device-TLB for the VirtIO PCI
device. So, raise/lower a flag and call a trigger function to pass this
event to a device implementation.
Signed-off-by: Viktor Prutyanov <viktor@daynix.com>
Message-Id: <20230512135122.70403-2-viktor@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
1. The vIOMMU support will make vDPA can work in IOMMU mode. This
will fix security issues while using the no-IOMMU mode.
To support this feature we need to add new functions for IOMMU MR adds and
deletes.
Also since the SVQ does not support vIOMMU yet, add the check for IOMMU
in vhost_vdpa_dev_start, if the SVQ and IOMMU enable at the same time
the function will return fail.
2. Skip the iova_max check vhost_vdpa_listener_skipped_section(). While
MR is IOMMU, move this check to vhost_vdpa_iommu_map_notify()
Verified in vp_vdpa and vdpa_sim_net driver
Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20230510054631.2951812-5-lulu@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The unmap ioctl doesn't accept a full 64-bit span. So need to
add check for the section's size in vhost_vdpa_listener_region_del().
Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20230510054631.2951812-4-lulu@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In trace_vhost_vdpa_listener_region_del, the value for llend
should change to int128_get64(int128_sub(llend, int128_one()))
Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20230510054631.2951812-3-lulu@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
To support vIOMMU in vdpa, need to exposed the function
vhost_dev_has_iommu, vdpa will use this function to check
if vIOMMU enable.
Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20230510054631.2951812-2-lulu@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Ensure op_info is not NULL in case of QCRYPTODEV_BACKEND_ALG_SYM algtype.
Fixes: 0e660a6f90 ("crypto: Introduce RSA algorithm")
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Reported-by: Yiming Tao <taoym@zju.edu.cn>
Message-Id: <20230509075317.1132301-1-mcascell@redhat.com>
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: zhenwei pi<pizhenwei@bytedance.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Let's just support 512 memslots on x86-64 and aarch64 as well. The maximum
number of ACPI slots (256) is no longer completely expressive ever since
we supported virtio-based memory devices. Further, we're completely
ignoring other memslots used outside of memory device context, such as
memslots used for boot memory.
Note that the vhost memslot limit in the kernel is usually configured to
be 509. With this change, we prepare vhost-user on the QEMU side to be
closer to that limit, to eventually support ~512 memslots in most vhost
implementations and have less "surprises" when cold/hotplugging vhost
devices while also consuming more memslots than we're currently used to
by memory devices (e.g., once virtio-mem starts using multiple memslots).
Note that most vhost-user implementations only support a small number of
memslots so far, which we can hopefully improve in the near future.
We'll leave the PPC special-case as is for now.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230503184144.808478-1-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Allowing guests to read unplugged memory simplified the bring-up of
virtio-mem in Linux guests -- which was limited to x86-64 only. On arm64
(which was added later), we never had legacy guests and don't even allow
to configure it, essentially always having "unplugged-inaccessible=on".
At this point, all guests we care about
should be supporting VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, so let's
change the default for the 8.1 machine.
This change implies that also memory that supports the shared zeropage
(private anonymous memory) will now require
VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE in the driver in order to be usable by
the guest -- as default, one can still manually set the
unplugged-inaccessible property.
Disallowing the guest to read unplugged memory will be important for
some future features, such as memslot optimizations or protection of
unplugged memory, whereby we'll actually no longer allow the guest to
even read from unplugged memory.
At some point, we might want to deprecate and remove that property.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <eduardo@habkost.net>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230503182352.792458-1-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Setting the VIRTIO Device Status Field to 0 resets the device. The
device's state is lost, including the vring configuration.
vhost-user.c currently sends SET_STATUS 0 before GET_VRING_BASE. This
risks confusion about the lifetime of the vhost-user state (e.g. vring
last_avail_idx) across VIRTIO device reset.
Eugenio Pérez <eperezma@redhat.com> adjusted the order for vhost-vdpa.c
in commit c3716f260b ("vdpa: move vhost reset after get vring base")
and in that commit description suggested doing the same for vhost-user
in the future.
Go ahead and adjust vhost-user.c now. I ran various online code searches
to identify vhost-user backends implementing SET_STATUS. It seems only
DPDK implements SET_STATUS and Yajun Wu <yajunw@nvidia.com> has
confirmed that it is safe to make this change.
Fixes: commit 923b8921d2 ("vhost-user: Support vhost_dev_start")
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Cindy Lu <lulu@redhat.com>
Cc: Yajun Wu <yajunw@nvidia.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230501230409.274178-1-stefanha@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
QEMU invokes vhost_svq_add() when adding a guest's element
into SVQ. In vhost_svq_add(), it uses vhost_svq_available_slots()
to check whether QEMU can add the element into SVQ. If there is
enough space, then QEMU combines some out descriptors and some
in descriptors into one descriptor chain, and adds it into
`svq->vring.desc` by vhost_svq_vring_write_descs().
Yet the problem is that, `svq->shadow_avail_idx - svq->shadow_used_idx`
in vhost_svq_available_slots() returns the number of occupied elements,
or the number of descriptor chains, instead of the number of occupied
descriptors, which may cause wrapping in SVQ descriptor ring.
Here is an example. In vhost_handle_guest_kick(), QEMU forwards
as many available buffers to device by virtqueue_pop() and
vhost_svq_add_element(). virtqueue_pop() returns a guest's element,
and then this element is added into SVQ by vhost_svq_add_element(),
a wrapper to vhost_svq_add(). If QEMU invokes virtqueue_pop() and
vhost_svq_add_element() `svq->vring.num` times,
vhost_svq_available_slots() thinks QEMU just ran out of slots and
everything should work fine. But in fact, virtqueue_pop() returns
`svq->vring.num` elements or descriptor chains, more than
`svq->vring.num` descriptors due to guest memory fragmentation,
and this causes wrapping in SVQ descriptor ring.
This bug is valid even before marking the descriptors used.
If the guest memory is fragmented, SVQ must add chains
so it can try to add more descriptors than possible.
This patch solves it by adding `num_free` field in
VhostShadowVirtqueue structure and updating this field
in vhost_svq_add() and vhost_svq_get_buf(), to record
the number of free descriptors.
Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230509084817.3973-1-yin31149@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
This patch extends virtio-blk emulation to handle zoned device commands
by calling the new block layer APIs to perform zoned device I/O on
behalf of the guest. It supports Report Zone, four zone oparations (open,
close, finish, reset), and Append Zone.
The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
support zoned block devices. Regular block devices(conventional zones)
will not be set.
The guest os can use blktests, fio to test those commands on zoned devices.
Furthermore, using zonefs to test zone append write is also supported.
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Message-id: 20230508051916.178322-2-faithilikerun@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This protects devices from bh->mmio reentrancy issues.
Thanks: Thomas Huth <thuth@redhat.com> for diagnosing OS X test failure.
Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230427211013.2994127-5-alxndr@bu.edu>
Signed-off-by: Thomas Huth <thuth@redhat.com>
- Protect BlockBackend.queued_requests with its own lock
- Switch to AIO_WAIT_WHILE_UNLOCKED() where possible
- AioContext removal: LinuxAioState/LuringState/ThreadPool
- Add more coroutine_fn annotations, use bdrv/blk_co_*
- Fix crash when execute hmp_commit
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmRH0b0RHGt3b2xmQHJl
ZGhhdC5jb20ACgkQfwmycsiPL9Y0yw/6A/vzA4TGgFUP3WIvH/sQri4/V3gyR+PT
u3hOQUCYZ99nioTpKV91TSuUPuU/Mdspy/0NKM+K92yIXqxa9172A2zLOsGOu21l
qKpse+nBf1zqEgB8YzUHyCBdetPz916C/f9RS26SNUCW85GCHYGHA3u7nKvWLMyV
oKIoTlA8QOglOuEKlRoYh7hCFm7ET51NOSEftm8GsYbsW/I2Vzl8a1SHN1lHufjd
We3+898zUrmFqNMp6Rjdhn+yZmmoGzoZqV4YQi83z7xjiv+Ms4VHVVW7X8d20xRX
5BLFiLHAuZ/1d26HyVhgBUr7KHyf94odocz8BylWKXGl5SXMCZun1Td1vgVKlGK+
GRxzB2cWGWqzC2UmqSTc0Z0aIWbXukKwvcX76uBKsQZ+kB2A7jFobxHiaoQEDJ8B
WRNEMH2+CqCAu9rsrNRinnJKhT2nXcr9F9YfwRIlagdAePGWin+EUW8huf14dDBm
Z2Y34aKW4RQibF8xirMHeRBbOLmcq2VpKLKwNfBHUDgZB8iuD7bLn4n9nwWXMG1w
zgNsTybkv46vLPamTpEaUoNTHfuRDTAuE7Z7lkcc7jF41Z0V1DC/DCCWcL/0LvhP
GIxFdkYug3hetdF2U/OZhUoEfxvkqcuBnrr55LFzqheKEllQpPwPpt7UF0aH8bg3
i/YpjHsf3xU=
=mpYX
-----END PGP SIGNATURE-----
Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging
Block layer patches
- Protect BlockBackend.queued_requests with its own lock
- Switch to AIO_WAIT_WHILE_UNLOCKED() where possible
- AioContext removal: LinuxAioState/LuringState/ThreadPool
- Add more coroutine_fn annotations, use bdrv/blk_co_*
- Fix crash when execute hmp_commit
# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmRH0b0RHGt3b2xmQHJl
# ZGhhdC5jb20ACgkQfwmycsiPL9Y0yw/6A/vzA4TGgFUP3WIvH/sQri4/V3gyR+PT
# u3hOQUCYZ99nioTpKV91TSuUPuU/Mdspy/0NKM+K92yIXqxa9172A2zLOsGOu21l
# qKpse+nBf1zqEgB8YzUHyCBdetPz916C/f9RS26SNUCW85GCHYGHA3u7nKvWLMyV
# oKIoTlA8QOglOuEKlRoYh7hCFm7ET51NOSEftm8GsYbsW/I2Vzl8a1SHN1lHufjd
# We3+898zUrmFqNMp6Rjdhn+yZmmoGzoZqV4YQi83z7xjiv+Ms4VHVVW7X8d20xRX
# 5BLFiLHAuZ/1d26HyVhgBUr7KHyf94odocz8BylWKXGl5SXMCZun1Td1vgVKlGK+
# GRxzB2cWGWqzC2UmqSTc0Z0aIWbXukKwvcX76uBKsQZ+kB2A7jFobxHiaoQEDJ8B
# WRNEMH2+CqCAu9rsrNRinnJKhT2nXcr9F9YfwRIlagdAePGWin+EUW8huf14dDBm
# Z2Y34aKW4RQibF8xirMHeRBbOLmcq2VpKLKwNfBHUDgZB8iuD7bLn4n9nwWXMG1w
# zgNsTybkv46vLPamTpEaUoNTHfuRDTAuE7Z7lkcc7jF41Z0V1DC/DCCWcL/0LvhP
# GIxFdkYug3hetdF2U/OZhUoEfxvkqcuBnrr55LFzqheKEllQpPwPpt7UF0aH8bg3
# i/YpjHsf3xU=
# =mpYX
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 25 Apr 2023 02:12:29 PM BST
# gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg: issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
* tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (25 commits)
block/monitor: Fix crash when executing HMP commit
vmdk: make vmdk_is_cid_valid a coroutine_fn
qcow2: mark various functions as coroutine_fn and GRAPH_RDLOCK
tests: mark more coroutine_fns
qemu-pr-helper: mark more coroutine_fns
9pfs: mark more coroutine_fns
nbd: mark more coroutine_fns, do not use co_wrappers
mirror: make mirror_flush a coroutine_fn, do not use co_wrappers
blkdebug: add missing coroutine_fn annotation
vvfat: mark various functions as coroutine_fn
thread-pool: avoid passing the pool parameter every time
thread-pool: use ThreadPool from the running thread
io_uring: use LuringState from the running thread
linux-aio: use LinuxAioState from the running thread
block: add missing coroutine_fn to bdrv_sum_allocated_file_size()
include/block: fixup typos
monitor: convert monitor_cleanup() to AIO_WAIT_WHILE_UNLOCKED()
hmp: convert handle_hmp_command() to AIO_WAIT_WHILE_UNLOCKED()
block: convert bdrv_drain_all_begin() to AIO_WAIT_WHILE_UNLOCKED()
block: convert bdrv_graph_wrlock() to AIO_WAIT_WHILE_UNLOCKED()
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
thread_pool_submit_aio() is always called on a pool taken from
qemu_get_current_aio_context(), and that is the only intended
use: each pool runs only in the same thread that is submitting
work to it, it can't run anywhere else.
Therefore simplify the thread_pool_submit* API and remove the
ThreadPool function parameter.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20230203131731.851116-5-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Mostly just fixes, cleanups all over the place.
Some optimizations.
More control over slot_reserved_mask.
More feature bits supported for SVQ.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmRHQvAPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpQc0H/RD+RXy7IAnmhkdCyjj0hM8pftPTwCJfrSCW
DLHP4c5jiKO5ngUoAv3YJdM77TBCXlJn6gceeKBrzhGUTtJ7dTLC+Udeq/jW43EF
/E2ldLLbTNFyUqW8yX7D+EVio7Jy4zXTHpczKCF5vO7MaVWS/b3QdCpmjXpEHLNb
janv24vQHHgmRwK96uIdIauJJT8aqYW0arn1po8anxuFS8ok9Tf8LTEF5uBHokJP
MriTwMaqMgRK+4rzh+b6wc7QC5GqIr44gFrsfFYuNOUY0+BizvGvUAtMt+B/XZwt
OF4RSShUh2bhsQoYwgvShfEsR/vWwOl3yMAhcsB+wMgMzMG8MUQ=
=e8DF
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pc,pci: fixes, features, cleanups
Mostly just fixes, cleanups all over the place.
Some optimizations.
More control over slot_reserved_mask.
More feature bits supported for SVQ.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmRHQvAPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpQc0H/RD+RXy7IAnmhkdCyjj0hM8pftPTwCJfrSCW
# DLHP4c5jiKO5ngUoAv3YJdM77TBCXlJn6gceeKBrzhGUTtJ7dTLC+Udeq/jW43EF
# /E2ldLLbTNFyUqW8yX7D+EVio7Jy4zXTHpczKCF5vO7MaVWS/b3QdCpmjXpEHLNb
# janv24vQHHgmRwK96uIdIauJJT8aqYW0arn1po8anxuFS8ok9Tf8LTEF5uBHokJP
# MriTwMaqMgRK+4rzh+b6wc7QC5GqIr44gFrsfFYuNOUY0+BizvGvUAtMt+B/XZwt
# OF4RSShUh2bhsQoYwgvShfEsR/vWwOl3yMAhcsB+wMgMzMG8MUQ=
# =e8DF
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 25 Apr 2023 04:03:12 AM BST
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [undefined]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (31 commits)
hw/pci-bridge: Make PCIe and CXL PXB Devices inherit from TYPE_PXB_DEV
hw/pci-bridge: pci_expander_bridge fix type in pxb_cxl_dev_reset()
docs/specs: Convert pci-testdev.txt to rst
docs/specs: Convert pci-serial.txt to rst
docs/specs/pci-ids: Convert from txt to rST
acpi: pcihp: allow repeating hot-unplug requests
virtio: i2c: Check notifier helpers for VIRTIO_CONFIG_IRQ_IDX
docs: Remove obsolete descriptions of SR-IOV support
intel_iommu: refine iotlb hash calculation
docs/cxl: Fix sentence
MAINTAINERS: Add Eugenio Pérez as vhost-shadow-virtqueue reviewer
tests: bios-tables-test: replace memset with initializer
hw/acpi: limit warning on acpi table size to pc machines older than version 2.3
Add my old and new work email mapping and use work email to support acpi
vhost-user-blk-server: notify client about disk resize
pci: avoid accessing slot_reserved_mask directly outside of pci.c
hw: Add compat machines for 8.1
hw/i386/amd_iommu: Factor amdvi_pci_realize out of amdvi_sysbus_realize
hw/i386/amd_iommu: Set PCI static/const fields via PCIDeviceClass
hw/i386/amd_iommu: Move capab_offset from AMDVIState to AMDVIPCIState
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Since the driver doesn't support interrupts, we must return early when
index is set to VIRTIO_CONFIG_IRQ_IDX.
Fixes: 544f0278af ("virtio: introduce macro VIRTIO_CONFIG_IRQ_IDX")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Message-Id: <d53ec8bc002001eafac597f6bd9a8812df989257.1681790067.git.viresh.kumar@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We move there all capabilities helpers from migration.c.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
Following David advise:
- looked through the history, capabilities are newer than 2012, so we
can remove that bit of the header.
- This part is posterior to Anthony.
Original Author is Orit. Once there,
I put myself. Peter Xu also did quite a bit of work here.
Anyone else wants/needs to be there? I didn't search too hard
because nobody asked before to be added.
What do you think?
Optimize the virtio-balloon feature on the ARM platform by adding
a variable to keep track of the current hot-plugged pc-dimm size,
instead of traversing the virtual machine's memory modules to count
the current RAM size during the balloon inflation or deflation
process. This variable can be updated only when plugging or unplugging
the device, which will result in an increase of approximately 60%
efficiency of balloon process on the ARM platform.
We tested the total amount of time required for the balloon inflation process on ARM:
inflate the balloon to 64GB of a 128GB guest under stress.
Before: 102 seconds
After: 42 seconds
Signed-off-by: Qi Xi <xiqi2@huawei.com>
Signed-off-by: Ming Yang yangming73@huawei.com
Message-Id: <e13bc78f96774bfab4576814c293aa52@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
These hooks were introduced in:
80a1ea3748 ("memory: move ioeventfd ops to MemoryListener", 2012-02-29)
But they seem to be never used. Drop them.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230306193209.516011-1-peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When a virtqueue size is changed by the guest via
virtio_queue_set_num(), its region cache is not automatically updated.
If the size was increased, this could lead to accessing the cache out
of bounds. For example, in vring_get_used_event():
static inline uint16_t vring_get_used_event(VirtQueue *vq)
{
return vring_avail_ring(vq, vq->vring.num);
}
static inline uint16_t vring_avail_ring(VirtQueue *vq, int i)
{
VRingMemoryRegionCaches *caches = vring_get_region_caches(vq);
hwaddr pa = offsetof(VRingAvail, ring[i]);
if (!caches) {
return 0;
}
return virtio_lduw_phys_cached(vq->vdev, &caches->avail, pa);
}
vq->vring.num will be greater than caches->avail.len, which will
trigger a failed assertion down the call path of
virtio_lduw_phys_cached().
Fix this by calling virtio_init_region_cache() after
virtio_queue_set_num() if we are not already calling
virtio_queue_set_rings(). In the legacy path this is already done by
virtio_queue_update_rings().
Signed-off-by: Carlos López <clopez@suse.de>
Message-Id: <20230317002749.27379-1-clopez@suse.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In virtqueue_{split,packed}_get_avail_bytes() descriptors are read
in a loop via MemoryRegionCache regions and calls to
vring_{split,packed}_desc_read() - these take a region cache and the
index of the descriptor to be read.
For direct descriptors we use a cache provided by the caller, whose
size matches that of the virtqueue vring. We limit the number of
descriptors we can read by the size of that vring:
max = vq->vring.num;
...
MemoryRegionCache *desc_cache = &caches->desc;
For indirect descriptors, we initialize a new cache and limit the
number of descriptors by the size of the intermediate descriptor:
len = address_space_cache_init(&indirect_desc_cache,
vdev->dma_as,
desc.addr, desc.len, false);
desc_cache = &indirect_desc_cache;
...
max = desc.len / sizeof(VRingDesc);
However, the first initialization of `max` is done outside the loop
where we process guest descriptors, while the second one is done
inside. This means that a sequence of an indirect descriptor followed
by a direct one will leave a stale value in `max`. If the second
descriptor's `next` field is smaller than the stale value, but
greater than the size of the virtqueue ring (and thus the cached
region), a failed assertion will be triggered in
address_space_read_cached() down the call chain.
Fix this by initializing `max` inside the loop in both functions.
Fixes: 9796d0ac8f ("virtio: use address_space_map/unmap to access descriptors")
Signed-off-by: Carlos López <clopez@suse.de>
Message-Id: <20230302100358.3613-1-clopez@suse.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
During protocol negotiation, when we the QEMU
stub does not support a backend with F_CONFIG,
it throws a warning and supresses the
VHOST_USER_PROTOCOL_F_CONFIG bit.
However, the warning uses warn_reportf_err macro
and passes an unitialized errp pointer. However,
the macro tries to edit the 'msg' member of the
unitialized Error and segfaults.
Instead, just use warn_report, which prints a
warning message directly to the output.
Fixes: 5653493 ("hw/virtio/vhost-user: don't suppress F_CONFIG when supported")
Signed-off-by: Albert Esteve <aesteve@redhat.com>
Message-Id: <20230302121719.9390-1-aesteve@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost-vdpa devices can return this feature now that blockers have been
set in case some features are not met.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230303172445.1089785-15-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Next patches enable devices to be migrated even if vdpa netdev has not
been started with x-svq. However, not all devices are migratable, so we
need to block migration if we detect that.
Block migration if we detect the device expose a feature SVQ does not
know how to work with.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230303172445.1089785-13-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Devices with CVQ need to migrate state beyond vq state. Leaving this to
future series.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230303172445.1089785-11-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Although it does not make a big difference, its more correct and
simplifies the cleanup path in subsequent patches.
Move ram_block_discard_disable(false) call to the top of
vhost_vdpa_cleanup because:
* We cannot use vhost_vdpa_first_dev after dev->opaque = NULL
assignment.
* Improve the stack order in cleanup: since it is the last action taken
in init, it should be the first at cleanup.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230303172445.1089785-10-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The function vhost.c:vhost_dev_stop calls vhost operation
vhost_dev_start(false). In the case of vdpa it totally reset and wipes
the device, making the fetching of the vring base (virtqueue state) totally
useless.
The kernel backend does not use vhost_dev_start vhost op callback, but
vhost-user do. A patch to make vhost_user_dev_start more similar to vdpa
is desirable, but it can be added on top.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230303172445.1089785-8-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The function vhost.c:vhost_dev_stop fetches the vring base so the vq
state can be migrated to other devices. However, this is unreliable in
vdpa, since we didn't signal the device to suspend the queues, making
the value fetched useless.
Suspend the device if possible before fetching first and subsequent
vring bases.
Moreover, vdpa totally reset and wipes the device at the last device
before fetch its vrings base, making that operation useless in the last
device. This will be fixed in later patches of this series.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230303172445.1089785-7-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This allows vhost_vdpa to track if it is safe to get the vring base from
the device or not. If it is not, vhost can fall back to fetch idx from
the guest buffer again.
No functional change intended in this patch, later patches will use this
field.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230303172445.1089785-6-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
At this moment it is only possible to migrate to a vdpa device running
with x-svq=on. As a protective measure, the rewind of the inflight
descriptors was done at the destination. That way if the source sent a
virtqueue with inuse descriptors they are always discarded.
Since this series allows to migrate also to passthrough devices with no
SVQ, the right thing to do is to rewind at the source so the base of
vrings are correct.
Support for inflight descriptors may be added in the future.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230303172445.1089785-5-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This is needed for qemu to know it can suspend the device to retrieve
its status and enable SVQ with it, so all the process is transparent to
the guest.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230303172445.1089785-4-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
As SVQ can be enabled dynamically at any time, it needs to store call fd
always.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230303172445.1089785-3-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Move queue_index, CryptoDevCompletionFunc and opaque into struct
CryptoDevBackendOpInfo, then cryptodev_backend_crypto_operation()
needs an argument CryptoDevBackendOpInfo *op_info only. And remove
VirtIOCryptoReq from cryptodev. It's also possible to hide
VirtIOCryptoReq into virtio-crypto.c in the next step. (In theory,
VirtIOCryptoReq is a private structure used by virtio-crypto only)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20230301105847.253084-9-pizhenwei@bytedance.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introduce cryptodev service type in cryptodev.json, then apply this
to related codes. Now we can remove VIRTIO_CRYPTO_SERVICE_xxx
dependence from QEMU cryptodev.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20230301105847.253084-5-pizhenwei@bytedance.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introduce cryptodev alg type in cryptodev.json, then apply this to
related codes, and drop 'enum CryptoDevBackendAlgType'.
There are two options:
1, { 'enum': 'QCryptodevBackendAlgType',
'prefix': 'CRYPTODEV_BACKEND_ALG',
'data': ['sym', 'asym']}
Then we can keep 'CRYPTODEV_BACKEND_ALG_SYM' and avoid lots of
changes.
2, changes in this patch(with prefix 'QCRYPTODEV_BACKEND_ALG').
To avoid breaking the rule of QAPI, use 2 here.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20230301105847.253084-4-pizhenwei@bytedance.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In vhost_svq_poll(), if vhost_svq_get_buf() fails due to a device
providing invalid descriptors, len is left uninitialized and returned
to the caller, potentally leaking stack data or causing undefined
behavior.
Fix this by initializing len to 0.
Found with GCC 13 and -fanalyzer (abridged):
../hw/virtio/vhost-shadow-virtqueue.c: In function ‘vhost_svq_poll’:
../hw/virtio/vhost-shadow-virtqueue.c:538:12: warning: use of uninitialized value ‘len’ [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
538 | return len;
| ^~~
‘vhost_svq_poll’: events 1-4
|
| 522 | size_t vhost_svq_poll(VhostShadowVirtqueue *svq)
| | ^~~~~~~~~~~~~~
| | |
| | (1) entry to ‘vhost_svq_poll’
|......
| 525 | uint32_t len;
| | ~~~
| | |
| | (2) region created on stack here
| | (3) capacity: 4 bytes
|......
| 528 | if (vhost_svq_more_used(svq)) {
| | ~
| | |
| | (4) inlined call to ‘vhost_svq_more_used’ from ‘vhost_svq_poll’
(...)
| 528 | if (vhost_svq_more_used(svq)) {
| | ^~~~~~~~~~~~~~~~~~~~~~~~~
| | ||
| | |(8) ...to here
| | (7) following ‘true’ branch...
|......
| 537 | vhost_svq_get_buf(svq, &len);
| | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| | |
| | (9) calling ‘vhost_svq_get_buf’ from ‘vhost_svq_poll’
|
+--> ‘vhost_svq_get_buf’: events 10-11
|
| 416 | static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
| | ^~~~~~~~~~~~~~~~~
| | |
| | (10) entry to ‘vhost_svq_get_buf’
|......
| 423 | if (!vhost_svq_more_used(svq)) {
| | ~
| | |
| | (11) inlined call to ‘vhost_svq_more_used’ from ‘vhost_svq_get_buf’
|
(...)
|
‘vhost_svq_get_buf’: event 14
|
| 423 | if (!vhost_svq_more_used(svq)) {
| | ^
| | |
| | (14) following ‘false’ branch...
|
‘vhost_svq_get_buf’: event 15
|
|cc1:
| (15): ...to here
|
<------+
|
‘vhost_svq_poll’: events 16-17
|
| 537 | vhost_svq_get_buf(svq, &len);
| | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
| | |
| | (16) returning to ‘vhost_svq_poll’ from ‘vhost_svq_get_buf’
| 538 | return len;
| | ~~~
| | |
| | (17) use of uninitialized value ‘len’ here
Note by Laurent Vivier <lvivier@redhat.com>:
The return value is only used to detect an error:
vhost_svq_poll
vhost_vdpa_net_cvq_add
vhost_vdpa_net_load_cmd
vhost_vdpa_net_load_mac
-> a negative return is only used to detect error
vhost_vdpa_net_load_mq
-> a negative return is only used to detect error
vhost_vdpa_net_handle_ctrl_avail
-> a negative return is only used to detect error
Fixes: d368c0b052 ("vhost: Do not depend on !NULL VirtQueueElement on vhost_svq_flush")
Signed-off-by: Carlos López <clopez@suse.de>
Message-Id: <20230213085747.19956-1-clopez@suse.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Not stopping them leave the device in a bad state when virtio-net
fronted device is unplugged with device_del monitor command.
This is not triggable in regular poweroff or qemu forces shutdown
because cleanup is called right after vhost_vdpa_dev_start(false). But
devices hot unplug does not call vdpa device cleanups. This lead to all
the vhost_vdpa devices without stop the SVQ but the last.
Fix it and clean the code, making it symmetric with
vhost_vdpa_svqs_start.
Fixes: dff4426fa6 ("vhost: Add Shadow VirtQueue kick forwarding capabilities")
Reported-by: Lei Yang <leiyang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230209170004.899472-1-eperezma@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
The Vhost-user specification changed feature and request
naming from _SLAVE_ to _BACKEND_.
This patch adopts the new naming convention.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Message-Id: <20230208203259.381326-4-maxime.coquelin@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost_dev_cleanup() clears vhost_dev so back up its vqs member to free
the memory pointed by the member.
Fixes: 821d28b88f ("vhost-user-rng: Add vhost-user-rng implementation")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20230130140516.78078-1-akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost_dev_cleanup() clears vhost_dev so back up its vqs member to free
the memory pointed by the member.
Fixes: 7221d3b634 ("hw/virtio: add boilerplate for vhost-user-i2c device")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20230130140435.78049-1-akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost_dev_cleanup(), called from vu_gpio_disconnect(), clears vhost_dev
so vhost-user-gpio must set the members of vhost_dev each time
connecting.
do_vhost_user_cleanup() should also acquire the pointer to vqs directly
from VHostUserGPIO instead of referring to vhost_dev as it can be called
after vhost_dev_cleanup().
Fixes: 27ba7b027f ("hw/virtio: add boilerplate for vhost-user-gpio device")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20230130140320.77999-1-akihiko.odaki@daynix.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost_dev_cleanup() clears vhost_dev so back up its vqs member to free
the memory pointed by the member.
Fixes: 98fc1ada4c ("virtio: add vhost-user-fs base device")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230130140225.77964-1-akihiko.odaki@daynix.com>
Ordinary memory preallocation runs when QEMU starts up and creates the
memory backends, before processing the incoming migration stream. With
virtio-mem, we don't know which memory blocks to preallocate before
migration started. Now that we migrate the virtio-mem bitmap early, before
migrating any RAM content, we can safely preallocate memory for all plugged
memory blocks before migrating any RAM content.
This is especially relevant for the following cases:
(1) User errors
With hugetlb/files, if we don't have sufficient backend memory available on
the migration destination, we'll crash QEMU (SIGBUS) during RAM migration
when running out of backend memory. Preallocating memory before actual
RAM migration allows for failing gracefully and informing the user about
the setup problem.
(2) Excluded memory ranges during migration
For example, virtio-balloon free page hinting will exclude some pages
from getting migrated. In that case, we won't crash during RAM
migration, but later, when running the VM on the destination, which is
bad.
To fix this for new QEMU machines that migrate the bitmap early,
preallocate the memory early, before any RAM migration. Warn with old
QEMU machines.
Getting postcopy right is a bit tricky, but we essentially now implement
the same (problematic) preallocation logic as ordinary preallocation:
preallocate memory early and discard it again before precopy starts. During
ordinary preallocation, discarding of RAM happens when postcopy is advised.
As the state (bitmap) is loaded after postcopy was advised but before
postcopy starts listening, we have to discard memory we preallocated
immediately again ourselves.
Note that nothing (not even hugetlb reservations) guarantees for postcopy
that backend memory (especially, hugetlb pages) are still free after they
were freed ones while discarding RAM. Still, allocating that memory at
least once helps catching some basic setup problems.
Before this change, trying to restore a VM when insufficient hugetlb
pages are around results in the process crashing to to a "Bus error"
(SIGBUS). With this change, QEMU fails gracefully:
qemu-system-x86_64: qemu_prealloc_mem: preallocating memory failed: Bad address
qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-mem-device-early'
qemu-system-x86_64: load of migration failed: Cannot allocate memory
And we can even introspect the early migration data, including the
bitmap:
$ ./scripts/analyze-migration.py -f STATEFILE
{
"ram (2)": {
"section sizes": {
"0000:00:03.0/mem0": "0x0000000780000000",
"0000:00:04.0/mem1": "0x0000000780000000",
"pc.ram": "0x0000000100000000",
"/rom@etc/acpi/tables": "0x0000000000020000",
"pc.bios": "0x0000000000040000",
"0000:00:02.0/e1000.rom": "0x0000000000040000",
"pc.rom": "0x0000000000020000",
"/rom@etc/table-loader": "0x0000000000001000",
"/rom@etc/acpi/rsdp": "0x0000000000001000"
}
},
"0000:00:03.0/virtio-mem-device-early (51)": {
"tmp": "00 00 00 01 40 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00",
"size": "0x0000000040000000",
"bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...]
},
"0000:00:04.0/virtio-mem-device-early (53)": {
"tmp": "00 00 00 08 c0 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00",
"size": "0x00000001fa400000",
"bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...]
},
[...]
Reported-by: Jing Qi <jinqi@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The bitmap and the size are immutable while migration is active: see
virtio_mem_is_busy(). We can migrate this information early, before
migrating any actual RAM content. Further, all information we need for
sanity checks is immutable as well.
Having this information in place early will, for example, allow for
properly preallocating memory before touching these memory locations
during RAM migration: this way, we can make sure that all memory was
actually preallocated and that any user errors (e.g., insufficient
hugetlb pages) can be handled gracefully.
In contrast, usable_region_size and requested_size can theoretically
still be modified on the source while the VM is running. Keep migrating
these properties the usual, late, way.
Use a new device property to keep behavior of compat machines
unmodified.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
"prealloc=on" for the memory backend does not work as expected, as
virtio-mem will simply discard all preallocated memory immediately again.
In the best case, it's an expensive NOP. In the worst case, it's an
unexpected allocation error.
Instead, "prealloc=on" should be specified for the virtio-mem device only,
such that virtio-mem will try preallocating memory before plugging
memory dynamically to the guest. Fail if such a memory backend is
provided.
Tested-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This moves these commands from MAINTAINERS section "Human
Monitor (HMP)" to "virtio".
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230124121946.1139465-20-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
This reverts commit a7f523c7d1.
The nested event loop is broken by design. It's only user was removed.
Drop the code as well so that nobody ever tries to use it again.
I had to fix a couple of trivial conflicts around return values because
of 025faa872b ("vhost-user: stick to -errno error return convention").
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20230119172424.478268-3-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This reverts commit db8a3772e3.
Motivation : this is breaking vhost-user with DPDK as reported in [0].
Received unexpected msg type. Expected 22 received 40
Fail to update device iotlb
Received unexpected msg type. Expected 40 received 22
Received unexpected msg type. Expected 22 received 11
Fail to update device iotlb
Received unexpected msg type. Expected 11 received 22
vhost VQ 1 ring restore failed: -71: Protocol error (71)
Received unexpected msg type. Expected 22 received 11
Fail to update device iotlb
Received unexpected msg type. Expected 11 received 22
vhost VQ 0 ring restore failed: -71: Protocol error (71)
unable to start vhost net: 71: falling back on userspace virtio
The failing sequence that leads to the first error is :
- QEMU sends a VHOST_USER_GET_STATUS (40) request to DPDK on the master
socket
- QEMU starts a nested event loop in order to wait for the
VHOST_USER_GET_STATUS response and to be able to process messages from
the slave channel
- DPDK sends a couple of legitimate IOTLB miss messages on the slave
channel
- QEMU processes each IOTLB request and sends VHOST_USER_IOTLB_MSG (22)
updates on the master socket
- QEMU assumes to receive a response for the latest VHOST_USER_IOTLB_MSG
but it gets the response for the VHOST_USER_GET_STATUS instead
The subsequent errors have the same root cause : the nested event loop
breaks the order by design. It lures QEMU to expect responses to the
latest message sent on the master socket to arrive first.
Since this was only needed for DAX enablement which is still not merged
upstream, just drop the code for now. A working solution will have to
be merged later on. Likely protect the master socket with a mutex
and service the slave channel with a separate thread, as discussed with
Maxime in the mail thread below.
[0] https://lore.kernel.org/qemu-devel/43145ede-89dc-280e-b953-6a2b436de395@redhat.com/
Reported-by: Yanghang Liu <yanghliu@redhat.com>
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2155173
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20230119172424.478268-2-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Use the proper QOM type definition instead of magic string.
This also helps during eventual refactor while using git-grep.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230117193014.83502-1-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
The VHOST_USER_ADD/REM_MEM_REG requests should be categorized into
non-vring specific messages, and should be sent only once.
Signed-off-by: Minghao Yuan <yuanmh12@chinatelecom.cn>
Message-Id: <20230123122119.194347-1-yuanmh12@chinatelecom.cn>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Presumably TARGET_ARM_64 should be a mistake of TARGET_AARCH64.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20230109063130.81296-1-akihiko.odaki@daynix.com>
Fixes: 27598393a2 ("Lift max memory slots limit imposed by vhost-user")
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We have two inclusion loops:
block/block.h
-> block/block-global-state.h
-> block/block-common.h
-> block/blockjob.h
-> block/block.h
block/block.h
-> block/block-io.h
-> block/block-common.h
-> block/blockjob.h
-> block/block.h
I believe these go back to Emanuele's reorganization of the block API,
merged a few months ago in commit d7e2fe4aac.
Fortunately, breaking them is merely a matter of deleting unnecessary
includes from headers, and adding them back in places where they are
now missing.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221221133551.3967339-2-armbru@redhat.com>
The 'hwaddr' type is defined in "exec/hwaddr.h" as:
hwaddr is the type of a physical address
(its size can be different from 'target_ulong').
All definitions use the 'HWADDR_' prefix, except TARGET_FMT_plx:
$ fgrep define include/exec/hwaddr.h
#define HWADDR_H
#define HWADDR_BITS 64
#define HWADDR_MAX UINT64_MAX
#define TARGET_FMT_plx "%016" PRIx64
^^^^^^
#define HWADDR_PRId PRId64
#define HWADDR_PRIi PRIi64
#define HWADDR_PRIo PRIo64
#define HWADDR_PRIu PRIu64
#define HWADDR_PRIx PRIx64
#define HWADDR_PRIX PRIX64
Since hwaddr's size can be *different* from target_ulong, it is
very confusing to read one of its format using the 'TARGET_FMT_'
prefix, normally used for the target_long / target_ulong types:
$ fgrep TARGET_FMT_ include/exec/cpu-defs.h
#define TARGET_FMT_lx "%08x"
#define TARGET_FMT_ld "%d"
#define TARGET_FMT_lu "%u"
#define TARGET_FMT_lx "%016" PRIx64
#define TARGET_FMT_ld "%" PRId64
#define TARGET_FMT_lu "%" PRIu64
Apparently this format was missed during commit a8170e5e97
("Rename target_phys_addr_t to hwaddr"), so complete it by
doing a bulk-rename with:
$ sed -i -e s/TARGET_FMT_plx/HWADDR_FMT_plx/g $(git grep -l TARGET_FMT_plx)
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230110212947.34557-1-philmd@linaro.org>
[thuth: Fix some warnings from checkpatch.pl along the way]
Signed-off-by: Thomas Huth <thuth@redhat.com>
proxy->vector_irqfd did not free when kvm_virtio_pci_vector_use or
msix_set_vector_notifiers failed in virtio_pci_set_guest_notifiers.
Fixes: 7d37d351
Signed-off-by: Lei Xiang <leixiang@kylinos.cn>
Tested-by: Zeng Chi <zengchi@kylinos.cn>
Suggested-by: Xie Ming <xieming@kylinos.cn>
Message-Id: <20221227081604.806415-1-leixiang@kylinos.cn>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This allows the vhost-vdpa device to batch the setup of all its MRs of
host notifiers.
This significantly reduces the device starting time, e.g. the time spend
on setup the host notifier MRs reduce from 423ms to 32ms for a VM with
64 vCPUs and 3 vhost-vDPA generic devices (vdpa_sim_blk, 64vq per device).
Signed-off-by: Longpeng <longpeng2@huawei.com>
Message-Id: <20221227072015.3134-4-longpeng2@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
This allows the vhost device to batch the setup of all its host notifiers.
This significantly reduces the device starting time, e.g. the time spend
on enabling notifiers reduce from 376ms to 9.1ms for a VM with 64 vCPUs
and 3 vhost-vDPA generic devices (vdpa_sim_blk, 64vq per device)
Signed-off-by: Longpeng <longpeng2@huawei.com>
Message-Id: <20221227072015.3134-3-longpeng2@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Simplify the error path in vhost_dev_enable_notifiers by using
vhost_dev_disable_notifiers directly.
Signed-off-by: Longpeng <longpeng2@huawei.com>
Message-Id: <20221227072015.3134-2-longpeng2@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In commit a585fad26b ("vdpa: request iova_range only once") we remove
GET_IOVA_RANGE form vhost_vdpa_init, the generic vdpa device will start
without iova_range populated, so the device won't work. Let's call
GET_IOVA_RANGE ioctl explicitly.
Fixes: a585fad26b ("vdpa: request iova_range only once")
Signed-off-by: Longpeng <longpeng2@huawei.com>
Message-Id: <20221224114848.3062-2-longpeng2@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
hw/virtio/virtio.h and hw/virtio/vhost.h include each other. The
former doesn't actually need the latter, so drop that inclusion to
break the loop.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20221222120813.727830-2-armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Edgar E. Iglesias <edgar@zeroasic.com>
virtio.c is big enough, extract more QMP related code to virtio-qmp.c.
To do so, expose qmp_find_virtio_device() and declar virtio_list in
the internal virtio-qmp.h header.
Note we have to leave qmp_x_query_virtio_queue_status() and
qmp_x_query_virtio_queue_element(), because they access VirtQueue
internal fields, and VirtQueue is only declared within virtio.c.
Suggested-by: Jonah Palmer <jonah.palmer@oracle.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221222080005.27616-3-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
To emphasize this function is QMP related, rename it.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221222080005.27616-2-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add process to handle the configure interrupt, The function's
logic is the same with vq interrupt.Add extra process to check
the configure interrupt
Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20221222070451.936503-11-lulu@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add configure interrupt support in virtio-mmio bus.
add function to set configure guest notifier.
Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20221222070451.936503-10-lulu@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add functions to support configure interrupt.
The configure interrupt process will start in vhost_dev_start
and stop in vhost_dev_stop.
Also add the functions to support vhost_config_pending and
vhost_config_mask.
Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20221222070451.936503-8-lulu@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add the functions to support the configure interrupt in virtio
The function virtio_config_guest_notifier_read will notify the
guest if there is an configure interrupt.
The function virtio_config_set_guest_notifier_fd_handler is
to set the fd hander for the notifier
Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20221222070451.936503-7-lulu@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add new call back function in vhost-vdpa, The function
vhost_set_config_call can set the event fd to kernel.
This function will be called in the vhost_dev_start
and vhost_dev_stop
Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20221222070451.936503-6-lulu@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
To reuse the interrupt process in configure interrupt
Need to decouple the single vector from the interrupt process.
We add new function kvm_virtio_pci_vector_use_one and _release_one.
These functions are used for the single vector, the whole process will
finish in the loop with vq number.
Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20221222070451.936503-4-lulu@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
To reuse the notifier process. We add the virtio_pci_get_notifier
to get the notifier and vector. The INPUT for this function is IDX,
The OUTPUT is the notifier and the vector
Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20221222070451.936503-3-lulu@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
To support configure interrupt for vhost-vdpa
Introduce VIRTIO_CONFIG_IRQ_IDX -1 as configure interrupt's queue index,
Then we can reuse the functions guest_notifier_mask and guest_notifier_pending.
Add the check of queue index in these drivers, if the driver does not support
configure interrupt, the function will just return
Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20221222070451.936503-2-lulu@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221228130956.80515-1-philmd@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
It should be the variable rdl2 to revert the already-notified listeners.
Fixes: 2044969f0b ("virtio-mem: Implement RamDiscardManager interface")
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20221228090312.17276-1-chenyi.qiang@intel.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
vmem->bitmap indexes the memory region of the virtio-mem backend at a
granularity of block_size. To calculate the index of target section offset,
the block_size should be divided instead of the bitmap_size.
Fixes: 2044969f0b ("virtio-mem: Implement RamDiscardManager interface")
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20221216062231.11181-1-chenyi.qiang@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: David Hildenbrand <david@redhat.com>
The monitor decoders are the only functions using the CONFIG_xxx
definitions declared in the target specific CONFIG_DEVICES header.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221213111707.34921-7-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <a class="moz-txt-link-rfc2396E" href="mailto:philmd@linaro.org"><philmd@linaro.org></a>
These config helpers use the target-dependent LD/ST API.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221213111707.34921-6-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
These arrays are only accessed read-only, move them to .rodata.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221213111707.34921-5-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jonah Palmer<jonah.palmer@oracle.com>
Signed-off-by: Philippe Mathieu-Daudé <a class="moz-txt-link-rfc2396E" href="mailto:philmd@linaro.org"><philmd@linaro.org></a>
Commit f3034ad71f ("qmp: decode feature & status bits in
virtio-status") did not guard all qmp_virtio_feature_map_t
arrays with the corresponding #ifdef'ry used in
qmp_decode_features(). Fix that and reduce the arrays scope
by declaring them static.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221213111707.34921-4-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jonah Palmer<jonah.palmer@oracle.com>
Suggested-by: Richard Henderson <a class="moz-txt-link-rfc2396E" href="mailto:richard.henderson@linaro.org"><richard.henderson@linaro.org></a>
Signed-off-by: Philippe Mathieu-Daudé <a class="moz-txt-link-rfc2396E" href="mailto:philmd@linaro.org"><philmd@linaro.org></a>
Since virtio_ss[] is added to specific_ss[], rename it as
specific_virtio_ss[] to make it clearer.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221213111707.34921-3-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio.c uses target_words_bigendian() which is declared in
"hw/core/cpu.h". Add the missing header to avoid when refactoring:
hw/virtio/virtio.c:2451:9: error: implicit declaration of function 'target_words_bigendian' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
if (target_words_bigendian()) {
^
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221213111707.34921-2-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When vIOMMU is enabled, the vq->used_phys is actually the IOVA not
GPA. So we need to translate it to GPA before the syncing otherwise we
may hit the following crash since IOVA could be out of the scope of
the GPA log size. This could be noted when using virtio-IOMMU with
vhost using 1G memory.
Fixes: c471ad0e9b ("vhost_net: device IOTLB support")
Cc: qemu-stable@nongnu.org
Tested-by: Lei Yang <leiyang@redhat.com>
Reported-by: Yalan Zhang <yalzhang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221216033552.77087-1-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Vhost message VHOST_USER_SET_LOG_BASE is device wide. So only
send it once with the first queue pair.
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Parav Pandit <parav@nvidia.com>
Message-Id: <20221122051447.248462-1-yajunw@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Isolate control virtqueue in its own group, allowing to intercept control
commands but letting dataplane run totally passthrough to the guest.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20221215113144.322011-13-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
The memory listener that thells the device how to convert GPA to qemu's
va is registered against CVQ vhost_vdpa. memory listener translations
are always ASID 0, CVQ ones are ASID 1 if supported.
Let's tell the listener if it needs to register them on iova tree or
not.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221215113144.322011-12-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
So the caller can choose which ASID is destined.
No need to update the batch functions as they will always be called from
memory listener updates at the moment. Memory listener updates will
always update ASID 0, as it's the passthrough ASID.
All vhost devices's ASID are 0 at this moment.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221215113144.322011-10-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
SVQ may run or not in a device depending on runtime conditions (for
example, if the device can move CVQ to its own group or not).
Allocate the SVQ array unconditionally at startup, since its hard to
move this allocation elsewhere.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221215113144.322011-9-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The next patches will start control SVQ if possible. However, we don't
know if that will be possible at qemu boot anymore.
Since the moved checks will be already evaluated at net/ to know if it
is ok to shadow CVQ, move them.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221215113144.322011-8-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently iova range is requested once per queue pair in the case of
net. Reduce the number of ioctls asking it once at initialization and
reusing that value for each vhost_vdpa.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20221215113144.322011-7-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasonwang@redhat.com>
Since we don't know if we will use SVQ at qemu initialization, let's
allocate iova_tree only if needed. To do so, accept it at SVQ start, not
at initialization.
This will avoid to create it if the device does not support SVQ.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221215113144.322011-5-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The next patches will start control SVQ if possible. However, we don't
know if that will be possible at qemu boot anymore.
Delay device file descriptors until we know it at device start. This
will avoid to create them if the device does not support SVQ.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221215113144.322011-4-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
By the end of this series CVQ is shadowed as long as the features
support it.
Since we don't know at the beginning of qemu running if this is
supported, move the event notifier handler setting to the start of the
SVQ, instead of the start of qemu run. This will avoid to create them if
the device does not support SVQ.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221215113144.322011-3-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This function used to trust in v->shadow_vqs != NULL to know if it must
start svq or not.
This is not going to be valid anymore, as qemu is going to allocate svq
array unconditionally (but it will only start them conditionally).
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221215113144.322011-2-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The generic vDPA device doesn't support migration currently, so
mark it as unmigratable temporarily.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Longpeng <longpeng2@huawei.com>
Message-Id: <20221215134944.2809-5-longpeng2@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Supports vdpa-dev-pci, we can use the device as follow:
-device vhost-vdpa-device-pci,vhostdev=/dev/vhost-vdpa-X
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Longpeng <longpeng2@huawei.com>
Message-Id: <20221215134944.2809-4-longpeng2@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Supports vdpa-dev, we can use the deivce directly:
-M microvm -m 512m -smp 2 -kernel ... -initrd ... -device \
vhost-vdpa-device,vhostdev=/dev/vhost-vdpa-x
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Longpeng <longpeng2@huawei.com>
Message-Id: <20221215134944.2809-3-longpeng2@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add helpers to get the "Transitional PCI Device ID" and "class_id"
of the device specified by the "Virtio Device ID".
These helpers will be used to build the generic vDPA device later.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Longpeng <longpeng2@huawei.com>
Message-Id: <20221215134944.2809-2-longpeng2@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Convert the TYPE_VIRTIO_PCI class to 3-phase reset. This is
necessary so that we can convert the subclass TYPE_VIRTIO_VGA_BASE
also to 3-phase reset.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20221125115240.3005559-2-peter.maydell@linaro.org
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmOZ6lYSHGFybWJydUBy
ZWRoYXQuY29tAAoJEDhwtADrkYZT6VEQAKynjWh3AIZ4/qOgrVqsP0oRspevLmfH
BbuGoldjYpEE7RbwuCaZalZ7iy7TcSySxnPfUDVsFHd7NWffJVjwKHifGC0D/Ez0
+Ggyb1CBebN+mS7t+BNFUHdMM+wxFIlHwg4f4aTFbn2o0HKgj2a8tcNzNRonZbfa
xURnvbD4G4u0VZEc3Jak+x193xbOJFsuuWq0BZnDuNk+XqjyW2RwfpXLPJVk+82a
4uy/YgYuqXUqBeULwcJj+shBL4SXR9GyajTFMS64przSUle0ADUmXkPtaS2agV7e
Pym/UQuAcxvNyw34fJsiMZxx6rZI9YU30jQUMRLoYcPRR/Q/aiPeiiHtiD6Kaid7
IfOeH/EArXaQRFpD89xj4YcaTnRLQOEj0NXgXvAbQf6eD8JYyao/S/0lCsPZEoA2
nibLqEQ25ncDNXoSomuwtfjVff3w68lODFbhwqfA0gf3cPtCgVZ6xQ8P/McNY6K6
wqFHXMWTDHk1LOCTucjYz1z2TGzTnSG4iWi5Yt6FSxAc958AO+v5ALn/1pcYun+E
azM/MF0AInKj2aJCT530zT0tpCs/Jo07YKC8k6ubi77S0ZdmGS1XLeXkRXfk1+yI
OhuUgiVlSTHxD69DagT2vbnx1mDMM9X+OBIMvEi5nwvD9A/ghaCgkDeGFvbA1ud0
t0mxPBZJ+tiZ
=JJjG
-----END PGP SIGNATURE-----
Merge tag 'pull-misc-2022-12-14' of https://repo.or.cz/qemu/armbru into staging
Miscellaneous patches for 2022-12-14
# gpg: Signature made Wed 14 Dec 2022 15:23:02 GMT
# gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg: issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* tag 'pull-misc-2022-12-14' of https://repo.or.cz/qemu/armbru:
ppc4xx_sdram: Simplify sdram_ddr_size() to return
block/vmdk: Simplify vmdk_co_create() to return directly
cleanup: Tweak and re-run return_directly.cocci
io: Tidy up fat-fingered parameter name
qapi: Use returned bool to check for failure (again)
sockets: Use ERRP_GUARD() where obviously appropriate
qemu-config: Use ERRP_GUARD() where obviously appropriate
qemu-config: Make config_parse_qdict() return bool
monitor: Use ERRP_GUARD() in monitor_init()
monitor: Simplify monitor_fd_param()'s error handling
error: Move ERRP_GUARD() to the beginning of the function
error: Drop a few superfluous ERRP_GUARD()
error: Drop some obviously superfluous error_propagate()
Drop more useless casts from void * to pointer
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The has_FOO for pointer-valued FOO are redundant, except for arrays.
They are also a nuisance to work with. Recent commit "qapi: Start to
elide redundant has_FOO in generated C" provided the means to elide
them step by step. This is the step for qapi/virtio.json.
Said commit explains the transformation in more detail. The invariant
violations mentioned there do not occur here.
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221104160712.3005652-29-armbru@redhat.com>
The has_FOO for pointer-valued FOO are redundant, except for arrays.
They are also a nuisance to work with. Recent commit "qapi: Start to
elide redundant has_FOO in generated C" provided the means to elide
them step by step. This is the step for qapi/machine*.json.
Said commit explains the transformation in more detail. The invariant
violations mentioned there do not occur here.
Cc: Eduardo Habkost <eduardo@habkost.net>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221104160712.3005652-16-armbru@redhat.com>
Tweak the semantic patch to drop redundant parenthesis around the
return expression.
Coccinelle drops a comment in hw/rdma/vmw/pvrdma_cmd.c; restored
manually.
Coccinelle messes up vmdk_co_create(), not sure why. Change dropped,
will be done manually in the next commit.
Line breaks in target/avr/cpu.h and hw/rdma/vmw/pvrdma_cmd.c tidied up
manually.
Whitespace in tools/virtiofsd/fuse_lowlevel.c tidied up manually.
checkpatch.pl complains "return of an errno should typically be -ve"
two times for hw/9pfs/9p-synth.c. Preexisting, the patch merely makes
it visible to checkpatch.pl.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221122134917.1217307-2-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Commit 012d4c96e2 changed the visitor functions taking Error ** to
return bool instead of void, and the commits following it used the new
return value to simplify error checking. Since then a few more uses
in need of the same treatment crept in. Do that. All pretty
mechanical except for
* balloon_stats_get_all()
This is basically the same transformation commit 012d4c96e2 applied
to the virtual walk example in include/qapi/visitor.h.
* set_max_queue_size()
Additionally replace "goto end of function" by return.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221121085054.683122-10-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
include/qapi/error.h advises to put ERRP_GUARD() right at the
beginning of the function, because only then can it guard the whole
function. Clean up the few spots disregarding the advice.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221121085054.683122-4-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
..and use for both virtio-user-blk and virtio-user-gpio. This avoids
the circular close by deferring shutdown due to disconnection until a
later point. virtio-user-blk already had this mechanism in place so
generalise it as a vhost-user helper function and use for both blk and
gpio devices.
While we are at it we also fix up vhost-user-gpio to re-establish the
event handler after close down so we can reconnect later.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20221130112439.2527228-5-alex.bennee@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
As per the fix to vhost-user-blk in f5b22d06fb (vhost: recheck dev
state in the vhost_migration_log routine) we really should track the
connection and starting separately.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20221130112439.2527228-4-alex.bennee@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit 02b61f38d3 ("hw/virtio: incorporate backend features in features")
properly negotiates VHOST_USER_F_PROTOCOL_FEATURES with the vhost-user
backend, but we forgot to enable vrings as specified in
docs/interop/vhost-user.rst:
If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the
ring starts directly in the enabled state.
If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is
initialized in a disabled state and is enabled by
``VHOST_USER_SET_VRING_ENABLE`` with parameter 1.
Some vhost-user front-ends already did this by calling
vhost_ops.vhost_set_vring_enable() directly:
- backends/cryptodev-vhost.c
- hw/net/virtio-net.c
- hw/virtio/vhost-user-gpio.c
But most didn't do that, so we would leave the vrings disabled and some
backends would not work. We observed this issue with the rust version of
virtiofsd [1], which uses the event loop [2] provided by the
vhost-user-backend crate where requests are not processed if vring is
not enabled.
Let's fix this issue by enabling the vrings in vhost_dev_start() for
vhost-user front-ends that don't already do this directly. Same thing
also in vhost_dev_stop() where we disable vrings.
[1] https://gitlab.com/virtio-fs/virtiofsd
[2] https://github.com/rust-vmm/vhost/blob/240fc2966/crates/vhost-user-backend/src/event_loop.rs#L217
Fixes: 02b61f38d3 ("hw/virtio: incorporate backend features in features")
Reported-by: German Maglione <gmaglione@redhat.com>
Tested-by: German Maglione <gmaglione@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20221123131630.52020-1-sgarzare@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20221130112439.2527228-3-alex.bennee@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Virtio 1.0 is pretty clear that features have to be
negotiated before enabling VQs. Unfortunately Seabios
ignored this ever since gaining 1.0 support (UEFI is ok).
Comment the error out for now, and add a TODO.
Fixes: 3c37f8b8d1 ("virtio: introduce virtio_queue_enable()")
Cc: "Kangjie Xu" <kangjie.xu@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20221121200339.362452-1-mst@redhat.com>
Commit 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
enabled VIRTIO_F_RING_RESET by default for all virtio devices.
This feature is not currently emulated by QEMU, so for vhost and
vhost-user devices we need to make sure it is supported by the offloaded
device emulation (in-kernel or in another process).
To do this we need to add VIRTIO_F_RING_RESET to the features bitmap
passed to vhost_get_features(). This way it will be masked if the device
does not support it.
This issue was initially discovered with vhost-vsock and vhost-user-vsock,
and then also tested with vhost-user-rng which confirmed the same issue.
They fail when sending features through VHOST_SET_FEATURES ioctl or
VHOST_USER_SET_FEATURES message, since VIRTIO_F_RING_RESET is negotiated
by the guest (Linux >= v6.0), but not supported by the device.
Fixes: 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1318
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20221121101101.29400-1-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
The previous fix to virtio_device_started revealed a problem in its
use by both the core and the device code. The core code should be able
to handle the device "starting" while the VM isn't running to handle
the restoration of migration state. To solve this duel use introduce a
new helper for use by the vhost-user backends who all use it to feed a
should_start variable.
We can also pick up a change vhost_user_blk_set_status while we are at
it which follows the same pattern.
Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started)
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Message-Id: <20221107121407.1010913-1-alex.bennee@linaro.org>
The motivation of adding vhost-user vhost_dev_start support is to
improve backend configuration speed and reduce live migration VM
downtime.
Today VQ configuration is issued one by one. For virtio net with
multi-queue support, backend needs to update RSS (Receive side
scaling) on every rx queue enable. Updating RSS is time-consuming
(typical time like 7ms).
Implement already defined vhost status and message in the vhost
specification [1].
(a) VHOST_USER_PROTOCOL_F_STATUS
(b) VHOST_USER_SET_STATUS
(c) VHOST_USER_GET_STATUS
Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for
device start and reset(0) for device stop.
On reception of the DRIVER_OK message, backend can apply the needed setting
only once (instead of incremental) and also utilize parallelism on enabling
queues.
This improves QEMU's live migration downtime with vhost user backend
implementation by great margin, specially for the large number of VQs of 64
from 800 msec to 250 msec.
[1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Parav Pandit <parav@nvidia.com>
Message-Id: <20221017064452.1226514-3-yajunw@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
There were several different ways to deal with the situation where the
vector specified for a msix function is out of bound:
- early return a function and keep progresssing
- propagate the error to the caller
- mark msix unusable
- assert it is in bound
- just ignore
An out-of-bound vector should not be specified if the device
implementation is correct so let msix functions always assert that the
specified vector is in range.
An exceptional case is virtio-pci, which allows the guest to configure
vectors. For virtio-pci, it is more appropriate to introduce its own
checks because it is sometimes too late to check the vector range in
msix functions.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20220829083524.143640-1-akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Yuval Shaia <yuval.shaia.ml@gmail.com>
Signed-off-by: Akihiko Odaki <<a href="mailto:akihiko.odaki@daynix.com" target="_blank">akihiko.odaki@daynix.com</a>><br>
vhost backend sends host notification for every VQ. If backend creates
VQs in parallel, the VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG may
arrive to QEMU in different order than incremental queue index order.
For example VQ 1's message arrive earlier than VQ 0's:
After alloc VhostUserHostNotifier for VQ 1. GPtrArray becomes
[ nil, VQ1 pointer ]
After alloc VhostUserHostNotifier for VQ 0. GPtrArray becomes
[ VQ0 pointer, nil, VQ1 pointer ]
This is wrong. fetch_notifier will return NULL for VQ 1 in
vhost_user_get_vring_base, causes host notifier miss removal(leak).
The fix is to remove current element from GPtrArray, make the right
position for element to insert.
Fixes: 503e355465 ("virtio/vhost-user: dynamically assign VhostUserHostNotifiers")
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Acked-by: Parav Pandit <parav@nvidia.com>
Message-Id: <20221018023651.1359420-1-yajunw@nvidia.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Most other virtio-pci devices allow MSI-X, let's have it for rng too.
Signed-off-by: David Daney <david.daney@fungible.com>
Reviewed-by: Marcin Nowakowski <marcin.nowakowski@fungible.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@fungible.com>
Message-Id: <20221014160947.66105-1-philmd@fungible.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Expose vhost_virtqueue_stop(), we need to use it when resetting a
virtqueue.
Signed-off-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221017092558.111082-9-xuanzhuo@linux.alibaba.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Expose vhost_virtqueue_start(), we need to use it when restarting a
virtqueue.
Signed-off-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221017092558.111082-8-xuanzhuo@linux.alibaba.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
PCI devices support device specific vq enable.
Based on this function, the driver can re-enable the virtqueue after the
virtqueue is reset.
Signed-off-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221017092558.111082-7-xuanzhuo@linux.alibaba.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
PCI devices support vq reset.
Based on this function, the driver can adjust the size of the ring, and
quickly recycle the buffer in the ring.
The migration of the virtio devices will not happen during a reset
operation. This is becuase the global iothread lock is held. Migration
thread also needs the lock. As a result, when migration of virtio
devices starts, the 'reset' status of VirtIOPCIQueue will always be 0.
Thus, we do not need to add it in vmstate_virtio_pci_modern_queue_state.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221017092558.111082-6-xuanzhuo@linux.alibaba.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introduce the interface queue_enable() in VirtioDeviceClass and the
fucntion virtio_queue_enable() in virtio, it can be called when
VIRTIO_PCI_COMMON_Q_ENABLE is written and related virtqueue can be
started. It only supports the devices of virtio 1 or later. The
not-supported devices can only start the virtqueue when DRIVER_OK.
Signed-off-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221017092558.111082-4-xuanzhuo@linux.alibaba.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introduce a new interface function virtio_queue_reset() to implement
reset for vq.
Add a new callback to VirtioDeviceClass for queue reset operation for
each child device.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221017092558.111082-3-xuanzhuo@linux.alibaba.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Separate the logic of vq reset. This logic will be called directly
later.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221017092558.111082-2-xuanzhuo@linux.alibaba.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In theory the virtio-iommu-pci could be plugged anywhere in the PCIe
topology and as long as the dt/acpi info are properly built this should
work. However at the moment we fail to do that because the
virtio-iommu-pci BDF is not computed at plug time and in that case
vms->virtio_iommu_bdf gets an incorrect value.
For instance if the virtio-iommu-pci is plugged onto a pcie root port
and the virtio-iommu protects a virtio-block-pci device the guest does
not boot.
So let's do not pretend we do support this case and fail the initialize()
if we detect the virtio-iommu-pci is plugged anywhere else than on the
root bus. Anyway this ability is not needed.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20221012163448.121368-1-eric.auger@redhat.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Tested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio-crypto: Modify the current interface of virtio-crypto
device to support asynchronous mode.
Signed-off-by: lei he <helei.sig11@bytedance.com>
Message-Id: <20221008085030.70212-2-helei.sig11@bytedance.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Enabling all the code path created before.
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
So SVQ code knows if an event is needed.
The code is not reachable at the moment.
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Actually use the new field of the used ring and tell the device if SVQ
wants to be notified.
The code is not reachable at the moment.
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
There was not enough room to accomodate them.
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
... and implement it under POSIX. When a ThreadContext is provided,
create new threads via the context such that these new threads obtain a
properly configured CPU affinity.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <20221014134720.168738-6-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's
* give the function a "qemu_*" style name
* make sure the parameters in the implementation match the prototype
* rename smp_cpus to max_threads, which makes the semantics of that
parameter clearer
... and add a function documentation.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <20221014134720.168738-2-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
While being at it add a #define for the magic 0x1040 number.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20221004112100.301935-6-kraxel@redhat.com>
Not needed for a virtio 1.0 device. virtio_pci_device_plugged()
overrides them anyway (so no functional change).
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20221004112100.301935-3-kraxel@redhat.com>
Not needed for a virtio 1.0 device. virtio_pci_device_plugged()
overrides them anyway (so no functional change).
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20221004112100.301935-2-kraxel@redhat.com>
This new command shows the information of a VirtQueue element.
[Note: Up until v10 of this patch series, virtio.json had many (15+)
enums defined (e.g. decoded device features, statuses, etc.). In v10
most of these enums were removed and replaced with string literals.
By doing this we get (1) simpler schema, (2) smaller generated code,
and (3) less maintenance burden for when new things are added (e.g.
devices, device features, etc.).]
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <1660220684-24909-6-git-send-email-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
These new commands show the internal status of a VirtIODevice's
VirtQueue and a vhost device's vhost_virtqueue (if active).
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <1660220684-24909-5-git-send-email-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Display feature names instead of bitmaps for host, guest, and
backend for VirtIODevices.
Display status names instead of bitmaps for VirtIODevices.
Display feature names instead of bitmaps for backend, protocol,
acked, and features (hdev->features) for vhost devices.
Decode features according to device ID. Decode statuses
according to configuration status bitmap (config_status_map).
Decode vhost user protocol features according to vhost user
protocol bitmap (vhost_user_protocol_map).
Transport features are on the first line. Undecoded bits (if
any) are stored in a separate field.
[Jonah: Several changes made to this patch from prev. version (v14):
- Moved all device features mappings to hw/virtio/virtio.c
- Renamed device features mappings (less generic)
- Generalized @FEATURE_ENTRY macro for all device mappings
- Virtio device feature map definitions include descriptions of
feature bits
- Moved @VHOST_USER_F_PROTOCOL_FEATURES feature bit from transport
feature map to vhost-user-supported device feature mappings
(blk, fs, i2c, rng, net, gpu, input, scsi, vsock)
- New feature bit added for virtio-vsock: @VIRTIO_VSOCK_F_SEQPACKET
- New feature bit added for virtio-iommu: @VIRTIO_IOMMU_F_BYPASS_CONFIG
- New feature bit added for virtio-mem: @VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE
- New virtio transport feature bit added: @VIRTIO_F_IN_ORDER
- Added device feature map definition for virtio-rng
]
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <1660220684-24909-4-git-send-email-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This new command shows the status of a VirtIODevice, including
its corresponding vhost device's status (if active).
Next patch will improve output by decoding feature bits, including
vhost device's feature bits (backend, protocol, acked, and features).
Also will decode status bits of a VirtIODevice.
[Jonah: From patch v12; added a check to @virtio_device_find to ensure
synchronicity between @virtio_list and the devices in the QOM
composition tree.]
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <1660220684-24909-3-git-send-email-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This new command lists all the instances of VirtIODevices with
their canonical QOM path and name.
[Jonah: @virtio_list duplicates information that already exists in
the QOM composition tree. However, extracting necessary information
from this tree seems to be a bit convoluted.
Instead, we still create our own list of realized virtio devices
but use @qmp_qom_get with the device's canonical QOM path to confirm
that the device exists and is realized. If the device exists but
is actually not realized, then we remove it from our list (for
synchronicity to the QOM composition tree).
Also, the QMP command @x-query-virtio is redundant as @qom-list
and @qom-get are sufficient to search '/machine/' for realized
virtio devices. However, @x-query-virtio is much more convenient
in listing realized virtio devices.]
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <1660220684-24909-2-git-send-email-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This is the first step towards moving all device config size calculation
logic into the virtio core code. In particular, this adds a struct that
contains all the necessary information for common virtio code to be able
to calculate the final config size for a device. This is expected to be
used with the new virtio_get_config_size helper, which calculates the
final length based on the provided host features.
This builds on top of already existing code like VirtIOFeature and
virtio_feature_get_config_size(), but adds additional fields, as well as
sanity checking so that device-specifc code doesn't have to duplicate it.
An example usage would be:
static const VirtIOFeature dev_features[] = {
{.flags = 1ULL << FEATURE_1_BIT,
.end = endof(struct virtio_dev_config, feature_1)},
{.flags = 1ULL << FEATURE_2_BIT,
.end = endof(struct virtio_dev_config, feature_2)},
{}
};
static const VirtIOConfigSizeParams dev_cfg_size_params = {
.min_size = DEV_BASE_CONFIG_SIZE,
.max_size = sizeof(struct virtio_dev_config),
.feature_sizes = dev_features
};
// code inside my_dev_device_realize()
size_t config_size = virtio_get_config_size(&dev_cfg_size_params,
host_features);
virtio_init(vdev, VIRTIO_ID_MYDEV, config_size);
Currently every device is expected to write its own boilerplate from the
example above in device_realize(), however, the next step of this
transition is moving VirtIOConfigSizeParams into VirtioDeviceClass,
so that it can be done automatically by the virtio initialization code.
All of the users of virtio_feature_get_config_size have been converted
to use virtio_get_config_size so it's no longer needed and is removed
with this commit.
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Message-Id: <20220906073111.353245-2-d-tatianin@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This allows is to instantiate a vhost-user-gpio device as part of a PCI
bus. It is mostly boilerplate which looks pretty similar to the
vhost-user-fs-pci device.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <5f560cab92d0d789b1c94295ec74b9952907d69d.1641987128.git.viresh.kumar@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220802095010.3330793-14-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This creates the QEMU side of the vhost-user-gpio device which connects
to the remote daemon. It is based of vhost-user-i2c code.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <5390324a748194a21bc99b1538e19761a8c64092.1641987128.git.viresh.kumar@linaro.org>
[AJB: fixes for qtest, tweaks to feature bits]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Message-Id: <20220802095010.3330793-13-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The `started` field is manipulated internally within the vhost code
except for one place, vhost-user-blk via f5b22d06fb (vhost: recheck
dev state in the vhost_migration_log routine). Mark that as a FIXME
because it introduces a potential race. I think the referenced fix
should be tracking its state locally.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220802095010.3330793-12-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwittz@nutanix.com>
All the boilerplate virtio code does the same thing (or should at
least) of checking to see if the VM is running before attempting to
start VirtIO. Push the logic up to the common function to avoid
getting a copy and paste wrong.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220802095010.3330793-11-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
These are useful for tracing the lifetime of vhost-user connections.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220802095010.3330793-10-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
If the guest driver attempts to use the UNUSED(30) bit it is
potentially buggy as 6.3 Legacy Interface: Reserved Feature Bits
states it "SHOULD NOT be negotiated". For now just log this guest
error.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220802095010.3330793-9-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220802095010.3330793-8-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
There are some extra bits used over a vhost-user connection which are
hidden from the device itself. We need to set them here to ensure we
enable things like the protocol extensions.
Currently net/vhost-user.c has it's own inscrutable way of persisting
this data but it really should live in the core vhost_user code.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220726192150.2435175-7-alex.bennee@linaro.org>
Message-Id: <20220802095010.3330793-2-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.
This commit only touches allocations with size arguments of the form
sizeof(T).
Patch created mechanically with:
$ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \
--macro-file scripts/cocci-macro-file.h FILES...
The previous iteration was commit a95942b50c.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220923084254.4173111-1-armbru@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
GCC issues a false positive warning, resulting in build failure with -Werror:
In file included from /usr/include/glib-2.0/glib.h:114,
from src/include/glib-compat.h:32,
from src/include/qemu/osdep.h:144,
from ../src/hw/virtio/vhost-shadow-virtqueue.c:10:
In function ‘g_autoptr_cleanup_generic_gfree’,
inlined from ‘vhost_handle_guest_kick’ at ../src/hw/virtio/vhost-shadow-virtqueue.c:292:42:
/usr/include/glib-2.0/glib/glib-autocleanups.h:28:3: error: ‘elem’ may be used uninitialized [-Werror=maybe-uninitialized]
28 | g_free (*pp);
| ^~~~~~~~~~~~
../src/hw/virtio/vhost-shadow-virtqueue.c: In function ‘vhost_handle_guest_kick’:
../src/hw/virtio/vhost-shadow-virtqueue.c:292:42: note: ‘elem’ was declared here
292 | g_autofree VirtQueueElement *elem;
| ^~~~
cc1: all warnings being treated as errors
There is actually no problem since "elem" is initialized in both branches.
Silence the warning by initializig it with "NULL".
$ gcc --version
gcc (GCC) 12.2.0
Fixes: 9c2ab2f1ec ("vhost: stop transfer elem ownership in vhost_handle_guest_kick")
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220910151117.6665-1-shentey@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
As the close-on-exec flags is not set on the file descriptors returned
by socketpair() at default, the fds will survive across exec' function.
In the case that exec' function get invoked, such as the live-update feature
which is been developing, it will cause fd leaks.
To address this problem, we should call qemu_socketpair() to create an pair of
connected sockets with the close-on-exec flag set.
Signed-off-by: Guoyi Tu <tugy@chinatelecom.cn>
Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <7002b12a5fb0a30cd878e14e07da61c36da72913.1661240709.git.tugy@chinatelecom.cn>
We can restore the device state in the destination via CVQ now. Remove
the migration blocker.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Since QEMU will be able to inject new elements on CVQ to restore the
state, we need not to depend on a VirtQueueElement to know if a new
element has been used by the device or not. Instead of check that, check
if there are new elements only using used idx on vhost_svq_flush.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
As discussed in previous series [1], this memory barrier is useless with
the atomic read of used idx at vhost_svq_more_used. Deleting it.
[1] https://lists.nongnu.org/archive/html/qemu-devel/2022-07/msg02616.html
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Since we're going to allow SVQ to add elements without the guest's
knowledge and without its own VirtQueueElement, it's easier to check if
an element is a valid head checking a different thing than the
VirtQueueElement.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
It was easier to allow vhost_svq_add to handle the memory. Now that we
will allow qemu to add elements to a SVQ without the guest's knowledge,
it's better to handle it in the caller.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reduce code duplication.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
We can unbind twice a file descriptor if we call twice
vhost_svq_set_svq_kick_fd because of this. Since it comes from vhost and
not from SVQ, that file descriptor could be a different thing that
guest's vhost notifier.
Likewise, it can happens the same if a guest start and stop the device
multiple times.
Reported-by: Lei Yang <leiyang@redhat.com>
Fixes: dff4426fa6 ("vhost: Add Shadow VirtQueue kick forwarding capabilities")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Nothing actually reads the return value, but an error in cleaning some
entries could cause device stop to abort, making a restart impossible.
Better ignore explicitely the return value.
Reported-by: Lei Yang <leiyang@redhat.com>
Fixes: 34e3c94eda ("vdpa: Add custom IOTLB translations to SVQ")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Although the device will be reset before usage, the right thing to do is
to clean it.
Reported-by: Lei Yang <leiyang@redhat.com>
Fixes: 34e3c94eda ("vdpa: Add custom IOTLB translations to SVQ")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
It's convenient to call iova_tree_remove from a map returned from
iova_tree_find or iova_tree_find_iova. With the current code this is not
possible, since we will free it, and then we will try to search for it
again.
Fix it making accepting the map by value, forcing a copy of the
argument. Not applying a fixes tag, since there is no use like that at
the moment.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
If a map fails for whatever reason, it must not be saved in the tree.
Otherwise, qemu will try to unmap it in cleanup, leaving to more errors.
Fixes: 34e3c94eda ("vdpa: Add custom IOTLB translations to SVQ")
Reported-by: Lei Yang <leiyang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Next patch will skip the registering of dma maps that the vdpa device
rejects in the iova tree. We need to consider that here or we cause a
SIGSEGV accessing result.
Reported-by: Lei Yang <leiyang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
virtio level reset should not affect pci express
registers such as PM, error or link.
Fixes: 27ce0f3afc ("hw/virtio: fix Power Management Control Register for PCI Express virtio devices")
Fixes: d584f1b9ca ("hw/virtio: fix Link Control Register for PCI Express virtio devices")
Fixes: c2cabb3422 ("hw/virtio: fix error enabling flags in Device Control register")
Cc: "Marcel Apfelbaum" <marcel@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
As reads happen in the callback we were never seeing them. We only
really care about the header so move the tracepoint to when the header
is complete.
Fixes: 6ca6d8ee9d (hw/virtio: add vhost_user_[read|write] trace points)
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220728135503.1060062-5-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The assert() protecting against leakage is a little aggressive and
causes needless crashes if a device is shutdown without having been
configured. In this case no descriptors are lost because none have
been assigned.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220728135503.1060062-4-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
I've noticed asserts firing because we query the status of vdev after
a vhost connection is closed down. Rather than faulting on the NULL
indirect just quietly reply false.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220728135503.1060062-3-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently we only enforce power-of-two mappings (required by the QEMU
notifier) for UNMAP requests. A MAP request not aligned on a
power-of-two may be successfully handled by VFIO, and then the
corresponding UNMAP notify will fail because it will attempt to split
that mapping. Ensure MAP and UNMAP notifications are consistent.
Fixes: dde3f08b5c ("virtio-iommu: Handle non power of 2 range invalidations")
Reported-by: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20220718135636.338264-1-jean-philippe@linaro.org>
Tested-by: Tina Zhang <tina.zhang@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost_vdpa_listener_region_del is always deleting the first iova entry
of the tree, since it's using the needle iova instead of the result's
one.
This was detected using a vga virtual device in the VM using vdpa SVQ.
It makes some extra memory adding and deleting, so the wrong one was
mapped / unmapped. This was undetected before since all the memory was
mappend and unmapped totally without that device, but other conditions
could trigger it too:
* mem_region was with .iova = 0, .translated_addr = (correct GPA).
* iova_tree_find_iova returned right result, but does not update
mem_region.
* iova_tree_remove always removed region with .iova = 0. Right iova were
sent to the device.
* Next map will fill the first region with .iova = 0, causing a mapping
with the same iova and device complains, if the next action is a map.
* Next unmap will cause to try to unmap again iova = 0, causing the
device to complain that no region was mapped at iova = 0.
Fixes: 34e3c94eda ("vdpa: Add custom IOTLB translations to SVQ")
Reported-by: Lei Yang <leiyang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The SVQ vring used idx usually match with the guest visible one, as long
as all the guest buffers (GPA) maps to exactly one buffer within qemu's
VA. However, as we can see in virtqueue_map_desc, a single guest buffer
could map to many buffers in SVQ vring.
Also, its also a mistake to rewind them at the source of migration.
Since VirtQueue is able to migrate the inflight descriptors, its
responsability of the destination to perform the rewind just in case it
cannot report the inflight descriptors to the device.
This makes easier to migrate between backends or to recover them in
vhost devices that support set in flight descriptors.
Fixes: 6d0b222666 ("vdpa: Adapt vhost_vdpa_get_vring_base to SVQ")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Since the vhost-vdpa device is exposing _F_LOG, adding a migration blocker if
it uses CVQ.
However, qemu is able to migrate simple devices with no CVQ as long as
they use SVQ. To allow it, add a placeholder error to vhost_vdpa, and
only add to vhost_dev when used. vhost_dev machinery place the migration
blocker if needed.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Do a simple forwarding of CVQ buffers, the same work SVQ could do but
through callbacks. No functional change intended.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Shadow CVQ will copy buffers on qemu VA, so we avoid TOCTOU attacks from
the guest that could set a different state in qemu device model and vdpa
device.
To do so, it needs to be able to map these new buffers to the device.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This allows external handlers to be aware of new buffers that the guest
places in the virtqueue.
When this callback is defined the ownership of the guest's virtqueue
element is transferred to the callback. This means that if the user
wants to forward the descriptor it needs to manually inject it. The
callback is also free to process the command by itself and use the
element with svq_push.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
It allows the Shadow Control VirtQueue to wait for the device to use the
available buffers.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This allows external parts of SVQ to forward custom buffers to the
device.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This function allows external SVQ users to return guest's available
buffers.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
A guest's buffer continuos on GPA may need multiple descriptors on
qemu's VA, so SVQ should track its length sepparatedly.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This will allow SVQ to add context to the different queue elements.
This patch only store the actual element, no functional change intended.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
VirtQueueElement comes from the guest, but we're heading SVQ to be able
to modify the element presented to the device without the guest's
knowledge.
To do so, make SVQ accept sg buffers directly, instead of using
VirtQueueElement.
Add vhost_svq_add_element to maintain element convenience.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The series need to expose vhost_svq_add with full functionality,
including checking for full queue.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The series needs to expose vhost_svq_add with full functionality,
including kick
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Future code needs to call it from vhost_svq_add.
No functional change intended.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
In the next patch we will allow busypolling of this value. The compiler
have a running path where shadow_used_idx, last_used_idx, and vring used
idx are not modified within the same thread busypolling.
This was not an issue before since we always cleared device event
notifier before checking it, and that could act as memory barrier.
However, the busypoll needs something similar to kernel READ_ONCE.
Let's add it here, sepparated from the polling.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
It's done for both in and out descriptors so it's better placed here.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>