Modify migrate_add_blocker and migrate_del_blocker to take an Error **
reason. This allows migration to own the Error object, so that if
an error occurs in migrate_add_blocker, migration code can free the Error
and clear the client handle, simplifying client code. It also simplifies
the migrate_del_blocker call site.
In addition, this is a pre-requisite for a proposed future patch that would
add a mode argument to migration requests to support live update, and
maintain a list of blockers for each mode. A blocker may apply to a single
mode or to multiple modes, and passing Error** will allow one Error object
to be registered for multiple modes.
No functional change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Tested-by: Michael Galaxy <mgalaxy@akamai.com>
Reviewed-by: Michael Galaxy <mgalaxy@akamai.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1697634216-84215-1-git-send-email-steven.sistare@oracle.com>
Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG' to allow
defining shared memory regions with sizes and offsets of 2^32 and more.
Multiple instances of the capability are allowed and distinguished
by a device-specific 'id'.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Tested-by: Alyssa Ross <hi@alyssa.is>
Tested-by: Huang Rui <ray.huang@amd.com>
Tested-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Let's mark the memslot alias memory regions as unmergable, such that
flatview and vhost won't merge adjacent memory region aliases and we can
atomically map/unmap individual aliases without affecting adjacent
alias memory regions.
This handles vhost and vfio in multiple-memslot mode correctly (which do
not support atomic memslot updates) and avoids the temporary removal of
large memslots, which can be an expensive operation. For example, vfio
might have to unpin + repin a lot of memory, which is undesired.
Message-ID: <20230926185738.277351-19-david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's allow for marking memory regions unmergeable, to teach
flatview code and vhost to not merge adjacent aliases to the same memory
region into a larger memory section; instead, we want separate aliases to
stay separate such that we can atomically map/unmap aliases without
affecting other aliases.
This is desired for virtio-mem mapping device memory located on a RAM
memory region via multiple aliases into a memory region container,
resulting in separate memslots that can get (un)mapped atomically.
As an example with virtio-mem, the layout would look something like this:
[...]
0000000240000000-00000020bfffffff (prio 0, i/o): device-memory
0000000240000000-000000043fffffff (prio 0, i/o): virtio-mem
0000000240000000-000000027fffffff (prio 0, ram): alias memslot-0 @mem2 0000000000000000-000000003fffffff
0000000280000000-00000002bfffffff (prio 0, ram): alias memslot-1 @mem2 0000000040000000-000000007fffffff
00000002c0000000-00000002ffffffff (prio 0, ram): alias memslot-2 @mem2 0000000080000000-00000000bfffffff
[...]
Without unmergable memory regions, all three memslots would get merged into
a single memory section. For example, when mapping another alias (e.g.,
virtio-mem-memslot-3) or when unmapping any of the mapped aliases,
memory listeners will first get notified about the removal of the big
memory section to then get notified about re-adding of the new
(differently merged) memory section(s).
In an ideal world, memory listeners would be able to deal with that
atomically, like KVM nowadays does. However, (a) supporting this for other
memory listeners (vhost-user, vfio) is fairly hard: temporary removal
can result in all kinds of issues on concurrent access to guest memory;
and (b) this handling is undesired, because temporarily removing+readding
can consume quite some time on bigger memslots and is not efficient
(e.g., vfio unpinning and repinning pages ...).
Let's allow for marking a memory region unmergeable, such that we
can atomically (un)map aliases to the same memory region, similar to
(un)mapping individual DIMMs.
Similarly, teach vhost code to not redo what flatview core stopped doing:
don't merge such sections. Merging in vhost code is really only relevant
for handling random holes in boot memory where; without this merging,
the vhost-user backend wouldn't be able to mmap() some boot memory
backed on hugetlb.
We'll use this for virtio-mem next.
Message-ID: <20230926185738.277351-18-david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Having large virtio-mem devices that only expose little memory to a VM
is currently a problem: we map the whole sparse memory region into the
guest using a single memslot, resulting in one gigantic memslot in KVM.
KVM allocates metadata for the whole memslot, which can result in quite
some memory waste.
Assuming we have a 1 TiB virtio-mem device and only expose little (e.g.,
1 GiB) memory, we would create a single 1 TiB memslot and KVM has to
allocate metadata for that 1 TiB memslot: on x86, this implies allocating
a significant amount of memory for metadata:
(1) RMAP: 8 bytes per 4 KiB, 8 bytes per 2 MiB, 8 bytes per 1 GiB
-> For 1 TiB: 2147483648 + 4194304 + 8192 = ~ 2 GiB (0.2 %)
With the TDP MMU (cat /sys/module/kvm/parameters/tdp_mmu) this gets
allocated lazily when required for nested VMs
(2) gfn_track: 2 bytes per 4 KiB
-> For 1 TiB: 536870912 = ~512 MiB (0.05 %)
(3) lpage_info: 4 bytes per 2 MiB, 4 bytes per 1 GiB
-> For 1 TiB: 2097152 + 4096 = ~2 MiB (0.0002 %)
(4) 2x dirty bitmaps for tracking: 2x 1 bit per 4 KiB page
-> For 1 TiB: 536870912 = 64 MiB (0.006 %)
So we primarily care about (1) and (2). The bad thing is, that the
memory consumption *doubles* once SMM is enabled, because we create the
memslot once for !SMM and once for SMM.
Having a 1 TiB memslot without the TDP MMU consumes around:
* With SMM: 5 GiB
* Without SMM: 2.5 GiB
Having a 1 TiB memslot with the TDP MMU consumes around:
* With SMM: 1 GiB
* Without SMM: 512 MiB
... and that's really something we want to optimize, to be able to just
start a VM with small boot memory (e.g., 4 GiB) and a virtio-mem device
that can grow very large (e.g., 1 TiB).
Consequently, using multiple memslots and only mapping the memslots we
really need can significantly reduce memory waste and speed up
memslot-related operations. Let's expose the sparse RAM memory region using
multiple memslots, mapping only the memslots we currently need into our
device memory region container.
The feature can be enabled using "dynamic-memslots=on" and requires
"unplugged-inaccessible=on", which is nowadays the default.
Once enabled, we'll auto-detect the number of memslots to use based on the
memslot limit provided by the core. We'll use at most 1 memslot per
gigabyte. Note that our global limit of memslots accross all memory devices
is currently set to 256: even with multiple large virtio-mem devices,
we'd still have a sane limit on the number of memslots used.
The default is to not dynamically map memslot for now
("dynamic-memslots=off"). The optimization must be enabled manually,
because some vhost setups (e.g., hotplug of vhost-user devices) might be
problematic until we support more memslots especially in vhost-user backends.
Note that "dynamic-memslots=on" is just a hint that multiple memslots
*may* be used for internal optimizations, not that multiple memslots
*must* be used. The actual number of memslots that are used is an
internal detail: for example, once memslot metadata is no longer an
issue, we could simply stop optimizing for that. Migration source and
destination can differ on the setting of "dynamic-memslots".
Message-ID: <20230926185738.277351-17-david@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
It's cleaner and future-proof to just have other state that depends on the
bitmap state to be updated as soon as possible when restoring the bitmap.
So factor out informing RamDiscardListener into a functon and call it in
case of early migration right after we restored the bitmap.
Message-ID: <20230926185738.277351-16-david@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's prepare for a user that has to modify the VirtIOMEM device state.
Message-ID: <20230926185738.277351-15-david@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
We want to support memory devices that can automatically decide how many
memslots they will use. In the worst case, they have to use a single
memslot.
The target use cases are virtio-mem and the hyper-v balloon.
Let's calculate a reasonable limit such a memory device may use, and
instruct the device to make a decision based on that limit. Use a simple
heuristic that considers:
* A memslot soft-limit for all memory devices of 256; also, to not
consume too many memslots -- which could harm performance.
* Actually still free and unreserved memslots
* The percentage of the remaining device memory region that memory device
will occupy.
Further, while we properly check before plugging a memory device whether
there still is are free memslots, we have other memslot consumers (such as
boot memory, PCI BARs) that don't perform any checks and might dynamically
consume memslots without any prior reservation. So we might succeed in
plugging a memory device, but once we dynamically map a PCI BAR we would
be in trouble. Doing accounting / reservation / checks for all such
users is problematic (e.g., sometimes we might temporarily split boot
memory into two memslots, triggered by the BIOS).
We use the historic magic memslot number of 509 as orientation to when
supporting 256 memory devices -> memslots (leaving 253 for boot memory and
other devices) has been proven to work reliable. We'll fallback to
suggesting a single memslot if we don't have at least 509 total memslots.
Plugging vhost devices with less than 509 memslots available while we
have memory devices plugged that consume multiple memslots due to
automatic decisions can be problematic. Most configurations might just fail
due to "limit < used + reserved", however, it can also happen that these
memory devices would suddenly consume memslots that would actually be
required by other memslot consumers (boot, PCI BARs) later. Note that this
has always been sketchy with vhost devices that support only a small number
of memslots; but we don't want to make it any worse.So let's keep it simple
and simply reject plugging such vhost devices in such a configuration.
Eventually, all vhost devices that want to be fully compatible with such
memory devices should support a decent number of memslots (>= 509).
Message-ID: <20230926185738.277351-13-david@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's add vhost_get_max_memslots().
Message-ID: <20230926185738.277351-12-david@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
We want to support memory devices that have a dynamically managed memory
region container as device memory region. This device memory region maps
multiple RAM memory subregions (e.g., aliases to the same RAM memory
region), whereby these subregions can be (un)mapped on demand.
Each RAM subregion will consume a memslot in KVM and vhost, resulting in
such a new device consuming memslots dynamically, and initially usually
0. We already track the number of used vs. required memslots for all
memslots. From that, we can derive the number of reserved memslots that
must not be used otherwise.
The target use case is virtio-mem and the hyper-v balloon, which will
dynamically map aliases to RAM memory region into their device memory
region container.
Properly document what's supported and what's not and extend the vhost
memslot check accordingly.
Message-ID: <20230926185738.277351-10-david@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's return the number of free slots instead of only checking if there
is a free slot. Required to support memory devices that consume multiple
memslots.
This is a preparation for memory devices that consume multiple memslots.
Message-ID: <20230926185738.277351-6-david@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Checking whether the memory regions are equal is sufficient: if they are
equal, then most certainly the contained fd is equal.
The whole vhost-user memslot handling is suboptimal and overly
complicated. We shouldn't have to lookup a RAM memory regions we got
notified about in vhost_user_get_mr_data() using a host pointer. But that
requires a bigger rework -- especially an alternative vhost_set_mem_table()
backend call that simply consumes MemoryRegionSections.
For now, let's just drop vhost_backend_can_merge().
Message-ID: <20230926185738.277351-3-david@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Having multiple vhost devices, some filtering out fd-less memslots and
some not, can mess up the "used_memslot" accounting. Consequently our
"free memslot" checks become unreliable and we might run out of free
memslots at runtime later.
An example sequence which can trigger a potential issue that involves
different vhost backends (vhost-kernel and vhost-user) and hotplugged
memory devices can be found at [1].
Let's make the filtering mechanism less generic and distinguish between
backends that support private memslots (without a fd) and ones that only
support shared memslots (with a fd). Track the used_memslots for both
cases separately and use the corresponding value when required.
Note: Most probably we should filter out MAP_PRIVATE fd-based RAM regions
(for example, via memory-backend-memfd,...,shared=off or as default with
memory-backend-file) as well. When not using MAP_SHARED, it might not work
as expected. Add a TODO for now.
[1] https://lkml.kernel.org/r/fad9136f-08d3-3fd9-71a1-502069c000cf@redhat.com
Message-ID: <20230926185738.277351-2-david@redhat.com>
Fixes: 988a27754b ("vhost: allow backends to filter memory sections")
Cc: Tiwei Bie <tiwei.bie@intel.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Rename a variable in vhost_dev_sync_region() and remove a superfluous
declaration in vhost_commit() to make this code compilable with "-Wshadow".
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20231004114809.105672-1-thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-By: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
"len" is used as parameter of the functions virtio_write_config()
and virtio_read_config(), and additionally as a local variable,
so this causes a compiler warning when compiling with "-Wshadow"
and can be confusing for the reader. Rename the local variables
to "caplen" to avoid this problem.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20231004095302.99037-1-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Add three new vhost-user protocol
`VHOST_USER_BACKEND_SHARED_OBJECT_* messages`.
These new messages are sent from vhost-user
back-ends to interact with the virtio-dmabuf
table in order to add or remove themselves as
virtio exporters, or lookup for virtio dma-buf
shared objects.
The action taken in the front-end depends
on the type stored in the virtio shared
object hash table.
When the table holds a pointer to a vhost
backend for a given UUID, the front-end sends
a VHOST_USER_GET_SHARED_OBJECT to the
backend holding the shared object.
The messages can only be sent after successfully
negotiating a new VHOST_USER_PROTOCOL_F_SHARED_OBJECT
vhost-user protocol feature bit.
Finally, refactor code to send response message so
that all common parts both for the common REPLY_ACK
case, and other data responses, can call it and
avoid code repetition.
Signed-off-by: Albert Esteve <aesteve@redhat.com>
Message-Id: <20231002065706.94707-4-aesteve@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The 'next' was converted from a local variable to an output parameter
in commit:
412e0e81b1 ("virtio: handle virtqueue_read_next_desc() errors")
But all the actual uses of the 'i/next' as an output were removed a few
months prior in commit:
aa570d6fb6 ("virtio: combine the read of a descriptor")
Remove the unused argument to simplify the code.
Also, adding a comment to the function to describe what it is actually
doing, as it is not obvious that the 'desc' is both an input and an
output argument.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Message-Id: <20230927140016.2317404-3-i.maximets@ovn.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It was supposed to be a compiler barrier and it was a compiler barrier
initially called 'wmb' when virtio core support was introduced.
Later all the instances of 'wmb' were switched to smp_wmb to fix memory
ordering issues on non-x86 platforms. However, this one doesn't need
to be an actual barrier, as its only purpose was to ensure that the
value is not read twice.
And since commit aa570d6fb6 ("virtio: combine the read of a descriptor")
there is no need for a barrier at all, since we're no longer reading
guest memory here, but accessing a local structure.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Message-Id: <20230927140016.2317404-2-i.maximets@ovn.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We do not need the most up to date number of heads, we only want to
know if there is at least one.
Use shadow variable as long as it is not equal to the last available
index checked. This avoids expensive qatomic dereference of the
RCU-protected memory region cache as well as the memory access itself.
The change improves performance of the af-xdp network backend by 2-3%.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Message-Id: <20230927135157.2316982-1-i.maximets@ovn.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Move the definition of VhostUserProtocolFeature to
include/hw/virtio/vhost-user.h.
Remove previous definitions in hw/scsi/vhost-user-scsi.c,
hw/virtio/vhost-user.c, and hw/virtio/virtio-qmp.c.
Previously there were 3 separate definitions of this over 3 different
files. Now only 1 definition of this will be present for these 3 files.
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Reviewed-by: Emmanouil Pitsidianakis <manos.pitsidianakis@linaro.org>
Message-Id: <20230926224107.2951144-4-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add new vhost-user protocol feature to vhost-user protocol feature map
and enumeration:
- VHOST_USER_PROTOCOL_F_STATUS
Add new virtio device features for several virtio devices to their
respective feature mappings:
virtio-blk:
- VIRTIO_BLK_F_SECURE_ERASE
virtio-net:
- VIRTIO_NET_F_NOTF_COAL
- VIRTIO_NET_F_GUEST_USO4
- VIRTIO_NET_F_GUEST_USO6
- VIRTIO_NET_F_HOST_USO
virtio/vhost-user-gpio:
- VIRTIO_GPIO_F_IRQ
- VHOST_USER_F_PROTOCOL_FEATURES
Add support for introspection on vhost-user-gpio devices.
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Reviewed-by: Emmanouil Pitsidianakis <manos.pitsidianakis@linaro.org>
Message-Id: <20230926224107.2951144-3-jonah.palmer@oracle.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The virtio_list duplicates information about virtio devices that already
exist in the QOM composition tree. Instead of creating this list of
realized virtio devices, search the QOM composition tree instead.
This patch modifies the QMP command qmp_x_query_virtio to instead
recursively search the QOM composition tree for devices of type
'TYPE_VIRTIO_DEVICE'. The device is also checked to ensure it's
realized.
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230926224107.2951144-2-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Next patches in this series will no longer perform an
immediate poll and check of the device's used buffers
for each CVQ state load command. Instead, they will
send CVQ state load commands in parallel by polling
multiple pending buffers at once.
To achieve this, this patch refactoring vhost_svq_poll()
to accept a new argument `num`, which allows vhost_svq_poll()
to wait for the device to use multiple elements,
rather than polling for a single element.
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <950b3bfcfc5d446168b9d6a249d554a013a691d4.1693287885.git.yin31149@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Doing that way allows CVQ to be enabled before the dataplane vqs,
restoring the state as MQ or MAC addresses properly in the case of a
migration.
The patch does it by defining a ->load NetClientInfo callback also for
dataplane. Ideally, this should be done by an independent patch, but
the function is already static so it would only add an empty
vhost_vdpa_net_data_load stub.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230822085330.3978829-5-eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The vhost-vdpa net backend needs to enable vrings in a different order
than default, so export it.
No functional change intended except for tracing, that now includes the
(virtio) index being enabled and the return value of the ioctl.
Still ignoring return value of this function if called from
vhost_vdpa_dev_start, as reorganize calling code around it is out of
the scope of this series.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230822085330.3978829-3-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Lots of virtio functions that are on a hot path in data transmission
are initializing indirect descriptor cache at the point of stack
allocation. It's a 112 byte structure that is getting zeroed out on
each call adding unnecessary overhead. It's going to be correctly
initialized later via special init function. The only reason to
actually initialize right away is the ability to safely destruct it.
Replacing a designated initializer with a function to only initialize
what is necessary.
Removal of the unnecessary stack initializations improves throughput
of virtio-net devices in terms of 64B packets per second by 6-14 %
depending on the case. Tested with a proposed af-xdp network backend
and a dpdk testpmd application in the guest, but should be beneficial
for other virtio devices as well.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Message-Id: <20230811143423.3258788-1-i.maximets@ovn.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
To use the generic device the user will need to provide the config
region size via the command line. We also add a notifier so the guest
can be pinged if the remote daemon updates the config.
With these changes:
-device vhost-user-device-pci,virtio-id=41,num_vqs=2,config_size=8
is equivalent to:
-device vhost-user-gpio-pci
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230710153522.3469097-11-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In theory we shouldn't need to repeat so much boilerplate to support
vhost-user backends. This provides a generic vhost-user-base QOM
object and a derived vhost-user-device for which the user needs to
provide the few bits of information that aren't currently provided by
the vhost-user protocol. This should provide a baseline implementation
from which the other vhost-user stub can specialise.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230710153522.3469097-8-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Similarly to commit de6cd7599b ("meson: Replace softmmu_ss
-> system_ss"), rename the virtio source set common to all
system emulation as 'system_virtio_ss[]'. This is clearer
because softmmu can be used for user emulation.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230710100510.84862-1-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The previous commit removed the dependencies on the
target-specific TARGET_PAGE_FOO macros. We can now
move vhost-vdpa.c to the 'softmmu_virtio_ss' source
set to build it once for all our targets.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230710100432.84819-1-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Similarly to commit e414ed2c47 ("virtio-iommu: Use
target-agnostic qemu_target_page_mask"), Replace the
target-specific TARGET_PAGE_SIZE and TARGET_PAGE_MASK
definitions by a call to the runtime qemu_target_page_size()
helper which is target agnostic.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230710094931.84402-5-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Use TARGET_PAGE_SIZE to calculate TARGET_PAGE_ALIGN
(see the rationale in previous commits).
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230710094931.84402-4-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Propagate TARGET_PAGE_MASK (see the previous commit for
rationale).
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230710094931.84402-3-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In order to make vhost-vdpa.c a target-agnostic source unit,
we need to remove the TARGET_PAGE_SIZE / TARGET_PAGE_MASK /
TARGET_PAGE_ALIGN uses. TARGET_PAGE_SIZE will be replaced by
the runtime qemu_target_page_size(). The other ones will be
deduced from TARGET_PAGE_SIZE.
Since the 3 macros are used in 3 related functions (sharing
the same call tree), we'll refactor them to only depend on
TARGET_PAGE_MASK.
Having the following call tree:
vhost_vdpa_listener_region_del()
-> vhost_vdpa_listener_skipped_section()
-> vhost_vdpa_section_end()
The first step is to propagate TARGET_PAGE_MASK to
vhost_vdpa_listener_skipped_section().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230710094931.84402-2-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio_load() as a whole should run in coroutine context because it
reads from the migration stream and we don't want this to block.
However, it calls virtio_set_features_nocheck() and devices don't
expect their .set_features callback to run in a coroutine and therefore
call functions that may not be called in coroutine context. To fix this,
drop out of coroutine context for calling virtio_set_features_nocheck().
Without this fix, the following crash was reported:
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007efc738c05d3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2 0x00007efc73873d26 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007efc738477f3 in __GI_abort () at abort.c:79
#4 0x00007efc7384771b in __assert_fail_base (fmt=0x7efc739dbcb8 "", assertion=assertion@entry=0x560aebfbf5cf "!qemu_in_coroutine()",
file=file@entry=0x560aebfcd2d4 "../block/graph-lock.c", line=line@entry=275, function=function@entry=0x560aebfcd34d "void bdrv_graph_rdlock_main_loop(void)") at assert.c:92
#5 0x00007efc7386ccc6 in __assert_fail (assertion=0x560aebfbf5cf "!qemu_in_coroutine()", file=0x560aebfcd2d4 "../block/graph-lock.c", line=275,
function=0x560aebfcd34d "void bdrv_graph_rdlock_main_loop(void)") at assert.c:101
#6 0x0000560aebcd8dd6 in bdrv_register_buf ()
#7 0x0000560aeb97ed97 in ram_block_added.llvm ()
#8 0x0000560aebb8303f in ram_block_add.llvm ()
#9 0x0000560aebb834fa in qemu_ram_alloc_internal.llvm ()
#10 0x0000560aebb2ac98 in vfio_region_mmap ()
#11 0x0000560aebb3ea0f in vfio_bars_register ()
#12 0x0000560aebb3c628 in vfio_realize ()
#13 0x0000560aeb90f0c2 in pci_qdev_realize ()
#14 0x0000560aebc40305 in device_set_realized ()
#15 0x0000560aebc48e07 in property_set_bool.llvm ()
#16 0x0000560aebc46582 in object_property_set ()
#17 0x0000560aebc4cd58 in object_property_set_qobject ()
#18 0x0000560aebc46ba7 in object_property_set_bool ()
#19 0x0000560aeb98b3ca in qdev_device_add_from_qdict ()
#20 0x0000560aebb1fbaf in virtio_net_set_features ()
#21 0x0000560aebb46b51 in virtio_set_features_nocheck ()
#22 0x0000560aebb47107 in virtio_load ()
#23 0x0000560aeb9ae7ce in vmstate_load_state ()
#24 0x0000560aeb9d2ee9 in qemu_loadvm_state_main ()
#25 0x0000560aeb9d45e1 in qemu_loadvm_state ()
#26 0x0000560aeb9bc32c in process_incoming_migration_co.llvm ()
#27 0x0000560aebeace56 in coroutine_trampoline.llvm ()
Cc: qemu-stable@nongnu.org
Buglink: https://issues.redhat.com/browse/RHEL-832
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230905145002.46391-3-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For symmetric algorithms, the length of ciphertext must be as same
as the plaintext.
The missing verification of the src_len and the dst_len in
virtio_crypto_sym_op_helper() may lead buffer overflow/divulged.
This patch is originally written by Yiming Tao for QEMU-SECURITY,
resend it(a few changes of error message) in qemu-devel.
Fixes: CVE-2023-3180
Fixes: 04b9b37edda("virtio-crypto: add data queue processing handler")
Cc: Gonglei <arei.gonglei@huawei.com>
Cc: Mauro Matteo Cascella <mcascell@redhat.com>
Cc: Yiming Tao <taoym@zju.edu.cn>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20230803024314.29962-2-pizhenwei@bytedance.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When the vhost-user reconnect to the backend, the notifer should be
cleanup. Otherwise, the fd resource will be exhausted.
Fixes: f9a09ca3ea ("vhost: add support for configure interrupt")
Signed-off-by: Li Feng <fengli@smartx.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20230731121018.2856310-2-fengli@smartx.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
virtio_queue_packed_set_last_avail_idx() is used by vhost devices to set
the internal queue indices to what has been reported by the vhost
back-end through GET_VRING_BASE. For packed virtqueues, this
32-bit value is expected to contain both the device's internal avail and
used indices, as well as their respective wrap counters.
To get the used index, we shift the 32-bit value right by 16, and then
apply a mask of 0x7ffff. That seems to be a typo, because it should be
0x7fff; first of all, the virtio specification says that the maximum
queue size for packed virt queues is 2^15, so the indices cannot exceed
2^15 - 1 anyway, making 0x7fff the correct mask. Second, the mask
clearly is wrong from context, too, given that (A) `idx & 0x70000` must
be 0 at this point (`idx` is 32 bit and was shifted to the right by 16
already), (B) `idx & 0x8000` is the used_wrap_counter, so should not be
part of the used index, and (C) `vq->used_idx` is a `uint16_t`, so
cannot fit the 0x70000 part of the mask anyway.
This most likely never produced any guest-visible bugs, though, because
for a vhost device, qemu will probably not evaluate the used index
outside of virtio_queue_packed_get_last_avail_idx(), where we
reconstruct the 32-bit value from avail and used indices and their wrap
counters again. There, it does not matter whether the highest bit of
the used_idx is the used index wrap counter, because we put the wrap
counter exactly in that position anyway.
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230721134945.26967-1-hreitz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: German Maglione <gmaglione@redhat.com>
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20230721072820.75797-1-david.edmondson@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The QEMU CI fails in virtio-scmi test occasionally. As reported by
Thomas Huth, this happens most likely when the system is loaded and it
fails with the following error:
qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659:
msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' failed.
../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped)
As discovered by Fabiano Rosas, the cause is a duplicate invocation of
msix_unset_vector_notifiers via duplicate vu_scmi_stop calls:
msix_unset_vector_notifiers
virtio_pci_set_guest_notifiers
vu_scmi_stop
vu_scmi_disconnect
...
qemu_chr_write_buffer
msix_unset_vector_notifiers
virtio_pci_set_guest_notifiers
vu_scmi_stop
vu_scmi_set_status
...
qemu_cleanup
While vu_scmi_stop calls are protected by vhost_dev_is_started()
check, it's apparently not enough. vhost-user-blk and vhost-user-gpio
use an extra protection, see f5b22d06fb (vhost: recheck dev state in
the vhost_migration_log routine) for the motivation. Let's use the
same in vhost-user-scmi, which fixes the failure above.
Fixes: a5dab090e1 ("hw/virtio: Add boilerplate for vhost-user-scmi device")
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Message-Id: <20230720101037.2161450-1-mzamazal@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
At several locations we compute the granule from the config
page_size_mask using ctz() and then format it in traces using
BIT(). As the page_size_mask is 64b we should use ctz64 and
BIT_ULL() for formatting. We failed to be consistent.
Note the page_size_mask is garanteed to be non null. The spec
mandates the device to set at least one bit, so ctz64 cannot
return 64. This is garanteed by the fact the device
initializes the page_size_mask to qemu_target_page_mask()
and then the page_size_mask is further constrained by
virtio_iommu_set_page_size_mask() callback which can't
result in a new mask being null. So if Coverity complains
round those ctz64/BIT_ULL with CID 1517772 this is a false
positive
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Fixes: 94df5b2180 ("virtio-iommu: Fix 64kB host page size VFIO device assignment")
Message-Id: <20230718182136.40096-1-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
In the virtio_iommu_handle_command() when a PROBE request is handled,
output_size takes a value greater than the tail size and on a subsequent
iteration we can get a stack out-of-band access. Initialize the
output_size on each iteration.
The issue was found with ASAN. Credits to:
Yiming Tao(Zhejiang University)
Gaoning Pan(Zhejiang University)
Fixes: 1733eebb9e ("virtio-iommu: Implement RESV_MEM probe request")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reported-by: Mauro Matteo Cascella <mcascell@redhat.com>
Cc: qemu-stable@nongnu.org
Message-Id: <20230717162126.11693-1-eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Let's support device unplug by forwarding the unplug_request_check()
callback to the virtio-mem device.
Further, disallow changing the requested-size once an unplug request is
pending.
Disallowing requested-size changes handles corner cases such as
(1) pausing the VM (2) requesting device unplug and (3) adjusting the
requested size. If the VM would plug memory (due to the requested size
change) before processing the unplug request, we would be in trouble.
Message-ID: <20230711153445.514112-8-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
In many cases, blindly unplugging a virtio-mem device is problematic. We
can only safely remove a device once:
* The guest is not expecting to be able to read unplugged memory
(unplugged-inaccessible == on)
* The virtio-mem device does not have memory plugged (size == 0)
* The virtio-mem device does not have outstanding requests to the VM to
plug memory (requested-size == 0)
So let's add a callback to the virtio-mem device class to check for that.
We'll wire-up virtio-mem-pci next.
Message-ID: <20230711153445.514112-7-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's support unplug requests for virtio-md-pci devices that provide
a unplug_request_check() callback.
We'll wire that up for virtio-mem-pci next.
Message-ID: <20230711153445.514112-6-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
While we fence unplug requests from the outside, the VM can still
trigger unplug of virtio based memory devices, for example, in Linux
doing on a virtio-mem-pci device:
# echo 0 > /sys/bus/pci/slots/3/power
While doing that is not really expected to work without harming the
guest OS (e.g., removing a virtio-mem device while it still provides
memory), let's make sure that we properly handle it on the QEMU side.
We'll add support for unplugging of virtio-mem devices in some
configurations next.
Message-ID: <20230711153445.514112-5-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's factor out (un)plug handling, to be reused from arm/virt code.
Provide stubs for the case that CONFIG_VIRTIO_MD is not selected because
neither virtio-mem nor virtio-pmem is enabled. While this cannot
currently happen for x86, it will be possible for arm/virt.
Message-ID: <20230711153445.514112-3-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's add a new abstract "virtio memory device" type, and use it as
parent class of virtio-mem-pci and virtio-pmem-pci.
Message-ID: <20230711153445.514112-2-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>