Make virtio-fs take into account server capabilities.
Just returning requested features assumes they all of then are implemented
by server and results in setting unsupported configuration if some of them
are absent.
Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
With changes suggested by Stefan
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
The same thing as for incoming postcopy - we cannot deal with concurrent
RAM discards when using background snapshot feature in outgoing migration.
Fixes: 8518278a6a (migration: implementation
of background snapshot thread)
Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Reported-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210401092226.102804-3-andrey.gruzdev@virtuozzo.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Commit 4c70875372 ("pci: advertise a page aligned ATS") advertises
the page aligned via ATS capability (RO) to unbrek recent Linux IOMMU
drivers since 5.2. But it forgot the compat the capability which
breaks the migration from old machine type:
(qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x104 read:
0 device: 20 cmask: ff wmask: 0 w1cmask:0
This patch introduces a new parameter "x-ats-page-aligned" for
virtio-pci device and turns it on for machine type which is newer than
5.1.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: qemu-stable@nongnu.org
Fixes: 4c70875372 ("pci: advertise a page aligned ATS")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20210406040330.11306-1-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The value is assigned later in this procedure.
Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Message-Id: <20210315115937.14286-3-yuri.benditovich@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1743098
This commit completes the solution of segfault in hot unplug flow
(by commit ccec7e9603).
Added missing check for vdev in virtio_pci_isr_read.
Typical stack of crash:
virtio_pci_isr_read ../hw/virtio/virtio-pci.c:1365 with proxy-vdev = 0
memory_region_read_accessor at ../softmmu/memory.c:442
access_with_adjusted_size at ../softmmu/memory.c:552
memory_region_dispatch_read1 at ../softmmu/memory.c:1420
memory_region_dispatch_read at ../softmmu/memory.c:1449
flatview_read_continue at ../softmmu/physmem.c:2822
flatview_read at ../softmmu/physmem.c:2862
address_space_read_full at ../softmmu/physmem.c:2875
Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Message-Id: <20210315115937.14286-2-yuri.benditovich@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
ret in virtio_pmem_resp is a uint32_t variable, which should be assigned
using virtio_stl_p.
The kernel side driver does not guarantee virtio_pmem_resp to be initialized
to zero in advance, So sometimes the flush operation will fail.
Signed-off-by: Wang Liang <wangliangzz@inspur.com>
Message-Id: <20210317024145.271212-1-wangliangzz@126.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Now that everything is in place, have the nested event loop to monitor
the slave channel. The source in the main event loop is destroyed and
recreated to ensure any pending even for the slave channel that was
previously detected is purged. This guarantees that the main loop
wont invoke slave_read() based on an event that was already handled
by the nested loop.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210312092212.782255-7-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
A deadlock condition potentially exists if a vhost-user process needs
to request something to QEMU on the slave channel while processing a
vhost-user message.
This doesn't seem to affect any vhost-user implementation so far, but
this is currently biting the upcoming enablement of DAX with virtio-fs.
The issue is being observed when the guest does an emergency reboot while
a mapping still exits in the DAX window, which is very easy to get with
a busy enough workload (e.g. as simulated by blogbench [1]) :
- QEMU sends VHOST_USER_GET_VRING_BASE to virtiofsd.
- In order to complete the request, virtiofsd then asks QEMU to remove
the mapping on the slave channel.
All these dialogs are synchronous, hence the deadlock.
As pointed out by Stefan Hajnoczi:
When QEMU's vhost-user master implementation sends a vhost-user protocol
message, vhost_user_read() does a "blocking" read during which slave_fd
is not monitored by QEMU.
The natural solution for this issue is an event loop. The main event
loop cannot be nested though since we have no guarantees that its
fd handlers are prepared for re-entrancy.
Introduce a new event loop that only monitors the chardev I/O for now
in vhost_user_read() and push the actual reading to a one-shot handler.
A subsequent patch will teach the loop to monitor and process messages
from the slave channel as well.
[1] https://github.com/jedisct1/Blogbench
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210312092212.782255-6-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The slave channel is implemented with socketpair() : QEMU creates
the pair, passes one of the socket to virtiofsd and monitors the
other one with the main event loop using qemu_set_fd_handler().
In order to fix a potential deadlock between QEMU and a vhost-user
external process (e.g. virtiofsd with DAX), we want to be able to
monitor and service the slave channel while handling vhost-user
requests.
Prepare ground for this by converting the slave channel to be a
QIOChannelSocket. This will make monitoring of the slave channel
as simple as calling qio_channel_add_watch_source(). Since the
connection is already established between the two sockets, only
incoming I/O (G_IO_IN) and disconnect (G_IO_HUP) need to be
serviced.
This also allows to get rid of the ancillary data parsing since
QIOChannelSocket can do this for us. Note that the MSG_CTRUNC
check is dropped on the way because QIOChannelSocket ignores this
case. This isn't a problem since slave_read() provisions space for
8 file descriptors, but affected vhost-user slave protocol messages
generally only convey one. If for some reason a buggy implementation
passes more file descriptors, no need to break the connection, just
like we don't break it if some other type of ancillary data is
received : this isn't explicitely violating the protocol per-se so
it seems better to ignore it.
The current code errors out on short reads and writes. Use the
qio_channel_*_all() variants to address this on the way.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210312092212.782255-5-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210312092212.782255-4-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Some message types, e.g. VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG,
can convey file descriptors. These must be closed before returning
from slave_read() to avoid being leaked. This can currently be done
in two different places:
[1] just after the request has been processed
[2] on the error path, under the goto label err:
These path are supposed to be mutually exclusive but they are not
actually. If the VHOST_USER_NEED_REPLY_MASK flag was passed and the
sending of the reply fails, both [1] and [2] are performed with the
same descriptor values. This can potentially cause subtle bugs if one
of the descriptor was recycled by some other thread in the meantime.
This code duplication complicates rollback for no real good benefit.
Do the closing in a unique place, under a new fdcleanup: goto label
at the end of the function.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210312092212.782255-3-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
slave_read() checks EAGAIN when reading or writing to the socket
fails. This gives the impression that the slave channel is in
non-blocking mode, which is certainly not the case with the current
code base. And the rest of the code isn't actually ready to cope
with non-blocking I/O.
Just drop the checks everywhere in this function for the sake of
clarity.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210312092212.782255-2-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Both functions don't check the personality of the interface (legacy or
modern) before accessing the configuration memory and always use
virtio_config_readX()/virtio_config_writeX().
With this patch, they now check the personality and in legacy mode
call virtio_config_readX()/virtio_config_writeX(), otherwise call
virtio_config_modern_readX()/virtio_config_modern_writeX().
This change has been tested with virtio-mmio guests (virt stretch/armhf and
virt sid/m68k) and virtio-pci guests (pseries RHEL-7.3/ppc64 and /ppc64le).
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210314200300.3259170-1-laurent@vivier.eu>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently, the default msix vectors for virtio-net-pci is 3 which is
obvious not suitable for multiqueue guest, so we depends on the user
or management tools to pass a correct vectors parameter. In fact, we
can simplifying this by calculating the number of vectors on realize.
Consider we have N queues, the number of vectors needed is 2*N + 2
(#queue pairs + plus one config interrupt and control vq). We didn't
check whether or not host support control vq because it was added
unconditionally by qemu to avoid breaking legacy guests such as Minix.
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Unmap notifiers work with an address mask assuming an
invalidation range of a power of 2. Nothing mandates this
in the VIRTIO-IOMMU spec.
So in case the range is not a power of 2, split it into
several invalidations.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-id: 20210309102742.30442-4-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The 'running' argument from VMChangeStateHandler does not require
other value than 0 / 1. Make it a plain boolean.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20210111152020.1422021-3-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
An assorted set of spelling fixes in various places.
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210309111510.79495-1-mjt@msgid.tls.msk.ru>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
At the moment the following QEMU command line triggers an assertion
failure On xlnx-versal SOC:
qemu-system-aarch64 \
-machine xlnx-versal-virt -nographic -smp 2 -m 128 \
-fsdev local,id=shareid,path=${HOME}/work,security_model=none \
-device virtio-9p-device,fsdev=shareid,mount_tag=share \
-fsdev local,id=shareid1,path=${HOME}/Music,security_model=none \
-device virtio-9p-device,fsdev=shareid1,mount_tag=share1
qemu-system-aarch64: ../migration/savevm.c:860:
vmstate_register_with_alias_id:
Assertion `!se->compat || se->instance_id == 0' failed.
This problem was fixed on arm virt platform in commit f58b39d2d5
("virtio-mmio: format transport base address in BusClass.get_dev_path")
It works perfectly on arm virt platform. but there is still there on
xlnx-versal SOC.
The main difference between arm virt and xlnx-versal is they use
different way to create virtio-mmio qdev. on arm virt, it calls
sysbus_create_simple("virtio-mmio", base, pic[irq]); which will call
sysbus_mmio_map internally and assign base address to subsys device
mmio correctly. but xlnx-versal's implements won't do this.
However, xlnx-versal can't switch to sysbus_create_simple() to create
virtio-mmio device. It's because xlnx-versal's cpu use
VersalVirt.soc.fpd.apu.mr as it's memory. which is subregion of
system_memory. sysbus_create_simple will add virtio to system_memory,
which can't be accessed by cpu.
Besides, xlnx-versal can't add sysbus_mmio_map api call too, because
this will add memory region to system_memory, and it can't be added
to VersalVirt.soc.fpd.apu.mr again.
We can solve this by assign correct base address offset on dev_path.
This path was test on aarch64 virt & xlnx-versal platform.
Signed-off-by: schspa <schspa@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Requiring a conditional for every goto is tedious:
if (busyloop_timeout) {
goto fail_busyloop;
} else {
goto fail;
}
Move the conditional to into the fail_busyloop label so that it's safe
to jump to this label unconditionally.
This change makes the migrate_add_blocker() error case more consistent.
It jumped to fail_busyloop unconditionally whereas the memslots limits
error case was conditional.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210222114931.272308-1-stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The category of the virtio-pmem device is not set, put it into the 'storage'
category.
Signed-off-by: Gan Qixin <ganqixin@huawei.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Message-Id: <20201130083630.2520597-3-ganqixin@huawei.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
When viewing/debugging memory regions it is sometimes hard to figure
out which PCI device something belongs to. Make the names unique by
including the vdev name in the name string.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20210213130325.14781-2-alex.bennee@linaro.org>
Not checking this can lead to invalid dev->vdev member access in
vhost_device_iotlb_miss if backend issue an iotlb message in a bad
timing, either maliciously or by a bug.
Reproduced rebooting a guest with testpmd in txonly forward mode.
#0 0x0000559ffff94394 in vhost_device_iotlb_miss (
dev=dev@entry=0x55a0012f6680, iova=10245279744, write=1)
at ../hw/virtio/vhost.c:1013
#1 0x0000559ffff9ac31 in vhost_backend_handle_iotlb_msg (
imsg=0x7ffddcfd32c0, dev=0x55a0012f6680)
at ../hw/virtio/vhost-backend.c:411
#2 vhost_backend_handle_iotlb_msg (dev=dev@entry=0x55a0012f6680,
imsg=imsg@entry=0x7ffddcfd32c0)
at ../hw/virtio/vhost-backend.c:404
#3 0x0000559fffeded7b in slave_read (opaque=0x55a0012f6680)
at ../hw/virtio/vhost-user.c:1464
#4 0x000055a0000c541b in aio_dispatch_handler (
ctx=ctx@entry=0x55a0010a2120, node=0x55a0012d9e00)
at ../util/aio-posix.c:329
Fixes: 020e571b8b ("vhost: rework IOTLB messaging")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20210129090728.831208-1-eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This property was only required for compatibility reasons in the
pc-1.0 machine type and earlier. Now that these machine types have
been removed, the property is not useful anymore.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210203171832.483176-4-thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Previous work on dev-iotlb message broke vhost on either SMMU or virtio-iommu
since dev-iotlb (or PCIe ATS) is not yet supported for those archs.
An initial idea is that we can let IOMMU to export this information to vhost so
that vhost would know whether the vIOMMU would support dev-iotlb, then vhost
can conditionally register to dev-iotlb or the old iotlb way. We can work
based on some previous patch to introduce PCIIOMMUOps as Yi Liu proposed [1].
However it's not as easy as I thought since vhost_iommu_region_add() does not
have a PCIDevice context at all since it's completely a backend. It seems
non-trivial to pass over a PCI device to the backend during init. E.g. when
the IOMMU notifier registered hdev->vdev is still NULL.
To make the fix smaller and easier, this patch goes the other way to leverage
the flag_changed() hook of vIOMMUs so that SMMU and virtio-iommu can trap the
dev-iotlb registration and fail it. Then vhost could try the fallback solution
as using UNMAP invalidation for it's translations.
[1] https://lore.kernel.org/qemu-devel/1599735398-6829-4-git-send-email-yi.l.liu@intel.com/
Reported-by: Eric Auger <eric.auger@redhat.com>
Fixes: b68ba1ca57
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210204191228.187550-1-peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch adds trace events for virtio-pmem functionality.
Adding trace events for virtio pmem request, reponse and host
side fsync functionality.
Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Message-Id: <20201117115705.32195-1-pankaj.gupta.linux@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Address space is destroyed without proper removal of its listeners with
current code. They are expected to be removed in
virtio_device_instance_finalize [1], but qemu calls it through
object_deinit, after address_space_destroy call through
device_set_realized [2].
Move it to virtio_device_unrealize, called before device_set_realized
[3] and making it symmetric with memory_listener_register in
virtio_device_realize.
v2: Delete no-op call of virtio_device_instance_finalize.
Add backtraces.
[1]
#0 virtio_device_instance_finalize (obj=0x555557de5120)
at /home/qemu/include/hw/virtio/virtio.h:71
#1 0x0000555555b703c9 in object_deinit (type=0x555556639860,
obj=<optimized out>) at ../qom/object.c:671
#2 object_finalize (data=0x555557de5120) at ../qom/object.c:685
#3 object_unref (objptr=0x555557de5120) at ../qom/object.c:1184
#4 0x0000555555b4de9d in bus_free_bus_child (kid=0x555557df0660)
at ../hw/core/qdev.c:55
#5 0x0000555555c65003 in call_rcu_thread (opaque=opaque@entry=0x0)
at ../util/rcu.c:281
Queued by:
#0 bus_remove_child (bus=0x555557de5098,
child=child@entry=0x555557de5120) at ../hw/core/qdev.c:60
#1 0x0000555555b4ee31 in device_unparent (obj=<optimized out>)
at ../hw/core/qdev.c:984
#2 0x0000555555b70465 in object_finalize_child_property (
obj=<optimized out>, name=<optimized out>, opaque=0x555557de5120)
at ../qom/object.c:1725
#3 0x0000555555b6fa17 in object_property_del_child (
child=0x555557de5120, obj=0x555557ddcf90) at ../qom/object.c:645
#4 object_unparent (obj=0x555557de5120) at ../qom/object.c:664
#5 0x0000555555b4c071 in bus_unparent (obj=<optimized out>)
at ../hw/core/bus.c:147
#6 0x0000555555b70465 in object_finalize_child_property (
obj=<optimized out>, name=<optimized out>, opaque=0x555557de5098)
at ../qom/object.c:1725
#7 0x0000555555b6fa17 in object_property_del_child (
child=0x555557de5098, obj=0x555557ddcf90) at ../qom/object.c:645
#8 object_unparent (obj=0x555557de5098) at ../qom/object.c:664
#9 0x0000555555b4ee19 in device_unparent (obj=<optimized out>)
at ../hw/core/qdev.c:981
#10 0x0000555555b70465 in object_finalize_child_property (
obj=<optimized out>, name=<optimized out>, opaque=0x555557ddcf90)
at ../qom/object.c:1725
#11 0x0000555555b6fa17 in object_property_del_child (
child=0x555557ddcf90, obj=0x55555685da10) at ../qom/object.c:645
#12 object_unparent (obj=0x555557ddcf90) at ../qom/object.c:664
#13 0x00005555558dc331 in pci_for_each_device_under_bus (
opaque=<optimized out>, fn=<optimized out>, bus=<optimized out>)
at ../hw/pci/pci.c:1654
[2]
Optimizer omits pci_qdev_unrealize, called by device_set_realized, and
do_pci_unregister_device, called by pci_qdev_unrealize and caller of
address_space_destroy.
#0 address_space_destroy (as=0x555557ddd1b8)
at ../softmmu/memory.c:2840
#1 0x0000555555b4fc53 in device_set_realized (obj=0x555557ddcf90,
value=<optimized out>, errp=0x7fffeea8f1e0)
at ../hw/core/qdev.c:850
#2 0x0000555555b6eaa6 in property_set_bool (obj=0x555557ddcf90,
v=<optimized out>, name=<optimized out>, opaque=0x555556650ba0,
errp=0x7fffeea8f1e0) at ../qom/object.c:2255
#3 0x0000555555b70e07 in object_property_set (
obj=obj@entry=0x555557ddcf90,
name=name@entry=0x555555db99df "realized",
v=v@entry=0x7fffe46b7500,
errp=errp@entry=0x5555565bbf38 <error_abort>)
at ../qom/object.c:1400
#4 0x0000555555b73c5f in object_property_set_qobject (
obj=obj@entry=0x555557ddcf90,
name=name@entry=0x555555db99df "realized",
value=value@entry=0x7fffe44f6180,
errp=errp@entry=0x5555565bbf38 <error_abort>)
at ../qom/qom-qobject.c:28
#5 0x0000555555b71044 in object_property_set_bool (
obj=0x555557ddcf90, name=0x555555db99df "realized",
value=<optimized out>, errp=0x5555565bbf38 <error_abort>)
at ../qom/object.c:1470
#6 0x0000555555921cb7 in pcie_unplug_device (bus=<optimized out>,
dev=0x555557ddcf90,
opaque=<optimized out>) at /home/qemu/include/hw/qdev-core.h:17
#7 0x00005555558dc331 in pci_for_each_device_under_bus (
opaque=<optimized out>, fn=<optimized out>,
bus=<optimized out>) at ../hw/pci/pci.c:1654
[3]
#0 virtio_device_unrealize (dev=0x555557de5120)
at ../hw/virtio/virtio.c:3680
#1 0x0000555555b4fc63 in device_set_realized (obj=0x555557de5120,
value=<optimized out>, errp=0x7fffee28df90)
at ../hw/core/qdev.c:850
#2 0x0000555555b6eab6 in property_set_bool (obj=0x555557de5120,
v=<optimized out>, name=<optimized out>, opaque=0x555556650ba0,
errp=0x7fffee28df90) at ../qom/object.c:2255
#3 0x0000555555b70e17 in object_property_set (
obj=obj@entry=0x555557de5120,
name=name@entry=0x555555db99ff "realized",
v=v@entry=0x7ffdd8035040,
errp=errp@entry=0x5555565bbf38 <error_abort>)
at ../qom/object.c:1400
#4 0x0000555555b73c6f in object_property_set_qobject (
obj=obj@entry=0x555557de5120,
name=name@entry=0x555555db99ff "realized",
value=value@entry=0x7ffdd8035020,
errp=errp@entry=0x5555565bbf38 <error_abort>)
at ../qom/qom-qobject.c:28
#5 0x0000555555b71054 in object_property_set_bool (
obj=0x555557de5120, name=name@entry=0x555555db99ff "realized",
value=value@entry=false, errp=0x5555565bbf38 <error_abort>)
at ../qom/object.c:1470
#6 0x0000555555b4edc5 in qdev_unrealize (dev=<optimized out>)
at ../hw/core/qdev.c:403
#7 0x0000555555b4c2a9 in bus_set_realized (obj=<optimized out>,
value=<optimized out>, errp=<optimized out>)
at ../hw/core/bus.c:204
#8 0x0000555555b6eab6 in property_set_bool (obj=0x555557de5098,
v=<optimized out>, name=<optimized out>, opaque=0x555557df04c0,
errp=0x7fffee28e0a0) at ../qom/object.c:2255
#9 0x0000555555b70e17 in object_property_set (
obj=obj@entry=0x555557de5098,
name=name@entry=0x555555db99ff "realized",
v=v@entry=0x7ffdd8034f50,
errp=errp@entry=0x5555565bbf38 <error_abort>)
at ../qom/object.c:1400
#10 0x0000555555b73c6f in object_property_set_qobject (
obj=obj@entry=0x555557de5098,
name=name@entry=0x555555db99ff "realized",
value=value@entry=0x7ffdd8020630,
errp=errp@entry=0x5555565bbf38 <error_abort>)
at ../qom/qom-qobject.c:28
#11 0x0000555555b71054 in object_property_set_bool (
obj=obj@entry=0x555557de5098,
name=name@entry=0x555555db99ff "realized",
value=value@entry=false, errp=0x5555565bbf38 <error_abort>)
at ../qom/object.c:1470
#12 0x0000555555b4c725 in qbus_unrealize (
bus=bus@entry=0x555557de5098) at ../hw/core/bus.c:178
#13 0x0000555555b4fc00 in device_set_realized (obj=0x555557ddcf90,
value=<optimized out>, errp=0x7fffee28e1e0)
at ../hw/core/qdev.c:844
#14 0x0000555555b6eab6 in property_set_bool (obj=0x555557ddcf90,
v=<optimized out>, name=<optimized out>, opaque=0x555556650ba0,
errp=0x7fffee28e1e0) at ../qom/object.c:2255
#15 0x0000555555b70e17 in object_property_set (
obj=obj@entry=0x555557ddcf90,
name=name@entry=0x555555db99ff "realized",
v=v@entry=0x7ffdd8020560,
errp=errp@entry=0x5555565bbf38 <error_abort>)
at ../qom/object.c:1400
#16 0x0000555555b73c6f in object_property_set_qobject (
obj=obj@entry=0x555557ddcf90,
name=name@entry=0x555555db99ff "realized",
value=value@entry=0x7ffdd8020540,
errp=errp@entry=0x5555565bbf38 <error_abort>)
at ../qom/qom-qobject.c:28
#17 0x0000555555b71054 in object_property_set_bool (
obj=0x555557ddcf90, name=0x555555db99ff "realized",
value=<optimized out>, errp=0x5555565bbf38 <error_abort>)
at ../qom/object.c:1470
#18 0x0000555555921cb7 in pcie_unplug_device (bus=<optimized out>,
dev=0x555557ddcf90, opaque=<optimized out>)
at /home/qemu/include/hw/qdev-core.h:17
#19 0x00005555558dc331 in pci_for_each_device_under_bus (
opaque=<optimized out>, fn=<optimized out>, bus=<optimized out>)
at ../hw/pci/pci.c:1654
Fixes: c611c76417 ("virtio: add MemoryListener to cache ring translations")
Buglink: https://bugs.launchpad.net/qemu/+bug/1912846
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20210125192505.390554-1-eperezma@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In the kernel, virtio_gpu_init() uses virtio_get_shm_region()
since
commit 6076a9711dc5 ("drm/virtio: implement blob resources: probe for host visible region")
but vm_get_shm_region() unconditionally uses VIRTIO_MMIO_SHM_SEL to
get the address and the length of the region.
commit 38e895487afc ("virtio: Implement get_shm_region for MMIO transport"
As this is not implemented in QEMU, address and length are 0 and passed
as is to devm_request_mem_region() that triggers a crash:
[drm:virtio_gpu_init] *ERROR* Could not reserve host visible region
Unable to handle kernel NULL pointer dereference at virtual address (ptrval)
According to the comments in the kernel, a non existent shared region
has a length of (u64)-1.
This is what we return now with this patch to disable the region.
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20201220163539.2255963-1-laurent@vivier.eu>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio-fs qualifies as a bootable device minimally under OVMF, but
currently the necessary "bootindex" property is missing. Add the property.
Expose the property only in the PCI device, for now. There is no boot
support for virtiofs on s390x (ccw) for the time being [1] [2], so leave
the CCW device unchanged. Add the property to the base device still,
because adding the alias to the CCW device later will be easier this way
[3].
[1] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg01745.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg01870.html
[3] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg01751.html
Example OpenFirmware device path for the "vhost-user-fs-pci" device in the
"bootorder" fw_cfg file:
/pci@i0cf8/pci-bridge@1,6/pci1af4,105a@0/filesystem@0
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Ján Tomko <jtomko@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: virtio-fs@redhat.com
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210112131603.12686-1-lersek@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This commit is the result of running the timer-del-timer-free.cocci
script on the whole source tree.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201215154107.3255-4-peter.maydell@linaro.org
Commit 8118f0950f "migration: Append JSON description of migration
stream" needs a JSON writer. The existing qobject_to_json() wasn't a
good fit, because it requires building a QObject to convert. Instead,
migration got its very own JSON writer, in commit 190c882ce2 "QJSON:
Add JSON writer". It tacitly limits numbers to int64_t, and strings
contents to characters that don't need escaping, unlike
qobject_to_json().
The previous commit factored the JSON writer out of qobject_to_json().
Replace migration's JSON writer by it.
Cc: Juan Quintela <quintela@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201211171152.146877-17-armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Move the property types and property macros implemented in
qdev-properties-system.c to a new qdev-properties-system.h
header.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20201211220529.2290218-16-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Generalize the qdev_hotplug variable to the different phases of
machine initialization. We would like to allow different
monitor commands depending on the phase.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Added AER capability for virtio-pci devices.
Also added property for devices, by default AER is disabled.
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Message-Id: <20201203110713.204938-3-andrew@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Removed hardcoded offset for ats. Added cap offset counter
for future capabilities like AER.
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Message-Id: <20201203110713.204938-2-andrew@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If we find a queue with an inconsistent guest index value, explicitly mark the
device as needing a reset - and broken - via virtio_error().
There's at least one driver implementation - the virtio-win NetKVM driver - that
is able to handle a VIRTIO_CONFIG_S_NEEDS_RESET notification and successfully
restore the device to a working state. Other implementations do not correctly
handle this, but as the VQ is not in a functional state anyway, this is still
worth doing.
Signed-off-by: John Levon <john.levon@nutanix.com>
Message-Id: <20201120185103.GA442386@sent>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This allows us to differentiate between regular IOMMU map/unmap events
and DEVIOTLB unmap. Doing so, notifiers that only need device IOTLB
invalidations will not receive regular IOMMU unmappings.
Adapt intel and vhost to use it.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20201116165506.31315-4-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This way we can tell between regular IOMMUTLBEntry (entry of IOMMU
hardware) and notifications.
In the notifications, we set explicitly if it is a MAPs or an UNMAP,
instead of trusting in entry permissions to differentiate them.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20201116165506.31315-3-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
QEMU currently truncates the mmap_offset field when sending
VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG messages. The struct
layout looks like this:
typedef struct VhostUserMemoryRegion {
uint64_t guest_phys_addr;
uint64_t memory_size;
uint64_t userspace_addr;
uint64_t mmap_offset;
} VhostUserMemoryRegion;
typedef struct VhostUserMemRegMsg {
uint32_t padding;
/* WARNING: there is a 32-bit hole here! */
VhostUserMemoryRegion region;
} VhostUserMemRegMsg;
The payload size is calculated as follows when sending the message in
hw/virtio/vhost-user.c:
msg->hdr.size = sizeof(msg->payload.mem_reg.padding) +
sizeof(VhostUserMemoryRegion);
This calculation produces an incorrect result of only 36 bytes.
sizeof(VhostUserMemRegMsg) is actually 40 bytes.
The consequence of this is that the final field, mmap_offset, is
truncated. This breaks x86_64 TCG guests on s390 hosts. Other guest/host
combinations may get lucky if either of the following holds:
1. The guest memory layout does not need mmap_offset != 0.
2. The host is little-endian and mmap_offset <= 0xffffffff so the
truncation has no effect.
Fix this by extending the existing 32-bit padding field to 64-bit. Now
the padding reflects the actual compiler padding. This can be verified
using pahole(1).
Also document the layout properly in the vhost-user specification. The
vhost-user spec did not document the exact layout. It would be
impossible to implement the spec without looking at the QEMU source
code.
Existing vhost-user frontends and device backends continue to work after
this fix has been applied. The only change in the wire protocol is that
QEMU now sets hdr.size to 40 instead of 36. If a vhost-user
implementation has a hardcoded size check for 36 bytes, then it will
fail with new QEMUs. Both QEMU and DPDK/SPDK don't check the exact
payload size, so they continue to work.
Fixes: f1aeb14b08 ("Transmit vhost-user memory regions individually")
Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201109174355.1069147-1-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: f1aeb14b08 ("Transmit vhost-user memory regions individually")
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Virtqueue has split and packed, so before setting inflight,
you need to inform the back-end virtqueue format.
Signed-off-by: Jin Yu <jin.yu@intel.com>
Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20201103123617.28256-1-jin.yu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This reverts commit adb29c0273.
The commit broke -device vhost-user-blk-pci because the
vhost_dev_prepare_inflight() function it introduced segfaults in
vhost_dev_set_features() when attempting to access struct vhost_dev's
vdev pointer before it has been assigned.
To reproduce the segfault simply launch a vhost-user-blk device with the
contrib vhost-user-blk device backend:
$ build/contrib/vhost-user-blk/vhost-user-blk -s /tmp/vhost-user-blk.sock -r -b /var/tmp/foo.img
$ build/qemu-system-x86_64 \
-device vhost-user-blk-pci,id=drv0,chardev=char1,addr=4.0 \
-object memory-backend-memfd,id=mem,size=1G,share=on \
-M memory-backend=mem,accel=kvm \
-chardev socket,id=char1,path=/tmp/vhost-user-blk.sock
Segmentation fault (core dumped)
Cc: Jin Yu <jin.yu@intel.com>
Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201102165709.232180-1-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The virtio-iommu device can deal with arbitrary page sizes for virtual
endpoints, but for endpoints assigned with VFIO it must follow the page
granule used by the host IOMMU driver.
Implement the interface to set the vIOMMU page size mask, called by VFIO
for each endpoint. We assume that all host IOMMU drivers use the same
page granule (the host page granule). Override the page_size_mask field
in the virtio config space.
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-10-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add notify_flag_changed() to notice when memory listeners are added and
removed.
Acked-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-7-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Implement the replay callback to setup all mappings for a new memory
region.
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-6-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Call the memory notifiers when attaching an endpoint to a domain, to
replay existing mappings, and when detaching the endpoint, to remove all
mappings.
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-5-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Extend VIRTIO_IOMMU_T_MAP/UNMAP request to notify memory listeners. It
will call VFIO notifier to map/unmap regions in the physical IOMMU.
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-4-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Store the memory region associated to each endpoint into the endpoint
structure, to allow efficient memory notification on map/unmap.
Acked-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-3-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Due to an invalid mask, virtio_iommu_mr() may return the wrong memory
region. It hasn't been too problematic so far because the function was
only used to test existence of an endpoint, but that is about to change.
Fixes: cfb42188b2 ("virtio-iommu: Implement attach/detach command")
Cc: QEMU Stable <qemu-stable@nongnu.org>
Acked-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-2-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>