Commit Graph

1073 Commits

Author SHA1 Message Date
Anton Kuchin
ace66791cd vhost-user-fs: fix features handling
Make virtio-fs take into account server capabilities.

Just returning requested features assumes they all of then are implemented
by server and results in setting unsupported configuration if some of them
are absent.

Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  With changes suggested by Stefan
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-04-13 16:13:41 +01:00
Andrey Gruzdev
1a8e44a89f migration: Inhibit virtio-balloon for the duration of background snapshot
The same thing as for incoming postcopy - we cannot deal with concurrent
RAM discards when using background snapshot feature in outgoing migration.

Fixes: 8518278a6a (migration: implementation
  of background snapshot thread)
Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Reported-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210401092226.102804-3-andrey.gruzdev@virtuozzo.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-04-06 18:56:01 +01:00
Jason Wang
d83f46d189 virtio-pci: compat page aligned ATS
Commit 4c70875372 ("pci: advertise a page aligned ATS") advertises
the page aligned via ATS capability (RO) to unbrek recent Linux IOMMU
drivers since 5.2. But it forgot the compat the capability which
breaks the migration from old machine type:

(qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x104 read:
0 device: 20 cmask: ff wmask: 0 w1cmask:0

This patch introduces a new parameter "x-ats-page-aligned" for
virtio-pci device and turns it on for machine type which is newer than
5.1.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: qemu-stable@nongnu.org
Fixes: 4c70875372 ("pci: advertise a page aligned ATS")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20210406040330.11306-1-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-04-06 07:11:36 -04:00
Yuri Benditovich
51e0e42cab virtio-pci: remove explicit initialization of val
The value is assigned later in this procedure.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Message-Id: <20210315115937.14286-3-yuri.benditovich@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-04-01 11:39:12 -04:00
Yuri Benditovich
c3fd706165 virtio-pci: add check for vdev in virtio_pci_isr_read
https://bugzilla.redhat.com/show_bug.cgi?id=1743098
This commit completes the solution of segfault in hot unplug flow
(by commit ccec7e9603).
Added missing check for vdev in virtio_pci_isr_read.
Typical stack of crash:
virtio_pci_isr_read ../hw/virtio/virtio-pci.c:1365 with proxy-vdev = 0
memory_region_read_accessor at ../softmmu/memory.c:442
access_with_adjusted_size at ../softmmu/memory.c:552
memory_region_dispatch_read1 at ../softmmu/memory.c:1420
memory_region_dispatch_read  at ../softmmu/memory.c:1449
flatview_read_continue at ../softmmu/physmem.c:2822
flatview_read at ../softmmu/physmem.c:2862
address_space_read_full at ../softmmu/physmem.c:2875

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Message-Id: <20210315115937.14286-2-yuri.benditovich@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-04-01 11:39:12 -04:00
Wang Liang
d2adda34a9 virtio-pmem: fix virtio_pmem_resp assign problem
ret in virtio_pmem_resp is a uint32_t variable, which should be assigned
using virtio_stl_p.

The kernel side driver does not guarantee virtio_pmem_resp to be initialized
to zero in advance, So sometimes the flush operation will fail.

Signed-off-by: Wang Liang <wangliangzz@inspur.com>
Message-Id: <20210317024145.271212-1-wangliangzz@126.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22 10:17:53 -04:00
Greg Kurz
db8a3772e3 vhost-user: Monitor slave channel in vhost_user_read()
Now that everything is in place, have the nested event loop to monitor
the slave channel. The source in the main event loop is destroyed and
recreated to ensure any pending even for the slave channel that was
previously detected is purged. This guarantees that the main loop
wont invoke slave_read() based on an event that was already handled
by the nested loop.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210312092212.782255-7-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-22 10:17:53 -04:00
Greg Kurz
a7f523c7d1 vhost-user: Introduce nested event loop in vhost_user_read()
A deadlock condition potentially exists if a vhost-user process needs
to request something to QEMU on the slave channel while processing a
vhost-user message.

This doesn't seem to affect any vhost-user implementation so far, but
this is currently biting the upcoming enablement of DAX with virtio-fs.
The issue is being observed when the guest does an emergency reboot while
a mapping still exits in the DAX window, which is very easy to get with
a busy enough workload (e.g. as simulated by blogbench [1]) :

- QEMU sends VHOST_USER_GET_VRING_BASE to virtiofsd.

- In order to complete the request, virtiofsd then asks QEMU to remove
  the mapping on the slave channel.

All these dialogs are synchronous, hence the deadlock.

As pointed out by Stefan Hajnoczi:

When QEMU's vhost-user master implementation sends a vhost-user protocol
message, vhost_user_read() does a "blocking" read during which slave_fd
is not monitored by QEMU.

The natural solution for this issue is an event loop. The main event
loop cannot be nested though since we have no guarantees that its
fd handlers are prepared for re-entrancy.

Introduce a new event loop that only monitors the chardev I/O for now
in vhost_user_read() and push the actual reading to a one-shot handler.
A subsequent patch will teach the loop to monitor and process messages
from the slave channel as well.

[1] https://github.com/jedisct1/Blogbench

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210312092212.782255-6-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-22 10:17:53 -04:00
Greg Kurz
57dc02173c vhost-user: Convert slave channel to QIOChannelSocket
The slave channel is implemented with socketpair() : QEMU creates
the pair, passes one of the socket to virtiofsd and monitors the
other one with the main event loop using qemu_set_fd_handler().

In order to fix a potential deadlock between QEMU and a vhost-user
external process (e.g. virtiofsd with DAX), we want to be able to
monitor and service the slave channel while handling vhost-user
requests.

Prepare ground for this by converting the slave channel to be a
QIOChannelSocket. This will make monitoring of the slave channel
as simple as calling qio_channel_add_watch_source(). Since the
connection is already established between the two sockets, only
incoming I/O (G_IO_IN) and disconnect (G_IO_HUP) need to be
serviced.

This also allows to get rid of the ancillary data parsing since
QIOChannelSocket can do this for us. Note that the MSG_CTRUNC
check is dropped on the way because QIOChannelSocket ignores this
case. This isn't a problem since slave_read() provisions space for
8 file descriptors, but affected vhost-user slave protocol messages
generally only convey one. If for some reason a buggy implementation
passes more file descriptors, no need to break the connection, just
like we don't break it if some other type of ancillary data is
received : this isn't explicitely violating the protocol per-se so
it seems better to ignore it.

The current code errors out on short reads and writes. Use the
qio_channel_*_all() variants to address this on the way.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210312092212.782255-5-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-22 10:17:53 -04:00
Greg Kurz
de62e49460 vhost-user: Factor out duplicated slave_fd teardown code
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210312092212.782255-4-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-22 10:17:53 -04:00
Greg Kurz
9e06080bed vhost-user: Fix double-close on slave_read() error path
Some message types, e.g. VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG,
can convey file descriptors. These must be closed before returning
from slave_read() to avoid being leaked. This can currently be done
in two different places:

[1] just after the request has been processed

[2] on the error path, under the goto label err:

These path are supposed to be mutually exclusive but they are not
actually. If the VHOST_USER_NEED_REPLY_MASK flag was passed and the
sending of the reply fails, both [1] and [2] are performed with the
same descriptor values. This can potentially cause subtle bugs if one
of the descriptor was recycled by some other thread in the meantime.

This code duplication complicates rollback for no real good benefit.
Do the closing in a unique place, under a new fdcleanup: goto label
at the end of the function.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210312092212.782255-3-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-22 10:17:53 -04:00
Greg Kurz
a890557d5a vhost-user: Drop misleading EAGAIN checks in slave_read()
slave_read() checks EAGAIN when reading or writing to the socket
fails. This gives the impression that the slave channel is in
non-blocking mode, which is certainly not the case with the current
code base. And the rest of the code isn't actually ready to cope
with non-blocking I/O.

Just drop the checks everywhere in this function for the sake of
clarity.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210312092212.782255-2-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-22 10:17:53 -04:00
Laurent Vivier
0ab8c021c6 virtio: Fix virtio_mmio_read()/virtio_mmio_write()
Both functions don't check the personality of the interface (legacy or
modern) before accessing the configuration memory and always use
virtio_config_readX()/virtio_config_writeX().

With this patch, they now check the personality and in legacy mode
call virtio_config_readX()/virtio_config_writeX(), otherwise call
virtio_config_modern_readX()/virtio_config_modern_writeX().

This change has been tested with virtio-mmio guests (virt stretch/armhf and
virt sid/m68k) and virtio-pci guests (pseries RHEL-7.3/ppc64 and /ppc64le).

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210314200300.3259170-1-laurent@vivier.eu>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22 10:17:53 -04:00
Jason Wang
51a81a2118 virtio-net: calculating proper msix vectors on init
Currently, the default msix vectors for virtio-net-pci is 3 which is
obvious not suitable for multiqueue guest, so we depends on the user
or management tools to pass a correct vectors parameter. In fact, we
can simplifying this by calculating the number of vectors on realize.

Consider we have N queues, the number of vectors needed is 2*N + 2
(#queue pairs + plus one config interrupt and control vq). We didn't
check whether or not host support control vq because it was added
unconditionally by qemu to avoid breaking legacy guests such as Minix.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2021-03-15 16:41:22 +08:00
Eric Auger
dde3f08b5c virtio-iommu: Handle non power of 2 range invalidations
Unmap notifiers work with an address mask assuming an
invalidation range of a power of 2. Nothing mandates this
in the VIRTIO-IOMMU spec.

So in case the range is not a power of 2, split it into
several invalidations.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-id: 20210309102742.30442-4-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-12 12:40:10 +00:00
Philippe Mathieu-Daudé
538f049704 sysemu: Let VMChangeStateHandler take boolean 'running' argument
The 'running' argument from VMChangeStateHandler does not require
other value than 0 / 1. Make it a plain boolean.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20210111152020.1422021-3-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-03-09 23:13:57 +01:00
Michael Tokarev
cba42d61a3 Various spelling fixes
An assorted set of spelling fixes in various places.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210309111510.79495-1-mjt@msgid.tls.msk.ru>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-03-09 21:19:10 +01:00
schspa
819b349619 virtio-mmio: improve virtio-mmio get_dev_path alog
At the moment the following QEMU command line triggers an assertion
failure On xlnx-versal SOC:
  qemu-system-aarch64 \
      -machine xlnx-versal-virt -nographic -smp 2 -m 128 \
      -fsdev local,id=shareid,path=${HOME}/work,security_model=none \
      -device virtio-9p-device,fsdev=shareid,mount_tag=share \
      -fsdev local,id=shareid1,path=${HOME}/Music,security_model=none \
      -device virtio-9p-device,fsdev=shareid1,mount_tag=share1

  qemu-system-aarch64: ../migration/savevm.c:860:
  vmstate_register_with_alias_id:
  Assertion `!se->compat || se->instance_id == 0' failed.

This problem was fixed on arm virt platform in commit f58b39d2d5
("virtio-mmio: format transport base address in BusClass.get_dev_path")

It works perfectly on arm virt platform. but there is still there on
xlnx-versal SOC.

The main difference between arm virt and xlnx-versal is they use
different way to create virtio-mmio qdev. on arm virt, it calls
sysbus_create_simple("virtio-mmio", base, pic[irq]); which will call
sysbus_mmio_map internally and assign base address to subsys device
mmio correctly. but xlnx-versal's implements won't do this.

However, xlnx-versal can't switch to sysbus_create_simple() to create
virtio-mmio device. It's because xlnx-versal's cpu use
VersalVirt.soc.fpd.apu.mr as it's memory. which is subregion of
system_memory. sysbus_create_simple will add virtio to system_memory,
which can't be accessed by cpu.

Besides, xlnx-versal can't add sysbus_mmio_map api call too, because
this will add memory region to system_memory, and it can't be added
to VersalVirt.soc.fpd.apu.mr again.

We can solve this by assign correct base address offset on dev_path.

This path was test on aarch64 virt & xlnx-versal platform.

Signed-off-by: schspa <schspa@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-05 15:17:34 +00:00
Stefan Hajnoczi
1d8d014e93 vhost: simplify vhost_dev_init() fail_busyloop label
Requiring a conditional for every goto is tedious:

  if (busyloop_timeout) {
      goto fail_busyloop;
  } else {
      goto fail;
  }

Move the conditional to into the fail_busyloop label so that it's safe
to jump to this label unconditionally.

This change makes the migrate_add_blocker() error case more consistent.
It jumped to fail_busyloop unconditionally whereas the memslots limits
error case was conditional.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210222114931.272308-1-stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-02 09:16:18 -05:00
Gan Qixin
d3649bfca7 virtio-pmem: put it into the 'storage' category
The category of the virtio-pmem device is not set, put it into the 'storage'
category.

Signed-off-by: Gan Qixin <ganqixin@huawei.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Message-Id: <20201130083630.2520597-3-ganqixin@huawei.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-02-20 12:36:19 +01:00
Alex Bennée
b74259e3de hw/virtio/pci: include vdev name in registered PCI sections
When viewing/debugging memory regions it is sometimes hard to figure
out which PCI device something belongs to. Make the names unique by
including the vdev name in the name string.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20210213130325.14781-2-alex.bennee@linaro.org>
2021-02-18 08:16:42 +00:00
Eugenio Pérez
4d1ccc17f4 vhost: Check for valid vdev in vhost_backend_handle_iotlb_msg
Not checking this can lead to invalid dev->vdev member access in
vhost_device_iotlb_miss if backend issue an iotlb message in a bad
timing, either maliciously or by a bug.

Reproduced rebooting a guest with testpmd in txonly forward mode.
 #0  0x0000559ffff94394 in vhost_device_iotlb_miss (
     dev=dev@entry=0x55a0012f6680, iova=10245279744, write=1)
     at ../hw/virtio/vhost.c:1013
 #1  0x0000559ffff9ac31 in vhost_backend_handle_iotlb_msg (
     imsg=0x7ffddcfd32c0, dev=0x55a0012f6680)
     at ../hw/virtio/vhost-backend.c:411
 #2  vhost_backend_handle_iotlb_msg (dev=dev@entry=0x55a0012f6680,
     imsg=imsg@entry=0x7ffddcfd32c0)
     at ../hw/virtio/vhost-backend.c:404
 #3  0x0000559fffeded7b in slave_read (opaque=0x55a0012f6680)
     at ../hw/virtio/vhost-user.c:1464
 #4  0x000055a0000c541b in aio_dispatch_handler (
     ctx=ctx@entry=0x55a0010a2120, node=0x55a0012d9e00)
     at ../util/aio-posix.c:329

Fixes: 020e571b8b ("vhost: rework IOTLB messaging")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20210129090728.831208-1-eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-05 08:52:59 -05:00
Thomas Huth
d8be0c6b68 hw/virtio/virtio-balloon: Remove the "class" property
This property was only required for compatibility reasons in the
pc-1.0 machine type and earlier. Now that these machine types have
been removed, the property is not useful anymore.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210203171832.483176-4-thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
2021-02-05 08:52:59 -05:00
Peter Xu
958ec334bc vhost: Unbreak SMMU and virtio-iommu on dev-iotlb support
Previous work on dev-iotlb message broke vhost on either SMMU or virtio-iommu
since dev-iotlb (or PCIe ATS) is not yet supported for those archs.

An initial idea is that we can let IOMMU to export this information to vhost so
that vhost would know whether the vIOMMU would support dev-iotlb, then vhost
can conditionally register to dev-iotlb or the old iotlb way.  We can work
based on some previous patch to introduce PCIIOMMUOps as Yi Liu proposed [1].

However it's not as easy as I thought since vhost_iommu_region_add() does not
have a PCIDevice context at all since it's completely a backend.  It seems
non-trivial to pass over a PCI device to the backend during init.  E.g. when
the IOMMU notifier registered hdev->vdev is still NULL.

To make the fix smaller and easier, this patch goes the other way to leverage
the flag_changed() hook of vIOMMUs so that SMMU and virtio-iommu can trap the
dev-iotlb registration and fail it.  Then vhost could try the fallback solution
as using UNMAP invalidation for it's translations.

[1] https://lore.kernel.org/qemu-devel/1599735398-6829-4-git-send-email-yi.l.liu@intel.com/

Reported-by: Eric Auger <eric.auger@redhat.com>
Fixes: b68ba1ca57
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210204191228.187550-1-peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-05 08:52:58 -05:00
Pankaj Gupta
73b123073d virtio-pmem: add trace events
This patch adds trace events for virtio-pmem functionality.
Adding trace events for virtio pmem request, reponse and host
side fsync functionality.

Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Message-Id: <20201117115705.32195-1-pankaj.gupta.linux@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-05 08:52:58 -05:00
Eugenio Pérez
f6ab64c05f virtio: Add corresponding memory_listener_unregister to unrealize
Address space is destroyed without proper removal of its listeners with
current code. They are expected to be removed in
virtio_device_instance_finalize [1], but qemu calls it through
object_deinit, after address_space_destroy call through
device_set_realized [2].

Move it to virtio_device_unrealize, called before device_set_realized
[3] and making it symmetric with memory_listener_register in
virtio_device_realize.

v2: Delete no-op call of virtio_device_instance_finalize.
    Add backtraces.

[1]

 #0  virtio_device_instance_finalize (obj=0x555557de5120)
     at /home/qemu/include/hw/virtio/virtio.h:71
 #1  0x0000555555b703c9 in object_deinit (type=0x555556639860,
      obj=<optimized out>) at ../qom/object.c:671
 #2  object_finalize (data=0x555557de5120) at ../qom/object.c:685
 #3  object_unref (objptr=0x555557de5120) at ../qom/object.c:1184
 #4  0x0000555555b4de9d in bus_free_bus_child (kid=0x555557df0660)
     at ../hw/core/qdev.c:55
 #5  0x0000555555c65003 in call_rcu_thread (opaque=opaque@entry=0x0)
     at ../util/rcu.c:281

Queued by:

 #0  bus_remove_child (bus=0x555557de5098,
     child=child@entry=0x555557de5120) at ../hw/core/qdev.c:60
 #1  0x0000555555b4ee31 in device_unparent (obj=<optimized out>)
     at ../hw/core/qdev.c:984
 #2  0x0000555555b70465 in object_finalize_child_property (
     obj=<optimized out>, name=<optimized out>, opaque=0x555557de5120)
     at ../qom/object.c:1725
 #3  0x0000555555b6fa17 in object_property_del_child (
     child=0x555557de5120, obj=0x555557ddcf90) at ../qom/object.c:645
 #4  object_unparent (obj=0x555557de5120) at ../qom/object.c:664
 #5  0x0000555555b4c071 in bus_unparent (obj=<optimized out>)
     at ../hw/core/bus.c:147
 #6  0x0000555555b70465 in object_finalize_child_property (
     obj=<optimized out>, name=<optimized out>, opaque=0x555557de5098)
     at ../qom/object.c:1725
 #7  0x0000555555b6fa17 in object_property_del_child (
     child=0x555557de5098, obj=0x555557ddcf90) at ../qom/object.c:645
 #8  object_unparent (obj=0x555557de5098) at ../qom/object.c:664
 #9  0x0000555555b4ee19 in device_unparent (obj=<optimized out>)
     at ../hw/core/qdev.c:981
 #10 0x0000555555b70465 in object_finalize_child_property (
     obj=<optimized out>, name=<optimized out>, opaque=0x555557ddcf90)
     at ../qom/object.c:1725
 #11 0x0000555555b6fa17 in object_property_del_child (
     child=0x555557ddcf90, obj=0x55555685da10) at ../qom/object.c:645
 #12 object_unparent (obj=0x555557ddcf90) at ../qom/object.c:664
 #13 0x00005555558dc331 in pci_for_each_device_under_bus (
     opaque=<optimized out>, fn=<optimized out>, bus=<optimized out>)
     at ../hw/pci/pci.c:1654

[2]

Optimizer omits pci_qdev_unrealize, called by device_set_realized, and
do_pci_unregister_device, called by pci_qdev_unrealize and caller of
address_space_destroy.

 #0  address_space_destroy (as=0x555557ddd1b8)
     at ../softmmu/memory.c:2840
 #1  0x0000555555b4fc53 in device_set_realized (obj=0x555557ddcf90,
      value=<optimized out>, errp=0x7fffeea8f1e0)
     at ../hw/core/qdev.c:850
 #2  0x0000555555b6eaa6 in property_set_bool (obj=0x555557ddcf90,
      v=<optimized out>, name=<optimized out>, opaque=0x555556650ba0,
     errp=0x7fffeea8f1e0) at ../qom/object.c:2255
 #3  0x0000555555b70e07 in object_property_set (
      obj=obj@entry=0x555557ddcf90,
      name=name@entry=0x555555db99df "realized",
      v=v@entry=0x7fffe46b7500,
      errp=errp@entry=0x5555565bbf38 <error_abort>)
     at ../qom/object.c:1400
 #4  0x0000555555b73c5f in object_property_set_qobject (
      obj=obj@entry=0x555557ddcf90,
      name=name@entry=0x555555db99df "realized",
      value=value@entry=0x7fffe44f6180,
      errp=errp@entry=0x5555565bbf38 <error_abort>)
     at ../qom/qom-qobject.c:28
 #5  0x0000555555b71044 in object_property_set_bool (
      obj=0x555557ddcf90, name=0x555555db99df "realized",
      value=<optimized out>, errp=0x5555565bbf38 <error_abort>)
     at ../qom/object.c:1470
 #6  0x0000555555921cb7 in pcie_unplug_device (bus=<optimized out>,
      dev=0x555557ddcf90,
      opaque=<optimized out>) at /home/qemu/include/hw/qdev-core.h:17
 #7  0x00005555558dc331 in pci_for_each_device_under_bus (
      opaque=<optimized out>, fn=<optimized out>,
      bus=<optimized out>) at ../hw/pci/pci.c:1654

[3]

 #0  virtio_device_unrealize (dev=0x555557de5120)
     at ../hw/virtio/virtio.c:3680
 #1  0x0000555555b4fc63 in device_set_realized (obj=0x555557de5120,
     value=<optimized out>, errp=0x7fffee28df90)
     at ../hw/core/qdev.c:850
 #2  0x0000555555b6eab6 in property_set_bool (obj=0x555557de5120,
     v=<optimized out>, name=<optimized out>, opaque=0x555556650ba0,
     errp=0x7fffee28df90) at ../qom/object.c:2255
 #3  0x0000555555b70e17 in object_property_set (
     obj=obj@entry=0x555557de5120,
     name=name@entry=0x555555db99ff "realized",
     v=v@entry=0x7ffdd8035040,
     errp=errp@entry=0x5555565bbf38 <error_abort>)
     at ../qom/object.c:1400
 #4  0x0000555555b73c6f in object_property_set_qobject (
     obj=obj@entry=0x555557de5120,
     name=name@entry=0x555555db99ff "realized",
     value=value@entry=0x7ffdd8035020,
     errp=errp@entry=0x5555565bbf38 <error_abort>)
     at ../qom/qom-qobject.c:28
 #5  0x0000555555b71054 in object_property_set_bool (
     obj=0x555557de5120, name=name@entry=0x555555db99ff "realized",
     value=value@entry=false, errp=0x5555565bbf38 <error_abort>)
     at ../qom/object.c:1470
 #6  0x0000555555b4edc5 in qdev_unrealize (dev=<optimized out>)
     at ../hw/core/qdev.c:403
 #7  0x0000555555b4c2a9 in bus_set_realized (obj=<optimized out>,
     value=<optimized out>, errp=<optimized out>)
     at ../hw/core/bus.c:204
 #8  0x0000555555b6eab6 in property_set_bool (obj=0x555557de5098,
     v=<optimized out>, name=<optimized out>, opaque=0x555557df04c0,
     errp=0x7fffee28e0a0) at ../qom/object.c:2255
 #9  0x0000555555b70e17 in object_property_set (
     obj=obj@entry=0x555557de5098,
     name=name@entry=0x555555db99ff "realized",
     v=v@entry=0x7ffdd8034f50,
     errp=errp@entry=0x5555565bbf38 <error_abort>)
     at ../qom/object.c:1400
 #10 0x0000555555b73c6f in object_property_set_qobject (
     obj=obj@entry=0x555557de5098,
     name=name@entry=0x555555db99ff "realized",
     value=value@entry=0x7ffdd8020630,
     errp=errp@entry=0x5555565bbf38 <error_abort>)
     at ../qom/qom-qobject.c:28
 #11 0x0000555555b71054 in object_property_set_bool (
     obj=obj@entry=0x555557de5098,
     name=name@entry=0x555555db99ff "realized",
     value=value@entry=false, errp=0x5555565bbf38 <error_abort>)
     at ../qom/object.c:1470
 #12 0x0000555555b4c725 in qbus_unrealize (
     bus=bus@entry=0x555557de5098) at ../hw/core/bus.c:178
 #13 0x0000555555b4fc00 in device_set_realized (obj=0x555557ddcf90,
     value=<optimized out>, errp=0x7fffee28e1e0)
     at ../hw/core/qdev.c:844
 #14 0x0000555555b6eab6 in property_set_bool (obj=0x555557ddcf90,
     v=<optimized out>, name=<optimized out>, opaque=0x555556650ba0,
     errp=0x7fffee28e1e0) at ../qom/object.c:2255
 #15 0x0000555555b70e17 in object_property_set (
     obj=obj@entry=0x555557ddcf90,
     name=name@entry=0x555555db99ff "realized",
     v=v@entry=0x7ffdd8020560,
     errp=errp@entry=0x5555565bbf38 <error_abort>)
     at ../qom/object.c:1400
 #16 0x0000555555b73c6f in object_property_set_qobject (
     obj=obj@entry=0x555557ddcf90,
     name=name@entry=0x555555db99ff "realized",
     value=value@entry=0x7ffdd8020540,
     errp=errp@entry=0x5555565bbf38 <error_abort>)
     at ../qom/qom-qobject.c:28
 #17 0x0000555555b71054 in object_property_set_bool (
     obj=0x555557ddcf90, name=0x555555db99ff "realized",
     value=<optimized out>, errp=0x5555565bbf38 <error_abort>)
     at ../qom/object.c:1470
 #18 0x0000555555921cb7 in pcie_unplug_device (bus=<optimized out>,
     dev=0x555557ddcf90, opaque=<optimized out>)
     at /home/qemu/include/hw/qdev-core.h:17
 #19 0x00005555558dc331 in pci_for_each_device_under_bus (
     opaque=<optimized out>, fn=<optimized out>, bus=<optimized out>)
     at ../hw/pci/pci.c:1654

Fixes: c611c76417 ("virtio: add MemoryListener to cache ring translations")
Buglink: https://bugs.launchpad.net/qemu/+bug/1912846
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20210125192505.390554-1-eperezma@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-05 08:52:58 -05:00
Laurent Vivier
2d9e7d4cdd virtio-mmio: fix guest kernel crash with SHM regions
In the kernel, virtio_gpu_init() uses virtio_get_shm_region()
since
commit 6076a9711dc5 ("drm/virtio: implement blob resources: probe for host visible region")
but vm_get_shm_region() unconditionally uses VIRTIO_MMIO_SHM_SEL to
get the address and the length of the region.

commit 38e895487afc ("virtio: Implement get_shm_region for MMIO transport"

As this is not implemented in QEMU, address and length are 0 and passed
as is to devm_request_mem_region() that triggers a crash:

  [drm:virtio_gpu_init] *ERROR* Could not reserve host visible region
  Unable to handle kernel NULL pointer dereference at virtual address (ptrval)

According to the comments in the kernel, a non existent shared region
has a length of (u64)-1.

This is what we return now with this patch to disable the region.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20201220163539.2255963-1-laurent@vivier.eu>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-05 08:52:58 -05:00
Philippe Mathieu-Daudé
a82185880e hw/virtio-pci: Replace error_report() by qemu_log_mask(GUEST_ERROR)
Replace I/O write error reported with error_report() by
qemu_log_mask(GUEST_ERROR) which allow filtering.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20201210172834.178052-1-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-01-18 11:51:26 +01:00
Laszlo Ersek
6da32fe5ef vhost-user-fs: add the "bootindex" property
virtio-fs qualifies as a bootable device minimally under OVMF, but
currently the necessary "bootindex" property is missing. Add the property.

Expose the property only in the PCI device, for now. There is no boot
support for virtiofs on s390x (ccw) for the time being [1] [2], so leave
the CCW device unchanged. Add the property to the base device still,
because adding the alias to the CCW device later will be easier this way
[3].

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg01745.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg01870.html
[3] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg01751.html

Example OpenFirmware device path for the "vhost-user-fs-pci" device in the
"bootorder" fw_cfg file:

  /pci@i0cf8/pci-bridge@1,6/pci1af4,105a@0/filesystem@0

Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Ján Tomko <jtomko@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: virtio-fs@redhat.com
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210112131603.12686-1-lersek@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-01-13 09:06:37 -05:00
Peter Maydell
729cc68373 Remove superfluous timer_del() calls
This commit is the result of running the timer-del-timer-free.cocci
script on the whole source tree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201215154107.3255-4-peter.maydell@linaro.org
2021-01-08 15:13:38 +00:00
Peter Maydell
41192db338 Machine queue, 2020-12-23
Cleanup:
 * qdev code cleanup (Eduardo Habkost)
 
 Bug fix:
 * hostmem: Free host_nodes list right after visited (Keqian Zhu)
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEWjIv1avE09usz9GqKAeTb5hNxaYFAl/jteYUHGVoYWJrb3N0
 QHJlZGhhdC5jb20ACgkQKAeTb5hNxaZUHw//c40nRlYdGSV5j6w3ZCSlmZRFxZTU
 UiLK51Z3hI9Q9kyLcoIQitEYlQTIbgp0qlIJ6evDd/HvQvZ+P4P0Lzm1UGOZhD0h
 nJk5+bBkP/mzMh0P9oiN20DSLk6a3Wvdiu/bQR8gm/WdLvTM1Zjek1ns5tL06ZvA
 MziG6gIypgScu2FeNxD0zC8sDO16oVrzKq7mjZcQe6XYFRsJmYjZw84v+uu/Bdf7
 MBxolkA8vYwwBJNdVsAf7I0Gw3BeArgPUOwbWyt8/tuGIOZxYjdKIj55S7j2fuju
 524sg8Di+YzxmLZaNAGksEBMj9uY39nwdHGhNElMtWCM9oOPumlps9eyLtpTagfM
 wmiVrMGWVlXV6c4kZo8R2NSF8hcDr02S7eyrUpITrh09p4nT6fBGG2ufEYiCyNao
 o9ZqMf7NUO5J60zM5EOfdGxpaN2O0M5pXCCN48NtmqvO0wIAfTc9l/OkCrrfVbEO
 Q/X1jqbj6ZcilSIl9OeLAPi7Xjx26jMeeLPUQtoZnkqDvpk/Vz6Ka1RgGG86QA5z
 2W/KCAoVrg6dO4f9vY3x84rf0Ta5kJtp2LezPgG8d++4bMSf2jN00wYvAQuCyqqW
 zbm8f57YST3vm8XMHPlmtnlKfiLI4wbVUmrDYu3rNI+JgdvhdXseGoErt15ejAcL
 B5IH2SK4AwMpSsk=
 =bnjc
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ehabkost-gl/tags/machine-next-pull-request' into staging

Machine queue, 2020-12-23

Cleanup:
* qdev code cleanup (Eduardo Habkost)

Bug fix:
* hostmem: Free host_nodes list right after visited (Keqian Zhu)

# gpg: Signature made Wed 23 Dec 2020 21:25:58 GMT
# gpg:                using RSA key 5A322FD5ABC4D3DBACCFD1AA2807936F984DC5A6
# gpg:                issuer "ehabkost@redhat.com"
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" [full]
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost-gl/tags/machine-next-pull-request:
  bugfix: hostmem: Free host_nodes list right after visited
  qdev: Avoid unnecessary DeviceState* variable at set_prop_arraylen()
  qdev: Rename qdev_get_prop_ptr() to object_field_prop_ptr()
  qdev: Move qdev_prop_tpm declaration to tpm_prop.h
  qdev: Make qdev_class_add_property() more flexible
  qdev: Make PropertyInfo.create return ObjectProperty*
  qdev: Move dev->realized check to qdev_property_set()
  qdev: Wrap getters and setters in separate helpers
  qdev: Add name argument to PropertyInfo.create method
  qdev: Add name parameter to qdev_class_add_property()
  qdev: Avoid using prop->name unnecessarily
  qdev: Get just property name at error_set_from_qdev_prop_error()
  sparc: Use DEFINE_PROP for nwindows property
  qdev: Reuse DEFINE_PROP in all DEFINE_PROP_* macros
  qdev: Move softmmu properties to qdev-properties-system.h

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-01-01 22:57:15 +00:00
Markus Armbruster
3ddba9a9e9 migration: Replace migration's JSON writer by the general one
Commit 8118f0950f "migration: Append JSON description of migration
stream" needs a JSON writer.  The existing qobject_to_json() wasn't a
good fit, because it requires building a QObject to convert.  Instead,
migration got its very own JSON writer, in commit 190c882ce2 "QJSON:
Add JSON writer".  It tacitly limits numbers to int64_t, and strings
contents to characters that don't need escaping, unlike
qobject_to_json().

The previous commit factored the JSON writer out of qobject_to_json().
Replace migration's JSON writer by it.

Cc: Juan Quintela <quintela@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201211171152.146877-17-armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-12-19 10:39:16 +01:00
Eduardo Habkost
ce35e2295e qdev: Move softmmu properties to qdev-properties-system.h
Move the property types and property macros implemented in
qdev-properties-system.c to a new qdev-properties-system.h
header.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20201211220529.2290218-16-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-12-18 15:20:17 -05:00
Paolo Bonzini
2f181fbd5a machine: introduce MachineInitPhase
Generalize the qdev_hotplug variable to the different phases of
machine initialization.  We would like to allow different
monitor commands depending on the phase.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15 12:51:52 -05:00
Paolo Bonzini
b326b6ea79 make ram_size local to vl.c
Use the machine properties for the leftovers too.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-10 12:15:10 -05:00
Andrew Melnychenko
fdfa3b1d6f hw/virtio-pci Added AER capability.
Added AER capability for virtio-pci devices.
Also added property for devices, by default AER is disabled.

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Message-Id: <20201203110713.204938-3-andrew@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-09 13:04:17 -05:00
Andrew Melnychenko
06e9744242 hw/virtio-pci Added counter for pcie capabilities offsets.
Removed hardcoded offset for ats. Added cap offset counter
for future capabilities like AER.

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Message-Id: <20201203110713.204938-2-andrew@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-09 13:04:17 -05:00
John Levon
4aedda25e8 virtio: reset device on bad guest index in virtio_load()
If we find a queue with an inconsistent guest index value, explicitly mark the
device as needing a reset - and broken - via virtio_error().

There's at least one driver implementation - the virtio-win NetKVM driver - that
is able to handle a VIRTIO_CONFIG_S_NEEDS_RESET notification and successfully
restore the device to a working state. Other implementations do not correctly
handle this, but as the VQ is not in a functional state anyway, this is still
worth doing.

Signed-off-by: John Levon <john.levon@nutanix.com>
Message-Id: <20201120185103.GA442386@sent>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-08 13:48:57 -05:00
Eugenio Pérez
b68ba1ca57 memory: Add IOMMU_NOTIFIER_DEVIOTLB_UNMAP IOMMUTLBNotificationType
This allows us to differentiate between regular IOMMU map/unmap events
and DEVIOTLB unmap. Doing so, notifiers that only need device IOTLB
invalidations will not receive regular IOMMU unmappings.

Adapt intel and vhost to use it.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20201116165506.31315-4-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-08 13:48:57 -05:00
Eugenio Pérez
5039caf3c4 memory: Add IOMMUTLBEvent
This way we can tell between regular IOMMUTLBEntry (entry of IOMMU
hardware) and notifications.

In the notifications, we set explicitly if it is a MAPs or an UNMAP,
instead of trusting in entry permissions to differentiate them.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20201116165506.31315-3-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-08 13:48:57 -05:00
Stefan Hajnoczi
3009edff81 vhost-user: fix VHOST_USER_ADD/REM_MEM_REG truncation
QEMU currently truncates the mmap_offset field when sending
VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG messages. The struct
layout looks like this:

  typedef struct VhostUserMemoryRegion {
      uint64_t guest_phys_addr;
      uint64_t memory_size;
      uint64_t userspace_addr;
      uint64_t mmap_offset;
  } VhostUserMemoryRegion;

  typedef struct VhostUserMemRegMsg {
      uint32_t padding;
      /* WARNING: there is a 32-bit hole here! */
      VhostUserMemoryRegion region;
  } VhostUserMemRegMsg;

The payload size is calculated as follows when sending the message in
hw/virtio/vhost-user.c:

  msg->hdr.size = sizeof(msg->payload.mem_reg.padding) +
      sizeof(VhostUserMemoryRegion);

This calculation produces an incorrect result of only 36 bytes.
sizeof(VhostUserMemRegMsg) is actually 40 bytes.

The consequence of this is that the final field, mmap_offset, is
truncated. This breaks x86_64 TCG guests on s390 hosts. Other guest/host
combinations may get lucky if either of the following holds:
1. The guest memory layout does not need mmap_offset != 0.
2. The host is little-endian and mmap_offset <= 0xffffffff so the
   truncation has no effect.

Fix this by extending the existing 32-bit padding field to 64-bit. Now
the padding reflects the actual compiler padding. This can be verified
using pahole(1).

Also document the layout properly in the vhost-user specification.  The
vhost-user spec did not document the exact layout. It would be
impossible to implement the spec without looking at the QEMU source
code.

Existing vhost-user frontends and device backends continue to work after
this fix has been applied. The only change in the wire protocol is that
QEMU now sets hdr.size to 40 instead of 36. If a vhost-user
implementation has a hardcoded size check for 36 bytes, then it will
fail with new QEMUs. Both QEMU and DPDK/SPDK don't check the exact
payload size, so they continue to work.

Fixes: f1aeb14b08 ("Transmit vhost-user memory regions individually")
Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201109174355.1069147-1-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: f1aeb14b08 ("Transmit vhost-user memory regions individually")
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
2020-11-12 09:19:40 -05:00
Jin Yu
1b0063b304 vhost-blk: set features before setting inflight feature
Virtqueue has split and packed, so before setting inflight,
you need to inform the back-end virtqueue format.

Signed-off-by: Jin Yu <jin.yu@intel.com>
Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20201103123617.28256-1-jin.yu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 16:39:05 -05:00
Stefan Hajnoczi
b7c1bd9d78 Revert "vhost-blk: set features before setting inflight feature"
This reverts commit adb29c0273.

The commit broke -device vhost-user-blk-pci because the
vhost_dev_prepare_inflight() function it introduced segfaults in
vhost_dev_set_features() when attempting to access struct vhost_dev's
vdev pointer before it has been assigned.

To reproduce the segfault simply launch a vhost-user-blk device with the
contrib vhost-user-blk device backend:

  $ build/contrib/vhost-user-blk/vhost-user-blk -s /tmp/vhost-user-blk.sock -r -b /var/tmp/foo.img
  $ build/qemu-system-x86_64 \
        -device vhost-user-blk-pci,id=drv0,chardev=char1,addr=4.0 \
        -object memory-backend-memfd,id=mem,size=1G,share=on \
        -M memory-backend=mem,accel=kvm \
        -chardev socket,id=char1,path=/tmp/vhost-user-blk.sock
  Segmentation fault (core dumped)

Cc: Jin Yu <jin.yu@intel.com>
Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201102165709.232180-1-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 16:39:05 -05:00
Bharat Bhushan
5c3cfe33f4 virtio-iommu: Set supported page size mask
The virtio-iommu device can deal with arbitrary page sizes for virtual
endpoints, but for endpoints assigned with VFIO it must follow the page
granule used by the host IOMMU driver.

Implement the interface to set the vIOMMU page size mask, called by VFIO
for each endpoint. We assume that all host IOMMU drivers use the same
page granule (the host page granule). Override the page_size_mask field
in the virtio config space.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-10-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 07:19:27 -05:00
Bharat Bhushan
6978bfaa68 virtio-iommu: Add notify_flag_changed() memory region callback
Add notify_flag_changed() to notice when memory listeners are added and
removed.

Acked-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-7-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 07:19:27 -05:00
Bharat Bhushan
308e5e1b5f virtio-iommu: Add replay() memory region callback
Implement the replay callback to setup all mappings for a new memory
region.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-6-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 07:19:27 -05:00
Bharat Bhushan
2f6eeb5f0b virtio-iommu: Call memory notifiers in attach/detach
Call the memory notifiers when attaching an endpoint to a domain, to
replay existing mappings, and when detaching the endpoint, to remove all
mappings.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-5-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 07:19:27 -05:00
Bharat Bhushan
15e4c8f01b virtio-iommu: Add memory notifiers for map/unmap
Extend VIRTIO_IOMMU_T_MAP/UNMAP request to notify memory listeners. It
will call VFIO notifier to map/unmap regions in the physical IOMMU.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-4-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 07:19:27 -05:00
Jean-Philippe Brucker
31aa323fb9 virtio-iommu: Store memory region in endpoint struct
Store the memory region associated to each endpoint into the endpoint
structure, to allow efficient memory notification on map/unmap.

Acked-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-3-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 07:19:27 -05:00
Jean-Philippe Brucker
bfe7a96173 virtio-iommu: Fix virtio_iommu_mr()
Due to an invalid mask, virtio_iommu_mr() may return the wrong memory
region. It hasn't been too problematic so far because the function was
only used to test existence of an endpoint, but that is about to change.

Fixes: cfb42188b2 ("virtio-iommu: Implement attach/detach command")
Cc: QEMU Stable <qemu-stable@nongnu.org>
Acked-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-2-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 07:19:27 -05:00