Commit Graph

131 Commits

Author SHA1 Message Date
Jason Wang
f7ef7e6e3b vhost: correctly turn on VIRTIO_F_IOMMU_PLATFORM
We turn on device IOTLB via VIRTIO_F_IOMMU_PLATFORM unconditionally on
platform without IOMMU support. This can lead unnecessary IOTLB
transactions which will damage the performance.

Fixing this by check whether the device is backed by IOMMU and disable
device IOTLB.

Reported-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20200302042454.24814-1-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-03-08 09:27:09 -04:00
Philippe Mathieu-Daudé
b897a47450 hw/virtio: Let vhost_memory_map() use a boolean 'is_write' argument
The 'is_write' argument is either 0 or 1.
Convert it to a boolean type.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2020-02-20 14:47:08 +01:00
Michael S. Tsirkin
8347505640 vhost: coding style fix
Drop a trailing whitespace. Make line shorter.

Fixes: 7652511473 ("vhost: Only align sections for vhost-user")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-01-23 02:08:15 -05:00
Dr. David Alan Gilbert
7652511473 vhost: Only align sections for vhost-user
I added hugepage alignment code in c1ece84e7c to deal with
vhost-user + postcopy which needs aligned pages when using userfault.
However, on x86 the lower 2MB of address space tends to be shotgun'd
with small fragments around the 512-640k range - e.g. video RAM, and
with HyperV synic pages tend to sit around there - again splitting
it up.  The alignment code complains with a 'Section rounded to ...'
error and gives up.

Since vhost-user already filters out devices without an fd
(see vhost-user.c vhost_user_mem_section_filter) it shouldn't be
affected by those overlaps.

Turn the alignment off on vhost-kernel so that it doesn't try
and align, and thus won't hit the rounding issues.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200116202414.157959-3-dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-22 00:50:03 -05:00
Dr. David Alan Gilbert
ff4776147e vhost: Add names to section rounded warning
Add the memory region names to section rounding/alignment
warnings.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200116202414.157959-2-dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-01-22 00:50:03 -05:00
Dr. David Alan Gilbert
7a064bcc66 virtio/vhost: Use auto_rcu_read macros
Use RCU_READ_LOCK_GUARD instead of manual rcu_read_(un)lock

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20191025103403.120616-2-dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-10-29 18:56:45 -04:00
Eric Auger
549d400587 memory: allow memory_region_register_iommu_notifier() to fail
Currently, when a notifier is attempted to be registered and its
flags are not supported (especially the MAP one) by the IOMMU MR,
we generally abruptly exit in the IOMMU code. The failure could be
handled more nicely in the caller and especially in the VFIO code.

So let's allow memory_region_register_iommu_notifier() to fail as
well as notify_flag_changed() callback.

All sites implementing the callback are updated. This patch does
not yet remove the exit(1) in the amd_iommu code.

in SMMUv3 we turn the warning message into an error message saying
that the assigned device would not work properly.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-04 18:49:18 +02:00
Dr. David Alan Gilbert
3fc4a64cba vhost: Fix memory region section comparison
Using memcmp to compare structures wasn't safe,
as I found out on ARM when I was getting falce miscompares.

Use the helper function for comparing the MRSs.

Fixes: ade6d081fc ("vhost: Regenerate region list from changed sections list")
Cc: qemu-stable@nongnu.org
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190814175535.2023-4-dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-09-25 10:16:39 -04:00
Markus Armbruster
650d103d3e Include hw/hw.h exactly where needed
In my "build everything" tree, changing hw/hw.h triggers a recompile
of some 2600 out of 6600 objects (not counting tests and objects that
don't depend on qemu/osdep.h).

The previous commits have left only the declaration of hw_error() in
hw/hw.h.  This permits dropping most of its inclusions.  Touching it
now recompiles less than 200 objects.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190812052359.30071-19-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16 13:31:52 +02:00
Markus Armbruster
ca77ee28e0 Include migration/qemu-file-types.h a lot less
In my "build everything" tree, changing migration/qemu-file-types.h
triggers a recompile of some 2600 out of 6600 objects (not counting
tests and objects that don't depend on qemu/osdep.h).

The culprit is again hw/hw.h, which supposedly includes it for
convenience.

Include migration/qemu-file-types.h only where it's needed.  Touching
it now recompiles less than 200 objects.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-10-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16 13:31:52 +02:00
Li Hangjing
240e647a14 vhost: fix vhost_log size overflow during migration
When a guest which doesn't support multiqueue is migrated with a multi queues
vhost-user-blk deivce, a crash will occur like:

0 qemu_memfd_alloc (name=<value optimized out>, size=562949953421312, seals=<value optimized out>, fd=0x7f87171fe8b4, errp=0x7f87171fe8a8) at util/memfd.c:153
1 0x00007f883559d7cf in vhost_log_alloc (size=70368744177664, share=true) at hw/virtio/vhost.c:186
2 0x00007f88355a0758 in vhost_log_get (listener=0x7f8838bd7940, enable=1) at qemu-2-12/hw/virtio/vhost.c:211
3 vhost_dev_log_resize (listener=0x7f8838bd7940, enable=1) at hw/virtio/vhost.c:263
4 vhost_migration_log (listener=0x7f8838bd7940, enable=1) at hw/virtio/vhost.c:787
5 0x00007f88355463d6 in memory_global_dirty_log_start () at memory.c:2503
6 0x00007f8835550577 in ram_init_bitmaps (f=0x7f88384ce600, opaque=0x7f8836024098) at migration/ram.c:2173
7 ram_init_all (f=0x7f88384ce600, opaque=0x7f8836024098) at migration/ram.c:2192
8 ram_save_setup (f=0x7f88384ce600, opaque=0x7f8836024098) at migration/ram.c:2219
9 0x00007f88357a419d in qemu_savevm_state_setup (f=0x7f88384ce600) at migration/savevm.c:1002
10 0x00007f883579fc3e in migration_thread (opaque=0x7f8837530400) at migration/migration.c:2382
11 0x00007f8832447893 in start_thread () from /lib64/libpthread.so.0
12 0x00007f8832178bfd in clone () from /lib64/libc.so.6

This is because vhost_get_log_size() returns a overflowed vhost-log size.
In this function, it uses the uninitialized variable vqs->used_phys and
vqs->used_size to get the vhost-log size.

Signed-off-by: Li Hangjing <lihangjing@baidu.com>
Reviewed-by: Xie Yongji <xieyongji@baidu.com>
Reviewed-by: Chai Wen <chaiwen@baidu.com>
Message-Id: <20190603061524.24076-1-lihangjing@baidu.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-06-16 16:16:52 -04:00
Jie Wang
31618958cc vhost: fix incorrect print type
fix incorrect print type in vhost_virtqueue_stop

Signed-off-by: Jie Wang <wangjie88@huawei.com>
Message-Id: <1556605773-42019-1-git-send-email-wangjie88@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-05-29 18:00:57 -04:00
Jie Wang
c39eb88da1 vhost: remove the dead code
remove the dead code

Signed-off-by: Jie Wang <wangjie88@huawei.com>
Message-Id: <1556604614-32081-1-git-send-email-wangjie88@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-05-29 18:00:57 -04:00
Xie Yongji
5ad204bf2a vhost-user: Support transferring inflight buffer between qemu and backend
This patch introduces two new messages VHOST_USER_GET_INFLIGHT_FD
and VHOST_USER_SET_INFLIGHT_FD to support transferring a shared
buffer between qemu and backend.

Firstly, qemu uses VHOST_USER_GET_INFLIGHT_FD to get the
shared buffer from backend. Then qemu should send it back
through VHOST_USER_SET_INFLIGHT_FD each time we start vhost-user.

This shared buffer is used to track inflight I/O by backend.
Qemu should retrieve a new one when vm reset.

Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Chai Wen <chaiwen@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Message-Id: <20190228085355.9614-2-xieyongji@baidu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12 22:31:21 -04:00
Paolo Bonzini
18658a3ced vhost: restrict Linux dependency to kernel vhost
vhost-user does not depend on Linux; it can run on any POSIX system.  Restrict
vhost-kernel to Linux in hw/virtio/vhost-backend.c, everything else can be
compiled on all POSIX systems.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1543851204-41186-4-git-send-email-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1550165756-21617-4-git-send-email-pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-02-21 12:28:01 -05:00
Yury Kotov
fa4ae4be15 vhost: fix invalid downcast
virtio_queue_get_desc_addr returns 64-bit hwaddr while int is usually 32-bit.
If returned hwaddr is not equal to 0 but least-significant 32 bits are
equal to 0 then this code will not actually stop running queue.

Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Acked-by: Jia He <hejianet@gmail.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-09-07 17:05:18 -04:00
Tiwei Bie
388a86df9c vhost: check region type before casting
Check region type first before casting the memory region
to IOMMUMemoryRegion. Otherwise QEMU will abort with below
error message when casting non-IOMMU memory region:

vhost_iommu_region_add: Object 0x561f28bce4f0 is not an
instance of type qemu:iommu-memory-region

Fixes: cb1efcf462 ("iommu: Add IOMMU index argument to notifier APIs")
Cc: Peter Maydell <peter.maydell@linaro.org>

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-08-03 11:35:21 +03:00
Peter Maydell
cb1efcf462 iommu: Add IOMMU index argument to notifier APIs
Add support for multiple IOMMU indexes to the IOMMU notifier APIs.
When initializing a notifier with iommu_notifier_init(), the caller
must pass the IOMMU index that it is interested in. When a change
happens, the IOMMU implementation must pass
memory_region_notify_iommu() the IOMMU index that has changed and
that notifiers must be called for.

IOMMUs which support only a single index don't need to change.
Callers which only really support working with IOMMUs with a single
index can use the result of passing MEMTXATTRS_UNSPECIFIED to
memory_region_iommu_attrs_to_index().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180604152941.20374-3-peter.maydell@linaro.org
2018-06-15 15:23:34 +01:00
Peter Maydell
f67c9b693a acpi, vhost, misc: fixes, features
vDPA support, fix to vhost blk RO bit handling, some include path
 cleanups, NFIT ACPI table.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbEXNvAAoJECgfDbjSjVRpc8gH/R8xrcFrV+k9wwbgYcOcGb6Y
 LWjseE31pqJcxRV80vLOdzYEuLStZQKQQY7xBDMlA5vdyvZxIA6FLO2IsiJSbFAk
 EK8pclwhpwQAahr8BfzenabohBv2UO7zu5+dqSvuJCiMWF3jGtPAIMxInfjXaOZY
 odc1zY2D2EgsC7wZZ1hfraRbISBOiRaez9BoGDKPOyBY9G1ASEgxJgleFgoBLfsK
 a1XU+fDM6hAVdxftfkTm0nibyf7PWPDyzqghLqjR9WXLvZP3Cqud4p8N29mY51pR
 KSTjA4FYk6Z9EVMltyBHfdJs6RQzglKjxcNGdlrvacDfyFi79fGdiosVllrjfJM=
 =3+V0
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

acpi, vhost, misc: fixes, features

vDPA support, fix to vhost blk RO bit handling, some include path
cleanups, NFIT ACPI table.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Fri 01 Jun 2018 17:25:19 BST
# gpg:                using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream: (31 commits)
  vhost-blk: turn on pre-defined RO feature bit
  ACPI testing: test NFIT platform capabilities
  nvdimm, acpi: support NFIT platform capabilities
  tests/.gitignore: add entry for generated file
  arch_init: sort architectures
  ui: use local path for local headers
  qga: use local path for local headers
  colo: use local path for local headers
  migration: use local path for local headers
  usb: use local path for local headers
  sd: fix up include
  vhost-scsi: drop an unused include
  ppc: use local path for local headers
  rocker: drop an unused include
  e1000e: use local path for local headers
  ioapic: fix up includes
  ide: use local path for local headers
  display: use local path for local headers
  trace: use local path for local headers
  migration: drop an unused include
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-04 10:15:16 +01:00
Peter Maydell
7446eb07c1 Make address_space_get_iotlb_entry() take a MemTxAttrs argument
As part of plumbing MemTxAttrs down to the IOMMU translate method,
add MemTxAttrs as an argument to address_space_get_iotlb_entry().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180521140402.23318-12-peter.maydell@linaro.org
2018-05-31 16:32:35 +01:00
Tiwei Bie
988a27754b vhost: allow backends to filter memory sections
This patch introduces a vhost op for vhost backends to allow
them to filter the memory sections that they can handle.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-24 21:14:10 +03:00
Peter Xu
ffcbbe722f vhost: add trace for IOTLB miss
Add some trace points for IOTLB translation for vhost. After vhost-user
is setup, the only IO path that QEMU will participate should be the
IOMMU translation, so it'll be good we can track this with explicit
timestamps when needed to see how long time we take to do the
translation, and whether there's anything stuck inside.  It might be
useful for triaging vhost-user problems.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23 03:14:41 +03:00
Jason Wang
aebbdbee55 vhost: do not verify ring mappings when IOMMU is enabled
When IOMMU is enabled, we store virtqueue metadata as iova (though it
may has _phys suffix) and access them through dma helpers. Any
translation failures could be reported by IOMMU.

In this case, trying to validate iova against gpa won't work and will
cause a false error reporting. So this patch bypasses the ring
verification if IOMMU is enabled which is similar to the behavior
before 0ca1fd2d68 that calls vhost_memory_map() which is a nop when
IOMMU is enabled.

Fixes: 0ca1fd2d68 ("vhost: Simplify ring verification checks")
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-04-16 19:11:38 +03:00
Peter Maydell
915d34c5f9 Miscellaneous bugfixes, including crash fixes from Alexey, Peter M. and
Thomas.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJay3qbAAoJEL/70l94x66D7LwIAIDjHDULzCy/u+m/uFyTn7rD
 zyDhQTWgHP6OQ+TqixIDDszeasev/PWmiC6Bp+NG6ZIG102+XTREciSW+X7B6mct
 OqI/5xpjoqzKj2LrTeCnm754Xv7Ilz9kxZ1MKlGqjnRzdmykDRx7RNLqGBohL4EI
 nnF3iiOiT4ECY/aLgeRLfufJqj9zHr8hQ3om+2zMqntPfqc3Eg0eCpgb7uGMRDq8
 nWLecnDtqmBWhXDJCPngxDavBQqHDAmq1aj9ppJPLS+nB6pez0DvHMI6Gg3K4fIl
 2ybJse5FbOj/+PsM1Ae5g8TcWz607mVgtE+crKxLDmffg+YjbO9raqWigZoIw2Y=
 =aMIC
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

Miscellaneous bugfixes, including crash fixes from Alexey, Peter M. and
Thomas.

# gpg: Signature made Mon 09 Apr 2018 15:37:15 BST
# gpg:                using RSA key BFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  Add missing bit for SSE instr in VEX decoding
  maint: Add .mailmap entries for patches claiming list authorship
  dump: Fix build with newer gcc
  device-crash-test: Remove fixed isa-fdc entry
  qemu-pr-helper: Write pidfile more often
  qemu-pr-helper: Daemonize before dropping privileges
  virtio-serial: fix heapover-flow
  kvmclock: fix clock_is_reliable on migration from QEMU < 2.9
  hw/dma/i82374: Avoid double creation of the 82374 controller
  hw/scsi: support SCSI-2 passthrough without PI
  scsi-disk: allow customizing the SCSI version
  scsi-disk: Don't enlarge min_io_size to max_io_size
  configure: Add missing configure options to help text
  i386/hyperv: error out if features requested but unsupported
  i386/hyperv: add hv-frequencies cpu property
  target/i386: WHPX: set CPUID_EXT_HYPERVISOR bit
  memfd: fix vhost-user-test on non-memfd capable host
  scripts/checkpatch.pl: Bug fix
  target/i386: Fix andn instruction
  sys_membarrier: fix up include directives

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-04-09 17:29:10 +01:00
Dr. David Alan Gilbert
e7b94a84b6 vhost: Allow adjoining regions
My rework of section adding combines overlapping or adjoining regions,
but checks they're actually the same underlying RAM block.
Fix the case where two blocks adjoin but don't overlap; that new region
should get added (but not combined), but my previous patch was disallowing it.

Fixes: c1ece84e7c

Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-04-09 17:35:46 +03:00
Maxime Coquelin
bc6abcff7c vhost-user-blk: set config ops before vhost-user init
As soon as vhost-user init is done, the backend may send
VHOST_USER_SLAVE_CONFIG_CHANGE_MSG, so let's set the
notification callback before it.

Also, it will be used to know whether the device supports
the config feature to advertize it or not.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Changpeng Liu <changpeng.liu@intel.com>
2018-04-09 17:35:45 +03:00
Marc-André Lureau
648abbfbaa memfd: fix vhost-user-test on non-memfd capable host
On RHEL7, memfd is not supported, and vhost-user-test fails:
TEST: tests/vhost-user-test... (pid=10248)
  /x86_64/vhost-user/migrate:
  qemu-system-x86_64: -object memory-backend-memfd,id=mem,size=2M,: failed to create memfd
FAIL

There is a qemu_memfd_check() to prevent running memfd path, but it
also checks for fallback implementation. Let's specialize
qemu_memfd_check() to check memfd only, while qemu_memfd_alloc_check()
checks for the qemu_memfd_alloc() API.

Reported-by: Miroslav Rezanina <mrezanin@redhat.com>
Tested-by: Miroslav Rezanina <mrezanin@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180328121804.16203-1-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2018-04-09 12:57:06 +02:00
Dr. David Alan Gilbert
c1ece84e7c vhost: Huge page align and merge
Align RAMBlocks to page size alignment, and adjust the merging code
to deal with partial overlap due to that alignment.

This is needed for postcopy so that we can place/fetch whole hugepages
when under userfault.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-03-20 16:40:37 +02:00
Jia He
9fac50c88d vhost: fix incorrect check in vhost_verify_ring_mappings
In commit 0ca1fd2d68 ("vhost: Simplify ring verification checks"),
it checks the virtqueue desc mapping for 3 times.

Fixed: commit 0ca1fd2d68 ("vhost: Simplify ring verification checks")
Signed-off-by: Jia He <jia.he@hxt-semitech.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-03-01 18:17:47 +02:00
Jia He
fb20fbb764 vhost: avoid to start/stop virtqueue which is not ready
In our Armv8a server, we try to configure the vhost scsi but fail
to boot up the guest (-machine virt-2.10). The guest's boot failure
is very early, even earlier than grub.

There are 3 virtqueues (ctrl, event and cmd) for virtio scsi device,
but ovmf and seabios will only set the physical address for the 3rd
one (cmd). Then in vhost_virtqueue_start(), virtio_queue_get_desc_addr
will be 0 for ctrl and event vq when qemu negotiates with ovmf. So
vhost_memory_map fails with ENOMEM.

This patch just fixs it by early quitting the virtqueue start/stop
when virtio_queue_get_desc_addr is 0.

Btw, after guest kernel starts, all the 3 queues will be initialized
and set address correctly.

Already tested on Arm64 and X86_64 qemu.

Signed-off-by: Jia He <jia.he@hxt-semitech.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-03-01 18:17:47 +02:00
Jay Zhou
9e2a2a3e08 vhost: fix memslot limit check
Since used_memslots will be updated to the actual value after
registering memory listener for the first time, move the
memslots limit checking to the right place.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-03-01 16:25:37 +02:00
Dr. David Alan Gilbert
aa3c40f6bf vhost: Move log_dirty check
Move the log_dirty check into vhost_section.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-02-08 21:06:41 +02:00
Dr. David Alan Gilbert
938eeb640c vhost: Merge and delete unused callbacks
Now that the olf vhost_set_memory code is gone, the _nop and _add
callbacks are identical and can be merged.  The _del callback is
no longer needed.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-02-08 21:06:41 +02:00
Dr. David Alan Gilbert
06709c120c vhost: Clean out old vhost_set_memory and friends
Remove the old update mechanism, vhost_set_memory, and the functions
and flags it used.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-02-08 21:06:40 +02:00
Dr. David Alan Gilbert
ade6d081fc vhost: Regenerate region list from changed sections list
Compare the sections list that's just been generated, and if it's
different from the old one regenerate the region list.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2018-02-08 21:06:40 +02:00
Dr. David Alan Gilbert
48d7c97577 vhost: Merge sections added to temporary list
As sections are reported by the listener to the _nop and _add
methods, add them to the temporary section list but now merge them
with the previous section if the new one abuts and the backend allows.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-02-08 21:06:40 +02:00
Dr. David Alan Gilbert
0ca1fd2d68 vhost: Simplify ring verification checks
vhost_verify_ring_mappings() were used to verify that
rings are still accessible and related memory hasn't
been moved after flatview is updated.

It was doing checks by mapping ring's GPA+len and
checking that HVA hadn't changed with new memory map.
To avoid maybe expensive mapping call, we were
identifying address range that changed and were doing
mapping only if ring was in changed range.

However it's not neccessary to perform ring's GPA
mapping as we already have its current HVA and all
we need is to verify that ring's GPA translates to
the same HVA in updated flatview.

This will allow the following patches to simplify the range
comparison that was previously needed to avoid expensive
verify_ring_mapping calls.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
with modifications by:
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-02-08 21:06:40 +02:00
Dr. David Alan Gilbert
c44317efec vhost: Build temporary section list and deref after commit
Igor spotted that there's a race, where a region that's unref'd
in a _del callback might be free'd before the set_mem_table call in
the _commit callback, and thus the vhost might end up using free memory.

Fix this by building a complete temporary sections list, ref'ing every
section (during add and nop) and then unref'ing the whole list right
at the end of commit.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-02-08 21:06:40 +02:00
Gal Hammer
76143618a5 virtio: remove event notifier cleanup call on de-assign
The virtio_bus_set_host_notifier function no longer calls
event_notifier_cleanup when a event notifier is removed.

The commit updates the code to match the new behavior and calls
virtio_bus_cleanup_host_notifier after the notifier was de-assign
and no longer in use.

This change is a preparation to allow executing the
virtio_bus_set_host_notifier function in a memory region
transaction.

Signed-off-by: Gal Hammer <ghammer@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-02-08 21:06:26 +02:00
Michael S. Tsirkin
f41d912023 Revert "vhost: add traces for memory listeners"
This reverts commit 0750b06021.

Follow up patches are reworking the memory listeners, the new mechanism
will add its own set of traces.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-02-08 19:26:51 +02:00
Marc-André Lureau
0f2956f915 memfd: add error argument, instead of perror()
This will allow callers to silence error report when the call is
allowed to failed.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180201132757.23063-2-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-02-07 14:09:25 +01:00
Peter Xu
d25836cafd memory: do explicit cleanup when remove listeners
When unregister memory listeners, we should call, e.g.,
region_del() (and possibly other undo operations) on every existing
memory region sections there, otherwise we may leak resources that are
held during the region_add(). This patch undo the stuff for the
listeners, which emulates the case when the address space is set from
current to an empty state.

I found this problem when debugging a refcount leak issue that leads to
a device unplug event lost (please see the "Bug:" line below).  In that
case, the leakage of resource is the PCI BAR memory region refcount.
And since memory regions are not keeping their own refcount but onto
their owners, so the vfio-pci device's (who is the owner of the PCI BAR
memory regions) refcount is leaked, and event missing.

We had encountered similar issues before and fixed in other
way (ee4c112846, "vhost: Release memory references on cleanup"). This
patch can be seen as a more high-level fix of similar problems that are
caused by the resource leaks from memory listeners. So now we can remove
the explicit unref of memory regions since that'll be done altogether
during unregistering of listeners now.

Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1531393
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180122060244.29368-5-peterx@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-02-07 14:09:24 +01:00
Peter Xu
0750b06021 vhost: add traces for memory listeners
Trace these operations on two memory listeners.  It helps to verify the
new memory listener fix, and good to keep them there.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180122060244.29368-2-peterx@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-02-07 14:09:24 +01:00
Changpeng Liu
4c3e257b5e vhost-user: add new vhost user messages to support virtio config space
Add VHOST_USER_GET_CONFIG/VHOST_USER_SET_CONFIG messages which can be
used for live migration of vhost user devices, also vhost user devices
can benefit from the messages to get/set virtio config space from/to the
I/O target. For the purpose to support virtio config space change,
VHOST_USER_SLAVE_CONFIG_CHANGE_MSG message is added as the event notifier
in case virtio config space change in the slave I/O target.

Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-01-18 21:52:37 +02:00
Greg Kurz
2fe45ec3bf vhost: fix error check in vhost_verify_ring_mappings()
Since commit f1f9e6c5 "vhost: adapt vhost_verify_ring_mappings() to
virtio 1 ring layout", we check the mapping of each part (descriptor
table, available ring and used ring) of each virtqueue separately.

The checking of a part is done by the vhost_verify_ring_part_mapping()
function: it returns either 0 on success or a negative errno if the
part cannot be mapped at the same place.

Unfortunately, the vhost_verify_ring_mappings() function checks its
return value the other way round. It means that we either:
- only verify the descriptor table of the first virtqueue, and if it
  is valid we ignore all the other mappings
- or ignore all broken mappings until we reach a valid one

ie, we only raise an error if all mappings are broken, and we consider
all mappings are valid otherwise (false success), which is obviously
wrong.

This patch ensures that vhost_verify_ring_mappings() only returns
success if ALL mappings are okay.

Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-12-01 19:05:58 +02:00
Maxime Coquelin
2ae39a113a vhost: restore avail index from vring used index on disconnection
vhost_virtqueue_stop() gets avail index value from the backend,
except if the backend is not responding.

It happens when the backend crashes, and in this case, internal
state of the virtio queue is inconsistent, making packets
to corrupt the vring state.

With a Linux guest, it results in following error message on
backend reconnection:

[   22.444905] virtio_net virtio0: output.0:id 0 is not a head!
[   22.446746] net enp0s3: Unexpected TXQ (0) queue failure: -5
[   22.476360] net enp0s3: Unexpected TXQ (0) queue failure: -5

Fixes: 283e2c2adc ("net: virtio-net discards TX data after link down")
Cc: qemu-stable@nongnu.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-12-01 19:05:58 +02:00
Felipe Franciosi
5c0ba1be37 virtio/vhost: reset dev->log after syncing
vhost_log_put() is called to decomission the dirty log between qemu and
a vhost device when stopping the device. Such a call can happen from
migration_completion().

Present code sets dev->log_size to zero too early in vhost_log_put(),
causing the sync check to always return false. As a consequence, the
last pass on the dirty bitmap never happens at the end of migration.

If a vhost device was busy (writing to guest memory) until the last
moments before vhost_virtqueue_stop(), this error will result in guest
memory corruption (at least) following migrations.

Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-10-15 05:54:41 +03:00
Alex Williamson
ee4c112846 vhost: Release memory references on cleanup
vhost registers a MemoryListener where it adds and removes references
to MemoryRegions as the MemoryRegionSections pass through.  The
region_add callback is invoked for each existing section when the
MemoryListener is registered, but unregistering the MemoryListener
performs no reciprocal region_del callback.  It's therefore the
owner of the MemoryListener's responsibility to cleanup any persistent
changes, such as these memory references, after unregistering.

The consequence of this bug is that if we have both a vhost device
and a vfio device, the vhost device will reference any mmap'd MMIO of
the vfio device via this MemoryListener.  If the vhost device is then
removed, those references remain outstanding.  If we then attempt to
remove the vfio device, it never gets finalized and the only way to
release the kernel file descriptors is to terminate the QEMU process.

Fixes: dfde4e6e1a ("memory: add ref/unref calls")
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org # v1.6.0+
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-09-08 16:15:17 +03:00
Marc-André Lureau
33c5793bd9 vhost: use QEMU_ALIGN_DOWN
I used the clang-tidy qemu-round check to generate the fix:
https://github.com/elmarco/clang-tools-extra

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2017-08-31 12:29:07 +02:00
Maxime Coquelin
020e571b8b vhost: rework IOTLB messaging
This patch reworks IOTLB messaging to prepare for vhost-user
device IOTLB support.

IOTLB messages handling is extracted from vhost-kernel backend,
so that only the messages transport remains backend specifics.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-06-02 18:57:17 +03:00