For UNMAP-only IOMMU notifiers, we don't need to walk the page tables.
Fasten that procedure by skipping the page table walk. That should
boost performance for UNMAP-only notifiers like vhost.
CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
SECURITY IMPLICATION: this patch fixes a potential race when multiple
threads access the IOMMU IOTLB cache.
Add a per-iommu big lock to protect IOMMU status. Currently the only
thing to be protected is the IOTLB/context cache, since that can be
accessed even without BQL, e.g., in IO dataplane.
Note that we don't need to protect device page tables since that's fully
controlled by the guest kernel. However there is still possibility that
malicious drivers will program the device to not obey the rule. In that
case QEMU can't really do anything useful, instead the guest itself will
be responsible for all uncertainties.
CC: QEMU Stable <qemu-stable@nongnu.org>
Reported-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
That is not really necessary. Removing that node struct and put the
list entry directly into VTDAddressSpace. It simplfies the code a lot.
Since at it, rename the old notifiers_list into vtd_as_with_notifiers.
CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
SECURITY IMPLICATION: without this patch, any guest with both assigned
device and a vIOMMU might encounter stale IO page mappings even if guest
has already unmapped the page, which may lead to guest memory
corruption. The stale mappings will only be limited to the guest's own
memory range, so it should not affect the host memory or other guests on
the host.
During IOVA page table walking, there is a special case when the PSI
covers one whole PDE (Page Directory Entry, which contains 512 Page
Table Entries) or more. In the past, we skip that entry and we don't
notify the IOMMU notifiers. This is not correct. We should send UNMAP
notification to registered UNMAP notifiers in this case.
For UNMAP only notifiers, this might cause IOTLBs cached in the devices
even if they were already invalid. For MAP/UNMAP notifiers like
vfio-pci, this will cause stale page mappings.
This special case doesn't trigger often, but it is very easy to be
triggered by nested device assignments, since in that case we'll
possibly map the whole L2 guest RAM region into the device's IOVA
address space (several GBs at least), which is far bigger than normal
kernel driver usages of the device (tens of MBs normally).
Without this patch applied to L1 QEMU, nested device assignment to L2
guests will dump some errors like:
qemu-system-x86_64: VFIO_MAP_DMA: -17
qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
0x7f89a920d000) = -17 (File exists)
CC: QEMU Stable <qemu-stable@nongnu.org>
Acked-by: Jason Wang <jasowang@redhat.com>
[peterx: rewrite the commit message]
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Fixes: commit da6789c27c ("nvdimm: add a macro for property "label-size"")
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch reports the protocol feature that is only advertised by
QEMU if the device implements the config ops.
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The warning is
hw/virtio/vhost-user.c:1319:26: error: suggest braces
around initialization of subobject [-Werror,-Wmissing-braces]
VhostUserMsg msg = { 0 };
^
{}
While the original code is correct, and technically exactly correct
as per ISO C89, both GCC and Clang support plain empty set of braces
as an extension.
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The response to a VHOST_USER_POSTCOPY_ADVISE contains a fd but doesn't
actually contain any data. FIx vu_message_write so that it doesn't
do a 0-byte write() call, since this was ending up with rc=0
that was confusing the error handling code.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Use qemu_set_nonblock rather than a simple fcntl; cleaner
and I have no reason to change other flags.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch introduces the support for setting memory region
based host notifiers for virtio device. This is helpful when
using a hardware accelerator for a virtio device, because
hardware heavily depends on the notification, this will allow
the guest driver in the VM to notify the hardware directly.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We are going to introduce a shared vhost user state which
will be named as 'VhostUserState'. So add 'Net' prefix to
the existing internal state structure in the vhost-user
netdev to avoid conflict.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Rework the update script slightly, add the unistd.h header and its
dependencies on all architectures.
This also removes the IA64 and MIPS from a KVM blacklist:
Linux dropped IA64, and there was never a reason to
exclude MIPS from kvm specifically - it was
excluded due to dependency of its unistd.h on sgidefs.h,
which we also import.
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Switch to the header we imported from Linux,
this allows us to drop a hack in kvm_i386.h.
More code will be dropped in the next patch.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It turns out (as will be clear from follow-up patches)
we do not really need any kvm para macros host side
for now, except on x86, and there we need it
unconditionally whether we run on kvm or we don't.
Import the x86 asm/kvm_para.h into standard-headers,
follow-up patches remove a bunch of code using this.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add some trace points for IOTLB translation for vhost. After vhost-user
is setup, the only IO path that QEMU will participate should be the
IOMMU translation, so it'll be good we can track this with explicit
timestamps when needed to see how long time we take to do the
translation, and whether there's anything stuck inside. It might be
useful for triaging vhost-user problems.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
qemu should read and report hugetlb page allocation
counts exported in the following kernel patch:
commit 4c3ca37c4a4394978fd0f005625f6064ed2b9a64
Author: Jonathan Helman <jonathan.helman@oracle.com>
Date: Mon Mar 19 11:00:35 2018 -0700
virtio_balloon: export hugetlb page allocation counts
Export the number of successful and failed hugetlb page
allocations via the virtio balloon driver. These 2 counts
come directly from the vm_events HTLB_BUDDY_PGALLOC and
HTLB_BUDDY_PGALLOC_FAIL.
Signed-off-by: Jonathan Helman <jonathan.helman@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
During smram region initialization some addresses are hardcoded,
replace them with macro to be more clear to readers.
Previous patch forgets about one value and exceeds the line
limit of 90 characters. The v2 breaks a few long lines
Signed-off-by: Zihan Yang <whois.zihan.yang@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
AMD Zen expose the Intel equivalant to Speculative Store Bypass Disable
via the 0x80000008_EBX[25] CPUID feature bit.
This needs to be exposed to guest OS to allow them to protect
against CVE-2018-3639.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20180521215424.13520-3-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
"Some AMD processors only support a non-architectural means of enabling
speculative store bypass disable (SSBD). To allow a simplified view of
this to a guest, an architectural definition has been created through a new
CPUID bit, 0x80000008_EBX[25], and a new MSR, 0xc001011f. With this, a
hypervisor can virtualize the existence of this definition and provide an
architectural method for using SSBD to a guest.
Add the new CPUID feature, the new MSR and update the existing SSBD
support to use this MSR when present." (from x86/speculation: Add virtualized
speculative store bypass disable support in Linux).
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20180521215424.13520-4-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
New microcode introduces the "Speculative Store Bypass Disable"
CPUID feature bit. This needs to be exposed to guest OS to allow
them to protect against CVE-2018-3639.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Message-Id: <20180521215424.13520-2-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEe3O61ovnosKJMUsicBtPaxppPlkFAlsBEgAPHG1qdEB0bHMu
bXNrLnJ1AAoJEHAbT2saaT5Z74oH/1rKw64XSout5P3RNbhy8LjDfhu/wxtda8oR
I3e/KbAOsmdU956Bvzk9eiL6gYyG2LrXgY5mJYaY5b+IwRZZEoywRvUX042fSEU0
Hdkrjx8MpAVPN0RztLM2UDQta2NPv9uBc/X9tMR+brRnLID6qXDdd1tjKG5e8vDt
iAfEgGdSg1opJlEyhOI3EPokdGa0QnrBgdPnEqhxtTWNhWWPxbdRVVTeyzJYOztC
sM647Ca6P5y+0lBQNB4CXydiAR0GnE/SS1krFWbrpuI6rbOwzBOsPTi84JtwQzv+
VWjmXfjnXy5c0+DG93e0sS6vqJF4V2ZoHKca2VVzLcBFPUxmZPw=
=MQI4
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mjt/tags/trivial-patches-fetch' into staging
trivial patches for 2018-05-20
# gpg: Signature made Sun 20 May 2018 07:13:20 BST
# gpg: using RSA key 701B4F6B1A693E59
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg: aka "Michael Tokarev <mjt@corpit.ru>"
# gpg: aka "Michael Tokarev <mjt@debian.org>"
# Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D 4324 457C E0A0 8044 65C5
# Subkey fingerprint: 7B73 BAD6 8BE7 A2C2 8931 4B22 701B 4F6B 1A69 3E59
* remotes/mjt/tags/trivial-patches-fetch: (22 commits)
acpi: fix a comment about aml_call0()
qapi/net.json: Fix the version number of the "vlan" removal
gdbstub: Handle errors in gdb_accept()
gdbstub: Use qemu_set_cloexec()
replace functions which are only available in glib-2.24
typedefs: Remove PcGuestInfo from qemu/typedefs.h
qemu-options: Allow -no-user-config again
hw/timer/mt48t59: Fix bit-rotten NVRAM_PRINTF format strings
Remove unnecessary variables for function return value
trivial: Do not include pci.h if it is not necessary
tests: fix tpm-crb tpm-tis tests race
hw/ide/ahci: Keep ALLWINNER_AHCI() macro internal
qemu-img-cmds.hx: add passive-aggressive note
qemu-img: Make documentation between .texi and .hx consistent
qemu-img: remove references to GEN_DOCS
qemu-img.texi: fix command ordering
qemu-img-commands.hx: argument ordering fixups
HACKING: document preference for g_new instead of g_malloc
qemu-option-trace: -trace enable= is a pattern, not a file
slirp/debug: Print IP addresses in human readable form
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
"vlan" will be dropped in 2.13, not in 2.12. And while we're at it,
use the better wording "dropped in" instead of "removed with" (also
for the "dump" removal).
Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
In gdb_accept(), we both fail to check all errors (notably
that from socket_set_nodelay(), as Coverity notes in CID 1005666),
and fail to return an error status back to our caller. Correct
both of these things, so that errors in accept() result in our
stopping with a useful error message rather than ignoring it.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Use the utility routine qemu_set_cloexec() rather than
manually calling fcntl(). This lets us drop the #ifndef _WIN32
guards and also means Coverity doesn't complain that we're
ignoring the fcntl error return (CID 1005665, CID 1005667).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Currently the minimal supported version of glib is 2.22.
Since testing is done with a glib that claims to be 2.22, but in fact
has APIs from newer version of glib, this bug was not caught during
submit of the patch referenced below.
Replace g_realloc_n, which is available only since 2.24, with g_renew.
Fixes commit 418026ca43 ("util: Introduce vfio helpers")
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
CC: qemu-stable@nongnu.org
It is long gone since e4e8ba04c2 ...
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
After 1217d6ca2b we error out
explicitly if an unknown -option was passed on the command line.
However, we are doing two pass command line option parsing. In
the first pass we just look for -no-user-config or -nodefconfig
being present which determines whether we load user config or
not. Then in the second pass we finally parse everything else
throwing an error if an unsupported -option was found. Problem is
that in the second pass -no-user-config and -nodefconfig are not
handled explicitly which makes us throw the unsupported option
error.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
When compiling with NVRAM_PRINTF enabled, gcc currently bails out with:
CC hw/timer/m48t59.o
CC hw/timer/m48t59-isa.o
hw/timer/m48t59.c: In function ‘NVRAM_writeb’:
hw/timer/m48t59.c:460:5: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 3 has type ‘hwaddr’ [-Werror=format=]
NVRAM_PRINTF("%s: 0x%08x => 0x%08x\n", __func__, addr, val);
^
hw/timer/m48t59.c:460:5: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 4 has type ‘uint64_t’ [-Werror=format=]
hw/timer/m48t59.c: In function ‘NVRAM_readb’:
hw/timer/m48t59.c:492:5: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 3 has type ‘hwaddr’ [-Werror=format=]
NVRAM_PRINTF("%s: 0x%08x <= 0x%08x\n", __func__, addr, retval);
Fix it by using the correct format strings and while we're at it,
also change the definition of NVRAM_PRINTF so that this can not
bit-rot so easily again.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
There is no need to include pci.h in these files.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
No need to close the TPM data socket on the emulator end, qemu will
close it after a SHUTDOWN. This avoids a race between close() and
read() in the TPM data thread.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The ALLWINNER_AHCI() macro is only used in ahci-allwinner.c.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
I'm kidding. It's very easy to forget there are per-command sections
in the texi, and insane that we don't autogenerate those, too.
Until then, leave a little post-it note in this .hx file until I
find a way to delete it.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
These are also different and out of order for whatever reason.
I'd like to automate this in the future, but for now let's put
on the band-aid.
In the case of resize, there were options missing from all
three docstrings; the new string is based on the code.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Nothing seemingly uses this.
(jcody: commit 77bd1119ba even mentions that it appears unused)
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This should match the summary ordering, which is alphabetical.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The TEXI and string versions are actually identical, except for markup.
We can probably automate this... but make the ordering the same until
then.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This patch documents the preference for g_new instead of g_malloc. The
reasons were adapted from commit b45c03f585.
Discussion in QEMU's mailing list:
http://lists.nongnu.org/archive/html/qemu-devel/2018-05/msg03238.html
Cc: qemu-devel@nongnu.org
Cc: David Hildenbrand <david@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The trace events all use a uint64_t data type, so should be using the
corresponding PRIx64 format, not HWADDR_PRIx which is intended for use
with the 'hwaddr' type.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>