Commit Graph

29 Commits

Author SHA1 Message Date
Liu, Yi L
fb43cf739e intel_iommu: scalable mode emulation
Intel(R) VT-d 3.0 spec introduces scalable mode address translation to
replace extended context mode. This patch extends current emulator to
support Scalable Mode which includes root table, context table and new
pasid table format change. Now intel_iommu emulates both legacy mode
and scalable mode (with legacy-equivalent capability set).

The key points are below:
1. Extend root table operations to support both legacy mode and scalable
   mode.
2. Extend context table operations to support both legacy mode and
   scalable mode.
3. Add pasid tabled operations to support scalable mode.

Signed-off-by: Liu, Yi L <yi.l.liu@intel.com>
[Yi Sun is co-developer to contribute much to refine the whole commit.]
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Message-Id: <1551753295-30167-2-git-send-email-yi.y.sun@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
2019-03-12 22:31:21 -04:00
Peter Xu
095955b24d intel_iommu: convert invalid traces into error reports
Report more *_invalid() tracepoints to error_report_once() so that we
can detect issues even without tracing enabled.  Drop those tracepoints.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-19 16:48:16 -05:00
Singh, Brijesh
135f866e60 x86_iommu/amd: Add interrupt remap support when VAPIC is enabled
Emulate the interrupt remapping support when guest virtual APIC is
enabled.

For more information refer: IOMMU spec rev 3.0 (section 2.2.5.2)

When VAPIC is enabled, it uses interrupt remapping as defined in
Table 22 and Figure 17 from IOMMU spec.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-05 13:24:02 -05:00
Singh, Brijesh
b44159fe00 x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled
Emulate the interrupt remapping support when guest virtual APIC is
not enabled.

For more info Refer: AMD IOMMU spec Rev 3.0 - section 2.2.5.1

When VAPIC is not enabled, it uses interrupt remapping as defined in
Table 20 and Figure 15 from IOMMU spec.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-05 13:24:02 -05:00
Singh, Brijesh
577c470f43 x86_iommu/amd: Prepare for interrupt remap support
Register the interrupt remapping callback and read/write ops for the
amd-iommu-ir memory region.

amd-iommu-ir is set to higher priority to ensure that this region won't
be masked out by other memory regions.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-05 13:24:02 -05:00
Peter Xu
4e4abd111a intel-iommu: replace more vtd_err_* traces
Replace all the trace_vtd_err_*() hooks with the new error_report_once()
since they are similar to trace_vtd_err() - dumping the first error
would be mostly enough, then we have them on by default too.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180815095328.32414-4-peterx@redhat.com>
[Use "%x" instead of "%" PRIx16 to print uint16_t, whitespace tidied up]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-08-27 15:09:20 +02:00
Peter Xu
1376211f77 intel-iommu: start to use error_report_once
Replace existing trace_vtd_err() with error_report_once() then stderr
will capture something if any of the error happens, meanwhile we don't
suffer from any DDOS.  Then remove the trace point.  Since at it,
provide more information where proper (now we can pass parameters into
the report function).

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180815095328.32414-3-peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
[Two format strings fixed, whitespace tidied up]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-08-27 15:00:45 +02:00
Peter Xu
63b88968f1 intel-iommu: rework the page walk logic
This patch fixes a potential small window that the DMA page table might
be incomplete or invalid when the guest sends domain/context
invalidations to a device.  This can cause random DMA errors for
assigned devices.

This is a major change to the VT-d shadow page walking logic. It
includes but is not limited to:

- For each VTDAddressSpace, now we maintain what IOVA ranges we have
  mapped and what we have not.  With that information, now we only send
  MAP or UNMAP when necessary.  Say, we don't send MAP notifies if we
  know we have already mapped the range, meanwhile we don't send UNMAP
  notifies if we know we never mapped the range at all.

- Introduce vtd_sync_shadow_page_table[_range] APIs so that we can call
  in any places to resync the shadow page table for a device.

- When we receive domain/context invalidation, we should not really run
  the replay logic, instead we use the new sync shadow page table API to
  resync the whole shadow page table without unmapping the whole
  region.  After this change, we'll only do the page walk once for each
  domain invalidations (before this, it can be multiple, depending on
  number of notifiers per address space).

While at it, the page walking logic is also refactored to be simpler.

CC: QEMU Stable <qemu-stable@nongnu.org>
Reported-by: Jintack Lim <jintack@cs.columbia.edu>
Tested-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23 17:34:05 +03:00
Peter Xu
d118c06ebb intel-iommu: trace domain id during page walk
This patch only modifies the trace points.

Previously we were tracing page walk levels.  They are redundant since
we have page mask (size) already.  Now we trace something much more
useful which is the domain ID of the page walking.  That can be very
useful when we trace more than one devices on the same system, so that
we can know which map is for which domain.

CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-05-23 17:33:58 +03:00
Philippe Mathieu-Daudé
7299e1a411 hw/i386/vmport: replace fprintf() by trace events or LOG_UNIMP
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20171215034356.4449-2-f4bug@amsat.org>
[Replace unknown command tracepoint with LOG_UNIMP, add
 generic tracepoint for vmport commands. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-21 09:30:32 +01:00
Philippe Mathieu-Daudé
87e0331c5a docs: fix broken paths to docs/devel/tracing.txt
With the move of some docs/ to docs/devel/ on ac06724a71,
no references were updated.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-07-31 13:12:53 +03:00
Ladi Prosek
8991c460be intel_iommu: relax iq tail check on VTD_GCMD_QIE enable
The VT-d spec (section 6.5.2) prescribes software to zero the
Invalidation Queue Tail Register before enabling the VTD_GCMD_QIE
Global Command Register bit. Windows Server 2012 R2 and possibly
other older Windows versions violate the protocol and set a
non-zero queue tail first, which in effect makes them crash early
on boot with -device intel-iommu,intremap=on.

This commit relaxes the check and instead of failing to enable
VTD_GCMD_QIE with vtd_err_qi_enable, it behaves as if the tail
register was set just after enabling VTD_GCMD_QIE
(see vtd_handle_iqt_write).

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-07-03 22:29:48 +03:00
Peter Xu
b9313021f3 intel_iommu: cleanup vtd_{do_}iommu_translate()
First, let vtd_do_iommu_translate() return a status, so that we
explicitly knows whether error occured. Meanwhile, we make sure that
IOMMUTLBEntry is filled in in that.

Then, cleanup vtd_iommu_translate a bit. So even with PT we'll get a log
now. Also, remove useless assignments.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-06-16 18:44:55 +03:00
Peter Xu
7feb51b709 intel_iommu: switching the rest DPRINTF to trace
We have converted many of the DPRINTF() into traces. This patch does the
last 100+ ones.

To debug VT-d when error happens, let's try enable:

  -trace enable="vtd_err*"

This should works just like the old GENERAL but of course better, since
we don't need to recompile.

Similar rules apply to the other modules. I was trying to make the
prefix good enough for sub-module debugging.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-06-16 18:44:55 +03:00
Peter Xu
dbaabb25f4 intel_iommu: support passthrough (PT)
Hardware support for VT-d device passthrough. Although current Linux can
live with iommu=pt even without this, but this is faster than when using
software passthrough.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Liu, Yi L <yi.l.liu@linux.intel.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
2017-05-25 21:25:27 +03:00
Peter Xu
dd4d607e40 intel_iommu: enable remote IOTLB
This patch is based on Aviv Ben-David (<bd.aviv@gmail.com>)'s patch
upstream:

  "IOMMU: enable intel_iommu map and unmap notifiers"
  https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg01453.html

However I removed/fixed some content, and added my own codes.

Instead of translate() every page for iotlb invalidations (which is
slower), we walk the pages when needed and notify in a hook function.

This patch enables vfio devices for VT-d emulation.

And, since we already have vhost DMAR support via device-iotlb, a
natural benefit that this patch brings is that vt-d enabled vhost can
live even without ATS capability now. Though more tests are needed.

Signed-off-by: Aviv Ben-David <bdaviv@cs.technion.ac.il>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-10-git-send-email-peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-04-20 15:22:41 -03:00
Peter Xu
558e0024a4 intel_iommu: allow dynamic switch of IOMMU region
This is preparation work to finally enabled dynamic switching ON/OFF for
VT-d protection. The old VT-d codes is using static IOMMU address space,
and that won't satisfy vfio-pci device listeners.

Let me explain.

vfio-pci devices depend on the memory region listener and IOMMU replay
mechanism to make sure the device mapping is coherent with the guest
even if there are domain switches. And there are two kinds of domain
switches:

  (1) switch from domain A -> B
  (2) switch from domain A -> no domain (e.g., turn DMAR off)

Case (1) is handled by the context entry invalidation handling by the
VT-d replay logic. What the replay function should do here is to replay
the existing page mappings in domain B.

However for case (2), we don't want to replay any domain mappings - we
just need the default GPA->HPA mappings (the address_space_memory
mapping). And this patch helps on case (2) to build up the mapping
automatically by leveraging the vfio-pci memory listeners.

Another important thing that this patch does is to seperate
IR (Interrupt Remapping) from DMAR (DMA Remapping). IR region should not
depend on the DMAR region (like before this patch). It should be a
standalone region, and it should be able to be activated without
DMAR (which is a common behavior of Linux kernel - by default it enables
IR while disabled DMAR).

Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-9-git-send-email-peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-04-20 15:22:41 -03:00
Peter Xu
f06a696dc9 intel_iommu: provide its own replay() callback
The default replay() don't work for VT-d since vt-d will have a huge
default memory region which covers address range 0-(2^64-1). This will
normally consumes a lot of time (which looks like a dead loop).

The solution is simple - we don't walk over all the regions. Instead, we
jump over the regions when we found that the page directories are empty.
It'll greatly reduce the time to walk the whole region.

To achieve this, we provided a page walk helper to do that, invoking
corresponding hook function when we found an page we are interested in.
vtd_page_walk_level() is the core logic for the page walking. It's
interface is designed to suite further use case, e.g., to invalidate a
range of addresses.

Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-8-git-send-email-peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-04-20 15:22:41 -03:00
Eric Blake
0d3ef78829 trace: Avoid abuse of amdvi_mmio_read
hw/i386/trace-events has an amdvi_mmio_read trace that is used for
both normal reads (listing the register name, address, size, and
offset) and for an error case (abusing the register name to show
an error message, the address to show the maximum value supported,
then shoehorning address and size into the size and offset
parameters).  The change from a wide address to a narrower size
parameter could truncate a (rather-large) bogus read attempt, so
it's better to create a separate dedicated trace with correct types,
rather than abusing the trace mechanism.  Broken since its
introduction in commit d29a09c.

[Change trace event argument type from hwaddr to uint64_t since
user-defined types should not be used for trace events.  This fixes a
build failure with LTTng UST.
--Stefan]

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-03-24 09:21:42 +00:00
Peter Xu
6c441e1d61 intel_iommu: convert dbg macros to trace for trans
Another patch to convert the DPRINTF() stuffs. This patch focuses on the
address translation path and caching.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-17 21:52:31 +02:00
Peter Xu
bc535e59c4 intel_iommu: convert dbg macros to traces for inv
VT-d codes are still using static DEBUG_INTEL_IOMMU macro. That's not
good, and we should end the day when we need to recompile the code
before getting useful debugging information for vt-d. Time to switch to
the trace system. This is the first patch to do it.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-17 21:52:31 +02:00
Stefan Hajnoczi
7f4076c1bb trace: clean up trace-events files
There are a number of unused trace events that
scripts/cleanup-trace-events.pl finds.  The "hw/vfio/pci-quirks.c"
filename was typoed and "qapi/qapi-visit-core.c" was missing the qapi/
directory prefix.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20170126171613.1399-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-31 17:12:15 +00:00
Daniel P. Berrange
9c5826306d trace: move hw/i386/xen events to correct subdir
The trace-events for a given source file should generally
always live in the same directory as the source file.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170125161417.31949-5-berrange@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-31 17:11:18 +00:00
Daniel P. Berrange
e97eb6f7f0 trace: move hw/mem/pc-dimm.c trace points into correct file
The trace points for hw/mem/pc-dimm.c were mistakenly put
in the hw/i386/trace-events file, instead of hw/mem/trace-events
in

  commit 5eb76e480b
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Thu Jun 16 09:40:10 2016 +0100

    trace: split out trace events for hw/i386/ directory

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1473872624-23285-4-git-send-email-berrange@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-28 19:17:54 +01:00
David Kiarie
d61e45ecc7 hw/i386/trace-events: Add AMD IOMMU trace events
Signed-off-by: David Kiarie <davidkiarie4@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-23 19:03:56 +03:00
Laurent Vivier
deff0ddb86 trace-event: display "%d" instead of "0x%d"
Display the slot number of mhp_pc_dimm_assigned_slot()
using "%d" without the "0x".

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-09-13 18:12:34 +03:00
Laurent Vivier
e723b87103 trace-events: fix first line comment in trace-events
Documentation is docs/tracing.txt instead of docs/trace-events.txt.

find . -name trace-events -exec \
     sed -i "s?See docs/trace-events.txt for syntax documentation.?See docs/tracing.txt for syntax documentation.?" \
     {} \;

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-id: 1470669081-17860-1-git-send-email-lvivier@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-08-12 10:36:01 +01:00
Peter Xu
02a2cbc872 x86-iommu: introduce IEC notifiers
This patch introduces x86 IOMMU IEC (Interrupt Entry Cache)
invalidation notifier list. When vIOMMU receives IEC invalidate
request, all the registered units will be notified with specific
invalidation requests.

Intel IOMMU is the first provider that generates such a event.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-07-21 20:43:49 +03:00
Daniel P. Berrange
5eb76e480b trace: split out trace events for hw/i386/ directory
Move all trace-events for files in the hw/i386/ directory to
their own file.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1466066426-16657-25-git-send-email-berrange@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-20 17:22:16 +01:00