Back in 2016, we discussed[1] rules for headers, and these were
generally liked:
1. Have a carefully curated header that's included everywhere first. We
got that already thanks to Peter: osdep.h.
2. Headers should normally include everything they need beyond osdep.h.
If exceptions are needed for some reason, they must be documented in
the header. If all that's needed from a header is typedefs, put
those into qemu/typedefs.h instead of including the header.
3. Cyclic inclusion is forbidden.
This patch gets include/ closer to obeying 2.
It's actually extracted from my "[RFC] Baby steps towards saner
headers" series[2], which demonstrates a possible path towards
checking 2 automatically. It passes the RFC test there.
[1] Message-ID: <87h9g8j57d.fsf@blackfin.pond.sub.org>
https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg03345.html
[2] Message-Id: <20190711122827.18970-1-armbru@redhat.com>
https://lists.nongnu.org/archive/html/qemu-devel/2019-07/msg02715.html
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190812052359.30071-2-armbru@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
This will make unversioned CPU models behavior depend on the
machine type:
* "pc-*-4.0" and older will not report them as aliases.
This is done to keep compatibility with older QEMU versions
after management software starts translating aliases.
* "pc-*-4.1" will translate unversioned CPU models to -v1.
This is done to keep compatibility with existing management
software, that still relies on CPU model runnability promises.
* "none" will translate unversioned CPU models to their latest
version. This is planned become the default in future machine
types (probably in pc-*-4.3).
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190628002844.24894-8-ehabkost@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
To make smp_parse() more flexible and expansive, a smp_parse function
pointer is added to MachineClass that machine types could override.
The generic smp_parse() code in vl.c is moved to hw/core/machine.c, and
become the default implementation of MachineClass::smp_parse. A PC-specific
function called pc_smp_parse() has been added to hw/i386/pc.c, which in
this patch changes nothing against the default one .
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190620054525.37188-3-like.xu@linux.intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
In new sockets/dies/cores/threads model, the apicid of logical cpu could
imply die level info of guest cpu topology thus x86_apicid_from_cpu_idx()
need to be refactored with #dies value, so does apicid_*_offset().
To keep semantic compatibility, the legacy pkg_offset which helps to
generate CPUIDs such as 0x3 for L3 cache should be mapping to die_offset.
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Message-Id: <20190612084104.34984-5-like.xu@linux.intel.com>
[ehabkost: squash unit test patch]
Message-Id: <20190612084104.34984-6-like.xu@linux.intel.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The field die_id (default as 0) and has_die_id are introduced to X86CPU.
Following the legacy smp check rules, the die_id validity is added to
the same contexts as leagcy smp variables such as hmp_hotpluggable_cpus(),
machine_set_cpu_numa_node(), cpu_slot_to_string() and pc_cpu_pre_plug().
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Message-Id: <20190612084104.34984-4-like.xu@linux.intel.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The die-level as the first PC-specific cpu topology is added to the leagcy
cpu topology model, which has one die per package implicitly and only the
numbers of sockets/cores/threads are configurable.
In the new model with die-level support, the total number of logical
processors (including offline) on board will be calculated as:
#cpus = #sockets * #dies * #cores * #threads
and considering compatibility, the default value for #dies would be
initialized to one in x86_cpu_initfn() and pc_machine_initfn().
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Message-Id: <20190612084104.34984-2-like.xu@linux.intel.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
To get rid of the global smp_* variables we're currently using, it's recommended
to pass MachineState in the list of incoming parameters for functions that use
global smp variables, thus some redundant parameters are dropped. It's applied
for legacy smbios_*(), *_machine_reset(), hot_add_cpu() and mips *_create_cpu().
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190518205428.90532-3-like.xu@linux.intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Replace the static variable with a PCMachineClass field. This
will help us eventually get rid of the pc_compat_*() init
functions.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190628200227.1053-1-ehabkost@redhat.com>
Reviewed-by: Philippe Mathieu-Daude <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit c87759ce87 fixed a regression affecting pc-q35 machines by
introducing a new pc-q35-4.0.1 machine version to be used instead
of pc-q35-4.0. The only purpose was to revert the default behaviour
of not using split irqchip, but the change also introduced the usual
hw_compat and pc_compat bits, and wired them for pc-q35 only.
This raises questions when it comes to add new compat properties for
4.0* machine versions of any architecture. Where to add them ? In
4.0, 4.0.1 or both ? Error prone. Another possibility would be to teach
all other architectures about 4.0.1. This solution isn't satisfying,
especially since this is a pc-q35 specific issue.
It turns out that the split irqchip default is handled in the machine
option function and doesn't involve compat lists at all.
Drop all the 4.0.1 compat lists and use the 4.0 ones instead in the 4.0.1
machine option function.
Move the compat props that were added to the 4.0.1 since c87759ce87 to
4.0.
Even if only hw_compat_4_0_1 had an impact on other architectures,
drop pc_compat_4_0_1 as well for consistency.
Fixes: c87759ce87 "q35: Revert to kernel irqchip"
Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <156051774276.244890.8660277280145466396.stgit@bahia.lan>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
No header includes qemu-common.h after this commit, as prescribed by
qemu-common.h's file comment.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-5-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c
block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c
target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h
target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h
target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h
target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and
net/tap-bsd.c fixed up]
Commit b2fc91db84 ("q35: set split kernel irqchip as default") changed
the default for the pc-q35-4.0 machine type to use split irqchip, which
turned out to have disasterous effects on vfio-pci INTx support. KVM
resampling irqfds are registered for handling these interrupts, but
these are non-functional in split irqchip mode. We can't simply test
for split irqchip in QEMU as userspace handling of this interrupt is a
significant performance regression versus KVM handling (GeForce GPUs
assigned to Windows VMs are non-functional without forcing MSI mode or
re-enabling kernel irqchip).
The resolution is to revert the change in default irqchip mode in the
pc-q35-4.1 machine and create a pc-q35-4.0.1 machine for the 4.0-stable
branch. The qemu-q35-4.0 machine type should not be used in vfio-pci
configurations for devices requiring legacy INTx support without
explicitly modifying the VM configuration to use kernel irqchip.
Link: https://bugs.launchpad.net/qemu/+bug/1826422
Fixes: b2fc91db84 ("q35: set split kernel irqchip as default")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <155786484688.13873.6037015630912983760.stgit@gimli.home>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It was found that Hyper-V 2016 on KVM in some configurations (q35 machine +
piix4-usb-uhci) hangs on boot. Root-cause was that one of Hyper-V
level-triggered interrupt handler performs EOI before fixing the cause of
the interrupt. This results in IOAPIC keep re-raising the level-triggered
interrupt after EOI because irq-line remains asserted.
Gory details: https://www.spinics.net/lists/kvm/msg184484.html
(the whole thread).
Turns out we were dealing with similar issues before; in-kernel IOAPIC
implementation has commit 184564efae4d ("kvm: ioapic: conditionally delay
irq delivery duringeoi broadcast") which describes a very similar issue.
Steal the idea from the above mentioned commit for IOAPIC implementation in
QEMU. SUCCESSIVE_IRQ_MAX_COUNT, delay and the comment are borrowed as well.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20190402080215.10747-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When building with CONFIG_Q35=n, we get:
LINK x86_64-softmmu/qemu-system-x86_64
/usr/bin/ld: hw/i386/acpi-build.o: in function `acpi_get_misc_info':
/source/qemu/hw/i386/acpi-build.c:243: undefined reference to `ich9_lpc_find'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:204: qemu-system-x86_64] Error 1
This is due to a dependency in acpi-build.c on the ICH9_LPC
(via ich9_lpc_find) and PIIX4_PM (via piix4_pm_find) devices.
To allow better modularity (compile acpi-build.c with only
Q35/ICH9 or ISAPC/PIIX4), refactor the similar helper as
object_resolve_type_unambiguous(). This way we relax the
linker dependencies and can build the x86 targets with a
selection of machines (instead of all of them).
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190427144025.22880-3-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Header guard symbols should match their file name to make guard
collisions less likely.
Cleaned up with scripts/clean-header-guards.pl, followed by some
renaming of new guard symbols picked by the script to better ones.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190315145123.28030-6-armbru@redhat.com>
[Rebase to master: update include/hw/net/ne2000-isa.h]
VTD_RTADDR_RTT is dropped even by the VT-d spec, so QEMU should
probably do the same thing (after all we never really implemented it).
Since we've had a field for that in the migration stream, to keep
compatibility we need to fill the hole up.
Please refer to VT-d spec 10.4.6.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190329061422.7926-3-peterx@redhat.com>
Reviewed-by: Liu, Yi L <yi.l.liu@intel.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Previously we have per-device system memory aliases when DMAR is
disabled by the system. It will slow the system down if there are
lots of devices especially when DMAR is disabled, because each of the
aliased system address space will contain O(N) slots, and rendering
such N address spaces will be O(N^2) complexity.
This patch introduces a shared nodmar memory region and for each
device we only create an alias to the shared memory region. With the
aliasing, QEMU memory core API will be able to detect when devices are
sharing the same address space (which is the nodmar address space)
when rendering the FlatViews and the total number of FlatViews can be
dramatically reduced when there are a lot of devices.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190313094323.18263-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch adds an option to provide flexibility for user to expose
Scalable Mode to guest. User could expose Scalable Mode to guest by
the config as below:
"-device intel-iommu,caching-mode=on,scalable-mode=on"
The Linux iommu driver has supported scalable mode. Please refer below
patch set:
https://www.spinics.net/lists/kernel/msg2985279.html
Signed-off-by: Liu, Yi L <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Message-Id: <1551753295-30167-4-git-send-email-yi.y.sun@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Per Intel(R) VT-d 3.0, the qi_desc is 256 bits in Scalable
Mode. This patch adds emulation of 256bits qi_desc.
Signed-off-by: Liu, Yi L <yi.l.liu@intel.com>
[Yi Sun is co-developer to rebase and refine the patch.]
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <1551753295-30167-3-git-send-email-yi.y.sun@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Intel(R) VT-d 3.0 spec introduces scalable mode address translation to
replace extended context mode. This patch extends current emulator to
support Scalable Mode which includes root table, context table and new
pasid table format change. Now intel_iommu emulates both legacy mode
and scalable mode (with legacy-equivalent capability set).
The key points are below:
1. Extend root table operations to support both legacy mode and scalable
mode.
2. Extend context table operations to support both legacy mode and
scalable mode.
3. Add pasid tabled operations to support scalable mode.
Signed-off-by: Liu, Yi L <yi.l.liu@intel.com>
[Yi Sun is co-developer to contribute much to refine the whole commit.]
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Message-Id: <1551753295-30167-2-git-send-email-yi.y.sun@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
The PC machines put firmware in ROM by default. To get it put into
flash memory (required by OVMF), you have to use -drive
if=pflash,unit=0,... and optionally -drive if=pflash,unit=1,...
Why two -drive? This permits setting up one part of the flash memory
read-only, and the other part read/write. It also makes upgrading
firmware on the host easier. Below the hood, it creates two separate
flash devices, because we were too lazy to improve our flash device
models to support sector protection.
The problem at hand is to do the same with -blockdev somehow, as one
more step towards deprecating -drive.
Mapping -drive if=none,... to -blockdev is a solved problem. With
if=T other than if=none, -drive additionally configures a block device
frontend. For non-onboard devices, that part maps to -device. Also a
solved problem. For onboard devices such as PC flash memory, we have
an unsolved problem.
This is actually an instance of a wider problem: our general device
configuration interface doesn't cover onboard devices. Instead, we have
a zoo of ad hoc interfaces that are much more limited. One of them is
-drive, which we'd rather deprecate, but can't until we have suitable
replacements for all its uses.
Sadly, I can't attack the wider problem today. So back to the narrow
problem.
My first idea was to reduce it to its solved buddy by using pluggable
instead of onboard devices for the flash memory. Workable, but it
requires some extra smarts in firmware descriptors and libvirt. Paolo
had an idea that is simpler for libvirt: keep the devices onboard, and
add machine properties for their block backends.
The implementation is less than straightforward, I'm afraid.
First, block backend properties are *qdev* properties. Machines can't
have those, as they're not devices. I could duplicate these qdev
properties as QOM properties, but I hate that.
More seriously, the properties do not belong to the machine, they
belong to the onboard flash devices. Adding them to the machine would
then require bad magic to somehow transfer them to the flash devices.
Fortunately, QOM provides the means to handle exactly this case: add
alias properties to the machine that forward to the onboard devices'
properties.
Properties need to be created in .instance_init() methods. For PC
machines, that's pc_machine_initfn(). To make alias properties work,
we need to create the onboard flash devices there, too. Requires
several bug fixes, in the previous commits. We also have to realize
the devices. More on that below.
If the user sets pflash0, firmware resides in flash memory.
pc_system_firmware_init() maps and realizes the flash devices.
Else, firmware resides in ROM. The onboard flash devices aren't used
then. pc_system_firmware_init() destroys them unrealized, along with
the alias properties.
The existing code to pick up drives defined with -drive if=pflash is
replaced by code to desugar into the machine properties.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <87ftrtux81.fsf@dusky.pond.sub.org>
pc_system_firmware_init() parameter @isapc_ram_fw is PCMachineState
member pci_enabled negated. The next commit will need more of
PCMachineState. To prepare for that, pass a PCMachineState *, and
drop the now redundant parameter @isapc_ram_fw.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20190308131445.17502-11-armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
As NVDIMM support is looming for ARM and SPAPR, let's
move the acpi_nvdimm_state to the generic machine struct
instead of duplicating the same code in several machines.
It is also renamed into nvdimms_state and becomes a pointer.
nvdimm and nvdimm-persistence become generic machine options.
They become guarded by a nvdimm_supported machine class member.
We also add a description for those options.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20190308182053.5487-3-eric.auger@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
As we intend to migrate the acpi_nvdimm_state into
the base machine with a new dimms_state name, let's
also rename the datatype.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190308182053.5487-2-eric.auger@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Function pc_acpi_init() is not used anymore.
Remove the definition and declaration.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20190214084939.20640-2-richardw.yang@linux.intel.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
In order to avoid migration issues, we enable PVH only for
machine type >= 4.0
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Files requiring PCMachineClass already include "hw/i386/pc.h".
To clean "qemu/typedefs.h", move the declaration to "hw/i386/pc.h"
(removing the forward declaration).
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Instead of verbose arrays with 4 lines for each entry, make each
entry take only one line. This makes long arrays that couldn't
fit in the screen become short and readable.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190107193020.21744-4-ehabkost@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Make them more QOMConventional.
Cc:qemu-trivial@nongnu.org
Signed-off-by: Li Qiang <liq3ea@163.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20190105023831.66910-1-liq3ea@163.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Use static arrays instead. I decided to rename the conflicting
pc_compat_2_1() function with pc_compat_2_1_fn().
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Use static arrays instead. I decided to rename the conflicting
pc_compat_2_1() function with pc_compat_2_1_fn().
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Use static arrays instead. I decided to rename the conflicting
pc_compat_2_2() function with pc_compat_2_2_fn().
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Use static arrays instead. I decided to rename the conflicting
pc_compat_2_3() function with pc_compat_2_3_fn().
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Switch the intr_supported variable from a boolean to OnOffAuto type so
that we can know whether the user specified it or not. With that
we'll have a chance to help the user to choose more wisely where
possible. Introduce x86_iommu_ir_supported() to mask these changes.
No functional change at all.
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Support DMA read/write draining should be easy for existing VT-d
emulation since the emulation itself does not have any request queue
there so we don't need to do anything to flush the un-commited queue.
What we need to do is to declare the support.
These capabilities are required to pass Windows SVVP test program. It
is verified that when with parameters "x-aw-bits=48,caching-mode=off"
we can pass the Windows SVVP test with this patch applied. Otherwise
we'll fail with:
IOMMU[0] - DWD (DMA write draining) not supported
IOMMU[0] - DWD (DMA read draining) not supported
Segment 0 has no DMA remapping capable IOMMU units
However since these bits are not declared support for QEMU<=3.1, we'll
need a compatibility bit for it and we turn this on by default only
for QEMU>=4.0.
Please refer to VT-d spec 6.5.4 for more information.
CC: Yu Wang <wyu@redhat.com>
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1654550
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Including all machine types that might have a pcie-root-port.
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <154394083644.28192.8501647946108201466.stgit@gimli.home>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
[ehabkost: fixed accidental recursion at spapr_machine_3_1_class_options()]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This makes their function more clear and prevents conflicts when adding
the actual devices to the machine state, if necessary.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20181107152434.22219-1-minyard@acm.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The vtd_generate_msi_message() in intel-iommu is used to construct a MSI
Message from IRQ. A similar function will be needed when we add interrupt
remapping support in amd-iommu. Moving the function in common file to
avoid the code duplication. Rename it to x86_iommu_irq_to_msi_message().
There is no logic changes in the code flow.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Suggested-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
As the release document ref below link (page 13):
https://software.intel.com/sites/default/files/managed/c5/15/\
architecture-instruction-set-extensions-programming-reference.pdf
PKU is supported in Skylake Server (Only Server) and later, and
on Intel(R) Xeon(R) Processor Scalable Family. So PKU is supposed
to be in Skylake-Server CPU model. And PKU's CPUID has been
exposed to QEMU. But PKU can't be find in Skylake-Server CPU
model in the code. So this patch will fix this issue in
Skylake-Server CPU model.
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Message-Id: <5014b57f834dcfa8fd3781504d98dcf063d54fde.1540801392.git.tao3.xu@intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Certain configurations do not allow SynIC to be used in QEMU. In
particular,
- when hyperv_vpindex is off, SINT routes can't be used as they refer to
the destination vCPU by vp_index
- older KVM (which doesn't expose KVM_CAP_HYPERV_SYNIC2) zeroes out
SynIC message and event pages on every msr load, breaking migration
OTOH in-KVM users of SynIC -- SynIC timers -- do work in those
configurations, and we shouldn't stop the guest from using them.
To cover both scenarios, introduce an X86CPU property that makes CPU
init code to skip creation of the SynIC object (and thus disables any
SynIC use in QEMU) but keeps the KVM part of the SynIC working.
The property is clear by default but is set via compat logic for older
machine types.
As a result, when hv_synic and a modern machine type are specified, QEMU
will refuse to run unless vp_index is on and the kernel is recent
enough. OTOH with an older machine type QEMU will run fine with
hv_synic=on against an older kernel and/or without vp_index enabled but
will disallow the in-QEMU uses of SynIC (in e.g. VMBus).
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Message-Id: <20180921082217.29481-4-rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
MSR_SMI_COUNT started being migrated in QEMU 2.12. Do not migrate it
on older machine types, or the subsection causes a load failure for
guests that use SMM.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Let's start to use "info pic" just like other platforms. For now we
keep the command for a while so that old users can know what is the new
command to use.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20171229073104.3810-6-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This include both userspace and in-kernel ioapic. Note that the numbers
can be inaccurate for kvm-ioapic. One reason is the same with
kvm-i8259, that when irqfd is used, irqs can be delivered all inside
kernel without our notice. Meanwhile, kvm-ioapic is specially treated
when irq numbers <ISA_NUM_IRQS, those irqs will be delivered in kernel
too via kvm-i8259 (please refer to kvm_pc_gsi_handler).
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20171229073104.3810-5-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Enable TOPOEXT feature on EPYC CPU. This is required to support
hyperthreading on VM guests. Also extend xlevel to 0x8000001E.
Disable topoext on PC_COMPAT_2_12 and keep xlevel 0x8000000a.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <1529443919-67509-3-git-send-email-babu.moger@amd.com>
[ehabkost: Added EPYC-IBPB.xlevel to PC_COMPAT_2_12]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Replace the "nvdimm-cap" option which took numeric arguments such as "2"
with a more user friendly "nvdimm-persistence" option which takes symbolic
arguments "cpu" or "mem-ctrl".
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add a machine command line option to allow the user to control the Platform
Capabilities Structure in the virtualized NFIT. This Platform Capabilities
Structure was added in ACPI 6.2 Errata A.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch fixes a potential small window that the DMA page table might
be incomplete or invalid when the guest sends domain/context
invalidations to a device. This can cause random DMA errors for
assigned devices.
This is a major change to the VT-d shadow page walking logic. It
includes but is not limited to:
- For each VTDAddressSpace, now we maintain what IOVA ranges we have
mapped and what we have not. With that information, now we only send
MAP or UNMAP when necessary. Say, we don't send MAP notifies if we
know we have already mapped the range, meanwhile we don't send UNMAP
notifies if we know we never mapped the range at all.
- Introduce vtd_sync_shadow_page_table[_range] APIs so that we can call
in any places to resync the shadow page table for a device.
- When we receive domain/context invalidation, we should not really run
the replay logic, instead we use the new sync shadow page table API to
resync the whole shadow page table without unmapping the whole
region. After this change, we'll only do the page walk once for each
domain invalidations (before this, it can be multiple, depending on
number of notifiers per address space).
While at it, the page walking logic is also refactored to be simpler.
CC: QEMU Stable <qemu-stable@nongnu.org>
Reported-by: Jintack Lim <jintack@cs.columbia.edu>
Tested-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
For UNMAP-only IOMMU notifiers, we don't need to walk the page tables.
Fasten that procedure by skipping the page table walk. That should
boost performance for UNMAP-only notifiers like vhost.
CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
SECURITY IMPLICATION: this patch fixes a potential race when multiple
threads access the IOMMU IOTLB cache.
Add a per-iommu big lock to protect IOMMU status. Currently the only
thing to be protected is the IOTLB/context cache, since that can be
accessed even without BQL, e.g., in IO dataplane.
Note that we don't need to protect device page tables since that's fully
controlled by the guest kernel. However there is still possibility that
malicious drivers will program the device to not obey the rule. In that
case QEMU can't really do anything useful, instead the guest itself will
be responsible for all uncertainties.
CC: QEMU Stable <qemu-stable@nongnu.org>
Reported-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
That is not really necessary. Removing that node struct and put the
list entry directly into VTDAddressSpace. It simplfies the code a lot.
Since at it, rename the old notifiers_list into vtd_as_with_notifiers.
CC: QEMU Stable <qemu-stable@nongnu.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The property legacy-cache will be used to control the cache information.
If user passes "-cpu legacy-cache" then older information will
be displayed even if the hardware supports new information. Otherwise
use the statically loaded cache definitions if available.
Renamed the previous cache structures to legacy_*. If there is any change in
the cache information, then it needs to be initialized in builtin_x86_defs.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Tested-by: Geoffrey McRae <geoff@hostfission.com>
Message-Id: <20180514164156.27034-3-babu.moger@amd.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
By default MachineClass::get_hotplug_handler is NULL and concrete board
should set it to it's own handler.
Considering there isn't any default handler, drop saving empty
MachineClass::get_hotplug_handler in child class and make PC code
consistent with spapr/s390x boards.
We can bring this back when actual usecase surfaces and do it
consistently across boards that use get_hotplug_handler().
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-id: 1525691524-32265-2-git-send-email-imammedo@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Let's make it clear that we are dealing with device memory. That it can
be used for memory hotplug is just a special case.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180423165126.15441-10-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Let's allow to query the MemoryHotplugState directly from the machine.
If the pointer is NULL, the machine does not support memory devices. If
the pointer is !NULL, the machine supports memory devices and the
data structure contains information about the applicable physical
guest address space region.
This allows us to generically detect if a certain machine has support
for memory devices, and to generically manage it (find free address
range, plug/unplug a memory region).
We will rename "MemoryHotplugState" to something more meaningful
("DeviceMemory") after we completed factoring out the pc-dimm code into
MemoryDevice code.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180423165126.15441-3-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
[ehabkost: rebased series, solved conflicts at spapr.c]
[ehabkost: squashed fix to use g_malloc0()]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Wang Xin <wangxinxin.wang@huawei.com>
Message-Id: <1517367668-25048-1-git-send-email-wangxinxin.wang@huawei.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au> (hw/ppc)
Message-Id: <20180308223946.26784-4-f4bug@amsat.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Again... (after 07dc788054 and 9157eee1b1).
We now extract the ISA bus specific helpers.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20180308223946.26784-2-f4bug@amsat.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The e1000 NIC is getting old and is not a very good default for a
PCIe machine type. Change it to e1000e, which should be supported
by a good number of guests.
In particular, drivers for 82574 were added first to Linux 2.6.27 (2008)
and Windows 2008 R2. This does mean that Windows 2008 will not work
anymore with Q35 machine types and a default "-net nic -net xxx" network
configuration; it did work before because it does have an AHCI driver.
However, Windows 2008 has been declared out of main stream support
in 2015. It will get out of extended support in 2020. Windows 2008
R2 has the same end of support dates and, since the two are basically
Vista vs. Windows 7, R2 probably is more popular.
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The current implementation of Intel IOMMU code only supports 39 bits
iova address width. This patch provides a new parameter (x-aw-bits)
for intel-iommu to extend its address width to 48 bits but keeping the
default the same (39 bits). The reason for not changing the default
is to avoid potential compatibility problems with live migration of
intel-iommu enabled QEMU guest. The only valid values for 'x-aw-bits'
parameter are 39 and 48.
After enabling larger address width (48), we should be able to map
larger iova addresses in the guest. For example, a QEMU guest that
is configured with large memory ( >=1TB ). To check whether 48 bits
aw is enabled, we can grep in the guest dmesg output with line:
"DMAR: Host address width 48".
Signed-off-by: Prasad Singamsetty <prasad.singamsety@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The current implementation of Intel IOMMU code only supports 39 bits
host/iova address width so number of macros use hard coded values based
on that. This patch is to redefine them so they can be used with
variable address widths. This patch doesn't add any new functionality
but enables adding support for 48 bit address width.
Signed-off-by: Prasad Singamsetty <prasad.singamsety@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
CPUID_7_0_EBX_CLFLUSHOPT is missed in current "Skylake-Server" cpu
model. Add it to "Skylake-Server" cpu model on pc-i440fx-2.12 and
pc-q35-2.12. Keep it disabled in "Skylake-Server" cpu model on older
machine types.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Message-Id: <20171219033730.12748-3-haozhong.zhang@intel.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This patch adds the function apic_get_highest_priority_irr to
apic.c and exports it through the interface in apic.h for use by hvf.
Signed-off-by: Sergio Andres Gomez Del Real <Sergio.G.DelReal@gmail.com>
Message-Id: <20170913090522.4022-8-Sergio.G.DelReal@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We have PCI_DEVFN_MAX now.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Liu, Yi L <yi.l.liu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
and remove the old i386/pc dependency.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
enable_tco is specific to i386/pc.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
this allows to remove the old i386/pc dependency on acpi/core.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
According to SDM 10.11.1, only [19:12] bits of MSI address are
Destination ID, change the mask to avoid ambiguity for VT-d spec
has used the bit 4 to indicate a remappable interrupt request.
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently there is no MMIO range over 4G
reserved for PCI hotplug. Since the 32bit PCI hole
depends on the number of cold-plugged PCI devices
and other factors, it is very possible is too small
to hotplug PCI devices with large BARs.
Fix it by reserving 2G for I4400FX chipset
in order to comply with older Win32 Guest OSes
and 32G for Q35 chipset.
Even if the new defaults of pci-hole64-size will appear in
"info qtree" also for older machines, the property was
not implemented so no changes will be visible to guests.
Note this is a regression since prev QEMU versions had
some range reserved for 64bit PCI hotplug.
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Starting with Windows Server 2012 and Windows 8, if
CPUID.40000005.EAX contains a value of -1, Windows assumes specific
limit to the number of VPs. In this case, Windows Server 2012
guest VMs may use more than 64 VPs, up to the maximum supported
number of processors applicable to the specific Windows
version being used.
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs
For compatibility, Let's introduce a new property for X86CPU,
named "x-hv-max-vps" as Eduardo's suggestion, and set it
to 0x40 before machine 2.10.
(The "x-" prefix indicates that the property is not supposed to
be a stable user interface.)
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Message-Id: <1505143227-14324-1-git-send-email-arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It was cached by read/write separately. Let's merge them.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently when running KVM, we expose "KVMKVMKVM\0\0\0" in
the 0x40000000 CPUID leaf. Other hypervisors (VMWare,
HyperV, Xen, BHyve) all do the same thing, which leaves
TCG as the odd one out.
The CPUID signature is used by software to detect which
virtual environment they are running in and (potentially)
change behaviour in certain ways. For example, systemd
supports a ConditionVirtualization= setting in unit files.
The virt-what command can also report the virt type it is
running on
Currently both these apps have to resort to custom hacks
like looking for 'fw-cfg' entry in the /proc/device-tree
file to identify TCG.
This change thus proposes a signature "TCGTCGTCGTCG" to be
reported when running under TCG.
To hide this, the -cpu option tcg-cpuid=off can be used.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20170509132736.10071-3-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This finishes QOM'fication of IOMMUMemoryRegion by introducing
a IOMMUMemoryRegionClass. This also provides a fastpath analog for
IOMMU_MEMORY_REGION_GET_CLASS().
This makes IOMMUMemoryRegion an abstract class.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170711035620.4232-3-aik@ozlabs.ru>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion
as a parent.
This moves IOMMU-related fields from MR to IOMMU MR. However to avoid
dymanic QOM casting in fast path (address_space_translate, etc),
this adds an @is_iommu boolean flag to MR and provides new helper to
do simple cast to IOMMU MR - memory_region_get_iommu. The flag
is set in the instance init callback. This defines
memory_region_is_iommu as memory_region_get_iommu()!=NULL.
This switches MemoryRegion to IOMMUMemoryRegion in most places except
the ones where MemoryRegion may be an alias.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20170711035620.4232-2-aik@ozlabs.ru>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
pc.h and sysemu/kvm.h are also included from common code (where
CONFIG_KVM is not available), so the #defines that depend on CONFIG_KVM
should not be declared here to avoid that anybody is using them in a
wrong way. Since we're also going to poison CONFIG_KVM for common code,
let's move them to kvm_i386.h instead. Most of the dummy definitions
from sysemu/kvm.h are also unused since the code that uses them is
only compiled for CONFIG_KVM (e.g. target/i386/kvm.c), so the unused
defines are also simply dropped here instead of being moved.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1498454578-18709-3-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The q35 machine type currently lets the guest firmware select a 1MB, 2MB
or 8MB TSEG (basically, SMRAM) size. In edk2/OVMF, we use 8MB, but even
that is not enough when a lot of VCPUs (more than approx. 224) are
configured -- SMRAM footprint scales largely proportionally with VCPU
count.
Introduce a new property for "mch" called "extended-tseg-mbytes", which
expresses (in megabytes) the user's choice of TSEG (SMRAM) size.
Invent a new, QEMU-specific register in the config space of the DRAM
Controller, at offset 0x50, in order to allow guest firmware to query the
TSEG (SMRAM) size.
According to Intel Document Number 316966-002, Table 5-1 "DRAM Controller
Register Address Map (D0:F0)":
Warning: Address locations that are not listed are considered Intel
Reserved registers locations. Reads to Reserved registers may
return non-zero values. Writes to reserved locations may
cause system failures.
All registers that are defined in the PCI 2.3 specification,
but are not necessary or implemented in this component are
simply not included in this document. The
reserved/unimplemented space in the PCI configuration header
space is not documented as such in this summary.
Offsets 0x50 and 0x51 are not listed in Table 5-1. They are also not part
of the standard PCI config space header. And they precede the capability
list as well, which starts at 0xe0 for this device.
When the guest writes value 0xffff to this register, the value that can be
read back is that of "mch.extended-tseg-mbytes" -- unless it remains
0xffff. The guest is required to write 0xffff first (as opposed to a
read-only register) because PCI config space is generally not cleared on
QEMU reset, and after S3 resume or reboot, new guest firmware running on
old QEMU could read a guest OS-injected value from this register.
After reading the available "extended" TSEG size, the guest firmware may
actually request that TSEG size by writing pattern 11b to the ESMRAMC
register's TSEG_SZ bit-field. (The Intel spec referenced above defines
only patterns 00b (1MB), 01b (2MB) and 10b (8MB); 11b is reserved.)
On the QEMU command line, the value can be set with
-global mch.extended-tseg-mbytes=N
The default value for 2.10+ q35 machine types is 16. The value is limited
to 0xfff (4095) at the moment, purely so that the product (4095 MB) can be
stored to the uint32_t variable "tseg_size" in mch_update_smram(). Users
are responsible for choosing sensible TSEG sizes.
On 2.9 and earlier q35 machine types, the default value is 0. This lets
the 11b bit pattern in ESMRAMC.TSEG_SZ, and the register at offset 0x50,
keep their original behavior.
When "extended-tseg-mbytes" is nonzero, the new register at offset 0x50 is
set to that value on reset, for completeness.
PCI config space is migrated automatically, so no VMSD changes are
necessary.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1447027
Ref: https://lists.01.org/pipermail/edk2-devel/2017-May/010456.html
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Since the automatic cpuid-level code was introduced in commit
c39c0edf9b ("target-i386: Automatically
set level/xlevel/xlevel2 when needed"), the CPU model tables just define
the default CPUID level code (set using "min-level"). Setting
"[x]level" forces CPUID level to a specific value and disable the
automatic-level logic.
But the PC compat code was not updated and the existing "[x]level"
compat properties broke compatibility for people using features that
triggered the auto-level code. To keep previous behavior, we should set
"min-[x]level" instead of "[x]level" on compat_props.
This was not a problem for most cases, because old machine-types don't
have full-cpuid-auto-level enabled. The only common use case it broke
was the CPUID[7] auto-level code, that was already enabled since the
first CPUID[7] feature was introduced (in QEMU 1.4.0).
This causes the regression reported at:
https://bugzilla.redhat.com/show_bug.cgi?id=1454641
Change the PC compat code to use "min-[x]level" instead of "[x]level" on
compat_props, and add new test cases to ensure we don't break this
again.
Reported-by: "Guo, Zhiyi" <zhguo@redhat.com>
Fixes: c39c0edf9b ("target-i386: Automatically set level/xlevel/xlevel2 when needed")
Cc: qemu-stable@nongnu.org
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Hardware support for VT-d device passthrough. Although current Linux can
live with iommu=pt even without this, but this is faster than when using
software passthrough.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Liu, Yi L <yi.l.liu@linux.intel.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>