Commit Graph

68 Commits

Author SHA1 Message Date
Dorjoy Chowdhury
f1826463d2 machine/nitro-enclave: New machine type for AWS Nitro Enclaves
AWS nitro enclaves[1] is an Amazon EC2[2] feature that allows creating
isolated execution environments, called enclaves, from Amazon EC2
instances which are used for processing highly sensitive data. Enclaves
have no persistent storage and no external networking. The enclave VMs
are based on the Firecracker microvm with a vhost-vsock device for
communication with the parent EC2 instance that spawned it and a Nitro
Secure Module (NSM) device for cryptographic attestation. The parent
instance VM always has CID 3 while the enclave VM gets a dynamic CID.

An EIF (Enclave Image Format)[3] file is used to boot an AWS nitro enclave
virtual machine. This commit adds support for AWS nitro enclave emulation
using a new machine type option '-M nitro-enclave'. This new machine type
is based on the 'microvm' machine type, similar to how real nitro enclave
VMs are based on Firecracker microvm. For nitro-enclave to boot from an
EIF file, the kernel and ramdisk(s) are extracted into a temporary kernel
and a temporary initrd file which are then hooked into the regular x86
boot mechanism along with the extracted cmdline. The EIF file path should
be provided using the '-kernel' QEMU option.

In QEMU, the vsock emulation for nitro enclave is added using vhost-user-
vsock as opposed to vhost-vsock. vhost-vsock doesn't support sibling VM
communication which is needed for nitro enclaves. So for the vsock
communication to CID 3 to work, another process that does the vsock
emulation in  userspace must be run, for example, vhost-device-vsock[4]
from rust-vmm, with necessary vsock communication support in another
guest VM with CID 3. Using vhost-user-vsock also enables the possibility
to implement some proxying support in the vhost-user-vsock daemon that
will forward all the packets to the host machine instead of CID 3 so
that users of nitro-enclave can run the necessary applications in their
host machine instead of running another whole VM with CID 3. The following
mandatory nitro-enclave machine option has been added related to the
vhost-user-vsock device.
  - 'vsock': The chardev id from the '-chardev' option for the
vhost-user-vsock device.

AWS Nitro Enclaves have built-in Nitro Secure Module (NSM) device which
has been added using the virtio-nsm device added in a previous commit.
In Nitro Enclaves, all the PCRs start in a known zero state and the first
16 PCRs are locked from boot and reserved. The PCR0, PCR1, PCR2 and PCR8
contain the SHA384 hashes related to the EIF file used to boot the VM
for validation. The following optional nitro-enclave machine options
have been added related to the NSM device.
  - 'id': Enclave identifier, reflected in the module-id of the NSM
device. If not provided, a default id will be set.
  - 'parent-role': Parent instance IAM role ARN, reflected in PCR3
of the NSM device.
  - 'parent-id': Parent instance identifier, reflected in PCR4 of the
NSM device.

[1] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
[2] https://aws.amazon.com/ec2/
[3] https://github.com/aws/aws-nitro-enclaves-image-format
[4] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock

Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Link: https://lore.kernel.org/r/20241008211727.49088-6-dorjoychy111@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31 18:28:33 +01:00
Bernhard Beschow
37b724cdef hw/char/serial.h: Extract serial-isa.h
The includes where updated based on compile errors. Now, the inclusion of the
header roughly matches Kconfig dependencies:

  # grep -r -e "select SERIAL_ISA"
  hw/ppc/Kconfig:    select SERIAL_ISA
  hw/isa/Kconfig:    select SERIAL_ISA
  hw/sparc64/Kconfig:    select SERIAL_ISA
  hw/i386/Kconfig:    select SERIAL_ISA
  hw/i386/Kconfig:    select SERIAL_ISA # for serial_hds_isa_init()

Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Link: https://lore.kernel.org/r/20240905073832.16222-3-shentey@gmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-03 19:33:23 +02:00
Juraj Marcin
1b063fe2df reset: Use ResetType for qemu_devices_reset() and MachineClass::reset()
Currently, both qemu_devices_reset() and MachineClass::reset() use
ShutdownCause for the reason of the reset. However, the Resettable
interface uses ResetState, so ShutdownCause needs to be translated to
ResetType somewhere. Translating it qemu_devices_reset() makes adding
new reset types harder, as they cannot always be matched to a single
ShutdownCause here, and devices may need to check the ResetType to
determine what to reset and if to reset at all.

This patch moves this translation up in the call stack to
qemu_system_reset() and updates all MachineClass children to use the
ResetType instead.

Message-ID: <20240904103722.946194-2-jmarcin@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Juraj Marcin <jmarcin@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2024-09-24 11:33:34 +02:00
David Woodhouse
93c76555d8 hw/i386/fw_cfg: Add etc/e820 to fw_cfg late
In e820_add_entry() the e820_table is reallocated with g_renew() to make
space for a new entry. However, fw_cfg_arch_create() just uses the
existing e820_table pointer. This leads to a use-after-free if anything
adds a new entry after fw_cfg is set up.

Shift the addition of the etc/e820 file to the machine done notifier, via
a new fw_cfg_add_e820() function.

Also make e820_table private and use an e820_get_table() accessor function
for it, which sets a flag that will trigger an assert() for any *later*
attempts to add to the table.

Make e820_add_entry() return void, as most callers don't check for error
anyway.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <a2708734f004b224f33d3b4824e9a5a262431568.camel@infradead.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-03 18:14:06 -04:00
Bernhard Beschow
8483518401 hw/i386: Have x86_bios_rom_init() take X86MachineState rather than MachineState
The function creates and leaks two MemoryRegion objects regarding the BIOS which
will be moved into X86MachineState in the next steps to avoid the leakage.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240430150643.111976-3-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-05-08 19:43:23 +02:00
Bernhard Beschow
9b0c44334c hw/i386/x86: Let ioapic_init_gsi() take parent as pointer
Rather than taking a QOM name which has to be resolved, let's pass the parent
directly as pointer. This simplifies the code.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-ID: <20240224135851.100361-2-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-02-27 09:37:30 +01:00
Philippe Mathieu-Daudé
bec4be77ea hw/acpi: Realize ACPI_GED sysbus device before accessing it
sysbus_mmio_map() should not be called on unrealized device.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20231018141151.87466-7-philmd@linaro.org>
2023-10-19 23:13:28 +02:00
Bernhard Beschow
c9c8ba69d5 hw/i386: Remove now redundant TYPE_ACPI_GED_X86
Now that TYPE_ACPI_GED_X86 doesn't assign AcpiDeviceIfClass::madt_cpu any more
it is the same as TYPE_ACPI_GED.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230908084234.17642-6-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04 18:15:05 -04:00
Philippe Mathieu-Daudé
a09ef8ff0a hw/i386: Rename 'hw/kvm/clock.h' -> 'hw/i386/kvm/clock.h'
kvmclock_create() is only implemented in hw/i386/kvm/clock.h.
Restrict the "hw/kvm/clock.h" header to i386 by moving it to
hw/i386/.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230620083228.88796-3-philmd@linaro.org>
2023-08-31 19:47:43 +02:00
Philippe Mathieu-Daudé
b797c98de4 hw/i386: Remove unuseful kvmclock_create() stub
We shouldn't call kvmclock_create() when KVM is not available
or disabled:
 - check for kvm_enabled() before calling it
 - assert KVM is enabled once called
Since the call is elided when KVM is not available, we can
remove the stub (it is never compiled).

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230620083228.88796-2-philmd@linaro.org>
2023-08-31 19:47:43 +02:00
Philippe Mathieu-Daudé
a5c80ab847 hw/i386/microvm: Simplify using object_dynamic_cast()
Use object_dynamic_cast() to determine if 'dev' is a TYPE_VIRTIO_MMIO.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-06-09 23:38:16 +03:00
Michael S. Tsirkin
167f487358 Revert "hw/i386: pass RNG seed via setup_data entry"
This reverts commit 67f7e426e5.

Additionally to the automatic revert, I went over the code
and dropped all mentions of legacy_no_rng_seed manually,
effectively reverting a combination of 2 additional commits:

    commit ffe2d2382e
    Author: Jason A. Donenfeld <Jason@zx2c4.com>
    Date:   Wed Sep 21 11:31:34 2022 +0200

        x86: re-enable rng seeding via SetupData

    commit 3824e25db1
    Author: Gerd Hoffmann <kraxel@redhat.com>
    Date:   Wed Aug 17 10:39:40 2022 +0200

        x86: disable rng seeding via setup_data

Fixes: 67f7e426e5 ("hw/i386: pass RNG seed via setup_data entry")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Dov Murik <dovmurik@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-02 03:10:46 -05:00
Michael S. Tsirkin
b34f2fd17e Revert "x86: don't let decompressed kernel image clobber setup_data"
This reverts commit eac7a7791b.

Fixes: eac7a7791b ("x86: don't let decompressed kernel image clobber setup_data")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Dov Murik <dovmurik@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-02 03:10:46 -05:00
Philippe Mathieu-Daudé
2d4bd81e39 hw/rtc: Rename rtc_[get|set]_memory -> mc146818rtc_[get|set]_cmos_data
rtc_get_memory() and rtc_set_memory() helpers only work with
TYPE_MC146818_RTC devices. 'memory' in their name refer to
the CMOS region. Rename them as mc146818rtc_get_cmos_data()
and mc146818rtc_set_cmos_data() to be explicit about what
they are doing.

Mechanical change doing:

  $ sed -i -e 's/rtc_set_memory/mc146818rtc_set_cmos_data/g' \
        $(git grep -wl rtc_set_memory)
  $ sed -i -e 's/rtc_get_memory/mc146818rtc_get_cmos_data/g' \
        $(git grep -wl rtc_get_memory)

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230210233116.80311-4-philmd@linaro.org>
2023-02-27 22:29:02 +01:00
Philippe Mathieu-Daudé
55c86cb803 hw/rtc/mc146818rtc: Pass MC146818RtcState instead of ISADevice argument
rtc_get_memory() and rtc_set_memory() methods can not take any
TYPE_ISA_DEVICE object. They expect a TYPE_MC146818_RTC one.

Simplify the API by passing a MC146818RtcState.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230210233116.80311-3-philmd@linaro.org>
2023-02-27 22:29:02 +01:00
Philippe Mathieu-Daudé
7067887ea1 hw/isa: Rename isa_bus_irqs() -> isa_bus_register_input_irqs()
isa_bus_irqs() register an array of input IRQs on
the ISA bus. Rename it as isa_bus_register_input_irqs().

Mechanical change using:

 $ sed -i -e 's/isa_bus_irqs/isa_bus_register_input_irqs/g' \
   $(git grep -wl isa_bus_irqs)

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230210163744.32182-10-philmd@linaro.org>
2023-02-27 22:29:02 +01:00
Jason A. Donenfeld
eac7a7791b x86: don't let decompressed kernel image clobber setup_data
The setup_data links are appended to the compressed kernel image. Since
the kernel image is typically loaded at 0x100000, setup_data lives at
`0x100000 + compressed_size`, which does not get relocated during the
kernel's boot process.

The kernel typically decompresses the image starting at address
0x1000000 (note: there's one more zero there than the compressed image
above). This usually is fine for most kernels.

However, if the compressed image is actually quite large, then
setup_data will live at a `0x100000 + compressed_size` that extends into
the decompressed zone at 0x1000000. In other words, if compressed_size
is larger than `0x1000000 - 0x100000`, then the decompression step will
clobber setup_data, resulting in crashes.

Visually, what happens now is that QEMU appends setup_data to the kernel
image:

          kernel image            setup_data
   |--------------------------||----------------|
0x100000                  0x100000+l1     0x100000+l1+l2

The problem is that this decompresses to 0x1000000 (one more zero). So
if l1 is > (0x1000000-0x100000), then this winds up looking like:

          kernel image            setup_data
   |--------------------------||----------------|
0x100000                  0x100000+l1     0x100000+l1+l2

                                 d e c o m p r e s s e d   k e r n e l
                     |-------------------------------------------------------------|
                0x1000000                                                     0x1000000+l3

The decompressed kernel seemingly overwriting the compressed kernel
image isn't a problem, because that gets relocated to a higher address
early on in the boot process, at the end of startup_64. setup_data,
however, stays in the same place, since those links are self referential
and nothing fixes them up.  So the decompressed kernel clobbers it.

Fix this by appending setup_data to the cmdline blob rather than the
kernel image blob, which remains at a lower address that won't get
clobbered.

This could have been done by overwriting the initrd blob instead, but
that poses big difficulties, such as no longer being able to use memory
mapped files for initrd, hurting performance, and, more importantly, the
initrd address calculation is hard coded in qboot, and it always grows
down rather than up, which means lots of brittle semantics would have to
be changed around, incurring more complexity. In contrast, using cmdline
is simple and doesn't interfere with anything.

The microvm machine has a gross hack where it fiddles with fw_cfg data
after the fact. So this hack is updated to account for this appending,
by reserving some bytes.

Fixup-by: Michael S. Tsirkin <mst@redhat.com>
Cc: x86@kernel.org
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20221230220725.618763-1-Jason@zx2c4.com>
Message-ID: <20230128061015-mutt-send-email-mst@kernel.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Eric Biggers <ebiggers@google.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
2023-01-28 06:21:29 -05:00
Ani Sinha
4ad08e8a57 hw/i386/e820: remove legacy reserved entries for e820
e820 reserved entries were used before the dynamic entries with fw config files
were intoduced. Please see the following change:
7d67110f2d9a6("pc: add etc/e820 fw_cfg file")

Identical support was introduced into seabios as well with the following commit:
ce39bd4031820 ("Add support for etc/e820 fw_cfg file")

Both the above commits are now quite old. QEMU machines 1.7 and newer no longer
use the reserved entries. Seabios uses fw config files and
dynamic e820 entries by default and only falls back to using reserved entries
when it has to work with old qemu (versions earlier than 1.7). Please see
functions qemu_cfg_e820() and qemu_early_e820(). It is safe to remove legacy
FW_CFG_E820_TABLE and associated code now as QEMU 7.0 has deprecated i440fx
machines 1.7 and older. It would be incredibly rare to run the latest qemu
version with a very old version of seabios that did not support fw config files
for e820.

As far as I could see, edk2/ovfm never supported reserved entries and uses fw
config files from the beginning. So there should be no incompatibilities with
ovfm as well.

CC: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Ani Sinha <ani@anisinha.ca>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220831045311.33083-1-ani@anisinha.ca>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-11-02 06:56:31 -04:00
Jason A. Donenfeld
7966d70f6f reset: allow registering handlers that aren't called by snapshot loading
Snapshot loading only expects to call deterministic handlers, not
non-deterministic ones. So introduce a way of registering handlers that
won't be called when reseting for snapshots.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-id: 20221025004327.568476-2-Jason@zx2c4.com
[PMM: updated json doc comment with Markus' text; fixed
 checkpatch style nit]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27 11:34:31 +01:00
Maciej S. Szmigiero
ec19444a53 hyperv: fix SynIC SINT assertion failure on guest reset
Resetting a guest that has Hyper-V VMBus support enabled triggers a QEMU
assertion failure:
hw/hyperv/hyperv.c:131: synic_reset: Assertion `QLIST_EMPTY(&synic->sint_routes)' failed.

This happens both on normal guest reboot or when using "system_reset" HMP
command.

The failing assertion was introduced by commit 64ddecc88b ("hyperv: SControl is optional to enable SynIc")
to catch dangling SINT routes on SynIC reset.

The root cause of this problem is that the SynIC itself is reset before
devices using SINT routes have chance to clean up these routes.

Since there seems to be no existing mechanism to force reset callbacks (or
methods) to be executed in specific order let's use a similar method that
is already used to reset another interrupt controller (APIC) after devices
have been reset - by invoking the SynIC reset from the machine reset
handler via a new x86_cpu_after_reset() function co-located with
the existing x86_cpu_reset() in target/i386/cpu.c.
Opportunistically move the APIC reset handler there, too.

Fixes: 64ddecc88b ("hyperv: SControl is optional to enable SynIc") # exposed the bug
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <cb57cee2e29b20d06f81dce054cbcea8b5d497e8.1664552976.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Jason A. Donenfeld
ffe2d2382e x86: re-enable rng seeding via SetupData
This reverts 3824e25db1 ("x86: disable rng seeding via setup_data"), but
for 7.2 rather than 7.1, now that modifying setup_data is safe to do.

Cc: Laurent Vivier <laurent@vivier.eu>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220921093134.2936487-4-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-27 11:30:59 +02:00
Gerd Hoffmann
3824e25db1 x86: disable rng seeding via setup_data
Causes regressions when doing direct kernel boots with OVMF.

At this point in the release cycle the only sensible action
is to just disable this for 7.1 and sort it properly in the
7.2 devel cycle.

Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <eduardo@habkost.net>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220817083940.3174933-1-kraxel@redhat.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <eduardo@habkost.net>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2022-08-17 07:07:37 -04:00
Jason A. Donenfeld
67f7e426e5 hw/i386: pass RNG seed via setup_data entry
Tiny machines optimized for fast boot time generally don't use EFI,
which means a random seed has to be supplied some other way. For this
purpose, Linux (≥5.20) supports passing a seed in the setup_data table
with SETUP_RNG_SEED, specially intended for hypervisors, kexec, and
specialized bootloaders. The linked commit shows the upstream kernel
implementation.

At Paolo's request, we don't pass these to versioned machine types ≤7.0.

Link: https://git.kernel.org/tip/tip/c/68b8e9713c8
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <eduardo@habkost.net>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220721125636.446842-1-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-22 19:26:34 +02:00
Gerd Hoffmann
3ef1497b46 microvm: turn off io reservations for pcie root ports
The pcie host bridge has no io window on microvm,
so io reservations will not work.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220701091516.43489-1-kraxel@redhat.com>
2022-07-19 14:35:06 +02:00
Xiaoyao Li
c300bbe8d2 hw/i386: Make pic a property of common x86 base machine type
Legacy PIC (8259) cannot be supported for TDX guests since TDX module
doesn't allow directly interrupt injection.  Using posted interrupts
for the PIC is not a viable option as the guest BIOS/kernel will not
do EOI for PIC IRQs, i.e. will leave the vIRR bit set.

Make PIC the property of common x86 machine type. Hence all x86
machines, including microvm, can disable it.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20220310122811.807794-3-xiaoyao.li@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-16 16:15:40 -04:00
Xiaoyao Li
9dee7e5109 hw/i386: Make pit a property of common x86 base machine type
Both pc and microvm have pit property individually. Let's just make it
the property of common x86 base machine type.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20220310122811.807794-2-xiaoyao.li@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-16 16:15:40 -04:00
Richard Henderson
b1fd92137e * Build system fixes and cleanups
* DMA support in the multiboot option ROM
 * Rename default-bus-bypass-iommu
 * Deprecate -watchdog and cleanup -watchdog-action
 * HVF fix for <PAGE_SIZE regions
 * Support TSC scaling for AMD nested virtualization
 * Fix for ESP fuzzing bug
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGBUeEUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOh+Qf+OMRhRiv6dYjbK/5zXrx81AgxYAY3
 dBUSr8v16LyrMl1U3DZWzhD+MzQsC83m/Xsh4lGxlHDWtkK9QQA5xDG95JZdY26i
 MGCbbjnFHISbyBQV9Y724gPfPjOOODuoFbzafSx6VLITOcyv1ye0cm7TOjOPB+tt
 E4c3JqTZ7g8a5yMe8ItkVhz5pPY+oVw8dxMNRp6Sup5Dbfx0DjacIwLasLsHfPL7
 qBADfqB20ovHUzLjXu7oWgEd4KxJ6kiSCaJJu/KD36hg0wB8+WVP1o43j4PkczHT
 QjU7eZaeaTrN5Cf34ttPge6QReMi5SFNCaA9O9/HLqrQgdEtt/diZWuqjQ==
 =a2mC
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* Build system fixes and cleanups
* DMA support in the multiboot option ROM
* Rename default-bus-bypass-iommu
* Deprecate -watchdog and cleanup -watchdog-action
* HVF fix for <PAGE_SIZE regions
* Support TSC scaling for AMD nested virtualization
* Fix for ESP fuzzing bug

# gpg: Signature made Tue 02 Nov 2021 10:57:37 AM EDT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]

* remotes/bonzini/tags/for-upstream: (27 commits)
  configure: fix --audio-drv-list help message
  configure: Remove the check for the __thread keyword
  Move the l2tpv3 test from configure to meson.build
  meson: remove unnecessary coreaudio test program
  meson: remove pointless warnings
  meson.build: Allow to disable OSS again
  meson: bump submodule to 0.59.3
  qtest/am53c974-test: add test for cancelling in-flight requests
  esp: ensure in-flight SCSI requests are always cancelled
  KVM: SVM: add migration support for nested TSC scaling
  hw/i386: fix vmmouse registration
  watchdog: remove select_watchdog_action
  vl: deprecate -watchdog
  watchdog: add information from -watchdog help to -device help
  hw/i386: Rename default_bus_bypass_iommu
  hvf: Avoid mapping regions < PAGE_SIZE as ram
  configure: do not duplicate CPU_CFLAGS into QEMU_LDFLAGS
  configure: remove useless NPTL probe
  target/i386: use DMA-enabled multiboot ROM for new-enough QEMU machine types
  optionrom: add a DMA-enabled multiboot ROM
  ...

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-03 13:07:30 -04:00
Gerd Hoffmann
f5918a9928 microvm: add device tree support.
Allows edk2 detect virtio-mmio devices and pcie ecam.
See comment in hw/i386/microvm-dt.c for more details.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20211014193617.2475578-1-kraxel@redhat.com>
2021-11-02 17:24:17 +01:00
Paolo Bonzini
f014c97459 target/i386: move linuxboot_dma_enabled to X86MachineState
This removes a parameter from x86_load_linux, and will avoid code
duplication between the linux and multiboot cases once multiboot
starts to support DMA.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-02 15:57:27 +01:00
Markus Armbruster
0d9a665451 microvm: Drop dead error handling in microvm_machine_state_init()
Stillborn in commit 0ebf007dda "hw/i386: Introduce the microvm machine
type".

Cc: Sergio Lopez <slp@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210720125408.387910-12-armbru@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
2021-08-26 17:15:28 +02:00
Thomas Huth
2068cabd3f Do not include cpu.h if it's not really necessary
Stop including cpu.h in files that don't need it.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-4-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-05-02 17:24:51 +02:00
Marian Postevca
d07b22863b acpi: Move setters/getters of oem fields to X86MachineState
The code that sets/gets oem fields is duplicated in both PC and MICROVM
variants. This commit moves it to X86MachineState so that all x86
variants can use it and duplication is removed.

Signed-off-by: Marian Postevca <posteuca@mutex.one>
Message-Id: <20210221001737.24499-2-posteuca@mutex.one>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22 18:58:19 -04:00
Michael S. Tsirkin
43e229a52b acpi: use constants as strncpy limit
gcc is not smart enough to figure out length was validated before use as
strncpy limit, resulting in this warning:

inlined from ‘virt_set_oem_table_id’ at ../../hw/arm/virt.c:2197:5:
/usr/include/aarch64-linux-gnu/bits/string_fortified.h:106:10: error:
‘__builtin_strncpy’ specified bound depends on the length of the
source argument [-Werror=stringop-overflow=]

Simplify things by using a constant limit instead.

Fixes: 97fc5d507fca ("acpi: Permit OEM ID and OEM table ID fields to be changed")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-05 08:52:59 -05:00
Marian Postevca
602b458201 acpi: Permit OEM ID and OEM table ID fields to be changed
Qemu's ACPI table generation sets the fields OEM ID and OEM table ID
to "BOCHS " and "BXPCxxxx" where "xxxx" is replaced by the ACPI
table name.

Some games like Red Dead Redemption 2 seem to check the ACPI OEM ID
and OEM table ID for the strings "BOCHS" and "BXPC" and if they are
found, the game crashes(this may be an intentional detection
mechanism to prevent playing the game in a virtualized environment).

This patch allows you to override these default values.

The feature can be used in this manner:
qemu -machine oem-id=ABCDEF,oem-table-id=GHIJKLMN

The oem-id string can be up to 6 bytes in size, and the
oem-table-id string can be up to 8 bytes in size. If the string are
smaller than their respective sizes they will be padded with space.
If either of these parameters is not set, the current default values
will be used for the one missing.

Note that the the OEM Table ID field will not be extended with the
name of the table, but will use either the default name or the user
provided one.

This does not affect the -acpitable option (for user-defined ACPI
tables), which has precedence over -machine option.

Signed-off-by: Marian Postevca <posteuca@mutex.one>
Message-Id: <20210119003216.17637-3-posteuca@mutex.one>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-02-05 08:52:59 -05:00
Claudio Fontana
a9dc68d9b2 i386: move kvm accel files into kvm/
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20201212155530.23098-2-cfontana@suse.de>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-12-16 14:06:52 -05:00
Paolo Bonzini
7d435078af i386: remove bios_name
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20201026143028.3034018-6-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-10 12:15:05 -05:00
Gerd Hoffmann
4d01b8994c microvm: add second ioapic
Create second ioapic, route virtio-mmio IRQs to it,
allow more virtio-mmio devices (24 instead of 8).

Needs ACPI, enabled by default, can be turned off
using -machine ioapic2=off

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20201203105423.10431-8-kraxel@redhat.com
2020-12-10 08:47:44 +01:00
Gerd Hoffmann
e57e9ae799 microvm: drop microvm_gsi_handler()
With the improved gsi_handler() we don't need
our private version any more.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20201203105423.10431-7-kraxel@redhat.com
2020-12-10 08:47:44 +01:00
Gerd Hoffmann
3d09c00704 microvm: make pcie irq base runtime changeable
Allows to move them in case we have enough
irq lines available.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20201203105423.10431-6-kraxel@redhat.com
2020-12-10 08:47:44 +01:00
Gerd Hoffmann
c214a7bcb6 microvm: make number of virtio transports runtime changeable
This will allow to increase the number of transports in
case we have enough irq lines available for them all.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20201203105423.10431-5-kraxel@redhat.com
2020-12-10 08:47:44 +01:00
Gerd Hoffmann
d4a42e8581 microvm: add usb support
Wire up "usb=on" machine option, when enabled add
a sysbus xhci controller with 8 ports.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20201020074844.5304-6-kraxel@redhat.com
2020-10-21 11:36:19 +02:00
Gerd Hoffmann
64b070dad3 microvm: set pci_irq_mask
Makes sure the PCI interrupt overrides are added to the
APIC table in case PCIe is enabled.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20201016113835.17465-5-kraxel@redhat.com
2020-10-21 11:36:05 +02:00
Eduardo Habkost
eafa08683f i386/kvm: Delete kvm_allows_irq0_override()
As IRQ routing is always available on x86,
kvm_allows_irq0_override() will always return true, so we don't
need the function anymore.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200922201922.2153598-4-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-10-14 15:28:54 -04:00
Peter Maydell
b23317eec4 microvm: add pcie support.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCgAGBQJfdMT5AAoJEEy22O7T6HE4QHkQAKBLDfVAoogJTQgKcgKKVAfb
 vxH+c0zIX4bXlh+/+aAShXf/1To1BkZtbIxYJX2hx9oec3zO+DK+p1YrAK8O0Lcz
 hleEyVpYhhX90y0HDzFlF9q05O90vYP+hzj8VW+IgkOJ7nWG+KdkiRBkxlwvn0PJ
 Zw4qw9fjZ/MW0Ml2UVQv2lfAaTc8XiasZo1ZEfZ8rK/a0ut+0wLefzWzqm//bJD+
 Ek2x9Om3okg2emeuBkeSWLlZ40fMGfEXn4UQkE7ZCLN6Q/LqSdEIn00MSjJa8C4T
 Z3CVNeHRlgG9C80tbM6rs+2YbWhBj0RPa7woNGZmVJaLIsBrMSC5s9ifvvnamtnE
 wzBm9Qayv67BcQHZOgEgxrSrNc7/tibwvcpGfiT9ONz/PVbMO7eTlRGFnwNGh2Fv
 0caPb8Ge9PLyfc7BXLday/0RM91lu3zTOlnfm6U/KFWPucF+zMFN5KCAGyqComxk
 g+1VxPPpXtCcIFwGYZ1yesKTW6VHFUEb6v5+gkU1UUJhSoz6141AR72DNFm2NA0j
 gk9GJ5ZZzMlFQV6YcrGkpFo0q0DKqSMy3dU1HjT7zMbh09hhJqdT1dyIBEfxJpgu
 LvDI318bvBjwqkdnlRxwQ01GZ3HGGkga0UHjz1LbeYlR59UC2wJWtCoMRYt9Oms4
 d+b7Fmbec2tU18uVtSOP
 =BHn7
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kraxel/tags/microvm-20200930-pull-request' into staging

microvm: add pcie support.

# gpg: Signature made Wed 30 Sep 2020 18:48:41 BST
# gpg:                using RSA key 4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" [full]
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>" [full]
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>" [full]
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/microvm-20200930-pull-request:
  tests/acpi: update expected data files
  acpi/gpex: no reason to use a method for _CRS
  tests/acpi: add microvm pcie test
  tests/acpi: factor out common microvm test setup
  tests/acpi: add empty tests/data/acpi/microvm/DSDT.pcie file
  tests/acpi: allow updates for expected data files
  microvm/pcie: add 64bit mmio window
  microvm: add pcie support
  microvm: add irq table
  arm: use acpi_dsdt_add_gpex
  acpi: add acpi_dsdt_add_gpex
  move MemMapEntry

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-10-01 15:28:55 +01:00
Vitaly Kuznetsov
8700a98443 target/i386: always create kvmclock device
QEMU's kvmclock device is only created when KVM PV feature bits for
kvmclock (KVM_FEATURE_CLOCKSOURCE/KVM_FEATURE_CLOCKSOURCE2) are
exposed to the guest. With 'kvm=off' cpu flag the device is not
created and we don't call KVM_GET_CLOCK/KVM_SET_CLOCK upon migration.
It was reported that without these call at least Hyper-V TSC page
clocksouce (which can be enabled independently) gets broken after
migration.

Switch to creating kvmclock QEMU device unconditionally, it seems
to always make sense to call KVM_GET_CLOCK/KVM_SET_CLOCK on migration.
Use KVM_CAP_ADJUST_CLOCK check instead of CPUID feature bits.

Reported-by: Antoine Damhet <antoine.damhet@blade-group.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200922151934.899555-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-30 19:11:36 +02:00
Gerd Hoffmann
8c2d9f9a38 microvm/pcie: add 64bit mmio window
Place the 64bit window at the top of the physical address space, assign
25% of the avaiable address space.  Force cpu.host-phys-bits=on for
microvm machine typs so this actually works reliable.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20200928104256.9241-7-kraxel@redhat.com
2020-09-30 11:29:56 +02:00
Gerd Hoffmann
24db877ab6 microvm: add pcie support
Uses the existing gpex device which is also used as pcie host bridge on
arm/aarch64.  For now only a 32bit mmio window and no ioport support.

It is disabled by default, use "-machine microvm,pcie=on" to enable.
ACPI support must be enabled too because the bus is declared in the
DSDT table.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20200928104256.9241-6-kraxel@redhat.com
2020-09-30 11:29:56 +02:00
Gerd Hoffmann
63bcfe7be0 microvm: enable ramfb
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20200915120909.20838-22-kraxel@redhat.com
2020-09-17 14:16:19 +02:00
Gerd Hoffmann
e3ab9873d2 microvm: wire up hotplug
The cpu hotplug code handles the initialization of coldplugged cpus
too, so it is needed even in case cpu hotplug is not supported.

Wire cpu hotplug up for microvm.
Without this we get a broken MADT table.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20200915120909.20838-17-kraxel@redhat.com
2020-09-17 14:16:19 +02:00
Gerd Hoffmann
50aef13181 x86: move acpi_dev from pc/microvm
Both pc and microvm machine types have a acpi_dev field.
Move it to the common base type.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-id: 20200915120909.20838-15-kraxel@redhat.com
2020-09-17 14:16:19 +02:00