Commit Graph

79 Commits

Author SHA1 Message Date
Michael S. Tsirkin
fdc27ced04 Revert "x86: reinitialize RNG seed on system reboot"
This reverts commit 763a2828bf.

Fixes: 763a2828bf ("x86: reinitialize RNG seed on system reboot")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Dov Murik <dovmurik@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-02 03:10:46 -05:00
Michael S. Tsirkin
b4bfa0a31d Revert "x86: re-initialize RNG seed when selecting kernel"
This reverts commit cc63374a5a.

Fixes: cc63374a5a ("x86: re-initialize RNG seed when selecting kernel")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Dov Murik <dovmurik@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-02 03:10:46 -05:00
Michael S. Tsirkin
ef82d893de Revert "x86: do not re-randomize RNG seed on snapshot load"
This reverts commit 14b29fea74.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: 14b29fea74 ("x86: do not re-randomize RNG seed on snapshot load")
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Dov Murik <dovmurik@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-02 03:10:46 -05:00
Michael S. Tsirkin
b34f2fd17e Revert "x86: don't let decompressed kernel image clobber setup_data"
This reverts commit eac7a7791b.

Fixes: eac7a7791b ("x86: don't let decompressed kernel image clobber setup_data")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Dov Murik <dovmurik@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-02 03:10:46 -05:00
Philippe Mathieu-Daudé
2d4bd81e39 hw/rtc: Rename rtc_[get|set]_memory -> mc146818rtc_[get|set]_cmos_data
rtc_get_memory() and rtc_set_memory() helpers only work with
TYPE_MC146818_RTC devices. 'memory' in their name refer to
the CMOS region. Rename them as mc146818rtc_get_cmos_data()
and mc146818rtc_set_cmos_data() to be explicit about what
they are doing.

Mechanical change doing:

  $ sed -i -e 's/rtc_set_memory/mc146818rtc_set_cmos_data/g' \
        $(git grep -wl rtc_set_memory)
  $ sed -i -e 's/rtc_get_memory/mc146818rtc_get_cmos_data/g' \
        $(git grep -wl rtc_get_memory)

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230210233116.80311-4-philmd@linaro.org>
2023-02-27 22:29:02 +01:00
Philippe Mathieu-Daudé
55c86cb803 hw/rtc/mc146818rtc: Pass MC146818RtcState instead of ISADevice argument
rtc_get_memory() and rtc_set_memory() methods can not take any
TYPE_ISA_DEVICE object. They expect a TYPE_MC146818_RTC one.

Simplify the API by passing a MC146818RtcState.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230210233116.80311-3-philmd@linaro.org>
2023-02-27 22:29:02 +01:00
Philippe Mathieu-Daudé
892afa04e6 hw/i386/x86: Reduce init_topo_info() scope
This function is not used anywhere outside this file, so
we can delete the prototype from include/hw/i386/x86.h and
make the function "static void".

This fixes when building with -Wall and using Clang
("Apple clang version 14.0.0 (clang-1400.0.29.202)"):

  ../hw/i386/x86.c:70:24: error: static function 'MACHINE' is used in an inline function with external linkage [-Werror,-Wstatic-in-inline]
      MachineState *ms = MACHINE(x86ms);
                         ^
  include/hw/i386/x86.h:101:1: note: use 'static' to give inline function 'init_topo_info' internal linkage
  void init_topo_info(X86CPUTopoInfo *topo_info, const X86MachineState *x86ms);
  ^
  static
  include/hw/boards.h:24:49: note: 'MACHINE' declared here
  OBJECT_DECLARE_TYPE(MachineState, MachineClass, MACHINE)
                                                  ^

Reported-by: Stefan Weil <sw@weilnetz.de>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221216220158.6317-6-philmd@linaro.org>
2023-02-27 22:29:01 +01:00
Markus Armbruster
6f1e91f716 error: Drop superfluous #include "qapi/qmp/qerror.h"
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230207075115.1525-2-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Konstantin Kostiuk <kkostiuk@redhat.com>
2023-02-23 13:56:14 +01:00
Jason A. Donenfeld
eac7a7791b x86: don't let decompressed kernel image clobber setup_data
The setup_data links are appended to the compressed kernel image. Since
the kernel image is typically loaded at 0x100000, setup_data lives at
`0x100000 + compressed_size`, which does not get relocated during the
kernel's boot process.

The kernel typically decompresses the image starting at address
0x1000000 (note: there's one more zero there than the compressed image
above). This usually is fine for most kernels.

However, if the compressed image is actually quite large, then
setup_data will live at a `0x100000 + compressed_size` that extends into
the decompressed zone at 0x1000000. In other words, if compressed_size
is larger than `0x1000000 - 0x100000`, then the decompression step will
clobber setup_data, resulting in crashes.

Visually, what happens now is that QEMU appends setup_data to the kernel
image:

          kernel image            setup_data
   |--------------------------||----------------|
0x100000                  0x100000+l1     0x100000+l1+l2

The problem is that this decompresses to 0x1000000 (one more zero). So
if l1 is > (0x1000000-0x100000), then this winds up looking like:

          kernel image            setup_data
   |--------------------------||----------------|
0x100000                  0x100000+l1     0x100000+l1+l2

                                 d e c o m p r e s s e d   k e r n e l
                     |-------------------------------------------------------------|
                0x1000000                                                     0x1000000+l3

The decompressed kernel seemingly overwriting the compressed kernel
image isn't a problem, because that gets relocated to a higher address
early on in the boot process, at the end of startup_64. setup_data,
however, stays in the same place, since those links are self referential
and nothing fixes them up.  So the decompressed kernel clobbers it.

Fix this by appending setup_data to the cmdline blob rather than the
kernel image blob, which remains at a lower address that won't get
clobbered.

This could have been done by overwriting the initrd blob instead, but
that poses big difficulties, such as no longer being able to use memory
mapped files for initrd, hurting performance, and, more importantly, the
initrd address calculation is hard coded in qboot, and it always grows
down rather than up, which means lots of brittle semantics would have to
be changed around, incurring more complexity. In contrast, using cmdline
is simple and doesn't interfere with anything.

The microvm machine has a gross hack where it fiddles with fw_cfg data
after the fact. So this hack is updated to account for this appending,
by reserving some bytes.

Fixup-by: Michael S. Tsirkin <mst@redhat.com>
Cc: x86@kernel.org
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20221230220725.618763-1-Jason@zx2c4.com>
Message-ID: <20230128061015-mutt-send-email-mst@kernel.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Eric Biggers <ebiggers@google.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
2023-01-28 06:21:29 -05:00
Zeng Guang
19e2a9fb9d target/i386: Set maximum APIC ID to KVM prior to vCPU creation
Specify maximum possible APIC ID assigned for current VM session to KVM
prior to the creation of vCPUs. By this setting, KVM can set up VM-scoped
data structure indexed by the APIC ID, e.g. Posted-Interrupt Descriptor
pointer table to support Intel IPI virtualization, with the most optimal
memory footprint.

It can be achieved by calling KVM_ENABLE_CAP for KVM_CAP_MAX_VCPU_ID
capability once KVM has enabled it. Ignoring the return error if KVM
doesn't support this capability yet.

Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220825025246.26618-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-31 09:46:34 +01:00
Jason A. Donenfeld
14b29fea74 x86: do not re-randomize RNG seed on snapshot load
Snapshot loading is supposed to be deterministic, so we shouldn't
re-randomize the various seeds used.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-id: 20221025004327.568476-4-Jason@zx2c4.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27 11:34:31 +01:00
Jason A. Donenfeld
cc63374a5a x86: re-initialize RNG seed when selecting kernel
We don't want it to be possible to re-read the RNG seed after ingesting
it, because this ruins forward secrecy. Currently, however, the setup
data section can just be re-read. Since the kernel is always read after
the setup data, use the selection of the kernel as a trigger to
re-initialize the RNG seed, just like we do on reboot, to preserve
forward secrecy.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220922152847.3670513-1-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-01 21:16:36 +02:00
Jason A. Donenfeld
763a2828bf x86: reinitialize RNG seed on system reboot
Since this is read from fw_cfg on each boot, the kernel zeroing it out
alone is insufficient to prevent it from being used twice. And indeed on
reboot we always want a new seed, not the old one. So re-fill it in this
circumstance.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220921093134.2936487-3-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-27 11:30:59 +02:00
Jason A. Donenfeld
eebb38a563 x86: use typedef for SetupData struct
The preferred style is SetupData as a typedef, not setup_data as a plain
struct.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220921093134.2936487-2-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-27 11:30:59 +02:00
Jason A. Donenfeld
e935b73508 x86: return modified setup_data only if read as memory, not as file
If setup_data is being read into a specific memory location, then
generally the setup_data address parameter is read first, so that the
caller knows where to read it into. In that case, we should return
setup_data containing the absolute addresses that are hard coded and
determined a priori. This is the case when kernels are loaded by BIOS,
for example. In contrast, when setup_data is read as a file, then we
shouldn't modify setup_data, since the absolute address will be wrong by
definition. This is the case when OVMF loads the image.

This allows setup_data to be used like normal, without crashing when EFI
tries to use it.

(As a small development note, strangely, fw_cfg_add_file_callback() was
exported but fw_cfg_add_bytes_callback() wasn't, so this makes that
consistent.)

Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Laurent Vivier <laurent@vivier.eu>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220921093134.2936487-1-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-27 11:30:59 +02:00
Joao Martins
4ab4c33014 hw/i386: add 4g boundary start to X86MachineState
Rather than hardcoding the 4G boundary everywhere, introduce a
X86MachineState field @above_4g_mem_start and use it
accordingly.

This is in preparation for relocating ram-above-4g to be
dynamically start at 1T on AMD platforms.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-2-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-07-26 10:40:58 -04:00
Jason A. Donenfeld
67f7e426e5 hw/i386: pass RNG seed via setup_data entry
Tiny machines optimized for fast boot time generally don't use EFI,
which means a random seed has to be supplied some other way. For this
purpose, Linux (≥5.20) supports passing a seed in the setup_data table
with SETUP_RNG_SEED, specially intended for hypervisors, kexec, and
specialized bootloaders. The linked commit shows the upstream kernel
implementation.

At Paolo's request, we don't pass these to versioned machine types ≤7.0.

Link: https://git.kernel.org/tip/tip/c/68b8e9713c8
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <eduardo@habkost.net>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220721125636.446842-1-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-22 19:26:34 +02:00
Jamie Iles
af9751316e hw/core/loader: return image sizes as ssize_t
Various loader functions return an int which limits images to 2GB which
is fine for things like a BIOS/kernel image, but if we want to be able
to load memory images or large ramdisks then any file over 2GB would
silently fail to load.

Cc: Luc Michel <lmichel@kalray.eu>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Luc Michel <lmichel@kalray.eu>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20211111141141.3295094-2-jamie@nuviainc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2022-06-10 09:31:42 +10:00
Xiaoyao Li
c300bbe8d2 hw/i386: Make pic a property of common x86 base machine type
Legacy PIC (8259) cannot be supported for TDX guests since TDX module
doesn't allow directly interrupt injection.  Using posted interrupts
for the PIC is not a viable option as the guest BIOS/kernel will not
do EOI for PIC IRQs, i.e. will leave the vIRR bit set.

Make PIC the property of common x86 machine type. Hence all x86
machines, including microvm, can disable it.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20220310122811.807794-3-xiaoyao.li@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-16 16:15:40 -04:00
Xiaoyao Li
9dee7e5109 hw/i386: Make pit a property of common x86 base machine type
Both pc and microvm have pit property individually. Let's just make it
the property of common x86 base machine type.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20220310122811.807794-2-xiaoyao.li@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-16 16:15:40 -04:00
David Woodhouse
dc89f32d92 target/i386: Fix sanity check on max APIC ID / X2APIC enablement
The check on x86ms->apic_id_limit in pc_machine_done() had two problems.

Firstly, we need KVM to support the X2APIC API in order to allow IRQ
delivery to APICs >= 255. So we need to call/check kvm_enable_x2apic(),
which was done elsewhere in *some* cases but not all.

Secondly, microvm needs the same check. So move it from pc_machine_done()
to x86_cpus_init() where it will work for both.

The check in kvm_cpu_instance_init() is now redundant and can be dropped.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20220314142544.150555-1-dwmw2@infradead.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-16 04:38:39 -04:00
Gerd Hoffmann
a8152c4e46 i386: firmware parsing and sev setup for -bios loaded firmware
Don't register firmware as rom, not needed (see comment).
Add x86_firmware_configure() call for proper sev initialization.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220425135051.551037-4-kraxel@redhat.com>
2022-04-27 07:51:01 +02:00
Gerd Hoffmann
2aa6a39bc8 i386: move bios load error message
Switch to usual goto-end-of-function error handling style.
No functional change.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220425135051.551037-2-kraxel@redhat.com>
2022-04-27 07:51:01 +02:00
Marc-André Lureau
0f9668e0c1 Remove qemu-common.h include from most units
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06 14:31:55 +02:00
Igor Mammedov
e6895f04c8 x86: cleanup unused compat_apic_id_mode
commit
  f862ddbb1a (hw/i386: Remove the deprecated pc-1.x machine types)
removed the last user of broken APIC ID compat knob,
but compat_apic_id_mode itself was forgotten.
Clean it up and simplify x86_cpu_apic_id_from_index()

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220228131634.3389805-1-imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-06 05:08:23 -05:00
Paolo Bonzini
3ca8ce720f target/i386: use DMA-enabled multiboot ROM for new-enough QEMU machine types
As long as fw_cfg supports DMA, the new ROM can be used also on older
machine types because it has the same size as the existing one.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-02 15:57:27 +01:00
Paolo Bonzini
f014c97459 target/i386: move linuxboot_dma_enabled to X86MachineState
This removes a parameter from x86_load_linux, and will avoid code
duplication between the linux and multiboot cases once multiboot
starts to support DMA.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-02 15:57:27 +01:00
Philippe Mathieu-Daudé
93777de365 target/i386/sev: Rename sev_i386.h -> sev.h
SEV is a x86 specific feature, and the "sev_i386.h" header
is already in target/i386/. Rename it as "sev.h" to simplify.

Patch created mechanically using:

  $ git mv target/i386/sev_i386.h target/i386/sev.h
  $ sed -i s/sev_i386.h/sev.h/ $(git grep -l sev_i386.h)

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20211007161716.453984-15-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-13 10:47:49 +02:00
Dov Murik
c0c2d319d6 x86/sev: generate SEV kernel loader hashes in x86_load_linux
If SEV is enabled and a kernel is passed via -kernel, pass the hashes of
kernel/initrd/cmdline in an encrypted guest page to OVMF for SEV
measured boot.

Co-developed-by: James Bottomley <jejb@linux.ibm.com>
Signed-off-by: James Bottomley <jejb@linux.ibm.com>
Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210930054915.13252-3-dovmurik@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-05 12:47:24 +02:00
Sean Christopherson
dfce81f1b9 vl: Add sgx compound properties to expose SGX EPC sections to guest
Because SGX EPC is enumerated through CPUID, EPC "devices" need to be
realized prior to realizing the vCPUs themselves, i.e. long before
generic devices are parsed and realized.  From a virtualization
perspective, the CPUID aspect also means that EPC sections cannot be
hotplugged without paravirtualizing the guest kernel (hardware does
not support hotplugging as EPC sections must be locked down during
pre-boot to provide EPC's security properties).

So even though EPC sections could be realized through the generic
-devices command, they need to be created much earlier for them to
actually be usable by the guest.  Place all EPC sections in a
contiguous block, somewhat arbitrarily starting after RAM above 4g.
Ensuring EPC is in a contiguous region simplifies calculations, e.g.
device memory base, PCI hole, etc..., allows dynamic calculation of the
total EPC size, e.g. exposing EPC to guests does not require -maxmem,
and last but not least allows all of EPC to be enumerated in a single
ACPI entry, which is expected by some kernels, e.g. Windows 7 and 8.

The new compound properties command for sgx like below:
 ......
 -object memory-backend-epc,id=mem1,size=28M,prealloc=on \
 -object memory-backend-epc,id=mem2,size=10M \
 -M sgx-epc.0.memdev=mem1,sgx-epc.1.memdev=mem2

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-6-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30 14:50:20 +02:00
Paolo Bonzini
67872eb8ed machine: move dies from X86MachineState to CpuTopology
In order to make SMP configuration a Machine property, we need a getter as
well as a setter.  To simplify the implementation put everything that the
getter needs in the CpuTopology struct.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210617155308.928754-7-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25 16:13:48 +02:00
Chenyi Qiang
035d1ef265 i386: Add ratelimit for bus locks acquired in guest
A bus lock is acquired through either split locked access to writeback
(WB) memory or any locked access to non-WB memory. It is typically >1000
cycles slower than an atomic operation within a cache and can also
disrupts performance on other cores.

Virtual Machines can exploit bus locks to degrade the performance of
system. To address this kind of performance DOS attack coming from the
VMs, bus lock VM exit is introduced in KVM and it can report the bus
locks detected in guest. If enabled in KVM, it would exit to the
userspace to let the user enforce throttling policies once bus locks
acquired in VMs.

The availability of bus lock VM exit can be detected through the
KVM_CAP_X86_BUS_LOCK_EXIT. The returned bitmap contains the potential
policies supported by KVM. The field KVM_BUS_LOCK_DETECTION_EXIT in
bitmap is the only supported strategy at present. It indicates that KVM
will exit to userspace to handle the bus locks.

This patch adds a ratelimit on the bus locks acquired in guest as a
mitigation policy.

Introduce a new field "bus_lock_ratelimit" to record the limited speed
of bus locks in the target VM. The user can specify it through the
"bus-lock-ratelimit" as a machine property. In current implementation,
the default value of the speed is 0 per second, which means no
restrictions on the bus locks.

As for ratelimit on detected bus locks, simply set the ratelimit
interval to 1s and restrict the quota of bus lock occurence to the value
of "bus_lock_ratelimit". A potential alternative is to introduce the
time slice as a property which can help the user achieve more precise
control.

The detail of bus lock VM exit can be found in spec:
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210521043820.29678-1-chenyi.qiang@intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-06-17 14:11:06 -04:00
Marian Postevca
d07b22863b acpi: Move setters/getters of oem fields to X86MachineState
The code that sets/gets oem fields is duplicated in both PC and MICROVM
variants. This commit moves it to X86MachineState so that all x86
variants can use it and duplication is removed.

Signed-off-by: Marian Postevca <posteuca@mutex.one>
Message-Id: <20210221001737.24499-2-posteuca@mutex.one>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22 18:58:19 -04:00
David Edmondson
e20e182ea0 x86/pvh: extract only 4 bytes of start address for 32 bit kernels
When loading the PVH start address from a 32 bit ELF note, extract
only the appropriate number of bytes.

Fixes: ab969087da ("pvh: Boot uncompressed kernel using direct boot ABI")
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20210302090315.3031492-3-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06 11:41:54 +01:00
Claudio Fontana
a9dc68d9b2 i386: move kvm accel files into kvm/
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20201212155530.23098-2-cfontana@suse.de>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-12-16 14:06:52 -05:00
Paolo Bonzini
2c65db5e58 vl: extract softmmu/datadir.c
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-10 12:15:18 -05:00
Paolo Bonzini
7d435078af i386: remove bios_name
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20201026143028.3034018-6-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-10 12:15:05 -05:00
Sunil Muthuswamy
faf20793b5 WHPX: support for the kernel-irqchip on/off
This patch adds support the kernel-irqchip option for
WHPX with on or off value. 'split' value is not supported
for the option. The option only works for the latest version
of Windows (ones that are coming out on Insiders). The
change maintains backward compatibility on older version of
Windows where this option is not supported.

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Message-Id: <SN4PR2101MB0880B13258DA9251F8459F4DC0170@SN4PR2101MB0880.namprd21.prod.outlook.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-10 12:15:01 -05:00
Gerd Hoffmann
94c5a60637 x86: add support for second ioapic
Add ioapic_init_secondary to initialize it, wire up
in gsi handling and acpi apic table creation.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20201203105423.10431-4-kraxel@redhat.com
2020-12-10 08:47:44 +01:00
Gerd Hoffmann
ceea95cd88 x86: rewrite gsi_handler()
Rewrite function to use switch() for IRQ number mapping.
Check i8259_irq exists before raising it so the function
also works in case no i8259 (aka pic) is present.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20201203105423.10431-3-kraxel@redhat.com
2020-12-10 08:47:44 +01:00
Gerd Hoffmann
1b2802c49f x86: make pci irqs runtime configurable
Add a variable to x86 machine state instead of
hard-coding the PCI interrupts.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20201016113835.17465-4-kraxel@redhat.com
2020-10-21 11:36:05 +02:00
Claudio Fontana
430065dab0 cpus: prepare new CpusAccel cpu accelerator interface
The new interface starts unused, will start being used by the
next patches.

It provides methods for each accelerator to start a vcpu, kick a vcpu,
synchronize state, get cpu virtual clock and elapsed ticks.

In qemu_wait_io_event, make it clear that APC is used only for HAX
on Windows.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-05 16:41:22 +02:00
Claudio Fontana
740b175973 cpu-timers, icount: new modules
refactoring of cpus.c continues with cpu timer state extraction.

cpu-timers: responsible for the softmmu cpu timers state,
            including cpu clocks and ticks.

icount: counts the TCG instructions executed. As such it is specific to
the TCG accelerator. Therefore, it is built only under CONFIG_TCG.

One complication is due to qtest, which uses an icount field to warp time
as part of qtest (qtest_clock_warp).

In order to solve this problem, provide a separate counter for qtest.

This requires fixing assumptions scattered in the code that
qtest_enabled() implies icount_enabled(), checking each specific case.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[remove redundant initialization with qemu_spice_init]
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
[fix lingering calls to icount_get]
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-05 16:41:22 +02:00
Igor Mammedov
c5be7517d6 x86: cpuhp: prevent guest crash on CPU hotplug when broadcast SMI is in use
There were reports of guest crash on CPU hotplug, when using q35 machine
type and OVMF with SMM, due to hotplugged CPU trying to process SMI at
default SMI handler location without it being relocated by firmware first.

Fix it by refusing hotplug if firmware hasn't negotiated CPU hotplug with
SMI support while SMI broadcast is in use.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20200923094650.1301166-3-imammedo@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-09-29 02:15:24 -04:00
Gerd Hoffmann
0cca1a918b x86: move cpu hotplug from pc to x86
The cpu hotplug code handles the initialization of coldplugged cpus
too, so it is needed even in case cpu hotplug is not supported.

Move the code from pc to x86, so microvm can use it.

Move both plug and unplug to keep everything in one place, even
though microvm needs plug only.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-id: 20200915120909.20838-16-kraxel@redhat.com
2020-09-17 14:16:19 +02:00
Gerd Hoffmann
9927a6329a x86: constify x86_machine_is_*_enabled
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200915120909.20838-14-kraxel@redhat.com
2020-09-17 14:16:19 +02:00
Babu Moger
0a48666a31 Revert "hw/i386: Update structures to save the number of nodes per package"
This reverts commit c24a41bb53.

Remove the EPYC specific apicid decoding and use the generic
default decoding.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <159889937478.21294.4192291354416942986.stgit@naples-babu.amd.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-09-02 07:29:26 -04:00
Babu Moger
0a714bff6c Revert "hw/i386: Introduce apicid functions inside X86MachineState"
This reverts commit 6121c7fbfd.

Remove the EPYC specific apicid decoding and use the generic
default decoding.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <159889935648.21294.8095493980805969544.stgit@naples-babu.amd.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-09-02 07:29:25 -04:00
Babu Moger
dfe7ed0a89 Revert "hw/i386: Move arch_id decode inside x86_cpus_init"
This reverts commit 2e26f4ab3b.

Remove the EPYC specific apicid decoding and use the generic
default decoding.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <159889934379.21294.15323080164340490855.stgit@naples-babu.amd.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-09-02 07:29:25 -04:00
Paolo Bonzini
2becc36a3e meson: infrastructure for building emulators
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-21 06:30:17 -04:00