When unplugging a device, at one point the device will be destroyed
via object_unparent(). This will, one the one hand, unrealize the
removed device hierarchy, and on the other hand, destroy/free the
device hierarchy.
When chaining hotplug handlers, we want to overwrite a bus hotplug
handler by the machine hotplug handler, to be able to perform
some part of the plug/unplug and to forward the calls to the bus hotplug
handler.
For now, the bus hotplug handler would trigger an object_unparent(), not
allowing us to perform some unplug action on a device after we forwarded
the call to the bus hotplug handler. The device would be gone at that
point.
machine_unplug_handler(dev)
/* eventually do unplug stuff */
bus_unplug_handler(dev)
/* dev is gone, we can't do more unplug stuff */
So move the object_unparent() to the original caller of the unplug. For
now, keep the unrealize() at the original places of the
object_unparent(). For implicitly chained hotplug handlers (e.g. pc
code calling acpi hotplug handlers), the object_unparent() has to be
done by the outermost caller. So when calling hotplug_handler_unplug()
from inside an unplug handler, nothing is to be done.
hotplug_handler_unplug(dev) -> calls machine_unplug_handler()
machine_unplug_handler(dev) {
/* eventually do unplug stuff */
bus_unplug_handler(dev) -> calls unrealize(dev)
/* we can do more unplug stuff but device already unrealized */
}
object_unparent(dev)
In the long run, every unplug action should be factored out of the
unrealize() function into the unplug handler (especially for PCI). Then
we can get rid of the additonal unrealize() calls and object_unparent()
will properly unrealize the device hierarchy after the device has been
unplugged.
hotplug_handler_unplug(dev) -> calls machine_unplug_handler()
machine_unplug_handler(dev) {
/* eventually do unplug stuff */
bus_unplug_handler(dev) -> only unplugs, does not unrealize
/* we can do more unplug stuff */
}
object_unparent(dev) -> will unrealize
The original approach was suggested by Igor Mammedov for the PCI
part, but I extended it to all hotplug handlers. I consider this one
step into the right direction.
To summarize:
- object_unparent() on synchronous unplugs is done by common code
-- "Caller of hotplug_handler_unplug"
- object_unparent() on asynchronous unplugs ("unplug requests") has to
be done manually
-- "Caller of hotplug_handler_unplug"
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190228122849.4296-2-david@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Function pc_acpi_init() is not used anymore.
Remove the definition and declaration.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20190214084939.20640-2-richardw.yang@linux.intel.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Let's avoid manually looking up the hotplug handler class. Use the
existing wrappers instead.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20181212095707.19358-1-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since linux commit: cf8fa920cb42 ("i386: handle an initrd in highmem (version 2)")
linux has supported initrd up to 4 GB, but the header field
ramdisk_max is still set to 2 GB to avoid "possible bootloader bugs".
When use '-kernel vmlinux -initrd initrd.cgz' to launch a VM,
the firmware(it could be linuxboot_dma.bin) helps to read initrd
contents into guest memory(below ramdisk_max) and jump to kernel.
that's similar with what bootloader does, like grub.
In addition, initrd_max is uint32_t simply because QEMU doesn't support
the 64-bit boot protocol (specifically the ext_ramdisk_image field).
Therefore here just limit initrd_max to UINT32_MAX simply as well to
allow initrd to be loaded below 4 GB.
NOTE: it's possible that linux protocol within [0x208, 0x20c]
supports up to 4 GB initrd as well.
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Eduardo Habkost <ehabkost@redhat.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
it's from v4.20-rc5.
CC: Stefano Garzarella <sgarzare@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In order to avoid migration issues, we enable PVH only for
machine type >= 4.0
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use pvh.bin option rom when we are booting an uncompressed
kernel using the x86/HVM direct boot ABI.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Based-on: <1547554687-12687-1-git-send-email-liam.merwick@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When initrd is specified, load and expose it to the guest firmware
through fw_cfg. The firmware will fill the hvm_start_info for the
kernel.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Based-on: <1545422632-24444-5-git-send-email-liam.merwick@oracle.com>
Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These changes (along with corresponding Linux kernel and qboot changes)
enable a guest to be booted using the x86/HVM direct boot ABI.
This commit adds a load_elfboot() routine to pass the size and
location of the kernel entry point to qboot (which will fill in
the start_info struct information needed to to boot the guest).
Having loaded the ELF binary, load_linux() will run qboot
which continues the boot.
The address for the kernel entry point is read from an ELF Note
in the uncompressed kernel binary by a helper routine passed
to load_elf().
Co-developed-by: George Kennedy <George.Kennedy@oracle.com>
Signed-off-by: George Kennedy <George.Kennedy@oracle.com>
Signed-off-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Modern AMD CPUs support NPT and NRIPSAVE features and KVM exposes these
when present. NRIPSAVE apeared somewhere in Opteron_G3 lifetime (e.g.
QuadCore AMD Opteron 2378 has is but QuadCore AMD Opteron HE 2344 doesn't),
NPT was introduced a bit earlier.
Add the FEAT_SVM leaf to Opteron_G4/G5 and EPYC/EPYC-IBPB cpu models.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20190121155051.5628-1-vkuznets@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Update the stepping from 5 to 6, in order that
the Cascadelake-Server CPU model can support AVX512VNNI
and MSR based features exposed by ARCH_CAPABILITIES.
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Message-Id: <20181227024304.12182-2-tao3.xu@intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
MPX support is being phased out by Intel; GCC has dropped it, Linux
is also going to do that. Even though KVM will have special code
to support MPX after the kernel proper stops enabling it in XCR0,
we probably also want to deprecate that in a few years. As a start,
do not enable it by default for any named CPU model starting with
the 4.0 machine types; this include Skylake, Icelake and Cascadelake.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20181220121100.21554-1-pbonzini@redhat.com>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The missing functionality was added ~3 years ago with the Linux commit
46896c73c1a4 ("KVM: svm: add support for RDTSCP")
so reenable RDTSCP support on those CPU models.
Opteron_G2 - being family 15, model 6, doesn't have RDTSCP support
(the real hardware doesn't have it. K8 got RDTSCP support with the NPT
models, i.e., models >= 0x40).
Document the host's minimum required kernel version, while at it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Message-ID: <20181212200803.GG6653@zn.tnic>
[ehabkost: moved compat properties code to pc.c]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Instead of verbose arrays with 4 lines for each entry, make each
entry take only one line. This makes long arrays that couldn't
fit in the screen become short and readable.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190107193020.21744-4-ehabkost@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
stringify() is useful when we need to use macros in compat_props
(like when we set virtio-baloon-pci.class=PCI_CLASS_MEMORY_RAM at
pc_i440fx_1_0_machine_options()), but it is pointless when we are
already providing a number literal.
Replace stringify() with string literals when appropriate.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190107193020.21744-3-ehabkost@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Make them more QOMConventional.
Cc:qemu-trivial@nongnu.org
Signed-off-by: Li Qiang <liq3ea@163.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20190105023831.66910-1-liq3ea@163.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Use static arrays instead. I decided to rename the conflicting
pc_compat_2_1() function with pc_compat_2_1_fn().
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Use static arrays instead. I decided to rename the conflicting
pc_compat_2_1() function with pc_compat_2_1_fn().
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Use static arrays instead. I decided to rename the conflicting
pc_compat_2_2() function with pc_compat_2_2_fn().
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Use static arrays instead. I decided to rename the conflicting
pc_compat_2_3() function with pc_compat_2_3_fn().
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Switch the intr_supported variable from a boolean to OnOffAuto type so
that we can know whether the user specified it or not. With that
we'll have a chance to help the user to choose more wisely where
possible. Introduce x86_iommu_ir_supported() to mask these changes.
No functional change at all.
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
SMBIOS is just another firmware interface used by some QEMU models.
We will later introduce more firmware interfaces in this subdirectory.
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The load_image() function is deprecated, as it does not let the
caller specify how large the buffer to read the file into is.
Use the glib g_file_get_contents() function instead, which does
the whole "allocate memory for the file and read it in" operation.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20181130151712.2312-6-peter.maydell@linaro.org
This makes their function more clear and prevents conflicts when adding
the actual devices to the machine state, if necessary.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20181107152434.22219-1-minyard@acm.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
We're plugging/unplugging a PCDIMMDevice, so directly pass this type
instead of a more generic DeviceState.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20181005092024.14344-5-david@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Previously, if the size of initrd >=2G, qemu exits with error:
root@haswell-OptiPlex-9020:/home/lizj# /home/lizhijian/lkp/qemu-colo/x86_64-softmmu/qemu-system-x86_64 -kernel ./vmlinuz-4.16.0-rc4 -initrd large.cgz -nographic
qemu: error reading initrd large.cgz: No such file or directory
root@haswell-OptiPlex-9020:/home/lizj# du -sh large.cgz
2.5G large.cgz
this patch changes the caller side that use this function to calculate
size of initrd file as well.
v2: update error message and int64_t printing format
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Message-Id: <1536833233-14121-1-git-send-email-lizhijian@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We can assign and verify the address before realizing and trying to plug.
reading/writing the address property should never fail for DIMMs, so let's
reduce error handling a bit by using &error_abort. Getting access to the
memory region now might however fail. So forward errors from
get_memory_region() properly.
As all memory devices should use the alignment of the underlying memory
region for guest physical address asignment, do detection of the
alignment in pc_dimm_pre_plug(), but allow pc.c to overwrite the
alignment for compatibility handling.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180801133444.11269-5-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
All applicable memory regions always have an alignment > 0. All memory
backends result in file_ram_alloc() or qemu_anon_ram_alloc() getting
called, setting the alignment to > 0.
So a PCDIMM memory region always has an alignment > 0. NVDIMM copy the
alignment of the original memory memory region into the handcrafted memory
region that will be used at this place.
So the check for 0 can be dropped and we can reduce the special
handling.
Dropping this check makes factoring out of alignment handling easier as
compat handling only has to look at pcmc->enforce_aligned_dimm and not
care about the alignment of the memory region.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180801133444.11269-4-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We can assign and verify the slot before realizing and trying to plug.
reading/writing the slot property should never fail, so let's reduce
error handling a bit by using &error_abort.
To do this during pre_plug, add and use (x86, ppc) pc_dimm_pre_plug().
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180801133444.11269-2-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hyper-V identifies vCPUs by Virtual Processor (VP) index which can be
queried by the guest via HV_X64_MSR_VP_INDEX msr. It is defined by the
spec as a sequential number which can't exceed the maximum number of
vCPUs per VM.
It has to be owned by QEMU in order to preserve it across migration.
However, the initial implementation in KVM didn't allow to set this
msr, and KVM used its own notion of VP index. Fortunately, the way
vCPUs are created in QEMU/KVM makes it likely that the KVM value is
equal to QEMU cpu_index.
So choose cpu_index as the value for vp_index, and push that to KVM on
kernels that support setting the msr. On older ones that don't, query
the kernel value and assert that it's in sync with QEMU.
Besides, since handling errors from vCPU init at hotplug time is
impossible, disable vCPU hotplug.
This patch also introduces accessor functions to encapsulate the mapping
between a vCPU and its vp_index.
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Message-Id: <20180702134156.13404-3-rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It eases code review, unit is explicit.
Patch generated using:
$ git grep -E '[<>][<>]=? ?[1-5]0' hw/ include/hw/
and modified manually.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Let's try to reduce error handling a bit. In the plug/unplug case, the
device was realized and therefore we can assume that getting access to
the memory region will not fail.
For get_vmstate_memory_region() this is already handled that way.
Document both cases.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180619134141.29478-13-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We can perform these checks before the device is actually realized.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180619134141.29478-6-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Let's rename it to make it look more consistent.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180619134141.29478-4-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use a similar naming scheme as spapr. This way, we can go ahead and
rename e.g. pc_dimm_memory_plug to pc_dimm_plug, which avoids
confusion.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180619134141.29478-3-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>