qemu/hw
David Hildenbrand d71920d425 virtio-mem: Proper support for preallocation with migration
Ordinary memory preallocation runs when QEMU starts up and creates the
memory backends, before processing the incoming migration stream. With
virtio-mem, we don't know which memory blocks to preallocate before
migration started. Now that we migrate the virtio-mem bitmap early, before
migrating any RAM content, we can safely preallocate memory for all plugged
memory blocks before migrating any RAM content.

This is especially relevant for the following cases:

(1) User errors

With hugetlb/files, if we don't have sufficient backend memory available on
the migration destination, we'll crash QEMU (SIGBUS) during RAM migration
when running out of backend memory. Preallocating memory before actual
RAM migration allows for failing gracefully and informing the user about
the setup problem.

(2) Excluded memory ranges during migration

For example, virtio-balloon free page hinting will exclude some pages
from getting migrated. In that case, we won't crash during RAM
migration, but later, when running the VM on the destination, which is
bad.

To fix this for new QEMU machines that migrate the bitmap early,
preallocate the memory early, before any RAM migration. Warn with old
QEMU machines.

Getting postcopy right is a bit tricky, but we essentially now implement
the same (problematic) preallocation logic as ordinary preallocation:
preallocate memory early and discard it again before precopy starts. During
ordinary preallocation, discarding of RAM happens when postcopy is advised.
As the state (bitmap) is loaded after postcopy was advised but before
postcopy starts listening, we have to discard memory we preallocated
immediately again ourselves.

Note that nothing (not even hugetlb reservations) guarantees for postcopy
that backend memory (especially, hugetlb pages) are still free after they
were freed ones while discarding RAM. Still, allocating that memory at
least once helps catching some basic setup problems.

Before this change, trying to restore a VM when insufficient hugetlb
pages are around results in the process crashing to to a "Bus error"
(SIGBUS). With this change, QEMU fails gracefully:

  qemu-system-x86_64: qemu_prealloc_mem: preallocating memory failed: Bad address
  qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-mem-device-early'
  qemu-system-x86_64: load of migration failed: Cannot allocate memory

And we can even introspect the early migration data, including the
bitmap:
  $ ./scripts/analyze-migration.py -f STATEFILE
  {
  "ram (2)": {
      "section sizes": {
          "0000:00:03.0/mem0": "0x0000000780000000",
          "0000:00:04.0/mem1": "0x0000000780000000",
          "pc.ram": "0x0000000100000000",
          "/rom@etc/acpi/tables": "0x0000000000020000",
          "pc.bios": "0x0000000000040000",
          "0000:00:02.0/e1000.rom": "0x0000000000040000",
          "pc.rom": "0x0000000000020000",
          "/rom@etc/table-loader": "0x0000000000001000",
          "/rom@etc/acpi/rsdp": "0x0000000000001000"
      }
  },
  "0000:00:03.0/virtio-mem-device-early (51)": {
      "tmp": "00 00 00 01 40 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00",
      "size": "0x0000000040000000",
      "bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...]
  },
  "0000:00:04.0/virtio-mem-device-early (53)": {
      "tmp": "00 00 00 08 c0 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00",
      "size": "0x00000001fa400000",
      "bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...]
  },
  [...]

Reported-by: Jing Qi <jinqi@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06 19:22:56 +01:00
..
9pfs coroutine: Split qemu/coroutine-core.h off qemu/coroutine.h 2023-01-20 07:21:46 +01:00
acpi acpi: Move the QMP command from monitor/ to hw/acpi/ 2023-02-04 07:56:54 +01:00
adc hw/adc: Make adci[*] R/W in NPCM7XX ADC 2022-07-18 13:20:14 +01:00
alpha include/hw/pci: Break inclusion loop pci_bridge.h and cxl.h 2023-01-08 01:54:22 -05:00
arm sbsa-ref: remove cortex-a76 from list of supported cpus 2023-02-03 12:59:22 +00:00
audio include/hw/pci: Split pci_device.h off pci.h 2023-01-08 01:54:22 -05:00
avr Remove qemu-common.h include from most units 2022-04-06 14:31:55 +02:00
block pflash: Only read non-zero parts of backend image 2023-01-24 18:26:41 +01:00
char hw/char/pl011: better handling of FIFO flags on LCR reset 2023-02-03 12:59:22 +00:00
core virtio-mem: Migrate immutable properties early 2023-02-06 19:22:56 +01:00
cpu hw/cpu: Mark arm11 and realview mpcore as target-independent code 2023-01-16 17:51:20 +01:00
cris Do not include exec/address-spaces.h if it's not really necessary 2021-05-02 17:24:51 +02:00
cxl hw/cxl/cxl-host: Fix an error message typo 2023-01-17 10:02:37 +01:00
display hw/display/sm501: Code style fix 2023-02-05 06:40:28 -03:00
dma bulk: Rename TARGET_FMT_plx -> HWADDR_FMT_plx 2023-01-18 11:14:34 +01:00
gpio hw/gpio/omap_gpio: Use CamelCase for TYPE_OMAP2_GPIO type name 2023-01-12 17:15:09 +00:00
hppa hw: Remove unused MAX_IDE_BUS define 2022-10-31 11:32:07 +01:00
hyperv hw/hyperv/vmbus: Use device_cold_reset() and bus_cold_reset() 2022-12-16 15:55:32 +00:00
i2c hw/isa/isa-bus: Turn isa_build_aml() into qbus_build_aml() 2023-01-27 11:47:02 -05:00
i386 pcihp: generate populated non-hotpluggble slot descriptions on non-hotplug path 2023-01-28 06:21:30 -05:00
ide virtio,pc,pci: features, cleanups, fixes 2023-01-09 10:07:12 +00:00
input hw/input/tsc2xxx: Constify set_transform()'s MouseTransformInfo arg 2023-01-05 14:11:15 +00:00
intc target/arm: Mark up sysregs for HFGRTR bits 36..63 2023-02-03 12:59:23 +00:00
ipack include/hw/pci: Split pci_device.h off pci.h 2023-01-08 01:54:22 -05:00
ipmi include/hw/pci: Split pci_device.h off pci.h 2023-01-08 01:54:22 -05:00
isa hw/isa/isa-bus: Turn isa_build_aml() into qbus_build_aml() 2023-01-27 11:47:02 -05:00
loongarch hw/intc/loongarch_pch: Change default irq number of pch irq controller 2023-01-06 14:12:43 +08:00
m68k hw: Add compat machines for 8.0 2022-12-21 06:35:28 -05:00
mem hw/cxl/device: Add Flex Bus Port DVSEC 2022-12-21 07:32:24 -05:00
microblaze hw/microblaze: pass random seed to fdt 2022-09-21 19:59:56 +02:00
mips hw/mips/boston: Rename MachineState 'mc' pointer to 'ms' 2023-01-13 16:22:57 +01:00
misc hw/misc/sifive_u_otp: Remove the deprecated OTP config with '-drive if=none' 2023-01-26 13:25:07 +01:00
net rocker: Move HMP commands from monitor to hw/net/rocker/ 2023-02-04 07:56:54 +01:00
nios2 hw/nios2: set machine->fdt in nios2_load_dtb() 2022-10-17 16:15:10 -03:00
nubus qbus: Rename qbus_create_inplace() to qbus_init() 2021-09-30 13:42:10 +01:00
nvme hw/nvme updates 2023-01-11 16:41:13 +00:00
nvram x86: don't let decompressed kernel image clobber setup_data 2023-01-28 06:21:29 -05:00
openrisc openrisc: re-randomize rng-seed on reboot 2022-10-27 11:34:31 +01:00
pci pci: make sure pci_bus_is_express() won't error out with "discards ‘const’ qualifier" 2023-01-28 06:21:29 -05:00
pci-bridge pci: acpi hotplug: rename x-native-hotplug to x-do-not-expose-native-hotplug-cap 2023-01-28 06:21:29 -05:00
pci-host ppc/pnv/pci: Fix PHB xscom registers memory region name 2023-02-05 06:40:28 -03:00
pcmcia hw/pcmcia: Do not register PCMCIA type if not required 2021-05-02 17:24:50 +02:00
ppc hw/ppc/pegasos2: Fix a typo in a comment 2023-02-05 06:40:28 -03:00
rdma hw/pvrdma: Protect against buggy or malicious guest driver 2023-01-16 18:49:38 +01:00
remote hw/pci/pci: Factor out pci_bus_map_irqs() from pci_bus_irqs() 2023-01-13 16:22:57 +01:00
riscv hw/riscv/virt.c: move create_fw_cfg() back to virt_machine_init() 2023-01-20 10:14:14 +10:00
rtc bulk: Rename TARGET_FMT_plx -> HWADDR_FMT_plx 2023-01-18 11:14:34 +01:00
rx rx: re-randomize rng-seed on reboot 2022-10-27 11:34:31 +01:00
s390x migration: Remove unused threshold_size parameter 2023-02-06 19:22:56 +01:00
scsi block: Convert bdrv_refresh_total_sectors() to co_wrapper_mixed 2023-02-01 16:52:32 +01:00
sd hw/arm/omap: Drop useless casts from void * to pointer 2023-01-12 17:15:09 +00:00
sensor hw/sensor: Add Renesas ISL69259 device model 2022-07-14 16:24:38 +02:00
sh4 bulk: Rename TARGET_FMT_plx -> HWADDR_FMT_plx 2023-01-18 11:14:34 +01:00
smbios include/hw/pci: Split pci_device.h off pci.h 2023-01-08 01:54:22 -05:00
sparc machine: make memory-backend a link property 2022-05-12 12:29:44 +02:00
sparc64 hw/sparc64/niagara: Use blk_name() instead of open-coding it 2023-01-20 07:25:01 +01:00
ssi trivial branch pull request 20230118 2023-01-19 15:05:29 +00:00
timer bulk: Rename TARGET_FMT_plx -> HWADDR_FMT_plx 2023-01-18 11:14:34 +01:00
tpm hw/tpm: Move tpm_ppi.c out of target-specific source set 2023-01-16 17:51:20 +01:00
tricore hw/tricore: fix inclusion of tricore_testboard 2021-07-20 20:10:21 +02:00
usb ccid-card-emulated: fix cast warning/error 2023-01-16 18:46:03 +01:00
vfio migration: Remove unused threshold_size parameter 2023-02-06 19:22:56 +01:00
virtio virtio-mem: Proper support for preallocation with migration 2023-02-06 19:22:56 +01:00
watchdog include/hw/pci: Split pci_device.h off pci.h 2023-01-08 01:54:22 -05:00
xen bulk: Rename TARGET_FMT_plx -> HWADDR_FMT_plx 2023-01-18 11:14:34 +01:00
xenpv Warn user if the vga flag is passed but no vga device is created 2022-05-09 08:21:14 +02:00
xtensa hw/xtensa: fix reset value of MIROUT register of MX PIC 2022-05-06 15:27:40 -07:00
Kconfig hw/loongarch: Add support loongson3 virt machine type. 2022-06-06 18:09:03 +00:00
meson.build hw/loongarch: Add support loongson3 virt machine type. 2022-06-06 18:09:03 +00:00