qemu/hw
Maciej S. Szmigiero 0d9e8c0b67 Add Hyper-V Dynamic Memory Protocol driver (hv-balloon) base
This driver is like virtio-balloon on steroids: it allows both changing the
guest memory allocation via ballooning and (in the next patch) inserting
pieces of extra RAM into it on demand from a provided memory backend.

The actual resizing is done via ballooning interface (for example, via
the "balloon" HMP command).
This includes resizing the guest past its boot size - that is, hot-adding
additional memory in granularity limited only by the guest alignment
requirements, as provided by the next patch.

In contrast with ACPI DIMM hotplug where one can only request to unplug a
whole DIMM stick this driver allows removing memory from guest in single
page (4k) units via ballooning.

After a VM reboot the guest is back to its original (boot) size.

In the future, the guest boot memory size might be changed on reboot
instead, taking into account the effective size that VM had before that
reboot (much like Hyper-V does).

For performance reasons, the guest-released memory is tracked in a few
range trees, as a series of (start, count) ranges.
Each time a new page range is inserted into such tree its neighbors are
checked as candidates for possible merging with it.

Besides performance reasons, the Dynamic Memory protocol itself uses page
ranges as the data structure in its messages, so relevant pages need to be
merged into such ranges anyway.

One has to be careful when tracking the guest-released pages, since the
guest can maliciously report returning pages outside its current address
space, which later clash with the address range of newly added memory.
Similarly, the guest can report freeing the same page twice.

The above design results in much better ballooning performance than when
using virtio-balloon with the same guest: 230 GB / minute with this driver
versus 70 GB / minute with virtio-balloon.

During a ballooning operation most of time is spent waiting for the guest
to come up with newly freed page ranges, processing the received ranges on
the host side (in QEMU and KVM) is nearly instantaneous.

The unballoon operation is also pretty much instantaneous:
thanks to the merging of the ballooned out page ranges 200 GB of memory can
be returned to the guest in about 1 second.
With virtio-balloon this operation takes about 2.5 minutes.

These tests were done against a Windows Server 2019 guest running on a
Xeon E5-2699, after dirtying the whole memory inside guest before each
balloon operation.

Using a range tree instead of a bitmap to track the removed memory also
means that the solution scales well with the guest size: even a 1 TB range
takes just a few bytes of such metadata.

Since the required GTree operations aren't present in every Glib version
a check for them was added to the meson build script, together with new
"--enable-hv-balloon" and "--disable-hv-balloon" configure arguments.
If these GTree operations are missing in the system's Glib version this
driver will be skipped during QEMU build.

An optional "status-report=on" device parameter requests memory status
events from the guest (typically sent every second), which allow the host
to learn both the guest memory available and the guest memory in use
counts.

Following commits will add support for their external emission as
"HV_BALLOON_STATUS_REPORT" QMP events.

The driver is named hv-balloon since the Linux kernel client driver for
the Dynamic Memory Protocol is named as such and to follow the naming
pattern established by the virtio-balloon driver.
The whole protocol runs over Hyper-V VMBus.

The driver was tested against Windows Server 2012 R2, Windows Server 2016
and Windows Server 2019 guests and obeys the guest alignment requirements
reported to the host via DM_CAPABILITIES_REPORT message.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
2023-11-06 14:08:10 +01:00
..
9pfs migration: simplify blockers 2023-10-20 08:51:41 +02:00
acpi virtio,pc,pci: features, cleanups 2023-10-23 14:45:29 -07:00
adc meson: Replace softmmu_ss -> system_ss 2023-06-20 10:01:30 +02:00
alpha hw/alpha: Use MachineClass->default_nic in the alpha machine 2023-05-26 09:10:49 +02:00
arm hw/arm: xlnx-versal-virt: Add AMD/Xilinx TRNG device 2023-11-02 14:42:03 +00:00
audio hw/audio/es1370: trace lost interrupts 2023-10-10 12:31:05 +00:00
avr
block virtio-blk: remove batch notification BH 2023-10-31 15:42:17 +01:00
char target-arm queue: 2023-11-03 10:04:12 +08:00
core target-arm queue: 2023-11-03 10:04:12 +08:00
cpu hw/other: spelling fixes 2023-09-21 11:31:16 +03:00
cris
cxl hw/cxl: Fix local variable shadowing of cap_hdrs 2023-10-06 10:56:54 +02:00
display target-arm queue: 2023-11-03 10:04:12 +08:00
dma hw/dma: Declare link using static DEFINE_PROP_LINK() macro 2023-10-19 23:13:28 +02:00
gpio hw/gpio/nrf51: implement DETECT signal 2023-08-22 17:30:59 +01:00
hppa hw/hppa: Add new HP C3700 machine 2023-10-20 00:47:38 +02:00
hyperv Add Hyper-V Dynamic Memory Protocol driver (hv-balloon) base 2023-11-06 14:08:10 +01:00
i2c target-arm queue: 2023-11-03 10:04:12 +08:00
i386 * Fix global variable shadowing in test code 2023-10-30 08:08:18 +09:00
ide migration: Use vmstate_register_any() for isa-ide 2023-11-01 16:13:58 +01:00
input target-arm queue: 2023-11-03 10:04:12 +08:00
intc Migration Pull request (20231102) 2023-11-03 09:57:32 +08:00
ipack meson: Replace softmmu_ss -> system_ss 2023-06-20 10:01:30 +02:00
ipmi hw/ipmi: Don't call vmstate_register() from instance_init() functions 2023-11-01 16:13:58 +01:00
isa virtio,pc,pci: features, cleanups 2023-10-23 14:45:29 -07:00
loongarch hw/acpi: Realize ACPI_GED sysbus device before accessing it 2023-10-19 23:13:28 +02:00
m68k m68k: Instantiate the ESP SCSI controller for the NeXTcube machine 2023-11-02 07:26:06 +01:00
mem memory-device: Drop size alignment check 2023-11-06 13:54:57 +01:00
microblaze hw/microblaze: Clean up local variable shadowing 2023-09-29 10:07:16 +02:00
mips virtio,pc,pci: features, cleanups 2023-10-23 14:45:29 -07:00
misc hw/misc: Introduce AMD/Xilix Versal TRNG device 2023-11-02 14:42:03 +00:00
net migration: Use vmstate_register_any() 2023-11-01 16:13:58 +01:00
nios2 hw/nios2: Clean up local variable shadowing 2023-09-29 10:07:16 +02:00
nubus trace-events: Fix the name of the tracing.rst file 2023-09-08 13:08:51 +03:00
nvme hw/nvme: Clean up local variable shadowing in nvme_ns_init() 2023-09-29 10:07:20 +02:00
nvram migration: Use vmstate_register_any() for eeprom93xx 2023-11-01 16:13:58 +01:00
openrisc
pci migration: Use vmstate_register_any() 2023-11-01 16:13:58 +01:00
pci-bridge hw/pci-bridge/cxl-upstream: Add serial number extended capability support 2023-10-04 18:15:06 -04:00
pci-host target/hppa: Add emulation of a C3700 HP-PARISC workstation 2023-10-20 06:46:26 -07:00
pcmcia hw/pcmcia/pxa2xx: Inline pxa2xx_pcmcia_init() 2023-10-27 12:48:57 +01:00
ppc migration: Hack to maintain backwards compatibility for ppc 2023-11-01 16:13:58 +01:00
rdma hw/rdma/vmw/pvrdma_cmd: Use correct struct in query_port() 2023-10-21 15:00:22 +03:00
remote migration: simplify blockers 2023-10-20 08:51:41 +02:00
riscv target/riscv: move KVM only files to kvm subdir 2023-10-12 12:20:24 +10:00
rtc hw/other: spelling fixes 2023-09-21 11:31:16 +03:00
rx hw/other: spelling fixes 2023-09-21 11:31:16 +03:00
s390x hw/s390x/s390-stattrib: Don't call register_savevm_live() during instance_init() 2023-11-01 16:13:58 +01:00
scsi cpr: relax vhost migration blockers 2023-11-01 16:13:59 +01:00
sd hw/sd/pxa2xx: Do not open-code sysbus_create_simple() 2023-10-27 12:48:57 +01:00
sensor hw/i2c: spelling fixes 2023-08-31 19:47:43 +02:00
sh4 hw/other: spelling fixes 2023-09-21 11:31:16 +03:00
smbios hw/acpi: changes towards enabling -Wshadow=local 2023-09-29 10:07:18 +02:00
sparc other architectures: spelling fixes 2023-07-25 17:14:07 +03:00
sparc64 hw/sparc64/ebus: Access memory regions via pci_address_space_io() 2023-10-19 23:13:28 +02:00
ssi hw/other: spelling fixes 2023-09-21 11:31:16 +03:00
timer migration: Use vmstate_register_any() 2023-11-01 16:13:58 +01:00
tpm hw/tpm: spelling fixes 2023-09-20 07:54:34 +03:00
tricore hw/tricore: Log failing test in testdevice 2023-09-29 08:28:02 +02:00
ufs hw/ufs: Modify lu.c to share codes with SCSI subsystem 2023-10-30 10:28:04 +09:00
usb hw/usb: Silence compiler warnings in USB code when compiling with -Wshadow 2023-10-06 13:27:48 +02:00
vfio migration: simplify notifiers 2023-10-20 08:51:41 +02:00
virtio Revert "hw/virtio/virtio-pmem: Replace impossible check by assertion" 2023-11-06 13:53:59 +01:00
watchdog hw/watchdog/wdt_imx2: Trace timer activity 2023-11-02 13:36:45 +00:00
xen hw/xen: cleanup sourcesets 2023-10-18 10:01:01 +02:00
xenpv
xtensa trivial: Simplify the spots that use TARGET_BIG_ENDIAN as a numeric value 2023-09-08 13:08:52 +03:00
Kconfig hw/ufs: Initial commit for emulated Universal-Flash-Storage 2023-09-07 14:01:29 -04:00
meson.build hw/ufs: Initial commit for emulated Universal-Flash-Storage 2023-09-07 14:01:29 -04:00