qemu/hw
David Gibson ed48c59875 virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size
The virtio-balloon always works in units of 4kiB (BALLOON_PAGE_SIZE), but
we can only actually discard memory in units of the host page size.

Now, we handle this very badly: we silently ignore balloon requests that
aren't host page aligned, and for requests that are host page aligned we
discard the entire host page.  The latter can corrupt guest memory if its
page size is smaller than the host's.

The obvious choice would be to disable the balloon if the host page size is
not 4kiB.  However, that would break the special case where host and guest
have the same page size, but that's larger than 4kiB.  That case currently
works by accident[1] - and is used in practice on many production POWER
systems where 64kiB has long been the Linux default page size on both host
and guest.

To make the balloon safe, without breaking that useful special case, we
need to accumulate 4kiB balloon requests until we have a whole contiguous
host page to discard.

We could in principle do that across all guest memory, but it would require
a large bitmap to track.  This patch represents a compromise: we track
ballooned subpages for a single contiguous host page at a time.  This means
that if the guest discards all 4kiB chunks of a host page in succession,
we will discard it.  This is the expected behaviour in the (host page) ==
(guest page) != 4kiB case we want to support.

If the guest scatters 4kiB requests across different host pages, we don't
discard anything, and issue a warning.  Not ideal, but at least we don't
corrupt guest memory as the previous version could.

Warning reporting is kind of a compromise here.  Determining whether we're
in a problematic state at realize() time is tricky, because we'd have to
look at the host pagesizes of all memory backends, but we can't really know
if some of those backends could be for special purpose memory that's not
subject to ballooning.

Reporting only when the guest tries to balloon a partial page also isn't
great because if the guest page size happens to line up it won't indicate
that we're in a non ideal situation.  It could also cause alarming repeated
warnings whenever a migration is attempted.

So, what we do is warn the first time the guest attempts balloon a partial
host page, whether or not it will end up ballooning the rest of the page
immediately afterwards.

[1] Because when the guest attempts to balloon a page, it will submit
    requests for each 4kiB subpage.  Most will be ignored, but the one
    which happens to be host page aligned will discard the whole lot.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20190214043916.22128-6-david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-02-22 10:51:31 -05:00
..
9pfs xen: re-name XenDevice to XenLegacyDevice... 2019-01-14 13:45:40 +00:00
acpi qdev: pass an Object * to qbus_set_hotplug_handler() 2019-02-17 21:54:02 +11:00
adc
alpha hw/alpha/Makefile.objs: Create CONFIG_* for alpha 2019-02-05 16:50:20 +01:00
arm hw/arm/armsse: Fix miswiring of expansion IRQs 2019-02-15 09:56:39 +00:00
audio audio: fix pc speaker init 2019-01-24 13:10:19 +01:00
block virtio-blk: set correct config size for the host driver 2019-02-13 16:18:17 +08:00
bt char: allow specifying a GMainContext at opening time 2019-02-13 14:23:39 +01:00
char qdev: pass an Object * to qbus_set_hotplug_handler() 2019-02-17 21:54:02 +11:00
core qdev: pass an Object * to qbus_set_hotplug_handler() 2019-02-17 21:54:02 +11:00
cpu hw/cpu/cluster: Mark the cpu-cluster device with user_creatable = false 2019-02-06 15:55:56 +01:00
cris hw/cris/Makefile.objs: Create CONFIG_* for cris 2019-02-05 16:50:20 +01:00
display hw/display/milkymist-tmu2: Move inlined code from header to source 2019-02-01 11:58:50 +01:00
dma hw/dma/i8257: Use qemu_log_mask(UNIMP) instead of fprintf 2019-02-14 11:46:30 +01:00
gpio trace: enforce that every trace-events file has a final newline 2019-01-24 14:16:56 +00:00
hppa hw/hppa: forward requests to CPU HPA 2019-02-12 08:59:21 -08:00
hyperv
i2c hw/i2c/Makefile.objs: Create new CONFIG_* variables for EEPROM and ACPI controller 2019-02-05 16:50:21 +01:00
i386 * cpu-exec fixes (Emilio, Laurent) 2019-02-05 19:39:22 +00:00
ide ide: split ioport registration to a separate file 2019-02-05 16:50:19 +01:00
input pckbd: Convert DPRINTF->trace 2019-02-14 11:46:30 +01:00
intc xics: Drop the KVM ICS class 2019-02-18 10:52:08 +11:00
ipack
ipmi
isa char: allow specifying a GMainContext at opening time 2019-02-13 14:23:39 +01:00
lm32 hw/lm32/Makefile.objs: Conditionally build lm32 and milkmyst 2019-02-05 16:50:20 +01:00
m68k hw/m68k/Makefile.objs: Conditionally build boards 2019-02-05 16:50:19 +01:00
mem pc-dimm: use same mechanism for [get|set]_addr 2019-02-21 12:28:41 -05:00
microblaze hw/microblaze/Makefile.objs: Create configs for petalogix and xilinx boards 2019-02-05 16:50:19 +01:00
mips hw/mips_int: hold BQL for all interrupt requests 2019-02-14 17:47:28 +01:00
misc cuda: decrease time delay before raising VIA SR interrupt and remove fast path 2019-02-17 21:54:02 +11:00
moxie hw/moxie/Makefile.objs: Conditionally build moxie 2019-02-05 16:50:20 +01:00
net vhost-net: compile it on all targets that have virtio-net. 2019-02-21 12:28:01 -05:00
nios2 hw/nios2/Makefile.objs: Conditionally build nios2 2019-02-05 16:50:20 +01:00
nvram fw_cfg: fix the life cycle and the name of "qemu_extra_params_fw" 2019-02-05 10:58:33 -05:00
openrisc hw/openrisc/Makefile.objs: Create CONFIG_* for openrisc 2019-02-05 16:50:21 +01:00
pci qdev: pass an Object * to qbus_set_hotplug_handler() 2019-02-17 21:54:02 +11:00
pci-bridge pci/shpc: perform unplug via the hotplug handler 2018-12-20 11:19:12 -05:00
pci-host build: actually use CONFIG_PAM 2019-02-05 16:50:19 +01:00
pcmcia
ppc ppc patch queue 2019-02-19 2019-02-18 16:20:13 +00:00
rdma hw/rdma: modify struct initialization 2019-01-19 11:01:33 +02:00
riscv riscv: Ensure the kernel start address is correctly cast 2019-02-11 15:56:22 -08:00
s390x ppc patch queue 2019-02-19 2019-02-18 16:20:13 +00:00
scsi qdev: pass an Object * to qbus_set_hotplug_handler() 2019-02-17 21:54:02 +11:00
sd hw: sd: set category of the sd memory card 2019-01-30 10:24:20 +01:00
sh4 * cpu-exec fixes (Emilio, Laurent) 2019-02-05 19:39:22 +00:00
smbios hw/smbios: Move to the hw/firmware/ subdirectory 2018-12-19 16:48:16 -05:00
sparc qemu-sparc queue 2019-02-07 16:49:30 +00:00
sparc64 hw/sparc64: Explicitly set default_display = "std" 2019-02-14 11:46:30 +01:00
ssi aspeed/smc: snoop SPI transfers to fake dummy cycles 2019-01-29 11:46:05 +00:00
timer qapi: move RTC_CHANGE to the target schema 2019-02-18 14:44:05 +01:00
tpm tpm: clear RAM when "memory overwrite" requested 2019-01-17 21:10:57 -05:00
tricore hw/tricore/Makefile.objs: Create CONFIG_* for tricore 2019-02-05 16:50:21 +01:00
unicore32 hw/unicore32/puv3: Drop useless inclusion of "hw/i386/pc.h" 2019-02-06 15:54:12 +01:00
usb usb: remove unnecessary NULL device check from usb_ep_get() 2019-02-20 09:41:23 +01:00
vfio hw/vfio/Makefile.objs: Create new CONFIG_* variables for VFIO core and PCI 2019-02-05 16:50:21 +01:00
virtio virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size 2019-02-22 10:51:31 -05:00
watchdog hw/watchdog/wdt_i6300esb: remove a unnecessary comment 2019-01-11 15:46:55 +01:00
xen xen: fix xen-bus state model to allow frontend re-connection 2019-02-04 11:04:49 +00:00
xenpv xen: Replace few mentions of xend by libxl 2019-01-14 13:45:40 +00:00
xtensa hw/xtensa/Makefile.objs: Build xtensa_sim and xtensa_fpga conditionally 2019-02-05 16:50:20 +01:00
Makefile.objs hw/vfio/Makefile.objs: Create new CONFIG_* variables for VFIO core and PCI 2019-02-05 16:50:21 +01:00