qemu/hw/hyperv/hv-balloon-page_range_tree.h

119 lines
3.4 KiB
C
Raw Normal View History

Add Hyper-V Dynamic Memory Protocol driver (hv-balloon) base This driver is like virtio-balloon on steroids: it allows both changing the guest memory allocation via ballooning and (in the next patch) inserting pieces of extra RAM into it on demand from a provided memory backend. The actual resizing is done via ballooning interface (for example, via the "balloon" HMP command). This includes resizing the guest past its boot size - that is, hot-adding additional memory in granularity limited only by the guest alignment requirements, as provided by the next patch. In contrast with ACPI DIMM hotplug where one can only request to unplug a whole DIMM stick this driver allows removing memory from guest in single page (4k) units via ballooning. After a VM reboot the guest is back to its original (boot) size. In the future, the guest boot memory size might be changed on reboot instead, taking into account the effective size that VM had before that reboot (much like Hyper-V does). For performance reasons, the guest-released memory is tracked in a few range trees, as a series of (start, count) ranges. Each time a new page range is inserted into such tree its neighbors are checked as candidates for possible merging with it. Besides performance reasons, the Dynamic Memory protocol itself uses page ranges as the data structure in its messages, so relevant pages need to be merged into such ranges anyway. One has to be careful when tracking the guest-released pages, since the guest can maliciously report returning pages outside its current address space, which later clash with the address range of newly added memory. Similarly, the guest can report freeing the same page twice. The above design results in much better ballooning performance than when using virtio-balloon with the same guest: 230 GB / minute with this driver versus 70 GB / minute with virtio-balloon. During a ballooning operation most of time is spent waiting for the guest to come up with newly freed page ranges, processing the received ranges on the host side (in QEMU and KVM) is nearly instantaneous. The unballoon operation is also pretty much instantaneous: thanks to the merging of the ballooned out page ranges 200 GB of memory can be returned to the guest in about 1 second. With virtio-balloon this operation takes about 2.5 minutes. These tests were done against a Windows Server 2019 guest running on a Xeon E5-2699, after dirtying the whole memory inside guest before each balloon operation. Using a range tree instead of a bitmap to track the removed memory also means that the solution scales well with the guest size: even a 1 TB range takes just a few bytes of such metadata. Since the required GTree operations aren't present in every Glib version a check for them was added to the meson build script, together with new "--enable-hv-balloon" and "--disable-hv-balloon" configure arguments. If these GTree operations are missing in the system's Glib version this driver will be skipped during QEMU build. An optional "status-report=on" device parameter requests memory status events from the guest (typically sent every second), which allow the host to learn both the guest memory available and the guest memory in use counts. Following commits will add support for their external emission as "HV_BALLOON_STATUS_REPORT" QMP events. The driver is named hv-balloon since the Linux kernel client driver for the Dynamic Memory Protocol is named as such and to follow the naming pattern established by the virtio-balloon driver. The whole protocol runs over Hyper-V VMBus. The driver was tested against Windows Server 2012 R2, Windows Server 2016 and Windows Server 2019 guests and obeys the guest alignment requirements reported to the host via DM_CAPABILITIES_REPORT message. Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
2023-06-12 17:00:54 +03:00
/*
* QEMU Hyper-V Dynamic Memory Protocol driver
*
* Copyright (C) 2020-2023 Oracle and/or its affiliates.
*
* This work is licensed under the terms of the GNU GPL, version 2 or later.
* See the COPYING file in the top-level directory.
*/
#ifndef HW_HYPERV_HV_BALLOON_PAGE_RANGE_TREE_H
#define HW_HYPERV_HV_BALLOON_PAGE_RANGE_TREE_H
#include "qemu/osdep.h"
/* PageRange */
typedef struct PageRange {
uint64_t start;
uint64_t count;
} PageRange;
/* return just the part of range before (start) */
static inline void page_range_part_before(const PageRange *range,
uint64_t start, PageRange *out)
{
uint64_t endr = range->start + range->count;
uint64_t end = MIN(endr, start);
out->start = range->start;
if (end > out->start) {
out->count = end - out->start;
} else {
out->count = 0;
}
}
/* return just the part of range after (start, count) */
static inline void page_range_part_after(const PageRange *range,
uint64_t start, uint64_t count,
PageRange *out)
{
uint64_t end = range->start + range->count;
uint64_t ends = start + count;
out->start = MAX(range->start, ends);
if (end > out->start) {
out->count = end - out->start;
} else {
out->count = 0;
}
}
static inline void page_range_intersect(const PageRange *range,
uint64_t start, uint64_t count,
PageRange *out)
{
uint64_t end1 = range->start + range->count;
uint64_t end2 = start + count;
uint64_t end = MIN(end1, end2);
out->start = MAX(range->start, start);
out->count = out->start < end ? end - out->start : 0;
}
static inline uint64_t page_range_intersection_size(const PageRange *range,
uint64_t start, uint64_t count)
{
PageRange trange;
page_range_intersect(range, start, count, &trange);
return trange.count;
}
static inline bool page_range_joinable_left(const PageRange *range,
uint64_t start, uint64_t count)
{
return start + count == range->start;
}
static inline bool page_range_joinable_right(const PageRange *range,
uint64_t start, uint64_t count)
{
return range->start + range->count == start;
}
static inline bool page_range_joinable(const PageRange *range,
uint64_t start, uint64_t count)
{
return page_range_joinable_left(range, start, count) ||
page_range_joinable_right(range, start, count);
}
/* PageRangeTree */
/* type safety */
typedef struct PageRangeTree {
GTree *t;
} PageRangeTree;
static inline bool page_range_tree_is_empty(PageRangeTree tree)
{
guint nnodes = g_tree_nnodes(tree.t);
return nnodes == 0;
}
void hvb_page_range_tree_init(PageRangeTree *tree);
void hvb_page_range_tree_destroy(PageRangeTree *tree);
bool hvb_page_range_tree_intree_any(PageRangeTree tree,
uint64_t start, uint64_t count);
bool hvb_page_range_tree_pop(PageRangeTree tree, PageRange *out,
uint64_t maxcount);
void hvb_page_range_tree_insert(PageRangeTree tree,
uint64_t start, uint64_t count,
uint64_t *dupcount);
#endif