In the past, QEMU would create memory regions that could partially cover
hugetlb pages, making mmap() fail if we would use the mmap_offset as an
fd_offset. For that reason, we never used the mmap_offset as an offset into
the fd and instead always mapped the fd from the very start.
However, that can easily result in us mmap'ing a lot of unnecessary
parts of an fd, possibly repeatedly.
QEMU nowadays does not create memory regions that partially cover huge
pages -- it never really worked with postcopy. QEMU handles merging of
regions that partially cover huge pages (due to holes in boot memory) since
2018 in c1ece84e7c ("vhost: Huge page align and merge").
Let's be a bit careful and not unconditionally convert the
mmap_offset into an fd_offset. Instead, let's simply detect the hugetlb
size and pass as much as we can as fd_offset, making sure that we call
mmap() with a properly aligned offset.
With QEMU and a virtio-mem device that is fully plugged (50GiB using 50
memslots) the qemu-storage daemon process consumes in the VA space
1281GiB before this change and 58GiB after this change.
================ Vhost user message ================
Request: VHOST_USER_ADD_MEM_REG (37)
Flags: 0x9
Size: 40
Fds: 59
Adding region 4
guest_phys_addr: 0x0000000200000000
memory_size: 0x0000000040000000
userspace_addr: 0x00007fb73bffe000
old mmap_offset: 0x0000000080000000
fd_offset: 0x0000000080000000
new mmap_offset: 0x0000000000000000
mmap_addr: 0x00007f02f1bdc000
Successfully added new region
================ Vhost user message ================
Request: VHOST_USER_ADD_MEM_REG (37)
Flags: 0x9
Size: 40
Fds: 59
Adding region 5
guest_phys_addr: 0x0000000240000000
memory_size: 0x0000000040000000
userspace_addr: 0x00007fb77bffe000
old mmap_offset: 0x00000000c0000000
fd_offset: 0x00000000c0000000
new mmap_offset: 0x0000000000000000
mmap_addr: 0x00007f0284000000
Successfully added new region
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20240214151701.29906-12-david@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>