In the PC machine, the PIC is created in board code to allow it to be
virtualized with various virtualization techniques. So explicitly disable its
creation in the PC machine via a property which defaults to enabled. Once the
PIIX implementations are consolidated this default will keep Malta working
without further ado.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20231007123843.127151-21-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
PIIX4 has its own, private PIIX4State structure. PIIX3 has almost the
same structure, provided in a public header. So reuse it and add a
cpu_intr attribute to it which is only used by PIIX4.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20231007123843.127151-19-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
TYPE_PIIX3_PCI_DEVICE was the former base class of the Xen and non-Xen variants
of the PIIX3 ISA device models. It will become the base class for the PIIX3 and
PIIX4 device models, so drop the "3" from the type names.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20231007123843.127151-15-shentey@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The power management controller is an integral part of PIIX3 (function 3). So
create it as part of the south bridge.
Note that the ACPI function is optional in QEMU. This is why it gets
object_initialize_child()'ed in realize rather than in instance_init.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20231007123843.127151-14-shentey@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The USB controller is an integral part of PIIX3 (function 2). So create
it as part of the south bridge.
Note that the USB function is optional in QEMU. This is why it gets
object_initialize_child()'ed in realize rather than in instance_init.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20231007123843.127151-13-shentey@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The IDE controller is an integral part of PIIX3 (function 1). So create it as
part of the south bridge.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20231007123843.127151-12-shentey@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
TYPE_PIIX3_DEVICE doesn't instantiate a PIC since it relies on the board to do
so. The "pic" attribute, however, suggests that there is one. Rename the
attribute to reflect that it represents ISA interrupt lines. Use the same naming
convention as in the VIA south bridges as well as in TYPE_I82378.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20231007123843.127151-8-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Avoid assigning the private member of struct PIIX3State from outside which goes
against best QOM practices. Instead, implement best QOM practice by adding an
"isa-irqs" array property to TYPE_PIIX3_DEVICE and assign it in board code, i.e.
from outside.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20231007123843.127151-6-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
PIIX_NUM_PIC_IRQS is assumed to be the same as ISA_NUM_IRQS, otherwise
inconsistencies can occur.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20231007123843.127151-5-shentey@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost-user-scsi has a VirtioDeviceClass->reset() function that calls
->vhost_reset_device(). The other vhost devices don't notify the vhost
device upon reset.
Stateful vhost devices may need to handle device reset in order to free
resources or prevent stale device state from interfering after reset.
Call ->vhost_device_reset() from virtio_reset() so that that vhost
devices are notified of device reset.
This patch affects behavior as follows:
- vhost-kernel: No change in behavior since ->vhost_reset_device() is
not implemented.
- vhost-user: back-ends that negotiate
VHOST_USER_PROTOCOL_F_RESET_DEVICE now receive a
VHOST_USER_DEVICE_RESET message upon device reset. Otherwise there is
no change in behavior. DPDK, SPDK, libvhost-user, and the
vhost-user-backend crate do not negotiate
VHOST_USER_PROTOCOL_F_RESET_DEVICE automatically.
- vhost-vdpa: an extra SET_STATUS 0 call is made during device reset.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20231004014532.1228637-4-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Coverity scan reports multiple false-positive "defects" for the
following series of actions in virtio.c:
MemoryRegionCache indirect_desc_cache;
address_space_cache_init_empty(&indirect_desc_cache);
address_space_cache_destroy(&indirect_desc_cache);
For some reason it's unable to recognize the dependency between 'mrs.mr'
and 'fv' and insists that '!mrs.mr' check in address_space_cache_destroy
may take a 'false' branch, even though it is explicitly initialized to
NULL in the address_space_cache_init_empty():
*** CID 1522371: Memory - illegal accesses (UNINIT)
/qemu/hw/virtio/virtio.c: 1627 in virtqueue_split_pop()
1621 }
1622
1623 vq->inuse++;
1624
1625 trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
1626 done:
>>> CID 1522371: Memory - illegal accesses (UNINIT)
>>> Using uninitialized value "indirect_desc_cache.fv" when
>>> calling "address_space_cache_destroy".
1627 address_space_cache_destroy(&indirect_desc_cache);
1628
1629 return elem;
1630
1631 err_undo_map:
1632 virtqueue_undo_map_desc(out_num, in_num, iov);
** CID 1522370: Memory - illegal accesses (UNINIT)
Instead of trying to silence these false positive reports in 4
different places, initializing 'fv' as well, as this doesn't result
in any noticeable performance impact.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Message-Id: <20231009104322.3085887-1-i.maximets@ovn.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
This is intended to be a semantic revert of commit 9b09503752
("migration: run setup callbacks out of big lock"). There have been so
many changes since that commit (e.g. a new setup callback
dirty_bitmap_save_setup() that also needs to be adapted now), it's
easier to do the revert manually.
For snapshots, the bdrv_writev_vmstate() function is used during setup
(in QIOChannelBlock backing the QEMUFile), but not holding the BQL
while calling it could lead to an assertion failure. To understand
how, first note the following:
1. Generated coroutine wrappers for block layer functions spawn the
coroutine and use AIO_WAIT_WHILE()/aio_poll() to wait for it.
2. If the host OS switches threads at an inconvenient time, it can
happen that a bottom half scheduled for the main thread's AioContext
is executed as part of a vCPU thread's aio_poll().
An example leading to the assertion failure is as follows:
main thread:
1. A snapshot-save QMP command gets issued.
2. snapshot_save_job_bh() is scheduled.
vCPU thread:
3. aio_poll() for the main thread's AioContext is called (e.g. when
the guest writes to a pflash device, as part of blk_pwrite which is a
generated coroutine wrapper).
4. snapshot_save_job_bh() is executed as part of aio_poll().
3. qemu_savevm_state() is called.
4. qemu_mutex_unlock_iothread() is called. Now
qemu_get_current_aio_context() returns 0x0.
5. bdrv_writev_vmstate() is executed during the usual savevm setup
via qemu_fflush(). But this function is a generated coroutine wrapper,
so it uses AIO_WAIT_WHILE. There, the assertion
assert(qemu_get_current_aio_context() == qemu_get_aio_context());
will fail.
To fix it, ensure that the BQL is held during setup. While it would
only be needed for snapshots, adapting migration too avoids additional
logic for conditional locking/unlocking in the setup callbacks.
Writing the header could (in theory) also trigger qemu_fflush() and
thus bdrv_writev_vmstate(), so the locked section also covers the
qemu_savevm_state_header() call, even for migration for consistency.
The section around multifd_send_sync_main() needs to be unlocked to
avoid a deadlock. In particular, the multifd_save_setup() function calls
socket_send_channel_create() using multifd_new_send_channel_async() as a
callback and then waits for the callback to signal via the
channels_ready semaphore. The connection happens via
qio_task_run_in_thread(), but the callback is only executed via
qio_task_thread_result() which is scheduled for the main event loop.
Without unlocking the section, the main thread would never get to
process the task result and the callback meaning there would be no
signal via the channels_ready semaphore.
The comment in ram_init_bitmaps() was introduced by 4987783400
("migration: fix incorrect memory_global_dirty_log_start outside BQL")
and is removed, because it referred to the qemu_mutex_lock_iothread()
call.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231013105839.415989-1-f.ebner@proxmox.com>
-----BEGIN PGP SIGNATURE-----
iLMEAAEKAB0WIQS4/x2g0v3LLaCcbCxAov/yOSY+3wUCZSimNQAKCRBAov/yOSY+
33XwBADF9ZKlESDBDa/huNFAKD7BsUIdglHfz9lHnLY+kQbCun4HyTLtp2IBsySu
mZTjdfU/LnaBidFLjEnmZZMPyiI3oV1ruSzT53egSDaxrFUXGpc9oxtMNLsyfk9P
swdngG13Fc9sWVKC7IJeYDYXgkvHY7NxsiV8U9vdqXOyw2uoHA==
=ufPc
-----END PGP SIGNATURE-----
Merge tag 'pull-loongarch-20231013' of https://gitlab.com/gaosong/qemu into staging
pull-loongarch-20231013
# -----BEGIN PGP SIGNATURE-----
#
# iLMEAAEKAB0WIQS4/x2g0v3LLaCcbCxAov/yOSY+3wUCZSimNQAKCRBAov/yOSY+
# 33XwBADF9ZKlESDBDa/huNFAKD7BsUIdglHfz9lHnLY+kQbCun4HyTLtp2IBsySu
# mZTjdfU/LnaBidFLjEnmZZMPyiI3oV1ruSzT53egSDaxrFUXGpc9oxtMNLsyfk9P
# swdngG13Fc9sWVKC7IJeYDYXgkvHY7NxsiV8U9vdqXOyw2uoHA==
# =ufPc
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 12 Oct 2023 22:06:45 EDT
# gpg: using RSA key B8FF1DA0D2FDCB2DA09C6C2C40A2FFF239263EDF
# gpg: Good signature from "Song Gao <m17746591750@163.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B8FF 1DA0 D2FD CB2D A09C 6C2C 40A2 FFF2 3926 3EDF
* tag 'pull-loongarch-20231013' of https://gitlab.com/gaosong/qemu:
LoongArch: step down as general arch maintainer
hw/loongarch/virt: Remove unused 'loongarch_virt_pm' region
hw/loongarch/virt: Remove unused ISA Bus
hw/loongarch/virt: Remove unused ISA UART
hw/loongarch: remove global loaderparams variable
target/loongarch: Add preldx instruction
target/loongarch: fix ASXE flag conflict
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
- Clean up coroutine versions of bdrv_{is_allocated,block_status}*
- Graph locking part 5 (protect children/parent links)
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmUoHL8RHGt3b2xmQHJl
ZGhhdC5jb20ACgkQfwmycsiPL9b4uRAAjryVAaA5jXZ3mdGB80nhGtARZlIaIVO/
tlXk065q2Cj+98f+fBPCPWvmEz28vJwBhJUsFwpHzLZrxecBpwZp0MPAkFBNkouq
+AiO9xyTAqccEp/dnIys4Bun9Rp0Jq9lk9y29zzEmQuK5uCB56lpx2cDn/JkzSQt
ZFtnxxTwi3MDTNvXATub8Ia/1suui0zvESS7J/NBxQNI3cFaQszp1vMwlRIoPiWo
15YZFPZZQ2pvu6/1nL1Vl9OLbPAVcEGJpjHZv0XhudYOwRiDvjYnwfPL7BuwYEsU
Dos4mZZd/KMU695s7OzlVYi1q4ATKUTUxyyylVhXZrFBXSE5ntnfoHTKHEruTyPb
G31h5mribSTWjdvY5HewHbSSPjByAWsSQg9yzcHybhORiqGQCpcGQ8zuW7oNKMPV
JicWdoRVY4U4hR0nRdDxz9zdpQ8QYok/ginBxFaOzrCfClUB7ZOBxwRMclIghuRH
FV+ZJk0ylVOz2tbfNxUa3KhUgTPd8jgCHFI7xak5EBRtTJiJjE03Xag1Fdxy5/D5
tRsBBW4sOFygAhjN/xyeaRv9L8rAv3x/akriFjPUbOMLkPcJpe/DTWsP8+5LaZF8
GkQvjsg5UvmfcJ3LFtecXxfYH4UWhDmyAjF+BswiRqafDDi2CCUmdwDnzEPbwuWO
x1y7cgxe9SE=
=4d/s
-----END PGP SIGNATURE-----
Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging
Block layer patches
- Clean up coroutine versions of bdrv_{is_allocated,block_status}*
- Graph locking part 5 (protect children/parent links)
# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmUoHL8RHGt3b2xmQHJl
# ZGhhdC5jb20ACgkQfwmycsiPL9b4uRAAjryVAaA5jXZ3mdGB80nhGtARZlIaIVO/
# tlXk065q2Cj+98f+fBPCPWvmEz28vJwBhJUsFwpHzLZrxecBpwZp0MPAkFBNkouq
# +AiO9xyTAqccEp/dnIys4Bun9Rp0Jq9lk9y29zzEmQuK5uCB56lpx2cDn/JkzSQt
# ZFtnxxTwi3MDTNvXATub8Ia/1suui0zvESS7J/NBxQNI3cFaQszp1vMwlRIoPiWo
# 15YZFPZZQ2pvu6/1nL1Vl9OLbPAVcEGJpjHZv0XhudYOwRiDvjYnwfPL7BuwYEsU
# Dos4mZZd/KMU695s7OzlVYi1q4ATKUTUxyyylVhXZrFBXSE5ntnfoHTKHEruTyPb
# G31h5mribSTWjdvY5HewHbSSPjByAWsSQg9yzcHybhORiqGQCpcGQ8zuW7oNKMPV
# JicWdoRVY4U4hR0nRdDxz9zdpQ8QYok/ginBxFaOzrCfClUB7ZOBxwRMclIghuRH
# FV+ZJk0ylVOz2tbfNxUa3KhUgTPd8jgCHFI7xak5EBRtTJiJjE03Xag1Fdxy5/D5
# tRsBBW4sOFygAhjN/xyeaRv9L8rAv3x/akriFjPUbOMLkPcJpe/DTWsP8+5LaZF8
# GkQvjsg5UvmfcJ3LFtecXxfYH4UWhDmyAjF+BswiRqafDDi2CCUmdwDnzEPbwuWO
# x1y7cgxe9SE=
# =4d/s
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 12 Oct 2023 12:20:15 EDT
# gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg: issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (26 commits)
block: Add assertion for bdrv_graph_wrlock()
block: Protect bs->children with graph_lock
block: Protect bs->parents with graph_lock
block: Mark bdrv_get_specific_info() and callers GRAPH_RDLOCK
block: Mark bdrv_apply_auto_read_only() and callers GRAPH_RDLOCK
block: Mark bdrv_op_is_blocked() and callers GRAPH_RDLOCK
qcow2: Mark check_constraints_on_bitmap() GRAPH_RDLOCK
qcow2: Mark qcow2_inactivate() and callers GRAPH_RDLOCK
qcow2: Mark qcow2_signal_corruption() and callers GRAPH_RDLOCK
block: Mark bdrv_amend_options() and callers GRAPH_RDLOCK
block: Mark bdrv_get_parent_name() and callers GRAPH_RDLOCK
block: Mark bdrv_primary_child() and callers GRAPH_RDLOCK
block: Mark bdrv_refresh_filename() and callers GRAPH_RDLOCK
block: Mark bdrv_get_xdbg_block_graph() and callers GRAPH_RDLOCK
block: Take graph rdlock in parts of reopen
block: Mark bdrv_snapshot_fallback() and callers GRAPH_RDLOCK
block: Mark bdrv_parent_cb_resize() and callers GRAPH_RDLOCK
block: Mark drain related functions GRAPH_RDLOCK
block: Mark bdrv_first_blk() and bdrv_is_root_node() GRAPH_RDLOCK
block: Take graph rdlock in bdrv_inactivate_all()
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
"Host Memory Backends" and "Memory devices" queue ("mem"):
- Support memory devices with multiple memslots
- Support memory devices that dynamically consume memslots
- Support memory devices that can automatically decide on the number of
memslots to use
- virtio-mem support for exposing memory dynamically via multiple
memslots
- Some required cleanups/refactorings
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAmUn+XMRHGRhdmlkQHJl
ZGhhdC5jb20ACgkQTd4Q9wD/g1qDHA//T01suTa+uzrcoJHoMWN11S47WnAmbuTo
vVakucLBPMJAa9xZeCy3OavXaVGpHkw+t6g3OFknof0LfQ5/j9iE3Q1PxURN7g5j
SJ2WJXCoceM6T4TMhPvVvgEaYjFmESqZB5FZgedMT0QRyhAxMuF9pCkWhk1O3OAV
JqQKqLFiGcv60AEuBYGZGzgiOUv8EJ5gKwRF4VOdyHIxqZDw1aZXzlcd4TzFZBQ7
rwW/3ef+sFmUJdmfrSrqcIlQSRrqZ2w95xATDzLTIEEUT3SWqh/E95EZWIz1M0oQ
NgWgFiLCR1KOj7bWFhLXT7IfyLh0mEysD+P/hY6QwQ4RewWG7EW5UK+JFswssdcZ
rEj5XpHZzev/wx7hM4bWsoQ+VIvrH7j3uYGyWkcgYRbdDEkWDv2rsT23lwGYNhht
oBsrdEBELRw6v4C8doq/+sCmHmuxUMqTGwbArCQVnB1XnLxOEkuqlnfq5MORkzNF
fxbIRx+LRluOllC0HVaDQd8qxRq1+UC5WIpAcDcrouy4HGgi1onWKrXpgjIAbVyH
M6cENkK7rnRk96gpeXdmrf0h9HqRciAOY8oUsFsvLyKBOCPBWDrLyOQEY5UoSdtD
m4QpEVgywCy2z1uU/UObeT/UxJy/9EL/Zb+DHoEK06iEhwONoUJjEBYMJD38RMkk
mwPTB4UAk9g=
=s69t
-----END PGP SIGNATURE-----
Merge tag 'mem-2023-10-12' of https://github.com/davidhildenbrand/qemu into staging
Hi,
"Host Memory Backends" and "Memory devices" queue ("mem"):
- Support memory devices with multiple memslots
- Support memory devices that dynamically consume memslots
- Support memory devices that can automatically decide on the number of
memslots to use
- virtio-mem support for exposing memory dynamically via multiple
memslots
- Some required cleanups/refactorings
# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAmUn+XMRHGRhdmlkQHJl
# ZGhhdC5jb20ACgkQTd4Q9wD/g1qDHA//T01suTa+uzrcoJHoMWN11S47WnAmbuTo
# vVakucLBPMJAa9xZeCy3OavXaVGpHkw+t6g3OFknof0LfQ5/j9iE3Q1PxURN7g5j
# SJ2WJXCoceM6T4TMhPvVvgEaYjFmESqZB5FZgedMT0QRyhAxMuF9pCkWhk1O3OAV
# JqQKqLFiGcv60AEuBYGZGzgiOUv8EJ5gKwRF4VOdyHIxqZDw1aZXzlcd4TzFZBQ7
# rwW/3ef+sFmUJdmfrSrqcIlQSRrqZ2w95xATDzLTIEEUT3SWqh/E95EZWIz1M0oQ
# NgWgFiLCR1KOj7bWFhLXT7IfyLh0mEysD+P/hY6QwQ4RewWG7EW5UK+JFswssdcZ
# rEj5XpHZzev/wx7hM4bWsoQ+VIvrH7j3uYGyWkcgYRbdDEkWDv2rsT23lwGYNhht
# oBsrdEBELRw6v4C8doq/+sCmHmuxUMqTGwbArCQVnB1XnLxOEkuqlnfq5MORkzNF
# fxbIRx+LRluOllC0HVaDQd8qxRq1+UC5WIpAcDcrouy4HGgi1onWKrXpgjIAbVyH
# M6cENkK7rnRk96gpeXdmrf0h9HqRciAOY8oUsFsvLyKBOCPBWDrLyOQEY5UoSdtD
# m4QpEVgywCy2z1uU/UObeT/UxJy/9EL/Zb+DHoEK06iEhwONoUJjEBYMJD38RMkk
# mwPTB4UAk9g=
# =s69t
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 12 Oct 2023 09:49:39 EDT
# gpg: using RSA key 1BD9CAAD735C4C3A460DFCCA4DDE10F700FF835A
# gpg: issuer "david@redhat.com"
# gpg: Good signature from "David Hildenbrand <david@redhat.com>" [unknown]
# gpg: aka "David Hildenbrand <davidhildenbrand@gmail.com>" [full]
# gpg: aka "David Hildenbrand <hildenbr@in.tum.de>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 1BD9 CAAD 735C 4C3A 460D FCCA 4DDE 10F7 00FF 835A
* tag 'mem-2023-10-12' of https://github.com/davidhildenbrand/qemu:
virtio-mem: Mark memslot alias memory regions unmergeable
memory,vhost: Allow for marking memory device memory regions unmergeable
virtio-mem: Expose device memory dynamically via multiple memslots if enabled
virtio-mem: Update state to match bitmap as soon as it's been migrated
virtio-mem: Pass non-const VirtIOMEM via virtio_mem_range_cb
memory: Clarify mapping requirements for RamDiscardManager
memory-device,vhost: Support automatic decision on the number of memslots
vhost: Add vhost_get_max_memslots()
kvm: Add stub for kvm_get_max_memslots()
memory-device,vhost: Support memory devices that dynamically consume memslots
memory-device: Track required and actually used memslots in DeviceMemoryState
stubs: Rename qmp_memory_device.c to memory_device.c
memory-device: Support memory devices with multiple memslots
vhost: Return number of free memslots
kvm: Return number of free memslots
softmmu/physmem: Fixup qemu_ram_block_from_host() documentation
vhost: Remove vhost_backend_can_merge() callback
vhost: Rework memslot filtering and fix "used_memslot" tracking
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This modifies the common virtio-gpu.h file have the fields and
defintions needed by gfxstream/rutabaga, by VirtioGpuRutabaga.
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Tested-by: Alyssa Ross <hi@alyssa.is>
Tested-by: Emmanouil Pitsidianakis <manos.pitsidianakis@linaro.org>
Tested-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Emmanouil Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Antonio Caggiano <quic_acaggian@quicinc.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Use VIRTIO_GPU_SHM_ID_HOST_VISIBLE as id for virtio-gpu.
Signed-off-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Tested-by: Alyssa Ross <hi@alyssa.is>
Tested-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Tested-by: Huang Rui <ray.huang@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG' to allow
defining shared memory regions with sizes and offsets of 2^32 and more.
Multiple instances of the capability are allowed and distinguished
by a device-specific 'id'.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Tested-by: Alyssa Ross <hi@alyssa.is>
Tested-by: Huang Rui <ray.huang@amd.com>
Tested-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
This patch fixes invalid ufs register fields.
This fixes an issue reported by Bin Meng that
caused ufs to fail over riscv.
Fixes: bc4e68d362 ("hw/ufs: Initial commit for emulated Universal-Flash-Storage")
Signed-off-by: Jeuk Kim <jeuk20.kim@samsung.com>
Reported-by: Bin Meng <bmeng@tinylab.org>
Reviewed-by: Bin Meng <bmeng@tinylab.org>
Tested-by: Bin Meng <bmeng@tinylab.org>
The LoongArch 'virt' machine doesn't use its ISA I/O region.
If a ISA device were to be mapped there, there is no support
for ISA IRQ. Unlikely useful. Simply remove.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20231010135342.40219-3-philmd@linaro.org>
Signed-off-by: Song Gao <gaosong@loongson.cn>
bdrv_graph_wrlock() can't run in a coroutine (because it polls) and
requires holding the BQL. We already have GLOBAL_STATE_CODE() to assert
the latter. Assert the former as well and add a no_coroutine_fn marker.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-23-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Almost all functions that access the child links already take the graph
lock now. Add locking to the remaining users and finally annotate the
struct field itself as protected by the graph lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-22-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Almost all functions that access the parent link already take the graph
lock now. Add locking to the remaining user in a test case and finally
annotate the struct field itself as protected by the graph lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-21-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_get_specific_info() need to hold a reader lock for the graph.
This removes an assume_graph_lock() call in vmdk's implementation.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-20-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_apply_auto_read_only() need to hold a reader lock for the graph
because it calls bdrv_can_set_read_only(), which indirectly accesses the
parents list of a node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-19-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_op_is_blocked() need to hold a reader lock for the graph
because it calls bdrv_get_device_or_node_name(), which accesses the
parents list of a node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-18-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
qcow2_signal_corruption() need to hold a reader lock for the graph
because it calls bdrv_get_node_name(), which accesses the parents list
of a node.
For some places, we know that they will hold the lock, but we don't have
the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock()
with a FIXME comment. These places will be removed once everything is
properly annotated.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-15-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_amend_options() need to hold a reader lock for the graph. This
removes an assume_graph_lock() call in crypto's implementation.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-14-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_get_parent_name() need to hold a reader lock for the graph
because it accesses the parents list of a node.
For some places, we know that they will hold the lock, but we don't have
the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock()
with a FIXME comment. These places will be removed once everything is
properly annotated.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-13-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_primary_child() need to hold a reader lock for the graph
because it accesses the children list of a node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-12-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_refresh_filename() need to hold a reader lock for the graph
because it accesses the children list of a node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-11-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_get_xdbg_block_graph() need to hold a reader lock for the graph
because it accesses the children list of a node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-10-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reopen isn't easy with respect to locking because many of its functions
need to iterate the graph, some change it, and then you get some drains
in the middle where you can't hold any locks.
Therefore just documents most of the functions to be unlocked, and take
locks internally before accessing the graph.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-9-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_snapshot_fallback() need to hold a reader lock for the graph
because it accesses the children list of a node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-8-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Draining recursively traverses the graph, therefore we need to make sure
that also such accesses to the graph are protected by the graph rdlock.
There are 3 different drain callers to consider:
1. drain in the main loop: no issue at all, rdlock is nop.
2. drain in an iothread: rdlock only works in main loop or coroutines,
so disallow it.
3. drain in a coroutine (regardless of AioContext): the drain mechanism
takes care of scheduling a BH in the bs->aio_context that will
then take care of perform the actual draining. This is wrong,
because as pointed in (2) if bs->aio_context is an iothread then
rdlock won't work. Therefore change bdrv_co_yield_to_drain to
schedule the BH in the main loop.
Caller (2) also implies that we need to modify test-bdrv-drain.c to
disallow draining in the iothreads.
For some places, we know that they will hold the lock, but we don't have
the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock()
with a FIXME comment. These places will be removed once everything is
properly annotated.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-6-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_first_blk() and bdrv_is_root_node() need to hold a reader lock
for the graph. These functions are the only functions in block-backend.c
that access the parent list of a node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-5-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a new wrapper type for GRAPH_RDLOCK functions that should be called
from coroutine context.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-3-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230904100306.156197-4-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Include both coroutine and non-coroutine versions, the latter being
co_wrapper_mixed_bdrv_rdlock of the former.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230904100306.156197-3-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* Add support for the max CPU
* Detect user choice in TCG
* Clear CSR values at reset and sync MPSTATE with host
* Fix the typo of inverted order of pmpaddr13 and pmpaddr14
* Split TCG/KVM accelerators from cpu.c
* Add extension properties for all cpus
* Replace GDB exit calls with proper shutdown
* Support KVM_GET_REG_LIST
* Remove RVG warning
* Use env_archcpu for better performance
* Deprecate capital 'Z' CPU properties
* Fix vfwmaccbf16.vf
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEaukCtqfKh31tZZKWr3yVEwxTgBMFAmUncYAACgkQr3yVEwxT
gBPQ3g/9Fi4uYRK7dymHHAQbOO9NPlmVPPSxmQ8fNUhoZUkbHfm56JEl42Xr02rA
Lg2ORRQxJhAinANV8CotnbyLRHNCAvouCMCQEjHo1YEHzdXc0tQzp+rIOHT7v9rH
6OQpI6RuCjO+0LQPMgzJx8yokMw/9b0uma3+RkNKod1XsSySo6JvDkMZGGZZWuVX
Que3TMHzc4513PWEwRS9NaAHqRdy/ax0aPu9khswTYBxeJ/mBTLvGj4wBq5wnS7+
JPvq0M5ScUMl4K5o884wsAzOdxRk8QZOMx3duMCbqXw0xFmYZj/EzcIeHdnXwuDB
lcANd6LcESMNUb8iDBaFRjLnZ/gNiu20/P/LPWyTirfoZXzZ+h6WPnSeli36xtzO
KKWtvS1YggCjsDvh9/PLYAvUGBcS/kUhIynN10YKnoKB+wSDxxyvBS1GU6c8czgc
WDf3V4P3Z8oPKDA/24Qd9Uiho1Gq9FED4eBQPb9PuvkfboKE/g7lUp708XXDFVld
hkJMsYROSRvk54RHITrD9Z+XFQ2TfC8wHLH0IwlyynQnc1sKvXaR6U1hZTAVtE4f
yley/xCQ7OUV+hrx1sQLURcN6A+SPummOY5jdHiD29QcJnOZnkSy5j2KOlnHSa5i
6v/6EFCgxwr69N6Q6X34VDv6+DZqLO2dNncQCInYFfupRhQ7t1E=
=SUon
-----END PGP SIGNATURE-----
Merge tag 'pull-riscv-to-apply-20231012-1' of https://github.com/alistair23/qemu into staging
Second RISC-V PR for 8.2
* Add support for the max CPU
* Detect user choice in TCG
* Clear CSR values at reset and sync MPSTATE with host
* Fix the typo of inverted order of pmpaddr13 and pmpaddr14
* Split TCG/KVM accelerators from cpu.c
* Add extension properties for all cpus
* Replace GDB exit calls with proper shutdown
* Support KVM_GET_REG_LIST
* Remove RVG warning
* Use env_archcpu for better performance
* Deprecate capital 'Z' CPU properties
* Fix vfwmaccbf16.vf
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEaukCtqfKh31tZZKWr3yVEwxTgBMFAmUncYAACgkQr3yVEwxT
# gBPQ3g/9Fi4uYRK7dymHHAQbOO9NPlmVPPSxmQ8fNUhoZUkbHfm56JEl42Xr02rA
# Lg2ORRQxJhAinANV8CotnbyLRHNCAvouCMCQEjHo1YEHzdXc0tQzp+rIOHT7v9rH
# 6OQpI6RuCjO+0LQPMgzJx8yokMw/9b0uma3+RkNKod1XsSySo6JvDkMZGGZZWuVX
# Que3TMHzc4513PWEwRS9NaAHqRdy/ax0aPu9khswTYBxeJ/mBTLvGj4wBq5wnS7+
# JPvq0M5ScUMl4K5o884wsAzOdxRk8QZOMx3duMCbqXw0xFmYZj/EzcIeHdnXwuDB
# lcANd6LcESMNUb8iDBaFRjLnZ/gNiu20/P/LPWyTirfoZXzZ+h6WPnSeli36xtzO
# KKWtvS1YggCjsDvh9/PLYAvUGBcS/kUhIynN10YKnoKB+wSDxxyvBS1GU6c8czgc
# WDf3V4P3Z8oPKDA/24Qd9Uiho1Gq9FED4eBQPb9PuvkfboKE/g7lUp708XXDFVld
# hkJMsYROSRvk54RHITrD9Z+XFQ2TfC8wHLH0IwlyynQnc1sKvXaR6U1hZTAVtE4f
# yley/xCQ7OUV+hrx1sQLURcN6A+SPummOY5jdHiD29QcJnOZnkSy5j2KOlnHSa5i
# 6v/6EFCgxwr69N6Q6X34VDv6+DZqLO2dNncQCInYFfupRhQ7t1E=
# =SUon
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 12 Oct 2023 00:09:36 EDT
# gpg: using RSA key 6AE902B6A7CA877D6D659296AF7C95130C538013
# gpg: Good signature from "Alistair Francis <alistair@alistair23.me>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6AE9 02B6 A7CA 877D 6D65 9296 AF7C 9513 0C53 8013
* tag 'pull-riscv-to-apply-20231012-1' of https://github.com/alistair23/qemu: (54 commits)
target/riscv: Fix vfwmaccbf16.vf
target/riscv: deprecate capital 'Z' CPU properties
target/riscv: Use env_archcpu for better performance
target/riscv/tcg: remove RVG warning
target/riscv/kvm: support KVM_GET_REG_LIST
target/riscv/kvm: improve 'init_multiext_cfg' error msg
gdbstub: replace exit calls with proper shutdown for softmmu
hw/char: riscv_htif: replace exit calls with proper shutdown
hw/misc/sifive_test.c: replace exit calls with proper shutdown
softmmu: pass the main loop status to gdb "Wxx" packet
softmmu: add means to pass an exit code when requesting a shutdown
target/riscv/tcg-cpu.c: add extension properties for all cpus
target/riscv: add riscv_cpu_get_name()
target/riscv/cpu: move priv spec functions to tcg-cpu.c
target/riscv/cpu.c: export isa_edata_arr[]
target/riscv/tcg: move riscv_cpu_add_misa_properties() to tcg-cpu.c
target/riscv/cpu.c: make misa_ext_cfgs[] 'const'
target/riscv/tcg: introduce tcg_cpu_instance_init()
target/riscv/cpu.c: export set_misa()
target/riscv/kvm: do not use riscv_cpu_add_misa_properties()
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Let's allow for marking memory regions unmergeable, to teach
flatview code and vhost to not merge adjacent aliases to the same memory
region into a larger memory section; instead, we want separate aliases to
stay separate such that we can atomically map/unmap aliases without
affecting other aliases.
This is desired for virtio-mem mapping device memory located on a RAM
memory region via multiple aliases into a memory region container,
resulting in separate memslots that can get (un)mapped atomically.
As an example with virtio-mem, the layout would look something like this:
[...]
0000000240000000-00000020bfffffff (prio 0, i/o): device-memory
0000000240000000-000000043fffffff (prio 0, i/o): virtio-mem
0000000240000000-000000027fffffff (prio 0, ram): alias memslot-0 @mem2 0000000000000000-000000003fffffff
0000000280000000-00000002bfffffff (prio 0, ram): alias memslot-1 @mem2 0000000040000000-000000007fffffff
00000002c0000000-00000002ffffffff (prio 0, ram): alias memslot-2 @mem2 0000000080000000-00000000bfffffff
[...]
Without unmergable memory regions, all three memslots would get merged into
a single memory section. For example, when mapping another alias (e.g.,
virtio-mem-memslot-3) or when unmapping any of the mapped aliases,
memory listeners will first get notified about the removal of the big
memory section to then get notified about re-adding of the new
(differently merged) memory section(s).
In an ideal world, memory listeners would be able to deal with that
atomically, like KVM nowadays does. However, (a) supporting this for other
memory listeners (vhost-user, vfio) is fairly hard: temporary removal
can result in all kinds of issues on concurrent access to guest memory;
and (b) this handling is undesired, because temporarily removing+readding
can consume quite some time on bigger memslots and is not efficient
(e.g., vfio unpinning and repinning pages ...).
Let's allow for marking a memory region unmergeable, such that we
can atomically (un)map aliases to the same memory region, similar to
(un)mapping individual DIMMs.
Similarly, teach vhost code to not redo what flatview core stopped doing:
don't merge such sections. Merging in vhost code is really only relevant
for handling random holes in boot memory where; without this merging,
the vhost-user backend wouldn't be able to mmap() some boot memory
backed on hugetlb.
We'll use this for virtio-mem next.
Message-ID: <20230926185738.277351-18-david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Having large virtio-mem devices that only expose little memory to a VM
is currently a problem: we map the whole sparse memory region into the
guest using a single memslot, resulting in one gigantic memslot in KVM.
KVM allocates metadata for the whole memslot, which can result in quite
some memory waste.
Assuming we have a 1 TiB virtio-mem device and only expose little (e.g.,
1 GiB) memory, we would create a single 1 TiB memslot and KVM has to
allocate metadata for that 1 TiB memslot: on x86, this implies allocating
a significant amount of memory for metadata:
(1) RMAP: 8 bytes per 4 KiB, 8 bytes per 2 MiB, 8 bytes per 1 GiB
-> For 1 TiB: 2147483648 + 4194304 + 8192 = ~ 2 GiB (0.2 %)
With the TDP MMU (cat /sys/module/kvm/parameters/tdp_mmu) this gets
allocated lazily when required for nested VMs
(2) gfn_track: 2 bytes per 4 KiB
-> For 1 TiB: 536870912 = ~512 MiB (0.05 %)
(3) lpage_info: 4 bytes per 2 MiB, 4 bytes per 1 GiB
-> For 1 TiB: 2097152 + 4096 = ~2 MiB (0.0002 %)
(4) 2x dirty bitmaps for tracking: 2x 1 bit per 4 KiB page
-> For 1 TiB: 536870912 = 64 MiB (0.006 %)
So we primarily care about (1) and (2). The bad thing is, that the
memory consumption *doubles* once SMM is enabled, because we create the
memslot once for !SMM and once for SMM.
Having a 1 TiB memslot without the TDP MMU consumes around:
* With SMM: 5 GiB
* Without SMM: 2.5 GiB
Having a 1 TiB memslot with the TDP MMU consumes around:
* With SMM: 1 GiB
* Without SMM: 512 MiB
... and that's really something we want to optimize, to be able to just
start a VM with small boot memory (e.g., 4 GiB) and a virtio-mem device
that can grow very large (e.g., 1 TiB).
Consequently, using multiple memslots and only mapping the memslots we
really need can significantly reduce memory waste and speed up
memslot-related operations. Let's expose the sparse RAM memory region using
multiple memslots, mapping only the memslots we currently need into our
device memory region container.
The feature can be enabled using "dynamic-memslots=on" and requires
"unplugged-inaccessible=on", which is nowadays the default.
Once enabled, we'll auto-detect the number of memslots to use based on the
memslot limit provided by the core. We'll use at most 1 memslot per
gigabyte. Note that our global limit of memslots accross all memory devices
is currently set to 256: even with multiple large virtio-mem devices,
we'd still have a sane limit on the number of memslots used.
The default is to not dynamically map memslot for now
("dynamic-memslots=off"). The optimization must be enabled manually,
because some vhost setups (e.g., hotplug of vhost-user devices) might be
problematic until we support more memslots especially in vhost-user backends.
Note that "dynamic-memslots=on" is just a hint that multiple memslots
*may* be used for internal optimizations, not that multiple memslots
*must* be used. The actual number of memslots that are used is an
internal detail: for example, once memslot metadata is no longer an
issue, we could simply stop optimizing for that. Migration source and
destination can differ on the setting of "dynamic-memslots".
Message-ID: <20230926185738.277351-17-david@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
We really only care about the RAM memory region not being mapped into
an address space yet as long as we're still setting up the
RamDiscardManager. Once mapped into an address space, memory notifiers
would get notified about such a region and any attempts to modify the
RamDiscardManager would be wrong.
While "mapped into an address space" is easy to check for RAM regions that
are mapped directly (following the ->container links), it's harder to
check when such regions are mapped indirectly via aliases. For now, we can
only detect that a region is mapped through an alias (->mapped_via_alias),
but we don't have a handle on these aliases to follow all their ->container
links to test if they are eventually mapped into an address space.
So relax the assertion in memory_region_set_ram_discard_manager(),
remove the check in memory_region_get_ram_discard_manager() and clarify
the doc.
Message-ID: <20230926185738.277351-14-david@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>