Commit Graph

20 Commits

Author SHA1 Message Date
Xiaoyao Li
b2e9426c04 physmem: Introduce ram_block_discard_guest_memfd_range()
When memory page is converted from private to shared, the original
private memory is back'ed by guest_memfd. Introduce
ram_block_discard_guest_memfd_range() for discarding memory in
guest_memfd.

Based on a patch by Isaku Yamahata <isaku.yamahata@intel.com>.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240320083945.991426-12-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23 17:35:26 +02:00
Paolo Bonzini
852f0048f3 RAMBlock: make guest_memfd require uncoordinated discard
Some subsystems like VFIO might disable ram block discard, but guest_memfd
uses discard operations to implement conversions between private and
shared memory.  Because of this, sequences like the following can result
in stale IOMMU mappings:

1. allocate shared page
2. convert page shared->private
3. discard shared page
4. convert page private->shared
5. allocate shared page
6. issue DMA operations against that shared page

This is not a use-after-free, because after step 3 VFIO is still pinning
the page.  However, DMA operations in step 6 will hit the old mapping
that was allocated in step 1.

Address this by taking ram_block_discard_is_enabled() into account when
deciding whether or not to discard pages.

Since kvm_convert_memory()/guest_memfd doesn't implement a
RamDiscardManager handler to convey and replay discard operations,
this is a case of uncoordinated discard, which is blocked/released
by ram_block_discard_require().  Interestingly, this function had
no use so far.

Alternative approaches would be to block discard of shared pages, but
this would cause guests to consume twice the memory if they use VFIO;
or to implement a RamDiscardManager and only block uncoordinated
discard, i.e. use ram_block_coordinated_discard_require().

[Commit message mostly by Michael Roth <michael.roth@amd.com>]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23 17:35:26 +02:00
Xiaoyao Li
15f7a80c49 RAMBlock: Add support of KVM private guest memfd
Add KVM guest_memfd support to RAMBlock so both normal hva based memory
and kvm guest memfd based private memory can be associated in one RAMBlock.

Introduce new flag RAM_GUEST_MEMFD. When it's set, it calls KVM ioctl to
create private guest_memfd during RAMBlock setup.

Allocating a new RAM_GUEST_MEMFD flag to instruct the setup of guest memfd
is more flexible and extensible than simply relying on the VM type because
in the future we may have the case that not all the memory of a VM need
guest memfd. As a benefit, it also avoid getting MachineState in memory
subsystem.

Note, RAM_GUEST_MEMFD is supposed to be set for memory backends of
confidential guests, such as TDX VM. How and when to set it for memory
backends will be implemented in the following patches.

Introduce memory_region_has_guest_memfd() to query if the MemoryRegion has
KVM guest_memfd allocated.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-ID: <20240320083945.991426-7-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23 17:35:25 +02:00
Nicholas Piggin
86a9ae80cc physmem: Factor cpu_physical_memory_dirty_bits_cleared() out
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20240219061731.232570-1-npiggin@gmail.com>
[PMD: Split patch in 2: part 1/2]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20240312201458.79532-3-philmd@linaro.org
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-12 17:39:40 -04:00
Philippe Mathieu-Daudé
7e8ccf99ed physmem: Expose tlb_reset_dirty_range_all()
In order to call tlb_reset_dirty_range_all() outside of
system/physmem.c, expose its prototype.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20240312201458.79532-2-philmd@linaro.org
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-12 17:39:40 -04:00
Jonathan Cameron
47293c922c physmem: Fix wrong address in large address_space_read/write_cached_slow()
If the access is bigger than the MemoryRegion supports,
flatview_read/write_continue() will attempt to update the Memory Region.
but the address passed to flatview_translate() is relative to the cache, not
to the FlatView.

On arm/virt with interleaved CXL memory emulation and virtio-blk-pci this
lead to the first part of descriptor being read from the CXL memory and the
second part from PA 0x8 which happens to be a blank region
of a flash chip and all ffs on this particular configuration.
Note this test requires the out of tree ARM support for CXL, but
the problem is more general.

Avoid this by adding new address_space_read_continue_cached()
and address_space_write_continue_cached() which share all the logic
with the flatview versions except for the MemoryRegion lookup which
is unnecessary as the MemoryRegionCache only covers one MemoryRegion.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240307153710.30907-5-Jonathan.Cameron@huawei.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 14:41:41 -04:00
Jonathan Cameron
e7927d33cf physmem: Factor out body of flatview_read/write_continue() loop
This code will be reused for the address_space_cached accessors
shortly.

Also reduce scope of result variable now we aren't directly
calling this in the loop.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20240307153710.30907-4-Jonathan.Cameron@huawei.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 14:41:40 -04:00
Jonathan Cameron
bcfd8ba4f5 physmem: Reduce local variable scope in flatview_read/write_continue()
Precursor to factoring out the inner loops for reuse.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20240307153710.30907-3-Jonathan.Cameron@huawei.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 14:41:40 -04:00
Jonathan Cameron
4c7c856319 physmem: Rename addr1 to more informative mr_addr in flatview_read/write() and similar
The calls to flatview_read/write[_continue]() have parameters addr and
addr1 but the names give no indication of what they are addresses of.
Rename addr1 to mr_addr to reflect that it is the translated address
offset within the MemoryRegion returned by flatview_translate().
Similarly rename the parameter in address_space_read/write_cached_slow()

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20240307153710.30907-2-Jonathan.Cameron@huawei.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 14:41:40 -04:00
Philippe Mathieu-Daudé
5d5bb9c8fd system/physmem: Do not include 'hw/xen/xen.h' but 'sysemu/xen.h'
physmem.c doesn't use any declaration from "hw/xen/xen.h",
it only requires "sysemu/xen.h" and "system/xen-mapcache.h".

Suggested-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20231114143816.71079-5-philmd@linaro.org>
2024-03-09 18:51:45 +01:00
Richard Henderson
9260bd4013 softmmu/physmem: Remove HOST_PAGE_ALIGN
Align allocation sizes to the maximum of host and target page sizes.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-15-richard.henderson@linaro.org>
2024-02-29 11:35:36 -10:00
Richard Henderson
80c3aeef7f softmmu/physmem: Remove qemu_host_page_size
Use qemu_real_host_page_size() instead.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-14-richard.henderson@linaro.org>
2024-02-29 11:35:36 -10:00
Manos Pitsidianakis
aab4631a4a system/physmem: remove redundant arg reassignment
Arguments `ram_block` are reassigned to local declarations `block`
without further use. Remove re-assignment to reduce noise.

Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-02-20 20:34:21 +03:00
Xiaoyao Li
ea18be78a6 physmem: replace function name with __func__ in ram_block_discard_range()
Use __func__ to avoid hard-coded function name.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240125023328.2520888-1-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-16 13:56:08 +01:00
Ilya Leoshkevich
f3d57a8200 target: Make qemu_target_page_mask() available for *-user
Currently qemu_target_page_mask() is usable only from the softmmu
code. Make it possible to use it from the *-user code as well.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-ID: <20231208003754.3688038-2-iii@linux.ibm.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20240124075609.14756-2-philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[rth: Split out change to accel/tcg/perf.c]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-01-29 21:04:10 +10:00
Stefan Hajnoczi
a4a411fbaf Replace "iothread lock" with "BQL" in comments
The term "iothread lock" is obsolete. The APIs use Big QEMU Lock (BQL)
in their names. Update the code comments to use "BQL" instead of
"iothread lock".

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-id: 20240102153529.486531-5-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-01-08 10:45:43 -05:00
Stefan Hajnoczi
195801d700 system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()
The Big QEMU Lock (BQL) has many names and they are confusing. The
actual QemuMutex variable is called qemu_global_mutex but it's commonly
referred to as the BQL in discussions and some code comments. The
locking APIs, however, are called qemu_mutex_lock_iothread() and
qemu_mutex_unlock_iothread().

The "iothread" name is historic and comes from when the main thread was
split into into KVM vcpu threads and the "iothread" (now called the main
loop thread). I have contributed to the confusion myself by introducing
a separate --object iothread, a separate concept unrelated to the BQL.

The "iothread" name is no longer appropriate for the BQL. Rename the
locking APIs to:
- void bql_lock(void)
- void bql_unlock(void)
- bool bql_locked(void)

There are more APIs with "iothread" in their names. Subsequent patches
will rename them. There are also comments and documentation that will be
updated in later patches.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Fabiano Rosas <farosas@suse.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Acked-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20240102153529.486531-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-01-08 10:45:43 -05:00
Vladimir Sementsov-Ogievskiy
adff55b520 coverity: physmem: use simple assertions instead of modelling
Unfortunately Coverity doesn't follow the logic aroung "len" and "l"
variables in stacks finishing with flatview_{read,write}_continue() and
generate a lot of OVERRUN false-positives. When small buffer (2 or 4
bytes) is passed to mem read/write path, Coverity assumes the worst
case of sz=8 in stn_he_p()/ldn_he_p() (defined in
include/qemu/bswap.h), and reports buffer overrun.

To silence these false-positives we have model functions, which hide
real logic from Coverity.

However, it turned out that these new two assertions are enough to
quiet Coverity.

Assertions are better than hiding the logic, so let's drop the
modelling and move to assertions for memory r/w call stacks.

After patch, the sequence

 cov-make-library --output-file /tmp/master.xmldb \
    scripts/coverity-scan/model.c
 cov-build --dir ~/covtmp/master make -j9
 cov-analyze --user-model-file /tmp/master.xmldb \
    --dir ~/covtmp/master --all --strip-path "$(pwd)
 cov-format-errors --dir ~/covtmp/master \
    --html-output ~/covtmp/master_html_report

Generate for me the same big set of CIDs excepept for 6 disappeared (so
it becomes even better).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Acked-by: David Hildenbrand <david@redhat.com>
Message-ID: <20231005140326.332830-1-vsementsov@yandex-team.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-24 12:07:47 +01:00
David Hildenbrand
022f033bd7 softmmu/physmem: Fixup qemu_ram_block_from_host() documentation
Let's fixup the documentation (e.g., removing traces of the ram_addr
parameter that no longer exists) and move it to the header file while at
it.

Message-ID: <20230926185738.277351-4-david@redhat.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-10-12 14:15:21 +02:00
Philippe Mathieu-Daudé
8d7f2e767d system: Rename softmmu/ directory as system/
The softmmu/ directory contains files specific to system
emulation. Rename it as system/. Update meson rules, the
MAINTAINERS file and all the documentation and comments.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20231004090629.37473-14-philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-08 21:08:08 +02:00