Commit Graph

1240 Commits

Author SHA1 Message Date
Philippe Mathieu-Daudé
aacfd8bbaf exec: Move CPUTLBEntry helpers to cputlb.c
The following CPUTLBEntry helpers are only used in accel/tcg/cputlb.c:
  - tlb_index()
  - tlb_entry()
  - tlb_read_idx()
  - tlb_addr_write()

Move them to this file, allowing to remove the huge "cpu.h" header
inclusion from "exec/cpu_ldst.h".

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240418192525.97451-13-philmd@linaro.org>
2024-04-26 17:03:05 +02:00
Philippe Mathieu-Daudé
e620363687 accel/hvf: Use accel-specific per-vcpu @dirty field
HVF has a specific use of the CPUState::vcpu_dirty field
(CPUState::vcpu_dirty is not used by common code).
To make this field accel-specific, add and use a new
@dirty variable in the AccelCPUState structure.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240424174506.326-4-philmd@linaro.org>
2024-04-26 17:03:00 +02:00
Philippe Mathieu-Daudé
e4751d340a accel/tcg: Rename load-extract/store-insert headers using .h.inc suffix
Since commit 139c1837db ("meson: rename included C source files
to .c.inc"), QEMU standard procedure for included C files is to
use *.c.inc.

Besides, since commit 6a0057aa22 ("docs/devel: make a statement
about includes") this is documented in the Coding Style:

  If you do use template header files they should be named with
  the ``.c.inc`` or ``.h.inc`` suffix to make it clear they are
  being included for expansion.

Therefore rename 'store-insert-al16.h' as 'store-insert-al16.h.inc'
and 'load-extract-al16-al8.h' as 'load-extract-al16-al8.h.inc'.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240424173333.96148-3-philmd@linaro.org>
2024-04-26 15:31:37 +02:00
Philippe Mathieu-Daudé
893b4bde88 accel/tcg: Include missing headers in 'tb-jmp-cache.h'
Due to missing headers, when including "tb-jmp-cache.h" we might get:

  accel/tcg/tb-jmp-cache.h:21:21: error: field ‘rcu’ has incomplete type
     21 |     struct rcu_head rcu;
        |                     ^~~
  accel/tcg/tb-jmp-cache.h:24:9: error: unknown type name ‘vaddr’
     24 |         vaddr pc;
        |         ^~~~~

Add the missing "qemu/rcu.h" and "exec/cpu-common.h" headers.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240111162442.43755-1-philmd@linaro.org>
2024-04-26 15:31:37 +02:00
Philippe Mathieu-Daudé
6bba316e23 accel/tcg: Include missing 'hw/core/cpu.h' header
tcg_cpu_init_cflags() accesses CPUState fields, so requires
"hw/core/cpu.h" to get its structure definition.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Anton Johansson <anjo@rev.ng>
Message-Id: <20231212123401.37493-12-philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2024-04-26 15:31:37 +02:00
Philippe Mathieu-Daudé
3b28c27067 accel/tcg: Un-inline retaddr helpers to 'user-retaddr.h'
set_helper_retaddr() is only used in accel/tcg/user-exec.c.

clear_helper_retaddr() is only used in accel/tcg/cpu-exec.c
and accel/tcg/user-exec.c.

No need to expose their definitions to all user-emulation
files including "exec/cpu_ldst.h", move them to a new
"user-retaddr.h" header (restricted to accel/tcg/).

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20231211212003.21686-19-philmd@linaro.org>
2024-04-26 15:31:37 +02:00
Philippe Mathieu-Daudé
4597463b38 accel: Include missing 'exec/cpu_ldst.h' header
Theses files call cpu_ldl_code() which is declared
in "exec/cpu_ldst.h".

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20231211212003.21686-5-philmd@linaro.org>
2024-04-26 15:31:37 +02:00
Philippe Mathieu-Daudé
51579d40f9 exec: Reduce tlb_set_dirty() declaration scope
tlb_set_dirty() is only used in accel/tcg/cputlb.c,
where it is defined. Declare it statically, removing
the stub.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240418192525.97451-11-philmd@linaro.org>
2024-04-26 15:28:11 +02:00
Zhao Liu
94da7b6e9a accel/tcg/icount-common: Consolidate the use of warn_report_once()
Use warn_report_once() to get rid of the static local variable "notified".

Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Message-ID: <20240418100716.1085491-1-zhao1.liu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23 17:35:26 +02:00
Isaku Yamahata
565f4768bb kvm/tdx: Ignore memory conversion to shared of unassigned region
TDX requires vMMIO region to be shared.  For KVM, MMIO region is the region
which kvm memslot isn't assigned to (except in-kernel emulation).
qemu has the memory region for vMMIO at each device level.

While OVMF issues MapGPA(to-shared) conservatively on 32bit PCI MMIO
region, qemu doesn't find corresponding vMMIO region because it's before
PCI device allocation and memory_region_find() finds the device region, not
PCI bus region.  It's safe to ignore MapGPA(to-shared) because when guest
accesses those region they use GPA with shared bit set for vMMIO.  Ignore
memory conversion request of non-assigned region to shared and return
success.  Otherwise OVMF is confused and panics there.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240229063726.610065-35-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23 17:35:26 +02:00
Isaku Yamahata
c5d9425ef4 kvm/tdx: Don't complain when converting vMMIO region to shared
Because vMMIO region needs to be shared region, guest TD may explicitly
convert such region from private to shared.  Don't complain such
conversion.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240229063726.610065-34-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23 17:35:26 +02:00
Chao Peng
c15e568407 kvm: handle KVM_EXIT_MEMORY_FAULT
Upon an KVM_EXIT_MEMORY_FAULT exit, userspace needs to do the memory
conversion on the RAMBlock to turn the memory into desired attribute,
switching between private and shared.

Currently only KVM_MEMORY_EXIT_FLAG_PRIVATE in flags is valid when
KVM_EXIT_MEMORY_FAULT happens.

Note, KVM_EXIT_MEMORY_FAULT makes sense only when the RAMBlock has
guest_memfd memory backend.

Note, KVM_EXIT_MEMORY_FAULT returns with -EFAULT, so special handling is
added.

When page is converted from shared to private, the original shared
memory can be discarded via ram_block_discard_range(). Note, shared
memory can be discarded only when it's not back'ed by hugetlb because
hugetlb is supposed to be pre-allocated and no need for discarding.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Message-ID: <20240320083945.991426-13-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23 17:35:26 +02:00
Xiaoyao Li
bd3bcf6962 kvm/memory: Make memory type private by default if it has guest memfd backend
KVM side leaves the memory to shared by default, which may incur the
overhead of paging conversion on the first visit of each page. Because
the expectation is that page is likely to private for the VMs that
require private memory (has guest memfd).

Explicitly set the memory to private when memory region has valid
guest memfd backend.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240320083945.991426-16-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23 17:35:25 +02:00
Chao Peng
ce5a983233 kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot
Switch to KVM_SET_USER_MEMORY_REGION2 when supported by KVM.

With KVM_SET_USER_MEMORY_REGION2, QEMU can set up memory region that
backend'ed both by hva-based shared memory and guest memfd based private
memory.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240320083945.991426-10-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23 17:35:25 +02:00
Xiaoyao Li
15f7a80c49 RAMBlock: Add support of KVM private guest memfd
Add KVM guest_memfd support to RAMBlock so both normal hva based memory
and kvm guest memfd based private memory can be associated in one RAMBlock.

Introduce new flag RAM_GUEST_MEMFD. When it's set, it calls KVM ioctl to
create private guest_memfd during RAMBlock setup.

Allocating a new RAM_GUEST_MEMFD flag to instruct the setup of guest memfd
is more flexible and extensible than simply relying on the VM type because
in the future we may have the case that not all the memory of a VM need
guest memfd. As a benefit, it also avoid getting MachineState in memory
subsystem.

Note, RAM_GUEST_MEMFD is supposed to be set for memory backends of
confidential guests, such as TDX VM. How and when to set it for memory
backends will be implemented in the following patches.

Introduce memory_region_has_guest_memfd() to query if the MemoryRegion has
KVM guest_memfd allocated.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-ID: <20240320083945.991426-7-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23 17:35:25 +02:00
Xiaoyao Li
0811baed49 kvm: Introduce support for memory_attributes
Introduce the helper functions to set the attributes of a range of
memory to private or shared.

This is necessary to notify KVM the private/shared attribute of each gpa
range. KVM needs the information to decide the GPA needs to be mapped at
hva-based shared memory or guest_memfd based private memory.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240320083945.991426-11-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23 17:35:25 +02:00
Xiaoyao Li
72853afc63 trace/kvm: Split address space and slot id in trace_kvm_set_user_memory()
The upper 16 bits of kvm_userspace_memory_region::slot are
address space id. Parse it separately in trace_kvm_set_user_memory().

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240229063726.610065-5-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23 17:35:25 +02:00
Paolo Bonzini
a99c0c66eb KVM: remove kvm_arch_cpu_check_are_resettable
Board reset requires writing a fresh CPU state.  As far as KVM is
concerned, the only thing that blocks reset is that CPU state is
encrypted; therefore, kvm_cpus_are_resettable() can simply check
if that is the case.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23 17:35:25 +02:00
Paolo Bonzini
5c3131c392 KVM: track whether guest state is encrypted
So far, KVM has allowed KVM_GET/SET_* ioctls to execute even if the
guest state is encrypted, in which case they do nothing.  For the new
API using VM types, instead, the ioctls will fail which is a safer and
more robust approach.

The new API will be the only one available for SEV-SNP and TDX, but it
is also usable for SEV and SEV-ES.  In preparation for that, require
architecture-specific KVM code to communicate the point at which guest
state is protected (which must be after kvm_cpu_synchronize_post_init(),
though that might change in the future in order to suppor migration).
From that point, skip reading registers so that cpu->vcpu_dirty is
never true: if it ever becomes true, kvm_arch_put_registers() will
fail miserably.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23 17:35:25 +02:00
Paolo Bonzini
1e1e48792a kvm: use configs/ definition to conditionalize debug support
If an architecture adds support for KVM_CAP_SET_GUEST_DEBUG but QEMU does not
have the necessary code, QEMU will fail to build after updating kernel headers.
Avoid this by using a #define in config-target.h instead of KVM_CAP_SET_GUEST_DEBUG.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-18 11:17:27 +02:00
Richard Henderson
dcd092a063 accel/tcg: Improve can_do_io management
We already attempted to set and clear can_do_io before the first
and last insns, but only used the initial value of max_insns and
the call to translator_io_start to find those insns.

Now that we track insn_start in DisasContextBase, and now that
we have emit_before_op, we can wait until we have finished
translation to identify the true first and last insns and emit
the sets of can_do_io at that time.

This fixes the case of a translation block which crossed a page
boundary, and for which the second page turned out to be mmio.
In this case we truncate the block, and the previous logic for
can_do_io could leave a block with a single insn with can_do_io
set to false, which would fail an assertion in cpu_io_recompile.

Reported-by: Jørgen Hansen <Jorgen.Hansen@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Jørgen Hansen <Jorgen.Hansen@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-04-09 07:45:10 -10:00
Richard Henderson
e7face702a accel/tcg: Add insn_start to DisasContextBase
This is currently target-specific for many; begin making it
target independent.

Tested-by: Jørgen Hansen <Jorgen.Hansen@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-04-09 07:45:05 -10:00
Igor Mammedov
e34f4d87e8 kvm: error out of kvm_irqchip_add_msi_route() in case of full route table
subj is calling kvm_add_routing_entry() which simply extends
  KVMState::irq_routes::entries[]
but doesn't check if number of routes goes beyond limit the kernel
is willing to accept. Which later leads toi the assert

  qemu-kvm: ../accel/kvm/kvm-all.c:1833: kvm_irqchip_commit_routes: Assertion `ret == 0' failed

typically it happens during guest boot for large enough guest

Reproduced with:
  ./qemu --enable-kvm -m 8G -smp 64 -machine pc \
     `for b in {1..2}; do echo -n "-device pci-bridge,id=pci$b,chassis_nr=$b ";
        for i in {0..31}; do touch /tmp/vblk$b$i;
           echo -n "-drive file=/tmp/vblk$b$i,if=none,id=drive$b$i,format=raw
                    -device virtio-blk-pci,drive=drive$b$i,bus=pci$b ";
      done; done`

While crash at boot time is bad, the same might happen at hotplug time
which is unacceptable.
So instead calling kvm_add_routing_entry() unconditionally, check first
that number of routes won't exceed KVM_CAP_IRQ_ROUTING. This way virtio
device insteads killin qemu, will gracefully fail to initialize device
as expected with following warnings on console:
    virtio-blk failed to set guest notifier (-28), ensure -accel kvm is set.
    virtio_bus_start_ioeventfd: failed. Fallback to userspace (slower).

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-ID: <20240408110956.451558-1-imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-08 21:22:00 +02:00
Philippe Mathieu-Daudé
93019696aa accel/tcg/plugin: Remove CONFIG_SOFTMMU_GATE definition
The CONFIG_SOFTMMU_GATE definition was never used, remove it.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240313213339.82071-2-philmd@linaro.org>
2024-04-02 14:54:35 +02:00
Richard Henderson
dafa0ecc97 accel/tcg: Use CPUState.get_pc in cpu_io_recompile
Using log_pc produces the pc at the beginning of TB,
not the actual pc installed by cpu_restore_state_from_tb,
which could be any of the guest instructions within TB.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-03-29 12:16:00 -10:00
Philippe Mathieu-Daudé
94956d7b51 bulk: Call in place single use cpu_env()
Avoid CPUArchState local variable when cpu_env() is used once.

Mechanical patch using the following Coccinelle spatch script:

 @@
 type CPUArchState;
 identifier env;
 expression cs;
 @@
  {
 -    CPUArchState *env = cpu_env(cs);
      ... when != env
 -     env
 +     cpu_env(cs)
      ... when != env
  }

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240129164514.73104-5-philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-12 11:46:16 +01:00
Philippe Mathieu-Daudé
f28b958cbf hw/xen: Extract 'xen_igd.h' from 'xen_pt.h'
"hw/xen/xen_pt.h" requires "hw/xen/xen_native.h" which is target
specific. It also declares IGD methods, which are not target
specific.

Target-agnostic code can use IGD methods. To allow that, extract
these methos into a new "hw/xen/xen_igd.h" header.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20231114143816.71079-18-philmd@linaro.org>
2024-03-09 18:51:45 +01:00
Pierrick Bouvier
3077be2545 plugins: cleanup codepath for previous inline operation
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-Id: <20240304130036.124418-13-pierrick.bouvier@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240305121005.3528075-26-alex.bennee@linaro.org>
2024-03-06 12:35:50 +00:00
Pierrick Bouvier
0bcebaba45 plugins: add inline operation per vcpu
Extends API with three new functions:
qemu_plugin_register_vcpu_{tb, insn, mem}_exec_inline_per_vcpu().

Those functions takes a qemu_plugin_u64 as input.

This allows to have a thread-safe and type-safe version of inline
operations.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-Id: <20240304130036.124418-5-pierrick.bouvier@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240305121005.3528075-18-alex.bennee@linaro.org>
2024-03-06 12:35:29 +00:00
Pierrick Bouvier
62f92b8d97 plugins: implement inline operation relative to cpu_index
Instead of working on a fixed memory location, allow to address it based
on cpu_index, an element size and a given offset.
Result address: ptr + offset + cpu_index * element_size.

With this, we can target a member in a struct array from a base pointer.

Current semantic is not modified, thus inline operation still targets
always the same memory location.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-Id: <20240304130036.124418-4-pierrick.bouvier@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240305121005.3528075-17-alex.bennee@linaro.org>
2024-03-06 12:35:26 +00:00
Richard Henderson
49fa457ca5 accel/tcg: Add TLB_CHECK_ALIGNED
This creates a per-page method for checking of alignment.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240301204110.656742-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-03-05 13:22:56 +00:00
Richard Henderson
a0ff4a879c accel/tcg: Add tlb_fill_flags to CPUTLBEntryFull
Allow the target to set tlb flags to apply to all of the
comparators.  Remove MemTxAttrs.byte_swap, as the bit is
not relevant to memory transactions, only the page mapping.
Adjust target/sparc to set TLB_BSWAP directly.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240301204110.656742-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-03-05 13:22:56 +00:00
Richard Henderson
33402cea1f accel/tcg: Disconnect TargetPageDataNode from page size
Dynamically size the node for the runtime target page size.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-29-richard.henderson@linaro.org>
2024-02-29 11:35:37 -10:00
Richard Henderson
8c45039f9e cpu: Remove page_size_init
Move qemu_host_page_{size,mask} and HOST_PAGE_ALIGN into bsd-user.
It should be removed from bsd-user as well, but defer that cleanup.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-28-richard.henderson@linaro.org>
2024-02-29 11:35:37 -10:00
Richard Henderson
a372d483f1 accel/tcg: Remove qemu_host_page_size from page_protect/page_unprotect
Use qemu_real_host_page_size instead.  Except for the final mprotect
within page_protect, we already handled host < target page size.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-2-richard.henderson@linaro.org>
2024-02-29 11:35:36 -10:00
Jonathan Cameron
6aba908d2b tcg: Avoid double lock if page tables happen to be in mmio memory.
On i386, after fixing the page walking code to work with pages in
MMIO memory (specifically CXL emulated interleaved memory),
a crash was seen in an interrupt handling path.

Useful part of backtrace

7  0x0000555555ab1929 in bql_lock_impl (file=0x555556049122 "../../accel/tcg/cputlb.c", line=2033) at ../../system/cpus.c:524
8  bql_lock_impl (file=file@entry=0x555556049122 "../../accel/tcg/cputlb.c", line=line@entry=2033) at ../../system/cpus.c:520
9  0x0000555555c9f7d6 in do_ld_mmio_beN (cpu=0x5555578e0cb0, full=0x7ffe88012950, ret_be=ret_be@entry=0, addr=19595792376, size=size@entry=8, mmu_idx=4, type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2033
10 0x0000555555ca0fbd in do_ld_8 (cpu=cpu@entry=0x5555578e0cb0, p=p@entry=0x7ffff4efd1d0, mmu_idx=<optimized out>, type=type@entry=MMU_DATA_LOAD, memop=<optimized out>, ra=ra@entry=0) at ../../accel/tcg/cputlb.c:2356
11 0x0000555555ca341f in do_ld8_mmu (cpu=cpu@entry=0x5555578e0cb0, addr=addr@entry=19595792376, oi=oi@entry=52, ra=0, ra@entry=52, access_type=access_type@entry=MMU_DATA_LOAD) at ../../accel/tcg/cputlb.c:2439
12 0x0000555555ca5f59 in cpu_ldq_mmu (ra=52, oi=52, addr=19595792376, env=0x5555578e3470) at ../../accel/tcg/ldst_common.c.inc:169
13 cpu_ldq_le_mmuidx_ra (env=0x5555578e3470, addr=19595792376, mmu_idx=<optimized out>, ra=ra@entry=0) at ../../accel/tcg/ldst_common.c.inc:301
14 0x0000555555b4b5fc in ptw_ldq (ra=0, in=0x7ffff4efd320) at ../../target/i386/tcg/sysemu/excp_helper.c:98
15 ptw_ldq (ra=0, in=0x7ffff4efd320) at ../../target/i386/tcg/sysemu/excp_helper.c:93
16 mmu_translate (env=env@entry=0x5555578e3470, in=0x7ffff4efd3e0, out=0x7ffff4efd3b0, err=err@entry=0x7ffff4efd3c0, ra=ra@entry=0) at ../../target/i386/tcg/sysemu/excp_helper.c:174
17 0x0000555555b4c4b3 in get_physical_address (ra=0, err=0x7ffff4efd3c0, out=0x7ffff4efd3b0, mmu_idx=0, access_type=MMU_DATA_LOAD, addr=18446741874686299840, env=0x5555578e3470) at ../../target/i386/tcg/sysemu/excp_helper.c:580
18 x86_cpu_tlb_fill (cs=0x5555578e0cb0, addr=18446741874686299840, size=<optimized out>, access_type=MMU_DATA_LOAD, mmu_idx=0, probe=<optimized out>, retaddr=0) at ../../target/i386/tcg/sysemu/excp_helper.c:606
19 0x0000555555ca0ee9 in tlb_fill (retaddr=0, mmu_idx=0, access_type=MMU_DATA_LOAD, size=<optimized out>, addr=18446741874686299840, cpu=0x7ffff4efd540) at ../../accel/tcg/cputlb.c:1315
20 mmu_lookup1 (cpu=cpu@entry=0x5555578e0cb0, data=data@entry=0x7ffff4efd540, mmu_idx=0, access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at ../../accel/tcg/cputlb.c:1713
21 0x0000555555ca2c61 in mmu_lookup (cpu=cpu@entry=0x5555578e0cb0, addr=addr@entry=18446741874686299840, oi=oi@entry=32, ra=ra@entry=0, type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffff4efd540) at ../../accel/tcg/cputlb.c:1803
22 0x0000555555ca3165 in do_ld4_mmu (cpu=cpu@entry=0x5555578e0cb0, addr=addr@entry=18446741874686299840, oi=oi@entry=32, ra=ra@entry=0, access_type=access_type@entry=MMU_DATA_LOAD) at ../../accel/tcg/cputlb.c:2416
23 0x0000555555ca5ef9 in cpu_ldl_mmu (ra=0, oi=32, addr=18446741874686299840, env=0x5555578e3470) at ../../accel/tcg/ldst_common.c.inc:158
24 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555578e3470, addr=addr@entry=18446741874686299840, mmu_idx=<optimized out>, ra=ra@entry=0) at ../../accel/tcg/ldst_common.c.inc:294
25 0x0000555555bb6cdd in do_interrupt64 (is_hw=1, next_eip=18446744072399775809, error_code=0, is_int=0, intno=236, env=0x5555578e3470) at ../../target/i386/tcg/seg_helper.c:889
26 do_interrupt_all (cpu=cpu@entry=0x5555578e0cb0, intno=236, is_int=is_int@entry=0, error_code=error_code@entry=0, next_eip=next_eip@entry=0, is_hw=is_hw@entry=1) at ../../target/i386/tcg/seg_helper.c:1130
27 0x0000555555bb87da in do_interrupt_x86_hardirq (env=env@entry=0x5555578e3470, intno=<optimized out>, is_hw=is_hw@entry=1) at ../../target/i386/tcg/seg_helper.c:1162
28 0x0000555555b5039c in x86_cpu_exec_interrupt (cs=0x5555578e0cb0, interrupt_request=<optimized out>) at ../../target/i386/tcg/sysemu/seg_helper.c:197
29 0x0000555555c94480 in cpu_handle_interrupt (last_tb=<synthetic pointer>, cpu=0x5555578e0cb0) at ../../accel/tcg/cpu-exec.c:844

Peter identified this as being due to the BQL already being
held when the page table walker encounters MMIO memory and attempts
to take the lock again.  There are other examples of similar paths
TCG, so this follows the approach taken in those of simply checking
if the lock is already held and if it is, don't take it again.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240219173153.12114-4-Jonathan.Cameron@huawei.com>
[rth: Use BQL_LOCK_GUARD]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-29 11:35:36 -10:00
Peter Maydell
62bcba836c accel/tcg: Set can_do_io at at start of lookup_tb_ptr helper
If a page table is in IO memory and lookup_tb_ptr probes
the TLB it can result in a page table walk for the instruction
fetch.  If this hits IO memory and io_prepare falsely assumes
it needs to do a TLB recompile.

Avoid that by setting can_do_io at the start of lookup_tb_ptr.

Link: https://lore.kernel.org/qemu-devel/CAFEAcA_a_AyQ=Epz3_+CheAT8Crsk9mOu894wbNW_FywamkZiw@mail.gmail.com/#t

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240219173153.12114-2-Jonathan.Cameron@huawei.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-29 11:35:36 -10:00
Alex Bennée
c006147122 plugins: create CPUPluginState and migrate plugin_mask
As we expand the per-vCPU data for plugins we don't want to pollute
CPUState. For now this just moves the plugin_mask (renamed to
event_mask) as the memory callbacks are accessed directly by TCG
generated code.

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240227144335.1196131-23-alex.bennee@linaro.org>
2024-02-28 09:11:42 +00:00
Akihiko Odaki
33a277fec0 plugins: Use different helpers when reading registers
This avoids optimizations incompatible when reading registers.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-Id: <20231213-gdb-v17-12-777047380591@daynix.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240227144335.1196131-21-alex.bennee@linaro.org>
2024-02-28 09:11:42 +00:00
Manos Pitsidianakis
431eddb69a accel/tcg: correct typos
Correct typos automatically found with the `typos` tool
<https://crates.io/crates/typos>

Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-02-20 22:21:18 +03:00
Paolo Bonzini
d8c7f1334f i386: xen: fix compilation --without-default-devices
The xenpv machine type requires XEN_BUS, so select it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-16 13:56:09 +01:00
William Roche
06152b89db migration: prevent migration when VM has poisoned memory
A memory page poisoned from the hypervisor level is no longer readable.
The migration of a VM will crash Qemu when it tries to read the
memory address space and stumbles on the poisoned page with a similar
stack trace:

Program terminated with signal SIGBUS, Bus error.
#0  _mm256_loadu_si256
#1  buffer_zero_avx2
#2  select_accel_fn
#3  buffer_is_zero
#4  save_zero_page
#5  ram_save_target_page_legacy
#6  ram_save_host_page
#7  ram_find_and_save_block
#8  ram_save_iterate
#9  qemu_savevm_state_iterate
#10 migration_iteration_run
#11 migration_thread
#12 qemu_thread_start

To avoid this VM crash during the migration, prevent the migration
when a known hardware poison exists on the VM.

Signed-off-by: William Roche <william.roche@oracle.com>
Link: https://lore.kernel.org/r/20240130190640.139364-2-william.roche@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:41:58 +08:00
Richard Henderson
3b91614004 include/exec: Change cpu_mmu_index argument to CPUState
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-03 16:46:10 +10:00
Philippe Mathieu-Daudé
ec1d32af12 target/i386: Extract x86_cpu_exec_halt() from accel/tcg/
Move this x86-specific code out of the generic accel/tcg/.

Reported-by: Anton Johansson <anjo@rev.ng>
Reviewed-by: Anton Johansson <anjo@rev.ng>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20240124101639.30056-10-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-01-29 21:04:10 +10:00
Philippe Mathieu-Daudé
aa6fb65746 accel/tcg: Introduce TCGCPUOps::cpu_exec_halt() handler
In order to make accel/tcg/ target agnostic,
introduce the cpu_exec_halt() handler.

Reviewed-by: Anton Johansson <anjo@rev.ng>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20240124101639.30056-9-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-01-29 21:04:10 +10:00
Richard Henderson
b7e9a4a9b0 accel/tcg: Inline need_replay_interrupt
The function is now trivial, and with inlining we can
re-use the calling function's tcg_ops variable.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-01-29 21:04:10 +10:00
Philippe Mathieu-Daudé
6ae754815f target/i386: Extract x86_need_replay_interrupt() from accel/tcg/
Move this x86-specific code out of the generic accel/tcg/.

Reviewed-by: Anton Johansson <anjo@rev.ng>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20240124101639.30056-8-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-01-29 21:04:10 +10:00
Philippe Mathieu-Daudé
0fdc69b76e accel/tcg: Introduce TCGCPUOps::need_replay_interrupt() handler
In order to make accel/tcg/ target agnostic,
introduce the need_replay_interrupt() handler.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Anton Johansson <anjo@rev.ng>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Message-Id: <20240124101639.30056-7-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-01-29 21:04:10 +10:00
Richard Henderson
991bd65ddd accel/tcg: Use CPUState.cc instead of CPU_GET_CLASS in cpu-exec.c
CPU_GET_CLASS does runtime type checking; use the cached
copy of the class instead.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-01-29 21:04:10 +10:00
Philippe Mathieu-Daudé
93c6091bfa accel/tcg: Un-inline icount_exit_request() for clarity
Convert packed logic to dumb icount_exit_request() helper.
No functional change intended.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Anton Johansson <anjo@rev.ng>
Message-Id: <20240124101639.30056-5-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-01-29 21:04:10 +10:00