replay API is used deeply within TCG common code (common to user
and system emulation). Unfortunately "sysemu/replay.h" requires
some QAPI headers for few system-specific declarations, example:
void replay_input_event(QemuConsole *src, InputEvent *evt);
Since commit c2651c0eaa ("qapi/meson: Restrict UI module to system
emulation and tools") the QAPI header defining the InputEvent is
not generated anymore.
To keep it simple, extract the 'core' replay prototypes to a new
"exec/replay-core.h" header which we include in the TCG code that
doesn't need the rest of the replay API.
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Message-Id: <20221219170806.60580-5-philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Silent when compiling with -Wextra:
../accel/kvm/kvm-all.c:2291:17: warning: missing field 'num' initializer [-Wmissing-field-initializers]
{ NULL, }
^
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20221220143532.24958-3-philmd@linaro.org>
Unused since introduction in commit 04b0de0ee8
("xen: factor out common functions").
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Paul Durrant <paul@xen.org>
Message-Id: <20230215153451.30626-1-philmd@linaro.org>
Since commit a0e61807a3 ("qapi: Remove QMP events and commands from
user-mode builds") we don't generate the "qapi-commands-machine.h"
header in a user-emulation-only build.
Rename 'hmp.c' as 'monitor.c' and move the QMP functions from
cpu-exec.c (which is always compiled) to monitor.c (which is only
compiled when system-emulation is selected).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221219170806.60580-4-philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221219170806.60580-2-philmd@linaro.org>
Both insert/remove_breakpoint() handlers are used in system and
user emulation. We can not use the 'hwaddr' type on user emulation,
we have to use 'vaddr' which is defined as "wide enough to contain
any #target_ulong virtual address".
gdbstub.c doesn't require to include "exec/hwaddr.h" anymore.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221216215519.5522-4-philmd@linaro.org>
If an instruction straddles a page boundary, and the first page
was ram, but the second page was MMIO, we would abort. Handle
this as if both pages are MMIO, by setting the ram_addr_t for
the first page to -1.
Reported-by: Sid Manning <sidneym@quicinc.com>
Reported-by: Jørgen Hansen <Jorgen.Hansen@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This commit was created with scripts/clean-includes.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20230202133830.2152150-19-armbru@redhat.com>
This will allow targets to avoid rolling their own.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These are not yet considering atomicity of the 16-byte value;
this is a direct replacement for the current target code which
uses a pair of 8-byte operations.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
After commit 4e4fa6c12d ("accel/tcg: Complete cpu initialization
before registration"), it looks the CPUJumpCache pointer can be NULL.
This causes a SIGSEV when running debug-wp-migration kvm unit test.
At the first place it should be clarified why this TCG code is called
with KVM acceleration. This may hide another bug.
Fixes: 4e4fa6c12d ("accel/tcg: Complete cpu initialization before registration")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20230203171510.2867451-1-eric.auger@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This moves these commands from MAINTAINERS section "QMP" to new
section "Stats". Status is Orphan. Volunteers welcome!
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230124121946.1139465-23-armbru@redhat.com>
There are actually a whole bunch of helpers that don't affect memory
that we shouldn't instrument. They are helpfully identified by the
TCG_CALL_NO_SIDE_EFFECTS flag which marks out lookup_tb_ptr as well as
a lot of the maths helpers. To avoid the string compare we introduce a
new flag for plugin internals so we skip that too.
Related: #1381
Signed-off-by: Emilio Cota <cota@braap.org>
Message-Id: <20230108164731.61469-4-cota@braap.org>
[AJB: updated to skip all no SE plugins, add flag for plugin helper]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230124180127.1881110-34-alex.bennee@linaro.org>
We were mistakenly checking tcg_ctx->plugin_insn as a canary to know
whether the TB had emitted helpers that might have accessed memory.
The problem is that tcg_ctx->plugin_insn gets updated on every
instruction in the TB, which results in us wrongly performing the
optimization (i.e. not clearing cpu->plugin_mem_cbs) way too often,
since it's not rare that the last instruction in the TB doesn't
use helpers.
Fix it by tracking a per-TB canary.
While at it, expand documentation.
Related: #1381
Signed-off-by: Emilio Cota <cota@braap.org>
Message-Id: <20230108164731.61469-2-cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230124180127.1881110-32-alex.bennee@linaro.org>
Fixes the appended use-after-free. The root cause is that
during tb invalidation we use CPU_FOREACH, and therefore
to safely free a vCPU we must wait for an RCU grace period
to elapse.
$ x86_64-linux-user/qemu-x86_64 tests/tcg/x86_64-linux-user/munmap-pthread
=================================================================
==1800604==ERROR: AddressSanitizer: heap-use-after-free on address 0x62d0005f7418 at pc 0x5593da6704eb bp 0x7f4961a7ac70 sp 0x7f4961a7ac60
READ of size 8 at 0x62d0005f7418 thread T2
#0 0x5593da6704ea in tb_jmp_cache_inval_tb ../accel/tcg/tb-maint.c:244
#1 0x5593da6704ea in do_tb_phys_invalidate ../accel/tcg/tb-maint.c:290
#2 0x5593da670631 in tb_phys_invalidate__locked ../accel/tcg/tb-maint.c:306
#3 0x5593da670631 in tb_invalidate_phys_page_range__locked ../accel/tcg/tb-maint.c:542
#4 0x5593da67106d in tb_invalidate_phys_range ../accel/tcg/tb-maint.c:614
#5 0x5593da6a64d4 in target_munmap ../linux-user/mmap.c:766
#6 0x5593da6dba05 in do_syscall1 ../linux-user/syscall.c:10105
#7 0x5593da6f564c in do_syscall ../linux-user/syscall.c:13329
#8 0x5593da49e80c in cpu_loop ../linux-user/x86_64/../i386/cpu_loop.c:233
#9 0x5593da6be28c in clone_func ../linux-user/syscall.c:6633
#10 0x7f496231cb42 in start_thread nptl/pthread_create.c:442
#11 0x7f49623ae9ff (/lib/x86_64-linux-gnu/libc.so.6+0x1269ff)
0x62d0005f7418 is located 28696 bytes inside of 32768-byte region [0x62d0005f0400,0x62d0005f8400)
freed by thread T148 here:
#0 0x7f49627b6460 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:52
#1 0x5593da5ac057 in cpu_exec_unrealizefn ../cpu.c:180
#2 0x5593da81f851 (/home/cota/src/qemu/build/qemu-x86_64+0x484851)
Signed-off-by: Emilio Cota <cota@braap.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230111151628.320011-2-cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230124180127.1881110-27-alex.bennee@linaro.org>
qemu/coroutine.h and qemu/lockable.h include each other.
They need each other only in macro expansions, so we could simply drop
both inclusions to break the loop, and add suitable includes to files
that expand the macros.
Instead, move a part of qemu/coroutine.h to new qemu/coroutine-core.h
so that qemu/coroutine-core.h doesn't need qemu/lockable.h, and
qemu/lockable.h only needs qemu/coroutine-core.h. Result:
qemu/coroutine.h includes qemu/lockable.h includes
qemu/coroutine-core.h.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221221131435.3851212-5-armbru@redhat.com>
[Semantic rebase conflict with 7c10cb38cc "accel/tcg: Add debuginfo
support" resolved]
The 'hwaddr' type is defined in "exec/hwaddr.h" as:
hwaddr is the type of a physical address
(its size can be different from 'target_ulong').
All definitions use the 'HWADDR_' prefix, except TARGET_FMT_plx:
$ fgrep define include/exec/hwaddr.h
#define HWADDR_H
#define HWADDR_BITS 64
#define HWADDR_MAX UINT64_MAX
#define TARGET_FMT_plx "%016" PRIx64
^^^^^^
#define HWADDR_PRId PRId64
#define HWADDR_PRIi PRIi64
#define HWADDR_PRIo PRIo64
#define HWADDR_PRIu PRIu64
#define HWADDR_PRIx PRIx64
#define HWADDR_PRIX PRIX64
Since hwaddr's size can be *different* from target_ulong, it is
very confusing to read one of its format using the 'TARGET_FMT_'
prefix, normally used for the target_long / target_ulong types:
$ fgrep TARGET_FMT_ include/exec/cpu-defs.h
#define TARGET_FMT_lx "%08x"
#define TARGET_FMT_ld "%d"
#define TARGET_FMT_lu "%u"
#define TARGET_FMT_lx "%016" PRIx64
#define TARGET_FMT_ld "%" PRId64
#define TARGET_FMT_lu "%" PRIu64
Apparently this format was missed during commit a8170e5e97
("Rename target_phys_addr_t to hwaddr"), so complete it by
doing a bulk-rename with:
$ sed -i -e s/TARGET_FMT_plx/HWADDR_FMT_plx/g $(git grep -l TARGET_FMT_plx)
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230110212947.34557-1-philmd@linaro.org>
[thuth: Fix some warnings from checkpatch.pl along the way]
Signed-off-by: Thomas Huth <thuth@redhat.com>
We now have the option to generate direct or indirect
goto_tb depending on the dynamic displacement, thus
the define is no longer necessary or completely accurate.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Replace 'tc_ptr' and 'addr' with 'tb' and 'n'.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Stop overloading jmp_target_arg for both offset and address,
depending on TCG_TARGET_HAS_direct_jump. Instead, add a new
field to hold the jump insn offset and always set the target
address in jmp_target_addr[]. This will allow a tcg backend
to use either direct or indirect depending on displacement.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This can replace four other variables that are references
into the TranslationBlock structure.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This will shortly be used for more than reset.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Recently the g_assert(cpu == current_cpu) test has been
intermittently failing with gcc. Reorg the code around
the setjmp to minimize the lifetime of the cpu variable
affected by the setjmp.
This appears to fix the existing issue with clang as well.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1147
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Add ability to dump /tmp/perf-<pid>.map and jit-<pid>.dump.
The first one allows the perf tool to map samples to each individual
translation block. The second one adds the ability to resolve symbol
names, line numbers and inspect JITed code.
Example of use:
perf record qemu-x86_64 -perfmap ./a.out
perf report
or
perf record -k 1 qemu-x86_64 -jitdump ./a.out
DEBUGINFOD_URLS= perf inject -j -i perf.data -o perf.data.jitted
perf report -i perf.data.jitted
Co-developed-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Co-developed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-Id: <20230112152013.125680-4-iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Add libdw-based functions for loading and querying debuginfo. Load
debuginfo from the system and the linux-user loaders.
This is useful for the upcoming perf support, which can then put
human-readable guest symbols instead of raw guest PCs into perfmap and
jitdump files.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-Id: <20230112152013.125680-3-iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
If we update an existing memslot (e.g., resize, split), we temporarily
remove the memslot to re-add it immediately afterwards. These updates
are not atomic, especially not for KVM VCPU threads, such that we can
get spurious faults.
Let's inhibit most KVM ioctls while performing relevant updates, such
that we can perform the update just as if it would happen atomically
without additional kernel support.
We capture the add/del changes and apply them in the notifier commit
stage instead. There, we can check for overlaps and perform the ioctl
inhibiting only if really required (-> overlap).
To keep things simple we don't perform additional checks that wouldn't
actually result in an overlap -- such as !RAM memory regions in some
cases (see kvm_set_phys_mem()).
To minimize cache-line bouncing, use a separate indicator
(in_ioctl_lock) per CPU. Also, make sure to hold the kvm_slots_lock
while performing both actions (removing+re-adding).
We have to wait until all IOCTLs were exited and block new ones from
getting executed.
This approach cannot result in a deadlock as long as the inhibitor does
not hold any locks that might hinder an IOCTL from getting finished and
exited - something fairly unusual. The inhibitor will always hold the BQL.
AFAIKs, one possible candidate would be userfaultfd. If a page cannot be
placed (e.g., during postcopy), because we're waiting for a lock, or if the
userfaultfd thread cannot process a fault, because it is waiting for a
lock, there could be a deadlock. However, the BQL is not applicable here,
because any other guest memory access while holding the BQL would already
result in a deadlock.
Nothing else in the kernel should block forever and wait for userspace
intervention.
Note: pause_all_vcpus()/resume_all_vcpus() or
start_exclusive()/end_exclusive() cannot be used, as they either drop
the BQL or require to be called without the BQL - something inhibitors
cannot handle. We need a low-level locking mechanism that is
deadlock-free even when not releasing the BQL.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Tested-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221111154758.1372674-4-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using the new accel-blocker API, mark where ioctls are being called
in KVM. Next, we will implement the critical section that will take
care of performing memslots modifications atomically, therefore
preventing any new ioctl from running and allowing the running ones
to finish.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221111154758.1372674-3-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This API allows the accelerators to prevent vcpus from issuing
new ioctls while execting a critical section marked with the
accel_ioctl_inhibit_begin/end functions.
Note that all functions submitting ioctls must mark where the
ioctl is being called with accel_{cpu_}ioctl_begin/end().
This API requires the caller to always hold the BQL.
API documentation is in sysemu/accel-blocker.h
Internally, it uses a QemuLockCnt together with a per-CPU QemuLockCnt
(to minimize cache line bouncing) to keep avoid that new ioctls
run when the critical section starts, and a QemuEvent to wait
that all running ioctls finish.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221111154758.1372674-2-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Always send errors to logfile when daemonized (Greg)
* Add support for IDE CompactFlash card (Lubomir)
* First round of build system cleanups (myself)
* First round of feature removals (myself)
* Reduce "qemu/accel.h" inclusion (Philippe)
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmO3Ym0UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNYmwf+LHEw+4T0fk1+2NfgIzH3+8s1EqDm
Ai56EjxO/p5NUptflXAnhn4P3LawswmmNE0ZIFFFBgwG5E9L+Jj/u5efuLu4uYPg
bboEBDn8nxSNN2l08u9TyS6kSWSxbwwrs7i2+V+4uQIlVIcCHu+A0vpXns4vWwY0
zZGF8CgJKDQdPIxdXrH8+6/xtadQ8uDkYsAWDiY/nhozCsCUTAZGTXWEQbHJLARI
Z4X+Cmz/NFB9G4ka6K/y0HbQw99KA8G/EMPUSglN0ya10yjpyzrmeI7IlIves+5U
8lhCZXyBhaV9GXlIK1vIgEXlHf83C19a+v0DpW0bpxK631n2VR5y3CArBg==
=2Koq
-----END PGP SIGNATURE-----
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* Atomic memslot updates for KVM (Emanuele, David)
* Always send errors to logfile when daemonized (Greg)
* Add support for IDE CompactFlash card (Lubomir)
* First round of build system cleanups (myself)
* First round of feature removals (myself)
* Reduce "qemu/accel.h" inclusion (Philippe)
# gpg: Signature made Thu 05 Jan 2023 23:51:09 GMT
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (24 commits)
i386: SGX: remove deprecated member of SGXInfo
target/i386: Add SGX aex-notify and EDECCSSA support
util: remove support -chardev tty and -chardev parport
util: remove support for hex numbers with a scaling suffix
KVM: remove support for kernel-irqchip=off
docs: do not talk about past removal as happening in the future
meson: accept relative symlinks in "meson introspect --installed" data
meson: cleanup compiler detection
meson: support meson 0.64 -Doptimization=plain
configure: test all warnings
tests/qapi-schema: remove Meson workaround
meson: cleanup dummy-cpus.c rules
meson: tweak hardening options for Windows
configure: remove backwards-compatibility and obsolete options
configure: preserve qemu-ga variables
configure: cleanup $cpu tests
configure: remove dead function
configure: remove useless write_c_skeleton
ide: Add "ide-cf" driver, a CompactFlash card
ide: Add 8-bit data mode
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Now that qtest is available on all targets including Windows, dummy-cpus.c
is included unconditionally in the build. It also does not need to be
compiled per-target.
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As in page_get_flags, we need to try again with the mmap
lock held if we fail a page lookup.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Because we allow lockless lookups, we have to be careful
when it is freed. Use rcu to delay the free until safe.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
When called from syscall(), we are not within a TB and pc == 0.
We can skip the check for invalidating the current TB.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We have been allocating a worst case number of arguments
to support calls. Instead, allow the size to vary.
By default leave space for 4 args, to maximize reuse,
but allow calls to increase the number of args to 32.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
[PMD: Split patch in two]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221218211832.73312-3-philmd@linaro.org>
In order to have variable size allocated TCGOp, pass the number
of arguments we use (and would allocate) up to tcg_op_alloc().
This alters tcg_emit_op(), tcg_op_insert_before() and
tcg_op_insert_after() prototypes.
In tcg_op_alloc() ensure the number of arguments is in range.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
[PMD: Extracted from bigger patch]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221218211832.73312-2-philmd@linaro.org>
Better to re-use the existing function for copying ops.
Acked-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We copied all of the arguments in copy_op_nocheck.
We only need to replace the one argument that we change.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The function pointer is immediately after the output and input
operands; no need to search.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Narrow the scope of the lock to the actual read/write,
moving the cpu_transation_failed call outside the lock.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Only the system emulation part of TB maintainance uses the
page_collection structure. Restrict its declaration (and the
functions requiring it) to tb-maint.c.
Convert the 'len' argument of tb_invalidate_phys_page_fast__locked()
from signed to unsigned.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20221209093649.43738-6-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20221209093649.43738-5-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Emphasize this function is called with pages locked.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20221209093649.43738-4-philmd@linaro.org>
[rth: Use "__locked" suffix, to match other instances.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit d9bb58e510 ("tcg: move tcg related files into accel/tcg/
subdirectory") introduced accel/tcg/trace-events, so we don't
need to use the root trace-events anymore.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20221209093649.43738-3-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Missed in commit 6526919224 ("accel/tcg: Restrict cpu_io_recompile()
from other accelerators").
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20221209093649.43738-2-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The only thing that still touches PageDesc in translate-all.c
are some locking routines related to tb-maint.c which have not
yet been moved. Do so now.
Move some code up in tb-maint.c as well, to untangle the maze
of ifdefs, and allow a sensible final ordering.
Move some declarations from exec/translate-all.h to internal.h,
as they are only used within accel/tcg/.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Now that PageDesc is not used for user-only, and for system
it is only used for tb maintenance, move the implementation
into tb-main.c appropriately ifdefed.
We have not yet eliminated all references to PageDesc for
user-only, so retain a typedef to the structure without definition.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This page tracking implementation is specific to user-only,
since the system softmmu version is in cputlb.c. Move it
out of translate-all.c to user-exec.c.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Make bsd-user match linux-user in not marking host pages
as reserved. This isn't especially effective anyway, as
it doesn't take into account any heap memory that qemu
may allocate after startup.
Reviewed-by: Warner Losh <imp@bsdimp.com>
Tested-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Continue weaning user-only away from PageDesc.
Use an interval tree to record target data.
Chunk the data, to minimize allocation overhead.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Begin weaning user-only away from PageDesc.
Since, for user-only, all TB (and page) manipulation is done with
a single mutex, and there is no virtual/physical discontinuity to
split a TB across discontinuous pages, place all of the TBs into
a single IntervalTree. This makes it trivial to find all of the
TBs intersecting a range.
Retain the existing PageDesc + linked list implementation for
system mode. Move the portion of the implementation that overlaps
the new user-only code behind the common ifdef.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Rename to tb_remove_all, to remove the PageDesc "page" from the name,
and to avoid suggesting a "flush" in the icache sense.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit 012d4c96e2 changed the visitor functions taking Error ** to
return bool instead of void, and the commits following it used the new
return value to simplify error checking. Since then a few more uses
in need of the same treatment crept in. Do that. All pretty
mechanical except for
* balloon_stats_get_all()
This is basically the same transformation commit 012d4c96e2 applied
to the virtual walk example in include/qapi/visitor.h.
* set_max_queue_size()
Additionally replace "goto end of function" by return.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221121085054.683122-10-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Isolate the code protected by setjmp. Fixes:
translate-all.c: In function ‘tb_gen_code’:
translate-all.c:748:51: error: argument ‘cflags’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered]
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
if QEMU is configured with modules enabled, it is possible that the
load of an accelerator module will fail.
Exit in this case, relying on module_object_class_by_name to report
the specific load error if any.
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[claudio: changed abort() to exit(1)]
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220929093035.4231-6-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Delay cpu_list_add until realize is complete, so that cross-cpu
interaction does not happen with incomplete cpu state. For this,
we must delay plugin initialization out of tcg_exec_realizefn,
because no cpu_index has been assigned.
Fixes a problem with cross-cpu jump cache flushing, when the
jump cache has not yet been allocated.
Fixes: a976a99a29 ("include/hw/core: Create struct CPUJumpCache")
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reported-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The value passed is always true, and if the target's
synchronize_from_tb hook is non-trivial, not exiting
may be erroneous.
Reviewed-by: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Add a way to examine the unwind data without actually
restoring the data back into env.
Reviewed-by: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Currently signal SIGIPI [=SIGUSR1] is used to kick the dummy CPU
when qtest accelerator is used. However SIGUSR1 is unsupported on
Windows. To support Windows, we add a QemuSemaphore CPUState::sem
to kick the dummy CPU instead for Windows.
Signed-off-by: Xuzhou Cheng <xuzhou.cheng@windriver.com>
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20221028045736.679903-2-bin.meng@windriver.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
All targets have been updated. Use the tcg_ops target hook
exclusively, which allows the compat code to be removed.
Reviewed-by: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Add a tcg_ops hook to replace the restore_state_to_opc
function call. Because these generic hooks cannot depend
on target-specific types, temporarily, copy the current
target_ulong data[] into uint64_t d64[].
Reviewed-by: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Since the only user, Arm MTE, always requires allocation,
merge the get and alloc functions to always produce a
non-null result. Also assume that the user has already
checked page validity.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Since "target data" is always user-only, move it out of
translate-all.c to user-exec.c.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Flush translation blocks in bulk, rather than page-by-page.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Use the existing function for clearing target data.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
When PAGE_RESET is set, we are replacing pages with new
content, which means that we need to invalidate existing
cached data, such as TranslationBlocks. Perform the
reset invalidate while we're doing other invalidates,
which allows us to remove the separate invalidates from
the user-only mmap/munmap/mprotect routines.
In addition, restrict invalidation to PAGE_EXEC pages.
Since cdf7130851, we have validated PAGE_EXEC is present
before translation, which means we can assume that if the
bit is not present, there are no translations to invalidate.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We do not require detection of overlapping TBs here,
so use the more appropriate function.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We missed this function when we introduced tb_page_addr_t.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This function is is never called with a real range,
only for a single page. Drop the second parameter
and rename to tb_invalidate_phys_page.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Rename to tb_invalidate_phys_page_unwind to emphasize that
we also detect invalidating the current TB, and also to free
up that name for other usage.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This data structure will be replaced for user-only: add accessors.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
When we added the fast path, we initialized page_addr[] early.
These stores in and around tb_page_add() are redundant; remove them.
Fixes: 50627f1b7b ("accel/tcg: Add fast path for translator_ld*")
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The results of the calls to cpu_get_tb_cpu_state,
current_{pc,cs_base,flags}, are not used.
In tb_invalidate_phys_page, use bool for current_tb_modified.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
There are no users outside of accel/tcg; this function
does not need to be defined in exec-all.h.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Move all of the TranslationBlock flushing and page linking
code from translate-all.c to tb-maint.c.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These items printf, and could be replaced with proper
tracepoints if we really cared.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Use a constant target data allocation size for all pages.
This will be necessary to reduce overhead of page tracking.
Since TARGET_PAGE_DATA_SIZE is now required, we can use this
to omit data tracking for targets that don't require it.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Profiling QEMU during Fedora 35 for PPC64 boot revealed that a
considerable amount of time was being spent in
check_for_breakpoints() (0.61% of total time on PPC64 and 2.19% on
amd64), even though it was just checking that its queue was empty
and returning, when no breakpoints were set. It turns out this
function is not inlined by the compiler and it's always called by
helper_lookup_tb_ptr(), one of the most called functions.
By leaving only the check for empty queue in
check_for_breakpoints() and moving the remaining code to
check_for_breakpoints_slow(), called only when the queue is not
empty, it's possible to avoid the call overhead. An improvement of
about 3% in total time was measured on POWER9.
Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221025202424.195984-2-leandro.lupori@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit a82fd5a4ec was intended to be a code cleanup, but
unfortunately it has a bug. It moves the initialization of the
TCG cflags from the "start a new vcpu" function to the
thread handler; this is fine when each vcpu has its own thread,
but when we are doing round-robin of vcpus on a single thread
we end up only initializing the cflags for CPU 0, not for any
of the others.
The most obvious effect of this bug is that running in icount
mode with more than one CPU is broken; typically the guest
hangs shortly after it brings up the secondary CPUs.
This reverts commit a82fd5a4ec.
Cc: qemu-stable@nongnu.org
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20221021163409.3674911-1-peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
There are cases that malicious virtual machine can cause CPU stuck (due
to event windows don't open up), e.g., infinite loop in microcode when
nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
IRQ) can be delivered. It leads the CPU to be unavailable to host or
other VMs. Notify VM exit is introduced to mitigate such kind of
attacks, which will generate a VM exit if no event window occurs in VM
non-root mode for a specified amount of time (notify window).
A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space
so that the user can query the capability and set the expected notify
window when creating VMs. The format of the argument when enabling this
capability is as follows:
Bit 63:32 - notify window specified in qemu command
Bit 31:0 - some flags (e.g. KVM_X86_NOTIFY_VMEXIT_ENABLED is set to
enable the feature.)
Users can configure the feature by a new (x86 only) accel property:
qemu -accel kvm,notify-vmexit=run|internal-error|disable,notify-window=n
The default option of notify-vmexit is run, which will enable the
capability and do nothing if the exit happens. The internal-error option
raises a KVM internal error if it happens. The disable option does not
enable the capability. The default value of notify-window is 0. It is valid
only when notify-vmexit is not disabled. The valid range of notify-window
is non-negative. It is even safe to set it to zero since there's an
internal hardware threshold to be added to ensure no false positive.
Because a notify VM exit may happen with VM_CONTEXT_INVALID set in exit
qualification (no cases are anticipated that would set this bit), which
means VM context is corrupted. It would be reflected in the flags of
KVM_EXIT_NOTIFY exit. If KVM_NOTIFY_CONTEXT_INVALID bit is set, raise a KVM
internal error unconditionally.
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220929072014.20705-5-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Expose struct KVMState out of kvm-all.c so that the field of struct
KVMState can be accessed when defining target-specific accelerator
properties.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220929072014.20705-4-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Several hypervisor capabilities in KVM are target-specific. When exposed
to QEMU users as accelerator properties (i.e. -accel kvm,prop=value), they
should not be available for all targets.
Add a hook for targets to add their own properties to -accel kvm, for
now no such property is defined.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220929072014.20705-3-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This removes the final hard coding of kvm_enabled() in gdbstub and
moves the check to an AccelOps.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Mads Ynddal <mads@ynddal.dk>
Message-Id: <20220929114231.583801-46-alex.bennee@linaro.org>
As HW virtualization requires specific support to handle breakpoints
lets push out special casing out of the core gdbstub code and into
AccelOpsClass. This will make it easier to add other accelerator
support and reduces some of the stub shenanigans.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Mads Ynddal <mads@ynddal.dk>
Message-Id: <20220929114231.583801-45-alex.bennee@linaro.org>
The support of single-stepping is very much dependent on support from
the accelerator we are using. To avoid special casing in gdbstub move
the probing out to an AccelClass function so future accelerators can
put their code there.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Mads Ynddal <mads@ynddal.dk>
Message-Id: <20220929114231.583801-44-alex.bennee@linaro.org>
Prepare for targets to be able to produce TBs that can
run in more than one virtual context.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The availability of tb->pc will shortly be conditional.
Introduce accessor functions to minimize ifdefs.
Pass around a known pc to places like tcg_gen_code,
where the caller must already have the value.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Wrap the bare TranslationBlock pointer into a structure.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This function has two users, who use it incompatibly.
In tlb_flush_page_by_mmuidx_async_0, when flushing a
single page, we need to flush exactly two pages.
In tlb_flush_range_by_mmuidx_async_0, when flushing a
range of pages, we need to flush N+1 pages.
This avoids double-flushing of jmp cache pages in a range.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>