The version of tcg_gen_ld8s_i64 for 32-bit systems does a load into
the low part of the return value - then attempts a sign extension into
the high part, but wrongly sets the high part to a sign extension of
itself rather than of the low part. This results in TCG internal
errors from the use of the uninitialized high part (in some GCC tests
of AArch64 NEON shift intrinsics, in particular). This patch corrects
the sign-extension logic, making it match other functions such as
tcg_gen_ld16s_i64.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Message-Id: <alpine.DEB.2.20.1610272333560.22353@digraph.polyomino.org.uk>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The typedefs we use for the TCGv_i32, TCGv_i64 and TCGv_ptr
types are somewhat confusing, because we define them as
pointers to structs, but the structs themselves are never
defined. Explain in the comments a bit more clearly why
this is OK and what is going on under the hood.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <1477067922-26202-1-git-send-email-peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This multiply has one signed input and one unsigned input,
producing the full double-width result.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1475011433-24456-2-git-send-email-rth@twiddle.net>
Reuse the existing locking provided by stdio to keep in_asm, cpu,
op, op_opt, op_ind, and out_asm as contiguous blocks.
While it isn't possible to interleave e.g. in_asm or op_opt logs
because of the TB lock protecting all code generation, it is
possible to interleave cpu logs, or to interleave a cpu dump with
an out_asm dump.
For mingw32, we appear to have no viable solution for this. The locking
functions are not properly exported from the system runtime library.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
softmmu requires more functions to be thread-safe, because translation
blocks can be invalidated from e.g. notdirty callbacks. Probably the
same holds for user-mode emulation, it's just that no one has ever
tried to produce a coherent locking there.
This patch will guide the introduction of more tb_lock and tb_unlock
calls for system emulation.
Note that after this patch some (most) of the mentioned functions are
still called outside tb_lock/tb_unlock. The next one will rectify this.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-Id: <20161027151030.20863-7-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Allow qemu to build on 32-bit hosts without 64-bit atomic ops.
Even if we only allow 32-bit hosts to multi-thread emulate 32-bit
guests, we still need some way to handle the 32-bit guest using a
64-bit atomic operation. Do so by dropping back to single-step.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Force the use of cmpxchg16b on x86_64.
Wikipedia suggests that only very old AMD64 (circa 2004) did not have
this instruction. Further, it's required by Windows 8 so no new cpus
will ever omit it.
If we truely care about these, then we could check this at startup time
and then avoid executing paths that use it.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Add all of cmpxchg, op_fetch, fetch_op, and xchg.
Handle both endian-ness, and sizes up to 8.
Handle expanding non-atomically, when emulating in serial.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
When we cannot emulate an atomic operation within a parallel
context, this exception allows us to stop the world and try
again in a serial context.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This comes from free from unifying tcg_reg_alloc_mov and
tcg_reg_alloc_movi's handling of TEMP_VAL_CONST. It triggers
often on moves to cc_dst, such as the following translation
of "sub $0x3c,%esp":
before: after:
subl $0x3c,%ebp subl $0x3c,%ebp
movl %ebp,0x10(%r14) movl %ebp,0x10(%r14)
movl $0x3c,%ebx movl $0x3c,0x2c(%r14)
movl %ebx,0x2c(%r14)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1473945360-13663-1-git-send-email-pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is to appease sanitizer builds which complain that:
"error: control reaches end of non-void function"
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20160930213106.20186-5-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
TARGET_PAGE_MASK, as defined, has type "int". We need to extend
that to the proper target width before oring in an "unsigned".
Signed-off-by: Richard Henderson <rth@twiddle.net>
This commit optimizes fence instructions. Two optimizations are
currently implemented: (1) unnecessary duplicate fence instructions,
and (2) merging weaker fences into a stronger fence.
[rth: Merge tcg_optimize_mb back into tcg_optimize, so that we only
loop over the opcode stream once. Merge "unrelated" weaker barriers
into one stronger barrier.]
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160823134825.32578-1-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Cc: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-11-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Generate a 'lock orl $0,0(%esp)' instruction for ordering instead of
mfence which has similar ordering semantics.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-3-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This commit introduces the TCGOpcode for memory barrier instruction.
This opcode takes an argument which is the type of memory barrier
which should be generated.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160714202026.9727-2-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Previously we allowed fully unaligned operations, but not operations
that are aligned but with less alignment than the operation size.
In addition, arm32, ia64, mips, and sparc had been omitted from the
previous overalignment patch, which would have led to that alignment
being enforced.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Unused function declarations were found using a simple gcc plugin and
manually verified by grepping the sources.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
host-utils.h and timer.h are included twice in tcg.c.
One time should be enough.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The HPPA backend has been removed by the following commit:
802b508123
tcg-hppa: Remove tcg backend
But some small pieces of the HPPA backend still survived until
today. Since we also do not have support for a HPPA target in
QEMU, we can nowadays safely remove the remaining HPPA parts
(like the disassembler code, or the detection of HPPA in the
configure script).
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Rather than rely on recursion during the middle of register allocation,
lower indirect registers to loads and stores off the indirect base into
plain temps.
For an x86_64 host, with sufficient registers, this results in identical
code, modulo the actual register assignments.
For an i686 host, with insufficient registers, this means that temps can
be (temporarily) spilled to the stack in order to satisfy an allocation.
This as opposed to the possibility of not being able to spill, to allocate
a register for the indirect base, in order to perform a spill.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
We only need two bits per temporary. Fold the two bytes into one,
and reduce the memory and cachelines required during compilation.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reduce the size of other bitfields to make room.
This reduces the cache footprint of compilation.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Instead of using -1 as end of chain, use 0, and link through the 0
entry as a fully circular double-linked list.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This reduces both memory usage and per-insn cacheline usage
during code generation.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Assertions help both Coverity and the clang static analyzer avoid
false positives, but on the other hand both are confused when
the condition is compiled as (void)(x != FOO). Always expand
assertion macros when using Coverity or clang, through a new
QEMU_STATIC_ANALYSIS preprocessor symbol.
This fixes a couple false positives in TCG.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These use guard symbols like TCG_TARGET_$target.
scripts/clean-header-guards.pl doesn't like them because they don't
match their file name (they should, to make guard collisions less
likely).
Clean them up: use guard symbol $target_TCG_TARGET_H for
tcg/$target/tcg-target.h.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Some architectures (e.g. ARMv8) need the address which is aligned
to a size more than the size of the memory access.
To support such check it's enough the current costless alignment
check implementation in QEMU, but we need to support
an alignment size specifying.
Signed-off-by: Sergey Sorokin <afarallax@yandex.ru>
Message-Id: <1466705806-679898-1-git-send-email-afarallax@yandex.ru>
Signed-off-by: Richard Henderson <rth@twiddle.net>
[rth: Assert in tcg_canonicalize_memop. Leave get_alignment_bits
available for, though unused by, user-mode. Retain logging difference
based on ALIGNED_ONLY.]
While we can store constants via constrants on INDEX_op_st_i32 et al,
we weren't able to spill constants to backing store.
Add a new backend interface, tcg_out_sti, which may store the constant
(and is allowed to fail). Rearrange the temp_* helpers so that we only
attempt to directly store a constant when the temp is becoming dead/free.
Signed-off-by: Richard Henderson <rth@twiddle.net>
The event is described in "trace-events". Note that the "MO_AMASK" flag
is not traced, since it does not seem to affect the visible semantics of
instructions.
[s/inline inline/inline/ to fix clang build.
--Stefan]
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 146549350711.18437.726780393247474362.stgit@fimbulvetr.bsc.es
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Information is tracked inside the TCGContext structure, and later used
by tracing events with the 'tcg' and 'vcpu' properties.
The 'cpu' field is used to check tracing of translation-time
events ("*_trans"). The 'tcg_env' field is used to pass it to
execution-time events ("*_exec").
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 146549350162.18437.3033661139638458143.stgit@fimbulvetr.bsc.es
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
exec-all.h contains TCG-specific definitions. It is not needed outside
TCG-specific files such as translate.c, exec.c or *helper.c.
One generic function had snuck into include/exec/exec-all.h; move it to
include/qom/cpu.h.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
TCG backends do not need most of exec-all.h; extract what they actually
need to a separate file or move it directly to tcg.h. The next patch
will stop including exec-all.h from everywhere.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The value returned from tcg_qemu_tb_exec() is the value passed to the
corresponding tcg_gen_exit_tb() at translation time of the last TB
attempted to execute. It is a little confusing to store it in a variable
named 'next_tb'. In fact, it is a combination of 4-byte aligned pointer
and additional information in its two least significant bits. Break it
down right away into two variables named 'last_tb' and 'tb_exit' which
are a pointer to the last TB attempted to execute and the TB exit
reason, correspondingly. This simplifies the code and improves its
readability.
Correct a misleading documentation comment for tcg_qemu_tb_exec() and
fix logging in cpu_tb_exec(). Also rename a misleading 'next_tb' in
another couple of places.
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
In user mode, there's only a static address translation, TBs are always
invalidated properly and direct jumps are reset when mapping change.
Thus the destination address is always valid for direct jumps and
there's no need to restrict it to the pages the TB resides in.
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Cc: Riku Voipio <riku.voipio@iki.fi>
Cc: Blue Swirl <blauwirbel@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>