In particular, make sure the memory is memset before use.
Continues the increased use of TCGTemp pointers instead of
integer indices where appropriate.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Chain the temporaries together via pointers intstead of indices.
The mem_reg value is now mem_base->reg. This will be important later.
This does require that the frame pointer have a global temporary
allocated for it. This is simple bar the existing reserved_regs check.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Thus, use cpu_env as the parameter, not TCG_AREG0 directly.
Update all uses in the translators.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Undo the workaround at b17a6d3390.
If there are lots of memory operations in a TB, the slow path code
can exceed the highwater reservation. Add a check within the loop.
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Split the bits that require it to exec/log.h.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-id: 1452174932-28657-8-git-send-email-den@openvz.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-16-git-send-email-peter.maydell@linaro.org
If there are a lot of guest memory ops in the TB, the amount of
code generated by tcg_out_tb_finalize could be well more than 1k.
In the short term, increase the reservation larger than any TB
seen in practice.
Reported-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
A simple typo in the variable to use when comparing vs the highwater mark.
Reports are that qemu can in fact segfault occasionally due to this mistake.
Signed-off-by: John Clarke <johnc@kirriwa.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
We currently pre-compute an worst case code size for any TB, which
works out to be 122kB. Since the average TB size is near 1kB, this
wastes quite a lot of storage.
Instead, check for overflow in between generating code for each opcode.
The overhead of the check isn't measurable and wastage is minimized.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
By putting the prologue at the end, we risk overwriting the
prologue should our estimate of maximum TB size. Given the
two different placements of the call to tcg_prologue_init,
move the high water mark computation into tcg_prologue_init.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
It's no longer used, so tidy up everything reached by it.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
We can now restore state without retranslation.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The gen_opc_* arrays are already redundant with the data stored in
the insn_start arguments. Transition restore_state_to_opc to use
data from the latter.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
With an eye toward having this data replace the gen_opc_* arrays
that each target collects in order to enable restore_state_from_tb.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
With an eye toward making it mandatory.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
tcg_op_defs (and the _max) are both needed by the TCI disassembler. For
multi-arch, tcg.c will be multiple-compiled (arch-obj) with its symbols
hidden from common code. So split the definition off to new file,
tcg-common.c which will remain a regular obj-y for use by both the TCI
disas as well as the multiple tcg.c's.
Cc: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-Id: <4b607425886d85aee65878e4935dfad46b3e6085.1441614289.git.crosthwaite.peter@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When a constant has to be loaded in a mov op, we fail to set
mem_coherent = 0. This patch fixes that.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1437994568-7825-3-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
When tcg_reg_alloc_mov propagate a constant, we failed to correctly mark
a temp as dead if the liveness analysis hints so. This fixes the
following assert when configure with --enable-debug-tcg:
qemu-x86_64: tcg/tcg.c:1827: tcg_reg_alloc_bb_end: Assertion `ts->val_type == TEMP_VAL_DEAD' failed.
Cc: Richard Henderson <rth@twiddle.net>
Reported-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1437994568-7825-2-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
When the same temp is used twice or more as an input argument to a TCG
instruction, the dead computation code doesn't recognize the second use
as a dead temp. This is because the temp is marked as live in the same
loop where dead inputs are checked.
The fix is to split the loop in two parts. This avoid emitting a move
and using a register for the movcond instruction when used as "move if
true" on x86-64. This might bring more improvements on RISC TCG targets
which don't have outputs aliased to inputs.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447228-29425-3-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
For TCG ops with two outputs registers (add2, sub2, div2, div2u), when
the same input temp is used for the two inputs aliased to the two
outputs, and when these inputs are both dead, the register allocation
code wrongly assigned the same register to the same output.
This happens for example with sub2 t1, t2, t3, t3, t4, t5, when t3 is
not used anymore after the TCG op. In that case the same register is
used for t1, t2 and t3.
The fix is to look for already allocated aliased input when allocating
a dead aliased input and check that the register is not already
used.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447228-29425-2-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
At the tcg opcode level, not at the tcg-op.h generator level.
This requires minor changes through all of the tcg backends,
but none of the cpu translators.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Pre-allocating 512 of them per TB is a waste.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This is less about improved type checking than enabling a
subsequent change to the representation of labels.
Acked-by: Claudio Fontana <claudio.fontana@huawei.com>
Tested-by: Claudio Fontana <claudio.fontana@huawei.com>
Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This is improved type checking for the translators -- it's no longer
possible to accidentally swap arguments to the branch functions.
Note that the code generating backends still manipulate labels as int.
With notable exceptions, the scope of the change is just a few lines
for each target, so it's not worth building extra machinery to do this
change in per-target increments.
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Cc: Michael Walle <michael@walle.cc>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Cc: Anthony Green <green@moxielogic.com>
Cc: Jia Liu <proljc@gmail.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
We no longer need INDEX_op_end to terminate the list, nor do we
need 5 forms of nop, since we just remove the TCGOp instead.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
With the linked list scheme we need not leave nops in the stream
that we need to process later.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The previous setup required ops and args to be completely sequential,
and was error prone when it came to both iteration and optimization.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Some of these functions are really quite large. We have a number of
things that ought to be circularly dependent, but we duplicated code
to break that chain for the inlines.
This saved 25% of the code size of one of the translators I examined.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Currently 'info jit' outputs half of the information to monitor and the
rest to qemu log. Dumping opcode counts to monitor as a part of 'info
jit' command doesn't sound useful. Add new monitor command 'info
opcount' that only dumps opcode counters.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
fopen() may fail and it does not check its return vaule here,
it is better to dump op count to the normal log file.
Signed-off-by: Li Liu <john.liuli@huawei.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The tcg_out* and tcg_patch* functions are utility routines that may or
may not be used by a particular backend; mark them with the 'unused'
attribute to suppress spurious warnings if they aren't used.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
As a "utility", it only supported ppc, and in a way that other
tcg backends provided directly in tcg-target.h. Removing this
disparity is easier now that the two ppc backends are merged.
Tested-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Since all backends have been converted, remove the compatibility code.
Acked-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Adjust the FDE to point to the code_buffer after we've copied it
to the image, rather than requiring that the backend set it prior.
This allows the backend to use read-only storage for its data.
Signed-off-by: Richard Henderson <rth@twiddle.net>
This will let us find all the info from the hash table.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Rather than special casing them, use the standard mechanisms
for tcg helper generation.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Rather than include helper.h with N values of GEN_HELPER, include a
secondary file that sets up the macros to include helper.h. This
minimizes the files that must be rebuilt when changing the macros
for file N.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Commit af3cbfbe80 hoisted some "common"
loads of the temporary type, forgetting that the types could differ
during truncating moves. This affects the correctness of the memory
offset on big-endian hosts.
Tested-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The move opcodes are special in that their constraints must cover
all available registers. So instead of checking the constraints,
just use the available registers.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Avoid allocating a tcg temporary to hold the constant address,
and instead place it directly into the op_call arguments.
At the same time, convert to the newly introduced tcg_out_call
backend function, rather than invoking tcg_out_op for the call.
Signed-off-by: Richard Henderson <rth@twiddle.net>
To be defined by the tcg backend based on the elemental unit of the ISA.
During the transition, allow TCG_TARGET_INSN_UNIT_SIZE to be undefined,
which allows us to default tcg_insn_unit to the current uint8_t.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
To avoid C undefined behaviour when patching generated code,
provide wrappers tcg_patch8/16/32/64 which use the usual memcpy
trick, and use them in the i386 backend.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Avoid stores to unaligned addresses in TCG code generation, by using the
usual memcpy() approach. (Using bswap.h would drag a lot of QEMU baggage
into TCG, so it's simpler just to do direct memcpy() here.)
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Quite a lot of effort was spent composing and decomposing 64-bit
quantities in registers, when we should just create them and leave
them as one 64-bit register.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Most 64-bit targets need to be able to ignore the high bits
of a TCG_TYPE_I32 value.
Suggested-by: Stuart Brady <sdb@zubnet.me.uk>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The second half register of a 64-bit temp on a 32-bit host
was allocated with the wrong base_type.
The base_type of the second half register is never checked,
but for consistency it should be the same as the first half.
Signed-off-by: Richard Henderson <rth@twiddle.net>
We have cache pools of temporaries that we can reuse later when they've
already been allocated before.
These cache pools differenciate between the target TCG variable type they
contain. So we have one pool for I32 and one pool for I64 variables.
On a 32bit system, we can't work with 64bit registers though. So instead we
spawn two I32 temporaries for every I64 temporary we create. All caching
works the same way as on a real 64-bit system though: We create a cache entry
in the 64bit array for the first i32 index.
However, when we free such a temporary we free it to the pool of its type
(which is always i32 on 32bit systems) rather than its base_type (which is
i64 or i32 depending on the variable). This means we put a temporary that
is of base_type == i64 into the i32 preallocated temporary pool.
Eventually, this results in failures like this on 32bit hosts:
qemu-system-ppc64: tcg/tcg.c:515: tcg_temp_new_internal: Assertion `ts->base_type == type' failed.
This patch makes the free routine use the base_type instead for the free case,
so it's consistent with the temporary allocation. It fixes the above failure
for me.
Signed-off-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 1390146811-59936-1-git-send-email-agraf@suse.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We previously allocated 32-bits per temp for the next_free_temp entry.
We now allocate 4 bits per temp across the 4 bitmaps.
Using a linked list meant that if a translator is tweeked, resulting in
temps being freed in a different order, that would have follow-on effects
throughout the TB. Always allocating the lowest free temp means that
follow-on effects are minimized, which can make it easier to diff output
when debugging the translators.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Step two in the transition, adding the new ldst opcodes. Keep the old
opcodes around until all backends support the new opcodes.
Signed-off-by: Richard Henderson <rth@twiddle.net>
For the few targets that actually use these, we'd not report
them symbolicly in the tcg opcode logs.
Signed-off-by: Richard Henderson <rth@twiddle.net>
One call inside of a loop to tcg_register_helper instead of hundreds
of sequential calls.
Presumably more icache and branch prediction friendly; resulting binary
size mostly unchanged on x86_64, as we're trading 32-bit rip-relative
references in .text for full 64-bit pointers in .rodata.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Slightly changes the interface, in that we now return name
instead of a TCGHelperInfo structure, which goes away.
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Use them in places where mulu2 and muls2 are used.
Optimize mulx2 with dead low part to mulxh.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
No point in splitting the write into 32-bit pieces.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Aliasing was forcing s->code_ptr to be re-read after the store.
Keep the pointer in a local variable to help the compiler.
Signed-off-by: Richard Henderson <rth@twiddle.net>
These will necessarily be the same layout for all hosts. This limits
the amount of boilerplate required to implement jit debug for a host.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
I don't think the debugger actually looks at this for anything,
using the correct .debug_frame contents, but might as well get
it all correct.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Expand the definition of "not present" to include "should not be present".
This means we can simplify the logic surrounding the generic tcg opcodes
for which the host backend ought not be providing definitions.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This makes it easier to verify changes to the code
generating the prologue.
[Aurelien: change the format from %i to %zu]
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Commit 7f6f0ae5b9 added two assertions.
One of these assertions is not needed:
The pointer ts is never NULL because it is initialized with the
address of an array element.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
In dead_temp, local temps should always be marked as back to memory,
even if they have not been allocated (i.e. they are discared before
cross a basic block).
It fixes the following assertion in target-xtensa:
qemu-system-xtensa: tcg/tcg.c:1665: temp_save: Assertion `s->temps[temp].val_type == 2 || s->temps[temp].fixed_reg' failed.
Aborted
Reported-by: Max Filippov <jcmvbkbc@gmail.com>
Tested-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Add optimized TCG qemu_ld/st generation which locates the code of TLB miss
cases at the end of a block after generating the other IRs.
Currently, this optimization supports only i386 and x86_64 hosts.
Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Commit 9c43b68de6 do not correctly check
for dead outputs when they need to be synced to memory in case of
half-dead operations.
Fix that by applying the same pattern than for the default case.
Tested-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The current helper flags, TCG_CALL_CONST and TCG_CALL_PURE might be
confusing and doesn't provide enough granularity for some helpers (FP
helpers for example).
This patch changes them into the following helpers flags:
- TCG_CALL_NO_READ_GLOBALS means that the helper does not read globals,
either directly or via an exception. They will not be saved to their
canonical location before calling the helper.
- TCG_CALL_NO_WRITE_GLOBALS means that the helper does not modify any
globals. They will only be saved to their canonical locations before
calling helpers, but they won't be reloaded afterwise.
- TCG_CALL_NO_SIDE_EFFECTS means that the call to the function is
removed if the return value is not used.
It provides convenience flags, to avoid helper definitions longer than
80 characters. It also provides compatibility flags, and updates the
documentation.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Operations with side effects (in practice qemu_ld/st ops), only need to
synchronize globals to make sure the CPU state is consistent in case of
exception.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The liveness analysis ensures that globals and temps are at the correct
state at a basic block end or with an op with side effects. Avoid
looping on all temps, this can be time consuming on targets with a lot
of globals. Keep an assert in debug mode.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Start with local temps in TEMP_VAL_MEM state, to make possible a later
check that all the temps are correctly saved back to memory.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Always mark dead input arguments as dead, even if the op is at the basic
block end. This will allow to check that all temps are correctly saved.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Now that the liveness analysis provides more information, rewrite
tcg_reg_alloc_mov(). This changes the behaviour about propagating
constants and memory accesses. We now take the assumption that once
a value is loaded into a register (from memory or from a constant),
it's better to keep it there than to reload it later. This assumption
is now always almost correct given that we are now sure the
corresponding temp is going to be used later (otherwise it would have
been synchronized and marked as dead already). The assumption is wrong
if one of the op after clobbers some registers including the one
of the holding the temp (this can be avoided by allocating clobbered
registers last, which is what most TCG target do), or in case of lack
of available register.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Now that the liveness analysis might mark some output temps as dead, call
temp_dead() if needed.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Rework the liveness analysis by tracking temps that need to go back to
memory in addition to dead temps tracking. This allows to mark output
arguments as "need sync", and to synchronize them back to memory as soon
as they are not written anymore. This way even arguments mapping to
globals can be marked as "dead", avoiding moves to a new register when
input and outputs are aliased.
In addition it means that registers are freed as soon as temps are not
used anymore, instead of waiting for a basic block end or an op with side
effects. This reduces register spilling especially on CPUs with few
registers, and spread the mov over all the TB, increasing the
performances on in-order CPUs.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Synchronize an output argument when requested by the liveness analysis.
This is needed so that the temp can be declared dead later.
For that, add a new op_sync_args table in which each bit tells if the
corresponding output argument needs to be synchronized with the memory.
Pass it to the tcg_reg_alloc_* functions, and honor this bit. We need to
synchronize the argument before marking it as dead, and we have to make
sure all the infos about the temp are correctly filled.
At the same time change some types from unsigned int to uint16_t when
passing op_dead_args.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Add a new function temp_sync() to synchronize the canonical location
of a temp with the value in the corresponding register, but without
freeing the associated register. Rewrite temp_save() to call
temp_sync() followed by temp_dead().
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Add a new function tcg_reg_sync() to synchronize the canonical location
of a temp with the value in the associated register, but without freeing
it. Rewrite tcg_reg_free() to first call tcg_reg_sync() and then to free
the register.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
A lot of code is duplicated to mark a temporary as dead. Replace it
by temp_dead(), which in addition marks the temp as saved in memory
for globals and local temps, instead of doing this a posteriori in
temp_save().
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Like add2, do operand ordering, constant folding, and dead operand
elimination. The latter happens about 15% of all mulu2 during an
x86_64 bios boot.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When x86_64 guest is not in 64-bit mode, the high-part of the 64-bit
add is dead. When the host is 32-bit, we can simplify to 32-bit
arithmetic.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
GUEST_BASE support is now supported by all TCG backends, and is
now mandatory. Drop the now-pointless TCG_TARGET_HAS_GUEST_BASE
define (set by every backend) and the error if it is unset.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
The pointer entry 'temps' always refers to the array entry 'static_temps'.
Removing the pointer and renaming 'static_temps' to 'temps' reduces the
size of TCGContext (4 or 8 byte) and allows better code generation.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
There are several cases that can be handled easier inside both
translators and code generators if we have out-of-band values
for conditions. It's easy enough to handle ALWAYS and NEVER in
the natural way inside the tcg middle-end.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Checking that we don't try for idx != [01] is trivial. Checking
that we don't issue more than one of any index requires a tad
more data and some ifdefs protecting that new variable.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* 'tcg-sparc' of git://repo.or.cz/qemu/rth:
tcg-sparc: Preserve branch destinations during retranslation
tcg-sparc: Fix and enable direct TB chaining.
tcg-sparc: Add %g/%o registers to alloc_order
tcg-sparc: Use defines for temporaries.
tcg-sparc: Mask shift immediates to avoid illegal insns.
tcg-sparc: Clean up cruft stemming from attempts to use global registers.
tcg-sparc: Change AREG0 in generated code to %i0.
tcg-sparc: Support GUEST_BASE.
tcg-sparc: Fix qemu_ld/st to handle 32-bit host.
tcg-sparc: Assume v9 cpu always, i.e. force v8plus in 32-bit mode.
tcg-sparc: Don't MAP_FIXED on top of the program
tcg-sparc: Fix ADDX opcode.
tcg-sparc: Hack in qemu_ld/st64 for 32-bit.
linux-user: Use memcpy in get_user/put_user.
The TCG targets no longer need individual implementations.
Since commit 6a18ae2d29,
'flags' is no longer used in tcg_target_get_call_iarg_regs_count.
The remaining tcg_target_get_call_iarg_regs_count is trivial and only
called once. Therefore the patch eliminates it completely.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Commit 25c4d9cc changed all TCGOpcode enums to be available, so we don't
need to #ifdef #endif the one that are available only on some targets.
This makes the code easier to read.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Current code doesn't actually work in 32-bit mode at all. Since
no one really noticed, drop the complication of v7 and v8 cpus.
Eliminate the --sparc_cpu configure option and standardize macro
testing on TCG_TARGET_REG_BITS / HOST_LONG_BITS
Signed-off-by: Richard Henderson <rth@twiddle.net>
Implemented with setcond if the target does not provide
the optional opcode.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
set_label is effectively the end of a basic block, as no optimization
can be made accross it. It was treated as such in the liveness analysis
code, but as a special case.
Mark it with TCG_OPF_BB_END flag so that this information can be used
by other parts of the TCG code, and remove the special case in the liveness
analysis code.
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Now that there are two passes of optimization (optimize.c, liveness)
there is no point of outputing the statistics of the liveness part
only. Update the code to take into account both optimizations.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Don't use global variables directly but via accessor functions. Rename globals.
Convert macros to functions, add GCC format attributes.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
This allows us to actually supply a function name in softmmu builds;
gdb doesn't pick up the minimal symbol table otherwise. Also add a
bit of documentation and statically generate more of the ELF image.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
This allows us to generate unwind info for the dynamicly generated
code in the code_gen_buffer. Only i386 is converted at this point.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Use stack based calling convention (GCC default) for interfacing with
generated code instead of register based convention (regparm(3)).
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
An attempt to allocate a large memory chunk after a small one resulted in
circular links in list of pools. It caused the same memory being
allocated twice for different arrays.
Now pools for large memory chunks are kept in separate list and are
freed during pool reset because current allocator can not reuse them.
Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
tcg_out_label is always called with a third argument of pointer type
which was casted to tcg_target_long.
These casts can be avoided by changing the prototype of tcg_out_label.
There was also a cast to long. For most hosts with
sizeof(long) == sizeof(tcg_target_long) == sizeof(void *) this did not
matter, but for w64 it was wrong. This is fixed now.
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The standard include files are already included in qemu-common.h.
malloc.h and alloca.h were needed for alloca() which was removed
from TCG code some years ago when switching from dyngen to TCG
(see commit 49516bc0d6).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Including tcg_out_ld, tcg_out_st, tcg_out_mov, tcg_out_movi.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
tcg_op_defs was already a global array.
The tci disassembler also needs ARRAY_SIZE(tcg_op_defs),
so add a new global constant with this value.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
These functions are defined in the tcg target specific file
tcg-target.c.
The forward declarations assert that every tcg target uses
the same function prototype.
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
By always defining these symbols, we can eliminate a lot of ifdefs.
To allow this to be checked reliably, the semantics of the
TCG_TARGET_HAS_* macros must be changed from def/undef to true/false.
This allows even more ifdefs to be removed, converting them into
C if statements.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
This allows the simplification of the op_bits function from
tcg/optimize.c.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Added file tcg/optimize.c to hold TCG optimizations. Function tcg_optimize
is called from tcg_gen_code_common. It calls other functions performing
specific optimizations. Stub for constant folding was added.
Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Use stack instead of temp_buf array in CPUState for TCG temps.
On Sparc64, stack pointer is not aligned but there is a fixed bias of 2047,
so don't try to enforce alignment.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The code for stack allocation for call arguments is way too simplistic
to actually work on targets with non-trivial stack allocation policies,
e.g. ppc64. We've also already allocated TCG_STATIC_CALL_ARGS_SIZE worth
of stack for calls which should be well more than any helper needs.
Remove broken dynamic stack allocation code and replace it with an assert.
Should dynamic stack allocation ever be needed again, target specific
functions should be added.
Thanks to Richard Henderson for the analysis.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Based on a patch from Hans de Goede <hdegoede@redhat.com>
This warning is new in gcc 4.6.
Acked-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Christophe Fergeau <cfergeau@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
If an op with dead outputs is not removed, because it has side effects
or has multiple output and only one dead, mark the registers as dead
instead of saving them. This avoid a few register spills on TCG targets
with low register count, especially with div2 and mul2 ops, or when a
qemu_ld* result is not used (prefetch emulation for example).
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
If an op is not removed and has dead output arguments, mark it
in op_dead_args similarly to what is done for input arguments.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Allow all args to be dead by replacing the input specific op_dead_iargs
variable by op_dead_args. Note this is a purely mechanical change.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Add support (if CONFIG_DEBUG_TCG is defined) for debugging leakage
of temporary variables. Generally any temporaries created by
a target while it is translating an instruction should be freed
by the end of that instruction; otherwise carefully crafted
guest code could cause TCG to run out of temporaries and assert.
By calling tcg_check_temp_count() after each instruction we can
check that we are not leaking temporaries in this way.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
fprintf_function uses format checking with GCC_FMT_ATTR.
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
When qemu is configured with --enable-debug-tcg,
gcc throws this warning (or error with -Werror):
tcg/tcg.c:1030: error: comparison of unsigned expression >= 0 is always true
Fix it by removing the >= 0 part.
The type cast to 'unsigned' catches negative values of op
(which should never happen).
This is a modification of Hollis Blanchard's patch.
Cc: Hollis Blanchard <hollis@penguinppc.org>
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Some hosts (amd64, ia64) have an ABI that ignores the high bits
of the 64-bit register when passing 32-bit arguments. Others
require the value to be properly sign-extended for the type.
I.e. "int32_t" must be sign-extended and "uint32_t" must be
zero-extended to 64-bits.
To effect this, extend the "sizemask" parameter to tcg_gen_callN
to include the signedness of the type of each parameter. If the
tcg target requires it, extend each 32-bit argument into a 64-bit
temp and pass that to the function call.
This ABI feature is required by sparc64, ppc64 and s390x.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>