Now that the code_gen_buffer is constrained to not cross 256mb
regions, we are assured that we can use J to reach another TB.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Use the same table to fold comparisons as with setcond.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Emitting a single branch instead of (up to) 3, using setcond2
to generate the composite compare.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The original code results in one too many insns per zero
present in the input. And since comparing 64-bit numbers
vs zero is common...
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Using tcg_unsigned_cond and tcg_high_cond.
Also, move the function up in the file for future cleanups.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Use the same table to fold comparisons as with setcond.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Use a table to fold comparisons to less-than.
Also, move the function up in the file for futher simplifications.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Most opcodes fall in to one of a couple of patterns.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Since we must use ADDUI, we would generate incorrect code for -32768.
Leaving off subtract of +32768 makes things easier for a follow-on patch.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
At the same time, tidy deposit by introducing tcg_out_opc_bf.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
T0 is an argument register for the n32 and n64 abis. T9 is the call
address register for the abis, and is more directly under the control
of the backend.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Use these instead of hard-coding the registers to use for temporaries.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Use FP (also known as S8) as a normal call-saved register.
Include T0 in the allocation order and call-clobbered list
even though it's currently used as a TCG temporary.
Put the argument registers at the end of the allocation order.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
In addition, fill delay slots calling the helpers and tail
call to the store helpers.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
At the same time, tidy up the call helpers, avoiding a memory reference.
Split out several subroutines. Use TCGMemOp constants. Make endianness
selectable at runtime.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
For userland builds calls will normally be in range,
and for the exit_tb opcode the branch to the epilogue.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Broken since dddbb2e1e3.
Do all the rest of the things that tcg_out_op did before
and after the big switch statement.
Signed-off-by: Richard Henderson <rth@twiddle.net>
There are a variety of common cases for which we can use carry tricks to
avoid a conditional branch. On very new hardware, use LOAD ON CONDITION
instead of a conditional branch.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Elides two insns from the sequence. The resulting tlb compare
sequence is satisfyingly minimal:
risbg %r2,%r8,51,186,56
risbg %r3,%r8,61,178,0
cg %r3,904(%r10,%r2)
lg %r2,920(%r10,%r2)
jlh tlb_miss
Signed-off-by: Richard Henderson <rth@twiddle.net>
Commit af3cbfbe80 hoisted some "common"
loads of the temporary type, forgetting that the types could differ
during truncating moves. This affects the correctness of the memory
offset on big-endian hosts.
Tested-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The INDEX_op_call case has just been obsoleted; the mov and movi
cases have not been reachable for years. Attempt to document this
both in each tcg_out_op switch, and via TCG_OPF_NOT_PRESENT.
Because of the TCG_OPF_NOT_PRESENT change, this must be done for
all targets in a single commit.
Signed-off-by: Richard Henderson <rth@twiddle.net>
The move opcodes are special in that their constraints must cover
all available registers. So instead of checking the constraints,
just use the available registers.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Avoid allocating a tcg temporary to hold the constant address,
and instead place it directly into the op_call arguments.
At the same time, convert to the newly introduced tcg_out_call
backend function, rather than invoking tcg_out_op for the call.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Now that all backends do define TCG_TARGET_INSN_UNIT_SIZE,
remove the fallback definition.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
And use tcg pointer differencing functions as appropriate.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
And use tcg pointer differencing functions as appropriate.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Using a 16-byte aligned structure achieves best results, both for code
cleanliness and compiled code size. However, this means that we can't
use the trick of encoding the slot number into the low 2 bits.
Thankfully, we only ever use slot2, so make that explicit in the names
of the relocation functions, and drop the code for other slots.
Signed-off-by: Richard Henderson <rth@twiddle.net>
And use tcg pointer differencing functions as appropriate.
Acked-by: Claudio Fontana <claudio.fontana@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
And use tcg pointer differencing functions as appropriate.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
And use tcg pointer differencing functions as appropriate.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
And use tcg pointer differencing functions as appropriate.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
And use tcg pointer differencing functions as appropriate.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
And use tcg pointer differencing functions as appropriate.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
To be defined by the tcg backend based on the elemental unit of the ISA.
During the transition, allow TCG_TARGET_INSN_UNIT_SIZE to be undefined,
which allows us to default tcg_insn_unit to the current uint8_t.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
To avoid C undefined behaviour when patching generated code,
provide wrappers tcg_patch8/16/32/64 which use the usual memcpy
trick, and use them in the i386 backend.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Avoid stores to unaligned addresses in TCG code generation, by using the
usual memcpy() approach. (Using bswap.h would drag a lot of QEMU baggage
into TCG, so it's simpler just to do direct memcpy() here.)
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Use TCGReg everywhere appropriate. Use int32_t for all arguments
that may be registers or immediate constants. Merge tcg_out_addi
into its only caller.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Use sextract instead of raw bit shifting for the tests. Introduce
a new check_fit_ptr macro to make it clear we're looking at pointers.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Using the 32-bit SMUL is a tad more efficient than
resorting to extending and using the 64-bit MULX.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Quite a lot of effort was spent composing and decomposing 64-bit
quantities in registers, when we should just create them and leave
them as one 64-bit register.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Replace with SPARC64 define. Soon even sparcv8plus will use
64-bit register as far as TCG is concerned.
Signed-off-by: Richard Henderson <rth@twiddle.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJTVthUAAoJEK0ScMxN0Ceb5OUH/0Ri4engOQim3mV1jAOr2Vca
3zg+Hl8b+UioXD0se24Iipr6s+02G1DApbbLPX7DZAnoh9jEBvDtHOdde3pNbMkQ
jcTpShTyT7OKSsklRN19ckvk0ffBch5W3Ekkw6/Hg6ys2HIvirRpEL6R58oJNlP6
xcCkQZISZVkakbv5xft8YQo1v8wnU5q2l85OaC1aaDB6g+Y6ZgoA1qkWjqlHkmQk
1asflfbC0r5ke+yx7vz6310f5xBDLSVv17dqsDUr70o1m/6bem6wQXMczwmYUfk5
99OCPiqdiCZLJyVFvIvfwSakL9Bq/nvnmywXTkrB7rovk5VZz3gc4sJkuUkL+KM=
=URV1
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/rth/tags/tcg-next-20140422' into staging
Pull tcg 2014-04-22
# gpg: Signature made Tue 22 Apr 2014 22:00:04 BST using RSA key ID 4DD0279B
# gpg: Can't check signature: public key not found
* remotes/rth/tags/tcg-next-20140422:
tcg: Use HOST_WORDS_BIGENDIAN
tcg: Fix fallback from muls2_i64 to mulu2_i64
tcg: Use tcg_gen_mulu2_i32 in tcg_gen_muls2_i32
tcg: Relax requirement for mulu2_i32 on 32-bit hosts
tcg-s390: Remove W constraint
tcg-sparc: Use the type parameter to tcg_target_const_match
tcg-ppc64: Use the type parameter to tcg_target_const_match
tcg-aarch64: Remove w constraint
tcg: Add TCGType parameter to tcg_target_const_match
tcg: Fix out of range shift in deposit optimizations
tci: Mask shift counts to avoid undefined behavior
tcg: Mask shift quantities while folding
tcg: Use "unspecified behavior" for shifts
tcg: Fix warning (1 bit signed bitfield entry) and replace int by bool
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Brown Bag sez, don't put the fallback code into the wrong function.
Also, check for muluh_i64 and use tcg_gen_mulu2_i64 instead of raw ops.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Instead require either mulu2_i32 or muluh_i32. The code in tcg-op.h
already supports looking for both. Previous incomplete conversion?
Signed-off-by: Richard Henderson <rth@twiddle.net>
Most 64-bit targets need to be able to ignore the high bits
of a TCG_TYPE_I32 value.
Suggested-by: Stuart Brady <sdb@zubnet.me.uk>
Signed-off-by: Richard Henderson <rth@twiddle.net>
By inspection, for a deposit(x, y, 0, 64), we'd have a shift of (1<<64)
and everything else falls apart. But we can reuse the existing deposit
logic to get this right.
Signed-off-by: Richard Henderson <rth@twiddle.net>
The TCG result would be undefined, but we can at least produce one
plausible result and avoid triggering the wrath of analysis tools.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Change the definition such that shifts are not allowed to crash
for any input.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Static code analyzers complain about signed bitfields with only a single
bit. is_ld is used as a boolean value, so make it bool.
ppc64 already used bool for the 2nd argument is_ld of the local function
add_qemu_ldst_label. Modify all other TCG targets to do follow this
example.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Still inline, but updated to the new routines. Always use the LE
helpers, reusing the bswap between the fast and slot paths.
Signed-off-by: Richard Henderson <rth@twiddle.net>
This sequencing requires 5 stop bits instead of 6, and has room left
over to pre-load the tlb addend, and bswap data prior to being stored.
Signed-off-by: Richard Henderson <rth@twiddle.net>
The assembler seems to prefer them, perhaps we should too.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Replace aarch64_ldst_op_data with AArch64LdstType, as it wasn't encoded
for the proper shift for the field and was confusing.
Merge aarch64_ldst_op_data, AArch64LdstType, and a few stray opcode bits
into a single I3312_* argument, eliminating some magic numbers from the
helper functions.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Cleaning up the implementation of REV and REV16 at the same time.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>