Still inline, but updated to the new routines. Always use the LE
helpers, reusing the bswap between the fast and slot paths.
Signed-off-by: Richard Henderson <rth@twiddle.net>
This sequencing requires 5 stop bits instead of 6, and has room left
over to pre-load the tlb addend, and bswap data prior to being stored.
Signed-off-by: Richard Henderson <rth@twiddle.net>
The assembler seems to prefer them, perhaps we should too.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Replace aarch64_ldst_op_data with AArch64LdstType, as it wasn't encoded
for the proper shift for the field and was confusing.
Merge aarch64_ldst_op_data, AArch64LdstType, and a few stray opcode bits
into a single I3312_* argument, eliminating some magic numbers from the
helper functions.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Cleaning up the implementation of REV and REV16 at the same time.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Making the bswap conditional on the memop instead of a compile-time test.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
In some cases, a direct branch will be in range.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Some guest env are small enough to reach the tlb with only a 12-bit addition.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Combines 4 other inline functions and tidies the prologue.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
It's obviously call-clobbered, but is otherwise unused.
Repurpose it as the TCG temporary.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
A compare and branch against zero happens at the start of
every single TB.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Rearrange code to put the compare and branch in the same place.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The subset of logical immediates that we support is quite quick to test,
and such constants are quite common to want to load.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
When profitable, initialize the register with MOVN instead of MOVZ,
before setting the remaining lanes with MOVK.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Rather than raw constants that could mean anything.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The arm ldrd/strd insns must cause alignment traps, whereas
at least for armv7 ldr/str must handle unaligned operations.
While this is hardly the only problem facing user-only emu,
this solves one problem for i386 on armv7 emulation.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reported-by: Huw Davies <huw@codeweavers.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
All of the helpers with the explicit big/little endian option
require the return address as a parameter. Acquire this via
a trampoline.
Move the load of areg0 into the trampoline.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Pass address registers explicitly, rather than as indicies of args[].
It's two argument registers either way. Use more TCGReg as appropriate.
Signed-off-by: Richard Henderson <rth@twiddle.net>
We were computing the full address into %o0 and then not using it.
Adjust some of the computation to rely less on having to pull immediate
values into registers.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Cleaning up the implementation of tcg_out_movi at the same time.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Tested-by: Claudio Fontana <claudio.fontana@huawei.com>
Clean up multiply at the same time.
For remainder, generic code will produce mul+sub,
whereas we can implement with msub.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Tested-by: Claudio Fontana <claudio.fontana@huawei.com>
Also tidy the implementation of ubfm, sbfm, extr in order to share code.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Tested-by: Claudio Fontana <claudio.fontana@huawei.com>
Handle a simplified set of logical immediates for the moment.
The way gcc and binutils do it, with 52k worth of tables, and
a binary search depth of log2(5334) = 13, seems slow for the
most common cases.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Tested-by: Claudio Fontana <claudio.fontana@huawei.com>
Avoid the magic numbers in the current implementation.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Tested-by: Claudio Fontana <claudio.fontana@huawei.com>
This merges the implementation of tcg_out_addi and tcg_out_subi.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Tested-by: Claudio Fontana <claudio.fontana@huawei.com>
Converting the add/sub (3.5.2) and logical shifted (3.5.10) instruction
groups to the new scheme.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Tested-by: Claudio Fontana <claudio.fontana@huawei.com>
Commit 023261ef85 failed to remove a
nop that's no longer required.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
At first glance the code appears to be using 1's compliment encoding,
a-la AArch32. Except that the constant is "off", creating a complicated
split field 2's compliment encoding.
Much clearer to just use a normal mask and shift.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
It was unused. Let's not overcomplicate things before we need them.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This reduces the code size of the function significantly.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
We assert that the values for _I32 and _I64 are 0 and 1 respectively.
This will make a couple of functions declared by tcg.c cleaner.
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Removed from other targets in 56bbc2f967.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Win32 doesn't have a cpuid.h, and MacOSX may have one but without
the __cpuid() function we use, which means that commit 9d2eec20
broke the build for those platforms. Fix this by tightening up
our configure cpuid.h check to test that the functions we need
are present, and adding some missing #ifdef guards in
tcg/i386/tcg-target.c.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
These three-operand shift instructions do not require the shift count
to be placed into ECX. This reduces the number of mov insns required,
with the mere addition of a new register constraint.
Don't attempt to get rid of the matching constraint, as that's impossible
to manipulate with just a new constraint. In addition, constant shifts
still need the matching constraint.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Note that the optimizer cannot simplify ANDC X,Y,C to AND X,Y,~C
so we must handle constants in the implementation of andc.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
These are not needed by users of tcg-target.h. No need to recompile
when we adjust them.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Recognize 0 operand to andc, and -1 operands to and, orc, eqv.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Like we already do for SUB and XOR.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Given, of course, an appropriate constant. These could be generated
from the "canonical" operation for inversion on the guest, or via
other optimizations.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The shl_i32 op might set some bits of the unused 32 high bits of the
mask. Fix that by clearing the unused 32 high bits for all 32-bit ops
except load/store which operate on tl values.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Known-zero bits optimization is a great idea that helps to generate more
optimized code. However the current implementation only works in very few
cases as the computed mask is not saved.
Fix this to make it really working.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
32-bit versions of sar and shr ops should not propagate known-zero bits
from the unused 32 high bits. For sar it could even lead to wrong code
being generated.
Cc: qemu-stable@nongnu.org
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
It's this that should be subtracted from 0x20 when converting to a right rotate.
Cc: qemu-stable@nongnu.org
Signed-off-by: Huw Davies <huw@codeweavers.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The second half register of a 64-bit temp on a 32-bit host
was allocated with the wrong base_type.
The base_type of the second half register is never checked,
but for consistency it should be the same as the first half.
Signed-off-by: Richard Henderson <rth@twiddle.net>
We have macros for marking TCGv values as unused, checking if they
are unused and comparing them to each other. However these only exist
for TCGv_i32 and TCGv_i64; add them for TCGv_ptr as well.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Commit c9baa30f42 failed to
delete all of the relevant code, leading to Werrors about
unused symbols.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
We have cache pools of temporaries that we can reuse later when they've
already been allocated before.
These cache pools differenciate between the target TCG variable type they
contain. So we have one pool for I32 and one pool for I64 variables.
On a 32bit system, we can't work with 64bit registers though. So instead we
spawn two I32 temporaries for every I64 temporary we create. All caching
works the same way as on a real 64-bit system though: We create a cache entry
in the 64bit array for the first i32 index.
However, when we free such a temporary we free it to the pool of its type
(which is always i32 on 32bit systems) rather than its base_type (which is
i64 or i32 depending on the variable). This means we put a temporary that
is of base_type == i64 into the i32 preallocated temporary pool.
Eventually, this results in failures like this on 32bit hosts:
qemu-system-ppc64: tcg/tcg.c:515: tcg_temp_new_internal: Assertion `ts->base_type == type' failed.
This patch makes the free routine use the base_type instead for the free case,
so it's consistent with the temporary allocation. It fixes the above failure
for me.
Signed-off-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 1390146811-59936-1-git-send-email-agraf@suse.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
TCG_TARGET_HAS_movcond_i32 is always defined to 1 in tcg-target.h, so
remove the corresponding #ifdef #endif sequence, left from a previous
refactoring.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The movbe instruction has been added on some Intel Atom CPUs and on
recent Intel Haswell CPUs. It allows to load/store a value and at the
same time bswap it.
This patch detects the avaibility of this instruction and when available
use it in the qemu load/store routines in replacement of load/store +
bswap. Note that for 16-bit unsigned loads, movbe + movzw is basically the
same as movzw + bswap, so the patch doesn't touch this case.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
[RTH: Reduced the number of conditionals using "movop".]
Signed-off-by: Richard Henderson <rth@twiddle.net>
Add support for three-byte opcodes, starting with the 0x0f 0x38 prefix.
Use P_EXT38 as the new constant, and shift all other constants so that
P_EXT and P_EXT38 have neighbouring values.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
[RTH: Changed the name from P_EXT2 to P_EXT38.]
Signed-off-by: Richard Henderson <rth@twiddle.net>
P_REXW is defined has a constant at the beginning of i386/tcg-target.c,
but the corresponding bit is later used in a harcoded way, which defeat
the purpose of a constant.
Fix that by using a conditional expression operator instead of a shift.
On x86 this actually makes the code slightly smaller as GCC does in
practice (opc >> 8) & 8 instead of (opc & 0x800) >> 8 so the constants
are smaller to load.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The comments apply to 8-bit stores, not 8-byte stores.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
We previously allocated 32-bits per temp for the next_free_temp entry.
We now allocate 4 bits per temp across the 4 bitmaps.
Using a linked list meant that if a translator is tweeked, resulting in
temps being freed in a different order, that would have follow-on effects
throughout the TB. Always allocating the lowest free temp means that
follow-on effects are minimized, which can make it easier to diff output
when debugging the translators.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
No need to set up a SIGILL signal handler for detection anymore.
Remove a ton of sanity checks that must be true, given that we're
requiring a 64-bit build (the note about 31-bit KVM is satisfied
by configuring with TCI).
Signed-off-by: Richard Henderson <rth@twiddle.net>
Allow host detection on linux systems without glibc 2.16 or later.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Allow host detection on linux systems without glibc 2.16 or later.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Being able to "extend" from 64-bits (with a mov) simplifies
a few places where the conditional breaks the train of thought.
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
We can and/or/xor/andcm small constants, saving one cycle.
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
We can subtract from more small constants that just 0 with one insn,
and we can add the negative for most small constants.
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Avoids a wasted cycle loading up small constants.
Simplify the code assuming the tcg optimizer is going to work
and don't expect the first operand of the add to be constant.
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
When performing an operation with two input registers, we'd leave
the stop bit (and thus an extra cycle) that's only needed when one
or the other input is a constant.
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Since the move away from the global areg0, we're no longer globally
reserving areg0. Which means our use of R7 clobbers a call-saved
register. Shift areg0 into the windowed registers. Indeed, choose
the incoming parameter register that it comes to us by.
This requires moving the register holding the return address elsewhere.
Choose R33 for tidiness.
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
There was a misconception that a stop bit is required between a compare
and the branch that uses the predicate set by the compare. This lead to
the usage of an extra bundle in which to perform the compare. The extra
bundle left room for constants to be loaded for use with the compare insn.
If we pack the compare and the branch together in the same bundle, then
there's no longer any room for non-zero constants. At which point we
can eliminate half the function by not handling them.
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Using only indirect calls results in 3 bundles (one to load the
descriptor address), and 4 stop bits. By looking through the
descriptor to the constants, we can perform the call with 2
bundles and only 1 stop bit.
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
There's no need to go through the full opcode-to-insn function call
to generate nops. This makes the source a bit more readable.
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
If we pull the code to emit the actual load/store into a subroutine,
we can share the reg+reg addressing mode code between softmmu and
usermode. This lets us load GUEST_BASE into a temporary register
rather than attempting to add it piece-wise to the address.
Which lets us use movw+movt for armv7, rather than (up to) 4 adds.
Code size for pre-armv7 stays the same.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Once we form a combined qemu_st_i32 opcode, we won't be able to
have separate constraints based on size. This one is fairly easy
to work around, since eax is available as a scratch register.
When storing variable data, this tends to merely exchange one mov
for another. E.g.
-: mov %esi,%ecx
...
-: mov %cl,(%edx)
+: mov %esi,%eax
+: mov %al,(%edx)
Where we do have a regression is when storing constant data, in which
we may load the constant into edi, when only ecx/ebx ought to be used.
The proper way to recover this regression is to allow constants as
arguments to qemu_st_i32, so that we never load the constant data into
a register at all, must less the wrong register. TBD.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Pass two TCGReg to tcg_out_tlb_load, rather than idx+args.
Move ldst_optimization routines just below tcg_out_tlb_load to avoid
the need for forward declarations.
Use TCGReg enum in preference to int where apprpriate.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Step three in the transition: helpers not tied to the target
"default" endianness. To be used when the guest uses a memory
operation with non-default endianness.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Step two in the transition, adding the new ldst opcodes. Keep the old
opcodes around until all backends support the new opcodes.
Signed-off-by: Richard Henderson <rth@twiddle.net>
This is a no-op backend data implementation, for those targets that
are not currently using the load/store optimization path.
This is prepatory to always requiring these functions in all backends.
Signed-off-by: Richard Henderson <rth@twiddle.net>
A minimal update to use the new helpers with the return address argument.
Tested-by: Claudio Fontana <claudio.fontana@linaro.org>
Reviewed-by: Claudio Fontana <claudio.fontana@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
For the few targets that actually use these, we'd not report
them symbolicly in the tcg opcode logs.
Signed-off-by: Richard Henderson <rth@twiddle.net>
One call inside of a loop to tcg_register_helper instead of hundreds
of sequential calls.
Presumably more icache and branch prediction friendly; resulting binary
size mostly unchanged on x86_64, as we're trading 32-bit rip-relative
references in .text for full 64-bit pointers in .rodata.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Slightly changes the interface, in that we now return name
instead of a TCGHelperInfo structure, which goes away.
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
# By Richard Henderson
# Via Richard Henderson
* rth/tcg-arm-pull:
tcg-arm: Move the tlb addend load earlier
tcg-arm: Remove restriction on qemu_ld output register
tcg-arm: Return register containing tlb addend
tcg-arm: Move load of tlb addend into tcg_out_tlb_read
tcg-arm: Use QEMU_BUILD_BUG_ON to verify constraints on tlb
tcg-arm: Use strd for tcg_out_arg_reg64
tcg-arm: Rearrange slow-path qemu_ld/st
tcg-arm: Use ldrd/strd for appropriate qemu_ld/st64
Message-id: 1380663109-14434-1-git-send-email-rth@twiddle.net
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
# By Stefan Weil
# Via Stefan Weil
* sweil/tci:
misc: Use new rotate functions
bitops: Add rotate functions (rol8, ror8, ...)
tci: Add implementation of rotl_i64, rotr_i64
Message-id: 1380137693-3729-1-git-send-email-sw@weilnetz.de
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
There are free scheduling slots between the sequence of
comparison instructions. This requires changing the
register in use to avoid conflict with those compares.
Signed-off-by: Richard Henderson <rth@twiddle.net>
The main intent of the patch is to allow the tlb addend register
to be changed, without tying that change to the constraint. But
the most common side-effect seems to be to enable usage of ldrd
with the r0,r1 pair.
Signed-off-by: Richard Henderson <rth@twiddle.net>
This allows us to make more intelligent decisions about the relative
offsets of the tlb comparator and the addend, avoiding any need of
writeback addressing.
Signed-off-by: Richard Henderson <rth@twiddle.net>
One of the two constraints we already checked via #if, but
the tlb offset distance was only checked at runtime.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Use the new helper_ret_*_mmu routines. Use a conditional call
to arrange for a tail-call from the store path, and to load the
return address for the helper for the load path.
Signed-off-by: Richard Henderson <rth@twiddle.net>
It is used by qemu-ppc64 when running Debian's busybox-static.
Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Less conditional compilation. Merge an add insn with the indexed
memory load insn. Load the tlb addend earlier. Avoid the address
update memory form.
Fix a bug in not allowing large enough tlb offsets for some guests.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Previously we'd only handle 16-bit offsets from memory operand without falling
back to indexed, but it's easy to use ADDIS to handle full 32-bit offsets.
This also lets us unify code that existed inline in tcg_out_op for handling
addition of large constants.
The new R2 temporary was marked reserved for the AIX calling convention, but
the register really is call-clobbered and since tcg generated code has no use
for a TOC, it's available for use.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Remove conditionalization from tcg_target_reg_alloc_order, relying on
reserved_regs to prevent register allocation that shouldn't happen.
So R11 is now present in reg_alloc_order for __APPLE__, but also now
reserved.
Sort reg_alloc_order into call-saved, call-clobbered, and parameters.
This reduces the effect of values getting spilled and reloaded before
function calls.
Whether or not it is reserved, R2 (TOC) is always call-clobbered.
Signed-off-by: Richard Henderson <rth@twiddle.net>
While these are rare from code that's been through the optimizer,
it's not uncommon within the tcg backend.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Instead of bare N, for clarity. The only (intentional) exception made
is for insns that encode R|0, i.e. when R0 encoded into the insn is
interpreted as zero not the contents of the register.
Signed-off-by: Richard Henderson <rth@twiddle.net>
The fix is that sparc has so many mmu modes that the last one overflowed
the 16-bit signed offset we assumed would fit. Handle this, and check
the new assumption at compile time.
Load the tlb addend earlier for the fast path.
Remove the explicit address + addend and make use of index addressing.
Adjust constraints for qemu_ld64 such that we don't clobber the address
register or tlb addend before loading both values.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Coding style fixes. Use TCGReg enumeration values instead of raw
numbers. Don't needlessly pull the whole TCGLabelQemuLdst struct
into local variables. Less conditional compilation.
No functional changes.
Signed-off-by: Richard Henderson <rth@twiddle.net>
While these are rare from code that's been through the optimizer,
it's not uncommon within the tcg backend.
Signed-off-by: Richard Henderson <rth@twiddle.net>
These use a 32-bit load-of-immediate to save a mflr+addi+mtlr sequence.
Tested with a Windows 98 guest (pretty much the most recent thing I
could run on my PPC machine) and kvm-unit-tests's sieve.flat. The
speed up for sieve.flat is as high as 10% for qemu-system-i386, 25%
(no kidding) for qemu-system-x86_64 on my PowerBook G4.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
For the AIX ABI, the function pointer and small area pointer need
to be loaded in the trampoline. The trampoline instead is called
with a normal BL instruction.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
error: suggest parentheses around comparison in operand of ‘&’ [-Werror=parentheses]
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
# By Stefan Weil (6) and others
# Via Michael Tokarev
* mjt/trivial-patches:
aio / timers: use g_usleep() not sleep()
adlib: sort offsets in portio registration
qmp: fix integer usage in examples
tci: Remove function tcg_out64 (fix broken build)
target-arm: Report unimplemented opcodes (LOG_UNIMP)
pflash_cfi02.c: fix debug macro
configure: Remove unneeded redirections of stderr (pkg-config --exists)
configure: Remove unneeded redirections of stderr (pkg-config --cflags, --libs)
configure: Don't write .pyc files by default (python -B)
curl: qemu_bh_new() can never return NULL
slirp/arp_table.c: Avoid shifting into sign bit of signed integers
configure: disable clang -Wstring-plus-int warning
rdma: silly ipv6 bugfix
misc: Fix some typos in names and comments
slirp: Port redirection option behave differently on Linux and Windows
Message-id: 1378119695-14568-1-git-send-email-mjt@msgid.tls.msk.ru
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
* 'tcg-next' of git://github.com/rth7680/qemu: (29 commits)
tcg-i386: Make use of zero-extended memory helper routines
tcg: Introduce zero and sign-extended versions of load helpers
exec: Split softmmu_defs.h
target: Include softmmu_exec.h where forgotten
exec: Rename USUFFIX to LSUFFIX
tcg-i386: Don't perform GETPC adjustment in TCG code
exec: Reorganize the GETRA/GETPC macros
configure: Allow x32 as a host
tcg-i386: Adjust tcg_out_tlb_load for x32
tcg-i386: Use intptr_t appropriately
tcg: Fix jit debug for x32
tcg: Use appropriate types in tcg_reg_alloc_call
tcg: Change tcg_out_ld/st offset to intptr_t
tcg: Change tcg_gen_exit_tb argument to uintptr_t
tcg: Use uintptr_t in TCGHelperInfo
tcg: Change relocation offsets to intptr_t
tcg: Change memory offsets to intptr_t
tcg: Change frame pointer offsets to intptr_t
tcg: Define TCG_ptr properly
tcg: Define TCG_TYPE_PTR properly
...
On MIPS ext8s and ext16s ops are implemented with a dedicated
instruction only on MIPS32R2, otherwise the same kind of implementation
than at TCG level (shift left followed by shift right) is used.
Change that by only implementing the ext8s and ext16s ops on MIPS32R2 so
that optimizations can be done by the optimizer. Use an inline version to
avoid having to test again for MIPS32R2 instructions. Keep the shift
implementation for the ld/st routines.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Use an inline version for the bswap16 and bswap32 ops to avoid
testing for MIPS32R2 instructions availability, as these ops are
only available in that case.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Now that TCG supports enabling and disabling ops at runtime, it's
possible to detect the available host instructions at runtime, and
enable the corresponding ops accordingly.
Unfortunately it's not easy to probe for available instructions on
MIPS, the information is partially available in /proc/cpuinfo, and
not available in AUXV. This patch therefore probes for the instructions
by trying to execute them and by catching a possible SIGILL signal.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
For 8 and 16-bit unsigned loads, rely on the zero-extension
from the helper and use a smaller 32-bit move insn.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The _cmmu helpers can be moved to exec-all.h. The helpers that are
used from TCG will shortly need access to tcg_target_long so move
their declarations into tcg.h.
This requires minor include adjustments to all TCG backends.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Since we now perform it inside the helper, no need to do it here.
This also lets us perform a tail-call from the store slow path to
the helper.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
There are several hosts for which it would be useful to use the
available 64-bit registers in a 32-bit pointer environment.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Using these instead of mulu2 and muls2 lets us avoid having to argument
overlap analysis in the backend. Normal register allocation will DTRT.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
With the optimization in tcg_liveness_analysis,
we can avoid the MFLO when it is unused.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Use them in places where mulu2 and muls2 are used.
Optimize mulx2 with dead low part to mulxh.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Commit ac26eb69a3 added tcg_out64 to tcg/tcg.c.
tcg/tci/tcg-target.c already had a nearly identical implementation which is
now removed to fix a compiler error.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Discontinue the jump-around-jump-to-jump scheme, trading it for a single
immediate move instruction. The two extra jumps always consume 7 bytes,
whereas the immediate move is either 5 or 7 bytes depending on where the
code_gen_buffer gets located.
Signed-off-by: Richard Henderson <rth@twiddle.net>