Using only indirect calls results in 3 bundles (one to load the
descriptor address), and 4 stop bits. By looking through the
descriptor to the constants, we can perform the call with 2
bundles and only 1 stop bit.
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
There's no need to go through the full opcode-to-insn function call
to generate nops. This makes the source a bit more readable.
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
If we pull the code to emit the actual load/store into a subroutine,
we can share the reg+reg addressing mode code between softmmu and
usermode. This lets us load GUEST_BASE into a temporary register
rather than attempting to add it piece-wise to the address.
Which lets us use movw+movt for armv7, rather than (up to) 4 adds.
Code size for pre-armv7 stays the same.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Once we form a combined qemu_st_i32 opcode, we won't be able to
have separate constraints based on size. This one is fairly easy
to work around, since eax is available as a scratch register.
When storing variable data, this tends to merely exchange one mov
for another. E.g.
-: mov %esi,%ecx
...
-: mov %cl,(%edx)
+: mov %esi,%eax
+: mov %al,(%edx)
Where we do have a regression is when storing constant data, in which
we may load the constant into edi, when only ecx/ebx ought to be used.
The proper way to recover this regression is to allow constants as
arguments to qemu_st_i32, so that we never load the constant data into
a register at all, must less the wrong register. TBD.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Pass two TCGReg to tcg_out_tlb_load, rather than idx+args.
Move ldst_optimization routines just below tcg_out_tlb_load to avoid
the need for forward declarations.
Use TCGReg enum in preference to int where apprpriate.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Step three in the transition: helpers not tied to the target
"default" endianness. To be used when the guest uses a memory
operation with non-default endianness.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Step two in the transition, adding the new ldst opcodes. Keep the old
opcodes around until all backends support the new opcodes.
Signed-off-by: Richard Henderson <rth@twiddle.net>
This is a no-op backend data implementation, for those targets that
are not currently using the load/store optimization path.
This is prepatory to always requiring these functions in all backends.
Signed-off-by: Richard Henderson <rth@twiddle.net>
A minimal update to use the new helpers with the return address argument.
Tested-by: Claudio Fontana <claudio.fontana@linaro.org>
Reviewed-by: Claudio Fontana <claudio.fontana@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
For the few targets that actually use these, we'd not report
them symbolicly in the tcg opcode logs.
Signed-off-by: Richard Henderson <rth@twiddle.net>
One call inside of a loop to tcg_register_helper instead of hundreds
of sequential calls.
Presumably more icache and branch prediction friendly; resulting binary
size mostly unchanged on x86_64, as we're trading 32-bit rip-relative
references in .text for full 64-bit pointers in .rodata.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Slightly changes the interface, in that we now return name
instead of a TCGHelperInfo structure, which goes away.
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
# By Richard Henderson
# Via Richard Henderson
* rth/tcg-arm-pull:
tcg-arm: Move the tlb addend load earlier
tcg-arm: Remove restriction on qemu_ld output register
tcg-arm: Return register containing tlb addend
tcg-arm: Move load of tlb addend into tcg_out_tlb_read
tcg-arm: Use QEMU_BUILD_BUG_ON to verify constraints on tlb
tcg-arm: Use strd for tcg_out_arg_reg64
tcg-arm: Rearrange slow-path qemu_ld/st
tcg-arm: Use ldrd/strd for appropriate qemu_ld/st64
Message-id: 1380663109-14434-1-git-send-email-rth@twiddle.net
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
# By Stefan Weil
# Via Stefan Weil
* sweil/tci:
misc: Use new rotate functions
bitops: Add rotate functions (rol8, ror8, ...)
tci: Add implementation of rotl_i64, rotr_i64
Message-id: 1380137693-3729-1-git-send-email-sw@weilnetz.de
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
There are free scheduling slots between the sequence of
comparison instructions. This requires changing the
register in use to avoid conflict with those compares.
Signed-off-by: Richard Henderson <rth@twiddle.net>
The main intent of the patch is to allow the tlb addend register
to be changed, without tying that change to the constraint. But
the most common side-effect seems to be to enable usage of ldrd
with the r0,r1 pair.
Signed-off-by: Richard Henderson <rth@twiddle.net>
This allows us to make more intelligent decisions about the relative
offsets of the tlb comparator and the addend, avoiding any need of
writeback addressing.
Signed-off-by: Richard Henderson <rth@twiddle.net>
One of the two constraints we already checked via #if, but
the tlb offset distance was only checked at runtime.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Use the new helper_ret_*_mmu routines. Use a conditional call
to arrange for a tail-call from the store path, and to load the
return address for the helper for the load path.
Signed-off-by: Richard Henderson <rth@twiddle.net>
It is used by qemu-ppc64 when running Debian's busybox-static.
Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Less conditional compilation. Merge an add insn with the indexed
memory load insn. Load the tlb addend earlier. Avoid the address
update memory form.
Fix a bug in not allowing large enough tlb offsets for some guests.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Previously we'd only handle 16-bit offsets from memory operand without falling
back to indexed, but it's easy to use ADDIS to handle full 32-bit offsets.
This also lets us unify code that existed inline in tcg_out_op for handling
addition of large constants.
The new R2 temporary was marked reserved for the AIX calling convention, but
the register really is call-clobbered and since tcg generated code has no use
for a TOC, it's available for use.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Remove conditionalization from tcg_target_reg_alloc_order, relying on
reserved_regs to prevent register allocation that shouldn't happen.
So R11 is now present in reg_alloc_order for __APPLE__, but also now
reserved.
Sort reg_alloc_order into call-saved, call-clobbered, and parameters.
This reduces the effect of values getting spilled and reloaded before
function calls.
Whether or not it is reserved, R2 (TOC) is always call-clobbered.
Signed-off-by: Richard Henderson <rth@twiddle.net>