The INDEX_op_call case has just been obsoleted; the mov and movi
cases have not been reachable for years. Attempt to document this
both in each tcg_out_op switch, and via TCG_OPF_NOT_PRESENT.
Because of the TCG_OPF_NOT_PRESENT change, this must be done for
all targets in a single commit.
Signed-off-by: Richard Henderson <rth@twiddle.net>
And use tcg pointer differencing functions as appropriate.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Most 64-bit targets need to be able to ignore the high bits
of a TCG_TYPE_I32 value.
Suggested-by: Stuart Brady <sdb@zubnet.me.uk>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Static code analyzers complain about signed bitfields with only a single
bit. is_ld is used as a boolean value, so make it bool.
ppc64 already used bool for the 2nd argument is_ld of the local function
add_qemu_ldst_label. Modify all other TCG targets to do follow this
example.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The fix is that sparc has so many mmu modes that the last one overflowed
the 16-bit signed offset we assumed would fit. Handle this, and check
the new assumption at compile time.
Load the tlb addend earlier for the fast path.
Remove the explicit address + addend and make use of index addressing.
Adjust constraints for qemu_ld64 such that we don't clobber the address
register or tlb addend before loading both values.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Coding style fixes. Use TCGReg enumeration values instead of raw
numbers. Don't needlessly pull the whole TCGLabelQemuLdst struct
into local variables. Less conditional compilation.
No functional changes.
Signed-off-by: Richard Henderson <rth@twiddle.net>
While these are rare from code that's been through the optimizer,
it's not uncommon within the tcg backend.
Signed-off-by: Richard Henderson <rth@twiddle.net>
These use a 32-bit load-of-immediate to save a mflr+addi+mtlr sequence.
Tested with a Windows 98 guest (pretty much the most recent thing I
could run on my PPC machine) and kvm-unit-tests's sieve.flat. The
speed up for sieve.flat is as high as 10% for qemu-system-i386, 25%
(no kidding) for qemu-system-x86_64 on my PowerBook G4.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
For the AIX ABI, the function pointer and small area pointer need
to be loaded in the trampoline. The trampoline instead is called
with a normal BL instruction.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The _cmmu helpers can be moved to exec-all.h. The helpers that are
used from TCG will shortly need access to tcg_target_long so move
their declarations into tcg.h.
This requires minor include adjustments to all TCG backends.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
mmu access looks something like:
<check tlb>
if miss goto slow_path
<fast path>
done:
...
; end of the TB
slow_path:
<pre process>
mr r3, r27 ; move areg0 to r3
; (r3 holds the first argument for all the PPC32 ABIs)
<call mmu_helper>
b $+8
.long done
<post process>
b done
On ppc32 <call mmu_helper> is:
(SysV and Darwin)
mmu_helper is most likely not within direct branching distance from
the call site, necessitating
a. moving 32 bit offset of mmu_helper into a GPR ; 8 bytes
b. moving GPR to CTR/LR ; 4 bytes
c. (finally) branching to CTR/LR ; 4 bytes
r3 setting - 4 bytes
call - 16 bytes
dummy jump over retaddr - 4 bytes
embedded retaddr - 4 bytes
Total overhead - 28 bytes
(PowerOpen (AIX))
a. moving 32 bit offset of mmu_helper's TOC into a GPR1 ; 8 bytes
b. loading 32 bit function pointer into GPR2 ; 4 bytes
c. moving GPR2 to CTR/LR ; 4 bytes
d. loading 32 bit small area pointer into R2 ; 4 bytes
e. (finally) branching to CTR/LR ; 4 bytes
r3 setting - 4 bytes
call - 24 bytes
dummy jump over retaddr - 4 bytes
embedded retaddr - 4 bytes
Total overhead - 36 bytes
Following is done to trim the code size of slow path sections:
In tcg_target_qemu_prologue trampolines are emitted that look like this:
trampoline:
mfspr r3, LR
addi r3, 4
mtspr LR, r3 ; fixup LR to point over embedded retaddr
mr r3, r27
<jump mmu_helper> ; tail call of sorts
And slow path becomes:
slow_path:
<pre process>
<call trampoline>
.long done
<post process>
b done
call - 4 bytes (trampoline is within code gen buffer
and most likely accessible via
direct branch)
embedded retaddr - 4 bytes
Total overhead - 8 bytes
In the end the icache pressure is decreased by 20/28 bytes at the cost
of an extra jump to trampoline and adjusting LR (to skip over embedded
retaddr) once inside.
Signed-off-by: malc <av1474@comtv.ru>
There are several cases that can be handled easier inside both
translators and code generators if we have out-of-band values
for conditions. It's easy enough to handle ALWAYS and NEVER in
the natural way inside the tcg middle-end.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The TCG jmp operation doesn't really make sense in the QEMU context, it
is unused, it is not implemented by some targets, and it is wrongly
implemented by some others.
This patch simply removes it.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Acked-by: Blue Swirl <blauwirbel@gmail.com>
Acked-by: Stefan Weil<sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The TCG targets no longer need individual implementations.
Since commit 6a18ae2d29,
'flags' is no longer used in tcg_target_get_call_iarg_regs_count.
The remaining tcg_target_get_call_iarg_regs_count is trivial and only
called once. Therefore the patch eliminates it completely.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Now that CONFIG_TCG_PASS_AREG0 is enabled for all targets,
remove dead code and support for !CONFIG_TCG_PASS_AREG0 case.
Remove dyngen-exec.h and all references to it. Although included by
hw/spapr_hcall.c, it does not seem to use it.
Remove unused HELPER_CFLAGS.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Commit eeacee4d86 changed the syntax of tcg_dump_ops, but didn't convert
all users (notably missing the ppc ones) to it. Fix them to the new syntax.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: malc <av1474@comtv.ru>
powerpc-apple-darwin9-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5577)
does not define _CALL_DARWIN, leading to unexpected behavior w.r.t.
register clobbering and stack frame layout.
Since _CALL_DARWIN is a reserved identifier, define a custom
TCG_TARGET_CALL_DARWIN based on either _CALL_DARWIN or __APPLE__.
Signed-off-by: Andreas F?rber <andreas.faerber@web.de>
Signed-off-by: malc <av1474@comtv.ru>
Adjust the tcg_out_qemu_{ld,st}() slow paths to pass AREG0 in r3,
based on patches by malc.
Also adjust the registers clobbered, based on patch by Alex.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Acked-by: Alexander Graf <agraf@suse.de>
[AF: Do not hardcode r3 for AREG0, requested by Alex]
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This accounts for the additional addr_reg2 register.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Also assure i64 alignment where necessary.
Alignment code optimization suggested by malc.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
For targets where TARGET_LONG_BITS != 32, i.e. 64-bit guests,
addr_reg is moved to r4. For hosts without TCG_TARGET_CALL_ALIGN_ARGS
either data_reg2 or data_reg or a masked version thereof would overwrite
r4. Place it in r5 instead, matching TCG_TARGET_CALL_ALIGN_ARGS hosts.
This fixes immediate crashes of 64-bit guests observed on Darwin/ppc but
not on Darwin/ppc64.
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Acked-by: malc <av1474@comtv.ru>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Optionally, make memory access helpers take a parameter for CPUState
instead of relying on global env.
On most targets, perform simple moves to reorder registers. On i386,
switch from regparm(3) calling convention to standard stack-based
version.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Scripted conversion:
for file in *.[hc] hw/*.[hc] hw/kvm/*.[hc] linux-user/*.[hc] linux-user/m68k/*.[hc] bsd-user/*.[hc] darwin-user/*.[hc] tcg/*/*.[hc] target-*/cpu.h; do
sed -i "s/CPUState/CPUArchState/g" $file
done
All occurrences of CPUArchState are expected to be replaced by QOM CPUState,
once all targets are QOM'ified and common fields have been extracted.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Including tcg_out_ld, tcg_out_st, tcg_out_mov, tcg_out_movi.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
Move the declaration and initialisation of some variables in
tcg_out_qemu_ld and tcg_out_qemu_st inside CONFIG_SOFTMMU, to
avoid the "variable set but not used" warning of gcc 4.6.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: malc <av1474@comtv.ru>
Make functions take a parameter for CPUState instead of relying
on global env. Pass CPUState pointer to TCG prologue, which moves
it to AREG0.
Thanks to Peter Maydell and Laurent Desnogues for the ARM prologue
change.
Revert the hacks to avoid AREG0 use on Sparc hosts.
Move cpu_has_work() and cpu_pc_from_tb() from exec.h to cpu.h.
Compile the file without HELPER_CFLAGS.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
We need not reserve the register unless we're going to use it.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: malc <av1474@comtv.ru>
Both tcg_target_init and tcg_target_qemu_prologue
are unused outside of tcg.c.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Mirror tcg_out_movi in having a TYPE parameter. This allows x86_64
to perform the move at the proper width, which may elide a REX prefix.
Introduce a TCG_TYPE_REG enumerator to represent the "native width"
of the host register, and to distinguish the usage from "pointer data"
as represented by the existing TCG_TYPE_PTR.
Update all targets to match.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>