The bswap16 TCG opcode assumes that the high bytes of the temp equal
to 0 before calling it. The ARM backend implementation takes this
assumption to slightly optimize the generated code.
The same implementation is called for implementing the cross-endian
qemu_st16 opcode, where this assumption is not true anymore. One way to
fix that would be to zero the high bytes before calling it. Given the
store instruction just ignore them, it is possible to provide a slightly
more optimized version. With ARMv6+ the rev16 instruction does the work
correctly. For lower ARM versions the patch provides a version which
behaves correctly with non-zero high bytes, but fill them with junk.
Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The TCG arm backend considers likely that the offset to the TLB
entries does not exceed 12 bits for mem_index = 0. In practice this is
not true for at least the MIPS target.
The current patch fixes that by loading the bits 23-12 with a separate
instruction, and using loads with address writeback, independently of
the value of mem_idx. In total this allow a 24-bit offset, which is a
lot more than needed.
Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-stable@nongnu.org
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Implement movcond_i32 for ARM, as the sequence
mov dst, v2 (implicitly done by the tcg common code)
cmp c1, c2
movCC dst, v1
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The code to emit either an immediate cmp or a register cmp insn is
duplicated in several places; factor it out into its own function.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* 'trivial-patches' of git://github.com/stefanha/qemu:
versatilepb: Use symbolic indices for ARM PIC
qdev: kill bogus comment
qemu-barrier: Fix compiler version check for future gcc versions
hw: Add missing 'static' attribute for QEMUMachine
cleanup useless return sentence
qemu-sockets: Fix compiler warning (regression for MinGW)
vnc: Fix spelling (hellmen -> hellman) in comment
slirp: Fix spelling in comment (enought -> enough, insure -> ensure)
tcg/arm: Use tcg_out_mov_reg rather than inline equivalent code
cpu: Add missing 'static' attribute to qemu_global_mutex
configure: Support empty target list (--target-list=)
hw: Fix return value check for bdrv_read, bdrv_write
There are several cases that can be handled easier inside both
translators and code generators if we have out-of-band values
for conditions. It's easy enough to handle ALWAYS and NEVER in
the natural way inside the tcg middle-end.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The TCG jmp operation doesn't really make sense in the QEMU context, it
is unused, it is not implemented by some targets, and it is wrongly
implemented by some others.
This patch simply removes it.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Acked-by: Blue Swirl <blauwirbel@gmail.com>
Acked-by: Stefan Weil<sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Use the recently introduced tcg_out_mov_reg() function rather than
the equivalent inline code.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
The TCG targets no longer need individual implementations.
Since commit 6a18ae2d29,
'flags' is no longer used in tcg_target_get_call_iarg_regs_count.
The remaining tcg_target_get_call_iarg_regs_count is trivial and only
called once. Therefore the patch eliminates it completely.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Now that CONFIG_TCG_PASS_AREG0 is enabled for all targets,
remove dead code and support for !CONFIG_TCG_PASS_AREG0 case.
Remove dyngen-exec.h and all references to it. Although included by
hw/spapr_hcall.c, it does not seem to use it.
Remove unused HELPER_CFLAGS.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
The CONFIG_TCG_PASS_AREG0 code for calling ld/st helpers was
broken in that it did not respect the ABI requirement that 64
bit values were passed in even-odd register pairs. The simplest
way to fix this is to implement some new utility functions
for marshalling function arguments into the correct registers
and stack, so that the code which sets up the address and
data arguments does not need to care whether there has been
a preceding env argument.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Optionally, make memory access helpers take a parameter for CPUState
instead of relying on global env.
On most targets, perform simple moves to reorder registers. On i386,
switch from regparm(3) calling convention to standard stack-based
version.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Scripted conversion:
for file in *.[hc] hw/*.[hc] hw/kvm/*.[hc] linux-user/*.[hc] linux-user/m68k/*.[hc] bsd-user/*.[hc] darwin-user/*.[hc] tcg/*/*.[hc] target-*/cpu.h; do
sed -i "s/CPUState/CPUArchState/g" $file
done
All occurrences of CPUArchState are expected to be replaced by QOM CPUState,
once all targets are QOM'ified and common fields have been extracted.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
ARM still doesn't support 16GB buffers in 32-bit modes, replace the
16GB by 16MB in the comment.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
On ARM, don't map the code buffer at a fixed location, and fix up the
call/goto tcg routines to let it do long jumps.
Mapping the code buffer at a fixed address could sometimes result in it being
mapped over the top of the heap with pretty random results.
Signed-off-by: Dr. David Alan Gilbert <david.gilbert@linaro.org>
Signed-off-by: Andrzej Zaborowski <andrew.zaborowski@intel.com>
Including tcg_out_ld, tcg_out_st, tcg_out_mov, tcg_out_movi.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
Remove the unused function tcg_out_addi() from the ARM TCG backend;
this fixes a compilation failure on ARM hosts with newer gcc.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Make functions take a parameter for CPUState instead of relying
on global env. Pass CPUState pointer to TCG prologue, which moves
it to AREG0.
Thanks to Peter Maydell and Laurent Desnogues for the ARM prologue
change.
Revert the hacks to avoid AREG0 use on Sparc hosts.
Move cpu_has_work() and cpu_pc_from_tb() from exec.h to cpu.h.
Compile the file without HELPER_CFLAGS.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Although the TCG generated code is always in ARM mode, it is possible
that the host code was compiled by gcc in Thumb mode (this is often the
default for Linux distributions targeting ARM v7 only). Handle this
by using BLX imm when doing a call from ARM into Thumb mode.
Since BLX imm is not a conditionalisable instruction, we make
tcg_out_call() no longer take a condition code; we were only ever
using it with COND_AL anyway.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Andrzej Zaborowski <andrew.zaborowski@intel.com>
Add a comment about cache coherency and retranslation, so that people
developping new targets based on existing ones are warned of the issue.
Acked-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Improve constant loading in two ways:
- On all ARM versions, it's possible to load 0xffffff00 = -0x100 using
the mvn rd, #0. Fix the conditions.
- On <= ARMv6 versions, where movw and movt are not available, load the
constants using mov and orr with rotations depending on the constant
to load. This is very useful for example to load constants where the
low byte is 0. This reduce the generated code size by about 7%.
Also fix the coding style at the same time.
Cc: Andrzej Zaborowski <balrog@zabor.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Due to a typo, qemu_st64 doesn't properly byteswap the 32-bit low word of
a 64 bit word before saving it. This patch fixes that.
Acked-by: Andrzej Zaborowski <balrogg@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
QEMU uses code retranslation to restore the CPU state when an exception
happens. For it to work the retranslation must not modify the generated
code. This is what is currently implemented in ARM TCG.
However on CPU that don't have icache/dcache/memory synchronised like
ARM, this requirement is stronger and code retranslation must not modify
the generated code "atomically", as the cache line might be flushed
at any moment (interrupt, exception, task switching), even if not
triggered by QEMU. The probability for this to happen is very low, and
depends on cache size and associativiy, machine load, interrupts, so the
symptoms are might happen randomly.
This requirement is currently not followed in tcg/arm, for the
load/store code, which basically has the following structure:
1) tlb access code is written
2) conditional fast path code is written
3) branch is written with a temporary target
4) slow path code is written
5) branch target is updated
The cache lines corresponding to the retranslated code is not flushed
after code retranslation as the generated code is supposed to be the
same. However if the cache line corresponding to the branch instruction
is flushed between step 3 and 5, and is not flushed again before the
code is executed again, the branch target is wrong. In the guest, the
symptoms are MMU page fault at a random addresses, which leads to
kernel page fault or segmentation faults.
The patch fixes this issue by avoiding writing the branch target until
it is known, that is by writing only the branch instruction first, and
later only the offset.
This fixes booting linux guests on ARM hosts (tested: arm, i386, mips,
mipsel, sh4, sparc).
Acked-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Both tcg_target_init and tcg_target_qemu_prologue
are unused outside of tcg.c.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Mirror tcg_out_movi in having a TYPE parameter. This allows x86_64
to perform the move at the proper width, which may elide a REX prefix.
Introduce a TCG_TYPE_REG enumerator to represent the "native width"
of the host register, and to distinguish the usage from "pointer data"
as represented by the existing TCG_TYPE_PTR.
Update all targets to match.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The beginning of the register allocation order list on the TCG arm
target matches the list of clobbered registers. This means that when an
helper is called, there is almost always clobbered registers that have
to be spilled.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
64-bit arguments should be aligned on an even register as specified
by the "Procedure Call Standard for the ARM Architecture".
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
addr_reg, data_reg and data_reg2 can't be register r0 or r1 du to the
constraints. Don't check if they equals these registers.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
On big endian targets, data arguments of qemu_ld/st ops have to be
byte swapped. Two temporary registers are needed for qemu_st to do
the bswap. r0 and r1 are used in system mode, do the same in user
mode, which implies reworking the constraints.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
While it make sense to pass a conditional argument to tcg_out_*()
functions as the ARM architecture allows that, it doesn't make sense
for qemu_ld/st functions. These functions use comparison instructions
and conditional execution already, so it is not possible to use a
second level of conditional execution.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Add an bswap16 and bswap32 ops, either using the rev and rev16
instructions on ARMv6+ or shifts and logical operations on previous
ARM versions. In both cases the result use less instructions than
the pure TCG version.
These ops are also needed by the qemu_ld/st functions.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Add an ext16u op, either using the uxth instruction on ARMv6+ or two
shifts on previous ARM versions. In both cases the result use the same
number or less instructions than the pure TCG version.
Also move all sign extension code to separate functions, so that they
can be reused in other parts of the code.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Use a set of variables to define the allowed ARM instructions, depending
on the __ARM_ARCH_*__ GCC defines.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The TCG ARM backends uses integer values to refer to both immediate
values and register number. This makes the code difficult to read.
The patch below replaces all (if I haven't miss any ;-) integer values
representing register number by TCG_REG_* enum values.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Instead of writing very compact code, declare all registers that are
clobbered or reserved one by one. This makes the code easier to read.
Also declare all the 16 registers to TCG, and mark pc as reserved.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
There is no need to save the LR register (r14) before a call to a
subroutine. According to the "Procedure Call Standard for the ARM
Architecture", it is the job of the callee to save this register.
Moreover, this register is already saved in the prologue/epilogue.
This patch removes the disabled SAVE_LR code, as there is no need to
reenable later.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Commit 86feb1c860
did not change all occurrences of INDEX_op_qemu_ld32u
for tcg/arm.
Please note that I could not test this patch
(I have currently no arm system available).
Cc: Richard Henderson <rth@twiddle.net>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Some targets (e.g. Alpha and MIPS64) need to keep 32-bit operands
sign-extended in 64-bit registers (regardless of the "real" sign
of the operand). For that, we need to be able to distinguish
between a 32-bit load with a 32-bit result and a 32-bit load with
a given extension to a 64-bit result. This distinction already
exists for the ld* loads, but not the qemu_ld* loads.
Reserve qemu_ld32u for 64-bit outputs and introduce qemu_ld32 for
32-bit outputs. Adjust all code generators to match.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Give the enumeration formed from tcg-opc.h a name: TCGOpcode.
Use that enumeration type instead of "int" whereever appropriate.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
There is no need to save r7, it is used to store the address
of the env structure and is not modified by GCC.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>