Existing compile-time detection is spotty at best. Convert
it all to runtime detection instead.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Add optimized TCG qemu_ld/st generation which locates the code of TLB miss
cases at the end of a block after generating the other IRs.
Currently, this optimization supports only i386 and x86_64 hosts.
Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
When we allocate a reserved_va for the guest, the kernel will likely
choose an address well above 4G. At which point we must use a pair
of movabsq+addq to form the host address. If we have OS support,
set up a segment register to point to guest_base instead.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
On x86_64, remove the constraint on the third argument register which
is not needed:
- For loads the helper arguments are env, addr, mem_idx. The addr
value should not be in the two first argument registers as they are
used in tcg_out_tlb_load().
- For stores the helper arguments are env, addr, data, mem_idx.
The addr and data values should not be in the two first argument
registers as they are used in tcg_out_tlb_load(). The data value
should also not be in the two first argument registers, but could
be in the third argument register in which case it would be already
loaded at the right location.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Now that CONFIG_TCG_PASS_AREG0 has been removed, it's easier to get
an optimal code for the load/store functions.
First swap the two registers used in tcg_out_tlb_load() so that the
address end-up in the second register instead of the first one. Adjust
tcg_out_qemu_ld() and tcg_out_qemu_st() to respectively call
tcg_out_qemu_ld_direct() and tcg_out_qemu_st_direct() with the correct
registers. Then replace the register shifting by direct load of the
arguments.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
There are several cases that can be handled easier inside both
translators and code generators if we have out-of-band values
for conditions. It's easy enough to handle ALWAYS and NEVER in
the natural way inside the tcg middle-end.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The TCG jmp operation doesn't really make sense in the QEMU context, it
is unused, it is not implemented by some targets, and it is wrongly
implemented by some others.
This patch simply removes it.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Acked-by: Blue Swirl <blauwirbel@gmail.com>
Acked-by: Stefan Weil<sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The movcond_i32 op has to be protected with TCG_TARGET_HAS_movcond_i32
to fix the build with -march < i686.
Thanks to Richard Henderson for the hint.
Reported-by: Alex Barcelo <abarcelo@ac.upc.edu>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The TCG targets no longer need individual implementations.
Since commit 6a18ae2d29,
'flags' is no longer used in tcg_target_get_call_iarg_regs_count.
The remaining tcg_target_get_call_iarg_regs_count is trivial and only
called once. Therefore the patch eliminates it completely.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
32 bit x86 hosts don't need registers for helper function arguments
because they use the default stack based calling convention.
Removing the registers allows simpler code for function
tcg_target_get_call_iarg_regs_count.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
While 64 bit hosts use the first three registers which are also used
as function input parameters, 32 bit hosts use TCG_REG_EAX and
TCG_REG_EDX which are not used in parameter passing.
After defining new register macros for the registers used in L
constraint, the patch replaces most occurrences of
tcg_target_call_iarg_regs[0], tcg_target_call_iarg_regs[1] and
tcg_target_call_iarg_regs[2] by those new macros.
tcg_target_call_iarg_regs remains unchanged when it is used for input
arguments (only with 64 bit hosts) before tcg_out_calli.
A comment related to those registers was fixed, too.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
[aurel32: build fix on i386, small optimization for i386 in the prologue]
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
TCG uses 6 registers for function arguments on 64 bit Linux hosts,
but only 4 registers on W64 hosts.
Commit 2999a0b200 increased the number
of arguments for some important helper functions from 4 to 5
which triggered a bug for W64 hosts: QEMU aborts when executing
helper_lcall_real in the guest's BIOS because function
tcg_target_get_call_iarg_regs_count always returned 6.
As W64 has only 4 registers for arguments, the 5th argument must be
passed on the stack using a correct stack offset.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
On x86, it is possible to move a constant value to memory. Add code to
handle a constant argument to load/store ops.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Now that CONFIG_TCG_PASS_AREG0 is enabled for all targets,
remove dead code and support for !CONFIG_TCG_PASS_AREG0 case.
Remove dyngen-exec.h and all references to it. Although included by
hw/spapr_hcall.c, it does not seem to use it.
Remove unused HELPER_CFLAGS.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
w64 uses the registers rcx, rdx, r8 and r9 for function arguments,
so it needs a different declaration of tcg_target_call_iarg_regs.
rax, rcx, rdx, r8, r9, r10 and r11 may be changed by function calls.
rbx, rbp, rdi, rsi, r12, r13, r14 and r15 remain unchanged by function calls.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Not all i386 / x86_64 hosts use ELF.
Ask the compiler whether ELF is used.
On w64, gdb crashes when ELF_HOST_MACHINE is defined.
Cc: Blue Swirl <blauwirbel@gmail.com>
Acked-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This allows us to generate unwind info for the dynamicly generated
code in the code_gen_buffer. Only i386 is converted at this point.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Optionally, make memory access helpers take a parameter for CPUState
instead of relying on global env.
On most targets, perform simple moves to reorder registers. On i386,
switch from regparm(3) calling convention to standard stack-based
version.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Use stack based calling convention (GCC default) for interfacing with
generated code instead of register based convention (regparm(3)).
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Scripted conversion:
for file in *.[hc] hw/*.[hc] hw/kvm/*.[hc] linux-user/*.[hc] linux-user/m68k/*.[hc] bsd-user/*.[hc] darwin-user/*.[hc] tcg/*/*.[hc] target-*/cpu.h; do
sed -i "s/CPUState/CPUArchState/g" $file
done
All occurrences of CPUArchState are expected to be replaced by QOM CPUState,
once all targets are QOM'ified and common fields have been extracted.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
tcg_out_label is always called with a third argument of pointer type
which was casted to tcg_target_long.
These casts can be avoided by changing the prototype of tcg_out_label.
There was also a cast to long. For most hosts with
sizeof(long) == sizeof(tcg_target_long) == sizeof(void *) this did not
matter, but for w64 it was wrong. This is fixed now.
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Including tcg_out_ld, tcg_out_st, tcg_out_mov, tcg_out_movi.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
x86 cannot provide an optimized generic deposit implementation. But at
least for a few special cases, namely for writing bits 0..7, 8..15, and
0..15, versions using only a single instruction are feasible.
Introducing such limited support improves emulating 16-bit x86 code on
x86, but also rarer cases where 32-bit or 64-bit code accesses bytes or
words.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The second register is only needed for 32 bit hosts.
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Except for specific cases where the use of %esp changes the encoding of
the instruction, it's cleaner to use TCG_REG_CALL_STACK instead of
TCG_REG_ESP.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Make functions take a parameter for CPUState instead of relying
on global env. Pass CPUState pointer to TCG prologue, which moves
it to AREG0.
Thanks to Peter Maydell and Laurent Desnogues for the ARM prologue
change.
Revert the hacks to avoid AREG0 use on Sparc hosts.
Move cpu_has_work() and cpu_pc_from_tb() from exec.h to cpu.h.
Compile the file without HELPER_CFLAGS.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Both tcg_target_init and tcg_target_qemu_prologue
are unused outside of tcg.c.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Mirror tcg_out_movi in having a TYPE parameter. This allows x86_64
to perform the move at the proper width, which may elide a REX prefix.
Introduce a TCG_TYPE_REG enumerator to represent the "native width"
of the host register, and to distinguish the usage from "pointer data"
as represented by the existing TCG_TYPE_PTR.
Update all targets to match.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Make fallthru be TLB hit and branch be TLB miss. Doing this
both improves branch prediction and will allow further cleanup.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Splitting out these functions will allow further cleanups.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Include it in the opcode as an extension, as with P_EXT
or the REX bits in the x86-64 port.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The result is shorter than the mov+add that TCG would
otherwise generate for us.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Implement full modrm+sib addressing mode processing.
Use that in qemu_ld/st to output the LEA.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>