Win32 doesn't have a cpuid.h, and MacOSX may have one but without
the __cpuid() function we use, which means that commit 9d2eec20
broke the build for those platforms. Fix this by tightening up
our configure cpuid.h check to test that the functions we need
are present, and adding some missing #ifdef guards in
tcg/i386/tcg-target.c.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
These three-operand shift instructions do not require the shift count
to be placed into ECX. This reduces the number of mov insns required,
with the mere addition of a new register constraint.
Don't attempt to get rid of the matching constraint, as that's impossible
to manipulate with just a new constraint. In addition, constant shifts
still need the matching constraint.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Note that the optimizer cannot simplify ANDC X,Y,C to AND X,Y,~C
so we must handle constants in the implementation of andc.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
These are not needed by users of tcg-target.h. No need to recompile
when we adjust them.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
TCG_TARGET_HAS_movcond_i32 is always defined to 1 in tcg-target.h, so
remove the corresponding #ifdef #endif sequence, left from a previous
refactoring.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The movbe instruction has been added on some Intel Atom CPUs and on
recent Intel Haswell CPUs. It allows to load/store a value and at the
same time bswap it.
This patch detects the avaibility of this instruction and when available
use it in the qemu load/store routines in replacement of load/store +
bswap. Note that for 16-bit unsigned loads, movbe + movzw is basically the
same as movzw + bswap, so the patch doesn't touch this case.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
[RTH: Reduced the number of conditionals using "movop".]
Signed-off-by: Richard Henderson <rth@twiddle.net>
Add support for three-byte opcodes, starting with the 0x0f 0x38 prefix.
Use P_EXT38 as the new constant, and shift all other constants so that
P_EXT and P_EXT38 have neighbouring values.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
[RTH: Changed the name from P_EXT2 to P_EXT38.]
Signed-off-by: Richard Henderson <rth@twiddle.net>
P_REXW is defined has a constant at the beginning of i386/tcg-target.c,
but the corresponding bit is later used in a harcoded way, which defeat
the purpose of a constant.
Fix that by using a conditional expression operator instead of a shift.
On x86 this actually makes the code slightly smaller as GCC does in
practice (opc >> 8) & 8 instead of (opc & 0x800) >> 8 so the constants
are smaller to load.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The comments apply to 8-bit stores, not 8-byte stores.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Once we form a combined qemu_st_i32 opcode, we won't be able to
have separate constraints based on size. This one is fairly easy
to work around, since eax is available as a scratch register.
When storing variable data, this tends to merely exchange one mov
for another. E.g.
-: mov %esi,%ecx
...
-: mov %cl,(%edx)
+: mov %esi,%eax
+: mov %al,(%edx)
Where we do have a regression is when storing constant data, in which
we may load the constant into edi, when only ecx/ebx ought to be used.
The proper way to recover this regression is to allow constants as
arguments to qemu_st_i32, so that we never load the constant data into
a register at all, must less the wrong register. TBD.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Pass two TCGReg to tcg_out_tlb_load, rather than idx+args.
Move ldst_optimization routines just below tcg_out_tlb_load to avoid
the need for forward declarations.
Use TCGReg enum in preference to int where apprpriate.
Signed-off-by: Richard Henderson <rth@twiddle.net>
For 8 and 16-bit unsigned loads, rely on the zero-extension
from the helper and use a smaller 32-bit move insn.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The _cmmu helpers can be moved to exec-all.h. The helpers that are
used from TCG will shortly need access to tcg_target_long so move
their declarations into tcg.h.
This requires minor include adjustments to all TCG backends.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Since we now perform it inside the helper, no need to do it here.
This also lets us perform a tail-call from the store slow path to
the helper.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Discontinue the jump-around-jump-to-jump scheme, trading it for a single
immediate move instruction. The two extra jumps always consume 7 bytes,
whereas the immediate move is either 5 or 7 bytes depending on where the
code_gen_buffer gets located.
Signed-off-by: Richard Henderson <rth@twiddle.net>
No point in splitting the write into 32-bit pieces.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
We can check the condition at compile time, rather than run time.
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
These will necessarily be the same layout for all hosts. This limits
the amount of boilerplate required to implement jit debug for a host.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
We've got a compile-time check for the condition in exec/cpu-defs.h.
Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: liguang <lig.fnst@cn.fujitsu.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Existing compile-time detection is spotty at best. Convert
it all to runtime detection instead.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Add optimized TCG qemu_ld/st generation which locates the code of TLB miss
cases at the end of a block after generating the other IRs.
Currently, this optimization supports only i386 and x86_64 hosts.
Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
When we allocate a reserved_va for the guest, the kernel will likely
choose an address well above 4G. At which point we must use a pair
of movabsq+addq to form the host address. If we have OS support,
set up a segment register to point to guest_base instead.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
On x86_64, remove the constraint on the third argument register which
is not needed:
- For loads the helper arguments are env, addr, mem_idx. The addr
value should not be in the two first argument registers as they are
used in tcg_out_tlb_load().
- For stores the helper arguments are env, addr, data, mem_idx.
The addr and data values should not be in the two first argument
registers as they are used in tcg_out_tlb_load(). The data value
should also not be in the two first argument registers, but could
be in the third argument register in which case it would be already
loaded at the right location.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Now that CONFIG_TCG_PASS_AREG0 has been removed, it's easier to get
an optimal code for the load/store functions.
First swap the two registers used in tcg_out_tlb_load() so that the
address end-up in the second register instead of the first one. Adjust
tcg_out_qemu_ld() and tcg_out_qemu_st() to respectively call
tcg_out_qemu_ld_direct() and tcg_out_qemu_st_direct() with the correct
registers. Then replace the register shifting by direct load of the
arguments.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
There are several cases that can be handled easier inside both
translators and code generators if we have out-of-band values
for conditions. It's easy enough to handle ALWAYS and NEVER in
the natural way inside the tcg middle-end.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The TCG jmp operation doesn't really make sense in the QEMU context, it
is unused, it is not implemented by some targets, and it is wrongly
implemented by some others.
This patch simply removes it.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Acked-by: Blue Swirl <blauwirbel@gmail.com>
Acked-by: Stefan Weil<sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The movcond_i32 op has to be protected with TCG_TARGET_HAS_movcond_i32
to fix the build with -march < i686.
Thanks to Richard Henderson for the hint.
Reported-by: Alex Barcelo <abarcelo@ac.upc.edu>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The TCG targets no longer need individual implementations.
Since commit 6a18ae2d29,
'flags' is no longer used in tcg_target_get_call_iarg_regs_count.
The remaining tcg_target_get_call_iarg_regs_count is trivial and only
called once. Therefore the patch eliminates it completely.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
32 bit x86 hosts don't need registers for helper function arguments
because they use the default stack based calling convention.
Removing the registers allows simpler code for function
tcg_target_get_call_iarg_regs_count.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
While 64 bit hosts use the first three registers which are also used
as function input parameters, 32 bit hosts use TCG_REG_EAX and
TCG_REG_EDX which are not used in parameter passing.
After defining new register macros for the registers used in L
constraint, the patch replaces most occurrences of
tcg_target_call_iarg_regs[0], tcg_target_call_iarg_regs[1] and
tcg_target_call_iarg_regs[2] by those new macros.
tcg_target_call_iarg_regs remains unchanged when it is used for input
arguments (only with 64 bit hosts) before tcg_out_calli.
A comment related to those registers was fixed, too.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
[aurel32: build fix on i386, small optimization for i386 in the prologue]
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
TCG uses 6 registers for function arguments on 64 bit Linux hosts,
but only 4 registers on W64 hosts.
Commit 2999a0b200 increased the number
of arguments for some important helper functions from 4 to 5
which triggered a bug for W64 hosts: QEMU aborts when executing
helper_lcall_real in the guest's BIOS because function
tcg_target_get_call_iarg_regs_count always returned 6.
As W64 has only 4 registers for arguments, the 5th argument must be
passed on the stack using a correct stack offset.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
On x86, it is possible to move a constant value to memory. Add code to
handle a constant argument to load/store ops.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Now that CONFIG_TCG_PASS_AREG0 is enabled for all targets,
remove dead code and support for !CONFIG_TCG_PASS_AREG0 case.
Remove dyngen-exec.h and all references to it. Although included by
hw/spapr_hcall.c, it does not seem to use it.
Remove unused HELPER_CFLAGS.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
w64 uses the registers rcx, rdx, r8 and r9 for function arguments,
so it needs a different declaration of tcg_target_call_iarg_regs.
rax, rcx, rdx, r8, r9, r10 and r11 may be changed by function calls.
rbx, rbp, rdi, rsi, r12, r13, r14 and r15 remain unchanged by function calls.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Stefan Weil <sw@weilnetz.de>