Commit Graph

1297 Commits

Author SHA1 Message Date
Laurent Vivier
090d0bfd94 s390: fix softmmu compilation
guest_base must be used only in linux-user mode.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Message-id: 1440757421-9674-1-git-send-email-laurent@vivier.eu
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-08-28 16:05:24 +01:00
Laurent Vivier
b76f21a707 linux-user: remove useless macros GUEST_BASE and RESERVED_VA
As we have removed CONFIG_USE_GUEST_BASE, we always use a guest base
and the macros GUEST_BASE and RESERVED_VA become useless: replace
them by their values.

Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <1440420834-8388-1-git-send-email-laurent@vivier.eu>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:14:30 -07:00
Laurent Vivier
4cbea59869 linux-user: remove --enable-guest-base/--disable-guest-base
All tcg host architectures now support the guest base and as
there is no real performance lost, it can be always enabled.

Anyway, guest base use can be disabled lively by setting guest
base to 0.

CONFIG_USE_GUEST_BASE is defined as (USE_GUEST_BASE && USER_ONLY),
it should have to be replaced by CONFIG_USER_ONLY in non CONFIG_USER_ONLY
parts, but as some other parts are using !CONFIG_SOFTMMU I have chosen to
use !CONFIG_SOFTMMU instead.

Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <1440373328-9788-2-git-send-email-laurent@vivier.eu>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:14:17 -07:00
Richard Henderson
9ee14902bf tcg/aarch64: Use softmmu fast path for unaligned accesses
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Richard Henderson
a5e39810b9 tcg/s390: Use softmmu fast path for unaligned accesses
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Benjamin Herrenschmidt
68d45bb61c tcg/ppc: Improve unaligned load/store handling on 64-bit backend
Currently, we get to the slow path for any unaligned access in the
backend, because we effectively preserve the bottom address bits
below the alignment requirement when comparing with the TLB entry,
so any non-0 bit there will cause the compare to fail.

For the same number of instructions, we can instead add the access
size - 1 to the address and stick to clearing all the bottom bits.

That means that normal unaligned accesses will not fallback (the HW
will handle them fine). Only when crossing a page boundary well we
end up having a mismatch because we'll end up pointing to the next
page which cannot possibly be in that same TLB entry.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Message-Id: <1437455978.5809.2.camel@kernel.crashing.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Aurelien Jarno
8cc580f6a0 tcg/i386: use softmmu fast path for unaligned accesses
Softmmu unaligned load/stores currently goes through through the slow
path for two reasons:
  - to support unaligned access on host with strict alignement
  - to correctly handle accesses crossing pages

x86 is only concerned by the second reason. Unaligned accesses are
avoided by compilers, but are not uncommon. We therefore would like
to see them going through the fast path, if they don't cross pages.

For that we can use the fact that two adjacent TLB entries can't contain
the same page. Therefore accessing the TLB entry corresponding to the
first byte, but comparing its content to page address of the last byte
ensures that we don't cross pages. We can do this check without adding
more instructions in the TLB code (but increasing its length by one
byte) by using the LEA instruction to combine the existing move with the
size addition.

On an x86-64 host, this gives a 3% boot time improvement for a powerpc
guest and 4% for an x86-64 guest.

[rth: Tidied calculation of the offset mask]

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1436467197-2183-1-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Richard Henderson
ecc7b3aa71 tcg: Remove tcg_gen_trunc_i64_i32
Replacing it with tcg_gen_extrl_i64_i32.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Richard Henderson
609ad70562 tcg: Split trunc_shr_i32 opcode into extr[lh]_i64_i32
Rather than allow arbitrary shift+trunc, only concern ourselves
with low and high parts.  This is all that was being used anyway.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Aurelien Jarno
870ad1547a tcg: update README about size changing ops
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Aurelien Jarno
8bcb5c8f34 tcg/optimize: add optimizations for ext_i32_i64 and extu_i32_i64 ops
They behave the same as ext32s_i64 and ext32u_i64 from the constant
folding and zero propagation point of view, except that they can't
be replaced by a mov, so we don't compute the affected value.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Aurelien Jarno
4f2331e5b6 tcg: implement real ext_i32_i64 and extu_i32_i64 ops
Implement real ext_i32_i64 and extu_i32_i64 ops. They ensure that a
32-bit value is always converted to a 64-bit value and not propagated
through the register allocator or the optimizer.

Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Stefan Weil <sw@weilnetz.de>
Acked-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Aurelien Jarno
6acd2558fd tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32
The tcg_gen_trunc_shr_i64_i32 function takes a 64-bit argument and
returns a 32-bit value. Directly call tcg_gen_op3 with the correct
types instead of calling tcg_gen_op3i_i32 and abusing the TCG types.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Aurelien Jarno
0632e555fc tcg: rename trunc_shr_i32 into trunc_shr_i64_i32
The op is sometimes named trunc_shr_i32 and sometimes trunc_shr_i64_i32,
and the name in the README doesn't match the name offered to the
frontends.

Always use the long name to make it clear it is a size changing op.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Aurelien Jarno
299f801304 tcg/optimize: allow constant to have copies
Now that copies and constants are tracked separately, we can allow
constant to have copies, deferring the choice to use a register or a
constant to the register allocation pass. This prevent this kind of
regular constant reloading:

-OUT: [size=338]
+OUT: [size=298]
   mov    -0x4(%r14),%ebp
   test   %ebp,%ebp
   jne    0x7ffbe9cb0ed6
   mov    $0x40002219f8,%rbp
   mov    %rbp,(%r14)
-  mov    $0x40002219f8,%rbp
   mov    $0x4000221a20,%rbx
   mov    %rbp,(%rbx)
   mov    $0x4000000000,%rbp
   mov    %rbp,(%r14)
-  mov    $0x4000000000,%rbp
   mov    $0x4000221d38,%rbx
   mov    %rbp,(%rbx)
   mov    $0x40002221a8,%rbp
   mov    %rbp,(%r14)
-  mov    $0x40002221a8,%rbp
   mov    $0x4000221d40,%rbx
   mov    %rbp,(%rbx)
   mov    $0x4000019170,%rbp
   mov    %rbp,(%r14)
-  mov    $0x4000019170,%rbp
   mov    $0x4000221d48,%rbx
   mov    %rbp,(%rbx)
   mov    $0x40000049ee,%rbp
   mov    %rbp,0x80(%r14)
   mov    %r14,%rdi
   callq  0x7ffbe99924d0
   mov    $0x4000001680,%rbp
   mov    %rbp,0x30(%r14)
   mov    0x10(%r14),%rbp
   mov    $0x4000001680,%rbp
   mov    %rbp,0x30(%r14)
   mov    0x10(%r14),%rbp
   shl    $0x20,%rbp
   mov    (%r14),%rbx
   mov    %ebx,%ebx
   mov    %rbx,(%r14)
   or     %rbx,%rbp
   mov    %rbp,0x10(%r14)
   mov    %rbp,0x90(%r14)
   mov    0x60(%r14),%rbx
   mov    %rbx,0x38(%r14)
   mov    0x28(%r14),%rbx
   mov    $0x4000220e60,%r12
   mov    %rbx,(%r12)
   mov    $0x40002219c8,%rbx
   mov    %rbp,(%rbx)
   mov    0x20(%r14),%rbp
   sub    $0x8,%rbp
   mov    $0x4000004a16,%rbx
   mov    %rbx,0x0(%rbp)
   mov    %rbp,0x20(%r14)
   mov    $0x19,%ebp
   mov    %ebp,0xa8(%r14)
   mov    $0x4000015110,%rbp
   mov    %rbp,0x80(%r14)
   xor    %eax,%eax
   jmpq   0x7ffbebcae426
   lea    -0x5f6d72a(%rip),%rax        # 0x7ffbe3d437b3
   jmpq   0x7ffbebcae426

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Aurelien Jarno
b41059dd9d tcg/optimize: track const/copy status separately
Instead of using an enum which could be either a copy or a const, track
them separately. This will be used in the next patch.

Constants are tracked through a bool. Copies are tracked by initializing
temp's next_copy and prev_copy to itself, allowing to simplify the code
a bit.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:53 -07:00
Aurelien Jarno
d9c769c609 tcg/optimize: add temp_is_const and temp_is_copy functions
Add two accessor functions temp_is_const and temp_is_copy, to make the
code more readable and make code change easier.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:53 -07:00
Aurelien Jarno
1208d7dd5f tcg/optimize: optimize temps tracking
The tcg_temp_info structure uses 24 bytes per temp. Now that we emulate
vector registers on most guests, it's not uncommon to have more than 100
used temps. This means we have initialize more than 2kB at least twice
per TB, often more when there is a few goto_tb.

Instead used a TCGTempSet bit array to track which temps are in used in
the current basic block. This means there are only around 16 bytes to
initialize.

This improves the boot time of a MIPS guest on an x86-64 host by around
7% and moves out tcg_optimize from the the top of the profiler list.

[rth: Handle TCG_CALL_DUMMY_ARG]

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:30 -07:00
Aurelien Jarno
29f3ff8d6c tcg/optimize: fix constant signedness
By convention, on a 64-bit host TCG internally stores 32-bit constants
as sign-extended. This is not the case in the optimizer when a 32-bit
constant is folded.

This doesn't seem to have more consequences than suboptimal code
generation. For instance the x86 backend assumes sign-extended constants,
and in some rare cases uses a 32-bit unsigned immediate 0xffffffff
instead of a 8-bit signed immediate 0xff for the constant -1. This is
with a ppc guest:

before
------

 ---- 0x9f29cc
 movi_i32 tmp1,$0xffffffff
 movi_i32 tmp2,$0x0
 add2_i32 tmp0,CA,CA,tmp2,r6,tmp2
 add2_i32 tmp0,CA,tmp0,CA,tmp1,tmp2
 mov_i32 r10,tmp0

0x7fd8c7dfe90c:  xor    %ebp,%ebp
0x7fd8c7dfe90e:  mov    %ebp,%r11d
0x7fd8c7dfe911:  mov    0x18(%r14),%r9d
0x7fd8c7dfe915:  add    %r9d,%r10d
0x7fd8c7dfe918:  adc    %ebp,%r11d
0x7fd8c7dfe91b:  add    $0xffffffff,%r10d
0x7fd8c7dfe922:  adc    %ebp,%r11d
0x7fd8c7dfe925:  mov    %r11d,0x134(%r14)
0x7fd8c7dfe92c:  mov    %r10d,0x28(%r14)

after
-----

 ---- 0x9f29cc
 movi_i32 tmp1,$0xffffffffffffffff
 movi_i32 tmp2,$0x0
 add2_i32 tmp0,CA,CA,tmp2,r6,tmp2
 add2_i32 tmp0,CA,tmp0,CA,tmp1,tmp2
 mov_i32 r10,tmp0

0x7f37010d490c:  xor    %ebp,%ebp
0x7f37010d490e:  mov    %ebp,%r11d
0x7f37010d4911:  mov    0x18(%r14),%r9d
0x7f37010d4915:  add    %r9d,%r10d
0x7f37010d4918:  adc    %ebp,%r11d
0x7f37010d491b:  add    $0xffffffffffffffff,%r10d
0x7f37010d491f:  adc    %ebp,%r11d
0x7f37010d4922:  mov    %r11d,0x134(%r14)
0x7f37010d4929:  mov    %r10d,0x28(%r14)

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1436544211-2769-2-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:08 -07:00
Aurelien Jarno
c99d69694a tcg/mips: fix add2
The add2 code in the tcg_out_addsub2 function doesn't take into account
the case where rl == al == bl. In that case we can't compute the carry
after the addition. As it corresponds to a multiplication by 2, the
carry bit is the bit 31.

While this is a corner case, this prevents x86-64 guests to boot on a
MIPS host.

Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2015-08-01 09:39:50 +02:00
Aurelien Jarno
3c8691f568 tcg/s390x: Mask TCGMemOp appropriately for indexing
Commit 2b7ec66f fixed TCGMemOp masking following the MO_AMASK addition,
but two cases were forgotten in the TCG S390 backend.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2015-08-01 09:39:37 +02:00
Aurelien Jarno
4214a8cb7c tcg/mips: Mask TCGMemOp appropriately for indexing
Commit 2b7ec66f fixed TCGMemOp masking following the MO_AMASK addition,
but two cases were forgotten in the TCG MIPS backend.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2015-08-01 09:39:33 +02:00
Aurelien Jarno
e72c4fb81d tcg/mips: fix TLB loading for BE host with 32-bit guests
For 32-bit guest, we load a 32-bit address from the TLB, so there is no
need to compensate for the low or high part. This fixes 32-bit guests on
big-endian hosts.

Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2015-08-01 09:38:36 +02:00
Aurelien Jarno
bbeb82395e tcg: mark temps as mem_coherent = 0 for mov with a constant
When a constant has to be loaded in a mov op, we fail to set
mem_coherent = 0. This patch fixes that.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1437994568-7825-3-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-07-27 07:25:40 -07:00
Aurelien Jarno
7df69deadf tcg: correctly mark dead inputs for mov with a constant
When tcg_reg_alloc_mov propagate a constant, we failed to correctly mark
a temp as dead if the liveness analysis hints so. This fixes the
following assert when configure with --enable-debug-tcg:

  qemu-x86_64: tcg/tcg.c:1827: tcg_reg_alloc_bb_end: Assertion `ts->val_type == TEMP_VAL_DEAD' failed.

Cc: Richard Henderson <rth@twiddle.net>
Reported-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1437994568-7825-2-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-07-27 07:25:40 -07:00
Aurelien Jarno
961521261a tcg/optimize: fix tcg_opt_gen_movi
Due to a copy&paste, the new op value is tested against mov_i32 instead
of movi_i32. The test is therefore always false. Fix that.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1436544211-2769-1-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-07-23 20:37:12 -07:00
Richard Henderson
80adb8fcad tcg/aarch64: use 32-bit offset for 32-bit softmmu emulation
Similar to the same fix for user-mode, except this instance
occurs on the softmmu path.  Again, the tlb addend must be
the base register, while the guest address is the index.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-07-23 20:19:44 -07:00
Paolo Bonzini
ffc6372851 tcg/aarch64: use 32-bit offset for 32-bit user-mode emulation
Thanks to the previous patch, it is now easy for tcg_out_qemu_ld and
tcg_out_qemu_st to use a 32-bit zero extended offset.  However, the
guest base register x28 must be the base and addr_reg must be the
index.

Reported-by: Leon Alrae <leon.alrae@imgtec.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1436974021-28978-3-git-send-email-pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-07-23 15:09:12 -07:00
Paolo Bonzini
6c0f0c0f12 tcg/aarch64: add ext argument to tcg_out_insn_3310
The new argument lets you pick uxtw or uxtx mode for the offset
register.  For now, all callers pass TCG_TYPE_I64 so that uxtx
is generated.  The bits for uxtx are removed from I3312_TO_I3310.

Reported-by: Leon Alrae <leon.alrae@imgtec.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1436974021-28978-2-git-send-email-pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-07-23 15:09:04 -07:00
Richard Henderson
ee8ba9e4d8 tcg/i386: Extend addresses for 32-bit guests
Removing the ??? comment explaining why it (mostly) worked.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1437081950-7206-2-git-send-email-rth@twiddle.net>
2015-07-23 15:09:04 -07:00
Stefan Weil
6e3c0c6edb tci: Fix regression with INDEX_op_qemu_st_i32, INDEX_op_qemu_st_i64
Commit 59227d5d45 did not update the
code in tcg/tci/tcg-target.c for those two cases.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-id: 1436556159-3002-1-git-send-email-sw@weilnetz.de
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-07-13 10:07:38 +01:00
James Hogan
a8f13961fd tcg/mips: Fix build error from merged memop+mmu_idx parameter
Commit 3972ef6f83 ("tcg: Push merged memop+mmu_idx parameter to
softmmu routines") caused the following build errors when building TCG
for MIPS:

In file included from tcg/tcg.c:258:0:
tcg/mips/tcg-target.c In function ‘tcg_out_qemu_ld_slow_path’:
tcg/mips/tcg-target.c:1015:22: error: ‘lb’ undeclared (first use in this function)
tcg/mips/tcg-target.c In function ‘tcg_out_qemu_st_slow_path’:
tcg/mips/tcg-target.c:1058:22: error: ‘lb’ undeclared (first use in this function)

It looks like lb was meant to refer to the TCGLabelQemuLdst *l
parameter, so fix both references to lb to refer to just l.

Fixes: 3972ef6f83 ("tcg: Push merged memop+mmu_idx parameter to softmmu routines")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Message-id: 1436433435-24898-2-git-send-email-james.hogan@imgtec.com
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-07-09 13:51:27 +01:00
Aurelien Jarno
cd3b29b745 tcg/s390: fix branch target change during code retranslation
Make sure to not modify the branch target. This ensure that the
branch target is not corrupted during partial retranslation.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07 17:51:47 +02:00
Peter Crosthwaite
6e0b07306d cpu-defs: Move CPU_TEMP_BUF_NLONGS to tcg
The usages of this define are pure TCG and there is no architecture
specific variation of the value. Localise it to the TCG engine to
remove another architecture agnostic piece from cpu-defs.h.

This follows on from a28177820a where
temp_buf was moved out of the CPU_COMMON obsoleting the need for
the super early definition.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-Id: <498e8e5325c1a1aff79e5bcfc28cb760ef6b214e.1433052532.git.crosthwaite.peter@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-26 16:00:50 +02:00
Aurelien Jarno
36e60ef6ac tcg/optimize: rename tcg_constant_folding
The tcg_constant_folding folding ends up doing all the optimizations
(which is a good thing to avoid looping on all ops multiple time), so
make it clear and just rename it tcg_optimize.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447607-31184-6-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09 07:00:56 -07:00
Aurelien Jarno
97a79eb70d tcg/optimize: fold constant test in tcg_opt_gen_mov
Most of the calls to tcg_opt_gen_mov are preceeded by a test to check if
the source temp is a constant. Fold that into the tcg_opt_gen_mov
function.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433495958-9508-1-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09 07:00:56 -07:00
Aurelien Jarno
5365718a9a tcg/optimize: fold temp copies test in tcg_opt_gen_mov
Each call to tcg_opt_gen_mov is preceeded by a test to check if the
source and destination temps are copies. Fold that into the
tcg_opt_gen_mov function.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447607-31184-4-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09 07:00:56 -07:00
Aurelien Jarno
8d6a91602e tcg/optimize: remove opc argument from tcg_opt_gen_mov
We can get the opcode using the TCGOp pointer. It needs to be
dereferenced, but it's anyway done a few lines below to write
the new value.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447607-31184-3-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09 07:00:56 -07:00
Aurelien Jarno
ebd27391b0 tcg/optimize: remove opc argument from tcg_opt_gen_movi
We can get the opcode using the TCGOp pointer. It needs to be
dereferenced, but it's anyway done a few lines below to write
the new value.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447607-31184-2-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09 07:00:56 -07:00
Aurelien Jarno
c19f47bf5e tcg: fix dead computation for repeated input arguments
When the same temp is used twice or more as an input argument to a TCG
instruction, the dead computation code doesn't recognize the second use
as a dead temp. This is because the temp is marked as live in the same
loop where dead inputs are checked.

The fix is to split the loop in two parts. This avoid emitting a move
and using a register for the movcond instruction when used as "move if
true" on x86-64. This might bring more improvements on RISC TCG targets
which don't have outputs aliased to inputs.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447228-29425-3-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09 06:42:27 -07:00
Aurelien Jarno
7e1df267a7 tcg: fix register allocation with two aliased dead inputs
For TCG ops with two outputs registers (add2, sub2, div2, div2u), when
the same input temp is used for the two inputs aliased to the two
outputs, and when these inputs are both dead, the register allocation
code wrongly assigned the same register to the same output.

This happens for example with sub2 t1, t2, t3, t3, t4, t5, when t3 is
not used anymore after the TCG op.  In that case the same register is
used for t1, t2 and t3.

The fix is to look for already allocated aliased input when allocating
a dead aliased input and check that the register is not already
used.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447228-29425-2-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09 06:42:27 -07:00
Richard Henderson
59c4b7e8df tcg: Handle MO_AMASK in tcg_dump_ops
Reviewed-by: Yongbok Kim <yongbok.kim@imgtec.com>
Tested-by: Yongbok Kim <yongbok.kim@imgtec.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09 06:35:53 -07:00
Richard Henderson
2b7ec66f02 tcg: Mask TCGMemOp appropriately for indexing
The addition of MO_AMASK means that places that used inverted masks
need to be changed to use positive masks, and places that failed to
mask the intended bits need updating.

Reviewed-by: Yongbok Kim <yongbok.kim@imgtec.com>
Tested-by: Yongbok Kim <yongbok.kim@imgtec.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09 06:35:29 -07:00
Paolo Bonzini
006f8638c6 tcg: add TCG_TARGET_TLB_DISPLACEMENT_BITS
This will be used to size the TLB when more than 8 MMU modes are
used by the target.  Limitations come from the limited size of
the immediate fields (which sometimes, as in the case of Aarch64,
extend to instructions that shift the immediate).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1424436345-37924-2-git-send-email-pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03 23:56:56 +02:00
Paolo Bonzini
5a58e884d1 tci: do not use CPUArchState in tcg-target.h
tcg-target.h does not use any QEMU-specific symbols, save for tci's usage
of CPUArchState.  Pull that up to tcg/tcg.h.

This will make it possible to include tcg-target.h in cpu-defs.h.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03 23:56:55 +02:00
Richard Henderson
dfb3630562 tcg: Add MO_ALIGN, MO_UNALN
These modifiers control, on a per-memory-op basis, whether
unaligned memory accesses are allowed.  The default setting
reflects the target's definition of ALIGNED_ONLY.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-05-14 12:15:18 -07:00
Richard Henderson
3972ef6f83 tcg: Push merged memop+mmu_idx parameter to softmmu routines
The extra information is not yet used but it is now available.
This requires minor changes through all of the tcg backends.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-05-14 12:15:14 -07:00
Richard Henderson
59227d5d45 tcg: Merge memop and mmu_idx parameters to qemu_ld/st
At the tcg opcode level, not at the tcg-op.h generator level.
This requires minor changes through all of the tcg backends,
but none of the cpu translators.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-05-14 12:14:55 -07:00
Emilio G. Cota
00c8fa9ffe tcg: optimise memory layout of TCGTemp
This brings down the size of the struct from 56 to 32 bytes on 64-bit,
and to 20 bytes on 32-bit. This leads to memory savings:

Before:
$ find . -name 'tcg.o' | xargs size
   text    data     bss     dec     hex filename
  41131   29800      88   71019   1156b ./aarch64-softmmu/tcg/tcg.o
  37969   29416      96   67481   10799 ./x86_64-linux-user/tcg/tcg.o
  39354   28816      96   68266   10aaa ./arm-linux-user/tcg/tcg.o
  40802   29096      88   69986   11162 ./arm-softmmu/tcg/tcg.o
  39417   29672      88   69177   10e39 ./x86_64-softmmu/tcg/tcg.o

After:
$ find . -name 'tcg.o' | xargs size
   text    data     bss     dec     hex filename
  40883   29800      88   70771   11473 ./aarch64-softmmu/tcg/tcg.o
  37473   29416      96   66985   105a9 ./x86_64-linux-user/tcg/tcg.o
  38858   28816      96   67770   108ba ./arm-linux-user/tcg/tcg.o
  40554   29096      88   69738   1106a ./arm-softmmu/tcg/tcg.o
  39169   29672      88   68929   10d41 ./x86_64-softmmu/tcg/tcg.o

Note that using an entire byte for some enums that need less than
that wastes a few bits (noticeable in 32 bits, where we use
20 bytes instead of 16) but avoids extraction code, which overall
is a win--I've tested several variations of the patch, and the appended
is the best performer for OpenSSL's bntest by a very small margin:

Before:
$ taskset -c 0 perf stat -r 15 -- x86_64-linux-user/qemu-x86_64 img/bntest-x86_64 >/dev/null
[...]
 Performance counter stats for 'x86_64-linux-user/qemu-x86_64 img/bntest-x86_64' (15 runs):

      10538.479833 task-clock (msec)  # 0.999 CPUs utilized  ( +-  0.38% )
               772 context-switches   # 0.073 K/sec          ( +-  2.03% )
                 0 cpu-migrations     # 0.000 K/sec          ( +-100.00% )
             2,207 page-faults        # 0.209 K/sec          ( +-  0.08% )
      10.552871687 seconds time elapsed                      ( +-  0.39% )

After:
$ taskset -c 0 perf stat -r 15 -- x86_64-linux-user/qemu-x86_64 img/bntest-x86_64 >/dev/null
 Performance counter stats for 'x86_64-linux-user/qemu-x86_64 img/bntest-x86_64' (15 runs):

      10459.968847 task-clock (msec)  # 0.999 CPUs utilized  ( +-  0.30% )
               739 context-switches   # 0.071 K/sec          ( +-  1.71% )
                 0 cpu-migrations     # 0.000 K/sec          ( +- 68.14% )
             2,204 page-faults        # 0.211 K/sec          ( +-  0.10% )
      10.473900411 seconds time elapsed                      ( +-  0.30% )

Suggested-by: Stefan Weil <sw@weilnetz.de>
Suggested-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-05-05 08:44:46 -07:00
Peter Crosthwaite
fee068e4f1 tcg: Delete unused cpu_pc_from_tb()
No code uses the cpu_pc_from_tb() function. Delete from tricore and
arm which each provide an unused implementation. Update the comment
in tcg.h to reflect that this is obsoleted by synchronize_from_tb.

Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-04-30 16:06:18 +03:00