Commit Graph

226 Commits

Author SHA1 Message Date
Richard Henderson
c56caea3b2 tcg: Revert "tcg/optimize: Flush data at labels not TCG_OPF_BB_END"
This reverts commit cd0372c515.

The patch is incorrect in that it retains copies between globals and
non-local temps, and non-local temps still die at the end of the BB.

Failing test case for hppa:

	.globl	_start
_start:
	cmpiclr,=	0x24,%r19,%r0
	cmpiclr,<>	0x2f,%r19,%r19

 ---- 00010057 0001005b
 movi_i32 tmp0,$0x24
 sub_i32 tmp1,tmp0,r19
 mov_i32 tmp2,tmp0
 mov_i32 tmp3,r19
 movi_i32 tmp1,$0x0

 ---- 0001005b 0001005f
 brcond_i32 tmp2,tmp3,eq,$L1
 movi_i32 tmp0,$0x2f
 sub_i32 tmp1,tmp0,r19
 mov_i32 tmp2,tmp0
 mov_i32 tmp3,r19
 movi_i32 tmp1,$0x0
 mov_i32 r19,tmp1
 setcond_i32 psw_n,tmp2,tmp3,ne
 set_label $L1

In this case, both copies of "mov_i32 tmp3,r19" are removed.  The
second because opt thought it was redundant.  The first is removed
later by liveness because tmp3 is known to be dead.  This leaves
the setcond_i32 with an uninitialized input.

Revert the entire patch for 5.2, and a proper optimization across
the branch may be considered for the next development cycle.

Reported-by: qemu@igor2.repo.hu
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-11-04 10:35:40 -08:00
Richard Henderson
cd0372c515 tcg/optimize: Flush data at labels not TCG_OPF_BB_END
We can easily propagate temp values through the entire extended
basic block (in this case, the set of blocks connected by fallthru),
simply by not discarding the register state at the branch.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-10-27 09:48:07 -07:00
Richard Henderson
1dc4fe7012 tcg/optimize: Fold dup2_vec
When the two arguments are identical, this can be reduced to
dup_vec or to mov_vec from a tcg_constant_vec.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-10-08 05:57:32 -05:00
Philippe Mathieu-Daudé
dcb32f1d8f tcg: Search includes from the project root source directory
We currently search both the root and the tcg/ directories for tcg
files:

  $ git grep '#include "tcg/' | wc -l
  28

  $ git grep '#include "tcg[^/]' | wc -l
  94

To simplify the preprocessor search path, unify by expliciting the
tcg/ directory.

Patch created mechanically by running:

  $ for x in \
      tcg.h tcg-mo.h tcg-op.h tcg-opc.h \
      tcg-op-gvec.h tcg-gvec-desc.h; do \
    sed -i "s,#include \"$x\",#include \"tcg/$x\"," \
      $(git grep -l "#include \"$x\""); \
    done

Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts)
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200101112303.20724-2-philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-01-15 15:13:10 -10:00
Tony Nguyen
14776ab5a1 tcg: TCGMemOp is now accelerator independent MemOp
Preparation for collapsing the two byte swaps, adjust_endianness and
handle_bswap, along the I/O path.

Target dependant attributes are conditionalized upon NEED_CPU_H.

Signed-off-by: Tony Nguyen <tony.nguyen@bt.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <81d9cd7d7f5aaadfa772d6c48ecee834e9cf7882.1566466906.git.tony.nguyen@bt.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03 08:30:38 -07:00
Markus Armbruster
6a0acfff99 Clean up inclusion of exec/cpu-common.h
migration/qemu-file.h neglects to include it even though it needs
ram_addr_t.  Fix that.  Drop a few superfluous inclusions elsewhere.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190812052359.30071-14-armbru@redhat.com>
2019-08-16 13:31:52 +02:00
Richard Henderson
80f4d7c3ae tcg: Fix constant folding of INDEX_op_extract2_i32
On a 64-bit host, discard any replications of the 32-bit
sign bit when performing the shift and merge.

Fixes: https://bugs.launchpad.net/bugs/1834496
Tested-by: Christophe Lyon <christophe.lyon@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-07-14 12:19:00 +02:00
Markus Armbruster
a8d2532645 Include qemu-common.h exactly where needed
No header includes qemu-common.h after this commit, as prescribed by
qemu-common.h's file comment.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-5-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c
block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c
target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h
target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h
target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h
target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and
net/tap-bsd.c fixed up]
2019-06-12 13:20:20 +02:00
Richard Henderson
ac383dde33 tcg: Do not recreate INDEX_op_neg_vec unless supported
Use tcg_can_emit_vec_op instead of just TCG_TARGET_HAS_neg_vec,
so that we check the type and vece for the actual operation.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:26:28 -07:00
Richard Henderson
fce1296f13 tcg: Add INDEX_op_extract2_{i32,i64}
This will let backends implement the double-word shift operation.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Emilio G. Cota
ac1043f6d6 tcg: Drop nargs from tcg_op_insert_{before,after}
It's unused since 75e8b9b7aa.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181209193749.12277-9-cota@braap.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:44 +03:00
Richard Henderson
6498594c8e tcg/optimize: Optimize bswap
Somehow we forgot these operations, once upon a time.
This will allow immediate stores to have their bswap
optimized away.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:44 +03:00
Richard Henderson
1fb57da72a tcg/optimize: Do not skip default processing of dup_vec
If we do not opimize away dup_vec, we must mark its output as changed.

Fixes: 170ba88f45
Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Message-id: 20180805233258.31892-1-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-08-06 14:57:48 +01:00
Richard Henderson
170ba88f45 tcg/optimize: Handle vector opcodes during optimize
Trivial move and constant propagation.  Some identity and constant
function folding, but nothing that requires knowledge of the size
of the vector element.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-02-08 15:54:06 +00:00
Richard Henderson
cd9090aa9d tcg: Generalize TCGOp parameters
We had two fields specific to INDEX_op_call.  Rename these and
add some macros so that the fields may be reused for other opcodes.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-12-29 12:43:39 -08:00
Richard Henderson
15fa08f845 tcg: Dynamically allocate TCGOps
With no fixed array allocation, we can't overflow a buffer.
This will be important as optimizations related to host vectors
may expand the number of ops used.

Use QTAILQ to link the ops together.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-12-29 12:43:39 -08:00
Emilio G. Cota
34184b0718 tcg: allocate optimizer temps with tcg_malloc
Groundwork for supporting multiple TCG contexts.

While at it, also allocate temps_used directly as a bitmap of the
required size, instead of using a bitmap of TCG_MAX_TEMPS via
TCGTempSet.

Performance-wise we lose about 1.12% in a translation-heavy workload
such as booting+shutting down debian-arm:

Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \
	-machine type=virt -nographic -smp 1 -m 4096 \
	-netdev user,id=unet,hostfwd=tcp::2222-:22 \
	-device virtio-net-device,netdev=unet \
	-drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \
	-device virtio-blk-device,drive=myblock \
	-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
	-name arm,debug-threads=on -smp 1' (10 runs):

             exec time (s)  Relative slowdown wrt original (%)
---------------------------------------------------------------
 original     20.213321616                                  0.
 tcg_malloc   20.441130078                           1.1270214
 TCGContext   20.477846517                           1.3086662
 g_malloc     20.780527895                           2.8061013

The other two alternatives shown in the table are:
- TCGContext: embed temps[TCG_MAX_TEMPS] and TCGTempSet used_temps
  in TCGContext. This is simple enough but it isn't faster than using
  tcg_malloc; moreover, it wastes memory.
- g_malloc: allocate/deallocate both temps and used_temps every time
  tcg_optimize is executed.

Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 13:53:42 -07:00
Richard Henderson
6349039d0b tcg: Use per-temp state data in optimize
While we're touching many of the lines anyway, adjust the naming
of the functions to better distinguish when "TCGArg" vs "TCGTemp"
should be used.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:45:07 +02:00
Richard Henderson
fa477d2547 tcg: Add temp_global bit to TCGTemp
This avoids needing to test the index of a temp against nb_globals.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:43:50 +02:00
Richard Henderson
434391390b tcg: Introduce arg_temp
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:43:36 +02:00
Richard Henderson
acd937019b tcg: Propagate args to op->args in optimizer
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:34:47 +02:00
Richard Henderson
75e8b9b7aa tcg: Merge opcode arguments into TCGOp
Rather than have a separate buffer of 10*max_ops entries,
give each opcode 10 entries.  The result is actually a bit
smaller and should have slightly more cache locality.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:34:47 +02:00
Richard Henderson
a768e4e992 tcg: Add opcode for ctpop
The number of actual invocations of ctpop itself does not warrent
an opcode, but it is very helpful for POWER7 to use in generating
an expansion for ctz.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:48:56 -08:00
Richard Henderson
0e28d0063b tcg: Add clz and ctz opcodes
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:06:11 -08:00
Richard Henderson
333b21b809 tcg/optimize: Fold movcond 0/1 into setcond
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:06:10 -08:00
Richard Henderson
7ec8bab3de tcg: Add field extraction primitives
Adds tcg_gen_extract_* and tcg_gen_sextract_* for extraction of
fixed position bitfields, much like we already have for deposit.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 07:59:11 -08:00
Alex Bennée
550276ae0a tcg/optimize: move default return out of if statement
This is to appease sanitizer builds which complain that:

  "error: control reaches end of non-void function"

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20160930213106.20186-5-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-04 10:00:25 +02:00
Pranith Kumar
34f939218c tcg: Optimize fence instructions
This commit optimizes fence instructions.  Two optimizations are
currently implemented: (1) unnecessary duplicate fence instructions,
and (2) merging weaker fences into a stronger fence.

[rth: Merge tcg_optimize_mb back into tcg_optimize, so that we only
loop over the opcode stream once.  Merge "unrelated" weaker barriers
into one stronger barrier.]

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160823134825.32578-1-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-09-16 08:12:12 -07:00
Richard Henderson
5a18407f55 tcg: Lower indirect registers in a separate pass
Rather than rely on recursion during the middle of register allocation,
lower indirect registers to loads and stores off the indirect base into
plain temps.

For an x86_64 host, with sufficient registers, this results in identical
code, modulo the actual register assignments.

For an i686 host, with insufficient registers, this means that temps can
be (temporarily) spilled to the stack in order to satisfy an allocation.
This as opposed to the possibility of not being able to spill, to allocate
a register for the indirect base, in order to perform a spill.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-08-05 21:44:40 +05:30
Richard Henderson
dcb8e75870 tcg: Reorg TCGOp chaining
Instead of using -1 as end of chain, use 0, and link through the 0
entry as a fully circular double-linked list.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-08-05 21:44:18 +05:30
Paolo Bonzini
00f6da6a1a exec: extract exec/tb-context.h
TCG backends do not need most of exec-all.h; extract what they actually
need to a separate file or move it directly to tcg.h.  The next patch
will stop including exec-all.h from everywhere.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:29 +02:00
Paolo Bonzini
33c11879fd qemu-common: push cpu.h inclusion out of qemu-common.h
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:29 +02:00
Aurelien Jarno
eabb7b91b3 tcg: use tcg_debug_assert instead of assert (fix performance regression)
The TCG code is quite performance sensitive, but at the same time can
also be quite tricky. That is why asserts that can be enabled with the
--enable-debug-tcg configure option.

This used to work the following way:

| #include "config.h"
|
| ...
|
| #if !defined(CONFIG_DEBUG_TCG) && !defined(NDEBUG)
| /* define it to suppress various consistency checks (faster) */
| #define NDEBUG
| #endif
|
| ...
|
| #include <assert.h>

Since commit 757e725b (tcg: Clean up includes) "config.h" as been
replaced by "qemu/osdep.h" which itself includes <assert.h>. As a
consequence the assertions are always enabled, even when using
--disable-debug-tcg, causing a performance regression, especially on
targets with many registers. For instance on qemu-system-ppc the
speed difference is about 15%.

tcg_debug_assert is controlled directly by CONFIG_DEBUG_TCG and already
uses in some places. This patch replaces all the calls to assert into
calss to tcg_debug_assert.

Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id: 1461228530-14852-1-git-send-email-aurelien@aurel32.net
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-21 15:41:47 +01:00
Peter Maydell
757e725b58 tcg: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-16-git-send-email-peter.maydell@linaro.org
2016-01-29 15:07:23 +00:00
Richard Henderson
609ad70562 tcg: Split trunc_shr_i32 opcode into extr[lh]_i64_i32
Rather than allow arbitrary shift+trunc, only concern ourselves
with low and high parts.  This is all that was being used anyway.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Aurelien Jarno
8bcb5c8f34 tcg/optimize: add optimizations for ext_i32_i64 and extu_i32_i64 ops
They behave the same as ext32s_i64 and ext32u_i64 from the constant
folding and zero propagation point of view, except that they can't
be replaced by a mov, so we don't compute the affected value.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Aurelien Jarno
0632e555fc tcg: rename trunc_shr_i32 into trunc_shr_i64_i32
The op is sometimes named trunc_shr_i32 and sometimes trunc_shr_i64_i32,
and the name in the README doesn't match the name offered to the
frontends.

Always use the long name to make it clear it is a size changing op.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Aurelien Jarno
299f801304 tcg/optimize: allow constant to have copies
Now that copies and constants are tracked separately, we can allow
constant to have copies, deferring the choice to use a register or a
constant to the register allocation pass. This prevent this kind of
regular constant reloading:

-OUT: [size=338]
+OUT: [size=298]
   mov    -0x4(%r14),%ebp
   test   %ebp,%ebp
   jne    0x7ffbe9cb0ed6
   mov    $0x40002219f8,%rbp
   mov    %rbp,(%r14)
-  mov    $0x40002219f8,%rbp
   mov    $0x4000221a20,%rbx
   mov    %rbp,(%rbx)
   mov    $0x4000000000,%rbp
   mov    %rbp,(%r14)
-  mov    $0x4000000000,%rbp
   mov    $0x4000221d38,%rbx
   mov    %rbp,(%rbx)
   mov    $0x40002221a8,%rbp
   mov    %rbp,(%r14)
-  mov    $0x40002221a8,%rbp
   mov    $0x4000221d40,%rbx
   mov    %rbp,(%rbx)
   mov    $0x4000019170,%rbp
   mov    %rbp,(%r14)
-  mov    $0x4000019170,%rbp
   mov    $0x4000221d48,%rbx
   mov    %rbp,(%rbx)
   mov    $0x40000049ee,%rbp
   mov    %rbp,0x80(%r14)
   mov    %r14,%rdi
   callq  0x7ffbe99924d0
   mov    $0x4000001680,%rbp
   mov    %rbp,0x30(%r14)
   mov    0x10(%r14),%rbp
   mov    $0x4000001680,%rbp
   mov    %rbp,0x30(%r14)
   mov    0x10(%r14),%rbp
   shl    $0x20,%rbp
   mov    (%r14),%rbx
   mov    %ebx,%ebx
   mov    %rbx,(%r14)
   or     %rbx,%rbp
   mov    %rbp,0x10(%r14)
   mov    %rbp,0x90(%r14)
   mov    0x60(%r14),%rbx
   mov    %rbx,0x38(%r14)
   mov    0x28(%r14),%rbx
   mov    $0x4000220e60,%r12
   mov    %rbx,(%r12)
   mov    $0x40002219c8,%rbx
   mov    %rbp,(%rbx)
   mov    0x20(%r14),%rbp
   sub    $0x8,%rbp
   mov    $0x4000004a16,%rbx
   mov    %rbx,0x0(%rbp)
   mov    %rbp,0x20(%r14)
   mov    $0x19,%ebp
   mov    %ebp,0xa8(%r14)
   mov    $0x4000015110,%rbp
   mov    %rbp,0x80(%r14)
   xor    %eax,%eax
   jmpq   0x7ffbebcae426
   lea    -0x5f6d72a(%rip),%rax        # 0x7ffbe3d437b3
   jmpq   0x7ffbebcae426

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Aurelien Jarno
b41059dd9d tcg/optimize: track const/copy status separately
Instead of using an enum which could be either a copy or a const, track
them separately. This will be used in the next patch.

Constants are tracked through a bool. Copies are tracked by initializing
temp's next_copy and prev_copy to itself, allowing to simplify the code
a bit.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:53 -07:00
Aurelien Jarno
d9c769c609 tcg/optimize: add temp_is_const and temp_is_copy functions
Add two accessor functions temp_is_const and temp_is_copy, to make the
code more readable and make code change easier.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:53 -07:00
Aurelien Jarno
1208d7dd5f tcg/optimize: optimize temps tracking
The tcg_temp_info structure uses 24 bytes per temp. Now that we emulate
vector registers on most guests, it's not uncommon to have more than 100
used temps. This means we have initialize more than 2kB at least twice
per TB, often more when there is a few goto_tb.

Instead used a TCGTempSet bit array to track which temps are in used in
the current basic block. This means there are only around 16 bytes to
initialize.

This improves the boot time of a MIPS guest on an x86-64 host by around
7% and moves out tcg_optimize from the the top of the profiler list.

[rth: Handle TCG_CALL_DUMMY_ARG]

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:30 -07:00
Aurelien Jarno
29f3ff8d6c tcg/optimize: fix constant signedness
By convention, on a 64-bit host TCG internally stores 32-bit constants
as sign-extended. This is not the case in the optimizer when a 32-bit
constant is folded.

This doesn't seem to have more consequences than suboptimal code
generation. For instance the x86 backend assumes sign-extended constants,
and in some rare cases uses a 32-bit unsigned immediate 0xffffffff
instead of a 8-bit signed immediate 0xff for the constant -1. This is
with a ppc guest:

before
------

 ---- 0x9f29cc
 movi_i32 tmp1,$0xffffffff
 movi_i32 tmp2,$0x0
 add2_i32 tmp0,CA,CA,tmp2,r6,tmp2
 add2_i32 tmp0,CA,tmp0,CA,tmp1,tmp2
 mov_i32 r10,tmp0

0x7fd8c7dfe90c:  xor    %ebp,%ebp
0x7fd8c7dfe90e:  mov    %ebp,%r11d
0x7fd8c7dfe911:  mov    0x18(%r14),%r9d
0x7fd8c7dfe915:  add    %r9d,%r10d
0x7fd8c7dfe918:  adc    %ebp,%r11d
0x7fd8c7dfe91b:  add    $0xffffffff,%r10d
0x7fd8c7dfe922:  adc    %ebp,%r11d
0x7fd8c7dfe925:  mov    %r11d,0x134(%r14)
0x7fd8c7dfe92c:  mov    %r10d,0x28(%r14)

after
-----

 ---- 0x9f29cc
 movi_i32 tmp1,$0xffffffffffffffff
 movi_i32 tmp2,$0x0
 add2_i32 tmp0,CA,CA,tmp2,r6,tmp2
 add2_i32 tmp0,CA,tmp0,CA,tmp1,tmp2
 mov_i32 r10,tmp0

0x7f37010d490c:  xor    %ebp,%ebp
0x7f37010d490e:  mov    %ebp,%r11d
0x7f37010d4911:  mov    0x18(%r14),%r9d
0x7f37010d4915:  add    %r9d,%r10d
0x7f37010d4918:  adc    %ebp,%r11d
0x7f37010d491b:  add    $0xffffffffffffffff,%r10d
0x7f37010d491f:  adc    %ebp,%r11d
0x7f37010d4922:  mov    %r11d,0x134(%r14)
0x7f37010d4929:  mov    %r10d,0x28(%r14)

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1436544211-2769-2-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:08 -07:00
Aurelien Jarno
961521261a tcg/optimize: fix tcg_opt_gen_movi
Due to a copy&paste, the new op value is tested against mov_i32 instead
of movi_i32. The test is therefore always false. Fix that.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1436544211-2769-1-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-07-23 20:37:12 -07:00
Aurelien Jarno
36e60ef6ac tcg/optimize: rename tcg_constant_folding
The tcg_constant_folding folding ends up doing all the optimizations
(which is a good thing to avoid looping on all ops multiple time), so
make it clear and just rename it tcg_optimize.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447607-31184-6-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09 07:00:56 -07:00
Aurelien Jarno
97a79eb70d tcg/optimize: fold constant test in tcg_opt_gen_mov
Most of the calls to tcg_opt_gen_mov are preceeded by a test to check if
the source temp is a constant. Fold that into the tcg_opt_gen_mov
function.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433495958-9508-1-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09 07:00:56 -07:00
Aurelien Jarno
5365718a9a tcg/optimize: fold temp copies test in tcg_opt_gen_mov
Each call to tcg_opt_gen_mov is preceeded by a test to check if the
source and destination temps are copies. Fold that into the
tcg_opt_gen_mov function.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447607-31184-4-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09 07:00:56 -07:00
Aurelien Jarno
8d6a91602e tcg/optimize: remove opc argument from tcg_opt_gen_mov
We can get the opcode using the TCGOp pointer. It needs to be
dereferenced, but it's anyway done a few lines below to write
the new value.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447607-31184-3-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09 07:00:56 -07:00
Aurelien Jarno
ebd27391b0 tcg/optimize: remove opc argument from tcg_opt_gen_movi
We can get the opcode using the TCGOp pointer. It needs to be
dereferenced, but it's anyway done a few lines below to write
the new value.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447607-31184-2-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09 07:00:56 -07:00
Richard Henderson
59227d5d45 tcg: Merge memop and mmu_idx parameters to qemu_ld/st
At the tcg opcode level, not at the tcg-op.h generator level.
This requires minor changes through all of the tcg backends,
but none of the cpu translators.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-05-14 12:14:55 -07:00
Richard Henderson
2374c4b837 tcg/optimize: Handle or r,a,a with constant a
As seen with ubuntu-5.10-live-powerpc.iso.

Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-03-16 08:46:13 -07:00
Richard Henderson
a4ce099a7a tcg: Implement insert_op_before
Rather reserving space in the op stream for optimization,
let the optimizer add ops as necessary.

Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-02-12 21:21:38 -08:00
Richard Henderson
0c627cdca2 tcg: Remove opcodes instead of noping them out
With the linked list scheme we need not leave nops in the stream
that we need to process later.

Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-02-12 21:21:38 -08:00
Richard Henderson
c45cb8bb89 tcg: Put opcodes in a linked list
The previous setup required ops and args to be completely sequential,
and was error prone when it came to both iteration and optimization.

Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-02-12 21:21:38 -08:00
Richard Henderson
bc8d688ff3 tcg/optimize: Don't special case TCG_OPF_CALL_CLOBBER
With the "old" ldst ops we didn't know the real width of the
result of the load, but with the "new" ldst ops we do.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-06-18 11:39:02 -07:00
Richard Henderson
3d1b2ff62c tcg: Remove TCG_TARGET_HAS_new_ldst
Since all backends have been converted, remove the compatibility code.

Acked-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-06-04 14:10:26 -07:00
Richard Henderson
24666baf1f tcg/optimize: Remember garbage high bits for 32-bit ops
For a 64-bit host, the high bits of a register after a 32-bit operation
are undefined.  Adjust the temps mask for all 32-bit ops to reflect that.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-05-28 09:33:56 -07:00
Richard Henderson
a62f6f5600 tcg/optimize: Move updating of gen_opc_buf into tcg_opt_gen_mov*
No functional change, just reduce a bit of redundancy.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-05-28 09:33:56 -07:00
Richard Henderson
a763551ad5 tcg: Optimize brcond2 and setcond2 ne/eq
If either the high or low pair can be resolved, we can
simplify to either a constant or to a 32-bit comparison.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-05-28 09:33:53 -07:00
Richard Henderson
cf06667428 tcg: Make call address a constant parameter
Avoid allocating a tcg temporary to hold the constant address,
and instead place it directly into the op_call arguments.

At the same time, convert to the newly introduced tcg_out_call
backend function, rather than invoking tcg_out_op for the call.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-05-12 11:13:12 -07:00
Richard Henderson
4bb7a41ed6 tcg: Add INDEX_op_trunc_shr_i32
Let the backend do something special for truncation.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-04-28 11:06:34 -07:00
Richard Henderson
d998e555d2 tcg: Fix out of range shift in deposit optimizations
By inspection, for a deposit(x, y, 0, 64), we'd have a shift of (1<<64)
and everything else falls apart.  But we can reuse the existing deposit
logic to get this right.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-04-18 16:57:36 -07:00
Richard Henderson
50c5c4d125 tcg: Mask shift quantities while folding
The TCG result would be undefined, but we can at least produce one
plausible result and avoid triggering the wrath of analysis tools.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-04-18 16:57:36 -07:00
Richard Henderson
464a1441c1 tcg/optimize: Add more identity simplifications
Recognize 0 operand to andc, and -1 operands to and, orc, eqv.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17 10:12:29 -06:00
Richard Henderson
e64e958e20 tcg/optimize: Optmize ANDC X,Y,Y to MOV X,0
Like we already do for SUB and XOR.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17 10:12:29 -06:00
Richard Henderson
e201b56418 tcg/optimize: Simply some logical ops to NOT
Given, of course, an appropriate constant.  These could be generated
from the "canonical" operation for inversion on the guest, or via
other optimizations.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17 10:12:29 -06:00
Richard Henderson
23ec69ed37 tcg/optimize: Handle known-zeros masks for ANDC
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17 10:12:29 -06:00
Aurelien Jarno
c8d7027253 tcg/optimize: add known-zero bits compute for load ops
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17 10:12:28 -06:00
Aurelien Jarno
f096dc9618 tcg/optimize: improve known-zero bits for 32-bit ops
The shl_i32 op might set some bits of the unused 32 high bits of the
mask. Fix that by clearing the unused 32 high bits for all 32-bit ops
except load/store which operate on tl values.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17 10:12:28 -06:00
Aurelien Jarno
3031244b01 tcg/optimize: fix known-zero bits optimization
Known-zero bits optimization is a great idea that helps to generate more
optimized code. However the current implementation only works in very few
cases as the computed mask is not saved.

Fix this to make it really working.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17 10:12:28 -06:00
Aurelien Jarno
e46b225a31 tcg/optimize: fix known-zero bits for right shift ops
32-bit versions of sar and shr ops should not propagate known-zero bits
from the unused 32 high bits. For sar it could even lead to wrong code
being generated.

Cc: qemu-stable@nongnu.org
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17 10:12:28 -06:00
Stefan Weil
3df2b8fde9 misc: Use new rotate functions
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2013-09-25 21:23:05 +02:00
Richard Henderson
01547f7f92 tcg: Constant fold div, rem
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:29 -07:00
Richard Henderson
03271524b6 tcg: Add muluh and mulsh opcodes
Use them in places where mulu2 and muls2 are used.
Optimize mulx2 with dead low part to mulxh.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:29 -07:00
Aurelien Jarno
66e61b55f1 tcg/optimize: fix setcond2 optimization
When setcond2 is rewritten into setcond, the state of the destination
temp should be reset, so that a copy of the previous value is not
used instead of the result.

Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2013-05-09 16:14:58 +02:00
Richard Henderson
2d497542e1 tcg-optimize: Fold sub r,0,x to neg r,x
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-03-23 14:31:03 +00:00
Richard Henderson
4d3203fd0b tcg: Add signed multiword multiplication operations
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Richard Henderson
d7156f7ce4 tcg: Add 64-bit multiword arithmetic operations
Matching the 32-bit multiword arithmetic that we already have.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Paolo Bonzini
633f650254 optimize: optimize using nonzero bits
This adds two optimizations using the non-zero bit mask.  In some cases
involving shifts or ANDs the value can become zero, and can thus be
optimized to a move of zero.  Second, useless zero-extension or an
AND with constant can be detected that would only zero bits that are
already zero.

The main advantage of this optimization is that it turns zero-extensions
into moves, thus enabling much better copy propagation (around 1% code
reduction).  Here is for example a "test $0xff0000,%ecx + je" before
optimization:

 mov_i64 tmp0,rcx
 movi_i64 tmp1,$0xff0000
 discard cc_src
 and_i64 cc_dst,tmp0,tmp1
 movi_i32 cc_op,$0x1c
 ext32u_i64 tmp0,cc_dst
 movi_i64 tmp12,$0x0
 brcond_i64 tmp0,tmp12,eq,$0x0

and after (without patch on the left, with on the right):

 movi_i64 tmp1,$0xff0000                 movi_i64 tmp1,$0xff0000
 discard cc_src                          discard cc_src
 and_i64 cc_dst,rcx,tmp1                 and_i64 cc_dst,rcx,tmp1
 movi_i32 cc_op,$0x1c                    movi_i32 cc_op,$0x1c
 ext32u_i64 tmp0,cc_dst
 movi_i64 tmp12,$0x0                     movi_i64 tmp12,$0x0
 brcond_i64 tmp0,tmp12,eq,$0x0           brcond_i64 cc_dst,tmp12,eq,$0x0

Other similar cases: "test %eax, %eax + jne" where eax is already 32-bit
(after optimization, without patch on the left, with on the right):

 discard cc_src                          discard cc_src
 mov_i64 cc_dst,rax                      mov_i64 cc_dst,rax
 movi_i32 cc_op,$0x1c                    movi_i32 cc_op,$0x1c
 ext32u_i64 tmp0,cc_dst
 movi_i64 tmp12,$0x0                     movi_i64 tmp12,$0x0
 brcond_i64 tmp0,tmp12,ne,$0x0           brcond_i64 rax,tmp12,ne,$0x0

"test $0x1, %dl + je":

 movi_i64 tmp1,$0x1                      movi_i64 tmp1,$0x1
 discard cc_src                          discard cc_src
 and_i64 cc_dst,rdx,tmp1                 and_i64 cc_dst,rdx,tmp1
 movi_i32 cc_op,$0x1a                    movi_i32 cc_op,$0x1a
 ext8u_i64 tmp0,cc_dst
 movi_i64 tmp12,$0x0                     movi_i64 tmp12,$0x0
 brcond_i64 tmp0,tmp12,eq,$0x0           brcond_i64 cc_dst,tmp12,eq,$0x0

In some cases TCG even outsmarts GCC. :)  Here the input code has
"and $0x2,%eax + movslq %eax,%rbx + test %rbx, %rbx" and the optimizer,
thanks to copy propagation, does the following:

 movi_i64 tmp12,$0x2                     movi_i64 tmp12,$0x2
 and_i64 rax,rax,tmp12                   and_i64 rax,rax,tmp12
 mov_i64 cc_dst,rax                      mov_i64 cc_dst,rax
 ext32s_i64 tmp0,rax                  -> nop
 mov_i64 rbx,tmp0                     -> mov_i64 rbx,cc_dst
 and_i64 cc_dst,rbx,rbx               -> nop

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-19 10:13:16 +00:00
Paolo Bonzini
3a9d8b179b optimize: track nonzero bits of registers
Add a "mask" field to the tcg_temp_info struct.  A bit that is zero
in "mask" will always be zero in the corresponding temporary.
Zero bits in the mask can be produced from moves of immediates,
zero-extensions, ANDs with constants, shifts; they can then be
be propagated by logical operations, shifts, sign-extensions,
negations, deposit operations, and conditional moves.  Other
operations will just reset the mask to all-ones, i.e. unknown.

[rth: s/target_ulong/tcg_target_ulong/]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-19 10:13:14 +00:00
Paolo Bonzini
d193a14a2c optimize: only write to state when clearing optimizer data
The next patch will add to the TCG optimizer a field that should be
non-zero in the default case.  Thus, replace the memset of the
temps array with a loop.  Only the state field has to be up-to-date,
because others are not used except if the state is TCG_TEMP_COPY
or TCG_TEMP_CONST.

[rth: Extracted the loop to a function.]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-19 10:13:13 +00:00
Evgeny Voevodin
92414b31e7 TCG: Use gen_opc_buf from context instead of global variable.
Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-17 13:53:36 +00:00
Aurelien Jarno
7850527966 tcg: rework TCG helper flags
The current helper flags, TCG_CALL_CONST and TCG_CALL_PURE might be
confusing and doesn't provide enough granularity for some helpers (FP
helpers for example).

This patch changes them into the following helpers flags:
- TCG_CALL_NO_READ_GLOBALS means that the helper does not read globals,
  either directly or via an exception. They will not be saved to their
  canonical location before calling the helper.
- TCG_CALL_NO_WRITE_GLOBALS means that the helper does not modify any
  globals. They will only be saved to their canonical locations before
  calling helpers, but they won't be reloaded afterwise.
- TCG_CALL_NO_SIDE_EFFECTS means that the call to the function is
  removed if the return value is not used.

It provides convenience flags, to avoid helper definitions longer than
80 characters. It also provides compatibility flags, and updates the
documentation.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:23 +01:00
Richard Henderson
1414968a6a tcg: Optimize mulu2
Like add2, do operand ordering, constant folding, and dead operand
elimination.  The latter happens about 15% of all mulu2 during an
x86_64 bios boot.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:51:39 +02:00
Richard Henderson
212c328d61 tcg: Constant fold add2 and sub2
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:51:37 +02:00
Richard Henderson
6c4382f8f4 tcg: Do constant folding on double-word comparisons
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:51:35 +02:00
Richard Henderson
9519da7e39 tcg: Split out subroutines from do_constant_folding_cond
We can re-use these for implementing double-word folding.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:51:32 +02:00
Richard Henderson
bc1473eff4 tcg: Optimize double-word comparisons against zero
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:32:29 +02:00
Richard Henderson
6e14e91b66 tcg: Use common code when failing to optimize
This saves a whole lot of repetitive code sequences.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:32:01 +02:00
Richard Henderson
0bfcb86538 tcg: Swap commutative double-word comparisons
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:31:57 +02:00
Richard Henderson
1e484e61e2 tcg: Canonicalize add2 operand ordering
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:31:53 +02:00
Richard Henderson
24c9ae4eba tcg: Split out swap_commutative as a subroutine
Reduces code duplication and prefers

  movcond d, c1, c2, const, s
to
  movcond d, c1, c2, s, const

It also prefers

  add r, r, c
over
  add r, c, r

when both inputs are known constants.  This doesn't matter for true add, as
we will fully constant fold that.  But it matters for a follow-on patch using
this routine for add2 which may not be fully foldable.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:30:40 +02:00
Richard Henderson
0aed257f08 tcg: Add TCG_COND_NEVER, TCG_COND_ALWAYS
There are several cases that can be handled easier inside both
translators and code generators if we have out-of-band values
for conditions.  It's easy enough to handle ALWAYS and NEVER in
the natural way inside the tcg middle-end.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-06 18:48:40 +02:00
Aurelien Jarno
7ef55fc919 tcg/optimize: add constant folding for deposit
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-22 15:10:21 +02:00
Aurelien Jarno
c2b0e2fea2 tcg/optimize: prefer the "op a, a, b" form for commutative ops
The "op a, a, b" form is better handled on non-RISC host than the "op
a, b, a" form, so swap the arguments to this form when possible, and
when b is not a constant.

This reduces the number of generated instructions by a tiny bit.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-22 15:10:21 +02:00
Aurelien Jarno
b336ceb691 tcg/optimize: further optimize brcond/movcond/setcond
When both argument of brcond/movcond/setcond are the same or when one
of the two values is a constant equal to zero, it's possible to do
further optimizations.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-22 15:10:21 +02:00
Aurelien Jarno
3c94193e0b tcg/optimize: optimize "op r, a, a => movi r, 0"
Now that it's possible to detect copies, we can optimize the case
the "op r, a, a => movi r, 0". This helps in the computation of
overflow flags when one of the two args is 0.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-22 15:10:21 +02:00
Aurelien Jarno
0aba1c7376 tcg/optimize: optimize "op r, a, a => mov r, a"
Now that we can easily detect all copies, we can optimize the
"op r, a, a => mov r, a" case a bit more.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-22 15:10:21 +02:00
Aurelien Jarno
1ff8c5418a tcg/optimize: do copy propagation for all operations
It is possible to due copy propagation for all operations, even the one
that have side effects or clobber arguments (it only concerns input
arguments). That said, the call operation should be handled differently
due to the variable number of arguments.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-22 15:10:21 +02:00
Aurelien Jarno
e590d4e6b3 tcg/optimize: rework copy progagation
The copy propagation pass tries to keep track what is a copy of what
and what has copy of what, and in addition it keep a circular list of
of all the copies. Unfortunately this doesn't fully work: a mov from
a temp which has a state "COPY" changed it into a state "HAS_COPY".
Later when this temp is used again, it is considered has not having
copy and thus no propagation is done.

This patch fixes that by removing the hiearchy between copies, and thus
only keeping a "COPY" state both meaning "is a copy" and "has a copy".
The decision of which copy to use is deferred to the actual temp
replacement. At this stage there is not one best choice to do, but only
better choices than others. For doing the best choice the operation
would have to be parsed in reversed to know if a temp is going to be
used later or not. That what is done by the liveness analysis. At this
stage it is known that globals will be always live, that local temps
will be dead at the end of the translation block, and that the temps
will be dead at the end of the basic block. This means that this stage
should try to replace temps by local temps or globals and local temps
by globals.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-22 15:10:21 +02:00
Aurelien Jarno
b80bb016d8 tcg/optimize: check types in copy propagation
The copy propagation doesn't check the types of the temps during copy
propagation. However TCG is using the mov_i32 for the i64 to i32
conversion and thus the two are not equivalent.

With this patch tcg_opt_gen_mov() doesn't consider two temps of
different type as copies anymore.

So far it seems the optimization was not aggressive enough to trigger
this bug, but it will be triggered later in this series once the copy
propagation is improved.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-22 15:10:20 +02:00
Aurelien Jarno
48b56ce168 tcg/optimize: remove TCG_TEMP_ANY
TCG_TEMP_ANY has no different meaning than TCG_TEMP_UNDEF, so use
the later instead.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-22 15:10:20 +02:00
Richard Henderson
5d8f536300 tcg: Optimize two-address commutative operations
While swapping constants to the second operand, swap
sources matching destinations to the first operand.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-21 19:53:17 +02:00
Richard Henderson
fa01a2084e tcg: Optimize movcond for constant comparisons
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-21 19:53:17 +02:00
Aurelien Jarno
a255066039 tcg/optimize: fix end of basic block detection
Commit e31b0a7c05 fixed copy propagation on
32-bit host by restricting the copy between different types. This was the
wrong fix.

The real problem is that the all temps states should be reset at the end
of a basic block. This was done by adding such operations in the switch,
but brcond2 was forgotten (that's why the crash was only observed on 32-bit
hosts).

Fix that by looking at the TCG_OPF_BB_END instead. We need to keep the case
for op_set_label as temps might be modified through another path.

Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-19 21:53:46 +02:00
Aurelien Jarno
d104bebd07 revert "TCG: fix copy propagation"
Given the copy propagation breakage on 32-bit hosts has been fixed
commit e31b0a7c05 can be reverted.

Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-19 21:40:47 +02:00
Aurelien Jarno
fedc0da251 tcg/optimize: fix if/else/break coding style
optimizer.c contains some cases were the break is appearing in both the
if and the else parts. Fix that by moving it to the outer part. Also
move some common code there.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-11 18:06:04 +02:00
Aurelien Jarno
fbeaa26c4c tcg/optimize: add constant folding for brcond
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-11 18:06:03 +02:00
Aurelien Jarno
f8dd19e5c7 tcg/optimize: add constant folding for setcond
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-11 18:06:01 +02:00
Aurelien Jarno
65a7cce17d tcg/optimize: swap brcond/setcond arguments when possible
brcond and setcond ops are not commutative, but it's easy to compute the
new condition after swapping the arguments. Try to always put the constant
argument in second position like for commutative ops, to help backends to
generate better code.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-11 18:05:59 +02:00
Aurelien Jarno
01ee5282ea tcg/optimize: simplify shift/rot r, 0, a => movi r, 0 cases
shift/rot r, 0, a is equivalent to movi r, 0.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-11 18:05:58 +02:00
Aurelien Jarno
61251c0c79 tcg/optimize: simplify and r, a, 0 cases
and r, a, 0 is equivalent to a movi r, 0.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-11 18:05:57 +02:00
Aurelien Jarno
38ee188b1b tcg/optimize: simplify or/xor r, a, 0 cases
or/xor r, a, 0 is equivalent to a mov r, a.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-11 18:05:56 +02:00
Aurelien Jarno
56e4943825 tcg/optimize: split expression simplification
Split expression simplification in multiple parts so that a given op
can appear multiple times. This patch should not change anything.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-09-11 18:05:55 +02:00
Blue Swirl
fe0de7aa5e TCG: improve optimizer debugging
Use enum TCGOpcode instead of plain old int so that the name of
current op can be seen in GDB. Add a default case to switch
so that GCC does not complain about unhandled enum cases.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-08-28 07:17:27 +00:00
Richard Henderson
cb25c80a9b tcg: Constant fold neg, andc, orc, eqv, nand, nor.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-08-21 18:52:25 +00:00
Richard Henderson
25c4d9cc84 tcg: Always define all of the TCGOpcode enum members.
By always defining these symbols, we can eliminate a lot of ifdefs.

To allow this to be checked reliably, the semantics of the
TCG_TARGET_HAS_* macros must be changed from def/undef to true/false.
This allows even more ifdefs to be removed, converting them into
C if statements.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-08-21 18:52:24 +00:00
Richard Henderson
8399ad59e7 tcg: Add and use TCG_OPF_64BIT.
This allows the simplification of the op_bits function from
tcg/optimize.c.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-08-21 18:52:22 +00:00
Blue Swirl
e31b0a7c05 TCG: fix copy propagation
Copy propagation introduced in 22613af4a6
considered only global registers. However, register temps and stack
allocated locals must be handled differently because register temps
don't survive across brcond.

Fix by propagating only within same class of temps.

Tested-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-08-07 09:33:20 +00:00
Blue Swirl
2ec00650f6 TCG: fix breakage by previous patch
Fix incorrect logic and typos in previous commit
1bfd07bdfe.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-07-30 18:54:23 +00:00
Blue Swirl
1bfd07bdfe TCG: fix breakage on some RISC hosts
Fix breakage by a640f03178
and 55c0975c5b.

Some TCG targets don't implement all TCG ops, so make
optimizing those conditional.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-07-30 12:21:33 +00:00
Kirill Batuzov
a640f03178 Do constant folding for unary operations.
Perform constant folding for NOT and EXT{8,16,32}{S,U} operations.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-07-30 10:51:30 +00:00
Kirill Batuzov
55c0975c5b Do constant folding for shift operations.
Perform constant forlding for SHR, SHL, SAR, ROTR, ROTL operations.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-07-30 10:51:29 +00:00
Kirill Batuzov
9a81090b12 Do constant folding for boolean operations.
Perform constant folding for AND, OR, XOR operations.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-07-30 10:51:29 +00:00
Kirill Batuzov
53108fb574 Do constant folding for basic arithmetic operations.
Perform actual constant folding for ADD, SUB and MUL operations.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-07-30 10:51:28 +00:00
Kirill Batuzov
22613af4a6 Add copy and constant propagation.
Make tcg_constant_folding do copy and constant propagation. It is a
preparational work before actual constant folding.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-07-30 10:51:27 +00:00
Kirill Batuzov
8f2e8c07a6 Add TCG optimizations stub
Added file tcg/optimize.c to hold TCG optimizations. Function tcg_optimize
is called from tcg_gen_code_common. It calls other functions performing
specific optimizations. Stub for constant folding was added.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-07-30 10:51:25 +00:00