Commit Graph

195 Commits

Author SHA1 Message Date
Richard Henderson
d0ed5151b1 tcg/optimize: Move prev_mb into OptContext
This will expose the variable to subroutines that
will be broken out of tcg_optimize.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27 17:11:22 -07:00
Richard Henderson
dc84988a5f tcg/optimize: Change tcg_opt_gen_{mov,movi} interface
Adjust the interface to take the OptContext parameter instead
of TCGContext or both.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27 17:11:22 -07:00
Richard Henderson
b10f38339b tcg/optimize: Remove do_default label
Break the final cleanup clause out of the main switch
statement.  When fully folding an opcode to mov/movi,
use "continue" to process the next opcode, else break
to fall into the final cleanup.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27 17:11:22 -07:00
Richard Henderson
3b3f847d75 tcg/optimize: Split out OptContext
Provide what will become a larger context for splitting
the very large tcg_optimize function.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27 17:11:22 -07:00
Richard Henderson
b1fde411d0 tcg/optimize: Rename "mask" to "z_mask"
Prepare for tracking different masks by renaming this one.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27 17:11:22 -07:00
Richard Henderson
9002ffcb72 tcg: Rename TCGMemOpIdx to MemOpIdx
We're about to move this out of tcg.h, so rename it
as we did when moving MemOp.

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-05 16:53:17 -07:00
Richard Henderson
0b76ff8f1b tcg: Handle new bswap flags during optimize
Notice when the input is known to be zero-extended and force
the TCG_BSWAP_IZ flag on.  Honor the TCG_BSWAP_OS bit during
constant folding.  Propagate the input to the output mask.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-06-29 10:04:57 -07:00
Richard Henderson
90163900e3 tcg: Add tcg_call_flags
We're going to change how to look up the call flags from a TCGop,
so extract it as a helper.

Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-06-19 08:51:11 -07:00
Richard Henderson
c58f4c97b2 tcg: Remove movi and dupi opcodes
These are now completely covered by mov from a
TYPE_CONST temporary.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-13 08:39:08 -10:00
Richard Henderson
0b4286dd15 tcg: Convert tcg_gen_dupi_vec to TCG_CONST
Because we now store uint64_t in TCGTemp, we can now always
store the full 64-bit duplicate immediate.  So remove the
difference between 32- and 64-bit hosts.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-13 08:39:08 -10:00
Richard Henderson
8fe35e0444 tcg/optimize: Use tcg_constant_internal with constant folding
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-13 08:39:08 -10:00
Richard Henderson
8f17a975e6 tcg/optimize: Adjust TempOptInfo allocation
Do not allocate a large block for indexing.  Instead, allocate
for each temporary as they are seen.

In general, this will use less memory, if we consider that most
TBs do not touch every target register.  This also allows us to
allocate TempOptInfo for new temps created during optimization.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-13 08:39:08 -10:00
Richard Henderson
4c868ce645 tcg/optimize: Improve find_better_copy
Prefer TEMP_CONST over anything else.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-13 08:39:08 -10:00
Richard Henderson
c0522136ad tcg: Introduce TYPE_CONST temporaries
These will hold a single constant for the duration of the TB.
They are hashed, so that each value has one temp across the TB.

Not used yet, this is all infrastructure.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-13 08:39:08 -10:00
Richard Henderson
54795544e4 tcg: Expand TempOptInfo to 64-bits
This propagates the extended value of TCGTemp.val that we did before.
In addition, it will be required for vector constants.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-13 08:39:08 -10:00
Richard Henderson
6fcb98eda1 tcg: Rename struct tcg_temp_info to TempOptInfo
Fix this name vs our coding style.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-13 08:39:08 -10:00
Richard Henderson
ee17db83d2 tcg: Consolidate 3 bits into enum TCGTempKind
The temp_fixed, temp_global, temp_local bits are all related.
Combine them into a single enumeration.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-13 08:39:08 -10:00
Richard Henderson
07ce0b0530 tcg: Introduce INDEX_op_qemu_st8_i32
Enable this on i386 to restrict the set of input registers
for an 8-bit store, as required by the architecture.  This
removes the last use of scratch registers for user-only mode.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-07 05:09:06 -10:00
Thomas Huth
d84568b773 tcg/optimize: Add fallthrough annotations
To be able to compile this file with -Werror=implicit-fallthrough,
we need to add some fallthrough annotations to the case statements
that might fall through. Unfortunately, the typical "/* fallthrough */"
comments do not work here as expected since some case labels are
wrapped in macros and the compiler fails to match the comments in
this case. But using __attribute__((fallthrough)) seems to work fine,
so let's use that instead (by introducing a new QEMU_FALLTHROUGH
macro in our compiler.h header file).

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20201211152426.350966-11-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2020-12-18 09:14:23 +01:00
Richard Henderson
c56caea3b2 tcg: Revert "tcg/optimize: Flush data at labels not TCG_OPF_BB_END"
This reverts commit cd0372c515.

The patch is incorrect in that it retains copies between globals and
non-local temps, and non-local temps still die at the end of the BB.

Failing test case for hppa:

	.globl	_start
_start:
	cmpiclr,=	0x24,%r19,%r0
	cmpiclr,<>	0x2f,%r19,%r19

 ---- 00010057 0001005b
 movi_i32 tmp0,$0x24
 sub_i32 tmp1,tmp0,r19
 mov_i32 tmp2,tmp0
 mov_i32 tmp3,r19
 movi_i32 tmp1,$0x0

 ---- 0001005b 0001005f
 brcond_i32 tmp2,tmp3,eq,$L1
 movi_i32 tmp0,$0x2f
 sub_i32 tmp1,tmp0,r19
 mov_i32 tmp2,tmp0
 mov_i32 tmp3,r19
 movi_i32 tmp1,$0x0
 mov_i32 r19,tmp1
 setcond_i32 psw_n,tmp2,tmp3,ne
 set_label $L1

In this case, both copies of "mov_i32 tmp3,r19" are removed.  The
second because opt thought it was redundant.  The first is removed
later by liveness because tmp3 is known to be dead.  This leaves
the setcond_i32 with an uninitialized input.

Revert the entire patch for 5.2, and a proper optimization across
the branch may be considered for the next development cycle.

Reported-by: qemu@igor2.repo.hu
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-11-04 10:35:40 -08:00
Richard Henderson
cd0372c515 tcg/optimize: Flush data at labels not TCG_OPF_BB_END
We can easily propagate temp values through the entire extended
basic block (in this case, the set of blocks connected by fallthru),
simply by not discarding the register state at the branch.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-10-27 09:48:07 -07:00
Richard Henderson
1dc4fe7012 tcg/optimize: Fold dup2_vec
When the two arguments are identical, this can be reduced to
dup_vec or to mov_vec from a tcg_constant_vec.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-10-08 05:57:32 -05:00
Philippe Mathieu-Daudé
dcb32f1d8f tcg: Search includes from the project root source directory
We currently search both the root and the tcg/ directories for tcg
files:

  $ git grep '#include "tcg/' | wc -l
  28

  $ git grep '#include "tcg[^/]' | wc -l
  94

To simplify the preprocessor search path, unify by expliciting the
tcg/ directory.

Patch created mechanically by running:

  $ for x in \
      tcg.h tcg-mo.h tcg-op.h tcg-opc.h \
      tcg-op-gvec.h tcg-gvec-desc.h; do \
    sed -i "s,#include \"$x\",#include \"tcg/$x\"," \
      $(git grep -l "#include \"$x\""); \
    done

Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts)
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200101112303.20724-2-philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-01-15 15:13:10 -10:00
Tony Nguyen
14776ab5a1 tcg: TCGMemOp is now accelerator independent MemOp
Preparation for collapsing the two byte swaps, adjust_endianness and
handle_bswap, along the I/O path.

Target dependant attributes are conditionalized upon NEED_CPU_H.

Signed-off-by: Tony Nguyen <tony.nguyen@bt.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <81d9cd7d7f5aaadfa772d6c48ecee834e9cf7882.1566466906.git.tony.nguyen@bt.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03 08:30:38 -07:00
Markus Armbruster
6a0acfff99 Clean up inclusion of exec/cpu-common.h
migration/qemu-file.h neglects to include it even though it needs
ram_addr_t.  Fix that.  Drop a few superfluous inclusions elsewhere.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190812052359.30071-14-armbru@redhat.com>
2019-08-16 13:31:52 +02:00
Richard Henderson
80f4d7c3ae tcg: Fix constant folding of INDEX_op_extract2_i32
On a 64-bit host, discard any replications of the 32-bit
sign bit when performing the shift and merge.

Fixes: https://bugs.launchpad.net/bugs/1834496
Tested-by: Christophe Lyon <christophe.lyon@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-07-14 12:19:00 +02:00
Markus Armbruster
a8d2532645 Include qemu-common.h exactly where needed
No header includes qemu-common.h after this commit, as prescribed by
qemu-common.h's file comment.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-5-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c
block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c
target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h
target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h
target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h
target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and
net/tap-bsd.c fixed up]
2019-06-12 13:20:20 +02:00
Richard Henderson
ac383dde33 tcg: Do not recreate INDEX_op_neg_vec unless supported
Use tcg_can_emit_vec_op instead of just TCG_TARGET_HAS_neg_vec,
so that we check the type and vece for the actual operation.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 14:26:28 -07:00
Richard Henderson
fce1296f13 tcg: Add INDEX_op_extract2_{i32,i64}
This will let backends implement the double-word shift operation.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Emilio G. Cota
ac1043f6d6 tcg: Drop nargs from tcg_op_insert_{before,after}
It's unused since 75e8b9b7aa.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181209193749.12277-9-cota@braap.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:44 +03:00
Richard Henderson
6498594c8e tcg/optimize: Optimize bswap
Somehow we forgot these operations, once upon a time.
This will allow immediate stores to have their bswap
optimized away.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17 06:04:44 +03:00
Richard Henderson
1fb57da72a tcg/optimize: Do not skip default processing of dup_vec
If we do not opimize away dup_vec, we must mark its output as changed.

Fixes: 170ba88f45
Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Message-id: 20180805233258.31892-1-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-08-06 14:57:48 +01:00
Richard Henderson
170ba88f45 tcg/optimize: Handle vector opcodes during optimize
Trivial move and constant propagation.  Some identity and constant
function folding, but nothing that requires knowledge of the size
of the vector element.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-02-08 15:54:06 +00:00
Richard Henderson
cd9090aa9d tcg: Generalize TCGOp parameters
We had two fields specific to INDEX_op_call.  Rename these and
add some macros so that the fields may be reused for other opcodes.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-12-29 12:43:39 -08:00
Richard Henderson
15fa08f845 tcg: Dynamically allocate TCGOps
With no fixed array allocation, we can't overflow a buffer.
This will be important as optimizations related to host vectors
may expand the number of ops used.

Use QTAILQ to link the ops together.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-12-29 12:43:39 -08:00
Emilio G. Cota
34184b0718 tcg: allocate optimizer temps with tcg_malloc
Groundwork for supporting multiple TCG contexts.

While at it, also allocate temps_used directly as a bitmap of the
required size, instead of using a bitmap of TCG_MAX_TEMPS via
TCGTempSet.

Performance-wise we lose about 1.12% in a translation-heavy workload
such as booting+shutting down debian-arm:

Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \
	-machine type=virt -nographic -smp 1 -m 4096 \
	-netdev user,id=unet,hostfwd=tcp::2222-:22 \
	-device virtio-net-device,netdev=unet \
	-drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \
	-device virtio-blk-device,drive=myblock \
	-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
	-name arm,debug-threads=on -smp 1' (10 runs):

             exec time (s)  Relative slowdown wrt original (%)
---------------------------------------------------------------
 original     20.213321616                                  0.
 tcg_malloc   20.441130078                           1.1270214
 TCGContext   20.477846517                           1.3086662
 g_malloc     20.780527895                           2.8061013

The other two alternatives shown in the table are:
- TCGContext: embed temps[TCG_MAX_TEMPS] and TCGTempSet used_temps
  in TCGContext. This is simple enough but it isn't faster than using
  tcg_malloc; moreover, it wastes memory.
- g_malloc: allocate/deallocate both temps and used_temps every time
  tcg_optimize is executed.

Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24 13:53:42 -07:00
Richard Henderson
6349039d0b tcg: Use per-temp state data in optimize
While we're touching many of the lines anyway, adjust the naming
of the functions to better distinguish when "TCGArg" vs "TCGTemp"
should be used.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:45:07 +02:00
Richard Henderson
fa477d2547 tcg: Add temp_global bit to TCGTemp
This avoids needing to test the index of a temp against nb_globals.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:43:50 +02:00
Richard Henderson
434391390b tcg: Introduce arg_temp
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:43:36 +02:00
Richard Henderson
acd937019b tcg: Propagate args to op->args in optimizer
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:34:47 +02:00
Richard Henderson
75e8b9b7aa tcg: Merge opcode arguments into TCGOp
Rather than have a separate buffer of 10*max_ops entries,
give each opcode 10 entries.  The result is actually a bit
smaller and should have slightly more cache locality.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24 21:34:47 +02:00
Richard Henderson
a768e4e992 tcg: Add opcode for ctpop
The number of actual invocations of ctpop itself does not warrent
an opcode, but it is very helpful for POWER7 to use in generating
an expansion for ctz.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:48:56 -08:00
Richard Henderson
0e28d0063b tcg: Add clz and ctz opcodes
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:06:11 -08:00
Richard Henderson
333b21b809 tcg/optimize: Fold movcond 0/1 into setcond
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:06:10 -08:00
Richard Henderson
7ec8bab3de tcg: Add field extraction primitives
Adds tcg_gen_extract_* and tcg_gen_sextract_* for extraction of
fixed position bitfields, much like we already have for deposit.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 07:59:11 -08:00
Alex Bennée
550276ae0a tcg/optimize: move default return out of if statement
This is to appease sanitizer builds which complain that:

  "error: control reaches end of non-void function"

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20160930213106.20186-5-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-04 10:00:25 +02:00
Pranith Kumar
34f939218c tcg: Optimize fence instructions
This commit optimizes fence instructions.  Two optimizations are
currently implemented: (1) unnecessary duplicate fence instructions,
and (2) merging weaker fences into a stronger fence.

[rth: Merge tcg_optimize_mb back into tcg_optimize, so that we only
loop over the opcode stream once.  Merge "unrelated" weaker barriers
into one stronger barrier.]

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20160823134825.32578-1-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-09-16 08:12:12 -07:00
Richard Henderson
5a18407f55 tcg: Lower indirect registers in a separate pass
Rather than rely on recursion during the middle of register allocation,
lower indirect registers to loads and stores off the indirect base into
plain temps.

For an x86_64 host, with sufficient registers, this results in identical
code, modulo the actual register assignments.

For an i686 host, with insufficient registers, this means that temps can
be (temporarily) spilled to the stack in order to satisfy an allocation.
This as opposed to the possibility of not being able to spill, to allocate
a register for the indirect base, in order to perform a spill.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-08-05 21:44:40 +05:30
Richard Henderson
dcb8e75870 tcg: Reorg TCGOp chaining
Instead of using -1 as end of chain, use 0, and link through the 0
entry as a fully circular double-linked list.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-08-05 21:44:18 +05:30
Paolo Bonzini
00f6da6a1a exec: extract exec/tb-context.h
TCG backends do not need most of exec-all.h; extract what they actually
need to a separate file or move it directly to tcg.h.  The next patch
will stop including exec-all.h from everywhere.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:29 +02:00
Paolo Bonzini
33c11879fd qemu-common: push cpu.h inclusion out of qemu-common.h
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:29 +02:00
Aurelien Jarno
eabb7b91b3 tcg: use tcg_debug_assert instead of assert (fix performance regression)
The TCG code is quite performance sensitive, but at the same time can
also be quite tricky. That is why asserts that can be enabled with the
--enable-debug-tcg configure option.

This used to work the following way:

| #include "config.h"
|
| ...
|
| #if !defined(CONFIG_DEBUG_TCG) && !defined(NDEBUG)
| /* define it to suppress various consistency checks (faster) */
| #define NDEBUG
| #endif
|
| ...
|
| #include <assert.h>

Since commit 757e725b (tcg: Clean up includes) "config.h" as been
replaced by "qemu/osdep.h" which itself includes <assert.h>. As a
consequence the assertions are always enabled, even when using
--disable-debug-tcg, causing a performance regression, especially on
targets with many registers. For instance on qemu-system-ppc the
speed difference is about 15%.

tcg_debug_assert is controlled directly by CONFIG_DEBUG_TCG and already
uses in some places. This patch replaces all the calls to assert into
calss to tcg_debug_assert.

Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id: 1461228530-14852-1-git-send-email-aurelien@aurel32.net
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-21 15:41:47 +01:00
Peter Maydell
757e725b58 tcg: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-16-git-send-email-peter.maydell@linaro.org
2016-01-29 15:07:23 +00:00
Richard Henderson
609ad70562 tcg: Split trunc_shr_i32 opcode into extr[lh]_i64_i32
Rather than allow arbitrary shift+trunc, only concern ourselves
with low and high parts.  This is all that was being used anyway.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Aurelien Jarno
8bcb5c8f34 tcg/optimize: add optimizations for ext_i32_i64 and extu_i32_i64 ops
They behave the same as ext32s_i64 and ext32u_i64 from the constant
folding and zero propagation point of view, except that they can't
be replaced by a mov, so we don't compute the affected value.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Aurelien Jarno
0632e555fc tcg: rename trunc_shr_i32 into trunc_shr_i64_i32
The op is sometimes named trunc_shr_i32 and sometimes trunc_shr_i64_i32,
and the name in the README doesn't match the name offered to the
frontends.

Always use the long name to make it clear it is a size changing op.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Aurelien Jarno
299f801304 tcg/optimize: allow constant to have copies
Now that copies and constants are tracked separately, we can allow
constant to have copies, deferring the choice to use a register or a
constant to the register allocation pass. This prevent this kind of
regular constant reloading:

-OUT: [size=338]
+OUT: [size=298]
   mov    -0x4(%r14),%ebp
   test   %ebp,%ebp
   jne    0x7ffbe9cb0ed6
   mov    $0x40002219f8,%rbp
   mov    %rbp,(%r14)
-  mov    $0x40002219f8,%rbp
   mov    $0x4000221a20,%rbx
   mov    %rbp,(%rbx)
   mov    $0x4000000000,%rbp
   mov    %rbp,(%r14)
-  mov    $0x4000000000,%rbp
   mov    $0x4000221d38,%rbx
   mov    %rbp,(%rbx)
   mov    $0x40002221a8,%rbp
   mov    %rbp,(%r14)
-  mov    $0x40002221a8,%rbp
   mov    $0x4000221d40,%rbx
   mov    %rbp,(%rbx)
   mov    $0x4000019170,%rbp
   mov    %rbp,(%r14)
-  mov    $0x4000019170,%rbp
   mov    $0x4000221d48,%rbx
   mov    %rbp,(%rbx)
   mov    $0x40000049ee,%rbp
   mov    %rbp,0x80(%r14)
   mov    %r14,%rdi
   callq  0x7ffbe99924d0
   mov    $0x4000001680,%rbp
   mov    %rbp,0x30(%r14)
   mov    0x10(%r14),%rbp
   mov    $0x4000001680,%rbp
   mov    %rbp,0x30(%r14)
   mov    0x10(%r14),%rbp
   shl    $0x20,%rbp
   mov    (%r14),%rbx
   mov    %ebx,%ebx
   mov    %rbx,(%r14)
   or     %rbx,%rbp
   mov    %rbp,0x10(%r14)
   mov    %rbp,0x90(%r14)
   mov    0x60(%r14),%rbx
   mov    %rbx,0x38(%r14)
   mov    0x28(%r14),%rbx
   mov    $0x4000220e60,%r12
   mov    %rbx,(%r12)
   mov    $0x40002219c8,%rbx
   mov    %rbp,(%rbx)
   mov    0x20(%r14),%rbp
   sub    $0x8,%rbp
   mov    $0x4000004a16,%rbx
   mov    %rbx,0x0(%rbp)
   mov    %rbp,0x20(%r14)
   mov    $0x19,%ebp
   mov    %ebp,0xa8(%r14)
   mov    $0x4000015110,%rbp
   mov    %rbp,0x80(%r14)
   xor    %eax,%eax
   jmpq   0x7ffbebcae426
   lea    -0x5f6d72a(%rip),%rax        # 0x7ffbe3d437b3
   jmpq   0x7ffbebcae426

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:54 -07:00
Aurelien Jarno
b41059dd9d tcg/optimize: track const/copy status separately
Instead of using an enum which could be either a copy or a const, track
them separately. This will be used in the next patch.

Constants are tracked through a bool. Copies are tracked by initializing
temp's next_copy and prev_copy to itself, allowing to simplify the code
a bit.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:53 -07:00
Aurelien Jarno
d9c769c609 tcg/optimize: add temp_is_const and temp_is_copy functions
Add two accessor functions temp_is_const and temp_is_copy, to make the
code more readable and make code change easier.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:53 -07:00
Aurelien Jarno
1208d7dd5f tcg/optimize: optimize temps tracking
The tcg_temp_info structure uses 24 bytes per temp. Now that we emulate
vector registers on most guests, it's not uncommon to have more than 100
used temps. This means we have initialize more than 2kB at least twice
per TB, often more when there is a few goto_tb.

Instead used a TCGTempSet bit array to track which temps are in used in
the current basic block. This means there are only around 16 bytes to
initialize.

This improves the boot time of a MIPS guest on an x86-64 host by around
7% and moves out tcg_optimize from the the top of the profiler list.

[rth: Handle TCG_CALL_DUMMY_ARG]

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:30 -07:00
Aurelien Jarno
29f3ff8d6c tcg/optimize: fix constant signedness
By convention, on a 64-bit host TCG internally stores 32-bit constants
as sign-extended. This is not the case in the optimizer when a 32-bit
constant is folded.

This doesn't seem to have more consequences than suboptimal code
generation. For instance the x86 backend assumes sign-extended constants,
and in some rare cases uses a 32-bit unsigned immediate 0xffffffff
instead of a 8-bit signed immediate 0xff for the constant -1. This is
with a ppc guest:

before
------

 ---- 0x9f29cc
 movi_i32 tmp1,$0xffffffff
 movi_i32 tmp2,$0x0
 add2_i32 tmp0,CA,CA,tmp2,r6,tmp2
 add2_i32 tmp0,CA,tmp0,CA,tmp1,tmp2
 mov_i32 r10,tmp0

0x7fd8c7dfe90c:  xor    %ebp,%ebp
0x7fd8c7dfe90e:  mov    %ebp,%r11d
0x7fd8c7dfe911:  mov    0x18(%r14),%r9d
0x7fd8c7dfe915:  add    %r9d,%r10d
0x7fd8c7dfe918:  adc    %ebp,%r11d
0x7fd8c7dfe91b:  add    $0xffffffff,%r10d
0x7fd8c7dfe922:  adc    %ebp,%r11d
0x7fd8c7dfe925:  mov    %r11d,0x134(%r14)
0x7fd8c7dfe92c:  mov    %r10d,0x28(%r14)

after
-----

 ---- 0x9f29cc
 movi_i32 tmp1,$0xffffffffffffffff
 movi_i32 tmp2,$0x0
 add2_i32 tmp0,CA,CA,tmp2,r6,tmp2
 add2_i32 tmp0,CA,tmp0,CA,tmp1,tmp2
 mov_i32 r10,tmp0

0x7f37010d490c:  xor    %ebp,%ebp
0x7f37010d490e:  mov    %ebp,%r11d
0x7f37010d4911:  mov    0x18(%r14),%r9d
0x7f37010d4915:  add    %r9d,%r10d
0x7f37010d4918:  adc    %ebp,%r11d
0x7f37010d491b:  add    $0xffffffffffffffff,%r10d
0x7f37010d491f:  adc    %ebp,%r11d
0x7f37010d4922:  mov    %r11d,0x134(%r14)
0x7f37010d4929:  mov    %r10d,0x28(%r14)

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1436544211-2769-2-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-24 11:10:08 -07:00
Aurelien Jarno
961521261a tcg/optimize: fix tcg_opt_gen_movi
Due to a copy&paste, the new op value is tested against mov_i32 instead
of movi_i32. The test is therefore always false. Fix that.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1436544211-2769-1-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-07-23 20:37:12 -07:00
Aurelien Jarno
36e60ef6ac tcg/optimize: rename tcg_constant_folding
The tcg_constant_folding folding ends up doing all the optimizations
(which is a good thing to avoid looping on all ops multiple time), so
make it clear and just rename it tcg_optimize.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447607-31184-6-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09 07:00:56 -07:00
Aurelien Jarno
97a79eb70d tcg/optimize: fold constant test in tcg_opt_gen_mov
Most of the calls to tcg_opt_gen_mov are preceeded by a test to check if
the source temp is a constant. Fold that into the tcg_opt_gen_mov
function.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433495958-9508-1-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09 07:00:56 -07:00
Aurelien Jarno
5365718a9a tcg/optimize: fold temp copies test in tcg_opt_gen_mov
Each call to tcg_opt_gen_mov is preceeded by a test to check if the
source and destination temps are copies. Fold that into the
tcg_opt_gen_mov function.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447607-31184-4-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09 07:00:56 -07:00
Aurelien Jarno
8d6a91602e tcg/optimize: remove opc argument from tcg_opt_gen_mov
We can get the opcode using the TCGOp pointer. It needs to be
dereferenced, but it's anyway done a few lines below to write
the new value.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447607-31184-3-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09 07:00:56 -07:00
Aurelien Jarno
ebd27391b0 tcg/optimize: remove opc argument from tcg_opt_gen_movi
We can get the opcode using the TCGOp pointer. It needs to be
dereferenced, but it's anyway done a few lines below to write
the new value.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1433447607-31184-2-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-09 07:00:56 -07:00
Richard Henderson
59227d5d45 tcg: Merge memop and mmu_idx parameters to qemu_ld/st
At the tcg opcode level, not at the tcg-op.h generator level.
This requires minor changes through all of the tcg backends,
but none of the cpu translators.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-05-14 12:14:55 -07:00
Richard Henderson
2374c4b837 tcg/optimize: Handle or r,a,a with constant a
As seen with ubuntu-5.10-live-powerpc.iso.

Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-03-16 08:46:13 -07:00
Richard Henderson
a4ce099a7a tcg: Implement insert_op_before
Rather reserving space in the op stream for optimization,
let the optimizer add ops as necessary.

Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-02-12 21:21:38 -08:00
Richard Henderson
0c627cdca2 tcg: Remove opcodes instead of noping them out
With the linked list scheme we need not leave nops in the stream
that we need to process later.

Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-02-12 21:21:38 -08:00
Richard Henderson
c45cb8bb89 tcg: Put opcodes in a linked list
The previous setup required ops and args to be completely sequential,
and was error prone when it came to both iteration and optimization.

Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-02-12 21:21:38 -08:00
Richard Henderson
bc8d688ff3 tcg/optimize: Don't special case TCG_OPF_CALL_CLOBBER
With the "old" ldst ops we didn't know the real width of the
result of the load, but with the "new" ldst ops we do.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-06-18 11:39:02 -07:00
Richard Henderson
3d1b2ff62c tcg: Remove TCG_TARGET_HAS_new_ldst
Since all backends have been converted, remove the compatibility code.

Acked-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-06-04 14:10:26 -07:00
Richard Henderson
24666baf1f tcg/optimize: Remember garbage high bits for 32-bit ops
For a 64-bit host, the high bits of a register after a 32-bit operation
are undefined.  Adjust the temps mask for all 32-bit ops to reflect that.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-05-28 09:33:56 -07:00
Richard Henderson
a62f6f5600 tcg/optimize: Move updating of gen_opc_buf into tcg_opt_gen_mov*
No functional change, just reduce a bit of redundancy.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-05-28 09:33:56 -07:00
Richard Henderson
a763551ad5 tcg: Optimize brcond2 and setcond2 ne/eq
If either the high or low pair can be resolved, we can
simplify to either a constant or to a 32-bit comparison.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-05-28 09:33:53 -07:00
Richard Henderson
cf06667428 tcg: Make call address a constant parameter
Avoid allocating a tcg temporary to hold the constant address,
and instead place it directly into the op_call arguments.

At the same time, convert to the newly introduced tcg_out_call
backend function, rather than invoking tcg_out_op for the call.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-05-12 11:13:12 -07:00
Richard Henderson
4bb7a41ed6 tcg: Add INDEX_op_trunc_shr_i32
Let the backend do something special for truncation.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-04-28 11:06:34 -07:00
Richard Henderson
d998e555d2 tcg: Fix out of range shift in deposit optimizations
By inspection, for a deposit(x, y, 0, 64), we'd have a shift of (1<<64)
and everything else falls apart.  But we can reuse the existing deposit
logic to get this right.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-04-18 16:57:36 -07:00
Richard Henderson
50c5c4d125 tcg: Mask shift quantities while folding
The TCG result would be undefined, but we can at least produce one
plausible result and avoid triggering the wrath of analysis tools.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-04-18 16:57:36 -07:00
Richard Henderson
464a1441c1 tcg/optimize: Add more identity simplifications
Recognize 0 operand to andc, and -1 operands to and, orc, eqv.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17 10:12:29 -06:00
Richard Henderson
e64e958e20 tcg/optimize: Optmize ANDC X,Y,Y to MOV X,0
Like we already do for SUB and XOR.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17 10:12:29 -06:00
Richard Henderson
e201b56418 tcg/optimize: Simply some logical ops to NOT
Given, of course, an appropriate constant.  These could be generated
from the "canonical" operation for inversion on the guest, or via
other optimizations.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17 10:12:29 -06:00
Richard Henderson
23ec69ed37 tcg/optimize: Handle known-zeros masks for ANDC
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17 10:12:29 -06:00
Aurelien Jarno
c8d7027253 tcg/optimize: add known-zero bits compute for load ops
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17 10:12:28 -06:00
Aurelien Jarno
f096dc9618 tcg/optimize: improve known-zero bits for 32-bit ops
The shl_i32 op might set some bits of the unused 32 high bits of the
mask. Fix that by clearing the unused 32 high bits for all 32-bit ops
except load/store which operate on tl values.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17 10:12:28 -06:00
Aurelien Jarno
3031244b01 tcg/optimize: fix known-zero bits optimization
Known-zero bits optimization is a great idea that helps to generate more
optimized code. However the current implementation only works in very few
cases as the computed mask is not saved.

Fix this to make it really working.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17 10:12:28 -06:00
Aurelien Jarno
e46b225a31 tcg/optimize: fix known-zero bits for right shift ops
32-bit versions of sar and shr ops should not propagate known-zero bits
from the unused 32 high bits. For sar it could even lead to wrong code
being generated.

Cc: qemu-stable@nongnu.org
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-02-17 10:12:28 -06:00
Stefan Weil
3df2b8fde9 misc: Use new rotate functions
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2013-09-25 21:23:05 +02:00
Richard Henderson
01547f7f92 tcg: Constant fold div, rem
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:29 -07:00
Richard Henderson
03271524b6 tcg: Add muluh and mulsh opcodes
Use them in places where mulu2 and muls2 are used.
Optimize mulx2 with dead low part to mulxh.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:29 -07:00
Aurelien Jarno
66e61b55f1 tcg/optimize: fix setcond2 optimization
When setcond2 is rewritten into setcond, the state of the destination
temp should be reset, so that a copy of the previous value is not
used instead of the result.

Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2013-05-09 16:14:58 +02:00
Richard Henderson
2d497542e1 tcg-optimize: Fold sub r,0,x to neg r,x
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-03-23 14:31:03 +00:00
Richard Henderson
4d3203fd0b tcg: Add signed multiword multiplication operations
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Richard Henderson
d7156f7ce4 tcg: Add 64-bit multiword arithmetic operations
Matching the 32-bit multiword arithmetic that we already have.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Paolo Bonzini
633f650254 optimize: optimize using nonzero bits
This adds two optimizations using the non-zero bit mask.  In some cases
involving shifts or ANDs the value can become zero, and can thus be
optimized to a move of zero.  Second, useless zero-extension or an
AND with constant can be detected that would only zero bits that are
already zero.

The main advantage of this optimization is that it turns zero-extensions
into moves, thus enabling much better copy propagation (around 1% code
reduction).  Here is for example a "test $0xff0000,%ecx + je" before
optimization:

 mov_i64 tmp0,rcx
 movi_i64 tmp1,$0xff0000
 discard cc_src
 and_i64 cc_dst,tmp0,tmp1
 movi_i32 cc_op,$0x1c
 ext32u_i64 tmp0,cc_dst
 movi_i64 tmp12,$0x0
 brcond_i64 tmp0,tmp12,eq,$0x0

and after (without patch on the left, with on the right):

 movi_i64 tmp1,$0xff0000                 movi_i64 tmp1,$0xff0000
 discard cc_src                          discard cc_src
 and_i64 cc_dst,rcx,tmp1                 and_i64 cc_dst,rcx,tmp1
 movi_i32 cc_op,$0x1c                    movi_i32 cc_op,$0x1c
 ext32u_i64 tmp0,cc_dst
 movi_i64 tmp12,$0x0                     movi_i64 tmp12,$0x0
 brcond_i64 tmp0,tmp12,eq,$0x0           brcond_i64 cc_dst,tmp12,eq,$0x0

Other similar cases: "test %eax, %eax + jne" where eax is already 32-bit
(after optimization, without patch on the left, with on the right):

 discard cc_src                          discard cc_src
 mov_i64 cc_dst,rax                      mov_i64 cc_dst,rax
 movi_i32 cc_op,$0x1c                    movi_i32 cc_op,$0x1c
 ext32u_i64 tmp0,cc_dst
 movi_i64 tmp12,$0x0                     movi_i64 tmp12,$0x0
 brcond_i64 tmp0,tmp12,ne,$0x0           brcond_i64 rax,tmp12,ne,$0x0

"test $0x1, %dl + je":

 movi_i64 tmp1,$0x1                      movi_i64 tmp1,$0x1
 discard cc_src                          discard cc_src
 and_i64 cc_dst,rdx,tmp1                 and_i64 cc_dst,rdx,tmp1
 movi_i32 cc_op,$0x1a                    movi_i32 cc_op,$0x1a
 ext8u_i64 tmp0,cc_dst
 movi_i64 tmp12,$0x0                     movi_i64 tmp12,$0x0
 brcond_i64 tmp0,tmp12,eq,$0x0           brcond_i64 cc_dst,tmp12,eq,$0x0

In some cases TCG even outsmarts GCC. :)  Here the input code has
"and $0x2,%eax + movslq %eax,%rbx + test %rbx, %rbx" and the optimizer,
thanks to copy propagation, does the following:

 movi_i64 tmp12,$0x2                     movi_i64 tmp12,$0x2
 and_i64 rax,rax,tmp12                   and_i64 rax,rax,tmp12
 mov_i64 cc_dst,rax                      mov_i64 cc_dst,rax
 ext32s_i64 tmp0,rax                  -> nop
 mov_i64 rbx,tmp0                     -> mov_i64 rbx,cc_dst
 and_i64 cc_dst,rbx,rbx               -> nop

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-19 10:13:16 +00:00
Paolo Bonzini
3a9d8b179b optimize: track nonzero bits of registers
Add a "mask" field to the tcg_temp_info struct.  A bit that is zero
in "mask" will always be zero in the corresponding temporary.
Zero bits in the mask can be produced from moves of immediates,
zero-extensions, ANDs with constants, shifts; they can then be
be propagated by logical operations, shifts, sign-extensions,
negations, deposit operations, and conditional moves.  Other
operations will just reset the mask to all-ones, i.e. unknown.

[rth: s/target_ulong/tcg_target_ulong/]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-19 10:13:14 +00:00
Paolo Bonzini
d193a14a2c optimize: only write to state when clearing optimizer data
The next patch will add to the TCG optimizer a field that should be
non-zero in the default case.  Thus, replace the memset of the
temps array with a loop.  Only the state field has to be up-to-date,
because others are not used except if the state is TCG_TEMP_COPY
or TCG_TEMP_CONST.

[rth: Extracted the loop to a function.]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-19 10:13:13 +00:00
Evgeny Voevodin
92414b31e7 TCG: Use gen_opc_buf from context instead of global variable.
Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-17 13:53:36 +00:00