mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Paolo Bonzini	33c11879fd	qemu-common: push cpu.h inclusion out of qemu-common.h Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-05-19 16:42:29 +02:00
Aurelien Jarno	eabb7b91b3	tcg: use tcg_debug_assert instead of assert (fix performance regression) The TCG code is quite performance sensitive, but at the same time can also be quite tricky. That is why asserts that can be enabled with the --enable-debug-tcg configure option. This used to work the following way: \| #include "config.h" \| \| ... \| \| #if !defined(CONFIG_DEBUG_TCG) && !defined(NDEBUG) \| /* define it to suppress various consistency checks (faster) */ \| #define NDEBUG \| #endif \| \| ... \| \| #include <assert.h> Since commit `757e725b` (tcg: Clean up includes) "config.h" as been replaced by "qemu/osdep.h" which itself includes <assert.h>. As a consequence the assertions are always enabled, even when using --disable-debug-tcg, causing a performance regression, especially on targets with many registers. For instance on qemu-system-ppc the speed difference is about 15%. tcg_debug_assert is controlled directly by CONFIG_DEBUG_TCG and already uses in some places. This patch replaces all the calls to assert into calss to tcg_debug_assert. Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-id: 1461228530-14852-1-git-send-email-aurelien@aurel32.net Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-04-21 15:41:47 +01:00
Peter Maydell	757e725b58	tcg: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1453832250-766-16-git-send-email-peter.maydell@linaro.org	2016-01-29 15:07:23 +00:00
Richard Henderson	609ad70562	tcg: Split trunc_shr_i32 opcode into extr[lh]_i64_i32 Rather than allow arbitrary shift+trunc, only concern ourselves with low and high parts. This is all that was being used anyway. Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-08-24 11:10:54 -07:00
Aurelien Jarno	8bcb5c8f34	tcg/optimize: add optimizations for ext_i32_i64 and extu_i32_i64 ops They behave the same as ext32s_i64 and ext32u_i64 from the constant folding and zero propagation point of view, except that they can't be replaced by a mov, so we don't compute the affected value. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-08-24 11:10:54 -07:00
Aurelien Jarno	0632e555fc	tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 The op is sometimes named trunc_shr_i32 and sometimes trunc_shr_i64_i32, and the name in the README doesn't match the name offered to the frontends. Always use the long name to make it clear it is a size changing op. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-08-24 11:10:54 -07:00
Aurelien Jarno	299f801304	tcg/optimize: allow constant to have copies Now that copies and constants are tracked separately, we can allow constant to have copies, deferring the choice to use a register or a constant to the register allocation pass. This prevent this kind of regular constant reloading: -OUT: [size=338] +OUT: [size=298] mov -0x4(%r14),%ebp test %ebp,%ebp jne 0x7ffbe9cb0ed6 mov $0x40002219f8,%rbp mov %rbp,(%r14) - mov $0x40002219f8,%rbp mov $0x4000221a20,%rbx mov %rbp,(%rbx) mov $0x4000000000,%rbp mov %rbp,(%r14) - mov $0x4000000000,%rbp mov $0x4000221d38,%rbx mov %rbp,(%rbx) mov $0x40002221a8,%rbp mov %rbp,(%r14) - mov $0x40002221a8,%rbp mov $0x4000221d40,%rbx mov %rbp,(%rbx) mov $0x4000019170,%rbp mov %rbp,(%r14) - mov $0x4000019170,%rbp mov $0x4000221d48,%rbx mov %rbp,(%rbx) mov $0x40000049ee,%rbp mov %rbp,0x80(%r14) mov %r14,%rdi callq 0x7ffbe99924d0 mov $0x4000001680,%rbp mov %rbp,0x30(%r14) mov 0x10(%r14),%rbp mov $0x4000001680,%rbp mov %rbp,0x30(%r14) mov 0x10(%r14),%rbp shl $0x20,%rbp mov (%r14),%rbx mov %ebx,%ebx mov %rbx,(%r14) or %rbx,%rbp mov %rbp,0x10(%r14) mov %rbp,0x90(%r14) mov 0x60(%r14),%rbx mov %rbx,0x38(%r14) mov 0x28(%r14),%rbx mov $0x4000220e60,%r12 mov %rbx,(%r12) mov $0x40002219c8,%rbx mov %rbp,(%rbx) mov 0x20(%r14),%rbp sub $0x8,%rbp mov $0x4000004a16,%rbx mov %rbx,0x0(%rbp) mov %rbp,0x20(%r14) mov $0x19,%ebp mov %ebp,0xa8(%r14) mov $0x4000015110,%rbp mov %rbp,0x80(%r14) xor %eax,%eax jmpq 0x7ffbebcae426 lea -0x5f6d72a(%rip),%rax # 0x7ffbe3d437b3 jmpq 0x7ffbebcae426 Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-08-24 11:10:54 -07:00
Aurelien Jarno	b41059dd9d	tcg/optimize: track const/copy status separately Instead of using an enum which could be either a copy or a const, track them separately. This will be used in the next patch. Constants are tracked through a bool. Copies are tracked by initializing temp's next_copy and prev_copy to itself, allowing to simplify the code a bit. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-08-24 11:10:53 -07:00
Aurelien Jarno	d9c769c609	tcg/optimize: add temp_is_const and temp_is_copy functions Add two accessor functions temp_is_const and temp_is_copy, to make the code more readable and make code change easier. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-08-24 11:10:53 -07:00
Aurelien Jarno	1208d7dd5f	tcg/optimize: optimize temps tracking The tcg_temp_info structure uses 24 bytes per temp. Now that we emulate vector registers on most guests, it's not uncommon to have more than 100 used temps. This means we have initialize more than 2kB at least twice per TB, often more when there is a few goto_tb. Instead used a TCGTempSet bit array to track which temps are in used in the current basic block. This means there are only around 16 bytes to initialize. This improves the boot time of a MIPS guest on an x86-64 host by around 7% and moves out tcg_optimize from the the top of the profiler list. [rth: Handle TCG_CALL_DUMMY_ARG] Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-08-24 11:10:30 -07:00
Aurelien Jarno	29f3ff8d6c	tcg/optimize: fix constant signedness By convention, on a 64-bit host TCG internally stores 32-bit constants as sign-extended. This is not the case in the optimizer when a 32-bit constant is folded. This doesn't seem to have more consequences than suboptimal code generation. For instance the x86 backend assumes sign-extended constants, and in some rare cases uses a 32-bit unsigned immediate 0xffffffff instead of a 8-bit signed immediate 0xff for the constant -1. This is with a ppc guest: before ------ ---- 0x9f29cc movi_i32 tmp1,$0xffffffff movi_i32 tmp2,$0x0 add2_i32 tmp0,CA,CA,tmp2,r6,tmp2 add2_i32 tmp0,CA,tmp0,CA,tmp1,tmp2 mov_i32 r10,tmp0 0x7fd8c7dfe90c: xor %ebp,%ebp 0x7fd8c7dfe90e: mov %ebp,%r11d 0x7fd8c7dfe911: mov 0x18(%r14),%r9d 0x7fd8c7dfe915: add %r9d,%r10d 0x7fd8c7dfe918: adc %ebp,%r11d 0x7fd8c7dfe91b: add $0xffffffff,%r10d 0x7fd8c7dfe922: adc %ebp,%r11d 0x7fd8c7dfe925: mov %r11d,0x134(%r14) 0x7fd8c7dfe92c: mov %r10d,0x28(%r14) after ----- ---- 0x9f29cc movi_i32 tmp1,$0xffffffffffffffff movi_i32 tmp2,$0x0 add2_i32 tmp0,CA,CA,tmp2,r6,tmp2 add2_i32 tmp0,CA,tmp0,CA,tmp1,tmp2 mov_i32 r10,tmp0 0x7f37010d490c: xor %ebp,%ebp 0x7f37010d490e: mov %ebp,%r11d 0x7f37010d4911: mov 0x18(%r14),%r9d 0x7f37010d4915: add %r9d,%r10d 0x7f37010d4918: adc %ebp,%r11d 0x7f37010d491b: add $0xffffffffffffffff,%r10d 0x7f37010d491f: adc %ebp,%r11d 0x7f37010d4922: mov %r11d,0x134(%r14) 0x7f37010d4929: mov %r10d,0x28(%r14) Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1436544211-2769-2-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-08-24 11:10:08 -07:00
Aurelien Jarno	961521261a	tcg/optimize: fix tcg_opt_gen_movi Due to a copy&paste, the new op value is tested against mov_i32 instead of movi_i32. The test is therefore always false. Fix that. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1436544211-2769-1-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-07-23 20:37:12 -07:00
Aurelien Jarno	36e60ef6ac	tcg/optimize: rename tcg_constant_folding The tcg_constant_folding folding ends up doing all the optimizations (which is a good thing to avoid looping on all ops multiple time), so make it clear and just rename it tcg_optimize. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1433447607-31184-6-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-06-09 07:00:56 -07:00
Aurelien Jarno	97a79eb70d	tcg/optimize: fold constant test in tcg_opt_gen_mov Most of the calls to tcg_opt_gen_mov are preceeded by a test to check if the source temp is a constant. Fold that into the tcg_opt_gen_mov function. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1433495958-9508-1-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-06-09 07:00:56 -07:00
Aurelien Jarno	5365718a9a	tcg/optimize: fold temp copies test in tcg_opt_gen_mov Each call to tcg_opt_gen_mov is preceeded by a test to check if the source and destination temps are copies. Fold that into the tcg_opt_gen_mov function. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1433447607-31184-4-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-06-09 07:00:56 -07:00
Aurelien Jarno	8d6a91602e	tcg/optimize: remove opc argument from tcg_opt_gen_mov We can get the opcode using the TCGOp pointer. It needs to be dereferenced, but it's anyway done a few lines below to write the new value. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1433447607-31184-3-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-06-09 07:00:56 -07:00
Aurelien Jarno	ebd27391b0	tcg/optimize: remove opc argument from tcg_opt_gen_movi We can get the opcode using the TCGOp pointer. It needs to be dereferenced, but it's anyway done a few lines below to write the new value. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1433447607-31184-2-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-06-09 07:00:56 -07:00
Richard Henderson	59227d5d45	tcg: Merge memop and mmu_idx parameters to qemu_ld/st At the tcg opcode level, not at the tcg-op.h generator level. This requires minor changes through all of the tcg backends, but none of the cpu translators. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-05-14 12:14:55 -07:00
Richard Henderson	2374c4b837	tcg/optimize: Handle or r,a,a with constant a As seen with ubuntu-5.10-live-powerpc.iso. Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-03-16 08:46:13 -07:00
Richard Henderson	a4ce099a7a	tcg: Implement insert_op_before Rather reserving space in the op stream for optimization, let the optimizer add ops as necessary. Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-02-12 21:21:38 -08:00
Richard Henderson	0c627cdca2	tcg: Remove opcodes instead of noping them out With the linked list scheme we need not leave nops in the stream that we need to process later. Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-02-12 21:21:38 -08:00
Richard Henderson	c45cb8bb89	tcg: Put opcodes in a linked list The previous setup required ops and args to be completely sequential, and was error prone when it came to both iteration and optimization. Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-02-12 21:21:38 -08:00
Richard Henderson	bc8d688ff3	tcg/optimize: Don't special case TCG_OPF_CALL_CLOBBER With the "old" ldst ops we didn't know the real width of the result of the load, but with the "new" ldst ops we do. Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-06-18 11:39:02 -07:00
Richard Henderson	3d1b2ff62c	tcg: Remove TCG_TARGET_HAS_new_ldst Since all backends have been converted, remove the compatibility code. Acked-by: Claudio Fontana <claudio.fontana@huawei.com> Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-06-04 14:10:26 -07:00
Richard Henderson	24666baf1f	tcg/optimize: Remember garbage high bits for 32-bit ops For a 64-bit host, the high bits of a register after a 32-bit operation are undefined. Adjust the temps mask for all 32-bit ops to reflect that. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-05-28 09:33:56 -07:00
Richard Henderson	a62f6f5600	tcg/optimize: Move updating of gen_opc_buf into tcg_opt_gen_mov* No functional change, just reduce a bit of redundancy. Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-05-28 09:33:56 -07:00
Richard Henderson	a763551ad5	tcg: Optimize brcond2 and setcond2 ne/eq If either the high or low pair can be resolved, we can simplify to either a constant or to a 32-bit comparison. Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-05-28 09:33:53 -07:00
Richard Henderson	cf06667428	tcg: Make call address a constant parameter Avoid allocating a tcg temporary to hold the constant address, and instead place it directly into the op_call arguments. At the same time, convert to the newly introduced tcg_out_call backend function, rather than invoking tcg_out_op for the call. Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-05-12 11:13:12 -07:00
Richard Henderson	4bb7a41ed6	tcg: Add INDEX_op_trunc_shr_i32 Let the backend do something special for truncation. Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-04-28 11:06:34 -07:00
Richard Henderson	d998e555d2	tcg: Fix out of range shift in deposit optimizations By inspection, for a deposit(x, y, 0, 64), we'd have a shift of (1<<64) and everything else falls apart. But we can reuse the existing deposit logic to get this right. Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-04-18 16:57:36 -07:00
Richard Henderson	50c5c4d125	tcg: Mask shift quantities while folding The TCG result would be undefined, but we can at least produce one plausible result and avoid triggering the wrath of analysis tools. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-04-18 16:57:36 -07:00
Richard Henderson	464a1441c1	tcg/optimize: Add more identity simplifications Recognize 0 operand to andc, and -1 operands to and, orc, eqv. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-02-17 10:12:29 -06:00
Richard Henderson	e64e958e20	tcg/optimize: Optmize ANDC X,Y,Y to MOV X,0 Like we already do for SUB and XOR. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-02-17 10:12:29 -06:00
Richard Henderson	e201b56418	tcg/optimize: Simply some logical ops to NOT Given, of course, an appropriate constant. These could be generated from the "canonical" operation for inversion on the guest, or via other optimizations. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-02-17 10:12:29 -06:00
Richard Henderson	23ec69ed37	tcg/optimize: Handle known-zeros masks for ANDC Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-02-17 10:12:29 -06:00
Aurelien Jarno	c8d7027253	tcg/optimize: add known-zero bits compute for load ops Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-02-17 10:12:28 -06:00
Aurelien Jarno	f096dc9618	tcg/optimize: improve known-zero bits for 32-bit ops The shl_i32 op might set some bits of the unused 32 high bits of the mask. Fix that by clearing the unused 32 high bits for all 32-bit ops except load/store which operate on tl values. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-02-17 10:12:28 -06:00
Aurelien Jarno	3031244b01	tcg/optimize: fix known-zero bits optimization Known-zero bits optimization is a great idea that helps to generate more optimized code. However the current implementation only works in very few cases as the computed mask is not saved. Fix this to make it really working. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-02-17 10:12:28 -06:00
Aurelien Jarno	e46b225a31	tcg/optimize: fix known-zero bits for right shift ops 32-bit versions of sar and shr ops should not propagate known-zero bits from the unused 32 high bits. For sar it could even lead to wrong code being generated. Cc: qemu-stable@nongnu.org Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-02-17 10:12:28 -06:00
Stefan Weil	3df2b8fde9	misc: Use new rotate functions Signed-off-by: Stefan Weil <sw@weilnetz.de>	2013-09-25 21:23:05 +02:00
Richard Henderson	01547f7f92	tcg: Constant fold div, rem Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2013-09-02 09:08:29 -07:00
Richard Henderson	03271524b6	tcg: Add muluh and mulsh opcodes Use them in places where mulu2 and muls2 are used. Optimize mulx2 with dead low part to mulxh. Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2013-09-02 09:08:29 -07:00
Aurelien Jarno	66e61b55f1	tcg/optimize: fix setcond2 optimization When setcond2 is rewritten into setcond, the state of the destination temp should be reset, so that a copy of the previous value is not used instead of the result. Reported-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2013-05-09 16:14:58 +02:00
Richard Henderson	2d497542e1	tcg-optimize: Fold sub r,0,x to neg r,x Cc: Blue Swirl <blauwirbel@gmail.com> Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2013-03-23 14:31:03 +00:00
Richard Henderson	4d3203fd0b	tcg: Add signed multiword multiplication operations Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2013-02-23 17:25:28 +00:00
Richard Henderson	d7156f7ce4	tcg: Add 64-bit multiword arithmetic operations Matching the 32-bit multiword arithmetic that we already have. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2013-02-23 17:25:28 +00:00
Paolo Bonzini	633f650254	optimize: optimize using nonzero bits This adds two optimizations using the non-zero bit mask. In some cases involving shifts or ANDs the value can become zero, and can thus be optimized to a move of zero. Second, useless zero-extension or an AND with constant can be detected that would only zero bits that are already zero. The main advantage of this optimization is that it turns zero-extensions into moves, thus enabling much better copy propagation (around 1% code reduction). Here is for example a "test $0xff0000,%ecx + je" before optimization: mov_i64 tmp0,rcx movi_i64 tmp1,$0xff0000 discard cc_src and_i64 cc_dst,tmp0,tmp1 movi_i32 cc_op,$0x1c ext32u_i64 tmp0,cc_dst movi_i64 tmp12,$0x0 brcond_i64 tmp0,tmp12,eq,$0x0 and after (without patch on the left, with on the right): movi_i64 tmp1,$0xff0000 movi_i64 tmp1,$0xff0000 discard cc_src discard cc_src and_i64 cc_dst,rcx,tmp1 and_i64 cc_dst,rcx,tmp1 movi_i32 cc_op,$0x1c movi_i32 cc_op,$0x1c ext32u_i64 tmp0,cc_dst movi_i64 tmp12,$0x0 movi_i64 tmp12,$0x0 brcond_i64 tmp0,tmp12,eq,$0x0 brcond_i64 cc_dst,tmp12,eq,$0x0 Other similar cases: "test %eax, %eax + jne" where eax is already 32-bit (after optimization, without patch on the left, with on the right): discard cc_src discard cc_src mov_i64 cc_dst,rax mov_i64 cc_dst,rax movi_i32 cc_op,$0x1c movi_i32 cc_op,$0x1c ext32u_i64 tmp0,cc_dst movi_i64 tmp12,$0x0 movi_i64 tmp12,$0x0 brcond_i64 tmp0,tmp12,ne,$0x0 brcond_i64 rax,tmp12,ne,$0x0 "test $0x1, %dl + je": movi_i64 tmp1,$0x1 movi_i64 tmp1,$0x1 discard cc_src discard cc_src and_i64 cc_dst,rdx,tmp1 and_i64 cc_dst,rdx,tmp1 movi_i32 cc_op,$0x1a movi_i32 cc_op,$0x1a ext8u_i64 tmp0,cc_dst movi_i64 tmp12,$0x0 movi_i64 tmp12,$0x0 brcond_i64 tmp0,tmp12,eq,$0x0 brcond_i64 cc_dst,tmp12,eq,$0x0 In some cases TCG even outsmarts GCC. :) Here the input code has "and $0x2,%eax + movslq %eax,%rbx + test %rbx, %rbx" and the optimizer, thanks to copy propagation, does the following: movi_i64 tmp12,$0x2 movi_i64 tmp12,$0x2 and_i64 rax,rax,tmp12 and_i64 rax,rax,tmp12 mov_i64 cc_dst,rax mov_i64 cc_dst,rax ext32s_i64 tmp0,rax -> nop mov_i64 rbx,tmp0 -> mov_i64 rbx,cc_dst and_i64 cc_dst,rbx,rbx -> nop Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2013-01-19 10:13:16 +00:00
Paolo Bonzini	3a9d8b179b	optimize: track nonzero bits of registers Add a "mask" field to the tcg_temp_info struct. A bit that is zero in "mask" will always be zero in the corresponding temporary. Zero bits in the mask can be produced from moves of immediates, zero-extensions, ANDs with constants, shifts; they can then be be propagated by logical operations, shifts, sign-extensions, negations, deposit operations, and conditional moves. Other operations will just reset the mask to all-ones, i.e. unknown. [rth: s/target_ulong/tcg_target_ulong/] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2013-01-19 10:13:14 +00:00
Paolo Bonzini	d193a14a2c	optimize: only write to state when clearing optimizer data The next patch will add to the TCG optimizer a field that should be non-zero in the default case. Thus, replace the memset of the temps array with a loop. Only the state field has to be up-to-date, because others are not used except if the state is TCG_TEMP_COPY or TCG_TEMP_CONST. [rth: Extracted the loop to a function.] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2013-01-19 10:13:13 +00:00
Evgeny Voevodin	92414b31e7	TCG: Use gen_opc_buf from context instead of global variable. Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-11-17 13:53:36 +00:00
Aurelien Jarno	7850527966	tcg: rework TCG helper flags The current helper flags, TCG_CALL_CONST and TCG_CALL_PURE might be confusing and doesn't provide enough granularity for some helpers (FP helpers for example). This patch changes them into the following helpers flags: - TCG_CALL_NO_READ_GLOBALS means that the helper does not read globals, either directly or via an exception. They will not be saved to their canonical location before calling the helper. - TCG_CALL_NO_WRITE_GLOBALS means that the helper does not modify any globals. They will only be saved to their canonical locations before calling helpers, but they won't be reloaded afterwise. - TCG_CALL_NO_SIDE_EFFECTS means that the call to the function is removed if the return value is not used. It provides convenience flags, to avoid helper definitions longer than 80 characters. It also provides compatibility flags, and updates the documentation. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:23 +01:00
Richard Henderson	1414968a6a	tcg: Optimize mulu2 Like add2, do operand ordering, constant folding, and dead operand elimination. The latter happens about 15% of all mulu2 during an x86_64 bios boot. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:51:39 +02:00
Richard Henderson	212c328d61	tcg: Constant fold add2 and sub2 Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:51:37 +02:00
Richard Henderson	6c4382f8f4	tcg: Do constant folding on double-word comparisons Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:51:35 +02:00
Richard Henderson	9519da7e39	tcg: Split out subroutines from do_constant_folding_cond We can re-use these for implementing double-word folding. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:51:32 +02:00
Richard Henderson	bc1473eff4	tcg: Optimize double-word comparisons against zero Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:32:29 +02:00
Richard Henderson	6e14e91b66	tcg: Use common code when failing to optimize This saves a whole lot of repetitive code sequences. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:32:01 +02:00
Richard Henderson	0bfcb86538	tcg: Swap commutative double-word comparisons Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:31:57 +02:00
Richard Henderson	1e484e61e2	tcg: Canonicalize add2 operand ordering Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:31:53 +02:00
Richard Henderson	24c9ae4eba	tcg: Split out swap_commutative as a subroutine Reduces code duplication and prefers movcond d, c1, c2, const, s to movcond d, c1, c2, s, const It also prefers add r, r, c over add r, c, r when both inputs are known constants. This doesn't matter for true add, as we will fully constant fold that. But it matters for a follow-on patch using this routine for add2 which may not be fully foldable. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:30:40 +02:00
Richard Henderson	0aed257f08	tcg: Add TCG_COND_NEVER, TCG_COND_ALWAYS There are several cases that can be handled easier inside both translators and code generators if we have out-of-band values for conditions. It's easy enough to handle ALWAYS and NEVER in the natural way inside the tcg middle-end. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-06 18:48:40 +02:00
Aurelien Jarno	7ef55fc919	tcg/optimize: add constant folding for deposit Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-22 15:10:21 +02:00
Aurelien Jarno	c2b0e2fea2	tcg/optimize: prefer the "op a, a, b" form for commutative ops The "op a, a, b" form is better handled on non-RISC host than the "op a, b, a" form, so swap the arguments to this form when possible, and when b is not a constant. This reduces the number of generated instructions by a tiny bit. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-22 15:10:21 +02:00
Aurelien Jarno	b336ceb691	tcg/optimize: further optimize brcond/movcond/setcond When both argument of brcond/movcond/setcond are the same or when one of the two values is a constant equal to zero, it's possible to do further optimizations. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-22 15:10:21 +02:00
Aurelien Jarno	3c94193e0b	tcg/optimize: optimize "op r, a, a => movi r, 0" Now that it's possible to detect copies, we can optimize the case the "op r, a, a => movi r, 0". This helps in the computation of overflow flags when one of the two args is 0. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-22 15:10:21 +02:00
Aurelien Jarno	0aba1c7376	tcg/optimize: optimize "op r, a, a => mov r, a" Now that we can easily detect all copies, we can optimize the "op r, a, a => mov r, a" case a bit more. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-22 15:10:21 +02:00
Aurelien Jarno	1ff8c5418a	tcg/optimize: do copy propagation for all operations It is possible to due copy propagation for all operations, even the one that have side effects or clobber arguments (it only concerns input arguments). That said, the call operation should be handled differently due to the variable number of arguments. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-22 15:10:21 +02:00
Aurelien Jarno	e590d4e6b3	tcg/optimize: rework copy progagation The copy propagation pass tries to keep track what is a copy of what and what has copy of what, and in addition it keep a circular list of of all the copies. Unfortunately this doesn't fully work: a mov from a temp which has a state "COPY" changed it into a state "HAS_COPY". Later when this temp is used again, it is considered has not having copy and thus no propagation is done. This patch fixes that by removing the hiearchy between copies, and thus only keeping a "COPY" state both meaning "is a copy" and "has a copy". The decision of which copy to use is deferred to the actual temp replacement. At this stage there is not one best choice to do, but only better choices than others. For doing the best choice the operation would have to be parsed in reversed to know if a temp is going to be used later or not. That what is done by the liveness analysis. At this stage it is known that globals will be always live, that local temps will be dead at the end of the translation block, and that the temps will be dead at the end of the basic block. This means that this stage should try to replace temps by local temps or globals and local temps by globals. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-22 15:10:21 +02:00
Aurelien Jarno	b80bb016d8	tcg/optimize: check types in copy propagation The copy propagation doesn't check the types of the temps during copy propagation. However TCG is using the mov_i32 for the i64 to i32 conversion and thus the two are not equivalent. With this patch tcg_opt_gen_mov() doesn't consider two temps of different type as copies anymore. So far it seems the optimization was not aggressive enough to trigger this bug, but it will be triggered later in this series once the copy propagation is improved. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-22 15:10:20 +02:00
Aurelien Jarno	48b56ce168	tcg/optimize: remove TCG_TEMP_ANY TCG_TEMP_ANY has no different meaning than TCG_TEMP_UNDEF, so use the later instead. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-22 15:10:20 +02:00
Richard Henderson	5d8f536300	tcg: Optimize two-address commutative operations While swapping constants to the second operand, swap sources matching destinations to the first operand. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-21 19:53:17 +02:00
Richard Henderson	fa01a2084e	tcg: Optimize movcond for constant comparisons Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-21 19:53:17 +02:00
Aurelien Jarno	a255066039	tcg/optimize: fix end of basic block detection Commit `e31b0a7c05` fixed copy propagation on 32-bit host by restricting the copy between different types. This was the wrong fix. The real problem is that the all temps states should be reset at the end of a basic block. This was done by adding such operations in the switch, but brcond2 was forgotten (that's why the crash was only observed on 32-bit hosts). Fix that by looking at the TCG_OPF_BB_END instead. We need to keep the case for op_set_label as temps might be modified through another path. Cc: Blue Swirl <blauwirbel@gmail.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-19 21:53:46 +02:00
Aurelien Jarno	d104bebd07	revert "TCG: fix copy propagation" Given the copy propagation breakage on 32-bit hosts has been fixed commit `e31b0a7c05` can be reverted. Cc: Blue Swirl <blauwirbel@gmail.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-19 21:40:47 +02:00
Aurelien Jarno	fedc0da251	tcg/optimize: fix if/else/break coding style optimizer.c contains some cases were the break is appearing in both the if and the else parts. Fix that by moving it to the outer part. Also move some common code there. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-11 18:06:04 +02:00
Aurelien Jarno	fbeaa26c4c	tcg/optimize: add constant folding for brcond Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-11 18:06:03 +02:00
Aurelien Jarno	f8dd19e5c7	tcg/optimize: add constant folding for setcond Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-11 18:06:01 +02:00
Aurelien Jarno	65a7cce17d	tcg/optimize: swap brcond/setcond arguments when possible brcond and setcond ops are not commutative, but it's easy to compute the new condition after swapping the arguments. Try to always put the constant argument in second position like for commutative ops, to help backends to generate better code. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-11 18:05:59 +02:00
Aurelien Jarno	01ee5282ea	tcg/optimize: simplify shift/rot r, 0, a => movi r, 0 cases shift/rot r, 0, a is equivalent to movi r, 0. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-11 18:05:58 +02:00
Aurelien Jarno	61251c0c79	tcg/optimize: simplify and r, a, 0 cases and r, a, 0 is equivalent to a movi r, 0. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-11 18:05:57 +02:00
Aurelien Jarno	38ee188b1b	tcg/optimize: simplify or/xor r, a, 0 cases or/xor r, a, 0 is equivalent to a mov r, a. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-11 18:05:56 +02:00
Aurelien Jarno	56e4943825	tcg/optimize: split expression simplification Split expression simplification in multiple parts so that a given op can appear multiple times. This patch should not change anything. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-09-11 18:05:55 +02:00
Blue Swirl	fe0de7aa5e	TCG: improve optimizer debugging Use enum TCGOpcode instead of plain old int so that the name of current op can be seen in GDB. Add a default case to switch so that GCC does not complain about unhandled enum cases. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2011-08-28 07:17:27 +00:00
Richard Henderson	cb25c80a9b	tcg: Constant fold neg, andc, orc, eqv, nand, nor. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2011-08-21 18:52:25 +00:00
Richard Henderson	25c4d9cc84	tcg: Always define all of the TCGOpcode enum members. By always defining these symbols, we can eliminate a lot of ifdefs. To allow this to be checked reliably, the semantics of the TCG_TARGET_HAS_* macros must be changed from def/undef to true/false. This allows even more ifdefs to be removed, converting them into C if statements. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2011-08-21 18:52:24 +00:00
Richard Henderson	8399ad59e7	tcg: Add and use TCG_OPF_64BIT. This allows the simplification of the op_bits function from tcg/optimize.c. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2011-08-21 18:52:22 +00:00
Blue Swirl	e31b0a7c05	TCG: fix copy propagation Copy propagation introduced in `22613af4a6` considered only global registers. However, register temps and stack allocated locals must be handled differently because register temps don't survive across brcond. Fix by propagating only within same class of temps. Tested-by: Stefan Weil <weil@mail.berlios.de> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2011-08-07 09:33:20 +00:00
Blue Swirl	2ec00650f6	TCG: fix breakage by previous patch Fix incorrect logic and typos in previous commit `1bfd07bdfe`. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2011-07-30 18:54:23 +00:00
Blue Swirl	1bfd07bdfe	TCG: fix breakage on some RISC hosts Fix breakage by `a640f03178` and `55c0975c5b`. Some TCG targets don't implement all TCG ops, so make optimizing those conditional. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2011-07-30 12:21:33 +00:00
Kirill Batuzov	a640f03178	Do constant folding for unary operations. Perform constant folding for NOT and EXT{8,16,32}{S,U} operations. Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2011-07-30 10:51:30 +00:00
Kirill Batuzov	55c0975c5b	Do constant folding for shift operations. Perform constant forlding for SHR, SHL, SAR, ROTR, ROTL operations. Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2011-07-30 10:51:29 +00:00
Kirill Batuzov	9a81090b12	Do constant folding for boolean operations. Perform constant folding for AND, OR, XOR operations. Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2011-07-30 10:51:29 +00:00
Kirill Batuzov	53108fb574	Do constant folding for basic arithmetic operations. Perform actual constant folding for ADD, SUB and MUL operations. Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2011-07-30 10:51:28 +00:00
Kirill Batuzov	22613af4a6	Add copy and constant propagation. Make tcg_constant_folding do copy and constant propagation. It is a preparational work before actual constant folding. Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2011-07-30 10:51:27 +00:00
Kirill Batuzov	8f2e8c07a6	Add TCG optimizations stub Added file tcg/optimize.c to hold TCG optimizations. Function tcg_optimize is called from tcg_gen_code_common. It calls other functions performing specific optimizations. Stub for constant folding was added. Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2011-07-30 10:51:25 +00:00

1 2 3 4

195 Commits