mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Aurelien Jarno	e5138db510	tcg: mark local temps as MEM in dead_temp() In dead_temp, local temps should always be marked as back to memory, even if they have not been allocated (i.e. they are discared before cross a basic block). It fixes the following assertion in target-xtensa: qemu-system-xtensa: tcg/tcg.c:1665: temp_save: Assertion `s->temps[temp].val_type == 2 \|\| s->temps[temp].fixed_reg' failed. Aborted Reported-by: Max Filippov <jcmvbkbc@gmail.com> Tested-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-11-24 13:24:13 +01:00
Aurelien Jarno	7aab08aa78	tcg/arm: fix cross-endian qemu_st16 The bswap16 TCG opcode assumes that the high bytes of the temp equal to 0 before calling it. The ARM backend implementation takes this assumption to slightly optimize the generated code. The same implementation is called for implementing the cross-endian qemu_st16 opcode, where this assumption is not true anymore. One way to fix that would be to zero the high bytes before calling it. Given the store instruction just ignore them, it is possible to provide a slightly more optimized version. With ARMv6+ the rev16 instruction does the work correctly. For lower ARM versions the patch provides a version which behaves correctly with non-zero high bytes, but fill them with junk. Cc: Andrzej Zaborowski <balrogg@gmail.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: qemu-stable@nongnu.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-11-24 13:19:53 +01:00
Aurelien Jarno	d17bd1d8cc	tcg/arm: fix TLB access in qemu-ld/st ops The TCG arm backend considers likely that the offset to the TLB entries does not exceed 12 bits for mem_index = 0. In practice this is not true for at least the MIPS target. The current patch fixes that by loading the bits 23-12 with a separate instruction, and using loads with address writeback, independently of the value of mem_idx. In total this allow a 24-bit offset, which is a lot more than needed. Cc: Andrzej Zaborowski <balrogg@gmail.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: qemu-stable@nongnu.org Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-11-24 13:19:53 +01:00
malc	ecf51c9abe	tcg/ppc: Fix !softmmu case Signed-off-by: malc <av1474@comtv.ru>	2012-11-21 10:56:22 +04:00
malc	ecdffbccd7	tcg/ppc: Remove unused s_bits variable Thanks to Alexander Graf for heads up. Signed-off-by: malc <av1474@comtv.ru>	2012-11-19 22:22:24 +04:00
Stefan Weil	e24dc9feb0	tci: Support deposit operations The operations for INDEX_op_deposit_i32 and INDEX_op_deposit_i64 are now supported and enabled by default. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-11-18 20:40:08 +00:00
Evgeny Voevodin	83eeb39669	TCG: Remove unused global variables Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-11-17 13:53:38 +00:00
Evgeny Voevodin	1ff0a2c594	TCG: Use gen_opparam_buf from context instead of global variable. Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-11-17 13:53:37 +00:00
Evgeny Voevodin	92414b31e7	TCG: Use gen_opc_buf from context instead of global variable. Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-11-17 13:53:36 +00:00
Evgeny Voevodin	c4afe5c4d3	TCG: Use gen_opparam_ptr from context instead of global variable. Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-11-17 13:53:34 +00:00
Evgeny Voevodin	efd7f48600	TCG: Use gen_opc_ptr from context instead of global variable. Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-11-17 13:53:27 +00:00
Evgeny Voevodin	8232a46a16	tcg/tcg.h: Duplicate global TCG variables in TCGContext Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-11-17 13:53:26 +00:00
Kirill Batuzov	3c5645fab3	tcg: properly check that op's output needs to be synced to memory Fix typo introduced in `b3a1be87ba`. Reported-by: Ruslan Savchenko <ruslan.savchenko@gmail.com> Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-11-11 16:06:46 +01:00
malc	c878da3b27	tcg/ppc32: Use trampolines to trim the code size for mmu slow path accessors mmu access looks something like: <check tlb> if miss goto slow_path <fast path> done: ... ; end of the TB slow_path: <pre process> mr r3, r27 ; move areg0 to r3 ; (r3 holds the first argument for all the PPC32 ABIs) <call mmu_helper> b $+8 .long done <post process> b done On ppc32 <call mmu_helper> is: (SysV and Darwin) mmu_helper is most likely not within direct branching distance from the call site, necessitating a. moving 32 bit offset of mmu_helper into a GPR ; 8 bytes b. moving GPR to CTR/LR ; 4 bytes c. (finally) branching to CTR/LR ; 4 bytes r3 setting - 4 bytes call - 16 bytes dummy jump over retaddr - 4 bytes embedded retaddr - 4 bytes Total overhead - 28 bytes (PowerOpen (AIX)) a. moving 32 bit offset of mmu_helper's TOC into a GPR1 ; 8 bytes b. loading 32 bit function pointer into GPR2 ; 4 bytes c. moving GPR2 to CTR/LR ; 4 bytes d. loading 32 bit small area pointer into R2 ; 4 bytes e. (finally) branching to CTR/LR ; 4 bytes r3 setting - 4 bytes call - 24 bytes dummy jump over retaddr - 4 bytes embedded retaddr - 4 bytes Total overhead - 36 bytes Following is done to trim the code size of slow path sections: In tcg_target_qemu_prologue trampolines are emitted that look like this: trampoline: mfspr r3, LR addi r3, 4 mtspr LR, r3 ; fixup LR to point over embedded retaddr mr r3, r27 <jump mmu_helper> ; tail call of sorts And slow path becomes: slow_path: <pre process> <call trampoline> .long done <post process> b done call - 4 bytes (trampoline is within code gen buffer and most likely accessible via direct branch) embedded retaddr - 4 bytes Total overhead - 8 bytes In the end the icache pressure is decreased by 20/28 bytes at the cost of an extra jump to trampoline and adjusting LR (to skip over embedded retaddr) once inside. Signed-off-by: malc <av1474@comtv.ru>	2012-11-06 04:37:57 +04:00
malc	ed224a56b3	tcg/ppc: ld/st optimization Signed-off-by: malc <av1474@comtv.ru>	2012-11-03 20:17:54 +04:00
Yeongkyoon Lee	b76f0d8c2e	tcg: Optimize qemu_ld/st by generating slow paths at the end of a block Add optimized TCG qemu_ld/st generation which locates the code of TLB miss cases at the end of a block after generating the other IRs. Currently, this optimization supports only i386 and x86_64 hosts. Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-11-03 09:44:21 +00:00
Aurelien Jarno	b3a1be87ba	tcg: don't remove op if output needs to be synced to memory Commit `9c43b68de6` do not correctly check for dead outputs when they need to be synced to memory in case of half-dead operations. Fix that by applying the same pattern than for the default case. Tested-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-31 22:20:45 +01:00
Aurelien Jarno	3585317f6f	tcg/mips: use MUL instead of MULT on MIPS32 and above MIPS32 and later instruction sets have a multiplication instruction directly operating on GPRs. It only produces a 32-bit result but it is exactly what is needed by QEMU. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-30 00:34:48 +01:00
Richard Henderson	44b37ace06	tcg-i386: Use %gs prefixes for x86_64 GUEST_BASE When we allocate a reserved_va for the guest, the kernel will likely choose an address well above 4G. At which point we must use a pair of movabsq+addq to form the host address. If we have OS support, set up a segment register to point to guest_base instead. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:25 +01:00
Aurelien Jarno	b393ab4228	tcg: remove compatiblity call flags Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:24 +01:00
Aurelien Jarno	7850527966	tcg: rework TCG helper flags The current helper flags, TCG_CALL_CONST and TCG_CALL_PURE might be confusing and doesn't provide enough granularity for some helpers (FP helpers for example). This patch changes them into the following helpers flags: - TCG_CALL_NO_READ_GLOBALS means that the helper does not read globals, either directly or via an exception. They will not be saved to their canonical location before calling the helper. - TCG_CALL_NO_WRITE_GLOBALS means that the helper does not modify any globals. They will only be saved to their canonical locations before calling helpers, but they won't be reloaded afterwise. - TCG_CALL_NO_SIDE_EFFECTS means that the call to the function is removed if the return value is not used. It provides convenience flags, to avoid helper definitions longer than 80 characters. It also provides compatibility flags, and updates the documentation. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:23 +01:00
Aurelien Jarno	3d5c5f876d	tcg: synchronize globals for ops with side effects Operations with side effects (in practice qemu_ld/st ops), only need to synchronize globals to make sure the CPU state is consistent in case of exception. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	b202d41ee7	tcg: forbid ld/st function to modify globals Mapping a memory address using a global and accessing it through ld/st operations is currently broken. As it doesn't make any sense to do that performance wise, let's forbid that. Update the TCG documentation, and remove partial support for that. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	344028ba0f	tcg: fix some op flags Some branch related ops are marked with TCG_OPF_SIDE_EFFECTS, some other not. In practice they don't need to, as they are all marked with TCG_OPF_BB_END, which is handled specifically in all the code. The call op is marked as TCG_OPF_SIDE_EFFECTS, which might be not true as there is are specific flags (TCG_CALL_CONST and TCG_CALL_PURE) for specifying that. On the other hand it always clobber arguments, so mark it as such even if the call op is handled in a different code path. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	2c0366f036	tcg: don't explicitly save globals and temps The liveness analysis ensures that globals and temps are at the correct state at a basic block end or with an op with side effects. Avoid looping on all temps, this can be time consuming on targets with a lot of globals. Keep an assert in debug mode. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	7dfd8c6aa1	tcg: start with local temps in TEMP_VAL_MEM state Start with local temps in TEMP_VAL_MEM state, to make possible a later check that all the temps are correctly saved back to memory. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	a52ad07e7c	tcg: always mark dead input arguments as dead Always mark dead input arguments as dead, even if the op is at the basic block end. This will allow to check that all temps are correctly saved. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	c29c1d7edf	tcg: rewrite tcg_reg_alloc_mov() Now that the liveness analysis provides more information, rewrite tcg_reg_alloc_mov(). This changes the behaviour about propagating constants and memory accesses. We now take the assumption that once a value is loaded into a register (from memory or from a constant), it's better to keep it there than to reload it later. This assumption is now always almost correct given that we are now sure the corresponding temp is going to be used later (otherwise it would have been synchronized and marked as dead already). The assumption is wrong if one of the op after clobbers some registers including the one of the holding the temp (this can be avoided by allocating clobbered registers last, which is what most TCG target do), or in case of lack of available register. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	4c4e1ab26b	tcg: improve tcg_reg_alloc_movi() Now that the liveness analysis might mark some output temps as dead, call temp_dead() if needed. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:21 +01:00
Aurelien Jarno	9c43b68de6	tcg: rework liveness analysis Rework the liveness analysis by tracking temps that need to go back to memory in addition to dead temps tracking. This allows to mark output arguments as "need sync", and to synchronize them back to memory as soon as they are not written anymore. This way even arguments mapping to globals can be marked as "dead", avoiding moves to a new register when input and outputs are aliased. In addition it means that registers are freed as soon as temps are not used anymore, instead of waiting for a basic block end or an op with side effects. This reduces register spilling especially on CPUs with few registers, and spread the mov over all the TB, increasing the performances on in-order CPUs. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:21 +01:00
Aurelien Jarno	ec7a869d31	tcg: sync output arguments on liveness request Synchronize an output argument when requested by the liveness analysis. This is needed so that the temp can be declared dead later. For that, add a new op_sync_args table in which each bit tells if the corresponding output argument needs to be synchronized with the memory. Pass it to the tcg_reg_alloc_* functions, and honor this bit. We need to synchronize the argument before marking it as dead, and we have to make sure all the infos about the temp are correctly filled. At the same time change some types from unsigned int to uint16_t when passing op_dead_args. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:21 +01:00
Aurelien Jarno	1ad80729be	tcg: add temp_sync() Add a new function temp_sync() to synchronize the canonical location of a temp with the value in the corresponding register, but without freeing the associated register. Rewrite temp_save() to call temp_sync() followed by temp_dead(). Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:21 +01:00
Aurelien Jarno	7f6ceedf9c	tcg: add tcg_reg_sync() Add a new function tcg_reg_sync() to synchronize the canonical location of a temp with the value in the associated register, but without freeing it. Rewrite tcg_reg_free() to first call tcg_reg_sync() and then to free the register. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:21 +01:00
Aurelien Jarno	639368dd68	tcg: add temp_dead() A lot of code is duplicated to mark a temporary as dead. Replace it by temp_dead(), which in addition marks the temp as saved in memory for globals and local temps, instead of doing this a posteriori in temp_save(). Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:21 +01:00
Aurelien Jarno	17b914912d	tcg/i386: remove ld/st third argument register constraint On x86_64, remove the constraint on the third argument register which is not needed: - For loads the helper arguments are env, addr, mem_idx. The addr value should not be in the two first argument registers as they are used in tcg_out_tlb_load(). - For stores the helper arguments are env, addr, data, mem_idx. The addr and data values should not be in the two first argument registers as they are used in tcg_out_tlb_load(). The data value should also not be in the two first argument registers, but could be in the third argument register in which case it would be already loaded at the right location. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:15 +01:00
Aurelien Jarno	166792f7bb	tcg/i386: remove suboptimal register shifting Now that CONFIG_TCG_PASS_AREG0 has been removed, it's easier to get an optimal code for the load/store functions. First swap the two registers used in tcg_out_tlb_load() so that the address end-up in the second register instead of the first one. Adjust tcg_out_qemu_ld() and tcg_out_qemu_st() to respectively call tcg_out_qemu_ld_direct() and tcg_out_qemu_st_direct() with the correct registers. Then replace the register shifting by direct load of the arguments. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:05 +01:00
Richard Henderson	4438c8a946	exec: Allocate code_gen_prologue from code_gen_buffer We had a hack for arm and sparc, allocating code_gen_prologue to a special section. Which, honestly does no good under certain cases. We've already got limits on code_gen_buffer_size to ensure that all TBs can use direct branches between themselves; reuse this limit to ensure the prologue is also reachable. As a bonus, we get to avoid marking a page of the main executable's data segment as executable. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-10-20 07:54:04 +00:00
Aurelien Jarno	41a05a4576	Merge branch 'linux-user-for-upstream' of git://git.linaro.org/people/rikuvoipio/qemu * 'linux-user-for-upstream' of git://git.linaro.org/people/rikuvoipio/qemu: linux-user: register align p{read, write}64 linux-user: ppc: mark as long long aligned tcg: Remove TCG_TARGET_HAS_GUEST_BASE define configure: Remove unnecessary host_guest_base code linux-user: If loading fails, print error as string, not number linux-user: Fix siginfo handling alpha-linux-user: Fix sigaltstack structure definition linux-user: Implement gethostname linux-user: Perform more checks on iovec lists linux-user: fix multi-threaded /proc/self/maps linux-user: fix statfs	2012-10-19 20:28:22 +02:00
Richard Henderson	1414968a6a	tcg: Optimize mulu2 Like add2, do operand ordering, constant folding, and dead operand elimination. The latter happens about 15% of all mulu2 during an x86_64 bios boot. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:51:39 +02:00
Richard Henderson	1305c451e6	tcg: Optimize half-dead add2/sub2 When x86_64 guest is not in 64-bit mode, the high-part of the 64-bit add is dead. When the host is 32-bit, we can simplify to 32-bit arithmetic. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:51:37 +02:00
Richard Henderson	212c328d61	tcg: Constant fold add2 and sub2 Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:51:37 +02:00
Richard Henderson	6c4382f8f4	tcg: Do constant folding on double-word comparisons Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:51:35 +02:00
Richard Henderson	9519da7e39	tcg: Split out subroutines from do_constant_folding_cond We can re-use these for implementing double-word folding. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:51:32 +02:00
Richard Henderson	bc1473eff4	tcg: Optimize double-word comparisons against zero Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:32:29 +02:00
Richard Henderson	6e14e91b66	tcg: Use common code when failing to optimize This saves a whole lot of repetitive code sequences. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:32:01 +02:00
Richard Henderson	0bfcb86538	tcg: Swap commutative double-word comparisons Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:31:57 +02:00
Richard Henderson	1e484e61e2	tcg: Canonicalize add2 operand ordering Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:31:53 +02:00
Richard Henderson	24c9ae4eba	tcg: Split out swap_commutative as a subroutine Reduces code duplication and prefers movcond d, c1, c2, const, s to movcond d, c1, c2, s, const It also prefers add r, r, c over add r, c, r when both inputs are known constants. This doesn't matter for true add, as we will fully constant fold that. But it matters for a follow-on patch using this routine for add2 which may not be fully foldable. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:30:40 +02:00
Richard Henderson	c7d4475a70	tcg-ia64: Implement deposit Note that in the general reg=reg,reg case we're restricted to 16-bit insertions. This makes it easy to allow "any" constant as input, as post-truncation it will fit into the constant load insn for which we have room in the bundle. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 01:26:43 +02:00
Aurelien Jarno	63975ea7df	tcg/ia64: slightly optimize TLB access code It is possible to slightly optimize the TLB access code, by replacing the movi + and instructions by a deposit instruction. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 01:26:43 +02:00

1 2 3 4 5 ...

736 Commits