Commit Graph

825 Commits

Author SHA1 Message Date
Richard Henderson
a22971f99f tcg-s390: Fix movi
The code to load the high 64 bits assumed that the insn used to
load the low 64 bits zero-extended.  Enforce that.
2013-04-05 13:35:39 -05:00
Aurelien Jarno
174d4d215f tcg/mips: Implement muls2_i32
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2013-04-01 18:49:17 +02:00
Richard Henderson
2d497542e1 tcg-optimize: Fold sub r,0,x to neg r,x
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-03-23 14:31:03 +00:00
陳韋任 (Wei-Ren Chen)
294e4669a5 Use proper term in TCG README
In TCG, "target" means the host architecture for which TCG generates
the code. Using "guest" rather than "target" to make the document more
consistent.

Signed-off-by: Chen Wei-Ren <chenwj@iis.sinica.edu.tw>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-22 15:55:03 +01:00
Peter Maydell
378df4b237 Handle CPU interrupts by inline checking of a flag
Fix some of the nasty TCG race conditions and crashes by implementing
cpu_exit() as setting a flag which is checked at the start of each TB.
This avoids crashes if a thread or signal handler calls cpu_exit()
while the execution thread is itself modifying the TB graph (which
may happen in system emulation mode as well as in linux-user mode
with a multithreaded guest binary).

This fixes the crashes seen in LP:668799; however there are another
class of crashes described in LP:1098729 which stem from the fact
that in linux-user with a multithreaded guest all threads will
use and modify the same global TCG date structures (including the
generated code buffer) without any kind of locking. This means that
multithreaded guest binaries are still in the "unsupported"
category.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-03-03 14:28:47 +00:00
Peter Maydell
0980011b4f tcg: Document tcg_qemu_tb_exec() and provide constants for low bit uses
Document tcg_qemu_tb_exec(). In particular, its return value is a
combination of a pointer to the next translation block and some
extra information in the low two bits. Provide some #defines for
the values passed in these bits to improve code clarity.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-03-03 14:28:19 +00:00
Blue Swirl
07ca08bac8 tcg-sparc: fix build
Fix build breakage by 803d805bce:
make tcg_out_addsub2() always available.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-03-03 08:25:50 +00:00
Peter Maydell
989b697ddd qemu-log: default to stderr for logging output
Switch the default for qemu_log logging output from "/tmp/qemu.log"
to stderr. This is an incompatible change in some sense, but logging
is mostly used for debugging purposes so it shouldn't affect production
use. The previous behaviour can be obtained by adding "-D /tmp/qemu.log"
to the command line.

This change requires us to:
 * update all the documentation/help text (we take the opportunity
   to smooth out minor inconsistencies between the phrasing in
   linux-user/bsd-user/system help messages)
 * make linux-user and bsd-user defer to qemu-log for the default
   logging destination rather than overriding it themselves
 * ensure that all logfile closing is done via qemu_log_close()
   and that that function doesn't close stderr
as well as the obvious change to the behaviour of do_qemu_set_log()
when no logfile name has been specified.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1361901160-28729-1-git-send-email-peter.maydell@linaro.org
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-02-26 13:31:47 -06:00
Richard Henderson
f1fae40c61 tcg: Apply life analysis to 64-bit multiword arithmetic ops
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:29 +00:00
Richard Henderson
f402f38f43 tcg: Implement muls2 with mulu2
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:29 +00:00
Richard Henderson
d693e14733 tcg-arm: Implement muls2_i32
We even had the encoding of smull already handy...

Cc: Andrzej Zaborowski <balrogg@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:29 +00:00
Richard Henderson
624988a53b tcg-i386: Implement multiword arithmetic ops
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:29 +00:00
Richard Henderson
f6953a7399 tcg: Implement multiword addition helpers
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Richard Henderson
696a8be6a0 tcg: Implement multiword multiply helpers
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Richard Henderson
3c51a98507 tcg: Implement a 64-bit to 32-bit extraction helper
We're going to have use for this shortly in implementing other helpers.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Richard Henderson
4d3203fd0b tcg: Add signed multiword multiplication operations
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Richard Henderson
d7156f7ce4 tcg: Add 64-bit multiword arithmetic operations
Matching the 32-bit multiword arithmetic that we already have.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Richard Henderson
803d805bce tcg-sparc: Always implement 32-bit multiword ops
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Richard Henderson
bbc863bfec tcg-i386: Always implement 32-bit multiword ops
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Richard Henderson
e6a7273454 tcg: Make 32-bit multiword operations optional for 64-bit hosts
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Andreas Färber
be96bd3fbf tcg/ppc: Fix build of tcg_qemu_tb_exec()
Commit 0b0d3320db (TCG: Final globals
clean-up) moved code_gen_prologue but forgot to update ppc code.
This broke the build on 32-bit ppc. ppc64 is unaffected.

Cc: Evgeny Voevodin <evgenyvoevodin@gmail.com>
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-17 14:27:36 +00:00
Peter Maydell
24537a0191 qemu-log: Rename the public-facing cpu_set_log function to qemu_set_log
Rename the public-facing function cpu_set_log to qemu_set_log. This
requires us to rename the internal-only qemu_set_log() to
do_qemu_set_log().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-16 10:44:44 +00:00
Evgeny Voevodin
5e5f07e08f TCG: Move translation block variables to new context inside tcg_ctx: tb_ctx
It's worth to clean-up translation blocks variables and move them
into one context as was suggested by Swirl.
Also if we use this context directly inside tcg_ctx, then it
speeds up code generation a bit.

Signed-off-by: Evgeny Voevodin <evgenyvoevodin@gmail.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-16 10:41:16 +00:00
Evgeny Voevodin
0b0d3320db TCG: Final globals clean-up
Signed-off-by: Evgeny Voevodin <evgenyvoevodin@gmail.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-16 10:40:56 +00:00
Peter Maydell
5256a7208a tcg/target-arm: Add missing parens to assertions
Silence a (legitimate) complaint about missing parentheses:

tcg/arm/tcg-target.c: In function ‘tcg_out_qemu_ld’:
tcg/arm/tcg-target.c:1148:5: error: suggest parentheses around
comparison in operand of ‘&’ [-Werror=parentheses]
tcg/arm/tcg-target.c: In function ‘tcg_out_qemu_st’:
tcg/arm/tcg-target.c:1357:5: error: suggest parentheses around
comparison in operand of ‘&’ [-Werror=parentheses]

which meant that we would mistakenly always assert if running
a QEMU built with debug enabled on ARM.

Signed-off-by: Peter Maydell <peter.maydelL@linaro.org>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-19 10:27:45 +00:00
Paolo Bonzini
633f650254 optimize: optimize using nonzero bits
This adds two optimizations using the non-zero bit mask.  In some cases
involving shifts or ANDs the value can become zero, and can thus be
optimized to a move of zero.  Second, useless zero-extension or an
AND with constant can be detected that would only zero bits that are
already zero.

The main advantage of this optimization is that it turns zero-extensions
into moves, thus enabling much better copy propagation (around 1% code
reduction).  Here is for example a "test $0xff0000,%ecx + je" before
optimization:

 mov_i64 tmp0,rcx
 movi_i64 tmp1,$0xff0000
 discard cc_src
 and_i64 cc_dst,tmp0,tmp1
 movi_i32 cc_op,$0x1c
 ext32u_i64 tmp0,cc_dst
 movi_i64 tmp12,$0x0
 brcond_i64 tmp0,tmp12,eq,$0x0

and after (without patch on the left, with on the right):

 movi_i64 tmp1,$0xff0000                 movi_i64 tmp1,$0xff0000
 discard cc_src                          discard cc_src
 and_i64 cc_dst,rcx,tmp1                 and_i64 cc_dst,rcx,tmp1
 movi_i32 cc_op,$0x1c                    movi_i32 cc_op,$0x1c
 ext32u_i64 tmp0,cc_dst
 movi_i64 tmp12,$0x0                     movi_i64 tmp12,$0x0
 brcond_i64 tmp0,tmp12,eq,$0x0           brcond_i64 cc_dst,tmp12,eq,$0x0

Other similar cases: "test %eax, %eax + jne" where eax is already 32-bit
(after optimization, without patch on the left, with on the right):

 discard cc_src                          discard cc_src
 mov_i64 cc_dst,rax                      mov_i64 cc_dst,rax
 movi_i32 cc_op,$0x1c                    movi_i32 cc_op,$0x1c
 ext32u_i64 tmp0,cc_dst
 movi_i64 tmp12,$0x0                     movi_i64 tmp12,$0x0
 brcond_i64 tmp0,tmp12,ne,$0x0           brcond_i64 rax,tmp12,ne,$0x0

"test $0x1, %dl + je":

 movi_i64 tmp1,$0x1                      movi_i64 tmp1,$0x1
 discard cc_src                          discard cc_src
 and_i64 cc_dst,rdx,tmp1                 and_i64 cc_dst,rdx,tmp1
 movi_i32 cc_op,$0x1a                    movi_i32 cc_op,$0x1a
 ext8u_i64 tmp0,cc_dst
 movi_i64 tmp12,$0x0                     movi_i64 tmp12,$0x0
 brcond_i64 tmp0,tmp12,eq,$0x0           brcond_i64 cc_dst,tmp12,eq,$0x0

In some cases TCG even outsmarts GCC. :)  Here the input code has
"and $0x2,%eax + movslq %eax,%rbx + test %rbx, %rbx" and the optimizer,
thanks to copy propagation, does the following:

 movi_i64 tmp12,$0x2                     movi_i64 tmp12,$0x2
 and_i64 rax,rax,tmp12                   and_i64 rax,rax,tmp12
 mov_i64 cc_dst,rax                      mov_i64 cc_dst,rax
 ext32s_i64 tmp0,rax                  -> nop
 mov_i64 rbx,tmp0                     -> mov_i64 rbx,cc_dst
 and_i64 cc_dst,rbx,rbx               -> nop

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-19 10:13:16 +00:00
Paolo Bonzini
3a9d8b179b optimize: track nonzero bits of registers
Add a "mask" field to the tcg_temp_info struct.  A bit that is zero
in "mask" will always be zero in the corresponding temporary.
Zero bits in the mask can be produced from moves of immediates,
zero-extensions, ANDs with constants, shifts; they can then be
be propagated by logical operations, shifts, sign-extensions,
negations, deposit operations, and conditional moves.  Other
operations will just reset the mask to all-ones, i.e. unknown.

[rth: s/target_ulong/tcg_target_ulong/]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-19 10:13:14 +00:00
Paolo Bonzini
d193a14a2c optimize: only write to state when clearing optimizer data
The next patch will add to the TCG optimizer a field that should be
non-zero in the default case.  Thus, replace the memset of the
temps array with a loop.  Only the state field has to be up-to-date,
because others are not used except if the state is TCG_TEMP_COPY
or TCG_TEMP_CONST.

[rth: Extracted the loop to a function.]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-19 10:13:13 +00:00
Paolo Bonzini
163fa4b09d tcg-i386: use LEA for 3-operand 64-bit addition
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-12 12:45:56 +00:00
Stefan Weil
9a8a5ae69d tcg: Remove unneeded assertion
Commit 7f6f0ae5b9 added two assertions.

One of these assertions is not needed:
The pointer ts is never NULL because it is initialized with the
address of an array element.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-01-02 11:23:21 -06:00
Richard Henderson
753d99d38b tcg-hppa: Fix typo in brcond2
Reported-by: Stuart Brady <sdb@zubnet.me.uk>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-12-29 12:21:53 +00:00
Richard Henderson
76a347e1cd tcg-i386: Perform cmov detection at runtime for 32-bit.
Existing compile-time detection is spotty at best.  Convert
it all to runtime detection instead.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-12-29 12:21:16 +00:00
Richard Henderson
afcb92beac tcg: Add TCGV_IS_UNUSED_*
Cc: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-12-29 12:14:07 +00:00
Paolo Bonzini
1de7afc984 misc: move include files to include/qemu/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:32:39 +01:00
Paolo Bonzini
022c62cbbc exec: move include files to include/exec/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:31 +01:00
Paolo Bonzini
cb9c377f54 janitor: add guards to headers
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:31 +01:00
Anthony Liguori
7c12fd9b29 Merge remote-tracking branch 'stefanha/trivial-patches' into staging
* stefanha/trivial-patches:
  pc_sysfw: Plug memory leak on pc_fw_add_pflash_drv() error path
  qemu-options: Fix space at EOL
  Fix spelling in comments and documentation
  Clean up pci_drive_hot_add()'s use of BlockInterfaceType
  arm: a9mpcore: remove un-used ptimer_iomem field
  target-sparc: Remove t0, t1 from CPUSPARCState
  target-m68k: Remove t1 from CPUM68KState
  target-alpha: Remove t0, t1 from CPUAlphaState
  s390x: Spelling fixes (endianess -> endianness, occured -> occurred)
  Fix comments (adress -> address, layed -> laid, wierd -> weird)
  Fix spelling (prefered -> preferred)
  configure: Remove stray debug output
  sd: Send debug printfery to stderr not stdout

Conflicts:
	configure

Resolve spelling conflict in configure.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-12-10 08:34:29 -06:00
Evgeny Voevodin
c3a43607d9 tcg/tcg.h: Duplicate global TCG gen_opc_ arrays into TCGContext.
Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-12-08 14:24:41 +00:00
Stefan Weil
a93cf9dfba Fix comments (adress -> address, layed -> laid, wierd -> weird)
Remove also a duplicated 'the'.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-12-07 12:34:11 +01:00
Aurelien Jarno
e5138db510 tcg: mark local temps as MEM in dead_temp()
In dead_temp, local temps should always be marked as back to memory,
even if they have not been allocated (i.e. they are discared before
cross a basic block).

It fixes the following assertion in target-xtensa:

    qemu-system-xtensa: tcg/tcg.c:1665: temp_save: Assertion `s->temps[temp].val_type == 2 || s->temps[temp].fixed_reg' failed.
    Aborted

Reported-by: Max Filippov <jcmvbkbc@gmail.com>
Tested-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-11-24 13:24:13 +01:00
Aurelien Jarno
7aab08aa78 tcg/arm: fix cross-endian qemu_st16
The bswap16 TCG opcode assumes that the high bytes of the temp equal
to 0 before calling it. The ARM backend implementation takes this
assumption to slightly optimize the generated code.

The same implementation is called for implementing the cross-endian
qemu_st16 opcode, where this assumption is not true anymore. One way to
fix that would be to zero the high bytes before calling it. Given the
store instruction just ignore them, it is possible to provide a slightly
more optimized version. With ARMv6+ the rev16 instruction does the work
correctly. For lower ARM versions the patch provides a version which
behaves correctly with non-zero high bytes, but fill them with junk.

Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-11-24 13:19:53 +01:00
Aurelien Jarno
d17bd1d8cc tcg/arm: fix TLB access in qemu-ld/st ops
The TCG arm backend considers likely that the offset to the TLB
entries does not exceed 12 bits for mem_index = 0. In practice this is
not true for at least the MIPS target.

The current patch fixes that by loading the bits 23-12 with a separate
instruction, and using loads with address writeback, independently of
the value of mem_idx. In total this allow a 24-bit offset, which is a
lot more than needed.

Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-stable@nongnu.org
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-11-24 13:19:53 +01:00
malc
ecf51c9abe tcg/ppc: Fix !softmmu case
Signed-off-by: malc <av1474@comtv.ru>
2012-11-21 10:56:22 +04:00
malc
ecdffbccd7 tcg/ppc: Remove unused s_bits variable
Thanks to Alexander Graf for heads up.

Signed-off-by: malc <av1474@comtv.ru>
2012-11-19 22:22:24 +04:00
Stefan Weil
e24dc9feb0 tci: Support deposit operations
The operations for INDEX_op_deposit_i32 and INDEX_op_deposit_i64
are now supported and enabled by default.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-18 20:40:08 +00:00
Evgeny Voevodin
83eeb39669 TCG: Remove unused global variables
Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-17 13:53:38 +00:00
Evgeny Voevodin
1ff0a2c594 TCG: Use gen_opparam_buf from context instead of global variable.
Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-17 13:53:37 +00:00
Evgeny Voevodin
92414b31e7 TCG: Use gen_opc_buf from context instead of global variable.
Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-17 13:53:36 +00:00
Evgeny Voevodin
c4afe5c4d3 TCG: Use gen_opparam_ptr from context instead of global variable.
Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-17 13:53:34 +00:00
Evgeny Voevodin
efd7f48600 TCG: Use gen_opc_ptr from context instead of global variable.
Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-17 13:53:27 +00:00
Evgeny Voevodin
8232a46a16 tcg/tcg.h: Duplicate global TCG variables in TCGContext
Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-17 13:53:26 +00:00
Kirill Batuzov
3c5645fab3 tcg: properly check that op's output needs to be synced to memory
Fix typo introduced in b3a1be87ba.

Reported-by: Ruslan Savchenko <ruslan.savchenko@gmail.com>
Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-11-11 16:06:46 +01:00
malc
c878da3b27 tcg/ppc32: Use trampolines to trim the code size for mmu slow path accessors
mmu access looks something like:

<check tlb>
if miss goto slow_path
<fast path>
done:
...

; end of the TB
slow_path:
 <pre process>
 mr r3, r27         ; move areg0 to r3
                    ; (r3 holds the first argument for all the PPC32 ABIs)
 <call mmu_helper>
 b $+8
 .long done
 <post process>
 b done

On ppc32 <call mmu_helper> is:

(SysV and Darwin)

mmu_helper is most likely not within direct branching distance from
the call site, necessitating

a. moving 32 bit offset of mmu_helper into a GPR ; 8 bytes
b. moving GPR to CTR/LR                          ; 4 bytes
c. (finally) branching to CTR/LR                 ; 4 bytes

r3 setting              - 4 bytes
call                    - 16 bytes
dummy jump over retaddr - 4 bytes
embedded retaddr        - 4 bytes
         Total overhead - 28 bytes

(PowerOpen (AIX))
a. moving 32 bit offset of mmu_helper's TOC into a GPR1 ; 8 bytes
b. loading 32 bit function pointer into GPR2            ; 4 bytes
c. moving GPR2 to CTR/LR                                ; 4 bytes
d. loading 32 bit small area pointer into R2            ; 4 bytes
e. (finally) branching to CTR/LR                        ; 4 bytes

r3 setting              - 4 bytes
call                    - 24 bytes
dummy jump over retaddr - 4 bytes
embedded retaddr        - 4 bytes
         Total overhead - 36 bytes

Following is done to trim the code size of slow path sections:

In tcg_target_qemu_prologue trampolines are emitted that look like this:

trampoline:
mfspr r3, LR
addi  r3, 4
mtspr LR, r3      ; fixup LR to point over embedded retaddr
mr    r3, r27
<jump mmu_helper> ; tail call of sorts

And slow path becomes:

slow_path:
 <pre process>
 <call trampoline>
 .long done
 <post process>
 b done

call                    - 4 bytes (trampoline is within code gen buffer
                                   and most likely accessible via
                                   direct branch)
embedded retaddr        - 4 bytes
         Total overhead - 8 bytes

In the end the icache pressure is decreased by 20/28 bytes at the cost
of an extra jump to trampoline and adjusting LR (to skip over embedded
retaddr) once inside.

Signed-off-by: malc <av1474@comtv.ru>
2012-11-06 04:37:57 +04:00
malc
ed224a56b3 tcg/ppc: ld/st optimization
Signed-off-by: malc <av1474@comtv.ru>
2012-11-03 20:17:54 +04:00
Yeongkyoon Lee
b76f0d8c2e tcg: Optimize qemu_ld/st by generating slow paths at the end of a block
Add optimized TCG qemu_ld/st generation which locates the code of TLB miss
cases at the end of a block after generating the other IRs.
Currently, this optimization supports only i386 and x86_64 hosts.

Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-03 09:44:21 +00:00
Aurelien Jarno
b3a1be87ba tcg: don't remove op if output needs to be synced to memory
Commit 9c43b68de6 do not correctly check
for dead outputs when they need to be synced to memory in case of
half-dead operations.

Fix that by applying the same pattern than for the default case.

Tested-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-31 22:20:45 +01:00
Aurelien Jarno
3585317f6f tcg/mips: use MUL instead of MULT on MIPS32 and above
MIPS32 and later instruction sets have a multiplication instruction
directly operating on GPRs. It only produces a 32-bit result but
it is exactly what is needed by QEMU.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-30 00:34:48 +01:00
Richard Henderson
44b37ace06 tcg-i386: Use %gs prefixes for x86_64 GUEST_BASE
When we allocate a reserved_va for the guest, the kernel will likely
choose an address well above 4G.  At which point we must use a pair
of movabsq+addq to form the host address.  If we have OS support,
set up a segment register to point to guest_base instead.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:25 +01:00
Aurelien Jarno
b393ab4228 tcg: remove compatiblity call flags
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:24 +01:00
Aurelien Jarno
7850527966 tcg: rework TCG helper flags
The current helper flags, TCG_CALL_CONST and TCG_CALL_PURE might be
confusing and doesn't provide enough granularity for some helpers (FP
helpers for example).

This patch changes them into the following helpers flags:
- TCG_CALL_NO_READ_GLOBALS means that the helper does not read globals,
  either directly or via an exception. They will not be saved to their
  canonical location before calling the helper.
- TCG_CALL_NO_WRITE_GLOBALS means that the helper does not modify any
  globals. They will only be saved to their canonical locations before
  calling helpers, but they won't be reloaded afterwise.
- TCG_CALL_NO_SIDE_EFFECTS means that the call to the function is
  removed if the return value is not used.

It provides convenience flags, to avoid helper definitions longer than
80 characters. It also provides compatibility flags, and updates the
documentation.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:23 +01:00
Aurelien Jarno
3d5c5f876d tcg: synchronize globals for ops with side effects
Operations with side effects (in practice qemu_ld/st ops), only need to
synchronize globals to make sure the CPU state is consistent in case of
exception.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:22 +01:00
Aurelien Jarno
b202d41ee7 tcg: forbid ld/st function to modify globals
Mapping a memory address using a global and accessing it through
ld/st operations is currently broken. As it doesn't make any sense
to do that performance wise, let's forbid that.

Update the TCG documentation, and remove partial support for that.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:22 +01:00
Aurelien Jarno
344028ba0f tcg: fix some op flags
Some branch related ops are marked with TCG_OPF_SIDE_EFFECTS, some other
not. In practice they don't need to, as they are all marked with
TCG_OPF_BB_END, which is handled specifically in all the code.

The call op is marked as TCG_OPF_SIDE_EFFECTS, which might be not true
as there is are specific flags (TCG_CALL_CONST and TCG_CALL_PURE) for
specifying that. On the other hand it always clobber arguments, so mark
it as such even if the call op is handled in a different code path.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:22 +01:00
Aurelien Jarno
2c0366f036 tcg: don't explicitly save globals and temps
The liveness analysis ensures that globals and temps are at the correct
state at a basic block end or with an op with side effects. Avoid
looping on all temps, this can be time consuming on targets with a lot
of globals. Keep an assert in debug mode.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:22 +01:00
Aurelien Jarno
7dfd8c6aa1 tcg: start with local temps in TEMP_VAL_MEM state
Start with local temps in TEMP_VAL_MEM state, to make possible a later
check that all the temps are correctly saved back to memory.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:22 +01:00
Aurelien Jarno
a52ad07e7c tcg: always mark dead input arguments as dead
Always mark dead input arguments as dead, even if the op is at the basic
block end. This will allow to check that all temps are correctly saved.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:22 +01:00
Aurelien Jarno
c29c1d7edf tcg: rewrite tcg_reg_alloc_mov()
Now that the liveness analysis provides more information, rewrite
tcg_reg_alloc_mov(). This changes the behaviour about propagating
constants and memory accesses. We now take the assumption that once
a value is loaded into a register (from memory or from a constant),
it's better to keep it there than to reload it later. This assumption
is now always almost correct given that we are now sure the
corresponding temp is going to be used later (otherwise it would have
been synchronized and marked as dead already). The assumption is wrong
if one of the op after clobbers some registers including the one
of the holding the temp (this can be avoided by allocating clobbered
registers last, which is what most TCG target do), or in case of lack
of available register.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:22 +01:00
Aurelien Jarno
4c4e1ab26b tcg: improve tcg_reg_alloc_movi()
Now that the liveness analysis might mark some output temps as dead, call
temp_dead() if needed.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:21 +01:00
Aurelien Jarno
9c43b68de6 tcg: rework liveness analysis
Rework the liveness analysis by tracking temps that need to go back to
memory in addition to dead temps tracking. This allows to mark output
arguments as "need sync", and to synchronize them back to memory as soon
as they are not written anymore. This way even arguments mapping to
globals can be marked as "dead", avoiding moves to a new register when
input and outputs are aliased.

In addition it means that registers are freed as soon as temps are not
used anymore, instead of waiting for a basic block end or an op with side
effects. This reduces register spilling especially on CPUs with few
registers, and spread the mov over all the TB, increasing the
performances on in-order CPUs.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:21 +01:00
Aurelien Jarno
ec7a869d31 tcg: sync output arguments on liveness request
Synchronize an output argument when requested by the liveness analysis.
This is needed so that the temp can be declared dead later.

For that, add a new op_sync_args table in which each bit tells if the
corresponding output argument needs to be synchronized with the memory.
Pass it to the tcg_reg_alloc_* functions, and honor this bit. We need to
synchronize the argument before marking it as dead, and we have to make
sure all the infos about the temp are correctly filled.

At the same time change some types from unsigned int to uint16_t when
passing op_dead_args.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:21 +01:00
Aurelien Jarno
1ad80729be tcg: add temp_sync()
Add a new function temp_sync() to synchronize the canonical location
of a temp with the value in the corresponding register, but without
freeing the associated register. Rewrite temp_save() to call
temp_sync() followed by temp_dead().

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:21 +01:00
Aurelien Jarno
7f6ceedf9c tcg: add tcg_reg_sync()
Add a new function tcg_reg_sync() to synchronize the canonical location
of a temp with the value in the associated register, but without freeing
it. Rewrite tcg_reg_free() to first call tcg_reg_sync() and then to free
the register.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:21 +01:00
Aurelien Jarno
639368dd68 tcg: add temp_dead()
A lot of code is duplicated to mark a temporary as dead. Replace it
by temp_dead(), which in addition marks the temp as saved in memory
for globals and local temps, instead of doing this a posteriori in
temp_save().

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:21 +01:00
Aurelien Jarno
17b914912d tcg/i386: remove ld/st third argument register constraint
On x86_64, remove the constraint on the third argument register which
is not needed:
 - For loads the helper arguments are env, addr, mem_idx. The addr
   value should not be in the two first argument registers as they are
   used in tcg_out_tlb_load().
 - For stores the helper arguments are env, addr, data, mem_idx.
   The addr and data values should not be in the two first argument
   registers as they are used in tcg_out_tlb_load(). The data value
   should also not be in the two first argument registers, but could
   be in the third argument register in which case it would be already
   loaded at the right location.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:15 +01:00
Aurelien Jarno
166792f7bb tcg/i386: remove suboptimal register shifting
Now that CONFIG_TCG_PASS_AREG0 has been removed, it's easier to get
an optimal code for the load/store functions.

First swap the two registers used in tcg_out_tlb_load() so that the
address end-up in the second register instead of the first one. Adjust
tcg_out_qemu_ld() and tcg_out_qemu_st() to respectively call
tcg_out_qemu_ld_direct() and tcg_out_qemu_st_direct() with the correct
registers. Then replace the register shifting by direct load of the
arguments.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:05 +01:00
Richard Henderson
4438c8a946 exec: Allocate code_gen_prologue from code_gen_buffer
We had a hack for arm and sparc, allocating code_gen_prologue to a
special section.  Which, honestly does no good under certain cases.
We've already got limits on code_gen_buffer_size to ensure that all
TBs can use direct branches between themselves; reuse this limit to
ensure the prologue is also reachable.

As a bonus, we get to avoid marking a page of the main executable's
data segment as executable.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-10-20 07:54:04 +00:00
Aurelien Jarno
41a05a4576 Merge branch 'linux-user-for-upstream' of git://git.linaro.org/people/rikuvoipio/qemu
* 'linux-user-for-upstream' of git://git.linaro.org/people/rikuvoipio/qemu:
  linux-user: register align p{read, write}64
  linux-user: ppc: mark as long long aligned
  tcg: Remove TCG_TARGET_HAS_GUEST_BASE define
  configure: Remove unnecessary host_guest_base code
  linux-user: If loading fails, print error as string, not number
  linux-user: Fix siginfo handling
  alpha-linux-user: Fix sigaltstack structure definition
  linux-user: Implement gethostname
  linux-user: Perform more checks on iovec lists
  linux-user: fix multi-threaded /proc/self/maps
  linux-user: fix statfs
2012-10-19 20:28:22 +02:00
Richard Henderson
1414968a6a tcg: Optimize mulu2
Like add2, do operand ordering, constant folding, and dead operand
elimination.  The latter happens about 15% of all mulu2 during an
x86_64 bios boot.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:51:39 +02:00
Richard Henderson
1305c451e6 tcg: Optimize half-dead add2/sub2
When x86_64 guest is not in 64-bit mode, the high-part of the 64-bit
add is dead.  When the host is 32-bit, we can simplify to 32-bit
arithmetic.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:51:37 +02:00
Richard Henderson
212c328d61 tcg: Constant fold add2 and sub2
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:51:37 +02:00
Richard Henderson
6c4382f8f4 tcg: Do constant folding on double-word comparisons
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:51:35 +02:00
Richard Henderson
9519da7e39 tcg: Split out subroutines from do_constant_folding_cond
We can re-use these for implementing double-word folding.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:51:32 +02:00
Richard Henderson
bc1473eff4 tcg: Optimize double-word comparisons against zero
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:32:29 +02:00
Richard Henderson
6e14e91b66 tcg: Use common code when failing to optimize
This saves a whole lot of repetitive code sequences.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:32:01 +02:00
Richard Henderson
0bfcb86538 tcg: Swap commutative double-word comparisons
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:31:57 +02:00
Richard Henderson
1e484e61e2 tcg: Canonicalize add2 operand ordering
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:31:53 +02:00
Richard Henderson
24c9ae4eba tcg: Split out swap_commutative as a subroutine
Reduces code duplication and prefers

  movcond d, c1, c2, const, s
to
  movcond d, c1, c2, s, const

It also prefers

  add r, r, c
over
  add r, c, r

when both inputs are known constants.  This doesn't matter for true add, as
we will fully constant fold that.  But it matters for a follow-on patch using
this routine for add2 which may not be fully foldable.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:30:40 +02:00
Richard Henderson
c7d4475a70 tcg-ia64: Implement deposit
Note that in the general reg=reg,reg case we're restricted
to 16-bit insertions.  This makes it easy to allow "any"
constant as input, as post-truncation it will fit into the
constant load insn for which we have room in the bundle.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 01:26:43 +02:00
Aurelien Jarno
63975ea7df tcg/ia64: slightly optimize TLB access code
It is possible to slightly optimize the TLB access code, by replacing
the movi + and instructions by a deposit instruction.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 01:26:43 +02:00
Aurelien Jarno
2174d1e1ff tcg/ia64: remove suboptimal register shifting in qemu_ld/st ops
Remove suboptimal register shifting in qemu_ld/st ops, introduced at the
CONFIG_TCG_PASS_AREG0 time.

As mem_idx is now loaded in register R58/R59 for the slow path, we have
to make sure to do it last, to not add additional register constraints.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 01:26:43 +02:00
Aurelien Jarno
b90cf71692 tcg/ia64: implement movcond_i32/64
Implement movcond_i32/64 on ia64 hosts. It is not possible to have
immediate compare arguments without adding a new bundle, but it is
possible to have 22-bit immediate value arguments.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 01:26:42 +02:00
Blue Swirl
da897bf5ae tcg/ia64: use stack for TCG temps
Use stack instead of temp_buf array in CPUState for TCG temps.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 01:26:42 +02:00
Peter Maydell
4a1d241e3c tcg/arm: Implement movcond_i32
Implement movcond_i32 for ARM, as the sequence
  mov dst, v2   (implicitly done by the tcg common code)
  cmp c1, c2
  movCC dst, v1

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 01:22:49 +02:00
Peter Maydell
7fc645bf7a tcg/arm: Factor out code to emit immediate or reg-reg op
The code to emit either an immediate cmp or a register cmp insn is
duplicated in several places; factor it out into its own function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 01:22:48 +02:00
Richard Henderson
203342d8dc tcg-sparc: Emit MOVR insns for setcond_i64 and movcond_64
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-10-13 10:39:53 +00:00
Richard Henderson
ab1339b9b4 tcg-sparc: Emit BPr insns for brcond_i64
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-10-13 10:39:53 +00:00
Richard Henderson
a115f3ea47 tcg-sparc: Drop use of Bicc in favor of BPcc
Now that we're always sparcv9, we can not bother using Bicc for
32-bit branches and BPcc for 64-bit branches and instead always
use BPcc.

New interfaces allow less direct use of tcg_out32 and raw numbers
inside the qemu_ld/st routines.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-10-13 10:39:53 +00:00
Richard Henderson
fd84ea2391 tcg-sparc: Optimize setcond2 equality compare with 0.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-10-13 10:39:53 +00:00
Richard Henderson
89269f6cea tcg-sparc: Use Z constraint for %g0
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-10-13 10:39:53 +00:00
Richard Henderson
4ec28e255f tcg-sparc: Fix add2/sub2
We must care not to clobber the high parts before we consume them.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-10-13 10:39:53 +00:00