Commit Graph

367 Commits

Author SHA1 Message Date
Richard Henderson
3250cff8e5 target-i386: Remove gen_op_mov*_A0_im
Propagate the definitions into all users.  In two cases, this allows
us to share code between the 32-bit and 64-bit immediate moves.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:32 -08:00
Richard Henderson
0ae657b116 target-i386: Remove gen_op_movl_T0_im*
Propagate the definitions into all users.  The only time that
gen_op_movl_T1_imu was used, the input was type 'unsigned',
so the replacement works identically.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:32 -08:00
Richard Henderson
1b90d56e8c target-i386: Remove gen_op_movl_T0_im*
Propagate the definition of gen_op_movl_T0_im to all users.
The function gen_op_movl_T0_imu was unused.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:32 -08:00
Richard Henderson
97212c8844 target-i386: Remove gen_op_movl_T0_0
Propagate its definition into all users.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:32 -08:00
Richard Henderson
a7fbcbe538 target-i386: Tidy extend + move
For the known MO_32/MO_64 cases, we don't need to extend a 32-bit temp
into a 64-bit temp before storing into the hardware register.

We do need the extension for the MO_8/MO_16 cases, in order for the
deposit_tl operation to work, so leave those alone.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:32 -08:00
Richard Henderson
d5601ad023 target-i386: Tidy extend + store
We can now use tcg_gen_qemu_st_i32 directly to avoid the extension.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:32 -08:00
Richard Henderson
80b0201384 target-i386: Tidy load + truncate
We can now use tcg_gen_qemu_ld_i32 directly to avoid the truncation.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:31 -08:00
Richard Henderson
24b9c00fc3 target-i386: Tidy gen_op_mov_TN_reg+tcg_gen_trunc_tl_i32
For the 16 and 32-bit cases, we don't need to truncate via
a temporary register.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:31 -08:00
Richard Henderson
3655a19fdd target-i386: Use MO_BE for movbe
Fold the bswap into the memory operation.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:31 -08:00
Richard Henderson
4eeb3939b5 target-i386: Remove unused arguments to gen_lea_modrm
The reg_ptr and offset_ptr outputs are universally unused.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:31 -08:00
Richard Henderson
4b1fe0671f target-i386: Tidy movsl
Always perform a sign-extending load.  In the extremely unlikely
case that we've used an 0x66 prefix, the extension to 64-bits is
unnecessary but not wrong; the store will still examine only 16 bits.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:31 -08:00
Richard Henderson
c8fbc47967 target-i386: Tidy mov[sz][bw]
We can use the MO_SIGN bit to tidy the reg-reg switch statement
as well as pass it on to gen_op_ld_v, eliminating one call.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:31 -08:00
Richard Henderson
ee3138da2f target-i386: Fix typo in gen_push_T1
By inspection, obviously we should be storing T[1] not T[0].
This could only happen for x86_64 in 64-bit mode with 0x66
prefix to call insn -- i.e. never.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:31 -08:00
Richard Henderson
b5afc10494 target-i386: Remove gen_op_st_T1_A0
Propagate its definition into all users.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:31 -08:00
Richard Henderson
fd8ca9f6f5 target-i386: Remove gen_op_st_T0_A0
Propagate its definition into all users.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:31 -08:00
Richard Henderson
d4faa3e08a target-i386: Introduce gen_op_st_rm_T0_A0
Too many places have the same test vs OR_TMP0 to indicate
a write back to memory.  Hoist that to a subroutine.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:30 -08:00
Richard Henderson
dc732b76fa target-i386: Remove gen_op_lds_T0_A0
Replace its users by gen_op_ld_v with the MO_SIGN bit set.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:30 -08:00
Richard Henderson
0f712e109b target-i386: Remove gen_op_ld_T1_A0
Propagate its definition into all users.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:30 -08:00
Richard Henderson
cc1a80dfb3 target-i386: Remove gen_op_ldu_T0_A0
Propagate its definition into all users.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:30 -08:00
Richard Henderson
909be18382 target-i386: Remove gen_op_ld_T0_A0
Propagate its definition into all users.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:30 -08:00
Richard Henderson
4ba9938c89 target-i386: Replace OT_* constants with MO_* constants
The MO_8/16/32/64 constants have the same encoding and meaning
as the OT_BYTE/WORD/LONG/QUAD.  Since we rely on them being the
same, for the qemu_ld/st helpers, standardize on the common names.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:36:16 -08:00
Richard Henderson
3523e4bd9b target-i386: Use new tcg_gen_qemu_st_* helpers
In preference to the older helpers.  Stores only in this patch.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:05:53 -08:00
Richard Henderson
3c5f41169b target-i386: Use new tcg_gen_qemu_ld_* helpers
In preference to the older helpers.  Loads only in this patch.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:05:49 -08:00
Richard Henderson
5c42a7cd98 target-i386: Stop encoding DisasContext.mem_index
Now that we don't combine mem_index with operand size info,
we don't need to encode it.  Which tidies many places that
access it.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:05:45 -08:00
Richard Henderson
323d18769e target-i386: Push DisasContext into load/store helpers
Rather than add s->mem_index into a combined size+mem_index
argument, pass the context down.  This will allow cleaning
up s->mem_index later.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-01-07 11:05:39 -08:00
Richard Henderson
7865eec4f5 target-i386: Fix addr32 prefix in gen_lea_modrm
Fix the following run-test-x86_64 testsuite failures:

-lea (%%eax) = 0000000000000001
-lea (%%ebx) = 0000000000000002
-lea (%%ecx) = 0000000000000004
-lea (%%edx) = 0000000000000008
-lea (%%esi) = 0000000000000010
-lea (%%edi) = 0000000000000020
+lea (%%eax) = 0000abcc00000001
+lea (%%ebx) = 0000abcf00000002
+lea (%%ecx) = 0000abc900000004
+lea (%%edx) = 0000abc500000008
+lea (%%esi) = 0000abdd00000010
+lea (%%edi) = 0000abed00000020

In addition, reduce ifdeffery and minimize the number of TCG ops
produced during address computation.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-id: 1384219016-5170-1-git-send-email-rth@twiddle.net
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
2013-11-21 08:01:16 -08:00
Paolo Bonzini
81f3053b77 target-i386: yield to another VCPU on PAUSE
After commit b1bbfe7 (aio / timers: On timer modification, qemu_notify
or aio_notify, 2013-08-21) FreeBSD guests report a huge slowdown.

The problem shows up as soon as FreeBSD turns out its periodic (~1 ms)
tick, but the timers are only the trigger for a pre-existing problem.

Before the offending patch, setting a timer did a timer_settime system call.

After, setting the timer exits the event loop (which uses poll) and
reenters it with a new deadline.  This does not cause any slowdown; the
difference is between one system call (timer_settime and a signal
delivery (SIGALRM) before the patch, and two system calls afterwards
(write to a pipe or eventfd + calling poll again when re-entering the
event loop).

Unfortunately, the exit/enter causes the main loop to grab the iothread
lock, which in turns kicks the VCPU thread out of execution.  This
causes TCG to execute the next VCPU in its round-robin scheduling of
VCPUS.  When the second VCPU is mostly unused, FreeBSD runs a "pause"
instruction in its idle loop which only burns cycles without any
progress.  As soon as the timer tick expires, the first VCPU runs
the interrupt handler but very soon it sets it again---and QEMU
then goes back doing nothing in the second VCPU.

The fix is to make the pause instruction do "cpu_loop_exit".

Cc: Richard Henderson <rth@twiddle.net>
Reported-by: Luigi Rizzo <rizzo@iet.unipi.it>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 1384948442-24217-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
2013-11-21 07:55:45 -08:00
Richard Henderson
5cd8f6210f tcg: Move helper registration into tcg_context_init
No longer needs to be done on a per-target basis.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-10 11:43:37 -07:00
Peter Maydell
bff93281a7 target-i386: Only provide CMOV and friends if feature bit set
The instructions CMOVcc, FCMOVcc and F[U]COMI[P] should only be
present if the CMOV feature bit is set. Add missing feature bit
checks so we correctly fault if emulating a 486 or 586.
This fixes bug LP:1201446.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-12 11:24:48 -07:00
Richard Henderson
8cfd04959a tcg: Change tcg_gen_exit_tb argument to uintptr_t
And update all users.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:30 -07:00
Andreas Färber
ed2803da58 cpu: Move singlestep_enabled field from CPU_COMMON to CPUState
Prepares for changing cpu_single_step() argument to CPUState.

Acked-by: Michael Walle <michael@walle.cc> (for lm32)
Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-07-23 02:41:32 +02:00
Andreas Färber
467215c20f target-i386: Change gen_intermediate_code_internal() argument to X86CPU
Also use bool type while at it.

Prepares for moving singlestep_enabled field to CPUState.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-07-09 21:33:03 +02:00
Richard Henderson
dec3fc9657 target-i386: Fix aflag logic for CODE64 and the 0x67 prefix
The code reorganization in commit 4a6fd938 broke handling of PREFIX_ADR.
While fixing this, tidy and comment the code so that it's more obvious
what's going on in setting both aflag and dflag.

The TARGET_X86_64 ifdef can be eliminated because CODE64 expands to the
constant zero when TARGET_X86_64 is undefined.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1369855851-21400-1-git-send-email-rth@twiddle.net
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-05-31 12:51:07 -05:00
Aurelien Jarno
38ebb396c9 target-i386: ROR r8/r16 imm instruction fix
Fix EFLAGS corruption by ROR r8/r16 imm instruction located at the end
of the TB, similarly to commit 089305ac for the non-immediate case.

Reported-by: Hervé Poussineau <hpoussin@reactos.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2013-05-10 19:59:54 +02:00
Eduardo Habkost
0514ef2fbb target-i386: Replace cpuid_*features fields with a feature word array
This replaces the feature-bit fields on both X86CPU and x86_def_t
structs with an array.

With this, we will be able to simplify code that simply does the same
operation on all feature words (e.g. kvm_check_features_against_host(),
filter_features_for_kvm(), add_flagname_to_bitmaps(), CPU feature-bit
property lookup/registration, and the proposed "feature-words" property)

The following field replacements were made on X86CPU and x86_def_t:

  (cpuid_)features         -> features[FEAT_1_EDX]
  (cpuid_)ext_features     -> features[FEAT_1_ECX]
  (cpuid_)ext2_features    -> features[FEAT_8000_0001_EDX]
  (cpuid_)ext3_features    -> features[FEAT_8000_0001_ECX]
  (cpuid_)ext4_features    -> features[FEAT_C000_0001_EDX]
  (cpuid_)kvm_features     -> features[FEAT_KVM]
  (cpuid_)svm_features     -> features[FEAT_SVM]
  (cpuid_)7_0_ebx_features -> features[FEAT_7_0_EBX]

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-05-02 00:27:55 +02:00
Pavel Dovgaluk
089305ac0a i386 ROR r8/r16 instruction fix
Fixed EFLAGS corruption by ROR r8/r16 instruction located at the end of the TB.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@gmail.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2013-04-20 21:27:52 +02:00
Aurelien Jarno
d640045a3e target-i386: add AES-NI instructions
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2013-04-13 13:51:57 +02:00
Aurelien Jarno
e71827bc0e target-i386: add pclmulqdq instruction
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2013-04-13 13:51:56 +02:00
Aurelien Jarno
34c6addd4b target-i386: SSE4.1: fix pinsrb instruction
gen_op_mov_TN_reg() loads the value in cpu_T[0], so this temporary should
be used instead of cpu_tmp0.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2013-04-01 18:49:15 +02:00
Richard Henderson
c53de1a289 target-i386: Fix flags computation for ADOX
When starting from CC_OP_DYNAMIC, and issuing adox before adcx,
a typo used the wrong value for the resulting CC_OP.

Cc: Blue Swirl <blauwirbel@gmail.com>
Reported-by: Torbjorn Granlund <tg@gmplib.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-03-23 14:26:52 +00:00
Peter Maydell
085d813407 Fix typos and misspellings
Fix various typos and misspellings. The bulk of these were found with
codespell.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-22 13:25:07 +01:00
Peter Maydell
806f352d3d gen-icount.h: Rename gen_icount_start/end to gen_tb_start/end
The gen_icount_start/end functions are now somewhat misnamed since they
are useful for generic "start/end of TB" code, used for more than just
icount. Rename them to gen_tb_start/end.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-03-03 14:29:08 +00:00
Richard Henderson
a4bcea3d67 target-i386: Use mulu2 and muls2
These correspond very closely to the insns that we're emulating.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-27 19:06:28 +00:00
Richard Henderson
76f1313323 target-i386: Use add2 to implement the ADX extension
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:29 +00:00
Richard Henderson
f437d0a3c2 target-i386: Use movcond to implement shiftd.
With this being all straight-line code, it can get deleted
when the cc variables die.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-19 23:05:19 -08:00
Richard Henderson
e2f515cf2f target-i386: Discard CC_OP computation in set_cc_op also
The shift and rotate insns use movcond to set CC_OP, and thus
achieve a conditional EFLAGS setting.  By discarding CC_OP in
a later flags setting insn, we can discard that movcond.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-19 23:05:19 -08:00
Richard Henderson
34d80a55ff target-i386: Use movcond to implement rotate flags.
With this being all straight-line code, it can get deleted
when the cc variables die.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-19 23:05:19 -08:00
Richard Henderson
a41f62f592 target-i386: Use movcond to implement shift flags.
With this being all straight-line code, it can get deleted
when the cc variables die.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-19 23:05:19 -08:00
Richard Henderson
436ff2d227 target-i386: Add CC_OP_CLR
Special case xor with self.  We need not even store the known
zero into cc_src.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-19 23:05:18 -08:00
Richard Henderson
321c535105 target-i386: Implement tzcnt and fix lzcnt
We weren't computing flags for lzcnt at all.  At the same time,
adjust the implementation of bsf/bsr to avoid the local branch,
using movcond instead.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-19 23:05:18 -08:00
Richard Henderson
cd7f97cafd target-i386: Implement ADX extension
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-19 23:05:18 -08:00
Richard Henderson
e2c3c2c551 target-i386: Implement RORX
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:52:32 -08:00
Richard Henderson
4a554890e4 target-i386: Implement SHLX, SARX, SHRX
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:52:32 -08:00
Richard Henderson
0592f74a75 target-i386: Implement PDEP, PEXT
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:52:32 -08:00
Richard Henderson
5f1f4b1771 target-i386: Implement MULX
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:52:32 -08:00
Richard Henderson
02ea1e6b4f target-i386: Implement BZHI
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:52:32 -08:00
Richard Henderson
bc4b43dc2f target-i386: Implement BLSR, BLSMSK, BLSI
Do all of group 17 at one time for ease.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:52:05 -08:00
Richard Henderson
c7ab7565bc target-i386: Implement BEXTR
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:39:39 -08:00
Richard Henderson
7073fbada7 target-i386: Implement ANDN
As this is the first of the BMI insns to be implemented,
this carries quite a bit more baggage than normal.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:39:39 -08:00
Richard Henderson
111994ee05 target-i386: Implement MOVBE
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:39:39 -08:00
Richard Henderson
701ed211d6 target-i386: Decode the VEX prefixes
No actual required uses of these encodings yet.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:39:39 -08:00
Richard Henderson
4a6fd938f5 target-i386: Tidy prefix parsing
Avoid duplicating switch statement between 32 and 64-bit modes.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:39:38 -08:00
Richard Henderson
988c3eb0d6 target-i386: Use CC_SRC2 for ADC and SBB
Add another slot in ENV and store two of the three inputs.  This lets us
do less work when carry-out is not needed, and avoids the unpredictable
CC_OP after translating these insns.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:39:09 -08:00
Richard Henderson
db9f259772 target-i386: Make helper_cc_compute_{all,c} const
Pass the data in explicitly, rather than indirectly via env.
This avoids all sorts of unnecessary register spillage.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:25:55 -08:00
Richard Henderson
a3251186fc target-i386: optimize flags checking after sub using CC_SRCT
After a comparison or subtraction, the original value of the LHS will
currently be reconstructed using an addition.  However, in most cases
it is already available: store it in a temp-local variable and save 1
or 2 TCG ops (2 if the result of the addition needs to be extended).

The temp-local can be declared dead as soon as the cc_op changes again,
or also before the translation block ends because gen_prepare_cc will
always make a copy before returning it.  All this magic, plus copy
propagation and dead-code elimination, ensures that the temp local will
(almost) never be spilled.

Example (cmp $0x21,%rax + jbe):

 Before                                     After
----------------------------------------------------------------------------
 movi_i64 tmp1,$0x21                        movi_i64 tmp1,$0x21
 movi_i64 cc_src,$0x21                      movi_i64 cc_src,$0x21
 sub_i64 cc_dst,rax,tmp1                    sub_i64 cc_dst,rax,tmp1
 add_i64 tmp7,cc_dst,cc_src
 movi_i32 cc_op,$0x11                       movi_i32 cc_op,$0x11
 brcond_i64 tmp7,cc_src,leu,$0x0            discard loc11
                                            brcond_i64 rax,cc_src,leu,$0x0

 Before                                     After
----------------------------------------------------------------------------
  mov    (%r14),%rbp                        mov    (%r14),%rbp
  mov    %rbp,%rbx                          mov    %rbp,%rbx
  sub    $0x21,%rbx                         sub    $0x21,%rbx
  lea    0x21(%rbx),%r12
  movl   $0x11,0xa0(%r14)                   movl   $0x11,0xa0(%r14)
  movq   $0x21,0x90(%r14)                   movq   $0x21,0x90(%r14)
  mov    %rbx,0x98(%r14)                    mov    %rbx,0x98(%r14)
  cmp    $0x21,%r12                     |   cmp    $0x21,%rbp
  jbe    ...                                jbe    ...

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:58 -08:00
Richard Henderson
891a5133f1 target-i386: Update cc_op before TCG branches
Placing the CC_OP_DYNAMIC at the join is less effective than
before the branch, as the branch will have forced global registers
to their home locations.  This way we have a chance to discard
CC_SRC2 before it gets stored.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:58 -08:00
Richard Henderson
dc259201f8 target-i386: introduce gen_jcc1_noeob
A jump that ends a basic block or otherwise falls back to CC_OP_DYNAMIC
will always have to call gen_op_set_cc_op.  However, not all jumps end
a basic block, so introduce a variant that does not do this.

This was partially undone earlier (i386: drop cc_op argument of gen_jcc1),
redo it now also to prepare for the introduction of src2.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:58 -08:00
Richard Henderson
63633fe6eb target-i386: use gen_op for cmps/scas
Replace low-level ops with a higher-level "cmp %al, (A0)" in the case
of scas, and "cmp T0, (A0)" in the case of cmps.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:58 -08:00
Paolo Bonzini
3b9d3cf160 target-i386: kill cpu_T3
It is almost unused, and it is simpler to pass a TCG value directly
to gen_shiftd_rm_T1_T3.  This value is then written to t2 without
going through a temporary register.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:57 -08:00
Richard Henderson
57eb0cc854 target-i386: expand cmov via movcond
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:57 -08:00
Paolo Bonzini
f32d3781de target-i386: introduce gen_cmovcc1
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:57 -08:00
Paolo Bonzini
cc8b6f5b39 target-i386: cleanup temporary macros for CCPrepare
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:57 -08:00
Richard Henderson
69d1aa31f7 target-i386: inline gen_prepare_cc_slow
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:57 -08:00
Paolo Bonzini
943131ca98 target-i386: use CCPrepare to generate conditional jumps
This simplifies all the jump generation code.  CCPrepare allows the
code to create an efficient brcond always, so there is no need to
duplicate the setcc and jcc code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:57 -08:00
Richard Henderson
276e6b5f06 target-i386: introduce gen_prepare_cc
This makes the i386 front-end able to create CCPrepare structs for all
condition, not just those that come from a single flag.  In particular,
JCC_L and JCC_LE can be optimized because gen_prepare_cc is not forced
to return a result in bit 0 (unlike gen_setcc_slow).

However, for now the slow jcc operations will still go through CC
computation in a single-bit temporary, followed by a brcond if the
temporary is nonzero.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:57 -08:00
Richard Henderson
bec93d7283 target-i386: introduce CCPrepare
Introduce a struct that describes how to build a *cond operation
that checks for a given x86 condition code.  For now, just change
gen_compute_eflags_* to return the new struct, generate code for
the CCPrepare struct, and go on as before.

[rth: Use ctz with the proper width rather than ffs.]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:57 -08:00
Paolo Bonzini
c365395e9b target-i386: optimize setcc instructions
Reconstruct the arguments for complex conditions involving CC_OP_SUBx (BE,
L, LE).  In the others do it via setcond and gen_setcc_slow (which is
not that slow in many cases).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:57 -08:00
Richard Henderson
be10b289d6 target-i386: optimize setle
And allow gen_setcc_slow to operate on cpu_cc_src.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:57 -08:00
Richard Henderson
2cb4764577 target-i386: optimize setbe
This is looking at EFLAGS, but it can do so more efficiently with
setcond.

Reviewed-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:57 -08:00
Paolo Bonzini
1a5c635947 target-i386: change gen_setcc_slow_T0 to gen_setcc_slow
Do not hard code the destination register.

Reviewed-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:57 -08:00
Richard Henderson
06847f1f1a target-i386: convert gen_compute_eflags_c to TCG
Do the switch at translation time, converting the helper templates to
TCG opcodes.  In some cases CF can be computed with a single setcond,
though others it may require a little more work.

In the CC_OP_DYNAMIC case, compute the whole EFLAGS, same as for ZF/SF/PF.

Reviewed-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:57 -08:00
Richard Henderson
8115f11735 target-i386: use inverted setcond when computing NS or NZ
Make gen_compute_eflags_z and gen_compute_eflags_s able to compute the
inverted condition, and use this in gen_setcc_slow_T0.  We cannot do it
yet in gen_compute_eflags_c, but prepare the code for it anyway.  It is
not worthwhile for PF, as usual.

shr+and+xor could be replaced by and+setcond.  I'm not doing it yet.

Reviewed-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:57 -08:00
Richard Henderson
086c407784 target-i386: do not call helper to compute ZF/SF
ZF, SF and PF can always be computed from CC_DST except in the
CC_OP_EFLAGS case (and CC_OP_DYNAMIC, which just resolves to CC_OP_EFLAGS
in gen_compute_eflags).  Use setcond to compute ZF and SF.

We could also use a table lookup to compute PF.

Reviewed-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:57 -08:00
Richard Henderson
b666265b20 target-i386: Move CC discards to set_cc_op
This gets us universal coverage, rather than scattering discards
around at various places.  As a bonus, we do not emit redundant
discards e.g. between sequential logic insns.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:56 -08:00
Richard Henderson
ccfcdd09bf target-i386: no need to flush out cc_op before gen_eob
This makes code more similar to the other callers of gen_eob, especially
loopz/loopnz/jcxz.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:56 -08:00
Richard Henderson
d229edce1c target-i386: do not compute eflags multiple times consecutively
After calling gen_compute_eflags, leave the computed value in cc_reg_src
and set cc_op to CC_OP_EFLAGS.  The next few patches will remove anyway
most calls to gen_compute_eflags.

As a result of this change it is more natural to remove the register
argument from gen_compute_eflags and change all the callers.

Reviewed-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:56 -08:00
Paolo Bonzini
1608ecca95 target-i386: add helper functions to get other flags
Introduce new functions to extract PF, SF, OF, ZF in addition to CF.
These provide single entry points for optimizing accesses to a single
flag.

Reviewed-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:56 -08:00
Richard Henderson
773cdfccb8 target-i386: Use gen_update_cc_op everywhere
All of the conditional calls to gen_op_set_cc_op go away, and
gen_op_set_cc_op itself gets inlined into its only remaining caller.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:56 -08:00
Richard Henderson
e207582f66 target-i386: Don't clobber s->cc_op in gen_update_cc_op
Use a dirty flag to know whether env->cc_op is up to date,
rather than forcing s->cc_op to DYNAMIC and losing info.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:56 -08:00
Richard Henderson
3ca51d07da target-i386: Introduce set_cc_op
This will provide a good hook into which we can consolidate
all of the cc variable discards.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:56 -08:00
Richard Henderson
fee71888a2 target-i386: Name the cc_op enumeration
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:56 -08:00
Paolo Bonzini
c7b3c87397 target-i386: factor gen_op_set_cc_op/tcg_gen_discard_tl around computing flags
Before computing flags we need to store the cc_op to memory.  Move this
to gen_compute_eflags_c and gen_compute_eflags rather than doing it all
over the place.

Alo, after computing the flags in cpu_cc_src we are in EFLAGS mode.
Set s->cc_op and discard cpu_cc_dst in gen_compute_eflags, rather than
doing it all over the place.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:56 -08:00
Paolo Bonzini
5bdb91b0dd target-i386: use gen_jcc1 to compile loopz
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:56 -08:00
Paolo Bonzini
6fa38ed219 target-i386: clean up sahf
Discard CC_DST and set s->cc_op immediately after computing EFLAGS.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:56 -08:00
Paolo Bonzini
f5847c912d target-i386: compute eflags outside rcl/rcr helper
Always compute EFLAGS first since it is needed whenever
the shift is non-zero, i.e. most of the time.  This makes it possible
to remove some writes of CC_OP_EFLAGS to cpu_cc_op and more importantly
removes cases where s->cc_op becomes CC_OP_DYNAMIC.  Also, we can
remove cc_tmp and just modify cc_src from within the helper.

Finally, always follow gen_compute_eflags(cpu_cc_src) by setting s->cc_op
and discarding cpu_cc_dst.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:56 -08:00
Paolo Bonzini
0ff6addd92 target-i386: move eflags computation closer to gen_op_set_cc_op
This ensures the invariant that cpu_cc_op matches s->cc_op when calling
the helpers.  The next patches need this because gen_compute_eflags and
gen_compute_eflags_c will take care of setting cpu_cc_op.

Always compute EFLAGS first since it is needed whenever the shift is
non-zero, i.e. most of the time.  This makes it possible to remove some
writes of CC_OP_EFLAGS to cpu_cc_op and more importantly removes cases
where s->cc_op becomes CC_OP_DYNAMIC.  These are slow and we want to
avoid them: CC_OP_EFLAGS is quite efficient once we paid the initial
cost of computing the flags.

Finally, always follow gen_compute_eflags(cpu_cc_src) by setting s->cc_op
and discarding cpu_cc_dst.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:55 -08:00
Paolo Bonzini
52320e15db target-i386: move carry computation for inc/dec closer to gen_op_set_cc_op
This ensures the invariant that cpu_cc_op matches s->cc_op when calling
the helpers.  The next patches need this because gen_compute_eflags and
gen_compute_eflags_c will take care of setting cpu_cc_op.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:55 -08:00
Paolo Bonzini
b27fc131fe target-i386: drop cc_op argument of gen_jcc1
As in the gen_repz_scas/gen_repz_cmps case, delay setting
CC_OP_DYNAMIC in gen_jcc until after code generation.  All of
gen_jcc1/is_fast_jcc/gen_setcc_slow_T0 now work on s->cc_op, which makes
things a bit easier to follow and to patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:55 -08:00
Paolo Bonzini
91642ff806 target-i386: factor setting of s->cc_op handling for string functions
Set it to the appropriate CC_OP_SUBx constant in gen_scas/gen_cmps.
In the repz case it can be overridden to CC_OP_DYNAMIC after generating
the code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:55 -08:00
Paolo Bonzini
d824df34e8 target-i386: introduce gen_ext_tl
Introduce a function that abstracts extracting an 8, 16, 32 or 64-bit value
with or without sign, generalizing gen_extu and gen_exts.

Reviewed-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18 15:03:55 -08:00