Place DISAS_* constants that update cpu_eip first, and
the "jump" ones last. Add comments explaining the differences
and usage.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Mark cc_op as clean and do not spill it at the end of the translation block.
Technically this is a tiny bit less efficient, but:
* it results in translations that are a tiny bit smaller
* for most of these instructions, it is not unlikely that they are close to
the end of the basic block, in which case cc_op would not be overwritten
* anyway the cost is probably dwarfed by that of computing flags.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
No need to set it again at the end of the translation block, cc_op_dirty
can be set to false.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is already handled in gen_eob(). Before adding another DISAS_*
case, remove the double calls.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
gen_helper_rsm cannot generate an exception, and reloads the flags.
So there's no need to spill cc_op and update cpu_eip, but on the
other hand cc_op must be reset to CC_OP_EFLAGS before returning.
It all works by chance, because by spilling cc_op before the call
to the helper, it becomes non-dirty and gen_eob will not overwrite
the CC_OP_EFLAGS value that is placed there by the helper. But
let's clean it up.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Intel SDM 18.3.1.4 "If an occurrence of the MOV or POP instruction
loads the SS register executes with EFLAGS.TF = 1, no single-step debug
exception occurs following the MOV or POP instruction."
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If EFLAGS.RF is 1, special processing in gen_eob_worker() is needed and
therefore goto_tb cannot be used.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The 32-bit AAM/AAD opcodes are using helpers that read and write flags and
env->regs[R_EAX]. Clean them up so that the table correctly includes AX
as a 16-bit input and output.
No real reason to do it to be honest, but they are nice one-output helpers
and it removes the masking of env->regs[R_EAX] that generic load/writeback
code already does.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240522123912.608497-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
gen_rot_carry and gen_rot_overflow are meant to be called with count == NULL
if the count cannot be zero. However this is not done in gen_ROL and gen_ROR,
and writing everywhere "can_be_zero ? count : NULL" is burdensome and less
readable. Just pass can_be_zero as a separate argument.
gen_RCL and gen_RCR use a conditional branch to skip the computation
if count is zero, so they can pass false unconditionally to gen_rot_overflow.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240522123914.608516-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Almost all of the disas_log implementations are identical.
Unify them within translator_loop.
Drop extra Priv/Virt logging from target/riscv.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These are trivial to add, and moving them to the new decoder fixes some
corner cases: raising #UD instead of an instruction fetch page fault for
the undefined opcodes, and incorrectly rejecting 0F 18 prefetches with
register operands (which are treated as reserved NOPs).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reject 0x66/0xf3/0xf2 in front of them.
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
According to the manual, 32-bit vs 64-bit is governed by REX.W
and REX ignores the 0x66 prefix. This can be confirmed with this
program:
#include <stdio.h>
int main()
{
int x = 0x12340000;
int y;
asm("popcntl %1, %0" : "=r" (y) : "r" (x)); printf("%x\n", y);
asm("mov $-1, %0; .byte 0x66; popcntl %1, %0" : "+r" (y) : "r" (x)); printf("%x\n", y);
asm("mov $-1, %0; .byte 0x66; popcntq %q1, %q0" : "+r" (y) : "r" (x)); printf("%x\n", y);
}
which prints 5/ffff0000/5 on real hardware and 5/ffff0000/ffff0000
on QEMU.
Cc: qemu-stable@nongnu.org
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The PCOMMIT instruction was never included in any physical processor.
TCG implements it as a no-op instruction, but its utility is debatable
to say the least. Drop it from the decoder since it is only available
with "-cpu max", which does not guarantee migration compatibility
across versions, and deprecate the property just in case someone is
using it as "pcommit=off".
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that a bulk of opcodes go through the new decoder, it is sensible
to do some cleanup. Go immediately through disas_insn_new and only jump
back after parsing the prefixes.
disas_insn() now only contains the three sigsetjmp cases, and they
are more easily managed if they are inlined into i386_tr_translate_insn.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Split the bits that have some duplication with disas_insn_new, from
those that should be the main topic of the conversion. This is the
first step towards removing duplicate decoding of prefixes between
disas_insn and disas_insn_new.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These are unlikely to be converted to the table-based decoding
soon (perhaps there could be generic ESC decoding in decode-new.c.inc
for the Mod/RM byte, but not operand decoding), so keep them separate
from the remaining legacy-decoded instructions.
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Send all converted opcodes to disas_insn_new() directly from the big
decoding switch statement; once more, the debugging/bisecting logic
disappears.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A few two-byte opcodes are simple extensions of existing one-byte opcodes;
they are easy to decode and need no change to emit.c.inc. Port them to
the new decoder.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move long-displacement Jcc, SETcc and CMOVcc to the new decoder.
While filling in the tables makes the code seem longer, the new
emitters are all just one line of code.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since new opcodes are not going to be added in translate.c, round the
case labels that call to disas_insn_new(), including whole sets of
eight opcodes when possible.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The shift instructions are rewritten instead of reusing code from the old
decoder. Rotates use CC_OP_ADCOX more extensively and generally rely
more on the optimizer, so that the code generators are shared between
the immediate-count and variable-count cases.
In particular, this makes gen_RCL and gen_RCR pretty efficient for the
count == 1 case, which becomes (apart from a few extra movs) something like:
(compute_cc_all if needed)
// save old value for OF calculation
mov cc_src2, T0
// the bulk of RCL is just this!
deposit T0, cc_src, T0, 1, TARGET_LONG_BITS - 1
// compute carry
shr cc_dst, cc_src2, length - 1
and cc_dst, cc_dst, 1
// compute overflow
xor cc_src2, cc_src2, T0
extract cc_src2, cc_src2, length - 1, 1
32-bit MUL and IMUL are also slightly more efficient on 64-bit hosts.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In the new decoder it is sometimes easier to put the segment
in T1 instead of T0, usually because another operand was loaded
by common code in T0. Genrealize gen_movl_seg_T0 to allow
using any source.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Compared to the old decoder, the main differences in translation
are for the little-used ARPL instruction. IMUL is adjusted a bit
to share more code to produce flags, but is otherwise very similar.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
While keeping decode->immediate for convenience and for 4-operand instructions,
store the immediate in X86DecodedOp as well. This enables instructions
with more than one immediate such as ENTER. It can also be used for far
calls and jumps.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Extract the code into new functions, and swap T0/T1 so that T0 corresponds
to the first immediate in the instruction stream.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Create a new wrapper for syscall/sysret, and do not go through multiple
layers of wrappers.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Instead of using s->T0 or s->T1, create a scratch register
when computing the C, NC, L or LE conditions.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Instead of using s->tmp0 or s->tmp4 as the result, just extend the cc_*
registers in place. It is harmless and, if multiple setcc instructions
are used, the optimizer will be able to remove the redundant ones.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
gen_update_cc_op must be called before control flow splits. Doing it
in gen_jmp_rel{,_csize} may hide bugs, instead assert that cc_op is
clean---even if that means a few more calls to gen_update_cc_op().
With this new invariant, setting cc_op to CC_OP_DYNAMIC is unnecessary
since the caller should have done it.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
gen_update_cc_op must be called before control flow splits. Do it
where the jump on ECX!=0 is translated.
On the other hand, remove the call before gen_jcc1, which takes care of
it already, and explain why REPZ/REPNZ need not use CC_OP_DYNAMIC---the
translation block ends before any control-flow-dependent cc_op could
be observed.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Resetting cc_op to CC_OP_DYNAMIC should be done at control flow junctions,
which is not the case here. This translation block is ending and the
only effect of calling set_cc_op() would be a discard of s->cc_srcT.
This discard is useless (it's a temporary, not a global) and in fact
prevents gen_prepare_cc from returning s->cc_srcT.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
With the introduction of TSTEQ and TSTNE the .mask field is always -1,
so remove all the now-unnecessary code.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The new conditions obviously come in handy when testing individual bits
of EFLAGS, and they make it possible to remove the .mask field of
CCPrepare.
Lowering to shift+and is done by the optimizer if necessary.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When testing the sign bit or equality to zero of a partial register, it
is useful to use a single TSTEQ or TSTNE operation. It can also be used
to test the parity flag, using bit 0 of the population count.
Do not do this for target_ulong-sized values however; the optimizer would
produce a comparison against zero anyway, and it avoids shifts by 64
which are undefined behavior.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Extract page-protection definitions to page-protection.h
- Rework in accel/tcg in preparation of extracting TCG fields from CPUState
- More uses of get_task_state() in user emulation
- Xen refactors in preparation for adding multiple map caches (Juergen & Edgar)
- MAINTAINERS updates (Aleksandar and Bin)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmY40CAACgkQ4+MsLN6t
wN5drxAA1oIsuUzpAJmlMIxZwlzbICiuexgn/HH9DwWNlrarKo7V1l4YB8jd9WOg
IKuj7c39kJKsDEB8BXApYwcly+l7DYdnAAI8Z7a+eN+ffKNl/0XBaLjsGf58RNwY
fb39/cXWI9ZxKxsHMSyjpiu68gOGvZ5JJqa30Fr+eOGuug9Fn/fOe1zC6l/dMagy
Dnym72stpD+hcsN5sVwohTBIk+7g9og1O/ctRx6Q3ZCOPz4p0+JNf8VUu43/reaR
294yRK++JrSMhOVFRzP+FH1G25NxiOrVCFXZsUTYU+qPDtdiKtjH1keI/sk7rwZ7
U573lesl7ewQFf1PvMdaVf0TrQyOe6kUGr9Mn2k8+KgjYRAjTAQk8V4Ric/+xXSU
0rd7Cz7lyQ8jm0DoOElROv+lTDQs4dvm3BopF3Bojo4xHLHd3SFhROVPG4tvGQ3H
72Q5UPR2Jr2QZKiImvPceUOg0z5XxoN6KRUkSEpMFOiTRkbwnrH59z/qPijUpe6v
8l5IlI9GjwkL7pcRensp1VC6e9KC7F5Od1J/2RLDw3UQllMQXqVw2bxD3CEtDRJL
QSZoS4d1jUCW4iAYdqh/8+2cOIPiCJ4ai5u7lSdjrIJkRErm32FV/pQLZauoHlT5
eTPUgzDoRXVgI1X1slTpVXlEEvRNbhZqSkYLkXr80MLn5hTafo0=
=3Qkg
-----END PGP SIGNATURE-----
Merge tag 'accel-20240506' of https://github.com/philmd/qemu into staging
Accelerator patches
- Extract page-protection definitions to page-protection.h
- Rework in accel/tcg in preparation of extracting TCG fields from CPUState
- More uses of get_task_state() in user emulation
- Xen refactors in preparation for adding multiple map caches (Juergen & Edgar)
- MAINTAINERS updates (Aleksandar and Bin)
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmY40CAACgkQ4+MsLN6t
# wN5drxAA1oIsuUzpAJmlMIxZwlzbICiuexgn/HH9DwWNlrarKo7V1l4YB8jd9WOg
# IKuj7c39kJKsDEB8BXApYwcly+l7DYdnAAI8Z7a+eN+ffKNl/0XBaLjsGf58RNwY
# fb39/cXWI9ZxKxsHMSyjpiu68gOGvZ5JJqa30Fr+eOGuug9Fn/fOe1zC6l/dMagy
# Dnym72stpD+hcsN5sVwohTBIk+7g9og1O/ctRx6Q3ZCOPz4p0+JNf8VUu43/reaR
# 294yRK++JrSMhOVFRzP+FH1G25NxiOrVCFXZsUTYU+qPDtdiKtjH1keI/sk7rwZ7
# U573lesl7ewQFf1PvMdaVf0TrQyOe6kUGr9Mn2k8+KgjYRAjTAQk8V4Ric/+xXSU
# 0rd7Cz7lyQ8jm0DoOElROv+lTDQs4dvm3BopF3Bojo4xHLHd3SFhROVPG4tvGQ3H
# 72Q5UPR2Jr2QZKiImvPceUOg0z5XxoN6KRUkSEpMFOiTRkbwnrH59z/qPijUpe6v
# 8l5IlI9GjwkL7pcRensp1VC6e9KC7F5Od1J/2RLDw3UQllMQXqVw2bxD3CEtDRJL
# QSZoS4d1jUCW4iAYdqh/8+2cOIPiCJ4ai5u7lSdjrIJkRErm32FV/pQLZauoHlT5
# eTPUgzDoRXVgI1X1slTpVXlEEvRNbhZqSkYLkXr80MLn5hTafo0=
# =3Qkg
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 06 May 2024 05:42:08 AM PDT
# gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full]
* tag 'accel-20240506' of https://github.com/philmd/qemu: (28 commits)
MAINTAINERS: Update my email address
MAINTAINERS: Update Aleksandar Rikalo email
system: Pass RAM MemoryRegion and is_write in xen_map_cache()
xen: mapcache: Break out xen_map_cache_init_single()
xen: mapcache: Break out xen_invalidate_map_cache_single()
xen: mapcache: Refactor xen_invalidate_map_cache_entry_unlocked
xen: mapcache: Refactor xen_replace_cache_entry_unlocked
xen: mapcache: Break out xen_ram_addr_from_mapcache_single
xen: mapcache: Refactor xen_remap_bucket for multi-instance
xen: mapcache: Refactor xen_map_cache for multi-instance
xen: mapcache: Refactor lock functions for multi-instance
xen: let xen_ram_addr_from_mapcache() return -1 in case of not found entry
system: let qemu_map_ram_ptr() use qemu_ram_ptr_length()
user: Use get_task_state() helper
user: Declare get_task_state() once in 'accel/tcg/vcpu-state.h'
user: Forward declare TaskState type definition
accel/tcg: Move @plugin_mem_cbs from CPUState to CPUNegativeOffsetState
accel/tcg: Restrict cpu_plugin_mem_cbs_enabled() to TCG
accel/tcg: Restrict qemu_plugin_vcpu_exit_hook() to TCG plugins
accel/tcg: Update CPUNegativeOffsetState::can_do_io field documentation
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Extract page-protection definitions from "exec/cpu-all.h"
to "exec/page-protection.h".
The list of files requiring the new header was generated
using:
$ git grep -wE \
'PAGE_(READ|WRITE|EXEC|RWX|VALID|ANON|RESERVED|TARGET_.|PASSTHROUGH)'
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240427155714.53669-3-philmd@linaro.org>
When emulated with QEMU, interrupts will never come in the following
loop. However, if the NOP instruction is uncommented, interrupts will
fire as normal.
loop:
cli
call do_sti
jmp loop
do_sti:
sti
# nop
ret
This behavior is different from that of a real processor. For example,
if KVM is enabled, interrupts will always fire regardless of whether the
NOP instruction is commented or not. Also, the Intel Software Developer
Manual states that after the STI instruction is executed, the interrupt
inhibit should end as soon as the next instruction (e.g., the RET
instruction if the NOP instruction is commented) is executed.
This problem is caused because the previous code may choose not to end
the TB even if the HF_INHIBIT_IRQ_MASK has just been reset (e.g., in the
case where the STI instruction is immediately followed by the RET
instruction), so that IRQs may not have a change to trigger. This commit
fixes the problem by always terminating the current TB to give IRQs a
chance to trigger when HF_INHIBIT_IRQ_MASK is reset.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Message-ID: <20240415064518.4951-4-lrh2000@pku.edu.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The XRSTOR instruction ends calling tlb_flush(), declared
in "exec/exec-all.h".
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20231211212003.21686-13-philmd@linaro.org>
The various Intel CPU manuals claim that SGDT and SIDT can write either 24-bits
or 32-bits depending upon the operand size, but this is incorrect. Not only do
the Intel CPU manuals give contradictory information between processor
revisions, but this information doesn't even match real-life behaviour.
In fact, tests on real hardware show that the CPU always writes 32-bits for SGDT
and SIDT, and this behaviour is required for at least OS/2 Warp and WFW 3.11 with
Win32s to function correctly. Remove the masking applied due to the operand size
for SGDT and SIDT so that the TCG behaviour matches the behaviour on real
hardware.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2198
--
MCA: Whilst I don't have a copy of OS/2 Warp handy, I've confirmed that this
patch fixes the issue in WFW 3.11 with Win32s. For more technical information I
highly recommend the excellent write-up at
https://www.os2museum.com/wp/sgdtsidt-fiction-and-reality/.
Message-ID: <20240419195147.434894-1-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When aborting translation of the current insn, restore the
previous value of insn_start.
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Jørgen Hansen <Jorgen.Hansen@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
CXL emulation of interleave requires read and write hooks due to
requirement for subpage granularity. The Linux kernel stack now enables
using this memory as conventional memory in a separate NUMA node. If a
process is deliberately forced to run from that node
$ numactl --membind=1 ls
the page table walk on i386 fails.
Useful part of backtrace:
(cpu=cpu@entry=0x555556fd9000, fmt=fmt@entry=0x555555fe3378 "cpu_io_recompile: could not find TB for pc=%p")
at ../../cpu-target.c:359
(retaddr=0, addr=19595792376, attrs=..., xlat=<optimized out>, cpu=0x555556fd9000, out_offset=<synthetic pointer>)
at ../../accel/tcg/cputlb.c:1339
(cpu=0x555556fd9000, full=0x7fffee0d96e0, ret_be=ret_be@entry=0, addr=19595792376, size=size@entry=8, mmu_idx=4, type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2030
(cpu=cpu@entry=0x555556fd9000, p=p@entry=0x7ffff56fddc0, mmu_idx=<optimized out>, type=type@entry=MMU_DATA_LOAD, memop=<optimized out>, ra=ra@entry=0) at ../../accel/tcg/cputlb.c:2356
(cpu=cpu@entry=0x555556fd9000, addr=addr@entry=19595792376, oi=oi@entry=52, ra=ra@entry=0, access_type=access_type@entry=MMU_DATA_LOAD) at ../../accel/tcg/cputlb.c:2439
at ../../accel/tcg/ldst_common.c.inc:301
at ../../target/i386/tcg/sysemu/excp_helper.c:173
(err=0x7ffff56fdf80, out=0x7ffff56fdf70, mmu_idx=0, access_type=MMU_INST_FETCH, addr=18446744072116178925, env=0x555556fdb7c0)
at ../../target/i386/tcg/sysemu/excp_helper.c:578
(cs=0x555556fd9000, addr=18446744072116178925, size=<optimized out>, access_type=MMU_INST_FETCH, mmu_idx=0, probe=<optimized out>, retaddr=0) at ../../target/i386/tcg/sysemu/excp_helper.c:604
Avoid this by plumbing the address all the way down from
x86_cpu_tlb_fill() where is available as retaddr to the actual accessors
which provide it to probe_access_full() which already handles MMIO accesses.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2180
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2220
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-ID: <20240307155304.31241-2-Jonathan.Cameron@huawei.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The A20 mask is only applied to the final memory access. Nested
page tables are always walked with the raw guest-physical address.
Unlike the previous patch, in this one the masking must be kept, but
it was done too early.
Cc: qemu-stable@nongnu.org
Fixes: 4a1e9d4d11 ("target/i386: Use atomic operations for pte updates", 2022-10-18)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If ptw_translate() does a MMU_PHYS_IDX access, the A20 mask is already
applied in get_physical_address(), which is called via probe_access_full()
and x86_cpu_tlb_fill().
If ptw_translate() on the other hand does a MMU_NESTED_IDX access,
the A20 mask must not be applied to the address that is looked up in
the nested page tables; it must be applied only to the addresses that
hold the NPT entries (which is achieved via MMU_PHYS_IDX, per the
previous paragraph).
Therefore, we can remove A20 masking from the computation of the page
table entry's address, and let get_physical_address() or mmu_translate()
apply it when they know they are returning a host-physical address.
Cc: qemu-stable@nongnu.org
Fixes: 4a1e9d4d11 ("target/i386: Use atomic operations for pte updates", 2022-10-18)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The address translation logic in get_physical_address() will currently
truncate physical addresses to 32 bits unless long mode is enabled.
This is incorrect when using physical address extensions (PAE) outside
of long mode, with the result that a 32-bit operating system using PAE
to access memory above 4G will experience undefined behaviour.
The truncation code was originally introduced in commit 33dfdb5 ("x86:
only allow real mode to access 32bit without LMA"), where it applied
only to translations performed while paging is disabled (and so cannot
affect guests using PAE).
Commit 9828198 ("target/i386: Add MMU_PHYS_IDX and MMU_NESTED_IDX")
rearranged the code such that the truncation also applied to the use
of MMU_PHYS_IDX and MMU_NESTED_IDX. Commit 4a1e9d4 ("target/i386: Use
atomic operations for pte updates") brought this truncation into scope
for page table entry accesses, and is the first commit for which a
Windows 10 32-bit guest will reliably fail to boot if memory above 4G
is present.
The truncation code however is not completely redundant. Even though the
maximum address size for any executed instruction is 32 bits, helpers for
operations such as BOUND, FSAVE or XSAVE may ask get_physical_address()
to translate an address outside of the 32-bit range, if invoked with an
argument that is close to the 4G boundary. Likewise for processor
accesses, for example TSS or IDT accesses, when EFER.LMA==0.
So, move the address truncation in get_physical_address() so that it
applies to 32-bit MMU indexes, but not to MMU_PHYS_IDX and MMU_NESTED_IDX.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2040
Fixes: 4a1e9d4d11 ("target/i386: Use atomic operations for pte updates", 2022-10-18)
Cc: qemu-stable@nongnu.org
Co-developed-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Accesses from a 32-bit environment (32-bit code segment for instruction
accesses, EFER.LMA==0 for processor accesses) have to mask away the
upper 32 bits of the address. While a bit wasteful, the easiest way
to do so is to use separate MMU indexes. These days, QEMU anyway is
compiled with a fixed value for NB_MMU_MODES. Split MMU_USER_IDX,
MMU_KSMAP_IDX and MMU_KNOSMAP_IDX in two.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove knowledge of specific MMU indexes (other than MMU_NESTED_IDX and
MMU_PHYS_IDX) from mmu_translate(). This will make it possible to split
32-bit and 64-bit MMU indexes.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
MSR_VM_HSAVE_PA bits 0-11 are reserved, as are the bits above the
maximum physical address width of the processor. Setting them to
1 causes a #GP (see "15.30.4 VM_HSAVE_PA MSR" in the AMD manual).
The same is true of VMCB addresses passed to VMRUN/VMLOAD/VMSAVE,
even though the manual is not clear on that.
Cc: qemu-stable@nongnu.org
Fixes: 4a1e9d4d11 ("target/i386: Use atomic operations for pte updates", 2022-10-18)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CR3 bits 63:32 are ignored in 32-bit mode (either legacy 2-level
paging or PAE paging). Do this in mmu_translate() to remove
the last where get_physical_address() meaningfully drops the high
bits of the address.
Cc: qemu-stable@nongnu.org
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Fixes: 4a1e9d4d11 ("target/i386: Use atomic operations for pte updates", 2022-10-18)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
target/i386: As specified by Intel Manual Vol2 3-180, cmp instructions
are not allowed to have lock prefix and a `UD` should be raised. Without
this patch, s1->T0 will be uninitialized and used in the case OP_CMPL.
Signed-off-by: Ziqiao Kong <ziqiaokong@gmail.com>
Message-ID: <20240215095015.570748-2-ziqiaokong@gmail.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This commit adds support for x2APIC transitions when writing to
MSR_IA32_APICBASE register and finally adds CPUID_EXT_X2APIC to
TCG_EXT_FEATURES.
The set_base in APICCommonClass now returns an integer to indicate error in
execution. apic_set_base return -1 on invalid APIC state transition,
accelerator can use this to raise appropriate exception.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Message-Id: <20240111154404.5333-4-minhquangbui99@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This commit creates apic_register_read/write which are used by both
apic_mem_read/write for MMIO access and apic_msr_read/write for MSR access.
The apic_msr_read/write returns -1 on error, accelerator can use this to
raise the appropriate exception.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Message-Id: <20240111154404.5333-2-minhquangbui99@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Move this x86-specific code out of the generic accel/tcg/.
Reported-by: Anton Johansson <anjo@rev.ng>
Reviewed-by: Anton Johansson <anjo@rev.ng>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20240124101639.30056-10-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Move this x86-specific code out of the generic accel/tcg/.
Reviewed-by: Anton Johansson <anjo@rev.ng>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20240124101639.30056-8-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
QEMU coding style recommends using structure typedefs.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Makes gen_intermediate_code() signature target agnostic so the function
can be called from accel/tcg/translate-all.c without target specifics.
Signed-off-by: Anton Johansson <anjo@rev.ng>
Message-Id: <20240119144024.14289-9-anjo@rev.ng>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The tcg_cpu_FOO() names are x86 specific, so rename
them as x86_tcg_cpu_FOO() (as other names in this file)
to ease navigating the code.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Message-ID: <20240111120221.35072-5-philmd@linaro.org>
For PC-relative translation blocks, env->eip changes during the
execution of a translation block, Therefore, QEMU must be able to
recover an instruction's PC just from the TranslationBlock struct and
the instruction data with. Because a TB will not span two pages, QEMU
stores all the low bits of EIP in the instruction data and replaces them
in x86_restore_state_to_opc. Bits 12 and higher (which may vary between
executions of a PCREL TB, since these only use the physical address in
the hash key) are kept unmodified from env->eip. The assumption is that
these bits of EIP, unlike bits 0-11, will not change as the translation
block executes.
Unfortunately, this is incorrect when the CS base is not aligned to a page.
Then the linear address of the instructions (i.e. the one with the
CS base addred) indeed will never span two pages, but bits 12+ of EIP
can actually change. For example, if CS base is 0x80262200 and EIP =
0x6FF4, the first instruction in the translation block will be at linear
address 0x802691F4. Even a very small TB will cross to EIP = 0x7xxx,
while the linear addresses will remain comfortably within a single page.
The fix is simply to use the low bits of the linear address for data[0],
since those don't change. Then x86_restore_state_to_opc uses tb->cs_base
to compute a temporary linear address (referring to some unknown
instruction in the TB, but with the correct values of bits 12 and higher);
the low bits are replaced with data[0], and EIP is obtained by subtracting
again the CS base.
Huge thanks to Mark Cave-Ayland for the image and initial debugging,
and to Gitlab user @kjliew for help with bisecting another occurrence
of (hopefully!) the same bug.
It should be relatively easy to write a testcase that performs MMIO on
an EIP with different bits 12+ than the first instruction of the translation
block; any help is welcome.
Fixes: e3a79e0e87 ("target/i386: Enable TARGET_TB_PCREL", 2022-10-11)
Cc: qemu-stable@nongnu.org
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Cc: Richard Henderson <richard.henderson@linaro.org>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1759
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1964
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2012
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The PCREL patches introduced a bug when updating EIP in the !CF_PCREL case.
Using s->pc in func gen_update_eip_next() solves the problem.
Cc: qemu-stable@nongnu.org
Fixes: b5e0d5d22f ("target/i386: Fix 32-bit wrapping of pc/eip computation")
Signed-off-by: guoguangyao <guoguangyao18@mails.ucas.ac.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240115020804.30272-1-guoguangyao18@mails.ucas.ac.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
With PCREL, we have a page-relative view of EIP, and an
approximation of PC = EIP+CSBASE that is good enough to
detect page crossings. If we try to recompute PC after
masking EIP, we will mess up that approximation and write
a corrupt value to EIP.
We already handled masking properly for PCREL, so the
fix in b5e0d5d2 was only needed for the !PCREL path.
Cc: qemu-stable@nongnu.org
Fixes: b5e0d5d22f ("target/i386: Fix 32-bit wrapping of pc/eip computation")
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240101230617.129349-1-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The Big QEMU Lock (BQL) has many names and they are confusing. The
actual QemuMutex variable is called qemu_global_mutex but it's commonly
referred to as the BQL in discussions and some code comments. The
locking APIs, however, are called qemu_mutex_lock_iothread() and
qemu_mutex_unlock_iothread().
The "iothread" name is historic and comes from when the main thread was
split into into KVM vcpu threads and the "iothread" (now called the main
loop thread). I have contributed to the confusion myself by introducing
a separate --object iothread, a separate concept unrelated to the BQL.
The "iothread" name is no longer appropriate for the BQL. Rename the
locking APIs to:
- void bql_lock(void)
- void bql_unlock(void)
- bool bql_locked(void)
There are more APIs with "iothread" in their names. Subsequent patches
will rename them. There are also comments and documentation that will be
updated in later patches.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Fabiano Rosas <farosas@suse.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Acked-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20240102153529.486531-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The main difficulty here is that a page fault when writing to the destination
must not overwrite the flags. Therefore, the flags computation must be
inlined instead of using gen_jcc1*.
For simplicity, I am using an unconditional cmpxchg operation, that becomes
a NOP if the comparison fails.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
ALU instructions can write to both memory and flags. If the CC_SRC*
and CC_DST locations have been written already when a memory access
causes a fault, the value in CC_SRC* and CC_DST might be interpreted
with the wrong CC_OP (the one that is in effect before the instruction.
Besides just using the wrong result for the flags, something like
subtracting -1 can have disastrous effects if the current CC_OP is
CC_OP_EFLAGS: this is because QEMU does not expect bits outside the ALU
flags to be set in CC_SRC, and env->eflags can end up set to all-ones.
In the case of the attached testcase, this sets IOPL to 3 and would
cause an assertion failure if SUB is moved to the new decoder.
This mechanism is not really needed for BMI instructions, which can
only write to a register, but put it to use anyway for cleanliness.
In the case of BZHI, the code has to be modified slightly to ensure
that decode->cc_src is written, otherwise the new assertions trigger.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
gen_jcc() has been changed to accept a relative offset since the
new decoder was written. Adjust the J operand, which is meant
to be used with jump instructions such as gen_jcc(), to not
include the program counter and to not truncate the result, as
both operations are now performed by common code.
The result is that J is now the same as the I operand.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Similar to gen_setcc1, make gen_cmovcc1 receive TCGv. This is more friendly
to simultaneous implementation in the old and the new decoder.
A small wart is that s->T0 of CMOV is currently the *second* argument (which
would ordinarily be in T1). Therefore, the condition has to be inverted in
order to overwrite s->T0 with cpu_regs[reg] if the MOV is not performed.
This only applies to the old decoder, and this code will go away soon.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Do not use gen_op, and pull the load from the accumulator into
disas_insn.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Create a new temporary, to ease the register allocator's work.
Creation of the temporary is pushed into gen_ext_tl, which
also allows NULL as the first parameter now.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Just create a temporary for the occasion.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The new x86 decoder wants the gen_* functions to compute EFLAGS before
writeback, which can be an issue for instructions with a memory
destination such as ARPL or shifts.
Extract code to compute the EFLAGS without clobbering CC_SRC, in case
the memory write causes a fault. The flags writeback mechanism will
take care of copying the result to CC_SRC.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The new decoder would rather have the operand in T0 when expanding SCAS, rather
than use R_EAX directly as gen_scas currently does. This makes SCAS more similar
to CMP and SUB, in that CC_DST = T0 - T1.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The new decoder likes to compute the address in A0 very early, so the
gen_lea_v_seg in gen_pop_T0 would clobber the address of the memory
operand. Instead use T0 since it is already available and will be
overwritten immediately after.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
decode->mem is only used if one operand has has_ea == true. String
operations will not use decode->mem and will load A0 on their own, because
they are the only case of two memory operands in a single instruction.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Usually the registers are just moved into s->T0 without much care for
their operand size. However, in some cases we can get more efficient
code if the operand fetching logic syncs with the emission function
on what is nicer.
All the current uses are mostly demonstrative and only reduce the code
in the emission functions, because the instructions do not support
memory operands. However the logic is generic and applies to several
more instructions such as MOVSXD (aka movslq), one-byte shift
instructions, multiplications, XLAT, and indirect calls/jumps.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
X86_SPECIAL_ZExtOp0 and X86_SPECIAL_ZExtOp2 are poorly named; they are a hack
that is needed by scalar insertion and extraction instructions, and not really
related to zero extension: for PEXTR the zero extension is done by the generation
functions, for PINSR the high bits are not used at all and in fact are *not*
filled with zeroes when loaded into s->T1.
Rename the values to match the effect described in the manual, and explain
better in the comments.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use _tl operations for 32-bit operands on 32-bit targets, and only go
through trunc and extu ops for 64-bit targets. While the trunc/ext
ops should be pretty much free after optimization, the optimizer also
does not like having the same temporary used in multiple EBBs.
Therefore it is nicer to not use tmpN* unless necessary.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The previous check erroneously allowed CMP to be modified with LOCK.
Instead, tag explicitly the instructions that do support LOCK.
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
cpu_cc_compute_all() has an argument that is always equal to CC_OP for historical
reasons (dating back to commit a7812ae412, "TCG variable type checking.", 2008-11-17,
which added the argument to helper_cc_compute_all). It does not make sense for the
argument to have any other value, so remove it and clean up some lines that are not
too long anymore.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
gen_lea_v_seg (called by gen_add_A0_ds_seg) already zeroes any
bits of s->A0 beyond s->aflag. It does so before summing the
segment base and, if not in 64-bit mode, also after summing it.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
is_int is always 1, and error_code is always zero.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
OF is equal to the carry flag, so use the same CCPrepare.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Take advantage of the fact that there can be no 1 bits between SF and OF.
If they were adjacent, you could sum SF and get a carry only if SF was
already set. Then the value of OF in the sum is the XOR of OF itself,
the carry (which is SF) and 0 (the value of the OF bit in the addend):
this is OF^SF exactly.
Because OF and SF are not adjacent, just place more 1 bits to the
left so that the carry propagates, which means summing CC_O - CC_S.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In 32-bit mode, pc = eip + cs_base is also 32-bit, and must wrap.
Failure to do so results in incorrect memory exceptions to the guest.
Before 732d548732, this was implicitly done via truncation to
target_ulong but only in qemu-system-i386, not qemu-system-x86_64.
To fix this, we must add conditional zero-extensions.
Since we have to test for 32 vs 64-bit anyway, note that cs_base
is always zero in 64-bit mode.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2022
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20231212172510.103305-1-richard.henderson@linaro.org>
Instructions in VEX exception class 6 generally look at the value of
VEX.W. Note that the manual places some instructions incorrectly in
class 4, for example VPERMQ which has no non-VEX encoding and no legacy
SSE analogue. AMD does a mess of its own, as documented in the comment
that this patch adds.
Most of them are checked for VEX.W=0, and are listed in the manual
(though with an omission) in table 2-16; VPERMQ and VPERMPD check for
VEX.W=1, which is only listed in the instruction description. Others,
such as VPSRLV, VPSLLV and the FMA3 instructions, use VEX.W to switch
between a 32-bit and 64-bit operation.
Fix more of the class 4/class 6 mismatches, and implement the check for
VEX.W in TCG.
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In preparation for adding more similar checks, move the VEX.L=0 check
and several X86_SPECIAL_* checks to a new field, where each bit represent
a common check on unused bits, or a restriction on the processor mode.
Likewise, many SVM intercepts can be checked during the decoding phase,
the main exception being the selective CR0 write, MSR and IOIO intercepts.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The implementation was validated with OpenSSL and with the test vectors in
https://github.com/rust-lang/stdarch/blob/master/crates/core_arch/src/x86/sha.rs.
The instructions provide a ~25% improvement on hashing a 64 MiB file:
runtime goes down from 1.8 seconds to 1.4 seconds; instruction count on
the host goes down from 5.8 billion to 4.8 billion with slightly better
IPC too. Good job Intel. ;)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit a908985971 ("target/i386/seg_helper: introduce tss_set_busy",
2023-09-26) failed to use the tss_selector argument of the new function,
which was therefore unused.
This shows up as a #GP fault when booting old versions of 32-bit
Linux.
Fixes: a908985971 ("target/i386/seg_helper: introduce tss_set_busy", 2023-09-26)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20231011135350.438492-1-pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Since we *might* have user emulation with softmmu,
replace the system emulation check by !user emulation one.
(target/ was cleaned from invalid CONFIG_SOFTMMU uses at
commit cab35c73be, but these files were merged few days
after, thus missed the cleanup.)
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20231004082239.27251-1-philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Allow the name 'cpu_env' to be used for something else.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>