Use the same flag generation code as SHL and SHR, but use
the existing gen_shiftd_rm_T1 function to compute the result
as well as CC_SRC.
Decoding-wise, SHLD/SHRD by immediate count as a 4 operand
instruction because s->T0 and s->T1 actually occupy three op
slots. The infrastructure used by opcodes in the 0F 3A table
works fine.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
SHLD/SHRD can have 3 register operands - s->T0, s->T1 and either
1 or CL - and therefore decode->op[2] is taken by the low part
of the register being shifted. Pass X86_OP_* to gen_shift_count
from its current callers and hardcode cpu_regs[R_ECX] as the
shift count.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use gen_ld_modrm/gen_st_modrm, moving them and gen_shift_flags to the
caller. This way, gen_shiftd_rm_T1 becomes something that the new
decoder can call.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These have very simple generators and no need for complex group
decoding. Apart from LAR/LSL which are simplified to use
gen_op_deposit_reg_v and movcond, the code is generally lifted
from translate.c into the generators.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
SYSENTER is allowed in VM86 mode, but not in real mode. Split the check
so that PE and !VM86 are covered by separate bits.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is already partly implemented due to VLDMXCSR and VSTMXCSR; finish
the job.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
All other control registers are stored plainly in CPUX86State.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is a bit more generic, as it can be applied to MPX as well.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Just like X86_ENTRYr, X86_ENTRYwr is easily changed to use only T0.
In this case, the motivation is to use it for the MOV instruction
family. The case when you need to preserve the input value is the
odd one, as it is used basically only for BLS* instructions.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
I am not sure why I made it use T1. It is a bit more symmetric with
respect to X86_ENTRYwr (which uses T0 for the "w"ritten operand
and T1 for the "r"ead operand), but it is also less flexible because it
does not let you apply zextT0/sextT0.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This makes for easier cpu_cc_* setup, and not using set_cc_op()
should come in handy if QEMU ever implements APX.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Avoid using set_cc_op() in preparation for implementing APX; treat
CC_OP_EFLAGS similar to the case where we have the "opposite" cc_op
(CC_OP_ADOX for ADCX and CC_OP_ADCX for ADOX), except the resulting
cc_op is not CC_OP_ADCOX. This is written easily as two "if"s, whose
conditions are both false for CC_OP_EFLAGS, both true for CC_OP_ADCOX,
and one each true for CC_OP_ADCX/ADOX.
The new logic also makes it easy to drop usage of tmp0.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CPUX86State argument would only be used to fetch bytes, but that has to be
done before the generator function is called. So remove it, and all
temptation together with it.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The calculation of FrameTemp is done using the size indicated by mo_pushpop()
before being written back to EBP, but the final writeback to EBP is done using
the size indicated by mo_stacksize().
In the case where mo_pushpop() is MO_32 and mo_stacksize() is MO_16 then the
final writeback to EBP is done using MO_16 which can leave junk in the top
16-bits of EBP after executing ENTER.
Change the writeback of EBP to use the same size indicated by mo_pushpop() to
ensure that the full value is written back.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2198
Message-ID: <20240606095319.229650-5-mark.cave-ayland@ilande.co.uk>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When OS/2 Warp configures its segment descriptors, many of them are configured with
the P flag clear to allow for a fault-on-demand implementation. In the case where
the stack value is POPped into the segment registers, the SP is incremented before
calling gen_helper_load_seg() to validate the segment descriptor:
IN:
0xffef2c0c: 66 07 popl %es
OP:
ld_i32 loc9,env,$0xfffffffffffffff8
sub_i32 loc9,loc9,$0x1
brcond_i32 loc9,$0x0,lt,$L0
st16_i32 loc9,env,$0xfffffffffffffff8
st8_i32 $0x1,env,$0xfffffffffffffffc
---- 0000000000000c0c 0000000000000000
ext16u_i64 loc0,rsp
add_i64 loc0,loc0,ss_base
ext32u_i64 loc0,loc0
qemu_ld_a64_i64 loc0,loc0,noat+un+leul,5
add_i64 loc3,rsp,$0x4
deposit_i64 rsp,rsp,loc3,$0x0,$0x10
extrl_i64_i32 loc5,loc0
call load_seg,$0x0,$0,env,$0x0,loc5
add_i64 rip,rip,$0x2
ext16u_i64 rip,rip
exit_tb $0x0
set_label $L0
exit_tb $0x7fff58000043
If helper_load_seg() generates a fault when validating the segment descriptor then as
the SP has already been incremented, the topmost word of the stack is overwritten by
the arguments pushed onto the stack by the CPU before taking the fault handler. As a
consequence things rapidly go wrong upon return from the fault handler due to the
corrupted stack.
Update the logic for the existing writeback condition so that a POP into the segment
registers also calls helper_load_seg() first before incrementing the SP, so that if a
fault occurs the SP remains unaltered.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2198
Message-ID: <20240606095319.229650-4-mark.cave-ayland@ilande.co.uk>
Fixes: cc1d28bdbe ("target/i386: move 00-5F opcodes to new decoder", 2024-05-07)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Instead of directly implementing the writeback using gen_op_st_v(), use the
existing gen_writeback() function.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-ID: <20240606095319.229650-3-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This will make subsequent changes a little easier to read.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-ID: <20240606095319.229650-2-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
DISAS_NORETURN suppresses the work normally done by gen_eob(), and therefore
must be used in special cases only. Document them.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
HLT uses DISAS_NORETURN because the corresponding helper calls
cpu_loop_exit(). However, while gen_eob() clears HF_RF_MASK and
synthesizes a #DB exception if single-step is active, none of this is
done by HLT. Note that the single-step trap is generated after the halt
is finished.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
PAUSE uses DISAS_NORETURN because the corresponding helper
calls cpu_loop_exit(). However, while HLT clear HF_INHIBIT_IRQ_MASK
to correctly handle "STI; HLT", the same is missing from PAUSE.
And also gen_eob() clears HF_RF_MASK and synthesizes a #DB exception
if single-step is active; none of this is done by HLT and PAUSE.
Start fixing PAUSE, HLT will follow.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
From vm entry to exit, VMRUN is handled as a single instruction. It
uses DISAS_NORETURN in order to avoid processing TF or RF before
the first instruction executes in the guest. However, the corresponding
handling is missing in vmexit. Add it, and at the same time reorganize
the comments with quotes from the manual about the tasks performed
by a #VMEXIT.
Another gen_eob() task that is missing in VMRUN is preparing the
HF_INHIBIT_IRQ flag for the next instruction, in this case by loading
it from the VMCB control state.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the required DR7 (either from the VMCB or from the host save
area) disables a breakpoint that was enabled prior to vmentry
or vmexit, it is left enabled and will trigger EXCP_DEBUG.
This causes a spurious #DB on the next crossing of the breakpoint.
To disable it, vmentry/vmexit must use cpu_x86_update_dr7
to load DR7.
Because cpu_x86_update_dr7 takes a 32-bit argument, check
reserved bits prior to calling cpu_x86_update_dr7, and do the
same for DR6 as well for consistency.
This scenario is tested by the "host_rflags" test in kvm-unit-tests.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
DR7.GD triggers a #DB exception on any access to debug registers.
The GD bit is cleared so that the #DB handler itself can access
the debug registers.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use decode.c's support for intercepts, doing the check in TCG-generated
code rather than the helper. This is cleaner because it allows removing
the eip_addend argument to helper_pause(), even though it adds a bit of
bloat for opcode 0x90's new decoding function.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use decode.c's support for intercepts, doing the check in TCG-generated
code rather than the helper. This is cleaner because it allows removing
the eip_addend argument to helper_hlt().
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
ICEBP generates a trap-like exception, while gen_exception() produces
a fault. Resurrect gen_update_eip_next() to implement the desired
semantics.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When preparing an exception stack frame for a fault exception, the value
pushed for RF is 1. Take that into account. The same should be true
of interrupts for repeated string instructions, but the situation there
is complicated.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
xsave.flat checks that "executing the XSETBV instruction causes a general-
protection fault (#GP) if ECX = 0 and EAX[2:1] has the value 10b". QEMU allows
that option, so the test fails. Add the condition.
Cc: qemu-stable@nongnu.org
Fixes: 892544317f ("target/i386: implement XSAVE and XRSTOR of AVX registers", 2022-10-18)
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
DisasContext.cpuid_ext_features indicates CPUID.01H.ECX.
Use DisasContext.cpuid_7_0_ecx_features field to check RDPID feature bit
(CPUID_7_0_ECX_RDPID).
Fixes: 6750485bf4 ("target/i386: implement RDPID in TCG")
Inspired-by: Xinyu Li <lixinyu20s@ict.ac.cn>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Message-ID: <20240603080723.1256662-1-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This commit fixes an issue with MOV instructions (0x8C and 0x8E)
involving segment registers; MOV to segment register's source is
16-bit, while MOV from segment register has to explicitly set the
memory operand size to 16 bits. Introduce a new flag
X86_SPECIAL_Op0_Mw to handle this specification correctly.
Signed-off-by: Xinyu Li <lixinyu20s@ict.ac.cn>
Message-ID: <20240602100528.2135717-1-lixinyu20s@ict.ac.cn>
Fixes: 5e9e21bcc4 ("target/i386: move 60-BF opcodes to new decoder", 2024-05-07)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Features check of CPUID_SSE and CPUID_SSE2 should use cpuid_features,
rather than cpuid_ext_features.
Signed-off-by: Xinyu Li <lixinyu20s@ict.ac.cn>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-ID: <20240602100904.2137939-1-lixinyu20s@ict.ac.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The TCGCPUOps::cpu_exec_halt method is called from cpu_handle_halt()
when the CPU is halted, so that a target CPU emulation can do
anything target-specific it needs to do. (At the moment we only use
this on i386.)
The current specification of the method doesn't allow the target
specific code to do something different if the CPU is about to come
out of the halt state, because cpu_handle_halt() only determines this
after the method has returned. (If the method called cpu_has_work()
itself this would introduce a potential race if an interrupt arrived
between the target's method implementation checking and
cpu_handle_halt() repeating the check.)
Change the definition of the method so that it returns a bool to
tell cpu_handle_halt() whether to stay in halt or not.
We will want this for the Arm target, where FEAT_WFxT wants to do
some work only for the case where the CPU is in halt but about to
leave it.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240430140035.3889879-2-peter.maydell@linaro.org
linux-user/i386: Fix allocation and alignment of fp state in signal frame
-----BEGIN PGP SIGNATURE-----
iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmZT2GwdHHJpY2hhcmQu
aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV87pQf9F/cmrKQG1mVWKmJd
MI7l63lbxejdgAADv1nmro+oapCsJSaQeUSrYp904ydqJjVfBJkaoXfknGsvxrNA
oW7nEuYt0sBKdaBUKhYpMOJ3ivfw7lVVMJmjNv9ngZRhW+WOoJrBHoleUkVLiM7D
rxkMLL+LQ7BR9i0Lv1unorOkqUPGNOnEd45qRn6k1g/Qnqi8SNMzxFwO8+232u8m
EG9un/oh4mKPyb5vSg3Y4JLg+yDKCRScBqBU1wcKFe1u+umBkv2BNcU+k62AJh1q
bv8i1n+X/dFAd1aj0NEupi04EOZIof5m3T4YIWg7M4I94NiFWNZ18vgskkmiO+Mo
0KPd/A==
=sYrE
-----END PGP SIGNATURE-----
Merge tag 'pull-lu-20240526' of https://gitlab.com/rth7680/qemu into staging
target/i386: Introduce X86Access and use for xsave and friends
linux-user/i386: Fix allocation and alignment of fp state in signal frame
# -----BEGIN PGP SIGNATURE-----
#
# iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmZT2GwdHHJpY2hhcmQu
# aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV87pQf9F/cmrKQG1mVWKmJd
# MI7l63lbxejdgAADv1nmro+oapCsJSaQeUSrYp904ydqJjVfBJkaoXfknGsvxrNA
# oW7nEuYt0sBKdaBUKhYpMOJ3ivfw7lVVMJmjNv9ngZRhW+WOoJrBHoleUkVLiM7D
# rxkMLL+LQ7BR9i0Lv1unorOkqUPGNOnEd45qRn6k1g/Qnqi8SNMzxFwO8+232u8m
# EG9un/oh4mKPyb5vSg3Y4JLg+yDKCRScBqBU1wcKFe1u+umBkv2BNcU+k62AJh1q
# bv8i1n+X/dFAd1aj0NEupi04EOZIof5m3T4YIWg7M4I94NiFWNZ18vgskkmiO+Mo
# 0KPd/A==
# =sYrE
# -----END PGP SIGNATURE-----
# gpg: Signature made Sun 26 May 2024 05:48:44 PM PDT
# gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg: issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [ultimate]
* tag 'pull-lu-20240526' of https://gitlab.com/rth7680/qemu: (28 commits)
target/i386: Pass host pointer and size to cpu_x86_{xsave,xrstor}
target/i386: Pass host pointer and size to cpu_x86_{fxsave,fxrstor}
target/i386: Pass host pointer and size to cpu_x86_{fsave,frstor}
target/i386: Convert do_xrstor to X86Access
target/i386: Convert do_xsave to X86Access
linux-user/i386: Honor xfeatures in xrstor_sigcontext
linux-user/i386: Fix allocation and alignment of fp state
linux-user/i386: Return boolean success from xrstor_sigcontext
linux-user/i386: Return boolean success from restore_sigcontext
linux-user/i386: Fix -mregparm=3 for signal delivery
linux-user/i386: Split out struct target_fregs_state
linux-user/i386: Replace target_fpstate_fxsave with X86LegacyXSaveArea
linux-user/i386: Remove xfeatures from target_fpstate_fxsave
linux-user/i386: Drop xfeatures_size from sigcontext arithmetic
target/i386: Add {hw,sw}_reserved to X86LegacyXSaveArea
target/i386: Add rbfm argument to cpu_x86_{xsave,xrstor}
target/i386: Split out do_xsave_chk
target/i386: Convert do_xrstor_* to X86Access
target/i386: Convert do_xsave_* to X86Access
tagret/i386: Convert do_fxsave, do_fxrstor to X86Access
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We have already validated the memory region in the course of
validating the signal frame. No need to do it again within
the helper function.
In addition, return failure when the header contains invalid
xstate_bv. The kernel handles this via exception handling
within XSTATE_OP within xrstor_from_user_sigframe.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We have already validated the memory region in the course of
validating the signal frame. No need to do it again within
the helper function.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We have already validated the memory region in the course of
validating the signal frame. No need to do it again within
the helper function.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
For now, continue to pass all 1's from signal.c.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This path is not required by user-only, and can in fact
be shared between xsave and xrstor.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The body of do_xrstor is now fully converted.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The body of do_xsave is now fully converted.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Move the alignment fault from do_* to helper_*, as it need
not apply to usage from within user-only signal handling.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>