This patch implements Supervisor Mode Execution Prevention (SMEP) and
Supervisor Mode Access Prevention (SMAP) for x86. The purpose of the
patch, obviously, is to help kernel developers debug the support for
those features.
A fair bit of the code relates to the handling of CPUID features. The
CPUID code probably would get greatly simplified if all the feature
bit words were unified into a single vector object, but in the
interest of producing a minimal patch for SMEP/SMAP, and because I had
very limited time for this project, I followed the existing style.
[ v2: don't change the definition of the qemu64 CPU shorthand, since
that breaks loading old snapshots. Per Anthony Liguori this can be
fixed once the CPU feature set is snapshot.
Change the coding style slightly to conform to checkpatch.pl. ]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
For all targets that currently call tcg_gen_debug_insn_start,
add CPU_LOG_TB_OP_OPT to the condition that gates it.
This is useful for comparing optimization dumps, when the
pre-optimization dump is merely noise.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> This instruction is always treated as a register-to-register (MOD = 11)
> instruction, regardless of the encoding of the MOD field in the MODR/M
> byte.
Also, Microport UNIX System V/386 v 2.1 (ca 1987) runs fine on
real Intel 386 and 486 CPU's (at least), but does not run in qemu without
this patch.
Signed-off-by: Matthew Ogilvie <mmogilvi_qemu@miniinfo.net>
Signed-off-by: malc <av1474@comtv.ru>
Add an explicit CPUX86State parameter instead of relying on AREG0.
Remove temporary wrappers and switch to AREG0 free mode.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Add an explicit CPUX86State parameter instead of relying on AREG0.
Rename remains of op_helper.c to seg_helper.c.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Make FPU helpers take a parameter for CPUState instead
of relying on global env.
Introduce temporary wrappers for FPU load and store ops. Remove
wrappers for non-AREG0 code. Don't call unconverted helpers
directly.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
According to the Intel manual
"Intel® 64 and IA-32 Architectures Software Developer’s Manual
Volume 3", "3.4.4 Segment Loading Instructions in IA-32e Mode":
"When in compatibility mode, FS and GS overrides operate as defined by
32-bit mode behavior regardless of the value loaded into the upper 32
linear-address bits of the hidden descriptor register base field.
Compatibility mode ignores the upper 32 bits when calculating an effective address."
However, the code misses the 64-bit mode case, where an instruction with
address and segment size override would be translated incorrectly. For example,
inc dword ptr gs:260h[ebx*4] gets incorrectly translated to:
(uint32_t)(gs.base + ebx * 4 + 0x260)
instead of
gs.base + (uint32_t)(ebx * 4 + 0x260)
Signed-off-by: Vitaly Chipounov <vitaly.chipounov@epfl.ch>
Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Rephrase some of the expressions used to select an entry
in the SSE op table arrays so that it's clearer that they
don't overrun the op table array size.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The X86_64_DEF macro is a confusing way of making some terms
in a conditional only appear if TARGET_X86_64 is defined. We
only use it in two places, and in both cases this is for making
the same test, so abstract that check out into a function
where we can use a more conventional #ifdef.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Commit 11f8cdb removed all the uses of the X86_64_ONLY
macro. The BUGGY_64() macro has been unused for a long time:
it originally marked some ops which couldn't be enabled
because of issues with the pre-TCG code generation scheme.
Remove the now-unnecessary definitions of both macros.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
commit c4baa0503d improved SSE table
type safety which now raises compiler errors when latest QEMU was
configured with --enable-debug.
Fix this by splitting the SSE tables even further to separate
helper functions with different signatures.
Instead of crashing by calling address 0, the code now jumps to
label illegal_op.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
SSE function tables could easily be corrupted because of use
of void pointers.
Introduce function pointer types and helper variables in order
to improve type safety.
Split sse_op_table3 according to types used.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Add an explicit CPUX86State parameter instead of relying on AREG0.
Merge raise_exception_env() to raise_exception(), likewise with
raise_exception_err_env() and raise_exception_err().
Introduce cpu_svm_check_intercept_param() and cpu_vmexit()
as wrappers.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Scripted conversion:
sed -i "s/CPUState/CPUX86State/g" target-i386/*.[hc]
sed -i "s/#define CPUX86State/#define CPUState/" target-i386/cpu.h
Signed-off-by: Andreas Färber <afaerber@suse.de>
Acked-by: Anthony Liguori <aliguori@us.ibm.com>
Commit 2355c16e74 introduced a new ldmxcsr
helper taking an i32 argument, but the helper is actually passed a long.
Fix that by truncating the long to i32.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
SSE rounding and flush to zero control has never been implemented. However
given that softfloat-native was using a single state for FPU and SSE and
given that glibc is setting both FPU and SSE state in fesetround(), this
was working correctly up to the switch to softfloat.
Fix that by adding an update_sse_status() function similar to
update_fpu_status(), and callin git on write to mxcsr.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When the i386 cmpxchg instruction is executed with a memory operand
and the comparison result is "unequal", do the memory write before
changing the accumulator instead of the other way around, because
otherwise the new accumulator value will incorrectly be used in the
comparison when the instruction is restarted after a page fault.
This bug was originally reported on 2010-04-25 as
https://bugs.launchpad.net/qemu/+bug/569760
Signed-off-by: Andreas Gustafsson <gson@gson.org>
T0 was already masked to 16 bits when loading it.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Remove also two assert statements which were the last remaining users.
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
The (x << (cl - 1)) quantity is only used if CL != 0. Move the
computation of that quantity nearer its use.
This avoids the creation of undefined TCG operations when the
constant propagation optimization proves that CL == 0, and thus
CL-1 is outside the range [0-wordsize).
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: malc <av1474@comtv.ru>
While trying to use qemu -cpu pentium3 to test for incorrect uses of certain
SSE2 instructions, I found that QEMU allowed the mfence and lfence
instructions to be executed even though Pentium 3 doesn't support them.
According to the processor specs (and experience on a real Pentium 3), these
instructions are only available with SSE2, but QEMU is checking for SSE. The
check for the related sfence instruction is correct (it works with SSE).
This trival patch fixes the test.
Signed-off-by: Martin Simmons <martin@lispworks.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Function gen_pc_load was introduced in commit
d2856f1ad4.
The only reason for parameter searched_pc was
a debug statement in target-i386/translate.c.
Parameter puc was needed by target-sparc until
commit d7da2a1040.
Remove searched_pc from the debug statement and remove both
parameters from the parameter list of gen_pc_load.
As the function name gen_pc_load was also misleading,
it is now called restore_state_to_opc. This new name
was suggested by Peter Maydell, thanks.
v2: Remove last parameter, too, and rename the function.
v3: Fix [] typo in target-arm/translate.c.
Fix wrong SHA1 object name in commit message (copy+paste error).
Cc: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
tcg_gen_exit_tb takes a parameter of type tcg_target_long,
so the type casts of pointer to long should be replaced by
type casts of pointer to tcg_target_long (suggested by Blue Swirl).
These changes are needed for build environments where
sizeof(long) != sizeof(void *), especially for w64.
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Use this for assignment to the low byte or low word of a register.
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
This patch simplifies target-i386/translate.c a bit by replacing some
code with gen_update_cc_op()
Signed-off-by: Jun Koi <junkoi2004@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This patch replaces constant value assigned for (DisasContext
*)->is_jmp with DISAS_TB_JUMP.
Signed-off-by: Jun Koi <junkoi2004@gmail.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
ssse3 uses tables with only two entries per op, but it is indexed
with b1 which can contain variables upto 3. This happens when ssse3
or sse4 are used with REP* prefixes.
Add boundary checking for this case.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
We were ignoring REX_B while special-casing NOP, i.e. xchg eax,eax.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Negative four byte displacements need to be sign-extended after
c086b783eb. Do so.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The proper logging for -d cpu is done in generic code.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The commit c22549204a led movntps &
movntdq to be translated incorrectly.
Signed-off-by: TeLeMan <geleman@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
A SIB byte with an index of 4 means "no scaled index", even if the scale
value is not 0. In 64-bit mode, if REX.X is used, an index of 4 selects
%r12. This is correctly handled by the computation of the index variable,
which includes the index bits, and also the REX.X prefix:
index = ((code >> 3) & 7) | REX_X(s);
Thanks to Avi Kivity, Jamie Lokier and Malc for the analysis of the
problem and the initial patch.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Similarly to what is done in 32938e127f
for "jmp im", trunc the immediate to 32-bit when not running in 64-bit
mode.
Reported-by: Kevin O'Connor <kevin@koconnor.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
lzcnt is a AMD Phenom/Barcelona added instruction returning the
number of leading zero bits in a word.
As this is similar to the "bsr" instruction, reuse the existing
code. There need to be some more changes, though, as lzcnt always
returns a valid value (in opposite to bsr, which has a special
case when the operand is 0).
lzcnt is guarded by the ABM CPUID bit (Fn8000_0001:ECX_5).
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The arpl implementation in target-i386/translate.c uses cpu_A0
temporary across a brcond op. This patch fixes that issue.
Signed-off-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This patch corrects the following aspects of exception generation in
fxsave/fxrstor:
* Generate #GP if the operand is not aligned to a 16 byte boundary
* Generate #UD if the LOCK prefix is used
* For CR0.EM = 1 #NM is generated, not #UD
Signed-off-by: Kevin Wolf <mail@kevin-wolf.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
RDTSCP reads the time stamp counter and atomically also the content
of a 32-bit MSR, which can be freely set by the OS. This allows CPU
local data to be queried by userspace.
Linux uses this to allow a fast implementation of the getcpu()
syscall, which uses the vsyscall page to avoid a context switch.
AMD CPUs since K8RevF and Intel CPUs since Nehalem support this
instruction.
RDTSCP is guarded by the RDTSCP CPUID bit (Fn8000_0001:EDX[27]).
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This adds support for the AMD Phenom/Barcelona's SSE4a instructions.
Those include insertq and extrq, which are doing shift and mask on
XMM registers, in two versions (immediate shift/length values and
stored in another XMM register).
Additionally it implements movntss, movntsd, which are scalar
non-temporal stores (avoiding cache trashing). These are implemented
as normal stores, though.
SSE4a is guarded by the SSE4A CPUID bit (Fn8000_0001:ECX[6]).
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>