target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible
from the Intel manual. Centralizing the decode the operands makes it
more homogeneous, for example all immediates are signed. All modrm
handling is in one function, and can be shared between SSE and ALU
instructions (including XMM<->GPR instructions). The SSE/AVX decoder
will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for
the new decoder is more verbose, but the control flow is simpler.
Conditionals are not nested and have small bodies. All instruction
groups are resolved even before operands are decoded, and code
generation is separated as much as possible within small functions
that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback
as much in common code as possible. All ALU operations for example
are implemented as T0=f(T0,T1). For non-ALU instructions,
read-modify-write memory operations are rare, but registers do not
have TCGv equivalents: therefore, the common logic sets up pointer
temporaries with the operands, while load and writeback are handled
by gvec or by helpers.
These principles make future code review and extensibility simpler, at
the cost of having a relatively large amount of code in the form of this
patch. Even EVEX should not be _too_ hard to implement (it's just a crazy
large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old
decoder with the new one. The old decoder takes care of parsing
prefixes and then optionally drops to the new one. The changes to the
old decoder are minimal and allow it to be replaced incrementally with
the new one.
There is a debugging mechanism through a "LIMIT" environment variable.
In user-mode emulation, the variable is the number of instructions
decoded by the new decoder before permanently switching to the old one.
In system emulation, the variable is the highest opcode that is decoded
by the new decoder (this is less friendly, but it's the best that can
be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-23 12:20:55 +03:00
|
|
|
/*
|
|
|
|
* Decode table flags, mostly based on Intel SDM.
|
|
|
|
*
|
|
|
|
* Copyright (c) 2022 Red Hat, Inc.
|
|
|
|
*
|
|
|
|
* Author: Paolo Bonzini <pbonzini@redhat.com>
|
|
|
|
*
|
|
|
|
* This library is free software; you can redistribute it and/or
|
|
|
|
* modify it under the terms of the GNU Lesser General Public
|
|
|
|
* License as published by the Free Software Foundation; either
|
|
|
|
* version 2.1 of the License, or (at your option) any later version.
|
|
|
|
*
|
|
|
|
* This library is distributed in the hope that it will be useful,
|
|
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
|
|
|
* Lesser General Public License for more details.
|
|
|
|
*
|
|
|
|
* You should have received a copy of the GNU Lesser General Public
|
|
|
|
* License along with this library; if not, see <http://www.gnu.org/licenses/>.
|
|
|
|
*/
|
|
|
|
|
|
|
|
typedef enum X86OpType {
|
|
|
|
X86_TYPE_None,
|
|
|
|
|
|
|
|
X86_TYPE_A, /* Implicit */
|
|
|
|
X86_TYPE_B, /* VEX.vvvv selects a GPR */
|
|
|
|
X86_TYPE_C, /* REG in the modrm byte selects a control register */
|
|
|
|
X86_TYPE_D, /* REG in the modrm byte selects a debug register */
|
|
|
|
X86_TYPE_E, /* ALU modrm operand */
|
|
|
|
X86_TYPE_F, /* EFLAGS/RFLAGS */
|
|
|
|
X86_TYPE_G, /* REG in the modrm byte selects a GPR */
|
|
|
|
X86_TYPE_H, /* For AVX, VEX.vvvv selects an XMM/YMM register */
|
|
|
|
X86_TYPE_I, /* Immediate */
|
|
|
|
X86_TYPE_J, /* Relative offset for a jump */
|
|
|
|
X86_TYPE_L, /* The upper 4 bits of the immediate select a 128-bit register */
|
|
|
|
X86_TYPE_M, /* modrm byte selects a memory operand */
|
|
|
|
X86_TYPE_N, /* R/M in the modrm byte selects an MMX register */
|
|
|
|
X86_TYPE_O, /* Absolute address encoded in the instruction */
|
|
|
|
X86_TYPE_P, /* reg in the modrm byte selects an MMX register */
|
|
|
|
X86_TYPE_Q, /* MMX modrm operand */
|
|
|
|
X86_TYPE_R, /* R/M in the modrm byte selects a register */
|
|
|
|
X86_TYPE_S, /* reg selects a segment register */
|
|
|
|
X86_TYPE_U, /* R/M in the modrm byte selects an XMM/YMM register */
|
|
|
|
X86_TYPE_V, /* reg in the modrm byte selects an XMM/YMM register */
|
|
|
|
X86_TYPE_W, /* XMM/YMM modrm operand */
|
|
|
|
X86_TYPE_X, /* string source */
|
|
|
|
X86_TYPE_Y, /* string destination */
|
|
|
|
|
|
|
|
/* Custom */
|
target/i386: move C0-FF opcodes to new decoder (except for x87)
The shift instructions are rewritten instead of reusing code from the old
decoder. Rotates use CC_OP_ADCOX more extensively and generally rely
more on the optimizer, so that the code generators are shared between
the immediate-count and variable-count cases.
In particular, this makes gen_RCL and gen_RCR pretty efficient for the
count == 1 case, which becomes (apart from a few extra movs) something like:
(compute_cc_all if needed)
// save old value for OF calculation
mov cc_src2, T0
// the bulk of RCL is just this!
deposit T0, cc_src, T0, 1, TARGET_LONG_BITS - 1
// compute carry
shr cc_dst, cc_src2, length - 1
and cc_dst, cc_dst, 1
// compute overflow
xor cc_src2, cc_src2, T0
extract cc_src2, cc_src2, length - 1, 1
32-bit MUL and IMUL are also slightly more efficient on 64-bit hosts.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-21 18:36:34 +03:00
|
|
|
X86_TYPE_EM, /* modrm byte selects an ALU memory operand */
|
2022-09-01 15:27:55 +03:00
|
|
|
X86_TYPE_WM, /* modrm byte selects an XMM/YMM memory operand */
|
2023-10-11 11:55:16 +03:00
|
|
|
X86_TYPE_I_unsigned, /* Immediate, zero-extended */
|
2024-05-06 18:34:55 +03:00
|
|
|
X86_TYPE_nop, /* modrm operand decoded but not loaded into s->T{0,1} */
|
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible
from the Intel manual. Centralizing the decode the operands makes it
more homogeneous, for example all immediates are signed. All modrm
handling is in one function, and can be shared between SSE and ALU
instructions (including XMM<->GPR instructions). The SSE/AVX decoder
will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for
the new decoder is more verbose, but the control flow is simpler.
Conditionals are not nested and have small bodies. All instruction
groups are resolved even before operands are decoded, and code
generation is separated as much as possible within small functions
that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback
as much in common code as possible. All ALU operations for example
are implemented as T0=f(T0,T1). For non-ALU instructions,
read-modify-write memory operations are rare, but registers do not
have TCGv equivalents: therefore, the common logic sets up pointer
temporaries with the operands, while load and writeback are handled
by gvec or by helpers.
These principles make future code review and extensibility simpler, at
the cost of having a relatively large amount of code in the form of this
patch. Even EVEX should not be _too_ hard to implement (it's just a crazy
large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old
decoder with the new one. The old decoder takes care of parsing
prefixes and then optionally drops to the new one. The changes to the
old decoder are minimal and allow it to be replaced incrementally with
the new one.
There is a debugging mechanism through a "LIMIT" environment variable.
In user-mode emulation, the variable is the number of instructions
decoded by the new decoder before permanently switching to the old one.
In system emulation, the variable is the highest opcode that is decoded
by the new decoder (this is less friendly, but it's the best that can
be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-23 12:20:55 +03:00
|
|
|
X86_TYPE_2op, /* 2-operand RMW instruction */
|
|
|
|
X86_TYPE_LoBits, /* encoded in bits 0-2 of the operand + REX.B */
|
|
|
|
X86_TYPE_0, /* Hard-coded GPRs (RAX..RDI) */
|
|
|
|
X86_TYPE_1,
|
|
|
|
X86_TYPE_2,
|
|
|
|
X86_TYPE_3,
|
|
|
|
X86_TYPE_4,
|
|
|
|
X86_TYPE_5,
|
|
|
|
X86_TYPE_6,
|
|
|
|
X86_TYPE_7,
|
|
|
|
X86_TYPE_ES, /* Hard-coded segment registers */
|
|
|
|
X86_TYPE_CS,
|
|
|
|
X86_TYPE_SS,
|
|
|
|
X86_TYPE_DS,
|
|
|
|
X86_TYPE_FS,
|
|
|
|
X86_TYPE_GS,
|
|
|
|
} X86OpType;
|
|
|
|
|
|
|
|
typedef enum X86OpSize {
|
|
|
|
X86_SIZE_None,
|
|
|
|
|
|
|
|
X86_SIZE_a, /* BOUND operand */
|
|
|
|
X86_SIZE_b, /* byte */
|
|
|
|
X86_SIZE_d, /* 32-bit */
|
|
|
|
X86_SIZE_dq, /* SSE/AVX 128-bit */
|
|
|
|
X86_SIZE_p, /* Far pointer */
|
|
|
|
X86_SIZE_pd, /* SSE/AVX packed double precision */
|
|
|
|
X86_SIZE_pi, /* MMX */
|
|
|
|
X86_SIZE_ps, /* SSE/AVX packed single precision */
|
|
|
|
X86_SIZE_q, /* 64-bit */
|
|
|
|
X86_SIZE_qq, /* AVX 256-bit */
|
|
|
|
X86_SIZE_s, /* Descriptor */
|
|
|
|
X86_SIZE_sd, /* SSE/AVX scalar double precision */
|
|
|
|
X86_SIZE_ss, /* SSE/AVX scalar single precision */
|
|
|
|
X86_SIZE_si, /* 32-bit GPR */
|
|
|
|
X86_SIZE_v, /* 16/32/64-bit, based on operand size */
|
|
|
|
X86_SIZE_w, /* 16-bit */
|
|
|
|
X86_SIZE_x, /* 128/256-bit, based on operand size */
|
|
|
|
X86_SIZE_y, /* 32/64-bit, based on operand size */
|
|
|
|
X86_SIZE_z, /* 16-bit for 16-bit operand size, else 32-bit */
|
target/i386: move C0-FF opcodes to new decoder (except for x87)
The shift instructions are rewritten instead of reusing code from the old
decoder. Rotates use CC_OP_ADCOX more extensively and generally rely
more on the optimizer, so that the code generators are shared between
the immediate-count and variable-count cases.
In particular, this makes gen_RCL and gen_RCR pretty efficient for the
count == 1 case, which becomes (apart from a few extra movs) something like:
(compute_cc_all if needed)
// save old value for OF calculation
mov cc_src2, T0
// the bulk of RCL is just this!
deposit T0, cc_src, T0, 1, TARGET_LONG_BITS - 1
// compute carry
shr cc_dst, cc_src2, length - 1
and cc_dst, cc_dst, 1
// compute overflow
xor cc_src2, cc_src2, T0
extract cc_src2, cc_src2, length - 1, 1
32-bit MUL and IMUL are also slightly more efficient on 64-bit hosts.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-21 18:36:34 +03:00
|
|
|
X86_SIZE_z_f64, /* 32-bit for 32-bit operand size or 64-bit mode, else 16-bit */
|
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible
from the Intel manual. Centralizing the decode the operands makes it
more homogeneous, for example all immediates are signed. All modrm
handling is in one function, and can be shared between SSE and ALU
instructions (including XMM<->GPR instructions). The SSE/AVX decoder
will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for
the new decoder is more verbose, but the control flow is simpler.
Conditionals are not nested and have small bodies. All instruction
groups are resolved even before operands are decoded, and code
generation is separated as much as possible within small functions
that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback
as much in common code as possible. All ALU operations for example
are implemented as T0=f(T0,T1). For non-ALU instructions,
read-modify-write memory operations are rare, but registers do not
have TCGv equivalents: therefore, the common logic sets up pointer
temporaries with the operands, while load and writeback are handled
by gvec or by helpers.
These principles make future code review and extensibility simpler, at
the cost of having a relatively large amount of code in the form of this
patch. Even EVEX should not be _too_ hard to implement (it's just a crazy
large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old
decoder with the new one. The old decoder takes care of parsing
prefixes and then optionally drops to the new one. The changes to the
old decoder are minimal and allow it to be replaced incrementally with
the new one.
There is a debugging mechanism through a "LIMIT" environment variable.
In user-mode emulation, the variable is the number of instructions
decoded by the new decoder before permanently switching to the old one.
In system emulation, the variable is the highest opcode that is decoded
by the new decoder (this is less friendly, but it's the best that can
be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-23 12:20:55 +03:00
|
|
|
|
|
|
|
/* Custom */
|
|
|
|
X86_SIZE_d64,
|
|
|
|
X86_SIZE_f64,
|
2023-08-29 19:25:46 +03:00
|
|
|
X86_SIZE_xh, /* SSE/AVX packed half register */
|
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible
from the Intel manual. Centralizing the decode the operands makes it
more homogeneous, for example all immediates are signed. All modrm
handling is in one function, and can be shared between SSE and ALU
instructions (including XMM<->GPR instructions). The SSE/AVX decoder
will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for
the new decoder is more verbose, but the control flow is simpler.
Conditionals are not nested and have small bodies. All instruction
groups are resolved even before operands are decoded, and code
generation is separated as much as possible within small functions
that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback
as much in common code as possible. All ALU operations for example
are implemented as T0=f(T0,T1). For non-ALU instructions,
read-modify-write memory operations are rare, but registers do not
have TCGv equivalents: therefore, the common logic sets up pointer
temporaries with the operands, while load and writeback are handled
by gvec or by helpers.
These principles make future code review and extensibility simpler, at
the cost of having a relatively large amount of code in the form of this
patch. Even EVEX should not be _too_ hard to implement (it's just a crazy
large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old
decoder with the new one. The old decoder takes care of parsing
prefixes and then optionally drops to the new one. The changes to the
old decoder are minimal and allow it to be replaced incrementally with
the new one.
There is a debugging mechanism through a "LIMIT" environment variable.
In user-mode emulation, the variable is the number of instructions
decoded by the new decoder before permanently switching to the old one.
In system emulation, the variable is the highest opcode that is decoded
by the new decoder (this is less friendly, but it's the best that can
be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-23 12:20:55 +03:00
|
|
|
} X86OpSize;
|
|
|
|
|
2022-09-01 15:51:35 +03:00
|
|
|
typedef enum X86CPUIDFeature {
|
|
|
|
X86_FEAT_None,
|
2022-09-06 00:27:53 +03:00
|
|
|
X86_FEAT_3DNOW,
|
2022-09-01 15:51:35 +03:00
|
|
|
X86_FEAT_ADX,
|
|
|
|
X86_FEAT_AES,
|
|
|
|
X86_FEAT_AVX,
|
|
|
|
X86_FEAT_AVX2,
|
|
|
|
X86_FEAT_BMI1,
|
|
|
|
X86_FEAT_BMI2,
|
2023-10-11 12:51:58 +03:00
|
|
|
X86_FEAT_CMOV,
|
2023-10-10 11:31:39 +03:00
|
|
|
X86_FEAT_CMPCCXADD,
|
2022-10-19 14:22:06 +03:00
|
|
|
X86_FEAT_F16C,
|
2022-10-19 14:22:06 +03:00
|
|
|
X86_FEAT_FMA,
|
2022-09-01 15:51:35 +03:00
|
|
|
X86_FEAT_MOVBE,
|
|
|
|
X86_FEAT_PCLMULQDQ,
|
2023-10-10 11:31:17 +03:00
|
|
|
X86_FEAT_SHA_NI,
|
2022-09-01 15:51:35 +03:00
|
|
|
X86_FEAT_SSE,
|
|
|
|
X86_FEAT_SSE2,
|
|
|
|
X86_FEAT_SSE3,
|
|
|
|
X86_FEAT_SSSE3,
|
|
|
|
X86_FEAT_SSE41,
|
|
|
|
X86_FEAT_SSE42,
|
|
|
|
X86_FEAT_SSE4A,
|
|
|
|
} X86CPUIDFeature;
|
|
|
|
|
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible
from the Intel manual. Centralizing the decode the operands makes it
more homogeneous, for example all immediates are signed. All modrm
handling is in one function, and can be shared between SSE and ALU
instructions (including XMM<->GPR instructions). The SSE/AVX decoder
will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for
the new decoder is more verbose, but the control flow is simpler.
Conditionals are not nested and have small bodies. All instruction
groups are resolved even before operands are decoded, and code
generation is separated as much as possible within small functions
that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback
as much in common code as possible. All ALU operations for example
are implemented as T0=f(T0,T1). For non-ALU instructions,
read-modify-write memory operations are rare, but registers do not
have TCGv equivalents: therefore, the common logic sets up pointer
temporaries with the operands, while load and writeback are handled
by gvec or by helpers.
These principles make future code review and extensibility simpler, at
the cost of having a relatively large amount of code in the form of this
patch. Even EVEX should not be _too_ hard to implement (it's just a crazy
large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old
decoder with the new one. The old decoder takes care of parsing
prefixes and then optionally drops to the new one. The changes to the
old decoder are minimal and allow it to be replaced incrementally with
the new one.
There is a debugging mechanism through a "LIMIT" environment variable.
In user-mode emulation, the variable is the number of instructions
decoded by the new decoder before permanently switching to the old one.
In system emulation, the variable is the highest opcode that is decoded
by the new decoder (this is less friendly, but it's the best that can
be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-23 12:20:55 +03:00
|
|
|
/* Execution flags */
|
|
|
|
|
|
|
|
typedef enum X86OpUnit {
|
|
|
|
X86_OP_SKIP, /* not valid or managed by emission function */
|
|
|
|
X86_OP_SEG, /* segment selector */
|
|
|
|
X86_OP_CR, /* control register */
|
|
|
|
X86_OP_DR, /* debug register */
|
|
|
|
X86_OP_INT, /* loaded into/stored from s->T0/T1 */
|
|
|
|
X86_OP_IMM, /* immediate */
|
|
|
|
X86_OP_SSE, /* address in either s->ptrX or s->A0 depending on has_ea */
|
|
|
|
X86_OP_MMX, /* address in either s->ptrX or s->A0 depending on has_ea */
|
|
|
|
} X86OpUnit;
|
|
|
|
|
2023-10-09 18:43:12 +03:00
|
|
|
typedef enum X86InsnCheck {
|
|
|
|
/* Illegal or exclusive to 64-bit mode */
|
|
|
|
X86_CHECK_i64 = 1,
|
|
|
|
X86_CHECK_o64 = 2,
|
|
|
|
|
|
|
|
/* Fault outside protected mode */
|
|
|
|
X86_CHECK_prot = 4,
|
|
|
|
|
|
|
|
/* Privileged instruction checks */
|
|
|
|
X86_CHECK_cpl0 = 8,
|
|
|
|
X86_CHECK_vm86_iopl = 16,
|
|
|
|
X86_CHECK_cpl_iopl = 32,
|
|
|
|
X86_CHECK_iopl = X86_CHECK_cpl_iopl | X86_CHECK_vm86_iopl,
|
|
|
|
|
|
|
|
/* Fault if VEX.L=1 */
|
|
|
|
X86_CHECK_VEX128 = 64,
|
2023-10-09 19:16:27 +03:00
|
|
|
|
|
|
|
/* Fault if VEX.W=1 */
|
|
|
|
X86_CHECK_W0 = 128,
|
|
|
|
|
|
|
|
/* Fault if VEX.W=0 */
|
|
|
|
X86_CHECK_W1 = 256,
|
2023-10-09 18:43:12 +03:00
|
|
|
} X86InsnCheck;
|
|
|
|
|
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible
from the Intel manual. Centralizing the decode the operands makes it
more homogeneous, for example all immediates are signed. All modrm
handling is in one function, and can be shared between SSE and ALU
instructions (including XMM<->GPR instructions). The SSE/AVX decoder
will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for
the new decoder is more verbose, but the control flow is simpler.
Conditionals are not nested and have small bodies. All instruction
groups are resolved even before operands are decoded, and code
generation is separated as much as possible within small functions
that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback
as much in common code as possible. All ALU operations for example
are implemented as T0=f(T0,T1). For non-ALU instructions,
read-modify-write memory operations are rare, but registers do not
have TCGv equivalents: therefore, the common logic sets up pointer
temporaries with the operands, while load and writeback are handled
by gvec or by helpers.
These principles make future code review and extensibility simpler, at
the cost of having a relatively large amount of code in the form of this
patch. Even EVEX should not be _too_ hard to implement (it's just a crazy
large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old
decoder with the new one. The old decoder takes care of parsing
prefixes and then optionally drops to the new one. The changes to the
old decoder are minimal and allow it to be replaced incrementally with
the new one.
There is a debugging mechanism through a "LIMIT" environment variable.
In user-mode emulation, the variable is the number of instructions
decoded by the new decoder before permanently switching to the old one.
In system emulation, the variable is the highest opcode that is decoded
by the new decoder (this is less friendly, but it's the best that can
be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-23 12:20:55 +03:00
|
|
|
typedef enum X86InsnSpecial {
|
|
|
|
X86_SPECIAL_None,
|
|
|
|
|
2023-10-19 15:22:53 +03:00
|
|
|
/* Accepts LOCK prefix; LOCKed operations do not load or writeback operand 0 */
|
|
|
|
X86_SPECIAL_HasLock,
|
|
|
|
|
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible
from the Intel manual. Centralizing the decode the operands makes it
more homogeneous, for example all immediates are signed. All modrm
handling is in one function, and can be shared between SSE and ALU
instructions (including XMM<->GPR instructions). The SSE/AVX decoder
will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for
the new decoder is more verbose, but the control flow is simpler.
Conditionals are not nested and have small bodies. All instruction
groups are resolved even before operands are decoded, and code
generation is separated as much as possible within small functions
that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback
as much in common code as possible. All ALU operations for example
are implemented as T0=f(T0,T1). For non-ALU instructions,
read-modify-write memory operations are rare, but registers do not
have TCGv equivalents: therefore, the common logic sets up pointer
temporaries with the operands, while load and writeback are handled
by gvec or by helpers.
These principles make future code review and extensibility simpler, at
the cost of having a relatively large amount of code in the form of this
patch. Even EVEX should not be _too_ hard to implement (it's just a crazy
large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old
decoder with the new one. The old decoder takes care of parsing
prefixes and then optionally drops to the new one. The changes to the
old decoder are minimal and allow it to be replaced incrementally with
the new one.
There is a debugging mechanism through a "LIMIT" environment variable.
In user-mode emulation, the variable is the number of instructions
decoded by the new decoder before permanently switching to the old one.
In system emulation, the variable is the highest opcode that is decoded
by the new decoder (this is less friendly, but it's the best that can
be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-23 12:20:55 +03:00
|
|
|
/* Always locked if it has a memory operand (XCHG) */
|
|
|
|
X86_SPECIAL_Locked,
|
|
|
|
|
2023-10-11 11:55:16 +03:00
|
|
|
/* Do not apply segment base to effective address */
|
|
|
|
X86_SPECIAL_NoSeg,
|
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible
from the Intel manual. Centralizing the decode the operands makes it
more homogeneous, for example all immediates are signed. All modrm
handling is in one function, and can be shared between SSE and ALU
instructions (including XMM<->GPR instructions). The SSE/AVX decoder
will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for
the new decoder is more verbose, but the control flow is simpler.
Conditionals are not nested and have small bodies. All instruction
groups are resolved even before operands are decoded, and code
generation is separated as much as possible within small functions
that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback
as much in common code as possible. All ALU operations for example
are implemented as T0=f(T0,T1). For non-ALU instructions,
read-modify-write memory operations are rare, but registers do not
have TCGv equivalents: therefore, the common logic sets up pointer
temporaries with the operands, while load and writeback are handled
by gvec or by helpers.
These principles make future code review and extensibility simpler, at
the cost of having a relatively large amount of code in the form of this
patch. Even EVEX should not be _too_ hard to implement (it's just a crazy
large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old
decoder with the new one. The old decoder takes care of parsing
prefixes and then optionally drops to the new one. The changes to the
old decoder are minimal and allow it to be replaced incrementally with
the new one.
There is a debugging mechanism through a "LIMIT" environment variable.
In user-mode emulation, the variable is the number of instructions
decoded by the new decoder before permanently switching to the old one.
In system emulation, the variable is the highest opcode that is decoded
by the new decoder (this is less friendly, but it's the best that can
be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-23 12:20:55 +03:00
|
|
|
/*
|
2023-10-20 00:14:34 +03:00
|
|
|
* Rd/Mb or Rd/Mw in the manual: register operand 0 is treated as 32 bits
|
|
|
|
* (and writeback zero-extends it to 64 bits if applicable). PREFIX_DATA
|
|
|
|
* does not trigger 16-bit writeback and, as a side effect, high-byte
|
|
|
|
* registers are never used.
|
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible
from the Intel manual. Centralizing the decode the operands makes it
more homogeneous, for example all immediates are signed. All modrm
handling is in one function, and can be shared between SSE and ALU
instructions (including XMM<->GPR instructions). The SSE/AVX decoder
will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for
the new decoder is more verbose, but the control flow is simpler.
Conditionals are not nested and have small bodies. All instruction
groups are resolved even before operands are decoded, and code
generation is separated as much as possible within small functions
that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback
as much in common code as possible. All ALU operations for example
are implemented as T0=f(T0,T1). For non-ALU instructions,
read-modify-write memory operations are rare, but registers do not
have TCGv equivalents: therefore, the common logic sets up pointer
temporaries with the operands, while load and writeback are handled
by gvec or by helpers.
These principles make future code review and extensibility simpler, at
the cost of having a relatively large amount of code in the form of this
patch. Even EVEX should not be _too_ hard to implement (it's just a crazy
large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old
decoder with the new one. The old decoder takes care of parsing
prefixes and then optionally drops to the new one. The changes to the
old decoder are minimal and allow it to be replaced incrementally with
the new one.
There is a debugging mechanism through a "LIMIT" environment variable.
In user-mode emulation, the variable is the number of instructions
decoded by the new decoder before permanently switching to the old one.
In system emulation, the variable is the highest opcode that is decoded
by the new decoder (this is less friendly, but it's the best that can
be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-23 12:20:55 +03:00
|
|
|
*/
|
2023-10-20 00:14:34 +03:00
|
|
|
X86_SPECIAL_Op0_Rd,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Ry/Mb in the manual (PINSRB). However, the high bits are never used by
|
|
|
|
* the instruction in either the register or memory cases; the *real* effect
|
|
|
|
* of this modifier is that high-byte registers are never used, even without
|
|
|
|
* a REX prefix. Therefore, PINSRW does not need it despite having Ry/Mw.
|
|
|
|
*/
|
|
|
|
X86_SPECIAL_Op2_Ry,
|
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible
from the Intel manual. Centralizing the decode the operands makes it
more homogeneous, for example all immediates are signed. All modrm
handling is in one function, and can be shared between SSE and ALU
instructions (including XMM<->GPR instructions). The SSE/AVX decoder
will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for
the new decoder is more verbose, but the control flow is simpler.
Conditionals are not nested and have small bodies. All instruction
groups are resolved even before operands are decoded, and code
generation is separated as much as possible within small functions
that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback
as much in common code as possible. All ALU operations for example
are implemented as T0=f(T0,T1). For non-ALU instructions,
read-modify-write memory operations are rare, but registers do not
have TCGv equivalents: therefore, the common logic sets up pointer
temporaries with the operands, while load and writeback are handled
by gvec or by helpers.
These principles make future code review and extensibility simpler, at
the cost of having a relatively large amount of code in the form of this
patch. Even EVEX should not be _too_ hard to implement (it's just a crazy
large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old
decoder with the new one. The old decoder takes care of parsing
prefixes and then optionally drops to the new one. The changes to the
old decoder are minimal and allow it to be replaced incrementally with
the new one.
There is a debugging mechanism through a "LIMIT" environment variable.
In user-mode emulation, the variable is the number of instructions
decoded by the new decoder before permanently switching to the old one.
In system emulation, the variable is the highest opcode that is decoded
by the new decoder (this is less friendly, but it's the best that can
be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-23 12:20:55 +03:00
|
|
|
|
target/i386: reimplement 0x0f 0x38, add AVX
There are several special cases here:
1) extending moves have different widths for the helpers vs. for the
memory loads, and the width for memory loads depends on VEX.L too.
This is represented by X86_SPECIAL_AVXExtMov.
2) some instructions, such as variable-width shifts, select the vector element
size via REX.W.
3) VSIB instructions (VGATHERxPy, VPGATHERxy) are also part of this group,
and they have (among other things) two output operands.
3) the macros for 4-operand blends (which are under 0x0f 0x3a) have to be
extended to support 2-operand blends. The 2-operand variant actually
came a few years earlier, but it is clearer to implement them in the
opposite order.
X86_TYPE_WM, introduced earlier for unaligned loads, is reused for helpers
that accept a Reg* but have a M argument.
These three-byte opcodes also include AVX new instructions, for which
the helpers were originally implemented by Paul Brook <paul@nowt.org>.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-14 19:52:44 +03:00
|
|
|
/*
|
|
|
|
* Register operand 2 is extended to full width, while a memory operand
|
|
|
|
* is doubled in size if VEX.L=1.
|
|
|
|
*/
|
|
|
|
X86_SPECIAL_AVXExtMov,
|
|
|
|
|
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible
from the Intel manual. Centralizing the decode the operands makes it
more homogeneous, for example all immediates are signed. All modrm
handling is in one function, and can be shared between SSE and ALU
instructions (including XMM<->GPR instructions). The SSE/AVX decoder
will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for
the new decoder is more verbose, but the control flow is simpler.
Conditionals are not nested and have small bodies. All instruction
groups are resolved even before operands are decoded, and code
generation is separated as much as possible within small functions
that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback
as much in common code as possible. All ALU operations for example
are implemented as T0=f(T0,T1). For non-ALU instructions,
read-modify-write memory operations are rare, but registers do not
have TCGv equivalents: therefore, the common logic sets up pointer
temporaries with the operands, while load and writeback are handled
by gvec or by helpers.
These principles make future code review and extensibility simpler, at
the cost of having a relatively large amount of code in the form of this
patch. Even EVEX should not be _too_ hard to implement (it's just a crazy
large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old
decoder with the new one. The old decoder takes care of parsing
prefixes and then optionally drops to the new one. The changes to the
old decoder are minimal and allow it to be replaced incrementally with
the new one.
There is a debugging mechanism through a "LIMIT" environment variable.
In user-mode emulation, the variable is the number of instructions
decoded by the new decoder before permanently switching to the old one.
In system emulation, the variable is the highest opcode that is decoded
by the new decoder (this is less friendly, but it's the best that can
be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-23 12:20:55 +03:00
|
|
|
/*
|
|
|
|
* MMX instruction exists with no prefix; if there is no prefix, V/H/W/U operands
|
|
|
|
* become P/P/Q/N, and size "x" becomes "q".
|
|
|
|
*/
|
|
|
|
X86_SPECIAL_MMX,
|
2023-10-19 16:40:54 +03:00
|
|
|
|
|
|
|
/* When loaded into s->T0, register operand 1 is zero/sign extended. */
|
|
|
|
X86_SPECIAL_SExtT0,
|
|
|
|
X86_SPECIAL_ZExtT0,
|
2024-06-02 13:05:28 +03:00
|
|
|
|
|
|
|
/* Memory operand size of MOV from segment register is MO_16 */
|
|
|
|
X86_SPECIAL_Op0_Mw,
|
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible
from the Intel manual. Centralizing the decode the operands makes it
more homogeneous, for example all immediates are signed. All modrm
handling is in one function, and can be shared between SSE and ALU
instructions (including XMM<->GPR instructions). The SSE/AVX decoder
will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for
the new decoder is more verbose, but the control flow is simpler.
Conditionals are not nested and have small bodies. All instruction
groups are resolved even before operands are decoded, and code
generation is separated as much as possible within small functions
that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback
as much in common code as possible. All ALU operations for example
are implemented as T0=f(T0,T1). For non-ALU instructions,
read-modify-write memory operations are rare, but registers do not
have TCGv equivalents: therefore, the common logic sets up pointer
temporaries with the operands, while load and writeback are handled
by gvec or by helpers.
These principles make future code review and extensibility simpler, at
the cost of having a relatively large amount of code in the form of this
patch. Even EVEX should not be _too_ hard to implement (it's just a crazy
large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old
decoder with the new one. The old decoder takes care of parsing
prefixes and then optionally drops to the new one. The changes to the
old decoder are minimal and allow it to be replaced incrementally with
the new one.
There is a debugging mechanism through a "LIMIT" environment variable.
In user-mode emulation, the variable is the number of instructions
decoded by the new decoder before permanently switching to the old one.
In system emulation, the variable is the highest opcode that is decoded
by the new decoder (this is less friendly, but it's the best that can
be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-23 12:20:55 +03:00
|
|
|
} X86InsnSpecial;
|
|
|
|
|
2022-09-18 01:43:52 +03:00
|
|
|
/*
|
|
|
|
* Special cases for instructions that operate on XMM/YMM registers. Intel
|
|
|
|
* retconned all of them to have VEX exception classes other than 0 and 13, so
|
|
|
|
* all these only matter for instructions that have a VEX exception class.
|
|
|
|
* Based on tables in the "AVX and SSE Instruction Exception Specification"
|
|
|
|
* section of the manual.
|
|
|
|
*/
|
|
|
|
typedef enum X86VEXSpecial {
|
|
|
|
/* Legacy SSE instructions that allow unaligned operands */
|
|
|
|
X86_VEX_SSEUnaligned,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Used for instructions that distinguish the XMM operand type with an
|
|
|
|
* instruction prefix; legacy SSE encodings will allow unaligned operands
|
|
|
|
* for scalar operands only (identified by a REP prefix). In this case,
|
|
|
|
* the decoding table uses "x" for the vector operands instead of specifying
|
|
|
|
* pd/ps/sd/ss individually.
|
|
|
|
*/
|
|
|
|
X86_VEX_REPScalar,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* VEX instructions that only support 256-bit operands with AVX2 (Table 2-17
|
|
|
|
* column 3). Columns 2 and 4 (instructions limited to 256- and 127-bit
|
|
|
|
* operands respectively) are implicit in the presence of dq and qq
|
|
|
|
* operands, and thus handled by decode_op_size.
|
|
|
|
*/
|
|
|
|
X86_VEX_AVX2_256,
|
|
|
|
} X86VEXSpecial;
|
|
|
|
|
|
|
|
|
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible
from the Intel manual. Centralizing the decode the operands makes it
more homogeneous, for example all immediates are signed. All modrm
handling is in one function, and can be shared between SSE and ALU
instructions (including XMM<->GPR instructions). The SSE/AVX decoder
will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for
the new decoder is more verbose, but the control flow is simpler.
Conditionals are not nested and have small bodies. All instruction
groups are resolved even before operands are decoded, and code
generation is separated as much as possible within small functions
that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback
as much in common code as possible. All ALU operations for example
are implemented as T0=f(T0,T1). For non-ALU instructions,
read-modify-write memory operations are rare, but registers do not
have TCGv equivalents: therefore, the common logic sets up pointer
temporaries with the operands, while load and writeback are handled
by gvec or by helpers.
These principles make future code review and extensibility simpler, at
the cost of having a relatively large amount of code in the form of this
patch. Even EVEX should not be _too_ hard to implement (it's just a crazy
large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old
decoder with the new one. The old decoder takes care of parsing
prefixes and then optionally drops to the new one. The changes to the
old decoder are minimal and allow it to be replaced incrementally with
the new one.
There is a debugging mechanism through a "LIMIT" environment variable.
In user-mode emulation, the variable is the number of instructions
decoded by the new decoder before permanently switching to the old one.
In system emulation, the variable is the highest opcode that is decoded
by the new decoder (this is less friendly, but it's the best that can
be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-23 12:20:55 +03:00
|
|
|
typedef struct X86OpEntry X86OpEntry;
|
|
|
|
typedef struct X86DecodedInsn X86DecodedInsn;
|
|
|
|
|
|
|
|
/* Decode function for multibyte opcodes. */
|
|
|
|
typedef void (*X86DecodeFunc)(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b);
|
|
|
|
|
|
|
|
/* Code generation function. */
|
|
|
|
typedef void (*X86GenFunc)(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode);
|
|
|
|
|
|
|
|
struct X86OpEntry {
|
|
|
|
/* Based on the is_decode flags. */
|
|
|
|
union {
|
|
|
|
X86GenFunc gen;
|
|
|
|
X86DecodeFunc decode;
|
|
|
|
};
|
|
|
|
/* op0 is always written, op1 and op2 are always read. */
|
|
|
|
X86OpType op0:8;
|
|
|
|
X86OpSize s0:8;
|
|
|
|
X86OpType op1:8;
|
|
|
|
X86OpSize s1:8;
|
|
|
|
X86OpType op2:8;
|
|
|
|
X86OpSize s2:8;
|
|
|
|
/* Must be I and b respectively if present. */
|
|
|
|
X86OpType op3:8;
|
|
|
|
X86OpSize s3:8;
|
|
|
|
|
|
|
|
X86InsnSpecial special:8;
|
2022-09-01 15:51:35 +03:00
|
|
|
X86CPUIDFeature cpuid:8;
|
2022-09-18 01:43:52 +03:00
|
|
|
unsigned vex_class:8;
|
|
|
|
X86VEXSpecial vex_special:8;
|
2023-10-09 18:43:12 +03:00
|
|
|
unsigned valid_prefix:16;
|
|
|
|
unsigned check:16;
|
|
|
|
unsigned intercept:8;
|
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible
from the Intel manual. Centralizing the decode the operands makes it
more homogeneous, for example all immediates are signed. All modrm
handling is in one function, and can be shared between SSE and ALU
instructions (including XMM<->GPR instructions). The SSE/AVX decoder
will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for
the new decoder is more verbose, but the control flow is simpler.
Conditionals are not nested and have small bodies. All instruction
groups are resolved even before operands are decoded, and code
generation is separated as much as possible within small functions
that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback
as much in common code as possible. All ALU operations for example
are implemented as T0=f(T0,T1). For non-ALU instructions,
read-modify-write memory operations are rare, but registers do not
have TCGv equivalents: therefore, the common logic sets up pointer
temporaries with the operands, while load and writeback are handled
by gvec or by helpers.
These principles make future code review and extensibility simpler, at
the cost of having a relatively large amount of code in the form of this
patch. Even EVEX should not be _too_ hard to implement (it's just a crazy
large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old
decoder with the new one. The old decoder takes care of parsing
prefixes and then optionally drops to the new one. The changes to the
old decoder are minimal and allow it to be replaced incrementally with
the new one.
There is a debugging mechanism through a "LIMIT" environment variable.
In user-mode emulation, the variable is the number of instructions
decoded by the new decoder before permanently switching to the old one.
In system emulation, the variable is the highest opcode that is decoded
by the new decoder (this is less friendly, but it's the best that can
be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-23 12:20:55 +03:00
|
|
|
bool is_decode:1;
|
|
|
|
};
|
|
|
|
|
|
|
|
typedef struct X86DecodedOp {
|
|
|
|
int8_t n;
|
|
|
|
MemOp ot; /* For b/c/d/p/s/q/v/w/y/z */
|
|
|
|
X86OpUnit unit;
|
|
|
|
bool has_ea;
|
2022-08-23 15:55:56 +03:00
|
|
|
int offset; /* For MMX and SSE */
|
|
|
|
|
2023-10-23 09:41:39 +03:00
|
|
|
union {
|
|
|
|
target_ulong imm;
|
|
|
|
/*
|
|
|
|
* This field is used internally by macros OP0_PTR/OP1_PTR/OP2_PTR,
|
|
|
|
* do not access directly!
|
|
|
|
*/
|
|
|
|
TCGv_ptr v_ptr;
|
|
|
|
};
|
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible
from the Intel manual. Centralizing the decode the operands makes it
more homogeneous, for example all immediates are signed. All modrm
handling is in one function, and can be shared between SSE and ALU
instructions (including XMM<->GPR instructions). The SSE/AVX decoder
will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for
the new decoder is more verbose, but the control flow is simpler.
Conditionals are not nested and have small bodies. All instruction
groups are resolved even before operands are decoded, and code
generation is separated as much as possible within small functions
that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback
as much in common code as possible. All ALU operations for example
are implemented as T0=f(T0,T1). For non-ALU instructions,
read-modify-write memory operations are rare, but registers do not
have TCGv equivalents: therefore, the common logic sets up pointer
temporaries with the operands, while load and writeback are handled
by gvec or by helpers.
These principles make future code review and extensibility simpler, at
the cost of having a relatively large amount of code in the form of this
patch. Even EVEX should not be _too_ hard to implement (it's just a crazy
large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old
decoder with the new one. The old decoder takes care of parsing
prefixes and then optionally drops to the new one. The changes to the
old decoder are minimal and allow it to be replaced incrementally with
the new one.
There is a debugging mechanism through a "LIMIT" environment variable.
In user-mode emulation, the variable is the number of instructions
decoded by the new decoder before permanently switching to the old one.
In system emulation, the variable is the highest opcode that is decoded
by the new decoder (this is less friendly, but it's the best that can
be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-23 12:20:55 +03:00
|
|
|
} X86DecodedOp;
|
|
|
|
|
|
|
|
struct X86DecodedInsn {
|
|
|
|
X86OpEntry e;
|
|
|
|
X86DecodedOp op[3];
|
2023-10-23 09:41:39 +03:00
|
|
|
/*
|
|
|
|
* Rightmost immediate, for convenience since most instructions have
|
|
|
|
* one (and also for 4-operand instructions).
|
|
|
|
*/
|
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible
from the Intel manual. Centralizing the decode the operands makes it
more homogeneous, for example all immediates are signed. All modrm
handling is in one function, and can be shared between SSE and ALU
instructions (including XMM<->GPR instructions). The SSE/AVX decoder
will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for
the new decoder is more verbose, but the control flow is simpler.
Conditionals are not nested and have small bodies. All instruction
groups are resolved even before operands are decoded, and code
generation is separated as much as possible within small functions
that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback
as much in common code as possible. All ALU operations for example
are implemented as T0=f(T0,T1). For non-ALU instructions,
read-modify-write memory operations are rare, but registers do not
have TCGv equivalents: therefore, the common logic sets up pointer
temporaries with the operands, while load and writeback are handled
by gvec or by helpers.
These principles make future code review and extensibility simpler, at
the cost of having a relatively large amount of code in the form of this
patch. Even EVEX should not be _too_ hard to implement (it's just a crazy
large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old
decoder with the new one. The old decoder takes care of parsing
prefixes and then optionally drops to the new one. The changes to the
old decoder are minimal and allow it to be replaced incrementally with
the new one.
There is a debugging mechanism through a "LIMIT" environment variable.
In user-mode emulation, the variable is the number of instructions
decoded by the new decoder before permanently switching to the old one.
In system emulation, the variable is the highest opcode that is decoded
by the new decoder (this is less friendly, but it's the best that can
be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-23 12:20:55 +03:00
|
|
|
target_ulong immediate;
|
|
|
|
AddressParts mem;
|
|
|
|
|
2023-10-11 16:26:40 +03:00
|
|
|
TCGv cc_dst, cc_src, cc_src2;
|
|
|
|
TCGv_i32 cc_op_dynamic;
|
|
|
|
int8_t cc_op;
|
|
|
|
|
target/i386: add core of new i386 decoder
The new decoder is based on three principles:
- use mostly table-driven decoding, using tables derived as much as possible
from the Intel manual. Centralizing the decode the operands makes it
more homogeneous, for example all immediates are signed. All modrm
handling is in one function, and can be shared between SSE and ALU
instructions (including XMM<->GPR instructions). The SSE/AVX decoder
will also not have duplicated code between the 0F, 0F38 and 0F3A tables.
- keep the code as "non-branchy" as possible. Generally, the code for
the new decoder is more verbose, but the control flow is simpler.
Conditionals are not nested and have small bodies. All instruction
groups are resolved even before operands are decoded, and code
generation is separated as much as possible within small functions
that only handle one instruction each.
- keep address generation and (for ALU operands) memory loads and writeback
as much in common code as possible. All ALU operations for example
are implemented as T0=f(T0,T1). For non-ALU instructions,
read-modify-write memory operations are rare, but registers do not
have TCGv equivalents: therefore, the common logic sets up pointer
temporaries with the operands, while load and writeback are handled
by gvec or by helpers.
These principles make future code review and extensibility simpler, at
the cost of having a relatively large amount of code in the form of this
patch. Even EVEX should not be _too_ hard to implement (it's just a crazy
large amount of possibilities).
This patch introduces the main decoder flow, and integrates the old
decoder with the new one. The old decoder takes care of parsing
prefixes and then optionally drops to the new one. The changes to the
old decoder are minimal and allow it to be replaced incrementally with
the new one.
There is a debugging mechanism through a "LIMIT" environment variable.
In user-mode emulation, the variable is the number of instructions
decoded by the new decoder before permanently switching to the old one.
In system emulation, the variable is the highest opcode that is decoded
by the new decoder (this is less friendly, but it's the best that can
be done without requiring deterministic execution).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-23 12:20:55 +03:00
|
|
|
uint8_t b;
|
|
|
|
};
|
|
|
|
|