mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Qi Hu	1215317510	target/i386: Fix calculation of LOCK NEG eflags After: lock negl -0x14(%rbp) pushf pop %rax %rax will contain the wrong value because the "lock neg" calculates the wrong eflags. Simple test: #include <assert.h> int main() { __volatile__ unsigned test = 0x2363a; __volatile__ char cond = 0; asm( "lock negl %0 \n\t" "sets %1" : "=m"(test), "=r"(cond)); assert(cond & 1); return 0; } Reported-by: Jinyang Shen <shenjinyang@loongson.cn> Co-Developed-by: Xuehai Chen <chenxuehai@loongson.cn> Signed-off-by: Xuehai Chen <chenxuehai@loongson.cn> Signed-off-by: Qi Hu <huqi@loongson.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-31 09:46:34 +01:00
Stefan Hajnoczi	08a5d04606	Revert incorrect cflags initialization. Add direct jumps for tcg/loongarch64. Speed up breakpoint check. Improve assertions for atomic.h. Move restore_state_to_opc to TCGCPUOps. Cleanups to TranslationBlock maintenance. -----BEGIN PGP SIGNATURE----- iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmNYlo4dHHJpY2hhcmQu aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV9y2wf9EKsCA6VtYI2Qtftf q/ujYFmUf8AKTb9eVcA0XX71CT1dEnFR7GQyT8B8X13x0pSbOX7tbEWHPreegTFV tESiejvymi6Q9devAB58GVwNoU/zPIQQGhCPxkVUKDmRztJz22MbGUzd7UKPPgU8 2nVMkIpLTMBsKeFLxE/D3ZntmdKsgyI/1Dtkl9TxvlDGsCbMjbNcr8lM+TLaG2oX GZhFyJHKEVy0cobukvhhb/9rU7AWdG/BnFmZM16JxvHV/YCwJBx3Udhcy9xPePUU yIjkGsUAq4aB6H9RFuTWh7GmaY5u6gMbTTi2J7hDos0mzauYJtpgEB/H42LpycGE sOhkLQ== =DUb8 -----END PGP SIGNATURE----- Merge tag 'pull-tcg-20221026' of https://gitlab.com/rth7680/qemu into staging Revert incorrect cflags initialization. Add direct jumps for tcg/loongarch64. Speed up breakpoint check. Improve assertions for atomic.h. Move restore_state_to_opc to TCGCPUOps. Cleanups to TranslationBlock maintenance. # -----BEGIN PGP SIGNATURE----- # # iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmNYlo4dHHJpY2hhcmQu # aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV9y2wf9EKsCA6VtYI2Qtftf # q/ujYFmUf8AKTb9eVcA0XX71CT1dEnFR7GQyT8B8X13x0pSbOX7tbEWHPreegTFV # tESiejvymi6Q9devAB58GVwNoU/zPIQQGhCPxkVUKDmRztJz22MbGUzd7UKPPgU8 # 2nVMkIpLTMBsKeFLxE/D3ZntmdKsgyI/1Dtkl9TxvlDGsCbMjbNcr8lM+TLaG2oX # GZhFyJHKEVy0cobukvhhb/9rU7AWdG/BnFmZM16JxvHV/YCwJBx3Udhcy9xPePUU # yIjkGsUAq4aB6H9RFuTWh7GmaY5u6gMbTTi2J7hDos0mzauYJtpgEB/H42LpycGE # sOhkLQ== # =DUb8 # -----END PGP SIGNATURE----- # gpg: Signature made Tue 25 Oct 2022 22:08:14 EDT # gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F # gpg: issuer "richard.henderson@linaro.org" # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full] # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * tag 'pull-tcg-20221026' of https://gitlab.com/rth7680/qemu: (47 commits) accel/tcg: Remove restore_state_to_opc function target/xtensa: Convert to tcg_ops restore_state_to_opc target/tricore: Convert to tcg_ops restore_state_to_opc target/sparc: Convert to tcg_ops restore_state_to_opc target/sh4: Convert to tcg_ops restore_state_to_opc target/s390x: Convert to tcg_ops restore_state_to_opc target/rx: Convert to tcg_ops restore_state_to_opc target/riscv: Convert to tcg_ops restore_state_to_opc target/ppc: Convert to tcg_ops restore_state_to_opc target/openrisc: Convert to tcg_ops restore_state_to_opc target/nios2: Convert to tcg_ops restore_state_to_opc target/mips: Convert to tcg_ops restore_state_to_opc target/microblaze: Convert to tcg_ops restore_state_to_opc target/m68k: Convert to tcg_ops restore_state_to_opc target/loongarch: Convert to tcg_ops restore_state_to_opc target/i386: Convert to tcg_ops restore_state_to_opc target/hppa: Convert to tcg_ops restore_state_to_opc target/hexagon: Convert to tcg_ops restore_state_to_opc target/cris: Convert to tcg_ops restore_state_to_opc target/avr: Convert to tcg_ops restore_state_to_opc ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 10:53:41 -04:00
Richard Henderson	434382e640	target/i386: Convert to tcg_ops restore_state_to_opc Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2022-10-26 11:11:28 +10:00
Stefan Hajnoczi	79fc2fb685	Pull request -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAmNXleQSHGxhdXJlbnRA dml2aWVyLmV1AAoJEPMMOL0/L748TIsP/1gulTFpYAs3Kao6IZonsuCzrjQrJWqv 5SD7cVb7isOWdOSNK3glE4dG54Q38PaS9GHaCvzIndjHxlWddCCUuwiw6p1Wdo70 fjNfcCOEPoalQbkZvLejhs5n2rlfTvS5JUnLKVD9+ton7hjnTyKGDDYao5mYhtzv Kn9NpCD3m+K3orzG2Jj7jR1UAumg4cW4YQEpT8ItDT4Y5UAxjL6TZQ6CE220DQDq YwDrHEgDYr/UKlTbIC/JwlKOLr0sh+UB1VV8GZS6e6pU9u5WpDDHlQZpU8W2tLLg cG5m8tLG2avFxRMUFrPNZ8Lx2xKO8wL1PtgAO9w7qFK+r0soZvv+Zh4ev/t5zGLf ciliItqf97yPYNIc3su75jqdQHed7lmZc3m9LBHg8VXN6rAatt8vWUbG90sAZuTU tWBZHvQmG0s2MK4UYqeQ59tc21v9T2+VCiiv/1vjgEUr8tBhXS562jrDt/bNEqKa eRzT4h4ffbP6BJRnyakxkFkQ7nd2OdlLNKUAr9Tk6T2fYuarfEdbYx//0950agqD AAtdQ/AJm6Pq1Px0/RuMKK5WsL818BoAkfr6n7qXleunytJ1W5hjW9EmFIPZWPTR ce/lSFHA0+MCpg6C8zAa4iNBg/Pk0p3GRrTeWyHK1FjV+Gep1QtE/a1vk/qiPzTM qZVfPxa8cXXe =caiq -----END PGP SIGNATURE----- Merge tag 'trivial-branch-for-7.2-pull-request' of https://gitlab.com/laurent_vivier/qemu into staging Pull request # -----BEGIN PGP SIGNATURE----- # # iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAmNXleQSHGxhdXJlbnRA # dml2aWVyLmV1AAoJEPMMOL0/L748TIsP/1gulTFpYAs3Kao6IZonsuCzrjQrJWqv # 5SD7cVb7isOWdOSNK3glE4dG54Q38PaS9GHaCvzIndjHxlWddCCUuwiw6p1Wdo70 # fjNfcCOEPoalQbkZvLejhs5n2rlfTvS5JUnLKVD9+ton7hjnTyKGDDYao5mYhtzv # Kn9NpCD3m+K3orzG2Jj7jR1UAumg4cW4YQEpT8ItDT4Y5UAxjL6TZQ6CE220DQDq # YwDrHEgDYr/UKlTbIC/JwlKOLr0sh+UB1VV8GZS6e6pU9u5WpDDHlQZpU8W2tLLg # cG5m8tLG2avFxRMUFrPNZ8Lx2xKO8wL1PtgAO9w7qFK+r0soZvv+Zh4ev/t5zGLf # ciliItqf97yPYNIc3su75jqdQHed7lmZc3m9LBHg8VXN6rAatt8vWUbG90sAZuTU # tWBZHvQmG0s2MK4UYqeQ59tc21v9T2+VCiiv/1vjgEUr8tBhXS562jrDt/bNEqKa # eRzT4h4ffbP6BJRnyakxkFkQ7nd2OdlLNKUAr9Tk6T2fYuarfEdbYx//0950agqD # AAtdQ/AJm6Pq1Px0/RuMKK5WsL818BoAkfr6n7qXleunytJ1W5hjW9EmFIPZWPTR # ce/lSFHA0+MCpg6C8zAa4iNBg/Pk0p3GRrTeWyHK1FjV+Gep1QtE/a1vk/qiPzTM # qZVfPxa8cXXe # =caiq # -----END PGP SIGNATURE----- # gpg: Signature made Tue 25 Oct 2022 03:53:08 EDT # gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C # gpg: issuer "laurent@vivier.eu" # gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full] # gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full] # gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full] # Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C * tag 'trivial-branch-for-7.2-pull-request' of https://gitlab.com/laurent_vivier/qemu: accel/tcg/tcg-accel-ops-rr: fix trivial typo ui: remove useless typecasts treewide: Remove the unnecessary space before semicolon include/hw/scsi/scsi.h: Remove unused scsi_legacy_handle_cmdline() prototype vmstate-static-checker:remove this redundant return tests/qtest: vhost-user-test: Fix [-Werror=format-overflow=] build warning tests/qtest: migration-test: Fix [-Werror=format-overflow=] build warning Drop useless casts from g_malloc() & friends to pointer elf2dmp: free memory in failure hw/core: Tidy up unnecessary casting away of const .gitignore: add multiple items to .gitignore Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-25 11:37:17 -04:00
Markus Armbruster	0a553c12c7	Drop useless casts from g_malloc() & friends to pointer These memory allocation functions return void *, and casting to another pointer type is useless clutter. Drop these casts. If you really want another pointer type, consider g_new(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220923120025.448759-3-armbru@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2022-10-22 23:15:40 +02:00
Paolo Bonzini	2872b0f390	target/i386: implement FMA instructions The only issue with FMA instructions is that there are _a lot_ of them (30 opcodes, each of which comes in up to 4 versions depending on VEX.W and VEX.L; a total of 96 possibilities). However, they can be implement with only 6 helpers, two for scalar operations and four for packed operations. (Scalar versions do not do any merging; they only affect the bottom 32 or 64 bits of the output operand. Therefore, there is no separate XMM and YMM of the scalar helpers). First, we can reduce the number of helpers to one third by passing four operands (one output and three inputs); the reordering of which operands go to the multiply and which go to the add is done in emit.c. Second, the different instructions also dispatch to the same softfloat function, so the flags for float32_muladd and float64_muladd are passed in the helper as int arguments, with a little extra complication to handle FMADDSUB and FMSUBADD. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-22 09:05:54 +02:00
Paolo Bonzini	cf5ec6641e	target/i386: implement F16C instructions F16C only consists of two instructions, which are a bit peculiar nevertheless. First, they access only the low half of an YMM or XMM register for the packed-half operand; the exact size still depends on the VEX.L flag. This is similar to the existing avx_movx flag, but not exactly because avx_movx is hardcoded to affect operand 2. To this end I added a "ph" format name; it's possible to reuse this approach for the VPMOVSX and VPMOVZX instructions, though that would also require adding two more formats for the low-quarter and low-eighth of an operand. Second, VCVTPS2PH is somewhat weird because it stores the result of the instruction into memory rather than loading it. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-20 15:16:18 +02:00
Paolo Bonzini	314d3eff66	target/i386: introduce function to set rounding mode from FPCW or MXCSR bits VROUND, FSTCW and STMXCSR all have to perform the same conversion from x86 rounding modes to softfloat constants. Since the ISA is consistent on the meaning of the two-bit rounding modes, extract the common code into a wrapper for set_float_rounding_mode. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-20 15:16:13 +02:00
Paolo Bonzini	0d4bcac3ca	target/i386: decode-new: avoid out-of-bounds access to xmm_regs[-1] If the destination is a memory register, op->n is -1. Going through tcg_gen_gvec_dup_imm path is both useless (the value has been stored by the gen_* function already) and wrong because of the out-of-bounds access. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-20 15:15:50 +02:00
Paolo Bonzini	653fad2497	target/i386: remove old SSE decoder With all SSE (and AVX!) instructions now implemented in disas_insn_new, it's possible to remove gen_sse, as well as the helpers for instructions that now use gvec. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	71a0891d61	target/i386: move 3DNow to the new decoder This adds another kind of weirdness when you thought you had seen it all: an opcode byte that comes _after_ the address, not before. It's not worth adding a new X86_SPECIAL_* constant for it, but it's actually not unlike VCMP; so, forgive me for exploiting the similarity and just deciding to dispatch to the right gen_helper_* call in a single code generation function. In fact, the old decoder had a bug where s->rip_offset should have been set to 1 for 3DNow! instructions, and it's fixed now. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paul Brook	2f8a21d8ff	target/i386: Enable AVX cpuid bits when using TCG Include AVX, AVX2 and VAES in the guest cpuid features supported by TCG. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-40-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	57f6bba023	target/i386: implement VLDMXCSR/VSTMXCSR These are exactly the same as the non-VEX version, but one has to be careful that only VEX.L=0 is allowed. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	892544317f	target/i386: implement XSAVE and XRSTOR of AVX registers Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	f8d19eec0d	target/i386: reimplement 0x0f 0x28-0x2f, add AVX Here the code is a bit uglier due to the truncation and extension of registers to and from 32-bit. There is also a mistake in the manual with respect to the size of the memory operand of CVTPS2PI and CVTTPS2PI, reported by Ricky Zhou. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	7170a17ec3	target/i386: reimplement 0x0f 0x10-0x17, add AVX These are mostly moves, and yet are a total pain. The main issue is that: 1) some instructions are selected by mod==11 (register operand) vs. mod=00/01/10 (memory operand) 2) stores to memory are two-operand operations, while the 3-register and load-from-memory versions operate on the entire contents of the destination; this makes it easier to separate the gen_* function for the store case 3) it's inefficient to load into xmm_T0 only to move the value out again, so the gen_* function for the load case is separated too The manual also has various mistakes in the operands here, for example the store case of MOVHPS operates on a 128-bit source (albeit discarding the bottom 64 bits) and therefore should be Mq,Vdq rather than Mq,Vq. Likewise for the destination and source of MOVHLPS. VUNPCK?PS and VUNPCK?PD are the same as VUNPCK?DQ and VUNPCK?QDQ, but encoded as prefixes rather than separate operands. The helpers can be reused however. For MOVSLDUP, MOVSHDUP and MOVDDUP I chose to reimplement them as helpers. I named the helper for MOVDDUP "movdldup" in preparation for possible future introduction of MOVDHDUP and to clarify the similarity with MOVSLDUP. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	aba2b8ecb9	target/i386: reimplement 0x0f 0xc2, 0xc4-0xc6, add AVX Nothing special going on here, for once. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	16fc5726a6	target/i386: reimplement 0x0f 0x38, add AVX There are several special cases here: 1) extending moves have different widths for the helpers vs. for the memory loads, and the width for memory loads depends on VEX.L too. This is represented by X86_SPECIAL_AVXExtMov. 2) some instructions, such as variable-width shifts, select the vector element size via REX.W. 3) VSIB instructions (VGATHERxPy, VPGATHERxy) are also part of this group, and they have (among other things) two output operands. 3) the macros for 4-operand blends (which are under 0x0f 0x3a) have to be extended to support 2-operand blends. The 2-operand variant actually came a few years earlier, but it is clearer to implement them in the opposite order. X86_TYPE_WM, introduced earlier for unaligned loads, is reused for helpers that accept a Reg* but have a M argument. These three-byte opcodes also include AVX new instructions, for which the helpers were originally implemented by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Richard Henderson	d4af67a27a	target/i386: Use tcg gvec ops for pmovmskb As pmovmskb is used by strlen et al, this is the third highest overhead sse operation at %0.8. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> [Reorganize to generate code for any vector size. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	7906847768	target/i386: reimplement 0x0f 0x3a, add AVX The more complicated operations here are insertions and extractions. Otherwise, there are just more entries than usual because the PS/PD/SS/SD variations are encoded in the opcode rater than in the prefixes. These three-byte opcodes also include AVX new instructions, whose implementation in the helpers was originally done by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	a64eee3ab4	target/i386: clarify (un)signedness of immediates from 0F3Ah opcodes Three-byte opcodes from the 0F3Ah area all have an immediate byte which is usually unsigned. Clarify in the helper code that it is unsigned; the new decoder treats immediates as signed by default, and seeing an intN_t in the prototype might give the wrong impression that one can use decode->immediate directly. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	6bbeb98d10	target/i386: reimplement 0x0f 0xd0-0xd7, 0xe0-0xe7, 0xf0-0xf7, add AVX The more complicated ones here are d6-d7, e6-e7, f7. The others are trivial. For LDDQU, using gen_load_sse directly might corrupt the register if the second part of the load fails. Therefore, add a custom X86_TYPE_WM value; like X86_TYPE_W it does call gen_load(), but it also rejects a value of 11 in the ModRM field like X86_TYPE_M. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	ce4fcb9478	target/i386: reimplement 0x0f 0x70-0x77, add AVX This includes shifts by immediate, which use bits 3-5 of the ModRM byte as an opcode extension. With the exception of 128-bit shifts, they are implemented using gvec. This also covers VZEROALL and VZEROUPPER, which use the same opcode as EMMS. If we were wanting to optimize out gen_clear_ymmh then this would be one of the starting points. The implementation of the VZEROALL and VZEROUPPER helpers is by Paul Brook. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	d1c1a4222c	target/i386: reimplement 0x0f 0x78-0x7f, add AVX These are a mixed batch, including the first two horizontal (66 and F2 only) operations, more moves, and SSE4a extract/insert. Because SSE4a is pretty rare, I chose to leave the helper as they are, but it is possible to unify them by loading index and length from the source XMM register and generating deposit or extract TCG ops. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	03b4588070	target/i386: reimplement 0x0f 0x50-0x5f, add AVX These are mostly floating-point SSE operations. The odd ones out are MOVMSK and CVTxx2yy, the others are straightforward. Unary operations are a bit special in AVX because they have 2 operands for PD/PS operands (VEX.vvvv must be 1111b), and 3 operands for SD/SS. They are handled using X86_OP_GROUP3 for compactness. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	1d0efbdb35	target/i386: reimplement 0x0f 0xd8-0xdf, 0xe8-0xef, 0xf8-0xff, add AVX These are more simple integer instructions present in both MMX and SSE/AVX, with no holes that were later occupied by newer instructions. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	92ec056a6b	target/i386: reimplement 0x0f 0x60-0x6f, add AVX These are both MMX and SSE/AVX instructions, except for vmovdqu. In both cases the inputs and output is in s->ptr{0,1,2}, so the only difference between MMX, SSE, and AVX is which helper to call. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	b98f886c8f	target/i386: Introduce 256-bit vector helpers The new implementation of SSE will cover AVX from the get go, because all the work for the helper functions is already done. We just need to build them. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	6e0cac782a	target/i386: implement additional AVX comparison operators The new implementation of SSE will cover AVX from the get go, so include the 24 extra comparison operators that are only available with the VEX prefix. Based on a patch by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	620f75566a	target/i386: provide 3-operand versions of unary scalar helpers Compared to Paul's implementation, the new decoder will use a different approach to implement AVX's merging of dst with src1 on scalar operations. Adjust the old SSE decoder to be compatible with new-style helpers. The affected instructions are CVTSx2Sx, ROUNDSx, RSQRTSx, SQRTSx, RCPSx. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	1de9e7e612	target/i386: support operand merging in binary scalar helpers Compared to Paul's implementation, the new decoder will use a different approach to implement AVX's merging of dst with src1 on scalar operations. Adjust the helpers to provide this functionality. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	f05f9789f5	target/i386: extend helpers to support VEX.V 3- and 4- operand encodings Add to the helpers all the operands that are needed to implement AVX. Extracted from a patch by Paul Brook <paul@nowt.org>. Message-Id: <20220424220204.2493824-26-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paul Brook	30f419219a	target/i386: Prepare ops_sse_header.h for 256 bit AVX Adjust all #ifdefs to match the ones in ops_sse.h. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-23-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	1d0b926150	target/i386: move scalar 0F 38 and 0F 3A instruction to new decoder Because these are the only VEX instructions that QEMU supports, the new decoder is entered on the first byte of a valid VEX prefix, and VEX decoding only needs to be done in decode-new.c.inc. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	55a3328669	target/i386: validate SSE prefixes directly in the decoding table Many SSE and AVX instructions are only valid with specific prefixes (none, 66, F3, F2). Introduce a direct way to encode this in the decoding table to avoid using decode groups too much. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	20581aadec	target/i386: validate VEX prefixes via the instructions' exception classes Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paul Brook	608db8dbfb	target/i386: add AVX_EN hflag Add a new hflag bit to determine whether AVX instructions are allowed Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-4-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	caa01fadbe	target/i386: add CPUID feature checks to new decoder Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	268dc4648f	target/i386: add CPUID[EAX=7,ECX=0].ECX to DisasContext TCG will shortly implement VAES instructions, so add the relevant feature word to the DisasContext. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	6ba13999be	target/i386: add ALU load/writeback core Add generic code generation that takes care of preparing operands around calls to decode.e.gen in a table-driven manner, so that ALU operations need not take care of that. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	b3e22b2318	target/i386: add core of new i386 decoder The new decoder is based on three principles: - use mostly table-driven decoding, using tables derived as much as possible from the Intel manual. Centralizing the decode the operands makes it more homogeneous, for example all immediates are signed. All modrm handling is in one function, and can be shared between SSE and ALU instructions (including XMM<->GPR instructions). The SSE/AVX decoder will also not have duplicated code between the 0F, 0F38 and 0F3A tables. - keep the code as "non-branchy" as possible. Generally, the code for the new decoder is more verbose, but the control flow is simpler. Conditionals are not nested and have small bodies. All instruction groups are resolved even before operands are decoded, and code generation is separated as much as possible within small functions that only handle one instruction each. - keep address generation and (for ALU operands) memory loads and writeback as much in common code as possible. All ALU operations for example are implemented as T0=f(T0,T1). For non-ALU instructions, read-modify-write memory operations are rare, but registers do not have TCGv equivalents: therefore, the common logic sets up pointer temporaries with the operands, while load and writeback are handled by gvec or by helpers. These principles make future code review and extensibility simpler, at the cost of having a relatively large amount of code in the form of this patch. Even EVEX should not be _too_ hard to implement (it's just a crazy large amount of possibilities). This patch introduces the main decoder flow, and integrates the old decoder with the new one. The old decoder takes care of parsing prefixes and then optionally drops to the new one. The changes to the old decoder are minimal and allow it to be replaced incrementally with the new one. There is a debugging mechanism through a "LIMIT" environment variable. In user-mode emulation, the variable is the number of instructions decoded by the new decoder before permanently switching to the old one. In system emulation, the variable is the highest opcode that is decoded by the new decoder (this is less friendly, but it's the best that can be done without requiring deterministic execution). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	a61ef762f9	target/i386: make rex_w available even in 32-bit mode REX.W can be used even in 32-bit mode by AVX instructions, where it is retroactively renamed to VEX.W. Make the field available even in 32-bit mode but keep the REX_W() macro as it was; this way, that the handling of dflag does not use it by mistake and the AVX code more clearly points at the special VEX behavior of the bit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	12a2c9c72c	target/i386: make ldo/sto operations consistent with ldq ldq takes a pointer to the first byte to load the 64-bit word in; ldo takes a pointer to the first byte of the ZMMReg. Make them consistent, which will be useful in the new SSE decoder's load/writeback routines. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Richard Henderson	75f107a856	target/i386: Define XMMReg and access macros, align ZMM registers This will be used for emission and endian adjustments of gvec operations. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220822223722.1697758-2-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Richard Henderson	8629e77be5	target/i386: Use probe_access_full for final stage2 translation Rather than recurse directly on mmu_translate, go through the same softmmu lookup that we did for the page table walk. This centralizes all knowledge of MMU_NESTED_IDX, with respect to setup of TranslationParams, to get_physical_address. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221002172956.265735-10-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Richard Henderson	4a1e9d4d11	target/i386: Use atomic operations for pte updates Use probe_access_full in order to resolve to a host address, which then lets us use a host cmpxchg to update the pte. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/279 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221002172956.265735-9-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Richard Henderson	11b4e971dc	target/i386: Combine 5 sets of variables in mmu_translate We don't need one variable set per translation level, which requires copying into pte/pte_addr for huge pages. Standardize on pte/pte_addr for all levels. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221002172956.265735-8-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Richard Henderson	726ea33531	target/i386: Use MMU_NESTED_IDX for vmload/vmsave Use MMU_NESTED_IDX for each memory access, rather than just a single translation to physical. Adjust svm_save_seg and svm_load_seg to pass in mmu_idx. This removes the last use of get_hphys so remove it. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221002172956.265735-7-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Richard Henderson	98281984a3	target/i386: Add MMU_PHYS_IDX and MMU_NESTED_IDX These new mmu indexes will be helpful for improving paging and code throughout the target. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221002172956.265735-6-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Richard Henderson	9bbcf37219	target/i386: Reorg GET_HPHYS Replace with PTE_HPHYS for the page table walk, and a direct call to mmu_translate for the final stage2 translation. Hoist the check for HF2_NPT_MASK out to get_physical_address, which avoids the recursive call when stage2 is disabled. We can now return all the way out to x86_cpu_tlb_fill before raising an exception, which means probe works. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221002172956.265735-5-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00

1 2 3 4 5 ...

1457 Commits