mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Paolo Bonzini	2872b0f390	target/i386: implement FMA instructions The only issue with FMA instructions is that there are _a lot_ of them (30 opcodes, each of which comes in up to 4 versions depending on VEX.W and VEX.L; a total of 96 possibilities). However, they can be implement with only 6 helpers, two for scalar operations and four for packed operations. (Scalar versions do not do any merging; they only affect the bottom 32 or 64 bits of the output operand. Therefore, there is no separate XMM and YMM of the scalar helpers). First, we can reduce the number of helpers to one third by passing four operands (one output and three inputs); the reordering of which operands go to the multiply and which go to the add is done in emit.c. Second, the different instructions also dispatch to the same softfloat function, so the flags for float32_muladd and float64_muladd are passed in the helper as int arguments, with a little extra complication to handle FMADDSUB and FMSUBADD. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-22 09:05:54 +02:00
Paolo Bonzini	cf5ec6641e	target/i386: implement F16C instructions F16C only consists of two instructions, which are a bit peculiar nevertheless. First, they access only the low half of an YMM or XMM register for the packed-half operand; the exact size still depends on the VEX.L flag. This is similar to the existing avx_movx flag, but not exactly because avx_movx is hardcoded to affect operand 2. To this end I added a "ph" format name; it's possible to reuse this approach for the VPMOVSX and VPMOVZX instructions, though that would also require adding two more formats for the low-quarter and low-eighth of an operand. Second, VCVTPS2PH is somewhat weird because it stores the result of the instruction into memory rather than loading it. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-20 15:16:18 +02:00
Paolo Bonzini	653fad2497	target/i386: remove old SSE decoder With all SSE (and AVX!) instructions now implemented in disas_insn_new, it's possible to remove gen_sse, as well as the helpers for instructions that now use gvec. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	7170a17ec3	target/i386: reimplement 0x0f 0x10-0x17, add AVX These are mostly moves, and yet are a total pain. The main issue is that: 1) some instructions are selected by mod==11 (register operand) vs. mod=00/01/10 (memory operand) 2) stores to memory are two-operand operations, while the 3-register and load-from-memory versions operate on the entire contents of the destination; this makes it easier to separate the gen_* function for the store case 3) it's inefficient to load into xmm_T0 only to move the value out again, so the gen_* function for the load case is separated too The manual also has various mistakes in the operands here, for example the store case of MOVHPS operates on a 128-bit source (albeit discarding the bottom 64 bits) and therefore should be Mq,Vdq rather than Mq,Vq. Likewise for the destination and source of MOVHLPS. VUNPCK?PS and VUNPCK?PD are the same as VUNPCK?DQ and VUNPCK?QDQ, but encoded as prefixes rather than separate operands. The helpers can be reused however. For MOVSLDUP, MOVSHDUP and MOVDDUP I chose to reimplement them as helpers. I named the helper for MOVDDUP "movdldup" in preparation for possible future introduction of MOVDHDUP and to clarify the similarity with MOVSLDUP. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	16fc5726a6	target/i386: reimplement 0x0f 0x38, add AVX There are several special cases here: 1) extending moves have different widths for the helpers vs. for the memory loads, and the width for memory loads depends on VEX.L too. This is represented by X86_SPECIAL_AVXExtMov. 2) some instructions, such as variable-width shifts, select the vector element size via REX.W. 3) VSIB instructions (VGATHERxPy, VPGATHERxy) are also part of this group, and they have (among other things) two output operands. 3) the macros for 4-operand blends (which are under 0x0f 0x3a) have to be extended to support 2-operand blends. The 2-operand variant actually came a few years earlier, but it is clearer to implement them in the opposite order. X86_TYPE_WM, introduced earlier for unaligned loads, is reused for helpers that accept a Reg* but have a M argument. These three-byte opcodes also include AVX new instructions, for which the helpers were originally implemented by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	7906847768	target/i386: reimplement 0x0f 0x3a, add AVX The more complicated operations here are insertions and extractions. Otherwise, there are just more entries than usual because the PS/PD/SS/SD variations are encoded in the opcode rater than in the prefixes. These three-byte opcodes also include AVX new instructions, whose implementation in the helpers was originally done by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	a64eee3ab4	target/i386: clarify (un)signedness of immediates from 0F3Ah opcodes Three-byte opcodes from the 0F3Ah area all have an immediate byte which is usually unsigned. Clarify in the helper code that it is unsigned; the new decoder treats immediates as signed by default, and seeing an intN_t in the prototype might give the wrong impression that one can use decode->immediate directly. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:05 +02:00
Paolo Bonzini	b98f886c8f	target/i386: Introduce 256-bit vector helpers The new implementation of SSE will cover AVX from the get go, because all the work for the helper functions is already done. We just need to build them. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	6e0cac782a	target/i386: implement additional AVX comparison operators The new implementation of SSE will cover AVX from the get go, so include the 24 extra comparison operators that are only available with the VEX prefix. Based on a patch by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	620f75566a	target/i386: provide 3-operand versions of unary scalar helpers Compared to Paul's implementation, the new decoder will use a different approach to implement AVX's merging of dst with src1 on scalar operations. Adjust the old SSE decoder to be compatible with new-style helpers. The affected instructions are CVTSx2Sx, ROUNDSx, RSQRTSx, SQRTSx, RCPSx. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	f05f9789f5	target/i386: extend helpers to support VEX.V 3- and 4- operand encodings Add to the helpers all the operands that are needed to implement AVX. Extracted from a patch by Paul Brook <paul@nowt.org>. Message-Id: <20220424220204.2493824-26-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paul Brook	30f419219a	target/i386: Prepare ops_sse_header.h for 256 bit AVX Adjust all #ifdefs to match the ones in ops_sse.h. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-23-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-18 13:58:04 +02:00
Paolo Bonzini	ca4b1b43bc	target/i386: fix INSERTQ implementation INSERTQ is defined to not modify any bits in the lower 64 bits of the destination, other than the ones being replaced with bits from the source operand. QEMU instead is using unshifted bits from the source for those bits. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-19 15:16:00 +02:00
Paul Brook	cbf4ad5498	target/i386: reimplement AVX comparison helpers AVX includes an additional set of comparison predicates, some of which our softfloat implementation does not expose as separate functions. Rewrite the helpers in terms of floatN_compare for future extensibility. Signed-off-by: Paul Brook <paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220424220204.2493824-24-paul@nowt.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 20:16:33 +02:00
Paolo Bonzini	ce4fa29f94	target/i386: Add size suffix to vector FP helpers For AVX we're going to need both 128 bit (xmm) and 256 bit (ymm) variants of floating point helpers. Add the register type suffix to the existing PS and PD helpers (SS and SD variants are only valid on 128 bit vectors) No functional changes. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-15-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 20:16:33 +02:00
Richard Henderson	8929906e21	tcg: Remove dh_alias indirection for dh_typecode The dh_alias redirect is intended to handle TCG types as distinguished from C types. TCG does not distinguish signed int from unsigned int, because they are the same size. However, we need to retain this distinction for dh_typecode, lest we fail to extend abi types properly for the host call parameters. This bug was detected when running the 'arm' emulator on an s390 system. The s390 uses TCG_TARGET_EXTEND_ARGS which triggers code in tcg_gen_callN to extend 32 bit values to 64 bits; the incorrect sign data in the typemask for each argument caused the values to be extended as unsigned values. This simple program exhibits the problem: static volatile int num = -9; static volatile int den = -5; int main(void) { int quo = num / den; printf("num %d den %d quo %d\n", num, den, quo); exit(0); } When run on the broken qemu, this results in: num -9 den -5 quo 0 The correct result is: num -9 den -5 quo 1 Fixes: `7319d83a73` ("tcg: Combine dh_is_64bit and dh_is_signed to dh_typecode") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/876 Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reported-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> Tested-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> Tested-by: Keith Packard <keithp@keithp.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2022-02-28 08:04:06 -10:00
Richard Henderson	7319d83a73	tcg: Combine dh_is_64bit and dh_is_signed to dh_typecode We will shortly be interested in distinguishing pointers from integers in the helper's declaration, as well as a true void return. We currently have two parallel 1 bit fields; merge them and expand to a 3 bit field. Our current maximum is 7 helper arguments, plus the return makes 8 * 3 = 24 bits used within the uint32_t typemask. Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2021-06-19 08:51:11 -07:00
Chetan Pant	d9ff33ada7	x86 tcg cpus: Fix Lesser GPL version number There is no "version 2" of the "Lesser" General Public License. It is either "GPL version 2.0" or "Lesser GPL version 2.1". This patch replaces all occurrences of "Lesser GPL version 2" with "Lesser GPL version 2.1" in comment section. Signed-off-by: Chetan Pant <chetan4windows@gmail.com> Message-Id: <20201023122801.19514-1-chetan4windows@gmail.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2020-11-15 16:41:42 +01:00
Richard Henderson	4885c3c495	target-i386: Use ctpop helper Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-01-10 08:49:59 -08:00
Thomas Huth	fcf5ef2ab5	Move target-* CPU file into a target/ folder We've currently got 18 architectures in QEMU, and thus 18 target-xxx folders in the root folder of the QEMU source tree. More architectures (e.g. RISC-V, AVR) are likely to be included soon, too, so the main folder of the QEMU sources slowly gets quite overcrowded with the target-xxx folders. To disburden the main folder a little bit, let's move the target-xxx folders into a dedicated target/ folder, so that target-xxx/ simply becomes target/xxx/ instead. Acked-by: Laurent Vivier <laurent@vivier.eu> [m68k part] Acked-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> [tricore part] Acked-by: Michael Walle <michael@walle.cc> [lm32 part] Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> [s390x part] Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [s390x part] Acked-by: Eduardo Habkost <ehabkost@redhat.com> [i386 part] Acked-by: Artyom Tarasenko <atar4qemu@gmail.com> [sparc part] Acked-by: Richard Henderson <rth@twiddle.net> [alpha part] Acked-by: Max Filippov <jcmvbkbc@gmail.com> [xtensa part] Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [ppc part] Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> [cris&microblaze part] Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn> [unicore32 part] Signed-off-by: Thomas Huth <thuth@redhat.com>	2016-12-20 21:52:12 +01:00

20 Commits