Commit Graph

20 Commits

Author SHA1 Message Date
Paolo Bonzini
2872b0f390 target/i386: implement FMA instructions
The only issue with FMA instructions is that there are _a lot_ of them (30
opcodes, each of which comes in up to 4 versions depending on VEX.W and
VEX.L; a total of 96 possibilities).  However, they can be implement with
only 6 helpers, two for scalar operations and four for packed operations.
(Scalar versions do not do any merging; they only affect the bottom 32
or 64 bits of the output operand.  Therefore, there is no separate XMM
and YMM of the scalar helpers).

First, we can reduce the number of helpers to one third by passing four
operands (one output and three inputs); the reordering of which operands
go to the multiply and which go to the add is done in emit.c.

Second, the different instructions also dispatch to the same softfloat
function, so the flags for float32_muladd and float64_muladd are passed
in the helper as int arguments, with a little extra complication to
handle FMADDSUB and FMSUBADD.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-22 09:05:54 +02:00
Paolo Bonzini
cf5ec6641e target/i386: implement F16C instructions
F16C only consists of two instructions, which are a bit peculiar
nevertheless.

First, they access only the low half of an YMM or XMM register for the
packed-half operand; the exact size still depends on the VEX.L flag.
This is similar to the existing avx_movx flag, but not exactly because
avx_movx is hardcoded to affect operand 2.  To this end I added a "ph"
format name; it's possible to reuse this approach for the VPMOVSX and
VPMOVZX instructions, though that would also require adding two more
formats for the low-quarter and low-eighth of an operand.

Second, VCVTPS2PH is somewhat weird because it *stores* the result of
the instruction into memory rather than loading it.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-20 15:16:18 +02:00
Paolo Bonzini
653fad2497 target/i386: remove old SSE decoder
With all SSE (and AVX!) instructions now implemented in disas_insn_new,
it's possible to remove gen_sse, as well as the helpers for instructions
that now use gvec.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Paolo Bonzini
7170a17ec3 target/i386: reimplement 0x0f 0x10-0x17, add AVX
These are mostly moves, and yet are a total pain.  The main issue
is that:

1) some instructions are selected by mod==11 (register operand)
vs. mod=00/01/10 (memory operand)

2) stores to memory are two-operand operations, while the 3-register
and load-from-memory versions operate on the entire contents of the
destination; this makes it easier to separate the gen_* function for
the store case

3) it's inefficient to load into xmm_T0 only to move the value out
again, so the gen_* function for the load case is separated too

The manual also has various mistakes in the operands here, for example
the store case of MOVHPS operates on a 128-bit source (albeit discarding
the bottom 64 bits) and therefore should be Mq,Vdq rather than Mq,Vq.
Likewise for the destination and source of MOVHLPS.

VUNPCK?PS and VUNPCK?PD are the same as VUNPCK?DQ and VUNPCK?QDQ,
but encoded as prefixes rather than separate operands.  The helpers
can be reused however.

For MOVSLDUP, MOVSHDUP and MOVDDUP I chose to reimplement them as
helpers.  I named the helper for MOVDDUP "movdldup" in preparation
for possible future introduction of MOVDHDUP and to clarify the
similarity with MOVSLDUP.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Paolo Bonzini
16fc5726a6 target/i386: reimplement 0x0f 0x38, add AVX
There are several special cases here:

1) extending moves have different widths for the helpers vs. for the
memory loads, and the width for memory loads depends on VEX.L too.
This is represented by X86_SPECIAL_AVXExtMov.

2) some instructions, such as variable-width shifts, select the vector element
size via REX.W.

3) VSIB instructions (VGATHERxPy, VPGATHERxy) are also part of this group,
and they have (among other things) two output operands.

3) the macros for 4-operand blends (which are under 0x0f 0x3a) have to be
extended to support 2-operand blends.  The 2-operand variant actually
came a few years earlier, but it is clearer to implement them in the
opposite order.

X86_TYPE_WM, introduced earlier for unaligned loads, is reused for helpers
that accept a Reg* but have a M argument.

These three-byte opcodes also include AVX new instructions, for which
the helpers were originally implemented by Paul Brook <paul@nowt.org>.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Paolo Bonzini
7906847768 target/i386: reimplement 0x0f 0x3a, add AVX
The more complicated operations here are insertions and extractions.
Otherwise, there are just more entries than usual because the PS/PD/SS/SD
variations are encoded in the opcode rater than in the prefixes.

These three-byte opcodes also include AVX new instructions, whose
implementation in the helpers was originally done by Paul Brook
<paul@nowt.org>.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Paolo Bonzini
a64eee3ab4 target/i386: clarify (un)signedness of immediates from 0F3Ah opcodes
Three-byte opcodes from the 0F3Ah area all have an immediate byte which
is usually unsigned.  Clarify in the helper code that it is unsigned;
the new decoder treats immediates as signed by default, and seeing
an intN_t in the prototype might give the wrong impression that one
can use decode->immediate directly.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Paolo Bonzini
b98f886c8f target/i386: Introduce 256-bit vector helpers
The new implementation of SSE will cover AVX from the get go, because
all the work for the helper functions is already done.  We just need to
build them.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Paolo Bonzini
6e0cac782a target/i386: implement additional AVX comparison operators
The new implementation of SSE will cover AVX from the get go, so include
the 24 extra comparison operators that are only available with the VEX
prefix.

Based on a patch by Paul Brook <paul@nowt.org>.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Paolo Bonzini
620f75566a target/i386: provide 3-operand versions of unary scalar helpers
Compared to Paul's implementation, the new decoder will use a different approach
to implement AVX's merging of dst with src1 on scalar operations.  Adjust the
old SSE decoder to be compatible with new-style helpers.

The affected instructions are CVTSx2Sx, ROUNDSx, RSQRTSx, SQRTSx, RCPSx.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Paolo Bonzini
f05f9789f5 target/i386: extend helpers to support VEX.V 3- and 4- operand encodings
Add to the helpers all the operands that are needed to implement AVX.

Extracted from a patch by Paul Brook <paul@nowt.org>.

Message-Id: <20220424220204.2493824-26-paul@nowt.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Paul Brook
30f419219a target/i386: Prepare ops_sse_header.h for 256 bit AVX
Adjust all #ifdefs to match the ones in ops_sse.h.

Signed-off-by: Paul Brook <paul@nowt.org>
Message-Id: <20220424220204.2493824-23-paul@nowt.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Paolo Bonzini
ca4b1b43bc target/i386: fix INSERTQ implementation
INSERTQ is defined to not modify any bits in the lower 64 bits of the
destination, other than the ones being replaced with bits from the
source operand.  QEMU instead is using unshifted bits from the source
for those bits.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-19 15:16:00 +02:00
Paul Brook
cbf4ad5498 target/i386: reimplement AVX comparison helpers
AVX includes an additional set of comparison predicates, some of which
our softfloat implementation does not expose as separate functions.
Rewrite the helpers in terms of floatN_compare for future extensibility.

Signed-off-by: Paul Brook <paul@nowt.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220424220204.2493824-24-paul@nowt.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-01 20:16:33 +02:00
Paolo Bonzini
ce4fa29f94 target/i386: Add size suffix to vector FP helpers
For AVX we're going to need both 128 bit (xmm) and 256 bit (ymm) variants of
floating point helpers. Add the register type suffix to the existing
*PS and *PD helpers (SS and SD variants are only valid on 128 bit vectors)

No functional changes.

Signed-off-by: Paul Brook <paul@nowt.org>
Message-Id: <20220424220204.2493824-15-paul@nowt.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-01 20:16:33 +02:00
Richard Henderson
8929906e21 tcg: Remove dh_alias indirection for dh_typecode
The dh_alias redirect is intended to handle TCG types as distinguished
from C types.  TCG does not distinguish signed int from unsigned int,
because they are the same size.  However, we need to retain this
distinction for dh_typecode, lest we fail to extend abi types properly
for the host call parameters.

This bug was detected when running the 'arm' emulator on an s390
system. The s390 uses TCG_TARGET_EXTEND_ARGS which triggers code
in tcg_gen_callN to extend 32 bit values to 64 bits; the incorrect
sign data in the typemask for each argument caused the values to be
extended as unsigned values.

This simple program exhibits the problem:

	static volatile int num = -9;
	static volatile int den = -5;
	int main(void)
	{
		int quo = num / den;
		printf("num %d den %d quo %d\n", num, den, quo);
		exit(0);
	}

When run on the broken qemu, this results in:

	num -9 den -5 quo 0

The correct result is:

	num -9 den -5 quo 1

Fixes: 7319d83a73 ("tcg: Combine dh_is_64bit and dh_is_signed to dh_typecode")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/876
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reported-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Tested-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Tested-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-02-28 08:04:06 -10:00
Richard Henderson
7319d83a73 tcg: Combine dh_is_64bit and dh_is_signed to dh_typecode
We will shortly be interested in distinguishing pointers
from integers in the helper's declaration, as well as a
true void return.  We currently have two parallel 1 bit
fields; merge them and expand to a 3 bit field.

Our current maximum is 7 helper arguments, plus the return
makes 8 * 3 = 24 bits used within the uint32_t typemask.

Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-06-19 08:51:11 -07:00
Chetan Pant
d9ff33ada7 x86 tcg cpus: Fix Lesser GPL version number
There is no "version 2" of the "Lesser" General Public License.
It is either "GPL version 2.0" or "Lesser GPL version 2.1".
This patch replaces all occurrences of "Lesser GPL version 2" with
"Lesser GPL version 2.1" in comment section.

Signed-off-by: Chetan Pant <chetan4windows@gmail.com>
Message-Id: <20201023122801.19514-1-chetan4windows@gmail.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2020-11-15 16:41:42 +01:00
Richard Henderson
4885c3c495 target-i386: Use ctpop helper
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10 08:49:59 -08:00
Thomas Huth
fcf5ef2ab5 Move target-* CPU file into a target/ folder
We've currently got 18 architectures in QEMU, and thus 18 target-xxx
folders in the root folder of the QEMU source tree. More architectures
(e.g. RISC-V, AVR) are likely to be included soon, too, so the main
folder of the QEMU sources slowly gets quite overcrowded with the
target-xxx folders.
To disburden the main folder a little bit, let's move the target-xxx
folders into a dedicated target/ folder, so that target-xxx/ simply
becomes target/xxx/ instead.

Acked-by: Laurent Vivier <laurent@vivier.eu> [m68k part]
Acked-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> [tricore part]
Acked-by: Michael Walle <michael@walle.cc> [lm32 part]
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> [s390x part]
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [s390x part]
Acked-by: Eduardo Habkost <ehabkost@redhat.com> [i386 part]
Acked-by: Artyom Tarasenko <atar4qemu@gmail.com> [sparc part]
Acked-by: Richard Henderson <rth@twiddle.net> [alpha part]
Acked-by: Max Filippov <jcmvbkbc@gmail.com> [xtensa part]
Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [ppc part]
Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> [cris&microblaze part]
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn> [unicore32 part]
Signed-off-by: Thomas Huth <thuth@redhat.com>
2016-12-20 21:52:12 +01:00