Commit Graph

82 Commits

Author SHA1 Message Date
Michael Tokarev
bad5cfcd60 i386: spelling fixes
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2023-09-20 07:54:34 +03:00
Richard Henderson
ca1e9c3ba1 tests/multiarch: Add test-aes
Use a shared driver and backends for i386, aarch64, ppc64, riscv64.

Acked-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-08 07:30:17 +01:00
Richard Henderson
ea185a557b tests/plugin: Remove duplicate insn log from libinsn.so
This is a perfectly natural occurrence for x86 "rep movb",
where the "rep" prefix forms a counted loop of the one insn.

During the tests/tcg/multiarch/memory test, this logging is
triggered over 350000 times.  Within the context of cross-i386-tci
build, which is already slow by nature, the logging is sufficient
to push the test into timeout.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-26 17:33:00 +02:00
Paolo Bonzini
9e65829699 tests/tcg/i386: correct mask for VPERM2F128/VPERM2I128
The instructions also use bits 3 and 7 of their 8-byte immediate.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-18 08:53:50 +02:00
Alex Bennée
d044b7c33a tests/tcg: limit the scope of the plugin tests
Running every plugin with every test is getting excessive as well as
not really improving coverage that much. Restrict the plugin tests to
just the MULTIARCH_TESTS which are shared between most architecture
for both system and user-mode. For those that aren't we need to squash
MULTIARCH_TESTS so we don't add them when they are not part of the
TESTS global.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230424092249.58552-14-alex.bennee@linaro.org>
2023-04-27 14:58:23 +01:00
Richard Henderson
9ad2ba6e8e target/i386: Fix BZHI instruction
We did not correctly handle N >= operand size.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1374
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230114233206.3118472-1-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-27 09:18:55 +01:00
Paolo Bonzini
60c7dd22e1 target/i386: fix ADOX followed by ADCX
When ADCX is followed by ADOX or vice versa, the second instruction's
carry comes from EFLAGS and the condition codes use the CC_OP_ADCOX
operation.  Retrieving the carry from EFLAGS is handled by this bit
of gen_ADCOX:

        tcg_gen_extract_tl(carry_in, cpu_cc_src,
            ctz32(cc_op == CC_OP_ADCX ? CC_C : CC_O), 1);

Unfortunately, in this case cc_op has been overwritten by the previous
"if" statement to CC_OP_ADCOX.  This works by chance when the first
instruction is ADCX; however, if the first instruction is ADOX,
ADCX will incorrectly take its carry from OF instead of CF.

Fix by moving the computation of the new cc_op at the end of the function.
The included exhaustive test case fails without this patch and passes
afterwards.

Because ADCX/ADOX need not be invoked through the VEX prefix, this
regression bisects to commit 16fc5726a6 ("target/i386: reimplement
0x0f 0x38, add AVX", 2022-10-18).  However, the mistake happened a
little earlier, when BMI instructions were rewritten using the new
decoder framework.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1471
Reported-by: Paul Jolly <https://gitlab.com/myitcv>
Fixes: 1d0b926150 ("target/i386: move scalar 0F 38 and 0F 3A instruction to new decoder", 2022-10-18)
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-11 09:07:25 +01:00
Richard Henderson
b14c009897 target/i386: Fix BEXTR instruction
There were two problems here: not limiting the input to operand bits,
and not correctly handling large extraction length.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1372
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230114230542.3116013-3-richard.henderson@linaro.org>
Cc: qemu-stable@nongnu.org
Fixes: 1d0b926150 ("target/i386: move scalar 0F 38 and 0F 3A instruction to new decoder", 2022-10-18)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-11 09:07:25 +01:00
Richard Henderson
5d62d6649c tests/tcg/i386: Introduce and use reg_t consistently
Define reg_t based on the actual register width.
Define the inlines using that type.  This will allow
input registers to 32-bit insns to be set to 64-bit
values on x86-64, which allows testing various edge cases.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230114230542.3116013-2-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-11 09:07:24 +01:00
Paolo Bonzini
2872b0f390 target/i386: implement FMA instructions
The only issue with FMA instructions is that there are _a lot_ of them (30
opcodes, each of which comes in up to 4 versions depending on VEX.W and
VEX.L; a total of 96 possibilities).  However, they can be implement with
only 6 helpers, two for scalar operations and four for packed operations.
(Scalar versions do not do any merging; they only affect the bottom 32
or 64 bits of the output operand.  Therefore, there is no separate XMM
and YMM of the scalar helpers).

First, we can reduce the number of helpers to one third by passing four
operands (one output and three inputs); the reordering of which operands
go to the multiply and which go to the add is done in emit.c.

Second, the different instructions also dispatch to the same softfloat
function, so the flags for float32_muladd and float64_muladd are passed
in the helper as int arguments, with a little extra complication to
handle FMADDSUB and FMSUBADD.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-22 09:05:54 +02:00
Paolo Bonzini
cf5ec6641e target/i386: implement F16C instructions
F16C only consists of two instructions, which are a bit peculiar
nevertheless.

First, they access only the low half of an YMM or XMM register for the
packed-half operand; the exact size still depends on the VEX.L flag.
This is similar to the existing avx_movx flag, but not exactly because
avx_movx is hardcoded to affect operand 2.  To this end I added a "ph"
format name; it's possible to reuse this approach for the VPMOVSX and
VPMOVZX instructions, though that would also require adding two more
formats for the low-quarter and low-eighth of an operand.

Second, VCVTPS2PH is somewhat weird because it *stores* the result of
the instruction into memory rather than loading it.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-20 15:16:18 +02:00
Paolo Bonzini
0339ddfa75 tests/tcg: extend SSE tests to AVX
Extracted from a patch by Paul Brook <paul@nowt.org>.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Paolo Bonzini
15b273f8e6 tests/tcg: move compiler tests to Makefiles
Further decoupling of tests/tcg from the main QEMU Makefile, and making
the build more similar between the cross compiler case and the vetted
container images.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220929114231.583801-25-alex.bennee@linaro.org>
2022-10-06 11:53:40 +01:00
Paolo Bonzini
c6cf8a2052 tests/tcg: clean up calls to run-test
Almost all invocations of run-test have either "$* on $(TARGET_NAME)"
or "$< on $(TARGET_NAME)" as the last argument.  So provide a default
test name, while allowing an escape hatch for custom names.

As an additional simplification, remove the need to do shell quoting.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220929114231.583801-24-alex.bennee@linaro.org>
2022-10-06 11:53:40 +01:00
Paolo Bonzini
e121d7606b tests/tcg: remove old SSE tests
The new testsuite is much more comprehensive, so remove the old one;
it is also buggy (the pinsrw test uses incorrect constraints, with =
instead of +, and the golden output for the fxsave tests differs depending
on how the C library uses SSE and AVX instructions).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-19 15:15:59 +02:00
Paolo Bonzini
e02907cc12 tests/tcg: refine MMX support in SSE tests
Extend the support to memory operands, and skip MMX instructions that
were introduced in SSE times, because they are now covered in test-mmx.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-19 15:15:59 +02:00
Paolo Bonzini
fa7ce0b028 tests/tcg: i386: add MMX and 3DNow! tests
Adjust the test-avx.py generator to produce tests specifically for
MMX and 3DNow.  Using a separate generator introduces some code
duplication, but is a simpler approach because of test-avx's extra
complexity to support 3- and 4-operand AVX instructions.

If needed, a common library can be introduced later.

While at it, for consistency move all the -cpu max rules to the
same place.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-19 15:14:40 +02:00
Paolo Bonzini
4ce4a1a714 tests/tcg: i386: fix typos in 3DNow! instructions
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-18 09:17:41 +02:00
Richard Henderson
d64655c2c3 tests/tcg/i386: Move smc_code2 to an executable section
We're about to start validating PAGE_EXEC, which means
that we've got to put this code into a section that is
both writable and executable.

Note that this test did not run on hardware beforehand either.

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-09-06 08:04:25 +01:00
Paul Brook
91117bc546 tests/tcg: i386: add SSE tests
Tests for correct operation of most x86-64 SSE instructions.
It should cover all combinations of overlapping register and memory
operands on a set of random-ish data.

Results are bit-identical to an Intel i5-8500, with the exception of
the RCPSS and RSQRT approximations where the real CPU gives less accurate
results (the Intel spec allows relative errors up to 1.5 * 2^-12)

Signed-off-by: Paul Brook <paul@nowt.org>
Acked-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220424220204.2493824-42-paul@nowt.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-01 20:16:33 +02:00
Paolo Bonzini
7b764d4173 tests/tcg: i386: extend BMI test
Cover all BMI1 and BMI2 instructions, both 32- and 64-bit.

Due to the use of inlines, the test now has to be compiled with -O2.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-01 08:37:04 +02:00
Paolo Bonzini
9e8504c057 tests/tcg: x86_64: improve consistency with i386
Include test-i386-bmi2, and specify manually the tests (only one for now)
that need -cpu max.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-01 08:37:04 +02:00
Richard Henderson
6012d96379 tests/tcg/i386: Use explicit suffix on fist insns
Fixes a number of assembler warnings of the form:

test-i386.c: Assembler messages:
test-i386.c:869: Warning: no instruction mnemonic suffix given
  and no register operands; using default for `fist'

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220527171143.168276-1-richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220613171258.1905715-3-alex.bennee@linaro.org>
2022-06-14 00:15:04 +01:00
Alex Bennée
f9caa8feea tests/tcg: add missing reference files for float_convs
We might as well include a reference file for i386/x86_64. I was going
to include s390x as well but it's broken hence I raised:

  https://gitlab.com/qemu-project/qemu/-/issues/979.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220419091020.3008144-24-alex.bennee@linaro.org>
2022-04-20 16:04:20 +01:00
Alex Bennée
2931014c3d tests/tcg: add float_convd test
This is a simple transliteration of the float_convs test but this time
working with doubles. I'm used it to test the handling of vector
registers in gdbstub but wasn't able to find a non-ugly way to
automate it.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220419091020.3008144-23-alex.bennee@linaro.org>
2022-04-20 16:04:20 +01:00
Paolo Bonzini
f084839aba tests/tcg: add compiler test variables when using containers
Even for container-based cross compilation use $(CROSS_CC_HAS_*) variables.
This makes the TCG test makefiles oblivious of whether the compiler is
invoked through a container or not.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220401141326.1244422-10-pbonzini@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220419091020.3008144-13-alex.bennee@linaro.org>
2022-04-20 16:04:20 +01:00
Alex Bennée
f8a4c6d728 tests/tcg: add vectorised sha512 versions
This builds vectorised versions of sha512 to exercise the vector code:

  - aarch64 (AdvSimd)
  - i386 (SSE)
  - s390x (MVX)
  - ppc64/ppc64le (power10 vectors)

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220225172021.3493923-14-alex.bennee@linaro.org>
2022-02-28 16:42:35 +00:00
Richard Henderson
efee71c8ca tests/tcg/multiarch: Re-enable signals test for most guests
With signal trampolines safely off the stack for all
guests besides hppa, we can re-enable this test.

It does show up a problem with sh4 (unrelated?),
so leave that test disabled for now.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210929130553.121567-27-richard.henderson@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-10-01 12:03:48 +02:00
Mahmoud Mandour
0163ce3179 tests/plugins/insn: made arg inline not positional and parse it as bool
Made argument "inline" not positional, this has two benefits. First is
that we adhere to how QEMU passes args generally, by taking the last
value of an argument and drop the others. And the second is that this
sets up a framework for potentially adding new args easily.

Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210730135817.17816-11-ma.mandourr@gmail.com>
[AJB: fix check-tcg tests calling arg=inline]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
2021-09-02 11:29:34 +01:00
Alex Bennée
0f1ea9c7a6 tests/tcg: also disable the signals test for plugins
This will be more important when plugins is enabled by default.

Fixes: eba61056e4 ("tests/tcg: generalise the disabling of the signals test")
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210709143005.1554-6-alex.bennee@linaro.org>
2021-07-14 14:31:48 +01:00
Alex Bennée
631f112f42 tests/tcg/i386: force -fno-pie for test-i386
The containerised compiler defaults to no-pie anyway but if we are
relying on the users installed cross compiler we need to check it
works for building 16 bit code first.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210401102530.12030-7-alex.bennee@linaro.org>
2021-04-06 15:04:42 +01:00
Alex Bennée
4011a686cc tests/tcg/i386: expand .data sections for system tests
Newer compilers might end up putting some data in .data.rel.local
which was getting skipped resulting in hilarious confusion on some
tests. Fix that.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210401102530.12030-6-alex.bennee@linaro.org>
2021-04-06 15:04:42 +01:00
Alex Bennée
e025d799af tests/plugin: expand insn test to detect duplicate instructions
A duplicate insn is one that is appears to be executed twice in a row.
This is currently possible due to -icount and cpu_io_recompile()
causing a re-translation of a block. On it's own this won't trigger
any tests though.

The heuristics that the plugin use can't deal with the x86 rep
instruction which (validly) will look like executing the same
instruction several times. To avoid problems later we tweak the rules
for x86 to run the "inline" version of the plugin. This also has the
advantage of increasing coverage of the plugin code (see bugfix in
previous commit).

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210213130325.14781-15-alex.bennee@linaro.org>
2021-02-18 08:19:23 +00:00
Alex Bennée
c00506aa26 gdbstub: implement a softmmu based test
This adds a new tests that allows us to test softmmu only features
including watchpoints. To do achieve this we need to:

  - add _exit: labels to the boot codes
  - write a memory.py test case
  - plumb the test case into the build system
  - tweak the run_test script to:
    - re-direct output when asked
    - use socket based connection for all tests
    - add a small pause before connection

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210108224256.2321-6-alex.bennee@linaro.org>
2021-01-18 10:04:31 +00:00
Paolo Bonzini
75b208c283 target/i386: fix operand order for PDEP and PEXT
For PDEP and PEXT, the mask is provided in the memory (mod+r/m)
operand, and therefore is loaded in s->T0 by gen_ldst_modrm.
The source is provided in the second source operand (VEX.vvvv)
and therefore is loaded in s->T1.  Fix the order in which
they are passed to the helpers.

Reported-by: Lenard Szolnoki <blog@lenardszolnoki.com>
Analyzed-by: Lenard Szolnoki <blog@lenardszolnoki.com>
Fixes: https://bugs.launchpad.net/qemu/+bug/1605123
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-10 12:14:49 -05:00
Joseph Myers
418b0f93d1 target/i386: fix IEEE SSE floating-point exception raising
The SSE instruction implementations all fail to raise the expected
IEEE floating-point exceptions because they do nothing to convert the
exception state from the softfloat machinery into the exception flags
in MXCSR.

Fix this by adding such conversions.  Unlike for x87, emulated SSE
floating-point operations might be optimized using hardware floating
point on the host, and so a different approach is taken that is
compatible with such optimizations.  The required invariant is that
all exceptions set in env->sse_status (other than "denormal operand",
for which the SSE semantics are different from those in the softfloat
code) are ones that are set in the MXCSR; the emulated MXCSR is
updated lazily when code reads MXCSR, while when code sets MXCSR, the
exceptions in env->sse_status are set accordingly.

A few instructions do not raise all the exceptions that would be
raised by the softfloat code, and those instructions are made to save
and restore the softfloat exception state accordingly.

Nothing is done about "denormal operand"; setting that (only for the
case when input denormals are *not* flushed to zero, the opposite of
the logic in the softfloat code for such an exception) will require
custom code for relevant instructions, or else architecture-specific
conditionals in the softfloat code for when to set such an exception
together with custom code for various SSE conversion and rounding
instructions that do not set that exception.

Nothing is done about trapping exceptions (for which there is minimal
and largely broken support in QEMU's emulation in the x87 case and no
support at all in the SSE case).

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Message-Id: <alpine.DEB.2.21.2006252358000.3832@digraph.polyomino.org.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 18:02:17 -04:00
Joseph Myers
ff57bb7b63 target/i386: reimplement fpatan using floatx80 operations
The x87 fpatan emulation is currently based around conversion to
double.  This is inherently unsuitable for a good emulation of any
floatx80 operation.  Reimplement using the soft-float operations, as
for other such instructions.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>

Message-Id: <alpine.DEB.2.21.2006230000340.24721@digraph.polyomino.org.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-26 09:39:39 -04:00
Joseph Myers
1f18a1e6ab target/i386: reimplement fyl2x using floatx80 operations
The x87 fyl2x emulation is currently based around conversion to
double.  This is inherently unsuitable for a good emulation of any
floatx80 operation.  Reimplement using the soft-float operations,
building on top of the reimplementation of fyl2xp1 and factoring out
code to be shared between the two instructions.

The included test assumes that the result in round-to-nearest mode
should always be one of the two closest floating-point numbers to the
mathematically exact result (including that it should be exact, in the
exact cases which cover more cases than for fyl2xp1).

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Message-Id: <alpine.DEB.2.21.2006172321530.20587@digraph.polyomino.org.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-26 09:39:39 -04:00
Joseph Myers
5eebc49d2d target/i386: reimplement fyl2xp1 using floatx80 operations
The x87 fyl2xp1 emulation is currently based around conversion to
double.  This is inherently unsuitable for a good emulation of any
floatx80 operation, even before considering that it is a particularly
naive implementation using double (adding 1 then using log rather than
attempting a better emulation using log1p).

Reimplement using the soft-float operations, as was done for f2xm1; as
in that case, m68k has related operations but not exactly this one and
it seemed safest to implement directly rather than reusing the m68k
code to avoid accumulation of errors.

A test is included with many randomly generated inputs.  The
assumption of the test is that the result in round-to-nearest mode
should always be one of the two closest floating-point numbers to the
mathematical value of y * log2(x + 1); the implementation aims to do
somewhat better than that (about 70 correct bits before rounding).  I
haven't investigated how accurate hardware is.

Intel manuals describe a narrower range of valid arguments to this
instruction than AMD manuals.  The implementation accepts the wider
range (it's needed anyway for the core code to be reusable in a
subsequent patch reimplementing fyl2x), but the test only has inputs
in the narrower range so that it's valid on hardware that may reject
or produce poor results for inputs outside that range.

Code in the previous implementation that sets C2 for some out-of-range
arguments is not carried forward to the new implementation; C2 is
undefined for this instruction and I suspect that code was just
cut-and-pasted from the trigonometric instructions (fcos, fptan, fsin,
fsincos) where C2 *is* defined to be set for out-of-range arguments.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>

Message-Id: <alpine.DEB.2.21.2006172320190.20587@digraph.polyomino.org.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-26 09:39:38 -04:00
Joseph Myers
eca30647fc target/i386: reimplement f2xm1 using floatx80 operations
The x87 f2xm1 emulation is currently based around conversion to
double.  This is inherently unsuitable for a good emulation of any
floatx80 operation, even before considering that it is a particularly
naive implementation using double (computing with pow and then
subtracting 1 rather than attempting a better emulation using expm1).

Reimplement using the soft-float operations, including additions and
multiplications with higher precision where appropriate to limit
accumulation of errors.  I considered reusing some of the m68k code
for transcendental operations, but the instructions don't generally
correspond exactly to x87 operations (for example, m68k has 2^x and
e^x - 1, but not 2^x - 1); to avoid possible accumulation of errors
from applying multiple such operations each rounding to floatx80
precision, I wrote a direct implementation of 2^x - 1 instead.  It
would be possible in principle to make the implementation more
efficient by doing the intermediate operations directly with
significands, signs and exponents and not packing / unpacking floatx80
format for each operation, but that would make it significantly more
complicated and it's not clear that's worthwhile; the m68k emulation
doesn't try to do that.

A test is included with many randomly generated inputs.  The
assumption of the test is that the result in round-to-nearest mode
should always be one of the two closest floating-point numbers to the
mathematical value of 2^x - 1; the implementation aims to do somewhat
better than that (about 70 correct bits before rounding).  I haven't
investigated how accurate hardware is.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>

Message-Id: <alpine.DEB.2.21.2006112341010.18393@digraph.polyomino.org.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-26 09:39:37 -04:00
Alex Bennée
d16242e524 tests/tcg: ensure -cpu max also used for plugin run
The check-tcg plugins build was failing because some special case
tests that needed -cpu max failed because the plugin variant hadn't
carried across the QEMU_OPTS tweak.

Guests which globally set QEMU_OPTS=-cpu FOO where unaffected.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200615141922.18829-3-alex.bennee@linaro.org>
2020-06-16 14:49:05 +01:00
Joseph Myers
bc921b2711 target/i386: correct fix for pcmpxstrx substring search
This corrects a bug introduced in my previous fix for SSE4.2 pcmpestri
/ pcmpestrm / pcmpistri / pcmpistrm substring search, commit
ae35eea7e4.

That commit fixed a bug that showed up in four GCC tests with one libc
implementation.  The tests in question generate random inputs to the
intrinsics and compare results to a C implementation, but they only
test 1024 possible random inputs, and when the tests use the cases of
those instructions that work with word rather than byte inputs, it's
easy to have problematic cases that show up much less frequently than
that.  Thus, testing with a different libc implementation, and so a
different random number generator, showed up a problem with the
previous patch.

When investigating the previous test failures, I found the description
of these instructions in the Intel manuals (starting from computing a
16x16 or 8x8 set of comparison results) confusing and hard to match up
with the more optimized implementation in QEMU, and referred to AMD
manuals which described the instructions in a different way.  Those
AMD descriptions are very explicit that the whole of the string being
searched for must be found in the other operand, not running off the
end of that operand; they say "If the prototype and the SUT are equal
in length, the two strings must be identical for the comparison to be
TRUE.".  However, that statement is incorrect.

In my previous commit message, I noted:

  The operation in this case is a search for a string (argument d to
  the helper) in another string (argument s to the helper); if a copy
  of d at a particular position would run off the end of s, the
  resulting output bit should be 0 whether or not the strings match in
  the region where they overlap, but the QEMU implementation was
  wrongly comparing only up to the point where s ends and counting it
  as a match if an initial segment of d matched a terminal segment of
  s.  Here, "run off the end of s" means that some byte of d would
  overlap some byte outside of s; thus, if d has zero length, it is
  considered to match everywhere, including after the end of s.

The description "some byte of d would overlap some byte outside of s"
is accurate only when understood to refer to overlapping some byte
*within the 16-byte operand* but at or after the zero terminator; it
is valid to run over the end of s if the end of s is the end of the
16-byte operand.  So the fix in the previous patch for the case of d
being empty was correct, but the other part of that patch was not
correct (as it never allowed partial matches even at the end of the
16-byte operand).  Nor was the code before the previous patch correct
for the case of d nonempty, as it would always have allowed partial
matches at the end of s.

Fix with a partial revert of my previous change, combined with
inserting a check for the special case of s having maximum length to
determine where it is necessary to check for matches.

In the added test, test 1 is for the case of empty strings, which
failed before my 2017 patch, test 2 is for the bug introduced by my
2017 patch and test 3 deals with the case where a match of an initial
segment at the end of the string is not valid when the string ends
before the end of the 16-byte operand (that is, the case that would be
broken by a simple revert of the non-empty-string part of my 2017
patch).

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Message-Id: <alpine.DEB.2.21.2006121344290.9881@digraph.polyomino.org.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-12 11:10:39 -04:00
Joseph Myers
975af797f1 target/i386: fix IEEE x87 floating-point exception raising
Most x87 instruction implementations fail to raise the expected IEEE
floating-point exceptions because they do nothing to convert the
exception state from the softfloat machinery into the exception flags
in the x87 status word.  There is special-case handling of division to
raise the divide-by-zero exception, but that handling is itself buggy:
it raises the exception in inappropriate cases (inf / 0 and nan / 0,
which should not raise any exceptions, and 0 / 0, which should raise
"invalid" instead).

Fix this by converting the floating-point exceptions raised during an
operation by the softfloat machinery into exceptions in the x87 status
word (passing through the existing fpu_set_exception function for
handling related to trapping exceptions).  There are special cases
where some functions convert to integer internally but exceptions from
that conversion are not always correct exceptions for the instruction
to raise.

There might be scope for some simplification if the softfloat
exception state either could always be assumed to be in sync with the
state in the status word, or could always be ignored at the start of
each instruction and just set to 0 then; I haven't looked into that in
detail, and it might run into interactions with the various ways the
emulation does not yet handle trapping exceptions properly.  I think
the approach taken here, of saving the softfloat state, setting
exceptions there to 0 and then merging the old exceptions back in
after carrying out the operation, is conservatively safe.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Message-Id: <alpine.DEB.2.21.2005152120280.3469@digraph.polyomino.org.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-10 12:10:51 -04:00
Joseph Myers
c8af85b10c target/i386: fix fisttpl, fisttpll handling of out-of-range values
The fist / fistt family of instructions should all store the most
negative integer in the destination format when the rounded /
truncated integer result is out of range or the input is an invalid
encoding, infinity or NaN.  The fisttpl and fisttpll implementations
(32-bit and 64-bit results, truncate towards zero) failed to do this,
producing the most positive integer in some cases instead.  Fix this
by copying the code used to handle this issue for fistpl and fistpll,
adjusted to use the _round_to_zero functions for the actual
conversion (but without any other changes to that code).

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Message-Id: <alpine.DEB.2.21.2005152119160.3469@digraph.polyomino.org.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-10 12:10:26 -04:00
Joseph Myers
374ff4d0a3 target/i386: fix fbstp handling of out-of-range values
The fbstp implementation fails to check for out-of-range and invalid
values, instead just taking the result of conversion to int64_t and
storing its sign and low 18 decimal digits.  Fix this by checking for
an out-of-range result (invalid conversions always result in INT64_MAX
or INT64_MIN from the softfloat code, which are large enough to be
considered as out-of-range by this code) and storing the packed BCD
indefinite encoding in that case.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Message-Id: <alpine.DEB.2.21.2005132351110.11687@digraph.polyomino.org.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-10 12:10:25 -04:00
Joseph Myers
18c53e1e73 target/i386: fix fbstp handling of negative zero
The fbstp implementation stores +0 when the rounded result should be
-0 because it compares an integer value with 0 to determine the sign.
Fix this by checking the sign bit of the operand instead.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Message-Id: <alpine.DEB.2.21.2005132350230.11687@digraph.polyomino.org.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-10 12:10:25 -04:00
Joseph Myers
34b9cc076f target/i386: fix fxam handling of invalid encodings
The fxam implementation does not check for invalid encodings, instead
treating them like NaN or normal numbers depending on the exponent.
Fix it to check that the high bit of the significand is set before
treating an encoding as NaN or normal, thus resulting in correct
handling (all of C0, C2 and C3 cleared) for invalid encodings.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Message-Id: <alpine.DEB.2.21.2005132349311.11687@digraph.polyomino.org.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-10 12:10:24 -04:00
Joseph Myers
80b4008c80 target/i386: fix floating-point load-constant rounding
The implementations of the fldl2t, fldl2e, fldpi, fldlg2 and fldln2
instructions load fixed constants independent of the rounding mode.
Fix them to load a value correctly rounded for the current rounding
mode (but always rounded to 64-bit precision independent of the
precision control, and without setting "inexact") as specified.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <alpine.DEB.2.21.2005132348310.11687@digraph.polyomino.org.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-10 12:10:24 -04:00
Joseph Myers
c535d68755 target/i386: fix fscale handling of rounding precision
The fscale implementation uses floatx80_scalbn for the final scaling
operation.  floatx80_scalbn ends up rounding the result using the
dynamic rounding precision configured for the FPU.  But only a limited
set of x87 floating-point instructions are supposed to respect the
dynamic rounding precision, and fscale is not in that set.  Fix the
implementation to save and restore the rounding precision around the
call to floatx80_scalbn.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Message-Id: <alpine.DEB.2.21.2005070045430.18350@digraph.polyomino.org.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-10 12:10:21 -04:00
Joseph Myers
c1c5fb8f90 target/i386: fix fscale handling of infinite exponents
The fscale implementation passes infinite exponents through to generic
code that rounds the exponent to a 32-bit integer before using
floatx80_scalbn.  In round-to-nearest mode, and ignoring exceptions,
this works in many cases.  But it fails to handle the special cases of
scaling 0 by a +Inf exponent or an infinity by a -Inf exponent, which
should produce a NaN, and because it produces an inexact result for
finite nonzero numbers being scaled, the result is sometimes incorrect
in other rounding modes.  Add appropriate handling of infinite
exponents to produce a NaN or an appropriately signed exact zero or
infinity as a result.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Message-Id: <alpine.DEB.2.21.2005070045010.18350@digraph.polyomino.org.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-10 12:10:18 -04:00