Commit Graph

2370 Commits

Author SHA1 Message Date
Peter Maydell
0f31e37c7f target/arm: Implement MVE VCTP
Implement the MVE VCTP insn, which sets the VPR.P0 predicate bits so
as to predicate any element at index Rn or greater is predicated.  As
with VPNOT, this insn itself is predicable and subject to beatwise
execution.

The calculation of the mask is the same as is used to determine
ltpmask in mve_element_mask(), but we precalculate masklen in
generated code to avoid having to have 4 helpers specialized by size.

We put the decode line in with the low-overhead-loop insns in
t32.decode because it's logically part of that collection of insn
patterns, even though it is an MVE only insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
fea3958fa1 target/arm: Implement MVE VPNOT
Implement the MVE VPNOT insn, which inverts the bits in VPR.P0
(subject to both predication and to beatwise execution).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
1241f148d5 target/arm: Implement MVE VMOV to/from 2 general-purpose registers
Implement the MVE VMOV forms that move data between 2 general-purpose
registers and 2 32-bit lanes in a vector register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
d5c571ea6d target/arm: Implement MVE VMAXA, VMINA
Implement the MVE VMAXA and VMINA insns, which take the absolute
value of the signed elements in the input vector and then accumulate
the unsigned max or min into the destination vector.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
398e7cd3cd target/arm: Implement MVE VQABS, VQNEG
Implement the MVE 1-operand saturating operations VQABS and VQNEG.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
8be9a25058 target/arm: Implement MVE saturating doubling multiply accumulates
Implement the MVE saturating doubling multiply accumulate insns
VQDMLAH, VQRDMLAH, VQDMLASH and VQRDMLASH.  These perform a multiply,
double, add the accumulator shifted by the element size, possibly
round, saturate to twice the element size, then take the high half of
the result.  The *MLAH insns do vector * scalar + vector, and the
*MLASH insns do vector * vector + scalar.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
c69e34c6de target/arm: Implement MVE VMLA
Implement the MVE VMLA insn, which multiplies a vector by a scalar
and accumulates into another vector.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
f0ffff5163 target/arm: Implement MVE VMLADAV and VMLSLDAV
Implement the MVE VMLADAV and VMLSLDAV insns.  Like the VMLALDAV and
VMLSLDAV insns already implemented, these accumulate multiplied
vector elements; but they accumulate a 32-bit result rather than a
64-bit one.

Note that these encodings overlap with what would be RdaHi=0b111 for
VMLALDAV, VMLSLDAV, VRMLALDAVH and VRMLSLDAVH.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
640cdf20a2 target/arm: Rename MVEGenDualAccOpFn to MVEGenLongDualAccOpFn
The MVEGenDualAccOpFn is a bit misnamed, since it is used for
the "long dual accumulate" operations that use a 64-bit
accumulator. Rename it to MVEGenLongDualAccOpFn so we can
use the former name for the 32-bit accumulator insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
54dc78a901 target/arm: Implement MVE narrowing moves
Implement the MVE narrowing move insns VMOVN, VQMOVN and VQMOVUN.
These take a double-width input, narrow it (possibly saturating) and
store the result to either the top or bottom half of the output
element.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
7f061c0ab9 target/arm: Implement MVE VABAV
Implement the MVE VABAV insn, which computes absolute differences
between elements of two vectors and accumulates the result into
a general purpose register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
688ba4cf33 target/arm: Implement MVE integer min/max across vector
Implement the MVE integer min/max across vector insns
VMAXV, VMINV, VMAXAV and VMINAV, which find the maximum
from the vector elements and a general purpose register,
and store the maximum back into the general purpose
register.

These insns overlap with VRMLALDAVH (they use what would
be RdaHi=0b110).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
345910f8c1 target/arm: Move 'x' and 'a' bit definitions into vmlaldav formats
All the users of the vmlaldav formats have an 'x bit in bit 12 and an
'a' bit in bit 5; move these to the format rather than specifying them
in each insn pattern.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
1b15a97d4c target/arm: Implement MVE shift-by-scalar
Implement the MVE instructions which perform shifts by a scalar.
These are VSHL T2, VRSHL T2, VQSHL T1 and VQRSHL T2.  They take the
shift amount in a general purpose register and shift every element in
the vector by that amount.

Mostly we can reuse the helper functions for shift-by-immediate; we
do need two new helpers for VQRSHL.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
6b895bf8fb target/arm: Implement MVE VMLAS
Implement the MVE VMLAS insn, which multiplies a vector by a vector
and adds a scalar.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
c386443b16 target/arm: Implement MVE VPSEL
Implement the MVE VPSEL insn, which sets each byte of the destination
vector Qd to the byte from either Qn or Qm depending on the value of
the corresponding bit in VPR.P0.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
cce81873bc target/arm: Implement MVE integer vector-vs-scalar comparisons
Implement the MVE integer vector comparison instructions that compare
each element against a scalar from a general purpose register.  These
are "VCMP (vector)" encodings T4, T5 and T6 and "VPT (vector)"
encodings T4, T5 and T6.

We have to move the decodetree pattern for VPST, because it
overlaps with VCMP T4 with size = 0b11.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
eff5d9a9bd target/arm: Implement MVE integer vector comparisons
Implement the MVE integer vector comparison instructions.  These are
"VCMP (vector)" encodings T1, T2 and T3, and "VPT (vector)" encodings
T1, T2 and T3.

These insns compare corresponding elements in each vector, and update
the VPR.P0 predicate bits with the results of the comparison.  VPT
also sets the VPR.MASK01 and VPR.MASK23 fields -- it is effectively
"VCMP then VPST".

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
552517861c target/arm: Factor out gen_vpst()
Factor out the "generate code to update VPR.MASK01/MASK23" part of
trans_VPST(); we are going to want to reuse it for the VPT insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
395b92d50e target/arm: Implement MVE incrementing/decrementing dup insns
Implement the MVE incrementing/decrementing dup insns VIDUP, VDDUP,
VIWDUP and VDWDUP.  These fill the elements of a vector with
successively incrementing values, starting at the offset specified in
a general purpose register.  The final value of the offset is written
back to this register.  The wrapping variants take a second general
purpose register which specifies the point where the count should
wrap back to 0.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
c1bd78cb06 target/arm: Implement MVE VMULL (polynomial)
Implement the MVE VMULL (polynomial) insn.  Unlike Neon, this comes
in two flavours: 8x8->16 and a 16x16->32.  Also unlike Neon, the
inputs are in either the low or the high half of each double-width
element.

The assembler for this insn indicates the size with "P8" or "P16",
encoded into bit 28 as size = 0 or 1. We choose to follow the
same encoding as VQDMULL and decode this into a->size as MO_16
or MO_32 indicating the size of the result elements. This then
carries through to the helper function names where it then
matches up with the existing pmull_h() which does an 8x8->16
operation and a new pmull_w() which does the 16x16->32.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
41704cc262 target/arm: Fix VLDRB/H/W for predicated elements
For vector loads, predicated elements are zeroed, instead of
retaining their previous values (as happens for most data
processing operations). This means we need to distinguish
"beat not executed due to ECI" (don't touch destination
element) from "beat executed but predicated out" (zero
destination element).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
e3152d02da target/arm: Fix VPT advance when ECI is non-zero
We were not paying attention to the ECI state when advancing the VPT
state.  Architecturally, VPT state advance happens for every beat
(see the pseudocode VPTAdvance()), so on every beat the 4 bits of
VPR.P0 corresponding to the current beat are inverted if required,
and at the end of beats 1 and 3 the VPR MASK fields are updated.
This means that if the ECI state says we should not be executing all
4 beats then we need to skip some of the updating of the VPR that we
currently do in mve_advance_vpt().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
e0d40070e1 target/arm: Factor out mve_eci_mask()
In some situations we need a mask telling us which parts of the
vector correspond to beats that are not being executed because of
ECI, separately from the combined "which bytes are predicated away"
mask.  Factor this mask calculation out of mve_element_mask() into
its own function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
3f4f1880c2 target/arm: Fix calculation of LTP mask when LR is 0
In mve_element_mask(), we calculate a mask for tail predication which
should have a number of 1 bits based on the value of LR.  However,
our MAKE_64BIT_MASK() macro has undefined behaviour when passed a
zero length.  Special case this to give the all-zeroes mask we
require.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:48 +01:00
Peter Maydell
fdcf2269c4 target/arm: Fix MVE 48-bit SQRSHRL for small right shifts
We got an edge case wrong in the 48-bit SQRSHRL implementation: if
the shift is to the right, although it always makes the result
smaller than the input value it might not be within the 48-bit range
the result is supposed to be if the input had some bits in [63..48]
set and the shift didn't bring all of those within the [47..0] range.

Handle this similarly to the way we already do for this case in
do_uqrshl48_d(): extend the calculated result from 48 bits,
and return that if not saturating or if it doesn't change the
result; otherwise fall through to return a saturated value.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:48 +01:00
Peter Maydell
95351aa76c target/arm: Fix 48-bit saturating shifts
In do_sqrshl48_d() and do_uqrshl48_d() we got some of the edge
cases wrong and failed to saturate correctly:

(1) In do_sqrshl48_d() we used the same code that do_shrshl_bhs()
does to obtain the saturated most-negative and most-positive 48-bit
signed values for the large-shift-left case.  This gives (1 << 47)
for saturate-to-most-negative, but we weren't sign-extending this
value to the 64-bit output as the pseudocode requires.

(2) For left shifts by less than 48, we copied the "8/16 bit" code
from do_sqrshl_bhs() and do_uqrshl_bhs().  This doesn't do the right
thing because it assumes the C type we're working with is at least
twice the number of bits we're saturating to (so that a shift left by
bits-1 can't shift anything off the top of the value).  This isn't
true for bits == 48, so we would incorrectly return 0 rather than the
most-positive value for situations like "shift (1 << 44) right by
20".  Instead check for saturation by doing the shift and signextend
and then testing whether shifting back left again gives the original
value.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:48 +01:00
Peter Maydell
a5e59e8dcb target/arm: Fix mask handling for MVE narrowing operations
In the MVE helpers for the narrowing operations (DO_VSHRN and
DO_VSHRN_SAT) we were using the wrong bits of the predicate mask for
the 'top' versions of the insn.  This is because the loop works over
the double-sized input elements and shifts the predicate mask by that
many bits each time, but when we write out the half-sized output we
must look at the mask bits for whichever half of the element we are
writing to.

Correct this by shifting the whole mask right by ESIZE bits for the
'top' insns.  This allows us also to simplify the saturation bit
checking (where we had noticed that we needed to look at a different
mask bit for the 'top' insn.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:48 +01:00
Peter Maydell
ed5a59d61f target/arm: Fix signed VADDV
A cut-and-paste error meant we handled signed VADDV like
unsigned VADDV; fix the type used.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:48 +01:00
Peter Maydell
c88ff88498 target/arm: Fix MVE VSLI by 0 and VSRI by <dt>
In the MVE shift-and-insert insns, we special case VSLI by 0
and VSRI by <dt>. VSRI by <dt> means "don't update the destination",
which is what we've implemented. However VSLI by 0 is "set
destination to the input", so we don't want to use the same
special-casing that we do for VSRI by <dt>.

Since the generic logic gives the right answer for a shift
by 0, just use that.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:48 +01:00
Peter Maydell
aa29190826 target/arm: Print MVE VPR in CPU dumps
Include the MVE VPR register value in the CPU dumps produced by
arm_cpu_dump_state() if we are printing FPU information. This
makes it easier to interpret debug logs when predication is
active.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:48 +01:00
Peter Maydell
9dacf0764b target/arm: Note that we handle VMOVL as a special case of VSHLL
Although the architecture doesn't define it as an alias, VMOVL
(vector move long) is encoded as a VSHLL with a zero shift.
Add a comment in the decode file noting that we handle VMOVL
as part of VSHLL.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:48 +01:00
Richard Henderson
b3d52804c5 target/arm: Add sve-default-vector-length cpu property
Mirror the behavour of /proc/sys/abi/sve_default_vector_length
under the real linux kernel.  We have no way of passing along
a real default across exec like the kernel can, but this is a
decent way of adjusting the startup vector length of a process.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/482
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210723203344.968563-4-richard.henderson@linaro.org
[PMM: tweaked docs formatting, document -1 special-case,
 added fixup patch from RTH mentioning QEMU's maximum veclen.]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-27 10:57:40 +01:00
Richard Henderson
ce440581c1 target/arm: Export aarch64_sve_zcr_get_valid_len
Rename from sve_zcr_get_valid_len and make accessible
from outside of helper.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210723203344.968563-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-27 10:57:40 +01:00
Richard Henderson
dc0bc8e785 target/arm: Correctly bound length in sve_zcr_get_valid_len
Currently, our only caller is sve_zcr_len_for_el, which has
already masked the length extracted from ZCR_ELx, so the
masking done here is a nop.  But we will shortly have uses
from other locations, where the length will be unmasked.

Saturate the length to ARM_MAX_VQ instead of truncating to
the low 4 bits.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210723203344.968563-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-27 10:57:40 +01:00
Peter Maydell
d4f6883912 target/arm: Report M-profile alignment faults correctly to the guest
For M-profile, we weren't reporting alignment faults triggered by the
generic TCG code correctly to the guest.  These get passed into
arm_v7m_cpu_do_interrupt() as an EXCP_DATA_ABORT with an A-profile
style exception.fsr value of 1.  We didn't check for this, and so
they fell through into the default of "assume this is an MPU fault"
and were reported to the guest as a data access violation MPU fault.

Report these alignment faults as UsageFaults which set the UNALIGNED
bit in the UFSR.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210723162146.5167-4-peter.maydell@linaro.org
2021-07-27 10:57:39 +01:00
Peter Maydell
0c317eb3dd target/arm: Add missing 'return's after calling v7m_exception_taken()
In do_v7m_exception_exit(), we perform various checks as part of
performing the exception return.  If one of these checks fails, the
architecture requires that we take an appropriate exception on the
existing stackframe.  We implement this by calling
v7m_exception_taken() to set up to take the new exception, and then
immediately returning from do_v7m_exception_exit() without proceeding
any further with the unstack-and-exception-return process.

In a couple of checks that are new in v8.1M, we forgot the "return"
statement, with the effect that if bad code in the guest tripped over
these checks we would set up to take a UsageFault exception but then
blunder on trying to also unstack and return from the original
exception, with the probable result that the guest would crash.

Add the missing return statements.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210723162146.5167-3-peter.maydell@linaro.org
2021-07-27 10:57:39 +01:00
Peter Maydell
888f470f12 target/arm: Enforce that M-profile SP low 2 bits are always zero
For M-profile, unlike A-profile, the low 2 bits of SP are defined to be
RES0H, which is to say that they must be hardwired to zero so that
guest attempts to write non-zero values to them are ignored.

Implement this behaviour by masking out the low bits:
 * for writes to r13 by the gdbstub
 * for writes to any of the various flavours of SP via MSR
 * for writes to r13 via store_reg() in generated code

Note that all the direct uses of cpu_R[] in translate.c are in places
where the register is definitely not r13 (usually because that has
been checked for as an UNDEFINED or UNPREDICTABLE case and handled as
UNDEF).

All the other writes to regs[13] in C code are either:
 * A-profile only code
 * writes of values we can guarantee to be aligned, such as
   - writes of previous-SP-value plus or minus a 4-aligned constant
   - writes of the value in an SP limit register (which we already
     enforce to be aligned)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210723162146.5167-2-peter.maydell@linaro.org
2021-07-27 10:57:39 +01:00
Richard Henderson
b5cf742841 accel/tcg: Remove TranslatorOps.breakpoint_check
The hook is now unused, with breakpoints checked outside translation.

Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-07-21 07:47:05 -10:00
Richard Henderson
b00d86bc8b target/arm: Implement debug_check_breakpoint
Reuse the code at the bottom of helper_check_breakpoints,
which is what we currently call from *_tr_breakpoint_check.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-07-21 07:47:04 -10:00
Richard Henderson
be9568b4e0 tcg: Rename helper_atomic_*_mmu and provide for user-only
Always provide the atomic interface using TCGMemOpIdx oi
and uintptr_t retaddr.  Rename from helper_* to cpu_* so
as to (mostly) match the exec/cpu_ldst.h functions, and
to emphasize that they are not callable from TCG directly.

Tested-by: Cole Robinson <crobinso@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-07-21 07:45:38 -10:00
Peter Maydell
8fe612a183 target/arm: Remove duplicate 'plus1' function from Neon and SVE decode
The Neon and SVE decoders use private 'plus1' functions to implement
"add one" for the !function decoder syntax.  We have a generic
"plus_1" function in translate.h, so use that instead.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210715095341.701-1-peter.maydell@linaro.org
2021-07-18 10:59:47 +01:00
Richard Henderson
d102058e79 target/arm: Fix offsets for TTBCR
The functions vmsa_ttbcr_write and vmsa_ttbcr_raw_write expect
the offset to be for the complete TCR structure, not the offset
to the low 32-bits of a uint64_t.  Using offsetoflow32 in this
case breaks big-endian hosts.

For TTBCR2, we do want the high 32-bits of a uint64_t.
Use cp15.tcr_el[*].raw_tcr as the offsetofhigh32 argument to
clarify this.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/187
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210709230621.938821-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-18 10:59:46 +01:00
Peter Maydell
bd38ae26ce Add translator_use_goto_tb.
Cleanups in prep of breakpoint fixes.
 Misc fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmDpvModHHJpY2hhcmQu
 aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV/1jgf+J1JMsPfxlSCwbbdc
 WEuWEcuKdcDFqhsePa6LaPYHTKuEEwavTG0kPbLIVZW2f6BTBeSYxAC6EWhq7pWo
 MGMhIOZM3fF0Yj+azuoybu9qxQ/K/aLM3GYt/OU00mvzturBezz+ka8MvWCrUwta
 XlhxhwnKsSP7lDWPBBjcdIIGiFJyxIRoU43giWaXrsvsc8ORJbmy7rgZfTKAit+w
 AvtQlc7TBi5nImz6f/KmEoy8mHEOhMf7czzo+v0u97lTiNK717/AHEwMfX9J585O
 GjlA9XmUUsNAciuLy48F1rHkgJxYAwo0G2shklpqPaOP5FctKm1reCSb8VEfAGaX
 Xq3UVA==
 =E9i/
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-tcg-20210710' into staging

Add translator_use_goto_tb.
Cleanups in prep of breakpoint fixes.
Misc fixes.

# gpg: Signature made Sat 10 Jul 2021 16:29:14 BST
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth-gitlab/tags/pull-tcg-20210710: (41 commits)
  cpu: Add breakpoint tracepoints
  tcg: Remove TCG_TARGET_HAS_goto_ptr
  accel/tcg: Log tb->cflags with -d exec
  accel/tcg: Split out log_cpu_exec
  accel/tcg: Move tb_lookup to cpu-exec.c
  accel/tcg: Move helper_lookup_tb_ptr to cpu-exec.c
  target/i386: Use cpu_breakpoint_test in breakpoint_handler
  tcg: Fix prologue disassembly
  target/xtensa: Use translator_use_goto_tb
  target/tricore: Use tcg_gen_lookup_and_goto_ptr
  target/tricore: Use translator_use_goto_tb
  target/sparc: Use translator_use_goto_tb
  target/sh4: Use translator_use_goto_tb
  target/s390x: Remove use_exit_tb
  target/s390x: Use translator_use_goto_tb
  target/rx: Use translator_use_goto_tb
  target/riscv: Use translator_use_goto_tb
  target/ppc: Use translator_use_goto_tb
  target/openrisc: Use translator_use_goto_tb
  target/nios2: Use translator_use_goto_tb
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-12 11:02:39 +01:00
Peter Maydell
d1987c8114 * More SVM fixes (Lara)
* Module annotation database (Gerd)
 * Memory leak fixes (myself)
 * Build fixes (myself)
 * --with-devices-* support (Alex)
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDoeBgUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMtFAgAippmxRt3lt+tcdSrCOZlKmxW6veK
 nUidtzfH5uE8vQsh5Q98WCEq871C/C+St1gK+q2H/MLrJeAqZD39DV+SKTuZ6Tcp
 3jL0iYC+oO0OjkHppDQTUDweF9KrsAW1WEeNz2th1OUDSjBXuXbZ+N497taouX18
 p2UN0gKNsOO2/QFrKL5KO7vSC56eBGoZz6gKtw/7dDtJBtizf1xKBRHW43b+CnQJ
 mHLs7Tj6oMC+vnMHkUKLH/6za3WJF1XHs5fp2isRgqoOSP8m0r6CMg8JnFIvmQf/
 tbLospKSWqcgD5C5PlFm2wSOjdU7zuPKM7wchhKrrEIvdDPhXaKrlpwi5Q==
 =GFX1
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging

* More SVM fixes (Lara)
* Module annotation database (Gerd)
* Memory leak fixes (myself)
* Build fixes (myself)
* --with-devices-* support (Alex)

# gpg: Signature made Fri 09 Jul 2021 17:23:52 BST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini-gitlab/tags/for-upstream: (48 commits)
  meson: Use input/output for entitlements target
  configure: allow the selection of alternate config in the build
  configs: rename default-configs to configs and reorganise
  hw/arm: move CONFIG_V7M out of default-devices
  hw/arm: add dependency on OR_IRQ for XLNX_VERSAL
  meson: Introduce target-specific Kconfig
  meson: switch function tests from compilation to linking
  vl: fix leak of qdict_crumple return value
  target/i386: fix exceptions for MOV to DR
  target/i386: Added DR6 and DR7 consistency checks
  target/i386: Added MSRPM and IOPM size check
  monitor/tcg: move tcg hmp commands to accel/tcg, register them dynamically
  usb: build usb-host as module
  monitor/usb: register 'info usbhost' dynamically
  usb: drop usb_host_dev_is_scsi_storage hook
  monitor: allow register hmp commands
  accel: build tcg modular
  accel: add tcg module annotations
  accel: build qtest modular
  accel: add qtest module annotations
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-11 22:20:51 +01:00
Richard Henderson
97f11c8169 target/arm: Use translator_use_goto_tb for aarch32
Just use translator_use_goto_tb directly at the one call site,
rather than maintaining a local wrapper.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-07-09 09:42:28 -07:00
Richard Henderson
0285162bdf target/arm: Use translator_use_goto_tb for aarch64
We have not needed to end a TB for I/O since ba3e792669
("icount: clean up cpu_can_io at the entry to the block"),
and gdbstub singlestep is handled by the generic function.

Drop the unused 'n' argument to use_goto_tb.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-07-09 09:42:28 -07:00
Richard Henderson
73fce314db target/arm: Use DISAS_TOO_MANY for ISB and SB
Using gen_goto_tb directly misses the single-step check.
Let the branch or debug exception be emitted by arm_tr_tb_stop.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-07-09 09:41:53 -07:00
Philippe Mathieu-Daudé
1797b08d24 tcg: Avoid including 'trace-tcg.h' in target translate.c
The root trace-events only declares a single TCG event:

  $ git grep -w tcg trace-events
  trace-events:115:# tcg/tcg-op.c
  trace-events:137:vcpu tcg guest_mem_before(TCGv vaddr, uint16_t info) "info=%d", "vaddr=0x%016"PRIx64" info=%d"

and only a tcg/tcg-op.c uses it:

  $ git grep -l trace_guest_mem_before_tcg
  tcg/tcg-op.c

therefore it is pointless to include "trace-tcg.h" in each target
(because it is not used). Remove it.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210629050935.2570721-1-f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-07-09 09:38:33 -07:00
Philippe Mathieu-Daudé
f4063f9c31 meson: Introduce target-specific Kconfig
Add a target-specific Kconfig. We need the definitions in Kconfig so
the minikconf tool can verify they exits. However CONFIG_FOO is only
enabled for target foo via the meson.build rules.

Two architecture have a particularity, ARM and MIPS. As their
translators have been split you can potentially build a plain 32 bit
build along with a 64-bit version including the 32-bit subset.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210131111316.232778-6-f4bug@amsat.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210707131744.26027-2-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-09 18:21:34 +02:00
hnick@vmware.com
49a6f3bffb target/arm: Correct the encoding of MDCCSR_EL0 and DBGDSCRint
Signed-off-by: Nick Hudson <hnick@vmware.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-09 16:09:12 +01:00
Peter Maydell
04ea4d3cfd target/arm: Implement MVE shifts by register
Implement the MVE shifts by register, which perform
shifts on a single general-purpose register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-19-peter.maydell@linaro.org
2021-07-02 11:48:38 +01:00
Peter Maydell
46321d47a9 target/arm: Implement MVE shifts by immediate
Implement the MVE shifts by immediate, which perform shifts
on a single general-purpose register.

These patterns overlap with the long-shift-by-immediates,
so we have to rearrange the grouping a little here.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-18-peter.maydell@linaro.org
2021-07-02 11:48:37 +01:00
Peter Maydell
0aa4b4c358 target/arm: Implement MVE long shifts by register
Implement the MVE long shifts by register, which perform shifts on a
pair of general-purpose registers treated as a 64-bit quantity, with
the shift count in another general-purpose register, which might be
either positive or negative.

Like the long-shifts-by-immediate, these encodings sit in the space
that was previously the UNPREDICTABLE MOVS/ORRS with Rm==13,15.
Because LSLL_rr and ASRL_rr overlap with both MOV_rxri/ORR_rrri and
also with CSEL (as one of the previously-UNPREDICTABLE Rm==13 cases),
we have to move the CSEL pattern into the same decodetree group.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-17-peter.maydell@linaro.org
2021-07-02 11:48:37 +01:00
Peter Maydell
f4ae6c8cbd target/arm: Implement MVE long shifts by immediate
The MVE extension to v8.1M includes some new shift instructions which
sit entirely within the non-coprocessor part of the encoding space
and which operate only on general-purpose registers.  They take up
the space which was previously UNPREDICTABLE MOVS and ORRS encodings
with Rm == 13 or 15.

Implement the long shifts by immediate, which perform shifts on a
pair of general-purpose registers treated as a 64-bit quantity, with
an immediate shift count between 1 and 32.

Awkwardly, because the MOVS and ORRS trans functions do not UNDEF for
the Rm==13,15 case, we need to explicitly emit code to UNDEF for the
cases where v8.1M now requires that.  (Trying to change MOVS and ORRS
is too difficult, because the functions that generate the code are
shared between a dozen different kinds of arithmetic or logical
instruction for all A32, T16 and T32 encodings, and for some insns
and some encodings Rm==13,15 are valid.)

We make the helper functions we need for UQSHLL and SQSHLL take
a 32-bit value which the helper casts to int8_t because we'll need
these helpers also for the shift-by-register insns, where the shift
count might be < 0 or > 32.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-16-peter.maydell@linaro.org
2021-07-02 11:48:37 +01:00
Peter Maydell
d43ebd9dc8 target/arm: Implement MVE VADDLV
Implement the MVE VADDLV insn; this is similar to VADDV, except
that it accumulates 32-bit elements into a 64-bit accumulator
stored in a pair of general-purpose registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-15-peter.maydell@linaro.org
2021-07-02 11:48:37 +01:00
Peter Maydell
2e6a4ce0f6 target/arm: Implement MVE VSHLC
Implement the MVE VSHLC insn, which performs a shift left of the
entire vector with carry in bits provided from a general purpose
register and carry out bits written back to that register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-14-peter.maydell@linaro.org
2021-07-02 11:48:37 +01:00
Peter Maydell
d6f9e011e8 target/arm: Implement MVE saturating narrowing shifts
Implement the MVE saturating shift-right-and-narrow insns
VQSHRN, VQSHRUN, VQRSHRN and VQRSHRUN.

do_srshr() is borrowed from sve_helper.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-13-peter.maydell@linaro.org
2021-07-02 11:48:37 +01:00
Peter Maydell
162e265500 target/arm: Implement MVE VSHRN, VRSHRN
Implement the MVE shift-right-and-narrow insn VSHRN and VRSHRN.

do_urshr() is borrowed from sve_helper.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-12-peter.maydell@linaro.org
2021-07-02 11:48:37 +01:00
Peter Maydell
a78b25fa71 target/arm: Implement MVE VSRI, VSLI
Implement the MVE VSRI and VSLI insns, which perform a
shift-and-insert operation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-11-peter.maydell@linaro.org
2021-07-02 11:48:37 +01:00
Peter Maydell
c226270703 target/arm: Implement MVE VSHLL
Implement the MVE VHLL (vector shift left long) insn.  This has two
encodings: the T1 encoding is the usual shift-by-immediate format,
and the T2 encoding is a special case where the shift count is always
equal to the element size.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-10-peter.maydell@linaro.org
2021-07-02 11:48:37 +01:00
Peter Maydell
3394116f47 target/arm: Implement MVE vector shift right by immediate insns
Implement the MVE vector shift right by immediate insns VSHRI and
VRSHRI.  As with Neon, we implement these by using helper functions
which perform left shifts but allow negative shift counts to indicate
right shifts.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-9-peter.maydell@linaro.org
2021-07-02 11:48:37 +01:00
Peter Maydell
f9ed61741e target/arm: Implement MVE vector shift left by immediate insns
Implement the MVE shift-vector-left-by-immediate insns VSHL, VQSHL
and VQSHLU.

The size-and-immediate encoding here is the same as Neon, and we
handle it the same way neon-dp.decode does.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-8-peter.maydell@linaro.org
2021-07-02 11:48:37 +01:00
Peter Maydell
eab8413985 target/arm: Implement MVE logical immediate insns
Implement the MVE logical-immediate insns (VMOV, VMVN,
VORR and VBIC). These have essentially the same encoding
as their Neon equivalents, and we implement the decode
in the same way.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-7-peter.maydell@linaro.org
2021-07-02 11:48:36 +01:00
Peter Maydell
e4667a5b5e target/arm: Use dup_const() instead of bitfield_replicate()
Use dup_const() instead of bitfield_replicate() in
disas_simd_mod_imm().

(We can't replace the other use of bitfield_replicate() in this file,
in logic_imm_decode_wmask(), because that location needs to handle 2
and 4 bit elements, which dup_const() cannot.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-6-peter.maydell@linaro.org
2021-07-02 11:48:36 +01:00
Peter Maydell
2c0286dba4 target/arm: Use asimd_imm_const for A64 decode
The A64 AdvSIMD modified-immediate grouping uses almost the same
constant encoding that A32 Neon does; reuse asimd_imm_const() (to
which we add the AArch64-specific case for cmode 15 op 1) instead of
reimplementing it all.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-5-peter.maydell@linaro.org
2021-07-02 11:48:36 +01:00
Peter Maydell
dfd66bc0f3 target/arm: Make asimd_imm_const() public
The function asimd_imm_const() in translate-neon.c is an
implementation of the pseudocode AdvSIMDExpandImm(), which we will
also want for MVE.  Move the implementation to translate.c, with a
prototype in translate.h.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-4-peter.maydell@linaro.org
2021-07-02 11:48:36 +01:00
Peter Maydell
303db86fc7 target/arm: Fix bugs in MVE VRMLALDAVH, VRMLSLDAVH
The initial implementation of the MVE VRMLALDAVH and VRMLSLDAVH
insns had some bugs:
 * the 32x32 multiply of elements was being done as 32x32->32,
   not 32x32->64
 * we were incorrectly maintaining the accumulator in its full
   72-bit form across all 4 beats of the insn; in the pseudocode
   it is squashed back into the 64 bits of the RdaHi:RdaLo
   registers after each beat

In particular, fixing the second of these allows us to recast
the implementation to avoid 128-bit arithmetic entirely.

Since the element size here is always 4, we can also drop the
parameterization of ESIZE to make the code a little more readable.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-3-peter.maydell@linaro.org
2021-07-02 11:48:36 +01:00
Peter Maydell
d59ccc30f6 target/arm: Fix MVE widening/narrowing VLDR/VSTR offset calculation
In do_ldst(), the calculation of the offset needs to be based on the
size of the memory access, not the size of the elements in the
vector.  This meant we were getting it wrong for the widening and
narrowing variants of the various VLDR and VSTR insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-2-peter.maydell@linaro.org
2021-07-02 11:48:36 +01:00
Joe Komlodi
103e7579dd target/arm: Check NaN mode before silencing NaN
If the CPU is running in default NaN mode (FPCR.DN == 1) and we execute
FRSQRTE, FRECPE, or FRECPX with a signaling NaN, parts_silence_nan_frac() will
assert due to fpst->default_nan_mode being set.

To avoid this, we check to see what NaN mode we're running in before we call
floatxx_silence_nan().

Signed-off-by: Joe Komlodi <joe.komlodi@xilinx.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1624662174-175828-2-git-send-email-joe.komlodi@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-02 11:48:36 +01:00
Richard Henderson
ebdd503d45 target/arm: Improve REVSH
The new bswap flags can implement the semantics exactly.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-06-29 10:04:57 -07:00
Richard Henderson
50a7470e3e target/arm: Improve vector REV
We can eliminate the requirement for a zero-extended output,
because the following store will ignore any garbage high bits.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-06-29 10:04:57 -07:00
Richard Henderson
2b0a39e51e target/arm: Improve REV32
For the sf version, we are performing two 32-bit bswaps
in either half of the register.  This is equivalent to
performing one 64-bit bswap followed by a rotate.

For the non-sf version, we can remove TCG_BSWAP_IZ
and the preceding zero-extension.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-06-29 10:04:57 -07:00
Richard Henderson
2b836c2ac1 tcg: Add flags argument to tcg_gen_bswap16_*, tcg_gen_bswap32_i64
Implement the new semantics in the fallback expansion.
Change all callers to supply the flags that keep the
semantics unchanged locally.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-06-29 10:04:57 -07:00
Peter Collingbourne
86f0d4c729 target/arm: Implement MTE3
MTE3 introduces an asymmetric tag checking mode, in which loads are
checked synchronously and stores are checked asynchronously. Add
support for it.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210616195614.11785-1-pcc@google.com
[PMM: Add line to emulation.rst]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-24 14:58:48 +01:00
Peter Maydell
4f57ef959c target/arm: Make VMOV scalar <-> gpreg beatwise for MVE
In a CPU with MVE, the VMOV (vector lane to general-purpose register)
and VMOV (general-purpose register to vector lane) insns are not
predicated, but they are subject to beatwise execution if they
are not in an IT block.

Since our implementation always executes all 4 beats in one tick,
this means only that we need to handle PSR.ECI:
 * we must do the usual check for bad ECI state
 * we must advance ECI state if the insn succeeds
 * if ECI says we should not be executing the beat corresponding
   to the lane of the vector register being accessed then we
   should skip performing the move

Note that if PSR.ECI is non-zero then we cannot be in an IT block.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-45-peter.maydell@linaro.org
2021-06-24 14:58:48 +01:00
Peter Maydell
6f060a636b target/arm: Implement MVE VADDV
Implement the MVE VADDV insn, which performs an addition
across vector lanes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-44-peter.maydell@linaro.org
2021-06-24 14:58:48 +01:00
Peter Maydell
8625693ac4 target/arm: Implement MVE VHCADD
Implement the MVE VHCADD insn, which is similar to VCADD
but performs a halving step. This one overlaps with VADC.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-43-peter.maydell@linaro.org
2021-06-24 14:58:48 +01:00
Peter Maydell
67ec113b11 target/arm: Implement MVE VCADD
Implement the MVE VCADD insn, which performs a complex add with
rotate.  Note that the size=0b11 encoding is VSBC.

The architecture grants some leeway for the "destination and Vm
source overlap" case for the size MO_32 case, but we choose not to
make use of it, instead always calculating all 16 bytes worth of
results before setting the destination register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-42-peter.maydell@linaro.org
2021-06-24 14:58:48 +01:00
Peter Maydell
89bc4c4f78 target/arm: Implement MVE VADC, VSBC
Implement the MVE VADC and VSBC insns.  These perform an
add-with-carry or subtract-with-carry of the 32-bit elements in each
lane of the input vectors, where the carry-out of each add is the
carry-in of the next.  The initial carry input is either 1 or is from
FPSCR.C; the carry out at the end is written back to FPSCR.C.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-41-peter.maydell@linaro.org
2021-06-24 14:58:48 +01:00
Peter Maydell
1eb987a89d target/arm: Implement MVE VRHADD
Implement the MVE VRHADD insn, which performs a rounded halving
addition.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-40-peter.maydell@linaro.org
2021-06-24 14:58:48 +01:00
Peter Maydell
43364321f3 target/arm: Implement MVE VQDMULL (vector)
Implement the vector form of the MVE VQDMULL insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-39-peter.maydell@linaro.org
2021-06-24 14:58:48 +01:00
Peter Maydell
92f117326a target/arm: Implement MVE VQDMLSDH and VQRDMLSDH
Implement the MVE VQDMLSDH and VQRDMLSDH insns, which are
like VQDMLADH and VQRDMLADH except that products are subtracted
rather than added.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-38-peter.maydell@linaro.org
2021-06-24 14:58:48 +01:00
Peter Maydell
fd677f8055 target/arm: Implement MVE VQDMLADH and VQRDMLADH
Implement the MVE VQDMLADH and VQRDMLADH insns.  These multiply
elements, and then add pairs of products, double, possibly round,
saturate and return the high half of the result.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-37-peter.maydell@linaro.org
2021-06-24 14:58:48 +01:00
Peter Maydell
bb002345eb target/arm: Implement MVE VRSHL
Implement the MVE VRSHL insn (vector form).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-36-peter.maydell@linaro.org
2021-06-24 14:58:47 +01:00
Peter Maydell
0372cad813 target/arm: Implement MVE VSHL insn
Implement the MVE VSHL insn (vector form).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-35-peter.maydell@linaro.org
2021-06-24 14:58:47 +01:00
Peter Maydell
9dc868c41d target/arm: Implement MVE VQRSHL
Implement the MV VQRSHL (vector) insn.  Again, the code to perform
the actual shifts is borrowed from neon_helper.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-34-peter.maydell@linaro.org
2021-06-24 14:58:47 +01:00
Peter Maydell
483da66139 target/arm: Implement MVE VQSHL (vector)
Implement the MVE VQSHL insn (encoding T4, which is the
vector-shift-by-vector version).

The DO_SQSHL_OP and DO_UQSHL_OP macros here are derived from
the neon_helper.c code for qshl_u{8,16,32} and qshl_s{8,16,32}.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-33-peter.maydell@linaro.org
2021-06-24 14:58:47 +01:00
Peter Maydell
f741707bb3 target/arm: Implement MVE VQADD, VQSUB (vector)
Implement the vector forms of the MVE VQADD and VQSUB insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-32-peter.maydell@linaro.org
2021-06-24 14:58:47 +01:00
Peter Maydell
380caf6c07 target/arm: Implement MVE VQDMULH, VQRDMULH (vector)
Implement the vector forms of the MVE VQDMULH and VQRDMULH insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-31-peter.maydell@linaro.org
2021-06-24 14:58:47 +01:00
Peter Maydell
a88903537d target/arm: Implement MVE VQDMULL scalar
Implement the MVE VQDMULL scalar insn. This multiplies the top or
bottom half of each element by the scalar, doubles and saturates
to a double-width result.

Note that this encoding overlaps with VQADD and VQSUB; it uses
what in VQADD and VQSUB would be the 'size=0b11' encoding.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-30-peter.maydell@linaro.org
2021-06-24 14:58:47 +01:00
Peter Maydell
66c0576754 target/arm: Implement MVE VQDMULH and VQRDMULH (scalar)
Implement the MVE VQDMULH and VQRDMULH scalar insns, which multiply
elements by the scalar, double, possibly round, take the high half
and saturate.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-29-peter.maydell@linaro.org
2021-06-24 14:58:47 +01:00
Peter Maydell
39f2ec8592 target/arm: Implement MVE VQADD and VQSUB
Implement the MVE VQADD and VQSUB insns, which perform saturating
addition of a scalar to each element.  Note that individual bytes of
each result element are used or discarded according to the predicate
mask, but FPSCR.QC is only set if the predicate mask for the lowest
byte of the element is set.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-28-peter.maydell@linaro.org
2021-06-24 14:58:47 +01:00
Peter Maydell
387debdb93 target/arm: Implement MVE VPST
Implement the MVE VPST insn, which sets the predicate mask
fields in the VPR to the immediate value encoded in the insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-27-peter.maydell@linaro.org
2021-06-24 14:58:47 +01:00
Peter Maydell
b050543b68 target/arm: Implement MVE VBRSR
Implement the MVE VBRSR insn, which reverses a specified
number of bits in each element, setting the rest to zero.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-26-peter.maydell@linaro.org
2021-06-24 14:58:47 +01:00
Peter Maydell
644f717c35 target/arm: Implement MVE VHADD, VHSUB (scalar)
Implement the scalar variants of the MVE VHADD and VHSUB insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-25-peter.maydell@linaro.org
2021-06-24 14:58:47 +01:00
Peter Maydell
91a358fdfb target/arm: Implement MVE VSUB, VMUL (scalar)
Implement the scalar forms of the MVE VSUB and VMUL insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-24-peter.maydell@linaro.org
2021-06-24 14:58:47 +01:00
Peter Maydell
e51896b386 target/arm: Implement MVE VADD (scalar)
Implement the scalar form of the MVE VADD insn. This takes the
scalar operand from a general purpose register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-23-peter.maydell@linaro.org
2021-06-24 14:58:47 +01:00
Peter Maydell
3854874733 target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH
Implement the MVE VRMLALDAVH and VRMLSLDAVH insns, which accumulate
the results of a rounded multiply of pairs of elements into a 72-bit
accumulator, returning the top 64 bits in a pair of general purpose
registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-22-peter.maydell@linaro.org
2021-06-21 17:12:51 +01:00
Peter Maydell
181cd97143 target/arm: Implement MVE VMLSLDAV
Implement the MVE insn VMLSLDAV, which multiplies source elements,
alternately adding and subtracting them, and accumulates into a
64-bit result in a pair of general purpose registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-21-peter.maydell@linaro.org
2021-06-21 17:12:51 +01:00
Peter Maydell
1d2386f70a target/arm: Implement MVE VMLALDAV
Implement the MVE VMLALDAV insn, which multiplies pairs of integer
elements, accumulating them into a 64-bit result in a pair of
general-purpose registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-20-peter.maydell@linaro.org
2021-06-21 17:12:51 +01:00
Peter Maydell
ac6ad1dca8 target/arm: Implement MVE VMULL
Implement the MVE VMULL insn, which multiplies two single
width integer elements to produce a double width result.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-19-peter.maydell@linaro.org
2021-06-21 17:12:51 +01:00
Peter Maydell
abc48e310c target/arm: Implement MVE VHADD, VHSUB
Implement MVE VHADD and VHSUB insns, which perform an addition
or subtraction and then halve the result.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-18-peter.maydell@linaro.org
2021-06-21 17:12:51 +01:00
Peter Maydell
bc67aa8d56 target/arm: Implement MVE VABD
Implement the MVE VABD insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-17-peter.maydell@linaro.org
2021-06-21 17:12:50 +01:00
Peter Maydell
cd367ff391 target/arm: Implement MVE VMAX, VMIN
Implement the MVE VMAX and VMIN insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-16-peter.maydell@linaro.org
2021-06-21 17:12:50 +01:00
Peter Maydell
fca87b78f3 target/arm: Implement MVE VRMULH
Implement the MVE VRMULH insn, which performs a rounding multiply
and then returns the high half.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-15-peter.maydell@linaro.org
2021-06-21 17:12:50 +01:00
Peter Maydell
ba62cc56e8 target/arm: Implement MVE VMULH
Implement the MVE VMULH insn, which performs a vector
multiply and returns the high half of the result.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-14-peter.maydell@linaro.org
2021-06-21 17:12:50 +01:00
Peter Maydell
9333fe4dd3 target/arm: Implement MVE VADD, VSUB, VMUL
Implement the MVE VADD, VSUB and VMUL insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-13-peter.maydell@linaro.org
2021-06-21 17:12:50 +01:00
Peter Maydell
68245e442c target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR
Implement the MVE vector logical operations operating
on two registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-12-peter.maydell@linaro.org
2021-06-21 17:12:50 +01:00
Peter Maydell
ab59362fca target/arm: Implement MVE VDUP
Implement the MVE VDUP insn, which duplicates a value from
a general-purpose register into every lane of a vector
register (subject to predication).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-11-peter.maydell@linaro.org
2021-06-21 17:12:50 +01:00
Peter Maydell
399a8c766c target/arm: Implement MVE VNEG
Implement the MVE VNEG insn (both integer and floating point forms).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-9-peter.maydell@linaro.org
2021-06-21 17:12:50 +01:00
Peter Maydell
59c9177338 target/arm: Implement MVE VABS
Implement the MVE VABS functions (both integer and floating point).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-8-peter.maydell@linaro.org
2021-06-21 17:12:50 +01:00
Peter Maydell
8abd3c80b1 target/arm: Implement MVE VMVN (register)
Implement the MVE VMVN(register) operation.  Note that for
predication this operation is byte-by-byte.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-7-peter.maydell@linaro.org
2021-06-21 17:12:50 +01:00
Peter Maydell
249b5309c4 target/arm: Implement MVE VREV16, VREV32, VREV64
Implement the MVE instructions VREV16, VREV32 and VREV64.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-6-peter.maydell@linaro.org
2021-06-21 17:12:50 +01:00
Peter Maydell
6437f1f77c target/arm: Implement MVE VCLS
Implement the MVE VCLS insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-5-peter.maydell@linaro.org
2021-06-21 17:12:50 +01:00
Peter Maydell
0f0f2bd548 target/arm: Implement MVE VCLZ
Implement the MVE VCLZ insn (and the necessary machinery
for MVE 1-input vector ops).

Note that for non-load instructions predication is always performed
at a byte level granularity regardless of element size (R_ZLSJ),
and so the masking logic here differs from that used in the VLDR
and VSTR helpers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-4-peter.maydell@linaro.org
2021-06-21 16:49:38 +01:00
Peter Maydell
2fc6b7510c target/arm: Implement widening/narrowing MVE VLDR/VSTR insns
Implement the variants of MVE VLDR (encodings T1, T2) which perform
"widening" loads where bytes or halfwords are loaded from memory and
zero or sign-extended into halfword or word length vector elements,
and the narrowing MVE VSTR (encodings T1, T2) where bytes or
halfwords are stored from halfword or word elements.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-3-peter.maydell@linaro.org
2021-06-21 16:49:38 +01:00
Peter Maydell
507b6a500c target/arm: Implement MVE VLDR/VSTR (non-widening forms)
Implement the forms of the MVE VLDR and VSTR insns which perform
non-widening loads of bytes, halfwords or words from memory into
vector elements of the same width (encodings T5, T6, T7).

(At the moment we know for MVE and M-profile in general that
vfp_access_check() can never return false, but we include the
conventional return-true-on-failure check for consistency
with non-M-profile translation code.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210617121628.20116-2-peter.maydell@linaro.org
2021-06-21 16:49:38 +01:00
Peter Maydell
88137f787f target/arm: Handle FPU check for FPCXT_NS insns via vfp_access_check_m()
Instead of open-coding the "take NOCP exception if FPU disabled,
otherwise call gen_preserve_fp_state()" code in the accessors for
FPCXT_NS, add an argument to vfp_access_check_m() which tells it to
skip the gen_update_fp_context() call, so we can use it for the
FPCXT_NS case.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-8-peter.maydell@linaro.org
2021-06-21 16:49:38 +01:00
Peter Maydell
e8cedaf779 target/arm: Split vfp_access_check() into A and M versions
vfp_access_check and its helper routine full_vfp_access_check() has
gradually grown and is now an awkward mix of A-profile only and
M-profile only pieces.  Refactor it into an A-profile only and an
M-profile only version, taking advantage of the fact that now the
only direct call to full_vfp_access_check() is in A-profile-only
code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-7-peter.maydell@linaro.org
2021-06-21 16:49:38 +01:00
Peter Maydell
95aceeeac9 target/arm: Factor FP context update code out into helper function
Factor the code in full_vfp_access_check() which updates the
ownership of the FP context and creates a new FP context
out into its own function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-6-peter.maydell@linaro.org
2021-06-21 16:49:38 +01:00
Peter Maydell
e494cd0a1a target/arm: Handle writeback in VLDR/VSTR sysreg with no memory access
A few subcases of VLDR/VSTR sysreg succeed but do not perform a
memory access:
 * VSTR of VPR when unprivileged
 * VLDR to VPR when unprivileged
 * VLDR to FPCXT_NS when fpInactive

In these cases, even though we don't do the memory access we should
still update the base register and perform the stack limit check if
the insn's addressing mode specifies writeback.  Our implementation
failed to do this, because we handle these side-effects inside the
memory_to_fp_sysreg() and fp_sysreg_to_memory() callback functions,
which are only called if there's something to load or store.

Fix this by adding an extra argument to the callbacks which is set to
true to actually perform the access and false to only do side effects
like writeback, and calling the callback with do_access = false
for the three cases listed above.

This produces slightly suboptimal code for the case of a write
to FPCXT_NS when the FPU is inactive and the insn didn't have
side effects (ie no writeback, or via VMSR), in which case we'll
generate a conditional branch over an unconditional branch.
But this doesn't seem to be important enough to merit requiring
the callback to report back whether it generated any code or not.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-5-peter.maydell@linaro.org
2021-06-21 16:49:37 +01:00
Peter Maydell
fa856736b6 target/arm: Don't NOCP fault for FPCXT_NS accesses
The M-profile architecture requires that accesses to FPCXT_NS when
there is no active FP state must not take a NOCP fault even if the
FPU is disabled. We were not implementing this correctly, because
in our decode we catch the NOCP faults early in m-nocp.decode.

Fix this bug by moving all the handling of M-profile FP system
register accesses from vfp.decode into m-nocp.decode and putting
it above the NOCP blocks. This provides the correct behaviour:
 * for accesses other than FPCXT_NS the trans functions call
   vfp_access_check(), which will check for FPU disabled and
   raise a NOCP exception if necessary
 * for FPCXT_NS we have the special case code that doesn't
   call vfp_access_check()
 * when these trans functions want to raise an UNDEF they return
   false, so the decoder will fall through into the NOCP blocks.
   This means that NOCP correctly takes precedence over UNDEF
   for these insns. (This is a difference from the other insns
   handled by m-nocp.decode, where UNDEF takes precedence and
   which we implement by having those trans functions call
   unallocated_encoding() in the appropriate places.)

[Note for backport to stable: this commit has a semantic dependency
on commit 9a486856e9, which was not marked as cc-stable because
we didn't know we'd need it for a for-stable bugfix.]

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-4-peter.maydell@linaro.org
2021-06-21 16:49:37 +01:00
Peter Maydell
9931d9d84b target/arm: Handle FPU being disabled in FPCXT_NS accesses
If the guest makes an FPCXT_NS access when the FPU is disabled,
one of two things happens:
 * if there is no active FP context, then the insn behaves the
   same way as if the FPU was enabled: writes ignored, reads
   same value as FPDSCR_NS
 * if there is an active FP context, then we take a NOCP
   exception

Add code to the sysreg read/write functions which emits
code to take the NOCP exception in the latter case.

At the moment this will never be used, because the NOCP checks in
m-nocp.decode happen first, and so the trans functions are never
called when the FPU is disabled.  The code will be needed when we
move the sysreg access insns to before the NOCP patterns in the
following commit.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-3-peter.maydell@linaro.org
2021-06-21 16:49:37 +01:00
Peter Maydell
41b3ffc599 target/arm/translate-vfp.c: Whitespace fixes
In the code for handling VFP system register accesses there is some
stray whitespace after a unary '-' operator, and also some incorrect
indent in a couple of function prototypes.  We're about to move this
code to another file, so fix the code style issues first so
checkpatch doesn't complain about the code-movement patch.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210618141019.10671-2-peter.maydell@linaro.org
2021-06-21 16:49:37 +01:00
Peter Maydell
15613357ba target/arm: Use acpi_ghes_present() to see if we report ACPI memory errors
The virt_is_acpi_enabled() function is specific to the virt board, as
is the check for its 'ras' property.  Use the new acpi_ghes_present()
function to check whether we should report memory errors via
acpi_ghes_record_errors().

This avoids a link error if QEMU was built without support for the
virt board, and provides a mechanism that can be used by any future
board models that want to add ACPI memory error reporting support
(they only need to call acpi_ghes_add_fw_cfg()).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Dongjiu Geng <gengdongjiu1@gmail.com>
Message-id: 20210603171259.27962-4-peter.maydell@linaro.org
2021-06-21 16:49:37 +01:00
Peter Maydell
dbcf6f9367 bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operations
Currently the ARM SVE helper code defines locally some utility
functions for swapping 16-bit halfwords within 32-bit or 64-bit
values and for swapping 32-bit words within 64-bit values,
parallel to the byte-swapping bswap16/32/64 functions.

We want these also for the ARM MVE code, and they're potentially
generally useful for other targets, so move them to bitops.h.
(We don't put them in bswap.h with the bswap* functions because
they are implemented in terms of the rotate operations also
defined in bitops.h, and including bitops.h from bswap.h seems
better avoided.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210614151007.4545-17-peter.maydell@linaro.org
2021-06-16 14:33:52 +01:00
Peter Maydell
77f96148f3 target/arm: Move expand_pred_b() data to vec_helper.c
For MVE, we want to re-use the large data table from expand_pred_b().
Move the data table to vec_helper.c so it is no longer in an SVE
specific source file.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-14-peter.maydell@linaro.org
2021-06-16 14:33:52 +01:00
Peter Maydell
6390eed45c target/arm: Add framework for MVE decode
Add the framework for decoding MVE insns, with the necessary new
files and the meson.build rules, but no actual content yet.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-11-peter.maydell@linaro.org
2021-06-16 14:33:52 +01:00
Peter Maydell
a454ea1e6d target/arm: Implement MVE LETP insn
Implement the MVE LETP insn.  This is like the existing LE loop-end
insn, but it must perform an FPU-enabled check, and on loop-exit it
resets LTPSIZE to 4.

To accommodate the requirement to do something on loop-exit, we drop
the use of condlabel and instead manage both the TB exits manually,
in the same way we already do in trans_WLS().

The other MVE-specific change to the LE insn is that we must raise an
INVSTATE UsageFault insn if LTPSIZE is not 4.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-10-peter.maydell@linaro.org
2021-06-16 14:33:52 +01:00
Peter Maydell
40a36f003c target/arm: Implement MVE DLSTP
Implement the MVE DLSTP insn; this is like the existing DLS
insn, except that it must do an FPU access check and it
sets LTPSIZE to the value specified in the insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-9-peter.maydell@linaro.org
2021-06-16 14:33:52 +01:00
Peter Maydell
6822abfdf8 target/arm: Implement MVE WLSTP insn
Implement the MVE WLSTP insn; this is like the existing WLS insn,
except that it specifies a size value which is used to set
FPSCR.LTPSIZE.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-8-peter.maydell@linaro.org
2021-06-16 14:33:52 +01:00
Peter Maydell
76c32d721d target/arm: Implement MVE LCTP
Implement the MVE LCTP instruction.

We put its decode and implementation with the other
low-overhead-branch insns because although it is only present if MVE
is implemented it is logically in the same group as the other LOB
insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-7-peter.maydell@linaro.org
2021-06-16 14:33:52 +01:00
Peter Maydell
9a486856e9 target/arm: Let vfp_access_check() handle late NOCP checks
In commit a3494d4671 we reworked the M-profile handling of its
checks for when the NOCP exception should be raised because the FPU
is disabled, so that (in line with the architecture) the NOCP check
is done early over a large range of the encoding space, and takes
precedence over UNDEF exceptions.  As part of this, we removed the
code from full_vfp_access_check() which raised an exception there for
M-profile with the FPU disabled, because it was no longer reachable.

For MVE, some instructions which are outside the "coprocessor space"
region of the encoding space must nonetheless do "is the FPU enabled"
checks and possibly raise a NOCP exception.  (In particular this
covers the MVE-specific low-overhead branch insns LCTP, DLSTP and
WLSTP.) To support these insns, reinstate the code in
full_vfp_access_check(), so that their trans functions can call
vfp_access_check() and get the correct behaviour.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-6-peter.maydell@linaro.org
2021-06-16 14:33:52 +01:00
Peter Maydell
5138bd0143 target/arm: Add handling for PSR.ECI/ICI
On A-profile, PSR bits [15:10][26:25] are always the IT state bits.
On M-profile, some of the reserved encodings of the IT state are used
to instead indicate partial progress through instructions that were
interrupted partway through by an exception and can be resumed.

These resumable instructions fall into two categories:

(1) load/store multiple instructions, where these bits are called
"ICI" and specify the register in the ldm/stm list where execution
should resume.  (Specifically: LDM, STM, VLDM, VSTM, VLLDM, VLSTM,
CLRM, VSCCLRM.)

(2) MVE instructions subject to beatwise execution, where these bits
are called "ECI" and specify which beats in this and possibly also
the following MVE insn have been executed.

There are also a few insns (LE, LETP, and BKPT) which do not use the
ICI/ECI bits but must leave them alone.

Otherwise, we should raise an INVSTATE UsageFault for any attempt to
execute an insn with non-zero ICI/ECI bits.

So far we have been able to ignore ECI/ICI, because the architecture
allows the IMPDEF choice of "always restart load/store multiple from
the beginning regardless of ICI state", so the only thing we have
been missing is that we don't raise the INVSTATE fault for bad guest
code.  However, MVE requires that we honour ECI bits and do not
rexecute beats of an insn that have already been executed.

Add the support in the decoder for handling ECI/ICI:
 * identify the ECI/ICI case in the CONDEXEC TB flags
 * when a load/store multiple insn succeeds, it updates the ECI/ICI
   state (both in DisasContext and in the CPU state), and sets a flag
   to say that the ECI/ICI state was handled
 * if we find that the insn we just decoded did not handle the
   ECI/ICI state, we delete all the code that we just generated for
   it and instead emit the code to raise the INVFAULT.  This allows
   us to avoid having to update every non-MVE non-LDM/STM insn to
   make it check for "is ECI/ICI set?".

We continue with our existing IMPDEF choice of not caring about the
ICI state for the load/store multiples and simply restarting them
from the beginning.  Because we don't allow interrupts in the middle
of an insn, the only way we would see this state is if the guest set
ICI manually on return from an exception handler, so it's a corner
case which doesn't merit optimisation.

ICI update for LDM/STM is simple -- it always zeroes the state.  ECI
update for MVE beatwise insns will be a little more complex, since
the ECI state may include information for the following insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-5-peter.maydell@linaro.org
2021-06-16 14:33:52 +01:00
Peter Maydell
375256a846 target/arm: Handle VPR semantics in existing code
When MVE is supported, the VPR register has a place on the exception
stack frame in a previously reserved slot just above the FPSCR.
It must also be zeroed in various situations when we invalidate
FPU context.

Update the code which handles the stack frames (exception entry and
exit code, VLLDM, and VLSTM) to save/restore VPR.

Update code which invalidates FP registers (mostly also exception
entry and exit code, but also VSCCLRM and the code in
full_vfp_access_check() that corresponds to the ExecuteFPCheck()
pseudocode) to zero VPR.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-4-peter.maydell@linaro.org
2021-06-16 14:33:52 +01:00
Peter Maydell
c485ce2c49 target/arm: Enable FPSCR.QC bit for MVE
MVE has an FPSCR.QC bit similar to the A-profile Neon one; when MVE
is implemented make the bit writeable, both in the generic "load and
store FPSCR" helper functions and in the code for handling the NZCVQC
sysreg which we had previously left as "TODO when we implement MVE".

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-3-peter.maydell@linaro.org
2021-06-16 14:33:52 +01:00
Peter Maydell
6e802db3c4 target/arm: Provide and use H8 and H1_8 macros
Currently we provide Hn and H1_n macros for accessing the correct
data within arrays of vector elements of size 1, 2 and 4, accounting
for host endianness.  We don't provide any macros for elements of
size 8 because there the host endianness doesn't matter.  However,
this does result in awkwardness where we need to pass empty arguments
to macros, because checkpatch complains about them.  The empty
argument is a little confusing for humans to read as well.

Add H8() and H1_8() macros and use them where we were previously
passing empty arguments to macros.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-2-peter.maydell@linaro.org
Message-id: 20210610132505.5827-1-peter.maydell@linaro.org
2021-06-16 14:33:51 +01:00
Richard Henderson
d3327a38cd target/arm: Fix mte page crossing test
The test was off-by-one, because tag_last points to the
last byte of the tag to check, thus tag_last - prev_page
will equal TARGET_PAGE_SIZE when we use the first byte
of the next page.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/403
Reported-by: Peter Collingbourne <pcc@google.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210612195707.840217-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-16 14:33:51 +01:00
Richard Henderson
475d696af7 target/arm: Diagnose UNALLOCATED in disas_simd_three_reg_same_fp16
This fprintf+assert has been in place since the beginning.
It is after to the fp_access_check, so we need to move the
check up.  Fold that in to the pairwise filter.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210604183506.916654-4-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-15 16:18:48 +01:00
Richard Henderson
0af4d13b31 target/arm: Remove fprintf from disas_simd_mod_imm
The default of this switch is truly unreachable.
The switch selector is 3 bits, and all 8 cases are present.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210604183506.916654-3-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-15 16:18:48 +01:00
Richard Henderson
cd39e773e0 target/arm: Diagnose UNALLOCATED in disas_simd_two_reg_misc_fp16
This fprintf+assert has been in place since the beginning.
It is prior to the fp_access_check, so we're still good to
raise sigill here.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/381
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210604183506.916654-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-15 16:18:48 +01:00
Richard Henderson
3c93dfa42c target/arm: Enable BFloat16 extensions
Disable BF16 again for !have_neon and !have_vfp during realize.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-13-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-03 16:43:26 +01:00
Richard Henderson
458d0ab683 target/arm: Implement bfloat widening fma (indexed)
This is BFMLAL{B,T} for both AArch64 AdvSIMD and SVE,
and VFMA{B,T}.BF16 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-11-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-03 16:43:26 +01:00
Richard Henderson
5693887f2e target/arm: Implement bfloat widening fma (vector)
This is BFMLAL{B,T} for both AArch64 AdvSIMD and SVE,
and VFMA{B,T}.BF16 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-10-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-03 16:43:26 +01:00
Richard Henderson
81266a1f58 target/arm: Implement bfloat16 matrix multiply accumulate
This is BFMMLA for both AArch64 AdvSIMD and SVE,
and VMMLA.BF16 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-9-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-03 16:43:26 +01:00
Richard Henderson
839144784b target/arm: Implement bfloat16 dot product (indexed)
This is BFDOT for both AArch64 AdvSIMD and SVE,
and VDOT.BF16 for AArch32 NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-8-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-03 16:43:26 +01:00
Richard Henderson
cb8657f7f9 target/arm: Implement bfloat16 dot product (vector)
This is BFDOT for both AArch64 AdvSIMD and SVE,
and VDOT.BF16 for AArch32 NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-7-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-03 16:43:26 +01:00
Richard Henderson
d29b17ca3e target/arm: Implement vector float32 to bfloat16 conversion
This is BFCVT{N,T} for both AArch64 AdvSIMD and SVE,
and VCVT.BF16.F32 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-03 16:43:26 +01:00
Richard Henderson
3a98ac40fa target/arm: Implement scalar float32 to bfloat16 conversion
This is the 64-bit BFCVT and the 32-bit VCVT{B,T}.BF16.F32.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-03 16:43:26 +01:00