qemu/target
Peter Maydell 39ef8286e6 target/arm: Fix SVE SDOT/UDOT/USDOT (4-way, indexed)
Our implementation of the indexed version of SVE SDOT/UDOT/USDOT got
the calculation of the inner loop terminator wrong.  Although we
correctly account for the element size when we calculate the
terminator for the first iteration:
   intptr_t segend = MIN(16 / sizeof(TYPED), opr_sz_n);
we don't do that when we move it forward after the first inner loop
completes.  The intention is that we process the vector in 128-bit
segments, which for a 64-bit element size should mean (1, 2), (3, 4),
(5, 6), etc.  This bug meant that we would iterate (1, 2), (3, 4, 5,
6), (7, 8, 9, 10) etc and apply the wrong indexed element to some of
the operations, and also index off the end of the vector.

You don't see this bug if the vector length is small enough that we
don't need to iterate the outer loop, i.e.  if it is only 128 bits,
or if it is the 64-bit special case from AA32/AA64 AdvSIMD.  If the
vector length is 256 bits then we calculate the right results for the
elements in the vector but do index off the end of the vector. Vector
lengths greater than 256 bits see wrong answers. The instructions
that produce 32-bit results behave correctly.

Fix the recalculation of 'segend' for subsequent iterations, and
restore a version of the comment that was lost in the refactor of
commit 7020ffd656 that explains why we only need to clamp segend to
opr_sz_n for the first iteration, not the later ones.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2595
Fixes: 7020ffd656 ("target/arm: Macroize helper_gvec_{s,u}dot_idx_{b,h}")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241101185544.2130972-1-peter.maydell@linaro.org
(cherry picked from commit e6b2fa1b81)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-11-08 13:02:58 +03:00
..
alpha target/alpha: Prefer fast cpu_env() over slower CPU QOM cast macro 2024-03-12 11:46:16 +01:00
arm target/arm: Fix SVE SDOT/UDOT/USDOT (4-way, indexed) 2024-11-08 13:02:58 +03:00
avr target/avr: Prefer fast cpu_env() over slower CPU QOM cast macro 2024-03-12 11:46:17 +01:00
cris target/cris: Prefer fast cpu_env() over slower CPU QOM cast macro 2024-03-12 11:46:17 +01:00
hexagon target/hexagon: don't look for static glib 2024-08-28 08:37:29 +03:00
hppa target/hppa: Fix PSW V-bit packaging in cpu_hppa_get for hppa64 2024-09-05 22:59:09 +03:00
i386 target/i386: Use probe_access_full_mmu in ptw_translate 2024-11-08 13:02:57 +03:00
loongarch target/loongarch: Fix helper_lddir() a CID INTEGER_OVERFLOW issue 2024-07-26 13:12:12 +03:00
m68k target/m68k: Always return a temporary from gen_lea_mode 2024-10-10 21:03:54 +03:00
microblaze target/microblaze: Use insn_start from DisasContextBase 2024-04-09 07:45:09 -10:00
mips target/mips: Prefer fast cpu_env() over slower CPU QOM cast macro 2024-03-12 12:04:24 +01:00
nios2 target/nios2: Prefer fast cpu_env() over slower CPU QOM cast macro 2024-03-12 12:04:24 +01:00
openrisc target/openrisc: Prefer fast cpu_env() over slower CPU QOM cast macro 2024-03-12 12:04:24 +01:00
ppc target/ppc: Fix mtDPDES targeting SMT siblings 2024-11-08 13:02:58 +03:00
riscv target/riscv: Fix vcompress with rvv_ta_all_1s 2024-11-08 13:02:58 +03:00
rx target/rx: Use target_ulong for address in LI 2024-08-28 08:37:28 +03:00
s390x target/s390x: Use insn_start from DisasContextBase 2024-04-09 07:45:09 -10:00
sh4 target/sh4: Update DisasContextBase.insn_start 2024-05-09 16:48:26 +03:00
sparc target/sparc: Restrict STQF to sparcv9 2024-08-28 08:37:29 +03:00
tricore target/tricore/helper: Use correct string format in cpu_tlb_fill() 2024-03-26 14:24:06 +01:00
xtensa target/xtensa: Prefer fast cpu_env() over slower CPU QOM cast macro 2024-03-12 12:04:25 +01:00
Kconfig
meson.build target: Make qemu_target_page_mask() available for *-user 2024-01-29 21:04:10 +10:00
target-common.c target: Make qemu_target_page_mask() available for *-user 2024-01-29 21:04:10 +10:00