Using get_address() with register identifiers comming from an "r" field
is wrong: if the "r" field designates "r0", we don't read the content
and instead assume 0 - which should only be applied when the register
was specified via "b" or "x".
PoP 5-11 "Operand-Address Generation":
"A zero in any of the B1, B2, X2, B3, or B4 fields indicates the absence
of the corresponding address component. For the absent component, a zero
is used in forming the intermediate sum, regardless of the contents of
general register 0. A displacement of zero has no special significance."
This BUG became visible for CSPG as generated by LLVM-12 in the upstream
Linux kernel (v5.11-rc2), used while creating the linear mapping in
vmem_map_init(): Trying to store to address 0 results in a Low Address
Protection exception.
Debugging this was more complicated than it could have been: The program
interrupt handler in the kernel will try to crash the kernel: doing so, it
will enable DAT. As the linear mapping is not created yet (asce=0), we run
into an addressing exception while tring to walk non-existant DAT tables,
resulting in a program exception loop.
This allows for booting upstream Linux kernels compiled by clang-12. Most
of these cases seem to be broken forever.
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210111163845.18148-4-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Looks like something went wrong whiel touching that line. Instead of "r1"
we need a new temporary. Also, we have to pass MO_TEQ, to indicate that
we are working with 64-bit values. Let's revert these changes.
Fixes: ff26d287bd ("target/s390x: Improve cc computation for ADD LOGICAL")
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210111163845.18148-2-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Now that SUB LOGICAL outputs borrow, we can use that as input directly.
It also means we can re-use CC_OP_SUBU and produce an output borrow
directly from SUB LOGICAL WITH BORROW.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20201214221356.68039-5-richard.henderson@linaro.org>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The resulting cc is only dependent on the result and the carry-out.
Carry-out and borrow-out are inverses, so are trivially converted.
With tcg ops, it is easier to compute borrow-out than carry-out, so
save result and borrow-out rather than the inputs.
Borrow-out for 64-bit inputs is had via tcg_gen_sub2_i64 directly
into cc_src. Borrow-out for 32-bit inputs is had via extraction
from a normal 64-bit sub (with zero-extended inputs).
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20201214221356.68039-4-richard.henderson@linaro.org>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Now that ADD LOGICAL outputs carry, we can use that as input directly.
It also means we can re-use CC_OP_ADDU and produce an output carry
directly from ADD LOGICAL WITH CARRY.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20201214221356.68039-3-richard.henderson@linaro.org>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The resulting cc is only dependent on the result and the
carry-out. So save those things rather than the inputs.
Carry-out for 64-bit inputs is had via tcg_gen_add2_i64 directly
into cc_src. Carry-out for 32-bit inputs is had via extraction
from a normal 64-bit add (with zero-extended inputs).
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20201214221356.68039-2-richard.henderson@linaro.org>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
This patch adds some gen_io_start() calls to allow execution
of s390x targets in icount mode with -smp 1.
It enables deterministic timers and record/replay features.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: David Hildenbrand <david@redhat.com>
Message-Id: <160455551747.32240.17074484658979970129.stgit@pasha-ThinkPad-X280>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
As with the other crypto functions, we only implement subcode 0 (query)
and no actual encryption/decryption. We now implement S390_FEAT_MSA_EXT_8.
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200928122717.30586-10-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
We need new CC handling, determining the CC based on the intermediate
result (64bit for MSC and MSRKC, 128bit for MSGC and MSGRKC).
We want to store out2 ("low") after muls128 to r1, so add
"wout_out2_r1".
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200928122717.30586-8-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Just like BRANCH ON CONDITION - however the address is read from memory
(always 8 bytes are read), we have to wrap the address manually. The
address is read using current CPU DAT/address-space controls, just like
ordinary data.
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200928122717.30586-7-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Just like MULTIPLY HALFWORD IMMEDIATE (MGHI), only the second operand
(signed 16 bit) comes from memory.
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200928122717.30586-6-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Multiply two signed 64bit values and store the 128bit result in r1 (0-63)
and r1 + 1 (64-127).
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200928122717.30586-5-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Easy, just like ADD HALFWORD IMMEDIATE (AGHI).
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200928122717.30586-3-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Recent upstream Linux uses the MONITOR CALL instruction for things like
BUG_ON() and WARN_ON(). We currently inject an operation exception when
we hit a MONITOR CALL instruction - which is wrong, as the instruction
is not glued to specific CPU features.
Doing a simple WARN_ON_ONCE() currently results in a panic:
[ 18.162801] illegal operation: 0001 ilc:2 [#1] SMP
[ 18.162889] Modules linked in:
[...]
[ 18.165476] Kernel panic - not syncing: Fatal exception: panic_on_oops
With a proper implementation, we now get:
[ 18.242754] ------------[ cut here ]------------
[ 18.242855] WARNING: CPU: 7 PID: 1 at init/main.c:1534 [...]
[ 18.242919] Modules linked in:
[...]
[ 18.246262] ---[ end trace a420477d71dc97b4 ]---
[ 18.259014] Freeing unused kernel memory: 4220K
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200918085122.26132-1-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The output is 128-bit, and thus requires a pair of 64-bit temps.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Buglink: https://bugs.launchpad.net/bugs/1883984
Message-Id: <20200620042140.42070-1-richard.henderson@linaro.org>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Merge VERLL and VERLLV into op_vesv and op_ves, alongside
all of the other vector shift operations.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These are trivially done by performing a memory operation
with the correct mmu_idx. The only tricky part is using
get_address directly in order to get the address wrapped;
we cannot use la2 because of the format.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20191211203614.15611-3-richard.henderson@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
24 and 31-bit address space handling is wrong when it comes to storing
back the addresses to the register.
While at it, read gprs 0 implicitly.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Simulate XxC=0 and ERM=0 (current mode), so we can use the existing
helper function.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
The only FP instruction we can implement without an helper.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
We can reuse some of the infrastructure introduced for
VECTOR FP CONVERT FROM FIXED 64-BIT and friends.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Take care of reading/indicating the 32-bit elements.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
We can reuse most of the infrastructure introduced for
VECTOR FP CONVERT FROM FIXED 64-BIT and friends.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
We can reuse most of the infrastructure added for VECTOR FP ADD.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
1. We'll reuse op_vcdg() for similar instructions later, prepare for
that.
2. We'll reuse vop64_2() later for other instructions.
We have to mangle the erm (effective rounding mode) and the m4 into
the simd_data(), and properly unmangle them again.
Make sure to restore the erm before triggering an exception.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Provide for all three instructions all four combinations of cc bit and
s bit.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
As far as I can see, there is only a tiny difference.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
1. We'll reuse op_vfa() for similar instructions later, prepare for
that.
2. We'll reuse vop64_3() for other instructions later.
3. Take care of modifying the vector register only if no trap happened.
- on traps, flags are not updated and no elements are modified
- traps don't modify the fpc flags
- without traps, all exceptions of all elements are merged
4. We'll reuse check_ieee_exc() later when we need the XxC flag.
We have to check for exceptions after processing each element.
Provide separate handlers for single/all element processing. We'll do
the same for all applicable FP instructions.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Unfortunately, there is no easy way to avoid looping over all elements
in v2. Provide specialized variants for !cc,!rt/!cc,rt/cc,!rt/cc,rt and
all element types. Especially for different values of rt, the compiler
might be able to optimize the code a lot.
Add s390_vec_write_element().
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Similar to VECTOR FIND ELEMENT EQUAL. Core logic courtesy of Richard H.
Add s390_vec_read_element() that can deal with element sizes.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Complicated stuff. Provide two different helpers for CC an !CC handling.
We might want to add more helpers later.
zero_search() and match_index() are courtesy of Richard H.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's return the cc value directly via cpu_env. Unfortunately there
isn't a simple way to calculate the value lazily - one would have to
calculate and store e.g. the population count of the mask and the
result so it can be evaluated in a cc helper.
But as VTM only sets the cc, we can assume the value will be needed soon
either way.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Similar to VECTOR SUM ACROSS DOUBLEWORD.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Similar to VECTOR SUM ACROSS DOUBLEWORD, however without a loop and
using 128-bit calculations.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Perform the calculations without a helper. Only 16 bit or 32 bit values
have to be added.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Fairly easy as only 128-bit handling is required. Simply perform the
subtraction and then subtract the borrow.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's keep it simple for now and handle 8/16 bit elements via helpers.
Especially for 8/16, we could come up with some bit tricks.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
We can use tcg_gen_sub2_i64() to do 128-bit subtraction and otherwise
existing gvec helpers.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Similar to VECTOR SHIFT RIGHT ARITHMETICAL.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>