Here's my first pull request for qemu-5.2, which has quite a few
accumulated things. Highlights are:
* Preliminary support for POWER10 (Power ISA 3.1) instruction emulation
* Add documentation on the (very confusing) pseries NUMA configuration
* Fix some bugs handling edge cases with XICS, XIVE and kernel_irqchip
* Fix icount for a number of POWER registers
* Many cleanups to error handling in XIVE code
* Validate size of -prom-env data
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAl87VpwACgkQbDjKyiDZ
s5LjIxAAs8YAQe3uDRz1Wb9GftoMmEHdq7JQoO0FbXDQIVXzpTAXmFLSBtCWKl6p
O1MEIy/o48b5ORXJqSDSA5LgxbHxYfHdIPEY5Tbn/TGvTvKyCukx9n11milUG8In
JxRrOTQBnQAAHkLoyuZyrWKOauC0N1scNrnX9Geuid13GcmqHg1d2alXAUu8jEeC
HSiVmtMqqyyqTx2xA4vfhaGuuwTthnKNfbGdg9ksVqBsCW+etn6ZKGImt8hBe3qO
5iqbQZvFbkpzgbjkhDzUDM6tmUAFN55y/Y+y7I8Tz4/IX7d3WbdqpplwrXXVWkpq
2gcBBjQ/9a1hPTBRVN9jn4CvHfhILBfeHIElUiLpSTQZQQALymTnnI2pLCgKoEFX
LcchXbjiX+pZ2OJnAijpwBcknjgT2U/ZNyiqHJfSQ6jzlYx1YtUf4xGUsgloSiK8
9QDK8o2k0Cm8Be+lPMBMmTctoi8bq+8SN5UUF710WQL235J58o9+z1vuGO2HVk3x
flBtv/+B890wcCDpGU80DPs/LSzR0xTTbA5JsWft2fvO569mda0MoWkJH5w6jvSc
ZLYqljCzFCVW+tKiGHzaBalJaMwn0+QMDTsxzP3yTt5LmmEeRXpBELgvrW64IobD
xBeryH3nG4SwxFSJq+4ATfvUzjy/Eo58lTTl6c53Ji8/D3aFwsA=
=L9Wi
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.2-20200818' into staging
ppc patch queue 2020-08-18
Here's my first pull request for qemu-5.2, which has quite a few
accumulated things. Highlights are:
* Preliminary support for POWER10 (Power ISA 3.1) instruction emulation
* Add documentation on the (very confusing) pseries NUMA configuration
* Fix some bugs handling edge cases with XICS, XIVE and kernel_irqchip
* Fix icount for a number of POWER registers
* Many cleanups to error handling in XIVE code
* Validate size of -prom-env data
# gpg: Signature made Tue 18 Aug 2020 05:18:36 BST
# gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-5.2-20200818: (40 commits)
spapr/xive: Use xive_source_esb_len()
nvram: Exit QEMU if NVRAM cannot contain all -prom-env data
spapr/xive: Simplify error handling of kvmppc_xive_cpu_synchronize_state()
ppc/xive: Simplify error handling in xive_tctx_realize()
spapr/xive: Simplify error handling in kvmppc_xive_connect()
ppc/xive: Fix error handling in vmstate_xive_tctx_*() callbacks
spapr/xive: Fix error handling in kvmppc_xive_post_load()
spapr/kvm: Fix error handling in kvmppc_xive_pre_save()
spapr/xive: Rework error handling of kvmppc_xive_set_source_config()
spapr/xive: Rework error handling in kvmppc_xive_get_queues()
spapr/xive: Rework error handling of kvmppc_xive_[gs]et_queue_config()
spapr/xive: Rework error handling of kvmppc_xive_cpu_[gs]et_state()
spapr/xive: Rework error handling of kvmppc_xive_mmap()
spapr/xive: Rework error handling of kvmppc_xive_source_reset()
spapr/xive: Rework error handling of kvmppc_xive_cpu_connect()
spapr: Simplify error handling in spapr_phb_realize()
spapr/xive: Convert KVM device fd checks to assert()
ppc/xive: Introduce dedicated kvm_irqchip_in_kernel() wrappers
ppc/xive: Rework setup of XiveSource::esb_mmio
target/ppc: Integrate icount to purr, vtb, and tbu40
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
With Makefiles that have automatically generated dependencies, you
generated includes are set as dependencies of the Makefile, so that they
are built before everything else and they are available when first
building the .c files.
Alternatively you can use a fine-grained dependency, e.g.
target/arm/translate.o: target/arm/decode-neon-shared.inc.c
With Meson you have only one choice and it is a third option, namely
"build at the beginning of the corresponding target"; the way you
express it is to list the includes in the sources of that target.
The problem is that Meson decides if something is a source vs. a
generated include by looking at the extension: '.c', '.cc', '.m', '.C'
are sources, while everything else is considered an include---including
'.inc.c'.
Use '.c.inc' to avoid this, as it is consistent with our other convention
of using '.rst.inc' for included reStructuredText files. The editorconfig
file is adjusted.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When emulating certain floating point instructions or vector instructions on
PowerPC machines, QEMU did not properly generate the SPE/Embedded Floating-
Point Unavailable interrupt. See the buglink further below for references to
the relevant NXP documentation.
This patch fixes the behavior of some evfs* instructions that were
incorrectly emitting the interrupt.
More importantly, this patch fixes the behavior of several efd* and ev*
instructions that were not generating the interrupt. Triggering the
interrupt for these instructions fixes lazy FPU/vector context switching on
some operating systems like Linux.
Without this patch, the result of some double-precision arithmetic could be
corrupted due to the lack of proper saving and restoring of the upper
32-bit part of the general-purpose registers.
Buglink: https://bugs.launchpad.net/qemu/+bug/1888918
Buglink: https://bugs.launchpad.net/qemu/+bug/1611394
Signed-off-by: Matthieu Bucchianeri <matthieu.bucchianeri@leostella.com>
Message-Id: <20200727175553.32276-1-matthieu.bucchianeri@leostella.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
vmulhsd: Vector Multiply High Signed Doubleword
vmulhud: Vector Multiply High Unsigned Doubleword
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Message-Id: <20200724045845.89976-5-ljp@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
vmulhsw: Vector Multiply High Signed Word
vmulhuw: Vector Multiply High Unsigned Word
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Message-Id: <20200724045845.89976-4-ljp@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Convert the original implementation of vmuluwm to the more generic
tcg_gen_gvec_mul.
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Message-Id: <20200701234344.91843-5-ljp@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Fix double-call to tcg_temp_new_i64(), where a temp is allocated both at
declaration time and further down the implementation of gen_evmwsmiaa().
Note that gen_evmwsmia() and gen_evmwsmiaa() are still not implemented
correctly, as they invoke gen_evmwsmi() which may return early, but the
return is not propagated. This will be fixed in my patch for bug #1888918.
Signed-off-by: Matthieu Bucchianeri <matthieu.bucchianeri@leostella.com>
Message-Id: <20200727172114.31415-1-matthieu.bucchianeri@leostella.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We can now unify the implementation of the 3 VSPLTI instructions.
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
"Deferred" was misspelled as "differed" in some comments, correct this
typo,
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <20200214155748.0896B745953@zero.eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In previous implementation, invocation of TCG shift function could request
shift of TCG variable by 64 bits when variable 'sh' is 0, which is not
supported in TCG (values can be shifted by 0 to 63 bits). This patch fixes
this by using two separate invocation of TCG shift functions, with maximum
shift amount of 32.
Name of variable 'shifted' is changed to 'carry' so variable naming
is similar to old helper implementation.
Variables 'avrA' and 'avrB' are replaced with variable 'avr'.
Fixes: 4e6d0920e7
Reported-by: "Paul A. Clark" <pc@us.ibm.com>
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Suggested-by: Aleksandar Markovic <aleksandar.markovic@rt-rk.com>
Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com>
Message-Id: <1570196639-7025-2-git-send-email-stefan.brankovic@rt-rk.com>
Tested-by: Paul A. Clarke <pc@us.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ISA 3.0B added a set of Floating-Point Status and Control Register (FPSCR)
instructions: mffsce, mffscdrn, mffscdrni, mffscrn, mffscrni, mffsl.
This patch adds support for 'mffsce' instruction.
'mffsce' is identical to 'mffs', except that it also clears the exception
enable bits in the FPSCR.
On CPUs without support for 'mffsce' (below ISA 3.0), the
instruction will execute identically to 'mffs'.
Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1568817082-1384-1-git-send-email-pc@us.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ISA 3.0B added a set of Floating-Point Status and Control Register (FPSCR)
instructions: mffsce, mffscdrn, mffscdrni, mffscrn, mffscrni, mffsl.
This patch adds support for 'mffscrn' and 'mffscrni' instructions.
'mffscrn' and 'mffscrni' are similar to 'mffsl', except they do not return
the status bits (FI, FR, FPRF) and they also set the rounding mode in the
FPSCR.
On CPUs without support for 'mffscrn'/'mffscrni' (below ISA 3.0), the
instructions will execute identically to 'mffs'.
Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
Message-Id: <1568817081-1345-1-git-send-email-pc@us.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since I found this two instructions implemented with tcg, I refactored
them so they are consistent with other similar implementations that
I introduced in this patch.
Also, a new dual macro GEN_VXFORM_TRANS_DUAL is added. This macro is
used if one instruction is realized with direct translation, and second
one with a helper.
Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com>
Message-Id: <1566898663-25858-4-git-send-email-stefan.brankovic@rt-rk.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A class of instructions of the form:
op Target,A,B
which operate like:
Target = Target * A + B
have a bit set which distinguishes them from instructions that operate as:
Target = Target * B + A
This bit is not being checked properly (using PPC_BIT macro), so all
instructions in this class are operating incorrectly as the second form
above. The bit was being checked as if it were part of a 64-bit
instruction opcode, rather than a proper 32-bit opcode. Fix by using the
macro (PPC_BIT32) which treats the opcode as a 32-bit quantity.
Fixes: c9f4e4d8b6 ("target/ppc: improve VSX_FMADD with new GEN_VSX_HELPER_VSX_MADD macro")
Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
Message-Id: <1566401321-22419-1-git-send-email-pc@us.ibm.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ISA 3.0B added a set of Floating-Point Status and Control Register (FPSCR)
instructions: mffsce, mffscdrn, mffscdrni, mffscrn, mffscrni, mffsl.
This patch adds support for 'mffsl'.
'mffsl' is identical to 'mffs', except it only returns mode, status, and enable
bits from the FPSCR.
On CPUs without support for 'mffsl' (below ISA 3.0), the 'mffsl' instruction
will execute identically to 'mffs'.
Note: I renamed FPSCR_RN to FPSCR_RN0 so I could create an FPSCR_RN mask which
is both bits of the FPSCR rounding mode, as defined in the ISA.
I also fixed a typo in the definition of FPSCR_FR.
Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
v4:
- nit: added some braces to resolve a checkpatch complaint.
v3:
- Changed tcg_gen_and_i64 to tcg_gen_andi_i64, eliminating the need for a
temporary, per review from Richard Henderson.
v2:
- I found that I copied too much of the 'mffs' implementation.
The 'Rc' condition code bits are not needed for 'mffsl'. Removed.
- I now free the (renamed) 'tmask' temporary.
- I now bail early for older ISA to the original 'mffs' implementation.
Message-Id: <1565982203-11048-1-git-send-email-pc@us.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Optimize Altivec instruction vclzw (Vector Count Leading Zeros Word).
This instruction counts the number of leading zeros of each word element
in source register and places result in the appropriate word element of
destination register.
Counting is to be performed in four iterations of for loop(one for each
word elemnt of source register vB). Every iteration consists of loading
appropriate word element from source register, counting leading zeros
with tcg_gen_clzi_i32, and saving the result in appropriate word element
of destination register.
Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1563200574-11098-7-git-send-email-stefan.brankovic@rt-rk.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Optimize Altivec instruction vclzd (Vector Count Leading Zeros Doubleword).
This instruction counts the number of leading zeros of each doubleword element
in source register and places result in the appropriate doubleword element of
destination register.
Using tcg-s count leading zeros instruction two times(once for each
doubleword element of source register vB) and placing result in
appropriate doubleword element of destination register vD.
Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1563200574-11098-6-git-send-email-stefan.brankovic@rt-rk.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Optimize altivec instruction vgbbd (Vector Gather Bits by Bytes by Doubleword)
All ith bits (i in range 1 to 8) of each byte of doubleword element in
source register are concatenated and placed into ith byte of appropriate
doubleword element in destination register.
Following solution is done for both doubleword elements of source register
in parallel, in order to reduce the number of instructions needed(that's why
arrays are used):
First, both doubleword elements of source register vB are placed in
appropriate element of array avr. Bits are gathered in 2x8 iterations(2 for
loops). In first iteration bit 1 of byte 1, bit 2 of byte 2,... bit 8 of
byte 8 are in their final spots so avr[i], i={0,1} can be and-ed with
tcg_mask. For every following iteration, both avr[i] and tcg_mask variables
have to be shifted right for 7 and 8 places, respectively, in order to get
bit 1 of byte 2, bit 2 of byte 3.. bit 7 of byte 8 in their final spots so
shifted avr values(saved in tmp) can be and-ed with new value of tcg_mask...
After first 8 iteration(first loop), all the first bits are in their final
places, all second bits but second bit from eight byte are in their places...
only 1 eight bit from eight byte is in it's place). In second loop we do all
operations symmetrically, in order to get other half of bits in their final
spots. Results for first and second doubleword elements are saved in
result[0] and result[1] respectively. In the end those results are saved in
appropriate doubleword element of destination register vD.
Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1563200574-11098-5-git-send-email-stefan.brankovic@rt-rk.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Optimization of altivec instructions vsl and vsr(Vector Shift Left/Rigt).
Perform shift operation (left and right respectively) on 128 bit value of
register vA by value specified in bits 125-127 of register vB. Lowest 3
bits in each byte element of register vB must be identical or result is
undefined.
For vsl instruction, the first step is bits 125-127 of register vB have
to be saved in variable sh. Then, the highest sh bits of the lower
doubleword element of register vA are saved in variable shifted,
in order not to lose those bits when shift operation is performed on
the lower doubleword element of register vA, which is the next
step. After shifting the lower doubleword element shift operation
is performed on higher doubleword element of vA, with replacement of
the lowest sh bits(that are now 0) with bits saved in shifted.
For vsr instruction, firstly, the bits 125-127 of register vB have
to be saved in variable sh. Then, the lowest sh bits of the higher
doubleword element of register vA are saved in variable shifted,
in odred not to lose those bits when the shift operation is
performed on the higher doubleword element of register vA, which is
the next step. After shifting higher doubleword element, shift operation
is performed on lower doubleword element of vA, with replacement of
highest sh bits(that are now 0) with bits saved in shifted.
Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1563200574-11098-3-git-send-email-stefan.brankovic@rt-rk.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Adding simple macro that is calling tcg implementation of appropriate
instruction if altivec support is active.
Optimization of altivec instruction lvsl (Load Vector for Shift Left).
Place bytes sh:sh+15 of value 0x00 || 0x01 || 0x02 || ... || 0x1E || 0x1F
in destination register. Sh is calculated by adding 2 source registers and
getting bits 60-63 of result.
First, the bits [28-31] are placed from EA to variable sh. After that,
the bytes are created in the following way:
sh:(sh+7) of X(from description) by multiplying sh with 0x0101010101010101
followed by addition of the result with 0x0001020304050607. Value obtained
is placed in higher doubleword element of vD.
(sh+8):(sh+15) by adding the result of previous multiplication with
0x08090a0b0c0d0e0f. Value obtained is placed in lower doubleword element
of vD.
Optimization of altivec instruction lvsr (Load Vector for Shift Right).
Place bytes 16-sh:31-sh of value 0x00 || 0x01 || 0x02 || ... || 0x1E ||
0x1F in destination register. Sh is calculated by adding 2 source
registers and getting bits 60-63 of result.
First, the bits [28-31] are placed from EA to variable sh. After that,
the bytes are created in the following way:
sh:(sh+7) of X(from description) by multiplying sh with 0x0101010101010101
followed by substraction of the result from 0x1011121314151617. Value
obtained is placed in higher doubleword element of vD.
(sh+8):(sh+15) by substracting the result of previous multiplication from
0x18191a1b1c1d1e1f. Value obtained is placed in lower doubleword element
of vD.
Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1563200574-11098-2-git-send-email-stefan.brankovic@rt-rk.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Introduce a new GEN_VSX_HELPER_VSX_MADD macro for the generator function which
enables the source and destination registers to be decoded at translation time.
This enables the determination of a or m form to be made at translation time so
that a single helper function can now be used for both variants.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190616123751.781-16-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190616123751.781-15-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190616123751.781-14-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Rather than perform the VSR register decoding within the helper itself,
introduce a new GEN_VSX_HELPER_R2_AB macro which performs the decode based
upon rA and rB at translation time.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190616123751.781-13-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Rather than perform the VSR register decoding within the helper itself,
introduce a new GEN_VSX_HELPER_R2 macro which performs the decode based
upon rD and rB at translation time.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190616123751.781-12-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Rather than perform the VSR register decoding within the helper itself,
introduce a new GEN_VSX_HELPER_R3 macro which performs the decode based
upon rD, rA and rB at translation time.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190616123751.781-11-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Rather than perform the VSR register decoding within the helper itself,
introduce a new GEN_VSX_HELPER_X1 macro which performs the decode based
upon xB at translation time.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190616123751.781-10-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Rather than perform the VSR register decoding within the helper itself,
introduce a new GEN_VSX_HELPER_X2_AB macro which performs the decode based
upon xA and xB at translation time.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190616123751.781-9-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Rather than perform the VSR register decoding within the helper itself,
introduce a new GEN_VSX_HELPER_X2 macro which performs the decode based
upon xT and xB at translation time.
With the previous change to the xscvqpdp generator and helper functions the
opcode parameter is no longer required in the common case and can be
removed.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190616123751.781-8-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Rather than perform the VSR register decoding within the helper itself,
introduce a new generator and helper function which perform the decode based
upon xT and xB at translation time.
The xscvqpdp helper is the only 2 parameter xT/xB implementation that requires
the opcode to be passed as an additional parameter, so handling this separately
allows us to optimise the conversion in the next commit.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190616123751.781-7-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Rather than perform the VSR register decoding within the helper itself,
introduce a new GEN_VSX_HELPER_X3 macro which performs the decode based
upon xT, xA and xB at translation time.
With the previous changes to the VSX_CMP generator and helper macros the
opcode parameter is no longer required in the common case and can be
removed.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190616123751.781-6-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Rather than perform the VSR register decoding within the helper itself,
introduce a new VSX_CMP macro which performs the decode based upon xT, xA
and xB at translation time.
Subsequent commits will make the same changes for other instructions however
the xvcmp* instructions are different in that they return a set of flags to be
optionally written back to the crf[6] register. Move this logic from the
helper function to the generator function, along with the float_status update.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190616123751.781-5-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Replace the target-specific implementation of XXSEL.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190603164927.8336-1-richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
During the conversion these instructions were incorrectly treated as
stores. We need to use set_cpu_vsr* and not get_cpu_vsr*.
Fixes: 8b3b2d75c7 ("introduce get_cpu_vsr{l,h}() and set_cpu_vsr{l,h}() helpers for VSR register access")
Signed-off-by: Anton Blanchard <anton@ozlabs.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20190524065345.25591-1-mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The gvec expanders take care of masking the shift amount
against the element width.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190518191430.21686-2-richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We were using set_cpu_vsr*() when we should have used get_cpu_vsr*().
Fixes: 8b3b2d75c7 ("introduce get_cpu_vsr{l,h}() and set_cpu_vsr{l,h}() helpers for VSR register access")
Signed-off-by: Anton Blanchard <anton@ozlabs.org>
Message-Id: <20190509104912.6b754dff@kryten>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A few small optimisations:
In VSX_LOAD_SCALAR_DS() we can don't need to read the VSR via
get_cpu_vsrh().
Split VSX_VECTOR_LOAD_STORE() into two functions. Loads only need to
write the VSRs (set_cpu_vsr*()) and stores only need to read the VSRs
(get_cpu_vsr*())
Thanks to Mark Cave-Ayland for the suggestions.
Signed-off-by: Anton Blanchard <anton@ozlabs.org>
Message-Id: <20190509103545.4a7fa71a@kryten>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
xxspltib raises a VMX or a VSX exception depending on the register
set it is operating on. We had a check, but it was backwards.
Fixes: f113283525 ("target-ppc: add xxspltib instruction")
Signed-off-by: Anton Blanchard <anton@ozlabs.org>
Message-Id: <20190509061713.69490488@kryten>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Fix a typo in xxbrq and xxbrw where we put both results into the lower
doubleword.
Fixes: 8b3b2d75c7 ("introduce get_cpu_vsr{l,h}() and set_cpu_vsr{l,h}() helpers for VSR register access")
Signed-off-by: Anton Blanchard <anton@ozlabs.org>
Message-Id: <20190507004811.29968-3-anton@ozlabs.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Fix a typo in xvxsigdp where we put both results into the lower
doubleword.
Fixes: dd977e4f45 ("target/ppc: Optimize x[sv]xsigdp using deposit_i64()")
Signed-off-by: Anton Blanchard <anton@ozlabs.org>
Message-Id: <20190507004811.29968-1-anton@ozlabs.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20190423102145.14812-2-f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Replace the single opcode in .opc with a null-terminated
array in .opt_opc. We still require that all opcodes be
used with the same .vece.
Validate the contents of this list with CONFIG_DEBUG_TCG.
All tcg_gen_*_vec functions will check any list active
during .fniv expansion. Swap the active list in and out
as we expand other opcodes, or take control away from the
front-end function.
Convert all existing vector aware front ends.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
I've been hitting several QEMU crashes while running a fedora29 ppc64le
guest under TCG. Each time, this would occur several minutes after the
guest reached login:
Fedora 29 (Twenty Nine)
Kernel 4.20.6-200.fc29.ppc64le on an ppc64le (hvc0)
Web console: https://localhost:9090/
localhost login:
tcg/tcg.c:3211: tcg fatal error
This happens because a bug crept up in the gen_stxsdx() helper when it
was converted to use VSR register accessors by commit 8b3b2d75c7
"target/ppc: introduce get_cpu_vsr{l,h}() and set_cpu_vsr{l,h}() helpers
for VSR register access".
The code creates a temporary, passes it directly to gen_qemu_st64_i64()
and then to set_cpu_vrsh()... which looks like this was mistakenly
coded as a load instead of a store.
Reverse the logic: read the VSR to the temporary first and then store
it to memory.
Fixes: 8b3b2d75c7
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155371035249.2038502.12364252604337688538.stgit@bahia.lan>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20190309214255.9952-3-f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>