Commit Graph

236 Commits

Author SHA1 Message Date
Chinmay Rath
e77d736d2a target/ppc: Move VSX fp compare insns to decodetree.
Moving the following instructions to decodetree specification:

	xvcmp{eq, gt, ge, ne}{s, d}p	: XX3-form

The changes were verified by validating that the tcg-ops generated for those
instructions remain the same which were captured using the '-d in_asm,op' flag.

Signed-off-by: Chinmay Rath <rathc@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:51:33 +10:00
Chinmay Rath
29df8d950e target/ppc: Move VSX vector with length storage access insns to decodetree.
Moving the following instructions to decodetree specification :

        {l, st}xvl(l)           : X-form

The changes were verified by validating that the tcg-ops generated by those
instructions remain the same, which were captured using the '-d in_asm,op' flag.

Also added a new function do_ea_calc_ra to calculate the effective address :
EA <- (RA == 0) ? 0 : GPR[RA], which is now used by the above-said insns,
and shall be used later by (p){lx, stx}vp insns.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Chinmay Rath <rathc@linux.ibm.com>
[np: Fix 32-bit build]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:51:33 +10:00
Chinmay Rath
638f6d553a target/ppc: Move VSX arithmetic and max/min insns to decodetree.
Moving the following instructions to decodetree specification:

	x{s, v}{add, sub, mul, div}{s, d}p	: XX3-form
	xs{max, min}dp, xv{max, min}{s, d}p	: XX3-form

The changes were verfied by validating that the tcg ops generated by those
instructions remain the same, which were captured with the '-d in_asm,op' flag.

Signed-off-by: Chinmay Rath <rathc@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:51:33 +10:00
Chinmay Rath
a7e10fab78 target/ppc: Move VMX integer add/sub saturate insns to decodetree.
Moving the following instructions to decodetree specification :

	v{add,sub}{u,s}{b,h,w}s		: VX-form

The changes were verified by validating that the tcg ops generated by those
instructions remain the same, which were captured with the '-d in_asm,op' flag.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Chinmay Rath <rathc@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26 09:51:33 +10:00
Richard Henderson
c6d84fd7cf target/ppc: Merge helper_{dcbz,dcbzep}
Merge the two and pass the mmu_idx directly from translation.
Swap the argument order in dcbz_common to avoid extra swaps.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-23 10:56:16 +10:00
Richard Henderson
62fe57c6d2 target/ppc: Split out helper_dbczl for 970
We can determine at translation time whether the insn is or
is not dbczl.  We must retain a runtime check against the
HID5 register, but we can move that to a separate function
that never affects other ppc models.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-23 10:56:16 +10:00
BALATON Zoltan
2b92822acc target/ppc: Remove unused helper_rac()
The helper_rac function is defined but not used, remove it.

Fixes: 005b69fdcc (target/ppc: Remove PowerPC 601 CPUs)
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-05-24 09:34:42 +10:00
Nicholas Piggin
0dfe59fe77 target/ppc: add SMT support to msgsnd broadcast
msgsnd has a broadcast mode that sends hypervisor doorbells to all
threads belonging to the same core as the target. A "subcore" mode
sends to all or one thread depending on 1LPAR mode.

Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-05-24 09:34:41 +10:00
Nicholas Piggin
2736432ffc target/ppc: Implement SPRC/SPRD SPRs
This implements the POWER SPRC/SPRD SPRs, and SCRATCH0-7 registers that
can be accessed via these indirect SPRs.

SCRATCH registers only provide storage, but they are used by firmware
for low level crash and progress data, so this implementation logs
writes to the registers to help with analysis.

Reviewed-by: Glenn Miles <milesg@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-05-24 09:34:40 +10:00
Nicholas Piggin
45693f94dd target/ppc: Implement attn instruction on BookS 64-bit processors
attn is an implementation-specific instruction that on POWER (and G5/
970) can be enabled with a HID bit (disabled = illegal), and executing
it causes the host processor to stop and the service processor to be
notified. Generally used for debugging.

Implement attn and make it checkstop the system, which should be good
enough for QEMU debugging.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-05-24 09:34:38 +10:00
Glenn Miles
6bfcf1dc23 target/ppc: Add clrbhrb and mfbhrbe instructions
Add support for the clrbhrb and mfbhrbe instructions.

Since neither instruction is believed to be critical to
performance, both instructions were implemented using helper
functions.

Access to both instructions is controlled by bits in the
HFSCR (for privileged state) and MMCR0 (for problem state).
A new function, helper_mmcr0_facility_check, was added for
checking MMCR0[BHRBA] and raising a facility_unavailable exception
if required.

NOTE: For P8 and P9, due to a performance issue, branch history will
not be kept, but the instructions will be allowed to execute
as normal with the exception that the mfbhrbe instruction will
always return a zero value.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Glenn Miles <milesg@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-05-24 09:33:32 +10:00
Glenn Miles
a7138e28a2 target/ppc: Add new hflags to support BHRB
This commit is preparatory to the addition of Branch History
Rolling Buffer (BHRB) functionality, which is being provided
today starting with the P8 processor.

BHRB uses several SPR register fields to control whether or not
a branch instruction's address (and sometimes target address)
should be recorded.  Checking each of these fields with each
branch instruction using jitted code would lead to a significant
decrease in performance.

Therefore, it was decided that BHRB configuration bits that are
not expected to change frequently should have their state summarized
in an hflag so that the amount of checking done by jitted code can
be reduced.

This commit contains the changes for summarizing the state of the
following register fields in the HFLAGS_BHRB_ENABLE hflag:

	MMCR0[FCP] - Determines if BHRB recording is frozen in the
                     problem state

	MMCR0[FCPC] - A modifier for MMCR0[FCP]

	MMCRA[BHRBRD] - Disables all BHRB recording for a thread

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Glenn Miles <milesg@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-05-24 08:57:50 +10:00
Chinmay Rath
21b5f5464f target/ppc: Move VMX storage access instructions to decodetree
Moving the following instructions to decodetree specification :

	{l,st}ve{b,h,w}x,
	{l,st}v{x,xl},
	lvs{l,r}		: X-form

The changes were verified by validating that the tcg ops generated by those
instructions remain the same, which were captured using the '-d in_asm,op' flag.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Chinmay Rath <rathc@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-05-24 08:57:50 +10:00
Chinmay Rath
948e257c48 target/ppc: Move logical fixed-point instructions to decodetree.
Moving the below instructions to decodetree specification :

	andi[s]., {ori, xori}[s]			: D-form

	{and, andc, nand, or, orc, nor, xor, eqv}[.],
	exts{b, h, w}[.],  cnt{l, t}z{w, d}[.],
	popcnt{b, w, d},  prty{w, d}, cmp, bpermd	: X-form

With this patch, all the fixed-point logical instructions have been
moved to decodetree.
The changes were verified by validating that the tcg ops generated by those
instructions remain the same, which were captured with the '-d in_asm,op' flag.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Chinmay Rath <rathc@linux.ibm.com>
[np: 32-bit compile fix]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-05-24 08:57:50 +10:00
Chinmay Rath
ae556c6a49 target/ppc: Move cmp{rb, eqb}, tw[i], td[i], isel instructions to decodetree.
Moving the following instructions to decodetree specification :

	cmp{rb, eqb}, t{w, d}	: X-form
	t{w, d}i		: D-form
	isel			: A-form

The changes were verified by validating that the tcg ops generated by those
instructions remain the same, which were captured using the '-d in_asm,op' flag.
Also for CMPRB, following review comments :
Replaced repetition of arithmetic right shifting (tcg_gen_shri_i32) followed
by extraction of last 8 bits (tcg_gen_ext8u_i32) with extraction of the required
bits using offsets (tcg_gen_extract_i32).

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Chinmay Rath <rathc@linux.ibm.com>
[np: 32-bit compile fix]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-05-24 08:57:50 +10:00
Chinmay Rath
f424bc10eb target/ppc: Move div/mod fixed-point insns (64 bits operands) to decodetree.
Moving the below instructions to decodetree specification :

	divd[u, e, eu][o][.]	: XO-form
	mod{sd, ud}		: X-form

With this patch, all the fixed-point arithmetic instructions have been
moved to decodetree.
The changes were verified by validating that the tcg ops generated by those
instructions remain the same, which were captured using the '-d in_asm,op' flag.
Also, remaned do_divwe method in fixedpoint-impl.c.inc to do_dive because it is
now used to divide doubleword operands as well, and not just words.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Chinmay Rath <rathc@linux.ibm.com>
[np: 32-bit compile fix]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-05-24 08:57:50 +10:00
Chinmay Rath
a81b5c1867 target/ppc: Move neg, darn, mod{sw, uw} to decodetree.
Moving the below instructions to decodetree specification :

	neg[o][.]       	: XO-form
	mod{sw, uw}, darn	: X-form

The changes were verified by validating that the tcg ops generated by those
instructions remain the same, which were captured with the '-d in_asm,op' flag.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Chinmay Rath <rathc@linux.ibm.com>
[np: 32-bit compile fix]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-05-24 08:57:50 +10:00
Chinmay Rath
2871921d85 target/ppc: Move divw[u, e, eu] instructions to decodetree.
Moving the following instructions to decodetree specification :
	 divw[u, e, eu][o][.] 	: XO-form

The changes were verified by validating that the tcg ops generated by those
instructions remain the same, which were captured with the '-d in_asm,op' flag.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Chinmay Rath <rathc@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-05-24 08:57:50 +10:00
Chinmay Rath
177fcc06dc target/ppc: Move floating-point arithmetic instructions to decodetree.
This patch moves the below instructions to decodetree specification :

    f{add, sub, mul, div, re, rsqrte, madd, msub, nmadd, nmsub}[s][.] : A-form
    ft{div, sqrt}                                                     : X-form

With this patch, all the floating-point arithmetic instructions have been
moved to decodetree.
The changes were verified by validating that the tcg ops generated by those
instructions remain the same, which were captured with the '-d in_asm,op' flag.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Chinmay Rath <rathc@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-05-24 08:57:50 +10:00
Nicholas Piggin
d5ee641cfc target/ppc: Implement watchpoint debug facility for v2.07S
ISA v2.07S introduced the watchpoint facility based on the DAWR0
and DAWRX0 SPRs. Implement this in TCG.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2023-09-06 11:19:32 +02:00
Nicholas Piggin
14192307ef target/ppc: Implement breakpoint debug facility for v2.07S
ISA v2.07S introduced the breakpoint facility based on the CIABR SPR.
Implement this in TCG.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2023-09-06 11:19:32 +02:00
Nicholas Piggin
148953849d target/ppc: Improve book3s branch trace interrupt for v2.07S
Improve the emulation accuracy of the single step and branch trace
interrupts for v2.07S. Set SRR1[33]=1, and set SIAR to completed
instruction address.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2023-09-06 11:19:32 +02:00
Nicholas Piggin
9cdfd1b9f7 target/ppc: SMT support for the HID SPR
HID is a per-core shared register, skiboot sets this (e.g., setting
HILE) on one thread and that must affect all threads of the core.

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20230705120631.27670-3-npiggin@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2023-07-07 04:47:49 -03:00
Nicholas Piggin
b25f2ffa19 target/ppc: Add TFMR SPR implementation with read and write helpers
TFMR is the Time Facility Management Register which is specific to
POWER CPUs, and used for the purpose of timebase management (generally
by firmware, not the OS).

Add helpers for the TFMR register, which will form part of the core
timebase facility model in future but for now behaviour is unchanged.

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20230625120317.13877-3-npiggin@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2023-07-07 04:18:26 -03:00
Nicholas Piggin
c5d98a7b3d target/ppc: Add support for SMT CTRL register
A relatively simple case to begin with, CTRL is a SMT shared register
where reads and writes need to synchronise against state changes by
other threads in the core.

Atomic serialisation operations are used to achieve this.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2023-06-25 22:41:30 +02:00
Richard Henderson
57b38ffd0c target/ppc: Use tcg_gen_qemu_{ld,st}_i128 for LQARX, LQ, STQ
No need to roll our own, as this is now provided by tcg.
This was the last use of retxl, so remove that too.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-05-23 16:51:19 -07:00
Richard Henderson
894448ae7d target/ppc: Use tcg_gen_atomic_cmpxchg_i128 for STQCX
Note that the previous direct reference to reserve_val,

-   tcg_gen_ld_i64(t1, cpu_env, (ctx->le_mode
-                                ? offsetof(CPUPPCState, reserve_val2)
-                                : offsetof(CPUPPCState, reserve_val)));

was incorrect because all references should have gone through
cpu_reserve_val.  Create a cpu_reserve_val2 tcg temp to fix this.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20221112061122.2720163-2-richard.henderson@linaro.org>
2023-02-04 06:19:42 -10:00
Leandro Lupori
eeaaefe9fa target/ppc: Increment PMC5 with inline insns
Profiling QEMU during Fedora 35 for PPC64 boot revealed that
6.39% of total time was being spent in helper_insns_inc(), on a
POWER9 machine. To avoid calling this helper every time PMCs had
to be incremented, an inline implementation of PMC5 increment and
check for overflow was developed. This led to a reduction of
about 12% in Fedora's boot time.

Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20221025202424.195984-4-leandro.lupori@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28 13:15:23 -03:00
Matheus Ferst
2fdedcbc69 target/ppc: introduce ppc_maybe_interrupt
This new method will check if any pending interrupt was unmasked and
then call cpu_interrupt/cpu_reset_interrupt accordingly. Code that
raises/lowers or masks/unmasks interrupts should call this method to
keep CPU_INTERRUPT_HARD coherent with env->pending_interrupts.

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20221021142156.4134411-2-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28 13:15:22 -03:00
Lucas Mateus Castro (alqotel)
da3c53bac3 target/ppc: Moved XSTSTDC[QDS]P to decodetree
Moved XSTSTDCSP, XSTSTDCDP and XSTSTDCQP to decodetree and moved some of
its decoding away from the helper as previously the DCMX, XB and BF were
calculated in the helper with the help of cpu_env, now that part was
moved to the decodetree with the rest.

xvtstdcsp:
rept    loop    master             patch
8       12500   1,85393600         1,94683600 (+5.0%)
25      4000    1,78779800         1,92479000 (+7.7%)
100     1000    2,12775000         2,28895500 (+7.6%)
500     200     2,99655300         3,23102900 (+7.8%)
2500    40      6,89082200         7,44827500 (+8.1%)
8000    12     17,50585500        18,95152100 (+8.3%)

xvtstdcdp:
rept    loop    master             patch
8       12500   1,39043100         1,33539800 (-4.0%)
25      4000    1,35731800         1,37347800 (+1.2%)
100     1000    1,51514800         1,56053000 (+3.0%)
500     200     2,21014400         2,47906000 (+12.2%)
2500    40      5,39488200         6,68766700 (+24.0%)
8000    12     13,98623900        18,17661900 (+30.0%)

xvtstdcdp:
rept    loop    master             patch
8       12500   1,35123800         1,34455800 (-0.5%)
25      4000    1,36441200         1,36759600 (+0.2%)
100     1000    1,49763500         1,54138400 (+2.9%)
500     200     2,19020200         2,46196400 (+12.4%)
2500    40      5,39265700         6,68147900 (+23.9%)
8000    12     14,04163600        18,19669600 (+29.6%)

As some values are now decoded outside the helper and passed to it as an
argument the number of arguments of the helper increased, the number
of TCGop needed to load the arguments increased. I suspect that's why
the slow-down in the tests with a high REPT but low LOOP.

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-12-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28 13:15:22 -03:00
Lucas Mateus Castro (alqotel)
a70a524710 target/ppc: Moved XVTSTDC[DS]P to decodetree
Moved XVTSTDCSP and XVTSTDCDP to decodetree an restructured the helper
to be simpler and do all decoding in the decodetree (so XB, XT and DCMX
are all calculated outside the helper).

Obs: The tests in this one are slightly different, these are the sum of
these instructions with all possible immediate and those instructions
are repeated 10 times.

xvtstdcsp:
rept    loop    master             patch
8       12500   2,76402100         2,70699100 (-2.1%)
25      4000    2,64867100         2,67884100 (+1.1%)
100     1000    2,73806300         2,78701000 (+1.8%)
500     200     3,44666500         3,61027600 (+4.7%)
2500    40      5,85790200         6,47475500 (+10.5%)
8000    12     15,22102100        17,46062900 (+14.7%)

xvtstdcdp:
rept    loop    master             patch
8       12500   2,11818000         1,61065300 (-24.0%)
25      4000    2,04573400         1,60132200 (-21.7%)
100     1000    2,13834100         1,69988100 (-20.5%)
500     200     2,73977000         2,48631700 (-9.3%)
2500    40      5,05067000         5,25914100 (+4.1%)
8000    12     14,60507800        15,93704900 (+9.1%)

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-11-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28 13:15:22 -03:00
Lucas Mateus Castro (alqotel)
26c964f851 target/ppc: Move VABSDU[BHW] to decodetree and use gvec
Moved VABSDUB, VABSDUH and VABSDUW to decodetree and use gvec to
translate them.

vabsdub:
rept    loop    master             patch
8       12500   0,03601600         0,00688500 (-80.9%)
25      4000    0,03651000         0,00532100 (-85.4%)
100     1000    0,03666900         0,00595300 (-83.8%)
500     200     0,04305800         0,01244600 (-71.1%)
2500    40      0,06893300         0,04273700 (-38.0%)
8000    12      0,14633200         0,12660300 (-13.5%)

vabsduh:
rept    loop    master             patch
8       12500   0,02172400         0,00687500 (-68.4%)
25      4000    0,02154100         0,00531500 (-75.3%)
100     1000    0,02235400         0,00596300 (-73.3%)
500     200     0,02827500         0,01245100 (-56.0%)
2500    40      0,05638400         0,04285500 (-24.0%)
8000    12      0,13166000         0,12641400 (-4.0%)

vabsduw:
rept    loop    master             patch
8       12500   0,01646400         0,00688300 (-58.2%)
25      4000    0,01454500         0,00475500 (-67.3%)
100     1000    0,01545800         0,00511800 (-66.9%)
500     200     0,02168200         0,01114300 (-48.6%)
2500    40      0,04571300         0,04138800 (-9.5%)
8000    12      0,12209500         0,12178500 (-0.3%)

Same as VADDCUW and VSUBCUW, overall performance gain but it uses more
TCGop (4 before the patch, 6 after).

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-8-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28 13:15:22 -03:00
Lucas Mateus Castro (alqotel)
c85929b2dd target/ppc: Move VAVG[SU][BHW] to decodetree and use gvec
Moved the instructions VAVGUB, VAVGUH, VAVGUW, VAVGSB, VAVGSH, VAVGSW,
to decodetree and use gvec with them. For these one the right shift
had to be made before the sum as to avoid an overflow, so add 1 at the
end if any of the entries had 1 in its LSB as to replicate the "+ 1"
before the shift described by the ISA.

vavgub:
rept    loop    master             patch
8       12500   0,02616600         0,00754200 (-71.2%)
25      4000    0,02530000         0,00637700 (-74.8%)
100     1000    0,02604600         0,00790100 (-69.7%)
500     200     0,03189300         0,01838400 (-42.4%)
2500    40      0,06006900         0,06851000 (+14.1%)
8000    12      0,13941000         0,20548500 (+47.4%)

vavguh:
rept    loop    master             patch
8       12500   0,01818200         0,00780600 (-57.1%)
25      4000    0,01789300         0,00641600 (-64.1%)
100     1000    0,01899100         0,00787200 (-58.5%)
500     200     0,02527200         0,01828400 (-27.7%)
2500    40      0,05361800         0,06773000 (+26.3%)
8000    12      0,12886600         0,20291400 (+57.5%)

vavguw:
rept    loop    master             patch
8       12500   0,01423100         0,00776600 (-45.4%)
25      4000    0,01780800         0,00638600 (-64.1%)
100     1000    0,02085500         0,00787000 (-62.3%)
500     200     0,02737100         0,01828800 (-33.2%)
2500    40      0,05572600         0,06774200 (+21.6%)
8000    12      0,13101700         0,20311600 (+55.0%)

vavgsb:
rept    loop    master             patch
8       12500   0,03006000         0,00788600 (-73.8%)
25      4000    0,02882200         0,00637800 (-77.9%)
100     1000    0,02958000         0,00791400 (-73.2%)
500     200     0,03548800         0,01860400 (-47.6%)
2500    40      0,06360000         0,06850800 (+7.7%)
8000    12      0,13816500         0,20550300 (+48.7%)

vavgsh:
rept    loop    master             patch
8       12500   0,01965900         0,00776600 (-60.5%)
25      4000    0,01875400         0,00638700 (-65.9%)
100     1000    0,01952200         0,00786900 (-59.7%)
500     200     0,02562000         0,01760300 (-31.3%)
2500    40      0,05384300         0,06742800 (+25.2%)
8000    12      0,13240800         0,20330000 (+53.5%)

vavgsw:
rept    loop    master             patch
8       12500   0,01407700         0,00775600 (-44.9%)
25      4000    0,01762300         0,00640000 (-63.7%)
100     1000    0,02046500         0,00788500 (-61.5%)
500     200     0,02745600         0,01843000 (-32.9%)
2500    40      0,05375500         0,06820500 (+26.9%)
8000    12      0,13068300         0,20304900 (+55.4%)

These results to me seems to indicate that with gvec the results have a
slower translation but faster execution.

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-7-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28 13:15:22 -03:00
Lucas Mateus Castro (alqotel)
d57fbd8fd9 target/ppc: Move VPRTYB[WDQ] to decodetree and use gvec
Moved VPRTYBW and VPRTYBD to use gvec and both of them and VPRTYBQ to
decodetree. VPRTYBW and VPRTYBD now also use .fni4 and .fni8,
respectively.

vprtybw:
rept    loop    master             patch
8       12500   0,01198900         0,00703100 (-41.4%)
25      4000    0,01070100         0,00571400 (-46.6%)
100     1000    0,01123300         0,00678200 (-39.6%)
500     200     0,01601500         0,01535600 (-4.1%)
2500    40      0,03872900         0,05562100 (43.6%)
8000    12      0,10047000         0,16643000 (65.7%)

vprtybd:
rept    loop    master             patch
8       12500   0,00757700         0,00788100 (4.0%)
25      4000    0,00652500         0,00669600 (2.6%)
100     1000    0,00714400         0,00825400 (15.5%)
500     200     0,01211000         0,01903700 (57.2%)
2500    40      0,03483800         0,07021200 (101.5%)
8000    12      0,09591800         0,21036200 (119.3%)

vprtybq:
rept    loop    master             patch
8       12500   0,00675600         0,00667200 (-1.2%)
25      4000    0,00619400         0,00643200 (3.8%)
100     1000    0,00707100         0,00751100 (6.2%)
500     200     0,01199300         0,01342000 (11.9%)
2500    40      0,03490900         0,04092900 (17.2%)
8000    12      0,09588200         0,11465100 (19.6%)

I wasn't expecting such a performance lost in both VPRTYBD and VPRTYBQ,
I'm not sure if it's worth to move those instructions. Comparing the
assembly of the helper with the TCGop they are pretty similar, so
I'm not sure why vprtybd took so much more time.

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-6-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28 13:15:22 -03:00
Lucas Mateus Castro (alqotel)
90b5aadb09 target/ppc: Move VNEG[WD] to decodtree and use gvec
Moved the instructions VNEGW and VNEGD to decodetree and used gvec to
decode it.

vnegw:
rept    loop    master             patch
8       12500   0,01053200         0,00548400 (-47.9%)
25      4000    0,01030500         0,00390000 (-62.2%)
100     1000    0,01096300         0,00395400 (-63.9%)
500     200     0,01472000         0,00712300 (-51.6%)
2500    40      0,03809000         0,02147700 (-43.6%)
8000    12      0,09957100         0,06202100 (-37.7%)

vnegd:
rept    loop    master             patch
8       12500   0,00594600         0,00543800 (-8.5%)
25      4000    0,00575200         0,00396400 (-31.1%)
100     1000    0,00676100         0,00394800 (-41.6%)
500     200     0,01149300         0,00709400 (-38.3%)
2500    40      0,03441500         0,02169600 (-37.0%)
8000    12      0,09516900         0,06337000 (-33.4%)

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-5-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28 13:15:22 -03:00
Lucas Mateus Castro (alqotel)
611bc69bf6 target/ppc: Move V(ADD|SUB)CUW to decodetree and use gvec
This patch moves VADDCUW and VSUBCUW to decodtree with gvec using an
implementation based on the helper, with the main difference being
changing the -1 (aka all bits set to 1) result returned by cmp when
true to +1. It also implemented a .fni4 version of those instructions
and dropped the helper.

vaddcuw:
rept    loop    master             patch
8       12500   0,01008200         0,00612400 (-39.3%)
25      4000    0,01091500         0,00471600 (-56.8%)
100     1000    0,01332500         0,00593700 (-55.4%)
500     200     0,01998500         0,01275700 (-36.2%)
2500    40      0,04704300         0,04364300 (-7.2%)
8000    12      0,10748200         0,11241000 (+4.6%)

vsubcuw:
rept    loop    master             patch
8       12500   0,01226200         0,00571600 (-53.4%)
25      4000    0,01493500         0,00462100 (-69.1%)
100     1000    0,01522700         0,00455100 (-70.1%)
500     200     0,02384600         0,01133500 (-52.5%)
2500    40      0,04935200         0,03178100 (-35.6%)
8000    12      0,09039900         0,09440600 (+4.4%)

Overall there was a gain in performance, but the TCGop code was still
slightly bigger in the new version (it went from 4 to 5).

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-4-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28 13:15:22 -03:00
Lucas Mateus Castro (alqotel)
306e475335 target/ppc: Move VMH[R]ADDSHS instruction to decodetree
This patch moves VMHADDSHS and VMHRADDSHS to decodetree I couldn't find
a satisfactory implementation with TCG inline.

vmhaddshs:
rept    loop    master             patch
8       12500   0,02983400         0,02648500 (-11.2%)
25      4000    0,02946000         0,02518000 (-14.5%)
100     1000    0,03104300         0,02638000 (-15.0%)
500     200     0,04002000         0,03502500 (-12.5%)
2500    40      0,08090100         0,07562200 (-6.5%)
8000    12      0,19242600         0,18626800 (-3.2%)

vmhraddshs:
rept    loop    master             patch
8       12500   0,03078600         0,02851000 (-7.4%)
25      4000    0,02793200         0,02746900 (-1.7%)
100     1000    0,02886000         0,02839900 (-1.6%)
500     200     0,03714700         0,03799200 (+2.3%)
2500    40      0,07948000         0,07852200 (-1.2%)
8000    12      0,19049800         0,18813900 (-1.2%)

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-3-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28 13:15:22 -03:00
Lucas Mateus Castro (alqotel)
dc46167a22 target/ppc: Moved VMLADDUHM to decodetree and use gvec
This patch moves VMLADDUHM to decodetree a creates a gvec implementation
using mul_vec and add_vec.

rept    loop    master             patch
8       12500   0,01810500         0,00903100 (-50.1%)
25      4000    0,01739400         0,00747700 (-57.0%)
100     1000    0,01843600         0,00901400 (-51.1%)
500     200     0,02574600         0,01971000 (-23.4%)
2500    40      0,05921600         0,07121800 (+20.3%)
8000    12      0,15326700         0,21725200 (+41.7%)

The significant difference in performance when REPT is low and LOOP is
high I think is due to the fact that the new implementation has a higher
translation time, as when using a helper only 5 TCGop are used but with
the patch a total of 10 TCGop are needed (Power lacks a direct mul_vec
equivalent so this instruction is implemented with the help of 5 others,
vmuleu, vmulou, vmrgh, vmrgl and vpkum).

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-2-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28 13:15:22 -03:00
Víctor Colombo
74177ec661 target/ppc: Merge fsqrt and fsqrts helpers
These two helpers are almost identical, differing only by the softfloat
operation it calls. Merge them into one using a macro.
Also, take this opportunity to capitalize the helper name as we moved
the instruction to decodetree in a previous patch.

Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220905123746.54659-4-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-09-20 10:54:06 -03:00
Víctor Colombo
53ae2aeb94 target/ppc: Implement hashstp and hashchkp
Implementation for instructions hashstp and hashchkp, the privileged
versions of hashst and hashchk, which were added in Power ISA 3.1B.

Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Lucas Mateus Castro <lucas.araujo@eldorado.org.br>
Message-Id: <20220715205439.161110-4-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-09-20 10:54:06 -03:00
Víctor Colombo
670f1da374 target/ppc: Implement hashst and hashchk
Implementation for instructions hashst and hashchk, which were added
in Power ISA 3.1B.

It was decided to implement the hash algorithm from ground up in this
patch exactly as described in Power ISA.

Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Lucas Mateus Castro <lucas.araujo@eldorado.org.br>
Message-Id: <20220715205439.161110-3-victor.colombo@eldorado.org.br>
[danielhb: fix block comment in excp_helper.c]
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-09-20 10:54:06 -03:00
Lucas Coutinho
491a25535c target/ppc: Implement slbiag
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-12-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18 13:59:43 -03:00
Lucas Coutinho
26d02c9d42 target/ppc: Move slbfee to decodetree
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-10-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18 13:59:43 -03:00
Lucas Coutinho
41b60e46b8 target/ppc: Move slbmfee to decodetree
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-9-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18 13:59:43 -03:00
Lucas Coutinho
74a153844e target/ppc: Move slbmfev to decodetree
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-8-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18 13:59:43 -03:00
Lucas Coutinho
0b0ba40fd2 target/ppc: Move slbmte to decodetree
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-7-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18 13:59:43 -03:00
Lucas Coutinho
2bfcb7a316 target/ppc: Move slbia to decodetree
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-6-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18 13:59:43 -03:00
Lucas Coutinho
a1b05c0625 target/ppc: Move slbieg to decodetree
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-5-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18 13:59:43 -03:00
Lucas Coutinho
43507e47e1 target/ppc: Move slbie to decodetree
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-4-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18 13:59:43 -03:00
Leandro Lupori
e7beaea55b target/ppc: Implement ISA 3.00 tlbie[l]
This initial version supports the invalidation of one or all
TLB entries. Flush by PID/LPID, or based in process/partition
scope is not supported, because it would make using the
generic QEMU TLB implementation hard. In these cases, all
entries are flushed.

Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220712193741.59134-3-leandro.lupori@eldorado.org.br>
[danielhb: moved 'set' declaration to TLBIE_RIC_PWC block]
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18 13:59:43 -03:00