Commit 47e83d9107 ended up unintentionally changing the control flow
of ppc_radix64_process_scoped_xlate(). When guest_visible is false,
it must not raise an exception, even if the radix configuration is
not valid.
This regression prevented Linux boot in a nested environment with
L1 using TCG and emulating KVM (cap-nested-hv=on) and L2 using
KVM. L2 would hang on Linux's futex_init(), when it tested how a
futex_atomic_cmpxchg_inatomic() handled a fault, because L1 would
start a loop of trying to perform partition scoped translations
and raising exceptions.
Fixes: 47e83d9107 ("target/ppc: Improve Radix xlate level validation")
Reported-by: Victor Colombo <victor.colombo@eldorado.org.br>
Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Tested-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20221028183617.121786-1-leandro.lupori@eldorado.org.br>
[danielhb: use %"PRIu64" to print 'nls']
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Profiling QEMU during Fedora 35 for PPC64 boot revealed that
6.39% of total time was being spent in helper_insns_inc(), on a
POWER9 machine. To avoid calling this helper every time PMCs had
to be incremented, an inline implementation of PMC5 increment and
check for overflow was developed. This led to a reduction of
about 12% in Fedora's boot time.
Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20221025202424.195984-4-leandro.lupori@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Add 2 new PMC related HFLAGS:
- HFLAGS_PMCJCE - value of MMCR0 PMCjCE bit
- HFLAGS_PMC_OTHER - set if a PMC other than PMC5-6 is enabled
These flags allow further optimization of PMC5 update code, by
allowing frequently tested conditions to be performed at
translation time.
Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20221025202424.195984-3-leandro.lupori@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Move the methods to excp_helper.c and make them static.
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20221021142156.4134411-4-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Now that cs->interrupt_request indicates if there is any unmasked
interrupt, checking if the CPU has work to do can be simplified to a
single check that works for all CPU models.
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20221021142156.4134411-3-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
This new method will check if any pending interrupt was unmasked and
then call cpu_interrupt/cpu_reset_interrupt accordingly. Code that
raises/lowers or masks/unmasks interrupts should call this method to
keep CPU_INTERRUPT_HARD coherent with env->pending_interrupts.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20221021142156.4134411-2-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Export p7_interrupt_powersave and use it in p7_next_unmasked_interrupt.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221011204829.1641124-26-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Move the interrupt masking logic out of cpu_has_work_POWER7 in a new
method, p7_interrupt_powersave, that only returns an interrupt if it can
wake the processor from power-saving mode.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221011204829.1641124-25-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Remove the following unused interrupts from the POWER7 interrupt
processing method:
- PPC_INTERRUPT_RESET: only raised for 6xx, 7xx, 970 and POWER5p;
- Hypervisor Virtualization: introduced in Power ISA v3.0;
- Hypervisor Doorbell and Event-Based Branch: introduced in
Power ISA v2.07;
- Critical Input, Watchdog Timer, and Fixed Interval Timer: only defined
for embedded CPUs;
- Doorbell and Critical Doorbell Interrupt: processor does not implement
the Embedded.Processor Control category;
- Programmable Interval Timer: 40x-only;
- PPC_INTERRUPT_THERM: only raised for 970 and POWER5p;
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221011204829.1641124-23-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The new method is identical to ppc_deliver_interrupt, processor-specific
code will be added/removed in the following patches.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221011204829.1641124-22-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Remove the following unused interrupts from the POWER7 interrupt masking
method:
- PPC_INTERRUPT_RESET: only raised for 6xx, 7xx, 970 and POWER5p;
- Hypervisor Virtualization: introduced in Power ISA v3.0;
- Hypervisor Doorbell and Event-Based Branch: introduced in
Power ISA v2.07;
- Critical Input, Watchdog Timer, and Fixed Interval Timer: only defined
for embedded CPUs;
- Doorbell and Critical Doorbell Interrupt: processor does not implement
the Embedded.Processor Control category;
- Programmable Interval Timer: 40x-only;
- PPC_INTERRUPT_THERM: only raised for 970 and POWER5p;
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221011204829.1641124-21-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The new method is identical to ppc_next_unmasked_interrupt_generic,
processor-specific code will be added/removed in the following patches.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221011204829.1641124-20-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Export p8_interrupt_powersave and use it in p8_next_unmasked_interrupt.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221011204829.1641124-19-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Move the interrupt masking logic out of cpu_has_work_POWER8 in a new
method, p8_interrupt_powersave, that only returns an interrupt if it can
wake the processor from power-saving mode.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221011204829.1641124-18-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Remove the following unused interrupts from the POWER8 interrupt
processing method:
- PPC_INTERRUPT_RESET: only raised for 6xx, 7xx, 970 and POWER5p;
- Debug Interrupt: removed in Power ISA v2.07;
- Hypervisor Virtualization: introduced in Power ISA v3.0;
- Critical Input, Watchdog Timer, and Fixed Interval Timer: only defined
for embedded CPUs;
- Critical Doorbell: processor does not implement the
"Embedded.Processor Control" category;
- Programmable Interval Timer: 40x-only;
- PPC_INTERRUPT_THERM: only raised for 970 and POWER5p;
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221011204829.1641124-16-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The new method is identical to ppc_deliver_interrupt, processor-specific
code will be added/removed in the following patches.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221011204829.1641124-15-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Remove the following unused interrupts from the POWER8 interrupt masking
method:
- PPC_INTERRUPT_RESET: only raised for 6xx, 7xx, 970, and POWER5p;
- Debug Interrupt: removed in Power ISA v2.07;
- Hypervisor Virtualization: introduced in Power ISA v3.0;
- Critical Input, Watchdog Timer, and Fixed Interval Timer: only defined
for embedded CPUs;
- Critical Doorbell: processor does not implement the "Embedded.Processor
Control" category;
- Programmable Interval Timer: 40x-only;
- PPC_INTERRUPT_THERM: only raised for 970 and POWER5p;
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221011204829.1641124-14-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The new method is identical to ppc_next_unmasked_interrupt_generic,
processor-specific code will be added/removed in the following patches.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221011204829.1641124-13-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Export p9_interrupt_powersave and use it in p9_next_unmasked_interrupt.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221011204829.1641124-12-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Move the interrupt masking logic out of cpu_has_work_POWER9 in a new
method, p9_interrupt_powersave, that only returns an interrupt if it can
wake the processor from power-saving mode.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221011204829.1641124-11-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Remove the following unused interrupts from the POWER9 interrupt
processing method:
- PPC_INTERRUPT_RESET: only raised for 6xx, 7xx, 970 and POWER5p;
- Debug Interrupt: removed in Power ISA v2.07;
- Critical Input, Watchdog Timer, and Fixed Interval Timer: only defined
for embedded CPUs;
- Critical Doorbell Interrupt: removed in Power ISA v3.0;
- Programmable Interval Timer: 40x-only.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221011204829.1641124-9-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The new method is identical to ppc_deliver_interrupt, processor-specific
code will be added/removed in the following patches.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221011204829.1641124-8-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Remove the following unused interrupts from the POWER9 interrupt masking
method:
- PPC_INTERRUPT_RESET: only raised for 6xx, 7xx, 970 and POWER5p;
- Debug Interrupt: removed in Power ISA v2.07;
- Critical Input, Watchdog Timer, and Fixed Interval Timer: only defined
for embedded CPUs;
- Critical Doorbell Interrupt: removed in Power ISA v3.0;
- Programmable Interval Timer: 40x-only.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221011204829.1641124-7-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The new method is identical to ppc_next_unmasked_interrupt_generic,
processor-specific code will be added/removed in the following patches.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221011204829.1641124-6-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Use ppc_set_irq to raise/clear interrupts to ensure CPU_INTERRUPT_HARD
will be set/reset accordingly.
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20221011204829.1641124-3-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
This enum defines the bit positions in env->pending_interrupts for each
interrupt. However, except for the comparison in kvmppc_set_interrupt,
the values are always used as (1 << PPC_INTERRUPT_*). Define them
directly like that to save some clutter. No functional change intended.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20221011204829.1641124-2-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Used gvec to translate XVTSTDCSP and XVTSTDCDP.
xvtstdcsp:
rept loop imm master version prev version current version
25 4000 0 0,206200 0,040730 (-80.2%) 0,040740 (-80.2%)
25 4000 1 0,205120 0,053650 (-73.8%) 0,053510 (-73.9%)
25 4000 3 0,206160 0,058630 (-71.6%) 0,058570 (-71.6%)
25 4000 51 0,217110 0,191490 (-11.8%) 0,192320 (-11.4%)
25 4000 127 0,206160 0,191490 (-7.1%) 0,192640 (-6.6%)
8000 12 0 1,234719 0,418833 (-66.1%) 0,386365 (-68.7%)
8000 12 1 1,232417 1,435979 (+16.5%) 1,462792 (+18.7%)
8000 12 3 1,232760 1,766073 (+43.3%) 1,743990 (+41.5%)
8000 12 51 1,239281 1,319562 (+6.5%) 1,423479 (+14.9%)
8000 12 127 1,231708 1,315760 (+6.8%) 1,426667 (+15.8%)
xvtstdcdp:
rept loop imm master version prev version current version
25 4000 0 0,159930 0,040830 (-74.5%) 0,040610 (-74.6%)
25 4000 1 0,160640 0,053670 (-66.6%) 0,053480 (-66.7%)
25 4000 3 0,160020 0,063030 (-60.6%) 0,062960 (-60.7%)
25 4000 51 0,160410 0,128620 (-19.8%) 0,127470 (-20.5%)
25 4000 127 0,160330 0,127670 (-20.4%) 0,128690 (-19.7%)
8000 12 0 1,190365 0,422146 (-64.5%) 0,388417 (-67.4%)
8000 12 1 1,191292 1,445312 (+21.3%) 1,428698 (+19.9%)
8000 12 3 1,188687 1,980656 (+66.6%) 1,975354 (+66.2%)
8000 12 51 1,191250 1,264500 (+6.1%) 1,355083 (+13.8%)
8000 12 127 1,197313 1,266729 (+5.8%) 1,349156 (+12.7%)
Overall, these instructions are the hardest ones to measure performance
as the gvec implementation is affected by the immediate. Above there are
5 different scenarios when it comes to immediate and 2 when it comes to
rept/loop combination. The immediates scenarios are: all bits are 0
therefore the target register should just be changed to 0, with 1 bit
set, with 2 bits set in a combination the new implementation can deal
with using gvec, 4 bits set and the new implementation can't deal with
it using gvec and all bits set. The rept/loop scenarios are high loop
and low rept (so it should spend more time executing it than translating
it) and high rept low loop (so it should spend more time translating it
than executing this code).
These comparisons are between the upstream version, a previous similar
implementation and a one with a cleaner code(this one).
For a comparison with o previous different implementation:
<20221010191356.83659-13-lucas.araujo@eldorado.org.br>
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-13-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Moved XSTSTDCSP, XSTSTDCDP and XSTSTDCQP to decodetree and moved some of
its decoding away from the helper as previously the DCMX, XB and BF were
calculated in the helper with the help of cpu_env, now that part was
moved to the decodetree with the rest.
xvtstdcsp:
rept loop master patch
8 12500 1,85393600 1,94683600 (+5.0%)
25 4000 1,78779800 1,92479000 (+7.7%)
100 1000 2,12775000 2,28895500 (+7.6%)
500 200 2,99655300 3,23102900 (+7.8%)
2500 40 6,89082200 7,44827500 (+8.1%)
8000 12 17,50585500 18,95152100 (+8.3%)
xvtstdcdp:
rept loop master patch
8 12500 1,39043100 1,33539800 (-4.0%)
25 4000 1,35731800 1,37347800 (+1.2%)
100 1000 1,51514800 1,56053000 (+3.0%)
500 200 2,21014400 2,47906000 (+12.2%)
2500 40 5,39488200 6,68766700 (+24.0%)
8000 12 13,98623900 18,17661900 (+30.0%)
xvtstdcdp:
rept loop master patch
8 12500 1,35123800 1,34455800 (-0.5%)
25 4000 1,36441200 1,36759600 (+0.2%)
100 1000 1,49763500 1,54138400 (+2.9%)
500 200 2,19020200 2,46196400 (+12.4%)
2500 40 5,39265700 6,68147900 (+23.9%)
8000 12 14,04163600 18,19669600 (+29.6%)
As some values are now decoded outside the helper and passed to it as an
argument the number of arguments of the helper increased, the number
of TCGop needed to load the arguments increased. I suspect that's why
the slow-down in the tests with a high REPT but low LOOP.
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-12-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Moved XVTSTDCSP and XVTSTDCDP to decodetree an restructured the helper
to be simpler and do all decoding in the decodetree (so XB, XT and DCMX
are all calculated outside the helper).
Obs: The tests in this one are slightly different, these are the sum of
these instructions with all possible immediate and those instructions
are repeated 10 times.
xvtstdcsp:
rept loop master patch
8 12500 2,76402100 2,70699100 (-2.1%)
25 4000 2,64867100 2,67884100 (+1.1%)
100 1000 2,73806300 2,78701000 (+1.8%)
500 200 3,44666500 3,61027600 (+4.7%)
2500 40 5,85790200 6,47475500 (+10.5%)
8000 12 15,22102100 17,46062900 (+14.7%)
xvtstdcdp:
rept loop master patch
8 12500 2,11818000 1,61065300 (-24.0%)
25 4000 2,04573400 1,60132200 (-21.7%)
100 1000 2,13834100 1,69988100 (-20.5%)
500 200 2,73977000 2,48631700 (-9.3%)
2500 40 5,05067000 5,25914100 (+4.1%)
8000 12 14,60507800 15,93704900 (+9.1%)
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-11-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Moved VPRTYBW and VPRTYBD to use gvec and both of them and VPRTYBQ to
decodetree. VPRTYBW and VPRTYBD now also use .fni4 and .fni8,
respectively.
vprtybw:
rept loop master patch
8 12500 0,01198900 0,00703100 (-41.4%)
25 4000 0,01070100 0,00571400 (-46.6%)
100 1000 0,01123300 0,00678200 (-39.6%)
500 200 0,01601500 0,01535600 (-4.1%)
2500 40 0,03872900 0,05562100 (43.6%)
8000 12 0,10047000 0,16643000 (65.7%)
vprtybd:
rept loop master patch
8 12500 0,00757700 0,00788100 (4.0%)
25 4000 0,00652500 0,00669600 (2.6%)
100 1000 0,00714400 0,00825400 (15.5%)
500 200 0,01211000 0,01903700 (57.2%)
2500 40 0,03483800 0,07021200 (101.5%)
8000 12 0,09591800 0,21036200 (119.3%)
vprtybq:
rept loop master patch
8 12500 0,00675600 0,00667200 (-1.2%)
25 4000 0,00619400 0,00643200 (3.8%)
100 1000 0,00707100 0,00751100 (6.2%)
500 200 0,01199300 0,01342000 (11.9%)
2500 40 0,03490900 0,04092900 (17.2%)
8000 12 0,09588200 0,11465100 (19.6%)
I wasn't expecting such a performance lost in both VPRTYBD and VPRTYBQ,
I'm not sure if it's worth to move those instructions. Comparing the
assembly of the helper with the TCGop they are pretty similar, so
I'm not sure why vprtybd took so much more time.
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-6-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
This patch moves VADDCUW and VSUBCUW to decodtree with gvec using an
implementation based on the helper, with the main difference being
changing the -1 (aka all bits set to 1) result returned by cmp when
true to +1. It also implemented a .fni4 version of those instructions
and dropped the helper.
vaddcuw:
rept loop master patch
8 12500 0,01008200 0,00612400 (-39.3%)
25 4000 0,01091500 0,00471600 (-56.8%)
100 1000 0,01332500 0,00593700 (-55.4%)
500 200 0,01998500 0,01275700 (-36.2%)
2500 40 0,04704300 0,04364300 (-7.2%)
8000 12 0,10748200 0,11241000 (+4.6%)
vsubcuw:
rept loop master patch
8 12500 0,01226200 0,00571600 (-53.4%)
25 4000 0,01493500 0,00462100 (-69.1%)
100 1000 0,01522700 0,00455100 (-70.1%)
500 200 0,02384600 0,01133500 (-52.5%)
2500 40 0,04935200 0,03178100 (-35.6%)
8000 12 0,09039900 0,09440600 (+4.4%)
Overall there was a gain in performance, but the TCGop code was still
slightly bigger in the new version (it went from 4 to 5).
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-4-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
This patch moves VMLADDUHM to decodetree a creates a gvec implementation
using mul_vec and add_vec.
rept loop master patch
8 12500 0,01810500 0,00903100 (-50.1%)
25 4000 0,01739400 0,00747700 (-57.0%)
100 1000 0,01843600 0,00901400 (-51.1%)
500 200 0,02574600 0,01971000 (-23.4%)
2500 40 0,05921600 0,07121800 (+20.3%)
8000 12 0,15326700 0,21725200 (+41.7%)
The significant difference in performance when REPT is low and LOOP is
high I think is due to the fact that the new implementation has a higher
translation time, as when using a helper only 5 TCGop are used but with
the patch a total of 10 TCGop are needed (Power lacks a direct mul_vec
equivalent so this instruction is implemented with the help of 5 others,
vmuleu, vmulou, vmrgh, vmrgl and vpkum).
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-2-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The macro is missing a '{' after the if condition. Any use of REQUIRE_HV
would cause a compilation error.
Fixes: fc34e81acd ("target/ppc: add macros to check privilege level")
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221006200654.725390-4-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
This instruction was added by Power ISA 3.0, using PPC2_PRCNTL makes it
available for older processors, like de e5500 and e6500.
Fixes: 7af1e7b022 ("target/ppc: add support for hypervisor doorbells on book3s CPUs")
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221006200654.725390-3-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
On Power ISA v2.07, the category for these instructions became
"Embedded.Processor Control" or "Book S".
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20221006200654.725390-2-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>