qemu/target/arm
Richard Henderson e979972a6a target/arm: Rely on hflags correct in cpu_get_tb_cpu_state
This is the payoff.

From perf record -g data of ubuntu 18 boot and shutdown:

BEFORE:

-   23.02%     2.82%  qemu-system-aar  [.] helper_lookup_tb_ptr
   - 20.22% helper_lookup_tb_ptr
      + 10.05% tb_htable_lookup
      - 9.13% cpu_get_tb_cpu_state
           3.20% aa64_va_parameters_both
           0.55% fp_exception_el

-   11.66%     4.74%  qemu-system-aar  [.] cpu_get_tb_cpu_state
   - 6.96% cpu_get_tb_cpu_state
        3.63% aa64_va_parameters_both
        0.60% fp_exception_el
        0.53% sve_exception_el

AFTER:

-   16.40%     3.40%  qemu-system-aar  [.] helper_lookup_tb_ptr
   - 13.03% helper_lookup_tb_ptr
      + 11.19% tb_htable_lookup
        0.55% cpu_get_tb_cpu_state

     0.98%     0.71%  qemu-system-aar  [.] cpu_get_tb_cpu_state

     0.87%     0.24%  qemu-system-aar  [.] rebuild_hflags_a64

Before, helper_lookup_tb_ptr is the second hottest function in the
application, consuming almost a quarter of the runtime.  Within the
entire execution, cpu_get_tb_cpu_state consumes about 12%.

After, helper_lookup_tb_ptr has dropped to the fourth hottest function,
with consumption dropping to a sixth of the runtime.  Within the
entire execution, cpu_get_tb_cpu_state has dropped below 1%, and the
supporting function to rebuild hflags also consumes about 1%.

Assertions are retained for --enable-debug-tcg.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-25-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-10-24 17:16:28 +01:00
..
a32-uncond.decode target/arm: Convert Unallocated memory hint 2019-09-05 13:23:03 +01:00
a32.decode target/arm: Convert SVC 2019-09-05 13:23:03 +01:00
arch_dump.c
arm_ldst.h
arm-powerctl.c
arm-powerctl.h
arm-semi.c target/arm/arm-semi: Implement SH_EXT_STDOUT_STDERR extension 2019-10-15 18:09:04 +01:00
cpu64.c
cpu-param.h
cpu-qom.h hw/core: Move cpu.c, cpu.h from qom/ to hw/core/ 2019-08-21 13:24:01 +02:00
cpu.c target/arm: Rebuild hflags at EL changes 2019-10-24 17:16:28 +01:00
cpu.h target/arm: Add arm_rebuild_hflags 2019-10-24 17:16:28 +01:00
crypto_helper.c
debug_helper.c
gdbstub64.c
gdbstub.c
helper-a64.c target/arm: Rebuild hflags at EL changes 2019-10-24 17:16:28 +01:00
helper-a64.h
helper-sve.h
helper.c target/arm: Rely on hflags correct in cpu_get_tb_cpu_state 2019-10-24 17:16:28 +01:00
helper.h target/arm: Add HELPER(rebuild_hflags_{a32, a64, m32}) 2019-10-24 17:16:28 +01:00
idau.h
internals.h target/arm: Split out arm_mmu_idx_el 2019-10-24 17:16:28 +01:00
iwmmxt_helper.c
kvm32.c
kvm64.c
kvm_arm.h intc/arm_gic: Support IRQ injection for more than 256 vpus 2019-10-15 18:09:02 +01:00
kvm-consts.h
kvm-stub.c
kvm.c ARM: KVM: Check KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 for smp_cpus > 256 2019-10-15 18:09:02 +01:00
m_helper.c target/arm: Rebuild hflags for M-profile 2019-10-24 17:16:28 +01:00
machine.c target/arm: Rebuild hflags at EL changes 2019-10-24 17:16:28 +01:00
Makefile.objs target/arm: Add skeleton for T16 decodetree 2019-09-05 13:23:03 +01:00
monitor.c
neon_helper.c
op_addsub.h
op_helper.c target/arm: Rebuild hflags at CPSR writes 2019-10-24 17:16:28 +01:00
pauth_helper.c
psci.c
sve_helper.c
sve.decode
t16.decode target/arm: Convert T16, long branches 2019-09-05 13:23:04 +01:00
t32.decode target/arm: Convert TT 2019-09-05 13:23:03 +01:00
tlb_helper.c
trace-events
translate-a64.c target/arm: Rebuild hflags at MSR writes 2019-10-24 17:16:28 +01:00
translate-a64.h Allow page table bit to swap endianness. 2019-09-04 16:29:18 +01:00
translate-sve.c tcg: TCGMemOp is now accelerator independent MemOp 2019-09-03 08:30:38 -07:00
translate-vfp.inc.c target/arm: Free TCG temps in trans_VMOV_64_sp() 2019-09-03 16:20:35 +01:00
translate.c target/arm: Rebuild hflags for M-profile 2019-10-24 17:16:28 +01:00
translate.h Allow page table bit to swap endianness. 2019-09-04 16:29:18 +01:00
vec_helper.c
vfp_helper.c
vfp-uncond.decode
vfp.decode