qemu/include/exec
Emilio G. Cota f6bb84d531 tcg: consolidate TB lookups in tb_lookup__cpu_state
This avoids duplicating code. cpu_exec_step will also use the
new common function once we integrate parallel_cpus into tb->cflags.

Note that in this commit we also fix a race, described by Richard Henderson
during review. Think of this scenario with threads A and B:

   (A) Lookup succeeds for TB in hash without tb_lock
        (B) Sets the TB's tb->invalid flag
        (B) Removes the TB from tb_htable
        (B) Clears all CPU's tb_jmp_cache
   (A) Store TB into local tb_jmp_cache

Given that order of events, (A) will keep executing that invalid TB until
another flush of its tb_jmp_cache happens, which in theory might never happen.
We can fix this by checking the tb->invalid flag every time we look up a TB
from tb_jmp_cache, so that in the above scenario, next time we try to find
that TB in tb_jmp_cache, we won't, and will therefore be forced to look it
up in tb_htable.

Performance-wise, I measured a small improvement when booting debian-arm.
Note that inlining pays off:

 Performance counter stats for 'taskset -c 0 qemu-system-arm \
	-machine type=virt -nographic -smp 1 -m 4096 \
	-netdev user,id=unet,hostfwd=tcp::2222-:22 \
	-device virtio-net-device,netdev=unet \
	-drive file=jessie.qcow2,id=myblock,index=0,if=none \
	-device virtio-blk-device,drive=myblock \
	-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
	-name arm,debug-threads=on -smp 1' (10 runs):

Before:
      18714.917392 task-clock                #    0.952 CPUs utilized            ( +-  0.95% )
            23,142 context-switches          #    0.001 M/sec                    ( +-  0.50% )
                 1 CPU-migrations            #    0.000 M/sec
            10,558 page-faults               #    0.001 M/sec                    ( +-  0.95% )
    53,957,727,252 cycles                    #    2.883 GHz                      ( +-  0.91% ) [83.33%]
    24,440,599,852 stalled-cycles-frontend   #   45.30% frontend cycles idle     ( +-  1.20% ) [83.33%]
    16,495,714,424 stalled-cycles-backend    #   30.57% backend  cycles idle     ( +-  0.95% ) [66.66%]
    76,267,572,582 instructions              #    1.41  insns per cycle
                                             #    0.32  stalled cycles per insn  ( +-  0.87% ) [83.34%]
    12,692,186,323 branches                  #  678.186 M/sec                    ( +-  0.92% ) [83.35%]
       263,486,879 branch-misses             #    2.08% of all branches          ( +-  0.73% ) [83.34%]

      19.648474449 seconds time elapsed                                          ( +-  0.82% )

After, w/ inline (this patch):
      18471.376627 task-clock                #    0.955 CPUs utilized            ( +-  0.96% )
            23,048 context-switches          #    0.001 M/sec                    ( +-  0.48% )
                 1 CPU-migrations            #    0.000 M/sec
            10,708 page-faults               #    0.001 M/sec                    ( +-  0.81% )
    53,208,990,796 cycles                    #    2.881 GHz                      ( +-  0.98% ) [83.34%]
    23,941,071,673 stalled-cycles-frontend   #   44.99% frontend cycles idle     ( +-  0.95% ) [83.34%]
    16,161,773,848 stalled-cycles-backend    #   30.37% backend  cycles idle     ( +-  0.76% ) [66.67%]
    75,786,269,766 instructions              #    1.42  insns per cycle
                                             #    0.32  stalled cycles per insn  ( +-  1.24% ) [83.34%]
    12,573,617,143 branches                  #  680.708 M/sec                    ( +-  1.34% ) [83.33%]
       260,235,550 branch-misses             #    2.07% of all branches          ( +-  0.66% ) [83.33%]

      19.340502161 seconds time elapsed                                          ( +-  0.56% )

After, w/o inline:
      18791.253967 task-clock                #    0.954 CPUs utilized            ( +-  0.78% )
            23,230 context-switches          #    0.001 M/sec                    ( +-  0.42% )
                 1 CPU-migrations            #    0.000 M/sec
            10,563 page-faults               #    0.001 M/sec                    ( +-  1.27% )
    54,168,674,622 cycles                    #    2.883 GHz                      ( +-  0.80% ) [83.34%]
    24,244,712,629 stalled-cycles-frontend   #   44.76% frontend cycles idle     ( +-  1.37% ) [83.33%]
    16,288,648,572 stalled-cycles-backend    #   30.07% backend  cycles idle     ( +-  0.95% ) [66.66%]
    77,659,755,503 instructions              #    1.43  insns per cycle
                                             #    0.31  stalled cycles per insn  ( +-  0.97% ) [83.34%]
    12,922,780,045 branches                  #  687.702 M/sec                    ( +-  1.06% ) [83.34%]
       261,962,386 branch-misses             #    2.03% of all branches          ( +-  0.71% ) [83.35%]

      19.700174670 seconds time elapsed                                          ( +-  0.56% )

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10 07:37:10 -07:00
..
user linux-user: Use correct alignment for long long on i386 guests 2016-08-04 16:34:59 +03:00
address-spaces.h Clean up header guards that don't match their file name 2016-07-12 16:19:16 +02:00
cpu_ldst_template.h trace: switch to modular code generation for sub-directories 2017-01-31 17:11:18 +00:00
cpu_ldst_useronly_template.h trace: switch to modular code generation for sub-directories 2017-01-31 17:11:18 +00:00
cpu_ldst.h cpu_ldst.h: use correct guest address parameter 2016-11-22 23:26:51 +01:00
cpu-all.h exec: introduce MemoryRegionCache 2016-12-22 16:00:23 +01:00
cpu-common.h cpu: Introduce a wrapper for tlb_flush() that can be used in common code 2017-07-04 14:30:03 +02:00
cpu-defs.h cputlb: bring back tlb_flush_count under !TLB_DEBUG 2017-10-10 07:37:10 -07:00
cputlb.h cputlb: bring back tlb_flush_count under !TLB_DEBUG 2017-10-10 07:37:10 -07:00
exec-all.h exec-all: fix typos in TranslationBlock's documentation 2017-10-10 07:37:10 -07:00
gdbstub.h gdbstub: rename cpu_index -> cpu_gdb_index 2017-07-14 12:04:41 +02:00
gen-icount.h gen-icount: use tcg_ctx.tcg_env instead of cpu_env 2017-06-30 11:40:59 -07:00
helper-gen.h Clean up decorations and whitespace around header guards 2016-07-12 16:20:46 +02:00
helper-head.h Clean up header guards that don't match their file name 2016-07-12 16:19:16 +02:00
helper-proto.h Clean up decorations and whitespace around header guards 2016-07-12 16:20:46 +02:00
helper-tcg.h tcg: Expand glue macros before stringifying helper names 2017-07-19 14:45:15 -07:00
hwaddr.h hw: Clean up includes 2016-06-07 18:19:23 +03:00
ioport.h hw: clean up hw/hw.h includes 2016-05-19 16:42:30 +02:00
log.h log: do not unnecessarily include qom/cpu.h 2016-02-03 09:19:10 +00:00
memattrs.h memory.h: Move MemTxResult type to memattrs.h 2017-09-04 15:21:54 +01:00
memory-internal.h memory: Rework "info mtree" to print flat views and dispatch trees 2017-09-21 23:19:38 +02:00
memory.h memory: trace FlatView creation and destruction 2017-09-22 01:06:51 +02:00
poison.h include/exec/poison: Mark CONFIG_SOFTMMU as poisoned 2017-07-04 14:39:11 +02:00
ram_addr.h cpu_physical_memory_sync_dirty_bitmap: Fix alignment check 2017-08-01 17:27:33 +02:00
ramlist.h ramblock: add new hmp command "info ramblock" 2017-05-17 17:31:16 +01:00
semihost.h semihosting: add --semihosting-config arg sub-argument 2015-06-19 14:17:45 +01:00
softmmu-semi.h Clean up decorations and whitespace around header guards 2016-07-12 16:20:46 +02:00
target_page.h migration: Make savevm.c target independent 2017-05-18 19:21:00 +02:00
tb-context.h tcg: allocate TB structs before the corresponding translated code 2017-06-19 11:10:59 -07:00
tb-hash-xx.h exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state 2017-07-17 13:11:05 +01:00
tb-hash.h exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state 2017-07-17 13:11:05 +01:00
tb-lookup.h tcg: consolidate TB lookups in tb_lookup__cpu_state 2017-10-10 07:37:10 -07:00
translator.h tcg: Add generic translation framework 2017-09-06 08:06:47 -07:00