qemu/linux-user
Emilio G. Cota 0ac20318ce tcg: remove tb_lock
Use mmap_lock in user-mode to protect TCG state and the page descriptors.
In !user-mode, each vCPU has its own TCG state, so no locks needed.
Per-page locks are used to protect the page descriptors.

Per-TB locks are used in both modes to protect TB jumps.

Some notes:

- tb_lock is removed from notdirty_mem_write by passing a
  locked page_collection to tb_invalidate_phys_page_fast.

- tcg_tb_lookup/remove/insert/etc have their own internal lock(s),
  so there is no need to further serialize access to them.

- do_tb_flush is run in a safe async context, meaning no other
  vCPU threads are running. Therefore acquiring mmap_lock there
  is just to please tools such as thread sanitizer.

- Not visible in the diff, but tb_invalidate_phys_page already
  has an assert_memory_lock.

- cpu_io_recompile is !user-only, so no mmap_lock there.

- Added mmap_unlock()'s before all siglongjmp's that could
  be called in user-mode while mmap_lock is held.
  + Added an assert for !have_mmap_lock() after returning from
    the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic.

Performance numbers before/after:

Host: AMD Opteron(tm) Processor 6376

                 ubuntu 17.04 ppc64 bootup+shutdown time

  700 +-+--+----+------+------------+-----------+------------*--+-+
      |    +    +      +            +           +           *B    |
      |         before ***B***                            ** *    |
      |tb lock removal ###D###                         ***        |
  600 +-+                                           ***         +-+
      |                                           **         #    |
      |                                        *B*          #D    |
      |                                     *** *         ##      |
  500 +-+                                ***           ###      +-+
      |                             * ***           ###           |
      |                            *B*          # ##              |
      |                          ** *          #D#                |
  400 +-+                      **            ##                 +-+
      |                      **           ###                     |
      |                    **           ##                        |
      |                  **         # ##                          |
  300 +-+  *           B*          #D#                          +-+
      |    B         ***        ###                               |
      |    *       **       ####                                  |
      |     *   ***      ###                                      |
  200 +-+   B  *B     #D#                                       +-+
      |     #B* *   ## #                                          |
      |     #*    ##                                              |
      |    + D##D#     +            +           +            +    |
  100 +-+--+----+------+------------+-----------+------------+--+-+
           1    8      16      Guest CPUs       48           64
  png: https://imgur.com/HwmBHXe

              debian jessie aarch64 bootup+shutdown time

  90 +-+--+-----+-----+------------+------------+------------+--+-+
     |    +     +     +            +            +            +    |
     |         before ***B***                                B    |
  80 +tb lock removal ###D###                              **D  +-+
     |                                                   **###    |
     |                                                 **##       |
  70 +-+                                             ** #       +-+
     |                                             ** ##          |
     |                                           **  #            |
  60 +-+                                       *B  ##           +-+
     |                                       **  ##               |
     |                                    ***  #D                 |
  50 +-+                               ***   ##                 +-+
     |                             * **   ###                     |
     |                           **B*  ###                        |
  40 +-+                     ****  # ##                         +-+
     |                   ****     #D#                             |
     |             ***B**      ###                                |
  30 +-+    B***B**        ####                                 +-+
     |    B *   *     # ###                                       |
     |     B       ###D#                                          |
  20 +-+   D  ##D##                                             +-+
     |      D#                                                    |
     |    +     +     +            +            +            +    |
  10 +-+--+-----+-----+------------+------------+------------+--+-+
          1     8     16      Guest CPUs        48           64
  png: https://imgur.com/iGpGFtv

The gains are high for 4-8 CPUs. Beyond that point, however, unrelated
lock contention significantly hurts scalability.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 08:18:48 -10:00
..
aarch64 linux-user: move generic signal definitions to generic/signal.h 2018-06-04 01:30:44 +02:00
alpha linux-user/alpha: Fix epoll syscalls 2018-06-11 14:44:22 +02:00
arm linux-user: move generic signal definitions to generic/signal.h 2018-06-04 01:30:44 +02:00
cris linux-user: move generic signal definitions to generic/signal.h 2018-06-04 01:30:44 +02:00
generic linux-user: move generic signal definitions to generic/signal.h 2018-06-04 01:30:44 +02:00
host linux-user: Fix register used for 6th and 7th syscall argument on aarch64 2018-02-18 18:52:32 +01:00
hppa linux-user/hppa: Fix typo in mknodat syscall 2018-06-11 14:45:44 +02:00
i386 linux-user: move generic signal definitions to generic/signal.h 2018-06-04 01:30:44 +02:00
m68k linux-user: move generic signal definitions to generic/signal.h 2018-06-04 01:30:44 +02:00
microblaze linux-user/microblaze: Fix typo in accept4 syscall 2018-06-11 14:47:08 +02:00
mips linux-user: move mips signal definitions to mips/target_signal.h 2018-06-04 01:30:44 +02:00
mips64 linux-user: move mips signal definitions to mips/target_signal.h 2018-06-04 01:30:44 +02:00
nios2 linux-user: move generic signal definitions to generic/signal.h 2018-06-04 01:30:44 +02:00
openrisc linux-user: move openrisc signal definitions to openrisc/target_signal.h 2018-06-04 01:30:44 +02:00
ppc linux-user: move generic signal definitions to generic/signal.h 2018-06-04 01:30:44 +02:00
riscv linux-user: move generic signal definitions to generic/signal.h 2018-06-04 01:30:44 +02:00
s390x linux-user: move generic signal definitions to generic/signal.h 2018-06-04 01:30:44 +02:00
sh4 linux-user: move generic signal definitions to generic/signal.h 2018-06-04 01:30:44 +02:00
sparc linux-user: move sparc signal definitions to sparc/target_signal.h 2018-06-04 01:30:44 +02:00
sparc64 linux-user/sparc64: Add inotify_rm_watch and tee syscalls 2018-06-11 14:47:45 +02:00
tilegx linux-user: move generic signal definitions to generic/signal.h 2018-06-04 01:30:44 +02:00
x86_64 linux-user: move generic signal definitions to generic/signal.h 2018-06-04 01:30:44 +02:00
xtensa linux-user: move generic signal definitions to generic/signal.h 2018-06-04 01:30:44 +02:00
cpu_loop-common.h linux-user: create a dummy per arch cpu_loop.c 2018-04-30 09:47:55 +02:00
elfload.c target/arm: Introduce ARM_FEATURE_V8_ATOMICS and initial decode 2018-05-10 18:10:57 +01:00
errno_defs.h linux-user: Handle ERFKILL and EHWPOISON 2017-01-22 18:14:10 -08:00
flat.h Support for 32 bit ABI on 64 bit targets (only enabled Sparc64) 2007-10-14 16:27:31 +00:00
flatload.c linux-user: Use is_error() to avoid warnings and make the code clearer 2018-06-11 14:40:11 +02:00
ioctls.h linux-user: Implement ioctl cmd TIOCGPTPEER 2018-02-18 18:52:32 +01:00
linux_loop.h linux-user: Add loop control ioctls 2016-07-19 15:22:33 +03:00
linuxload.c linux-user: Clean up includes 2016-01-29 15:07:22 +00:00
m68k-sim.c linux-user: Clean up includes 2016-01-29 15:07:22 +00:00
main.c tcg: remove tb_lock 2018-06-15 08:18:48 -10:00
Makefile.objs linux-user: create a dummy per arch cpu_loop.c 2018-04-30 09:47:55 +02:00
mmap.c linux-user: drop unused target_msync function 2018-03-13 11:30:22 -07:00
qemu.h linux-user: Export use is_error(), use it to avoid warnings 2018-06-11 14:40:11 +02:00
safe-syscall.S linux-user: Provide safe_syscall for fixing races between signals and syscalls 2016-05-27 14:49:51 +03:00
signal-common.h linux-user: introduce target_sigsp() and target_save_altstack() 2018-05-03 18:29:15 +02:00
signal.c linux-user: move get_sp_from_cpustate() to target_cpu.h 2018-06-04 01:30:44 +02:00
socket.h linux-user: update ARCH_HAS_SOCKET_TYPES use 2018-05-25 10:10:55 +02:00
strace.c linux-user: fix O_TMPFILE handling 2017-10-16 16:00:56 +03:00
strace.list linux-user/alpha: Fix epoll syscalls 2018-06-11 14:44:22 +02:00
syscall_defs.h linux-user: remove useless #if 2018-06-04 01:30:44 +02:00
syscall_types.h linux-user: Add FICLONE and FICLONERANGE ioctls 2017-02-16 15:29:30 +01:00
syscall.c linux-user: Export use is_error(), use it to avoid warnings 2018-06-11 14:40:11 +02:00
target_flat.h linux-user/FLAT: allow targets to override FLAT processing 2011-02-09 10:33:54 +02:00
trace-events trace-events: fix code style: print 0x before hex numbers 2017-08-01 12:13:07 +01:00
uaccess.c util: move declarations out of qemu-common.h 2016-03-22 22:20:17 +01:00
uname.c linux-user: Clean up includes 2016-01-29 15:07:22 +00:00
uname.h Clean up decorations and whitespace around header guards 2016-07-12 16:20:46 +02:00
vm86.c linux-user: Clean up includes 2016-01-29 15:07:22 +00:00