Commit Graph

1089 Commits

Author SHA1 Message Date
Richard Henderson
74841f044e exec: Move user-only watchpoint stubs inline
Let the user-only watchpoint stubs resolve to empty inline functions.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03 08:30:39 -07:00
Tony Nguyen
9bf825bf3d memory: Single byte swap along the I/O path
Now that MemOp has been pushed down into the memory API, and
callers are encoding endianness, we can collapse byte swaps
along the I/O path into the accelerator and target independent
adjust_endianness.

Collapsing byte swaps along the I/O path enables additional endian
inversion logic, e.g. SPARC64 Invert Endian TTE bit, with redundant
byte swaps cancelling out.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Tony Nguyen <tony.nguyen@bt.com>
Message-Id: <911ff31af11922a9afba9b7ce128af8b8b80f316.1566466906.git.tony.nguyen@bt.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03 08:30:39 -07:00
Tony Nguyen
d5d680cacc memory: Access MemoryRegion with endianness
Preparation for collapsing the two byte swaps adjust_endianness and
handle_bswap into the former.

Call memory_region_dispatch_{read|write} with endianness encoded into
the "MemOp op" operand.

This patch does not change any behaviour as
memory_region_dispatch_{read|write} is yet to handle the endianness.

Once it does handle endianness, callers with byte swaps can collapse
them into adjust_endianness.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Tony Nguyen <tony.nguyen@bt.com>
Message-Id: <8066ab3eb037c0388dfadfe53c5118429dd1de3a.1566466906.git.tony.nguyen@bt.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03 08:30:39 -07:00
Tony Nguyen
3d9e7c3e7b exec: Access MemoryRegion with MemOp
The memory_region_dispatch_{read|write} operand "unsigned size" is
being converted into a "MemOp op".

Convert interfaces by using no-op size_memop.

After all interfaces are converted, size_memop will be implemented
and the memory_region_dispatch_{read|write} operand "unsigned size"
will be converted into a "MemOp op".

As size_memop is a no-op, this patch does not change any behaviour.

Signed-off-by: Tony Nguyen <tony.nguyen@bt.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <3b042deef0a60dd49ae2320ece92120ba6027f2b.1566466906.git.tony.nguyen@bt.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03 08:30:38 -07:00
Peter Maydell
f3b8f18ebf Monitor patches for 2019-08-21
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl1dZKsSHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZTJ4QP/10izA+dSofQ9404GRq3TNzwRCKugU44
 nES9CqDh6x5emx+ADQWYkugblgfH9GOvUaAUNtY+uFaEr55yC/F+VWeVXvyjt5U6
 ZpPZqIRDOHo2+PZrddr/KcKmiomS6plz03m9bzb3pYN1yIl2ZzgClAhAqWQLk0WB
 wwiY+YsJ83YR4sdiRMZkuF+UL7N8fSqYvIIj0yzM8+8ONDor9n16PoPeFg3JSsyG
 aMxXIUnSBZAVtClaNkUPtS0Wf9XEuqoG1rvMRV4Vv+eeb7fwA414DqanRJdLlGMA
 yNRtFcVyztCfjgVEXnY9JJlFe6pDkoe8ycoimQ4YA60C9c1DIMHqyjFWXRHfDwk8
 bYMSX6CTpfoEvbTfmwqYR6KSkb/KuXiFDmcYlTYFvIt3grhhdHQbru9vy+E5sm/b
 j3CPV2DTCkeGY+oZFfKIaQT9yoWZOhmMY5doMTYyinXygPTGQROUrHtzUeRXKmJZ
 arqDRmh+mlEiGETNeYQCI45eYCSDYxO+UNrhszxhmv6B1+ixhIrV2oXhi61vVBeY
 yngY4EILbuA2Z/E4BevJk91ESWJTr3UP13c6p7yf21iN4BD1KkHy5HoXCgYfQDeV
 4kar49g6WQ/VQEiwhi65Xd0OwstynkcV69F+kMagVMgaLeRsdU5ikGJQzxTeWJRl
 SPpc7oDwuAS+
 =2F3E
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-monitor-2019-08-21' into staging

Monitor patches for 2019-08-21

# gpg: Signature made Wed 21 Aug 2019 16:35:07 BST
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-monitor-2019-08-21:
  monitor/qmp: Update comment for commit 4eaca8de26
  qdev: Collect HMP handlers command handlers in qdev-monitor.c
  qapi: Move query-target from misc.json to machine.json
  hw/core: Move cpu.c, cpu.h from qom/ to hw/core/

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-08-22 10:31:21 +01:00
Markus Armbruster
2e5b09fd0e hw/core: Move cpu.c, cpu.h from qom/ to hw/core/
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190709152053.16670-2-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
[Rebased onto merge commit 95a9457fd44; missed instances of qom/cpu.h
in comments replaced]
2019-08-21 13:24:01 +02:00
Paolo Bonzini
9458a9a1df memory: fix race between TCG and accesses to dirty bitmap
There is a race between TCG and accesses to the dirty log:

      vCPU thread                  reader thread
      -----------------------      -----------------------
      TLB check -> slow path
        notdirty_mem_write
          write to RAM
          set dirty flag
                                   clear dirty flag
      TLB check -> fast path
                                   read memory
        write to RAM

Fortunately, in order to fix it, no change is required to the
vCPU thread.  However, the reader thread must delay the read after
the vCPU thread has finished the write.  This can be approximated
conservatively by run_on_cpu, which waits for the end of the current
translation block.

A similar technique is used by KVM, which has to do a synchronous TLB
flush after doing a test-and-clear of the dirty-page flags.

Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-20 17:26:20 +02:00
Markus Armbruster
b58c5c2dd2 numa: Move remaining NUMA declarations from sysemu.h to numa.h
Commit e35704ba9c "numa: Move NUMA declarations from sysemu.h to
numa.h" left a few NUMA-related macros behind.  Move them now.

Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190812052359.30071-26-armbru@redhat.com>
2019-08-16 13:31:53 +02:00
Markus Armbruster
650d103d3e Include hw/hw.h exactly where needed
In my "build everything" tree, changing hw/hw.h triggers a recompile
of some 2600 out of 6600 objects (not counting tests and objects that
don't depend on qemu/osdep.h).

The previous commits have left only the declaration of hw_error() in
hw/hw.h.  This permits dropping most of its inclusions.  Touching it
now recompiles less than 200 objects.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190812052359.30071-19-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16 13:31:52 +02:00
Peter Xu
077874e01f memory: Introduce memory listener hook log_clear()
Introduce a new memory region listener hook log_clear() to allow the
listeners to hook onto the points where the dirty bitmap is cleared by
the bitmap users.

Previously log_sync() contains two operations:

  - dirty bitmap collection, and,
  - dirty bitmap clear on remote site.

Let's take KVM as example - log_sync() for KVM will first copy the
kernel dirty bitmap to userspace, and at the same time we'll clear the
dirty bitmap there along with re-protecting all the guest pages again.

We add this new log_clear() interface only to split the old log_sync()
into two separated procedures:

  - use log_sync() to collect the collection only, and,
  - use log_clear() to clear the remote dirty bitmap.

With the new interface, the memory listener users will still be able
to decide how to implement the log synchronization procedure, e.g.,
they can still only provide log_sync() method only and put all the two
procedures within log_sync() (that's how the old KVM works before
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is introduced).  However with this
new interface the memory listener users will start to have a chance to
postpone the log clear operation explicitly if the module supports.
That can really benefit users like KVM at least for host kernels that
support KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2.

There are three places that can clear dirty bits in any one of the
dirty bitmap in the ram_list.dirty_memory[3] array:

        cpu_physical_memory_snapshot_and_clear_dirty
        cpu_physical_memory_test_and_clear_dirty
        cpu_physical_memory_sync_dirty_bitmap

Currently we hook directly into each of the functions to notify about
the log_clear().

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190603065056.25211-7-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-07-15 15:39:02 +02:00
Peter Xu
5dea4079ad memory: Pass mr into snapshot_and_clear_dirty
Also we change the 2nd parameter of it to be the relative offset
within the memory region. This is to be used in follow up patches.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20190603065056.25211-6-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-07-15 15:39:02 +02:00
Like Xu
5cc8767d05 general: Replace global smp variables with smp machine properties
Basically, the context could get the MachineState reference via call
chains or unrecommended qdev_get_machine() in !CONFIG_USER_ONLY mode.

A local variable of the same name would be introduced in the declaration
phase out of less effort OR replace it on the spot if it's only used
once in the context. No semantic changes.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190518205428.90532-4-like.xu@linux.intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-07-05 17:07:36 -03:00
Markus Armbruster
a8d2532645 Include qemu-common.h exactly where needed
No header includes qemu-common.h after this commit, as prescribed by
qemu-common.h's file comment.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-5-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c
block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c
target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h
target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h
target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h
target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and
net/tap-bsd.c fixed up]
2019-06-12 13:20:20 +02:00
Markus Armbruster
14a48c1d0d qemu-common: Move tcg_enabled() etc. to sysemu/tcg.h
Other accelerators have their own headers: sysemu/hax.h, sysemu/hvf.h,
sysemu/kvm.h, sysemu/whpx.h.  Only tcg_enabled() & friends sit in
qemu-common.h.  This necessitates inclusion of qemu-common.h into
headers, which is against the rules spelled out in qemu-common.h's
file comment.

Move tcg_enabled() & friends into their own header sysemu/tcg.h, and
adjust #include directives.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-2-armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[Rebased with conflicts resolved automatically, except for
accel/tcg/tcg-all.c]
2019-06-11 20:22:09 +02:00
Peter Maydell
06e6433955 Machine queue, 2019-04-25
* 4.1 machine-types (Cornelia Huck)
 * Support MAP_SYNC on pmem memory backends (Zhang Yi)
 * -cpu parsing fixes and cleanups (Eduardo Habkost)
 * machine initialization cleanups (Wei Yang, Markus Armbruster)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJcwfRxAAoJECgHk2+YTcWmBegP/1alp8qiO/JdSkI/+jw9iUBC
 SviMwFrQVdKWT5ou/aYTM3apqrwC9XLUQ2vuNzLQDURG+SbcCf5BLvSrcvg9iR6z
 ASUot7ta1QtkR361dL0akhvqH8pNXpGolq5VleQqBOWAGUVjgrbWuwPlFVz9TZ8R
 LaVwDITv0fpQwtq+hB4b9hiDkebZFE4/xkNyxpaoJGzaePe1sCqACzNe1/PQ15ni
 gmd+VQ1qX3frUTSZcaWTrJIdQvZlkaD+pmEiwo969EE4U9ZGwwPRpShmeHnjuKDQ
 ufTGo05+/ikqp8refxA/XqyveHeJ69JSFNLCz2QwAgdwN/OXRG306Ln69vFNuX0D
 rfMJBvKZotc7enN08aQN1m1Sm0Y+2xo9RQgFUynZnzauQXKiEndLPHyjbbQ+pAPQ
 TmHrUQnmYSvoELewrCaq4XloXrd3X57U3K19ksqF+3meApQ7fuY9dQF2A2bE+aB7
 OhiMqdw9HVAjSzplKa5jPniSc5vgRCdr9AtX5B2RJdsQEv72JfwsOYB0DnrF4hyo
 NJz7HyS28xkbKrfbhztr8WoV8nPYvdS+xjSfim8YS6lFaNDnWZl2ybp/Trr1HItv
 TbDtPSx/IePHhIXd63aXkDt7FSoUib6+fCi8Wssuuo+MJMZfHacpWHkx2bVwSuf6
 doOaY/KY8mAq5DiM09zz
 =MNVq
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ehabkost/tags/machine-next-pull-request' into staging

Machine queue, 2019-04-25

* 4.1 machine-types (Cornelia Huck)
* Support MAP_SYNC on pmem memory backends (Zhang Yi)
* -cpu parsing fixes and cleanups (Eduardo Habkost)
* machine initialization cleanups (Wei Yang, Markus Armbruster)

# gpg: Signature made Thu 25 Apr 2019 18:54:57 BST
# gpg:                using RSA key 2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" [full]
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/machine-next-pull-request:
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  linux-headers: add linux/mman.h.
  scripts/update-linux-headers: add linux/mman.h
  util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap
  cpu: Fix crash with empty -cpu option
  cpu: Rename parse_cpu_model() to parse_cpu_option()
  vl: Simplify machine_parse()
  vl: Clean up after previous commit
  vl.c: allocate TYPE_MACHINE list once during bootup
  vl.c: make find_default_machine() local
  hw: add compat machines for 4.1

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-04-26 14:30:18 +01:00
Zhang Yi
2ac0f1621c util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap
besides the existing 'shared' flags, we are going to add
'is_pmem' to qemu_ram_mmap(), which indicated the memory backend
file is a persist memory.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Message-Id: <786c46862cfeb253ee0ea2f44d62ffe76edb7fa4.1549555521.git.yi.z.zhang@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-04-25 14:17:36 -03:00
Eduardo Habkost
5b863f3e2f cpu: Fix crash with empty -cpu option
Fix the following crash:

  $ qemu-system-x86_64 -cpu ''
  qemu-system-x86_64: qom/cpu.c:291: cpu_class_by_name: \
      Assertion `cpu_model && cc->class_by_name' failed.

Regression test script included.

Fixes: 99193d8f2e ("cpu: drop unnecessary NULL check and cpu_common_class_by_name()")
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190418034501.5038-1-ehabkost@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-04-25 14:17:35 -03:00
Eduardo Habkost
c1c8cfe5f9 cpu: Rename parse_cpu_model() to parse_cpu_option()
The "model[,option...]" string parsed by the function is not just
a CPU model.  Rename the function and its argument to indicate it
expects the full "-cpu" option to be provided.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190417025944.16154-2-ehabkost@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-04-25 14:17:35 -03:00
David Hildenbrand
905b7ee4d6 exec: Introduce qemu_maxrampagesize() and rename qemu_getrampagesize()
Rename qemu_getrampagesize() to qemu_minrampagesize(). While at it,
properly rename find_max_supported_pagesize() to
find_min_backend_pagesize().

s390x is actually interested into the maximum ram pagesize, so
introduce and use qemu_maxrampagesize().

Add a TODO, indicating that looking at any mapped memory backends is not
100% correct in some cases.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190417113143.5551-3-david@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-04-25 13:47:27 +02:00
Markus Armbruster
90c84c5600 qom/cpu: Simplify how CPUClass:cpu_dump_state() prints
CPUClass method dump_statistics() takes an fprintf()-like callback and
a FILE * to pass to it.  Most callers pass fprintf() and stderr.
log_cpu_state() passes fprintf() and qemu_log_file.
hmp_info_registers() passes monitor_fprintf() and the current monitor
cast to FILE *.  monitor_fprintf() casts it right back, and is
otherwise identical to monitor_printf().

The callback gets passed around a lot, which is tiresome.  The
type-punning around monitor_fprintf() is ugly.

Drop the callback, and call qemu_fprintf() instead.  Also gets rid of
the type-punning, since qemu_fprintf() takes NULL instead of the
current monitor cast to FILE *.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-15-armbru@redhat.com>
2019-04-18 22:18:59 +02:00
Markus Armbruster
b6b71cb5c6 memory: Clean up how mtree_info() prints
mtree_info() takes an fprintf()-like callback and a FILE * to pass to
it, and so do its helper functions.  Passing around callback and
argument is rather tiresome.

Its only caller hmp_info_mtree() passes monitor_printf() cast to
fprintf_function and the current monitor cast to FILE *.

The type-punning is technically undefined behaviour, but works in
practice.  Clean up: drop the callback, and call qemu_printf()
instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-9-armbru@redhat.com>
2019-04-18 22:18:59 +02:00
David Gibson
7d5489e6d1 exec: Only count mapped memory backends for qemu_getrampagesize()
qemu_getrampagesize() works out the minimum host page size backing any of
guest RAM.  This is required in a few places, such as for POWER8 PAPR KVM
guests, because limitations of the hardware virtualization mean the guest
can't use pagesizes larger than the host pages backing its memory.

However, it currently checks against *every* memory backend, whether or not
it is actually mapped into guest memory at the moment.  This is incorrect.

This can cause a problem attempting to add memory to a POWER8 pseries KVM
guest which is configured to allow hugepages in the guest (e.g.
-machine cap-hpt-max-page-size=16m).  If you attempt to add non-hugepage,
you can (correctly) create a memory backend, however it (correctly) will
throw an error when you attempt to map that memory into the guest by
'device_add'ing a pc-dimm.

What's not correct is that if you then reset the guest a startup check
against qemu_getrampagesize() will cause a fatal error because of the new
memory object, even though it's not mapped into the guest.

This patch corrects the problem by adjusting find_max_supported_pagesize()
(called from qemu_getrampagesize() via object_child_foreach) to exclude
non-mapped memory backends.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
2019-03-29 14:24:08 +11:00
Wei Yang
494d199727 exec.c: refactor function flatview_add_to_dispatch()
flatview_add_to_dispatch() registers page based on the condition of
*section*, which may looks like this:

    |s|PPPPPPP|s|

where s stands for subpage and P for page.

The procedure of this function could be described as:

    - register first subpage
    - register page
    - register last subpage

This means the procedure could be simplified into these three steps
instead of a loop iteration.

This patch refactors the function into three corresponding steps and
adds some comment to clarify it.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190311054252.6094-1-richardw.yang@linux.intel.com>
[Paolo: move exit before adjustment of remain.offset_within_*,
 otherwise int128_get64 fails when a region is 2^64 bytes long]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-11 16:51:42 +01:00
Yury Kotov
fbd162e629 migration: Add an ability to ignore shared RAM blocks
If ignore-shared capability is set then skip shared RAMBlocks during the
RAM migration.
Also, move qemu_ram_foreach_migratable_block (and rename) to the
migration code, because it requires access to the migration capabilities.

Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190215174548.2630-4-yury-kotov@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06 10:49:17 +00:00
Yury Kotov
754cb9c0eb exec: Change RAMBlockIterFunc definition
Currently, qemu_ram_foreach_* calls RAMBlockIterFunc with many
block-specific arguments. But often iter func needs RAMBlock*.
This refactoring is needed for fast access to RAMBlock flags from
qemu_ram_foreach_block's callback. The only way to achieve this now
is to call qemu_ram_block_from_host (which also enumerates blocks).

So, this patch reduces complexity of
qemu_ram_foreach_block() -> cb() -> qemu_ram_block_from_host()
from O(n^2) to O(n).

Fix RAMBlockIterFunc definition and add some functions to read
RAMBlock* fields witch were passed.

Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190215174548.2630-2-yury-kotov@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06 10:49:17 +00:00
Li Zhijian
0c249ff71c unify len and addr type for memory/address APIs
Some address/memory APIs have different type between
'hwaddr/target_ulong addr' and 'int len'. It is very unsafe, especially
some APIs will be passed a non-int len by caller which might cause
overflow quietly.
Below is an potential overflow case:
    dma_memory_read(uint32_t len)
      -> dma_memory_rw(uint32_t len)
        -> dma_memory_rw_relaxed(uint32_t len)
          -> address_space_rw(int len) # len overflow

CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Peter Crosthwaite <crosthwaite.peter@gmail.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Peter Maydell <peter.maydell@linaro.org>
CC: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-05 16:50:18 +01:00
Murilo Opsfelder Araujo
53adb9d43e mmap-alloc: fix hugetlbfs misaligned length in ppc64
The commit 7197fb4058 ("util/mmap-alloc:
fix hugetlb support on ppc64") fixed Huge TLB mappings on ppc64.

However, we still need to consider the underlying huge page size
during munmap() because it requires that both address and length be a
multiple of the underlying huge page size for Huge TLB mappings.
Quote from "Huge page (Huge TLB) mappings" paragraph under NOTES
section of the munmap(2) manual:

  "For munmap(), addr and length must both be a multiple of the
  underlying huge page size."

On ppc64, the munmap() in qemu_ram_munmap() does not work for Huge TLB
mappings because the mapped segment can be aligned with the underlying
huge page size, not aligned with the native system page size, as
returned by getpagesize().

This has the side effect of not releasing huge pages back to the pool
after a hugetlbfs file-backed memory device is hot-unplugged.

This patch fixes the situation in qemu_ram_mmap() and
qemu_ram_munmap() by considering the underlying page size on ppc64.

After this patch, memory hot-unplug releases huge pages back to the
pool.

Fixes: 7197fb4058
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-04 18:44:20 +11:00
Peter Maydell
5601be3b01 exec.c: Don't reallocate IOMMUNotifiers that are in use
The tcg_register_iommu_notifier() code has a GArray of
TCGIOMMUNotifier structs which it has registered by passing
memory_region_register_iommu_notifier() a pointer to the embedded
IOMMUNotifier field. Unfortunately, if we need to enlarge the
array via g_array_set_size() this can cause a realloc(), which
invalidates the pointer that memory_region_register_iommu_notifier()
put into the MemoryRegion's iommu_notify list. This can result
in segfaults.

Switch the GArray to holding pointers to the TCGIOMMUNotifier
structs, so that we can individually allocate and free them.

Cc: qemu-stable@nongnu.org
Fixes: 1f871c5e6b ("exec.c: Handle IOMMUs in address_space_translate_for_iotlb()")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190128174241.5860-1-peter.maydell@linaro.org
2019-02-01 14:55:45 +00:00
Stefan Hajnoczi
047be4ed24 memory: add memory_region_flush_rom_device()
ROM devices go via MemoryRegionOps->write() callbacks for write
operations and do not dirty/invalidate that memory.  Device emulation
must be able to mark memory ranges that have been modified internally
(e.g. using memory_region_get_ram_ptr()).

Introduce the memory_region_flush_rom_device() API for this purpose.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190123212234.32068-2-stefanha@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: fix block comment style]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-01-29 11:46:04 +00:00
Peter Maydell
ea7a5330b7 exec.c: Use correct attrs in cpu_memory_rw_debug()
In the softmmu version of cpu_memory_rw_debug(), we ask the
CPU for the attributes to use for the virtual memory access,
and we correctly use those to identify the address space
index. However, we were not passing them in to the
address_space_write_rom() and address_space_rw() functions.

The effect of this was that a memory access from the gdbstub
to a device which had behaviour that was sensitive to the
memory attributes (such as some ARMv8M NVIC registers) was
incorrectly always performed as if non-secure, rather than
using the right security state for the CPU's current state.

Fixes: https://bugs.launchpad.net/qemu/+bug/1812091

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190117133834.7480-1-peter.maydell@linaro.org
2019-01-29 11:46:04 +00:00
Paolo Bonzini
f481ee2d5e qemu/queue.h: typedef QTAILQ heads
This will be needed when we change the QTAILQ head and elem structs
to unions.  However, it is also consistent with the usage elsewhere
in QEMU for other list head structs (see for example FsMountList).

Note that most QTAILQs only need their name in order to do backwards
walks.  Those do not break with the struct->union change, and anyway
the change will also remove the need to name heads when doing backwards
walks, so those are not touched here.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 15:46:55 +01:00
Paolo Bonzini
b58deb344d qemu/queue.h: leave head structs anonymous unless necessary
Most list head structs need not be given a name.  In most cases the
name is given just in case one is going to use QTAILQ_LAST, QTAILQ_PREV
or reverse iteration, but this does not apply to lists of other kinds,
and even for QTAILQ in practice this is only rarely needed.  In addition,
we will soon reimplement those macros completely so that they do not
need a name for the head struct.  So clean up everything, not giving a
name except in the rare case where it is necessary.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 15:46:55 +01:00
Peter Maydell
3c8133f973 Rename cpu_physical_memory_write_rom() to address_space_write_rom()
The API of cpu_physical_memory_write_rom() is odd, because it
takes an AddressSpace, unlike all the other cpu_physical_memory_*
access functions. Rename it to address_space_write_rom(), and
bring its API into line with address_space_write().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20181122133507.30950-3-peter.maydell@linaro.org
2018-12-14 13:30:48 +00:00
Peter Maydell
75693e1411 exec.c: Rename cpu_physical_memory_write_rom_internal()
Rename cpu_physical_memory_write_rom_internal() to
address_space_write_rom_internal(), and make it take
MemTxAttrs and return a MemTxResult. This brings its
API into line with address_space_write().

This is an internal function to exec.c; fixing its API
will allow us to change the global function
cpu_physical_memory_write_rom().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20181122133507.30950-2-peter.maydell@linaro.org
2018-12-14 13:30:48 +00:00
Emilio G. Cota
5005e2537d exec: introduce tlb_init
Paves the way for the addition of a per-TLB lock.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181009174557.16125-4-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-18 18:58:10 -07:00
Thomas Huth
c95ac10340 cpu: Provide a proper prototype for target_words_bigendian() in a header
We've got three places already that provide a prototype for this
function in a .c file - that's ugly. Let's provide a proper prototype
in a header instead, with a proper description why this function should
not be used in most cases.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-10-17 08:41:43 +02:00
Hikaru Nishida
d5dbde4645 hostmem-file: make available memory-backend-file on POSIX-based hosts
Before this change, memory-backend-file object is valid for Linux hosts
only because hostmem-file.c is compiled only on Linux hosts.
However, other POSIX-based hosts (such as macOS) can support
memory-backend-file object in the same way as on Linux hosts.
This patch makes hostmem-file.c and related functions to be compiled on
all POSIX-based hosts to make available memory-backend-file on them.

Signed-off-by: Hikaru Nishida <hikarupsp@gmail.com>
Message-Id: <20180924123205.29651-1-hikarupsp@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02 19:09:13 +02:00
Peter Maydell
55f4e79d79 pc: fixes
This includes nvdimm persistence fixes queued before the release.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbepoTAAoJECgfDbjSjVRpLioH/3BPps8FLh4x2gZSq3B+u72O
 RYUA3I3TilEGyc9yf8o7e1Hf+pQAJBEmulcnKxXFVWZIJ1GVLPt4NZCMQGiPDnJL
 +RCT/Q64PUy09hRjddAasikrvXa4YOsRgBgJJToO7v9PSQSaU3fC7O3hNea7KcF/
 C4SSqkUgxyDhCCYHHblpKxFz/wtwy4ZaCGSdozIdmKNPJ6/ye8wOQ1Mq9e1Mwp18
 S6ilJub5IwB6aM2KVMmX4AFomF4u2cn153ts8fI+Dyo4/NE6P4+viDlz3BOBKdzm
 kmd49h6/n4Lenoo4oI1yNHSuIJJTVfvnoLu6rG7mPbQKgxNd1uN4KuUIygU5PCY=
 =Xcaj
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc: fixes

This includes nvdimm persistence fixes queued before the release.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Mon 20 Aug 2018 11:38:11 BST
# gpg:                using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream:
  migration/ram: ensure write persistence on loading all data to PMEM.
  migration/ram: Add check and info message to nvdimm post copy.
  mem/nvdimm: ensure write persistence to PMEM in label emulation
  hostmem-file: add the 'pmem' option
  configure: add libpmem support
  memory, exec: switch file ram allocation functions to 'flags' parameters
  memory, exec: Expose all memory block related flags.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-08-21 10:23:53 +01:00
Peter Maydell
55a7cb144d accel/tcg: Check whether TLB entry is RAM consistently with how we set it up
We set up TLB entries in tlb_set_page_with_attrs(), where we have
some logic for determining whether the TLB entry is considered
to be RAM-backed, and thus has a valid addend field. When we
look at the TLB entry in get_page_addr_code(), we use different
logic for determining whether to treat the page as RAM-backed
and use the addend field. This is confusing, and in fact buggy,
because the code in tlb_set_page_with_attrs() correctly decides
that rom_device memory regions not in romd mode are not RAM-backed,
but the code in get_page_addr_code() thinks they are RAM-backed.
This typically results in "Bad ram pointer" assertion if the
guest tries to execute from such a memory region.

Fix this by making get_page_addr_code() just look at the
TLB_MMIO bit in the code_address field of the TLB, which
tlb_set_page_with_attrs() sets if and only if the addend
field is not valid for code execution.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180713150945.12348-1-peter.maydell@linaro.org
2018-08-14 17:17:19 +01:00
Junyan He
a4de8552b2 hostmem-file: add the 'pmem' option
When QEMU emulates vNVDIMM labels and migrates vNVDIMM devices, it
needs to know whether the backend storage is a real persistent memory,
in order to decide whether special operations should be performed to
ensure the data persistence.

This boolean option 'pmem' allows users to specify whether the backend
storage of memory-backend-file is a real persistent memory. If
'pmem=on', QEMU will set the flag RAM_PMEM in the RAM block of the
corresponding memory region. If 'pmem' is set while lack of libpmem
support, a error is generated.

Signed-off-by: Junyan He <junyan.he@intel.com>
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-08-10 13:29:39 +03:00
Junyan He
cbfc017103 memory, exec: switch file ram allocation functions to 'flags' parameters
As more flag parameters besides the existing 'share' are going to be
added to following functions
memory_region_init_ram_from_file
qemu_ram_alloc_from_fd
qemu_ram_alloc_from_file
let's switch them to use the 'flags' parameters so as to ease future
flag additions.

The existing 'share' flag is converted to the RAM_SHARED bit in ram_flags,
and other flag bits are ignored by above functions right now.

Signed-off-by: Junyan He <junyan.he@intel.com>
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2018-08-10 13:29:39 +03:00
Junyan He
b0e5de9381 memory, exec: Expose all memory block related flags.
We need to use these flags in other files rather than just in exec.c,
For example, RAM_SHARED should be used when create a ram block from file.
We expose them the exec/memory.h

Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-08-10 13:29:39 +03:00
Paolo Bonzini
c40d479207 tcg: simplify !CONFIG_TCG handling of tb_invalidate_*
There is no need for a stub, since tb_invalidate_phys_addr can be excised
altogether when TCG is disabled.  This is a bit cleaner since it avoids
using code that is clearly specific to user-mode emulation (it calls
mmap_lock/unlock) for the !CONFIG_TCG case.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-02 15:41:18 +02:00
Philippe Mathieu-Daudé
646f34fa54 tcg: Fix --disable-tcg build breakage
Fix the --disable-tcg breakage introduced by 8bca9a03ec:

    $ configure --disable-tcg
    [...]
    $ make -C i386-softmmu exec.o
    make: Entering directory 'i386-softmmu'
      CC      exec.o
    In file included from source/qemu/exec.c:62:0:
    source/qemu/include/exec/ram_addr.h:96:6: error: conflicting types for ‘tb_invalidate_phys_range’
     void tb_invalidate_phys_range(ram_addr_t start, ram_addr_t end);
          ^~~~~~~~~~~~~~~~~~~~~~~~
    In file included from source/qemu/exec.c:24:0:
    source/qemu/include/exec/exec-all.h:309:6: note: previous declaration of ‘tb_invalidate_phys_range’ was here
     void tb_invalidate_phys_range(target_ulong start, target_ulong end);
          ^~~~~~~~~~~~~~~~~~~~~~~~
    source/qemu/exec.c:1043:6: error: conflicting types for ‘tb_invalidate_phys_addr’
     void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr, MemTxAttrs attrs)
          ^~~~~~~~~~~~~~~~~~~~~~~
    In file included from source/qemu/exec.c:24:0:
    source/qemu/include/exec/exec-all.h:308:6: note: previous declaration of ‘tb_invalidate_phys_addr’ was here
     void tb_invalidate_phys_addr(target_ulong addr);
          ^~~~~~~~~~~~~~~~~~~~~~~
    make: *** [source/qemu/rules.mak:69: exec.o] Error 1
    make: Leaving directory 'i386-softmmu'

Tested to build x86_64-softmmu and i386-softmmu targets.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180629200710.27626-1-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-07-02 13:42:05 +01:00
David Hildenbrand
61362b71c1 exec: check that alignment is a power of two
Right now we can crash QEMU using e.g.

qemu-system-x86_64 -m 256M,maxmem=20G,slots=2 \
 -object memory-backend-file,id=mem0,size=12288,mem-path=/dev/zero,align=12288 \
 -device pc-dimm,id=dimm1,memdev=mem0

qemu-system-x86_64: util/mmap-alloc.c:115:
 qemu_ram_mmap: Assertion `is_power_of_2(align)' failed

Fix this by adding a proper check.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180607154705.6316-3-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-28 19:05:31 +02:00
Paolo Bonzini
8bca9a03ec move public invalidate APIs out of translate-all.{c,h}, clean up
Place them in exec.c, exec-all.h and ram_addr.h.  This removes
knowledge of translate-all.h (which is an internal header) from
several files outside accel/tcg and removes knowledge of
AddressSpace from translate-all.c (as it only operates on ram_addr_t).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-28 19:05:30 +02:00
Eric Auger
a99761d3c8 exec: Fix MAP_RAM for cached access
When an IOMMUMemoryRegion is in front of a virtio device,
address_space_cache_init does not set cache->ptr as the memory
region is not RAM. However when the device performs an access,
we end up in glue() which performs the translation and then uses
MAP_RAM. This latter uses the unset ptr and returns a wrong value
which leads to a SIGSEV in address_space_lduw_internal_cached_slow,
for instance.

In slow path cache->ptr is NULL and MAP_RAM must redirect to
qemu_map_ram_ptr((mr)->ram_block, ofs).

As MAP_RAM, IS_DIRECT and INVALIDATE are the same in _cached_slow
and non cached mode, let's remove those macros.

This fixes the use cases featuring vIOMMU (Intel and ARM SMMU)
which lead to a SIGSEV.

Fixes: 48564041a7 (exec: reintroduce MemoryRegion caching)
Signed-off-by: Eric Auger <eric.auger@redhat.com>

Message-Id: <1528895946-28677-1-git-send-email-eric.auger@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-28 19:05:30 +02:00
David Hildenbrand
c136180c90 postcopy: drop ram_pages parameter from postcopy_ram_incoming_init()
Not needed. Don't expose last_ram_page().

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180620202736.21399-1-david@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-06-27 13:28:31 +02:00
Emilio G. Cota
f28d0dfdce tcg: fix --disable-tcg build breakage
Fix the --disable-tcg breakage introduced by tb_lock's removal by
relying on the fact that tcg_enabled() is set to 0 at
compile-time under --disable-tcg.

While at it, add further asserts to fix builds that enable both
--disable-tcg and --enable-debug, which were broken even before
tb_lock's removal.

Tested to build x86_64-softmmu and i386-softmmu targets.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-22 18:55:24 +01:00
Emilio G. Cota
0ac20318ce tcg: remove tb_lock
Use mmap_lock in user-mode to protect TCG state and the page descriptors.
In !user-mode, each vCPU has its own TCG state, so no locks needed.
Per-page locks are used to protect the page descriptors.

Per-TB locks are used in both modes to protect TB jumps.

Some notes:

- tb_lock is removed from notdirty_mem_write by passing a
  locked page_collection to tb_invalidate_phys_page_fast.

- tcg_tb_lookup/remove/insert/etc have their own internal lock(s),
  so there is no need to further serialize access to them.

- do_tb_flush is run in a safe async context, meaning no other
  vCPU threads are running. Therefore acquiring mmap_lock there
  is just to please tools such as thread sanitizer.

- Not visible in the diff, but tb_invalidate_phys_page already
  has an assert_memory_lock.

- cpu_io_recompile is !user-only, so no mmap_lock there.

- Added mmap_unlock()'s before all siglongjmp's that could
  be called in user-mode while mmap_lock is held.
  + Added an assert for !have_mmap_lock() after returning from
    the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic.

Performance numbers before/after:

Host: AMD Opteron(tm) Processor 6376

                 ubuntu 17.04 ppc64 bootup+shutdown time

  700 +-+--+----+------+------------+-----------+------------*--+-+
      |    +    +      +            +           +           *B    |
      |         before ***B***                            ** *    |
      |tb lock removal ###D###                         ***        |
  600 +-+                                           ***         +-+
      |                                           **         #    |
      |                                        *B*          #D    |
      |                                     *** *         ##      |
  500 +-+                                ***           ###      +-+
      |                             * ***           ###           |
      |                            *B*          # ##              |
      |                          ** *          #D#                |
  400 +-+                      **            ##                 +-+
      |                      **           ###                     |
      |                    **           ##                        |
      |                  **         # ##                          |
  300 +-+  *           B*          #D#                          +-+
      |    B         ***        ###                               |
      |    *       **       ####                                  |
      |     *   ***      ###                                      |
  200 +-+   B  *B     #D#                                       +-+
      |     #B* *   ## #                                          |
      |     #*    ##                                              |
      |    + D##D#     +            +           +            +    |
  100 +-+--+----+------+------------+-----------+------------+--+-+
           1    8      16      Guest CPUs       48           64
  png: https://imgur.com/HwmBHXe

              debian jessie aarch64 bootup+shutdown time

  90 +-+--+-----+-----+------------+------------+------------+--+-+
     |    +     +     +            +            +            +    |
     |         before ***B***                                B    |
  80 +tb lock removal ###D###                              **D  +-+
     |                                                   **###    |
     |                                                 **##       |
  70 +-+                                             ** #       +-+
     |                                             ** ##          |
     |                                           **  #            |
  60 +-+                                       *B  ##           +-+
     |                                       **  ##               |
     |                                    ***  #D                 |
  50 +-+                               ***   ##                 +-+
     |                             * **   ###                     |
     |                           **B*  ###                        |
  40 +-+                     ****  # ##                         +-+
     |                   ****     #D#                             |
     |             ***B**      ###                                |
  30 +-+    B***B**        ####                                 +-+
     |    B *   *     # ###                                       |
     |     B       ###D#                                          |
  20 +-+   D  ##D##                                             +-+
     |      D#                                                    |
     |    +     +     +            +            +            +    |
  10 +-+--+-----+-----+------------+------------+------------+--+-+
          1     8     16      Guest CPUs        48           64
  png: https://imgur.com/iGpGFtv

The gains are high for 4-8 CPUs. Beyond that point, however, unrelated
lock contention significantly hurts scalability.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15 08:18:48 -10:00