As more flag parameters besides the existing 'share' are going to be
added to following functions
memory_region_init_ram_from_file
qemu_ram_alloc_from_fd
qemu_ram_alloc_from_file
let's switch them to use the 'flags' parameters so as to ease future
flag additions.
The existing 'share' flag is converted to the RAM_SHARED bit in ram_flags,
and other flag bits are ignored by above functions right now.
Signed-off-by: Junyan He <junyan.he@intel.com>
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
We need to use these flags in other files rather than just in exec.c,
For example, RAM_SHARED should be used when create a ram block from file.
We expose them the exec/memory.h
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The condition to check whether an address has hit against a particular
TLB entry is not completely trivial. We do this in various places, and
in fact in one place (get_page_addr_code()) we have got the condition
wrong. Abstract it out into new tlb_hit() and tlb_hit_page() inline
functions (one for a known-page-aligned address and one for an
arbitrary address), and use them in all the places where we had the
condition correct.
This is a no-behaviour-change patch; we leave fixing the buggy
code in get_page_addr_code() to a subsequent patch.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20180629162122.19376-2-peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
There is no need for a stub, since tb_invalidate_phys_addr can be excised
altogether when TCG is disabled. This is a bit cleaner since it avoids
using code that is clearly specific to user-mode emulation (it calls
mmap_lock/unlock) for the !CONFIG_TCG case.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fix the --disable-tcg breakage introduced by 8bca9a03ec:
$ configure --disable-tcg
[...]
$ make -C i386-softmmu exec.o
make: Entering directory 'i386-softmmu'
CC exec.o
In file included from source/qemu/exec.c:62:0:
source/qemu/include/exec/ram_addr.h:96:6: error: conflicting types for ‘tb_invalidate_phys_range’
void tb_invalidate_phys_range(ram_addr_t start, ram_addr_t end);
^~~~~~~~~~~~~~~~~~~~~~~~
In file included from source/qemu/exec.c:24:0:
source/qemu/include/exec/exec-all.h:309:6: note: previous declaration of ‘tb_invalidate_phys_range’ was here
void tb_invalidate_phys_range(target_ulong start, target_ulong end);
^~~~~~~~~~~~~~~~~~~~~~~~
source/qemu/exec.c:1043:6: error: conflicting types for ‘tb_invalidate_phys_addr’
void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr, MemTxAttrs attrs)
^~~~~~~~~~~~~~~~~~~~~~~
In file included from source/qemu/exec.c:24:0:
source/qemu/include/exec/exec-all.h:308:6: note: previous declaration of ‘tb_invalidate_phys_addr’ was here
void tb_invalidate_phys_addr(target_ulong addr);
^~~~~~~~~~~~~~~~~~~~~~~
make: *** [source/qemu/rules.mak:69: exec.o] Error 1
make: Leaving directory 'i386-softmmu'
Tested to build x86_64-softmmu and i386-softmmu targets.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180629200710.27626-1-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This adds owners/parents (which are the same, just occasionally
owner==NULL) printing for memory regions; a new '-o' flag
enabled new output.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20180604032511.6980-1-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Place them in exec.c, exec-all.h and ram_addr.h. This removes
knowledge of translate-all.h (which is an internal header) from
several files outside accel/tcg and removes knowledge of
AddressSpace from translate-all.c (as it only operates on ram_addr_t).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Not needed. Don't expose last_ram_page().
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180620202736.21399-1-david@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
trace_mem_build_info expects a size_shift for its first argument. Fix it.
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-id: 1527028012-21888-2-git-send-email-cota@braap.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Add support for MMU protection regions that are smaller than
TARGET_PAGE_SIZE. We do this by marking the TLB entry for those
pages with a flag TLB_RECHECK. This flag causes us to always
take the slow-path for accesses. In the slow path we can then
special case them to always call tlb_fill() again, so we have
the correct information for the exact address being accessed.
This change allows us to handle reading and writing from small
regions; we cannot deal with execution from the small region.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180620130619.11362-2-peter.maydell@linaro.org
Use mmap_lock in user-mode to protect TCG state and the page descriptors.
In !user-mode, each vCPU has its own TCG state, so no locks needed.
Per-page locks are used to protect the page descriptors.
Per-TB locks are used in both modes to protect TB jumps.
Some notes:
- tb_lock is removed from notdirty_mem_write by passing a
locked page_collection to tb_invalidate_phys_page_fast.
- tcg_tb_lookup/remove/insert/etc have their own internal lock(s),
so there is no need to further serialize access to them.
- do_tb_flush is run in a safe async context, meaning no other
vCPU threads are running. Therefore acquiring mmap_lock there
is just to please tools such as thread sanitizer.
- Not visible in the diff, but tb_invalidate_phys_page already
has an assert_memory_lock.
- cpu_io_recompile is !user-only, so no mmap_lock there.
- Added mmap_unlock()'s before all siglongjmp's that could
be called in user-mode while mmap_lock is held.
+ Added an assert for !have_mmap_lock() after returning from
the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic.
Performance numbers before/after:
Host: AMD Opteron(tm) Processor 6376
ubuntu 17.04 ppc64 bootup+shutdown time
700 +-+--+----+------+------------+-----------+------------*--+-+
| + + + + + *B |
| before ***B*** ** * |
|tb lock removal ###D### *** |
600 +-+ *** +-+
| ** # |
| *B* #D |
| *** * ## |
500 +-+ *** ### +-+
| * *** ### |
| *B* # ## |
| ** * #D# |
400 +-+ ** ## +-+
| ** ### |
| ** ## |
| ** # ## |
300 +-+ * B* #D# +-+
| B *** ### |
| * ** #### |
| * *** ### |
200 +-+ B *B #D# +-+
| #B* * ## # |
| #* ## |
| + D##D# + + + + |
100 +-+--+----+------+------------+-----------+------------+--+-+
1 8 16 Guest CPUs 48 64
png: https://imgur.com/HwmBHXe
debian jessie aarch64 bootup+shutdown time
90 +-+--+-----+-----+------------+------------+------------+--+-+
| + + + + + + |
| before ***B*** B |
80 +tb lock removal ###D### **D +-+
| **### |
| **## |
70 +-+ ** # +-+
| ** ## |
| ** # |
60 +-+ *B ## +-+
| ** ## |
| *** #D |
50 +-+ *** ## +-+
| * ** ### |
| **B* ### |
40 +-+ **** # ## +-+
| **** #D# |
| ***B** ### |
30 +-+ B***B** #### +-+
| B * * # ### |
| B ###D# |
20 +-+ D ##D## +-+
| D# |
| + + + + + + |
10 +-+--+-----+-----+------------+------------+------------+--+-+
1 8 16 Guest CPUs 48 64
png: https://imgur.com/iGpGFtv
The gains are high for 4-8 CPUs. Beyond that point, however, unrelated
lock contention significantly hurts scalability.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This applies to both user-mode and !user-mode emulation.
Instead of relying on a global lock, protect the list of incoming
jumps with tb->jmp_lock. This lock also protects tb->cflags,
so update all tb->cflags readers outside tb->jmp_lock to use
atomic reads via tb_cflags().
In order to find the destination TB (and therefore its jmp_lock)
from the origin TB, we introduce tb->jmp_dest[].
I considered not using a linked list of jumps, which simplifies
code and makes the struct smaller. However, it unnecessarily increases
memory usage, which results in a performance decrease. See for
instance these numbers booting+shutting down debian-arm:
Time (s) Rel. err (%) Abs. err (s) Rel. slowdown (%)
------------------------------------------------------------------------------
before 20.88 0.74 0.154512 0.
after 20.81 0.38 0.079078 -0.33524904
GTree 21.02 0.28 0.058856 0.67049808
GHashTable + xxhash 21.63 1.08 0.233604 3.5919540
Using a hash table or a binary tree to keep track of the jumps
doesn't really pay off, not only due to the increased memory usage,
but also because most TBs have only 0 or 1 jumps to them. The maximum
number of jumps when booting debian-arm that I measured is 35, but
as we can see in the histogram below a TB with that many incoming jumps
is extremely rare; the average TB has 0.80 incoming jumps.
n_jumps: 379208; avg jumps/tb: 0.801099
dist: [0.0,1.0)|▄█▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁ ▁▁▁ ▁▁▁ ▁|[34.0,35.0]
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The appended adds assertions to make sure we do not longjmp with page
locks held. Note that user-mode has nothing to check, since page_locks
are !user-mode only.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Groundwork for supporting parallel TCG generation.
Instead of using a global lock (tb_lock) to protect changes
to pages, use fine-grained, per-page locks in !user-mode.
User-mode stays with mmap_lock.
Sometimes changes need to happen atomically on more than one
page (e.g. when a TB that spans across two pages is
added/invalidated, or when a range of pages is invalidated).
We therefore introduce struct page_collection, which helps
us keep track of a set of pages that have been locked in
the appropriate locking order (i.e. by ascending page index).
This commit first introduces the structs and the function helpers,
to then convert the calling code to use per-page locking. Note
that tb_lock is not removed yet.
While at it, rename tb_alloc_page to tb_page_add, which pairs with
tb_page_remove.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This commit does several things, but to avoid churn I merged them all
into the same commit. To wit:
- Use uintptr_t instead of TranslationBlock * for the list of TBs in a page.
Just like we did in (c37e6d7e "tcg: Use uintptr_t type for
jmp_list_{next|first} fields of TB"), the rationale is the same: these
are tagged pointers, not pointers. So use a more appropriate type.
- Only check the least significant bit of the tagged pointers. Masking
with 3/~3 is unnecessary and confusing.
- Introduce the TB_FOR_EACH_TAGGED macro, and use it to define
PAGE_FOR_EACH_TB, which improves readability. Note that
TB_FOR_EACH_TAGGED will gain another user in a subsequent patch.
- Update tb_page_remove to use PAGE_FOR_EACH_TB. In case there
is a bug and we attempt to remove a TB that is not in the list, instead
of segfaulting (since the list is NULL-terminated) we will reach
g_assert_not_reached().
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Thereby making it per-TCGContext. Once we remove tb_lock, this will
avoid an atomic increment every time a TB is invalidated.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This paves the way for enabling scalable parallel generation of TCG code.
Instead of tracking TBs with a single binary search tree (BST), use a
BST for each TCG region, protecting it with a lock. This is as scalable
as it gets, since each TCG thread operates on a separate region.
The core of this change is the introduction of struct tcg_region_tree,
which contains a pointer to a GTree and an associated lock to serialize
accesses to it. We then allocate an array of tcg_region_tree's, adding
the appropriate padding to avoid false sharing based on
qemu_dcache_linesize.
Given a tc_ptr, we first find the corresponding region_tree. This
is done by special-casing the first and last regions first, since they
might be of size != region.size; otherwise we just divide the offset
by region.stride. I was worried about this division (several dozen
cycles of latency), but profiling shows that this is not a fast path.
Note that region.stride is not required to be a power of two; it
is only required to be a multiple of the host's page size.
Note that with this design we can also provide consistent snapshots
about all region trees at once; for instance, tcg_tb_foreach
acquires/releases all region_tree locks before/after iterating over them.
For this reason we now drop tb_lock in dump_exec_info().
As an alternative I considered implementing a concurrent BST, but this
can be tricky to get right, offers no consistent snapshots of the BST,
and performance and scalability-wise I don't think it could ever beat
having separate GTrees, given that our workload is insert-mostly (all
concurrent BST designs I've seen focus, understandably, on making
lookups fast, which comes at the expense of convoluted, non-wait-free
insertions/removals).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Currently we don't support board configurations that put an IOMMU
in the path of the CPU's memory transactions, and instead just
assert() if the memory region fonud in address_space_translate_for_iotlb()
is an IOMMUMemoryRegion.
Remove this limitation by having the function handle IOMMUs.
This is mostly straightforward, but we must make sure we have
a notifier registered for every IOMMU that a transaction has
passed through, so that we can flush the TLB appropriately
when any of the IOMMUs change their mappings.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180604152941.20374-5-peter.maydell@linaro.org
Add an IOMMU index argument to the translate method of
IOMMUs. Since all of our current IOMMU implementations
support only a single IOMMU index, this has no effect
on the behaviour.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180604152941.20374-4-peter.maydell@linaro.org
Add support for multiple IOMMU indexes to the IOMMU notifier APIs.
When initializing a notifier with iommu_notifier_init(), the caller
must pass the IOMMU index that it is interested in. When a change
happens, the IOMMU implementation must pass
memory_region_notify_iommu() the IOMMU index that has changed and
that notifiers must be called for.
IOMMUs which support only a single index don't need to change.
Callers which only really support working with IOMMUs with a single
index can use the result of passing MEMTXATTRS_UNSPECIFIED to
memory_region_iommu_attrs_to_index().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180604152941.20374-3-peter.maydell@linaro.org
If an IOMMU supports mappings that care about the memory
transaction attributes, then it no longer has a unique
address -> output mapping, but more than one. We can
represent these using an IOMMU index, analogous to TCG's
mmu indexes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180604152941.20374-2-peter.maydell@linaro.org
There's a common pattern in QEMU where a function needs to perform
a data load or store of an N byte integer in a particular endianness.
At the moment this is handled by doing a switch() on the size and
calling the appropriate ld*_p or st*_p function for each size.
Provide a new family of functions ldn_*_p() and stn_*_p() which
take the size as an argument and do the switch() themselves.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180611171007.4165-2-peter.maydell@linaro.org
The API for cpu_transaction_failed() says that it takes the physical
address for the failed transaction. However we were actually passing
it the offset within the target MemoryRegion. We don't currently
have any target CPU implementations of this hook that require the
physical address; fix this bug so we don't get confused if we ever
do add one.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180611125633.32755-3-peter.maydell@linaro.org
The 'addr' field in the CPUIOTLBEntry struct has a rather non-obvious
use; add a comment documenting it (reverse-engineered from what
the code that sets it is doing).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180611125633.32755-2-peter.maydell@linaro.org
The migration code should be using the
RAMBLOCK_FOREACH_MIGRATABLE and qemu_ram_foreach_block_migratable
not the all-block versions; poison them so that we can't accidentally
use them.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20180605162545.80778-3-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
On the POWER9 processor, the XIVE interrupt controller can control
interrupt sources using MMIO to trigger events, to EOI or to turn off
the sources. Priority management and interrupt acknowledgment is also
controlled by MMIO in the presenter sub-engine.
These MMIO regions are exposed to guests in QEMU with a set of 'ram
device' memory mappings, similarly to VFIO, and the VMAs are populated
dynamically with the appropriate pages using a fault handler.
But, these regions are an issue for migration. We need to discard the
associated RAMBlocks from the RAM state on the source VM and let the
destination VM rebuild the memory mappings on the new host in the
post_load() operation just before resuming the system.
To achieve this goal, the following introduces a new RAMBlock flag
RAM_MIGRATABLE which is updated in the vmstate_register_ram() and
vmstate_unregister_ram() routines. This flag is then used by the
migration to identify RAMBlocks to discard on the source. Some checks
are also performed on the destination to make sure nothing invalid was
sent.
This change impacts the boston, malta and jazz mips boards for which
migration compatibility is broken.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Do the cast to uintptr_t within the helper, so that the compiler
can type check the pointer argument. We can also do some more
sanity checking of the index argument.
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The function has been deprecated for 2.5 years, and there are just a handful
of users. Convert them to memory_region_init_io with NULL callbacks,
and while at it pass the right device as the owner.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20180515134835.3409-1-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As part of plumbing MemTxAttrs down to the IOMMU translate method,
add MemTxAttrs as an argument to address_space_get_iotlb_entry().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180521140402.23318-12-peter.maydell@linaro.org
As part of plumbing MemTxAttrs down to the IOMMU translate method,
add MemTxAttrs as an argument to flatview_translate(); all its
callers now have attrs available.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180521140402.23318-11-peter.maydell@linaro.org
As part of plumbing MemTxAttrs down to the IOMMU translate method,
add MemTxAttrs as an argument to the MemoryRegion valid.accepts
callback. We'll need this for subpage_accepts().
We could take the approach we used with the read and write
callbacks and add new a new _with_attrs version, but since there
are so few implementations of the accepts hook we just change
them all.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180521140402.23318-9-peter.maydell@linaro.org
As part of plumbing MemTxAttrs down to the IOMMU translate method,
add MemTxAttrs as an argument to memory_region_access_valid().
Its callers either have an attrs value to hand, or don't care
and can use MEMTXATTRS_UNSPECIFIED.
The callsite in flatview_access_valid() is part of a recursive
loop flatview_access_valid() -> memory_region_access_valid() ->
subpage_accepts() -> flatview_access_valid(); we make it pass
MEMTXATTRS_UNSPECIFIED for now, until the next several commits
have plumbed an attrs parameter through the rest of the loop
and we can add an attrs parameter to flatview_access_valid().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180521140402.23318-8-peter.maydell@linaro.org
As part of plumbing MemTxAttrs down to the IOMMU translate method,
add MemTxAttrs as an argument to address_space_access_valid().
Its callers either have an attrs value to hand, or don't care
and can use MEMTXATTRS_UNSPECIFIED.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180521140402.23318-6-peter.maydell@linaro.org
As part of plumbing MemTxAttrs down to the IOMMU translate method,
add MemTxAttrs as an argument to address_space_map().
Its callers either have an attrs value to hand, or don't care
and can use MEMTXATTRS_UNSPECIFIED.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180521140402.23318-5-peter.maydell@linaro.org
As part of plumbing MemTxAttrs down to the IOMMU translate method,
add MemTxAttrs as an argument to address_space_translate()
and address_space_translate_cached(). Callers either have an
attrs value to hand, or don't care and can use MEMTXATTRS_UNSPECIFIED.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180521140402.23318-4-peter.maydell@linaro.org
As part of plumbing MemTxAttrs down to the IOMMU translate method,
add MemTxAttrs as an argument to tb_invalidate_phys_addr().
Its callers either have an attrs value to hand, or don't care
and can use MEMTXATTRS_UNSPECIFIED.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180521140402.23318-3-peter.maydell@linaro.org
Add more detail to the documentation for memory_region_init_iommu()
and other IOMMU-related functions and data structures.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20180521140402.23318-2-peter.maydell@linaro.org
Depending on the host abi, float16, aka uint16_t, values are
passed and returned either zero-extended in the host register
or with garbage at the top of the host register.
The tcg code generator has so far been assuming garbage, as that
matches the x86 abi, but this is incorrect for other host abis.
Further, target/arm has so far been assuming zero-extended results,
so that it may store the 16-bit value into a 32-bit slot with the
high 16-bits already clear.
Rectify both problems by mapping "f16" in the helper definition
to uint32_t instead of (a typedef for) uint16_t. This forces
the host compiler to assume garbage in the upper 16 bits on input
and to zero-extend the result on output.
Cc: qemu-stable@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Message-id: 20180522175629.24932-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
gdb_handlesig()'s behaviour is not entirely obvious at first
glance. Add a doc comment for it, and also add a comment
explaining why it's ok for gdb_do_syscallv() to ignore
gdb_handlesig()'s return value. (Coverity complains about
this: CID 1390850.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20180515181958.25837-1-peter.maydell@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
In thunk_type_align() and thunk_type_size() we currently return
-1 if the value at the type_ptr isn't one of the TYPE_* values
we understand. However, this should never happen, and if it does
then the calling code will go confusingly wrong because none
of the callsites try to handle an error return. Switch to an
assertion instead, so that if this does somehow happen we'll have
a nice clear backtrace of what happened rather than a weird crash
or misbehaviour.
This also silences various Coverity complaints about not handling
the negative return value (CID 1005735, 1005736, 1005738, 1390582).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20180514174616.19601-1-peter.maydell@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
* dtc configure fixes
* MemoryRegionCache second try
* Deprecated option removal
* add support for Hyper-V reenlightenment MSRs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJa9Y2qAAoJEL/70l94x66Df8EIAI4pi+zf1mTlH0Koi+oqOg+d
geBC6N9IA+n1p90XERnPbuiT19NjON2R1Z907SbzDkijxdNRoYUoQf7Z+ZBTENjn
dYsVvgLYzajGLWWtJetPPaNFAqeF2z8B3lbVQnGVLzH5pQQ2NS1NJsvXQA2LslLs
2ll1CJ2EEBhayoBSbHK+0cY85f+DUgK/T1imIV2T/rwcef9Rw218nvPfGhPBSoL6
tI2xIOxz8bBOvZNg2wdxpaoPuDipBFu6koVVbaGSgXORg8k5CEcKNxInztufdELW
KZK5ORa3T0uqu5T/GGPAfm/NbYVQ4aTB5mddshsXtKbBhnbSfRYvpVsR4kQB/Hc=
=oC1r
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* Don't silently truncate extremely long words in the command line
* dtc configure fixes
* MemoryRegionCache second try
* Deprecated option removal
* add support for Hyper-V reenlightenment MSRs
# gpg: Signature made Fri 11 May 2018 13:33:46 BST
# gpg: using RSA key BFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini/tags/for-upstream: (29 commits)
rename included C files to foo.inc.c, remove osdep.h
pc-dimm: fix error messages if no slots were defined
build: Silence dtc directory creation
shippable: Remove Debian 8 libfdt kludge
configure: Display if libfdt is from system or git
configure: Really use local libfdt if the system one is too old
i386/kvm: add support for Hyper-V reenlightenment MSRs
qemu-doc: provide details of supported build platforms
qemu-options: Remove deprecated -no-kvm-irqchip
qemu-options: Remove deprecated -no-kvm-pit-reinjection
qemu-options: Bail out on unsupported options instead of silently ignoring them
qemu-options: Remove remainders of the -tdf option
qemu-options: Mark -virtioconsole as deprecated
target/i386: sev: fix memory leaks
opts: don't silently truncate long option values
opts: don't silently truncate long parameter keys
accel: use g_strsplit for parsing accelerator names
update-linux-headers: drop hyperv.h
qemu-thread: always keep the posix wrapper layer
exec: reintroduce MemoryRegion caching
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
While at it, use int for both num_insns and max_insns to make
sure we have same-type comparisons.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Michael Clark <mjc@sifive.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
MemoryRegionCache was reverted to "normal" address_space_* operations
for 2.9, due to lack of support for IOMMUs. Reinstate the
optimizations, caching only the IOMMU translation at address_cache_init
but not the IOMMU lookup and target AddressSpace translation are not
cached; now that MemoryRegionCache supports IOMMUs, it becomes more widely
applicable too.
The inlined fast path is defined in memory_ldst_cached.inc.h, while the
slow path uses memory_ldst.inc.c as before. The smaller fast path causes
a little code size reduction in MemoryRegionCache users:
hw/virtio/virtio.o text size before: 32373
hw/virtio/virtio.o text size after: 31941
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For now, this reduces the text size very slightly due to the newly-added
inlining:
text size before: 9301965
text size after: 9300645
Later, however, the declarations in include/exec/memory_ldst.inc.h will be
reused for the MemoryRegionCache slow path functions.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since commit 8efb2ed5ec ("linux-user: Correct signedness of
target_flock l_start and l_len fields"), flock64 structure uses
abi_llong for l_start and l_len in place of "unsigned long long"
this should force them to be aligned accordingly to the target
rules. So we can remove the padding field and the QEMU_PACKED
attribute.
I have compared the result of the following program before and
after the change:
cat -> flock64_dump <<EOF
p/d sizeof(struct target_flock64)
p/d &((struct target_flock64 *)0)->l_type
p/d &((struct target_flock64 *)0)->l_whence
p/d &((struct target_flock64 *)0)->l_start
p/d &((struct target_flock64 *)0)->l_len
p/d &((struct target_flock64 *)0)->l_pid
quit
EOF
for file in build/all/*-linux-user/qemu-* ; do
echo $file
gdb -batch -nx -x flock64_dump $file 2> /dev/null
done
The sizeof() changes because we remove the QEMU_PACKED.
The new size is 32 (except for i386 and m68k) and this is
the real size of "struct flock64" on the target architecture.
The following architectures differ:
aarch64_be, aarch64, alpha, armeb, arm, cris, hppa, nios2, or1k,
riscv32, riscv64, s390x.
For a subset of these architectures, I have checked with the following
program the new structure is the correct one:
#include <stdio.h>
#define __USE_LARGEFILE64
#include <fcntl.h>
int main(void)
{
printf("struct flock64 %d\n", sizeof(struct flock64));
printf("l_type %d\n", &((struct flock64 *)0)->l_type);
printf("l_whence %d\n", &((struct flock64 *)0)->l_whence);
printf("l_start %d\n", &((struct flock64 *)0)->l_start);
printf("l_len %d\n", &((struct flock64 *)0)->l_len);
printf("l_pid %d\n", &((struct flock64 *)0)->l_pid);
}
[I have checked aarch64, alpha, hppa, s390x]
For ARM, the target_flock64 becomes the EABI definition, so we need to
define the OABI one in place of the EABI one and use it when it is
needed.
I have also fixed the alignment value for sh4 (to align llong on 4 bytes)
(see c2e3dee6e0 "linux-user: Define target alignment size")
[We should check alignment properties for cris, nios2 and or1k]
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20180502215730.28162-1-laurent@vivier.eu>
In icount mode, instructions that access io memory spaces in the middle
of the translation block invoke TB recompilation. After recompilation,
such instructions become last in the TB and are allowed to access io
memory spaces.
When the code includes instruction like i386 'xchg eax, 0xffffd080'
which accesses APIC, QEMU goes into an infinite loop of the recompilation.
This instruction includes two memory accesses - one read and one write.
After the first access, APIC calls cpu_report_tpr_access, which restores
the CPU state to get the current eip. But cpu_restore_state_from_tb
resets the cpu->can_do_io flag which makes the second memory access invalid.
Therefore the second memory access causes a recompilation of the block.
Then these operations repeat again and again.
This patch moves resetting cpu->can_do_io flag from
cpu_restore_state_from_tb to cpu_loop_exit* functions.
It also adds a parameter for cpu_restore_state which controls restoring
icount. There is no need to restore icount when we only query CPU state
without breaking the TB. Restoring it in such cases leads to the
incorrect flow of the virtual time.
In most cases new parameter is true (icount should be recalculated).
But there are two cases in i386 and openrisc when the CPU state is only
queried without the need to break the TB. This patch fixes both of
these cases.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <20180409091320.12504.35329.stgit@pasha-VirtualBox>
[rth: Make can_do_io setting unconditional; move from cpu_exec;
make cpu_loop_exit_{noexc,restore} call cpu_loop_exit.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Since the commit:
commit 4486e89c21
Author: Stefan Hajnoczi <stefanha@redhat.com>
Date: Wed Mar 7 14:42:05 2018 +0000
vl: introduce vm_shutdown()
GDB crashes when qemu exits (at least on sparc-softmmu):
Remote communication error. Target disconnected.: Connection reset by peer.
Quitting: putpkt: write failed: Broken pipe.
So send a packet to exit GDB before we exit QEMU:
[Inferior 1 (Thread 0) exited normally]
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: KONRAD Frederic <frederic.konrad@adacore.com>
Message-id: 1521538773-30802-1-git-send-email-frederic.konrad@adacore.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Use a flag on the RAMBlock to state whether it has the
UFFDIO_ZEROPAGE capability, use it when it's available.
This allows the use of postcopy on tmpfs as well as hugepage
backed files.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Utility to give the offset of a host pointer within a RAMBlock
(assuming we already know it's in that RAMBlock)
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In linux-user QEMU that runs for a target with TARGET_ABI_BITS bigger
than L1_MAP_ADDR_SPACE_BITS an assertion in page_set_flags fires when
mmap, munmap, mprotect, mremap or shmat is called for an address outside
the guest address space. mmap and mprotect should return ENOMEM in such
case.
Change definition of GUEST_ADDR_MAX to always be the last valid guest
address. Account for this change in open_self_maps.
Add macro guest_addr_valid that verifies if the guest address is valid.
Add function guest_range_valid that verifies if address range is within
guest address space and does not wrap around. Use that macro in
mmap/munmap/mprotect/mremap/shmat for error checking.
Cc: qemu-stable@nongnu.org
Cc: Riku Voipio <riku.voipio@iki.fi>
Cc: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20180307215010.30706-1-jcmvbkbc@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
address_space_read is calling address_space_to_flatview but it can
be called outside the RCU lock. To fix it, push the rcu_read_lock/unlock
pair up from flatview_read_full to address_space_read's constant size
fast path and address_space_read_full.
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These accessors are called from inlined functions, and the call sequence
is much more expensive than just inlining the access. Move the
struct declaration to memory-internal.h so that exec.c and memory.c
can both use an inline function.
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This allows us to explicitly pass float16 to helpers rather than
assuming uint32_t and dealing with the result. Of course they will be
passed in i32 sized registers by default.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180227143852.11175-2-alex.bennee@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Currently only file backed memory backend can
be created with a "share" flag in order to allow
sharing guest RAM with other processes in the host.
Add the "share" flag also to RAM Memory Backend
in order to allow remapping parts of the guest RAM
to different host virtual addresses. This is needed
by the RDMA devices in order to remap non-contiguous
QEMU virtual addresses to a contiguous virtual address range.
Moved the "share" flag to the Host Memory base class,
modified phys_mem_alloc to include the new parameter
and a new interface memory_region_init_ram_shared_nomigrate.
There are no functional changes if the new flag is not used.
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Simplify the users of memory_region_snapshot_and_clear_dirty, so
that they do not have to call memory_region_sync_dirty_bitmap
explicitly.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes, with the change
to target/s390x/gen-features.c manually reverted, and blank lines
around deletions collapsed.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-3-armbru@redhat.com>
This adds get_attr() to IOMMUMemoryRegionClass, like
iommu_ops::domain_get_attr in the Linux kernel.
This defines the first attribute - IOMMU_ATTR_SPAPR_TCE_FD - which
will be used between the pSeries machine and VFIO-PCI.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The memory-internal.h header claims that it is for "obsolete
exec.c functions" which "will be removed soon". This statement
was added in 2011, six years ago, but the header is still here.
(Admittedly none of the prototypes added in commit 67d95c153b
are still in the header.)
It's convenient to have a place to put prototypes for functions
which are used internally to the various .c files of the memory
system or by the accel/tcg code, which is inevitably fairly
closely coupled. So keep the header but update the comments to
reflect what we're actually using it for.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <1511276888-17834-1-git-send-email-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Message-Id: <1515043788-38300-1-git-send-email-jianjay.zhou@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The MC68040 MMU provides the size of the access that
triggers the page fault.
This size is set in the Special Status Word which
is written in the stack frame of the access fault
exception.
So we need the size in m68k_cpu_unassigned_access() and
m68k_cpu_handle_mmu_fault().
To be able to do that, this patch modifies the prototype of
handle_mmu_fault handler, tlb_fill() and probe_write().
do_unassigned_access() already includes a size parameter.
This patch also updates handle_mmu_fault handlers and
tlb_fill() of all targets (only parameter, no code change).
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20180118193846.24953-2-laurent@vivier.eu>
When mmap(2) the backend files, QEMU uses the host page size
(getpagesize(2)) by default as the alignment of mapping address.
However, some backends may require alignments different than the page
size. For example, mmap a device DAX (e.g., /dev/dax0.0) on Linux
kernel 4.13 to an address, which is 4K-aligned but not 2M-aligned,
fails with a kernel message like
[617494.969768] dax dax0.0: qemu-system-x86: dax_mmap: fail, unaligned vma (0x7fa37c579000 - 0x7fa43c579000, 0x1fffff)
Because there is no common approach to get such alignment requirement,
we add the 'align' option to 'memory-backend-file', so that users or
management utils, which have enough knowledge about the backend, can
specify a proper alignment via this option.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Message-Id: <20171211072806.2812-2-haozhong.zhang@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[ehabkost: fixed typo, fixed error_setg() format string]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This code has an optimised, word aligned version, and a boring
unaligned version. My commit f70d345 fixed one alignment issue, but
there's another.
The optimised version operates on 'longs' dealing with (typically) 64
pages at a time, replacing the whole long by a 0 and counting the bits.
If the Ramblock is less than 64bits in length that long can contain bits
representing two different RAMBlocks, but the code will update the
bmap belinging to the 1st RAMBlock only while having updated the total
dirty page count for both.
This probably didn't matter prior to 6b6712ef which split the dirty
bitmap by RAMBlock, but now they're separate RAMBlocks we end up
with a count that doesn't match the state in the bitmaps.
Symptom:
Migration showing a few dirty pages left to be sent constantly
Seen on aarch64 and x86 with x86+ovmf
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reported-by: Wei Huang <wei@redhat.com>
Fixes: 6b6712efcc
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We already handle this in the backends, and the lifetime datum
for the TCGOp is already large enough.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
With no fixed array allocation, we can't overflow a buffer.
This will be important as optimizations related to host vectors
may expand the number of ops used.
Use QTAILQ to link the ops together.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Normally we create an address space for that CPU and pass that address
space into the function. Let's just do it inside to unify address space
creations. It'll simplify my next patch to rename those address spaces.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20171123092333.16085-3-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This was never used since its introduction in commit
196ea13104 ("memory: Add global-locking property to memory
regions").
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The function notdirty_mem_write() has a sequence of actions
it has to do before and after the actual business of writing
data to host RAM to ensure that dirty flags are correctly
updated and we flush any TCG translations for the region.
We need to do this also in other places that write directly
to host RAM, most notably the TCG atomic helper functions.
Pull out the before and after pieces into their own functions.
We use an API where the prepare function stashes the various
bits of information about the write into a struct for the
complete function to use, because in the calls for the atomic
helpers the place where the complete function will be called
doesn't have the information to hand.
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1511201308-23580-2-git-send-email-peter.maydell@linaro.org
When we handle a signal from a fault within a user-only memory helper,
we cannot cpu_restore_state with the PC found within the signal frame.
Use a TLS variable, helper_retaddr, to record the unwind start point
to find the faulting guest insn.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We are still seeing signals during translation time when we walk over
a page protection boundary. This expands the check to ensure the host
PC is inside the code generation buffer. The original suggestion was
to check versus tcg_ctx.code_gen_ptr but as we now segment the
translation buffer we have to settle for just a general check for
being inside.
I've also fixed up the declaration to make it clear it can deal with
invalid addresses. A later patch will fix up the call sites.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20171108153245.20740-2-alex.bennee@linaro.org
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Tested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Now that every target is using the disas_set_info hook,
the flags argument is unused. Remove it.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This is identical for each target. So, move the initialization to
common code. Move the variable itself out of tcg_ctx and name it
cpu_env to minimize changes within targets.
This also means we can remove tcg_global_reg_new_{ptr,i32,i64},
since there are no longer global-register temps created by targets.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Groundwork for supporting multiple TCG contexts.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Groundwork for supporting multiple TCG contexts.
The core of this patch is this change to tcg/tcg.h:
> -extern TCGContext tcg_ctx;
> +extern TCGContext tcg_init_ctx;
> +extern TCGContext *tcg_ctx;
Note that for now we set *tcg_ctx to whatever TCGContext is passed
to tcg_context_init -- in this case &tcg_init_ctx.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Groundwork for supporting multiple TCG contexts.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We don't really free anything in this function anymore; we just remove
the TB from the binary search tree.
Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This is a prerequisite for supporting multiple TCG contexts, since
we will have threads generating code in separate regions of
code_gen_buffer.
For this we need a new field (.size) in struct tb_tc to keep
track of the size of the translated code. This field uses a size_t
to avoid adding a hole to the struct, although really an unsigned
int would have been enough.
The comparison function we use is optimized for the common case:
insertions. Profiling shows that upon booting debian-arm, 98%
of comparisons are between existing tb's (i.e. a->size and b->size
are both !0), which happens during insertions (and removals, but
those are rare). The remaining cases are lookups. From reading the glib
sources we see that the first key is always the lookup key. However,
the code does not assume this to always be the case because this
behaviour is not guaranteed in the glib docs. However, we embed
this knowledge in the code as a branch hint for the compiler.
Note that tb_free does not free space in the code_gen_buffer anymore,
since we cannot easily know whether the tb is the last one inserted
in code_gen_buffer. The next patch in this series renames tb_free
to tb_remove to reflect this.
Performance-wise, lookups in tb_find_pc are the same as before:
O(log n). However, insertions are O(log n) instead of O(1), which
results in a small slowdown when booting debian-arm:
Performance counter stats for 'build/arm-softmmu/qemu-system-arm \
-machine type=virt -nographic -smp 1 -m 4096 \
-netdev user,id=unet,hostfwd=tcp::2222-:22 \
-device virtio-net-device,netdev=unet \
-drive file=img/arm/jessie-arm32.qcow2,id=myblock,index=0,if=none \
-device virtio-blk-device,drive=myblock \
-kernel img/arm/aarch32-current-linux-kernel-only.img \
-append console=ttyAMA0 root=/dev/vda1 \
-name arm,debug-threads=on -smp 1' (10 runs):
- Before:
8048.598422 task-clock (msec) # 0.931 CPUs utilized ( +- 0.28% )
16,974 context-switches # 0.002 M/sec ( +- 0.12% )
0 cpu-migrations # 0.000 K/sec
10,125 page-faults # 0.001 M/sec ( +- 1.23% )
35,144,901,879 cycles # 4.367 GHz ( +- 0.14% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
65,758,252,643 instructions # 1.87 insns per cycle ( +- 0.33% )
10,871,298,668 branches # 1350.707 M/sec ( +- 0.41% )
192,322,212 branch-misses # 1.77% of all branches ( +- 0.32% )
8.640869419 seconds time elapsed ( +- 0.57% )
- After:
8146.242027 task-clock (msec) # 0.923 CPUs utilized ( +- 1.23% )
17,016 context-switches # 0.002 M/sec ( +- 0.40% )
0 cpu-migrations # 0.000 K/sec
18,769 page-faults # 0.002 M/sec ( +- 0.45% )
35,660,956,120 cycles # 4.378 GHz ( +- 1.22% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
65,095,366,607 instructions # 1.83 insns per cycle ( +- 1.73% )
10,803,480,261 branches # 1326.192 M/sec ( +- 1.95% )
195,601,289 branch-misses # 1.81% of all branches ( +- 0.39% )
8.828660235 seconds time elapsed ( +- 0.38% )
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Now that we have curr_cflags, we can include CF_USE_ICOUNT
early and then remove it as necessary.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These flags are used by target/*/translate.c,
and affect code generation.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Convert all existing readers of tb->cflags to tb_cflags, so that we
use atomic_read and therefore avoid undefined behaviour in C11.
Note that the remaining setters/getters of the field are protected
by tb_lock, and therefore do not need conversion.
Luckily all readers access the field via 'tb->cflags' (so no foo.cflags,
bar->cflags in the code base), which makes the conversion easily
scriptable:
FILES=$(git grep 'tb->cflags' target include/exec/gen-icount.h \
accel/tcg/translator.c | cut -f1 -d':' | sort | uniq)
perl -pi -e 's/([^.>])tb->cflags/$1tb_cflags(tb)/g' $FILES
perl -pi -e 's/([a-z->.]*)(->|\.)tb->cflags/tb_cflags($1$2tb)/g' $FILES
Then manually fixed the few errors that checkpatch reported.
Compile-tested for all targets.
Suggested-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This will enable us to decouple code translation from the value
of parallel_cpus at any given time. It will also help us minimize
TB flushes when generating code via EXCP_ATOMIC.
Note that the declaration of parallel_cpus is brought to exec-all.h
to be able to define there the "curr_cflags" inline.
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The GET and MAKE functions weren't really specific enough.
We now have a full complement of functions that convert exactly
between temporaries, arguments, tcgv pointers, and indices.
The target/sparc change is also a bug fix, which would have affected
a host that defines TCG_TARGET_HAS_extr[lh]_i64_i32, i.e. MIPS64.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Transform TCGv_* to an "argument" or a temporary.
For now, an argument is simply the temporary index.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This patch adds ability to track down already received
pages, it's necessary for calculation vCPU block time in
postcopy migration feature, and for recovery after
postcopy migration failure.
Also it's necessary to solve shared memory issue in
postcopy livemigration. Information about received pages
will be transferred to the software virtual bridge
(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
already received pages. fallocate syscall is required for
remmaped shared memory, due to remmaping itself blocks
ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
error (struct page is exists after remmap).
Bitmap is placed into RAMBlock as another postcopy/precopy
related bitmaps.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Background: s390x implements Low-Address Protection (LAP). If LAP is
enabled, writing to effective addresses (before any translation)
0-511 and 4096-4607 triggers a protection exception.
So we have subpage protection on the first two pages of every address
space (where the lowcore - the CPU private data resides).
By immediately invalidating the write entry but allowing the caller to
continue, we force every write access onto these first two pages into
the slow path. we will get a tlb fault with the specific accessed
addresses and can then evaluate if protection applies or not.
We have to make sure to ignore the invalid bit if tlb_fill() succeeds.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20171016202358.3633-2-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
These only depend on the host and therefore belong in the common
osdep, not in a target-dependent object.
While at it, query the host during an init constructor, which guarantees
the page size will be well-defined throughout the execution of the program.
Suggested-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
In preparation for adding tc.size to be able to keep track of
TB's using the binary search tree implementation from glib.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
And fix the following warning when DEBUG_TB_INVALIDATE is enabled
in translate-all.c:
CC mipsn32-linux-user/accel/tcg/translate-all.o
/data/src/qemu/accel/tcg/translate-all.c: In function ‘tb_alloc_page’:
/data/src/qemu/accel/tcg/translate-all.c:1201:16: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘tb_page_addr_t {aka unsigned int}’ [-Werror=format=]
printf("protecting code page: 0x" TARGET_FMT_lx "\n",
^
cc1: all warnings being treated as errors
/data/src/qemu/rules.mak:66: recipe for target 'accel/tcg/translate-all.o' failed
make[1]: *** [accel/tcg/translate-all.o] Error 1
Makefile:328: recipe for target 'subdir-mipsn32-linux-user' failed
make: *** [subdir-mipsn32-linux-user] Error 2
cota@flamenco:/data/src/qemu/build ((18f3fe1...) *$)$
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This gets rid of a hole in struct TranslationBlock.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit f0aff0f124 ("cputlb: add assert_cpu_is_self checks") buried
the increment of tlb_flush_count under TLB_DEBUG. This results in
"info jit" always (mis)reporting 0 TLB flushes when !TLB_DEBUG.
Besides, under MTTCG tlb_flush_count is updated by several threads,
so in order not to lose counts we'd either have to use atomic ops
or distribute the counter, which is more scalable.
This patch does the latter by embedding tlb_flush_count in CPUArchState.
The global count is then easily obtained by iterating over the CPU list.
Note that this change also requires updating the accessors to
tlb_flush_count to use atomic_read/set whenever there may be conflicting
accesses (as defined in C11) to it.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Since FlatViews are shared now and ASes not, this gets rid of
address_space_init_shareable().
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-17-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This adds a new "-d" switch to "info mtree" to print dispatch tree
internals.
This changes the way "-f" is handled - it prints now flat views and
associated address spaces.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-15-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This renames some helpers to reflect better what they do.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-9-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As we are going to share FlatView's between AddressSpace's,
and AddressSpaceDispatch is a structure to perform quick lookup
in FlatView, this moves ASD to FlatView.
After previosly open coded ASD rendering, we can also remove
as->next_dispatch as the new FlatView pointer is stored
on a stack and set to an AS atomically.
flatview_destroy() is executed under RCU instead of
address_space_dispatch_free() now.
This makes mem_begin/mem_commit to work with ASD and mem_add with FV
as later on mem_add will be taking FV as an argument anyway.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-5-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We are going to share FlatView's between AddressSpace's and per-AS
memory listeners won't suit the purpose anymore so open code
the dispatch tree rendering.
Since there is a good chance that dispatch_listener was the only
listener, this avoids address_space_update_topology_pass() if there is
no registered listeners; this should improve starting time.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-3-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace the USE_DIRECT_JUMP ifdef with a TCG_TARGET_HAS_direct_jump
boolean test. Replace the tb_set_jmp_target1 ifdef with an unconditional
function tb_target_set_jmp_target.
While we're touching all backends, add a parameter for tb->tc_ptr;
we're going to need it shortly for some backends.
Move tb_set_jmp_target and tb_add_jump from exec-all.h to cpu-exec.c.
This opens the possibility for TCG_TARGET_HAS_direct_jump to be
a runtime decision -- based on host cpu capabilities, the size of
code_gen_buffer, or a future debugging switch.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Message-Id: <150002073981.22386.9870422422367410100.stgit@frigg.lan>
[rth: Moved max_insns adjustment from tb_start to init_disas_context.
Removed pc_next return from translate_insn.
Removed tcg_check_temp_count from generic loop.
Moved gen_io_end to exactly match gen_io_start.
Use qemu_log instead of error_report for temporary leaks.
Moved TB size/icount assignments before disas_log.]
Signed-off-by: Richard Henderson <rth@twiddle.net>
Used later. An enum makes expected values explicit and
bounds the value space of switches.
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-Id: <150002049746.22386.2316077281615710615.stgit@frigg.lan>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This will allow some amount of cleanup to happen before
switching the backends over to enum DisasJumpType.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Lluís Vilanova <vilanova@ac.upc.edu>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Move the MemTxResult type to memattrs.h. We're going to want to
use it in cpu/qom.h, which doesn't want to include all of
memory.h. In practice MemTxResult and MemTxAttrs are pretty
closely linked since both are used for the new-style
read_with_attrs and write_with_attrs callbacks, so memattrs.h
is a reasonable home for this rather than creating a whole
new header file for it.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Needed to implement a target-agnostic gen_intermediate_code()
in the future.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Benneé <alex.benee@linaro.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Message-Id: <150002025498.22386.18051908483085660588.stgit@frigg.lan>
Signed-off-by: Richard Henderson <rth@twiddle.net>
* new model of the ARM MPS2/MPS2+ FPGA based development board
* clean up DISAS_* exit conditions and fix various regressions
since commits e75449a3468a6b28c7b5 (in particular including
ones which broke OP-TEE guests)
* make Cortex-M3 and M4 correctly default to 8 PMSA regions
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJZbLEBAAoJEDwlJe0UNgzeTqsP/06M2a/rswUKjIGAsXv+TeTl
5N31g9E6Jr57HXK94Q0XtNkLlPwvIn97Dcv6VKg5+E8OgJx7ozldwZVFghWvMbOA
mbaikzgTRRUf6ydNTA4DtWYZPkaLNT86Vmb2T1GKS0nmw2ymd+hMLNk5vZd1jhDv
krHxwECI5e+u1INpw7erlQ2mqVP1NjvOuMNtjdAgtJ5tnjFRfQaVedePmr5qOuIK
xkYMKMNtled/KS0gP4TaSu5S012iYhzrpKISN/g4WHT/8kllr+iEowNAOJSJ6l38
oaBJJJCsLwnnV1nRClp4NNQv0Q/RXyIex5mPkeWERk4QU9adSDHnYJR7xn7JEyzV
l9o+av28bXA7l3C8BOi3ahSGh5cDu+hif0Biml/Kke7e4+1Lp3/QWSQ+p/E5PDDq
rhk65cg07PxSHeogN8hgu+RYN0gF3WBKASwUIDAkVdBsLlH8LVmoT5DtllL+6PyY
cwCp3nWeu0q2YDxGOfCZrUC4YJMl8hqHoWbdVah8vLKV/w/JVUtVEIol0za50dzG
ii6wOLqzV8GH0vkVa5x0InlH+t+/LtDRVkgHUT3/64eEEG+SsK/GmZeEtvcmp7GP
7Qx+Dd7hPgh+uis0XZPz37vqyCYhaFNw1+M9EESlQKUKfdY8B5B5bpXVDOBF+0Zl
daOoMw8xBd21DXNk9tCk
=gVxi
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20170717' into staging
target-arm queue:
* new model of the ARM MPS2/MPS2+ FPGA based development board
* clean up DISAS_* exit conditions and fix various regressions
since commits e75449a3468a6b28c7b5 (in particular including
ones which broke OP-TEE guests)
* make Cortex-M3 and M4 correctly default to 8 PMSA regions
# gpg: Signature made Mon 17 Jul 2017 13:43:45 BST
# gpg: using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg: aka "Peter Maydell <pmaydell@gmail.com>"
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* remotes/pmaydell/tags/pull-target-arm-20170717:
MAINTAINERS: Add entries for MPS2 board
hw/arm/mps2: Add ethernet
hw/arm/mps2: Add SCC
hw/misc/mps2_scc: Implement MPS2 Serial Communication Controller
hw/arm/mps2: Add timers
hw/char/cmsdk-apb-timer: Implement CMSDK APB timer device
hw/arm/mps2: Add UARTs
hw/char/cmsdk-apb-uart.c: Implement CMSDK APB UART
hw/arm/mps2: Implement skeleton mps2-an385 and mps2-an511 board models
target/arm: use DISAS_EXIT for eret handling
target/arm: use gen_goto_tb for ISB handling
target/arm/translate: ensure gen_goto_tb sets exit flags
target/arm/translate.h: expand comment on DISAS_EXIT
target/arm/translate: make DISAS_UPDATE match declared semantics
include/exec/exec-all: document common exit conditions
target/arm: Make Cortex-M3 and M4 default to 8 PMSA regions
qdev: support properties which don't set a default value
qdev-properties.h: Explicitly set the default value for arraylen properties
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
As a precursor to later patches attempt to come up with a more
concrete wording for what each of the common exit cases would be.
CC: Emilio G. Cota <cota@braap.org>
CC: Richard Henderson <rth@twiddle.net>
CC: Lluís Vilanova <vilanova@ac.upc.edu>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 20170713141928.25419-2-alex.bennee@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Every vCPU now uses a separate set of TBs for each set of dynamic
tracing event state values. Each set of TBs can be used by any number of
vCPUs to maximize TB reuse when vCPUs have the same tracing state.
This feature is later used by tracetool to optimize tracing of guest
code events.
The maximum number of TB sets is defined as 2^E, where E is the number
of events that have the 'vcpu' property (their state is stored in
CPUState->trace_dstate).
For this to work, a change on the dynamic tracing state of a vCPU will
force it to flush its virtual TB cache (which is only indexed by
address), and fall back to the physical TB cache (which now contains the
vCPU's dynamic tracing state as part of the hashing function).
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-id: 149915775266.6295.10060144081246467690.stgit@frigg.lan
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Add new utility functions which both initialize a RAM
MemoryRegion and arrange for its contents to be migrated;
we give thes the memory_region_init_ram(), memory_region_init_rom()
and memory_region_init_rom_device() names that we just freed up
by renaming the old implementations to _nomigrate().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1499438577-7674-6-git-send-email-peter.maydell@linaro.org
Rename memory_region_init_rom() to memory_region_init_rom_nomigrate()
and memory_region_init_rom_device() to
memory_region_init_rom_device_nomigrate().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1499438577-7674-5-git-send-email-peter.maydell@linaro.org
Rename memory_region_init_ram() to memory_region_init_ram_nomigrate().
This leaves the way clear for us to provide a memory_region_init_ram()
which does handle migration.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1499438577-7674-4-git-send-email-peter.maydell@linaro.org
The various functions for initializing RAM MemoryRegions do not do
anything to cause the data in the MemoryRegion to be migrated.
Note in their documentation comments that this is the responsibility
of the caller.
(We will shortly add a new function that *does* do this for you.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1499438577-7674-3-git-send-email-peter.maydell@linaro.org
This finishes QOM'fication of IOMMUMemoryRegion by introducing
a IOMMUMemoryRegionClass. This also provides a fastpath analog for
IOMMU_MEMORY_REGION_GET_CLASS().
This makes IOMMUMemoryRegion an abstract class.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170711035620.4232-3-aik@ozlabs.ru>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion
as a parent.
This moves IOMMU-related fields from MR to IOMMU MR. However to avoid
dymanic QOM casting in fast path (address_space_translate, etc),
this adds an @is_iommu boolean flag to MR and provides new helper to
do simple cast to IOMMU MR - memory_region_get_iommu. The flag
is set in the instance init callback. This defines
memory_region_is_iommu as memory_region_get_iommu()!=NULL.
This switches MemoryRegion to IOMMUMemoryRegion in most places except
the ones where MemoryRegion may be an alias.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20170711035620.4232-2-aik@ozlabs.ru>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is to make it clear the index is purely a gdbstub function and
should not be confused with the value of cpu->cpu_index. At the same
time we move the function from the header to gdbstub itself which will
help with later changes.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20170712105216.747-3-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add CONFIG_TCG around TLB-related functions and structure declarations.
Some of these functions are defined in ./accel/tcg/cputlb.c, which will
not be linked in if TCG is disabled, and have no stubs; therefore, their
callers will also be compiled out for --disable-tcg.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CONFIG_SOFTMMU should never be used in common code, so mark
it as poisoned, too.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1498454578-18709-6-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 1f5c00cfdb ("qom/cpu: move tlb_flush to cpu_common_reset")
moved the call to tlb_flush() from the target-specific reset handlers
into the common code qom/cpu.c file, and protected the call with
"#ifdef CONFIG_SOFTMMU" to avoid that it is called for linux-user
only targets. But since qom/cpu.c is common code, CONFIG_SOFTMMU is
*never* defined here, so the tlb_flush() was simply never executed
anymore. Fix it by introducing a wrapper for tlb_flush() in a file
that is re-compiled for each target, i.e. in translate-all.c.
Fixes: 1f5c00cfdb
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1498454578-18709-5-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CONFIG_KVM is only defined for target-specific code, so nobody should
use it by accident in common code. To avoid such subtle bugs,
CONFIG_KVM is now marked as poisoned in common code. The header
include/sysemu/kvm.h is somewhat special since it is included
all over the place from common code, too, so we need some extra
logic via "#ifdef NEED_CPU_H" here to make sure that we can
compile all files without problems.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1498454578-18709-4-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The defines of some *-linux-user targets were still missing.
Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1498454578-18709-2-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We are relying on cpu_env being defined as a global, yet most
targets (i.e. all but arm/a64) have it defined as a local variable.
Luckily all of them use the same "cpu_env" name, but really
compilation shouldn't break if the name of that local variable
changed.
Fix it by using tcg_ctx.tcg_env, which all targets set in their
translate_init function. This change also helps paving the way
for the upcoming "translation loop common to all targets" work.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1497639397-19453-3-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1497639397-19453-2-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
In cpu_physical_memory_sync_dirty_bitmap(rb, start, ...), the 2nd
argument 'start' is relative to the start of the ramblock 'rb'. When
it's used to access the dirty memory bitmap of ram_list (i.e.
ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]->blocks[]), an offset to
the start of all RAM (i.e. rb->offset) should be added to it, which has
however been missed since c/s 6b6712efcc. For a ramblock of host memory
backend whose offset is not zero, cpu_physical_memory_sync_dirty_bitmap()
synchronizes the incorrect part of the dirty memory bitmap of ram_list
to the per ramblock dirty bitmap. As a result, a guest with host
memory backend may crash after migration.
Fix it by adding the offset of ramblock when accessing the dirty memory
bitmap of ram_list in cpu_physical_memory_sync_dirty_bitmap().
Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Message-Id: <20170628083704.24997-1-haozhong.zhang@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Tested-by: Juan Quintela <quintela@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This introduces a special callback which allows to run code from some MMIO
devices.
SysBusDevice with a MemoryRegion which implements the request_ptr callback will
be notified when the guest try to execute code from their offset. Then it will
be able to eg: pre-load some code from an SPI device or ask a pointer from an
external simulator, etc..
When the pointer or the data in it are no longer valid the device has to
invalidate it.
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Allocating an arbitrarily-sized array of tbs results in either
(a) a lot of memory wasted or (b) unnecessary flushes of the code
cache when we run out of TB structs in the array.
An obvious solution would be to just malloc a TB struct when needed,
and keep the TB array as an array of pointers (recall that tb_find_pc()
needs the TB array to run in O(log n)).
Perhaps a better solution, which is implemented in this patch, is to
allocate TB's right before the translated code they describe. This
results in some memory waste due to padding to have code and TBs in
separate cache lines--for instance, I measured 4.7% of padding in the
used portion of code_gen_buffer when booting aarch64 Linux on a
host with 64-byte cache lines. However, it can allow for optimizations
in some host architectures, since TCG backends could safely assume that
the TB and the corresponding translated code are very close to each
other in memory. See this message by rth for a detailed explanation:
https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg05172.html
Subject: Re: GSoC 2017 Proposal: TCG performance enhancements
Message-ID: <1e67644b-4b30-887e-d329-1848e94c9484@twiddle.net>
Suggested-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1496790745-314-3-git-send-email-cota@braap.org>
[rth: Simplify the arithmetic in tcg_tb_alloc]
Signed-off-by: Richard Henderson <rth@twiddle.net>
These are defined in config-target.h and thus should never be
used in common code.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1497468113-2874-3-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since we've got some new CPU targets in QEMU during the last months
and years, we've got some new TARGET_xxx defines now which should
be marked as poisoned for common code.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1497468113-2874-2-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a new function to initialize a RAM memory region with a file
descriptor to be mmap-ed.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20170602141229.15326-5-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add qemu_ram_alloc_from_fd(), which can be use to allocate ramblock from
fd only.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20170602141229.15326-4-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Instead of exporting goto_ptr directly to TCG frontends, export
tcg_gen_lookup_and_goto_ptr(), which calls goto_ptr with the pointer
returned by the lookup_tb_ptr() helper. This is the only use case
we have for goto_ptr and lookup_tb_ptr, so having this function is
very convenient. Furthermore, it trivially allows us to avoid calling
the lookup helper if goto_ptr is not implemented by the backend.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1493263764-18657-2-git-send-email-cota@braap.org>
Message-Id: <1493263764-18657-3-git-send-email-cota@braap.org>
Message-Id: <1493263764-18657-4-git-send-email-cota@braap.org>
Message-Id: <1493263764-18657-5-git-send-email-cota@braap.org>
[rth: Squashed 4 related commits.]
Signed-off-by: Richard Henderson <rth@twiddle.net>
All the file is surounded already by #ifndef CONFIG_USER_ONLY.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
We were always passing in that one as "false" to assume that's an read
operation, and we also assume that IOMMU translation would always have
that read permission. A better permission would be IOMMU_NONE since the
replay is after all not a real read operation, but just a page table
rebuilding process.
CC: David Gibson <david@gibson.dropbear.id.au>
CC: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
This patch converts the old "is_write" bool into IOMMUAccessFlags. The
difference is that "is_write" can only express either read/write, but
sometimes what we really want is "none" here (neither read nor write).
Replay is an good example - during replay, we should not check any RW
permission bits since thats not an actual IO at all.
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
It only needed TARGET_PAGE_SIZE/BITS/BITS_MIN values, so just export
them from exec.h
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
That is the only function that we need from exec.c, and having to
include the whole sysemu.h for this.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
/me leans to be less sloppy with copyright notices
thanks Dave
To dump information about ramblocks. It looks like:
(qemu) info ramblock
Block Name PSize Offset Used Total
/objects/mem 2 MiB 0x0000000000000000 0x0000000080000000 0x0000000080000000
vga.vram 4 KiB 0x0000000080060000 0x0000000001000000 0x0000000001000000
/rom@etc/acpi/tables 4 KiB 0x00000000810b0000 0x0000000000020000 0x0000000000200000
pc.bios 4 KiB 0x0000000080000000 0x0000000000040000 0x0000000000040000
0000:00:03.0/e1000.rom 4 KiB 0x0000000081070000 0x0000000000040000 0x0000000000040000
pc.rom 4 KiB 0x0000000080040000 0x0000000000020000 0x0000000000020000
0000:00:02.0/vga.rom 4 KiB 0x0000000081060000 0x0000000000010000 0x0000000000010000
/rom@etc/table-loader 4 KiB 0x00000000812b0000 0x0000000000001000 0x0000000000001000
/rom@etc/acpi/rsdp 4 KiB 0x00000000812b1000 0x0000000000001000 0x0000000000001000
Ramblock is something hidden internally in QEMU implementation, and this
command should only be used by mostly QEMU developers on RAM stuff. It
is not a command suitable for QMP interface. So only HMP interface is
provided for it.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1494562661-9063-4-git-send-email-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
So that it can simplifies the iterators.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1494562661-9063-2-git-send-email-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Both the ram bitmap and the unsent bitmap are split by RAMBlock.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Fix compilation when DEBUG_POSTCOPY is enabled (thanks Hailiang)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJY/5EdAAoJEIlPj0hw4a6QHj8P/1D3iZq8vKyLvnkioYC00ao8
dhuCFV1Dk0dl/QXvDXxNAHFqmHp2PzHeHF2V19bTbUh9QnY3WH9aSsMQ7YVPDzLb
6ZezbxXd1l8+IOrWh+ch+x6jiUpRAeHGjhSpFxQEmH5THBmPtLHBFhAeeGpVq6CT
MPFlN0mr1snyMMbK2aKCVTHgh5Wip83/t/kijIq4RmPwXdJbwxx55PL8jxC/NZHq
/NbAZzAT149fmItfBNnsRTfNoDs7fSmgM9VelYsnJNHs6qkYYfewxys4vudePG8n
ZcjCXMv9vdCSTfXfYY9nxhfGCgkkRSvBWrzARRIUPxTXGAmEU52LJrccpG9AEFRZ
Kgz1JYr0y+u/g+RsCJvAglvmciawXgZH8GIR4sFl2iv4u6cx8PxNRgjTdt7YX4Je
V7+scmOLB0U5wkI7PUcPV1v2fwKJHFfMdyik257otfMCJiTCnWGJfCZGR4JAdy3C
NHuG4eIj2XLjZ7IQIo3mg+EfdCWdfVMKtXj0MAFsP6Kcr6sY/ef5sx/wB61k3NU1
Y4ctI/LAOONsjlC2yzJ4aZR4LgyFcVcwx8FKOK7niCcCgVLYGWpyiHU3kFKYlmKM
FKIlGPPOK8WuRV9NUGDI9A8XENyFXGyiR2xGCotfgHj+FSSqRzhKH4ecfgNimJVY
wjTzZmBa68pLHdXaJ+SV
=OfeB
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/sstabellini/tags/xen-20170421-v2-tag' into staging
Xen 2017/04/21 + fix
# gpg: Signature made Tue 25 Apr 2017 19:10:37 BST
# gpg: using RSA key 0x894F8F4870E1AE90
# gpg: Good signature from "Stefano Stabellini <stefano.stabellini@eu.citrix.com>"
# gpg: aka "Stefano Stabellini <sstabellini@kernel.org>"
# Primary key fingerprint: D04E 33AB A51F 67BA 07D3 0AEA 894F 8F48 70E1 AE90
* remotes/sstabellini/tags/xen-20170421-v2-tag: (21 commits)
move xen-mapcache.c to hw/i386/xen/
move xen-hvm.c to hw/i386/xen/
move xen-common.c to hw/xen/
add xen-9p-backend to MAINTAINERS under Xen
xen/9pfs: build and register Xen 9pfs backend
xen/9pfs: send responses back to the frontend
xen/9pfs: implement in/out_iov_from_pdu and vmarshal/vunmarshal
xen/9pfs: receive requests from the frontend
xen/9pfs: connect to the frontend
xen/9pfs: introduce Xen 9pfs backend
9p: introduce a type for the 9p header
xen: import ring.h from xen
configure: use pkg-config for obtaining xen version
xen: additionally restrict xenforeignmemory operations
xen: use libxendevice model to restrict operations
xen: use 5 digit xen versions
xen: use libxendevicemodel when available
configure: detect presence of libxendevicemodel
xen: create wrappers for all other uses of xc_hvm_XXX() functions
xen: rename xen_modified_memory() to xen_hvm_modified_memory()
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This patch adds support for getting and using a local copy of the dirty
bitmap.
memory_region_snapshot_and_clear_dirty() will create a snapshot of the
dirty bitmap for the specified range, clear the dirty bitmap and return
the copy. The returned bitmap can be a bit larger than requested, the
range is expanded so the code can copy unsigned longs from the bitmap
and avoid atomic bit update operations.
memory_region_snapshot_get_dirty() will return the dirty status of
pages, pretty much like memory_region_get_dirty(), but using the copy
returned by memory_region_copy_and_clear_dirty().
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170421091632.30900-3-kraxel@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJY+d69AAoJEPSH7xhYctcj/4oQAIFFEyWaqrL9ve5ySiJgdtcY
zYtiIhZQ+nPuy2i1oDSX+vbMcmkJDDyfO5qLovxyHGkZHniR8HtxNHP+MkZQa07p
DiSIvd51HvcixIouhbGcoUCU63AYxqNL3o5/TyNpUI72nvsgwl3yfOot7PtutE/F
r384j8DrOJ9VwC5GGPg27mJvRPvyfDQWfxDCyMYVw153HTuwVYtgiu/layWojJDV
D2L1KV45ezBuGckZTHt9y6K4J5qz8qHb/dJc+whBBjj4j9T9XOILU9NPDAEuvjFZ
gHbrUyxj7EiApjHcDZoQm9Raez422ALU30yc9Kn7ik7vSqTxk2Ejq6Gz7y9MJrDn
KdMj75OETJNjBL+0T9MmbtWts28+aalpTUXtBpmi3eWQV5Hcox2NF1RP42jtD9Pa
lkrM6jv0nsdNfBPlQ+ZmBTJxysWECcMqy487nrzmPNC8vZfokjXL5be12puho9fh
ziU4gx9C6/k82S+/H6WD/AUtRiXJM7j4oTU2mnjrsSXQC1JNWqODBOFUo9zsDufl
vtcrxfPhSD1DwOInFSIBHf/RylcgTkPCL0rPoJ8npNDly6rHFYJ+oIbsn84Z4uYY
RWvH8xB9wgRlK9L1WdRgOd2q7PaeHQoSSdPOiS9YVEVMVvSW8Es5CRlhcAsw/M/T
1Tl65cNrjETAuZKL3dLH
=EsZ5
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20170421' into staging
migration/next for 20170421
# gpg: Signature made Fri 21 Apr 2017 11:28:13 BST
# gpg: using RSA key 0xF487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>"
# gpg: aka "Juan Quintela <quintela@trasno.org>"
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723
* remotes/juanquintela/tags/migration/20170421: (65 commits)
hmp: info migrate_parameters format tunes
hmp: info migrate_capability format tunes
migration: rename max_size to threshold_size
migration: set current_active_state once
virtio-rng: stop virtqueue while the CPU is stopped
migration: don't close a file descriptor while it can be in use
ram: Remove migration_bitmap_extend()
migration: Disable hotplug/unplug during migration
qdev: Move qdev_unplug() to qdev-monitor.c
qdev: Export qdev_hot_removed
qdev: qdev_hotplug is really a bool
migration: Remove MigrationState parameter from migration_is_idle()
ram: Use RAMBitmap type for coherence
ram: rename last_ram_offset() last_ram_pages()
ram: Use ramblock and page offset instead of absolute offset
ram: Change offset field in PageSearchStatus to page
ram: Remember last_page instead of last_offset
ram: Use page number instead of an address for the bitmap operations
ram: reorganize last_sent_block
ram: ram_discard_range() don't use the mis parameter
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We have disabled memory hotplug, so we don't need to handle
migration_bitamp there.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
We change the meaning of start to be the offset from the beggining of
the block.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
The default replay() don't work for VT-d since vt-d will have a huge
default memory region which covers address range 0-(2^64-1). This will
normally consumes a lot of time (which looks like a dead loop).
The solution is simple - we don't walk over all the regions. Instead, we
jump over the regions when we found that the page directories are empty.
It'll greatly reduce the time to walk the whole region.
To achieve this, we provided a page walk helper to do that, invoking
corresponding hook function when we found an page we are interested in.
vtd_page_walk_level() is the core logic for the page walking. It's
interface is designed to suite further use case, e.g., to invalidate a
range of addresses.
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-8-git-send-email-peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Originally we have one memory_region_iommu_replay() function, which is
the default behavior to replay the translations of the whole IOMMU
region. However, on some platform like x86, we may want our own replay
logic for IOMMU regions. This patch adds one more hook for IOMMUOps for
the callback, and it'll override the default if set.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-6-git-send-email-peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Generalizing the notify logic in memory_region_notify_iommu() into a
single function. This can be further used in customized replay()
functions for IOMMUs.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-5-git-send-email-peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This is an "global" version of existing memory_region_iommu_replay() -
we announce the translations to all the registered notifiers, instead of
a specific one.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-4-git-send-email-peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
A new macro is provided to iterate all the IOMMU notifiers hooked
under specific IOMMU memory region.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-3-git-send-email-peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
In this patch, IOMMUNotifier.{start|end} are introduced to store section
information for a specific notifier. When notification occurs, we not
only check the notification type (MAP|UNMAP), but also check whether the
notified iova range overlaps with the range of specific IOMMU notifier,
and skip those notifiers if not in the listened range.
When removing an region, we need to make sure we removed the correct
VFIOGuestIOMMU by checking the IOMMUNotifier.start address as well.
This patch is solving the problem that vfio-pci devices receive
duplicated UNMAP notification on x86 platform when vIOMMU is there. The
issue is that x86 IOMMU has a (0, 2^64-1) IOMMU region, which is
splitted by the (0xfee00000, 0xfeefffff) IRQ region. AFAIK
this (splitted IOMMU region) is only happening on x86.
This patch also helps vhost to leverage the new interface as well, so
that vhost won't get duplicated cache flushes. In that sense, it's an
slight performance improvement.
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-2-git-send-email-peterx@redhat.com>
[ehabkost: included extra vhost_iommu_region_del() change from Peter Xu]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
MemoryRegionCache did not know about virtio support for IOMMUs (because the
two features were developed at the same time). Revert MemoryRegionCache
to "normal" address_space_* operations for 2.9, as it is simpler than
undoing the virtio patches.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch is a purely cosmetic change that avoids a name collision in
a subsequent patch.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Provide a helper to say whether a RAMBlock was created as a
shared mapping.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
In function cpu_physical_memory_sync_dirty_bitmap, file
include/exec/ram_addr.h:
if (src[idx][offset]) {
unsigned long bits = atomic_xchg(&src[idx][offset], 0);
unsigned long new_dirty;
new_dirty = ~dest[k];
dest[k] |= bits;
new_dirty &= bits;
num_dirty += ctpopl(new_dirty);
}
After these codes executed, only the pages not dirtied in bitmap(dest),
but dirtied in dirty_memory[DIRTY_MEMORY_MIGRATION] will be calculated.
For example:
When ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] = 0b00001111,
and atomic_rcu_read(&migration_bitmap_rcu)->bmap = 0b00000011,
the new_dirty will be 0b00001100, and this function will return 2 but not
4 which is expected.
the dirty pages in dirty_memory[DIRTY_MEMORY_MIGRATION] are all new,
so these should be calculated also.
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The 'name' parameter to memory_region_init_* had been marked as debug
only, however vmstate_region_ram uses it as a parameter to
qemu_ram_set_idstr to set RAMBlock names and these form part of the
migration stream.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170309152708.30635-1-dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This will probably be my last pull request before the hard freeze. It
has some new work, but that has all been posted in draft before the
soft freeze, so I think it's reasonable to include in qemu-2.9.
This batch has:
* A substantial amount of POWER9 work
* Implements the legacy (hash) MMU for POWER9
* Some more preliminaries for implementing the POWER9 radix
MMU
* POWER9 has_work
* Basic POWER9 compatibility mode handling
* Removal of some premature tests
* Some cleanups and fixes to the existing MMU code to make the
POWER9 work simpler
* A bugfix for TCG multiply adds on power
* Allow pseries guests to access PCIe extended config space
This also includes a code-motion not strictly in ppc code - moving
getrampagesize() from ppc code to exec.c. This will make some future
VFIO improvements easier, Paolo said it was ok to merge via my tree.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJYuOEEAAoJEGw4ysog2bOSHD0P/jBg/qr/4KnsB1KhnlVrB2sP
vy2d3bGGlUWr9Z+CK/PMCRB8ekFgQLjidLIXji6mviUocv6m3WsVrnbLF/oOL/IT
NPMVAffw7q804YVu1Ns9R82d6CIqHTy//bpg69tFMcJmhL9fqPan3wTZZ9JeiyAm
SikqkAHBSW4SxKqg8ApaSqx5L2QTqyfkClR0sLmgM0JtmfJrbobpQ6bMtdPjUZ9L
n2gnpO2vaWCa1SEQrRrdELqvcD8PHkSJapWOBXOkpGWxoeov/PYxOgkpdDUW4qYY
lVLtp1Vd3OB/h3Unqfw32DNiHA5p89hWPX5UybKMgRVL9Cv2/lyY47pcY8XTeNzn
bv84YRbFJeI+GgoEnghmtq+IM8XiW/cr9rWm9wATKfKGcmmFauumALrsffUpHVCM
4hSNgBv5t2V9ptZ+MDlM/Ku+zk9GoqwQ+hemdpVtiyhOtGUPGFBn5YLE4c2DHFxV
+L9JtBnFn8obnssNoz0wL+QvZchT1qUHMhH5CWAanjw9CTDp/YwQ2P01zK+00s9d
4cB7fUG3WNto5eXXEGMaXeDsUEz8z//hTe3j5sVbnHsXi0R3dhv7iryifmx4bUKU
H9EwAc+uNUHbvBy7u6IWg0I8P2n00CCO6JqXijQ92zELJ5j0XhzHUI2dOXn+zyEo
3FZu56LFnSSUBEXuTjq4
=PcNw
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170303' into staging
ppc patch queuye for 2017-03-03
This will probably be my last pull request before the hard freeze. It
has some new work, but that has all been posted in draft before the
soft freeze, so I think it's reasonable to include in qemu-2.9.
This batch has:
* A substantial amount of POWER9 work
* Implements the legacy (hash) MMU for POWER9
* Some more preliminaries for implementing the POWER9 radix
MMU
* POWER9 has_work
* Basic POWER9 compatibility mode handling
* Removal of some premature tests
* Some cleanups and fixes to the existing MMU code to make the
POWER9 work simpler
* A bugfix for TCG multiply adds on power
* Allow pseries guests to access PCIe extended config space
This also includes a code-motion not strictly in ppc code - moving
getrampagesize() from ppc code to exec.c. This will make some future
VFIO improvements easier, Paolo said it was ok to merge via my tree.
# gpg: Signature made Fri 03 Mar 2017 03:20:36 GMT
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.9-20170303:
target/ppc: rewrite f[n]m[add,sub] using float64_muladd
spapr: Small cleanup of PPC MMU enums
spapr_pci: Advertise access to PCIe extended config space
target/ppc: Rework hash mmu page fault code and add defines for clarity
target/ppc: Move no-execute and guarded page checking into new function
target/ppc: Add execute permission checking to access authority check
target/ppc: Add Instruction Authority Mask Register Check
hw/ppc/spapr: Add POWER9 to pseries cpu models
target/ppc/POWER9: Add cpu_has_work function for POWER9
target/ppc/POWER9: Add POWER9 pa-features definition
target/ppc/POWER9: Add POWER9 mmu fault handler
target/ppc: Don't gen an SDR1 on POWER9 and rework register creation
target/ppc: Add patb_entry to sPAPRMachineState
target/ppc/POWER9: Add POWERPC_MMU_V3 bit
powernv: Don't test POWER9 CPU yet
exec, kvm, target-ppc: Move getrampagesize() to common code
target/ppc: Add POWER9/ISAv3.00 to compat_table
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
At the moment ram device's memory regions are DEVICE_NATIVE_ENDIAN. It's
incorrect. This memory region is backed by a MMIO area in host, so the
uint64_t data that MemoryRegionOps read from/write to this area should be
host-endian rather than target-endian. Hence, current code does not work
when target and host endianness are different which is the most common case
on PPC64. To fix it, this introduces DEVICE_HOST_ENDIAN for the ram device.
This has been tested on PPC64 BE/LE host/guest in all possible combinations
including TCG.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
Message-Id: <1488171164-28319-1-git-send-email-xyjxie@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Merge the original development branch due to breakage caused by the
MTTCG merge.
Conflicts:
cpu-exec.c
translate-common.c
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
getrampagesize() returns the largest supported page size and mainly
used to know if huge pages are enabled.
However is implemented in target-ppc/kvm.c and not available
in TCG or other architectures.
This renames and moves gethugepagesize() to mmap-alloc.c where
fd-based analog of it is already implemented. This renames and moves
getrampagesize() to exec.c as it seems to be the common place for
helpers like this.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Record the largest page size in use; we'll need it soon for allocating
temporary buffers.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20170224182844.32452-7-dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Create ram_block_discard_range in exec.c to replace
postcopy_ram_discard_range and most of ram_discard_range.
Those two routines are a bit of a weird combination, and
ram_discard_range is about to get more complex for hugepages.
It's OS dependent code (so shouldn't be in migration/ram.c) but
it needs quite a bit of the innards of RAMBlock so doesn't belong in
the os*.c.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20170224182844.32452-5-dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This introduces support to the cputlb API for flushing all CPUs TLBs
with one call. This avoids the need for target helpers to iterate
through the vCPUs themselves.
An additional variant of the API (_synced) will cause the source vCPUs
work to be scheduled as "safe work". The result will be all the flush
operations will be complete by the time the originating vCPU executes
its safe work. The calling implementation can either end the TB
straight away (which will then pick up the cpu->exit_request on
entering the next block) or defer the exit until the architectural
sync point (usually a barrier instruction).
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
The main use case for tlb_reset_dirty is to set the TLB_NOTDIRTY flags
in TLB entries to force the slow-path on writes. This is used to mark
page ranges containing code which has been translated so it can be
invalidated if written to. To do this safely we need to ensure the TLB
entries in question for all vCPUs are updated before we attempt to run
the code otherwise a race could be introduced.
To achieve this we atomically set the flag in tlb_reset_dirty_range and
take care when setting it when the TLB entry is filled.
On 32 bit systems attempting to emulate 64 bit guests we don't even
bother as we might not have the atomic primitives available. MTTCG is
disabled in this case and can't be forced on. The copy_tlb_helper
function helps keep the atomic semantics in one place to avoid
confusion.
The dirty helper function is made static as it isn't used outside of
cputlb.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
While the vargs approach was flexible the original MTTCG ended up
having munge the bits to a bitmap so the data could be used in
deferred work helpers. Instead of hiding that in cputlb we push the
change to the API to make it take a bitmap of MMU indexes instead.
For ARM some the resulting flushes end up being quite long so to aid
readability I've tended to move the index shifting to a new line so
all the bits being or-ed together line up nicely, for example:
tlb_flush_page_by_mmuidx(other_cs, pageaddr,
(1 << ARMMMUIdx_S1SE1) |
(1 << ARMMMUIdx_S1SE0));
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
[AT: SPARC parts only]
Reviewed-by: Artyom Tarasenko <atar4qemu@gmail.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
[PM: ARM parts only]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Some architectures allow to flush the tlb of other VCPUs. This is not a problem
when we have only one thread for all VCPUs but it definitely needs to be an
asynchronous work when we are in true multithreaded work.
We take the tb_lock() when doing this to avoid racing with other threads
which may be invalidating TB's at the same time. The alternative would
be to use proper atomic primitives to clear the tlb entries en-mass.
This patch doesn't do anything to protect other cputlb function being
called in MTTCG mode making cross vCPU changes.
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
[AJB: remove need for g_malloc on defer, make check fixes, tb_lock]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
There are now only two uses of the global exit_request left.
The first ensures we exit the run_loop when we first start to process
pending work and in the kick handler. This is just as easily done by
setting the first_cpu->exit_request flag.
The second use is in the round robin kick routine. The global
exit_request ensured every vCPU would set its local exit_request and
cause a full exit of the loop. Now the iothread isn't being held while
running we can just rely on the kick handler to push us out as intended.
We lightly re-factor the main vCPU thread to ensure cpu->exit_requests
cause us to exit the main loop and process any IO requests that might
come along. As an cpu->exit_request may legitimately get squashed
while processing the EXCP_INTERRUPT exception we also check
cpu->queued_work_first to ensure queued work is expedited as soon as
possible.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
..and make the definition local to cpus. In preparation for MTTCG the
concept of a global tcg_current_cpu will no longer make sense. However
we still need to keep track of it in the single-threaded case to be able
to exit quickly when required.
qemu_cpu_kick_no_halt() moves and becomes qemu_cpu_kick_rr_cpu() to
emphasise its use-case. qemu_cpu_kick now kicks the relevant cpu as
well as qemu_kick_rr_cpu() which will become a no-op in MTTCG.
For the time being the setting of the global exit_request remains.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
The icount interrupt flag and tcg_exit_req serve almost the same
purpose, let's make them completely the same.
The former TB_EXIT_REQUESTED and TB_EXIT_ICOUNT_EXPIRED cases are
unified, since we can distinguish them from the value of the
interrupt flag.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For now, the cache is created on every virtqueue_pop. Later on,
direct descriptors will be able to reuse it.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When icount is active, tb_add_jump is surprisingly called with an
out of bounds basic block index. I have no idea how that can work,
but it does not seem like a good idea. Clear *last_tb for all
TB_EXIT_ICOUNT_EXPIRED cases, even when all you have to do is
refill icount_extra.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce rules in the top level Makefile that are able to generate
trace.[ch] files in every subdirectory which has a trace-events file.
The top level directory is handled specially, so instead of creating
trace.h, it creates trace-root.h. This allows sub-directories to
include the top level trace-root.h file, without ambiguity wrt to
the trace.g file in the current sub-dir.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170125161417.31949-7-berrange@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Adding one more option "-f" for "info mtree" to dump the flat views of
all the address spaces.
This will be useful to debug the memory rendering logic, also it'll be
much easier with it to know what memory region is handling what address
range.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1484556005-29701-3-git-send-email-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We have never has the concept of global TLB entries which would avoid
the flush so we never actually use this flag. Drop it and make clear
that tlb_flush is the sledge-hammer it has always been.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
[DG: ppc portions]
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
This patch introduces a helper to query the iotlb entry for a
possible iova. This will be used by later device IOTLB API to enable
the capability for a dataplane (e.g vhost) to query the IOTLB.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Device models often have to perform multiple access to a single
memory region that is known in advance, but would to use "DMA-style"
functions instead of address_space_map/unmap. This can happen
for example when the data has to undergo endianness conversion.
Introduce a new data structure to cache the result of
address_space_translate without forcing usage of a host address
like address_space_map does.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Templatize the address_space_* and *_phys functions, so that we can add
similar functions in the next patch that work with a lightweight,
cache-like version of address_space_map/unmap.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In the user emulation code path, tlb_vaddr_to_host erronesously passed
vaddr as the guest address to be translated, instead of addr, the parameter
which actually contained the guest address.
This resulted in incorrect addresses being used when emulating block copy
(mvc/mvpg) and block clear (xc) instructions for the s390x target.
Signed-off-by: Bobby Bingham <koorogi@koorogi.info>
Message-Id: <20161113050523.23909-1-koorogi@koorogi.info>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
With a vfio assigned device we lay down a base MemoryRegion registered
as an IO region, giving us read & write accessors. If the region
supports mmap, we lay down a higher priority sub-region MemoryRegion
on top of the base layer initialized as a RAM device pointer to the
mmap. Finally, if we have any quirks for the device (ie. address
ranges that need additional virtualization support), we put another IO
sub-region on top of the mmap MemoryRegion. When this is flattened,
we now potentially have sub-page mmap MemoryRegions exposed which
cannot be directly mapped through KVM.
This is as expected, but a subtle detail of this is that we end up
with two different access mechanisms through QEMU. If we disable the
mmap MemoryRegion, we make use of the IO MemoryRegion and service
accesses using pread and pwrite to the vfio device file descriptor.
If the mmap MemoryRegion is enabled and results in one of these
sub-page gaps, QEMU handles the access as RAM, using memcpy to the
mmap. Using either pread/pwrite or the mmap directly should be
correct, but using memcpy causes us problems. I expect that not only
does memcpy not necessarily honor the original width and alignment in
performing a copy, but it potentially also uses processor instructions
not intended for MMIO spaces. It turns out that this has been a
problem for Realtek NIC assignment, which has such a quirk that
creates a sub-page mmap MemoryRegion access.
To resolve this, we disable memory_access_is_direct() for ram_device
regions since QEMU assumes that it can use memcpy for those regions.
Instead we access through MemoryRegionOps, which replaces the memcpy
with simple de-references of standard sizes to the host memory.
With this patch we attempt to provide unrestricted access to the RAM
device, allowing byte through qword access as well as unaligned
access. The assumption here is that accesses initiated by the VM are
driven by a device specific driver, which knows the device
capabilities. If unaligned accesses are not supported by the device,
we don't want them to work in a VM by performing multiple aligned
accesses to compose the unaligned access. A down-side of this
philosophy is that the xp command from the monitor attempts to use
the largest available access weidth, unaware of the underlying
device. Using memcpy had this same restriction, but at least now an
operator can dump individual registers, even if blocks of device
memory may result in access widths beyond the capabilities of a
given device (RTL NICs only support up to dword).
Reported-by: Thorsten Kohfeldt <thorsten.kohfeldt@gmx.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Setting skip_dump on a MemoryRegion allows us to modify one specific
code path, but the restriction we're trying to address encompasses
more than that. If we have a RAM MemoryRegion backed by a physical
device, it not only restricts our ability to dump that region, but
also affects how we should manipulate it. Here we recognize that
MemoryRegions do not change to sometimes allow dumps and other times
not, so we replace setting the skip_dump flag with a new initializer
so that we know exactly the type of region to which we're applying
this behavior.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
softmmu requires more functions to be thread-safe, because translation
blocks can be invalidated from e.g. notdirty callbacks. Probably the
same holds for user-mode emulation, it's just that no one has ever
tried to produce a coherent locking there.
This patch will guide the introduction of more tb_lock and tb_unlock
calls for system emulation.
Note that after this patch some (most) of the mentioned functions are
still called outside tb_lock/tb_unlock. The next one will rectify this.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-Id: <20161027151030.20863-7-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This adds asserts to check the locking on the various translation
engines structures. There are two sets of structures that are protected
by locks.
The first the l1map and PageDesc structures used to track which
translation blocks are associated with which physical addresses. In
user-mode this is covered by the mmap_lock.
The second case are TB context related structures which are protected by
tb_lock which is also user-mode only.
Currently the asserts do nothing in SoftMMU mode but this will change
for MTTCG.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-Id: <20161027151030.20863-4-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When we cannot emulate an atomic operation within a parallel
context, this exception allows us to stop the world and try
again in a serial context.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
x2APIC support to APIC code, cpu_exec_init() refactor on all
architectures, and other x86 changes.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJYDmYyAAoJECgHk2+YTcWmoSUP/2ga+b9YmPuyL7XC+12pff0I
Z8gdjUzbMUNcCI0JMZCTGUJbs3BapLcnsA7ypmt88s9kG02WeDMhNx1BfYiAFgLU
kPLQlXAM7awEdGagd3sTCiFojSUZ7GxYHjd5fuhPoOAXvXM8im6zJl18ZcsnStjO
/J8JGoGDHq1XJlz+RIjnGamojJWCiO/+iiD+rFmVSic8zjHPDYq14sIk/QJX+DaF
azLiOI6DAlX3kyrN5ZshhIRQ3COzzUMUSDF/ZaYHjudUco5MBnwj/oLQniTq+ZUd
hCu7dr5TpLxI7q1yltyd0UIl/+aZGbE8tEvoXAtc735iK4m2CTckT7ql6x3xI+Ir
PmpPgIswHqfCiCXm8imLj6ZI47kRA1x4x4AudLaNVKP7jO82485sS9HWpOadYsaU
jvek2SqfqvH+vce4FzwlLEcXGDb73MT/XkIUvd7SfPIbs9umgdZc03U4SHfAWr0i
lAIRs4Ym0AAS2WSE4E09wvdUUr9oxaQBMhw3JAiNmg7hLfyINTP+D/IhtlAVXXEA
F9D7fky5lDwfKvIwPxPJbDD5bCBV9AmxhiahIhv3epu4Kg4orf1inkrx0IZWSbB0
7+JZ7j8asuizfibkeZAN9rxVwmz32makJNsnjzZHlnaPxTvIDzvRkNceBnhC5vKq
3yfxgl4agXmMjveraAtt
=T2kg
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging
x86 and CPU queue, 2016-10-24
x2APIC support to APIC code, cpu_exec_init() refactor on all
architectures, and other x86 changes.
# gpg: Signature made Mon 24 Oct 2016 20:51:14 BST
# gpg: using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6
* remotes/ehabkost/tags/x86-pull-request:
exec: call cpu_exec_exit() from a CPU unrealize common function
exec: move cpu_exec_init() calls to realize functions
exec: split cpu_exec_init()
pc: q35: Bump max_cpus to 288
pc: Require IRQ remapping and EIM if there could be x2APIC CPUs
pc: Add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs
Increase MAX_CPUMASK_BITS from 255 to 288
pc: Clarify FW_CFG_MAX_CPUS usage comment
pc: kvm_apic: Pass APIC ID depending on xAPIC/x2APIC mode
pc: apic_common: Reset APIC ID to initial ID when switching into x2APIC mode
pc: apic_common: Restore APIC ID to initial ID on reset
pc: apic_common: Extend APIC ID property to 32bit
pc: Leave max apic_id_limit only in legacy cpu hotplug code
acpi: cphp: Force switch to modern cpu hotplug if APIC ID > 254
pc: acpi: x2APIC support for SRAT table
pc: acpi: x2APIC support for MADT table and _MAT method
Conflicts:
target-arm/cpu.c
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Modify all CPUs to call it from XXX_cpu_realizefn() function.
Remove all the cannot_destroy_with_object_finalize_yet as
unsafe references have been moved to cpu_exec_realizefn().
(tested with QOM command provided by commit 4c315c27)
for arm:
Setting of cpu->mp_affinity is moved from arm_cpu_initfn()
to arm_cpu_realizefn() as setting of cpu_index is now done
in cpu_exec_realizefn(). To avoid to overwrite an user defined
value, we set it to an invalid value by default, and update
it in realize function only if the value is still invalid.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Support target CPUs having a page size which isn't knownn
at compile time. To use this, the CPU implementation should:
* define TARGET_PAGE_BITS_VARY
* not define TARGET_PAGE_BITS
* define TARGET_PAGE_BITS_MIN to the smallest value it
might possibly want for TARGET_PAGE_BITS
* call set_preferred_target_page_bits() in its realize
function to indicate the actual preferred target page
size for the CPU (and report any error from it)
In CONFIG_USER_ONLY, the CPU implementation should continue
to define TARGET_PAGE_BITS appropriately for the guest
OS page size.
Machines which want to take advantage of having the page
size something larger than TARGET_PAGE_BITS_MIN must
set the MachineClass minimum_page_bits field to a value
which they guarantee will be no greater than the preferred
page size for any CPU they create.
Note that changing the target page size by setting
minimum_page_bits is a migration compatibility break
for that machine.
For debugging purposes, attempts to use TARGET_PAGE_SIZE
before it has been finally confirmed will assert.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
This speeds up MEMORY_LISTENER_CALL noticeably. Right now,
with many PCI devices you have N regions added to M AddressSpaces
(M = # PCI devices with bus-master enabled) and each call looks
up the whole listener list, with at least M listeners in it.
Because most of the regions in N are BARs, which are also roughly
proportional to M, the whole thing is O(M^3). This changes it
to O(M^2), which is the best we can do without rewriting the
whole thing.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Store the page size in each RAMBlock, we need it later.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Use async_safe_run_on_cpu() to make tb_flush() thread safe. This is
possible now that code generation does not happen in the middle of
execution.
It can happen that multiple threads schedule a safe work to flush the
translation buffer. To keep statistics and debugging output sane, always
check if the translation buffer has already been flushed.
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
[AJB: minor re-base fixes]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <1470158864-17651-13-git-send-email-alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a mutex for the CPU list to system emulation, as it will be used to
manage safe work. Abstract manipulation of the CPU list in new functions
cpu_list_add and cpu_list_remove.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Migrating a VM during reboot sometimes results in differences
between the source and destination in the SMRAM area.
This is because migration_bitmap_sync() only fetches from KVM
the dirty log of address_space_memory. SMRAM memory slots
are ignored and the modifications to SMRAM are not sent to the
destination.
Reported-by: He Rongguang <herongguang.he@huawei.com>
Reviewed-by: He Rongguang <herongguang.he@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The new interface can be used to replace the old notify_started() and
notify_stopped(). Meanwhile it provides explicit flags so that IOMMUs
can know what kind of notifications it is requested for.
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1474606948-14391-3-git-send-email-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
IOMMU Notifier list is used for notifying IO address mapping changes.
Currently VFIO is the only user.
However it is possible that future consumer like vhost would like to
only listen to part of its notifications (e.g., cache invalidations).
This patch introduced IOMMUNotifier and IOMMUNotfierFlag bits for a
finer grained control of it.
IOMMUNotifier contains a bitfield for the notify consumer describing
what kind of notification it is interested in. Currently two kinds of
notifications are defined:
- IOMMU_NOTIFIER_MAP: for newly mapped entries (additions)
- IOMMU_NOTIFIER_UNMAP: for entries to be removed (cache invalidates)
When registering the IOMMU notifier, we need to specify one or multiple
types of messages to listen to.
When notifications are triggered, its type will be checked against the
notifier's type bits, and only notifiers with registered bits will be
notified.
(For any IOMMU implementation, an in-place mapping change should be
notified with an UNMAP followed by a MAP.)
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1474606948-14391-2-git-send-email-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The return address argument to the softmmu template helpers was
confused. In the legacy case, we wanted to indicate that there
is no return address, and so passed in NULL. However, we then
immediately subtracted GETPC_ADJ from NULL, resulting in a non-zero
value, indicating the presence of an (invalid) return address.
Push the GETPC_ADJ subtraction down to the only point it's required:
immediately before use within cpu_restore_state_from_tb, after all
NULL pointer checks have been completed.
This makes GETPC and GETRA identical. Remove GETRA as the lesser
used macro, replacing all uses with GETPC.
Signed-off-by: Richard Henderson <rth@twiddle.net>
When invalidating a translation block, set an invalid flag into the
TranslationBlock structure first. It is also necessary to check whether
the target TB is still valid after acquiring 'tb_lock' but before calling
tb_add_jump() since TB lookup is to be performed out of 'tb_lock' in
future. Note that we don't have to check 'last_tb'; an already invalidated
TB will not be executed anyway and it is thus safe to patch it.
Suggested-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Instead of using -1 as end of chain, use 0, and link through the 0
entry as a fully circular double-linked list.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
For i386, the ABI specifies that 'long long' (8 byte values)
need only be 4 aligned, but we were requiring them to be
8-aligned. This meant we were laying out the target_epoll_event
structure wrongly. Add a suitable ifdef to abitypes.h to
specify the i386-specific alignment requirement.
Reported-by: Icenowy Zheng <icenowy@aosc.xyz>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Header guard symbols should match their file name to make guard
collisions less likely. Offenders found with
scripts/clean-header-guards.pl -vn.
Cleaned up with scripts/clean-header-guards.pl, followed by some
renaming of new guard symbols picked by the script to better ones.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Tracked down with an ugly, brittle and probably buggy Perl script.
Also move includes converted to <...> up so they get included before
ours where that's obviously okay.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
There are functions tlb_fill(), cpu_unaligned_access() and
do_unaligned_access() that are called with access type and mmu index
arguments. But these arguments are named 'is_write' and 'is_user' in their
declarations. The patches fix the arguments to avoid a confusion.
Signed-off-by: Sergey Sorokin <afarallax@yandex.ru>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-id: 1465907177-1399402-1-git-send-email-afarallax@yandex.ru
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Some architectures (e.g. ARMv8) need the address which is aligned
to a size more than the size of the memory access.
To support such check it's enough the current costless alignment
check implementation in QEMU, but we need to support
an alignment size specifying.
Signed-off-by: Sergey Sorokin <afarallax@yandex.ru>
Message-Id: <1466705806-679898-1-git-send-email-afarallax@yandex.ru>
Signed-off-by: Richard Henderson <rth@twiddle.net>
[rth: Assert in tcg_canonicalize_memop. Leave get_alignment_bits
available for, though unused by, user-mode. Retain logging difference
based on ALIGNED_ONLY.]
It doesn't make sense to pass a NULL ops argument to
memory_region_init_rom_device(), because the effect will
be that if the guest tries to write to the memory region
then QEMU will segfault. Catch the bug earlier by sanity
checking the arguments to this function, and remove the
misleading documentation that suggests that passing NULL
might be sensible.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1467122287-24974-4-git-send-email-peter.maydell@linaro.org
Provide a new helper function memory_region_init_rom() for memory
regions which are read-only (and unlike those created by
memory_region_init_rom_device() don't have special behaviour
for writes). This has the same behaviour as calling
memory_region_init_ram() and then memory_region_set_readonly()
(which is what we do today in boards with pure ROMs) but is a
more easily discoverable API for the purpose.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1467122287-24974-2-git-send-email-peter.maydell@linaro.org
The IOMMU driver may change behavior depending on whether a notifier
client is present. In the case of POWER, this represents a change in
the visibility of the IOTLB, for other drivers such as intel-iommu and
future AMD-Vi emulation, notifier support is not yet enabled and this
provides the opportunity to flag that incompatibility.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
[new log & extracted from [PATCH qemu v17 12/12] spapr_iommu, vfio, memory: Notify IOMMU about starting/stopping listening]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This function needs to be converted to QOM hook and virtualised for
multi-arch. This rename interferes, as cpu-qom will not have access
to the renaming causing name divergence. This rename doesn't really do
anything anyway so just delete it.
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-Id: <69bd25a8678b8b31b91cd9760c777bed1aafb44e.1437212383.git.crosthwaite.peter@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Crosthwaite <crosthwaitepeter@gmail.com>
Every IOMMU has some granularity which MemoryRegionIOMMUOps::translate
uses when translating, however this information is not available outside
the translate context for various checks.
This adds a get_min_page_size callback to MemoryRegionIOMMUOps and
a wrapper for it so IOMMU users (such as VFIO) can know the minimum
actual page size supported by an IOMMU.
As IOMMU MR represents a guest IOMMU, this uses TARGET_PAGE_SIZE
as fallback.
This removes vfio_container_granularity() and uses new helper in
memory_region_iommu_replay() when replaying IOMMU mappings on added
IOMMU memory region.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
[dwg: Removed an unnecessary calculation]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The event is described in "trace-events". Note that the "MO_AMASK" flag
is not traced, since it does not seem to affect the visible semantics of
instructions.
[s/inline inline/inline/ to fix clang build.
--Stefan]
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 146549350711.18437.726780393247474362.stgit@fimbulvetr.bsc.es
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Having a fixed-size hash table for keeping track of all translation blocks
is suboptimal: some workloads are just too big or too small to get maximum
performance from the hash table. The MRU promotion policy helps improve
performance when the hash table is a little undersized, but it cannot
make up for severely undersized hash tables.
Furthermore, frequent MRU promotions result in writes that are a scalability
bottleneck. For scalability, lookups should only perform reads, not writes.
This is not a big deal for now, but it will become one once MTTCG matures.
The appended fixes these issues by using qht as the implementation of
the TB hash table. This solution is superior to other alternatives considered,
namely:
- master: implementation in QEMU before this patchset
- xxhash: before this patch, i.e. fixed buckets + xxhash hashing + MRU.
- xxhash-rcu: fixed buckets + xxhash + RCU list + MRU.
MRU is implemented here by adding an intermediate struct
that contains the u32 hash and a pointer to the TB; this
allows us, on an MRU promotion, to copy said struct (that is not
at the head), and put this new copy at the head. After a grace
period, the original non-head struct can be eliminated, and
after another grace period, freed.
- qht-fixed-nomru: fixed buckets + xxhash + qht without auto-resize +
no MRU for lookups; MRU for inserts.
The appended solution is the following:
- qht-dyn-nomru: dynamic number of buckets + xxhash + qht w/ auto-resize +
no MRU for lookups; MRU for inserts.
The plots below compare the considered solutions. The Y axis shows the
boot time (in seconds) of a debian jessie image with arm-softmmu; the X axis
sweeps the number of buckets (or initial number of buckets for qht-autoresize).
The plots in PNG format (and with errorbars) can be seen here:
http://imgur.com/a/Awgnq
Each test runs 5 times, and the entire QEMU process is pinned to a
single core for repeatability of results.
Host: Intel Xeon E5-2690
28 ++------------+-------------+-------------+-------------+------------++
A***** + + + master **A*** +
27 ++ * xxhash ##B###++
| A******A****** xxhash-rcu $$C$$$ |
26 C$$ A******A****** qht-fixed-nomru*%%D%%%++
D%%$$ A******A******A*qht-dyn-mru A*E****A
25 ++ %%$$ qht-dyn-nomru &&F&&&++
B#####% |
24 ++ #C$$$$$ ++
| B### $ |
| ## C$$$$$$ |
23 ++ # C$$$$$$ ++
| B###### C$$$$$$ %%%D
22 ++ %B###### C$$$$$$C$$$$$$C$$$$$$C$$$$$$C$$$$$$C
| D%%%%%%B###### @E@@@@@@ %%%D%%%@@@E@@@@@@E
21 E@@@@@@E@@@@@@F&&&@@@E@@@&&&D%%%%%%B######B######B######B######B######B
+ E@@@ F&&& + E@ + F&&& + +
20 ++------------+-------------+-------------+-------------+------------++
14 16 18 20 22 24
log2 number of buckets
Host: Intel i7-4790K
14.5 ++------------+------------+-------------+------------+------------++
A** + + + master **A*** +
14 ++ ** xxhash ##B###++
13.5 ++ ** xxhash-rcu $$C$$$++
| qht-fixed-nomru %%D%%% |
13 ++ A****** qht-dyn-mru @@E@@@++
| A*****A******A****** qht-dyn-nomru &&F&&& |
12.5 C$$ A******A******A*****A****** ***A
12 ++ $$ A*** ++
D%%% $$ |
11.5 ++ %% ++
B### %C$$$$$$ |
11 ++ ## D%%%%% C$$$$$ ++
| # % C$$$$$$ |
10.5 F&&&&&&B######D%%%%% C$$$$$$C$$$$$$C$$$$$$C$$$$$C$$$$$$ $$$C
10 E@@@@@@E@@@@@@B#####B######B######E@@@@@@E@@@%%%D%%%%%D%%%###B######B
+ F&& D%%%%%%B######B######B#####B###@@@D%%% +
9.5 ++------------+------------+-------------+------------+------------++
14 16 18 20 22 24
log2 number of buckets
Note that the original point before this patch series is X=15 for "master";
the little sensitivity to the increased number of buckets is due to the
poor hashing function in master.
xxhash-rcu has significant overhead due to the constant churn of allocating
and deallocating intermediate structs for implementing MRU. An alternative
would be do consider failed lookups as "maybe not there", and then
acquire the external lock (tb_lock in this case) to really confirm that
there was indeed a failed lookup. This, however, would not be enough
to implement dynamic resizing--this is more complex: see
"Resizable, Scalable, Concurrent Hash Tables via Relativistic
Programming" by Triplett, McKenney and Walpole. This solution was
discarded due to the very coarse RCU read critical sections that we have
in MTTCG; resizing requires waiting for readers after every pointer update,
and resizes require many pointer updates, so this would quickly become
prohibitive.
qht-fixed-nomru shows that MRU promotion is advisable for undersized
hash tables.
However, qht-dyn-mru shows that MRU promotion is not important if the
hash table is properly sized: there is virtually no difference in
performance between qht-dyn-nomru and qht-dyn-mru.
Before this patch, we're at X=15 on "xxhash"; after this patch, we're at
X=15 @ qht-dyn-nomru. This patch thus matches the best performance that we
can achieve with optimum sizing of the hash table, while keeping the hash
table scalable for readers.
The improvement we get before and after this patch for booting debian jessie
with arm-softmmu is:
- Intel Xeon E5-2690: 10.5% less time
- Intel i7-4790K: 5.2% less time
We could get this same improvement _for this particular workload_ by
statically increasing the size of the hash table. But this would hurt
workloads that do not need a large hash table. The dynamic (upward)
resizing allows us to start small and enlarge the hash table as needed.
A quick note on downsizing: the table is resized back to 2**15 buckets
on every tb_flush; this makes sense because it is not guaranteed that the
table will reach the same number of TBs later on (e.g. most bootup code is
thrown away after boot); it makes sense to grow the hash table as
more code blocks are translated. This also avoids the complication of
having to build downsizing hysteresis logic into qht.
Reviewed-by: Sergey Fedorov <serge.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-15-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
For some workloads such as arm bootup, tb_phys_hash is performance-critical.
The is due to the high frequency of accesses to the hash table, originated
by (frequent) TLB flushes that wipe out the cpu-private tb_jmp_cache's.
More info:
https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg05098.html
To dig further into this I modified an arm image booting debian jessie to
immediately shut down after boot. Analysis revealed that quite a bit of time
is unnecessarily spent in tb_phys_hash: the cause is poor hashing that
results in very uneven loading of chains in the hash table's buckets;
the longest observed chain had ~550 elements.
The appended addresses this with two changes:
1) Use xxhash as the hash table's hash function. xxhash is a fast,
high-quality hashing function.
2) Feed the hashing function with not just tb_phys, but also pc and flags.
This improves performance over using just tb_phys for hashing, since that
resulted in some hash buckets having many TB's, while others getting very few;
with these changes, the longest observed chain on a single hash bucket is
brought down from ~550 to ~40.
Tests show that the other element checked for in tb_find_physical,
cs_base, is always a match when tb_phys+pc+flags are a match,
so hashing cs_base is wasteful. It could be that this is an ARM-only
thing, though. UPDATE:
On Tue, Apr 05, 2016 at 08:41:43 -0700, Richard Henderson wrote:
> The cs_base field is only used by i386 (in 16-bit modes), and sparc (for a TB
> consisting of only a delay slot).
> It may well still turn out to be reasonable to ignore cs_base for hashing.
BTW, after this change the hash table should not be called "tb_hash_phys"
anymore; this is addressed later in this series.
This change gives consistent bootup time improvements. I tested two
host machines:
- Intel Xeon E5-2690: 11.6% less time
- Intel i7-4790K: 19.2% less time
Increasing the number of hash buckets yields further improvements. However,
using a larger, fixed number of buckets can degrade performance for other
workloads that do not translate as many blocks (600K+ for debian-jessie arm
bootup). This is dealt with later in this series.
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-8-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This will be used by upcoming changes for hashing the tb hash.
Add this into a separate file to include the copyright notice from
xxhash.
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1465412133-3029-7-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
The function cpu_resume_from_signal() is now always called with a
NULL puc argument, and is rather misnamed since it is never called
from a signal handler. It is essentially forcing an exit to the
top level cpu loop but without raising any exception, so rename
it to cpu_loop_exit_noexc() and drop the useless unused argument.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Riku Voipio <riku.voipio@linaro.org>
Message-id: 1463494687-25947-4-git-send-email-peter.maydell@linaro.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIVAwUAV1gdMrRIkN7ePJvAAQhLcg/+Kby99taEuewItrA1yDs75jxOlLqaJopd
cVzo4LFRFPhIn4UEKqRQS0CGoIeU/DYOmObvuUzJxs2LyUoHoqmQOwEm5obC2a85
JrHo/NOppYBbyvvIEAAXzZDCZo0KZKVclrlT+AX5obpOSNSvAnKvEuLWq1aQ9WGN
n4AzHuFEl885cd4nFd8VK/xth89bqz6U/z8CjgIuw3mczp1XNrK5IJJwAy5epHay
GCBr9XHooW3SU971WS20RTRS0D33tKPHgCU3ZeZ3rKh4g3JNj6/ixdVgzi9NqFsQ
5DzAj/iBGhN1LtCOednRS6tUt32Bhy8G/g4O3GiXdejagAmNe2wz31cveNJ8S3W5
DK8SZAnJlz06zN5uIpOVQgDOqfXZkCp7ndq779QJoHOAnuOjJBcUbhw1myz2R3eR
6208tStWl3R0+ATEK8CZ7ejg1cUHvdzyqGJA+1nC2HaFUrBWipxN8jf2fz9vO/wG
G7zNbahvVgyJWO7bPNK4mxkb6qkWCETnCnLJsq2ZbmtPEMcINjD8vNWLNvFGVG8b
2HbinDrzh0Z9Zik5gLZfiVyP5HFaWSrJn9QRVIgaUjuIH9n3/25sl9OvW/JLjxJ+
h2P17CLnAK6dhUYc4R3wQTx2X/N2FvO4DD8iMYOcgDY6fhZ2b6EEyE9yBgQrIDbF
gU1AlC/CX+M=
=AXqa
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/riku/tags/pull-linux-user-20160608' into staging
linux-user pull request for June 2016
# gpg: Signature made Wed 08 Jun 2016 14:27:14 BST
# gpg: using RSA key 0xB44890DEDE3C9BC0
# gpg: Good signature from "Riku Voipio <riku.voipio@iki.fi>"
# gpg: aka "Riku Voipio <riku.voipio@linaro.org>"
* remotes/riku/tags/pull-linux-user-20160608: (44 commits)
linux-user: In fork_end(), remove correct CPUs from CPU list
linux-user: Special-case ERESTARTSYS in target_strerror()
linux-user: Make target_strerror() return 'const char *'
linux-user: Correct signedness of target_flock l_start and l_len fields
linux-user: Use safe_syscall wrapper for ioctl
linux-user: Use safe_syscall wrapper for accept and accept4 syscalls
linux-user: Use safe_syscall wrapper for semop
linux-user: Use safe_syscall wrapper for epoll_wait syscalls
linux-user: Use safe_syscall wrapper for poll and ppoll syscalls
linux-user: Use safe_syscall wrapper for sleep syscalls
linux-user: Use safe_syscall wrapper for rt_sigtimedwait syscall
linux-user: Use safe_syscall wrapper for flock
linux-user: Use safe_syscall wrapper for mq_timedsend and mq_timedreceive
linux-user: Use safe_syscall wrapper for msgsnd and msgrcv
linux-user: Use safe_syscall wrapper for send* and recv* syscalls
linux-user: Use safe_syscall wrapper for connect syscall
linux-user: Use safe_syscall wrapper for readv and writev syscalls
linux-user: Fix error conversion in 64-bit fadvise syscall
linux-user: Fix NR_fadvise64 and NR_fadvise64_64 for 32-bit guests
linux-user: Fix handling of arm_fadvise64_64 syscall
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Conflicts:
configure
scripts/qemu-binfmt-conf.sh
The target_to_host_bitmask() and host_to_target_bitmask() functions
and the associated struct bitmask_transtbl are completely generic,
but for historical reasons the target related fields and parameters
are named 'x86' and the host related fields are named 'alpha'.
Rename them to 'target' and 'host'.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The thunk_type_size_array() and thunk_type_align_array() functions
are only provided if NO_THUNK_TYPE_SIZE is not defined. However
nothing in the codebase defines that, and so in fact these functions
are always present. Drop the unnecessary #ifdefs.
(Over a decade ago thunk.h used to be included by some softmmu
files, which defined NO_THUNK_TYPE_SIZE, but these includes are
long gone; see for instance commit f193c7979c2f7.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The WORDS_ALIGNED #define is not used anywhere, and hasn't been since
2013 when commit 612d590ebc rewrote the various ld<type>_<endian>_p
functions to not use it. Remove the #define and the comment describing it.
Also remove the line in the comment about TARGET_WORDS_ALIGNED, since
it has never actually existed.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>