Commit Graph

1077 Commits

Author SHA1 Message Date
Paolo Bonzini
1f4e496e1f exec: introduce MemoryRegionCache
Device models often have to perform multiple access to a single
memory region that is known in advance, but would to use "DMA-style"
functions instead of address_space_map/unmap.  This can happen
for example when the data has to undergo endianness conversion.
Introduce a new data structure to cache the result of
address_space_translate without forcing usage of a host address
like address_space_map does.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-22 16:00:23 +01:00
Paolo Bonzini
715c31ec8e exec: introduce address_space_extend_translation
This extracts the common part of address_space_map and
address_space_cache_init into a new function.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-22 16:00:23 +01:00
Paolo Bonzini
0ce265ffef exec: introduce memory_ldst.inc.c
Templatize the address_space_* and *_phys functions, so that we can add
similar functions in the next patch that work with a lightweight,
cache-like version of address_space_map/unmap.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-22 16:00:23 +01:00
Paolo Bonzini
2651efe7f5 exec: optimize remaining address_space_* cases
Do them right before the next patch generalizes them into a multi-included
file.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-22 16:00:22 +01:00
Peter Maydell
a9353fe897 exec.c: Fix breakpoint invalidation race
A bug (1647683) was reported showing a crash when removing
breakpoints.  The reproducer was bisected to 3359baad when tb_flush
was finally made thread safe.  While in MTTCG the locking in
breakpoint_invalidate would have prevented any problems, but
currently tb_lock() is a NOP for system emulation.

The race is between a tb_flush from the gdbstub and the
tb_invalidate_phys_addr() in breakpoint_invalidate().

Ideally we'd have actual locking here; for the moment the
simple fix is to do a full tb_flush() for a bp invalidate,
since that is thread-safe even if no lock is taken.

Reported-by: Julian Brown <julian@codesourcery.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1481047629-7763-1-git-send-email-peter.maydell@linaro.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-12-06 20:21:46 +00:00
Stefan Hajnoczi
199a5bde46 * NBD bugfix (Changlong)
* NBD write zeroes support (Eric)
 * Memory backend fixes (Haozhong)
 * Atomics fix (Alex)
 * New AVX512 features (Luwei)
 * "make check" logging fix (Paolo)
 * Chardev refactoring fallout (Paolo)
 * Small checkpatch improvements (Paolo, Jeff)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQExBAABCAAbBQJYGaRPFBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D
 XKgH/RgNtosBTqJsmphkS7wACFAFOf7Uq46ajoKfB66Pt1J/++pFQg4TApPYkb7j
 KlKeKmXa7hb6+Jg8325H4zGkGno4kn2dE+OnznaB1xPKwiZVAMQVzQsagsEVqpno
 k/5PBVRptIiuHQKyU29Go0CxbWJBTH0O14S7rDK4YDF0YMnuT280HQOI3jdu1igV
 G/Q+CMgfk+yXf6GWHE8Z9sNq7n0ha8qgruA/X3NC7+pAvEsUcAP065zwLp9weYuK
 W1MU68L7Ub4tRo0SVf1HFkDUNdMv4T4hg+wpGe1GwthJWexHu9x0YAQBy60ykJb6
 NtHwjLwCUWtm7AiZD/btsOJPmjk=
 =+Dt/
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* NBD bugfix (Changlong)
* NBD write zeroes support (Eric)
* Memory backend fixes (Haozhong)
* Atomics fix (Alex)
* New AVX512 features (Luwei)
* "make check" logging fix (Paolo)
* Chardev refactoring fallout (Paolo)
* Small checkpatch improvements (Paolo, Jeff)

# gpg: Signature made Wed 02 Nov 2016 08:31:11 AM GMT
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (30 commits)
  main-loop: Suppress I/O thread warning under qtest
  docs/rcu.txt: Fix minor typo
  vl: exit qemu on guest panic if -no-shutdown is not set
  checkpatch: allow spaces before parenthesis for 'coroutine_fn'
  x86: add AVX512_4VNNIW and AVX512_4FMAPS features
  slirp: fix CharDriver breakage
  qemu-char: do not forward events through the mux until QEMU has started
  nbd: Implement NBD_CMD_WRITE_ZEROES on client
  nbd: Implement NBD_CMD_WRITE_ZEROES on server
  nbd: Improve server handling of shutdown requests
  nbd: Refactor conversion to errno to silence checkpatch
  nbd: Support shorter handshake
  nbd: Less allocation during NBD_OPT_LIST
  nbd: Let client skip portions of server reply
  nbd: Let server know when client gives up negotiation
  nbd: Share common option-sending code in client
  nbd: Send message along with server NBD_REP_ERR errors
  nbd: Share common reply-sending code in server
  nbd: Rename struct nbd_request and nbd_reply
  nbd: Rename NbdClientSession to NBDClientSession
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-11-03 16:32:30 +00:00
Haozhong Zhang
1775f111ea exec.c: check memory backend file size with 'size' option
If the memory backend file is not large enough to hold the required 'size',
Qemu will report error and exit.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Message-Id: <20161027042300.5929-3-haozhong.zhang@intel.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20161102010551.2723-1-haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02 09:28:51 +01:00
Richard Henderson
1ee73216f4 log: Add locking to large logging blocks
Reuse the existing locking provided by stdio to keep in_asm, cpu,
op, op_opt, op_ind, and out_asm as contiguous blocks.

While it isn't possible to interleave e.g. in_asm or op_opt logs
because of the TB lock protecting all code generation, it is
possible to interleave cpu logs, or to interleave a cpu dump with
an out_asm dump.

For mingw32, we appear to have no viable solution for this.  The locking
functions are not properly exported from the system runtime library.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-11-01 10:29:03 -06:00
Haozhong Zhang
d6af99c9f8 exec.c: do not truncate non-empty memory backend file
For '-object memory-backend-file,mem-path=foo,size=xyz', if the size of
file 'foo' does not match the given size 'xyz', the current QEMU will
truncate the file to the given size, which may corrupt the existing data
in that file. To avoid such data corruption, this patch disables
truncating non-empty backend files.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Message-Id: <20161027042300.5929-2-haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-01 16:06:57 +01:00
Alex Bennée
f35e44e764 exec.c: ensure all AddressSpaceDispatch updates under RCU
The memory_dispatch field is meant to be protected by RCU so we should
use the correct primitives when accessing it. This race was flagged up
by the ThreadSanitizer.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20161021153418.21571-1-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-01 16:06:57 +01:00
Alex Bennée
ba051fb5e5 tcg: move locking for tb_invalidate_phys_page_range up
In the linux-user case all things that involve ''l1_map' and PageDesc
tweaks are protected by the memory lock (mmpa_lock). For SoftMMU mode
we previously relied on single threaded behaviour, with MTTCG we now use
the tb_lock().

As a result we need to do a little re-factoring  and push the taking of
this lock up the call tree. This requires a slightly different entry for
the SoftMMU and user-mode cases from tb_invalidate_phys_range.

This also means user-mode breakpoint insertion needs to take two locks
but it hadn't taken any previously so this is an improvement.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20161027151030.20863-20-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-31 15:00:25 +01:00
KONRAD Frederic
a5e998262f tcg: protect translation related stuff with tb_lock.
This protects all translation related work with tb_lock() too ensure
thread safety. This effectively serialises all code generation. In
addition to the code generation we also take the lock for TB
invalidation. This has a knock on effect of meaning tb_lock() is held
for modification of the SoftMMU TLB by non-self threads which will be
used in later patches.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Message-Id: <1439220437-23957-8-git-send-email-fred.konrad@greensocs.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[AJB: moved into tree, clean-up history]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-Id: <20161027151030.20863-10-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-31 10:51:16 +01:00
Richard Henderson
258dfaaad0 exec: Avoid direct references to Int128 parts
Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26 08:29:00 -07:00
Peter Maydell
c43e853afe x86 and CPU queue, 2016-10-24
x2APIC support to APIC code, cpu_exec_init() refactor on all
 architectures, and other x86 changes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJYDmYyAAoJECgHk2+YTcWmoSUP/2ga+b9YmPuyL7XC+12pff0I
 Z8gdjUzbMUNcCI0JMZCTGUJbs3BapLcnsA7ypmt88s9kG02WeDMhNx1BfYiAFgLU
 kPLQlXAM7awEdGagd3sTCiFojSUZ7GxYHjd5fuhPoOAXvXM8im6zJl18ZcsnStjO
 /J8JGoGDHq1XJlz+RIjnGamojJWCiO/+iiD+rFmVSic8zjHPDYq14sIk/QJX+DaF
 azLiOI6DAlX3kyrN5ZshhIRQ3COzzUMUSDF/ZaYHjudUco5MBnwj/oLQniTq+ZUd
 hCu7dr5TpLxI7q1yltyd0UIl/+aZGbE8tEvoXAtc735iK4m2CTckT7ql6x3xI+Ir
 PmpPgIswHqfCiCXm8imLj6ZI47kRA1x4x4AudLaNVKP7jO82485sS9HWpOadYsaU
 jvek2SqfqvH+vce4FzwlLEcXGDb73MT/XkIUvd7SfPIbs9umgdZc03U4SHfAWr0i
 lAIRs4Ym0AAS2WSE4E09wvdUUr9oxaQBMhw3JAiNmg7hLfyINTP+D/IhtlAVXXEA
 F9D7fky5lDwfKvIwPxPJbDD5bCBV9AmxhiahIhv3epu4Kg4orf1inkrx0IZWSbB0
 7+JZ7j8asuizfibkeZAN9rxVwmz32makJNsnjzZHlnaPxTvIDzvRkNceBnhC5vKq
 3yfxgl4agXmMjveraAtt
 =T2kg
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging

x86 and CPU queue, 2016-10-24

x2APIC support to APIC code, cpu_exec_init() refactor on all
architectures, and other x86 changes.

# gpg: Signature made Mon 24 Oct 2016 20:51:14 BST
# gpg:                using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/x86-pull-request:
  exec: call cpu_exec_exit() from a CPU unrealize common function
  exec: move cpu_exec_init() calls to realize functions
  exec: split cpu_exec_init()
  pc: q35: Bump max_cpus to 288
  pc: Require IRQ remapping and EIM if there could be x2APIC CPUs
  pc: Add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs
  Increase MAX_CPUMASK_BITS from 255 to 288
  pc: Clarify FW_CFG_MAX_CPUS usage comment
  pc: kvm_apic: Pass APIC ID depending on xAPIC/x2APIC mode
  pc: apic_common: Reset APIC ID to initial ID when switching into x2APIC mode
  pc: apic_common: Restore APIC ID to initial ID on reset
  pc: apic_common: Extend APIC ID property to 32bit
  pc: Leave max apic_id_limit only in legacy cpu hotplug code
  acpi: cphp: Force switch to modern cpu hotplug if APIC ID > 254
  pc: acpi: x2APIC support for SRAT table
  pc: acpi: x2APIC support for MADT table and _MAT method

Conflicts:
	target-arm/cpu.c

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-25 10:25:27 +01:00
Laurent Vivier
7bbc124e7e exec: call cpu_exec_exit() from a CPU unrealize common function
As cpu_exec_exit() mirrors the cpu_exec_realizefn(),
rename it as cpu_exec_unrealizefn().

Create and register a cpu_common_unrealizefn() function for
the CPU device class and call cpu_exec_unrealizefn() from
this function.

Remove cpu_exec_exit() from cpu_common_finalize()
(which mirrors init, not realize), and as x86_cpu_unrealizefn()
and ppc_cpu_unrealizefn() overwrite the device class unrealize function,
add a call to a parent_unrealize pointer.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-24 17:29:16 -02:00
Laurent Vivier
ce5b1bbf62 exec: move cpu_exec_init() calls to realize functions
Modify all CPUs to call it from XXX_cpu_realizefn() function.

Remove all the cannot_destroy_with_object_finalize_yet as
unsafe references have been moved to cpu_exec_realizefn().
(tested with QOM command provided by commit 4c315c27)

for arm:

Setting of cpu->mp_affinity is moved from arm_cpu_initfn()
to arm_cpu_realizefn() as setting of cpu_index is now done
in cpu_exec_realizefn(). To avoid to overwrite an user defined
value, we set it to an invalid value by default, and update
it in realize function only if the value is still invalid.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-24 17:29:16 -02:00
Laurent Vivier
39e329e341 exec: split cpu_exec_init()
Put in cpu_exec_initfn() what initializes the CPU,
and leave in cpu_exec_init() what adds it to the environment.

As cpu_exec_initfn() is called by all XX_cpu_initfn(), call it
directly in cpu_common_initfn().
cpu_exec_init() is now a realize function, it will be renamed
to cpu_exec_realizefn() and moved to the XX_cpu_realizefn()
function in a following patch.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-24 17:29:16 -02:00
Peter Maydell
20bccb82ff cpu: Support a target CPU having a variable page size
Support target CPUs having a page size which isn't knownn
at compile time. To use this, the CPU implementation should:
 * define TARGET_PAGE_BITS_VARY
 * not define TARGET_PAGE_BITS
 * define TARGET_PAGE_BITS_MIN to the smallest value it
   might possibly want for TARGET_PAGE_BITS
 * call set_preferred_target_page_bits() in its realize
   function to indicate the actual preferred target page
   size for the CPU (and report any error from it)

In CONFIG_USER_ONLY, the CPU implementation should continue
to define TARGET_PAGE_BITS appropriately for the guest
OS page size.

Machines which want to take advantage of having the page
size something larger than TARGET_PAGE_BITS_MIN must
set the MachineClass minimum_page_bits field to a value
which they guarantee will be no greater than the preferred
page size for any CPU they create.

Note that changing the target page size by setting
minimum_page_bits is a migration compatibility break
for that machine.

For debugging purposes, attempts to use TARGET_PAGE_SIZE
before it has been finally confirmed will assert.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-10-24 16:26:49 +01:00
Vijaya Kumar K
2615fabd42 exec.c: Remove static allocation of sub_section of sub_page
Allocate sub_section dynamically. Remove dependency
on TARGET_PAGE_SIZE to make run-time page size detection
for arm platforms.

Signed-off-by: Vijaya Kumar K <vijayak@cavium.com>
Message-id: 1465808915-4887-3-git-send-email-vijayak@caviumnetworks.com
[PMM: use flexible array member rather than separate malloc
 so we don't need an extra pointer deref when using it]
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-24 16:26:49 +01:00
Haozhong Zhang
8360668e69 exec.c: workaround regression caused by alignment change in d2f39ad
Commit d2f39ad "exec.c: Ensure right alignment also for file backed ram"
added an additional alignment requirement on the size of backend file
besides the previous page size. On x86, the alignment is changed from
4KB in QEMU 2.6 to 2MB in QEMU 2.7.

This change breaks certain usages in QEMU 2.7 on x86, e.g.
    -object memory-backend-file,id=mem1,mem-path=/tmp/,size=$SZ
    -device pc-dimm,id=dimm1,memdev=mem1
where $SZ is multiple of 4KB but not 2MB (e.g. 1023M). QEMU 2.7
reports the following error message and aborts:
qemu-system-x86_64: -device pc-dimm,memdev=mem1,id=nv1: backend memory size must be multiple of 0x200000

The same regression may also happen in other platforms as indicated by
Igor Mammedov. This change is however necessary for s390 according to
the commit message of d2f39ad, so we workaround the regression by taking
the change only on s390.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reported-by: "Xu, Anthony" <anthony.xu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-24 15:46:11 +02:00
Dr. David Alan Gilbert
863e9621c5 RAMBlocks: Store page size
Store the page size in each RAMBlock, we need it later.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2016-10-13 17:23:53 +02:00
Marc-André Lureau
efee678d6d exec: remove unused compacted argument
Since commit b35ba30f8f when it was introduced, phys_page_compact()
takes an unused compacted argument.

ubsan complains about it when launching qemu-x86_64 without arguments:
qemu/exec.c:310:5: runtime error: variable length array bound evaluates to non-positive value 0

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-10-08 11:25:29 +03:00
Paolo Bonzini
267f685b8b cpus-common: move CPU list management to common code
Add a mutex for the CPU list to system emulation, as it will be used to
manage safe work.  Abstract manipulation of the CPU list in new functions
cpu_list_add and cpu_list_remove.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-27 11:57:29 +02:00
Cao jin
c2cd627ddb kvm-all: drop kvm_setup_guest_memory
kvm_setup_guest_memory only does "madvise to QEMU_MADV_DONTFORK" and
is only called by ram_block_add, which actually is duplicate code.
Bonus: add simple comment for kvm_has_sync_mmu to make life easier.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Message-Id: <1473662096-32598-1-git-send-email-caoj.fnst@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-13 19:09:43 +02:00
Igor Mammedov
3b8c1761f0 qtail: clean up direct access to tqe_prev field
instead of accessing tqe_prev field dircetly outside
of queue.h use macros to check if element is in list
and make sure that afer element is removed from list
tqe_prev field could be used to do the same check.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1469450832-84343-1-git-send-email-imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-13 19:08:41 +02:00
Igor Mammedov
630eb0faf4 exec: Ensure the only one cpu_index allocation method is used
Make sure that cpu_index auto allocation isn't used in
combination with manual cpu_index assignment. And
dissallow out of order cpu removal if auto allocation
is in use.

Target that wishes to support out of order unplug should
switch to manual cpu_index assignment. Following patch
could be used as an example:
 (pc: init CPUState->cpu_index with index in possible_cpus[]))

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-05 16:01:55 -03:00
Igor Mammedov
056b68af77 fix qemu exit on memory hotplug when allocation fails at prealloc time
When adding hostmem backend at runtime, QEMU might exit with error:
  "os_mem_prealloc: Insufficient free host memory pages available to allocate guest RAM"

It happens due to os_mem_prealloc() not handling errors gracefully.

Fix it by passing errp argument so that os_mem_prealloc() could
report error to callers and undo performed allocation when
os_mem_prealloc() fails.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1469008443-72059-1-git-send-email-imammedo@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-02 12:03:58 +02:00
Igor Mammedov
a07f953ef4 exec: Set cpu_index only if it's not been explictly set
It keeps the legacy behavior for all users that doesn't care
about stable cpu_index value, but would allow boards that
would support device_add/device_del to set stable cpu_index
that won't depend on order in which cpus are created/destroyed.

While at that simplify cpu_get_free_index() as cpu_index
generated by USER_ONLY and softmmu variants is the same
since none of the users support cpu-remove so far, except
of not yet released spapr/x86 device_add/delr, which
will be altered by follow up patches to set stable
cpu_index manually.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-07-26 15:32:01 -03:00
Igor Mammedov
8b1b835035 exec: Don't use cpu_index to detect if cpu_exec_init()'s been called
Instead use QTAIL's tqe_prev field to detect if cpu's been
placed in list by cpu_exec_init() which is always set if
QTAIL element is in list.

Fixes SIGSEGV on failure path in case cpu_index is assigned
by board and cpu.relalize() fails before cpu_exec_init() is called.

In follow up patches, cpu_index will be assigned by boards that
support cpu hot(un)plug and need stable cpu_index that doesn't
depend on order cpus are created/removed.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-07-26 15:32:00 -03:00
Igor Mammedov
1bc7e522d9 exec: Reduce CONFIG_USER_ONLY ifdeffenery
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-07-26 15:31:58 -03:00
Peter Lieven
101420b886 exec: avoid realloc in phys_map_node_reserve
this is the first step in reducing the brk heap fragmentation
created by the map->nodes memory allocation. Since the introduction
of RCU the freeing of the PhysPageMaps is delayed so that sometimes
several hundred are allocated at the same time.

Even worse the memory for map->nodes is allocated and shortly
afterwards reallocated. Since the number of nodes it grows
to in the end is the same for all PhysPageMaps remember this value
and at least avoid the reallocation.

The large number of simultaneous allocations (about 450 x 70kB in
my configuration) has to be addressed later.

Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1468577030-21097-1-git-send-email-pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-17 09:59:21 +02:00
Markus Armbruster
a9c94277f0 Use #include "..." for our own headers, <...> for others
Tracked down with an ugly, brittle and probably buggy Perl script.

Also move includes converted to <...> up so they get included before
ours where that's obviously okay.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-07-12 16:19:16 +02:00
Paolo Bonzini
02d0e09503 os-posix: include sys/mman.h
qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check
is bogus without a previous inclusion of sys/mman.h.  Include it in
sysemu/os-posix.h and remove it from everywhere else.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:03 +02:00
Anthony PERARD
d6b6aec409 exec: Fix qemu_ram_block_from_host for Xen
Since f615f39 (exec: remove ram_addr argument from
qemu_ram_block_from_host), migration under Xen is likely to fail, with a
SEGV of QEMU. But the commit only reveal a bug with the calculation of
the offset value in qemu_ram_block_from_host().

This patch calculates the offset from the ram_addr as
qemu_ram_addr_from_host() will later calculate the ram_addr from the
offset.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2016-06-13 11:50:20 +01:00
Peter Maydell
6886b98036 cpu-exec: Rename cpu_resume_from_signal() to cpu_loop_exit_noexc()
The function cpu_resume_from_signal() is now always called with a
NULL puc argument, and is rather misnamed since it is never called
from a signal handler. It is essentially forcing an exit to the
top level cpu loop but without raising any exception, so rename
it to cpu_loop_exit_noexc() and drop the useless unused argument.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Riku Voipio <riku.voipio@linaro.org>
Message-id: 1463494687-25947-4-git-send-email-peter.maydell@linaro.org
2016-06-09 15:55:02 +01:00
Peter Maydell
500acc9c41 ppc patch queue for 2016-05-31
Here's another ppc patch queue.  This batch is all preliminaries
 towards two significant features:
 
 1) Full hypervisor-mode support for POWER8
     Patches 1-8 start fixing various bugs with TCG's handling of
     hypervisor mode
 
 2) CPU hotplug support
     Patches 9-12 make some preliminary fixes towards implementing CPU
     hotplug on ppc64 (and other non-x86 platforms).  These patches are
     actually to generic code, not ppc, but are included here with
     Paolo's ACK.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXTN1QAAoJEGw4ysog2bOSM4kP/2TKm/wkGo3nsGm7vG0CArs+
 JVIlTWI9Le7Cq5ijCkTwV9gjeG2CYz+Us2PCh2ZAoHpXgZtP7px2HRcDv07SbCnt
 SaCwCS+EGf3ZO9baQrzG0zfe8XrlJF+XXTejD2zWtOZw7sZ/4OPWF9KdcZbjWqFp
 PzJuXrpYOAaIyXyEPJSZFpHY+AC9NIblqHlUrKntPLLOYbqQBYP4IMxsUmOgu2IX
 rFK/5A8t20BJN0lbmx8JNKh0voorFpHY/hhaH/1T7rKxsRkKMh3VbYSxD6EYs3Uc
 nZ4ufQQW6C4CEFta3YHNwoClcsQUbnZQh3Ra+gKo9bXvqDzasVpq/mBgl3BDjGeG
 LQPSA6sfmEA8lqtRikVdgSgdXDnwy5YXJLVmIXeAIG1KHa6eRuUxC3o+ScOkcH3A
 ynLglCEBl9slsG9/yYkDcFW2u0t/txTBUvaxMfOQomAejrGjOLGZqnSWMd2UC4Gt
 KQRP+b7igkXC7+bfrpBPWKyYGvOOCukESw3OV90hLBIzOthI1dI5hO0Gj61C/nlI
 NXMRbx0qTztgj3tfSTTs6e9Ke8PEKnyXqgol0+t9Yxntlz28f2alTubUoyv/vZjx
 8J1IOlNms3PnO26TxUVBu7/KaCGCM25eTQgllbTx8rhaqin3wAH3dRc1RTlWWhwJ
 SgADl+MWf8sa7DkcxnZa
 =vAIf
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160531' into staging

ppc patch queue for 2016-05-31

Here's another ppc patch queue.  This batch is all preliminaries
towards two significant features:

1) Full hypervisor-mode support for POWER8
    Patches 1-8 start fixing various bugs with TCG's handling of
    hypervisor mode

2) CPU hotplug support
    Patches 9-12 make some preliminary fixes towards implementing CPU
    hotplug on ppc64 (and other non-x86 platforms).  These patches are
    actually to generic code, not ppc, but are included here with
    Paolo's ACK.

# gpg: Signature made Tue 31 May 2016 01:39:44 BST using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.7-20160531:
  cpu: Add a sync version of cpu_remove()
  cpu: Reclaim vCPU objects
  exec: Do vmstate unregistration from cpu_exec_exit()
  exec: Remove cpu from cpus list during cpu_exec_exit()
  ppc: Add PPC_64H instruction flag to POWER7 and POWER8
  ppc: Get out of emulation on SMT "OR" ops
  ppc: Fix sign extension issue in mtmsr(d) emulation
  ppc: Change 'invalid' bit mask of tlbiel and tlbie
  ppc: tlbie, tlbia and tlbisync are HV only
  ppc: Do some batching of TCG tlb flushes
  ppc: Use split I/D mmu modes to avoid flushes on interrupts
  ppc: Remove MMU_MODEn_SUFFIX definitions

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-31 10:37:22 +01:00
Bharata B Rao
9dfeca7c6b exec: Do vmstate unregistration from cpu_exec_exit()
cpu_exec_init() does vmstate_register for the CPU device. This needs to be
undone from cpu_exec_exit(). This change is needed to support CPU hot
removal.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
[dwg: added missing include to fix compile on some archs]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-30 14:03:29 +10:00
Bharata B Rao
1c59eb39cf exec: Remove cpu from cpus list during cpu_exec_exit()
CPUState *cpu gets added to the cpus list during cpu_exec_init(). It
should be removed from cpu_exec_exit().

cpu_exec_exit() is called from generic CPU::instance_finalize and some
archs like PowerPC call it from CPU unrealizefn. So ensure that we
dequeue the cpu only once.

Now -1 value for cpu->cpu_index indicates that we have already dequeued
the cpu for CONFIG_USER_ONLY case also.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-30 13:22:20 +10:00
Paolo Bonzini
0878d0e11b exec: hide mr->ram_addr from qemu_get_ram_ptr users
Let users of qemu_get_ram_ptr and qemu_ram_ptr_length pass in an
address that is relative to the MemoryRegion.  This basically means
what address_space_translate returns.

Because the semantics of the second parameter change, rename the
function to qemu_map_ram_ptr.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:12 +02:00
Paolo Bonzini
07bdaa4196 memory: split memory_region_from_host from qemu_ram_addr_from_host
Move the old qemu_ram_addr_from_host to memory_region_from_host and
make it return an offset within the region.  For qemu_ram_addr_from_host
return the ram_addr_t directly, similar to what it was before
commit 1b5ec23 ("memory: return MemoryRegion from qemu_ram_addr_from_host",
2013-07-04).

Reviewed-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:12 +02:00
Paolo Bonzini
f615f39616 exec: remove ram_addr argument from qemu_ram_block_from_host
Of the two callers, one does not use it, and the other can compute
it itself based on the other output argument (offset) and the RAMBlock.

Reviewed-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:12 +02:00
Paolo Bonzini
4ff87573df memory: remove qemu_get_ram_fd, qemu_set_ram_fd, qemu_ram_block_host_ptr
Remove direct uses of ram_addr_t and optimize memory_region_{get,set}_fd
now that a MemoryRegion knows its RAMBlock directly.

Reviewed-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:12 +02:00
Paolo Bonzini
e4e697940d memory: remove unnecessary masking of MemoryRegion ram_addr
mr->ram_block->offset is already aligned to both host and target size
(see qemu_ram_alloc_internal).  Remove further masking as it is
unnecessary.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:45 +02:00
Gonglei
ab0a995608 exec: adjust rcu_read_lock requirement
qemu_ram_unset_idstr() doesn't need rcu lock anymore,
meanwhile make the range of rcu lock in
qemu_ram_set_idstr() as small as possible.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Message-Id: <1462845901-89716-3-git-send-email-arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:44 +02:00
Gonglei
fa53a0e53e memory: drop find_ram_block()
On the one hand, we have already qemu_get_ram_block() whose function
is similar. On the other hand, we can directly use mr->ram_block but
searching RAMblock by ram_addr which is a kind of waste.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-Id: <1462845901-89716-2-git-send-email-arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:44 +02:00
Dominik Dingel
d2f39add72 exec.c: Ensure right alignment also for file backed ram
While in the anonymous ram case we already take care of the right alignment
such an alignment gurantee does not exist for file backed ram allocation.

Instead, pagesize is used for alignment. On s390 this is not enough for gmap,
as we need to satisfy an alignment up to segments.

Reported-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>

Message-Id: <1461585338-45863-1-git-send-email-dingel@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:42 +02:00
Paolo Bonzini
df43d49cb8 hw: clean up hw/hw.h includes
Include qom/object.h and exec/memory.h instead of exec/ioport.h;
exec/ioport.h was almost everywhere required only for those two
includes, not for the content of the header itself.

Remove block/aio.h, everybody is already including it through
another path.

With this change, include/hw/hw.h is freed from qemu-common.h.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:30 +02:00
Paolo Bonzini
63c915526d cpu: move exec-all.h inclusion out of cpu.h
exec-all.h contains TCG-specific definitions.  It is not needed outside
TCG-specific files such as translate.c, exec.c or *helper.c.

One generic function had snuck into include/exec/exec-all.h; move it to
include/qom/cpu.h.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:29 +02:00
Paolo Bonzini
33c11879fd qemu-common: push cpu.h inclusion out of qemu-common.h
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:29 +02:00
Paolo Bonzini
741da0d38b hw: cannot include hw/hw.h from user emulation
All qdev definitions are available from other headers, user-mode
emulation does not need hw/hw.h.

By considering system emulation only, it is simpler to disentangle
hw/hw.h from NEED_CPU_H.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:28 +02:00
Emilio G. Cota
89fee74a0f tb: consistently use uint32_t for tb->flags
We are inconsistent with the type of tb->flags: usage varies loosely
between int and uint64_t. Settle to uint32_t everywhere, which is
superior to both: at least one target (aarch64) uses the most significant
bit in the u32, and uint64_t is wasteful.

Compile-tested for all targets.

Suggested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Suggested-by: Richard Henderson <rth@twiddle.net>
Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1460049562-23517-1-git-send-email-cota@braap.org>
2016-05-12 14:06:40 -10:00
Marc-André Lureau
85bc2a1512 memory: fix segv on qemu_ram_free(block=0x0)
Since f1060c55bf, the pointer is directly passed to
qemu_ram_free(). However, on initialization failure, it may be called
with a NULL pointer. Return immediately in this case.

This fixes a SEGV when memory initialization failed, for example
permission denied on open backing store /dev/hugepages, with -object
memory-backend-file,mem-path=/dev/hugepages.

Program received signal SIGSEGV, Segmentation fault.
0x00005555556e67e7 in qemu_ram_free (block=0x0) at /home/elmarco/src/qemu/exec.c:1775

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1459250451-29984-1-git-send-email-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05 11:46:52 +02:00
Paolo Bonzini
5c3ece79cd exec: fix error handling in file_ram_alloc
One instance of double closing, and invalid close(-1) in some cases
of "goto error".

Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:18 +01:00
Veronia Bahaa
f348b6d1a5 util: move declarations out of qemu-common.h
Move declarations out of qemu-common.h for functions declared in
utils/ files: e.g. include/qemu/path.h for utils/path.c.
Move inline functions out of qemu-common.h and into new files (e.g.
include/qemu/bcd.h)

Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:17 +01:00
Markus Armbruster
da34e65cb4 include/qemu/osdep.h: Don't include qapi/error.h
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef.  Since then, we've moved to include qemu/osdep.h
everywhere.  Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h.  That's in excess of
100KiB of crap most .c files don't actually need.

Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h.  Include qapi/error.h in .c files that need it and don't
get it now.  Include qapi-types.h in qom/object.h for uint16List.

Update scripts/clean-includes accordingly.  Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h.  Update the list of includes in the qemu/osdep.h
comment quoted above similarly.

This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third.  Unfortunately, the number depending on
qapi-types.h shrinks only a little.  More work is needed for that one.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:15 +01:00
Paolo Bonzini
39c350ee12 exec: fix early return from ram_block_add
After reporting an error, ram_block_add was going on with the registration
of the RAMBlock.  The visible effect is that it unlocked the ramlist
mutex twice.

Fixes: 528f46af6e
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-15 18:23:33 +01:00
Markus Armbruster
e1fb647199 exec: Fix memory allocation when memory path isn't on hugetlbfs
gethugepagesize() works reliably only when its argument is on
hugetlbfs.  When it's not, it returns the filesystem's "optimal
transfer block size", which may or may not be the actual page size
you'll get when you mmap().

If the value is too small or not a power of two, we fail
qemu_ram_mmap()'s assertions.  These were added in commit 794e8f3
(v2.5.0).  The bug's impact before that is currently unknown.  Seems
fairly unlikely at least when the normal page size is 4KiB.

Else, if the value is too large, we align more strictly than
necessary.

gethugepagesize() goes back to commit c902760 (v0.13).  That commit
clearly intended gethugepagesize() to be used on hugetlbfs only.  Not
only was it named accordingly, it also printed a warning when used on
anything else.  However, the commit neglected to spell out the
restriction in user documentation of -mem-path.

Commit bfc2a1a (v2.5.0) dropped the warning as bogus "because QEMU
functions perfectly well with the path on a regular tmpfs filesystem".
It sure does when you're sufficiently lucky.  In my testing, I was
lucky, too.

Fix by switching to qemu_fd_getpagesize().  Rename the variable
holding its result from hpagesize to page_size.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1457378754-21649-3-git-send-email-armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-15 18:23:33 +01:00
Markus Armbruster
fd97fd4408 exec: Fix memory allocation when memory path names new file
Commit 8d31d6b extended file_ram_alloc() to accept file names in
addition to directory names.  Even though it passes O_CREAT to open(),
it actually works only for existing files.  Reproducer adapted from
the commit's qemu-doc.texi update:

    $ qemu-system-x86_64 -object memory-backend-file,size=2M,mem-path=/dev/hugepages/my-shmem-file,id=mb1
    qemu-system-x86_64: -object memory-backend-file,size=2M,mem-path=/dev/hugepages/my-shmem-file,id=mb1: failed to get page size of file /dev/hugepages/my-shmem-file: No such file or directory

This is because we first get the page size for @path, then open the
actual file.  Unwise even before the flawed commit, because the
directory could change in between, invalidating the page size.
Unlikely to bite in practice.

Rearrange the code to create the file (if necessary) before getting
its page size.  Carefully avoid TOCTTOU conditions with a method
suggested by Paolo Bonzini.

While there, replace "hugepages" by "guest RAM" in error messages,
because host memory backends can be used for purposes other than huge
pages, e.g. /dev/shm/ shared memory.  Help text of -mem-path agrees.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1457378754-21649-2-git-send-email-armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-15 18:23:33 +01:00
Fam Zheng
729633c2bc exec: Introduce AddressSpaceDispatch.mru_section
Under heavy workloads the lookup will likely end up with the same
MemoryRegionSection from last time. Using a pointer to cache the result,
like ram_list.mru_block, significantly reduces cost of
address_space_translate.

During address space topology update, as->dispatch will be reallocated
so the pointer is invalidated automatically.

Perf reports a visible drop on the cpu usage, because phys_page_find is
not called.  Before:

   2.35%  qemu-system-x86_64       [.] phys_page_find
   0.97%  qemu-system-x86_64       [.] address_space_translate_internal
   0.95%  qemu-system-x86_64       [.] address_space_translate
   0.55%  qemu-system-x86_64       [.] address_space_lookup_region

After:

   0.97%  qemu-system-x86_64       [.] address_space_translate_internal
   0.97%  qemu-system-x86_64       [.] address_space_lookup_region
   0.84%  qemu-system-x86_64       [.] address_space_translate

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1456813104-25902-8-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:26:37 +01:00
Fam Zheng
29cb533d8c exec: Factor out section_covers_addr
This will be shared by the next patch.

Also add a comment explaining the unobvious condition on "size.hi".

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1456813104-25902-7-git-send-email-famz@redhat.com>
[Small change to the comment. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:26:37 +01:00
Fam Zheng
f1060c55bf exec: Pass RAMBlock pointer to qemu_ram_free
The only caller now knows exactly which RAMBlock to free, so it's not
necessary to do the lookup.

Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1456813104-25902-6-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:26:37 +01:00
Fam Zheng
8e41fb63c5 memory: Drop MemoryRegion.ram_addr
All references to mr->ram_addr are replaced by
memory_region_get_ram_addr(mr) (except for a few assertions that are
replaced with mr->ram_block).

Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1456813104-25902-5-git-send-email-famz@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:26:29 +01:00
Fam Zheng
0a75601853 memory: Move assignment to ram_block to memory_region_init_*
We don't force "const" qualifiers with pointers in QEMU, but it's still
good to keep a clean function interface. Assigning to mr->ram_block is
in this sense ugly - one initializer mutating its owning object's state.

Move it to memory_region_init_*, where mr->ram_addr is assigned.

Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1456813104-25902-3-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:18:28 +01:00
Fam Zheng
528f46af6e exec: Return RAMBlock pointer from allocating functions
Previously we return RAMBlock.offset; now return the pointer to the
whole structure.

ram_block_add returns void now, error is completely passed with errp.

Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1456813104-25902-2-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:18:28 +01:00
Gonglei
3655cb9c73 memory: optimize qemu_get_ram_ptr and qemu_ram_ptr_length
these two functions consume too much cpu overhead to
find the RAMBlock by ram address.

After this patch, we can pass the RAMBlock pointer
to them so that they don't need to find the RAMBlock
anymore most of the time. We can get better performance
in address translation processing.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Message-Id: <1455935721-8804-3-git-send-email-arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-25 16:11:29 +01:00
Gonglei
58eaa2174e exec: store RAMBlock pointer into memory region
Each RAM memory region has a unique corresponding RAMBlock.
In the current realization, the memory region only stored
the ram_addr which means the offset of RAM address space,
We need to qurey the global ram.list to find the ram block
by ram_addr if we want to get the ram block, which is very
expensive.

Now, we store the RAMBlock pointer into memory region
structure. So, if we know the mr, we can easily get the
RAMBlock.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Message-Id: <1456130097-4208-2-git-send-email-arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-25 16:11:26 +01:00
Peter Maydell
fc1ec1acff trivial patches for 2016-02-11
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWvHuEAAoJEL7lnXSkw9fb2lAIAIMkxyMS6xtUFlASAA8yhPJe
 iWoWX8okweraRZKdjinG15dNsRMzOywL6zActb+roNHJ1+RICOpHYhpUtp1P5Kpd
 gGYq3ry9HoVZZLxw1dGlJoLZVwDGdkAZmMFl1nOo6vq3Rb0GR7Et6NS//4nzkDtp
 XcNrsOGbfq0ABo/sYvQMHoWgSfnGU/2mkiMDNQs5FRWtCnUXgRRlvoEuolUPzkDS
 SHovLVYVIFB9okLgYpUhj2NmvDCnAlGcOL0rBb7hGvnKgZu+vT2kFIej6InrLZGX
 hVcptzX4Msv2E40wVB2y2QyU0GyWg1CjN6J39UAxA/a0q5lh/fm6EZbtAeiSZ+8=
 =dGSI
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2016-02-11' into staging

trivial patches for 2016-02-11

# gpg: Signature made Thu 11 Feb 2016 12:16:04 GMT using RSA key ID A4C3D7DB
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>"
# gpg:                 aka "Michael Tokarev <mjt@debian.org>"

* remotes/mjt/tags/pull-trivial-patches-2016-02-11:
  w32: include winsock2.h before windows.h
  Adds keycode 86 to the hid_usage_keys translation table.
  s390x: remove s390-zipl.rom
  Passthru CCID card: QOMify
  Emulated CCID card: QOMify
  ES1370: QOMify
  char: fix parameter name / type in BSD codepath
  qmp-spec: fix index in doc
  rdma: remove check on time_spent when calculating mbs
  qemu-sockets: simplify error handling
  cpu: cpu_save/cpu_load is no more
  qom: Correct object_property_get_int() description
  man: virtfs-proxy-helper: Rework awkward sentence
  remove libtool support

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-02-11 15:09:33 +00:00
Paolo Bonzini
945123a554 cpu: cpu_save/cpu_load is no more
Everything has been converted to vmstate.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-02-11 15:15:46 +03:00
Sergey Fedorov
568496c0c0 cpu: Add callback to check architectural watchpoint match
When QEMU watchpoint matches, that is not definitely an architectural
watchpoint match yet. If it is a stop-before-access watchpoint then that
is hardly possible to ignore it after throwing a TCG exception.

A special callback is introduced to check for architectural watchpoint
match before raising a TCG exception.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Message-id: 1454256948-10485-2-git-send-email-serge.fdrv@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-02-11 11:17:32 +00:00
Stefan Hajnoczi
5b82b703b6 memory: RCU ram_list.dirty_memory[] for safe RAM hotplug
Although accesses to ram_list.dirty_memory[] use atomics so multiple
threads can safely dirty the bitmap, the data structure is not fully
thread-safe yet.

This patch handles the RAM hotplug case where ram_list.dirty_memory[] is
grown.  ram_list.dirty_memory[] is change from a regular bitmap to an
RCU array of pointers to fixed-size bitmap blocks.  Threads can continue
accessing bitmap blocks while the array is being extended.  See the
comments in the code for an in-depth explanation of struct
DirtyMemoryBlocks.

I have tested that live migration with virtio-blk dataplane works.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <1453728801-5398-2-git-send-email-stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-09 15:45:26 +01:00
Paolo Bonzini
508127e243 log: do not unnecessarily include qom/cpu.h
Split the bits that require it to exec/log.h.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-id: 1452174932-28657-8-git-send-email-den@openvz.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-02-03 09:19:10 +00:00
Peter Maydell
7b31bbc2e6 exec: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-4-git-send-email-peter.maydell@linaro.org
2016-01-29 15:07:22 +00:00
Peter Maydell
0b0571dd24 Xen 2016/01/21
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJWoQ5KAAoJEIlPj0hw4a6QP6sP/01U66Fv7ZzqxnV6U/6hJkOG
 X11S6KUHVdNoLyMB4RCyOV/zsF16ODZ9A1PI+qeq/1Po4zNLASWYAdWR7OinCiis
 ad5QGmHY2JmzLm2x8ivWZR1ZqQ+PTRWFH7eEFEROaI/IEyG1wL4bTkMLB0L6Ih74
 SMnMJg3Rkl8XhxdvVuE5JZ4f4ZTyPIk+0daMXIH9Q58XspblVNRjKAbotjte/zrj
 XmCIxVfu29NOIKD3F1n0Cw29OqCuyofbxWHk+SwT68fM8M8KcdnX1WGmfOXylXod
 JP0j2NRN07LgMfJv1K+QXPSNlFOZAMlzzXpOAnbb2AJceTTMMwTdkQb6aahFfMEL
 eyWabU+ZI8gemFePgWWdOipkrqtWlGvdyFLKLv42CR9jhVGNck8SBt01njLcOEsf
 TZjsuzPVxMmQvSYr7xcZgIFKwWkt3yUpOAKl6KS5PlerIezpJ1MtmB1ZmFF+Caui
 kGpC1tfIgdu3VHdlqASlc50BsAeqTdGzXI+KxTE/6raOnn+aUVIXrUzcdgV+Tgby
 52Fd9y83X65RXIgasNIvNpUEX+jc7FYdrBaO2graSBzpCWAzituyypOk4WEpHxIn
 da64hN9Z3i4BzLDZtaC05B8A0iWpckLOwbVWK1zblsdiJJAaOFVAU9cNl2Plxm8j
 cy8WC0FdEqLZxXpU0deB
 =+hRh
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/sstabellini/tags/xen-20160121' into staging

Xen 2016/01/21

# gpg: Signature made Thu 21 Jan 2016 16:58:50 GMT using RSA key ID 70E1AE90
# gpg: Good signature from "Stefano Stabellini <stefano.stabellini@eu.citrix.com>"

* remotes/sstabellini/tags/xen-20160121:
  Xen PCI passthru: convert to realize()
  Add Error **errp for xen_pt_config_init()
  Add Error **errp for xen_pt_setup_vga()
  Add Error **errp for xen_host_pci_device_get()
  Xen: use qemu_strtoul instead of strtol
  Change xen_host_pci_sysfs_path() to return void
  xen-pvdevice: convert to realize()
  xen-hvm: Clean up xen_ram_alloc() error handling
  xen-hvm: Clean up xen_hvm_init() error handling
  xenfb.c: avoid expensive loops when prod <= out_cons
  MAINTAINERS: update Xen files

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-01-21 17:21:08 +00:00
Peter Crosthwaite
6731d864f8 qom/cpu: Add MemoryRegion property
Add a MemoryRegion property, which if set is used to construct
the CPU's initial (default) AddressSpace.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
[PMM: code is moved from qom/cpu.c to exec.c to avoid having to
 make qom/cpu.o be a non-common object file; code to use the
 MemoryRegion and to default it to system_memory added.]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2016-01-21 14:15:06 +00:00
Peter Maydell
79ed041647 exec.c: Use correct AddressSpace in watch_mem_read and watch_mem_write
In the watchpoint access routines watch_mem_read and watch_mem_write,
find the correct AddressSpace to use from current_cpu and the memory
transaction attributes, rather than always assuming address_space_memory.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2016-01-21 14:15:06 +00:00
Peter Maydell
5232e4c798 exec.c: Use cpu_get_phys_page_attrs_debug
Use cpu_get_phys_page_attrs_debug() when doing virtual-to-physical
conversions in debug related code, so that we can obtain the right
address space index and thus select the correct AddressSpace,
rather than always using cpu->as.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2016-01-21 14:15:06 +00:00
Peter Maydell
651a5bc037 exec.c: Add cpu_get_address_space()
Add a function to return the AddressSpace for a CPU based on
its numerical index. (Callers outside exec.c don't have access
to the CPUAddressSpace struct so can't just fish it out of the
CPUState struct directly.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2016-01-21 14:15:05 +00:00
Peter Maydell
a54c87b68a exec.c: Pass MemTxAttrs to iotlb_to_region so it uses the right AS
Pass the MemTxAttrs for the memory access to iotlb_to_region(); this
allows it to determine the correct AddressSpace to use for the lookup.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2016-01-21 14:15:05 +00:00
Peter Maydell
d7898cda81 cputlb.c: Use correct address space when looking up MemoryRegionSection
When looking up the MemoryRegionSection for the new TLB entry in
tlb_set_page_with_attrs(), use cpu_asidx_from_attrs() to determine
the correct address space index for the lookup, and pass it into
address_space_translate_for_iotlb().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2016-01-21 14:15:05 +00:00
Peter Maydell
12ebc9a76d exec.c: Allow target CPUs to define multiple AddressSpaces
Allow multiple calls to cpu_address_space_init(); each
call adds an entry to the cpu->ases array at the specified
index. It is up to the target-specific CPU code to actually use
these extra address spaces.

Since this multiple AddressSpace support won't work with
KVM, add an assertion to avoid confusing failures.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2016-01-21 14:15:04 +00:00
Peter Maydell
56943e8cc1 exec.c: Don't set cpu->as until cpu_address_space_init
Rather than setting cpu->as unconditionally in cpu_exec_init
(and then having target-i386 override this later), don't set
it until the first call to cpu_address_space_init.

This requires us to initialise the address space for
both TCG and KVM (KVM doesn't need the AS listener but
it does require cpu->as to be set).

For target CPUs which don't set up any address spaces (currently
everything except i386), add the default address_space_memory
in qemu_init_vcpu().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2016-01-21 14:15:04 +00:00
Markus Armbruster
37aa7a0e2f xen-hvm: Clean up xen_ram_alloc() error handling
xen_ram_alloc() dies with hw_error() on error, even though its caller
ram_block_add() handles errors just fine.  Add an Error **errp
parameter and use it.

Leave case RUN_STATE_INMIGRATE alone, because that looks like some
kind of warning.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2016-01-14 16:49:50 +00:00
Tetsuya Mukawa
56a571d9c8 ivshmem: Store file descriptor for vhost-user negotiation
If virtio-net driver allocates memory in ivshmem shared memory,
vhost-net will work correctly, but vhost-user will not work because
a fd of shared memory will not be sent to vhost-user backend.
This patch fixes ivshmem to store file descriptor of shared memory.
It will be used when vhost-user negotiates vhost-user backend.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-01-09 23:20:20 +02:00
Paolo Bonzini
3cc8f88499 memory: try to inline constant-length reads
memcpy can take a large amount of time for small reads and writes.
Handle the common case of reading s/g descriptors from memory (there
is no corresponding "write" case that is as common, because writes
often use address_space_st* functions) by inlining the relevant
parts of address_space_read into the caller.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-17 17:33:49 +01:00
Paolo Bonzini
a203ac702e memory: extract first iteration of address_space_read and address_space_write
We want to inline the case where there is only one iteration, because
then the compiler can also inline the memcpy.  As a start, extract
everything after the first address_space_translate call.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-17 17:33:49 +01:00
Paolo Bonzini
eb7eeb8862 memory: split address_space_read and address_space_write
Rather than dispatching on is_write for every iteration, make
address_space_rw call one of the two functions.  The amount of
duplicate logic is pretty small, and memory_access_is_direct can
be tweaked so that it inlines better in the callers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-17 17:33:48 +01:00
Paolo Bonzini
e81bcda529 exec: make qemu_ram_ptr_length more similar to qemu_get_ram_ptr
Notably, use qemu_get_ram_block to enjoy the MRU optimization.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-17 17:33:48 +01:00
Paolo Bonzini
49b24afcb1 exec: always call qemu_get_ram_ptr within rcu_read_lock
Simplify the code and document the assumption.  The only caller
that is not within rcu_read_lock is memory_region_get_ram_ptr.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-17 17:33:48 +01:00
Paolo Bonzini
013a29424c qemu-log: introduce qemu_log_separate
In some cases, the same message is printed both on stderr and in the log.
Avoid duplicate output in the default case where stderr _is_ the log,
and standardize this to stderr+log where it used to use stdio+log.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-17 17:33:47 +01:00
Eduardo Habkost
2f3a2bb15e exec: Remove unnecessary RAM_FILE flag
The only code that sets RAMBlock.fd is file_ram_alloc(), and the only
code that calls file_ram_alloc() sets the RAM_FILE flag. That means the
flag is always set when RAMBlock.fd >= 0, and the munmap() call at
reclaim_ramblock() is dead code that never runs.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1446847881-9385-1-git-send-email-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-17 15:24:34 +01:00
Eduardo Habkost
a29ac16632 exec: Eliminate qemu_ram_free_from_ptr()
Replace qemu_ram_free_from_ptr() with qemu_ram_free().

The only difference between qemu_ram_free_from_ptr() and
qemu_ram_free() is that g_free_rcu() is used instead of
call_rcu(reclaim_ramblock). We can safely replace it because:

* RAM blocks allocated by qemu_ram_alloc_from_ptr() always have
  RAM_PREALLOC set;
* reclaim_ramblock(block) will do nothing except g_free(block)
  if RAM_PREALLOC is set at block->flags.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1446844805-14492-2-git-send-email-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-17 15:24:33 +01:00
Don Slutz
55b4e80b04 exec: Stop using memory after free
memory_region_unref(mr) can free memory.

For example I got:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f43280d4700 (LWP 4462)]
0x00007f43323283c0 in phys_section_destroy (mr=0x7f43259468b0)
    at /home/don/xen/tools/qemu-xen-dir/exec.c:1023
1023        if (mr->subpage) {
(gdb) bt
    at /home/don/xen/tools/qemu-xen-dir/exec.c:1023
    at /home/don/xen/tools/qemu-xen-dir/exec.c:1034
    at /home/don/xen/tools/qemu-xen-dir/exec.c:2205
(gdb) p mr
$1 = (MemoryRegion *) 0x7f43259468b0

And this change prevents this.

Signed-off-by: Don Slutz <Don.Slutz@Gmail.com>
Message-Id: <1448921464-21845-1-git-send-email-Don.Slutz@Gmail.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-02 12:01:43 +01:00
Daniel P. Berrange
bfc2a1a1f4 exec: remove warning about mempath and hugetlbfs
The gethugepagesize() method in exec.c printed a warning if
the file path for "-mem-path" or "-object memory-backend-file"
was not on a hugetlbfs filesystem. This warning is bogus, because
QEMU functions perfectly well with the path on a regular tmpfs
filesystem. Use of hugetlbfs vs tmpfs is a choice for the management
application or end user to make as best fits their needs. As such it
is inappropriate for QEMU to have an opinion on whether the user's
choice is right or wrong in this case.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1448448749-1332-3-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-26 16:47:44 +01:00
Daniel P. Berrange
2c189a4e12 Revert "exec: silence hugetlbfs warning under qtest"
This reverts commit 1c7ba94a18.

That commit changed QEMU initialization order from

 - object-initial, chardev, qtest, object-late

to

 - chardev, qtest, object-initial, object-late

This breaks chardev setups which need to rely on objects
having been created. For example, when chardevs use TLS
encryption in the future, they need to have tls credential
objects created first.

This revert, restores the ordering introduced in

  commit f08f9271bf
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Wed May 13 17:14:04 2015 +0100

    vl: Create (most) objects before creating chardev backends

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1448448749-1332-2-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-26 16:47:44 +01:00
Marc-André Lureau
1c7ba94a18 exec: silence hugetlbfs warning under qtest
vhost-user-test prints a warning. A test should not need to run on
hugetlbfs, let's silence the warning under qtest. The
condition can't check on qtest_enabled() since vhost-user-test actually
doesn't use qtest accel. However, qtest_driver() can be used, if
qtest_init() is called early enough. For that reason, move chardev and
qtest initialization early.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-19 15:26:05 +02:00
Dr. David Alan Gilbert
4ed023ce2a Round up RAMBlock sizes to host page sizes
RAMBlocks that are not a multiple of host pages in length
cause problems for postcopy (I've seen an ACPI table on aarch64
be 5k in length - i.e. 5x target-page), so round RAMBlock sizes
up to a host-page.

This potentially breaks migration compatibility due to changes
in RAMBlock sizes; however:
   1) x86 and s390 I think always have host=target page size
   2) When I've tried on Power the block sizes already seem aligned.
   3) I don't think there's anything else that maintains per-version
      machine-types for compatibility.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:28 +01:00
Dr. David Alan Gilbert
e3dd74934f qemu_ram_block_by_name
Add a function to find a RAMBlock by name; use it in two
of the places that already open code that loop; we've
got another use later in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 14:51:48 +01:00
Dr. David Alan Gilbert
422148d3e5 qemu_ram_block_from_host
Postcopy sends RAMBlock names and offsets over the wire (since it can't
rely on the order of ramaddr being the same), and it starts out with
HVA fault addresses from the kernel.

qemu_ram_block_from_host translates a HVA into a RAMBlock, an offset
in the RAMBlock and the global ram_addr_t value.

Rewrite qemu_ram_addr_from_host to use qemu_ram_block_from_host.

Provide qemu_ram_get_idstr since its the actual name text sent on the
wire.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 14:51:48 +01:00
Dr. David Alan Gilbert
038629a699 Provide runtime Target page information
The migration code generally is built target-independent, however
there are a few places where knowing the target page size would
avoid artificially moving stuff into migration/ram.c.

Provide 'qemu_target_page_bits()' that returns TARGET_PAGE_BITS
to other bits of code so that they can stay target-independent.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 14:51:48 +01:00
Paolo Bonzini
68851b98e5 exec: avoid unnecessary cacheline bounce on ram_list.mru_block
Whenever the MRU cache hits for the list of RAM blocks, qemu_get_ram_block
does an unnecessary write that causes a processor cache line to bounce
from one core to another.  This causes a performance hit.

Reported-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Peter Maydell
9319738080 So here it is, let's see what happens.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJWPHM6AAoJEL/70l94x66DK5YIAJTNthYWL8eNhQ1iek6CLlV+
 etVXm3JDmkV0zOfYVHLBb44VLZ6I1ocas+57F/kmz7SKpMLiI6bMXRxhTSkiO4D+
 3N36cWQf3fq+P0DmxuikMlYGz8V6QQ5PQE2xJKV0ZIWAkiqInxilkN3qt81sNR+A
 A9Ohom3sc0eGHyYJcVDK4krbnNSAZjIB2yMWperw61x+GYAhxjA02HPUgB32KK6q
 KrdnKmnRu9Cw6y4wTCbbDITJztPexZYsX2DOJh30wC0eNcE+MZ7J2im8Frpxe+Ml
 C8MUuvSqLOyeu9tUfrXGzd6kMtEKrmU+fh2nNbxJbtfowDjkW2jcIEgC0UjkGE4=
 =BF1q
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream-replay' into staging

So here it is, let's see what happens.

# gpg: Signature made Fri 06 Nov 2015 09:30:34 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"

* remotes/bonzini/tags/for-upstream-replay:
  replay: recording of the user input
  replay: command line options
  replay: replay blockers for devices
  replay: initialization and deinitialization
  replay: ptimer
  bottom halves: introduce bh call function
  replay: checkpoints
  icount: improve counting for record/replay
  replay: shutdown event
  replay: recording and replaying clock ticks
  replay: asynchronous events infrastructure
  replay: interrupts and exceptions
  cpu: replay instructions sequence
  cpu-exec: allow temporary disabling icount
  replay: introduce icount event
  replay: introduce mutex to protect the replay log
  replay: internal functions for replay log
  replay: global variables and function stubs

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-06 11:31:40 +00:00
Pavel Dovgalyuk
7615936ebf replay: initialization and deinitialization
This patch introduces the functions for enabling the record/replay and for
freeing the resources when simulator closes.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162507.8676.90232.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
2015-11-06 10:16:03 +01:00
Pavel Fedin
8d31d6b65a backends/hostmem-file: Allow to specify full pathname for backing file
This allows to explicitly specify file name to use with the backend. This
is important when using it together with ivshmem in order to make it backed
by hugetlbfs. By default filename is autogenerated using mkstemp(), and the
file is unlink()ed after creation, effectively making it anonymous. This is
not very useful with ivshmem because it ends up in a memory which cannot be
accessed by something else.

Distinction between directory and file name is done by stat() check. If an
existing directory is given, the code keeps old behavior. Otherwise it
creates or opens a file with the given pathname.

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Tested-by: Igor Skalkin <i.skalkin@samsung.com>
Message-Id: <004301d11166$9672fe30$c358fa90$@samsung.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 15:56:05 +01:00
Paolo Bonzini
680a4783dc memory: call begin, log_start and commit when registering a new listener
This ensures that cpu_reload_memory_map() is called as soon as
tcg_cpu_address_space_init() is called, and before cpu->memory_dispatch
is used.  qemu-system-s390x never changes the address spaces after
tcg_cpu_address_space_init() is called, and thus tcg_commit() is never
called.  This causes a SIGSEGV.

Because memory_map_init() will now call mem_commit(), we have to
initialize io_mem_* before address_space_memory and friends.

Reported-by: Philipp Kern <pkern@debian.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: 0a1c71cec6
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 15:56:01 +01:00
Igor Mammedov
cc57501dee file_ram_alloc: propagate error to caller instead of terminating QEMU
QEMU shouldn't exits from file_ram_alloc() if -mem-prealloc option is specified
and "object_add memory-backend-file,..." fails allocation during memory hotplug.

Propagate error to a caller and let it decide what to do with allocation failure.
That leaves QEMU alive if it can't create backend during hotplug time and
kills QEMU at startup time if backends or initial memory were misconfigured/
too large.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1445274671-17704-1-git-send-email-imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-02 14:50:27 +01:00
Peter Maydell
ca3e40e233 vhost, pc, virtio features, fixes, cleanups
New features:
     VT-d support for devices behind a bridge
     vhost-user migration support
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWKMrnAAoJECgfDbjSjVRpVL0H/iRc31o00QE4nWBRpxUpf8WJ
 V5RWE8qKkDgBha5bS5Nt4vs8K4jkkHGXCbmygMidWph96hUPK8/yHy1A/wmpBibB
 5hVSPDK8onavNGJwpaWDrkhd9OhKAaKOuu49T6+VWJGZY/uX5ayqmcN934y0NPUa
 4EhH5tyxPpYOYeW9i/VOMQ374gCJcpzYBMug4NJZRyFpfz/b2mzAQtoqw3EsPtB0
 vpVJ+fKiCyG39HFKQJW7cL12yBeXOoyhjfDxpumLqwLWMfmde+vJwTFx6wbechgV
 aU3jIdvUX8wHCNYaB937NsMaDALoGNqUjbpKnf+xD1w7xr9pwTzdyrGH3rpGLEE=
 =+G1+
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

vhost, pc, virtio features, fixes, cleanups

New features:
    VT-d support for devices behind a bridge
    vhost-user migration support

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Thu 22 Oct 2015 12:39:19 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream: (37 commits)
  hw/isa/lpc_ich9: inject the SMI on the VCPU that is writing to APM_CNT
  i386: keep cpu_model field in MachineState uptodate
  vhost: set the correct queue index in case of migration with multiqueue
  piix: fix resource leak reported by Coverity
  seccomp: add memfd_create to whitelist
  vhost-user-test: check ownership during migration
  vhost-user-test: add live-migration test
  vhost-user-test: learn to tweak various qemu arguments
  vhost-user-test: wrap server in TestServer struct
  vhost-user-test: remove useless static check
  vhost-user-test: move wait_for_fds() out
  vhost: add migration block if memfd failed
  vhost-user: use an enum helper for features mask
  vhost user: add rarp sending after live migration for legacy guest
  vhost user: add support of live migration
  net: add trace_vhost_user_event
  vhost-user: document migration log
  vhost: use a function for each call
  vhost-user: add a migration blocker
  vhost-user: send log shm fd along with log_base
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-22 12:41:44 +01:00
Michael S. Tsirkin
794e8f301a exec: factor out duplicate mmap code
Anonymous and file-backed RAM allocation are now almost exactly the same.

Reduce code duplication by moving RAM mmap code out of oslib-posix.c and
exec.c.

Reported-by: Marc-André Lureau <mlureau@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-21 09:24:44 +03:00
Peter Maydell
32857f4d5e exec.c: Collect AddressSpace related fields into a CPUAddressSpace struct
Gather up all the fields currently in CPUState which deal with the CPU's
AddressSpace into a separate CPUAddressSpace struct. This paves the way
for allowing the CPU to know about more than one AddressSpace.

The rearrangement also allows us to make the MemoryListener a directly
embedded object in the CPUAddressSpace (it could not be embedded in
CPUState because 'struct MemoryListener' isn't defined for the user-only
builds). This allows us to resolve the FIXME in tcg_commit() by going
directly from the MemoryListener to the CPUAddressSpace.

This patch extracts the actual update of the cached dispatch pointer
from cpu_reload_memory_map() (which is renamed accordingly to
cpu_reloading_memory_map() as it is only responsible for breaking
cpu-exec.c's RCU critical section now). This lets us keep the definition
of the CPUAddressSpace struct private to exec.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <1443709790-25180-4-git-send-email-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-12 18:29:26 +02:00
Peter Maydell
0a1c71cec6 exec.c: Don't call cpu_reload_memory_map() from cpu_exec_init()
Currently we call cpu_reload_memory_map() from cpu_exec_init(),
but this is not necessary:
 * KVM doesn't use the data structures maintained by
   cpu_reload_memory_map() (the TLB and cpu->memory_dispatch)
 * for TCG, we will call this function via tcg_commit() either
   as soon as tcg_cpu_address_space_init() registers the listener,
   or when the first MemoryRegion is added to the AddressSpace
   if the AS is empty when we register the listener

The unnecessary call is awkward for adding support for multiple
address spaces per CPU, so drop it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Message-Id: <1443709790-25180-2-git-send-email-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-12 18:29:25 +02:00
Michael S. Tsirkin
8561c9244d exec: allocate PROT_NONE pages on top of RAM
This inserts a read and write protected page between RAM and QEMU
memory, for file-backend RAM.
This makes it harder to exploit QEMU bugs resulting from buffer
overflows in devices using variants of cpu_physical_memory_map,
dma_memory_map etc.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 16:16:52 +03:00
Peter Crosthwaite
dfccc76023 include/exec: Move cputlb exec.c defs out
Move the architecture agnostic function prototypes for exec.c out of
cputlb.h to exec-all.h. This allows hiding of the arch specific
cputlb.h from exec.c which should be getting close to having no
architecture specifics. Prepares support for multi-arch, which will have
a minimal cpu.h that services exec.c but not cputlb.h.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-Id: <b4fe754c58c860315e35d44430c26b1c967ce2c9.1441614289.git.crosthwaite.peter@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-16 17:33:33 +02:00
Peter Crosthwaite
bcae01e468 cputlb: Change tlb_set_dirty() arg to cpu
Change tlb_set_dirty() to accept a CPU instead of an env pointer. This
allows for removal of another CPUArchState usage from prototypes that
need to be QOMified.

Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-Id: <d2b1dcbe7945112989861d8ba7369449c11cc273.1441614289.git.crosthwaite.peter@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-16 17:33:33 +02:00
Peter Crosthwaite
9a13565d52 cputlb: move CPU_LOOP() for tlb_reset() to exec.c
To prepare for multi-arch, cputlb.c should only have awareness of one
single architecture. This means it should not have access to the full
CPU lists which may be heterogeneous. Instead, push the CPU_LOOP() up
to the one and only caller in exec.c.

Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-Id: <db06dc6c49f8970caaf116d0385f00ee10a56f2f.1441614289.git.crosthwaite.peter@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-16 17:33:33 +02:00
Andrey Smetanin
bac05aa9a7 cpu: Add crash_occurred flag into CPUState
CPUState::crash_occurred field inside CPUState marks
that guest crash occurred. This value is added into
cpu common migration subsection.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Andreas Färber <afaerber@suse.de>
Message-Id: <1435924905-8926-12-git-send-email-den@openvz.org>
[Document the new field. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-16 17:33:32 +02:00
Peter Maydell
a2aa09e181 * Support for jemalloc
* qemu_mutex_lock_iothread "No such process" fix
 * cutils: qemu_strto* wrappers
 * iohandler.c simplification
 * Many other fixes and misc patches.
 
 And some MTTCG work (with Emilio's fixes squashed):
 * Signal-free TCG kick
 * Removing spinlock in favor of QemuMutex
 * User-mode emulation multi-threading fixes/docs
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJV8Tk7AAoJEL/70l94x66Ds3QH/3bi0RRR2NtKIXAQrGo5tfuD
 NPMu1K5Hy+/26AC6mEVNRh4kh7dPH5E4NnDGbxet1+osvmpjxAjc2JrxEybhHD0j
 fkpzqynuBN6cA2Gu5GUNoKzxxTmi2RrEYigWDZqCftRXBeO2Hsr1etxJh9UoZw5H
 dgpU3j/n0Q8s08jUJ1o789knZI/ckwL4oXK4u2KhSC7ZTCWhJT7Qr7c0JmiKReaF
 JEYAsKkQhICVKRVmC8NxML8U58O8maBjQ62UN6nQpVaQd0Yo/6cstFTZsRrHMHL3
 7A2Tyg862cMvp+1DOX3Bk02yXA+nxnzLF8kUe0rYo6llqDBDStzqyn1j9R0qeqA=
 =nB06
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* Support for jemalloc
* qemu_mutex_lock_iothread "No such process" fix
* cutils: qemu_strto* wrappers
* iohandler.c simplification
* Many other fixes and misc patches.

And some MTTCG work (with Emilio's fixes squashed):
* Signal-free TCG kick
* Removing spinlock in favor of QemuMutex
* User-mode emulation multi-threading fixes/docs

# gpg: Signature made Thu 10 Sep 2015 09:03:07 BST using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"

* remotes/bonzini/tags/for-upstream: (44 commits)
  cutils: work around platform differences in strto{l,ul,ll,ull}
  cpu-exec: fix lock hierarchy for user-mode emulation
  exec: make mmap_lock/mmap_unlock globally available
  tcg: comment on which functions have to be called with mmap_lock held
  tcg: add memory barriers in page_find_alloc accesses
  remove unused spinlock.
  replace spinlock by QemuMutex.
  cpus: remove tcg_halt_cond and tcg_cpu_thread globals
  cpus: protect work list with work_mutex
  scripts/dump-guest-memory.py: fix after RAMBlock change
  configure: Add support for jemalloc
  add macro file for coccinelle
  configure: factor out adding disas configure
  vhost-scsi: fix wrong vhost-scsi firmware path
  checkpatch: remove tests that are not relevant outside the kernel
  checkpatch: adapt some tests to QEMU
  CODING_STYLE: update mixed declaration rules
  qmp: Add example usage of strto*l() qemu wrapper
  cutils: Add qemu_strtoull() wrapper
  cutils: Add qemu_strtoll() wrapper
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-09-14 16:13:16 +01:00
Paolo Bonzini
f240eb6fdc remove qemu/tls.h
TLS is now required on all platforms, so DECLARE_TLS/DEFINE_TLS is not
needed anymore.  Removing it does not break Windows because of the
previous patch.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-09 15:34:53 +02:00
Peter Maydell
6554f5c037 exec.c: Use pow2floor() rather than hand-calculation
Use pow2floor() to round down to the nearest power of 2,
rather than an inline calculation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1437741192-20955-5-git-send-email-peter.maydell@linaro.org
2015-09-07 14:19:00 +01:00
Chen Hanxiao
9284f31994 exec: use macro ROUND_UP for alignment
Use ROUND_UP instead.

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Message-Id: <1437707523-4910-1-git-send-email-chenhanxiao@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-08-14 23:40:32 +02:00
Peter Maydell
0b8e2c1002 exec.c: Use atomic_rcu_read() to access dispatch in memory_region_section_get_iotlb()
When accessing the dispatch pointer in an AddressSpace within an RCU
critical section we should always use atomic_rcu_read(). Fix an
access within memory_region_section_get_iotlb() which was incorrectly
doing a direct pointer access.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <1437391637-31576-1-git-send-email-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-23 07:37:38 +02:00
Peter Crosthwaite
4bad9e392e cpu: Change cpu_exec_init() arg to cpu, not env
The callers (most of them in target-foo/cpu.c) to this function all
have the cpu pointer handy. Just pass it to avoid an ENV_GET_CPU() from
core code (in exec.c).

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: "Edgar E. Iglesias" <edgar.iglesias@gmail.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Michael Walle <michael@walle.cc>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Cc: Anthony Green <green@moxielogic.com>
Cc: Jia Liu <proljc@gmail.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Cc: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-07-09 15:20:40 +02:00
Peter Crosthwaite
bbd77c180d translate-all: Change tb_flush() env argument to cpu
All of the core-code usages of this API have the cpu pointer handy so
pass it in. There are only 3 architecture specific usages (2 of which
are commented out) which can just use ENV_GET_CPU() locally to get the
cpu pointer. The reduces core code usage of the CPU env, which brings
us closer to common-obj'ing these core files.

Cc: Riku Voipio <riku.voipio@iki.fi>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-07-09 15:20:40 +02:00
Bharata B Rao
b7bca73334 cpu: Convert cpu_index into a bitmap
Currently CPUState::cpu_index is monotonically increasing and a newly
created CPU always gets the next higher index. The next available
index is calculated by counting the existing number of CPUs. This is
fine as long as we only add CPUs, but there are architectures which
are starting to support CPU removal, too. For an architecture like PowerPC
which derives its CPU identifier (device tree ID) from cpu_index, the
existing logic of generating cpu_index values causes problems.

With the currently proposed method of handling vCPU removal by parking
the vCPU fd in QEMU
(Ref: http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg02604.html),
generating cpu_index this way will not work for PowerPC.

This patch changes the way cpu_index is handed out by maintaining
a bit map of the CPUs that tracks both addition and removal of CPUs.

The CPU bitmap allocation logic is part of cpu_exec_init(), which is
called by instance_init routines of various CPU targets. Newly added
cpu_exec_exit() API handles the deallocation part and this routine is
called from generic CPU instance_finalize.

Note: This new CPU enumeration is for !CONFIG_USER_ONLY only.
CONFIG_USER_ONLY continues to have the old enumeration logic.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
[AF: max_cpus -> MAX_CPUMASK_BITS]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-07-09 15:20:40 +02:00
Bharata B Rao
5a790cc4b9 cpu: Add Error argument to cpu_exec_init()
Add an Error argument to cpu_exec_init() to let users collect the
error. This is in preparation to change the CPU enumeration logic
in cpu_exec_init(). With the new enumeration logic, cpu_exec_init()
can fail if cpu_index values corresponding to max_cpus have already
been handed out.

Since all current callers of cpu_exec_init() are from instance_init,
use error_abort Error argument to abort in case of an error.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-07-09 15:20:40 +02:00
Eduardo Habkost
291135b5da cpu: Reorder cpu->as, cpu->thread_id, cpu->memory_dispatch init
Instead of initializing cpu->as, cpu->thread_id, and reloading memory
map while holding cpu_list_lock(), do it earlier, before locking the CPU
list and initializing cpu_index.

This allows the code handling cpu_index and global CPU list to be
isolated from the rest.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-07-09 15:20:39 +02:00
Eduardo Habkost
7c39163e38 cpu: Initialize breakpoint/watchpoint lists in cpu_common_initfn()
One small step in the simplification of cpu_exec_init().

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-07-09 15:20:39 +02:00
Eduardo Habkost
199fc85acd cpu: No need to zero-initialize CPUState::numa_node
QOM objects are already zero-filled when instantiated, there's no need
to explicitly set numa_node to 0.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-07-09 15:20:39 +02:00
Li Zhijian
dd63169766 migration: extend migration_bitmap
Prevously, if we hotplug a device(e.g. device_add e1000) during
migration is processing in source side, qemu will add a new ram
block but migration_bitmap is not extended.
In this case, migration_bitmap will overflow and lead qemu abort
unexpectedly.

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-07-07 14:54:56 +02:00
Paolo Bonzini
b242e0e0e2 exec: skip MMIO regions correctly in cpu_physical_memory_write_rom_internal
Loading the BIOS in the mac99 machine is interesting, because there is a
PROM in the middle of the BIOS region (from 16K to 32K).  Before memory
region accesses were clamped, when QEMU was asked to load a BIOS from
0xfff00000 to 0xffffffff it would put even those 16K from the BIOS file
into the region.  This is weird because those 16K were not actually
visible between 0xfff04000 and 0xfff07fff.  However, it worked.

After clamping was added, this also worked.  In this case, the
cpu_physical_memory_write_rom_internal function split the write in
three parts: the first 16K were copied, the PROM area (second 16K) were
ignored, then the rest was copied.

Problems then started with commit 965eb2f (exec: do not clamp accesses
to MMIO regions, 2015-06-17).  Clamping accesses is not done for MMIO
regions because they can overlap wildly, and MMIO registers can be
expected to perform full-width accesses based only on their address
(with no respect for adjacent registers that could decode to completely
different MemoryRegions).  However, this lack of clamping also applied
to the PROM area!  cpu_physical_memory_write_rom_internal thus failed
to copy the third range above, i.e. only copied the first 16K of the BIOS.

In effect, address_space_translate is expecting _something else_ to do
the clamping for MMIO regions if the incoming length is large.  This
"something else" is memory_access_size in the case of address_space_rw,
so use the same logic in cpu_physical_memory_write_rom_internal.

Reported-by: Alexander Graf <agraf@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Fixes: 965eb2f
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-06 14:59:11 +02:00
Jan Kiszka
4840f10eff memory: let address_space_rw/ld*/st* run outside the BQL
The MMIO case is further broken up in two cases: if the caller does not
hold the BQL on invocation, the unlocked one takes or avoids BQL depending
on the locking strategy of the target memory region and its coalesced
MMIO handling.  In this case, the caller should not hold _any_ lock
(a friendly suggestion which is disregarded by virtio-scsi-dataplane).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Frederic Konrad <fred.konrad@greensocs.com>
Message-Id: <1434646046-27150-6-git-send-email-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-01 15:45:51 +02:00
Paolo Bonzini
125b380666 exec: pull qemu_flush_coalesced_mmio_buffer() into address_space_rw/ld*/st*
As memory_region_read/write_accessor will now be run also without BQL held,
we need to move coalesced MMIO flushing earlier in the dispatch process.

Cc: Frederic Konrad <fred.konrad@greensocs.com>
Message-Id: <1434646046-27150-5-git-send-email-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-07-01 15:45:50 +02:00
Paolo Bonzini
e4a511f8cc exec: clamp accesses against the MemoryRegionSection
Because the clamping was done against the MemoryRegion,
address_space_rw was effectively broken if a write spanned
multiple sections that are not linear in underlying memory
(with the memory not being under an IOMMU).

This is visible with the MIPS rc4030 IOMMU, which is implemented
as a series of alias memory regions that point to the actual RAM.

Tested-by: Hervé Poussineau <hpoussin@reactos.org>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 12:27:14 +02:00
Paolo Bonzini
965eb2fcdf exec: do not clamp accesses to MMIO regions
It is common for MMIO registers to overlap, for example a 4 byte register
at 0xcf8 (totally random choice... :)) and a 1 byte register at 0xcf9.
If these registers are implemented via separate MemoryRegions, it is
wrong to clamp the accesses as the value written would be truncated.

Hence for these regions the effects of commit 23820db (exec: Respect
as_translate_internal length clamp, 2015-03-16, previously applied as
commit c3c1bb99) must be skipped.

Tested-by: Hervé Poussineau <hpoussin@reactos.org>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-19 12:27:14 +02:00
Dr. David Alan Gilbert
e3807054e2 qemu_ram_foreach_block: pass up error value, and down the ramblock name
check the return value of the function it calls and error if it's non-0
Fixup qemu_rdma_init_one_block that is the only current caller,
  and rdma_add_block the only function it calls using it.

Pass the name of the ramblock to the function; helps in debugging.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Michael R. Hines <mrhines@us.ibm.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-06-12 06:54:01 +02:00
Juan Quintela
5cd8cadae8 migration: Use normal VMStateDescriptions for Subsections
We create optional sections with this patch.  But we already have
optional subsections.  Instead of having two mechanism that do the
same, we can just generalize it.

For subsections we just change:

- Add a needed function to VMStateDescription
- Remove VMStateSubsection (after removal of the needed function
  it is just a VMStateDescription)
- Adjust the whole tree, moving the needed function to the corresponding
  VMStateDescription

Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-06-12 06:53:57 +02:00
Stefan Hajnoczi
03eebc9e32 memory: replace cpu_physical_memory_reset_dirty() with test-and-clear
The cpu_physical_memory_reset_dirty() function is sometimes used
together with cpu_physical_memory_get_dirty().  This is not atomic since
two separate accesses to the dirty memory bitmap are made.

Turn cpu_physical_memory_reset_dirty() and
cpu_physical_memory_clear_dirty_range_type() into the atomic
cpu_physical_memory_test_and_clear_dirty().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <1417519399-3166-6-git-send-email-stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:10:00 +02:00
Paolo Bonzini
e87f7778b6 exec: only check relevant bitmaps for cleanliness
Most of the time, not all bitmaps have to be marked as dirty;
do not do anything if the interesting ones are already dirty.
Previously, any clean bitmap would have cause all the bitmaps to be
marked dirty.

In fact, unless running TCG most of the time bitmap operations need
not be done at all, because memory_region_is_logging returns zero.
In this case, skip the call to cpu_physical_memory_range_includes_clean
altogether as well.

With this patch, cpu_physical_memory_set_dirty_range is called
unconditionally, so there need not be anymore a separate call to
xen_modified_memory.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:10:00 +02:00
Paolo Bonzini
58d2707e87 exec: pass client mask to cpu_physical_memory_set_dirty_range
This cuts in half the cost of bitmap operations (which will become more
expensive when made atomic) during migration on non-VRAM regions.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:09:59 +02:00
Paolo Bonzini
358653391b translate-all: remove unnecessary argument to tb_invalidate_phys_range
The is_cpu_write_access argument is always 0, remove it.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:09:59 +02:00
Paolo Bonzini
845b6214a3 exec: use memory_region_get_dirty_log_mask to optimize dirty tracking
The memory API can now return the exact set of bitmaps that have to
be tracked.  Use it instead of the in_migration variable.

In the next patches, we will also use it to set only DIRTY_MEMORY_VGA
or DIRTY_MEMORY_MIGRATION if necessary.  This can make a difference
for dataplane, especially after the dirty bitmap is changed to use
more expensive atomic operations.

Of some interest is the change to stl_phys_notdirty.  When migration
was introduced, stl_phys_notdirty was changed to effectively behave
as stl_phys during migration.  In fact, if one looks at the function as it
was in the beginning (commit 8df1cd0, physical memory access functions,
2005-01-28), at the time the dirty bitmap was the equivalent of
DIRTY_MEMORY_CODE nowadays; hence, the function simply should not touch
the dirty code bits.  This patch changes it to do the intended thing.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:09:59 +02:00
Paolo Bonzini
49dfcec403 ram_addr: tweaks to xen_modified_memory
Invoke xen_modified_memory from cpu_physical_memory_set_dirty_range_nocode;
it is akin to DIRTY_MEMORY_MIGRATION, so set it together with that bitmap.
The remaining call from invalidate_and_set_dirty's "else" branch will go
away soon.

Second, fix the second argument to the function in the
cpu_physical_memory_set_dirty_lebitmap call site.  That function is only used
by KVM, but it is better to be clean anyway.

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:09:59 +02:00
Paolo Bonzini
db94604b20 exec: optimize phys_page_set_level
phys_page_set_level is writing zeroes to a struct that has just been
filled in by phys_map_node_alloc.  Instead, tell phys_map_node_alloc
whether to fill in the page "as a leaf" or "as a non-leaf".

memcpy is faster than struct assignment, which copies each bitfield
individually.  A compiler bug (https://gcc.gnu.org/PR66391), and
small memcpys like this one are special-cased anyway, and optimized
to a register move, so just use the memcpy.

This cuts the cost of phys_page_set_level from 25% to 5% when
booting qboot.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05 17:09:58 +02:00
Paolo Bonzini
41063e1e7a exec: move rcu_read_lock/unlock to address_space_translate callers
Once address_space_translate will be called outside the BQL, the returned
MemoryRegion might disappear as soon as the RCU read-side critical section
ends.  Avoid this by moving the critical section to the callers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1426684909-95030-3-git-send-email-pbonzini@redhat.com>
2015-04-30 16:55:32 +02:00
Peter Maydell
06feaacfb4 - miscellaneous cleanups for TCG (Emilio) and NBD (Bogdan)
- next part in the thread-safe address_space_* saga: atomic access
   to the bounce buffer and the map_clients list, from Fam
 - optional support for linking with tcmalloc, also from Fam
 - reapplying Peter Crosthwaite's "Respect as_translate_internal
   length clamp" after fixing the SPARC fallout.
 - build system fix from Wei Liu
 - small acpi-build and ioport cleanup by myself
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJVQJd4AAoJEL/70l94x66DYFYH/3ifhqWZsd4dfJri0CGAHI4i
 SpPmNeouc8W+F/3lwf6Inrh5NnTgd5QzoUBMQaWVkQKwUiWls8g2mXkT3jo0iDqT
 /B40YXnZjNm20MixNaZmk9AsOF6OqPM8EMufau874k5zTlx3tCGAW1QD+I1N7WK7
 DfsFsIUD1svo2prn55fSoitMG1TIVPnpcklb4YGJRbAacQYUDhr5KAIhT1quDR2R
 93BvToyQmPqRQ4YKqnJLp8HAkL4FaJumfFZVvyh2cZvyaYGN/RVdi2Dw985dJDPX
 /z4enE4GCAs4RDw3lZ1RDbiZDqpT2ibFgASg/arX3SxzqHirOGvMdkOjO99r9j4=
 =aLjh
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

- miscellaneous cleanups for TCG (Emilio) and NBD (Bogdan)
- next part in the thread-safe address_space_* saga: atomic access
  to the bounce buffer and the map_clients list, from Fam
- optional support for linking with tcmalloc, also from Fam
- reapplying Peter Crosthwaite's "Respect as_translate_internal
  length clamp" after fixing the SPARC fallout.
- build system fix from Wei Liu
- small acpi-build and ioport cleanup by myself

# gpg: Signature made Wed Apr 29 09:34:00 2015 BST using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (22 commits)
  nbd/trivial: fix type cast for ioctl
  translate-all: use bitmap helpers for PageDesc's bitmap
  target-i386: disable LINT0 after reset
  Makefile.target: prepend $libs_softmmu to $LIBS
  milkymist: do not modify libs-softmmu
  configure: Add support for tcmalloc
  exec: Respect as_translate_internal length clamp
  ioport: reserve the whole range of an I/O port in the AddressSpace
  ioport: loosen assertions on emulation of 16-bit ports
  ioport: remove wrong comment
  ide: there is only one data port
  gus: clean up MemoryRegionPortio
  sb16: remove useless mixer_write_indexw
  sun4m: fix slavio sysctrl and led register sizes
  acpi-build: remove dependency from ram_addr.h
  memory: add memory_region_ram_resize
  dma-helpers: Fix race condition of continue_after_map_failure and dma_aio_cancel
  exec: Notify cpu_register_map_client caller if the bounce buffer is available
  exec: Protect map_client_list with mutex
  linux-user, bsd-user: Remove two calls to cpu_exec_init_all
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-04-30 12:04:11 +01:00
Peter Crosthwaite
23820dbfc7 exec: Respect as_translate_internal length clamp
address_space_translate_internal will clamp the *plen length argument
based on the size of the memory region being queried. The iommu walker
logic in addresss_space_translate was ignoring this by discarding the
post fn call value of *plen. Fix by just always using *plen as the
length argument throughout the fn, removing the len local variable.

This fixes a bootloader bug when a single elf section spans multiple
QEMU memory regions.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Message-Id: <1426570554-15940-1-git-send-email-peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-27 18:24:19 +02:00
Fam Zheng
e95205e1f9 dma-helpers: Fix race condition of continue_after_map_failure and dma_aio_cancel
If DMA's owning thread cancels the IO while the bounce buffer's owning thread
is notifying the "cpu client list", a use-after-free happens:

     continue_after_map_failure               dma_aio_cancel
     ------------------------------------------------------------------
     aio_bh_new
                                              qemu_bh_delete
     qemu_bh_schedule (use after free)

Also, the old code doesn't run the bh in the right AioContext.

Fix both problems by passing a QEMUBH to cpu_register_map_client.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1426496617-10702-6-git-send-email-famz@redhat.com>
[Remove unnecessary forward declaration. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-27 18:24:18 +02:00
Fam Zheng
33b6c2edf6 exec: Notify cpu_register_map_client caller if the bounce buffer is available
The caller's workflow is like

    if (!address_space_map()) {
        ...
        cpu_register_map_client();
    }

If bounce buffer became available after address_space_map() but before
cpu_register_map_client(), the caller could miss it and has to wait for the
next bounce buffer notify, which may never happen in the worse case.

Just notify the list in cpu_register_map_client().

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1426496617-10702-5-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-27 18:24:18 +02:00
Fam Zheng
38e047b50d exec: Protect map_client_list with mutex
So that accesses from multiple threads are safe.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1426496617-10702-4-git-send-email-famz@redhat.com>
[Remove #if from cpu_exec_init_all. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-27 18:24:17 +02:00
Fam Zheng
c2cba0ffe4 exec: Atomic access to bounce buffer
There could be a race condition when two processes call
address_space_map concurrently and both want to use the bounce buffer.

Add an in_use flag in BounceBuffer to sync it.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1426496617-10702-2-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-27 18:24:17 +02:00
Peter Maydell
66b9b43c42 exec.c: Capture the memory attributes for a watchpoint hit
Capture the memory attributes for the transaction which triggered
a watchpoint; this allows CPU specific code to implement features
like ARM's "user-mode only WPs also hit for LDRT/STRT accesses
made from privileged code". This change also correctly passes
through the memory attributes to the underlying device when
a watchpoint access doesn't hit.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2015-04-26 16:49:24 +01:00
Peter Maydell
500131154d exec.c: Add new address_space_ld*/st* functions
Add new address_space_ld*/st* functions which allow transaction
attributes and error reporting for basic load and stores. These
are named to be in line with the address_space_read/write/rw
buffer operations.

The existing ld/st*_phys functions are now wrappers around
the new functions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2015-04-26 16:49:24 +01:00
Peter Maydell
5c9eb0286c exec.c: Make address_space_rw take transaction attributes
Make address_space_rw take transaction attributes, rather
than always using the 'unspecified' attributes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2015-04-26 16:49:24 +01:00
Peter Maydell
f25a49e005 exec.c: Convert subpage memory ops to _with_attrs
Convert the subpage memory ops to _with_attrs; this will allow
us to pass the attributes through to the underlying access
functions. (Nothing uses the attributes yet.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2015-04-26 16:49:24 +01:00
Peter Maydell
3b64349539 memory: Replace io_mem_read/write with memory_region_dispatch_read/write
Rather than retaining io_mem_read/write as simple wrappers around
the memory_region_dispatch_read/write functions, make the latter
public and change all the callers to use them, since we need to
touch all the callsites anyway to add MemTxAttrs and MemTxResult
support. Delete io_mem_read and io_mem_write entirely.

(All the callers currently pass MEMTXATTRS_UNSPECIFIED
and convert the return value back to bool or ignore it.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2015-04-26 16:49:23 +01:00
Paolo Bonzini
4025446f0a Revert "exec: Respect as_tranlsate_internal length clamp"
This reverts commit c3c1bb99d1.
It causes problems with boards that declare memory regions shorter
than the registers they contain.

Reported-by: Zoltan Balaton <balaton@eik.bme.hu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-01 10:06:38 +02:00
Paolo Bonzini
f18c69cfc5 exec: avoid possible overwriting of mmaped area in qemu_ram_remap
It is not necessary to munmap an area before remapping it with MAP_FIXED;
if the memory region specified by addr and len overlaps pages of any
existing mapping, then the overlapped part of the existing mapping will
be discarded.

On the other hand, if QEMU does munmap the pages, there is a small
probability that another mmap sneaks in and catches the just-freed
portion of the address space.  In effect, munmap followed by
mmap(MAP_FIXED) is a use-after-free error, and Coverity flags it
as such.  Fix it.

Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-26 10:43:54 +01:00
Peter Crosthwaite
c3c1bb99d1 exec: Respect as_tranlsate_internal length clamp
address_space_translate_internal will clamp the *plen length argument
based on the size of the memory region being queried. The iommu walker
logic in addresss_space_translate was ignoring this by discarding the
post fn call value of *plen. Fix by just always using *plen as the
length argument throughout the fn, removing the len local variable.

This fixes a bootloader bug when a single elf section spans multiple
QEMU memory regions.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Message-Id: <1426570554-15940-1-git-send-email-peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-18 12:09:42 +01:00
Peter Maydell
a195fdd028 misc fixes and cleanups
A bunch of fixes all over the place, some of the
 bugs fixed are actually regressions.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVAH/uAAoJECgfDbjSjVRprq0H/iyqLSHQIv6gNOPYQbLXOCv0
 pkCeLx6kTMO9lSwxZcsZvMsYPeiEL3CHRKJcEjq0+Ap0uen0pa2Yl3WzyJcnBcib
 xwkHk/UftFYAiZAzVtd4moXujvVLYNL1ukvr/wPOdIkTEn8U6K3NaT3pLooc369f
 oTyQhlL3E9HJ5S6X0HXJIFwtsOIhPfS3NCLoDFbFjtb9mIsqTx7N5s2C5hctF+ir
 JtyuwPx5oT73WYxoYmjSP6n/Nf5cuJdqtm6o2KijjhWWYMJ6epYVBo/DD6dIFbmJ
 V/23dxpon+lvhae2c2LAVrkiJ1Boon/eMbJK/mNwpFX7vW35ataLPy6pYpaiEJs=
 =RUld
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

misc fixes and cleanups

A bunch of fixes all over the place, some of the
bugs fixed are actually regressions.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Wed Mar 11 17:48:30 2015 GMT using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream: (25 commits)
  virtio-scsi: remove empty wrapper for cmd
  virtio-scsi: clean out duplicate cdb field
  virtio-scsi: fix cdb/sense size
  uapi/virtio_scsi: allow overriding CDB/SENSE size
  virtio-scsi: drop duplicate CDB/SENSE SIZE
  exec: don't include hw/boards for linux-user
  acpi: specify format for build_append_namestring
  MAINTAINERS: drop aliguori@amazon.com
  tpm: Move memory subregion function into realize function
  virtio-pci: Convert to realize()
  pci: Convert pci_nic_init() to Error to avoid qdev_init()
  machine: query mem-merge machine property
  machine: query dump-guest-core machine property
  hw/boards: make it safe to include for linux-user
  machine: query phandle-start machine property
  machine: query kvm-shadow-mem machine property
  kvm: add machine state to kvm_arch_init
  machine: query kernel-irqchip property
  machine: allowed/required kernel-irqchip support
  machine: replace qemu opts with iommu property
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-03-12 09:13:07 +00:00
Michael S. Tsirkin
4485bd269c exec: don't include hw/boards for linux-user
As noted by Andreas, hw/boards.h shouldn't be used outside softmmu code.
Include it conditionally, and drop the (now unnecessary) ifdef guards in
hw/boards.h

Reported-by: Andreas Färber <afaerber@suse.de>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
2015-03-11 18:24:29 +01:00
Marcel Apfelbaum
75cc7f0183 machine: query mem-merge machine property
Running
    qemu-bin ... -machine pc,mem-merge=on
leads to crash:
    x86_64-softmmu/qemu-system-x86_64 -machine pc,dump-guest-core=on
    qemu-system-x86_64: qemu/util/qemu-option.c:387: qemu_opt_get_bool_helper:
    Assertion `opt->desc && opt->desc->type == QEMU_OPT_BOOL' failed.  Aborted
    (core dumped)

This happens because the commit e79d5a6 ("machine: remove qemu_machine_opts
global list") removed the global option descriptions and moved them to
MachineState's QOM properties.

Fix this by querying machine properties through designated wrappers.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-11 18:19:22 +01:00
Marcel Apfelbaum
47c8ca533e machine: query dump-guest-core machine property
Running
    qemu-bin ... -machine pc,dump-guest-core=on
leads to crash:
    x86_64-softmmu/qemu-system-x86_64 -machine pc,dump-guest-core=on
    qemu-system-x86_64: qemu/util/qemu-option.c:387: qemu_opt_get_bool_helper:
    Assertion `opt->desc && opt->desc->type == QEMU_OPT_BOOL' failed.  Aborted
    (core dumped)

This happens because the commit e79d5a6 ("machine: remove qemu_machine_opts
global list") removed the global option descriptions and moved them to
MachineState's QOM properties.

Fix this by querying machine properties through designated wrappers.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-11 18:17:54 +01:00
Peter Maydell
23a7a28796 - scsi: improvements to error reporting and conversion to realize,
Coverity/sparse fix for iscsi driver
 - RCU fallout: fix -daemonize and s390x system emulation
 - KVM: kvm_stat improvements and new man page
 - x86: SYSRET fix for VxWorks
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJU/sUFAAoJEL/70l94x66D1JwIAJ28Lan2DQwi+xHvNxF8zW6n
 v7eMc04/fepuon0TYmUZC3qbqc00sccEQZQ+yAAauT9epZ/kdSDudDOzG+3F4MuQ
 /X3crXw2/jrhtWedGq49vFCONX4MKoaoudqK8kOFMe1ImQgkOYeAzOoqeFXyHsFh
 jINlKTJZB6oKzrZ+SYryY14cO7pvGaIhyqaCC+6GcVihTjm9Yq13lP1lFj7LsVRV
 aGfd6xH9RSV/mwzvZwD4i3cUWSUaV/wY0NDhAEzDPCUcxX0/nAj3XF1YeJUF30Qd
 ETaCLo/Nxq2R6POK3c/Zm/FRLvjzZ2caD+q1LcwB/bCYdc2lJ1JDxE/hr48ANv0=
 =OWXY
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

- scsi: improvements to error reporting and conversion to realize,
  Coverity/sparse fix for iscsi driver
- RCU fallout: fix -daemonize and s390x system emulation
- KVM: kvm_stat improvements and new man page
- x86: SYSRET fix for VxWorks

# gpg: Signature made Tue Mar 10 10:18:45 2015 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  x86: fix SS selector in SYSRET
  scsi: Convert remaining PCI HBAs to realize()
  scsi: Improve error reporting for invalid drive property
  hw: Propagate errors through qdev_prop_set_drive()
  scsi: Clean up duplicated error in legacy if=scsi code
  cpus: initialize cpu->memory_dispatch
  rcu: handle forks safely
  qemu-thread: do not use PTHREAD_MUTEX_ERRORCHECK
  kvm_stat: add kvm_stat.1 man page
  kvm_stat: add column headers to text UI
  iscsi: Fix check for username

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-03-10 18:03:02 +00:00
Paolo Bonzini
cba7054928 cpus: initialize cpu->memory_dispatch
This fixes a NULL pointer dereference in s390x-softmmu.

On pretty much all other architectures, creating an MMIO region calls
cpu_reload_memory_map.  On s390, however, there are no MMIO regions
and everything is done via hypercalls.

Fixes: 9d82b5a792
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-10 10:49:25 +01:00
Gonglei
81b07353c5 Remove superfluous '\n' around error_report()
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-03-10 08:15:33 +03:00
Michael S. Tsirkin
129ddaf31b exec: round up size on MR resize
Block size must fundamentally be a multiple of target page size.
Aligning automatically removes need to worry about the alignment
from callers.

Note: the only caller of qemu_ram_resize (acpi) already happens to have
size padded to a power of 2, but we would like to drop the padding in
ACPI core, and don't want to expose target page size knowledge to ACPI.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Paolo Bonzini <ponzini@redhat.com>
2015-02-26 12:42:20 +01:00
Mike Day
0dc3f44aca Convert ram_list to RCU
Allow "unlocked" reads of the ram_list by using an RCU-enabled QLIST.

The ramlist mutex is kept.  call_rcu callbacks are run with the iothread
lock taken, but that may change in the future.  Writers still take the
ramlist mutex, but they no longer need to assume that the iothread lock
is taken.

Readers of the list, instead, no longer require either the iothread
or ramlist mutex, but they need to use rcu_read_lock() and
rcu_read_unlock().

One place in arch_init.c was downgrading from write side to read side
like this:

    qemu_mutex_lock_iothread()
    qemu_mutex_lock_ramlist()
    ...
    qemu_mutex_unlock_iothread()
    ...
    qemu_mutex_unlock_ramlist()

and the equivalent idiom is:

    qemu_mutex_lock_ramlist()
    rcu_read_lock()
    ...
    qemu_mutex_unlock_ramlist()
    ...
    rcu_read_unlock()

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-16 17:31:55 +01:00
Mike Day
0d53d9fe8a exec: convert ram_list to QLIST
QLIST has RCU-friendly primitives, so switch to it.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-16 17:30:20 +01:00
Mike Day
ae3a7047d0 cosmetic changes preparing for the following patches
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-16 17:30:20 +01:00
Paolo Bonzini
43771539d4 exec: protect mru_block with RCU
Hence, freeing a RAMBlock has to be switched to call_rcu.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-16 17:30:19 +01:00
Paolo Bonzini
79e2b9aecc exec: RCUify AddressSpaceDispatch
Note that even after this patch, most callers of address_space_*
functions must still be under the big QEMU lock, otherwise the memory
region returned by address_space_translate can disappear as soon as
address_space_translate returns.  This will be fixed in the next part
of this series.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-16 17:30:19 +01:00
Paolo Bonzini
9d82b5a792 exec: make iotlb RCU-friendly
After the previous patch, TLBs will be flushed on every change to
the memory mapping.  This patch augments that with synchronization
of the MemoryRegionSections referred to in the iotlb array.

With this change, it is guaranteed that iotlb_to_region will access
the correct memory map, even once the TLB will be accessed outside
the BQL.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-16 17:30:19 +01:00
Paolo Bonzini
76e5c76f2e exec: introduce cpu_reload_memory_map
This for now is a simple TLB flush.  This can change later for two
reasons:

1) an AddressSpaceDispatch will be cached in the CPUState object

2) it will not be possible to do tlb_flush once the TCG-generated code
runs outside the BQL.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-16 17:30:19 +01:00
Paolo Bonzini
6e48e8f9e0 memory: unregister AddressSpace MemoryListener within BQL
address_space_destroy_dispatch is called from an RCU callback and hence
outside the iothread mutex (BQL).  However, after address_space_destroy
no new accesses can hit the destroyed AddressSpace so it is not necessary
to observe changes to the memory map.  Move the memory_listener_unregister
call earlier, to make it thread-safe again.

Reported-by: Alex Williamson <alex.williamson@redhat.com>
Fixes: 374f2981d1
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-02-10 10:25:44 -07:00
Paolo Bonzini
a904c91196 exec: fix madvise of NULL pointer
Coverity flags this as "dereference after null check".  Not quite a
dereference, since it will just EFAULT, but still nice to fix.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:27:05 +01:00
Peter Maydell
ec53b45bcd exec.c: Drop TARGET_HAS_ICE define and checks
The TARGET_HAS_ICE #define is intended to indicate whether a target-*
guest CPU implementation supports the breakpoint handling. However,
all our guest CPUs have that support (the only two which do not
define TARGET_HAS_ICE are unicore32 and openrisc, and in both those
cases the bp support is present and the lack of the #define is just
a bug). So remove the #define entirely: all new guest CPU support
should include breakpoint handling as part of the basic implementation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 1420484960-32365-1-git-send-email-peter.maydell@linaro.org
2015-01-20 15:19:32 +00:00
Peter Maydell
aaf0301917 pc: resizeable ROM blocks
This makes ROM blocks resizeable.  This infrastructure is required for other
 functionality we have queued.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUrme8AAoJECgfDbjSjVRpqmEH/1APnrphAi/CM6rxf2hPyvWj
 f5yQDNXfeGxrHaW5vux6DvgHUkTng6KGBxz6XMSiwul6MeyRFNDqwbfMhSHjiIum
 QkT//jqb5xux60kyTLXuIBTPok1SsKDtaTxbvZb0VmZrnkdYeI2CLa1Mq3cQUY0a
 8DKnchQEM5lic9bxj+OuLiDFx8QYaMpQlUP9iIvNq6GjX+0zNsWvfPtkMTm00t93
 lHKPvD2eVmrgfS5g+lkAwLDahLSjqwDc0YuLABOgDUFsZFz9GAUCHSpt0y8HEBwR
 1NhGCfbnyyRl/1OSULtARGQ4Ddwm5dn1i5I4usoP5rLFS7FV5F7xhBu0IZlwgVA=
 =pFmm
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc: resizeable ROM blocks

This makes ROM blocks resizeable.  This infrastructure is required for other
functionality we have queued.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Thu 08 Jan 2015 11:19:24 GMT using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream:
  acpi-build: make ROMs RAM blocks resizeable
  memory: API to allocate resizeable RAM MR
  arch_init: support resizing on incoming migration
  exec: qemu_ram_alloc_resizeable, qemu_ram_resize
  exec: split length -> used_length/max_length
  exec: cpu_physical_memory_set/clear_dirty_range
  memory: add memory_region_set_size

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-01-10 21:02:23 +00:00
Michael S. Tsirkin
62be4e3a50 exec: qemu_ram_alloc_resizeable, qemu_ram_resize
Add API to allocate "resizeable" RAM.
This looks just like regular RAM generally, but
has a special property that only a portion of it
(used_length) is actually used, and migrated.

This used_length size can change across reboots.

Follow up patches will change used_length for such blocks at migration,
making it easier to extend devices using such RAM (notably ACPI,
but in the future thinkably other ROMs) without breaking migration
compatibility or wasting ROM (guest) memory.

Device is notified on resize, so it can adjust if necessary.

qemu_ram_alloc_resizeable allocates this memory, qemu_ram_resize resizes
it.

Note: nothing prevents making all RAM resizeable in this way.
However, reviewers felt that only enabling this selectively will
make some class of errors easier to detect.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 13:17:54 +02:00
Michael S. Tsirkin
9b8424d573 exec: split length -> used_length/max_length
This patch allows us to distinguish between two
length values for each block:
    max_length - length of memory block that was allocated
    used_length - length of block used by QEMU/guest

Currently, we set used_length - max_length, unconditionally.
Follow-up patches allow used_length <= max_length.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 13:17:54 +02:00
Michael S. Tsirkin
c8d6f66ae7 exec: cpu_physical_memory_set/clear_dirty_range
Make cpu_physical_memory_set/clear_dirty_range
behave symmetrically.

To clear range for a given client type only, add
cpu_physical_memory_clear_dirty_range_type.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 13:17:54 +02:00
Paolo Bonzini
ff6cff7554 exec: allows 8-byte accesses in subpage_ops
Otherwise fw_cfg accesses are split into 4-byte ones before they reach the
fw_cfg ops / handlers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1419250305-31062-6-git-send-email-pbonzini@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-12-22 23:39:17 +00:00
Paolo Bonzini
adee64249e exec: change default exception_index value for migration to -1
In QEMU 2.2 the exception_index value was added to the migration stream
through a subsection.  The default was set to 0, which is wrong and
should have been -1.

However, 2.2 does not have commit e511b4d (cpu-exec: reset exception_index
correctly, 2014-11-26), hence in 2.2 the exception_index is never used
and is set to -1 on the next call to cpu_exec.  So we can change the
migration stream to make the default -1.  The effects are:

- 2.2.1 -> 2.2.0: cpu->exception_index set incorrectly to 0 if it
were -1 on the source; then reset to -1 in cpu_exec.  This is TCG
only; KVM does not use exception_index.

- 2.2.0 -> 2.2.1: cpu->exception_index set incorrectly to -1 if it
were 0 on the source; but it would be reset to -1 in cpu_exec anyway.
This is TCG only; KVM does not use exception_index.

- 2.2.1 -> 2.1: two bugs fixed: 1) can migrate backwards if
cpu->exception_index is set to -1; 2) should not migrate backwards
(but 2.2.0 allows it) if cpu->exception_index is set to 0

- 2.2.0 -> 2.3.0: 2.2.0 will send the subsection unnecessarily if
exception_index is -1, but that is not a problem.  2.3.0 will set
cpu->exception_index to -1 if it is 0 on the source, but this would
be anyway a problem for 2.2.0 -> 2.2.x migration (due to lack of
commit e511b4d in 2.2.x) so we can ignore it

- 2.2.1 -> 2.3.0: everything works.

In addition, play it safe and never send the subsection unless TCG
is in use.  KVM does not use exception_index (PPC KVM stores values
in it for use in the subsequent call to ppc_cpu_do_interrupt, but
does not need it as soon as kvm_handle_debug returns).  Xen and
qtest do not run any code for the CPU at all.

Reported-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Tested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1418989994-17244-3-git-send-email-pbonzini@redhat.com
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-12-20 20:38:07 +00:00
Michael S. Tsirkin
1240be2435 exec: add wrapper for host pointer access
host pointer accesses force pointer math, let's
add a wrapper to make them safer.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amos Kong <akong@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2014-12-16 17:47:35 +05:30
Igor Mammedov
a2b257d621 memory: expose alignment used for allocating RAM as MemoryRegion API
introduce memory_region_get_alignment() that returns
underlying memory block alignment or 0 if it's not
relevant/implemented for backend.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-11-23 12:11:30 +02:00
Peter Maydell
f874bf905f exec: Handle multipage ranges in invalidate_and_set_dirty()
The code in invalidate_and_set_dirty() needs to handle addr/length
combinations which cross guest physical page boundaries. This can happen,
for example, when disk I/O reads large blocks into guest RAM which previously
held code that we have cached translations for. Unfortunately we were only
checking the clean/dirty status of the first page in the range, and then
were calling a tb_invalidate function which only handles ranges that don't
cross page boundaries. Fix the function to deal with multipage ranges.

The symptoms of this bug were that guest code would misbehave (eg segfault),
in particular after a guest reboot but potentially any time the guest
reused a page of its physical RAM for new code.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1416167061-13203-1-git-send-email-peter.maydell@linaro.org
2014-11-18 10:19:12 +00:00
Max Filippov
07e2863d02 exec.c: fix setting 1-byte-long watchpoints
With commit 05068c0dfb 'exec.c: Relax restrictions on watchpoint length
and alignment' it's no longer possible to set 1-byte-long watchpoint
because of incorrect address range check.
Fix that by changing condition that checks for address wraparound.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1411016616-29879-1-git-send-email-jcmvbkbc@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-19 17:42:16 +01:00
Peter Maydell
cc35a44cf7 Merge remote-tracking branch 'remotes/qmp-unstable/queue/qmp' into staging
* remotes/qmp-unstable/queue/qmp:
  exec: file_ram_alloc(): print error when prealloc fails
  monitor: fix debug print compiling error

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-15 19:44:34 +01:00
Peter Maydell
2b31cd4e08 - Memory: improve error reporting and avoid crashes on hotplug
- Build: fixing block/iscsi.so and ranlib warnings on Mac OS X
 - Migration fixes for x86
 - The odd KVM patch.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJUEXeWAAoJEBvWZb6bTYby4AwP/0Hh55A7QzkkzZ66y65zM+G5
 dsgRcLjufHSRQHoNQqm6LOcicV3Ygc/X644EY6jnZCZxFh/fsWuTPqUDGxLAnxEc
 2V0PkLRIScAMOPezzxvRy6/9hkG+UYM3ZOL5D9yxA9pGuBtttw7tkts19Vqf9WZc
 NYG5TBDuEGM1c596Zpo7t10m+Oiw+Jyi5luLXsb4lh5ikdFPDrtJaf0AnFvR+ym0
 HXlj2K/0vHNowUeLoo+oWnZsW8mLE6OyJhgfo1tJtsH1BR+lQJnBnQ4moq4Sl/Wz
 +iht/4gtz34XwLILokFR6yiNrPe+MIryyv+FYxOD5loIdGVDtKMx30UkIE2/D933
 6/n5i3GBLi9JapeT9gkKTxk/UVRPzJ1PK07RWevgNZNQyTGKAUGp+p48nSzMYX7V
 7GFSy3Q8uqOR8g9n+t+RURxkoMNbhhw7v53Z3PPXPCALCMDzg9RARlW/nkfiExcZ
 oThUjE/8xfMTQlN1SO5HTyQXEkYjtknZhfC7/KFvkWYMbCG0KBTf212Md0zlTNkj
 +C6r8Gq4ZWVIc07QyKkoCMxB+a9Uhvy4T1PKuSlm6iu94zUgZRhdf/PlOXimhFqH
 9GL67Tv15kpj05xCS6jDXjeMZ416/UKw91OcsiT1UUHcq7/rc+GBycd0ngV1UgnQ
 di5V12IVt8JwdzFxMeCT
 =GIKW
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

- Memory: improve error reporting and avoid crashes on hotplug
- Build: fixing block/iscsi.so and ranlib warnings on Mac OS X
- Migration fixes for x86
- The odd KVM patch.

# gpg: Signature made Thu 11 Sep 2014 11:21:10 BST using RSA key ID 9B4D86F2
# gpg: Good signature from "Paolo Bonzini <pbonzini@redhat.com>"
# gpg:                 aka "Paolo Bonzini <bonzini@gnu.org>"

* remotes/bonzini/tags/for-upstream: (21 commits)
  gdbstub: init mon_chr through qemu_chr_alloc
  pckbd: adding new fields to vmstate
  mc146818rtc: add missed field to vmstate
  piix: do not set irq while loading vmstate
  serial: fixing vmstate for save/restore
  parallel: adding vmstate for save/restore
  fdc: adding vmstate for save/restore
  cpu: init vmstate for ticks and clock offset
  apic_common: vapic_paddr synchronization fix
  vl: use QLIST_FOREACH_SAFE to visit change state handlers
  exec: add parameter errp to gethugepagesize
  exec: report error when memory < hpagesize
  hostmem-ram: don't exit qemu if size of memory-backend-ram is way too big
  memory: add parameter errp to memory_region_init_rom_device
  memory: add parameter errp to memory_region_init_ram
  exec: add parameter errp to qemu_ram_alloc and qemu_ram_alloc_from_ptr
  rules.mak: Fix DSO build by pulling in archive symbols
  util: Don't link host-utils.o if it's empty
  util: Move general qemu_getauxval to util/getauxval.c
  trace: Only link generated-tracers.o with "simple" backend
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-12 16:55:49 +01:00
Luiz Capitulino
e4d9df4fb1 exec: file_ram_alloc(): print error when prealloc fails
If memory allocation fails when using the -mem-prealloc command-line
option, QEMU exits without printing any error information to
the user:

 # qemu [...] -m 1G -mem-prealloc -mem-path /dev/hugepages
 # echo $?
 1

This commit adds an error message, so that we print instead:

 # qemu [...] -m 1G -mem-prealloc -mem-path /dev/hugepages
 qemu: unable to map backing store for hugepages: Cannot allocate memory

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-09-12 11:22:21 -04:00
Peter Maydell
08225676b2 exec.c: Record watchpoint fault address and direction
When we check whether we've hit a watchpoint we know the address
that we were attempting to access and whether it was a read or a
write. Record this information in the CPUWatchpoint struct so that
target-specific code can report it to the guest.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2014-09-12 14:06:48 +01:00
Peter Maydell
3ee887e8ff exec.c: Provide full set of dummy wp remove functions in user-mode
We already provide dummy versions of the cpu_watchpoint_insert
and cpu_watchpoint_remove_all functions when CONFIG_USER_ONLY
is defined. Complete the set by providing cpu_watchpoint_remove
and cpu_watchpoint_remove_by_ref as well.

This allows target-* code using these functions to avoid
some ifdeffery.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2014-09-12 14:06:48 +01:00
Peter Maydell
05068c0dfb exec.c: Relax restrictions on watchpoint length and alignment
The current implementation of watchpoints requires that they
have a power of 2 length which is not greater than TARGET_PAGE_SIZE
and that their address is a multiple of their length. Watchpoints
on ARM don't fit these restrictions, so change the implementation
so they can be relaxed.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2014-09-12 14:06:48 +01:00
Hu Tao
fc7a5800ad exec: add parameter errp to gethugepagesize
Add parameter errp to gethugepagesize thus callers can handle errors.

If user adds a memory-backend-file object using object_add command,
specifying a non-existing directory for property mem-path, qemu will
core dump with message:

  /nonexistingdir: No such file or directory
  Bad ram offset fffffffffffff000
  Aborted (core dumped)

This patch fixes the problem. With this patch, qemu reports an error
message like:

  qemu-system-x86_64: -object memory-backend-file,mem-path=/nonexistingdir,id=mem-file0,size=128M:
  failed to get page size of file /nonexistingdir: No such file or directory

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-09 13:41:44 +02:00
Hu Tao
557529dd60 exec: report error when memory < hpagesize
Report an error when memory < hpagesize in file_ram_alloc() so callers
can handle the error.

If user adds a memory-backend-file object using object_add command,
specifying a size that is less than huge page size, qemu will core dump
with message:

  Bad ram offset fffffffffffff000
  Aborted (core dumped)

This patch fixes the problem. With this patch, qemu reports error
message like:

  qemu-system-x86_64: -object memory-backend-file,mem-path=/hugepages,id=mem-file0,size=1M: memory
  size 0x100000 must be equal to or larger than huge page size 0x200000

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-09 13:41:44 +02:00
Hu Tao
ef701d7b6f exec: add parameter errp to qemu_ram_alloc and qemu_ram_alloc_from_ptr
Add parameter errp to qemu_ram_alloc and qemu_ram_alloc_from_ptr so that
we can handle errors.

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
[Assert ptr != NULL in memory_region_init_ram_ptr. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-09 13:41:25 +02:00
Pavel Dovgaluk
6c3bff0ed8 exec: Save CPUState::exception_index field
This patch adds a subsection with exception_index field to the VMState for
correct saving the CPU state.
Without this patch, simulator could miss the pending exception in the saved
virtual machine state.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Cc: qemu-stable@nongnu.org
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-09-05 16:32:48 +02:00
Le Tan
8d7b8cb9c2 iommu: add is_write as a parameter to the translate function of MemoryRegionIOMMUOps
Add a bool variable is_write as a parameter to the translate function of
MemoryRegionIOMMUOps to indicate the operation of the access. It can be
used for correct fault reporting from within the callback.
Change the interface of related functions.

Signed-off-by: Le Tan <tamlokveer@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-08-28 23:10:22 +02:00
Peter Maydell
0e4a773705 SCSI changes that enable sending vendor-specific commands via virtio-scsi.
Memory changes for QOMification and automatic tracking of MR lifetime.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJT8et9AAoJEBvWZb6bTYbyIJAQAI3AlLSe27xWoUGfQUgWH30z
 Rt/pShHz3BJMfQpD79JfTH8u6uBpkQmKtflerNT7FhXN9ULDzNq+b/jRtke8nkuy
 ctCt05FhhK00rfWpUoRue4XiCuvbizBU7MK0DI3yCyNdXQyYnFvgnvsJtlqox8Zh
 J5HZcBJEmdCiWBxq7UPk0qBitp4PqNoy7jlD/Ex3m7fJN5WK2cyspQIT9zmhehVn
 B8Nwp+RitDDbXbwm0r18col5rFr/6Nj6+dW1gr+7sVJDLNsmJEqC2l3Kgk0wbPkG
 Uqwbih29me9PC9/L1VLGHY0ApKDQ8JGE0GrYgEg162hbhoxEHkjjoHMhDUfV6Pj8
 NkqcjjWl11UUhgkNqrGafayXbBVnOiEglxy8uXCeq14y9Xd/gjK9Fz6MQvRSOjms
 PFmaKknhdmpxh0DuZmTix7WBmKim8zOiCE0/vrAPvwx5L+d1bn5xh6yQvtVjBMpU
 Sru3Mhdm9bL9dUDBgOM/G6WCxSTVLBlExOblcYkQh03MfabD7bfplcrKYPXt5ull
 Y8YLjqkoIfoy5t0ErvtlpdBJjeEz99JXU+wLQ6NYHnzwzTV+oUtSaEph14mAFOcY
 XkFKdoPDI9PnyEfvy4193du8z/dSbhu7sWgHWbTCQyrcaNnSaVhlH43NUC+p23YN
 8vfEsVLd1X7MFkDBUmWp
 =M+/m
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

SCSI changes that enable sending vendor-specific commands via virtio-scsi.

Memory changes for QOMification and automatic tracking of MR lifetime.

# gpg: Signature made Mon 18 Aug 2014 13:03:09 BST using RSA key ID 9B4D86F2
# gpg: Good signature from "Paolo Bonzini <pbonzini@redhat.com>"
# gpg:                 aka "Paolo Bonzini <bonzini@gnu.org>"

* remotes/bonzini/tags/for-upstream:
  mtree: remove write-only field
  memory: Use canonical path component as the name
  memory: Use memory_region_name for name access
  memory: constify memory_region_name
  exec: Abstract away ref to memory region names
  loader: Abstract away ref to memory region names
  tpm_tis: remove instance_finalize callback
  memory: remove memory_region_destroy
  memory: convert memory_region_destroy to object_unparent
  ioport: split deletion and destruction
  nic: do not destroy memory regions in cleanup functions
  vga: do not dynamically allocate chain4_alias
  sysbus: remove unused function sysbus_del_io
  qom: object: move unparenting to the child property's release callback
  qom: object: delete properties before calling instance_finalize
  virtio-scsi: implement parse_cdb
  scsi-block, scsi-generic: implement parse_cdb
  scsi-block: extract scsi_block_is_passthrough
  scsi-bus: introduce parse_cdb in SCSIDeviceClass and SCSIBusInfo
  scsi-bus: prepare scsi_req_new for introduction of parse_cdb

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-08-19 13:00:57 +01:00
Peter Crosthwaite
83234bf2fa exec: Abstract away ref to memory region names
Use the function provided rather than spying on the struct.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-18 12:06:21 +02:00
Paolo Bonzini
6886867e98 exec: fix migration with devices that use address_space_rw
Devices that use address_space_rw to write large areas to memory
(as opposed to address_space_map/unmap) were broken with respect
to migration since fe680d0 (exec: Limit translation limiting in
address_space_translate to xen, 2014-05-07).  Such devices include
IDE CD-ROMs.

The reason is that invalidate_and_set_dirty (called by address_space_rw
but not address_space_map/unmap) was only setting the dirty bit for
the first page in the translation.

To fix this, introduce cpu_physical_memory_set_dirty_range_nocode that
is the same as cpu_physical_memory_set_dirty_range except it does not
muck with the DIRTY_MEMORY_CODE bitmap.  This function can be used if
the caller invalidates translations with tb_invalidate_phys_page_range.

There is another difference between cpu_physical_memory_set_dirty_range
and cpu_physical_memory_set_dirty_flag; the former includes a call
to xen_modified_memory.  This is handled separately in
invalidate_and_set_dirty, and is not needed in other callers of
cpu_physical_memory_set_dirty_range_nocode, so leave it alone.

Just one nit: now that invalidate_and_set_dirty takes care of handling
multiple pages, there is no need for address_space_unmap to wrap it
in a loop.  In fact that loop would now be O(n^2).

Reported-by: Dave Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-22 10:38:50 +02:00
Paolo Bonzini
1f6245e5ab memory: do not give a name to the internal exec.c regions
There is no need to have them visible under /machine.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-01 10:20:41 +02:00
Peter Crosthwaite
b4fefef9d5 memory: MemoryRegion: QOMify
QOMify memory regions as an Object. The former init() and destroy()
routines become instance_init() and instance_finalize() resp.

memory_region_init() is re-implemented to be:
object_initialize() + set fields

memory_region_destroy() is re-implemented to call unparent().

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
[Add newly-created MR as child, unparent on destruction. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-01 10:20:41 +02:00