When using a memory-backend object with prealloc turned on, QEMU
will memset() the first byte in every memory page to zero. While
this might have been acceptable for memory backends associated
with RAM, this corrupts application data for NVDIMMs.
Instead of setting every page to zero, read the current byte
value and then just write that same value back, so we are not
corrupting the original data. Directly write the value instead
of memset()ing it, since there's no benefit to memset for a
single byte write.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Message-id: 20170303113255.28262-1-berrange@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
icount has become much slower after tcg_cpu_exec has stopped
using the BQL. There is also a latent bug that is masked by
the slowness.
The slowness happens because every occurrence of a QEMU_CLOCK_VIRTUAL
timer now has to wake up the I/O thread and wait for it. The rendez-vous
is mediated by the BQL QemuMutex:
- handle_icount_deadline wakes up the I/O thread with BQL taken
- the I/O thread wakes up and waits on the BQL
- the VCPU thread releases the BQL a little later
- the I/O thread raises an interrupt, which calls qemu_cpu_kick
- the VCPU thread notices the interrupt, takes the BQL to
process it and waits on it
All this back and forth is extremely expensive, causing a 6 to 8-fold
slowdown when icount is turned on.
One may think that the issue is that the VCPU thread is too dependent
on the BQL, but then the latent bug comes in. I first tried removing
the BQL completely from the x86 cpu_exec, only to see everything break.
The only way to fix it (and make everything slow again) was to add a dummy
BQL lock/unlock pair.
This is because in -icount mode you really have to process the events
before the CPU restarts executing the next instruction. Therefore, this
series moves the processing of QEMU_CLOCK_VIRTUAL timers straight in
the vCPU thread when running in icount mode.
The required changes include:
- make the timer notification callback wake up TCG's single vCPU thread
when run from another thread. By using async_run_on_cpu, the callback
can override all_cpu_threads_idle() when the CPU is halted.
- move handle_icount_deadline after qemu_tcg_wait_io_event, so that
the timer notification callback is invoked after the dummy work item
wakes up the vCPU thread
- make handle_icount_deadline run the timers instead of just waking the
I/O thread.
- stop processing the timers in the main loop
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There is no change for now, because the callback just invokes
qemu_notify_event.
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This dependency is the wrong way, and we will need util/qemu-timer.h from
sysemu/cpus.h in the next patch.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the first timer is exactly at the current value of the clock, the
deadline is met and the timer should fire. This fixes itself on the next
iteration of the loop without icount; with icount, however, execution
of instructions will stop exactly at the deadline and won't proceed.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Suramya Shah <shah.suramya@gmail.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170310163948.7567-1-shah.suramya@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using "-mem-prealloc" option for a large guest leads to higher guest
start-up and migration time. This is because with "-mem-prealloc" option
qemu tries to map every guest page (create address translations), and
make sure the pages are available during runtime. virsh/libvirt by
default, seems to use "-mem-prealloc" option in case the guest is
configured to use huge pages. The patch tries to map all guest pages
simultaneously by spawning multiple threads. Currently limiting the
change to QEMU library functions on POSIX compliant host only, as we are
not sure if the problem exists on win32. Below are some stats with
"-mem-prealloc" option for guest configured to use huge pages.
------------------------------------------------------------------------
Idle Guest | Start-up time | Migration time
------------------------------------------------------------------------
Guest stats with 2M HugePage usage - single threaded (existing code)
------------------------------------------------------------------------
64 Core - 4TB | 54m11.796s | 75m43.843s
64 Core - 1TB | 8m56.576s | 14m29.049s
64 Core - 256GB | 2m11.245s | 3m26.598s
------------------------------------------------------------------------
Guest stats with 2M HugePage usage - map guest pages using 8 threads
------------------------------------------------------------------------
64 Core - 4TB | 5m1.027s | 34m10.565s
64 Core - 1TB | 1m10.366s | 8m28.188s
64 Core - 256GB | 0m19.040s | 2m10.148s
-----------------------------------------------------------------------
Guest stats with 2M HugePage usage - map guest pages using 16 threads
-----------------------------------------------------------------------
64 Core - 4TB | 1m58.970s | 31m43.400s
64 Core - 1TB | 0m39.885s | 7m55.289s
64 Core - 256GB | 0m11.960s | 2m0.135s
-----------------------------------------------------------------------
Changed in v2:
- modify number of memset threads spawned to min(smp_cpus, 16).
- removed 64GB memory restriction for spawning memset threads.
Changed in v3:
- limit number of threads spawned based on
min(sysconf(_SC_NPROCESSORS_ONLN), 16, smp_cpus)
- implement memset thread specific siglongjmp in SIGBUS signal_handler.
Changed in v4
- remove sigsetjmp/siglongjmp and SIGBUS unblock/block for main thread
as main thread no longer touches any pages.
- simplify code my returning memset_thread_failed status from
touch_all_pages.
Signed-off-by: Jitendra Kolhe <jitendra.kolhe@hpe.com>
Message-Id: <1487907103-32350-1-git-send-email-jitendra.kolhe@hpe.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Additionally permit non-negative integers as key components. A
dictionary's keys must either be all integers or none. If all keys
are integers, convert the dictionary to a list. The set of keys must
be [0,N].
Examples:
* list.1=goner,list.0=null,list.1=eins,list.2=zwei
is equivalent to JSON [ "null", "eins", "zwei" ]
* a.b.c=1,a.b.0=2
is inconsistent: a.b.c clashes with a.b.0
* list.0=null,list.2=eins,list.2=zwei
has a hole: list.1 is missing
Similar design flaw as for objects: there is no way to denote an empty
list. While interpreting "key absent" as empty list seems natural
(removing a list member from the input string works when there are
multiple ones, so why not when there's just one), it doesn't work:
"key absent" already means "optional list absent", which isn't the
same as "empty list present".
Update the keyval object visitor to use this a.0 syntax in error
messages rather than the usual a[0].
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1488317230-26248-25-git-send-email-armbru@redhat.com>
[Off-by-one fix squashed in, as per Kevin's review]
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Until now, key components are separated by '.'. This leaves little
room for evolving the syntax, and is incompatible with the __RFQDN_
prefix convention for downstream extensions.
Since key components will be commonly used as QAPI member names by the
QObject input visitor, we can just as well borrow the QAPI naming
rules here: letters, digits, hyphen and period starting with a letter,
with an optional __RFQDN_ prefix for downstream extensions.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <1488317230-26248-20-git-send-email-armbru@redhat.com>
keyval_parse() parses KEY=VALUE,... into a QDict. Works like
qemu_opts_parse(), except:
* Returns a QDict instead of a QemuOpts (d'oh).
* Supports nesting, unlike QemuOpts: a KEY is split into key
fragments at '.' (dotted key convention; the block layer does
something similar on top of QemuOpts). The key fragments are QDict
keys, and the last one's value is updated to VALUE.
* Each key fragment may be up to 127 bytes long. qemu_opts_parse()
limits the entire key to 127 bytes.
* Overlong key fragments are rejected. qemu_opts_parse() silently
truncates them.
* Empty key fragments are rejected. qemu_opts_parse() happily
accepts empty keys.
* It does not store the returned value. qemu_opts_parse() stores it
in the QemuOptsList.
* It does not treat parameter "id" specially. qemu_opts_parse()
ignores all but the first "id", and fails when its value isn't
id_wellformed(), or duplicate (a QemuOpts with the same ID is
already stored). It also screws up when a value contains ",id=".
* Implied value is not supported. qemu_opts_parse() desugars "foo" to
"foo=on", and "nofoo" to "foo=off".
* An implied key's value can't be empty, and can't contain ','.
I intend to grow this into a saner replacement for QemuOpts. It'll
take time, though.
Note: keyval_parse() provides no way to do lists, and its key syntax
is incompatible with the __RFQDN_ prefix convention for downstream
extensions, because it blindly splits at '.', even in __RFQDN_. Both
issues will be addressed later in the series.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1488317230-26248-4-git-send-email-armbru@redhat.com>
This will probably be my last pull request before the hard freeze. It
has some new work, but that has all been posted in draft before the
soft freeze, so I think it's reasonable to include in qemu-2.9.
This batch has:
* A substantial amount of POWER9 work
* Implements the legacy (hash) MMU for POWER9
* Some more preliminaries for implementing the POWER9 radix
MMU
* POWER9 has_work
* Basic POWER9 compatibility mode handling
* Removal of some premature tests
* Some cleanups and fixes to the existing MMU code to make the
POWER9 work simpler
* A bugfix for TCG multiply adds on power
* Allow pseries guests to access PCIe extended config space
This also includes a code-motion not strictly in ppc code - moving
getrampagesize() from ppc code to exec.c. This will make some future
VFIO improvements easier, Paolo said it was ok to merge via my tree.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJYuOEEAAoJEGw4ysog2bOSHD0P/jBg/qr/4KnsB1KhnlVrB2sP
vy2d3bGGlUWr9Z+CK/PMCRB8ekFgQLjidLIXji6mviUocv6m3WsVrnbLF/oOL/IT
NPMVAffw7q804YVu1Ns9R82d6CIqHTy//bpg69tFMcJmhL9fqPan3wTZZ9JeiyAm
SikqkAHBSW4SxKqg8ApaSqx5L2QTqyfkClR0sLmgM0JtmfJrbobpQ6bMtdPjUZ9L
n2gnpO2vaWCa1SEQrRrdELqvcD8PHkSJapWOBXOkpGWxoeov/PYxOgkpdDUW4qYY
lVLtp1Vd3OB/h3Unqfw32DNiHA5p89hWPX5UybKMgRVL9Cv2/lyY47pcY8XTeNzn
bv84YRbFJeI+GgoEnghmtq+IM8XiW/cr9rWm9wATKfKGcmmFauumALrsffUpHVCM
4hSNgBv5t2V9ptZ+MDlM/Ku+zk9GoqwQ+hemdpVtiyhOtGUPGFBn5YLE4c2DHFxV
+L9JtBnFn8obnssNoz0wL+QvZchT1qUHMhH5CWAanjw9CTDp/YwQ2P01zK+00s9d
4cB7fUG3WNto5eXXEGMaXeDsUEz8z//hTe3j5sVbnHsXi0R3dhv7iryifmx4bUKU
H9EwAc+uNUHbvBy7u6IWg0I8P2n00CCO6JqXijQ92zELJ5j0XhzHUI2dOXn+zyEo
3FZu56LFnSSUBEXuTjq4
=PcNw
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170303' into staging
ppc patch queuye for 2017-03-03
This will probably be my last pull request before the hard freeze. It
has some new work, but that has all been posted in draft before the
soft freeze, so I think it's reasonable to include in qemu-2.9.
This batch has:
* A substantial amount of POWER9 work
* Implements the legacy (hash) MMU for POWER9
* Some more preliminaries for implementing the POWER9 radix
MMU
* POWER9 has_work
* Basic POWER9 compatibility mode handling
* Removal of some premature tests
* Some cleanups and fixes to the existing MMU code to make the
POWER9 work simpler
* A bugfix for TCG multiply adds on power
* Allow pseries guests to access PCIe extended config space
This also includes a code-motion not strictly in ppc code - moving
getrampagesize() from ppc code to exec.c. This will make some future
VFIO improvements easier, Paolo said it was ok to merge via my tree.
# gpg: Signature made Fri 03 Mar 2017 03:20:36 GMT
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.9-20170303:
target/ppc: rewrite f[n]m[add,sub] using float64_muladd
spapr: Small cleanup of PPC MMU enums
spapr_pci: Advertise access to PCIe extended config space
target/ppc: Rework hash mmu page fault code and add defines for clarity
target/ppc: Move no-execute and guarded page checking into new function
target/ppc: Add execute permission checking to access authority check
target/ppc: Add Instruction Authority Mask Register Check
hw/ppc/spapr: Add POWER9 to pseries cpu models
target/ppc/POWER9: Add cpu_has_work function for POWER9
target/ppc/POWER9: Add POWER9 pa-features definition
target/ppc/POWER9: Add POWER9 mmu fault handler
target/ppc: Don't gen an SDR1 on POWER9 and rework register creation
target/ppc: Add patb_entry to sPAPRMachineState
target/ppc/POWER9: Add POWERPC_MMU_V3 bit
powernv: Don't test POWER9 CPU yet
exec, kvm, target-ppc: Move getrampagesize() to common code
target/ppc: Add POWER9/ISAv3.00 to compat_table
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The cast is there because sigbus_handler is invoked via sigfd_handler.
But it feels just wrong to use struct qemu_signalfd_siginfo in the
prototype of a function that is passed to sigaction.
Instead, do a simple-minded conversion of qemu_signalfd_siginfo to
siginfo_t.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
getrampagesize() returns the largest supported page size and mainly
used to know if huge pages are enabled.
However is implemented in target-ppc/kvm.c and not available
in TCG or other architectures.
This renames and moves gethugepagesize() to mmap-alloc.c where
fd-based analog of it is already implemented. This renames and moves
getrampagesize() to exec.c as it seems to be the common place for
helpers like this.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Similarly to allocation, do it from an inline function. This allows
tests to only use the headers for allocation/free of timer.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 75cdcd1 neglected to update tests/qemu-iotests/049.out, and
made the error message for negative size worse. Fix that.
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
parse_option_size()'s checking for overflow and trailing crap is
wrong. Has always been that way. qemu_strtosz() gets it right, so
use that.
This adds support for size suffixes 'P', 'E', and ignores case for all
suffixes, not just 'k'.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1487708048-2131-25-git-send-email-armbru@redhat.com>
This will permit its use in parse_option_size().
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com> (maintainer:X86)
Cc: Kevin Wolf <kwolf@redhat.com> (supporter:Block layer core)
Cc: Max Reitz <mreitz@redhat.com> (supporter:Block layer core)
Cc: qemu-block@nongnu.org (open list:Block layer core)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1487708048-2131-24-git-send-email-armbru@redhat.com>
This makes qemu_strtosz(), qemu_strtosz_mebi() and
qemu_strtosz_metric() similar to qemu_strtoi64(), except negative
values are rejected.
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com> (maintainer:X86)
Cc: Kevin Wolf <kwolf@redhat.com> (supporter:Block layer core)
Cc: Max Reitz <mreitz@redhat.com> (supporter:Block layer core)
Cc: qemu-block@nongnu.org (open list:Block layer core)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1487708048-2131-23-git-send-email-armbru@redhat.com>
Change the qemu_strtosz() & friends to return -EINVAL when @endptr is
null and the conversion doesn't consume the string completely.
Matches how qemu_strtol() & friends work.
Only test_qemu_strtosz_simple() passes a null @endptr. No functional
change there, because its conversion consumes the string.
Simplify callers that use @endptr only to fail when it doesn't point
to '\0' to pass a null @endptr instead.
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com> (maintainer:X86)
Cc: Kevin Wolf <kwolf@redhat.com> (supporter:Block layer core)
Cc: Max Reitz <mreitz@redhat.com> (supporter:Block layer core)
Cc: qemu-block@nongnu.org (open list:Block layer core)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1487708048-2131-22-git-send-email-armbru@redhat.com>
Writing QEMU_STRTOSZ_DEFSUFFIX_* instead of '*' gains nothing. Get
rid of these eyesores.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1487708048-2131-18-git-send-email-armbru@redhat.com>
Most callers of qemu_strtosz_suffix() pass QEMU_STRTOSZ_DEFSUFFIX_B.
Capture the pattern in new qemu_strtosz().
Inline qemu_strtosz_suffix() into its only remaining caller.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1487708048-2131-17-git-send-email-armbru@redhat.com>
With qemu_strtosz(), no suffix means mebibytes. It's used rarely.
I'm going to add a similar function where no suffix means bytes.
Rename qemu_strtosz() to qemu_strtosz_MiB() to make the name
qemu_strtosz() available for the new function.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1487708048-2131-16-git-send-email-armbru@redhat.com>
To parse numbers with metric suffixes, we use
qemu_strtosz_suffix_unit(nptr, &eptr, QEMU_STRTOSZ_DEFSUFFIX_B, 1000)
Capture this in a new function for legibility:
qemu_strtosz_metric(nptr, &eptr)
Replace test_qemu_strtosz_suffix_unit() by test_qemu_strtosz_metric().
Rename qemu_strtosz_suffix_unit() to do_strtosz() and give it internal
linkage.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1487708048-2131-15-git-send-email-armbru@redhat.com>
parse_option_number() fails to check for these errors after
strtoull(). Has always been broken. Fix that.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1487708048-2131-10-git-send-email-armbru@redhat.com>
Reorder check_strtox_error() to make it obvious that we always store
through a non-null @endptr.
Transform
if (some error) {
error case ...
err = value for error case;
} else {
normal case ...
err = value for normal case;
}
return err;
to
if (some error) {
error case ...
return value for error case;
}
normal case ...
return value for normal case;
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1487708048-2131-9-git-send-email-armbru@redhat.com>
Name same things the same, different things differently.
* qemu_strtol()'s parameter @nptr is called @p in
check_strtox_error(). Rename the latter.
* qemu_strtol()'s parameter @endptr is called @next in
check_strtox_error(). Rename the latter.
* qemu_strtol()'s variable @p is called @endptr in
check_strtox_error(). Rename both to @ep.
* qemu_strtol()'s variable @err is *negative* errno,
check_strtox_error()'s parameter @err is *positive*. Rename the
latter to @libc_errno.
Same for qemu_strtoul(), qemu_strtoi64(), qemu_strtou64(), of course.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1487708048-2131-8-git-send-email-armbru@redhat.com>
The name qemu_strtoll() suggests conversion to long long, but it
actually converts to int64_t. Rename to qemu_strtoi64().
The name qemu_strtoull() suggests conversion to unsigned long long,
but it actually converts to uint64_t. Rename to qemu_strtou64().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1487708048-2131-7-git-send-email-armbru@redhat.com>
Fixes the following documentation bugs:
* Fails to document that null @nptr is safe.
* Fails to document that we return -EINVAL when no conversion could be
performed (commit 47d4be1).
* Confuses long long with int64_t, and unsigned long long with
uint64_t.
* Claims the unsigned conversions can underflow. They can't.
While there, mark problematic assumptions that int64_t is long long,
and uint64_t is unsigned long long with FIXME comments.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1487708048-2131-6-git-send-email-armbru@redhat.com>
Plenty of code relies on QemuOpt member @str not being null, including
qemu_opts_print(), qemu_opts_to_qdict(), and callbacks passed to
qemu_opt_foreach().
Begs the question whether it can be null. Only opt_set() creates
QemuOpt. It sets member @str to its argument @value. Passing null
for @value would plant a time bomb. Callers:
* opts_do_parse() can't pass null.
* qemu_opt_set() passes its argument @value. Callers:
- qemu_opts_from_qdict_1() can't pass null
- qemu_opts_set() passes its argument @value, but none of its
callers pass null.
- Many more outside qemu-option.c, but they shouldn't pass null,
either.
Assert member @str isn't null, so that misuse is caught right away.
Simplify parse_option_bool(), parse_option_number() and
parse_option_size() accordingly. Best viewed with whitespace changes
ignored.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1487708048-2131-3-git-send-email-armbru@redhat.com>
This adds a CoMutex around the existing CoQueue. Because the write-side
can just take CoMutex, the old "writer" field is not necessary anymore.
Instead of removing it altogether, count the number of pending writers
during a read-side critical section and forbid further readers from
entering.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213181244.16297-7-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
All that CoQueue needs in order to become thread-safe is help
from an external mutex. Add this to the API.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213181244.16297-6-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Running a very small critical section on pthread_mutex_t and CoMutex
shows that pthread_mutex_t is much faster because it doesn't actually
go to sleep. What happens is that the critical section is shorter
than the latency of entering the kernel and thus FUTEX_WAIT always
fails. With CoMutex there is no such latency but you still want to
avoid wait and wakeup. So introduce it artificially.
This only works with one waiters; because CoMutex is fair, it will
always have more waits and wakeups than a pthread_mutex_t.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213181244.16297-3-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This uses the lock-free mutex described in the paper '"Blocking without
Locking", or LFTHREADS: A lock-free thread library' by Gidenstam and
Papatriantafilou. The same technique is used in OSv, and in fact
the code is essentially a conversion to C of OSv's code.
[Added missing coroutine_fn in tests/test-aio-multithread.c.
--Stefan]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213181244.16297-2-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Pull the increment/decrement pair out of aio_bh_poll and into the
callers.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-18-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch prepares for the removal of unnecessary lockcnt inc/dec pairs.
Extract the dispatching loop for file descriptor handlers into a new
function aio_dispatch_handlers, and then inline aio_dispatch into
aio_poll.
aio_dispatch can now become void.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-17-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-16-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-15-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This covers both file descriptor callbacks and polling callbacks,
since they execute related code.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-14-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-13-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The AioContext data structures are now protected by list_lock and/or
they are walked with FOREACH_RCU primitives. There is no need anymore
to acquire the AioContext for the entire duration of aio_dispatch.
Instead, just acquire it before and after invoking the callbacks.
The next step is then to push it further down.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-12-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
As a small step towards the introduction of multiqueue, we want
coroutines to remain on the same AioContext that started them,
unless they are moved explicitly with e.g. aio_co_schedule. This patch
avoids that coroutines switch AioContext when they use a CoMutex.
For now it does not make much of a difference, because the CoMutex
is not thread-safe and the AioContext itself is used to protect the
CoMutex from concurrent access. However, this is going to change.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-9-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
aio_co_wake provides the infrastructure to start a coroutine on a "home"
AioContext. It will be used by CoMutex and CoQueue, so that coroutines
don't jump from one context to another when they go to sleep on a
mutex or waitqueue. However, it can also be used as a more efficient
alternative to one-shot bottom halves, and saves the effort of tracking
which AioContext a coroutine is running on.
aio_co_schedule is the part of aio_co_wake that starts a coroutine
on a remove AioContext, but it is also useful to implement e.g.
bdrv_set_aio_context callbacks.
The implementation of aio_co_schedule is based on a lock-free
multiple-producer, single-consumer queue. The multiple producers use
cmpxchg to add to a LIFO stack. The consumer (a per-AioContext bottom
half) grabs all items added so far, inverts the list to make it FIFO,
and goes through it one item at a time until it's empty. The data
structure was inspired by OSv, which uses it in the very code we'll
"port" to QEMU for the thread-safe CoMutex.
Most of the new code is really tests.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213135235.12274-3-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
AioContext is fairly self contained, the only dependency is QEMUTimer but
that in turn doesn't need anything else. So move them out of block-obj-y
to avoid introducing a dependency from io/ to block-obj-y.
main-loop and its dependency iohandler also need to be moved, because
later in this series io/ will call iohandler_get_aio_context.
[Changed copyright "the QEMU team" to "other QEMU contributors" as
suggested by Daniel Berrange and agreed by Paolo.
--Stefan]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213135235.12274-2-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The member VMStateField.start is used for two things, partial data
migration for VBUFFER data (basically provide migration for a
sub-buffer) and for locating next in QTAILQ.
The implementation of the VBUFFER feature is broken when VMSTATE_ALLOC
is used. This however goes unnoticed because actually partial migration
for VBUFFER is not used at all.
Let's consolidate the usage of VMStateField.start by removing support
for partial migration for VBUFFER.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Message-Id: <20170203175217.45562-1-pasic@linux.vnet.ibm.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
To iterate over all QemuOpts currently requires using a callback
function which is inconvenient for control flow. Add support for
using iterator functions more directly
QemuOptsIter iter;
QemuOpt *opt;
qemu_opts_iter_init(&iter, opts, "repeated-key");
while ((opt = qemu_opts_iter_next(&iter)) != NULL) {
....do something...
}
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170203120649.15637-8-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Implements 128-bit left shift and right shift as well as their
testcases. By design, shift silently mods by 128, so the caller is
responsible to assert the shift range if necessary.
Left shift sets the overflow flag if any non-zero digit is shifted out.
Examples:
ulshift(&low, &high, 250, &overflow);
equivalent: n << 122
urshift(&low, &high, -2);
equivalent: n << 126
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[dwg: Added test-shift128 to .gitignore]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It is not possible to implement functions in host-utils.c for
architectures with quadwords because the guard is implemented in the
Makefile. This patch move the guard out of the Makefile to the
implementation file.
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Bitmaps with a granularity of 58 or above can be neither serialized nor
deserialized (see the comment in the function added in this series for
an explanation). This patch adds a function so that we can check whether
a bitmap actually can be (de-)serialized at all, thus avoiding failing
the necessary assertion in hbitmap_serialization_granularity().
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20161115225746.3590-2-mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
A fix has been committed in upstream glib commit
210a9796f78eb90f76f1bd6a304e9fea05e97617.
(See also related bug https://bugzilla.gnome.org/show_bug.cgi?id=764415)
It is desirable to use the glib version instead of qemu copy, since it
provides more debugging facilities (G_MAIN_POLL_DEBUG etc), and
hopefully has a better maintainance. Hopefully, we can drop the qemu
copy in a few years.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>