Adds EXTERNAL attribute definition to qemu timers subsystem and assigns
it to virtual clock timers, used in slirp (ICMP IPv6) and ui (key queue).
Virtual clock processing in rr mode can use this attribute instead of a
separate clock type.
Fixes: 87f4fe7653
Fixes: 775a412bf8
Fixes: 9888091404
Signed-off-by: Artem Pisarenko <artem.k.pisarenko@gmail.com>
Message-Id: <e771f96ab94e86b54b9a783c974f2af3009fe5d1.1539764043.git.artem.k.pisarenko@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Attributes are simple flags, associated with individual timers for their
whole lifetime. They intended to be used to mark individual timers for
special handling when they fire.
New/init functions family in timer interface updated and refactored (new
'attribute' argument added, timer_list replaced with timer_list_group+type
combinations, comments improved to avoid info duplication). Also existing
aio interface extended with attribute-enabled variants of functions,
which create/initialize timers.
Signed-off-by: Artem Pisarenko <artem.k.pisarenko@gmail.com>
Message-Id: <f47b81dbce734e9806f9516eba8ca588e6321c2f.1539764043.git.artem.k.pisarenko@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
That patch series introduced new virtual clock type for use in external
subsystems. It breaks desired behavior in non-record/replay usage
scenarios due to a small change to existing behavior. Processing of
virtual timers belonging to new clock type is kicked off to the main
loop, which makes these timers asynchronous with vCPU thread and,
in icount mode, with whole guest execution. This breaks expected
determinism in non-record/replay icount mode of emulation where these
"external subsystems" are isolated from the host (i.e. they are
external only to guest core, not to the entire emulation environment).
Example for slirp ("user" backend for network device):
User runs qemu in icount mode with rtc clock=vm without any external
communication interfaces but with "-netdev user,restrict=on". It expects
deterministic execution, because network services are emulated inside
qemu and isolated from host. There are no reasons to get reply from DHCP
server with different delay or something like that.
The next patches revert reimplements the same changes in a better way.
This reverts commit 87f4fe7653.
This reverts commit 775a412bf8.
This reverts commit 9888091404.
Signed-off-by: Artem Pisarenko <artem.k.pisarenko@gmail.com>
Message-Id: <18b1e7c8f155fe26976f91be06bde98eef6f8751.1539764043.git.artem.k.pisarenko@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Modify qemu_opts_print_help():
- to print expected argument type
- skip description if not available
- sort lines
- prefix with the list name (like qdev, to avoid confusion)
- drop 16-chars alignment, use a '-' as seperator for option name and
description
For ex, "-spice help" output is changed from:
port No description available
tls-port No description available
addr No description available
[...]
gl No description available
rendernode No description available
to:
spice.addr=str
spice.agent-mouse=bool (on/off)
spice.disable-agent-file-xfer=bool (on/off)
[...]
spice.x509-key-password=str
spice.zlib-glz-wan-compression=str
"qemu-img create -f qcow2 -o help", changed from:
size Virtual disk size
compat Compatibility level (0.10 or 1.1)
backing_file File name of a base image
[...]
lazy_refcounts Postpone refcount updates
refcount_bits Width of a reference count entry in bits
to:
backing_file=str - File name of a base image
backing_fmt=str - Image format of the base image
cluster_size=size - qcow2 cluster size
[...]
refcount_bits=num - Width of a reference count entry in bits
size=size - Virtual disk size
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
QDev options accept 'help' (or '?', but that's problematic with shell
globbing) in the list of parameters, which is handy to list the
available options.
Unfortunately, this isn't built in QemuOpts. qemu_opts_parse_noisily()
seems to be the common path for command line options, so place a
fallback to print help, listing the available options.
This is quite handy, for example with qemu "-spice help".
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Slirp and VNC modules use virtual clock for processing some events that
are related to the guest execution speed.
But virtual clock-related events are consideres to be deterministic and
are recorded/replayed by icount mechanism. But slirp and VNC lie outside
the recorded guest core (which includes CPU and peripherals).
Therefore slirp and VNC are external for the guest, but should work at
guest speed.
This patch introduces new virtual clock which can be used for external
subsystems for running timers that are synchronized with the guest.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <20180912082002.3228.82417.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Run some memfd-related checks before registering hostmem-memfd &
various properties. This will help libvirt to figure out what the host
is supposed to be capable of.
qemu_memfd_check() is changed to a less optimized version, since it is
used with various flags, it no longer caches the result.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180906161415.8543-1-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
With the seqlock, we either have to use atomics to remain
within defined behaviour (and note that 64-bit atomics aren't
always guaranteed to compile, irrespective of __nocheck), or
drop the atomics and be in undefined behaviour territory.
Fix it by dropping the seqlock and using atomic64 accessors.
This will limit scalability when !CONFIG_ATOMIC64, but those
machines (1) don't have many users and (2) are unlikely to
have many cores.
- With CONFIG_ATOMIC64:
$ tests/atomic_add-bench -n 1 -m -p
Throughput: 13.00 Mops/s
- Forcing !CONFIG_ATOMIC64:
$ tests/atomic_add-bench -n 1 -m -p
Throughput: 10.89 Mops/s
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20180910232752.31565-5-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This introduces read/set accessors for int64_t and uint64_t.
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20180910232752.31565-3-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These functions do not modify their @ht or @bucket arguments.
Constify those arguments.
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
seqlock_read_begin takes a const param since c04649eeea
("seqlock: constify seqlock_read_begin", 2018-08-23), so
we can constify the entire lookup.
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Accessing the HT from an iterator results almost always
in a deadlock. Given that only one qht-internal function
uses this argument, drop it from the interface.
Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This currently has no users, but the use case is so common that I
think we must support it.
Note that without the appended we cannot safely remove a set of
elements; a 2-step approach (i.e. qht_iter first, keep track of
the to-be-deleted elements, and then a bunch of qht_remove calls)
would be racy, since between the iteration and the removals other
threads might insert additional elements.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit 70232b5253 ("aio-posix: Don't count ctx->notifier as progress when
2018-08-15), by not reporting progress, causes aio_poll to execute the
system call when polling succeeds because of ctx->notifier. This introduces
latency before the call to aio_bh_poll() and negates the advantages of
polling, unfortunately.
The fix builds on the previous patch, separating the effect of polling on
the timeout from the progress reported to aio_poll(). ctx->notifier
does zero the timeout, causing the caller to skip the system call,
but it does not report progress, so that the bug fix of commit 70232b5253
still stands.
Fixes: 70232b5253
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180912171040.1732-4-pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
This is a preparation for the next patch, and also a very small
optimization. Compute the timeout only once, before invoking
try_poll_mode, and adjust it in run_poll_handlers. The adjustment
is the polling time when polling fails, or zero (non-blocking) if
polling succeeds.
Fixes: 70232b5253
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180912171040.1732-3-pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
It is valid for an aio_set_fd_handler to happen concurrently with
aio_poll. In that case, poll_disable_cnt can change under the heels
of aio_poll, and the assertion on poll_disable_cnt can fail in
run_poll_handlers.
Therefore, this patch simply checks the counter on every polling
iteration. There are no particular needs for ordering, since the
polling loop is terminated anyway by aio_notify at the end of
aio_set_fd_handler.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180912171040.1732-2-pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
- Deprecate the "enforce-config-section" machine parameter
- Re-enable the wdt_ib700, endianness and vmxnet3 qtests
- Some trivial fixes and doc update patches that crossed my way
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJbqlsyAAoJEC7Z13T+cC21RbAP/3IvGfBxuRm6rBWoghjQgbl8
KU8nPnlZUtqjxmfUTILO/h+pJ3na5MQ8hh7v8JHi+xlQ2DPkECW21DtnfdxntVjw
+b+N5Ap6J22GHyEq4HJXPWAk2rDInqkU966DvL40RiMvOTfXdg9EO0TDX0VsVgZv
BR1r7/t3T0P7hiQ0XWb9U2JchRIC+Zgk34gXZPSTpoIv89fUhzNoK5LvAA6yV1FQ
TvE8VTKJm4wkqThH1ShtbJCBKjHjW/W8LYZr3YMothcs8vGjEdEcDL4BoJZDn3bF
h4VTkU+k8lp7W9LmlnPnu1WH/5ezhzdwJTeFaPJt4U10WKJptAS4vbK03DXlds9O
9d2BOXKrima2kSr1ejSe1f0kcE8fis1XFmSuhF61Nbw6ngT5+pP2JSc1XwFazd2K
zQwV4GXBLzAGnd4F2Ec+5TKzbGFVfczxeBDiBkkVmG+XdX/UXJpkpPYGAaw7DDiK
JwKVVYIPk1ll6MAbR6qEGsvE/adHNEm8lUdjXqwgbQlIeUZ2H0hCu9lJ0X81mtoQ
WZP+nMa/87COnlPX6VPVgxM2TXQOH/UbGz/WmYzZ6/gPKTX+gfwrHQGdp7Tjl33U
KxFKWioFnoqGuyWasvTtKEK67/IlrY+w1nXuuqKJg8J2/qx1SVtx45FHkRkxkIDx
4boRpx0XUqpDVdf8VhRB
=dXgp
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/huth-gitlab/tags/pull-request-2018-09-25' into staging
- Deprecate the usage of a network backend via "name" instead of "id"
- Deprecate the "enforce-config-section" machine parameter
- Re-enable the wdt_ib700, endianness and vmxnet3 qtests
- Some trivial fixes and doc update patches that crossed my way
# gpg: Signature made Tue 25 Sep 2018 16:58:42 BST
# gpg: using RSA key 2ED9D774FE702DB5
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>"
# gpg: aka "Thomas Huth <thuth@redhat.com>"
# gpg: aka "Thomas Huth <huth@tuxfamily.org>"
# gpg: aka "Thomas Huth <th.huth@posteo.de>"
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5
* remotes/huth-gitlab/tags/pull-request-2018-09-25:
Revert "check: Move VMXNET3 test to common"
Revert "check: Move endianess test to common"
Revert "check: Move wdt_ib700 test to common"
tests/migration: Speed up the test on ppc64
hw/qdev-core: Fix description of instance_init
qdev: fix a typo in comment
docs: Fix some typos (most found by codespell)
trivial: Make bios files and source files non-executable
memfd: fix possible usage of the uninitialized file descriptor
hw/core/machine: Officially deprecate the enforce-config-section parameter
net/slirp: Deprecate the [hub_id name] parameter tuple
net: Deprecate the "name" parameter of -net
Makefile: Add missing dependency for qemu-deprecated.texi
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The qemu_memfd_alloc_check() routine allocates the fd variable on stack.
This variable is initialized inside the qemu_memfd_alloc() function.
There are several cases when *fd will be left unintialized which can
lead to the unexpected close() in the qemu_memfd_free() call.
Set file descriptor to -1 before calling the qemu_memfd_alloc routine.
Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
When draining a block node, we recurse to its parent and for subtree
drains also to its children. A single AIO_WAIT_WHILE() is then used to
wait for bdrv_drain_poll() to become true, which depends on all of the
nodes we recursed to. However, if the respective child or parent becomes
quiescent and calls bdrv_wakeup(), only the AioWait of the child/parent
is checked, while AIO_WAIT_WHILE() depends on the AioWait of the
original node.
Fix this by using a single AioWait for all callers of AIO_WAIT_WHILE().
This may mean that the draining thread gets a few more unnecessary
wakeups because an unrelated operation got completed, but we already
wake it up when something _could_ have changed rather than only if it
has certainly changed.
Apart from that, drain is a slow path anyway. In theory it would be
possible to use wakeups more selectively and still correctly, but the
gains are likely not worth the additional complexity. In fact, this
patch is a nice simplification for some places in the code.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
bdrv_do_drained_begin/end() assume that they are called with the
AioContext lock of bs held. If we call drain functions from a coroutine
with the AioContext lock held, we yield and schedule a BH to move out of
coroutine context. This means that the lock for the home context of the
coroutine is released and must be re-acquired in the bottom half.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
AIO Coroutines shouldn't by managed by an AioContext different than the
one assigned when they are created. aio_co_enter avoids entering a
coroutine from a different AioContext, calling aio_co_schedule instead.
Scheduled coroutines are then entered by co_schedule_bh_cb using
qemu_coroutine_enter, which just calls qemu_aio_coroutine_enter with the
current AioContext obtained with qemu_get_current_aio_context.
Eventually, co->ctx will be set to the AioContext passed as an argument
to qemu_aio_coroutine_enter.
This means that, if an IO Thread's AioConext is being processed by the
Main Thread (due to aio_poll being called with a BDS AioContext, as it
happens in AIO_WAIT_WHILE among other places), the AioContext from some
coroutines may be wrongly replaced with the one from the Main Thread.
This is the root cause behind some crashes, mainly triggered by the
drain code at block/io.c. The most common are these abort and failed
assertion:
util/async.c:aio_co_schedule
456 if (scheduled) {
457 fprintf(stderr,
458 "%s: Co-routine was already scheduled in '%s'\n",
459 __func__, scheduled);
460 abort();
461 }
util/qemu-coroutine-lock.c:
286 assert(mutex->holder == self);
But it's also known to cause random errors at different locations, and
even SIGSEGV with broken coroutine backtraces.
By using qemu_aio_coroutine_enter directly in co_schedule_bh_cb, we can
pass the correct AioContext as an argument, making sure co->ctx is not
wrongly altered.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add two functions to print an error/warning report once depending
on a passed-in condition variable and flip it if printed. This is
useful if you want to print a message not once-globally, but e.g.
once-per-device.
Inspired by warn_once() in hw/vfio/ccw.c, which has been replaced
with warn_report_once_cond().
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20180830145902.27376-2-cohuck@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Function comments reworded]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
vhost-user-gpu will share the same code to open a DRM node.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180713130916.4153-20-marcandre.lureau@redhat.com>
[ kraxel: buildfix: util/drm.o must be CONFIG_OPENGL not CONFIG_LINUX ]
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
We reject bytes that can't occur in valid UTF-8 (\xC0..\xC1,
\xF5..\xFF in the lexer. That's insufficient; there's plenty of
invalid UTF-8 not containing these bytes, as demonstrated by
check-qjson:
* Malformed sequences
- Unexpected continuation bytes
- Missing continuation bytes after start bytes other than
\xC0..\xC1, \xF5..\xFD.
* Overlong sequences with start bytes other than \xC0..\xC1,
\xF5..\xFD.
* Invalid code points
Fixing this in the lexer would be bothersome. Fixing it in the parser
is straightforward, so do that.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-23-armbru@redhat.com>
Let's set the alignment just like for the posix variant. This will
implicitly set the alignment of the underlying memory region and
therefore make memory_region_get_alignment(mr) return something > 0 for
all memory backends applicable to PCDIMM/NVDIMM.
The allocation granularity is ususally 64k, while the page size is 4k.
The documentation of VirtualAlloc is not really comprehensible in case
only MEM_COMMIT is specified without an address. We'll detect the actual
values and then go for the bigger one. The expection is, that it will
always be 64k aligned. (The assumption is that MEM_COMMIT does an
implicit MEM_RESERVE, so the address will always be aligned to the
allocation granularity. And the allocation granularity is always bigger
than the page size).
This will allow us to drop special handling in pc.c for
memory_region_get_alignment(mr) == 0, as we can then assume that it is
always set (and AFAICS >= getpagesize()).
For pc in pc_memory_plug(), under Windows TARGET_PAGE_SIZE == getpagesize(),
therefore alignment of DIMMs will not change, and therefore also not the
guest physical memory layout.
For spapr in spapr_memory_plug(), an alignment of 0 would have been used
until now. As QEMU_ALIGN_UP will crash with the alignment being 0, this
never worked, so we don't have to care about compatibility handling.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180801133444.11269-3-david@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The current paths for modules are CONFIG_QEMU_MODDIR and paths relative
to the executable. Qemu and its modules can be installed and executed in
paths that are different from these search paths. This change allows
a search path to be specified by environment variable.
An example usage for this is postmarketOS[1]. This is a build environment
for Alpine Linux. It sets up Alpine Linux in a chroot environment.
Alpine's Qemu packages are installed in the chroot. The Alpine Linux Qemu
package is used to test compiled Alpine Linux system images. This way there
isn't a reliance on the which ever version of Qemu the host system / distro
provides.
postmarketOS executes Qemu on host system outside of the chroot
The Qemu module search path needs to point to the location of the
chroot relative to the host system.
e.g.
The root of the Alpine Linux chroot is:
~/.local/var/pmbootstrap/chroot_native/
Alpine's Qemu is installed at
~/.local/var/pmbootstrap/chroot_native/usr/bin/
The Qemu module search path needs to be:
QEMU_MODULE_DIR=~/.local/var/pmbootstrap/chroot_native/usr/lib/qemu/
[1] https://postmarketos.org/
Signed-off-by: ryang <decatf@gmail.com>
Message-Id: <20180704181010.GA918@computer>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The BQL is acquired via qemu_mutex_lock_iothread(), which makes
the profiler assign the associated wait time (i.e. most of
BQL wait time) entirely to that function. This loses the original
call site information, which does not help diagnose BQL contention.
Fix it by tracking the callers explicitly.
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
I first implemented this by deleting all entries in the global
hash table. But doing that safely slows down profiling, since
we'd need to introduce rcu_read_lock/unlock in the fast path.
What's implemented here avoids messing with the thread-local
data in the global hash table. It achieves this by taking a snapshot
of the current state, so that subsequent reports present the delta
wrt to the snapshot.
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The goal of this module is to profile synchronization primitives (i.e.
mutexes, recursive mutexes and condition variables) so that scalability
issues can be quickly diagnosed.
Sync primitives are profiled by QSP based on the vaddr of the object accessed
as well as the call site (file:line_nr). That means the same object called
from two different call sites will be tracked in separate entries, which
might be reported together or separately (see subsequent commit on
call site coalescing).
Some perf numbers:
Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Command: taskset -c 0 tests/atomic_add-bench -d 5 -m
- Before: 54.80 Mops/s
- After: 54.75 Mops/s
That is, a negligible slowdown due to the now indirect call to
qemu_mutex_lock. Note that using a branch instead of an indirect
call introduces a more severe slowdown (53.65 Mops/s, i.e. 2% slowdown).
Enabling the profiler (with -p, added in this series) is more interesting:
- No profiling: 54.75 Mops/s
- W/ profiling: 12.53 Mops/s
That is, a 4.36X slowdown.
We can break down this slowdown by removing the get_clock calls or
the entry lookup:
- No profiling: 54.75 Mops/s
- W/o get_clock: 25.37 Mops/s
- W/o entry lookup: 19.30 Mops/s
- W/ profiling: 12.53 Mops/s
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The counter is for qemu_lockcnt_inc/dec sections (read side),
qemu_lockcnt_lock/unlock is for the write side.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20180803063917.30292-1-famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
An aio_notify() pairs with an aio_notify_accept(). The former should
happen in the main thread or a vCPU thread, and the latter should be
done in the IOThread.
There is one rare case that the main thread or vCPU thread may "steal"
the aio_notify() event just raised by itself, in bdrv_set_aio_context()
[1]. The sequence is like this:
main thread IO Thread
===============================================================
bdrv_drained_begin()
aio_disable_external(ctx)
aio_poll(ctx, true)
ctx->notify_me += 2
...
bdrv_drained_end()
...
aio_notify()
...
bdrv_set_aio_context()
aio_poll(ctx, false)
[1] aio_notify_accept(ctx)
ppoll() /* Hang! */
[1] is problematic. It will clear the ctx->notifier event so that
the blocked ppoll() will not return.
(For the curious, this bug was noticed when booting a number of VMs
simultaneously in RHV. One or two of the VMs will hit this race
condition, making the VIRTIO device unresponsive to I/O commands. When
it hangs, Seabios is busy waiting for a read request to complete (read
MBR), right after initializing the virtio-blk-pci device, using 100%
guest CPU. See also https://bugzilla.redhat.com/show_bug.cgi?id=1562750
for the original bug analysis.)
aio_notify() only injects an event when ctx->notify_me is set,
correspondingly aio_notify_accept() is only useful when ctx->notify_me
_was_ set. Move the call to it into the "blocking" branch. This will
effectively skip [1] and fix the hang.
Furthermore, blocking aio_poll is only allowed on home thread
(in_aio_context_home_thread), because otherwise two blocking
aio_poll()'s can steal each other's ctx->notifier event and cause
hanging just like described above.
Cc: qemu-stable@nongnu.org
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20180809132259.18402-3-famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
The same logic exists in fd polling. This change is especially important
to avoid busy loop once we limit aio_notify_accept() to blocking
aio_poll().
Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20180809132259.18402-2-famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Ciro Santilli reported that commit a5ed352596
breaks the execution replay. It happens due to the probing the clock
for the new instances of iothread.
However, this probing was made in replay mode for the timer lists that
are empty.
This patch removes clock probing in replay mode.
It is an artifact of the old version with another thread model.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <20180725121526.12867.17866.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
No callers of get_opt_value() pass in a NULL for the "value" parameter,
so the check is redundant.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20180514171913.17664-4-berrange@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Tested-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The logic for parsing the multiboot initrd modules was messed up in
commit 950c4e6c94
Author: Daniel P. Berrangé <berrange@redhat.com>
Date: Mon Apr 16 12:17:43 2018 +0100
opts: don't silently truncate long option values
Causing the length to be undercounter, and the number of modules over
counted. It also passes NULL to get_opt_value() which was not robust
at accepting a NULL value.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20180514171913.17664-2-berrange@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Tested-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
soft-freeze, but I'd like these preparatory patches to be merged anyway.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEtIKLr5QxQM7yo0kQcdTV5YIvc9YFAls2DEgACgkQcdTV5YIv
c9ZShw/6AuDMeWpzFHAdf4lBuUdDFyAC7QFln8xdb+MDus5whAvtt7vDeY0OKbgo
w5VDTkstO7h4jQJDHkjzHK91ZdUbgu0Tj7C09x4oQpUueNWsTZGcPBvNGadjjOBt
70LdwyV2nSER3+QNjTNznrh0faxay4xuSTIY/mW6iudeWGobwXmseEeOE8gGM+w0
s1GwxMVKIfllKUmW2vx0mGfn02pKtTnan+Si+sp/AnY9xSquFfHWpZhXZlkZrfYd
mgtJOTY9IpSekr9jBBKgUlZ/QVYiliDzuh3ePDYKtsuHZZ7z2ype3DkXqYOnblOs
C+2gWUE/TC5BStjRX3RmPv21dpfkEdlxOZpgXbpP1VgKqbtnbnvcgTL89IPv9afl
Aj+q5uYR494kOL5rSDynVRdWhnUmMnkqHCZpKG+IRMHv6GlrXpxOQWenwCS/vYWK
swKqRwGj0CFugdt7qVZ+4XjXbbWEI21dHHG7nAXinfakKVOfJYIeGIQC7WfpIrxy
ApV0mHSceK0AMBJvlf1Zf0Qm0lJ7Ay7MRT/5XWDFV9Bogf+wxtGvf9Ukc2qQhwd8
mR9iN7rlWz3VSu5vS3bEdsiBXKibxIRfv7HhF5fa+mwkZA9gMbj33vVds1zA4ta1
Qw4doRq4xWui3uNO9jvtcXtW5Bq7N4p6wVFK76dLVHk5axLCIec=
=vVJr
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into staging
The Darwin host support still needs some more work. It won't make it for
soft-freeze, but I'd like these preparatory patches to be merged anyway.
# gpg: Signature made Fri 29 Jun 2018 11:39:04 BST
# gpg: using RSA key 71D4D5E5822F73D6
# gpg: Good signature from "Greg Kurz <groug@kaod.org>"
# gpg: aka "Gregory Kurz <gregory.kurz@free.fr>"
# gpg: aka "[jpeg image of size 3330]"
# Primary key fingerprint: B482 8BAF 9431 40CE F2A3 4910 71D4 D5E5 822F 73D6
* remotes/gkurz/tags/for-upstream:
9p: darwin: Explicitly cast comparisons of mode_t with -1
cutils: Provide strchrnul
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This updates the minimum required glib version to 2.40
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJbNhcKAAoJEL6G67QVEE/f1gMP+wV4J+V9WJOxcXbNjP9bb1K4
gsvZrv2pKCVbcJW2o3Mc3POXExX1E2GsrAl639zpDHbNVlVFvQC51GaRO1Fc6lwh
7NRFHen0Ee6I7TvWlVXMy1YPXPZ/8EJ/1KuZerYUUyKO/R8ojEt/TbELwfl3P1LF
VZGWgs+GpwUYwiG4ZmYPQ0VsT/munZPlaK1mRQSDbGqX0KG1Vy8Q+mgbGAm+S6xh
39HtM7ecdfXbVEAnoMsp9+kRi3zXpD5zSAyVBZN2RDktMt/EHxF0pPJCqX7oq0Rn
DehXDkzaqF2ghzrSAIT3rbKBhzYIY/ny/vxnyQZ/Px0GrBdKwuUJ+pYcF2akS2hV
s98VWRw/tqt8wQKYLF/wmeL7eE/tKAHSUqFc3Ta0Rvgfq0z9Mp3b73vuswBpfjWL
+GiASRvg96n/1yqmywtCd7KtF5riOo1iyvfMFCdQ+nBHQok/6K/oWfZPyoZsCAIa
2GJBfmHzkkc98QeNO0Dp/+ckSyKIBuyTV1bDyq/8Yz7IwEd1PfRWooEgVFSHO7MU
6ddCECvKVvP0S03r5dlw4Imio39gPpeZBMmSE4ZOh/hRa89T4UqIUvSLwzat7OhA
DH1NhJsj7G1jaIxENfzBhpggg++rHCVcfc2cv28Tl7i7twyRsb7CveYA6TwD+UIg
7JT+KswUo7W9MrkKBHC+
=8tKK
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/berrange/tags/min-glib-pull-request' into staging
glib: update the min required version
This updates the minimum required glib version to 2.40
# gpg: Signature made Fri 29 Jun 2018 12:24:58 BST
# gpg: using RSA key BE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg: aka "Daniel P. Berrange <berrange@redhat.com>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF
* remotes/berrange/tags/min-glib-pull-request:
glib: enforce the minimum required version and warn about old APIs
glib: bump min required glib library version to 2.40
util: remove redundant include of glib.h and add osdep.h
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>