Regularize our handling of <sys/signal.h>: currently we include it in
osdep.h, but only for OpenBSD, and we include it without an ifdef
guard in a couple of C files. This causes problems for Haiku, which
doesn't have that header.
Instead, check in configure whether sys/signal.h exists, and if it
does then always include it from osdep.h.
Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200703145614.16684-5-peter.maydell@linaro.org
[PMM: Expanded commit message; rename to HAVE_SYS_SIGNAL_H]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Instead of using an OS-specific ifdef test to select the "openpty()
is in pty.h" codepath, make configure check for the existence of
the header and use the new CONFIG_PTY instead.
This is necessary to build on Haiku, which also provides openpty()
via pty.h.
Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200703145614.16684-3-peter.maydell@linaro.org
[PMM: Expanded commit message; rename to HAVE_PTY_H]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
GCC supports "#pragma GCC diagnostic" since version 4.6, and
Clang seems to support it, too, since its early versions 3.x.
That means that our minimum required compiler versions all support
this pragma already and we can remove the test from configure and
all the related #ifdefs in the code.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20200710045515.25986-1-thuth@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
- tests/vm support for aarch64 VMs
- tests/tcg better cross-compiler detection
- update docker tooling to support registries
- update docker support for xtensa
- gitlab build docker images and store in registry
- gitlab use docker images for builds
- a number of skipIf updates to support move
- linux-user MAP_FIXED_NOREPLACE fix
- qht-bench compiler tweaks
- configure fix for secret keyring
- tsan fiber annotation clean-up
- doc updates for mttcg/icount/gdbstub
- fix cirrus to use brew bash for iotests
- revert virtio-gpu breakage
- fix LC_ALL to avoid sorting changes in iotests
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEZoWumedRZ7yvyN81+9DbCVqeKkQFAl8J0yoACgkQ+9DbCVqe
KkSzTAf/Vn+9TU8Qt7nZvl7W4tz7Sy5K8EJGwj2RXx6CWWWLiFbsXurIM8Krw5Vc
RmvUxwa359b+J0lQpfeNDHYm1nM8RZLFlkG0a5bl0I8sW0EcPjBRtwNaGKXh2p0u
u2RS2QAi6A9AvYT4ZREYlBM+o9WzbxCEQm4s8fr6WEJCQfxBnb5/bGiEjWR64e8C
j9Kvou+zAKfVizbQMtu+mwqjsoPtcS1b3vVcO7anhNuUsuaEKkS0dFWzWvw3lwJR
STIYnb8Y/eJ1yKr0hPH2qtWv3n6yhlYvYmpUCH6AwshGMUoeFEzR2VoWS6yZPGG6
na6XA3UW5R9AxIDfkCJ5ueeo8t9xMQ==
=HRWa
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/stsquad/tags/pull-testing-and-misc-110720-2' into staging
Testing and misc build updates:
- tests/vm support for aarch64 VMs
- tests/tcg better cross-compiler detection
- update docker tooling to support registries
- update docker support for xtensa
- gitlab build docker images and store in registry
- gitlab use docker images for builds
- a number of skipIf updates to support move
- linux-user MAP_FIXED_NOREPLACE fix
- qht-bench compiler tweaks
- configure fix for secret keyring
- tsan fiber annotation clean-up
- doc updates for mttcg/icount/gdbstub
- fix cirrus to use brew bash for iotests
- revert virtio-gpu breakage
- fix LC_ALL to avoid sorting changes in iotests
# gpg: Signature made Sat 11 Jul 2020 15:56:42 BST
# gpg: using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>" [full]
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44
* remotes/stsquad/tags/pull-testing-and-misc-110720-2: (50 commits)
iotests: Set LC_ALL=C for sort
Revert "vga: build virtio-gpu as module"
tests: fix "make check-qtest" for modular builds
.cirrus.yml: add bash to the brew packages
tests/docker: update toolchain set in debian-xtensa-cross
tests/docker: fall back more gracefully when pull fails
docs: Add to gdbstub documentation the PhyMemMode
docs/devel: add some notes on tcg-icount for developers
docs/devel: convert and update MTTCG design document
tests/qht-bench: Adjust threshold computation
tests/qht-bench: Adjust testing rate by -1
travis.yml: Test also the other targets on s390x
shippable: pull images from registry instead of building
testing: add check-build target
containers.yml: build with docker.py tooling
gitlab: limit re-builds of the containers
tests: improve performance of device-introspect-test
gitlab: add avocado asset caching
gitlab: enable check-tcg for linux-user tests
linux-user/elfload: use MAP_FIXED_NOREPLACE in pgb_reserved_va
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This reverts commit 8d5a24c83d.
Compiling all virtio-gpu objects into a single module isn't a good plan
because the individual objects have different CONFIG_* dependencies.
Leads to module load failures on s390x due to vga support being
disabled, which in turn breaks '-device virtio-gpu-device' (flagged by
travis ci).
So back to the drawing board for modular virtio-gpu ...
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200710203652.9708-3-kraxel@redhat.com>
This is a cleanup patch to follow-up the patch which introduced TSAN.
This patch makes separate start_switch_fiber_ functions for TSAN and ASAN.
This does two things:
1. Unrelated ASAN and TSAN code is separate and each function only
has arguments that are actually needed.
2. The co->tsan_caller_fiber and co->tsan_co_fiber fields are only
access from within #ifdef CONFIG_TSAN.
Signed-off-by: Robert Foley <robert.foley@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200626170001.27017-1-robert.foley@linaro.org>
Message-Id: <20200701135652.1366-5-alex.bennee@linaro.org>
This is followup patch to the one submitted back in Oct, 19
https://lists.gnu.org/archive/html/qemu-devel/2019-10/msg02102.html
My mistake here, I took my eyes of the mailing list after I got the
initial thumbs up. This patch follows up on Markus comments in the
above link.
Purpose of this patch:
We want to print guest name for errors, warnings and info messages. This
was the first of two patches the second being MCE errors targeting a VM
with guest name prepended. But in a large fleet we see many other
errors that disable a VM or crash it. In a large fleet and centralized
logging having the guest name enables identify of owner and customer.
Signed-off-by: Mario Smarduch <msmarduch@digitalocean.com>
Message-Id: <20200626201900.8876-1-msmarduch@digitalocean.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away. Convert
if (!foo(..., &err)) {
...
error_propagate(errp, err);
...
return ...
}
to
if (!foo(..., errp)) {
...
...
return ...
}
where nothing else needs @err. Coccinelle script:
@rule1 forall@
identifier fun, err, errp, lbl;
expression list args, args2;
binary operator op;
constant c1, c2;
symbol false;
@@
if (
(
- fun(args, &err, args2)
+ fun(args, errp, args2)
|
- !fun(args, &err, args2)
+ !fun(args, errp, args2)
|
- fun(args, &err, args2) op c1
+ fun(args, errp, args2) op c1
)
)
{
... when != err
when != lbl:
when strict
- error_propagate(errp, err);
... when != err
(
return;
|
return c2;
|
return false;
)
}
@rule2 forall@
identifier fun, err, errp, lbl;
expression list args, args2;
expression var;
binary operator op;
constant c1, c2;
symbol false;
@@
- var = fun(args, &err, args2);
+ var = fun(args, errp, args2);
... when != err
if (
(
var
|
!var
|
var op c1
)
)
{
... when != err
when != lbl:
when strict
- error_propagate(errp, err);
... when != err
(
return;
|
return c2;
|
return false;
|
return var;
)
}
@depends on rule1 || rule2@
identifier err;
@@
- Error *err = NULL;
... when != err
Not exactly elegant, I'm afraid.
The "when != lbl:" is necessary to avoid transforming
if (fun(args, &err)) {
goto out
}
...
out:
error_propagate(errp, err);
even though other paths to label out still need the error_propagate().
For an actual example, see sclp_realize().
Without the "when strict", Coccinelle transforms vfio_msix_setup(),
incorrectly. I don't know what exactly "when strict" does, only that
it helps here.
The match of return is narrower than what I want, but I can't figure
out how to express "return where the operand doesn't use @err". For
an example where it's too narrow, see vfio_intx_enable().
Silently fails to convert hw/arm/armsse.c, because Coccinelle gets
confused by ARMSSE being used both as typedef and function-like macro
there. Converted manually.
Line breaks tidied up manually. One nested declaration of @local_err
deleted manually. Preexisting unwanted blank line dropped in
hw/riscv/sifive_e.c.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-35-armbru@redhat.com>
See recent commit "error: Document Error API usage rules" for
rationale.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200707160613.848843-14-armbru@redhat.com>
opt_set() frees its argument @value on failure. Slightly unclean;
functions ideally do nothing on failure.
To tidy this up, move opt_create() from opt_set() into its callers,
along with the cleanup. Rename opt_set() to opt_validate(), noting
its similarity to qemu_opts_validate(). Drop redundant parameter
@opts; use opt->opts instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200707160613.848843-13-armbru@redhat.com>
There is just one use so far. The next commit will add more.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200707160613.848843-12-armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200707160613.848843-11-armbru@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200707160613.848843-10-armbru@redhat.com>
This is to make the next commit easier to review.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200707160613.848843-9-armbru@redhat.com>
Convert uses like
opts = qemu_opts_create(..., &err);
if (err) {
...
}
to
opts = qemu_opts_create(..., errp);
if (!opts) {
...
}
Eliminate error_propagate() that are now unnecessary. Delete @err
that are now unused.
Note that we can't drop parallels_open()'s error_propagate() here. We
continue to execute it even in the converted case. It's a no-op then:
local_err is null.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200707160613.848843-8-armbru@redhat.com>
Add support for qom types provided by modules. For starters use a
manually maintained list which maps qom type to module and prefix.
Two load functions are added: One to load the module for a specific
type, and one to load all modules (needed for object/device lists as
printed by -- for example -- qemu -device help).
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20200624131045.14512-2-kraxel@redhat.com
Recent versions of Solaris (v11.4) now feature an openpty() function,
too, causing a build failure since we ship our own implementation of
openpty() for Solaris in util/qemu-openpty.c so far. Since there are
now both variants available in the wild, with and without this function
(and illumos is said to not have this function yet), let's introduce a
proper HAVE_OPENPTY define for this to fix the build failure.
Message-Id: <20200702143955.678-1-thuth@redhat.com>
Tested-by: Michele Denber <denber@mindspring.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Sometimes virtual timer callbacks depend on order
of virtual timer processing and warping of virtual clock.
Therefore every callback should be logged to make replay deterministic.
This patch creates a checkpoint before every virtual timer callback.
With these checkpoints virtual timers processing and clock warping
events order is completely deterministic.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Acked-by: Alex Bennée <alex.bennee@linaro.org>
--
v2:
- remove mutex lock/unlock for virtual clock checkpoint since it is
not process any asynchronous events (commit ca9759c2a9)
- bump record/replay log file version
Message-Id: <159012932716.27256.8854065545365559921.stgit@pasha-ThinkPad-X280>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
From d7f9d40777d1ed7c9450b0be4f957da2993dfc72 Mon Sep 17 00:00:00 2001
From: David Carlier <devnexen@gmail.com>
Date: Fri, 12 Jun 2020 09:39:17 +0100
Subject: [PATCH] util/getauxval: Porting to FreeBSD getauxval feature
FreeBSD has a similar API for auxiliary vector.
Signed-off-by: David Carlier <devnexen@gmail.com>
Message-Id: <CA+XhMqxTU6PUSQBpbA9VrS1QZfqgrCAKUCtUF-x2aF=fCMTDOw@mail.gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Current implementation of LLVM's SafeStack is not compatible with
code that uses an alternate stack created with sigaltstack().
Since coroutine-sigaltstack relies on sigaltstack(), it is not
compatible with SafeStack. The resulting binary is incorrect, with
different coroutines sharing the same unsafe stack and producing
undefined behavior at runtime.
In the future LLVM may provide a SafeStack implementation compatible with
sigaltstack(). In the meantime, if SafeStack is desired, the coroutine
implementation from coroutine-ucontext should be used.
As a safety check, add a control in coroutine-sigaltstack to throw a
preprocessor #error if SafeStack is enabled and we are trying to
use coroutine-sigaltstack to implement coroutines.
Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com>
Message-id: 20200529205122.714-3-dbuono@linux.vnet.ibm.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
LLVM's SafeStack instrumentation does not yet support programs that make
use of the APIs in ucontext.h
With the current implementation of coroutine-ucontext, the resulting
binary is incorrect, with different coroutines sharing the same unsafe
stack and producing undefined behavior at runtime.
This fix allocates an additional unsafe stack area for each coroutine,
and sets the new unsafe stack pointer before calling swapcontext() in
qemu_coroutine_new.
This is the only place where the pointer needs to be manually updated,
since sigsetjmp/siglongjmp are already instrumented by LLVM to properly
support SafeStack.
The additional stack is then freed in qemu_coroutine_delete.
Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com>
Message-id: 20200529205122.714-2-dbuono@linux.vnet.ibm.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
From 3025a0ce3fdf7d3559fc35a52c659f635f5c750c Mon Sep 17 00:00:00 2001
From: David Carlier <devnexen@gmail.com>
Date: Tue, 26 May 2020 21:35:27 +0100
Subject: [PATCH] util/oslib-posix : qemu_init_exec_dir implementation for Mac
Using dyld API to get the full path of the current process.
Signed-off-by: David Carlier <devnexen@gmail.com>
Message-id: CA+XhMqxwC10XHVs4Z-JfE0-WLAU3ztDuU9QKVi31mjr59HWCxg@mail.gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This allows us to see the name of the thread in tsan
warning reports such as this:
Thread T7 'CPU 1/TCG' (tid=24317, running) created by main thread at:
Signed-off-by: Robert Foley <robert.foley@linaro.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200609200738.445-12-robert.foley@linaro.org>
Message-Id: <20200612190237.30436-15-alex.bennee@linaro.org>
We tried running QEMU under tsan in 2016, but tsan's lack of support for
longjmp-based fibers was a blocker:
https://groups.google.com/forum/#!topic/thread-sanitizer/se0YuzfWazw
Fortunately, thread sanitizer gained fiber support in early 2019:
https://reviews.llvm.org/D54889
This patch brings tsan support upstream by importing the patch that annotated
QEMU's coroutines as tsan fibers in Android's QEMU fork:
https://android-review.googlesource.com/c/platform/external/qemu/+/844675
Tested with '--enable-tsan --cc=clang-9 --cxx=clang++-9 --disable-werror'
configure flags.
Signed-off-by: Lingfeng Yang <lfy@google.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
[cota: minor modifications + configure changes]
Signed-off-by: Robert Foley <robert.foley@linaro.org>
[RF: configure changes, coroutine fix + minor modifications]
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200609200738.445-2-robert.foley@linaro.org>
Message-Id: <20200612190237.30436-5-alex.bennee@linaro.org>
getpid is good enough in a mono thread context, however thr_self/_lwp_self
reflects the real current thread identifier from a given process.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Carlier <devnexen@gmail.com>
These objects are not required when configured with --disable-system.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Tested-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200522172510.25784-6-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
unix_listen/connect_saddr now support abstract address types
two aditional BOOL switches are introduced:
tight: whether to set @addrlen to the minimal string length,
or the maximum sun_path length. default is TRUE
abstract: whether we use abstract address. default is FALSE
cli example:
-monitor unix:/tmp/unix.socket,abstract,tight=off
OR
-chardev socket,path=/tmp/unix.socket,id=unix1,abstract,tight=on
Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The glib event loop does not call fdmon_io_uring_wait() so fd handlers
waiting to be submitted build up in the list. There is no benefit is
using io_uring when the glib GSource is being used, so disable it
instead of implementing a more complex fix.
This fixes a memory leak where AioHandlers would build up and increasing
amounts of CPU time were spent iterating them in aio_pending(). The
symptom is that guests become slow when QEMU is built with io_uring
support.
Buglink: https://bugs.launchpad.net/qemu/+bug/1877716
Fixes: 73fd282e7b ("aio-posix: add io_uring fd monitoring implementation")
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Oleksandr Natalenko <oleksandr@redhat.com>
Message-id: 20200511183630.279750-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The io_uring file descriptor monitoring implementation has an internal
list of fd handlers that are pending submission to io_uring.
fdmon_io_uring_destroy() deletes all fd handlers on the list.
Don't delete fd handlers directly in fdmon_io_uring_destroy() for two
reasons:
1. This duplicates the aio-posix.c AioHandler deletion code and could
become outdated if the struct changes.
2. Only handlers with the FDMON_IO_URING_REMOVE flag set are safe to
remove. If the flag is not set then something still has a pointer to
the fd handler. Let aio-posix.c and its user worry about that. In
practice this isn't an issue because fdmon_io_uring_destroy() is only
called when shutting down so all users have removed their fd
handlers, but the next patch will need this!
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Oleksandr Natalenko <oleksandr@redhat.com>
Message-id: 20200511183630.279750-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
- reduce client-side fragmentation of NBD trim and status requests
- fix iotest 41 when run in deep tree
- fix socket activation in qemu-nbd
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl6whTUACgkQp6FrSiUn
Q2r4nAf7BtGSFMkUu6nWYeq+Ggg+Xwmz2FLAzWTK/rccGDC44c9ETzOIbWEddo6X
FHpU07VXdLW1h2M7ox8lQVo0DZEFxTRBYTPtUtjB7izfkAs4CkYeElJsZAPAZKgU
GsKqa3RM6uXubsQaXXXjMFCGlYgqi1dVkmkgtPebt7evSe0ATlTfYfd0y9gb5f9C
cbHD3CVcGKQe4ZtNcSBpTzOvXJSrBZznyCyhBO2qmVXTynt/5Ygog+Ulq3DHZsPX
UkRkTPohKA0BhXuS7wD49danlzCLiTlvswr62fAncM1+AJTbmIa+apy3SwiOkwMh
Aawq5vDtaFV+HEBKbMC0QRhgtoEe1w==
=ExlI
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2020-05-04' into staging
nbd patches for 2020-05-04
- reduce client-side fragmentation of NBD trim and status requests
- fix iotest 41 when run in deep tree
- fix socket activation in qemu-nbd
# gpg: Signature made Mon 04 May 2020 22:12:21 BST
# gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg: aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A
* remotes/ericb/tags/pull-nbd-2020-05-04:
block/nbd-client: drop max_block restriction from discard
block/nbd-client: drop max_block restriction from block_status
iotests/041: Fix NBD socket path
tools: Fix use of fcntl(F_SETFD) during socket activation
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Silent static analyzer warning
Remove dead assignments
Support -chardev serial on macOS
Update MAINTAINERS
Some cosmetic changes
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAl6wOI4SHGxhdXJlbnRA
dml2aWVyLmV1AAoJEPMMOL0/L748p7UQAIFSNN0FrDV+K7i8qqq0X+JrS+dNOHNm
DSpOf8IaGm/BezzL6XirXBVpFxg9iB5DQVLsjP1kUggO7rbBO0blx5H5eOPhnXZj
xg60kLN16ty7NZ/WPS1G9jF4nDsjz0ZUtCXb0OXsuGJIOrsmN2r/lxdJwcjHZaqJ
RzbcCSFXlvL0g7mOakJinMJH5r/nWCiUoEYsikhP10DcvuSBoCnjr+LYV6Ef02G0
Y5lgKN2G0EAMgWTJaL3gIF27zS8QLDNll+eO+PIU5K4yo75/wRCKr4e3PpErZlf6
B+hCAAPnXCpDKw+8sK2z+9OZXUGe1hQ8LHNgNNM921C66f+vLLXpIDTAECihM4K4
0wThYlFDwT4j+PMHFNlzIobGMtb33ui8m40lepMt/YOVFqY4tr8u3MLhHkVDo2+8
sNuOOWLXAoFOYyRqgTeVJvZvMUFQqtDiftghw1BR55TyIpDWjvLYRqae5CI+MGXs
6YylZVHGzVjMVptxvivvIQ735Nq8LaKq7N8Cb7uvcbRaCki39BsxXVPZx4p6NdwN
dMndUOz/y75dNlRMDjK8l/oRFPJa/p1Yz8mZhl0uVOO6JeJhBwYmk+WkQ7g/GHZb
Rx15HnVWRu6C/Icbw4kqZYyqrgl5lykS8aAWURePdpjzKY77rY1H71FesMhjifRN
ZGgfUdWI88M4
=ibgH
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-for-5.1-pull-request' into staging
trivial patches (20200504)
Silent static analyzer warning
Remove dead assignments
Support -chardev serial on macOS
Update MAINTAINERS
Some cosmetic changes
# gpg: Signature made Mon 04 May 2020 16:45:18 BST
# gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg: issuer "laurent@vivier.eu"
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full]
# gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full]
# gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C
* remotes/vivier2/tags/trivial-branch-for-5.1-pull-request:
hw/timer/pxa2xx_timer: Add assertion to silent static analyzer warning
hw/timer/stm32f2xx_timer: Remove dead assignment
hw/gpio/aspeed_gpio: Remove dead assignment
hw/isa/i82378: Remove dead assignment
hw/ide/sii3112: Remove dead assignment
hw/input/adb-kbd: Remove dead assignment
hw/i2c/pm_smbus: Remove dead assignment
blockdev: Remove dead assignment
block: Avoid dead assignment
Compress lines for immediate return
chardev: Add macOS to list of OSes that support -chardev serial
MAINTAINERS: Update Keith Busch's email address
elf_ops: Don't try to g_mapped_file_unref(NULL)
hw/mem/pc-dimm: Fix line over 80 characters warning
hw/mem/pc-dimm: Print slot number on error at pc_dimm_pre_plug()
MAINTAINERS: Mark the LatticeMico32 target as orphan
timer/exynos4210_mct: Remove redundant statement in exynos4210_mct_write()
display/blizzard: use extract16() for fix clang analyzer warning in blizzard_draw_line16_32()
scsi/esp-pci: add g_assert() for fix clang analyzer warning in esp_pci_io_write()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Blindly setting FD_CLOEXEC without a read-modify-write will
inadvertently clear any other intentionally-set bits, such as a
proposed new bit for designating a fd that must behave in 32-bit mode.
However, we cannot use our wrapper qemu_set_cloexec(), because that
wrapper intentionally abort()s on failure, whereas the probe here
intentionally tolerates failure to deal with incorrect socket
activation gracefully. Instead, fix the code to do the proper
read-modify-write.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200420175309.75894-3-eblake@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
- ran regexp "qemu_mutex_lock\(.*\).*\n.*if" to find targets
- replaced result with QEMU_LOCK_GUARD if all unlocks at function end
- replaced result with WITH_QEMU_LOCK_GUARD if unlock not at end
Signed-off-by: Daniel Brodsky <dnbrdsky@gmail.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-id: 20200404042108.389635-3-dnbrdsky@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Compress two lines into a single line if immediate return statement is found.
It also remove variables progress, val, data, ret and sock
as they are no longer needed.
Remove space between function "mixer_load" and '(' to fix the
checkpatch.pl error:-
ERROR: space prohibited between function name and open parenthesis '('
Done using following coccinelle script:
@@
local idexpression ret;
expression e;
@@
-ret =
+return
e;
-return ret;
Signed-off-by: Simran Singhal <singhalsimran0@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200401165314.GA3213@simran-Inspiron-5558>
[lv: in handle_aiocb_write_zeroes_unmap() move "int ret" inside the #ifdef]
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200415083048.14339-6-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
qdict_iter() has just three uses and no test coverage. Replace by
qdict_first(), qdict_next() for more concise code and less type
punning.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200415083048.14339-5-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
is_valid_option_list()'s purpose is ensuring qemu-img.c's can safely
join multiple parameter strings separated by ',' like this:
g_strdup_printf("%s,%s", params1, params2);
How it does that is anything but obvious. A close reading of the code
reveals that it fails exactly when its argument starts with ',' or
ends with an odd number of ','. Makes sense, actually, because when
the argument starts with ',', a separating ',' preceding it would get
escaped, and when it ends with an odd number of ',', a separating ','
following it would get escaped.
Move it to qemu-img.c and rewrite it the obvious way.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200415074927.19897-9-armbru@redhat.com>
When opts_parse() sets @invalidp to true, qemu_opts_parse_noisily()
uses has_help_option() to decide whether to print help. This parses
the input string a second time.
Easy to avoid: replace @invalidp by @help_wanted.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200415074927.19897-7-armbru@redhat.com>
has_help_option() uses its own parser. It's inconsistent with
qemu_opts_parse(), as demonstrated by test-qemu-opts case
/qemu-opts/has_help_option. Fix by reusing the common parser.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200415074927.19897-5-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200415074927.19897-4-armbru@redhat.com>
The next commits will put it to use.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200415074927.19897-3-armbru@redhat.com>
With the module upgrades code change, the statically sized dirs array
can now overflow. Increase it's size by one, according to the new
maximum possible usage.
Fixes: bd83c861c0 ("modules: load modules from versioned /var/run dir")
Signed-off-by: Bruce Rogers <brogers@suse.com>
Message-Id: <20200411010746.472295-1-brogers@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In touch_all_pages, if the mutex is not taken around qemu_cond_broadcast,
qemu_cond_broadcast may be called before all touch page threads enter
qemu_cond_wait. In this case, the touch page threads wait forever for the
main thread to wake them up, causing a deadlock.
Signed-off-by: Bauerchen <bauerchen@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When using C11 atomics, non-seqcst reads and writes do not participate
in the total order of seqcst operations. In util/async.c and util/aio-posix.c,
in particular, the pattern that we use
write ctx->notify_me write bh->scheduled
read bh->scheduled read ctx->notify_me
if !bh->scheduled, sleep if ctx->notify_me, notify
needs to use seqcst operations for both the write and the read. In
general this is something that we do not want, because there can be
many sources that are polled in addition to bottom halves. The
alternative is to place a seqcst memory barrier between the write
and the read. This also comes with a disadvantage, in that the
memory barrier is implicit on strongly-ordered architectures and
it wastes a few dozen clock cycles.
Fortunately, ctx->notify_me is never written concurrently by two
threads, so we can assert that and relax the writes to ctx->notify_me.
The resulting solution works and performs well on both aarch64 and x86.
Note that the atomic_set/atomic_read combination is not an atomic
read-modify-write, and therefore it is even weaker than C11 ATOMIC_RELAXED;
on x86, ATOMIC_RELAXED compiles to a locked operation.
Analyzed-by: Ying Fang <fangying1@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Ying Fang <fangying1@huawei.com>
Message-Id: <20200407140746.8041-6-pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The io_uring_enter(2) syscall returns with errno=EINTR when interrupted
by a signal. Retry the syscall in this case.
It's essential to do this in the io_uring_submit_and_wait() case. My
interpretation of the Linux v5.5 io_uring_enter(2) code is that it
shouldn't affect the io_uring_submit() case, but there is no guarantee
this will always be the case. Let's check for -EINTR around both APIs.
Note that the liburing APIs have -errno return values.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200408091139.273851-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Unfortunately reading /proc/self/maps is still considered the gold
standard for a process finding out about it's own memory layout. As we
will want this data in other contexts soon factor out the code to read
and parse the data. Rather than just blindly copying the existing
sscanf based code we use a more modern glib version of the parsing
code to make a more general purpose map structure.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200403191150.863-9-alex.bennee@linaro.org>
When a file descriptor becomes ready we must re-arm POLL_ADD. This is
done by adding an sqe to the io_uring sq ring. The ->need_wait()
function wasn't taking pending sqes into account and therefore
io_uring_submit_and_wait() was not being called. Polling for cqes
failed to detect fd readiness since we hadn't submitted the sqe to
io_uring.
This patch fixes the following tests/test-aio -p /aio/event/wait
failure:
ok 11 /aio/event/wait
**
ERROR:tests/test-aio.c:374:test_flush_event_notifier: assertion failed: (aio_poll(ctx, false))
Reported-by: Cole Robinson <crobinso@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200402145434.99349-1-stefanha@redhat.com
Fixes: 73fd282e7b
("aio-posix: add io_uring fd monitoring implementation")
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
By increasing avx2 length_to_accel to 128, we can simplify its logic and reduce a
branch.
The authorship of this patch actually belongs to Richard Henderson
<richard.henderson@linaro.org>, I just fixed a boundary case on his
original patch.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Message-Id: <1585119021-46593-2-git-send-email-robert.hu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Because in unit test, init_accel() will be called several times, each with
different accelerator type.
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Message-Id: <1585119021-46593-1-git-send-email-robert.hu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When external event sources are disabled fdmon-io_uring falls back to
fdmon-poll. The ->need_wait() callback needs to watch for this so it
can return true when external event sources are disabled.
It is also necessary to call ->wait() when AioHandlers have changed
because io_uring is asynchronous and we must submit new sqes.
Both of these changes to ->need_wait() together fix tests/test-aio -p
/aio/external-client, which failed with:
test-aio: tests/test-aio.c:404: test_aio_external_client: Assertion `aio_poll(ctx, false)' failed.
Reported-by: Julia Suvorova <jusual@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20200319163559.117903-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Firstly, _next_dirty_area is for scenarios when we may contiguously
search for next dirty area inside some limited region, so it is more
comfortable to specify "end" which should not be recalculated on each
iteration.
Secondly, let's add a possibility to limit resulting area size, not
limiting searching area. This will be used in NBD code in further
commit. (Note that now bdrv_dirty_bitmap_next_dirty_area is unused)
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-8-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
We have bdrv_dirty_bitmap_next_zero, let's add corresponding
bdrv_dirty_bitmap_next_dirty, which is more comfortable to use than
bitmap iterators in some cases.
For test modify test_hbitmap_next_zero_check_range to check both
next_zero and next_dirty and add some new checks.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-7-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
We are going to introduce bdrv_dirty_bitmap_next_dirty so that same
variable may be used to store its return value and to be its parameter,
so it would int64_t.
Similarly, we are going to refactor hbitmap_next_dirty_area to use
hbitmap_next_dirty together with hbitmap_next_zero, therefore we want
hbitmap_next_zero parameter type to be int64_t too.
So, for convenience update all parameters of *_next_zero and
*_next_dirty_area to be int64_t.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-6-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-5-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
Function is internal and even commented as internal. Drop its
definition from .h file.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-4-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
The function is definitely internal (it's not used by third party and
it has complicated interface). Move it to .c file.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-3-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
We have APIs which returns signed int64_t, to be able to return error.
Therefore we can't handle bitmaps with absolute size larger than
(INT64_MAX+1). Still, keep maximum to be INT64_MAX which is a bit
safer.
Note, that bitmaps are used to represent disk images, which can't
exceed INT64_MAX anyway.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20200205112041.6003-2-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
This patch introduces two lock guard macros that automatically unlock a
lock object (QemuMutex and others):
void f(void) {
QEMU_LOCK_GUARD(&mutex);
if (!may_fail()) {
return; /* automatically unlocks mutex */
}
...
}
and:
WITH_QEMU_LOCK_GUARD(&mutex) {
if (!may_fail()) {
return; /* automatically unlocks mutex */
}
}
/* automatically unlocks mutex here */
...
Convert qemu-timer.c functions that benefit from these macros as an
example. Manual qemu_mutex_lock/unlock() callers are left unmodified in
cases where clarity would not improve by switching to the macros.
Many other QemuMutex users remain in the codebase that might benefit
from lock guards. Over time they can be converted, if that is
desirable.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
[Use QEMU_MAKE_LOCKABLE_NONNULL. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On upgrades the old .so files usually are replaced. But on the other
hand since a qemu process represents a guest instance it is usually kept
around.
That makes late addition of dynamic features e.g. 'hot-attach of a ceph
disk' fail by trying to load a new version of e.f. block-rbd.so into an
old still running qemu binary.
This adds a fallback to also load modules from a versioned directory in the
temporary /var/run path. That way qemu is providing a way for packaging
to store modules of an upgraded qemu package as needed until the next reboot.
An example how that can then be used in packaging can be seen in:
https://git.launchpad.net/~paelzer/ubuntu/+source/qemu/log/?h=bug-1847361-miss-old-so-on-upgrade-UBUNTU
Fixes: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1847361
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20200310145806.18335-2-christian.ehrhardt@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The mutex and condition variable were never initialized, causing
-mem-prealloc to abort with an assertion failure.
Fixes: 037fb5eb39
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Cc: bauerchen <bauerchen@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
And intialize buffer_is_zero() with it, when Intel AVX512F is
available on host.
This function utilizes Intel AVX512 fundamental instructions which
is faster than its implementation with AVX2 (in my unit test, with
4K buffer, on CascadeLake SP, ~36% faster, buffer_zero_avx512() V.S.
buffer_zero_avx2()).
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When there are many poll handlers it's likely that some of them are idle
most of the time. Remove handlers that haven't had activity recently so
that the polling loop scales better for guests with a large number of
devices.
This feature only takes effect for the Linux io_uring fd monitoring
implementation because it is capable of combining fd monitoring with
userspace polling. The other implementations can't do that and risk
starving fds in favor of poll handlers, so don't try this optimization
when they are in use.
IOPS improves from 10k to 105k when the guest has 100
virtio-blk-pci,num-queues=32 devices and 1 virtio-blk-pci,num-queues=1
device for rw=randread,iodepth=1,bs=4k,ioengine=libaio on NVMe.
[Clarified aio_poll_handlers locking discipline explanation in comment
after discussion with Paolo Bonzini <pbonzini@redhat.com>.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200305170806.1313245-8-stefanha@redhat.com
Message-Id: <20200305170806.1313245-8-stefanha@redhat.com>
Unlike ppoll(2) and epoll(7), Linux io_uring completions can be polled
from userspace. Previously userspace polling was only allowed when all
AioHandler's had an ->io_poll() callback. This prevented starvation of
fds by userspace pollable handlers.
Add the FDMonOps->need_wait() callback that enables userspace polling
even when some AioHandlers lack ->io_poll().
For example, it's now possible to do userspace polling when a TCP/IP
socket is monitored thanks to Linux io_uring.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200305170806.1313245-7-stefanha@redhat.com
Message-Id: <20200305170806.1313245-7-stefanha@redhat.com>
The recent Linux io_uring API has several advantages over ppoll(2) and
epoll(2). Details are given in the source code.
Add an io_uring implementation and make it the default on Linux.
Performance is the same as with epoll(7) but later patches add
optimizations that take advantage of io_uring.
It is necessary to change how aio_set_fd_handler() deals with deleting
AioHandlers since removing monitored file descriptors is asynchronous in
io_uring. fdmon_io_uring_remove() marks the AioHandler deleted and
aio_set_fd_handler() will let it handle deletion in that case.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200305170806.1313245-6-stefanha@redhat.com
Message-Id: <20200305170806.1313245-6-stefanha@redhat.com>
The AioHandler *node, bool is_new arguments are more complicated to
think about than simply being given AioHandler *old_node, AioHandler
*new_node.
Furthermore, the new Linux io_uring file descriptor monitoring mechanism
added by the new patch requires access to both the old and the new
nodes. Make this change now in preparation.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200305170806.1313245-5-stefanha@redhat.com
Message-Id: <20200305170806.1313245-5-stefanha@redhat.com>
The ppoll(2) and epoll(7) file descriptor monitoring implementations are
mixed with the core util/aio-posix.c code. Before adding another
implementation for Linux io_uring, extract out the existing
ones so there is a clear interface and the core code is simpler.
The new interface is AioContext->fdmon_ops, a pointer to a FDMonOps
struct. See the patch for details.
Semantic changes:
1. ppoll(2) now reflects events from pollfds[] back into AioHandlers
while we're still on the clock for adaptive polling. This was
already happening for epoll(7), so if it's really an issue then we'll
need to fix both in the future.
2. epoll(7)'s fallback to ppoll(2) while external events are disabled
was broken when the number of fds exceeded the epoll(7) upgrade
threshold. I guess this code path simply wasn't tested and no one
noticed the bug. I didn't go out of my way to fix it but the correct
code is simpler than preserving the bug.
I also took some liberties in removing the unnecessary
AioContext->epoll_available (just check AioContext->epollfd != -1
instead) and AioContext->epoll_enabled (it's implicit if our
AioContext->fdmon_ops callbacks are being invoked) fields.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200305170806.1313245-4-stefanha@redhat.com
Message-Id: <20200305170806.1313245-4-stefanha@redhat.com>
Now that run_poll_handlers_once() is only called by run_poll_handlers()
we can improve the CPU time profile by moving the expensive
RCU_READ_LOCK() out of the polling loop.
This reduces the run_poll_handlers() from 40% CPU to 10% CPU in perf's
sampling profiler output.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200305170806.1313245-3-stefanha@redhat.com
Message-Id: <20200305170806.1313245-3-stefanha@redhat.com>
One iteration of polling is always performed even when polling is
disabled. This is done because:
1. Userspace polling is cheaper than making a syscall. We might get
lucky.
2. We must poll once more after polling has stopped in case an event
occurred while stopping polling.
However, there are downsides:
1. Polling becomes a bottleneck when the number of event sources is very
high. It's more efficient to monitor fds in that case.
2. A high-frequency polling event source can starve non-polling event
sources because ppoll(2)/epoll(7) is never invoked.
This patch removes the forced polling iteration so that poll_ns=0 really
means no polling.
IOPS increases from 10k to 60k when the guest has 100
virtio-blk-pci,num-queues=32 devices and 1 virtio-blk-pci,num-queues=1
device because the large number of event sources being polled slows down
the event loop.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200305170806.1313245-2-stefanha@redhat.com
Message-Id: <20200305170806.1313245-2-stefanha@redhat.com>
QLIST_SAFE_REMOVE() is confusing here because the node must be on the
list. We actually just wanted to clear the linked list pointers when
removing it from the list. QLIST_REMOVE() now does this, so switch to
it.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200224103406.1894923-3-stefanha@redhat.com
Message-Id: <20200224103406.1894923-3-stefanha@redhat.com>
Use error_setg_win32() which adds a hint similar to strerror(errno)).
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200228100726.8414-3-philmd@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
[desc]:
Large memory VM starts slowly when using -mem-prealloc, and
there are some areas to optimize in current method;
1、mmap will be used to alloc threads stack during create page
clearing threads, and it will attempt mm->mmap_sem for write
lock, but clearing threads have hold read lock, this competition
will cause threads createion very slow;
2、methods of calcuating pages for per threads is not well;if we use
64 threads to split 160 hugepage,63 threads clear 2page,1 thread
clear 34 page,so the entire speed is very slow;
to solve the first problem,we add a mutex in thread function,and
start all threads when all threads finished createion;
and the second problem, we spread remainder to other threads,in
situation that 160 hugepage and 64 threads, there are 32 threads
clear 3 pages,and 32 threads clear 2 pages.
[test]:
320G 84c VM start time can be reduced to 10s
680G 84c VM start time can be reduced to 18s
Signed-off-by: bauerchen <bauerchen@tencent.com>
Reviewed-by: Pan Rui <ruippan@tencent.com>
Reviewed-by: Ivan Ren <ivanren@tencent.com>
[Simplify computation of the number of pages per thread. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The virtual-device fuzzer must initialize QOM, prior to running
vl:qemu_init, so that it can use the qos_graph to identify the arguments
required to initialize a guest for libqos-assisted fuzzing. This change
prevents errors when vl:qemu_init tries to (re)initialize the previously
initialized QOM module.
Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200220041118.23264-4-alxndr@bu.edu
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
File descriptor monitoring is O(1) with epoll(7), but
aio_dispatch_handlers() still scans all AioHandlers instead of
dispatching just those that are ready. This makes aio_poll() O(n) with
respect to the total number of registered handlers.
Add a local ready_list to aio_poll() so that each nested aio_poll()
builds a list of handlers ready to be dispatched. Since file descriptor
polling is level-triggered, nested aio_poll() calls also see fds that
were ready in the parent but not yet dispatched. This guarantees that
nested aio_poll() invocations will dispatch all fds, even those that
became ready before the nested invocation.
Since only handlers ready to be dispatched are placed onto the
ready_list, the new aio_dispatch_ready_handlers() function provides O(1)
dispatch.
Note that AioContext polling is still O(n) and currently cannot be fully
disabled. This still needs to be fixed before aio_poll() is fully O(1).
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20200214171712.541358-6-stefanha@redhat.com
[Fix compilation error on macOS where there is no epoll(87). The
aio_epoll() prototype was out of date and aio_add_ready_list() needed to
be moved outside the ifdef.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It is not necessary to scan all AioHandlers for deletion. Keep a list
of deleted handlers instead of scanning the full list of all handlers.
The AioHandler->deleted field can be dropped. Let's check if the
handler has been inserted into the deleted list instead. Add a new
QLIST_IS_INSERTED() API for this check.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20200214171712.541358-5-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Don't pass the nanosecond timeout into epoll_wait(), which expects
milliseconds.
The epoll_wait() timeout value does not matter if qemu_poll_ns()
determined that the poll fd is ready, but passing a value in the wrong
units is still ugly. Pass a 0 timeout to epoll_wait() instead.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20200214171712.541358-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
epoll_handler is a stack variable and must not be accessed after it goes
out of scope:
if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) {
AioHandler epoll_handler;
...
add_pollfd(&epoll_handler);
ret = aio_epoll(ctx, pollfds, npfd, timeout);
} ...
...
/* if we have any readable fds, dispatch event */
if (ret > 0) {
for (i = 0; i < npfd; i++) {
nodes[i]->pfd.revents = pollfds[i].revents;
}
}
nodes[0] is &epoll_handler, which has already gone out of scope.
There is no need to use pollfds[] for epoll. We don't need an
AioHandler for the epoll fd.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20200214171712.541358-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The ctx->first_bh list contains all created BHs, including those that
are not scheduled. The list is iterated by the event loop and therefore
has O(n) time complexity with respected to the number of created BHs.
Rewrite BHs so that only scheduled or deleted BHs are enqueued.
Only BHs that actually require action will be iterated.
One semantic change is required: qemu_bh_delete() enqueues the BH and
therefore invokes aio_notify(). The
tests/test-aio.c:test_source_bh_delete_from_cb() test case assumed that
g_main_context_iteration(NULL, false) returns false after
qemu_bh_delete() but it now returns true for one iteration. Fix up the
test case.
This patch makes aio_compute_timeout() and aio_bh_poll() drop from a CPU
profile reported by perf-top(1). Previously they combined to 9% CPU
utilization when AioContext polling is commented out and the guest has 2
virtio-blk,num-queues=1 and 99 virtio-blk,num-queues=32 devices.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20200221093951.1414693-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The first rcu_read_lock/unlock() is expensive. Nested calls are cheap.
This optimization increases IOPS from 73k to 162k with a Linux guest
that has 2 virtio-blk,num-queues=1 and 99 virtio-blk,num-queues=32
devices.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20200218182708.914552-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Here's the next patch of ppc target patches. Highlights are:
* Some fixes for CAS / unplug interactions
* Remove some leaks of device trees
* Some fixes for the PHB3 and PHB4 devices
* Support for NVDIMMs on the pseries machine type
* Assorted other fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAl5PUAwACgkQbDjKyiDZ
s5KBiBAAzUs/K1ygQuvcNTChOeu+3rBMe5XWRSRPIhqIZWnebssptwiMfqjVQXHB
1unLmmkUU8ov9cXWiRiAF/mrmBX6Kw2UbnNzjatvB7L1BX8b3Z848vYKb0mTZG6g
Q7E8nyqQqmWUtzeF6B0sNf597oPalHO5j6r5e/Pl5hjZGkjDXihB5tKePBRucnOO
wCsWSsoafaBJwq2w8KBahfgLYdQdCW2G2GHcfGouobzravEuOaxpknoLJiUw65Wa
Gf6PFrJE5XxDbn+zxkvs6RvPnNMkfFxn0fYbmPghjAM58Unz2LGyf9RRd5b6beuu
fYsoZPxX+gOFVjBBwjTmj+VxSL1Fw+b2xOrd1uf7zVNyJNO8AsxRWQMD7bLFTGSZ
95KkP5wSjcw2sm8eTtKkUAOP9YcTBJP1mjfjX/9rSGASCoOv6DJ71YOtyI4pzh5S
DYLYm2EL3A0BxqxLodSx4xcTrh3IHAUXcteYr+ndGOy7MHzZGR88gtUmuAoFcgKc
N0xV8dRoC5gNgLLxQ7zMh9q6qQHr0n13rRFKQ075piU4ce7JUzSR+EXCfQh2HLv+
GYM+FpZo2zFsWZxXDQE1fsLaCfz533HsplevAM2J/bGWAokYMEZKV6jV7DKXgxW0
6SrrSdxJrEThb198j1jSKCs0vwLDPcvV1GV6/sWStHBH1totRhY=
=hrAW
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200221' into staging
ppc patch queue 2020-02-21
Here's the next patch of ppc target patches. Highlights are:
* Some fixes for CAS / unplug interactions
* Remove some leaks of device trees
* Some fixes for the PHB3 and PHB4 devices
* Support for NVDIMMs on the pseries machine type
* Assorted other fixes and cleanups
# gpg: Signature made Fri 21 Feb 2020 03:35:40 GMT
# gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-5.0-20200221:
hw/ppc/virtex_ml507:fix leak of fdevice tree blob
spapr: Fix handling of unplugged devices during CAS and migration
spapr: Don't use spapr_drc_needed() in CAS code
ppc: free 'fdt' after reset the machine
target/ppc/cpu.h: Clean up comments in the struct CPUPPCState definition
target/ppc/cpu.h: Move fpu related members closer in cpu env
target/ppc: Fix typo in comments
spapr: Allow changing offset for -kernel image
pnv/phb3: Add missing break statement
pnv/phb4: Fix error path in pnv_pec_realize()
pnv/phb3: Convert 1u to 1ull
target/ppc/cpu.h: Remove duplicate includes
spapr: Add Hcalls to support PAPR NVDIMM device
spapr: Add NVDIMM device support
nvdimm: add uuid property to nvdimm
mem: move nvdimm_device_list to utilities
ppc: function to setup latest class options
ppc/pnv: Fix PCI_EXPRESS dependency
qtest: Fix rtas dependencies
spapr/rtas: Print message from "ibm,os-term"
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
nvdimm_device_list is required for parsing the list for devices
in subsequent patches. Move it to common utility area.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <158131055857.2897.15658377276504711773.stgit@lep8c.aus.stglabs.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Disable by default build of fdt, slirp and tools with linux-user
Improve strace and use qemu_log to send trace to a file
Add partial ALSA ioctl supports
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAl5OT1QSHGxhdXJlbnRA
dml2aWVyLmV1AAoJEPMMOL0/L748i9kQAJYbShtYQNoNhSf/joKtQbxJ6vtQPryU
8J85wD2RShYpX2qi86WZ4IGFhSE+wFQ+Lfi1Z0SPs/VUgTSaiOYYEZBfROiKzY/M
hbpXlfzWGOLxfzgP17QR5eoSxbVA81N7ruyTiUxFGUTeVbowIXR+/v5Ek8GOJCuu
pAA/07NHxuP/YP2MkrlfAolFte5riQgfQT+QrJwpVwygdRwMx0Ed+gDWJosl8zL/
qEOzyZlXI/Mm+5bKCUQknSqpL3yQibmYfqM0XLhxjfvZ4vJDYk3Vet7ww39Je9Rz
7eAmXU7ERzzuFBTA7HEJjAjy+BrPkPMJINczfYcahUrt3TYS36DxinTre0TJbkMV
1np1Snn4LhSbvcAY/gFpO6X+lS5siYKZC9Ki2chBfl7AygjfswSl1wqE4cNwEyma
ycjdYnlzo8JvuPPI7qploFAPr0hDgloAVfIzTlz+LC7dhsAhHtpf0vq0xC717z98
qOTO1MVoZa0WserrnyAFDaNkBwwYGwhm9WKcd23/9XO6gjc5yZWF98oc31Z1KpT4
ya+qqpcy++FNM3K+74BC403pdoda75N7sl0HYsxJWnhRjUOAPjEwHRo69iGzO/ku
8AOcjws/wf2BnoR7KO/KLQrkpc+EAGY/ulndTqUEyC1MdsMoF6VOB5aTFiBgJBMO
GlsvTkt+05i4
=WAH+
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/vivier2/tags/linux-user-for-5.0-pull-request' into staging
Implement membarrier, SO_RCVTIMEO and SO_SNDTIMEO
Disable by default build of fdt, slirp and tools with linux-user
Improve strace and use qemu_log to send trace to a file
Add partial ALSA ioctl supports
# gpg: Signature made Thu 20 Feb 2020 09:20:20 GMT
# gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg: issuer "laurent@vivier.eu"
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full]
# gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full]
# gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C
* remotes/vivier2/tags/linux-user-for-5.0-pull-request:
linux-user: Add support for selected alsa timer instructions using ioctls
linux-user: Add support for getting/setting selected alsa timer parameters using ioctls
linux-user: Add support for selecting alsa timer using ioctl
linux-user: Add support for getting/setting specified alsa timer parameters using ioctls
linux-user: Add support for getting alsa timer version and id
linux-user: remove gemu_log from the linux-user tree
linux-user: Use `qemu_log' for strace
linux-user: Use `qemu_log' for non-strace logging
configure: Avoid compiling system tools on user build by default
linux-user/strace: Improve output of various syscalls
configure: linux-user doesn't need neither fdt nor slirp
linux-user: implement getsockopt SO_RCVTIMEO and SO_SNDTIMEO
linux-user: Implement membarrier syscall
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This change switches linux-user strace logging to use the newer `qemu_log`
logging subsystem rather than the older `gemu_log` (notice the "g")
logger. `qemu_log` has several advantages, namely that it allows logging
to a file, and provides a more unified interface for configuration
of logging (via the QEMU_LOG environment variable or options).
This change introduces a new log mask: `LOG_STRACE` which is used for
logging of user-mode strace messages.
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Josh Kunz <jkz@google.com>
Message-Id: <20200204025416.111409-3-jkz@google.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
In a few places we report errno formatted as a negative integer.
This is not as user friendly as it can be. Use strerror() and/or
error_setg_errno() instead.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <4949c3ecf1a32189b8a4b5eb4b0fd04c1122501d.1581674006.git.mprivozn@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Some older parts of QEMU's codebase assume that CLOCK_MONOTONIC
might not be defined by the host OS, and have workarounds to
deal with this. However, more recently (notably in commit
50290c002c for qemu-img in mid-2019, but also much
earlier in 2011 in commit 22795174a3 for ui/spice-display.c)
we've written code that assumes CLOCK_MONOTONIC is always
defined. The only host OS anybody's ever noticed this on
is OSX 10.11 and earlier, which we don't support.
So we can assume that all our host OSes have the #define,
and we can remove some now-unnecessary ifdefs.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20200201172252.6605-1-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
NULL is a valid log filename used to indicate we want to use stderr
but qemu_set_log_filename (which is called by bsd-user/main.c) was not
handling it correctly.
That also made redundant a couple of NULL checks in calling code which
have been removed.
Signed-off-by: Salvador Fandino <salvador@qindel.com>
Message-Id: <20200123193626.19956-1-salvador@qindel.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
uClibc defines _SC_LEVEL1_ICACHE_LINESIZE and _SC_LEVEL1_DCACHE_LINESIZE
but the corresponding sysconf calls returns -1, which is a valid result,
meaning that the limit is indeterminate.
Handle this situation using the fallback values instead of crashing due
to an assertion failure.
Signed-off-by: Carlos Santos <casantos@redhat.com>
Message-Id: <20191017123713.30192-1-casantos@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>