Every laio_io_plug() call has a matching laio_io_unplug() call. There is
a plugged counter that tracks the number of levels of plugging and
allows for nesting.
The plugged counter must reflect the balance between laio_io_plug() and
laio_io_unplug() calls accurately. Otherwise I/O stalls occur since
io_submit(2) calls are skipped while plugged.
Reported-by: Nikolay Tenev <nt@storpool.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20220609164712.1539045-2-stefanha@redhat.com
Cc: Stefano Garzarella <sgarzare@redhat.com>
Fixes: 68d7946648 ("linux-aio: add `dev_max_batch` parameter to laio_io_unplug()")
[Stefano Garzarella suggested adding a Fixes tag.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Linux recently added a new io_uring(7) optimization API that QEMU
doesn't take advantage of yet. The liburing library that QEMU uses
has added a corresponding new API calling io_uring_register_ring_fd().
When this API is called after creating the ring, the io_uring_submit()
library function passes a flag to the io_uring_enter(2) syscall
allowing it to skip the ring file descriptor fdget()/fdput()
operations. This saves some CPU cycles.
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Message-id: 20220531105011.111082-1-faithilikerun@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
qemu_co_queue_restart_all is basically the same as qemu_co_enter_all
but without a QemuLockable argument. That's perfectly fine, but only as
long as the function is marked coroutine_fn. If used outside coroutine
context, qemu_co_queue_wait will attempt to take the lock and that
is just broken: if you are calling qemu_co_queue_restart_all outside
coroutine context, the lock is going to be a QemuMutex which cannot be
taken twice by the same thread.
The patch adds the marker to qemu_co_queue_restart_all and to its sole
non-coroutine_fn caller; it then reimplements the function in terms of
qemu_co_enter_all_impl, to remove duplicated code and to clarify that the
latter also works in coroutine context.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20220427130830.150180-4-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Leading underscores are ill-advised because such identifiers are
reserved. Trailing underscores are merely ugly. Strip both.
Our header guards commonly end in _H. Normalize the exceptions.
Macros should be ALL_CAPS. Normalize the exception.
Done with scripts/clean-header-guards.pl.
include/hw/xen/interface/ and tools/virtiofsd/ left alone, because
these were imported from Xen and libfuse respectively.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220506134911.2856099-3-armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Header guard symbols should match their file name to make guard
collisions less likely.
Cleaned up with scripts/clean-header-guards.pl, followed by some
renaming of new guard symbols picked by the script to better ones.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220506134911.2856099-2-armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[Change to generated file ebpf/rss.bpf.skeleton.h backed out]
VMDK disk data is stored in extents, which may or may not be separate
from bs->file. VmdkExtent.file points to where they are stored. Each
that is stored in bs->file will simply reuse the exact pointer value of
bs->file.
(That is why vmdk_free_extents() will unref VmdkExtent.file (e->file)
only if e->file != bs->file.)
Reopen operations can change bs->file (they will replace the whole
BdrvChild object, not just the BDS stored in that BdrvChild), and then
we will need to change all .file pointers of all such VmdkExtents to
point to the new BdrvChild.
In vmdk_reopen_prepare(), we have to check which VmdkExtents are
affected, and in vmdk_reopen_commit(), we can modify them. We have to
split this because:
- The new BdrvChild is created only after prepare, so we can change
VmdkExtent.file only in commit
- In commit, there no longer is any (valid) reference to the old
BdrvChild object, so there would be nothing to compare VmdkExtent.file
against to see whether it was equal to bs->file before reopening
(There is BDRVReopenState.old_file_bs, but the old bs->file
BdrvChild's .bs pointer will be NULL-ed when the new BdrvChild is
created, and so we cannot compare VmdkExtent.file->bs against
BDRVReopenState.old_file_bs)
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220314162719.65384-2-hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qcow2_co_invalidate_cache() closes and opens the qcow2 file, by calling
qcow2_close() and qcow2_do_open(). These two functions must thus be
usable from both a global-state and an I/O context.
As they are, they are not safe to call in an I/O context, because they
use bdrv_unref_child() and bdrv_open_child() to close/open the data_file
child, respectively, both of which are global-state functions. When
used from qcow2_co_invalidate_cache(), we do not need to close/open the
data_file child, though (we do not do this for bs->file or bs->backing
either), and so we should skip it in the qcow2_co_invalidate_cache()
path.
To do so, add a parameter to qcow2_do_open() and qcow2_close() to make
them skip handling s->data_file, and have qcow2_co_invalidate_cache()
exempt it from the memset() on the BDRVQcow2State.
(Note that the QED driver similarly closes/opens the QED image by
invoking bdrv_qed_close()+bdrv_qed_do_open(), but both functions seem
safe to use in an I/O context.)
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/945
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220427114057.36651-3-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It is only used by block/file-posix.c, move it there.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
requests[].receiving is set by nbd_receive_replies() under the receive_mutex;
Read it under the same mutex as well. Waking up receivers on errors happens
after each reply finishes processing, in nbd_co_receive_one_chunk().
If there is no currently-active reply, there are two cases:
* either there is no active request at all, in which case no
element of request[] can have .receiving = true
* or nbd_receive_replies() must be running and owns receive_mutex;
in that case it will get back to nbd_co_receive_one_chunk() because
the socket has been shutdown, and all waiting coroutines will wake up
in turn.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414175756.671165-9-pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
Remove the confusing, and most likely wrong, atomics. The only function
that used to be somewhat in a hot path was nbd_client_connected(),
but it is not anymore after the previous patches.
The same logic is used both to check if a request had to be reissued
and also in nbd_reconnecting_attempt(). The former cases are outside
requests_lock, while nbd_reconnecting_attempt() does have the lock,
therefore the two have been separated in the previous commit.
nbd_client_will_reconnect() can simply take s->requests_lock, while
nbd_reconnecting_attempt() can inline the access now that no
complicated atomics are involved.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414175756.671165-8-pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
Prepare for the next patch, so that the diff is less confusing.
nbd_client_connecting is moved closer to the definition point.
nbd_client_connecting_wait() is kept only for the reconnection
logic; when it is used to check if a request has to be reissued,
use the renamed function nbd_client_will_reconnect(). In the
next patch, the two cases will have different locking requirements.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414175756.671165-7-pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
The condition for waiting on the s->free_sema queue depends on
both s->in_flight and s->state. The latter is currently using
atomics, but this is quite dubious and probably wrong.
Because s->state is written in the main thread too, for example by
the yank callback, it cannot be protected by a CoMutex. Introduce a
separate lock that can be used by nbd_co_send_request(); later on this
lock will also be used for s->state. There will not be any contention
on the lock unless there is a yank or reconnect, so this is not
performance sensitive.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414175756.671165-6-pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
Elevate s->in_flight early so that other incoming requests will wait
on the CoQueue in nbd_co_send_request; restart them after getting back
from nbd_reconnect_attempt. This could be after the reconnect timer or
nbd_cancel_in_flight have cancelled the attempt, so there is no
need anymore to cancel the requests there.
nbd_co_send_request now handles both stopping and restarting pending
requests after a successful connection, and there is no need to
hold send_mutex in nbd_co_do_establish_connection. The current setup
is confusing because nbd_co_do_establish_connection is called both with
send_mutex taken and without it. Before the patch it uses free_sema which
(at least in theory...) is protected by send_mutex, after the patch it
does not anymore.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414175756.671165-5-pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: wrap long line]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
It is unnecessary to check nbd_client_connected() because every time
s->state is moved out of NBD_CLIENT_CONNECTED the socket is shut down
and all coroutines are resumed.
The only case where it was actually needed is when the NBD
server disconnects and there is no reconnect-delay. In that
case, nbd_receive_replies() does not set s->reply.handle and
nbd_co_do_receive_one_chunk() cannot continue. For that one case,
check the return value of nbd_receive_replies().
As to the others:
* nbd_receive_replies() can put the current coroutine to sleep if another
reply is ongoing; then it will be woken by nbd_channel_error(), called
by the ongoing reply. Or it can try itself to read a reply header and
fail, thus calling nbd_channel_error() itself.
* nbd_co_send_request() will write the body of the request and fail
* nbd_reply_chunk_iter_receive() will call nbd_co_receive_one_chunk()
and then nbd_co_do_receive_one_chunk(), which will handle the failure as
above; or it will just detect a previous call to nbd_iter_channel_error()
via iter->ret < 0.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414175756.671165-4-pbonzini@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
Several coroutine functions in block/nbd.c are not marked as such. This
patch adds a few more markers; it is not exhaustive, but it focuses
especially on:
- places that wake other coroutines, because aio_co_wake() has very
different semantics inside a coroutine (queuing after yield vs. entering
immediately);
- functions with _co_ in their names, to avoid confusion
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414175756.671165-3-pbonzini@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
The .reply_possible field of s->requests is never set to false. This is
not a problem as it is only a safeguard to detect protocol errors,
but it's sloppy. In fact, the field is actually not necessary at all,
because .coroutine is set to NULL in NBD_FOREACH_REPLY_CHUNK after
receiving the last chunk. Thus, replace .reply_possible with .coroutine
and move the check before deciding the fate of this request.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414175756.671165-2-pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
Rename the type to be reused. Old name is "what is it for". To be
natively reused for other needs, let's name it exactly "what is it".
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Message-Id: <20220314213226.362217-2-v.sementsov-og@mail.ru>
[eblake: Adjust S-o-b to Vladimir's new email, with permission]
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: John Snow <jsnow@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
'blockdev-change-medium' is a convinient wrapper for the following
sequence of commands:
* blockdev-open-tray
* blockdev-remove-medium
* blockdev-insert-medium
* blockdev-close-tray
and should be used f.e. to change ISO image inside the CD-ROM tray.
Though the guest could lock the tray and some linux guests like
CentOS 8.5 actually does that. In this case the execution if this
command results in the error like the following:
Device 'scsi0-0-1-0' is locked and force was not specified,
wait for tray to open and try again.
This situation is could be resolved 'blockdev-open-tray' by passing
flag 'force' inside. Thus is seems reasonable to add the same
capability for 'blockdev-change-medium' too.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Acked-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Hanna Reitz <hreitz@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220412221846.280723-1-den@openvz.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Instead of fprint()-ing error messages in rebuild_refcount_structure()
and its rebuild_refcounts_write_refblocks() helper, pass them through an
Error object to qcow2_check_refcounts() (which will then print it).
Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220405134652.19278-4-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
When rebuilding the refcount structures (when qemu-img check -r found
errors with refcount = 0, but reference count > 0), the new refcount
table defaults to being put at the image file end[1]. There is no good
reason for that except that it means we will not have to rewrite any
refblocks we already wrote to disk.
Changing the code to rewrite those refblocks is not too difficult,
though, so let us do that. That is beneficial for images on block
devices, where we cannot really write beyond the end of the image file.
Use this opportunity to add extensive comments to the code, and refactor
it a bit, getting rid of the backwards-jumping goto.
[1] Unless there is something allocated in the area pointed to by the
last refblock, so we have to write that refblock. In that case, we
try to put the reftable in there.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1519071
Closes: https://gitlab.com/qemu-project/qemu/-/issues/941
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220405134652.19278-2-hreitz@redhat.com>
Replace the global variables with inlined helper functions. getpagesize() is very
likely annotated with a "const" function attribute (at least with glibc), and thus
optimization should apply even better.
This avoids the need for a constructor initialization too.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When the stream block job cuts out the nodes between top and base in
stream_prepare(), it does not drain the subtree manually; it fetches the
base node, and tries to insert it as the top node's backing node with
bdrv_set_backing_hd(). bdrv_set_backing_hd() however will drain, and so
the actual base node might change (because the base node is actually not
part of the stream job) before the old base node passed to
bdrv_set_backing_hd() is installed.
This has two implications:
First, the stream job does not keep a strong reference to the base node.
Therefore, if it is deleted in bdrv_set_backing_hd()'s drain (e.g.
because some other block job is drained to finish), we will get a
use-after-free. We should keep a strong reference to that node.
Second, even with such a strong reference, the problem remains that the
base node might change before bdrv_set_backing_hd() actually runs and as
a result the wrong base node is installed.
Both effects can be seen in 030's TestParallelOps.test_overlapping_5()
case, which has five nodes, and simultaneously streams from the middle
node to the top node, and commits the middle node down to the base node.
As it is, this will sometimes crash, namely when we encounter the
above-described use-after-free.
Taking a strong reference to the base node, we no longer get a crash,
but the resuling block graph is less than ideal: The expected result is
obviously that all middle nodes are cut out and the base node is the
immediate backing child of the top node. However, if stream_prepare()
takes a strong reference to its base node (the middle node), and then
the commit job finishes in bdrv_set_backing_hd(), supposedly dropping
that middle node, the stream job will just reinstall it again.
Therefore, we need to keep the whole subtree drained in
stream_prepare(), so that the graph modification it performs is
effectively atomic, i.e. that the base node it fetches is still the base
node when bdrv_set_backing_hd() sets it as the top node's backing node.
Verify this by asserting in said 030's test case that the base node is
always the top node's immediate backing child when both jobs are done.
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220324140907.17192-1-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru>
"0x%u" format is very misleading, replace by "0x%x".
Found running:
$ git grep -E '0x%[0-9]*([lL]*|" ?PRI)[dDuU]' block/
Inspired-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-id: 20220323114718.58714-2-philippe.mathieu.daude@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Hi,
A collection of fixes & cleanup patches that should be safe for 7.0 inclusion.
-----BEGIN PGP SIGNATURE-----
iQJQBAABCAA6FiEEh6m9kz+HxgbSdvYt2ujhCXWWnOUFAmI5vPIcHG1hcmNhbmRy
ZS5sdXJlYXVAcmVkaGF0LmNvbQAKCRDa6OEJdZac5a7ED/9+DCc6b+yAeMsFR7SI
kqxSvPW9RbgQrJo0LrJxX7H+xYs40JFpkNZFhuAGgWPrk6GlebMzg+aMgSlZi4XN
B7y5/dAKUUPCC+kNQ7azP4Gp+xb+Pxg2ZZxQ9SnxsGgPWC1prliiB8Zbvs8f5lHl
ACbh7wvfVOcSJoMaCAf5km4AFzWYQQkwn2w3CRl4CfWnuWUhjnnYL9DfjHrfaYPK
JCbRCx534dy/amrMPgbAOcDRl0K9/9Tw+xATxOkQPLZ4Za4tclsAGZ9Hb2WoDuWS
LYQ1ZJVouv37EnaPVMCyPyC2n4oLJ86L2RCSBqKgIgv7rmwTUcqlfYPVg7TZGxuw
T234lIc8AXcm2UNQ4iTXLH/Od9RGHKseZSF8QYTVGNDtfvp3bDFVT6k5e2X/SpXY
gVloTdFzmwYWM8dtREPepZlEhXNKz7XdltlrcwyDdKWW0OffLRyKkNIsuUja7EoL
q4n8l4tq084iLTHpEUSWaFwZvu89b8n81hML0box6XXrOldk1qdf57Ka5gqxNrnk
pJES7ocRoTANjZgASrJW8vPu3/GkdlmE/Khf5bnOzq/lWMwVxPqYEQY+PRoAU2zR
MS1UJ9IITe3toJlx7+DqR8Lo6fUyralwKv/MUnBW65S45S7VkbCO4anELNnVvzAE
CFfsa30VblNDEbppBMXwRFyX0Q==
=fKgO
-----END PGP SIGNATURE-----
Merge tag 'fixes-pull-request' of gitlab.com:marcandre.lureau/qemu into staging
Fixes and cleanups for 7.0
Hi,
A collection of fixes & cleanup patches that should be safe for 7.0 inclusion.
# gpg: Signature made Tue 22 Mar 2022 12:11:30 GMT
# gpg: using RSA key 87A9BD933F87C606D276F62DDAE8E10975969CE5
# gpg: issuer "marcandre.lureau@redhat.com"
# gpg: Good signature from "Marc-André Lureau <marcandre.lureau@redhat.com>" [full]
# gpg: aka "Marc-André Lureau <marcandre.lureau@gmail.com>" [full]
# Primary key fingerprint: 87A9 BD93 3F87 C606 D276 F62D DAE8 E109 7596 9CE5
* tag 'fixes-pull-request' of gitlab.com:marcandre.lureau/qemu: (21 commits)
qapi: remove needless include
Remove trailing ; after G_DEFINE_AUTO macro
tests: remove needless include
error: use GLib to remember the program name
qga: remove bswap.h include
qapi: remove needless include
meson: fix CONFIG_ATOMIC128 check
meson: move int128 checks from configure
qapi: remove needless include
util: remove the net/net.h dependency
util: remove needless includes
scripts/modinfo-collect: remove unused/dead code
Move HOST_LONG_BITS to compiler.h
Simplify HOST_LONG_BITS
compiler.h: replace QEMU_SENTINEL with G_GNUC_NULL_TERMINATED
compiler.h: replace QEMU_WARN_UNUSED_RESULT with G_GNUC_WARN_UNUSED_RESULT
Replace GCC_FMT_ATTR with G_GNUC_PRINTF
Drop qemu_foo() socket API wrapper
m68k/nios2-semi: fix gettimeofday() result check
vl: typo fix in a comment
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
- iotest fixes:
- Fix some iotests for riscv targets
- Use GNU sed in more places where required
- Meson-related fixes (i.e. to print errors when they occur)
- Have qemu-img calls (from Python tests) generally raise nicely
formattable exceptions on errors
- Fix iotest 207
- Allow RBD images to be growable by writing zeroes past the end of
file, fixing qcow2 on rbd
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEy2LXoO44KeRfAE00ofpA0JgBnN8FAmI5uC4SHGhyZWl0ekBy
ZWRoYXQuY29tAAoJEKH6QNCYAZzfsPIP/1iRUWmWd9h9m7gX5hSU+TWYXsH+Ua/G
fKdDHqNlVjQHq6SDN+A3jDCjAV9I91vlHRVIMWXc4QlKbdNdY7yUP1zYKuRTbfig
UXce6g1NQoZUIlBSHZDqzgEmrvjunwP1U2te2LWliQjmvlTJgdIHYUJn2VgmY3wg
2vo8exE3YO1FArS9nFsiX/1Ju35Dm4+3w9NkI6KxnKvaFpY++jovVVy2Bj7Jmfsh
bRnfiQtDVX+0H4FXYS02hDEX4h7PTzsd1DeapVbiiZW9KJrbb0rchTSGc+VKMdkC
z2XDfU+Hw4jYlNomuWdSZy6qJMNzUaKEqJMji1OiLym4429OAyseL8EO5c6Utjcb
RRqGKWBOp1ceFZcy8vmQ2Rxc7b3Nc/Jv41Ty7PbyHmrtd2nbgD+AVnlFH9qWDtZo
clvFSaIHcHaC4k+MppsbGTKvbW7qRYUkdk1B+tFlZytwQpFvM4oK2CF8jNzoLYfY
qJIvrBqgaBKnYzAGCV4qgH9I0gWY7WHwvwfevHy47rk7XDsErWKMfBFy294TYDyq
+yU6K1VijWDEn/DdQZMSZQJeE7ByA4cfSSYGRmwtTLOZxmUi4mEBEEo6Lw3x4eI2
eXj/fhRenbjfE+5p3xmWvEyaVvvyor9oXUPH45HN/LvGgUJmEsjs0XGwnNNwxJBT
s2lp8U5Lo4Lb
=0cG5
-----END PGP SIGNATURE-----
Merge tag 'pull-block-2022-03-22' of https://gitlab.com/hreitz/qemu into staging
Block patches for 7.0-rc1:
- iotest fixes:
- Fix some iotests for riscv targets
- Use GNU sed in more places where required
- Meson-related fixes (i.e. to print errors when they occur)
- Have qemu-img calls (from Python tests) generally raise nicely
formattable exceptions on errors
- Fix iotest 207
- Allow RBD images to be growable by writing zeroes past the end of
file, fixing qcow2 on rbd
# gpg: Signature made Tue 22 Mar 2022 11:51:10 GMT
# gpg: using RSA key CB62D7A0EE3829E45F004D34A1FA40D098019CDF
# gpg: issuer "hreitz@redhat.com"
# gpg: Good signature from "Hanna Reitz <hreitz@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: CB62 D7A0 EE38 29E4 5F00 4D34 A1FA 40D0 9801 9CDF
* tag 'pull-block-2022-03-22' of https://gitlab.com/hreitz/qemu: (25 commits)
iotests/207: Filter host fingerprint
iotests.py: Filters for VM.run_job()
iotests: make qemu_img_log and img_info_log raise on error
iotests: remove qemu_img_pipe_and_status()
iotests: replace qemu_img_log('create', ...) calls
iotests: use qemu_img() in has_working_luks()
iotests: remove remaining calls to qemu_img_pipe()
iotests/149: Remove qemu_img_pipe() call
iotests: replace unchecked calls to qemu_img_pipe()
iotests: change supports_quorum to use qemu_img
iotests: add qemu_img_map() function
iotests/remove-bitmap-from-backing: use qemu_img_info()
iotests: add qemu_img_info()
iotests: use qemu_img_json() when applicable
iotests: add qemu_img_json()
iotests: fortify compare_images() against crashes
iotests: make qemu_img raise on non-zero rc by default
iotests: Remove explicit checks for qemu_img() == 0
python/utils: add VerboseProcessError
python/utils: add add_visual_margin() text decoration utility
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
One less qemu-specific macro. It also helps to make some headers/units
only depend on glib, and thus moved in standalone projects eventually.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
One less qemu-specific macro. It also helps to make some headers/units
only depend on glib, and thus moved in standalone projects eventually.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Commit d24f80234b ("block/rbd: increase dynamically the image size")
added a workaround to support growing images (eg. qcow2), resizing
the image before write operations that exceed the current size.
We recently added support for write zeroes and without the
workaround we can have problems with qcow2.
So let's move the resize into qemu_rbd_start_co() and do it when
the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20220317162638.41192-1-sgarzare@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
During the IO stress test, the IO request coroutine has a probability that is
can't be awakened when the NBD server is killed.
The GDB stack is as follows:
(gdb) bt
0 0x00007f2ff990cbf6 in __ppoll (fds=0x55575de85000, nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
1 0x000055575c302e7c in qemu_poll_ns (fds=0x55575de85000, nfds=1, timeout=599999603140) at ../util/qemu-timer.c:348
2 0x000055575c2d3c34 in fdmon_poll_wait (ctx=0x55575dc480f0, ready_list=0x7ffd9dd1dae0, timeout=599999603140) at ../util/fdmon-poll.c:80
3 0x000055575c2d350d in aio_poll (ctx=0x55575dc480f0, blocking=true) at ../util/aio-posix.c:655
4 0x000055575c16eabd in bdrv_do_drained_begin(bs=0x55575dee7fe0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true)at ../block/io.c:474
5 0x000055575c16eba6 in bdrv_drained_begin (bs=0x55575dee7fe0) at ../block/io.c:480
6 0x000055575c1aff33 in quorum_del_child (bs=0x55575dee7fe0, child=0x55575dcea690, errp=0x7ffd9dd1dd08) at ../block/quorum.c:1130
7 0x000055575c14239b in bdrv_del_child (parent_bs=0x55575dee7fe0, child=0x55575dcea690, errp=0x7ffd9dd1dd08) at ../block.c:7705
8 0x000055575c12da28 in qmp_x_blockdev_change(parent=0x55575df404c0 "colo-disk0", has_child=true, child=0x55575de867f0 "children.1", has_node=false, no de=0x0, errp=0x7ffd9dd1dd08) at ../blockdev.c:3676
9 0x000055575c258435 in qmp_marshal_x_blockdev_change (args=0x7f2fec008190, ret=0x7f2ff7b0bd98, errp=0x7f2ff7b0bd90) at qapi/qapi-commands-block-core.c :1675
10 0x000055575c2c6201 in do_qmp_dispatch_bh (opaque=0x7f2ff7b0be30) at ../qapi/qmp-dispatch.c:129
11 0x000055575c2ebb1c in aio_bh_call (bh=0x55575dc429c0) at ../util/async.c:141
12 0x000055575c2ebc2a in aio_bh_poll (ctx=0x55575dc480f0) at ../util/async.c:169
13 0x000055575c2d2d96 in aio_dispatch (ctx=0x55575dc480f0) at ../util/aio-posix.c:415
14 0x000055575c2ec07f in aio_ctx_dispatch (source=0x55575dc480f0, callback=0x0, user_data=0x0) at ../util/async.c:311
15 0x00007f2ff9e7cfbd in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
16 0x000055575c2fd581 in glib_pollfds_poll () at ../util/main-loop.c:232
17 0x000055575c2fd5ff in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:255
18 0x000055575c2fd710 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:531
19 0x000055575bfa7588 in qemu_main_loop () at ../softmmu/runstate.c:726
20 0x000055575bbee57a in main (argc=60, argv=0x7ffd9dd1e0e8, envp=0x7ffd9dd1e2d0) at ../softmmu/main.c:50
(gdb) qemu coroutine 0x55575e16aac0
0 0x000055575c2ee7dc in qemu_coroutine_switch (from_=0x55575e16aac0, to_=0x7f2ff830fba0, action=COROUTINE_YIELD) at ../util/coroutine-ucontext.c:302
1 0x000055575c2fe2a9 in qemu_coroutine_yield () at ../util/qemu-coroutine.c:195
2 0x000055575c2fe93c in qemu_co_queue_wait_impl (queue=0x55575dc46170, lock=0x7f2b32ad9850) at ../util/qemu-coroutine-lock.c:56
3 0x000055575c17ddfb in nbd_co_send_request (bs=0x55575ebfaf20, request=0x7f2b32ad9920, qiov=0x55575dfc15d8) at ../block/nbd.c:478
4 0x000055575c17f931 in nbd_co_request (bs=0x55575ebfaf20, request=0x7f2b32ad9920, write_qiov=0x55575dfc15d8) at ../block/nbd.c:1182
5 0x000055575c17fe14 in nbd_client_co_pwritev (bs=0x55575ebfaf20, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, flags=0) at ../block/nbd.c:1284
6 0x000055575c170d25 in bdrv_driver_pwritev (bs=0x55575ebfaf20, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0)
at ../block/io.c:1264
7 0x000055575c1733b4 in bdrv_aligned_pwritev
(child=0x55575dff6890, req=0x7f2b32ad9ad0, offset=403487858688, bytes=4538368, align=1, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:2126
8 0x000055575c173c67 in bdrv_co_pwritev_part (child=0x55575dff6890, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0)
at ../block/io.c:2314
9 0x000055575c17391b in bdrv_co_pwritev (child=0x55575dff6890, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, flags=0) at ../block/io.c:2233
10 0x000055575c1ee506 in replication_co_writev (bs=0x55575e9824f0, sector_num=788062224, remaining_sectors=8864, qiov=0x55575dfc15d8, flags=0)
at ../block/replication.c:270
11 0x000055575c170eed in bdrv_driver_pwritev (bs=0x55575e9824f0, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0)
at ../block/io.c:1297
12 0x000055575c1733b4 in bdrv_aligned_pwritev
(child=0x55575dcea690, req=0x7f2b32ad9e00, offset=403487858688, bytes=4538368, align=512, qiov=0x55575dfc15d8, qiov_offset=0, flags=0)
at ../block/io.c:2126
13 0x000055575c173c67 in bdrv_co_pwritev_part (child=0x55575dcea690, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0)
at ../block/io.c:2314
14 0x000055575c17391b in bdrv_co_pwritev (child=0x55575dcea690, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, flags=0) at ../block/io.c:2233
15 0x000055575c1aeffa in write_quorum_entry (opaque=0x7f2fddaf8c50) at ../block/quorum.c:699
16 0x000055575c2ee4db in coroutine_trampoline (i0=1578543808, i1=21847) at ../util/coroutine-ucontext.c:173
17 0x00007f2ff9855660 in __start_context () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
When we do failover in COLO mode, QEMU will hang while it is waiting for
the in-flight IO. From the call trace, we can see the IO request coroutine
has yielded in nbd_co_send_request(). When we kill the NBD server, it will never
be wake up. Actually, when we do IO stress test, it will have a lot of
requests in free_sema queue. When the NBD server is killed, current
MAX_NBD_REQUESTS finishes with errors but they wake up at most
MAX_NBD_REQEUSTS from the queue. So, let's move qemu_co_queue_next out
to fix this issue.
Signed-off-by: Lei Rao <lei.rao@intel.com>
Message-Id: <20220309074844.275450-1-lei.rao@intel.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
When building on macOS 12 we get:
block/file-posix.c:3335:18: warning: 'IOMasterPort' is deprecated: first deprecated in macOS 12.0 [-Wdeprecated-declarations]
kernResult = IOMasterPort( MACH_PORT_NULL, &masterPort );
^~~~~~~~~~~~
IOMainPort
Replace by IOMainPort, redefining it to IOMasterPort if not available.
Suggested-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed by: Cameron Esfahani <dirty@apple.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Tested-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
- Dan Berrange: Allow qemu-nbd to support TLS over Unix sockets
- Eric Blake: Minor cleanups related to 64-bit block operations
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmImtE8ACgkQp6FrSiUn
Q2ovmgf/aksDqf2eNcahs++fez+8Qi9ll5OY/qGyjnzBgsatYKjrK+xF7OnjoJox
eRX026lh81Q4EQK7oZBUnr2UCY4bncDBTI7MTLh603EV/tId5ZLwx007ERhzvtC1
mIsQHXNuO9X25LQG2eWnfunY9YztQpiT5r/g3khD2yPBqJWIvBfblzPLx6FkF7px
/WM8xEKCihmGr1Wr3b+zGYL083YkaBWCvHoR8mJt3tEFUj+Qie8XcdV0OVyI0XUj
5goIFRcpVwBE8P2nLtfUKNzEXz22cmdonOJUX7E5IvGO21k5F/HrWlHdo8JnuSUZ
t0w5L9yCxBrRpY1burz30b77J0WMCw==
=C8Dd
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2022-03-07' into staging
nbd patches for 2022-03-07
- Dan Berrange: Allow qemu-nbd to support TLS over Unix sockets
- Eric Blake: Minor cleanups related to 64-bit block operations
# gpg: Signature made Tue 08 Mar 2022 01:41:35 GMT
# gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg: aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A
* remotes/ericb/tags/pull-nbd-2022-03-07:
qemu-io: Allow larger write zeroes under no fallback
qemu-io: Utilize 64-bit status during map
nbd/server: Minor cleanups
tests/qemu-iotests: validate NBD TLS with UNIX sockets and PSK
tests/qemu-iotests: validate NBD TLS with UNIX sockets
tests/qemu-iotests: validate NBD TLS with hostname mismatch
tests/qemu-iotests: convert NBD TLS test to use standard filters
tests/qemu-iotests: introduce filter for qemu-nbd export list
tests/qemu-iotests: expand _filter_nbd rules
tests/qemu-iotests: add QEMU_IOTESTS_REGEN=1 to update reference file
block/nbd: don't restrict TLS usage to IP sockets
qemu-nbd: add --tls-hostname option for TLS certificate validation
block/nbd: support override of hostname for TLS certificate validation
block: pass desired TLS hostname through from block driver client
crypto: mandate a hostname when checking x509 creds on a client
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* cleanups of qemu_oom_check() and qemu_memalign()
* target/arm/translate-neon: UNDEF if VLD1/VST1 stride bits are non-zero
* target/arm/translate-neon: Simplify align field check for VLD3
* GICv3 ITS: add more trace events
* GICv3 ITS: implement 8-byte accesses properly
* GICv3: fix minor issues with some trace/log messages
* ui/cocoa: Use the standard about panel
* target/arm: Provide cpu property for controling FEAT_LPA2
* hw/arm/virt: Disable LPA2 for -machine virt-6.2
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmImNs4ZHHBldGVyLm1h
eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3q87D/0cMQeF00uVRNqftrQg2SDI
txJIG2QYUOPMCDfGWlGTfXv2TUc5y3XwA77C9vTcJcIWJlZ30DUa95DNYqA0BbOH
TEOzRuZME64wA/JndHadz7oh+xb3HYn+6aSr63LeQCI3/h1eXVHknnEcyF1danOb
YNB1T308THTEwJHQuKHYksIasgVwcjOf8FvMRYFozVkAKEx1SlabpFXST+aVNyx4
ASsC2PTiJYAqwnYrTX8lWOYKMiKfkNrQcTd6x7rkoDw1pV7ZDMw2/69tpkhdJ5Fa
lwxhwZ3+40x49eFGAhfuZWZmGLd4c+76u64pmWW429uk1JhaoXgErJM3xfHbI1er
d7XSQYkMhDrY5SFuoE5XYwOuxanPtn3f7luM236Uzgf4ZR6qTrf6x+R1xLPZVYa9
fWbjvR3g5sltTOzyc+9UsBq1OPCbRUbmhJtJDvojj5sWmNvgOwZnSkTu5kMAqvFP
T2cQIi6phRBo3oMN/fhEZi3g828JjYEA9QlpWZ74JOyiXjYUq9VVNpoe/dtAv4Yy
wZ+XhVNIK82/4Mxjr9SEeYeNzYrsEEvFAUqe9Bil2CpuIMV5ONEzs+UfQ/gyk4eq
QnGPiojCrpf6PPAfci0Y6b4RzO+loMFpLjCpurngB4g4cBdmThKip0sVZdTZAI9Y
lnusB8MR1sESoqYdPZsAfQ==
=ix0J
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20220307' into staging
target-arm queue:
* cleanups of qemu_oom_check() and qemu_memalign()
* target/arm/translate-neon: UNDEF if VLD1/VST1 stride bits are non-zero
* target/arm/translate-neon: Simplify align field check for VLD3
* GICv3 ITS: add more trace events
* GICv3 ITS: implement 8-byte accesses properly
* GICv3: fix minor issues with some trace/log messages
* ui/cocoa: Use the standard about panel
* target/arm: Provide cpu property for controling FEAT_LPA2
* hw/arm/virt: Disable LPA2 for -machine virt-6.2
# gpg: Signature made Mon 07 Mar 2022 16:46:06 GMT
# gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg: issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate]
# gpg: aka "Peter Maydell <pmaydell@gmail.com>" [ultimate]
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate]
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* remotes/pmaydell/tags/pull-target-arm-20220307:
hw/arm/virt: Disable LPA2 for -machine virt-6.2
target/arm: Provide cpu property for controling FEAT_LPA2
ui/cocoa: Use the standard about panel
hw/intc/arm_gicv3_cpuif: Fix register names in ICV_HPPIR read trace event
hw/intc/arm_gicv3: Fix missing spaces in error log messages
hw/intc/arm_gicv3: Specify valid and impl in MemoryRegionOps
hw/intc/arm_gicv3_its: Add trace events for table reads and writes
hw/intc/arm_gicv3_its: Add trace events for commands
target/arm/translate-neon: Simplify align field check for VLD3
target/arm/translate-neon: UNDEF if VLD1/VST1 stride bits are non-zero
osdep: Move memalign-related functions to their own header
util: Put qemu_vfree() in memalign.c
util: Use meson checks for valloc() and memalign() presence
util: Share qemu_try_memalign() implementation between POSIX and Windows
meson.build: Don't misdetect posix_memalign() on Windows
util: Return valid allocation for qemu_try_memalign() with zero size
util: Unify implementations of qemu_memalign()
util: Make qemu_oom_check() a static function
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The TLS usage for NBD was restricted to IP sockets because validating
x509 certificates requires knowledge of the hostname that the client
is connecting to.
TLS does not have to use x509 certificates though, as PSK (pre-shared
keys) provide an alternative credential option. These have no
requirement for a hostname and can thus be trivially used for UNIX
sockets.
Furthermore, with the ability to overide the default hostname for
TLS validation in the previous patch, it is now also valid to want
to use x509 certificates with FD passing and UNIX sockets.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220304193610.3293146-6-berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
When connecting to an NBD server with TLS and x509 credentials,
the client must validate the hostname it uses for the connection,
against that published in the server's certificate. If the client
is tunnelling its connection over some other channel, however, the
hostname it uses may not match the info reported in the server's
certificate. In such a case, the user needs to explicitly set an
override for the hostname to use for certificate validation.
This is achieved by adding a 'tls-hostname' property to the NBD
block driver.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220304193610.3293146-4-berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
In
commit a71d597b98
Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Date: Thu Jun 10 13:08:00 2021 +0300
block/nbd: reuse nbd_co_do_establish_connection() in nbd_open()
the use of the 'hostname' field from the BDRVNBDState struct was
lost, and 'nbd_connect' just hardcoded it to match the IP socket
address. This was a harmless bug at the time since we block use
with anything other than IP sockets.
Shortly though, we want to allow the caller to override the hostname
used in the TLS certificate checks. This is to allow for TLS
when doing port forwarding or tunneling. Thus we need to reinstate
the passing along of the 'hostname'.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220304193610.3293146-3-berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Move the various memalign-related functions out of osdep.h and into
their own header, which we include only where they are used.
While we're doing this, add some brief documentation comments.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20220226180723.1706285-10-peter.maydell@linaro.org
Current scheme of image fleecing looks like this:
[guest] [NBD export]
| |
|root | root
v v
[copy-before-write] -----> [temp.qcow2]
| target |
|file |backing
v |
[active disk] <-------------+
- On guest writes copy-before-write filter copies old data from active
disk to temp.qcow2. So fleecing client (NBD export) when reads
changed regions from temp.qcow2 image and unchanged from active disk
through backing link.
This patch makes possible new image fleecing scheme:
[guest] [NBD export]
| |
| root | root
v file v
[copy-before-write]<------[snapshot-access]
| |
| file | target
v v
[active-disk] [temp.img]
- copy-before-write does CBW operations and also provides
snapshot-access API. The API may be accessed through
snapshot-access driver.
Benefits of new scheme:
1. Access control: if remote client try to read data that not covered
by original dirty bitmap used on copy-before-write open, client gets
-EACCES.
2. Discard support: if remote client do DISCARD, this additionally to
discarding data in temp.img informs block-copy process to not copy
these clusters. Next read from discarded area will return -EACCES.
This is significant thing: when fleecing user reads data that was
not yet copied to temp.img, we can avoid copying it on further guest
write.
3. Synchronisation between client reads and block-copy write is more
efficient. In old scheme we just rely on BDRV_REQ_SERIALISING flag
used for writes to temp.qcow2. New scheme is less blocking:
- fleecing reads are never blocked: if data region is untouched or
in-flight, we just read from active-disk, otherwise we read from
temp.img
- writes to temp.img are not blocked by fleecing reads
- still, guest writes of-course are blocked by in-flight fleecing
reads, that currently read from active-disk - it's the minimum
necessary blocking
4. Temporary image may be of any format, as we don't rely on backing
feature.
5. Permission relation are simplified. With old scheme we have to share
write permission on target child of copy-before-write, otherwise
backing link conflicts with copy-before-write file child write
permissions. With new scheme we don't have backing link, and
copy-before-write node may have unshared access to temporary node.
(Not realized in this commit, will be in future).
6. Having control on fleecing reads we'll be able to implement
alternative behavior on failed copy-before-write operations.
Currently we just break guest request (that's a historical behavior
of backup). But in some scenarios it's a bad behavior: better
is to drop the backup as failed but don't break guest request.
With new scheme we can simply unset some bits in a bitmap on CBW
failure and further fleecing reads will -EACCES, or something like
this. (Not implemented in this commit, will be in future)
Additional application for this is implementing timeout for CBW
operations.
Iotest 257 output is updated, as two more bitmaps now live in
copy-before-write filter.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20220303194349.2304213-13-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
The new block driver simply utilizes snapshot-access API of underlying
block node.
In further patches we want to use it like this:
[guest] [NBD export]
| |
| root | root
v file v
[copy-before-write]<------[snapshot-access]
| |
| file | target
v v
[active-disk] [temp.img]
This way, NBD client will be able to read snapshotted state of active
disk, when active disk is continued to be written by guest. This is
known as "fleecing", and currently uses another scheme based on qcow2
temporary image which backing file is active-disk. New scheme comes
with benefits - see next commit.
The other possible application is exporting internal snapshots of
qcow2, like this:
[guest] [NBD export]
| |
| root | root
v file v
[qcow2]<---------[snapshot-access]
For this, we'll need to implement snapshot-access API handlers in
qcow2 driver, and improve snapshot-access block driver (and API) to
make it possible to select snapshot by name. Another thing to improve
is size of snapshot. Now for simplicity we just use size of bs->file,
which is OK for backup, but for qcow2 snapshots export we'll need to
imporve snapshot-access API to get size of snapshot.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20220303194349.2304213-12-vsementsov@virtuozzo.com>
[hreitz: Rebased on block GS/IO split]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Add new block driver handlers and corresponding generic wrappers.
It will be used to allow copy-before-write filter to provide
reach fleecing interface in further commit.
In future this approach may be used to allow reading qcow2 internal
snapshots, for example to export them through NBD.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303194349.2304213-11-vsementsov@virtuozzo.com>
[hreitz: Rebased on block GS/IO split]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Add function to wait for all intersecting requests.
To be used in the further commit.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303194349.2304213-10-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Add a convenient function similar with bdrv_block_status() to get
status of dirty bitmap.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303194349.2304213-9-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Split intersecting-requests functionality out of block-copy to be
reused in copy-before-write filter.
Note: while being here, fix tiny typo in MAINTAINERS.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303194349.2304213-7-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Split block_copy_reset() out of block_copy_reset_unallocated() to be
used separately later.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303194349.2304213-6-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
This brings "incremental" mode to copy-before-write filter: user can
specify bitmap so that filter will copy only "dirty" areas.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20220303194349.2304213-5-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
This will be used in the following commit to bring "incremental" mode
to copy-before-write filter.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303194349.2304213-4-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
That simplifies handling failure in existing code and in further new
usage of bdrv_merge_dirty_bitmap().
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303194349.2304213-3-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
We are going to complicate bitmap initialization in the further
commit. And in future, backup job will be able to work without filter
(when source is immutable), so we'll need same bitmap initialization in
copy-before-write filter and in backup job. So, it's reasonable to do
it in block-copy.
Note that for now cbw_open() is the only caller of
block_copy_state_new().
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303194349.2304213-2-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
There is a bug in handling BDRV_REQ_NO_WAIT flag: we still may wait in
wait_serialising_requests() if request is unaligned. And this is
possible for the only user of this flag (preallocate filter) if
underlying file is unaligned to its request_alignment on start.
So, we have to fix preallocate filter to do only aligned preallocate
requests.
Next, we should fix generic block/io.c somehow. Keeping in mind that
preallocate is the only user of BDRV_REQ_NO_WAIT and that we have to
fix its behavior now, it seems more safe to just assert that we never
use BDRV_REQ_NO_WAIT with unaligned requests and add corresponding
comment. Let's do so.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-Id: <20220215121609.38570-1-vsementsov@virtuozzo.com>
[hreitz: Rebased on block GS/IO split]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Coverity points out that we aren't checking the return value
from curl_easy_setopt() for any of the calls to it we make
in block/curl.c.
Some of these options are documented as always succeeding (e.g.
CURLOPT_VERBOSE) but others have documented failure cases (e.g.
CURLOPT_URL). For consistency we check every call, even the ones
that theoretically cannot fail.
Fixes: Coverity CID 1459336, 1459482, 1460331
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220222152341.850419-3-peter.maydell@linaro.org>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
In curl_open(), the 'out' label assumes that the state->errmsg string
has been set (either by curl_easy_perform() or by manually copying a
string into it); however if curl_init_state() fails we will jump to
that label without setting the string. Add the missing error string
setup.
(We can't be specific about the cause of failure: the documentation
of curl_easy_init() just says "If this function returns NULL,
something went wrong".)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220222152341.850419-2-peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Otherwise, the BDS might be freed while the job is running, which would
cause a use-after-free.
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220304153729.711387-5-hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
.bdrv_amend_clean() says block drivers can use it to clean up what was
done in .bdrv_amend_pre_run(). Therefore, it should always be called
after .bdrv_amend_pre_run(), which means we need it to call it in the
JobDriver.free() callback, not in JobDriver.clean().
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220304153729.711387-3-hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_refresh_limits() recurses down to the node's children. That does
not seem necessary: When we refresh limits on some node, and then
recurse down and were to change one of its children's BlockLimits, then
that would mean we noticed the changed limits by pure chance. The fact
that we refresh the parent's limits has nothing to do with it, so the
reason for the change probably happened before this point in time, and
we should have refreshed the limits then.
Consequently, we should actually propagate block limits changes upwards,
not downwards. That is a separate and pre-existing issue, though, and
so will not be addressed in this patch.
The problem with recursing is that bdrv_refresh_limits() is not atomic.
It begins with zeroing BDS.bl, and only then sets proper, valid limits.
If we do not drain all nodes whose limits are refreshed, then concurrent
I/O requests can encounter invalid request_alignment values and crash
qemu. Therefore, a recursing bdrv_refresh_limits() requires the whole
subtree to be drained, which is currently not ensured by most callers.
A non-recursive bdrv_refresh_limits() only requires the node in question
to not receive I/O requests, and this is done by most callers in some
way or another:
- bdrv_open_driver() deals with a new node with no parents yet
- bdrv_set_file_or_backing_noperm() acts on a drained node
- bdrv_reopen_commit() acts only on drained nodes
- bdrv_append() should in theory require the node to be drained; in
practice most callers just lock the AioContext, which should at least
be enough to prevent concurrent I/O requests from accessing invalid
limits
So we can resolve the bug by making bdrv_refresh_limits() non-recursive.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1879437
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20220216105355.30729-2-hreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-27-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
block coroutines functions run in different aiocontext, and are
not protected by the BQL. Therefore are I/O.
On the other side, generated_co_wrapper functions use BDRV_POLL_WHILE,
meaning the caller can either be the main loop or a specific iothread.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-25-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
copy-before-write functions always run under BQL.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-24-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Snapshots run also under the BQL, so they all are
in the global state API. The aiocontext lock that they hold
is currently an overkill and in future could be removed.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-23-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-22-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Following the assertion derived from the API split,
propagate the assertion also in the static functions.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-18-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Mark all I/O functions with IO_CODE, and all "I/O OR GS" with
IO_OR_GS_CODE.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-14-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-13-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Mark all I/O functions with IO_CODE, and all "I/O OR GS" with
IO_OR_GS_CODE.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-10-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All the global state (GS) API functions will check that
qemu_in_main_thread() returns true. If not, it means
that the safety of BQL cannot be guaranteed, and
they need to be moved to I/O.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-9-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Similarly to the previous patches, split block-backend.h
in block-backend-io.h and block-backend-global-state.h
In addition, remove "block/block.h" include as it seems
it is not necessary anymore, together with "qemu/iov.h"
block-backend-common.h contains the structures shared between
the two headers, and the functions that can't be categorized as
I/O or global state.
Assertions are added in the next patch.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-8-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Allow writable exports to get BLK_PERM_RESIZE permission
from creation, in fuse_export_create().
In this way, there is no need to give the permission in
fuse_do_truncate(), which might be run in an iothread.
Permissions should be set only in the main thread, so
in any case if an iothread tries to set RESIZE, it will
be blocked.
Also assert in fuse_do_truncate that if we give the
RESIZE permission we can then restore the original ones.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303151616.325444-7-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Mark all I/O functions with IO_CODE, and all "I/O OR GS" with
IO_OR_GS_CODE.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-6-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All the global state (GS) API functions will check that
qemu_in_main_thread() returns true. If not, it means
that the safety of BQL cannot be guaranteed, and
they need to be moved to I/O.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-5-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
block.h currently contains a mix of functions:
some of them run under the BQL and modify the block layer graph,
others are instead thread-safe and perform I/O in iothreads.
Some others can only be called by either the main loop or the
iothread running the AioContext (and not other iothreads),
and using them in another thread would cause deadlocks, and therefore
it is not ideal to define them as I/O.
It is not easy to understand which function is part of which
group (I/O vs GS vs "I/O or GS"), and this patch aims to clarify it.
The "GS" functions need the BQL, and often use
aio_context_acquire/release and/or drain to be sure they
can modify the graph safely.
The I/O function are instead thread safe, and can run in
any AioContext.
"I/O or GS" functions run instead in the main loop or in
a single iothread, and use BDRV_POLL_WHILE().
By splitting the header in two files, block-io.h
and block-global-state.h we have a clearer view on what
needs what kind of protection. block-common.h
contains common structures shared by both headers.
block.h is left there for legacy and to avoid changing
all includes in all c files that use the block APIs.
Assertions are added in the next patch.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-4-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Following the bdrv_activate renaming, change also the name
of the respective callers.
bdrv_invalidate_cache_all -> bdrv_activate_all
blk_invalidate_cache -> blk_activate
test_sync_op_invalidate_cache -> test_sync_op_activate
No functional change intended.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220209105452.1694545-5-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This function is currently just a wrapper for bdrv_invalidate_cache(),
but in future will contain the code of bdrv_co_invalidate_cache() that
has to always be protected by BQL, and leave the rest in the I/O
coroutine.
Replace all bdrv_invalidate_cache() invokations with bdrv_activate().
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220209105452.1694545-4-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
block_crypto_amend_options_generic_luks uses the block layer
permission API, therefore it should be called with the BQL held.
However, the same function is being called by two BlockDriver
callbacks: bdrv_amend_options (under BQL) and bdrv_co_amend (I/O).
The latter is I/O because it is invoked by block/amend.c's
blockdev_amend_run(), a .run callback of the amend JobDriver.
Therefore we want to change this function to still perform
the permission check, but making sure it is done under BQL regardless
of the caller context.
Remove the permission check in block_crypto_amend_options_generic_luks()
and:
- in block_crypto_amend_options_luks() (BQL case, called by
.bdrv_amend_options()), reuse helper functions
block_crypto_amend_{prepare/cleanup} that take care of checking
permissions.
- for block_crypto_co_amend_luks() (I/O case, called by
.bdrv_co_amend()), don't check for permissions but delegate
.bdrv_amend_pre_run() and .bdrv_amend_clean() to do it,
performing these checks before and after the job runs in its aiocontext.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220209105452.1694545-3-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move the permission API calls into driver-specific callbacks
that always run under BQL. In this case, bdrv_crypto_luks
needs to perform permission checks before and after
qcrypto_block_amend_options(). The problem is that the caller,
block_crypto_amend_options_generic_luks(), can also run in I/O
from .bdrv_co_amend(). This does not comply with Global State-I/O API split,
as permissions API must always run under BQL.
Firstly, introduce .bdrv_amend_pre_run() and .bdrv_amend_clean()
callbacks. These two callbacks are guaranteed to be invoked under
BQL, respectively before and after .bdrv_co_amend().
They take care of performing the permission checks
in the same way as they are currently done before and after
qcrypto_block_amend_options().
These callbacks are in preparation for next patch, where we
delete the original permission check. Right now they just add redundant
control.
Then, call .bdrv_amend_pre_run() before job_start in
qmp_x_blockdev_amend(), so that it will be run before the job coroutine
is created and stay in the main loop.
As a cleanup, use JobDriver's .clean() callback to call
.bdrv_amend_clean(), and run amend-specific cleanup callbacks under BQL.
After this patch, permission failures occur early in the blockdev-amend
job to update a LUKS volume's keys. iotest 296 must now expect them in
x-blockdev-amend's QMP reply instead of waiting for the actual job to
fail later.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220209105452.1694545-2-eesposit@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220304153729.711387-6-hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
- Improves documentation of SSH fingerprint checking
- Fixes SHA256 fingerprints with non-blockdev usage
- Blocks the clone3, setns, unshare & execveat syscalls
with seccomp
- Blocks process spawning via clone syscall, but allows
threads, with seccomp
- Takes over seccomp maintainer role
- Expands firmware descriptor spec to allow flash
without NVRAM
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE2vOm/bJrYpEtDo4/vobrtBUQT98FAmIOOBkACgkQvobrtBUQ
T9/ruhAAr8jkAH8FN5ftx2/L7q8SHpjPupue1CJ0Nl/ykmYhTGc+SqC3R2nZWOk2
Ws8hHVcDVT1lhrGxPtU7o+JPC1TebJTsloimJoKQY3qfdvZadJeR/4KsOUzi2ruu
VZ6HiYvZc1c9T+NPf3QRhBo7yyascKWKWHDseUNIt/2DiefCox4QFUDDMG86HiQF
KK30xWTvwJdcPxRlbfZbWRoqA0v4OoSDK6Ftp94FQSNBkExO85kstDq3xVaApf8H
DE1QD7gf+dvz11wVuFhrf4d1EH032nU0p0kMxhABc4/kZXo5iWXohhzML3/MUEVT
pe5/9pzUdWpfXQd/2r7x2PyPgySAG7lGbkgltowY52qnRPaNw9ukwkFCFAj8wiD8
FT2ghvkYD3zLfnZ3nuuzJVjf3pXgCc5VcfXaoffT72a7gpI1LTuEqPFwo04imV4l
21fYFx26mYTGCLH1CwVw8MQ2z/dg6uorT/NHdmRA/KrYJ1Elay2K7DV3Z5jOM5MI
0Ll5HkfsUut+1rioUjNgmlQ+96k/G0P0hVUoTUIcgl3U/GDx2+ypcrNTfmEcaCLV
bOhsjtrcg/KAXsCSbvnfDe3bWf0txnscyqoilEzDahLvciWG3d6qlhczLy29LGb4
/w7iqnUcSygXc+a9/ckVo1h5fo0i9qb3W8Pw9klapvz6SGJ83g4=
=PeCY
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/berrange-gitlab/tags/misc-next-pull-request' into staging
This misc series of changes:
- Improves documentation of SSH fingerprint checking
- Fixes SHA256 fingerprints with non-blockdev usage
- Blocks the clone3, setns, unshare & execveat syscalls
with seccomp
- Blocks process spawning via clone syscall, but allows
threads, with seccomp
- Takes over seccomp maintainer role
- Expands firmware descriptor spec to allow flash
without NVRAM
# gpg: Signature made Thu 17 Feb 2022 11:57:13 GMT
# gpg: using RSA key DAF3A6FDB26B62912D0E8E3FBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>" [full]
# gpg: aka "Daniel P. Berrange <berrange@redhat.com>" [full]
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF
* remotes/berrange-gitlab/tags/misc-next-pull-request:
docs: expand firmware descriptor to allow flash without NVRAM
MAINTAINERS: take over seccomp from Eduardo Otubo
seccomp: block setns, unshare and execveat syscalls
seccomp: block use of clone3 syscall
seccomp: fix blocking of process spawning
seccomp: add unit test for seccomp filtering
seccomp: allow action to be customized per syscall
block: print the server key type and fingerprint on failure
block: support sha256 fingerprint with pre-blockdev options
block: better document SSH host key fingerprint checking
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
When validating the server key fingerprint fails, it is difficult for
the user to know what they got wrong. The fingerprint accepted by QEMU
is received in a different format than OpenSSH displays. There can also
be keys for multiple different ciphers in known_hosts. It may not be
obvious which cipher QEMU will use and whether it will be the same
as OpenSSH. Address this by printing the server key type and its
corresponding fingerprint in the format QEMU accepts.
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
When support for sha256 fingerprint checking was aded in
commit bf783261f0
Author: Daniel P. Berrangé <berrange@redhat.com>
Date: Tue Jun 22 12:51:56 2021 +0100
block/ssh: add support for sha256 host key fingerprints
it was only made to work with -blockdev. Getting it working with
-drive requires some extra custom parsing.
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
v2: add my s-o-b marks to each commit
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEi5wmzbL9FHyIDoahVh8kwfGfefsFAmIGYU8ACgkQVh8kwfGf
efslpQ/+OAEgA8z2frIOzoa5PKmyWDaE4xOAsr595EzhSnDeLi1QVIqSSXFap/m6
1rzqglD+5udM9LT9FZBHeMDbWr1UqvODZTAah6/T8E6vVzXm9Jn+CivEj4kT4Dd2
2Mrg3ydjkZsVivs1el4eG7QZ88TvSZkoNAIlT9lVYuT/TaRBKohrms1B4h25wqn9
u78tnbdAdEGtlHUFy9qyOXPt0eGRt6a1CTo+u0wwJLxKjLpq8cht0DPA4/CQhZay
Jm7cudJAJyTFxT7EUxpfROj4KuSMsUDmF/pNE8qwQuSnZC9G9u67ImhWpMFmwEZl
yCAJbOTudWYefgZ3+BtKMH5SBVLA6AsQ60luWkXkqK1dUB4cWxt5lG/UAm1OMJsn
FL6w3WJFkRTIC1OtHqqJnBtDFHWhTxNX9jjcAdb6lpjz4/wKywUVs+l5aSbxswog
DMrE1XT+JIab1IqAnSGw0QTY0H9COiQyOlsJ8EjtFFsqwLACSX4qBowesGfoA7IF
N0ZorAs6QuYYzEaSbRG1vB1lAusSCF5s/UnTg1mDLFWSqJOsqZN7jxXyQbbiFhsN
f+OTlyIwv4t32eGLRrKHF0vaWUoKQ3skB8m9UFMdQbGBO/LakZL9L42ITxarzUMI
CVeGLAzs3KvbZUlUyJWl97F2KkdrornmM/9CB57sjw13K94oNYo=
=+lgn
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/vsementsov/tags/pull-nbd-2022-02-09-v2' into staging
nbd: handle AioContext change correctly
v2: add my s-o-b marks to each commit
# gpg: Signature made Fri 11 Feb 2022 13:14:55 GMT
# gpg: using RSA key 8B9C26CDB2FD147C880E86A1561F24C1F19F79FB
# gpg: Good signature from "Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 8B9C 26CD B2FD 147C 880E 86A1 561F 24C1 F19F 79FB
* remotes/vsementsov/tags/pull-nbd-2022-02-09-v2:
iotests/281: Let NBD connection yield in iothread
block/nbd: Move s->ioc on AioContext change
iotests/281: Test lingering timers
iotests.py: Add QemuStorageDaemon class
block/nbd: Assert there are no timers when closed
block/nbd: Delete open timer when done
block/nbd: Delete reconnect delay timer when done
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
s->ioc must always be attached to the NBD node's AioContext. If that
context changes, s->ioc must be attached to the new context.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2033626
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Our two timers must not remain armed beyond nbd_clear_bdrvstate(), or
they will access freed data when they fire.
This patch is separate from the patches that actually fix the issue
(HEAD^^ and HEAD^) so that you can run the associated regression iotest
(281) on a configuration that reproducibly exposes the bug.
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
We start the open timer to cancel the connection attempt after a while.
Once nbd_do_establish_connection() has returned, the attempt is over,
and we no longer need the timer.
Delete it before returning from nbd_open(), so that it does not persist
for longer. It has no use after nbd_open(), and just like the reconnect
delay timer, it might well be dangerous if it were to fire afterwards.
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
We start the reconnect delay timer to cancel the reconnection attempt
after a while. Once nbd_co_do_establish_connection() has returned, this
attempt is over, and we no longer need the timer.
Delete it before returning from nbd_reconnect_attempt(), so that it does
not persist beyond the I/O request that was paused for reconnecting; we
do not want it to fire in a drained section, because all sort of things
can happen in such a section (e.g. the AioContext might be changed, and
we do not want the timer to fire in the wrong context; or the BDS might
even be deleted, and so the timer CB would access already-freed data).
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
For a long time, we assumed that libxml2 is necessary for parallels
block format support (block/parallels*). However, this format actually
does not use libxml [*]. Since this is the only user of libxml2 in
whole QEMU tree, we can drop all libxml2 checks and dependencies too.
It is even more: --enable-parallels configure option was the only
option which was silently ignored when it's (fake) dependency
(libxml2) isn't installed.
Drop all mentions of libxml2.
[*] Actually the basis for libxml use were introduced in commit
ed279a06c5 ("configure: add dependency") but the implementation
was never merged:
https://lore.kernel.org/qemu-devel/70227bbd-a517-70e9-714f-e6e0ec431be9@openvz.org/
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20220119090423.149315-1-mjt@msgid.tls.msk.ru>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
[PMD: Updated description and adapted to use lcitool]
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220121154134.315047-5-f4bug@amsat.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20220204204335.1689602-9-alex.bennee@linaro.org>
librbd had a bug until early 2022 that affected all versions of ceph that
supported fast-diff. This bug results in reporting of incorrect offsets
if the offset parameter to rbd_diff_iterate2 is not object aligned.
This patch works around this bug for pre Quincy versions of librbd.
Fixes: 0347a8fd4c
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <20220113144426.4036493-3-pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
the assumption that we can't hit a hole if we do not diff against a snapshot was wrong.
We can see a hole in an image if we diff against base if there exists an older snapshot
of the image and we have discarded blocks in the image where the snapshot has data.
Fix this by simply handling a hole like an unallocated area. There are no callbacks
for unallocated areas so just bail out if we hit a hole.
Fixes: 0347a8fd4c
Suggested-by: Ilya Dryomov <idryomov@gmail.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <20220113144426.4036493-2-pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When building on FreeBSD we get:
[816/6851] Compiling C object libblockdev.fa.p/block_export_fuse.c.o
../block/export/fuse.c:628:16: error: use of undeclared identifier 'FALLOC_FL_KEEP_SIZE'
if (mode & FALLOC_FL_KEEP_SIZE) {
^
../block/export/fuse.c:651:16: error: use of undeclared identifier 'FALLOC_FL_PUNCH_HOLE'
if (mode & FALLOC_FL_PUNCH_HOLE) {
^
../block/export/fuse.c:652:22: error: use of undeclared identifier 'FALLOC_FL_KEEP_SIZE'
if (!(mode & FALLOC_FL_KEEP_SIZE)) {
^
3 errors generated.
FAILED: libblockdev.fa.p/block_export_fuse.c.o
Meson indeed reported FALLOC_FL_PUNCH_HOLE is not available:
C compiler for the host machine: cc (clang 10.0.1 "FreeBSD clang version 10.0.1")
Checking for function "fallocate" : NO
Checking for function "posix_fallocate" : YES
Header <linux/falloc.h> has symbol "FALLOC_FL_PUNCH_HOLE" : NO
Header <linux/falloc.h> has symbol "FALLOC_FL_ZERO_RANGE" : NO
...
Similarly to commit 304332039 ("block/export/fuse.c: fix musl build"),
guard the code requiring FALLOC_FL_KEEP_SIZE / FALLOC_FL_PUNCH_HOLE
definitions under CONFIG_FALLOCATE_PUNCH_HOLE #ifdef'ry.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220201112655.344373-3-f4bug@amsat.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In order to safely maintain a mixture of #ifdef'ry with if-else-if
ladder, rearrange the last statement (!mode) first. Since it is
mutually exclusive with the other conditions, checking it first
doesn't make any logical difference, but allows to add #ifdef'ry
around in a more cleanly way.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220201112655.344373-2-f4bug@amsat.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The vhost-user-blk export runs requests asynchronously in their own
coroutine. When the vhost connection goes away and we want to stop the
vhost-user server, we need to wait for these coroutines to stop before
we can unmap the shared memory. Otherwise, they would still access the
unmapped memory and crash.
This introduces a refcount to VuServer which is increased when spawning
a new request coroutine and decreased before the coroutine exits. The
memory is only unmapped when the refcount reaches zero.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20220125151435.48792-1-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
After migration, the permissions the guest device wants to impose on its
BlockBackend are stored in blk->perm and blk->shared_perm. In
blk_root_activate(), we take our permissions, but keep all shared
permissions open by calling `blk_set_perm(blk->perm, BLK_PERM_ALL)`.
Only afterwards (immediately or later, depending on the runstate) do we
restrict the shared permissions by calling
`blk_set_perm(blk->perm, blk->shared_perm)`. Unfortunately, our first
call with shared_perm=BLK_PERM_ALL has overwritten blk->shared_perm to
be BLK_PERM_ALL, so this is a no-op and the set of shared permissions is
not restricted.
Fix this bug by saving the set of shared permissions before invoking
blk_set_perm() with BLK_PERM_ALL and restoring it afterwards.
Fixes: 5f7772c4d0
("block-backend: Defer shared_perm tightening migration
completion")
Reported-by: Peng Liang <liangpeng10@huawei.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20211125135317.186576-2-hreitz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Peng Liang <liangpeng10@huawei.com>
If image doesn't have any compressed cluster we can easily switch to
zlib compression, which may allow to downgrade the image.
That's mostly needed to support IMGOPTS='compression_type=zstd' in some
iotests which do qcow2 downgrade.
While being here also fix checkpatch complain against '#' in printf
formatting.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20211223160144.1097696-13-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
We update the block-status cache whenever we get new information from a
bdrv_co_block_status() call to the block driver. However, if we have
passed want_zero=false to that call, it may flag areas containing zeroes
as data, and so we would update the block-status cache with wrong
information.
Therefore, we should not update the cache with want_zero=false.
Reported-by: Nir Soffer <nsoffer@redhat.com>
Fixes: 0bc329fbb0 ("block: block-status cache for data regions")
Reviewed-by: Nir Soffer <nsoffer@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220118170000.49423-2-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
First, this permission never protected a node from being changed, as
generic child-replacing functions don't check it.
Second, it's a strange thing: it presents a permission of parent node
to change its child. But generally, children are replaced by different
mechanisms, like jobs or qmp commands, not by nodes.
Graph-mod permission is hard to understand. All other permissions
describe operations which done by parent node on its child: read,
write, resize. Graph modification operations are something completely
different.
The only place where BLK_PERM_GRAPH_MOD is used as "perm" (not shared
perm) is mirror_start_job, for s->target. Still modern code should use
bdrv_freeze_backing_chain() to protect from graph modification, if we
don't do it somewhere it may be considered as a bug. So, it's a bit
risky to drop GRAPH_MOD, and analyzing of possible loss of protection
is hard. But one day we should do it, let's do it now.
One more bit of information is that locking the corresponding byte in
file-posix doesn't make sense at all.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210902093754.2352-1-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The calculation in sector2cluster() is done relative to the offset of
the root directory. Any writes to blocks before the start of the root
directory (in particular, writes to the FAT) result in negative values,
which are not handled correctly in vvfat_write().
This changes sector2cluster() to return a signed value, and makes sure
that vvfat_write() doesn't try to find mappings for negative cluster
number. It clarifies the code in vvfat_write() to make it more obvious
that the cluster numbers can be negative.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211209152231.23756-1-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The size of the qcow size was calculated so that only the FAT partition
would fit on it, but not the whole disk. However, offsets relative to
the whole disk are used to access it, so increase its size to be large
enough for that.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211209151815.23495-1-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>