Block drivers may optimize I/O requests accessing buffers previously
registered with bdrv_register_buf(). Checking whether all elements of a
request's QEMUIOVector are within previously registered buffers is
expensive, so we need a hint from the user to avoid costly checks.
Add a BDRV_REQ_REGISTERED_BUF request flag to indicate that all
QEMUIOVector elements in an I/O request are known to be within
previously registered buffers.
Always pass the flag through to driver read/write functions. There is
little harm in passing the flag to a driver that does not use it.
Passing the flag to drivers avoids changes across many block drivers.
Filter drivers would need to explicitly support the flag and pass
through to their children when the children support it. That's a lot of
code changes and it's hard to remember to do that everywhere, leading to
silent reduced performance when the flag is accidentally dropped.
The only problematic scenario with the approach in this patch is when a
driver passes the flag through to internal I/O requests that don't use
the same I/O buffer. In that case the hint may be set when it should
actually be clear. This is a rare case though so the risk is low.
Some drivers have assert(!flags), which no longer works when
BDRV_REQ_REGISTERED_BUF is passed in. These assertions aren't very
useful anyway since the functions are called almost exclusively by
bdrv_driver_preadv/pwritev() so if we get flags handling right there
then the assertion is not needed.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20221013185908.1297568-7-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Now that the API offers also _locked() functions, take advantage
of it and give also the caller control to take the lock and call
_locked functions.
This makes sense especially when we have for loops, because it
makes no sense to have:
for(job = job_next(); ...)
where each job_next() takes the lock internally.
Instead we want
JOB_LOCK_GUARD();
for(job = job_next_locked(); ...)
In addition, protect also direct field accesses, by either creating a
new critical section or widening the existing ones.
Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20220926093214.506243-12-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Callers of coroutine_fn must be coroutine_fn themselves, or the call
must be within "if (qemu_in_coroutine())". Apply coroutine_fn to
functions where this holds.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220922084924.201610-3-pbonzini@redhat.com>
[kwolf: Fixed up coding style]
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Swap 'buf' and 'bytes' around for consistency with
blk_co_{pread,pwrite}(), and in preparation to implement these functions
using generated_co_wrapper.
Callers were updated using this Coccinelle script:
@@ expression blk, offset, buf, bytes, flags; @@
- blk_pread(blk, offset, buf, bytes, flags)
+ blk_pread(blk, offset, bytes, buf, flags)
@@ expression blk, offset, buf, bytes, flags; @@
- blk_pwrite(blk, offset, buf, bytes, flags)
+ blk_pwrite(blk, offset, bytes, buf, flags)
It had no effect on hw/block/nand.c, presumably due to the #if, so that
file was updated manually.
Overly-long lines were then fixed by hand.
Signed-off-by: Alberto Faria <afaria@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220705161527.1054072-4-afaria@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
For consistency with other I/O functions, and in preparation to
implement it using generated_co_wrapper.
Callers were updated using this Coccinelle script:
@@ expression blk, offset, buf, bytes; @@
- blk_pread(blk, offset, buf, bytes)
+ blk_pread(blk, offset, buf, bytes, 0)
It had no effect on hw/block/nand.c, presumably due to the #if, so that
file was updated manually.
Overly-long lines were then fixed by hand.
Signed-off-by: Alberto Faria <afaria@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220705161527.1054072-3-afaria@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
They currently return the value of their 'bytes' parameter on success.
Make them return 0 instead, for consistency with other I/O functions and
in preparation to implement them using generated_co_wrapper. This also
makes it clear that short reads/writes are not possible.
Signed-off-by: Alberto Faria <afaria@redhat.com>
Message-Id: <20220705161527.1054072-2-afaria@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
This function is safe to call in an I/O context, and qcow2_do_open()
does so (invoked in an I/O context by qcow2_co_invalidate_cache()).
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220427114057.36651-2-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Replace the global variables with inlined helper functions. getpagesize() is very
likely annotated with a "const" function attribute (at least with glibc), and thus
optimization should apply even better.
This avoids the need for a constructor initialization too.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-29-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-27-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
block coroutines functions run in different aiocontext, and are
not protected by the BQL. Therefore are I/O.
On the other side, generated_co_wrapper functions use BDRV_POLL_WHILE,
meaning the caller can either be the main loop or a specific iothread.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-25-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Following the assertion derived from the API split,
propagate the assertion also in the static functions.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-18-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We want to be sure that the functions that write the child and
parent list of a bs are under BQL and drain.
BQL prevents from concurrent writings from the GS API, while
drains protect from I/O.
TODO: drains are missing in some functions using this assert.
Therefore a proper assertion will fail. Because adding drains
requires additional discussions, they will be added in future
series.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-15-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Mark all I/O functions with IO_CODE, and all "I/O OR GS" with
IO_OR_GS_CODE.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-14-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-13-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that we "covered" the three main cases where the
permission API was being used under BQL (fuse,
amend and invalidate_cache), we can safely assert for
the permission functions implemented in block.c
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-11-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Mark all I/O functions with IO_CODE, and all "I/O OR GS" with
IO_OR_GS_CODE.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-6-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All the global state (GS) API functions will check that
qemu_in_main_thread() returns true. If not, it means
that the safety of BQL cannot be guaranteed, and
they need to be moved to I/O.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-5-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
block.h currently contains a mix of functions:
some of them run under the BQL and modify the block layer graph,
others are instead thread-safe and perform I/O in iothreads.
Some others can only be called by either the main loop or the
iothread running the AioContext (and not other iothreads),
and using them in another thread would cause deadlocks, and therefore
it is not ideal to define them as I/O.
It is not easy to understand which function is part of which
group (I/O vs GS vs "I/O or GS"), and this patch aims to clarify it.
The "GS" functions need the BQL, and often use
aio_context_acquire/release and/or drain to be sure they
can modify the graph safely.
The I/O function are instead thread safe, and can run in
any AioContext.
"I/O or GS" functions run instead in the main loop or in
a single iothread, and use BDRV_POLL_WHILE().
By splitting the header in two files, block-io.h
and block-global-state.h we have a clearer view on what
needs what kind of protection. block-common.h
contains common structures shared by both headers.
block.h is left there for legacy and to avoid changing
all includes in all c files that use the block APIs.
Assertions are added in the next patch.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-4-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Split bdrv_co_invalidate cache in two: the Global State (under BQL)
code that takes care of permissions and running GS callbacks,
and leave only the I/O code (->bdrv_co_invalidate_cache) running in
the I/O coroutine.
The only side effect is that bdrv_co_invalidate_cache is not
recursive anymore, and so is every direct call to
bdrv_invalidate_cache().
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220209105452.1694545-6-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Following the bdrv_activate renaming, change also the name
of the respective callers.
bdrv_invalidate_cache_all -> bdrv_activate_all
blk_invalidate_cache -> blk_activate
test_sync_op_invalidate_cache -> test_sync_op_activate
No functional change intended.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220209105452.1694545-5-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This function is currently just a wrapper for bdrv_invalidate_cache(),
but in future will contain the code of bdrv_co_invalidate_cache() that
has to always be protected by BQL, and leave the rest in the I/O
coroutine.
Replace all bdrv_invalidate_cache() invokations with bdrv_activate().
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220209105452.1694545-4-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Graph modifications should be done in drained section. stream_prepare()
handler of block stream job call bdrv_set_backing_hd() without using
drained section and it's theoretically possible that some IO request
will interleave with graph modification and will use outdated pointers
to removed block nodes.
Some other callers use bdrv_set_backing_hd() not caring about drained
sections too. So it seems good to make a drained section exactly in
bdrv_set_backing_hd().
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20220124173741.2984056-1-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
First, this permission never protected a node from being changed, as
generic child-replacing functions don't check it.
Second, it's a strange thing: it presents a permission of parent node
to change its child. But generally, children are replaced by different
mechanisms, like jobs or qmp commands, not by nodes.
Graph-mod permission is hard to understand. All other permissions
describe operations which done by parent node on its child: read,
write, resize. Graph modification operations are something completely
different.
The only place where BLK_PERM_GRAPH_MOD is used as "perm" (not shared
perm) is mirror_start_job, for s->target. Still modern code should use
bdrv_freeze_backing_chain() to protect from graph modification, if we
don't do it somewhere it may be considered as a bug. So, it's a bit
risky to drop GRAPH_MOD, and analyzing of possible loss of protection
is hard. But one day we should do it, let's do it now.
One more bit of information is that locking the corresponding byte in
file-posix doesn't make sense at all.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210902093754.2352-1-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_backing_overridden is only used in block.c, so there is
no need to leave it in block_int.h
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20211215121140.456939-2-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In most of the block layer, especially when traversing down from other
BlockDriverStates, we assume that BdrvChild.bs can never be NULL. When
it becomes NULL, it is expected that the corresponding BdrvChild pointer
also becomes NULL and the BdrvChild object is freed.
Therefore, once bdrv_replace_child_noperm() sets the BdrvChild.bs
pointer to NULL, it should also immediately set the corresponding
BdrvChild pointer (like bs->file or bs->backing) to NULL.
In that context, it also makes sense for this function to free the
child. Sometimes we cannot do so, though, because it is called in a
transactional context where the caller might still want to reinstate the
child in the abort branch (and free it only on commit), so this behavior
has to remain optional.
In bdrv_replace_child_tran()'s abort handler, we now rely on the fact
that the BdrvChild passed to bdrv_replace_child_tran() must have had a
non-NULL .bs pointer initially. Make a note of that and assert it.
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20211111120829.81329-10-hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211115145409.176785-10-kwolf@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
As of a future commit, bdrv_replace_child_noperm() will clear the
indirect BdrvChild pointer passed to it if the new child BDS is NULL.
bdrv_replace_child_tran() will want to let it do that, but revert this
change in its abort handler. For that, we need to have it receive a
BdrvChild ** pointer, too, and keep it stored in the
BdrvReplaceChildState object that we attach to the transaction.
Note that we do not need to store it in the BdrvReplaceChildState when
new_bs is not NULL, because then there is nothing to revert. This is
important so that bdrv_replace_node_noperm() can pass a pointer to a
loop-local variable to bdrv_replace_child_tran() without worrying that
this pointer will outlive one loop iteration.
(Of course, for that to work, bdrv_replace_node_noperm() and in turn
bdrv_replace_node() and its relatives may not be called with a NULL @to
node. Luckily, they already are not, but now we should assert this.)
bdrv_remove_file_or_backing_child() on the other hand needs to ensure
that the indirect pointer it passes will stay valid for the duration of
the transaction. Ensure this by keeping a strong reference to the BDS
whose &bs->backing or &bs->file it passes to bdrv_replace_child_tran(),
and giving up that reference only in the transaction .clean() handler.
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20211111120829.81329-9-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211115145409.176785-9-kwolf@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
As of a future patch, bdrv_replace_child_tran() will take a BdrvChild **
pointer. Prepare for that by getting such a pointer and using it where
applicable, and (dereferenced) as a parameter for
bdrv_replace_child_tran().
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20211111120829.81329-7-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211115145409.176785-7-kwolf@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
bdrv_replace_child_noperm() modifies BdrvChild.bs, and can potentially
set it to NULL. That is dangerous, because BDS parents generally assume
that their children's .bs pointer is never NULL. We therefore want to
let bdrv_replace_child_noperm() set the corresponding BdrvChild pointer
to NULL, too.
This patch lays the foundation for it by passing a BdrvChild ** pointer
to bdrv_replace_child_noperm() so that it can later use it to NULL the
BdrvChild pointer immediately after setting BdrvChild.bs to NULL.
(We will still need to undertake some intermediate steps, though.)
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20211111120829.81329-6-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211115145409.176785-6-kwolf@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
bdrv_attach_child_common_abort() restores the parent's AioContext. To
do so, the child (which was supposed to be attached, but is now detached
again by this abort handler) is added to the ignore list for the
AioContext changing functions.
However, since we modify a BDS's children list in the BdrvChildClass's
.attach and .detach handlers, the child is already effectively detached
from the parent by this point. We do not need to put it into the ignore
list.
Use this opportunity to clean up the empty line structure: Keep setting
the ignore list, invoking the AioContext function, and freeing the
ignore list in blocks separated by empty lines.
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211111120829.81329-5-hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211115145409.176785-5-kwolf@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Now that bdrv_remove_empty_child() no longer removes the child from the
parent's children list but only checks that it is not in such a list, it
is only a wrapper around bdrv_child_free() that checks that the child is
empty and unused. That should apply to all children that we free, so
put those checks into bdrv_child_free() and drop
bdrv_remove_empty_child().
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211111120829.81329-4-hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211115145409.176785-4-kwolf@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
The children list is specific to BDS parents. We should not modify it
in the general children modification code, but let BDS parents deal with
it in their .attach() and .detach() methods.
This also has the advantage that a BdrvChild is removed from the
children list before its .bs pointer can become NULL. BDS parents
generally assume that their children's .bs pointer is never NULL, so
this is actually a bug fix.
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211111120829.81329-3-hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211115145409.176785-3-kwolf@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Using blockdev-snapshot to append a node as an overlay to itself, or to
any of its parents, causes crashes. Catch the condition and return an
error for these cases instead.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1824363
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211018134714.48438-1-kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Use bdrv_new_open_driver_opts() instead of complicated bdrv_open().
Among other extra things bdrv_open() also check for white-listed
formats, which we don't want for internal node creation: currently
backup doesn't work when copy-before-write filter is not white-listed.
As well block-stream doesn't work when copy-on-read is not
white-listed.
Fixes: 751cec7a26
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2004812
Reported-by: Yanan Fu
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210920115538.264372-5-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
- options & flags is common pair for open-like functions, let's use it
- add a comment that specifies use of @options
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210920115538.264372-4-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
- use ERRP_GUARD(): function calls error_prepend(), so it must use
ERRP_GUARD(), otherwise error_prepend() would not be called when
passed errp is error_fatal
- drop error propagation, handle return code instead
- for symmetry, do error_prepend() for the second failure
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210920115538.264372-3-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add version of bdrv_new_open_driver() that supports QDict options.
We'll use it in further commit.
Simply add one more argument to bdrv_new_open_driver() is worse, as
there are too many invocations of bdrv_new_open_driver() to update
then.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210920115538.264372-2-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We must not inactivate child when parent has write permissions on
it.
Calling .bdrv_inactivate() doesn't help: actually only qcow2 has this
handler and it is used to flush caches, not for permission
manipulations.
So, let's simply check cumulative parent permissions before
inactivating the node.
This commit fixes a crash when we do migration during backup: prior to
the commit nothing prevents all nodes inactivation at migration finish
and following backup write to the target crashes on assertion
"assert(!(bs->open_flags & BDRV_O_INACTIVE));" in
bdrv_co_write_req_prepare().
After the commit, we rely on the fact that copy-before-write filter
keeps write permission on target node to be able to write to it. So
inactivation fails and migration fails as expected.
Corresponding test now passes, so, enable it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210911120027.8063-3-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
As we have attempted before
(https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html,
"file-posix: Cache lseek result for data regions";
https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html,
"file-posix: Cache next hole"), this patch seeks to reduce the number of
SEEK_DATA/HOLE operations the file-posix driver has to perform. The
main difference is that this time it is implemented as part of the
general block layer code.
The problem we face is that on some filesystems or in some
circumstances, SEEK_DATA/HOLE is unreasonably slow. Given the
implementation is outside of qemu, there is little we can do about its
performance.
We have already introduced the want_zero parameter to
bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls
unless we really want zero information; but sometimes we do want that
information, because for files that consist largely of zero areas,
special-casing those areas can give large performance boosts. So the
real problem is with files that consist largely of data, so that
inquiring the block status does not gain us much performance, but where
such an inquiry itself takes a lot of time.
To address this, we want to cache data regions. Most of the time, when
bad performance is reported, it is in places where the image is iterated
over from start to end (qemu-img convert or the mirror job), so a simple
yet effective solution is to cache only the current data region.
(Note that only caching data regions but not zero regions means that
returning false information from the cache is not catastrophic: Treating
zeroes as data is fine. While we try to invalidate the cache on zero
writes and discards, such incongruences may still occur when there are
other processes writing to the image.)
We only use the cache for nodes without children (i.e. protocol nodes),
because that is where the problem is: Drivers that rely on block-status
implementations outside of qemu (e.g. SEEK_DATA/HOLE).
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210812084148.14458-3-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[hreitz: Added `local_file == bs` assertion, as suggested by Vladimir]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Add function to transactionally replace bs inside BdrvChild.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210824083856.17408-2-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Currently, the block driver whitelists are only applied for the system
emulator. All other binaries still give unrestricted access to all block
drivers. There are use cases where this made sense because the main
concern was avoiding customers running VMs on less optimised block
drivers and getting bad performance. Allowing the same image format e.g.
as a target for 'qemu-img convert' is not a problem then.
However, if the concern is the supportability of the driver in general,
either in full or when used read-write, not applying the list driver
whitelist in tools doesn't help - especially since qemu-nbd and
qemu-storage-daemon now give access to more or less the same operations
in block drivers as running a system emulator.
In order to address this, introduce a new configure option that enforces
the driver whitelist in all binaries.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210709164141.254097-1-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
As the BlockReopenQueue can contain nodes in multiple AioContexts, only
one of which may be locked when AIO_WAIT_WHILE() can be called, we can't
let the caller lock the right contexts. Instead, individually lock the
AioContext of a single node when iterating the queue.
Reintroduce bdrv_reopen() as a wrapper for reopening a single node that
drains the node and temporarily drops the AioContext lock for
bdrv_reopen_multiple().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210708114709.206487-4-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move the code to free a BlockReopenQueue to a separate function.
It will be used in a subsequent patch.
[ kwolf: Also free explicit_options and options, and explicitly
qobject_ref() the value when it continues to be used. This makes
future memory leaks less likely. ]
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210708114709.206487-3-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Back in commit d9f059aa6c (qemu-img: Deprecate use of -b without -F),
we deprecated the ability to create a file with a backing image that
requires qemu to perform format probing. Qemu can still probe older
files for backwards compatibility, but it is time to finish off the
ability to create such images, due to the potential security risk they
present. Update a couple of iotests affected by the change.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210503213600.569128-3-eblake@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When the x-blockdev-reopen was added it allowed reconfiguring the
graph by replacing backing files, but changing the 'file' option was
forbidden. Because of this restriction some operations are not
possible, notably inserting and removing block filters.
This patch adds support for replacing the 'file' option. This is
similar to replacing the backing file and the user is likewise
responsible for the correctness of the resulting graph, otherwise this
can lead to data corruption.
Signed-off-by: Alberto Garcia <berto@igalia.com>
[vsementsov: bdrv_reopen_parse_file_or_backing() is modified a lot]
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610120537.196183-9-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It's used only in bdrv_reopen_commit(). "backing" is covered by the
loop through all children except for case when we removed backing child
during reopen.
Make it more obvious and drop extra boolean field: qdict_del will not
fail if there is no such entry.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610120537.196183-8-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move supports_backing check of bdrv_reopen_parse_backing to called
(through bdrv_set_backing_noperm()) bdrv_set_file_or_backing_noperm()
function. The check applies to general case, so it's appropriate for
bdrv_set_file_or_backing_noperm().
We have to declare backing support for two test drivers, otherwise new
check fails.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610120537.196183-7-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The logic around finding overlay here is not obvious. Actually it does
two simple things:
1. If new bs is already in backing chain, split from parent bs by
several implicit filters we are done, do nothing.
2. Otherwise, don't try to replace implicit filter.
Let's rewrite this in more obvious way.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610120537.196183-6-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_set_backing_noperm() takes care of it (actual check is in
bdrv_set_file_or_backing_noperm()), so we don't need to check it here.
While being here, improve error message a bit.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610120537.196183-5-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We don't need this check: bdrv_set_backing_noperm() will do it anyway
(actually in bdrv_attach_child_common()).
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610120537.196183-4-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
To be used for reopen in future commit.
Notes:
- It seems OK to update inherits_from if new bs is recursively inherits
from parent bs. Let's just not check for backing_chain_contains, to
support file child of non-filters.
- Simply check child->frozen instead of
bdrv_is_backing_chain_frozen(), as we really interested only in this
one child.
- Role determination of new child is a bit more complex: it remains
the same for backing child, it's obvious for filter driver. But for
non-filter file child let's for now restrict to only replacing
existing child (and keeping its role).
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610120537.196183-3-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
To be used for reopen in future commit.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610120537.196183-2-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210610112618.127378-3-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We have bdrv_replace_child() wrapper on bdrv_replace_child_noperm().
But bdrv_replace_child() doesn't update permissions. It's rather
strange, as normally it's expected that foo() should call foo_noperm()
and update permissions.
Let's rename and add comment.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210610112618.127378-2-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When creating an image file with a backing file, we generally try to
open the backing file (unless -u was specified), mostly to verify that
it is there, but also to get the file size if none was specified for the
new image.
For neither of these things do we need data I/O, and so we can pass
BDRV_O_NO_IO when opening the backing file. This allows us to open even
encrypted backing images without requiring the user to provide a secret.
This makes the -u switch in iotests 189 and 198 unnecessary (and the
$size parameter), so drop it, because this way we get regression tests
for this patch here.
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/441
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210622140030.212487-1-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit 3108a15cf (block: introduce bdrv_drop_filter()) introduced
uninitialized variable to_cow_parent in bdrv_replace_node_common
function that is used only when detach_subchain is true. It is used in
two places. First if block properly initialize the variable and second
block use it.
However, compiler may treat these two blocks as two independent cases so
it thinks first block can fail test and second one pass (although both
use same condition). This cause warning that variable can be
uninitialized in second block.
The warning was observed with GCC 8.4.1 and 11.0.1.
To prevent this warning, initialize the variable with NULL.
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
Message-Id: <1162368493.17178530.1620201543649.JavaMail.zimbra@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some BSD platforms do not have this header.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Joelle van Dyne <j@getutm.app>
Message-Id: <20210315180341.31638-3-j@getutm.app>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now permissions are updated as follows:
1. do graph modifications ignoring permissions
2. do permission update
(of course, we rollback [1] if [2] fails)
So, on stage [2] we can't say which users are "old" and which are
"new" and exist only since [1]. And current error message is a bit
outdated. Let's improve it, to make everything clean.
While being here, add also a comment and some good assertions.
iotests 283, 307, qsd-jobs outputs are updated.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210601075218.79249-7-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All child classes have this callback. So, drop unreachable code.
Still add an assertion to bdrv_attach_child_common(), to early detect
bad classes.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210601075218.79249-6-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We have different types of parents: block nodes, block backends and
jobs. So, it makes sense to specify type together with name.
Next, this handler us used to compose an error message about permission
conflict. And permission conflict occurs in a specific place of block
graph. We shouldn't report name of parent device (as it refers another
place in block graph), but exactly and only the name of the node. So,
use bdrv_get_node_name() directly.
iotest 283 output is updated.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20210601075218.79249-4-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The logic around **child is not obvious: this reference is used not
only to return resulting child, but also to rollback NULL value on
transaction abort.
So, let's add documentation and some assertions.
While being here, drop extra declaration of bdrv_attach_child_noperm().
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210601075218.79249-2-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This variable is just a cache for !(bs->open_flags & BDRV_O_RDWR),
which we have to synchronize everywhere. Let's just drop it and
consistently use bdrv_is_read_only().
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210527154056.70294-3-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It's better to use accessor function instead of bs->read_only directly.
In some places use bdrv_is_writable() instead of
checking both BDRV_O_RDWR set and BDRV_O_INACTIVE not set.
In bdrv_open_common() it's a bit strange to add one more variable, but
we are going to drop bs->read_only in the next patch, so new ro local
variable substitutes it here.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210527154056.70294-2-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Like other error paths, this one needs to call tran_finalize() and clean
up the BlockReopenQueue, too.
Fixes: CID 1452772
Fixes: 72373e40fb
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210503110555.24001-3-kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The error path needs to call tran_finalize(), too.
Fixes: CID 1452773
Fixes: 548a74c0db
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210503110555.24001-2-kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
They are unused now.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210506090621.11848-3-vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Stop including sysemu/sysemu.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-2-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Now, bdrv_node_check_perm() is called only with fresh cumulative
permissions, so its actually "refresh_perm".
Move permission calculation to the function. Also, drop unreachable
error message and rewrite the remaining one to be more generic (as now
we don't know which node is added and which was already here).
Add also Virtuozzo copyright, as big work is done at this point.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-37-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We don't have bdrv_replace_child(), so it's time for
bdrv_replace_child_safe() to take its place.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-36-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Old interfaces dropped, nobody directly calls
bdrv_child_set_perm_abort() and bdrv_child_set_perm_commit(), so we can
use personal state structure for the action and stop exploiting
BdrvChild structure. Also, drop "_safe" suffix which is redundant now.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-35-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_replace_child() has only one caller, the second argument is
unused. Inline it now. This triggers deletion of some more unused
interfaces.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-34-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_check_perm_common() has only one caller, so no more sense in
"common".
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-33-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-32-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move bdrv_reopen_multiple to new paradigm of permission update:
first update graph relations, then do refresh the permissions.
We have to modify reopen process in file-posix driver: with new scheme
we don't have prepared permissions in raw_reopen_prepare(), so we
should reconfigure fd in raw_check_perm(). Still this seems more native
and simple anyway.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-31-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
During reopen we may add backing bs from other aio context, which may
lead to changing original context of top bs.
We are going to move graph modification to prepare stage. So, it will
be possible that bdrv_flush() in bdrv_reopen_prepare called on bs in
non-original aio context, which we didn't aquire which leads to crash.
To avoid this problem move bdrv_flush() to be a separate reopen stage
before bdrv_reopen_prepare().
This doesn't seem correct to acquire only one aio context and not all
contexts participating in reopen. But it's not obvious how to do it
correctly, keeping in mind:
1. rules of bdrv_set_aio_context_ignore() that requires new_context
lock not being held
2. possible deadlocks because of holding all (or several?) AioContext
locks
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-30-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Split out no-perm part of bdrv_set_backing_hd() as a separate
transaction action. Note the in case of existing BdrvChild we reuse it,
not recreate, just to do less actions.
We don't need to create extra reference to backing_hd as we don't lose
it in bdrv_attach_child().
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-29-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
To be used in further commit.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-28-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
To be used in the further commit.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-27-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This argument is always NULL. Drop it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-26-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Using bdrv_replace_node() for removing filter is not good enough: it
keeps child reference of the filter, which may conflict with original
top node during permission update.
Instead let's create new interface, which will do all graph
modifications first and then update permissions.
Let's modify bdrv_replace_node_common(), allowing it additionally drop
backing chain child link pointing to new node. This is quite
appropriate for bdrv_drop_intermediate() and makes possible to add
new bdrv_drop_filter() as a simple wrapper.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-24-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-23-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_append is not very good for inserting filters: it does extra
permission update as part of bdrv_set_backing_hd(). During this update
filter may conflict with other parents of top_bs.
Instead, let's first do all graph modifications and after it update
permissions.
append-greedy-filter test-case in test-bdrv-graph-mod is now works, so
move it out of debug option.
Note: bdrv_append() is still only works for backing-child based
filters. It's something to improve later.
Note2: we use the fact that bdrv_append() is used to append new nodes,
without backing child, so we don't need frozen check and inherits_from
logic from bdrv_set_backing_hd().
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-22-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Split part of bdrv_replace_node_common() to be used separately.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-21-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Split no-perm part of bdrv_attach_child as separate transaction action.
It will be used in later commits.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-20-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Split out no-perm part of bdrv_root_attach_child() into separate
transaction action. bdrv_root_attach_child() now moves to new
permission update paradigm: first update graph relations then update
permissions.
qsd-jobs test output updated. Seems now permission update goes in
another order. Still, the test comment say that we only want to check
that command doesn't crash, and it's still so.
Error message is a bit misleading as it looks like job was added first.
But actually in new paradigm of graph update we can't distinguish such
things. We should update the error message, but let's not do it now.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210428151804.439460-19-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
inore_children thing doesn't help to track all propagated permissions
of children we want to ignore. The simplest way to correctly update
permissions is update graph first and then do permission update. In
this case we just referesh permissions for the whole subgraph (in
topological-sort defined order) and everything is correctly calculated
automatically without any ignore_children.
So, refactor bdrv_replace_node_common to first do graph update and then
refresh the permissions.
Test test_parallel_exclusive_write() now pass, so move it out of
debugging "if".
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-18-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
To be used in the following commit.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-17-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add new interface, allowing use of existing node list. It will be used
to fix bdrv_replace_node() in the further commit.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-16-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Refactor calling driver callbacks to a separate transaction action to
be used later.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-15-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Rewrite bdrv_check_perm(), bdrv_abort_perm_update() and bdrv_set_perm()
to update nodes in topological sort order instead of simple DFS. With
topologically sorted nodes, we update a node only when all its parents
already updated. With DFS it's not so.
Consider the following example:
A -+
| |
| v
| B
| |
v |
C<-+
A is parent for B and C, B is parent for C.
Obviously, to update permissions, we should go in order A B C, so, when
we update C, all parent permissions already updated. But with current
approach (simple recursion) we can update in sequence A C B C (C is
updated twice). On first update of C, we consider old B permissions, so
doing wrong thing. If it succeed, all is OK, on second C update we will
finish with correct graph. But if the wrong thing failed, we break the
whole process for no reason (it's possible that updated B permission
will be less strict, but we will never check it).
Also new approach gives a way to simultaneously and correctly update
several nodes, we just need to run bdrv_topological_dfs() several times
to add all nodes and their subtrees into one topologically sorted list
(next patch will update bdrv_replace_node() in this manner).
Test test_parallel_perm_update() is now passing, so move it out of
debugging "if".
We also need to support ignore_children in
bdrv_parent_perms_conflict()
For test 283 order of conflicting parents check is changed.
Note also that in bdrv_check_perm() we don't check for parents conflict
at root bs, as we may be in the middle of permission update in
bdrv_reopen_multiple(). bdrv_reopen_multiple() will be updated soon.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-14-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Each of them has only one caller. Open-coding simplifies further
pemission-update system changes.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-13-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We are going to drop recursive bdrv_child_* functions, so stop use them
in bdrv_child_try_set_perm() as a first step.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-12-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Split out non-recursive parts, and refactor as block graph transaction
action.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-11-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add additional check that node parents do not interfere with each
other. This should not hurt existing callers and allows in further
patch use bdrv_refresh_perms() to update a subtree of changed
BdrvChild (check that change is correct).
New check will substitute bdrv_check_update_perm() in following
permissions refactoring, so keep error messages the same to avoid
unit test result changes.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-10-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
These functions are called only from bdrv_reopen_multiple() in block.c.
No reason to publish them.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-8-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Passing parent aio context is redundant, as child_class and parent
opaque pointer are enough to retrieve it. Drop the argument and use new
bdrv_child_get_parent_aio_context() interface.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-7-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add new handler to get aio context and implement it in all child
classes. Add corresponding public interface to be used soon.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-6-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We have too much comments for this feature. It seems better just don't
do it. Most of real users (tests don't count) have to create additional
reference.
Drop also comment in external_snapshot_prepare:
- bdrv_append doesn't "remove" old bs in common sense, it sounds
strange
- the fact that bdrv_append can fail is obvious from the context
- the fact that we must rollback all changes in transaction abort is
known (it's the direct role of abort)
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-5-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
QemuOpts is usually created merging the QemuOptsList of format
and protocol. So, when the format calls bdr_create_file(), the 'opts'
parameter contains a QemuOptsList with a combination of format and
protocol default values.
The format properly removes its options before calling
bdr_create_file(), but the default values remain in 'opts->list'.
So if the protocol has options with the same name (e.g. rbd has
'cluster_size' as qcow2), it will see the default values of the format,
since for overlapping options, the format wins.
To avoid this issue, lets convert QemuOpts to QDict, in this way we take
only the set options, and then convert it back to QemuOpts, using the
'create_opts' of the protocol. So the new QemuOpts, will contain only the
protocol defaults.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20210308161232.248833-1-sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
- Add Vladimir as NBD co-maintainer
- Fix reporting of holes in NBD_CMD_BLOCK_STATUS
- Improve command-line parsing accuracy of large numbers (anything going
through qemu_strtosz), including the deprecation of hex+suffix
- Improve some error reporting in the block layer
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmBHlmIACgkQp6FrSiUn
Q2q2cQgAqJWNb4J/ShjvzocDDPzJ0iBitFbg0huFPfbt4DScubEZo5wBJG7vOhOW
hIHrWCRzGvRgsn0tcSfrgFaegmHKrLgjkibM7ou8ni9NC1kUBd3R/3FBNIMxhYf7
Q8Kfspl0LRfMJDKF9jdCnQ4Gxcd6h2OIYZqiWVg8V4Tc8WdCpIVOah7e7wjuW8bT
vgZvfboUWm5AmIF9j/MxuMn+HFZ4ArSuFVL80ZaXlD00vRra7u3HZ8pUfcOlOujg
7HeouM1E5j3NNE6aZSN++x/EQ3sg0zmirbWUCcgAyRfdRkAmB15uh2PUzPxEIJKH
UHUIW5LvNtz2+yzOAz2yK29OE523Yg==
=blE1
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2021-03-09' into staging
nbd patches for 2021-03-09
- Add Vladimir as NBD co-maintainer
- Fix reporting of holes in NBD_CMD_BLOCK_STATUS
- Improve command-line parsing accuracy of large numbers (anything going
through qemu_strtosz), including the deprecation of hex+suffix
- Improve some error reporting in the block layer
# gpg: Signature made Tue 09 Mar 2021 15:38:10 GMT
# gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg: aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A
* remotes/ericb/tags/pull-nbd-2021-03-09:
block/qcow2: refactor qcow2_update_options_prepare error paths
block/qed: bdrv_qed_do_open: deal with errp
block/qcow2: simplify qcow2_co_invalidate_cache()
block/qcow2: read_cache_sizes: return status value
block/qcow2-bitmap: return status from qcow2_store_persistent_dirty_bitmaps
block/qcow2-bitmap: improve qcow2_load_dirty_bitmaps() interface
block/qcow2: qcow2_get_specific_info(): drop error propagation
blockjob: return status from block_job_set_speed()
block/mirror: drop extra error propagation in commit_active_start()
block: drop extra error propagation for bdrv_set_backing_hd
blockdev: fix drive_backup_prepare() missed error
block: check return value of bdrv_open_child and drop error propagation
utils: Deprecate hex-with-suffix sizes
utils: Improve qemu_strtosz() to have 64 bits of precision
utils: Enhance testsuite for do_strtosz()
nbd: server: Report holes for raw images
MAINTAINERS: add Vladimir as co-maintainer of NBD
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
bdrv_set_backing_hd now returns status, let's use it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20210202124956.63146-6-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Some error messages contain ambiguous representations of the 'node-name'
parameter. This can be particularly confusing when exchanging QMP
messages (C = client, S = server):
C: {"execute": "block_resize", "arguments": { "device": "my_file", "size": 26843545600 }}
S: {"error": {"class": "GenericError", "desc": "Cannot find device=my_file nor node_name="}}
^^^^^^^^^
This error message suggests one could send a message with a key called
'node_name':
C: {"execute": "block_resize", "arguments": { "node_name": "my_file", "size": 26843545600 }}
^^^^^^^^^
but using the underscore is actually incorrect, the parameter should be
'node-name':
S: {"error": {"class": "GenericError", "desc": "Parameter 'node_name' is unexpected"}}
This behavior was uncovered in bz1651437, but I ended up going down a
rabbit hole looking for other areas where this miscommunication might
occur and changing those accordingly as well.
Fixes: https://bugzilla.redhat.com/1651437
Signed-off-by: Connor Kuehl <ckuehl@redhat.com>
Message-Id: <20210305151929.1947331-2-ckuehl@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This function wraps bdrv_co_delete_file for the common case of removing a file,
which was just created by format driver, on an error condition.
It hides the -ENOTSUPP error, and reports all other errors otherwise.
Use it in luks driver
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20201217170904.946013-3-mlevitsk@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now bdrv_append returns status and we can drop all the local_err things
around it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20210202124956.63146-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The recommended use of qemu error api assumes returning status together
with setting errp and avoid void functions with errp parameter. Let's
improve bdrv_append and some friends to reduce error-propagation
overhead in further patches.
Choose int return status, because bdrv_replace_node_common() has call
to bdrv_check_update_perm(), which reports int status, which seems
correct to propagate.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210202124956.63146-2-vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Move blk_exp_close_all() from bdrv_close() to qemu_cleanup(), before
bdrv_drain_all_begin().
Export drivers may have coroutines yielding at some point in the block
layer, so we need to shut them down before draining the block layer,
as otherwise they may get stuck blk_wait_while_drained().
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1900505
Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20210201125032.44713-3-slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some graphs may contain an indirect reference to the first BDS in the
chain that can be reached while walking it bottom->up from one its
children.
Doubling-processing of a BDS is especially problematic for the
aio_notifiers, as they might attempt to work on both the old and the
new AIO contexts.
To avoid this problem, add every child and parent to the ignore list
before actually processing them.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20210201125032.44713-2-slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Provide API for insertion a node to backing chain.
Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-3-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl/dynUSHGFybWJydUBy
ZWRoYXQuY29tAAoJEDhwtADrkYZT3igP/3bWwsKR5vKVsDUTmMfrhcgaFvQiaYoG
F29Bond8Xy0Zd0gl7OWh/5jKL0vGlrEVPrKfYLUjMnfkeRec/pOkIB2oOmIxpnPs
9zi4kh2hQ3dEoRBuvSnnZzedetYPTuCpWMIjlztkgfxgcimqm8TPNVSxRaSApjC3
Y8108wGwBWVf2C0rhKO9E2xA51uo6khy05i1psUtqUlC+PuDQ/OwzQHM2dnWdDB6
kUwBDK17nhL6WwsYqCyKLSiDModReYfDiY8GS5MDLo74dzwXiatEefCR7+sbM4xq
eX/SBoqoeS1jLPNuCryNeGNKvNA2KAbEJTnbQA2NxBXHgZ9/1SxVZFxuPp4nDMSQ
N7BDuDI8YtJE479RjT/ZzRG65xadGBSe/HXkXM9mZwh1zitop8SVZ9fArFBHvNzw
Y5zAv3fQd54+87psffg4dYFK0wGmqTabLEEuVzM8KIVqcAdYA2yC2b2EHy+vsxuq
GMkr0WaA6Sq2gthXmzdTjmUPuHdan/NIhuV6d66SbPNH2oH31piptFxuznyFWSKV
isciFFdUrkg5QrF8DSt2nmdwMFf8QGbszqP8QIGMzhJCCS9GXIiGG8f149++q8X8
HO1lFAdLQJdrDwCYmfx36tOvi2rS/rcoTGgvg66UX3xKko1ruoxR1ZWcS54obJN6
vEQDZ+PxubDg
=vGLy
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2020-12-19' into staging
QAPI patches patches for 2020-12-19
# gpg: Signature made Sat 19 Dec 2020 09:40:05 GMT
# gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg: issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* remotes/armbru/tags/pull-qapi-2020-12-19: (33 commits)
qobject: Make QString immutable
block: Use GString instead of QString to build filenames
keyval: Use GString to accumulate value strings
json: Use GString instead of QString to accumulate strings
migration: Replace migration's JSON writer by the general one
qobject: Factor JSON writer out of qobject_to_json()
qobject: Factor quoted_str() out of to_json()
qobject: Drop qstring_get_try_str()
qobject: Drop qobject_get_try_str()
Revert "qobject: let object_property_get_str() use new API"
block: Avoid qobject_get_try_str()
qmp: Fix tracing of non-string command IDs
qobject: Move internals to qobject-internal.h
hw/rdma: Replace QList by GQueue
Revert "qstring: add qstring_free()"
qobject: Change qobject_to_json()'s value to GString
qobject: Use GString instead of QString to accumulate JSON
qobject: Make qobject_to_json_pretty() take a pretty argument
monitor: Use GString instead of QString for output buffer
hmp: Simplify how qmp_human_monitor_command() gets output
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
QString supports modifying its string, but it's quite limited: you can
only append. Just one caller remains:
bdrv_parse_filename_strip_prefix() uses it just for building an
initial string.
Change it to do build the initial string with GString. This is
another step towards making QString immutable.
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: qemu-block@nongnu.org
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201211171152.146877-20-armbru@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
I'm about to remove qobject_get_try_str(). Use qstring_get_str()
instead. Safe because the argument is known to be a QString here.
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: qemu-block@nongnu.org
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201211171152.146877-11-armbru@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
qobject_to_json() and qobject_to_json_pretty() build a GString, then
covert it to QString. Just one of the callers actually needs a
QString: qemu_rbd_parse_filename(). A few others need a string they
can modify: qmp_send_response(), qga's send_response(), to_json_str(),
and qmp_fd_vsend_fds(). The remainder just need a string.
Change qobject_to_json() and qobject_to_json_pretty() to return the
GString.
qemu_rbd_parse_filename() now has to convert to QString. All others
save a QString temporary. to_json_str() actually becomes a bit
simpler, because GString provides more convenient modification
functions.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201211171152.146877-6-armbru@redhat.com>
Do generic processing even for drivers which define .bdrv_check_perm
handler. It's needed for further preallocate filter: it will need to do
additional action on bdrv_check_perm, but don't want to reimplement
generic logic.
The patch doesn't change existing behaviour: the only driver that
implements bdrv_check_perm is file-posix, but it never has any
children.
Also, bdrv_set_perm() don't stop processing if driver has
.bdrv_set_perm handler as well.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201021145859.11201-8-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
The only users of this thing are:
1. bdrv_child_try_set_perm, to ignore failures on loosen restrictions
2. assertion in bdrv_replace_child
3. assertion in bdrv_inactivate_recurse
Assertions are not enough reason for overcomplication the permission
update system. So, look at bdrv_child_try_set_perm.
We are interested in tighten_restrictions only on failure. But on
failure this field is not reliable: we may fail in the middle of
permission update, some nodes are not touched and we don't know should
their permissions be tighten or not. So, we rely on the fact that if we
loose restrictions on some node (or BdrvChild), we'll not tighten
restriction in the whole subtree as part of this update (assertions 2
and 3 rely on this fact as well). And, if we rely on this fact anyway,
we can just check it on top, and don't pass additional pointer through
the whole recursive infrastructure.
Note also, that further patches will fix real bugs in permission update
system, so now is good time to simplify it, as a help for further
refactorings.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201106124241.16950-8-vsementsov@virtuozzo.com>
[mreitz: Fixed rebase conflict]
Signed-off-by: Max Reitz <mreitz@redhat.com>
We must set the permission used for _check_. Assert that we have
backup and drop extra arguments.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201106124241.16950-7-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
We should never set permissions other than cumulative permissions of
parents. During bdrv_reopen_multiple() we _check_ for synthetic
permissions but when we do _set_ the graph is already updated.
Add an assertion to bdrv_reopen_multiple(), other cases are more
obvious.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201106124241.16950-6-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Make separate function for common pattern.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201106124241.16950-5-vsementsov@virtuozzo.com>
[mreitz: Squashed in
https://lists.nongnu.org/archive/html/qemu-block/2020-11/msg00299.html]
Signed-off-by: Max Reitz <mreitz@redhat.com>
We are going to modify block layer to work with 64bit requests. And
first step is moving to int64_t type for both offset and bytes
arguments in all block request related functions.
It's mostly safe (when widening signed or unsigned int to int64_t), but
switching from uint64_t is questionable.
So, let's first establish the set of requests we want to work with.
First signed int64_t should be enough, as off_t is signed anyway. Then,
obviously offset + bytes should not overflow.
And most interesting: (offset + bytes) being aligned up should not
overflow as well. Aligned to what alignment? First thing that comes in
mind is bs->bl.request_alignment, as we align up request to this
alignment. But there is another thing: look at
bdrv_mark_request_serialising(). It aligns request up to some given
alignment. And this parameter may be bdrv_get_cluster_size(), which is
often a lot greater than bs->bl.request_alignment.
Note also, that bdrv_mark_request_serialising() uses signed int64_t for
calculations. So, actually, we already depend on some restrictions.
Happily, bdrv_get_cluster_size() returns int and
bs->bl.request_alignment has 32bit unsigned type, but defined to be a
power of 2 less than INT_MAX. So, we may establish, that INT_MAX is
absolute maximum for any kind of alignment that may occur with the
request.
Note, that bdrv_get_cluster_size() is not documented to return power
of 2, still bdrv_mark_request_serialising() behaves like it is.
Also, backup uses bdi.cluster_size and is not prepared to it not being
power of 2.
So, let's establish that Qemu supports only power-of-2 clusters and
alignments.
So, alignment can't be greater than 2^30.
Finally to be safe with calculations, to not calculate different
maximums for different nodes (depending on cluster size and
request_alignment), let's simply set QEMU_ALIGN_DOWN(INT64_MAX, 2^30)
as absolute maximum bytes length for Qemu. Actually, it's not much less
than INT64_MAX.
OK, then, let's apply it to block/io.
Let's consider all block/io entry points of offset/bytes:
4 bytes/offset interface functions: bdrv_co_preadv_part(),
bdrv_co_pwritev_part(), bdrv_co_copy_range_internal() and
bdrv_co_pdiscard() and we check them all with bdrv_check_request().
We also have one entry point with only offset: bdrv_co_truncate().
Check the offset.
And one public structure: BdrvTrackedRequest. Happily, it has only
three external users:
file-posix.c: adopted by this patch
write-threshold.c: only read fields
test-write-threshold.c: sets obviously small constant values
Better is to make the structure private and add corresponding
interfaces.. Still it's not obvious what kind of interface is needed
for file-posix.c. Let's keep it public but add corresponding
assertions.
After this patch we'll convert functions in block/io.c to int64_t bytes
and offset parameters. We can assume that offset/bytes pair always
satisfy new restrictions, and make
corresponding assertions where needed. If we reach some offset/bytes
point in block/io.c missing bdrv_check_request() it is considered a
bug. As well, if block/io.c modifies a offset/bytes request, expanding
it more then aligning up to request_alignment, it's a bug too.
For all io requests except for discard we keep for now old restriction
of 32bit request length.
iotest 206 output error message changed, as now test disk size is
larger than new limit. Add one more test case with new maximum disk
size to cover too-big-L1 case.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-5-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
block-export-add type=fuse allows mounting block graph nodes via FUSE on
some existing regular file. That file should then appears like a raw
disk image, and accesses to it result in accesses to the exported BDS.
Right now, we only implement the necessary block export functions to set
it up and shut it down. We do not implement any access functions, so
accessing the mount point only results in errors. This will be
addressed by a followup patch.
We keep a hash table of exported mount points, because we want to be
able to detect when users try to use a mount point twice. This is
because we invoke stat() to check whether the given mount point is a
regular file, but if that file is served by ourselves (because it is
already used as a mount point), then this stat() would have to be served
by ourselves, too, which is impossible to do while we (as the caller)
are waiting for it to settle. Therefore, keep track of mount point
paths to at least catch the most obvious instances of that problem.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-3-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
First, permission update loop tries to do iterations transactionally,
but the whole update is not transactional: nobody roll-back successful
loop iterations when some iteration fails.
Second, in the iteration we have nested permission update:
c->klass->update_filename may point to bdrv_child_cb_update_filename()
which calls bdrv_backing_update_filename(), which may do node reopen to
RW.
Permission update system is not prepared to nested updates, at least it
has intermediate permission-update state stored in BdrvChild
structures: has_backup_perm, backup_perm and backup_shared_perm.
So, let's first do bdrv_replace_node_common() (which is more
transactional than open-coded update in bdrv_drop_intermediate()) and
then call update_filename() in separate. We still do not rollback
changes in case of update_filename() failure but it's not much worse
than pre-patch behavior.
Note that bdrv_replace_node_common() does check for frozen children,
so corresponding check is dropped in bdrv_drop_intermediate().
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201106124241.16950-4-vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add new parameter to bdrv_replace_node(): auto_skip. With
auto_skip=false we'll have stricter behavior: update _all_ from
parents or fail. New behaviour will be used in the following commit in
block.c, so keep original function name as public interface.
Note: new error message is a bit funny in contrast with further
"Cannot" in case of frozen child, but we'd better keep some difference
to make it possible to distinguish one from another on failure. Still,
actually we'd better refactor should_update_child() call to distinguish
also different kinds of "should not". Let's do it later.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201106124241.16950-3-vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201106124241.16950-2-vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Back in 2015, we attempted to fix error reporting for images that
claimed to have more than INT64_MAX/512 sectors, but due to the type
promotions caused by BDRV_SECTOR_SIZE being unsigned, this
inadvertently forces all negative ret values to be slammed into -EFBIG
rather than the original error. While we're at it, we can avoid the
confusing ?: by spelling the logic more directly.
Fixes: 4a9c9ea0d3
Reported-by: Guoyi Tu <tu.guoyi@h3c.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201105155122.60943-1-eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
block.c has a useful macro QAPI_LIST_ADD() for inserting at the front
of any QAPI-generated list; move it from block.c to qapi/util.h so
more places can use it, including one earlier place in block.c, and
rename it to something more obvious (since we also have a lot of
places that append, rather than prepend, to a list).
There are many more places in the codebase that can benefit from using
the macro, but converting them will be left to later patches.
In theory, all QAPI list types are child classes of GenericList; but
in practice, that relationship is not explicitly spelled out in the C
type declarations (rather, it is something that happens implicitly due
to C compatible layouts), and the macro does not actually depend on
the GenericList type. We considered moving GenericList from visitor.h
into util.h to group related code; however, such a move would be
awkward if we do not also move GenericAlternate. Unfortunately,
moving GenericAlternate would introduce its own problems of
declaration circularity (qapi-builtin-types.h needs a complete
definition of QEnumLookup from util.h, but GenericAlternate needs a
complete definition of QType from qapi-builtin-types.h).
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201027050556.269064-3-eblake@redhat.com>
[eblake: s/ADD/PREPEND/ per suggestion by Markus]
There is no need to rely on the verbosity of the gcc/clang compiler
extension of g_new(typeof(X), 1) when we can instead use the standard
g_malloc(sizeof(X)). In general, we like g_new over g_malloc for
returning type X rather than void* to let the compiler catch more
potential typing mistakes, but in this particular macro, our other use
of typeof on the same line already ensures we are getting correct
results.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201027050556.269064-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
If a BDS gets deleted during blk_drain_all(), it might miss a
call to bdrv_do_drained_end(). This means missing a call to
aio_enable_external() and the AIO context remains disabled for
ever. This can cause a device to become irresponsive and to
disrupt the guest execution, ie. hang, loop forever or worse.
This scenario is quite easy to encounter with virtio-scsi
on POWER when punching multiple blockdev-create QMP commands
while the guest is booting and it is still running the SLOF
firmware. This happens because SLOF disables/re-enables PCI
devices multiple times via IO/MEM/MASTER bits of PCI_COMMAND
register after the initial probe/feature negotiation, as it
tends to work with a single device at a time at various stages
like probing and running block/network bootloaders without
doing a full reset in-between. This naturally generates many
dataplane stops and starts, and thus many drain sections that
can race with blockdev_create_run(). In the end, SLOF bails
out.
It is somehow reproducible on x86 but it requires to generate
articial dataplane start/stop activity with stop/cont QMP
commands. In this case, seabios ends up looping for ever,
waiting for the virtio-scsi device to send a response to
a command it never received.
Add a helper that pairs all previously called bdrv_do_drained_begin()
with a bdrv_do_drained_end() and call it from bdrv_close().
While at it, update the "/bdrv-drain/graph-change/drain_all"
test in test-bdrv-drain so that it can catch the issue.
BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1874441
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160346526998.272601.9045392804399803158.stgit@bahia.lan>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Inside of coroutine context, we can't directly use aio_context_acquire()
for the AioContext of a block node because we already own the lock of
the current AioContext and we need to avoid double locking to prevent
deadlocks.
This provides helper functions to lock the AioContext of a node only if
it's not the same as the current AioContext.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-14-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Add a pair of functions to temporarily move the current coroutine to the
AioContext of a given BlockDriverState.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201005155855.256490-13-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Use code generation implemented in previous commit to generated
coroutine wrappers in block.c and block/io.c
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200924185414.28642-6-vsementsov@virtuozzo.com>
We are going to keep coroutine-wrappers code (structure-packing
parameters, BDRV_POLL wrapper functions) in separate auto-generated
files. So, we'll need a header with declaration of original _co_
functions, for those which are static now. As well, we'll need
declarations for wrapper functions. Do these declarations now, as a
preparation step.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200924185414.28642-4-vsementsov@virtuozzo.com>
This is the only coroutine wrapper from block.c and block/io.c which
doesn't return a value, so let's convert it to the common behavior, to
simplify moving to generated coroutine wrappers in a further commit.
Also, bdrv_invalidate_cache is a void function, returning error only
through **errp parameter, which is considered to be bad practice, as
it forces callers to define and propagate local_err variable, so
conversion is good anyway.
This patch leaves the conversion of .bdrv_co_invalidate_cache() driver
callbacks and bdrv_invalidate_cache_all() for another day.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200924185414.28642-2-vsementsov@virtuozzo.com>
This adds a function to shut down all block exports, and another one to
shut down the block exports of a single type. The latter is used for now
when stopping the NBD server. As soon as we implement support for
multiple NBD servers, we'll need a per-server list of exports and it
will be replaced by a function using that.
As a side effect, the BlockExport layer has a list tracking all existing
exports now. closed_exports loses its only user and can go away.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-18-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
clang's C11 atomic_fetch_*() functions only take a C11 atomic type
pointer argument. QEMU uses direct types (int, etc) and this causes a
compiler error when a QEMU code calls these functions in a source file
that also included <stdatomic.h> via a system header file:
$ CC=clang CXX=clang++ ./configure ... && make
../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid)
Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h
and <stdatomic.h> can co-exist. I checked /usr/include on my machine and
searched GitHub for existing "qatomic_" users but there seem to be none.
This patch was generated using:
$ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \
sort -u >/tmp/changed_identifiers
$ for identifier in $(</tmp/changed_identifiers); do
sed -i "s%\<$identifier\>%q$identifier%g" \
$(git grep -I -l "\<$identifier\>")
done
I manually fixed line-wrap issues and misaligned rST tables.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
I found that there are many spelling errors in the comments of qemu,
so I used the spellcheck tool to check the spelling errors
and finally found some spelling errors in the folder.
Signed-off-by: zhaolichang <zhaolichang@huawei.com>
Reviewed-by: Alex Bennee <alex.bennee@linaro.org>
Message-Id: <20200917075029.313-2-zhaolichang@huawei.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Parts of the block layer treat BDS.backing_file as if it were whatever
the image header says (i.e., if it is a relative path, it is relative to
the overlay), other parts treat it like a cache for
bs->backing->bs->filename (relative paths are relative to the CWD).
Considering bs->backing->bs->filename exists, let us make it mean the
former.
Among other things, this now allows the user to specify a base when
using qemu-img to commit an image file in a directory that is not the
CWD (assuming, everything uses relative filenames).
Before this patch:
$ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
$ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
$ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
After this patch:
$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
Image committed.
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
Image committed.
With this change, bdrv_find_backing_image() must look at whether the
user has overridden a BDS's backing file. If so, it can no longer use
bs->backing_file, but must instead compare the given filename against
the backing node's filename directly.
Note that this changes the QAPI output for a node's backing_file. We
had very inconsistent output there (sometimes what the image header
said, sometimes the actual filename of the backing image). This
inconsistent output was effectively useless, so we have to decide one
way or the other. Considering that bs->backing_file usually at runtime
contained the path to the image relative to qemu's CWD (or absolute),
this patch changes QAPI's backing_file to always report the
bs->backing->bs->filename from now on. If you want to receive the image
header information, you have to refer to full-backing-filename.
This necessitates a change to iotest 228. The interesting information
it really wanted is the image header, and it can get that now, but it
has to use full-backing-filename instead of backing_file. Because of
this patch's changes to bs->backing_file's behavior, we also need some
reference output changes.
Along with the changes to bs->backing_file, stop updating
BDS.backing_format in bdrv_backing_attach() as well. This way,
ImageInfo's backing-filename and backing-filename-format fields will
represent what the image header says and nothing else.
iotest 245 changes in behavior: With the backing node no longer
overriding the parent node's backing_file string, you can now omit the
@backing option when reopening a node with neither a default nor a
current backing file even if it used to have a backing node at some
point.
273 also changes: The base image is opened without a format layer, so
ImageInfo.backing-filename-format used to report "file" for the base
image's overlay after blockdev-snapshot. However, the image header
never says "file" anywhere, so it now reports $IMGFMT.
Signed-off-by: Max Reitz <mreitz@redhat.com>
There are two practical problems with bdrv_get_allocated_file_size()'s
default right now:
(1) For drivers with children, we should generally sum all their sizes
instead of just passing the request through to bs->file. The latter
is good for filters, but not so much for format drivers.
(2) Filters need not have bs->file, so we should actually go to the
filtered child instead of hard-coding bs->file.
Fix this by splitting the default implementation into three branches:
(1) For filter drivers: Return the size of the filtered child
(2) For protocol drivers: Return -ENOTSUP, because the default
implementation cannot make a guess
(3) For other drivers: Sum all data-bearing children's sizes
Signed-off-by: Max Reitz <mreitz@redhat.com>
When looking for a blkdebug node (which implements debug breakpoints),
use bdrv_primary_bs() to iterate through the graph, because that is
where a blkdebug node would be.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
bdrv_refresh_filename() and the kind of related bdrv_dirname() should
look to the primary child when they wish to copy the underlying file's
filename.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reopening a node's backing child needs a bit of special handling because
the "backing" child has different defaults than all other children
(among other things). Adding filter support here is a bit more
difficult than just using the child access functions. In fact, we often
have to directly use bs->backing because these functions are about the
"backing" child (which may or may not be the COW backing file).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Use child access functions when iterating through backing chains so
filters do not break the chain.
In addition, bdrv_find_overlay() will now always return the actual
overlay; that is, it will never return a filter node but only one with a
COW backing file (there may be filter nodes between that node and @bs).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Places that use patterns like
if (bs->drv->is_filter && bs->file) {
... something about bs->file->bs ...
}
should be
BlockDriverState *filtered = bdrv_filter_bs(bs);
if (filtered) {
... something about @filtered ...
}
instead.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Filters cannot compress data themselves but they have to implement
.bdrv_co_pwritev_compressed() still (or they cannot forward compressed
writes). Therefore, checking whether
bs->drv->bdrv_co_pwritev_compressed is non-NULL is not sufficient to
know whether the node can actually handle compressed writes. This
function looks down the filter chain to see whether there is a
non-filter that can actually convert the compressed writes into
compressed data (and thus normal writes).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
The original purpose of bdrv_is_encrypted() was to inquire whether a BDS
can be used without the user entering a password or not. It has not
been used for that purpose for quite some time.
Actually, it is not even fit for that purpose, because to answer that
question, it would have recursively query all of the given node's
children.
So now we have to decide in which direction we want to fix
bdrv_is_encrypted(): Recursively query all children, or drop it and just
use bs->encrypted to get the current node's status?
Nowadays, its only purpose is to report through bdrv_query_image_info()
whether the given image is encrypted or not. For this purpose, it is
probably more interesting to see whether a given node itself is
encrypted or not (otherwise, a management application cannot discern for
certain which nodes are really encrypted and which just have encrypted
children).
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
In order to make filters work in backing chains, the associated
functions must be able to deal with them and freeze both COW and filter
child links.
While at it, add some comments that note which functions require their
caller to ensure that a given child link is not frozen, and how the
callers do so.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
bdrv_set_backing_hd() is a function that explicitly cares about the
bs->backing child. Highlight that in its description and use
child_bs(bs->backing) instead of backing_bs(bs) to make it more obvious.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
bdrv_has_zero_init() should use bdrv_cow_child() if it wants to check
whether the given BDS has a COW backing file.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
There are BDS children that the general block layer code can access,
namely bs->file and bs->backing. Since the introduction of filters and
external data files, their meaning is not quite clear. bs->backing can
be a COW source, or it can be a filtered child; bs->file can be a
filtered child, it can be data and metadata storage, or it can be just
metadata storage.
This overloading really is not helpful. This patch adds functions that
retrieve the correct child for each exact purpose. Later patches in
this series will make use of them. Doing so will allow us to handle
filter nodes in a meaningful way.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Providing an empty string for the backing file parameter like so:
qemu-img create -f qcow2 -b '' /tmp/foo
allows the flow of control to reach and subsequently fail an assert
statement because passing an empty string to
bdrv_get_full_backing_filename_from_filename()
simply results in NULL being returned without an error being raised.
To fix this, let's check for an empty string when getting the value from
the opts list.
Reported-by: Attila Fazekas <afazekas@redhat.com>
Fixes: https://bugzilla.redhat.com/1809553
Signed-off-by: Connor Kuehl <ckuehl@redhat.com>
Message-Id: <20200813134722.802180-1-ckuehl@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move the create-config logic to meson.build; create a
configuration_data object and let meson handle the
quoting and output.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Unaligned requests will automatically be aligned to bl.request_alignment
and we can't extend write requests to access space beyond the end of the
image without resizing the image, so if we have the WRITE permission,
but not the RESIZE one, it's required that the image size is aligned.
Failing to meet this requirement could cause assertion failures like
this if RESIZE permissions weren't requested:
qemu-img: block/io.c:1910: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.
This was e.g. triggered by qemu-img converting to a target image with 4k
request alignment when the image was only aligned to 512 bytes, but not
to 4k.
Turn this into a graceful error in bdrv_check_perm() so that WRITE
without RESIZE can only be taken if the image size is aligned. If a user
holds both permissions and drops only RESIZE, the function will return
an error, but bdrv_child_try_set_perm() will ignore the failure silently
if permissions are only requested to be relaxed and just keep both
permissions while returning success.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200716142601.111237-2-kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Creating an image that requires format probing of the backing image is
potentially unsafe (we've had several CVEs over the years based on
probes leaking information to the guest on a subsequent boot, although
these days tools like libvirt are aware of the issue enough to prevent
the worst effects). For example, if our probing algorithm ever
changes, or if other tools like libvirt determine a different probe
result than we do, then subsequent use of that backing file under a
different format will present corrupted data to the guest.
Fortunately, the worst effects occur only when the backing image is
originally raw, and we at least prevent commit into a probed raw
backing file that would change its probed type.
Still, it is worth starting a deprecation clock so that future
qemu-img can refuse to create backing chains that would rely on
probing, to encourage clients to avoid unsafe practices. Most
warnings are intentionally emitted from bdrv_img_create() in the block
layer, but qemu-img convert uses bdrv_create() which cannot emit its
own warning without causing spurious warnings on other code paths. In
the end, all command-line image creation or backing file rewriting now
performs a check.
Furthermore, if we probe a backing file as non-raw, then it is safe to
explicitly record that result (rather than relying on future probes);
only where we probe a raw image do we care about further warnings to
the user when using such an image (for example, commits into a
probed-raw backing file are prevented), to help them improve their
tooling. But whether or not we make the probe results explicit, we
still warn the user to remind them to upgrade their workflow to supply
-F always.
iotest 114 specifically wants to create an unsafe image for later
amendment rather than defaulting to our new default of recording a
probed format, so it needs an update. While touching it, expand it to
cover all of the various warnings enabled by this patch. iotest 301
also shows a change to qcow messages.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200706203954.341758-11-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For now, this is a mechanical addition; all callers pass false. But
the next patch will use it to improve 'qemu-img rebase -u' when
selecting a backing file with no format.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Message-Id: <20200706203954.341758-10-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Back in commit 6e6e55f5 (Jul 2017, v2.10), we tweaked the code to warn
if the backing file could not be opened but the user gave a size,
unless the user also passes the -u option to bypass the open of the
backing file. As one common reason for failure to open the backing
file is when there is mismatch in the requested backing format in
relation to what the backing file actually contains, we actually want
to open the backing file and ensure that it has the right format in as
many cases as possible. iotest 301 for qcow demonstrates how
detecting explicit format mismatch is useful to prevent the creation
of an image that would probe differently than the user requested. Now
is the time to finally turn the warning an error, as promised.
Note that the original warning was added prior to our documentation of
an official deprecation policy (eb22aeca, also Jul 2017), and because
the warning didn't mention the word "deprecated", we never actually
remembered to document it as such. But the warning has been around
long enough that I don't see prolonging it another two releases.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200706203954.341758-7-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
During 'qemu-img create ... 2>&1', if --quiet is not in force, we can
end up with buffered I/O in stdout that was produced before failure,
but which appears in output after failure. This is confusing; the fix
is to flush stdout prior to attempting anything that might produce an
error message. Several iotests demonstrate the resulting ordering
change now that the merged outputs now reflect chronology. (An even
better fix would be to avoid printf from within block.c altogether,
but that's much more invasive...)
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200706203954.341758-2-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away, even when we need to keep error_propagate() for other
error paths.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-38-armbru@redhat.com>
When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away. The previous commit did that with a Coccinelle script I
consider fairly trustworthy. This commit uses the same script with
the matching of return taken out, i.e. we convert
if (!foo(..., &err)) {
...
error_propagate(errp, err);
...
}
to
if (!foo(..., errp)) {
...
...
}
This is unsound: @err could still be read between afterwards. I don't
know how to express "no read of @err without an intervening write" in
Coccinelle. Instead, I manually double-checked for uses of @err.
Suboptimal line breaks tweaked manually. qdev_realize() simplified
further to placate scripts/checkpatch.pl.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-36-armbru@redhat.com>
When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away. Convert
if (!foo(..., &err)) {
...
error_propagate(errp, err);
...
return ...
}
to
if (!foo(..., errp)) {
...
...
return ...
}
where nothing else needs @err. Coccinelle script:
@rule1 forall@
identifier fun, err, errp, lbl;
expression list args, args2;
binary operator op;
constant c1, c2;
symbol false;
@@
if (
(
- fun(args, &err, args2)
+ fun(args, errp, args2)
|
- !fun(args, &err, args2)
+ !fun(args, errp, args2)
|
- fun(args, &err, args2) op c1
+ fun(args, errp, args2) op c1
)
)
{
... when != err
when != lbl:
when strict
- error_propagate(errp, err);
... when != err
(
return;
|
return c2;
|
return false;
)
}
@rule2 forall@
identifier fun, err, errp, lbl;
expression list args, args2;
expression var;
binary operator op;
constant c1, c2;
symbol false;
@@
- var = fun(args, &err, args2);
+ var = fun(args, errp, args2);
... when != err
if (
(
var
|
!var
|
var op c1
)
)
{
... when != err
when != lbl:
when strict
- error_propagate(errp, err);
... when != err
(
return;
|
return c2;
|
return false;
|
return var;
)
}
@depends on rule1 || rule2@
identifier err;
@@
- Error *err = NULL;
... when != err
Not exactly elegant, I'm afraid.
The "when != lbl:" is necessary to avoid transforming
if (fun(args, &err)) {
goto out
}
...
out:
error_propagate(errp, err);
even though other paths to label out still need the error_propagate().
For an actual example, see sclp_realize().
Without the "when strict", Coccinelle transforms vfio_msix_setup(),
incorrectly. I don't know what exactly "when strict" does, only that
it helps here.
The match of return is narrower than what I want, but I can't figure
out how to express "return where the operand doesn't use @err". For
an example where it's too narrow, see vfio_intx_enable().
Silently fails to convert hw/arm/armsse.c, because Coccinelle gets
confused by ARMSSE being used both as typedef and function-like macro
there. Converted manually.
Line breaks tidied up manually. One nested declaration of @local_err
deleted manually. Preexisting unwanted blank line dropped in
hw/riscv/sifive_e.c.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-35-armbru@redhat.com>
When creating an image fails because the format doesn't support option
"backing_file" or "backing_fmt", bdrv_img_create() first has
qemu_opt_set() put a generic error into @local_err, then puts the real
error into @errp with error_setg(), and then propagates the former to
the latter, which throws away the generic error. A bit complicated,
but works.
Now that qemu_opt_set() returns a useful value, we can simply ignore
the generic error instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200707160613.848843-16-armbru@redhat.com>
The function has only one user: bdrv_co_block_status(). Inline it to
simplify reviewing of the following patches, which will finally drop
unallocated_blocks_are_zero field too.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200528094405.145708-3-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
'force' option will be used for some unsafe amend operations.
This includes things like erasing last keyslot in luks based formats
(which destroys the data, unless the master key is backed up
by external means), but that _might_ be desired result.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200608094030.670121-4-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Commit 93676c88 relaxed our NBD client code to request export names up
to the NBD protocol maximum of 4096 bytes without NUL terminator, even
though the block layer can't store anything longer than 4096 bytes
including NUL terminator for display to the user. Since this means
there are some export names where we have to truncate things, we can
at least try to make the truncation a bit more obvious for the user.
Note that in spite of the truncated display name, we can still
communicate with an NBD server using such a long export name; this was
deemed nicer than refusing to even connect to such a server (since the
server may not be under our control, and since determining our actual
length limits gets tricky when nbd://host:port/export and
nbd+unix:///export?socket=/path are themselves variable-length
expansions beyond the export name but count towards the block layer
name length).
Reported-by: Xueqiang Wei <xuwei@redhat.com>
Fixes: https://bugzilla.redhat.com/1843684
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200610163741.3745251-3-eblake@redhat.com>
Implementations should decide the necessary permissions based on @role.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-35-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-33-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-32-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-31-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Replace child_file by child_of_bds in all remaining places (excluding
tests).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-28-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-25-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Make all parents of backing files pass the appropriate BdrvChildRole.
By doing so, we can switch their BdrvChildClass over to the generic
child_of_bds, which will do the right thing when given a correct
BdrvChildRole.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-24-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-23-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This callback can be used by BDSs that use child_of_bds with the
appropriate BdrvChildRole for their children.
Also, make bdrv_format_default_perms() use it for child_of_bds children
(just a temporary solution until we can drop bdrv_format_default_perms()
altogether).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-20-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We can be less restrictive about pure data children than those with
metadata on them, so let bdrv_default_perms_for_storage() handle
metadata children differently from pure data children.
As explained in the code, the restrictions on metadata children are
strictly stricter than those for pure data children, so in theory we
just have to distinguish between pure-data and all other storage
children (pure metadata or data+metadata). In practice, that is not
obvious, though, so we have two independent code paths for metadata and
for data children, and data+metadata children will go through both
(without the path for data children doing anything meaningful).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-19-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Right now, bdrv_format_default_perms() is used by format parents
(generally). We want to switch to a model where most parents use a
single BdrvChildClass, which then decides the permissions based on the
child role. To do so, we have to split bdrv_format_default_perms() into
separate functions for each such role.
Note that bdrv_default_perms_for_storage() currently handles all DATA |
METADATA children. A follow-up patch is going to split it further into
one function for each case.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-18-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Right now, bdrv_format_default_perms() is used by format parents
(generally). We want to switch to a model where most parents use a
single BdrvChildClass, which then decides the permissions based on the
child role. To do so, we have to split bdrv_format_default_perms() into
separate functions for each such role.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-17-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_format_default_perms() has one code path for backing files, and one
for storage files. We want to pull them out into their own functions,
so make sure they are completely distinct before so the next patches
will be a bit cleaner.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-16-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Any current user of child_file, child_format, and child_backing can and
should use this generic BdrvChildClass instead, as it can handle all of
these cases. However, to be able to do so, the users must pass the
appropriate BdrvChildRole when the child is created/attached. (The
following commits will take care of that.)
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-15-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Make bdrv_child_cb_detach() call bdrv_backing_detach() for children with
a COW role (and drop the reverse call from bdrv_backing_detach()), so it
can be used for any child (with a proper role set).
Because so far no child has a proper role set, we need a temporary new
callback for child_backing.detach that ensures bdrv_backing_detach() is
called for all COW children that do not have their role set yet.
(Also, move bdrv_child_cb_detach() down to group it with
bdrv_inherited_options() and bdrv_child_cb_attach().)
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-14-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Make bdrv_child_cb_attach() call bdrv_backing_attach() for children with
a COW role (and drop the reverse call from bdrv_backing_attach()), so it
can be used for any child (with a proper role set).
Because so far no child has a proper role set, we need a temporary new
callback for child_backing.attach that ensures bdrv_backing_attach() is
called for all COW children that do not have their role set yet.
(Also, move bdrv_child_cb_attach() down to group it with
bdrv_inherited_options().)
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-13-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Let child_file's, child_format's, and child_backing's .inherit_options()
implementations fall back to bdrv_inherited_options() to show that it
would really work for all of these cases, if only the parents passed the
appropriate BdrvChildRole and parent_is_format values.
(Also, make bdrv_open_inherit(), the only place to explicitly call
bdrv_backing_options(), call bdrv_inherited_options() instead.)
This patch should incur only two visible changes, both for child_format
children, both of which are effectively bug fixes:
First, they no longer have discard=unmap set by default. This reason it
was set is because bdrv_inherited_fmt_options() fell through to
bdrv_protocol_options(), and that set it because "format drivers take
care to send flushes and respect unmap policy". None of the drivers
that use child_format for their children (quorum and blkverify) are
format drivers, though, so this reasoning does not apply here.
Second, they no longer have BDRV_O_NO_IO force-cleared. child_format
was used solely for children that do not store any metadata and as such
will not be accessed by their parents as long as those parents do not
receive I/O themselves. Thus, such children should inherit
BDRV_O_NO_IO.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-12-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
After the series this patch belongs to, we want to have a common
BdrvChildClass that encompasses all of child_file, child_format, and
child_backing. Such a single class needs a single .inherit_options()
implementation, and this patch introduces it.
The next patch will show how the existing implementations can fall back
to it just by passing appropriate BdrvChildRole and parent_is_format
values.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200513110544.176672-11-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The other two .inherit_options implementations specify exactly for what
case they are used in their name, so do it for this one as well.
(The actual intention behind this patch is to follow it up with a
generic bdrv_inherited_options() that works for all three cases.)
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-10-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We plan to unify the generic .inherit_options() functions. The
resulting common function will need to decide whether to force-enable
format probing, force-disable it, or leave it as-is. To make this
decision, it will need to know whether the parent node is a format node
or not (because we never want format probing if the parent is a format
node already (except for the backing chain)).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-9-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For now, all callers (effectively) pass 0 and no callee evaluates thie
value. Later patches will change both.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-8-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For now, all callers pass 0 and no callee evaluates this value. Later
patches will change both.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-7-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For now, it is always set to 0. Later patches in this series will
ensure that all callers pass an appropriate combination of flags.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200513110544.176672-6-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This structure nearly only contains parent callbacks for child state
changes. It cannot really reflect a child's role, because different
roles may overlap (as we will see when real roles are introduced), and
because parents can have custom callbacks even when the child fulfills a
standard role.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200513110544.176672-4-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Right now, all users of bdrv_make_empty() call the BlockDriver method
directly. That is not only bad style, it is also wrong, unless the
caller has a BdrvChild with a WRITE or WRITE_UNCHANGED permission.
(WRITE_UNCHANGED suffices, because callers generally use this function
to clear a node with a backing file after a commit operation.)
Introduce bdrv_make_empty() that verifies that it does.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200429141126.85159-2-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that there are no clients of bdrv_has_zero_init_truncate, none of
the drivers need to worry about providing it.
What's more, this eliminates a source of some confusion: a literal
reading of the documentation as written in ceaca56f and implemented in
commit 1dcaf527 claims that a driver which returns 0 for
bdrv_has_zero_init_truncate() must not return 1 for
bdrv_has_zero_init(); this condition was violated for parallels, qcow,
and sometimes for vdi, although in practice it did not matter since
those drivers also lacked .bdrv_co_truncate.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200428202905.770727-10-eblake@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that node level interface bdrv_truncate() supports passing request
flags to the block driver, expose this on the BlockBackend level, too.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200424125448.63318-4-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The previous few commits have made this more obvious, and removed the
one exception. Time to clarify the documentation, and drop dead error
checking.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200424084338.26803-13-armbru@redhat.com>
bdrv_root_attach_child promises to drop child_bs reference on failure.
It does it on first handled failure path, but not on the second. Fix
that.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200324155921.23822-1-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of checking the .bdrv_co_create_opts to see if we need the
fallback, just implement the .bdrv_co_create_opts in the drivers that
need it.
This way we don't break various places that need to know if the
underlying protocol/format really supports image creation, and this way
we still allow some drivers to not support image creation.
Fixes: fd17146cd9
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1816007
Note that technically this driver reverts the image creation fallback
for the vxhs driver since I don't have a means to test it, and IMHO it
is better to leave it not supported as it was prior to generic image
creation patches.
Also drop iscsi_create_opts which was left accidentally.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200326011218.29230-3-mlevitsk@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
[mreitz: Fixed alignment, and moved bdrv_co_create_opts_simple() and
bdrv_create_opts_simple from block.h into block_int.h]
Signed-off-by: Max Reitz <mreitz@redhat.com>
This will allow the reuse of a single generic .bdrv_co_create
implementation for several drivers.
No functional changes.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200326011218.29230-2-mlevitsk@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
There is a use-after-free possible: bdrv_unref_child() leaves
bs->backing freed but not NULL. bdrv_attach_child may produce nested
polling loop due to drain, than access of freed pointer is possible.
I've produced the following crash on 30 iotest with modified code. It
does not reproduce on master, but still seems possible:
#0 __strcmp_avx2 () at /lib64/libc.so.6
#1 bdrv_backing_overridden (bs=0x55c9d3cc2060) at block.c:6350
#2 bdrv_refresh_filename (bs=0x55c9d3cc2060) at block.c:6404
#3 bdrv_backing_attach (c=0x55c9d48e5520) at block.c:1063
#4 bdrv_replace_child_noperm
(child=child@entry=0x55c9d48e5520,
new_bs=new_bs@entry=0x55c9d3cc2060) at block.c:2290
#5 bdrv_replace_child
(child=child@entry=0x55c9d48e5520,
new_bs=new_bs@entry=0x55c9d3cc2060) at block.c:2320
#6 bdrv_root_attach_child
(child_bs=child_bs@entry=0x55c9d3cc2060,
child_name=child_name@entry=0x55c9d241d478 "backing",
child_role=child_role@entry=0x55c9d26ecee0 <child_backing>,
ctx=<optimized out>, perm=<optimized out>, shared_perm=21,
opaque=0x55c9d3c5a3d0, errp=0x7ffd117108e0) at block.c:2424
#7 bdrv_attach_child
(parent_bs=parent_bs@entry=0x55c9d3c5a3d0,
child_bs=child_bs@entry=0x55c9d3cc2060,
child_name=child_name@entry=0x55c9d241d478 "backing",
child_role=child_role@entry=0x55c9d26ecee0 <child_backing>,
errp=errp@entry=0x7ffd117108e0) at block.c:5876
#8 in bdrv_set_backing_hd
(bs=bs@entry=0x55c9d3c5a3d0,
backing_hd=backing_hd@entry=0x55c9d3cc2060,
errp=errp@entry=0x7ffd117108e0)
at block.c:2576
#9 stream_prepare (job=0x55c9d49d84a0) at block/stream.c:150
#10 job_prepare (job=0x55c9d49d84a0) at job.c:761
#11 job_txn_apply (txn=<optimized out>, fn=<optimized out>) at
job.c:145
#12 job_do_finalize (job=0x55c9d49d84a0) at job.c:778
#13 job_completed_txn_success (job=0x55c9d49d84a0) at job.c:832
#14 job_completed (job=0x55c9d49d84a0) at job.c:845
#15 job_completed (job=0x55c9d49d84a0) at job.c:836
#16 job_exit (opaque=0x55c9d49d84a0) at job.c:864
#17 aio_bh_call (bh=0x55c9d471a160) at util/async.c:117
#18 aio_bh_poll (ctx=ctx@entry=0x55c9d3c46720) at util/async.c:117
#19 aio_poll (ctx=ctx@entry=0x55c9d3c46720,
blocking=blocking@entry=true)
at util/aio-posix.c:728
#20 bdrv_parent_drained_begin_single (poll=true, c=0x55c9d3d558f0)
at block/io.c:121
#21 bdrv_parent_drained_begin_single (c=c@entry=0x55c9d3d558f0,
poll=poll@entry=true)
at block/io.c:114
#22 bdrv_replace_child_noperm
(child=child@entry=0x55c9d3d558f0,
new_bs=new_bs@entry=0x55c9d3d27300) at block.c:2258
#23 bdrv_replace_child
(child=child@entry=0x55c9d3d558f0,
new_bs=new_bs@entry=0x55c9d3d27300) at block.c:2320
#24 bdrv_root_attach_child
(child_bs=child_bs@entry=0x55c9d3d27300,
child_name=child_name@entry=0x55c9d241d478 "backing",
child_role=child_role@entry=0x55c9d26ecee0 <child_backing>,
ctx=<optimized out>, perm=<optimized out>, shared_perm=21,
opaque=0x55c9d3cc2060, errp=0x7ffd11710c60) at block.c:2424
#25 bdrv_attach_child
(parent_bs=parent_bs@entry=0x55c9d3cc2060,
child_bs=child_bs@entry=0x55c9d3d27300,
child_name=child_name@entry=0x55c9d241d478 "backing",
child_role=child_role@entry=0x55c9d26ecee0 <child_backing>,
errp=errp@entry=0x7ffd11710c60) at block.c:5876
#26 bdrv_set_backing_hd
(bs=bs@entry=0x55c9d3cc2060,
backing_hd=backing_hd@entry=0x55c9d3d27300,
errp=errp@entry=0x7ffd11710c60)
at block.c:2576
#27 stream_prepare (job=0x55c9d495ead0) at block/stream.c:150
...
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200316060631.30052-2-vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
bdrv_do_find_format() calls strcmp() using BlockDriver::format_name
as argument, which must not be NULL. Assert this field is not null
when we register a block driver in bdrv_register().
Reported-by: Mansour Ahmadi <ManSoSec@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200318222235.23856-1-philmd@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Using the new 'bdrv_co_delete_file' interface, a pure co_routine function
'bdrv_co_delete_file' inside block.c can can be used in a way similar of
the existing bdrv_create_file to to clean up a created file.
We're creating a pure co_routine because the only caller of
'bdrv_co_delete_file' will be already in co_routine context, thus there
is no need to add all the machinery to check for qemu_in_coroutine() and
create a separated co_routine to do the job.
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200130213907.2830642-3-danielhb413@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>