We have to perform an active commit whenever the top node has a parent
that has taken the WRITE permission on it.
This means that block-commit's @backing-file parameter is no longer
allowed for such nodes, and that users will have to issue a
block-job-complete command. Neither should pose a problem in practice,
because this case was basically just broken until now.
(Since this commit already touches block-commit's documentation, it also
moves up the chunk explaining general block-commit behavior that for
some reason was situated under @backing-file.)
Signed-off-by: Max Reitz <mreitz@redhat.com>
We want to make it explicit where bs->backing is used, and we have done
so. The old role of backing_bs() is now effectively taken by
bdrv_cow_bs().
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
This changes iotest 204's output, because blkdebug on top of a COW node
used to make qemu-img map disregard the rest of the backing chain (the
backing chain was broken by the filter). With this patch, the
allocation in the base image is reported correctly.
Signed-off-by: Max Reitz <mreitz@redhat.com>
When looking for a dirty bitmap to share, we should handle filters by
just including them in the search (so they do not break backing chains).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
This includes some permission limiting (for example, we only need to
take the RESIZE permission if the base is smaller than the top).
Signed-off-by: Max Reitz <mreitz@redhat.com>
This includes some permission limiting (for example, we only need to
take the RESIZE permission for active commits where the base is smaller
than the top).
base_overlay is introduced so we can query bdrv_is_allocated_above() on
it - we cannot do that with base itself, because a filter's block_status
is the same as its child node, so if there are filters on base,
bdrv_is_allocated_above() on base would return information including
base.
Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to
"target_backing_bs", because that is what it really refers to.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
query-block, query-named-block-nodes, and query-blockstats now return
any filtered child under "backing", not just bs->backing or COW
children. This is so that filters do not interrupt the reported backing
chain. This changes the output for iotest 184, as the throttled node
now appears as a backing child.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
It makes no sense to report the block stats of a purely metadata-storing
child in query-blockstats. So if the primary child does not have any
data, try to find a unique data-storing child.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
This allows us to differentiate between filters and nodes with COW
backing files: Filters cannot be used as overlays at all (for this
function).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
It is trivial, so we might as well do it.
Remove _filter_actual_image_size from iotest 184, so we get to see the
result in its reference output.
Signed-off-by: Max Reitz <mreitz@redhat.com>
There are two practical problems with bdrv_get_allocated_file_size()'s
default right now:
(1) For drivers with children, we should generally sum all their sizes
instead of just passing the request through to bs->file. The latter
is good for filters, but not so much for format drivers.
(2) Filters need not have bs->file, so we should actually go to the
filtered child instead of hard-coding bs->file.
Fix this by splitting the default implementation into three branches:
(1) For filter drivers: Return the size of the filtered child
(2) For protocol drivers: Return -ENOTSUP, because the default
implementation cannot make a guess
(3) For other drivers: Sum all data-bearing children's sizes
Signed-off-by: Max Reitz <mreitz@redhat.com>
When looking for a blkdebug node (which implements debug breakpoints),
use bdrv_primary_bs() to iterate through the graph, because that is
where a blkdebug node would be.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
If the top node's driver does not provide snapshot functionality and we
want to fall back to a node down the chain, we need to snapshot all
non-COW children. For simplicity's sake, just do not fall back if there
is more than one such child. Furthermore, we really only can fall back
to bs->file and bs->backing, because bdrv_snapshot_goto() has to modify
the child link (notably, set it to NULL).
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
If a node whose driver does not provide VM state functions has a
metadata child, the VM state should probably go there; if it is a
filter, the VM state should probably go there. It follows that we
should generally go down to the primary child.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
bdrv_refresh_filename() and the kind of related bdrv_dirname() should
look to the primary child when they wish to copy the underlying file's
filename.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Instead of looking at just bs->file and bs->backing, we should look at
all children that could end up receiving forwarded requests.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Before HEAD^, we needed this because bdrv_co_flush() by itself would
only flush bs->file. With HEAD^, bdrv_co_flush() will flush all
children on which a WRITE or WRITE_UNCHANGED permission has been taken.
Thus, vmdk no longer needs to do it itself.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
If the driver does not support .bdrv_co_flush() so bdrv_co_flush()
itself has to flush the children of the given node, it should not flush
just bs->file->bs, but in fact all children that might have been written
to (judging from the permissions taken on them).
This is a bug fix for qcow2 images with an external data file, as they
so far did not flush that data_file node.
In any case, the BLKDBG_EVENT() should be emitted on the primary child,
because that is where a blkdebug node would be if there is any.
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reopening a node's backing child needs a bit of special handling because
the "backing" child has different defaults than all other children
(among other things). Adding filter support here is a bit more
difficult than just using the child access functions. In fact, we often
have to directly use bs->backing because these functions are about the
"backing" child (which may or may not be the COW backing file).
Signed-off-by: Max Reitz <mreitz@redhat.com>
The condition modified here is not about potentially filtered children,
but only about COW sources (i.e. traditional backing files).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Use child access functions when iterating through backing chains so
filters do not break the chain.
In addition, bdrv_find_overlay() will now always return the actual
overlay; that is, it will never return a filter node but only one with a
COW backing file (there may be filter nodes between that node and @bs).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Because of the (not so recent anymore) changes that make the stream job
independent of the base node and instead track the node above it, we
have to split that "bottom" node into two cases: The bottom COW node,
and the node directly above the base node (which may be an R/W filter
or the bottom COW node).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Places that use patterns like
if (bs->drv->is_filter && bs->file) {
... something about bs->file->bs ...
}
should be
BlockDriverState *filtered = bdrv_filter_bs(bs);
if (filtered) {
... something about @filtered ...
}
instead.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Filters cannot compress data themselves but they have to implement
.bdrv_co_pwritev_compressed() still (or they cannot forward compressed
writes). Therefore, checking whether
bs->drv->bdrv_co_pwritev_compressed is non-NULL is not sufficient to
know whether the node can actually handle compressed writes. This
function looks down the filter chain to see whether there is a
non-filter that can actually convert the compressed writes into
compressed data (and thus normal writes).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
The original purpose of bdrv_is_encrypted() was to inquire whether a BDS
can be used without the user entering a password or not. It has not
been used for that purpose for quite some time.
Actually, it is not even fit for that purpose, because to answer that
question, it would have recursively query all of the given node's
children.
So now we have to decide in which direction we want to fix
bdrv_is_encrypted(): Recursively query all children, or drop it and just
use bs->encrypted to get the current node's status?
Nowadays, its only purpose is to report through bdrv_query_image_info()
whether the given image is encrypted or not. For this purpose, it is
probably more interesting to see whether a given node itself is
encrypted or not (otherwise, a management application cannot discern for
certain which nodes are really encrypted and which just have encrypted
children).
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
In order to make filters work in backing chains, the associated
functions must be able to deal with them and freeze both COW and filter
child links.
While at it, add some comments that note which functions require their
caller to ensure that a given child link is not frozen, and how the
callers do so.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
bdrv_set_backing_hd() is a function that explicitly cares about the
bs->backing child. Highlight that in its description and use
child_bs(bs->backing) instead of backing_bs(bs) to make it more obvious.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
bdrv_has_zero_init() should use bdrv_cow_child() if it wants to check
whether the given BDS has a COW backing file.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
There are BDS children that the general block layer code can access,
namely bs->file and bs->backing. Since the introduction of filters and
external data files, their meaning is not quite clear. bs->backing can
be a COW source, or it can be a filtered child; bs->file can be a
filtered child, it can be data and metadata storage, or it can be just
metadata storage.
This overloading really is not helpful. This patch adds functions that
retrieve the correct child for each exact purpose. Later patches in
this series will make use of them. Doing so will allow us to handle
filter nodes in a meaningful way.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
In preparation of using multiple IRQ (thus multiple eventfds)
make BDRVNVMeState::irq_notifier an array (for now of a single
element, the admin queue notifier).
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-16-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
As we want to do per-queue polling, extract the nvme_poll_queue()
method which operates on a single queue.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-15-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
nvme_create_queue_pair() doesn't require BlockDriverState anymore.
Replace it by BDRVNVMeState and AioContext to simplify.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-14-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
BDRV_POLL_WHILE() is defined as:
#define BDRV_POLL_WHILE(bs, cond) ({ \
BlockDriverState *bs_ = (bs); \
AIO_WAIT_WHILE(bdrv_get_aio_context(bs_), \
cond); })
As we will remove the BlockDriverState use in the next commit,
start by using the exploded version of BDRV_POLL_WHILE().
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-13-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
nvme_init_queue() doesn't require BlockDriverState anymore.
Replace it by BDRVNVMeState to simplify.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-12-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qemu_try_blockalign() is a generic API that call back to the
block driver to return its page alignment. As we call from
within the very same driver, we already know to page alignment
stored in our state. Remove indirections and use the value from
BDRVNVMeState.
This change is required to later remove the BlockDriverState
argument, to make nvme_init_queue() per hardware, and not per
block driver.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-11-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In the next commit we'll get rid of qemu_try_blockalign().
To ease review, first replace qemu_try_blockalign0() by explicit
calls to qemu_try_blockalign() and memset().
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-10-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We allocate an unique chunk of memory then use it for two
different structures. By using an union, we make it clear
the data is overlapping (and we can remove the casts).
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-9-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We are going to modify the code in the next commit. Renaming
the 'resp' variable to 'id' first makes the next commit easier
to review. No logical changes.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-8-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Rearrange nvme_add_io_queue() by using a common error path.
This will be proven useful in few commits where we add IRQ
notification to the IO queues.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-7-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Do not use the same error message for different failures.
Display a different error whether it is the CQ or the SQ.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-6-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Use definitions instead of '0' or '1' indexes. Also this will
be useful when using multi-queues later.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-5-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
As nvme_create_queue_pair() is allowed to fail, replace the
alloc() calls by try_alloc() to avoid aborting QEMU.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-4-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Avoid further processing if TRACE_NVME_SUBMIT_COMMAND_RAW is
not enabled. This is an untested intend of performance optimization.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-3-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Use self-explicit SCALE_MS definition instead of magic value.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-2-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>