Commit Graph

4921 Commits

Author SHA1 Message Date
Alberto Garcia
184581fa4d qcow2: Fix removal of list members from BDRVQcow2State.cluster_allocs
When a write request needs to allocate new clusters (or change the L2
bitmap of existing ones) a QCowL2Meta structure is created so the L2
metadata can be later updated and any copy-on-write can be performed
if necessary.

A write request can span a region consisting of an arbitrary
combination of previously unallocated and allocated clusters, and if
the unallocated ones can be put contiguous to the existing ones then
QEMU will do so in order to minimize the number of write operations.

In practice this means that a write request has not just one but a
number of QCowL2Meta structures. All of them are added to the
cluster_allocs list that is stored in BDRVQcow2State and is used to
detect overlapping requests. After the write request finishes all its
associated QCowL2Meta are removed from that list. calculate_l2_meta()
takes care of creating and putting those structures in the list, and
qcow2_handle_l2meta() takes care of removing them.

The problem is that the error path in handle_alloc() also tries to
remove an item in that list, a remnant from the time when this was
handled there (that code would not even be correct anymore because
it only removes one struct and not all the ones from the same write
request).

This can trigger a double removal of the same item from the list,
causing a crash. This is not easy to reproduce in practice because
it requires that do_alloc_cluster_offset() fails after a successful
previous allocation during the same write request, but it can be
reproduced with the included test case.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <3440a1c4d53c4fe48312b478c96accb338cbef7c.1599150873.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:13 +02:00
Alberto Garcia
02b1ecfa10 qcow2: Use macros for the L1, refcount and bitmap table entry sizes
This patch replaces instances of sizeof(uint64_t) in the qcow2 driver
with macros that indicate what those sizes are actually referring to.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200828110828.13833-1-berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:12 +02:00
Lukas Straub
5eb9a3c7b0 block/quorum.c: stable children names
If we remove the child with the highest index from the quorum,
decrement s->next_child_index. This way we get stable children
names as long as we only remove the last child.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Fixes: https://bugs.launchpad.net/bugs/1881231
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <5d5f930424c1c770754041aa8ad6421dc4e2b58e.1596536719.git.lukasstraub2@web.de>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:12 +02:00
Peter Maydell
2499453eb1 Block layer patches:
- qemu-img create: Fail gracefully when backing file is an empty string
 - Fixes related to filter block nodes ("Deal with filters" series)
 - block/nvme: Various cleanups required to use multiple queues
 - block/nvme: Use NvmeBar structure from "block/nvme.h"
 - file-win32: Fix "locking" option
 - iotests: Allow running from different directory
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl9Z7bcRHGt3b2xmQHJl
 ZGhhdC5jb20ACgkQfwmycsiPL9ZS6w//bos+A0RfRRF0YFWkIBLQWxqzKcGvMJ8W
 XWv3mFzd47UaDgRYwVnCC3CR6bLYEINISngZ3geA4jI1+w7AtYKDOO0HN32dUg+D
 ZrNMn02701CA6qkmpxJ+yjsrl9ltR3jYe0me4Wr39Pvdexa2pl/e+M4Vas6FhkYL
 ghAwNThypscGCrFjAlz3ru2Sc/K+sPWrGoqkzr+SWvsm9wy4vb8aLxr8Yy50x/zc
 CqALS9SQ/YA93BCVi9CzPkVyV3ioA0kg/y38WvLtAQ9GZ3m/ekMro3WvdYsRsFCN
 LGXsuwFig+U7Kd7lJrCS9TLnlTJstNGqPq9jEoV5cThPvGknFfMvVOzRmmP7tzqT
 YRcPRy39z44OoLKa3kyg3aF38BTxt+9gPqBnivKMr9j9EecMvPsXXHRvF+lP+LsP
 j753Ih561hX6FurcjX8pc9GOM2cQA0GjlyL77UTTAmLZyFXP/8e55oQbBuYTylc/
 Xlvmc/T+yEGiEGTnK+FxgDAiUaxbCCM9cDVStJjTvsIq43dwXb48g1onDsGZ5eDf
 j9lmAD6TJxHNOB5ErNsDPODf4/D1wJ9t9WVF8UZp9ArfPHRdxMzT7Q4LvetaDmVl
 +hQC9cgTq8Qd8LwSqbKEYua4L6iGbmLAT7/N6htq5L1eVLg76/tLg/tKSwh/vKAY
 yzPmyHaVK84=
 =gaaW
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- qemu-img create: Fail gracefully when backing file is an empty string
- Fixes related to filter block nodes ("Deal with filters" series)
- block/nvme: Various cleanups required to use multiple queues
- block/nvme: Use NvmeBar structure from "block/nvme.h"
- file-win32: Fix "locking" option
- iotests: Allow running from different directory

# gpg: Signature made Thu 10 Sep 2020 10:11:19 BST
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (65 commits)
  block/qcow2-cluster: Add missing "fallthrough" annotation
  block/nvme: Pair doorbell registers
  block/nvme: Use generic NvmeBar structure
  block/nvme: Group controller registers in NVMeRegs structure
  file-win32: Fix "locking" option
  iotests: Allow running from different directory
  iotests: Test committing to overridden backing
  iotests: Add test for commit in sub directory
  iotests: Add filter mirror test cases
  iotests: Add filter commit test cases
  iotests: Let complete_and_wait() work with commit
  iotests: Test that qcow2's data-file is flushed
  block: Leave BDS.backing_{file,format} constant
  block: Inline bdrv_co_block_status_from_*()
  blockdev: Fix active commit choice
  block: Drop backing_bs()
  qemu-img: Use child access functions
  nbd: Use CAF when looking for dirty bitmap
  commit: Deal with filters
  backup: Deal with filters
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-09-11 14:47:49 +01:00
Thomas Huth
b9be6faed1 block/qcow2-cluster: Add missing "fallthrough" annotation
When compiling with -Werror=implicit-fallthrough, the compiler currently
complains:

../../devel/qemu/block/qcow2-cluster.c: In function ‘cluster_needs_new_alloc’:
../../devel/qemu/block/qcow2-cluster.c:1320:12: error: this statement may fall
 through [-Werror=implicit-fallthrough=]
         if (l2_entry & QCOW_OFLAG_COPIED) {
            ^
../../devel/qemu/block/qcow2-cluster.c:1323:5: note: here
     case QCOW2_CLUSTER_UNALLOCATED:
     ^~~~

It's quite obvious that the fallthrough is intended here, so let's add
a comment to silence the compiler warning.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20200908070028.193298-1-thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé
e5ff22ba9f block/nvme: Pair doorbell registers
For each queue doorbell registers are paired as:
- Submission Queue Tail Doorbell
- Completion Queue Head Doorbell

Reflect that in the NVMeRegs structure, and adapt
nvme_create_queue_pair() accordingly.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200904124130.583838-4-philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Fam Zheng <fam@euphon.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé
c7100f0a0b block/nvme: Use generic NvmeBar structure
Commit f3c507adcd ("NVMe: Initial commit for new storage interface")
introduced the NvmeBar structure. Unfortunately in commit bdd6a90a9e
("block: Add VFIO based NVMe driver") we duplicated it.

Apparently in commit a3d9a352d4 ("block: Move NVMe constants to
a separate header") we tried to unify headers but forgot to remove
the structure declared in the block/nvme.c source file.

Do it now, and remove the structure size check which is redundant
with the header check added in commit 74e18435c0 ("hw/block/nvme:
Align I/O BAR to 4 KiB").

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200904124130.583838-3-philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Fam Zheng <fam@euphon.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé
0ea32f34ce block/nvme: Group controller registers in NVMeRegs structure
We want to use the NvmeBar structure from "block/nvme.h" in the
next commit. As a preliminary step, group all the NVMe controller
registers in the 'ctrl' field, keeping the doorbells registers
out of it.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200904124130.583838-2-philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Fam Zheng <fam@euphon.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-10 11:11:12 +02:00
Kevin Wolf
3b079ac0ff file-win32: Fix "locking" option
The intended behaviour was that locking=off/auto work and have no
effect (to remain compatible with file-posix), whereas locking=on would
return an error. Unfortunately, the code forgot to remove "locking" from
the options QDict, so any attempt to use the option would fail.

Replace the option parsing code for "locking" with something that is
part of the raw_runtime_opts QemuOptsList (so it is properly removed
from the QDict) and looks more like file-posix.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200907092739.9988-1-kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-10 11:11:12 +02:00
Markus Armbruster
b15e402fc8 trace-events: Fix attribution of trace points to source
Some trace points are attributed to the wrong source file.  Happens
when we neglect to update trace-events for code motion, or add events
in the wrong place, or misspell the file name.

Clean up with help of scripts/cleanup-trace-events.pl.  Funnies
requiring manual post-processing:

* accel/tcg/cputlb.c trace points are in trace-events.

* block.c and blockdev.c trace points are in block/trace-events.

* hw/block/nvme.c uses the preprocessor to hide its trace point use
  from cleanup-trace-events.pl.

* hw/tpm/tpm_spapr.c uses pseudo trace point tpm_spapr_show_buffer to
  guard debug code.

* include/hw/xen/xen_common.h trace points are in hw/xen/trace-events.

* linux-user/trace-events abbreviates a tedious list of filenames to
  */signal.c.

* net/colo-compare and net/filter-rewriter.c use pseudo trace points
  colo_compare_miscompare and colo_filter_rewriter_debug to guard
  debug code.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200806141334.3646302-5-armbru@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-09-09 17:17:58 +01:00
Markus Armbruster
6ec9379870 trace-events: Delete unused trace points
Tracked down with the help of scripts/cleanup-trace-events.pl.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20200806141334.3646302-4-armbru@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-09-09 17:17:02 +01:00
Max Reitz
0b877d09df block: Leave BDS.backing_{file,format} constant
Parts of the block layer treat BDS.backing_file as if it were whatever
the image header says (i.e., if it is a relative path, it is relative to
the overlay), other parts treat it like a cache for
bs->backing->bs->filename (relative paths are relative to the CWD).
Considering bs->backing->bs->filename exists, let us make it mean the
former.

Among other things, this now allows the user to specify a base when
using qemu-img to commit an image file in a directory that is not the
CWD (assuming, everything uses relative filenames).

Before this patch:

$ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
$ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
$ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'

After this patch:

$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
Image committed.
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
Image committed.

With this change, bdrv_find_backing_image() must look at whether the
user has overridden a BDS's backing file.  If so, it can no longer use
bs->backing_file, but must instead compare the given filename against
the backing node's filename directly.

Note that this changes the QAPI output for a node's backing_file.  We
had very inconsistent output there (sometimes what the image header
said, sometimes the actual filename of the backing image).  This
inconsistent output was effectively useless, so we have to decide one
way or the other.  Considering that bs->backing_file usually at runtime
contained the path to the image relative to qemu's CWD (or absolute),
this patch changes QAPI's backing_file to always report the
bs->backing->bs->filename from now on.  If you want to receive the image
header information, you have to refer to full-backing-filename.

This necessitates a change to iotest 228.  The interesting information
it really wanted is the image header, and it can get that now, but it
has to use full-backing-filename instead of backing_file.  Because of
this patch's changes to bs->backing_file's behavior, we also need some
reference output changes.

Along with the changes to bs->backing_file, stop updating
BDS.backing_format in bdrv_backing_attach() as well.  This way,
ImageInfo's backing-filename and backing-filename-format fields will
represent what the image header says and nothing else.

iotest 245 changes in behavior: With the backing node no longer
overriding the parent node's backing_file string, you can now omit the
@backing option when reopening a node with neither a default nor a
current backing file even if it used to have a backing node at some
point.

273 also changes: The base image is opened without a format layer, so
ImageInfo.backing-filename-format used to report "file" for the base
image's overlay after blockdev-snapshot.  However, the image header
never says "file" anywhere, so it now reports $IMGFMT.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
549ec0d978 block: Inline bdrv_co_block_status_from_*()
With bdrv_filter_bs(), we can easily handle this default filter behavior
in bdrv_co_block_status().

blkdebug wants to have an additional assertion, so it keeps its own
implementation, except bdrv_co_block_status_from_file() needs to be
inlined there.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
9a71b9de3f commit: Deal with filters
This includes some permission limiting (for example, we only need to
take the RESIZE permission if the base is smaller than the top).

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
2b088c60bb backup: Deal with filters
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
3f072a7fb7 mirror: Deal with filters
This includes some permission limiting (for example, we only need to
take the RESIZE permission for active commits where the base is smaller
than the top).

base_overlay is introduced so we can query bdrv_is_allocated_above() on
it - we cannot do that with base itself, because a filter's block_status
is the same as its child node, so if there are filters on base,
bdrv_is_allocated_above() on base would return information including
base.

Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to
"target_backing_bs", because that is what it really refers to.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
c6f6d8462c block-copy: Use CAF to find sync=top base
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
0a7585dbba block: Use child access functions for QAPI queries
query-block, query-named-block-nodes, and query-blockstats now return
any filtered child under "backing", not just bs->backing or COW
children.  This is so that filters do not interrupt the reported backing
chain.  This changes the output for iotest 184, as the throttled node
now appears as a backing child.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
3f26191c73 block: Report data child for query-blockstats
It makes no sense to report the block stats of a purely metadata-storing
child in query-blockstats.  So if the primary child does not have any
data, try to find a unique data-storing child.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
07cd7b659a block/null: Implement bdrv_get_allocated_file_size
It is trivial, so we might as well do it.

Remove _filter_actual_image_size from iotest 184, so we get to see the
result in its reference output.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
c8af87573f block/snapshot: Fix fallback
If the top node's driver does not provide snapshot functionality and we
want to fall back to a node down the chain, we need to snapshot all
non-COW children.  For simplicity's sake, just do not fall back if there
is more than one such child.  Furthermore, we really only can fall back
to bs->file and bs->backing, because bdrv_snapshot_goto() has to modify
the child link (notably, set it to NULL).

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
c4db2e25df block: Use CAF in bdrv_co_rw_vmstate()
If a node whose driver does not provide VM state functions has a
metadata child, the VM state should probably go there; if it is a
filter, the VM state should probably go there.  It follows that we
should generally go down to the primary child.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
66b129ac5e block: Iterate over children in refresh_limits
Instead of looking at just bs->file and bs->backing, we should look at
all children that could end up receiving forwarded requests.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
fb787f02a6 vmdk: Drop vmdk_co_flush()
Before HEAD^, we needed this because bdrv_co_flush() by itself would
only flush bs->file.  With HEAD^, bdrv_co_flush() will flush all
children on which a WRITE or WRITE_UNCHANGED permission has been taken.
Thus, vmdk no longer needs to do it itself.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
883833e29c block: Flush all children in generic code
If the driver does not support .bdrv_co_flush() so bdrv_co_flush()
itself has to flush the children of the given node, it should not flush
just bs->file->bs, but in fact all children that might have been written
to (judging from the permissions taken on them).

This is a bug fix for qcow2 images with an external data file, as they
so far did not flush that data_file node.

In any case, the BLKDBG_EVENT() should be emitted on the primary child,
because that is where a blkdebug node would be if there is any.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
23b93525a2 block: Use bdrv_cow_child() in bdrv_co_truncate()
The condition modified here is not about potentially filtered children,
but only about COW sources (i.e. traditional backing files).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
67acfd2188 stream: Deal with filters
Because of the (not so recent anymore) changes that make the stream job
independent of the base node and instead track the node above it, we
have to split that "bottom" node into two cases: The bottom COW node,
and the node directly above the base node (which may be an R/W filter
or the bottom COW node).

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
cb8503159a block: Use CAFs in block status functions
Use the child access functions in the block status inquiry functions as
appropriate.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
93393e698c block: Use bdrv_filter_(bs|child) where obvious
Places that use patterns like

    if (bs->drv->is_filter && bs->file) {
        ... something about bs->file->bs ...
    }

should be

    BlockDriverState *filtered = bdrv_filter_bs(bs);
    if (filtered) {
        ... something about @filtered ...
    }

instead.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
4935e8be22 copy-on-read: Support compressed writes
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
e7e754aec3 throttle: Support compressed writes
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
8b8277cdb0 block: Drop bdrv_is_encrypted()
The original purpose of bdrv_is_encrypted() was to inquire whether a BDS
can be used without the user entering a password or not.  It has not
been used for that purpose for quite some time.

Actually, it is not even fit for that purpose, because to answer that
question, it would have recursively query all of the given node's
children.

So now we have to decide in which direction we want to fix
bdrv_is_encrypted(): Recursively query all children, or drop it and just
use bs->encrypted to get the current node's status?

Nowadays, its only purpose is to report through bdrv_query_image_info()
whether the given image is encrypted or not.  For this purpose, it is
probably more interesting to see whether a given node itself is
encrypted or not (otherwise, a management application cannot discern for
certain which nodes are really encrypted and which just have encrypted
children).

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé
b111b3fcde block/nvme: Use an array of EventNotifier
In preparation of using multiple IRQ (thus multiple eventfds)
make BDRVNVMeState::irq_notifier an array (for now of a single
element, the admin queue notifier).

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-16-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé
7a1fb2ef40 block/nvme: Extract nvme_poll_queue()
As we want to do per-queue polling, extract the nvme_poll_queue()
method which operates on a single queue.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-15-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé
0a28b02ef9 block/nvme: Simplify nvme_create_queue_pair() arguments
nvme_create_queue_pair() doesn't require BlockDriverState anymore.
Replace it by BDRVNVMeState and AioContext to simplify.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-14-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé
073a06978c block/nvme: Replace BDRV_POLL_WHILE by AIO_WAIT_WHILE
BDRV_POLL_WHILE() is defined as:

  #define BDRV_POLL_WHILE(bs, cond) ({          \
      BlockDriverState *bs_ = (bs);             \
      AIO_WAIT_WHILE(bdrv_get_aio_context(bs_), \
                     cond); })

As we will remove the BlockDriverState use in the next commit,
start by using the exploded version of BDRV_POLL_WHILE().

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-13-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé
3a6d34d066 block/nvme: Simplify nvme_init_queue() arguments
nvme_init_queue() doesn't require BlockDriverState anymore.
Replace it by BDRVNVMeState to simplify.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-12-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé
38e1f8186f block/nvme: Replace qemu_try_blockalign(bs) by qemu_try_memalign(pg_sz)
qemu_try_blockalign() is a generic API that call back to the
block driver to return its page alignment. As we call from
within the very same driver, we already know to page alignment
stored in our state. Remove indirections and use the value from
BDRVNVMeState.
This change is required to later remove the BlockDriverState
argument, to make nvme_init_queue() per hardware, and not per
block driver.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-11-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé
2ed846930d block/nvme: Replace qemu_try_blockalign0 by qemu_try_blockalign/memset
In the next commit we'll get rid of qemu_try_blockalign().
To ease review, first replace qemu_try_blockalign0() by explicit
calls to qemu_try_blockalign() and memset().

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-10-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé
7d3b214ae4 block/nvme: Use union of NvmeIdCtrl / NvmeIdNs structures
We allocate an unique chunk of memory then use it for two
different structures. By using an union, we make it clear
the data is overlapping (and we can remove the casts).

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-9-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:24:53 +02:00
Philippe Mathieu-Daudé
4d98093937 block/nvme: Rename local variable
We are going to modify the code in the next commit. Renaming
the 'resp' variable to 'id' first makes the next commit easier
to review. No logical changes.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-8-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé
c8edbfb2cc block/nvme: Use common error path in nvme_add_io_queue()
Rearrange nvme_add_io_queue() by using a common error path.
This will be proven useful in few commits where we add IRQ
notification to the IO queues.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-7-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé
bf6ce5ec6d block/nvme: Improve error message when IO queue creation failed
Do not use the same error message for different failures.
Display a different error whether it is the CQ or the SQ.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-6-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé
73159e52e6 block/nvme: Define INDEX macros to ease code review
Use definitions instead of '0' or '1' indexes. Also this will
be useful when using multi-queues later.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-5-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé
0ea45f76eb block/nvme: Let nvme_create_queue_pair() fail gracefully
As nvme_create_queue_pair() is allowed to fail, replace the
alloc() calls by try_alloc() to avoid aborting QEMU.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-4-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé
e266f52cfb block/nvme: Avoid further processing if trace event not enabled
Avoid further processing if TRACE_NVME_SUBMIT_COMMAND_RAW is
not enabled. This is an untested intend of performance optimization.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-3-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé
e4f310fe7f block/nvme: Replace magic value by SCALE_MS definition
Use self-explicit SCALE_MS definition instead of magic value.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200821195359.1285345-2-philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:23:55 +02:00
Peter Maydell
df8176274a nbd patches for 2020-09-02
- fix a few iotests affected by earlier nbd changes
 - avoid blocking qemu by nbd client in connect()
 - build qemu-nbd for mingw
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl9QFB8ACgkQp6FrSiUn
 Q2pTwggArPN7dRwm1KD9jL8X6PV01uhLuLRFzrrofFX22Blroj5XPbR24BdKeN8V
 uDSFGzzoz2Vx3vKPHyZdx7bbeEL0pF9dzjZX6JwzT0McbVuge5aG2zC/ARdfc4PN
 1Yf2FB/nY8Xt5G12usu3FIz7JBoQNm4mlPnqVqf7t0LQxgUFvO7F2LernyqEOYKS
 uSpXHNFqddZcax7etXeldJOlSGJBQTaCmplrJbw2ilVhLZJD+0OglY4SAsrrenN+
 gb/KD4REjhtQsmoTaqthGCnGXipEoJYDfEOMhbkl0UK+9Mx0t+3cj3tflG0eGldM
 ERk1d4d7VSlyqIy7w43v3+IB8M99ag==
 =7Cn5
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2020-09-02' into staging

nbd patches for 2020-09-02

- fix a few iotests affected by earlier nbd changes
- avoid blocking qemu by nbd client in connect()
- build qemu-nbd for mingw

# gpg: Signature made Wed 02 Sep 2020 22:52:31 BST
# gpg:                using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg:                 aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2020-09-02:
  nbd: disable signals and forking on Windows builds
  nbd: skip SIGTERM handler if NBD device support is not built
  block: add missing socket_init() calls to tools
  block/nbd: use non-blocking connect: fix vm hang on connect()
  iotests/259: Fix reference output
  iotests/059: Fix reference output

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-09-03 21:35:01 +01:00
Vladimir Sementsov-Ogievskiy
1dc4718d84 block/nbd: use non-blocking connect: fix vm hang on connect()
This makes nbd's connection_co yield during reconnects, so that
reconnect doesn't block the main thread. This is very important in
case of an unavailable nbd server host: connect() call may take a long
time, blocking the main thread (and due to reconnect, it will hang
again and again with small gaps of working time during pauses between
connection attempts).

Realization notes:

 - We don't want to implement non-blocking connect() over non-blocking
 socket, because getaddrinfo() doesn't have portable non-blocking
 realization anyway, so let's just use a thread for both getaddrinfo()
 and connect().

 - We can't use qio_channel_socket_connect_async (which behaves
 similarly and starts a thread to execute connect() call), as it's relying
 on someone iterating main loop (g_main_loop_run() or something like
 this), which is not always the case.

 - We can't use thread_pool_submit_co API, as thread pool waits for all
 threads to finish (but we don't want to wait for blocking reconnect
 attempt on shutdown.

 So, we just create the thread by hand. Some additional difficulties
 are:

 - We want our connect to avoid blocking drained sections and aio context
 switches. To achieve this, we make it possible to "cancel" synchronous
 wait for the connect (which is a coroutine yield actually), still,
 the thread continues in background, and if successful, its result may be
 reused on next reconnect attempt.

 - We don't want to wait for reconnect on shutdown, so there is
 CONNECT_THREAD_RUNNING_DETACHED thread state, which means that the block
 layer is no longer interested in a result, and thread should close new
 connected socket on finish and free the state.

How to reproduce the bug, fixed with this commit:

1. Create an image on node1:
   qemu-img create -f qcow2 xx 100M

2. Start NBD server on node1:
   qemu-nbd xx

3. Start vm with second nbd disk on node2, like this:

  ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \
    file=/work/images/cent7.qcow2 -drive file=nbd+tcp://192.168.100.2 \
    -vnc :0 -qmp stdio -m 2G -enable-kvm -vga std

4. Access the vm through vnc (or some other way?), and check that NBD
   drive works:

   dd if=/dev/sdb of=/dev/null bs=1M count=10

   - the command should succeed.

5. Now, let's trigger nbd-reconnect loop in Qemu process. For this:

5.1 Kill NBD server on node1

5.2 run "dd if=/dev/sdb of=/dev/null bs=1M count=10" in the guest
    again. The command should fail and a lot of error messages about
    failing disk may appear as well.

    Now NBD client driver in Qemu tries to reconnect.
    Still, VM works well.

6. Make node1 unavailable on NBD port, so connect() from node2 will
   last for a long time:

   On node1 (Note, that 10809 is just a default NBD port):

   sudo iptables -A INPUT -p tcp --dport 10809 -j DROP

   After some time the guest hangs, and you may check in gdb that Qemu
   hangs in connect() call, issued from the main thread. This is the
   BUG.

7. Don't forget to drop iptables rule from your node1:

   sudo iptables -D INPUT -p tcp --dport 10809 -j DROP

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200812145237.4396-1-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: minor wording and formatting tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-09-02 16:47:23 -05:00
Peter Maydell
e4d8b7c1a9 qemu-nvme
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE28EdLTc7SjdV9QLsYlFWYQpPbMAFAl9Pro4ACgkQYlFWYQpP
 bMC1CRAAkawTI4mMcOfI3smFoMeiY8kZJWXJUBXfHbMJ4asaoIjTkH/lXRXBw7KQ
 sH5tB9CuOums3VjagkZ0Sw6R/kP1LbyJTAwq/pwOXwYRDc/E3zQpMblkIHH1boIM
 Bxl814hw3hBqV+D0wgeKpl83lbiOpd10Cbpdb/xNKat6qVquLGurSGKgA7jNuF4s
 oPTPtfZpyH9LUr4DV+sL+fGX6vaCdSFZPZUhJqwFfx79+r3+YiHGLAE6fgsdGDJt
 2RSSKMqBe2gg0BY5ToW9L55BsLnwMMrAZnGzEkeZvRKqm0JZBXQsERa61p4VEAJf
 uYkSEqOwsKjXQNTdDEekyH67AkgXaoqG0hiiOcgoLsla7C0zROtoKcfVM/+WC0LT
 T0/bfgubmoDV8kLzPuOV8xOGxjfbu4Qlxy1JsIC6BU4zBQvpDwOeTx3MUWaCUfvk
 YmDMEhZWGcZ3RBLrgQmzm4ZKMtGdYXnGQz5dwVkRRfghQs2fl5ZmUjGR7MqKe18n
 4K0nzhPiXbOTlqvLVvzVlrBzdc8ECAs1kVoJF7C3LwRmXbT2N/fUhZP/nYpeM2Hj
 DQNmA8KpXMKae2+2iDnQNWbvdpz3SiHD6dK7A1bEsdoG0L60xfyeAF+JuPiESUnd
 OAhf+muxKiInv2k5GNh7mDZPWM6nDepf/PZP6ohc7dKxVam7N2M=
 =Y23H
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/nvme/tags/pull-nvme-20200902' into staging

qemu-nvme

# gpg: Signature made Wed 02 Sep 2020 15:39:10 BST
# gpg:                using RSA key DBC11D2D373B4A3755F502EC625156610A4F6CC0
# gpg: Good signature from "Keith Busch <kbusch@kernel.org>" [unknown]
# gpg:                 aka "Keith Busch <keith.busch@gmail.com>" [unknown]
# gpg:                 aka "Keith Busch <keith.busch@intel.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: DBC1 1D2D 373B 4A37 55F5  02EC 6251 5661 0A4F 6CC0

* remotes/nvme/tags/pull-nvme-20200902: (39 commits)
  hw/block/nvme: remove explicit qsg/iov parameters
  hw/block/nvme: use preallocated qsg/iov in nvme_dma_prp
  hw/block/nvme: consolidate qsg/iov clearing
  hw/block/nvme: add ns/cmd references in NvmeRequest
  hw/block/nvme: be consistent about zeros vs zeroes
  hw/block/nvme: add check for mdts
  hw/block/nvme: refactor request bounds checking
  hw/block/nvme: verify validity of prp lists in the cmb
  hw/block/nvme: add request mapping helper
  hw/block/nvme: add tracing to nvme_map_prp
  hw/block/nvme: refactor dma read/write
  hw/block/nvme: destroy request iov before reuse
  hw/block/nvme: remove redundant has_sg member
  hw/block/nvme: replace dma_acct with blk_acct equivalent
  hw/block/nvme: add mapping helpers
  hw/block/nvme: memset preallocated requests structures
  hw/block/nvme: bump supported version to v1.3
  hw/block/nvme: provide the mandatory subnqn field
  hw/block/nvme: enforce valid queue creation sequence
  hw/block/nvme: reject invalid nsid values in active namespace id list
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-09-02 21:20:20 +01:00