Commit Graph

3425 Commits

Author SHA1 Message Date
Peter Maydell
79b2a13aa8 nbd patches for 2017-10-14
- Marc-André Lureau - NBD: use g_new() family of functions
 - Vladimir Sementsov-Ogievskiy - first half of 00/13 nbd minimal structured read
 -----BEGIN PGP SIGNATURE-----
 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
 
 iQEcBAABCAAGBQJZ4q4XAAoJEKeha0olJ0NqEooH/R8NKYACELA39xrLdEMUQuZY
 1Lm3/OtpBIICKx7OiZ7LniqApAI++FgjNxOf6PAfNG0TmEA+wMFaZ6NJEdi9DAmv
 kJVLsxiqKLDD+WIKMq5XfZQoFMJ8rV8W2/BYx9cF3Pl4KMT20qDsumsncZJ7DGOR
 jjsbAI8Q6g45VBx6TJbxXiTMDj87nIyNaydAGzRQTmEHtnmh8mllPiuEhJu24l6G
 7CQKfcu4/7Te/5PvJIPn7CxHdVjLYalgWDRkU3kXcwmO8vGQEkYoiHPoc8lGsGtw
 oXJ2YIODYBIjeICkF0/PjT9aoeJQG8EuHR1hT0CW5dVBZz/DlVP/j+EZ6IDV/8k=
 =ud0Z
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2017-10-14' into staging

nbd patches for 2017-10-14

- Marc-André Lureau - NBD: use g_new() family of functions
- Vladimir Sementsov-Ogievskiy - first half of 00/13 nbd minimal structured read

# gpg: Signature made Sun 15 Oct 2017 01:38:47 BST
# gpg:                using RSA key 0xA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>"
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>"
# gpg:                 aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2017-10-14:
  nbd: header constants indenting
  nbd/server: simplify reply transmission
  nbd/server: refactor nbd_co_send_simple_reply parameters
  nbd/server: do not use NBDReply structure
  nbd/server: structurize simple reply header sending
  nbd: rename some simple-request related objects to be _simple_
  block/nbd-client: refactor nbd_co_receive_reply
  block/nbd-client: assert qiov len once in nbd_co_request
  NBD: use g_new() family of functions

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-16 15:54:42 +01:00
Manos Pitsidianakis
b867eaa17b block/throttle.c: add bdrv_co_drain_begin/end callbacks
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-13 12:38:41 +01:00
Manos Pitsidianakis
f8ea8dacf0 block: rename bdrv_co_drain to bdrv_co_drain_begin
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-13 12:38:41 +01:00
Manos Pitsidianakis
481cad48e5 block: add bdrv_co_drain_end callback
BlockDriverState has a bdrv_co_drain() callback but no equivalent for
the end of the drain. The throttle driver (block/throttle.c) needs a way
to mark the end of the drain in order to toggle io_limits_disabled
correctly, thus bdrv_co_drain_end is needed.

Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-13 12:38:41 +01:00
Vladimir Sementsov-Ogievskiy
ed397b2fe7 block/nbd-client: refactor nbd_co_receive_reply
Pass handle parameter directly, as the whole request isn't needed.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171012095319.136610-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2017-10-12 16:20:27 -05:00
Vladimir Sementsov-Ogievskiy
4bfe4478d1 block/nbd-client: assert qiov len once in nbd_co_request
Also improve the assertion: check that qiov is NULL for other commands
than CMD_READ and CMD_WRITE.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171012095319.136610-2-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2017-10-12 16:19:35 -05:00
Eduardo Habkost
e5766d6ec7 config: qemu_config_parse() return number of config groups
Change qemu_config_parse() to return the number of config groups
in success and -EINVAL on error. This will allow callers of
qemu_config_parse() to check if something was really loaded from
the config file.

All existing callers of qemu_config_parse() and
qemu_read_config_file() only check if the return value was
negative, so the change shouldn't affect them.

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171004025043.3788-2-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-09 23:21:52 -03:00
Vladimir Sementsov-Ogievskiy
ce960aa906 block/mirror: check backing in bdrv_mirror_top_flush
Backing may be zero after failed bdrv_append in mirror_start_job,
which leads to SIGSEGV.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20170929152255.5431-1-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:48 +02:00
Pavel Butsykin
163bc39d2c qcow2: truncate the tail of the image file after shrinking the image
Now after shrinking the image, at the end of the image file, there might be a
tail that probably will never be used. So we can find the last used cluster and
cut the tail.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170929121613.25997-3-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:48 +02:00
Pavel Butsykin
76a2a30a99 qcow2: fix return error code in qcow2_truncate()
Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170929121613.25997-2-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:48 +02:00
Vladimir Sementsov-Ogievskiy
18775ff326 block/mirror: check backing in bdrv_mirror_top_refresh_filename
Backing may be zero after failed bdrv_attach_child in
bdrv_set_backing_hd, which leads to SIGSEGV.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20170928120300.58164-1-vsementsov@virtuozzo.com
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Daniel P. Berrange
d67a6b09b4 block: support passthrough of BDRV_REQ_FUA in crypto driver
The BDRV_REQ_FUA flag can trivially be allowed in the crypt driver
as a passthrough to the underlying block driver.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-7-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Daniel P. Berrange
4609742a49 block: convert qcrypto_block_encrypt|decrypt to take bytes offset
Instead of sector offset, take the bytes offset when encrypting
or decrypting data.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-6-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Daniel P. Berrange
a73466fbad block: convert crypto driver to bdrv_co_preadv|pwritev
Make the crypto driver implement the bdrv_co_preadv|pwritev
callbacks, and also use bdrv_co_preadv|pwritev for I/O
with the protocol driver beneath. This replaces sector based
I/O with byte based I/O, and allows us to stop assuming the
physical sector size matches the encryption sector size.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-5-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Daniel P. Berrange
31376555c7 block: fix data type casting for crypto payload offset
The crypto APIs report the offset of the data payload as an uint64_t
type, but the block driver is casting to size_t or ssize_t which will
potentially truncate.

Most of the block APIs use int64_t for offsets meanwhile, so even if
using uint64_t in the crypto block driver we are still at risk of
truncation.

Change the block crypto driver to use uint64_t, but add asserts that
the value is less than INT64_MAX.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-4-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Daniel P. Berrange
161253e2d0 block: use 1 MB bounce buffers for crypto instead of 16KB
Using 16KB bounce buffers creates a significant performance
penalty for I/O to encrypted volumes on storage which high
I/O latency (rotating rust & network drives), because it
triggers lots of fairly small I/O operations.

On tests with rotating rust, and cache=none|directsync,
write speed increased from 2MiB/s to 32MiB/s, on a par
with that achieved by the in-kernel luks driver. With
other cache modes the in-kernel driver is still notably
faster because it is able to report completion of the
I/O request before any encryption is done, while the
in-QEMU driver must encrypt the data before completion.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-2-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Eric Blake
cb2e28780c block: Perform copy-on-read in loop
Improve our braindead copy-on-read implementation.  Pre-patch,
we have multiple issues:
- we create a bounce buffer and perform a write for the entire
request, even if the active image already has 99% of the
clusters occupied, and really only needs to copy-on-read the
remaining 1% of the clusters
- our bounce buffer was as large as the read request, and can
needlessly exhaust our memory by using double the memory of
the request size (the original request plus our bounce buffer),
rather than a capped maximum overhead beyond the original
- if a driver has a max_transfer limit, we are bypassing the
normal code in bdrv_aligned_preadv() that fragments to that
limit, and instead attempt to read the entire buffer from the
driver in one go, which some drivers may assert on
- a client can request a large request of nearly 2G such that
rounding the request out to cluster boundaries results in a
byte count larger than 2G.  While this cannot exceed 32 bits,
it DOES have some follow-on problems:
-- the call to bdrv_driver_pread() can assert for exceeding
BDRV_REQUEST_MAX_BYTES, if the driver is old and lacks
.bdrv_co_preadv
-- if the buffer is all zeroes, the subsequent call to
bdrv_co_do_pwrite_zeroes is a no-op due to a negative size,
which means we did not actually copy on read

Fix all of these issues by breaking up the action into a loop,
where each iteration is capped to sane limits.  Also, querying
the allocation status allows us to optimize: when data is
already present in the active layer, we don't need to bounce.

Note that the code has a telling comment that copy-on-read
should probably be a filter driver rather than a bolt-on hack
in io.c; but that remains a task for another day.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
d855ebcd3c block: Add blkdebug hook for copy-on-read
Make it possible to inject errors on writes performed during a
read operation due to copy-on-read semantics.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
9cdcfd9f7a block: Uniform handling of 0-length bdrv_get_block_status()
Handle a 0-length block status request up front, with a uniform
return value claiming the area is not allocated.

Most callers don't pass a length of 0 to bdrv_get_block_status()
and friends; but it definitely happens with a 0-length read when
copy-on-read is enabled.  While we could audit all callers to
ensure that they never make a 0-length request, and then assert
that fact, it was just as easy to fix things to always report
success (as long as the callers are careful to not go into an
infinite loop).  However, we had inconsistent behavior on whether
the status is reported as allocated or defers to the backing
layer, depending on what callbacks the driver implements, and
possibly wasting quite a few CPU cycles to get to that answer.
Consistently reporting unallocated up front doesn't really hurt
anything, and makes it easier both for callers (0-length requests
now have well-defined behavior) and for drivers (drivers don't
have to deal with 0-length requests).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Kevin Wolf
bde70715b6 commit: Remove overlay_bs
We don't need to make any assumptions about the graph layout above the
top node of the commit operation any more. Remove the use of
bdrv_find_overlay() and related variables from the commit job code.

bdrv_drop_intermediate() doesn't use the 'active' parameter any more, so
we can just drop it.

The overlay node was previously added to the block job to get a
BLK_PERM_GRAPH_MOD. We really need to respect those permissions in
bdrv_drop_intermediate() now, but as long as we haven't figured out yet
how BLK_PERM_GRAPH_MOD is actually supposed to work, just leave a TODO
comment there.

With this change, it is now possible to perform another block job on an
overlay node without conflicts. qemu-iotests 030 is changed accordingly.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-10-06 16:28:58 +02:00
Kevin Wolf
61f09cea01 commit: Support multiple roots above top node
This changes the commit block job to support operation in a graph where
there is more than a single active layer that references the top node.

This involves inserting the commit filter node not only on the path
between the given active node and the top node, but between the top node
and all of its parents.

On completion, bdrv_drop_intermediate() must consider all parents for
updating the backing file link. These parents may be backing files
themselves and as such read-only; reopen them temporarily if necessary.
Previously this was achieved by the bdrv_reopen() calls in the commit
block job that made overlay_bs read-write for the whole duration of the
block job, even though write access is only needed on completion.

Now that we consider all parents, overlay_bs is meaningless. It is left
in place in this commit, but we'll remove it soon.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
ca75962244 dirty-bitmap: Convert internal hbitmap size/granularity
Now that all callers are using byte-based interfaces, there's no
reason for our internal hbitmap to remain with sector-based
granularity.  It also simplifies our internal scaling, since we
already know that hbitmap widens requests out to granularity
boundaries.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
0fdf1a4f68 dirty-bitmap: Switch bdrv_set_dirty() to bytes
Both callers already had bytes available, but were scaling to
sectors.  Move the scaling to internal code.  In the case of
bdrv_aligned_pwritev(), we are now passing the exact offset
rather than a rounded sector-aligned value, but that's okay
as long as dirty bitmap widens start/bytes to granularity
boundaries.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
49d741b504 qcow2: Switch store_bitmap_data() to byte-based iteration
Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.

iotests 165 was rather weak - on a default 64k-cluster image, where
bitmap granularity also defaults to 64k bytes, a single cluster of
the bitmap table thus covers (64*1024*8) bits which each cover 64k
bytes, or 32G of image space.  But the test only uses a 1G image,
so it cannot trigger any more than one loop of the code in
store_bitmap_data(); and it was writing to the first cluster.  In
order to test that we are properly aligning which portions of the
bitmap are being written to the file, we really want to test a case
where the first dirty bit returned by bdrv_dirty_iter_next() is not
aligned to the start of a cluster, which we can do by modifying the
test to write data that doesn't happen to fall in the first cluster
of the image.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
ab94db6f76 qcow2: Switch load_bitmap_data() to byte-based iteration
Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
b85ee45334 qcow2: Switch qcow2_measure() to byte-based iteration
This is new code, but it is easier to read if it makes passes over
the image using bytes rather than sectors (and will get easier in
the future when bdrv_get_block_status is converted to byte-based).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
23ca459a45 mirror: Switch mirror_dirty_init() to byte-based iteration
Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
e0d7f73e63 dirty-bitmap: Change bdrv_[re]set_dirty_bitmap() to use bytes
Some of the callers were already scaling bytes to sectors; others
can be easily converted to pass byte offsets, all in our shift
towards a consistent byte interface everywhere.  Making the change
will also make it easier to write the hold-out callers to use byte
rather than sectors for their iterations; it also makes it easier
for a future dirty-bitmap patch to offload scaling over to the
internal hbitmap.  Although all callers happen to pass
sector-aligned values, make the internal scaling robust to any
sub-sector requests.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
3b5d4df0c6 dirty-bitmap: Change bdrv_get_dirty_locked() to take bytes
Half the callers were already scaling bytes to sectors; the other
half can eventually be simplified to use byte iteration.  Both
callers were already using the result as a bool, so make that
explicit.  Making the change also makes it easier for a future
dirty-bitmap patch to offload scaling over to the internal hbitmap.

Remember, asking whether a byte is dirty is effectively asking
whether the entire granularity containing the byte is dirty, since
we only track dirtiness by granularity.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
9a46dba7b7 dirty-bitmap: Change bdrv_get_dirty_count() to report bytes
Thanks to recent cleanups, all callers were scaling a return value
of sectors into bytes; do the scaling internally instead.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
f798184cfd dirty-bitmap: Change bdrv_dirty_iter_next() to report byte offset
Thanks to recent cleanups, most callers were scaling a return value
of sectors into bytes (the exception, in qcow2-bitmap, will be
converted to byte-based iteration later).  Update the interface to
do the scaling internally instead.

In qcow2-bitmap, the code was specifically checking for an error
return of -1.  To avoid a regression, we either have to make sure
we continue to return -1 (rather than a scaled -512) on error, or
we have to fix the caller to treat all negative values as error
rather than just one magic value.  It's easy enough to make both
changes at the same time, even though either one in isolation
would work.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
715a74d819 dirty-bitmap: Set iterator start by offset, not sector
All callers to bdrv_dirty_iter_new() passed 0 for their initial
starting point, drop that parameter.

Most callers to bdrv_set_dirty_iter() were scaling a byte offset to
a sector number; the exception qcow2-bitmap will be converted later
to use byte rather than sector iteration.  Move the scaling to occur
internally to dirty bitmap code instead, so that callers now pass
in bytes.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
c7e7c87ac8 qcow2: Switch sectors_covered_by_bitmap_cluster() to byte-based
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the qcow2 bitmap
helper function sectors_covered_by_bitmap_cluster(), renaming it
to bytes_covered_by_bitmap_cluster() in the process.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
86f6ae67e1 dirty-bitmap: Change bdrv_dirty_bitmap_*serialize*() to take bytes
Right now, the dirty-bitmap code exposes the fact that we use
a scale of sector granularity in the underlying hbitmap to anything
that wants to serialize a dirty bitmap.  It's nicer to uniformly
expose bytes as our dirty-bitmap interface, matching the previous
change to bitmap size.  The only caller to serialization is currently
qcow2-cluster.c, which becomes a bit more verbose because it is still
tracking sectors for other reasons, but a later patch will fix that
to more uniformly use byte offsets everywhere.  Likewise, within
dirty-bitmap, we have to add more assertions that we are not
truncating incorrectly, which can go away once the internal hbitmap
is byte-based rather than sector-based.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
993e6525bf dirty-bitmap: Track bitmap size by bytes
We are still using an internal hbitmap that tracks a size in sectors,
with the granularity scaled down accordingly, because it lets us
use a shortcut for our iterators which are currently sector-based.
But there's no reason we can't track the dirty bitmap size in bytes,
since it is (mostly) an internal-only variable (remember, the size
is how many bytes are covered by the bitmap, not how many bytes the
bitmap occupies).  A later cleanup will convert dirty bitmap
internals to be entirely byte-based, eliminating the intermediate
sector rounding added here; and technically, since bdrv_getlength()
already rounds up to sectors, our use of DIV_ROUND_UP is more for
theoretical completeness than for any actual rounding.

Use is_power_of_2() while at it, instead of open-coding that.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
ebfcd2e75f dirty-bitmap: Change bdrv_dirty_bitmap_size() to report bytes
We're already reporting bytes for bdrv_dirty_bitmap_granularity();
mixing bytes and sectors in our return values is a recipe for
confusion.  A later cleanup will convert dirty bitmap internals
to be entirely byte-based, but in the meantime, we should report
the bitmap size in bytes.

The only external caller in qcow2-bitmap.c is temporarily more verbose
(because it is still using sector-based math), but will later be
switched to track progress by bytes instead of sectors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
1b6cc579de dirty-bitmap: Avoid size query failure during truncate
We've previously fixed several places where we failed to account
for possible errors from bdrv_nb_sectors().  Fix another one by
making bdrv_dirty_bitmap_truncate() take the new size from the
caller instead of querying itself; then adjust the sole caller
bdrv_truncate() to pass the size just determined by a successful
resize, or to reuse the size given to the original truncate
operation when refresh_total_sectors() was not able to confirm the
actual size (the two sizes can potentially differ according to
rounding constraints), thus avoiding sizing the bitmaps to -1.
This also fixes a bug where not all failure paths in
bdrv_truncate() would set errp.

Note that bdrv_truncate() is still a bit awkward.  We may want
to revisit it later and clean up things to better guarantee that
a resize attempt either fails cleanly up front, or cannot fail
after guest-visible changes have been made (if temporary changes
are made, then they need to be cleanly rolled back).  But that
is a task for another day; for now, the goal is the bare minimum
fix to ensure that just bdrv_dirty_bitmap_truncate() cannot fail.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
dfe55c3577 dirty-bitmap: Drop unused functions
We had several functions that no one is currently using, and which
use sector-based interfaces.  I'm trying to convert towards byte-based
interfaces, so it's easier to just drop the unused functions:

bdrv_dirty_bitmap_get_meta
bdrv_dirty_bitmap_get_meta_locked
bdrv_dirty_bitmap_reset_meta
bdrv_dirty_bitmap_meta_granularity

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
113754f3a8 qcow2: Ensure bitmap serialization is aligned
When subdividing a bitmap serialization, the code in hbitmap.c
enforces that start/count parameters are aligned (except that
count can end early at end-of-bitmap).  We exposed this required
alignment through bdrv_dirty_bitmap_serialization_align(), but
forgot to actually check that we comply with it.

Fortunately, qcow2 is never dividing bitmap serialization smaller
than one cluster (which is a minimum of 512 bytes); so we are
always compliant with the serialization alignment (which insists
that we partition at least 64 bits per chunk) because we are doing
at least 4k bits per chunk.

Still, it's safer to add an assertion (for the unlikely case that
we'd ever support a cluster smaller than 512 bytes, or if the
hbitmap implementation changes what it considers to be aligned),
rather than leaving bdrv_dirty_bitmap_serialization_align()
without a caller.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
ecbfa2817d hbitmap: Rename serialization_granularity to serialization_align
The only client of hbitmap_serialization_granularity() is dirty-bitmap's
bdrv_dirty_bitmap_serialization_align().  Keeping the two names consistent
is worthwhile, and the shorter name is more representative of what the
function returns (the required alignment to be used for start/count of
other serialization functions, where violating the alignment causes
assertion failures).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
765d9df962 block: Typo fix in copy_on_readv()
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Peter Maydell
cfe4cade05 Block layer patches
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZylugAAoJEH8JsnLIjy/Wp1QP/jtwre8WwtlX3SZ82cyiwonc
 fODcokF6iEsX7vs9vMr8lkbwF4mjxaU2Xf/knC98J/6rCxM8DRP/MW4BZttTxDI9
 ++U926cIHHzjtSDalcjfTKnJD7PikHbLXz3SJ+u9hPnIeGs56hybCU6iHrrIcOTs
 YSMtD3eUQ8B8JaoegKgNyqkhvfhEdHlWFCq9/T2YEWwYMzGt1jvSTRqe3vr8vIu3
 v7QKvh35gm965yaUTGpv7Ej7TOBTWOaYBo9D1TBKNEZOFTBjqyldciyBoDhCF2P9
 +4EsNTkZ7u20Rko41dzsZPzhjG10gm/lNZNj3Cul9ta4kQ0hGeOjEujd9L9ANTGl
 gwnPPHKwgax5O+ctCPmrU7yHG+XIQD3gckC69qQeRXnPYQ4Jeo/LqhwjU+FcZfHs
 97Lz6CHQHgtBP9JJwBMtUp77HJ58KPnnWIxGkb9u2vm4CpvRFkMrx5ekmj//9klX
 5niRiqkNdrkEUnu/FIXOXxSmwnlhedAGQNq7ALSoW95El1QCy8Mm0eKEvmHyvZzd
 z2gSufLX6ynOaG4x5oY5eezKm6F4Hxwt+w8Svj9PHXIrmrEIop11LG5MVsDGDjyh
 XKiLddEIVKTYGwffX0CGTLYA34m2uHPkVVrMOIEvni3G6byXkb10+4pBpuTu/O/h
 wQPFraquH1I2B5YETsMa
 =7Bt2
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Tue 26 Sep 2017 14:52:32 BST
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (24 commits)
  block/qcow2-bitmap: fix use of uninitialized pointer
  qemu-iotests: add shrinking image test
  qcow2: add shrink image support
  qcow2: add qcow2_cache_discard
  qemu-img: add --shrink flag for resize
  iotests: fix 181: enable postcopy-ram capability on target
  qemu-iotests: Test change-backing-file command
  block: Fix permissions after bdrv_reopen()
  block: reopen: Queue children after their parents
  block: Base permissions on rw state after reopen
  block: Add reopen queue to bdrv_check_perm()
  block: Add reopen_queue to bdrv_child_perm()
  qemu-io: Drop write permissions before read-only reopen
  block: Clean up some bad code in the vvfat driver
  block/throttle-groups.c: allocate RestartData on the heap
  throttle: Assert that bkt->max is valid in throttle_compute_wait()
  iotests: Print full path of bad output if mismatch
  iotests: use virtio aliases for 067
  iotests: use -ccw on s390x for 051
  iotests: use -ccw on s390x for 040, 139, and 182
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-27 16:48:39 +01:00
Vladimir Sementsov-Ogievskiy
5330f32b71 block/qcow2-bitmap: fix use of uninitialized pointer
Without initialization to zero dirty_bitmap field may be not zero
for a bitmap which should not be stored and
qcow2_store_persistent_dirty_bitmaps will erroneously call
store_bitmap for it which leads to SIGSEGV on bdrv_dirty_bitmap_name.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20170922144353.4220-1-vsementsov@virtuozzo.com
Cc: qemu-stable@nongnu.org
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-09-26 15:00:32 +02:00
Pavel Butsykin
46b732cdf3 qcow2: add shrink image support
This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170918124230.8152-4-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-09-26 15:00:32 +02:00
Pavel Butsykin
f71c08ea8e qcow2: add qcow2_cache_discard
Whenever l2/refcount table clusters are discarded from the file we can
automatically drop unnecessary content of the cache tables. This reduces
the chance of eviction useful cache data and eliminates inconsistent data
in the cache with the data in the file.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170918124230.8152-3-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-09-26 15:00:32 +02:00
Kevin Wolf
e0995dc3da block: Add reopen_queue to bdrv_child_perm()
In the context of bdrv_reopen(), we'll have to look at the state of the
graph as it will be after the reopen. This interface addition is in
preparation for the change.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-09-26 14:46:23 +02:00
Thomas Huth
7a6ab45e19 block: Clean up some bad code in the vvfat driver
Remove the unnecessary home-grown redefinition of the assert() macro here,
and remove the unusable debug code at the end of the checkpoint() function.
The code there uses assert() with side-effects (assignment to the "mapping"
variable), which should be avoided. Looking more closely, it seems as it is
apparently also only usable for one certain directory layout (with a file
named USB.H in it) and thus is of no use for the rest of the world.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-26 14:46:23 +02:00
Manos Pitsidianakis
43a5dc02fd block/throttle-groups.c: allocate RestartData on the heap
RestartData is the opaque data of the throttle_group_restart_queue_entry
coroutine. By being stack allocated, it isn't available anymore if
aio_co_enter schedules the coroutine with a bottom half and runs after
throttle_group_restart_queue returns.

Cc: qemu-stable@nongnu.org
Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-26 14:46:23 +02:00
Fam Zheng
97ec9117c3 file-posix: Clear out first sector in hdev_create
People get surprised when, after "qemu-img create -f raw /dev/sdX", they
still see qcow2 with "qemu-img info", if previously the bdev had a qcow2
header. While this is natural because raw doesn't need to write any
magic bytes during creation, hdev_create is free to clear out the first
sector to make sure the stale qcow2 header doesn't cause such confusion.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-26 14:46:23 +02:00
Vladimir Sementsov-Ogievskiy
a693437037 block/nbd-client: nbd_co_send_request: fix return code
It's incorrect to return success rc >= 0 if we skip qio_channel_writev_all()
call due to s->quit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170920124507.18841-4-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2017-09-25 08:21:26 -05:00