Commit Graph

214 Commits

Author SHA1 Message Date
Max Reitz
ed3d2ec98a block: Add errp to b{lk,drv}_truncate()
For one thing, this allows us to drop the error message generation from
qemu-img.c and blockdev.c and instead have it unified in
bdrv_truncate().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170328205129.15138-3-mreitz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-04-28 16:02:02 +02:00
Kevin Wolf
52cdbc5869 block: Pass BdrvChild to bdrv_truncate()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2017-02-24 16:09:23 +01:00
Alberto Garcia
7061a07898 qcow2: Optimize the refcount-block overlap check
The metadata overlap checks introduced in a40f1c2add help detect
corruption in the qcow2 image by verifying that data writes don't
overlap with existing metadata sections.

The 'refcount-block' check in particular iterates over the refcount
table in order to get the addresses of all refcount blocks and check
that none of them overlap with the region where we want to write.

The problem with the refcount table is that since it always occupies
complete clusters its size is usually very big. With the default
values of cluster_size=64KB and refcount_bits=16 this table holds 8192
entries, each one of them enough to map 2GB worth of host clusters.

So unless we're using images with several TB of allocated data this
table is going to be mostly empty, and iterating over it is a waste of
CPU. If the storage backend is fast enough this can have an effect on
I/O performance.

This patch keeps the index of the last used (i.e. non-zero) entry in
the refcount table and updates it every time the table changes. The
refcount-block overlap check then uses that index instead of reading
the whole table.

In my tests with a 4GB qcow2 file stored in RAM this doubles the
amount of write IOPS.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 20170201123828.4815-1-berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-02-12 00:47:43 +01:00
Eric Blake
0c51a893b6 block: Convert bdrv_discard() to byte-based
Another step towards byte-based interfaces everywhere.  Replace
the sector-based bdrv_discard() with a new byte-based
bdrv_pdiscard(), which silently ignores any unaligned head
or tail.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-3-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20 14:11:55 +01:00
Peter Maydell
f1f7a1ddf3 block/qcow2: Don't use cpu_to_*w()
Don't use the cpu_to_*w() functions, which we are trying to deprecate.
Instead either just use cpu_to_*() to do the byteswap, or use
st*_be_p() if we need to do the store somewhere other than to a
variable that's already the correct type.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1466093177-17890-1-git-send-email-peter.maydell@linaro.org
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-05 16:54:04 +02:00
Kevin Wolf
d9ca2ea2e2 block: Convert bdrv_pwrite(v/_sync) to BdrvChild
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:27 +02:00
Kevin Wolf
cf2ab8fc34 block: Convert bdrv_pread(v) to BdrvChild
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:27 +02:00
Kevin Wolf
18d51c4bac block: Convert bdrv_write() to BdrvChild
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05 16:46:27 +02:00
Eduardo Habkost
9be385980d coccinelle: Remove unnecessary variables for function return value
Use Coccinelle script to replace 'ret = E; return ret' with
'return E'. The script will do the substitution only when the
function return type and variable type are the same.

Manual fixups:

* audio/audio.c: coding style of "read (...)" and "write (...)"
* block/qcow2-cluster.c: wrap line to make it shorter
* block/qcow2-refcount.c: change indentation of wrapped line
* target-tricore/op_helper.c: fix coding style of
  "remainder|quotient"
* target-mips/dsp_helper.c: reverted changes because I don't
  want to argue about checkpatch.pl
* ui/qemu-pixman.c: fix line indentation
* block/rbd.c: restore blank line between declarations and
  statements

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1465855078-19435-4-git-send-email-ehabkost@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Unused Coccinelle rule name dropped along with a redundant comment;
whitespace touched up in block/qcow2-cluster.c; stale commit message
paragraph deleted]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-06-20 16:38:13 +02:00
Laurent Vivier
d737b78cc1 qcow/qcow2: Use DIV_ROUND_UP
Replace (((n) + (d) - 1) /(d)) by DIV_ROUND_UP(n,d).

This patch is the result of coccinelle script
scripts/coccinelle/round.cocci

CC: qemu-block@nongnu.org
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-06-07 18:19:24 +03:00
Paolo Bonzini
58369e22cf qemu-common: stop including qemu/bswap.h from qemu-common.h
Move it to the actual users.  There are still a few includes of
qemu/bswap.h in headers; removing them is left for future work.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:28 +02:00
Markus Armbruster
da34e65cb4 include/qemu/osdep.h: Don't include qapi/error.h
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef.  Since then, we've moved to include qemu/osdep.h
everywhere.  Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h.  That's in excess of
100KiB of crap most .c files don't actually need.

Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h.  Include qapi/error.h in .c files that need it and don't
get it now.  Include qapi-types.h in qom/object.h for uint16List.

Update scripts/clean-includes accordingly.  Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h.  Update the list of includes in the qemu/osdep.h
comment quoted above similarly.

This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third.  Unfortunately, the number depending on
qapi-types.h shrinks only a little.  More work is needed for that one.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:15 +01:00
Peter Maydell
80c71a241a block: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-01-20 13:36:23 +01:00
Max Reitz
03bb78ed25 qcow2: Point to amend function in check
If a reference count is not representable with the current refcount
order, the image check should point to qemu-img amend for increasing the
refcount order. However, qemu-img amend needs write access to the image
which cannot be provided if the image is marked corrupt; and the image
check will not mark the image consistent unless everything actually is
consistent.

Therefore, if an image is marked corrupt and the image check encounters
a reference count overflow, it cannot be fixed by using qemu-img amend
to increase the refcount order. Instead, one has to use qemu-img convert
to create a completely new copy of the image in this case.

Alternatively, we may want to give the user a way of manually removing
the corrupt flag, maybe through qemu-img amend, but this is not part of
this patch.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-12-18 14:34:43 +01:00
Max Reitz
791c9a004e qcow2: Add function for refcount order amendment
Add a function qcow2_change_refcount_order() which allows changing the
refcount order of a qcow2 image.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-12-18 14:34:43 +01:00
Kevin Wolf
c2551b47c9 qcow2: Fix potential qemu-img check crash on 32 bit hosts
This crash was caught with qemu-iotests test case 138.

Commit b6d36de already fixed a few 32 bit truncation bugs that could
cause qemu-img check to allocate too little memory and consequently
it would segfault. On 32 bit hosts, there is one more place that needs
to be fixed because size_t was involved in the calculation and is a
32 bit type there.

Cc: qemu-stable@nongnu.org
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Michael S. Tsirkin <mst@redhat.com>
2015-12-02 13:22:29 +01:00
John Snow
9533423063 qcow2: avoid misaligned 64bit bswap
If we create a buffer directly on the stack by using 12 bytes, there's
no guarantee the 64bit value we want to swap will be aligned, which
could cause errors with undefined behavior.

Spotted with clang -fsanitize=undefined and observed in iotests 15, 26,
44, 115 and 121.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:47 +01:00
Kevin Wolf
9a4f4c3156 block: Convert bs->file to BdrvChild
This patch removes the temporary duplication between bs->file and
bs->file_child by converting everything to BdrvChild.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-16 15:34:29 +02:00
Max Reitz
2ac01520be qcow2: Make qcow2_alloc_bytes() more explicit
In case of -EAGAIN returned by update_refcount(), we should discard the
cluster offset we were trying to allocate and request a new one, because
in theory that old offset might now be taken by a refcount block.

In practice, this was not the case due to update_refcount() generally
returning strictly monotonic increasing cluster offsets. However, this
behavior is not set in stone, and it is also not obvious when looking at
qcow2_alloc_bytes() alone, so we should not rely on it.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-09-14 16:51:37 +02:00
Max Reitz
b6d36def6d qcow2: Make size_to_clusters() return uint64_t
Sadly, some images may have more clusters than what can be represented
using a plain int. We should be prepared for that case (in
qcow2_check_refcounts() we actually were trying to catch that case, but
since size_to_clusters() truncated the returned value, that check never
did anything useful).

Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-09-14 16:51:37 +02:00
Kevin Wolf
ff99129ab8 qcow2: Rename BDRVQcowState to BDRVQcow2State
BDRVQcowState is already used by qcow1, and gdb is always confused which
one to use. Rename the qcow2 one so they can be distinguished.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2015-09-14 16:51:36 +02:00
Daniel P. Berrange
b6af097528 maint: remove / fix many doubled words
Many source files have doubled words (eg "the the", "to to",
and so on). Most of these can simply be removed, but a couple
were actual mis-spellings (eg "to to" instead of "to do").
There was even one triple word score "to to to" :-)

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-09-11 10:21:38 +03:00
Jindřich Makovička
3e5feb6202 qcow2: Handle EAGAIN returned from update_refcount
Fixes a crash during image compression

Signed-off-by: Jindřich Makovička <makovick@gmail.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-07-02 09:20:18 +01:00
Alberto Garcia
a3f1afb43a qcow2: make qcow2_cache_put() a void function
This function never receives an invalid table pointer, so we can make
it void and remove all the error checking code.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-05-22 17:08:01 +02:00
Alberto Garcia
72e80b8901 qcow2: use one single memory block for the L2/refcount cache tables
The qcow2 L2/refcount cache contains one separate table for each cache
entry. Doing one allocation per table adds unnecessary overhead and it
also requires us to store the address of each table separately.

Since the size of the cache is constant during its lifetime, it's
better to have an array that contains all the tables using one single
allocation.

In my tests measuring freshly created caches with sizes 128MB (L2) and
32MB (refcount) this uses around 10MB of RAM less.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-05-22 17:08:01 +02:00
Kevin Wolf
ecbda7a225 qcow2: Flush pending discards before allocating cluster
Before a freed cluster can be reused, pending discards for this cluster
must be processed.

The original assumption was that this was not a problem because discards
are only cached during discard/write zeroes operations, which are
synchronous so that no concurrent write requests can cause cluster
allocations.

However, the discard/write zeroes operation itself can allocate a new L2
table (and it has to in order to put zero flags there), so make sure we
can cope with the situation.

This fixes https://bugs.launchpad.net/bugs/1349972.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-05-22 17:08:00 +02:00
Stefan Hajnoczi
786a4ea82e Convert (ffs(val) - 1) to ctz32(val)
This commit was generated mechanically by coccinelle from the following
semantic patch:

@@
expression val;
@@
- (ffs(val) - 1)
+ ctz32(val)

The call sites have been audited to ensure the ffs(0) - 1 == -1 case
never occurs (due to input validation, asserts, etc).  Therefore we
don't need to worry about the fact that ctz32(0) == 32.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1427124571-28598-5-git-send-email-stefanha@redhat.com
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:08 +02:00
Max Reitz
14a58a4e0c qcow2: Respect new_block in alloc_refcount_block()
When choosing a new place for the refcount table, alloc_refcount_block()
tries to infer the number of clusters used so far from its argument
cluster_index (which comes from the idea that if any cluster with an
index greater than cluster_index was in use, the refcount table would
have to be big enough already to describe cluster_index).

However, there is a cluster that may be at or after cluster_index, and
which is not covered by the refcount structures, and that is the new
refcount block new_block. Therefore, it should be taken into account for
the blocks_used calculation.

Also, because new_block already describes (or is intended to describe)
cluster_index, we may not put the new refcount structures there.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 1423598552-24301-2-git-send-email-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-03-16 12:10:30 -04:00
Max Reitz
59c0cb7830 qcow2: More helpers for refcount modification
Add helper functions for getting and setting refcounts in a refcount
array for any possible refcount order, and choose the correct one during
refcount initialization.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10 14:02:21 +01:00
Max Reitz
7453c96b78 qcow2: Helper function for refcount modification
Since refcounts do not always have to be a uint16_t, all refcount blocks
and arrays in memory should not have a specific type (thus they become
pointers to void) and for accessing them, two helper functions are used
(a getter and a setter). Those functions are called indirectly through
function pointers in the BDRVQcowState so they may later be exchanged
for different refcount orders.

With the check and repair functions using this function, the refcount
array they are creating will be in big endian byte order; additionally,
using realloc_refcount_array() makes the size of this refcount array
always cluster-aligned. Both combined allow rebuild_refcount_structure()
to drop the bounce buffer which was used to convert parts of the
refcount array to big endian byte order and store them on disk. Instead,
those parts can now be written directly.

[ kwolf: Fixed a build failure on 32 bit and another with old glib ]

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10 14:02:21 +01:00
Max Reitz
5fee192efd qcow2: Helper for refcount array reallocation
Add a helper function for reallocating a refcount array, independent of
the refcount order. The newly allocated space is zeroed and the function
handles failed reallocations gracefully.

The helper function will always align the buffer size to a cluster
boundary; if storing the refcounts in such an array in big endian byte
order, this makes it possible to write parts of the array directly as
refcount blocks into the image file.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10 14:02:21 +01:00
Max Reitz
0e06528e98 qcow2: Use 64 bits for refcount values
Refcounts may have a width of up to 64 bits, so qemu should use the same
width to represent refcount values internally.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10 14:02:21 +01:00
Max Reitz
2aabe7c7a1 qcow2: Use unsigned addend for update_refcount()
update_refcount() and qcow2_update_cluster_refcount() currently take a
signed addend. At least one caller passes a value directly derived from
an absolute refcount that should be reached ("l2_refcount - 1" in
expand_zero_clusters_in_l1()). Therefore, the addend should be unsigned
as well; this will be especially important for 64 bit refcounts.

Because update_refcount() then no longer knows whether the refcount
should be increased or decreased, it now requires an additional flag
which specified exactly that. The same applies to
qcow2_update_cluster_refcount().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10 14:02:21 +01:00
Max Reitz
7324c10f96 qcow2: Only return status from qcow2_get_refcount
Refcounts can theoretically be of type uint64_t; in order to be able to
represent the full range, qcow2_get_refcount() cannot use a single
variable to represent both all refcount values and also keep some values
reserved for errors.

One solution would be to add an Error pointer parameter to
qcow2_get_refcount(); however, no caller could (currently) pass that
error message, so it would have to be emitted immediately and be
passed to the next caller by returning -EIO or something similar.
Therefore, an Error parameter does not offer any advantages here.

The solution applied by this patch is simpler to use. Because no caller
would be able to pass the error message, they would have to print it and
free it, whereas with this patch the caller only needs to pass the
returned integer (which is often a no-op from the code perspective,
because that integer will be stored in a variable "ret" which will be
returned by the fail path of many callers).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10 14:02:21 +01:00
Max Reitz
c6e9d8ae66 qcow2: Do not return new value after refcount update
qcow2_update_cluster_refcount() does not have any quick access to the
new refcount value, it has to call qcow2_get_refcount(). Some callers do
not need that new value at all, others call qcow2_get_refcount()
themselves anyway (albeit in a different code path, which can however be
easily changed), therefore there is no advantage in making
qcow2_update_cluster_refcount() return the new value. Drop it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10 14:02:21 +01:00
Max Reitz
346a53df38 qcow2: Add two new fields to BDRVQcowState
Add two new fields regarding refcount information (the bit width of
every entry and the maximum refcount value) to the BDRVQcowState.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10 14:02:20 +01:00
Max Reitz
8c44dfbc62 qcow2: Rewrite qcow2_alloc_bytes()
qcow2_alloc_bytes() is a function with insufficient error handling and
an unnecessary goto. This patch rewrites it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:22 +01:00
Max Reitz
44751917db block/qcow2: Make get_refcount() global
Reading the refcount of a cluster is an operation which can be useful in
all of the qcow2 code, so make that function globally available.

While touching this function, amend the comment describing the "addend"
parameter: It is (no longer, if it ever was) necessary to have it set to
-1 or 1; any value is fine.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Message-id: 1414404776-4919-6-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 11:41:49 +00:00
Max Reitz
17bd5f4727 qcow2: Drop REFCOUNT_SHIFT
With BDRVQcowState.refcount_block_bits, we don't need REFCOUNT_SHIFT
anymore.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:02 +02:00
Max Reitz
791230d8bb qcow2: Clean up after refcount rebuild
Because the old refcount structure will be leaked after having rebuilt
it, we need to recalculate the refcounts and run a leak-fixing operation
afterwards (if leaks should be fixed at all).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
c7c0681bc8 qcow2: Rebuild refcount structure during check
The previous commit introduced the "rebuild" variable to qcow2's
implementation of the image consistency check. Now make use of this by
adding a function which creates a completely new refcount structure
based solely on the in-memory information gathered before.

The old refcount structure will be leaked, however. This leak will be
dealt with in a follow-up commit.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
f307b2558f qcow2: Do not perform potentially damaging repairs
If a referenced cluster has a refcount of 0, increasing its refcount may
result in clusters being allocated for the refcount structures. This may
overwrite the referenced cluster, therefore we cannot simply increase
the refcount then.

In such cases, we can either try to replicate all the refcount
operations solely for the check operation, basing the allocations on the
in-memory refcount table; or we can simply rebuild the whole refcount
structure based on the in-memory refcount table. Since the latter will
be much easier, do that.

To prepare for this, introduce a "rebuild" boolean which should be set
to true whenever a fix is rather dangerous or too complicated using the
current refcount structures. Another example for this is refcount blocks
being referenced more than once.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
001c158def qcow2: Fix refcount blocks beyond image end
If the qcow2 check function detects a refcount block located beyond the
image end, grow the image appropriately. This cannot break anything and
is the logical fix for such a case.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
9696df219a qcow2: Reuse refcount table in calculate_refcounts()
We will later call calculate_refcounts multiple times, so reuse the
refcount table if possible.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
641bb63cd6 qcow2: Let inc_refcounts() resize the reftable
Now that the refcount table can be passed around by reference, do that
for inc_refcounts() (and subsequently check_refcounts_l1() and
check_refcounts_l2()) and use it for resizing it when a cluster after
the image end is encountered.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
fef4d3d564 qcow2: Let inc_refcounts() return -errno
As of a future patch, inc_refcounts() will have to throw errors which
are generally signaled by returning -errno. Therefore, let it return an
integer which is either 0 for success or -errno and handle the -errno
case in all callers.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
ad27390c85 qcow2: Split fail code in L1 and L2 checks
Instead of printing out an error message, incrementing check_errors and
returning a fixed -errno, just do cleanups and return -ret, with ret set
by the code which threw the exception (jumped to the fail label).

Also, increment check_errors on error in check_refcounts_l2().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
713d9675e0 qcow2: Use int64_t for in-memory reftable size
Use int64_t for the entry count of the in-memory refcount table
throughout the check functions.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
057a3fe57e qcow2: Pull check_refblocks() up
Pull check_refblocks() before calculate_refcounts() so we can drop its
static declaration.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
78fb328e85 qcow2: Use sizeof(**refcount_table)
When implementing variable refcounts, we want to be able to easily find
all the places in qemu which are tied to a certain refcount order.
Replace sizeof(uint16_t) in the check code by sizeof(**refcount_table)
so we can later find it more easily.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
6ca56bf5e9 qcow2: Split qcow2_check_refcounts()
Put the code for calculating the reference counts and comparing them
during qemu-img check into own functions.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Zhang Haoyu
d8bb71b622 qcow2: fix leak of Qcow2DiscardRegion in update_refcount_discard
When the Qcow2DiscardRegion is adjacent to another one referenced by "d",
free this Qcow2DiscardRegion metadata referenced by "p" after
it was removed from s->discards queue.

Signed-off-by: Zhang Haoyu <zhanghy@sangfor.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Max Reitz
a97c67ee6c qcow2: Check L1/L2/reftable entries for alignment
Offsets taken from the L1, L2 and refcount tables are generally assumed
to be correctly aligned. However, this cannot be guaranteed if the image
has been written to by something different than qemu, thus check all
offsets taken from these tables for correct cluster alignment.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1409926039-29044-5-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:28 +01:00
Max Reitz
adb435522b qcow2: Use qcow2_signal_corruption() for overlaps
Use the new function in case of a failed overlap check.

This changes output in case of corruption, so adapt iotest 060's
reference output accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Message-id: 1409926039-29044-4-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:28 +01:00
Max Reitz
9bf040b962 qapi/block: Add "fatal" to BLOCK_IMAGE_CORRUPTED
Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when
reading from an image, they should generally not be. Nonetheless, even
an image only read from may of course be corrupted and this can be
detected during normal operation. In this case, a non-fatal event should
be emitted, but the image should not be marked corrupt (in accordance to
"fatal" set to false).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1409926039-29044-2-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:26 +01:00
Markus Armbruster
5839e53bbc block: Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n).  It's also safer,
for two reasons.  One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.

Patch created with Coccinelle, with two manual changes on top:

* Add const to bdrv_iterate_format() to keep the types straight

* Convert the allocation in bdrv_drop_intermediate(), which Coccinelle
  inexplicably misses

Coccinelle semantic patch:

    @@
    type T;
    @@
    -g_malloc(sizeof(T))
    +g_new(T, 1)
    @@
    type T;
    @@
    -g_try_malloc(sizeof(T))
    +g_try_new(T, 1)
    @@
    type T;
    @@
    -g_malloc0(sizeof(T))
    +g_new0(T, 1)
    @@
    type T;
    @@
    -g_try_malloc0(sizeof(T))
    +g_try_new0(T, 1)
    @@
    type T;
    expression n;
    @@
    -g_malloc(sizeof(T) * (n))
    +g_new(T, n)
    @@
    type T;
    expression n;
    @@
    -g_try_malloc(sizeof(T) * (n))
    +g_try_new(T, n)
    @@
    type T;
    expression n;
    @@
    -g_malloc0(sizeof(T) * (n))
    +g_new0(T, n)
    @@
    type T;
    expression n;
    @@
    -g_try_malloc0(sizeof(T) * (n))
    +g_try_new0(T, n)
    @@
    type T;
    expression p, n;
    @@
    -g_realloc(p, sizeof(T) * (n))
    +g_renew(T, p, n)
    @@
    type T;
    expression p, n;
    @@
    -g_try_realloc(p, sizeof(T) * (n))
    +g_try_renew(T, p, n)

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Stefan Hajnoczi
39ba3bf69c qcow2: fix new_blocks double-free in alloc_refcount_block()
Commit de82815db1 ("qcow2: Handle failure
for potentially large allocations") introduced a double-free of
new_blocks in the alloc_refcount_block() error path.

The qemu-iotests qcow2 026 test case was failing because qemu-io
segfaulted.

Make sure new_blocks is NULL after we free it the first time.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:26 +01:00
Max Reitz
8fcffa9853 qcow2: Return useful error code in refcount_init()
If bdrv_pread() returns an error, it is very unlikely that it was
ENOMEM. In this case, the return value should be passed along; as
bdrv_pread() will always either return the number of bytes read or a
negative value (the error code), the condition for checking whether
bdrv_pread() failed can be simplified (and clarified) as well.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:16 +02:00
Kevin Wolf
de82815db1 qcow2: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the qcow2 block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:15 +02:00
Wenchao Xia
c120f0fa14 qapi event: convert BLOCK_IMAGE_CORRUPTED
Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-06-23 11:12:27 -04:00
Max Reitz
65f33bc002 qcow2: Fix alloc_clusters_noref() overflow detection
If the very first allocation has a length of 0, the free_cluster_index
is still 0 after the for loop, which means that subtracting one from it
will underflow and signal an invalid range of clusters by returning
-EFBIG. However, there is no such range, as its length is 0.

Fix this by preventing underflows on free_cluster_index during the
check.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-09 13:32:16 +02:00
Max Reitz
a49139af77 qcow2: Catch bdrv_getlength() error
The call to bdrv_getlength() from qcow2_check_refcounts() may result in
an error. Check this and abort if necessary.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-04-30 14:46:17 +02:00
Max Reitz
91f827dcff qcow2: Avoid overflow in alloc_clusters_noref()
alloc_clusters_noref() stores the cluster index in a uint64_t. However,
offsets are often represented as int64_t (as for example the return
value of alloc_clusters_noref() itself demonstrates). Therefore, we
should make sure all offsets in the allocated range of clusters are
representable using int64_t without overflows.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-04-30 14:46:13 +02:00
Kevin Wolf
0abe740f1d qcow2: Protect against some integer overflows in bdrv_check
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-04-01 15:22:35 +02:00
Kevin Wolf
bb572aefbd qcow2: Fix types in qcow2_alloc_clusters and alloc_clusters_noref
In order to avoid integer overflows.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-04-01 15:22:34 +02:00
Kevin Wolf
2b5d5953ee qcow2: Check new refcount table size on growth
If the size becomes larger than what qcow2_open() would accept, fail the
growing operation.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-04-01 15:22:34 +02:00
Kevin Wolf
db8a31d11d qcow2: Avoid integer overflow in get_refcount (CVE-2014-0143)
This ensures that the checks catch all invalid cluster indexes
instead of returning the refcount of a wrong cluster.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-04-01 15:22:34 +02:00
Kevin Wolf
b106ad9185 qcow2: Don't rely on free_cluster_index in alloc_refcount_block() (CVE-2014-0147)
free_cluster_index is only correct if update_refcount() was called from
an allocation function, and even there it's brittle because it's used to
protect unfinished allocations which still have a refcount of 0 - if it
moves in the wrong place, the unfinished allocation can be corrupted.

So not using it any more seems to be a good idea. Instead, use the
first requested cluster to do the calculations. Return -EAGAIN if
unfinished allocations could become invalid and let the caller restart
its search for some free clusters.

The context of creating a snapsnot is one situation where
update_refcount() is called outside of a cluster allocation. For this
case, the change fixes a buffer overflow if a cluster is referenced in
an L2 table that cannot be represented by an existing refcount block.
(new_table[refcount_table_index] was out of bounds)

[Bump the qemu-iotests 026 refblock_alloc.write leak count from 10 to
11.
--Stefan]

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-04-01 15:21:03 +02:00
Kevin Wolf
5dab2faddc qcow2: Check refcount table size (CVE-2014-0144)
Limit the in-memory reference count table size to 8 MB, it's enough in
practice. This fixes an unbounded allocation as well as a buffer
overflow in qcow2_refcount_init().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-04-01 14:19:09 +02:00
Max Reitz
a134d90f50 qcow2: Fix fail path in realloc_refcount_block()
If qcow2_alloc_clusters() fails, new_offset and ret will both be
negative after the fail label, thus passing the first if condition and
subsequently resulting in a call of qcow2_free_clusters() with an
invalid (negative) offset parameter. Fix this by introducing a new label
"fail_free_cluster" which is only invoked if new_offset is indeed
pointing to a newly allocated cluster that should be cleaned up by
freeing it.

While we're at it, clean up the whole fail path. qcow2_cache_put()
should (and actually can) never fail, hence the return value can safely
be ignored (aside from asserting that it indeed did not fail).

Furthermore, there is no reason to give QCOW2_DISCARD_ALWAYS to
qcow2_free_clusters(), a mere QCOW2_DISCARD_OTHER will suffice.

Ultimately, rename the "fail" label to "done", as it is invoked both on
failure and success.

Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-03-19 09:39:41 +01:00
Max Reitz
8a15b813e6 qcow2: Correct comment for realloc_refcount_block()
Contrary to the comment describing this function's behavior, it does not
return 0 on success, but rather the offset of the newly allocated
cluster. This patch adjusts the comment accordingly to reflect the
actual behavior.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-03-19 09:39:41 +01:00
Max Reitz
26d49c4675 qcow2-refcount: Sanitize refcount table entry
When reading the refcount table entry in get_refcount(), only bits which
are actually significant for the refcount block offset should be taken
into account.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-03-13 14:23:27 +01:00
Hu Tao
33304ec9fa qcow2: fix offset overflow in qcow2_alloc_clusters_at()
When cluster size is big enough it can lead to an offset overflow
in qcow2_alloc_clusters_at(). This patch fixes it.

The allocation is stopped each time at L2 table boundary
(see handle_alloc()), so the possible maximum bytes could be

  2^(cluster_bits - 3 + cluster_bits)

cluster_bits - 3 is used to compute the number of entry by L2
and the additional cluster_bits is to take into account each
clusters referenced by the L2 entries.

so int is safe for cluster_bits<=17, unsafe otherwise.

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-09 09:12:39 +01:00
Hu Tao
ac95acdb8e qcow2: use start_of_cluster() and offset_into_cluster() everywhere
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-06 16:53:50 +01:00
Max Reitz
3e3553905c qcow2: Make overlap check mask variable
Replace the QCOW2_OL_DEFAULT macro by a variable overlap_check in
BDRVQcowState.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 16:50:00 +02:00
Max Reitz
231bb26764 qcow2: Use negated overflow check mask
In qcow2_check_metadata_overlap and qcow2_pre_write_overlap_check,
change the parameter signifying the checks to perform from its current
positive form to a negative one, i.e., it will no longer explicitly
specify every check to perform but rather a mask of checks not to
perform.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 16:50:00 +02:00
Max Reitz
8f730dd24e qcow2: Free preallocated zero clusters
In qcow2_free_any_clusters, preallocated zero clusters should be freed
just as normal clusters are.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 16:49:59 +02:00
Max Reitz
998b959c1e qcow2: Use pread for inactive L1 in overlap check
Currently, qcow2_check_metadata_overlap uses bdrv_read to read inactive
L1 tables from disk. The number of sectors to read is calculated through
a truncating integer division, therefore, if the L1 table size is not a
multiple of the sector size, the final entries will not be read and
their entries in memory remain undefined (from the g_malloc).
Using bdrv_pread fixes this.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 16:49:59 +02:00
Max Reitz
db0749012b qcow2: CHECK_OFLAG_COPIED is obsolete
CHECK_OFLAG_COPIED as a parameter to check_refcounts_l1 and
check_refcounts_l2 is obselete now, since the OFLAG_COPIED consistency
check is actually no longer performed by these functions (but by
check_oflag_copied).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-10-02 11:40:41 +02:00
Max Reitz
1e242b5544 qcow2: Correct endianness in overlap check
If an inactive L1 table is loaded from disk, its entries are in big
endian and have to be converted to host byte order before using them.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-10-02 11:06:35 +02:00
Max Reitz
7454d60045 qcow2: Don't shadow return value
When trying to update the refcounts for a snapshot, the return value of
update_refcount on a compressed cluster was pretty much ignored,
cancelling the update on error but returning 0. This is caused by an
inner "ret" variable shadowing the outer one (the latter is used in the
return statement).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 10:08:56 +02:00
Max Reitz
32b6444d23 qcow2-cluster: Expand zero clusters
Add functionality for expanding zero clusters. This is necessary for
downgrading the image version to one without zero cluster support.

For non-backed images, this function may also just discard zero clusters
instead of truly expanding them.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-12 10:12:46 +02:00
Max Reitz
afa50193cd qcow2-refcount: Repair shared refcount blocks
If the refcount of a refcount block is greater than one, we can at least
try to repair that problem by duplicating the affected block.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-02 10:06:59 +02:00
Max Reitz
e23e400ec6 qcow2-refcount: Repair OFLAG_COPIED errors
Since the OFLAG_COPIED checks are now executed after the refcounts have
been repaired (if repairing), it is safe to assume that they are correct
but the OFLAG_COPIED flag may be not. Therefore, if its value differs
from what it should be (considering the according refcount), that
discrepancy can be repaired by correctly setting (or clearing that flag.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:48:44 +02:00
Max Reitz
4f6ed88c03 qcow2-refcount: Move OFLAG_COPIED checks
Move the OFLAG_COPIED checks out of check_refcounts_l1 and
check_refcounts_l2 and after the actual refcount checks/fixes (since the
refcounts might actually change there).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:48:44 +02:00
Max Reitz
a40f1c2add qcow2: Metadata overlap checks
Two new functions are added; the first one checks a given range in the
image file for overlaps with metadata (main header, L1 tables, L2
tables, refcount table and blocks).

The second one should be used immediately before writing to the image
file as it calls the first function and, upon collision, marks the
image as corrupt and makes the BDS unusable, thereby preventing
further access.

Both functions take a bitmask argument specifying the structures which
should be checked for overlaps, making it possible to also check
metadata writes against colliding with other structures.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:48:43 +02:00
Max Reitz
8b81a7b6ba qcow2-refcount: Snapshot update for zero clusters
Account for all cluster types in qcow2_update_snapshot_refcounts;
this prevents this function from updating the refcount of unallocated
zero clusters which effectively led to wrong adjustments of the refcount
of cluster 0 (the main qcow2 header). This in turn resulted in images
with (unallocated) zero clusters having a cluster 0 refcount greater
than one after creating a snapshot.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:28:52 +02:00
Kevin Wolf
0b919fae31 qcow2: Batch discards
This optimises the discard operation for freed clusters by batching
discard requests (both snapshot deletion and bdrv_discard end up
updating the refcounts cluster by cluster).

Note that we don't discard asynchronously, but keep s->lock held. This
is to avoid that a freed cluster is reallocated and written to while the
discard is still in flight.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-06-24 10:25:17 +02:00
Kevin Wolf
67af674e47 qcow2: Options to enable discard for freed clusters
Deleted snapshots are discarded in the image file by default, discard
requests take their default from the -drive discard=... option and other
places that free clusters must always be enabled explicitly.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-06-24 10:25:17 +02:00
Kevin Wolf
6cfcb9b8b9 qcow2: Add refcount update reason to all callers
This adds a refcount update reason to all callers of update_refcounts(),
so that a follow-up patch can use this information to decide whether
clusters that reach a refcount of 0 should be discarded in the image
file.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-06-24 10:25:17 +02:00
Kevin Wolf
c2b6ff51e4 qcow2: Fix L1 write error handling in qcow2_update_snapshot_refcount
It ignored the error code, and at least the 'goto fail' is obvious
nonsense as it creates an endless loop (if the next attempt doesn't
magically succeed) and leaves the in-memory L1 table in big-endian
instead of converting it back.

In error cases, there's no point in writing an updated L1 table, so
skip this part for them.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-04-05 18:58:05 +02:00
Kevin Wolf
c2bc78b6a9 qcow2: Return real error in qcow2_update_snapshot_refcount
This fixes the error message triggered by the following script:

    cat > /tmp/blkdebug.cfg <<EOF
    [inject-error]
    event = "cluster_free"
    errno = "28"
    immediately = "off"
    EOF

    $qemu_img create -f qcow2 test.qcow2 10G
    $qemu_img snapshot -c snap test.qcow2
    $qemu_img snapshot -d snap blkdebug:/tmp/blkdebug.cfg:test.qcow2

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-04-05 18:58:05 +02:00
Kevin Wolf
c349ca4bb2 qcow2: Fix "total clusters" number in bdrv_check
This should be based on the virtual disk size, not on the size of the
image.

Interesting observation: With some VM state stored in the image file,
percentages higher than 100% are possible, even though snapshots
themselves are ignored. This is a qcow2 bug to be fixed another day: The
VM state should be discarded in the active L2 tables after completing
the snapshot creation.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:42 +01:00
Stefan Hajnoczi
3647917919 qcow2: drop unnecessary flush in qcow2_update_snapshot_refcount()
We already flush when the function completes.  There is no need to flush
after every compressed cluster.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15 16:07:50 +01:00
Stefan Hajnoczi
f9cb2860bd qcow2: drop flush in update_cluster_refcount()
The update_cluster_refcount() function increments/decrements a cluster's
refcount and then returns the new refcount value.

There is no need to flush since both update_cluster_refcount() callers
already take care of this:

1. qcow2_alloc_bytes() calls update_cluster_refcount() when compressed
   sectors will be appended to an existing cluster with enough free
   space.  qcow2_alloc_bytes() already flushes so there is no need to do
   so in update_cluster_refcount().

2. qcow2_update_snapshot_refcount() sets a cache dependency on refcounts
   if it needs to update L2 entries.  It also flushes before completing.

Removing this flush significantly speeds up qcow2 snapshot creation:

  $ qemu-img create -f qcow2 test.qcow2 -o size=50G,preallocation=metadata
  $ time qemu-img snapshot -c new test.qcow2

Time drops from more than 3 minutes to under 1 second.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15 16:07:50 +01:00
Stefan Hajnoczi
2154f24e4e qcow2: flush in qcow2_update_snapshot_refcount()
Users of qcow2_update_snapshot_refcount() do not flush consistently.
qcow2_snapshot_create() flushes but qcow2_snapshot_goto() and
qcow2_snapshot_delete() do not.

Solve this by moving the bdrv_flush() into
qcow2_update_snapshot_refcount().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15 16:07:50 +01:00
Stefan Hajnoczi
c1f5bafd70 qcow2: set L2 cache dependency in qcow2_alloc_bytes()
Compressed writes use qcow2_alloc_bytes() to allocate space with byte
granularity.  The affected clusters' refcounts will be incremented but
we do not need to flush yet.

Set a L2 cache dependency on the refcount block cache, so that the
refcounts get written out before the L2 updates.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15 16:07:50 +01:00
Stefan Hajnoczi
9991923b26 qcow2: flush refcount cache correctly in alloc_refcount_block()
update_refcount() affects the refcount cache, it does not write to disk.
Therefore bdrv_flush(bs->file) does nothing.  We need to flush the
refcount cache in order to write out the refcount updates!

While we're here also add error returns when qcow2_cache_flush() fails.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15 16:07:49 +01:00
Stefan Hajnoczi
4db35162ea qcow2: support compressed clusters in BlockFragInfo
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-22 21:21:09 +01:00
Stefan Hajnoczi
fba31bae2d qcow2: record fragmentation statistics during check
The qemu-img check command can display fragmentation statistics:
 * Total number of clusters in virtual disk
 * Number of allocated clusters
 * Number of fragmented clusters

This patch adds fragmentation statistics support to qcow2.

Compressed and normal clusters count as allocated.  Zero clusters are
not counted as allocated unless their L2 entry has a non-zero offset
(e.g. preallocation).

Only the current L1 table counts towards the statistics - snapshots are
ignored.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-22 21:21:09 +01:00