Commit Graph

104 Commits

Author SHA1 Message Date
Kevin Wolf
2cf7cfa1cd qcow2: Catch some L1 table index overflows
This catches the situation that is described in the bug report at
https://bugs.launchpad.net/qemu/+bug/865518 and goes like this:

    $ qemu-img create -f qcow2 huge.qcow2 $((1024*1024))T
    Formatting 'huge.qcow2', fmt=qcow2 size=1152921504606846976 encryption=off cluster_size=65536 lazy_refcounts=off
    $ qemu-io /tmp/huge.qcow2 -c "write $((1024*1024*1024*1024*1024*1024 - 1024)) 512"
    Segmentation fault

With this patch applied the segfault will be avoided, however the case
will still fail, though gracefully:

    $ qemu-img create -f qcow2 /tmp/huge.qcow2 $((1024*1024))T
    Formatting 'huge.qcow2', fmt=qcow2 size=1152921504606846976 encryption=off cluster_size=65536 lazy_refcounts=off
    qemu-img: The image size is too large for file format 'qcow2'

Note that even long before these overflow checks kick in, you get
insanely high memory usage (up to INT_MAX * sizeof(uint64_t) = 16 GB for
the L1 table), so with somewhat smaller image sizes you'll probably see
qemu aborting for a failed g_malloc().

If you need huge image sizes, you should increase the cluster size to
the maximum of 2 MB in order to get higher limits.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-05-14 16:44:33 +02:00
Kevin Wolf
ecdd5333ab qcow2: Gather clusters in a looping loop
Instead of just checking once in exactly this order if there are
dependendies, non-COW clusters and new allocation, this starts looping
around these. This way we can, for example, gather non-COW clusters after
new allocations as long as the host cluster offsets stay contiguous.

Once handle_dependencies() is extended so that COW areas of in-flight
allocations can be overwritten, this allows to continue with gathering
other clusters (we wouldn't be able to do that without this change
because we would have missed a possible second dependency in one of the
next clusters).

This means that in the typical sequential write case, we can combine the
COW overwrite of one cluster with the allocation of the next cluster as
soon as something like Delayed COW gets actually implemented. It is only
by avoiding splitting requests this way that Delayed COW actually starts
improving performance noticably.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:44 +01:00
Kevin Wolf
2c3b32d256 qcow2: Move cluster gathering to a non-looping loop
This patch is mainly to separate the indentation change from the
semantic changes. All that really changes here is that everything moves
into a while loop, all 'goto done' become 'break' and at the end of the
loop a new 'break is inserted.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:44 +01:00
Kevin Wolf
88c6588c51 qcow2: Allow requests with multiple l2metas
Instead of expecting a single l2meta, have a list of them. This allows
to still have a single I/O request for the guest data, even though
multiple l2meta may be needed in order to describe both a COW overwrite
and a new cluster allocation (typical sequential write case).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:44 +01:00
Kevin Wolf
710c2496d8 qcow2: Use byte granularity in qcow2_alloc_cluster_offset()
This gets rid of the nb_clusters and keep_clusters and the associated
complicated calculations. Just advance the number of bytes that have
been processed and everything is fine.

This patch advances the variables even after the last operation even
though they aren't used any more afterwards to make things look more
uniform. A later patch will turn the whole thing into a loop and then
it actually starts making sense.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:44 +01:00
Kevin Wolf
411d62b04b qcow2: Prepare handle_alloc/copied() for byte granularity
This makes handle_alloc() and handle_copied() return byte-granularity
host offsets instead of returning always the cluster start. This is
required so that qcow2_alloc_cluster_offset() can stop aligning
everything to cluster boundaries.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
e62daaf679 qcow2: handle_copied(): Implement non-zero host_offset
Look only for clusters that start at a given physical offset.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
c53ede9f6d qcow2: handle_copied(): Get rid of keep_clusters parameter
Now *bytes is used to return the length of the area that can be written
to without performing an allocation or COW.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
acb0467f8d qcow2: handle_copied(): Get rid of nb_clusters parameter
handle_copied() uses its bytes parameter now to determine how many
clusters it should try to find.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
0af729ec00 qcow2: Factor out handle_copied()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
83baa9a471 qcow2: Clean up handle_alloc()
Things can be simplified a bit now. No semantic changes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
c37f4cd71d qcow2: Finalise interface of handle_alloc()
The interface works completely on a byte granularity now and duplicated
parameters are removed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
3b8e2e260c qcow2: handle_alloc(): Get rid of keep_clusters parameter
handle_alloc() is now called with the offset at which the actual new
allocation starts instead of the offset at which the whole write request
starts, part of which may already be processed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
f5bc635094 qcow2: handle_alloc(): Get rid of nb_clusters parameter
We already communicate the same information in *bytes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
10f0ed8b2f qcow2: Factor out handle_alloc()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
037689d896 qcow2: Decouple cluster allocation from cluster reuse code
This moves some code that prepares the allocation of new clusters to
where the actual allocation happens. This is the minimum required to be
able to move it to a separate function in the next patch.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
65eb2e35c0 qcow2: Change handle_dependency to byte granularity
This is a more precise description of what really constitutes a
dependency. The behaviour doesn't change at this point because the COW
area of the old request is still aligned to cluster boundaries and
therefore an overlap is detected wheneven the requests touch any part of
the same cluster.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
d9d74f4177 qcow2: Improve check for overlapping allocations
The old code detected an overlapping allocation even when the
allocations didn't actually overlap, but were only adjacent.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
17a71e5823 qcow2: Handle dependencies earlier
Handling overlapping allocations isn't just a detail of cluster
allocation. It is rather one of three ways to get the host cluster
offset for a write request:

1. If a request overlaps an in-flight allocations, the cluster offset
   can be taken from there (this is what handle_dependencies will evolve
   into) or the request must just wait until the allocation has
   completed. Accessing the L2 is not valid in this case, it has
   outdated information.

2. Outside overlapping areas, check the clusters that can be written to
   as they are, with no COW involved.

3. If a COW is required, allocate new clusters

Changing the code to reflect this doesn't change the behaviour because
overlaps cannot exist for clusters that are kept in step 2. It does
however make it easier for later patches to work on clusters that belong
to an allocation that is still in flight.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:42 +01:00
Paolo Bonzini
381b487d54 qcow2: make is_allocated return true for zero clusters
Otherwise, live migration of the top layer will miss zero clusters and
let the backing file show through.  This also matches what is done in qed.

QCOW2_CLUSTER_ZERO clusters are invalid in v2 image files.  Check this
directly in qcow2_get_cluster_offset instead of replicating the test
everywhere.

Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15 16:07:50 +01:00
Kevin Wolf
74c4510a3c qcow2: Allow lazy refcounts to be enabled on the command line
qcow2 images now accept a boolean lazy_refcounts options. Use it like
this:

  -drive file=test.qcow2,lazy_refcounts=on

If the option is specified on the command line, it overrides the default
specified by the qcow2 header flags that were set when creating the
image.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15 16:07:49 +01:00
Paolo Bonzini
737e150e89 block: move include files to include/block/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:31 +01:00
Kevin Wolf
226c3c26b9 qcow2: Factor out handle_dependencies()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
280d373579 qcow2: Enable dirty flag in qcow2_alloc_cluster_link_l2
This is closer to where the dirty flag is really needed, and it avoids
having checks for special cases related to cluster allocation directly
in the writev loop.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
f50f88b9fe qcow2: Allocate l2meta only for cluster allocations
Even for writes to already allocated clusters, an l2meta is allocated,
though it stays effectively unused. After this patch, only allocating
requests still have one. Each l2meta now describes an in-flight request
that writes to clusters that are not yet hooked up in the L2 table.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
060bee8943 qcow2: Drop l2meta.cluster_offset
There's no real reason to have an l2meta for normal requests that don't
allocate anything. Before we can get rid of it, we must return the host
cluster offset in a different way.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
593fb83cac qcow2: Introduce Qcow2COWRegion
This makes it easier to address the areas for which a COW must be
performed. As a nice side effect, the COW code in
qcow2_alloc_cluster_link_l2 becomes really trivial.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
1d3afd649b qcow2: Round QCowL2Meta.offset down to cluster boundary
The offset within the cluster is already present as n_start and this is
what the code uses. QCowL2Meta.offset is only needed at a cluster
granularity.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Stefan Hajnoczi
bfe8043e92 qcow2: implement lazy refcounts
Lazy refcounts is a performance optimization for qcow2 that postpones
refcount metadata updates and instead marks the image dirty.  In the
case of crash or power failure the image will be left in a dirty state
and repaired next time it is opened.

Reducing metadata I/O is important for cache=writethrough and
cache=directsync because these modes guarantee that data is on disk
after each write (hence we cannot take advantage of caching updates in
RAM).  Refcount metadata is not needed for guest->file block address
translation and therefore does not need to be on-disk at the time of
write completion - this is the motivation behind the lazy refcount
optimization.

The lazy refcount optimization must be enabled at image creation time:

  qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on a.qcow2 10G
  qemu-system-x86_64 -drive if=virtio,file=a.qcow2,cache=writethrough

Update qemu-iotests 031 and 036 since the extension header size changes
when we add feature bit table entries.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-06 22:39:14 +02:00
Kevin Wolf
b7ab0fea37 qcow2: Fix avail_sectors in cluster allocation code
avail_sectors should really be the number of sectors from the start of
the allocation, not from the start of the write request.

We're lucky enough that this mistake didn't cause any real bug.
avail_sectors is only used in the intialiser of QCowL2Meta:

  .nb_available   = MIN(requested_sectors, avail_sectors),

m->nb_available in turn is only used for COW at the end of the
allocation. A COW occurs only if the request wasn't cluster aligned,
which in turn would imply that requested_sectors was less than
avail_sectors (both in the original and in the fixed version). In this
case avail_sectors is ignored and therefore the mistake doesn't cause
any misbehaviour.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:43 +02:00
Kevin Wolf
cdba7fee1d qcow2: Simplify calculation for COW area at the end
copy_sectors() always uses the sum (cluster_offset + n_start) or
(start_sect + n_start), so if some value is added to both cluster_offset
and start_sect, and subtracted from n_start, it's cancelled out anyway.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:43 +02:00
Zhi Yong Wu
833e40858c qcow2: remove a line of unnecessary code
Commit 3948d1d4 removed the pointer argument we filled in with l2_offset
but forgot to remove the unnecessary l2_offset assignment.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:42 +02:00
Kevin Wolf
1417d7e40e qcow2: Silence false warning
Some gcc versions seem not to be able to figure out that the switch
statement covers all possible values and that c is therefore always
initialised. Add a default branch for them.

Reported-by: malc <av1474@comtv.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: malc <av1474@comtv.ru>
2012-06-15 15:52:45 +04:00
Kevin Wolf
df02179189 qcow2: Check qcow2_alloc_clusters_at() return value
When using qcow2_alloc_clusters_at(), the cluster allocation code
checked the wrong variable for an error code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-25 18:12:54 +02:00
Kevin Wolf
54e6814360 qcow2: Limit COW to where it's needed
This fixes a regression introduced in commit 250196f1. The bug leads to
data corruption, found during an Autotest run with a Fedora 8 guest.

Consider a write request whose first part is covered by an already
allocated cluster, but additional clusters need to be newly allocated.
When counting the number of clusters to allocate, the qcow2 code would
decide to do COW for all remaining clusters of the write request, even
if some of them are already allocated.

If during this COW operation another write request is issued that touches
the same cluster, it will still refer to the old cluster. When the COW
completes, the first request will update the L2 table and the second
write request will be lost. Note that the requests need not overlap, it's
enough for them to touch the same cluster.

This patch ensures that only clusters that really require COW are
considered for allocation. In this case any other request writing to the
same cluster will be an allocating write and gets serialised.

Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Tested-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-07 19:33:18 +02:00
Kevin Wolf
7242411460 qcow2: Don't hold cache references across yield
If cache references are held while the coroutine has yielded, the cache
may get used up and abort() when it can't find a free entry.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-02 18:39:39 +02:00
Kevin Wolf
60651f901a qcow2: Remove unused parameter in do_alloc_cluster_offset
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-02 18:39:39 +02:00
Kevin Wolf
621f058940 qcow2: Zero write support
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:30 +02:00
Kevin Wolf
6377af48b0 qcow2: Support reading zero clusters
This adds support for reading zero clusters in version 3 images.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:29 +02:00
Kevin Wolf
143550a83e qcow2: Simplify count_cow_clusters
count_cow_clusters() tries to reuse existing functions, and all it
achieves is to make things much more complicated than they really are:
Everything needs COW, unless it's a normal cluster with refcount 1.

This patch implements the obvious way of doing this, and by using
qcow2_get_cluster_type() it gets rid of all flag magic.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:28 +02:00
Kevin Wolf
8e37f681d5 qcow2: Ignore reserved bits in L1/L2 entries
This changes the still existing places that assume that the only flags
are QCOW_OFLAG_COPIED and QCOW_OFLAG_COMPRESSED to properly mask out
reserved bits.

It does not convert bdrv_check yet.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:28 +02:00
Kevin Wolf
b0b6862e5e qcow2: Fail write_compressed when overwriting data
qcow2_alloc_compressed_cluster_offset() already fails if the copied flag
is set, because qcow2_write_compressed() doesn't perform COW as it would
have to do to allow this.

However, what we really want to check here is whether the cluster is
allocated or not. With internal snapshots the copied flag may not be set
on allocated clusters. Check the cluster offset instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:27 +02:00
Kevin Wolf
2bfcc4a0a0 qcow2: Ignore reserved bits in count_contiguous_clusters()
Until now, count_contiguous_clusters() has an argument that allowed to
specify flags that should be ignored in the comparison, i.e. that are
allowed to change between contiguous clusters.

This patch changes the function so that it ignores all flags by default
now and you need to pass the flags on which it should stop.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:27 +02:00
Kevin Wolf
68d000a390 qcow2: Ignore reserved bits in get_cluster_offset
With this change, reading from a qcow2 image ignores all reserved bits
that are set in an L1 or L2 table entry.

Now get_cluster_offset() assigns *cluster_offset only the offset without
any other flags. The cluster type is not longer encoded in the offset,
but a positive return value in case of success.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:27 +02:00
Kevin Wolf
8dc0a5e7a0 qcow2: Fix error handling in qcow2_alloc_cluster_offset
If do_alloc_cluster_offset() fails, the error handling code tried to
remove the request from the in-flight queue, to which it wasn't added
yet, resulting in a NULL pointer dereference.

m->nb_clusters really only becomes != 0 when the request is in the list.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-19 16:03:27 +02:00
Kevin Wolf
3948d1d487 qcow2: Remove unused parameter in get_cluster_table()
Since everything goes through the cache, callers don't use the L2 table
offset any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-04-05 14:54:39 +02:00
Kevin Wolf
250196f19c qcow2: Reduce number of I/O requests
If the first part of a write request is allocated, but the second isn't
and it can be allocated so that the resulting area is contiguous, handle
it at once. This is a common case for sequential writes.

After this patch, alloc_cluster_offset() only checks if the clusters are
already allocated or how many new clusters can be allocated contigouosly.
The actual cluster allocation is split off into a new function
do_alloc_cluster_offset().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-03-12 15:14:07 +01:00
Kevin Wolf
bf319ece56 qcow2: Factor out count_cow_clusters
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-03-12 15:14:07 +01:00
Kevin Wolf
3cce16f44d qcow2: Add some tracing
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-03-12 15:14:06 +01:00
Stefan Hajnoczi
aef4acb661 qcow2: avoid reentrant bdrv_read() in copy_sectors()
A BlockDriverState should not issue requests on itself through the
public block layer interface.  Nested, or reentrant, requests are
problematic because they do I/O throttling and request tracking twice.

Features like block layer copy-on-read use request tracking to avoid
race conditions between concurrent requests.  The reentrant request will
have to "wait" for its parent request to complete.  But the parent is
waiting for the reentrant request to make progress so we have reached
deadlock.

The solution is for block drivers to avoid the public block layer
interfaces for reentrant requests.   Instead they should call their own
internal functions if they wish to perform reentrant requests.

This is also a good opportunity to make copy_sectors() a true
coroutine_fn.  That means calling bdrv_co_writev() instead of
bdrv_write().  Behavior is unchanged but we're being explicit that this
executes in coroutine context.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:49:47 +01:00