Commit Graph

157 Commits

Author SHA1 Message Date
Kevin Wolf
bf319ece56 qcow2: Factor out count_cow_clusters
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-03-12 15:14:07 +01:00
Kevin Wolf
3cce16f44d qcow2: Add some tracing
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-03-12 15:14:06 +01:00
Stefan Hajnoczi
aef4acb661 qcow2: avoid reentrant bdrv_read() in copy_sectors()
A BlockDriverState should not issue requests on itself through the
public block layer interface.  Nested, or reentrant, requests are
problematic because they do I/O throttling and request tracking twice.

Features like block layer copy-on-read use request tracking to avoid
race conditions between concurrent requests.  The reentrant request will
have to "wait" for its parent request to complete.  But the parent is
waiting for the reentrant request to make progress so we have reached
deadlock.

The solution is for block drivers to avoid the public block layer
interfaces for reentrant requests.   Instead they should call their own
internal functions if they wish to perform reentrant requests.

This is also a good opportunity to make copy_sectors() a true
coroutine_fn.  That means calling bdrv_co_writev() instead of
bdrv_write().  Behavior is unchanged but we're being explicit that this
executes in coroutine context.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:49:47 +01:00
Kevin Wolf
1b9f1491f8 qcow2: Unlock during COW
Unlocking during COW allows for more parallelism. One change it requires is
that buffers are dynamically allocated instead of just using a per-image
buffer.

While touching the code, drop the synchronous qcow2_read() function and replace
it by a bdrv_read() call.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:49:40 +01:00
Kevin Wolf
8f1efd00c4 qcow2: Fix bdrv_write_compressed error handling
If during allocation of compressed clusters the cluster was already allocated
uncompressed, fail and properly release the l2_table (the latter avoids a
failed assertion).

While at it, make it return some real error numbers instead of -1.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
2011-10-21 17:34:13 +02:00
Frediano Ziglio
ee18e73023 qcow2: fix range check
QCowL2Meta::offset is not cluster aligned but only sector aligned
however nb_clusters count cluster from cluster start.
This fix range check. Note that old code have no corruption issues
related to this check cause it only cause intersection to occur
when shouldn't.

Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:22 +02:00
Frediano Ziglio
05140499d3 qcow2: initialize metadata before inserting in cluster_allocs
QCow2Meta structure was inserted into list before many fields are
initialized. Currently is not a problem cause all occur in a lock
but if qcow2_alloc_clusters would in a future unlock this lock
some issues could arise.
Initializing fields before inserting fix the problem.

Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:22 +02:00
Frediano Ziglio
a791236992 qcow2: removed unused depends_on field
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:17 +02:00
Frediano Ziglio
35ee5e39c5 qcow2: use always stderr for debugging
let all DEBUG_ALLOC2 printf goes to stderr

Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-25 15:22:25 +02:00
Devin Nakamura
d57237f291 qcow2: fix typo in documentation for qcow2_get_cluster_offset()
Documentation states the num is measured in clusters, but its
actually measured in sectors

Signed-off-by: Devin Nakamura <devin122@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 14:15:17 +02:00
Anthony Liguori
7267c0947d Use glib memory allocation and free functions
qemu_malloc/qemu_free no longer exist after this commit.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-20 23:01:08 -05:00
Kevin Wolf
68d100e905 qcow2: Use coroutines
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-02 15:53:41 +02:00
Kevin Wolf
9e2a3701a1 qcow2: Fix in-flight list after qcow2_cache_put failure
If qcow2_cache_put returns an error during cluster allocation and the
allocation fails, it must be removed from the list of in-flight allocations.
Otherwise we'd get a loop in the list when the ACB is used for the next
allocation.

Luckily, this qcow2_cache_put shouldn't fail anyway because the L2 table is
only read, so that qcow2_cache_put doesn't even involve I/O.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-06-15 14:36:15 +02:00
Kevin Wolf
80fa3341a7 qcow2: Fix memory leaks in error cases
This fixes memory leaks that may be caused by I/O errors during L1 table growth
(can happen during save_vm) and in qemu-img check.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-06-08 11:56:40 +02:00
Kevin Wolf
16fde5f2c2 qcow2: Fix order in L2 table COW
When copying L2 tables (this happens only with internal snapshots), the order
wasn't completely safe, so that after a crash you could end up with a L2 table
that has too low refcount, possibly leading to corruption in the long run.

This patch puts the operations in the right order: First allocate the new
L2 table and replace the reference, and only then decrease the refcount of the
old table.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-02-10 13:24:29 +01:00
Kevin Wolf
8af3648843 qcow2: Fix error handling for reading compressed clusters
When reading a compressed cluster failed, qcow2 falsely returned success.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
2011-02-10 13:23:44 +01:00
Kevin Wolf
5ea929e3d1 qcow2: Add bdrv_discard support
This adds a bdrv_discard function to qcow2 that frees the discarded clusters.
It does not yet pass the discard on to the underlying file system driver, but
the space can be reused by future writes to the image.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-01-31 10:03:00 +01:00
Kevin Wolf
3de0a2944b qcow2: Batch flushes for COW
qcow2 calls bdrv_flush() after performing COW in order to ensure that the
L2 table change is never written before the copy is safe on disk. Now that the
L2 table is cached, we can wait with flushing until we write out the next L2
table.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-24 16:41:49 +01:00
Kevin Wolf
29c1a7301a qcow2: Use QcowCache
Use the new functions of qcow2-cache.c for everything that works on refcount
block and L2 tables.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-24 16:41:49 +01:00
Aurelien Jarno
653df36bbe qcow2: fix unaligned access
cpu_to_be64w() is called with an obviously non-aligned pointer. Use
cpu_to_be64wu() instead. It fixes unaligned accesses errors on IA64
hosts.

Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-24 11:08:50 +01:00
Jes Sorensen
7c80ab3f21 block/qcow2.c: rename qcow_ functions to qcow2_
It doesn't really make sense for functions in qcow2.c to be named
qcow_ so convert the names to match correctly.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:15:01 +01:00
Kevin Wolf
1c02e2a171 qcow2: Invalidate cache after failed read
The cache content may be destroyed after a failed read, better not use it any
more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2010-11-04 13:54:37 +01:00
Stefan Hajnoczi
72893756e0 qcow2: Support exact L1 table growth
The L1 table grow operation includes a size calculation that bumps up
the new L1 table size in order to anticipate the size needs of vmstate
data.  This helps reduce the number of times that the L1 table has to be
grown when vmstate data is appended.

This size overhead is not necessary during image creation,
bdrv_truncate(), or snapshot goto operations.  In fact, existing
qemu-iotests that exercise table growth are no longer able to trigger it
because image creation preallocates an L1 table that is too large after
changes to qcow_create2().

This patch keeps the size calculation but also adds exact growth for
callers that do not want to inflate the L1 table size unnecessarily.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-10-22 14:49:35 +02:00
Kevin Wolf
bd28f83565 qcow2: Avoid bounce buffers for AIO read requests
qcow2 used to use bounce buffers for any AIO requests. This does not only imply
unnecessary copying, but also unbounded allocations which should be avoided.

This patch removes bounce buffers from the normal AIO read path, and constrains
them to a constant size for encrypted images.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 15:39:42 +02:00
Kevin Wolf
9f8e668eb1 qcow2: Get rid of additional sync on COW
We always have a sync for the refcount update when a new cluster is
allocated. If we move this past the COW, we can save an additional sync.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 15:39:42 +02:00
Kevin Wolf
29216ed14f qcow2: Move sync out of qcow2_alloc_clusters
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-21 15:39:42 +02:00
Kevin Wolf
7ec5e6a4ca qcow2: Remove unnecessary flush after L2 write
When a new cluster was allocated, we only need a flush after the write to the
L2 table if it was a COW and we need to decrease the refcounts of the old
clusters.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-09-08 12:39:24 +02:00
Kevin Wolf
8b3b720620 qcow2: Use bdrv_(p)write_sync for metadata writes
Use bdrv_(p)write_sync to ensure metadata integrity in case of a crash.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-22 14:38:02 +02:00
Kevin Wolf
68dba0bf45 qcow2: Restore L1 entry on l2_allocate failure
If writing the L1 table to disk failed, we need to restore its old content in
memory to avoid inconsistencies.

Reported-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:41:58 +02:00
Kevin Wolf
55c17e9821 qcow2: Change l2_load to return 0/-errno
Provide the error code to the caller instead of just indicating success/error.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-28 13:29:12 +02:00
Kevin Wolf
1c46efaa0a qcow2: Allow qcow2_get_cluster_offset to return errors
qcow2_get_cluster_offset() looks up a given virtual disk offset and returns the
offset of the corresponding cluster in the image file. Errors (e.g. L2 table
can't be read) are currenctly indicated by a return value of 0, which is
unfortuately the same as for any unallocated cluster. So in effect we can't
check for errors.

This makes the old return value a by-reference parameter and returns the usual
0/-errno error code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-28 13:29:11 +02:00
Kevin Wolf
175e11526e qcow2: Fix error handling in l2_allocate
l2_allocate has some intermediate states in which the image is inconsistent.
Change the order to write to the L1 table only after the new L2 table has
successfully been initialized.

Also reset the L2 cache in failure case, it's very likely wrong.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-28 13:14:25 +02:00
Kevin Wolf
1b7c801b40 qcow2: Clear L2 table cache after write error
If the L2 table was already updated in cache, but writing it to disk has
failed, we must not continue using the changed version in the cache to stay
consistent with what's on the disk.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-28 13:14:25 +02:00
Kevin Wolf
66f82ceed6 block: Open the underlying image file in generic code
Format drivers shouldn't need to bother with things like file names, but rather
just get an open BlockDriverState for the underlying protocol. This patch
introduces this behaviour for bdrv_open implementation. For protocols which
need to access the filename to open their file/device/connection/... a new
callback bdrv_file_open is introduced which doesn't get an underlying file
opened.

For now, also some of the more obscure formats use bdrv_file_open because they
open() the file themselves instead of using the block.c functions. They need to
be fixed in later patches.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:30 +02:00
Kevin Wolf
c46e116723 qcow2: Return 0/-errno in l2_allocate
Returning NULL on error doesn't allow distinguishing between different errors.
Change the interface to return an integer for -errno.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:08:46 +02:00
Kevin Wolf
f7defcb627 qcow2: Return 0/-errno in write_l1_entry
Change write_l1_entry to return the real error code instead of -1.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:08:46 +02:00
Kevin Wolf
c835d00fc8 qcow2: Fix error return code in qcow2_alloc_cluster_link_l2
Fix qcow2_alloc_cluster_link_l2 to return the real error code like it does in
all other error cases.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:08:46 +02:00
Kevin Wolf
79a31189d4 qcow2: Return 0/-errno in write_l2_entries
Change write_l2_entries to return the real error code instead of -1.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:08:46 +02:00
Kevin Wolf
8252278afb qcow2: Trigger blkdebug events
This adds blkdebug events to qcow2 to allow injecting I/O errors in specific
places.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:08:46 +02:00
Kevin Wolf
c644db3d53 qcow2: Remove request from in-flight list after error
If we complete a request with a failure we need to remove it from the list of
requests that are in flight. If we don't do it, the next time the same AIOCB is
used for a cluster allocation it will create a loop in the list and qemu will
hang in an endless loop.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2010-04-10 01:25:30 +02:00
Kevin Wolf
4805bb6696 qcow2: Fix access after end of array
If a write requests crosses a L2 table boundary and all clusters until the
end of the L2 table are usable for the request, we must not look at the next
L2 entry because we already have arrived at the end of the array.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-19 15:53:54 -06:00
Kevin Wolf
f4f0d391b2 qcow2: Fix signedness bugs
Checking for return codes < 0 isn't really going to work with unsigned
types. Use signed types instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-10 11:56:57 -06:00
Kevin Wolf
5d757b563d qcow2: Don't ignore qcow2_alloc_clusters return value
Now that qcow2_alloc_clusters can return error codes, we must handle them in
the callers of qcow2_alloc_clusters.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 14:59:19 -06:00
Kevin Wolf
148da7ea9d qcow2: Return 0/-errno in qcow2_alloc_cluster_offset
Returning 0/-errno allows it to distingush different errors classes. The
cluster offset of newly allocated clusters is now returned in the QCowL2Meta
struct.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 14:59:19 -06:00
Kevin Wolf
1e3e8f1a43 qcow2: Return 0/-errno in get_cluster_table
Switching to 0/-errno allows it to distinguish different error cases.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 14:59:19 -06:00
Kevin Wolf
fb8fa77ce1 qcow2: Fix error handling in qcow2_grow_l1_table
Return the appropriate error value instead of always using EIO. Don't free the
L1 table on errors, we still need it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 14:59:19 -06:00
Stefan Weil
d191d12d5f qcow2: Allow qcow2 disk images with size zero
Images with disk size 0 may be used for
VM snapshots, but not to save normal block data.

It is possible to create such images using
qemu-img, but opening them later fails.

So even "qemu-img info image.qcow2" is not
possible for an image created with
"qemu-img create -f qcow2 image.qcow2 0".

This is fixed here.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-11-09 08:43:01 -06:00
Kevin Wolf
72ecf02d7d Revert "qcow2: Bring synchronous read/write back to life"
It was merely a workaround and the real fix is done now.
This reverts commit ef845c3bf4.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-27 12:28:59 -05:00
Kevin Wolf
ef845c3bf4 qcow2: Bring synchronous read/write back to life
When the synchronous read and write functions were dropped, they were replaced
by generic emulation functions. Unfortunately, these emulation functions don't
provide the same semantics as the original functions did.

The original bdrv_read would mean that we read some data synchronously and that
we won't be interrupted during this read. The latter assumption is no longer
true with the emulation function which needs to use qemu_aio_poll and therefore
allows the callback of any other concurrent AIO request to be run during the
read. Which in turn means that (meta)data read earlier could have changed and
be invalid now. qcow2 is not prepared to work in this way and it's just scary
how many places there are where other requests could run.

I'm not sure yet where exactly it breaks, but you'll see breakage with virtio
on qcow2 with a backing file. Providing synchronous functions again fixes the
problem for me.

Patchworks-ID: 35437
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-15 09:32:04 -05:00
Kevin Wolf
80ee15a6b2 qcow2: Increase maximum cluster size to 2 MB
This patch increases the maximum qcow2 cluster size to 2 MB. Starting with 128k
clusters, L2 tables span 2 GB or more of virtual disk space, causing 32 bit
truncation and wraparound of signed integers. Therefore some variables need to
use a larger data type.

While being at reviewing data types, change some integers that are used for
array indices to unsigned. In some places they were checked against some upper
limit but not for negative values. This could avoid potential segfaults with
corrupted qcow2 images.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-05 09:32:52 -05:00
Blue Swirl
72cf2d4f0e Fix sys-queue.h conflict for good
Problem: Our file sys-queue.h is a copy of the BSD file, but there are
some additions and it's not entirely compatible. Because of that, there have
been conflicts with system headers on BSD systems. Some hacks have been
introduced in the commits 15cc923584,
f40d753718,
96555a96d7 and
3990d09adf but the fixes were fragile.

Solution: Avoid the conflict entirely by renaming the functions and the
file. Revert the previous hacks.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-09-12 07:36:22 +00:00
Kevin Wolf
f214978a42 qcow2: Order concurrent AIO requests on the same unallocated cluster
When two AIO requests write to the same cluster, and this cluster is
unallocated, currently both requests allocate a new cluster and the second one
merges the first one when it is completed. This means an cluster allocation, a
read and a cluster deallocation which cause some overhead. If we simply let the
second request wait until the first one is done, we improve overall performance
with AIO requests (specifially, qcow2/virtio combinations).

This patch maintains a list of in-flight requests that have allocated new
clusters. A second request touching the same cluster is limited so that it
either doesn't touch the allocation of the first request (so it can have a
non-overlapping allocation) or it waits for the first request to complete.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-09 17:31:26 -05:00
Kevin Wolf
3f6a3ee51e qcow2: Fix L1 table memory allocation
Contrary to what one could expect, the size of L1 tables is not cluster
aligned. So as we're writing whole sectors now instead of single entries,
we need to ensure that the L1 table in memory is large enough; otherwise
write would access memory after the end of the L1 table.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-10 13:44:29 -05:00
Kevin Wolf
4c1612d954 alloc_cluster_link_l2: Write complete sectors
When updating the L2 tables in alloc_cluster_link_l2(), write complete
sectors instead of updating single entries.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16 15:18:36 -05:00
Kevin Wolf
6583e3c7e8 l2_allocate: Write complete sectors
When modifying the L1 table, l2_allocate() needs to write complete sectors
instead of single entries. The L1 table is already in memory, reading it from
disk in the block layer to align the request is wasted performance.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16 15:18:36 -05:00
Kevin Wolf
ed6ccf0f51 qcow2: Rename global functions
The qcow2 source is now split into several more manageable files. During the
conversion quite some functions that were static before needed to be changed to
be global to make the source compile again.

We were lucky enough not to get name conflicts with these additional global
names, but they are not nice. This patch adds a qcow2_ prefix to all of the
global functions in qcow2.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16 15:18:36 -05:00
Kevin Wolf
45aba42fba qcow2: Split out guest cluster functions
qcow2-cluster.c contains all functions related to the management of guest
clusters, i.e. what the guest sees on its virtual disk. This code is about
mapping these guest clusters to host clusters in the image file using the
two-level lookup tables.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16 15:18:36 -05:00