Commit Graph

1196 Commits

Author SHA1 Message Date
Max Reitz
998b959c1e qcow2: Use pread for inactive L1 in overlap check
Currently, qcow2_check_metadata_overlap uses bdrv_read to read inactive
L1 tables from disk. The number of sectors to read is calculated through
a truncating integer division, therefore, if the L1 table size is not a
multiple of the sector size, the final entries will not be read and
their entries in memory remain undefined (from the g_malloc).
Using bdrv_pread fixes this.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 16:49:59 +02:00
Max Reitz
37764dfb71 qcow2: Add support for ImageInfoSpecific
Add a new ImageInfoSpecificQCow2 type as a subtype of ImageInfoSpecific.
This contains the compatibility level as a string and an optional
lazy_refcounts boolean (optional means mandatory for compat >= 1.1 and
not available for compat == 0.10).

Also, add qcow2_get_specific_info, which returns this information.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 14:03:57 +02:00
Max Reitz
a8d8ecb77f block/qapi: Human-readable ImageInfoSpecific dump
Add a function for generically dumping the ImageInfoSpecific information
in a human-readable format to block/qapi.c.

Use this function in bdrv_image_info_dump and qemu-io-cmds.c:info_f to
allow qemu-img info resp. qemu-io -c info to print that format specific
information.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 10:52:54 +02:00
Max Reitz
eae041fe6f block: Add bdrv_get_specific_info
Add a function for retrieving an ImageInfoSpecific object from a block
driver.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 10:52:54 +02:00
Fam Zheng
79e14bf778 qapi: make use of new BlockJobType
Switch the string to enum type BlockJobType in BlockJobDriver.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 10:52:54 +02:00
Fam Zheng
3fc4b10af0 blockjob: rename BlockJobType to BlockJobDriver
We will use BlockJobType as the enum type name of block jobs in QAPI,
rename current BlockJobType to BlockJobDriver, which will eventually
become a set of operations, similar to block drivers.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 10:52:54 +02:00
Anthony Liguori
634ebf4b17 Merge remote-tracking branch 'bonzini/scsi-next' into staging
# By Asias He (1) and Peter Lieven (1)
# Via Paolo Bonzini
* bonzini/scsi-next:
  scsi: Allocate SCSITargetReq r->buf dynamically [CVE-2013-4344]
  block/iscsi: reenable iscsi_co_get_block_status

Message-id: 1381332391-8781-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
2013-10-10 10:03:00 -07:00
Peter Lieven
24c7608a5d block/iscsi: reenable iscsi_co_get_block_status
Commit f35c934a accidently disabled iscsi_co_get_block_status for all
libiscsi versions. Its not possible to check for enumeration constants
in the C preprocessor. This patch changes the check to the preprocessor
constant LIBISCSI_FEATURE_IOVECTOR which was introduced shortly after
get_lba_status support was added to libiscsi.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-09 10:43:42 +02:00
Max Reitz
e3b21ef9e0 qcow2: Free allocated L2 cluster on error
If an error occurs in l2_allocate, the allocated (but unused) L2 cluster
should be freed.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-10-07 13:23:19 +02:00
Max Reitz
fda74f826b qcow2: Switch L1 table in a single sequence
Switching the L1 table in memory should be an atomic operation, as far
as possible. Calling qcow2_free_clusters on the old L1 table on disk is
not a good idea when the old L1 table is no longer valid and the address
to the new one hasn't yet been written into the corresponding
BDRVQcowState field. To be more specific, this can lead to segfaults due
to qcow2_check_metadata_overlap trying to access the L1 table during the
free operation.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-10-02 15:38:29 +02:00
Jeff Cody
5641bf4056 block: vhdx - add migration blocker
This blocks migration for VHDX image files, until the
functionality can be supported.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-10-02 15:24:39 +02:00
Max Reitz
db0749012b qcow2: CHECK_OFLAG_COPIED is obsolete
CHECK_OFLAG_COPIED as a parameter to check_refcounts_l1 and
check_refcounts_l2 is obselete now, since the OFLAG_COPIED consistency
check is actually no longer performed by these functions (but by
check_oflag_copied).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-10-02 11:40:41 +02:00
Max Reitz
1e242b5544 qcow2: Correct endianness in overlap check
If an inactive L1 table is loaded from disk, its entries are in big
endian and have to be converted to host byte order before using them.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-10-02 11:06:35 +02:00
Kevin Wolf
61653008ad qcow2: Remove useless count_contiguous_clusters() parameter
All callers pass start = 0, and it's doubtful if any other value would
actually do what you expect. Remove the parameter.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2013-09-27 17:22:43 +02:00
Max Reitz
22f0dd29af qcow2: COMPRESSED on count_contiguous_clusters
Compressed clusters can never be contiguous, therefore the corresponding
flag does not need to be given explicitly to count_contiguous_clusters.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-27 17:22:43 +02:00
Max Reitz
15684a4742 qcow2: count_contiguous_clusters and compression
The function is not intended to be used on compressed clusters and will
not work correctly, if used anyway, since L2E_OFFSET_MASK is not the
right mask for determining the offset of compressed clusters. Therefore,
assert that the first cluster is not compressed and always include the
compression flag in the mask of significant flags, i.e., stop the search
as soon as a compressed cluster occurs.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-27 17:22:43 +02:00
Max Reitz
320c706666 qcow2: Free only newly allocated clusters on error
In expand_zero_clusters_in_l1, a new cluster is only allocated if it was
not already preallocated. On error, such preallocated clusters should
not be freed, but only the newly allocated ones.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-27 17:22:43 +02:00
Max Reitz
be0b742ee3 qcow2: Always use error path in l2_allocate
Just returning -errno in some cases prevents
trace_qcow2_l2_allocate_done from being executed (and, in one case, also
the unused allocated L2 table from being freed). Always going down the
error path fixes this.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-27 17:22:43 +02:00
Max Reitz
8585afd813 qcow2: Don't put invalid L2 table into cache
In l2_allocate, the fail path is executed if qcow2_cache_flush fails.
However, the L2 table has not yet been fetched from the L2 table cache.
The qcow2_cache_put in the fail path therefore basically gives an
undefined argument as the L2 table address (in this case).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-27 11:31:59 +02:00
Max Reitz
e390cf5a97 qcow2: Correct bitmap size in zero expansion
Since the expanded_clusters bitmap is addressed using host offsets in
the underlying image file, the correct size to use for allocating the
bitmap is not determined by the guest disk image but by the underlying
host image file.

Furthermore, this size may change during the expansion due to cluster
allocations on growable image files. In this case, the bitmap needs to
be resized as well to reflect the growth.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-27 11:16:35 +02:00
Max Reitz
c01dbccbad qcow2: Assert against currently impossible overflow
If qcow2_alloc_cluster_link_l2 is called with a QCowL2Meta describing a
request crossing L2 boundaries, a buffer overflow will occur. This is
impossible right now since such requests are never generated (every
request is shortened to L2 boundaries before) and probably also
completely unintended (considering the name "QCowL2Meta"), however, it
is still worth an assertion.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 21:57:44 +02:00
Jeff Cody
687fb89366 block: qed - use QEMU_PACKED for on-disk structures
QEDHeader is read, and written, directly from on-disk images
via bdrv_pread()/write().  To avoid any unintentional padding,
these structs should be packed.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 20:51:15 +02:00
Jeff Cody
c4217f645d block: qcow2 - used QEMU_PACKED for on-disk structures
QCowHeader and QCowExtension are structs that reside in the on-disk
image format, and are read and written directly via bdrv_pread()/write(),
and as such should be packed to avoid any unintentional struct padding.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 20:51:13 +02:00
Jeff Cody
e54835c06d block: vpc - use QEMU_PACKED for on-disk structures
The VHD footer and header structs (vhd_footer and vhd_dyndisk_header)
are on-disk structures for the image format, and as such should be
packed.

Go ahead and make these typedefs as well, with the preferred QEMU
naming convention, so that the packed attribute is used consistently
with the struct.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 20:51:10 +02:00
Jeff Cody
8368febd81 block: vdi - use QEMU_PACKED for on-disk structures
The header struct VdiHeader is an on-disk structure for the image
format, and as such should be packed.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 20:51:05 +02:00
Stefan Hajnoczi
9e6337d081 rbd: avoid qemu_rbd_snap_list() memory leaks
When there are no snapshots qemu_rbd_snap_list() returns 0 and the
snapshot table pointer is NULL.  Don't forget to free the snaps buffer
we allocated for librbd rbd_snap_list().

When the function succeeds don't forget to free the snaps buffer after
calling rbd_snap_list_end().

Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 16:22:00 +02:00
Stefan Weil
c3e4f43a99 block: Fix compiler warning (-Werror=uninitialized)
The patch fixes a warning from gcc (Debian 4.6.3-14+rpi1) 4.6.3:

block/stream.c:141:22: error:
‘copy’ may be used uninitialized in this function [-Werror=uninitialized]

This is not a real bug - a better compiler would not complain.

Now 'copy' has always a defined value, so the check for ret >= 0
can be removed.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 16:21:28 +02:00
Benoît Canet
030be32184 block: introduce BlockDriver.bdrv_needs_filename to enable some drivers.
Some drivers will have driver specifics options but no filename.
This new bool allow the block layer to treat them correctly.

The .bdrv_needs_filename is set in drivers not having .bdrv_parse_filename and
not having .bdrv_open.

The first exception to this rule will be the quorum driver.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 16:21:28 +02:00
Fam Zheng
301c7d38a0 vmdk: fix cluster size check for flat extents
We use the extent size as cluster size for flat extents (where no L1/L2
table is allocated so it's safe) reuse sector calculating code with
sparse extents.

Don't pass in the cluster size for adding flat extent, just set it to
sectors later, then the cluster size checking will not fail.

The cluster_sectors is changed to int64_t to allow big flat extent.

Without this, flat extent opening is broken:

    # qemu-img create -f vmdk -o subformat=monolithicFlat /tmp/a.vmdk 100G
    Formatting '/tmp/a.vmdk', fmt=vmdk size=107374182400 compat6=off subformat='monolithicFlat' zeroed_grain=off
    # qemu-img info /tmp/a.vmdk
    image: /tmp/a.vmdk
    file format: raw
    virtual size: 0 (0 bytes)
    disk size: 4.0K

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 16:21:28 +02:00
Max Reitz
7454d60045 qcow2: Don't shadow return value
When trying to update the refcounts for a snapshot, the return value of
update_refcount on a compressed cluster was pretty much ignored,
cancelling the update on error but returning 0. This is caused by an
inner "ret" variable shadowing the outer one (the latter is used in the
return statement).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 10:08:56 +02:00
Anthony Liguori
16121fa39e Merge remote-tracking branch 'stefanha/block' into staging
# By Stefan Hajnoczi (4) and others
# Via Stefan Hajnoczi
* stefanha/block:
  virtio-blk: do not relay a previous driver's WCE configuration to the current
  blockdev: do not default cache.no-flush to true
  block: don't lose data from last incomplete sector
  qcow2: Correct snapshots size for overlap check
  coroutine: fix /perf/nesting coroutine benchmark
  coroutine: add qemu_coroutine_yield benchmark
  qemu-timer: do not take the lock in timer_pending
  qemu-timer: make qemu_timer_mod_ns() and qemu_timer_del() thread-safe
  qemu-timer: drop outdated signal safety comments
  osdep: warn if open(O_DIRECT) on fails with EINVAL
  libcacard: link against qemu-error.o for error_report()

Message-id: 1379698931-946-1-git-send-email-stefanha@redhat.com
2013-09-23 11:53:05 -05:00
Anthony Liguori
f3ca508f00 Merge remote-tracking branch 'bonzini/scsi-next' into staging
# By Hervé Poussineau (5) and Stefan Weil (1)
# Via Paolo Bonzini
* bonzini/scsi-next:
  block/iscsi: Drop iscsi_co_get_block_status for older versions of libiscsi
  lsi: add 53C810 variant
  lsi: remove todo
  lsi: ignore write accesses to CTEST0 registers
  lsi: check ssid versus sdid only if ssid is valid
  lsi: use constant name instead of its value
2013-09-23 11:52:32 -05:00
Max Reitz
0f39ac9a07 qcow2: Correct snapshots size for overlap check
Using s->snapshots_size instead of snapshots_size for the metadata
overlap check in qcow2_write_snapshots leads to the detection of an
overlap with the main qcow2 image header when deleting the last
snapshot, since s->snapshots_size has not yet been updated and is
therefore non-zero. However, the offset returned by qcow2_alloc_clusters
will be zero since snapshots_size is zero. Therefore, an overlap is
detected albeit no such will occur.

This patch fixes this by replacing s->snapshots_size by snapshots_size
when calling qcow2_pre_write_overlap_check.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-20 12:48:03 +02:00
Stefan Weil
f35c934a5a block/iscsi: Drop iscsi_co_get_block_status for older versions of libiscsi
Debian wheezy includes libiscsi-dev 1.4.0 which does not provide
SCSI_PROVISIONING_TYPE_DEALLOCATED. Drop iscsi_co_get_block_status
in this case to allow compilation without errors.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-18 01:28:50 +02:00
Anthony Liguori
5dc11192b2 Merge remote-tracking branch 'kwolf/for-anthony' into staging
# By Max Reitz (16) and others
# Via Kevin Wolf
* kwolf/for-anthony: (33 commits)
  qemu-iotests: Fix test 038
  block: Assert validity of BdrvActionOps
  qemu-iotests: Cleanup test image in test number 007
  qemu-img: fix invalid JSON
  coroutine: add ./configure --disable-coroutine-pool
  qemu-iotests: Adjustments due to error propagation
  qcow2: Use Error parameter
  qemu-img create: Emit filename on error
  block: Error parameter for create functions
  block: Error parameter for open functions
  bdrv: Use "Error" for creating images
  bdrv: Use "Error" for opening images
  qemu-iotests: add 057 internal snapshot for block device test case
  hmp: add interface hmp_snapshot_delete_blkdev_internal
  hmp: add interface hmp_snapshot_blkdev_internal
  qmp: add interface blockdev-snapshot-delete-internal-sync
  qmp: add interface blockdev-snapshot-internal-sync
  qmp: add internal snapshot support in qmp_transaction
  snapshot: distinguish id and name in snapshot delete
  snapshot: new function bdrv_snapshot_find_by_id_and_name()
  ...

Message-id: 1379073063-14963-1-git-send-email-kwolf@redhat.com
2013-09-17 09:51:40 -05:00
Peter Lieven
65f3e33964 iscsi: split discard requests in multiple parts
Replace .bdrv_aio_discard with .bdrv_co_discard so that discard
requests can be split in multiple parts, each for a small amount
of sectors.

This is useful because we expose a generic API with no limit
on the amount of sectors that can be unmapped in one request.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-12 13:14:19 +02:00
Max Reitz
3ef6c40ad0 qcow2: Use Error parameter
Employ usage of the new Error ** parameter in qcow2_open, qcow2_create
and associated functions.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2013-09-12 10:12:48 +02:00
Max Reitz
cc84d90ff5 block: Error parameter for create functions
Add an Error ** parameter to bdrv_create and its associated functions to
allow more specific error messages.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2013-09-12 10:12:48 +02:00
Max Reitz
34b5d2c68e block: Error parameter for open functions
Add an Error ** parameter to bdrv_open, bdrv_file_open and associated
functions to allow more specific error messages.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2013-09-12 10:12:48 +02:00
Max Reitz
d5124c00d8 bdrv: Use "Error" for creating images
Add an Error ** parameter to BlockDriver.bdrv_create to allow more
specific error messages.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2013-09-12 10:12:48 +02:00
Max Reitz
015a1036a7 bdrv: Use "Error" for opening images
Add an Error ** parameter to BlockDriver.bdrv_open and
BlockDriver.bdrv_file_open to allow more specific error messages.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2013-09-12 10:12:47 +02:00
Wenchao Xia
a89d89d3e6 snapshot: distinguish id and name in snapshot delete
Snapshot creation actually already distinguish id and name since it take
a structured parameter *sn, but delete can't. Later an accurate delete
is needed in qmp_transaction abort and blockdev-snapshot-delete-sync,
so change its prototype. Also *errp is added to tip error, but return
value is kepted to let caller check what kind of error happens. Existing
caller for it are savevm, delvm and qemu-img, they are not impacted by
introducing a new function bdrv_snapshot_delete_by_id_or_name(), which
check the return value and do the operation again.

Before this patch:
  For qcow2, it search id first then name to find the one to delete.
  For rbd, it search name.
  For sheepdog, it does nothing.

After this patch:
  For qcow2, logic is the same by call it twice in caller.
  For rbd, it always fails in delete with id, but still search for name
in second try, no change to user.

Some code for *errp is based on Pavel's patch.

Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-12 10:12:47 +02:00
Wenchao Xia
2ea1dd758c snapshot: new function bdrv_snapshot_find_by_id_and_name()
To make it clear about id and name in searching, add this API
to distinguish them. Caller can choose to search by id or name,
*errp will be set only for exception.

Some code are modified based on Pavel's patch.

Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-12 10:12:47 +02:00
Max Reitz
9296b3ed70 qcow2: Implement bdrv_amend_options
Implement bdrv_amend_options for compat, size, backing_file, backing_fmt
and lazy_refcounts.

Downgrading images from compat=1.1 to compat=0.10 is achieved through
handling all incompatible flags accordingly, clearing all compatible and
autoclear flags and expanding all zero clusters.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-12 10:12:46 +02:00
Max Reitz
b6481f376b qcow2: Save refcount order in BDRVQcowState
Save the image refcount order in BDRVQcowState. This will be relevant
for future code supporting different refcount orders than four and also
for code that needs to verify a certain refcount order for an opened
image.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-12 10:12:46 +02:00
Max Reitz
32b6444d23 qcow2-cluster: Expand zero clusters
Add functionality for expanding zero clusters. This is necessary for
downgrading the image version to one without zero cluster support.

For non-backed images, this function may also just discard zero clusters
instead of truly expanding them.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-12 10:12:46 +02:00
Max Reitz
e7108feaac qcow2-cache: Empty cache
Add a function for emptying a cache, i.e., flushing it and marking all
elements invalid.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-12 10:12:46 +02:00
Tal Kain
56e023af80 raw-win32.c: Fix incorrect handling behaviour of small block files
It is a valid case that the read data's size is smaller than the
requested size since there could be files that are smaller than
the minimum block size (For ex. when a VMDK disk descriptor file)

Signed-off-by: Tal Kain <tal.kain@ravellosystems.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-12 10:12:46 +02:00
Kevin Wolf
1ebf561c11 qcow2: Discard VM state in active L1 after creating snapshot
During savevm, the VM state is written to the active L1 of the image and
then a snapshot is taken. After that, the VM state isn't needed any more
in the active L1 and should be discarded. This is implemented by this
patch.

The impact of not discarding the VM state is that a snapshot can never
become smaller than any previous snapshot (because it would be padded
with old VM state), and more importantly that future savevm operations
cause unnecessary COWs (with associated flushes), which makes subsequent
snapshots much slower.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2013-09-12 10:12:46 +02:00
Kevin Wolf
670df5e3b4 qcow2: Pass discard type to qcow2_discard_clusters()
The function will be used internally instead of only being called for
guest discard requests.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2013-09-12 10:12:46 +02:00
Peter Lieven
54a5c1d5db iscsi: add .bdrv_get_block_status
this patch adds a coroutine for .bdrv_co_block_status as well as
a generic framework that can be used to build coroutines in block/iscsi.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-12 08:46:21 +02:00
Peter Lieven
f18a7cbb09 iscsi: add logical block provisioning information to iscsilun
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-12 08:46:21 +02:00
Paolo Bonzini
5accc8408f scsi: prefer UUID to VM name for the initiator name
The UUID is unique even across multiple hosts, thus it is
better than a VM name even if it is less user-friendly.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-12 08:46:21 +02:00
Paolo Bonzini
f5f7abcfd5 raw-posix: report unwritten extents as zero
These are created for example with XFS_IOC_ZERO_RANGE.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini
63390a8d14 raw-posix: return get_block_status data and flags
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini
4bc74be997 block: return get_block_status data and flags for formats
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini
b6b8a33354 block: introduce bdrv_get_block_status API
For now, bdrv_get_block_status is just another name for bdrv_is_allocated.
The next patches will add more flags.

This also touches all block drivers with a mostly mechanical rename.  The
sole exception is cow; because it calls cow_co_is_allocated from the read
code, we keep that function and make cow_co_get_block_status a wrapper.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini
d663640c04 block: expect errors from bdrv_co_is_allocated
Some bdrv_is_allocated callers do not expect errors, but the fallback
in qcow2.c might make other callers trip on assertion failures or
infinite loops.

Fix the callers to always look for errors.

Cc: qemu-stable@nongnu.org
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini
4f5786376e block: remove bdrv_is_allocated_above/bdrv_co_is_allocated_above distinction
Now that bdrv_is_allocated detects coroutine context, the two can
use the same code.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini
bdad13b9de block: make bdrv_co_is_allocated static
bdrv_is_allocated can detect coroutine context and go through a fast
path, similar to other block layer functions.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Paolo Bonzini
e641c1e81e cow: do not call bdrv_co_is_allocated
As we change bdrv_is_allocated to gather more information from bs and
bs->file, it will become a bit slower.  It is still appropriate for online
jobs, but not for reads/writes.  Call the internal function instead.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Paolo Bonzini
26ae980492 cow: make writes go at a less indecent speed
Only sync once per write, rather than once per sector.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Paolo Bonzini
276cbc7f2f cow: make reads go at a decent speed
Do not do two reads for each sector; load each sector of the bitmap
and use bitmap operations to process it.

Writes are still dog slow!

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Fam Zheng
4f6fd3491c block: make bdrv_delete() static
Manage BlockDriverState lifecycle with refcnt, so bdrv_delete() is no
longer public and should be called by bdrv_unref() if refcnt is
decreased to 0.

This is an identical change because effectively, there's no multiple
reference of BDS now: no caller of bdrv_ref() yet, only bdrv_new() sets
bs->refcnt to 1, so all bdrv_unref() now actually delete the BDS.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Fam Zheng
13c91cb7e2 iscsi: use bdrv_new() instead of stack structure
BlockDriverState structure needs bdrv_new() to initialize refcnt, don't
allocate a local structure variable and memset to 0, becasue with coming
refcnt implementation, bdrv_unref will crash if bs->refcnt not
initialized to 1.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Fam Zheng
3d34c6cd99 vvfat: use bdrv_new() to allocate BlockDriverState
we need bdrv_new() to properly initialize BDS, don't allocate memory
manually.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Stefan Weil
68dc036488 w32: Fix access to host devices (regression)
QEMU failed to open host devices like \\.\PhysicalDrive0 (first hard disk)
since some time (commit 8a79380b8ef1b02d2abd705dd026a18863b09020?).

Those devices use hdev_open which did not use the latest API for options.
This resulted in a fatal runtime error:

  Block protocol 'host_device' doesn't support the option 'filename'

Duplicate code from raw_open to fix this.

Cc: qemu-stable@nongnu.org
Reported-by: David Brenner <david.brenner3@gmail.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Benoît Canet
2024c1df43 block: Add iops_size to do the iops accounting for a given io size.
This feature can be used in case where users are avoiding the iops limit by
doing jumbo I/Os hammering the storage backend.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:07 +02:00
Benoît Canet
3e9fab690d block: Add support for throttling burst max in QMP and the command line.
The max parameter of the leaky bucket throttling algorithm can be used to
allow the guest to do bursts.
The max value is a pool of I/O that the guest can use without being throttled
at all. Throttling is triggered once this pool is empty.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:07 +02:00
Benoît Canet
cc0681c454 block: Enable the new throttling code in the block layer.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:07 +02:00
Anthony Liguori
bb7d4d82b6 Merge remote-tracking branch 'kwolf/for-anthony' into staging
# By Max Reitz (11) and others
# Via Kevin Wolf
* kwolf/for-anthony: (26 commits)
  qemu-iotests: Overlapping cluster allocations
  qcow2_check: Mark image consistent
  qcow2-refcount: Repair shared refcount blocks
  qcow2-refcount: Repair OFLAG_COPIED errors
  qcow2-refcount: Move OFLAG_COPIED checks
  qcow2: Employ metadata overlap checks
  qcow2: Metadata overlap checks
  qcow2: Add corrupt bit
  qemu-iotests: Snapshotting zero clusters
  qcow2-refcount: Snapshot update for zero clusters
  option: Add assigned flag to QEMUOptionParameter
  gluster: Abort on AIO completion failure
  block: Remove old raw driver
  switch raw block driver from "raw.o" to "raw_bsd.o"
  raw_bsd: register bdrv_raw
  raw_bsd: add raw_create_options
  raw_bsd: introduce "special members"
  raw_bsd: add raw_create()
  raw_bsd: emit debug events in bdrv_co_readv() and bdrv_co_writev()
  add skeleton for BSD licensed "raw" BlockDriver
  ...

Message-id: 1378111792-20436-1-git-send-email-kwolf@redhat.com
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
2013-09-03 12:32:46 -05:00
Max Reitz
24530f3e06 qcow2_check: Mark image consistent
If no corruptions remain after an image repair (and no errors have been
encountered), clear the corrupt flag in qcow2_check.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-02 10:15:15 +02:00
Max Reitz
afa50193cd qcow2-refcount: Repair shared refcount blocks
If the refcount of a refcount block is greater than one, we can at least
try to repair that problem by duplicating the affected block.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-02 10:06:59 +02:00
Stefan Hajnoczi
5b21a2ae4d curl: qemu_bh_new() can never return NULL
Drop error code path which cannot be taken since qemu_bh_new() does not
return NULL.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2013-09-01 19:11:56 +04:00
Stefan Weil
4c293dc6e4 misc: Fix some typos in names and comments
Most typos were found using a modified version of codespell:

accross -> across
issueing -> issuing
TICNT_THRESHHOLD -> TICNT_THRESHOLD
bandwith -> bandwidth
VCARD_7816_PROPIETARY -> VCARD_7816_PROPRIETARY
occured -> occurred
gaurantee -> guarantee
sofware -> software

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2013-09-01 18:59:24 +04:00
Max Reitz
e23e400ec6 qcow2-refcount: Repair OFLAG_COPIED errors
Since the OFLAG_COPIED checks are now executed after the refcounts have
been repaired (if repairing), it is safe to assume that they are correct
but the OFLAG_COPIED flag may be not. Therefore, if its value differs
from what it should be (considering the according refcount), that
discrepancy can be repaired by correctly setting (or clearing that flag.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:48:44 +02:00
Max Reitz
4f6ed88c03 qcow2-refcount: Move OFLAG_COPIED checks
Move the OFLAG_COPIED checks out of check_refcounts_l1 and
check_refcounts_l2 and after the actual refcount checks/fixes (since the
refcounts might actually change there).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:48:44 +02:00
Max Reitz
cf93980e77 qcow2: Employ metadata overlap checks
The pre-write overlap check function is now called before most of the
qcow2 writes (aborting it on collision or other error).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:48:43 +02:00
Max Reitz
a40f1c2add qcow2: Metadata overlap checks
Two new functions are added; the first one checks a given range in the
image file for overlaps with metadata (main header, L1 tables, L2
tables, refcount table and blocks).

The second one should be used immediately before writing to the image
file as it calls the first function and, upon collision, marks the
image as corrupt and makes the BDS unusable, thereby preventing
further access.

Both functions take a bitmask argument specifying the structures which
should be checked for overlaps, making it possible to also check
metadata writes against colliding with other structures.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:48:43 +02:00
Max Reitz
69c9872653 qcow2: Add corrupt bit
This adds an incompatible bit indicating corruption to qcow2. Any image
with this bit set may not be written to unless for repairing (and
subsequently clearing the bit if the repair has been successful).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:48:43 +02:00
Max Reitz
8b81a7b6ba qcow2-refcount: Snapshot update for zero clusters
Account for all cluster types in qcow2_update_snapshot_refcounts;
this prevents this function from updating the refcount of unallocated
zero clusters which effectively led to wrong adjustments of the refcount
of cluster 0 (the main qcow2 header). This in turn resulted in images
with (unallocated) zero clusters having a cluster 0 refcount greater
than one after creating a snapshot.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:28:52 +02:00
Bharata B Rao
9faa574f7d gluster: Abort on AIO completion failure
Currently if gluster AIO callback thread fails to notify the QEMU thread about
AIO completion, we try graceful recovery by marking the disk drive as
inaccessible. This error recovery code is race-prone as found by Asias and
Stefan. However as found out by Paolo, this kind of error is impossible and
hence simplify the code that handles this error recovery.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:28:52 +02:00
Kevin Wolf
e5b1d99f55 block: Remove old raw driver
This is unused code now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:28:52 +02:00
Laszlo Ersek
7a6d3fc594 switch raw block driver from "raw.o" to "raw_bsd.o"
"Incoming" function prototypes and "outgoing" function calls must match
reality. Implemented using the "struct BlockDriver" definition in
"include/block/block_int.h", and gcc errors & warnings.

v1->v2:

On 08/20/13 09:51, Kevin Wolf wrote:
> Am 18.08.2013 um 16:29 hat Paolo Bonzini geschrieben:
>> Il 16/08/2013 16:15, Laszlo Ersek ha scritto:
>>> +static int raw_reopen_prepare(BDRVReopenState *reopen_state,
>>> +                              BlockReopenQueue *queue, Error **errp)
>>>  {
>>> -    return bdrv_reopen_prepare(bs->file);
>>> +    BDRVReopenState tmp = *reopen_state;
>>> +
>>> +    tmp.bs = tmp.bs->file;
>>> +    return bdrv_reopen_prepare(&tmp, queue, errp);
>>>  }
>>
>> This should just return zero, my fault.
>
> Which is because bdrv_reopen_queue() already queues bs->file for reopen.
> The simple return 0; implementation is shared by all other format drivers
> that support reopening images.

Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:28:52 +02:00
Laszlo Ersek
775d6afd5c raw_bsd: register bdrv_raw
On 08/05/13 15:03, Paolo Bonzini wrote:
>
> [...]
>
> 5) Formats are registered with bdrv_register (takes a BlockDriver*). You
> also need to pass the caller of bdrv_register to block_init.

Fill in the BlockDriver structure with the raw_*() functions that have
been added to "block/raw_bsd.c", in the order the fields are defined in
"include/block/block_int.h".

I needed more explanation / naming examples for registering the driver
than what Paolo gave me, so I copied / adapted from "block/qcow2.c". The
parts I took as basis for modification are blamed on

    commit 5efa9d5a8b
    Author: Anthony Liguori <aliguori@us.ibm.com>
    Date:   Sat May 9 17:03:42 2009 -0500

        Convert block infrastructure to use new module init functionality

    commit 20d97356c9
    Author: Blue Swirl <blauwirbel@gmail.com>
    Date:   Fri Apr 23 20:19:47 2010 +0000

        Fix OpenBSD build

Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:28:52 +02:00
Laszlo Ersek
ff369a483d raw_bsd: add raw_create_options
On 08/05/13 15:03, Paolo Bonzini wrote:
>
> [...]
>
> 4) There is another member, .create_options, which is an array of
> QEMUOptionParameter structs, terminated by an all-zero item.  The only
> option you need is for the virtual disk size.  You will find something
> to copy from in other block drivers, for example block/qcow2.c.

Code taken and adapted from "block/qcow2.c", as suggested. The code being
copied/modified is blamed on

    commit 20d97356c9
    Author: Blue Swirl <blauwirbel@gmail.com>
    Date:   Fri Apr 23 20:19:47 2010 +0000

        Fix OpenBSD build

and

    commit 7c80ab3f21
    Author: Jes Sorensen <Jes.Sorensen@redhat.com>
    Date:   Fri Dec 17 16:02:39 2010 +0100

        block/qcow2.c: rename qcow_ functions to qcow2_

Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:28:52 +02:00
Laszlo Ersek
01dd96d8f4 raw_bsd: introduce "special members"
On 08/05/13 15:03, Paolo Bonzini wrote:
>
> [...]
>
> 3) These members are special
>
>     .format_name   is the string "raw"
>     .bdrv_open     raw_open should set bs->sg to bs->file->sg and return 0
>     .bdrv_close    raw_close should do nothing
>     .bdrv_probe    raw_probe should just return 1.

v1->v2:

On 08/20/13 10:11, Kevin Wolf wrote:
> Am 16.08.2013 um 16:15 hat Laszlo Ersek geschrieben:

>> +static int raw_probe(void)
>> +{
>> +    return 1;
>> +}
>
> Maybe add a comment here like "smallest possible positive score so that
> raw is used if and only if no other block driver works".

Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:28:52 +02:00
Laszlo Ersek
1565262c37 raw_bsd: add raw_create()
On 08/05/13 15:03, Paolo Bonzini wrote:
>
> [...]
>
> 2) This is also a simple forwarder function:
>
>     .bdrv_create
>
> but there is no BlockDriverState argument so the forwarded-to function
> does not have a bs->file argument either.  The forwarded-to function is
> bdrv_create_file.

Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:28:52 +02:00
Laszlo Ersek
9eaafd90d1 raw_bsd: emit debug events in bdrv_co_readv() and bdrv_co_writev()
On 08/05/13 15:03, Paolo Bonzini wrote:
>
> [...]
>
> 1) BlockDriver is a struct in which these function members are
> interesting:
>
>     .bdrv_reopen_prepare
>     .bdrv_co_readv
>     .bdrv_co_writev
>     .bdrv_co_is_allocated
>     .bdrv_co_write_zeroes
>     .bdrv_co_discard
>     .bdrv_getlength
>     .bdrv_get_info
>     .bdrv_truncate
>     .bdrv_is_inserted
>     .bdrv_media_changed
>     .bdrv_eject
>     .bdrv_lock_medium
>     .bdrv_ioctl
>     .bdrv_aio_ioctl
>     .bdrv_has_zero_init
>
> They should be implemented as simple forwarders (see above). There are
> 16 functions listed here, you can easily see how this already accounts
> for 100+ SLOC roughly...
>
> The implementations of bdrv_co_readv and bdrv_co_writev should also call
> BLKDBG_EVENT on bs->file too, before forwarding to bs->file.  The events
> to be generated are BLKDBG_READ_AIO and BLKDBG_WRITE_AIO.

Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:28:52 +02:00
Laszlo Ersek
e1c66c6d82 add skeleton for BSD licensed "raw" BlockDriver
On 08/05/13 15:03, Paolo Bonzini wrote:
>
>
> ----- Original Message -----
>> From: "Laszlo Ersek" <lersek@redhat.com>
>> To: "Paolo Bonzini" <pbonzini@redhat.com>
>> Sent: Monday, August 5, 2013 2:43:46 PM
>> Subject: Re: [PATCH 1/2] raw: add license header
>>
>> On 08/02/13 00:27, Paolo Bonzini wrote:
>>> On 08/01/2013 10:13 AM, Christoph Hellwig wrote:
>>>> On Wed, Jul 31, 2013 at 08:19:51AM +0200, Paolo Bonzini wrote:
>>>>> Most of the block layer is under the BSD license, thus it is
>>>>> reasonable to license block/raw.c the same way.  CCed people should
>>>>> ACK by replying with a Signed-off-by line.
>>>>
>>>> The coded was intended to be GPLv2.
>>>
>>> Laszlo, would you be willing to do clean-room reverse engineering?
>>>
>>> (No rants, please. :))
>>
>> What's the scope exactly?
>
> It's quite small, it's a file full of forwarders like
>
> static void raw_foo(BlockDriverState *bs)
> {
>     return bdrv_foo(bs->file);
> }
>
> It's 170 lines of code, all as boring as this.  I only picked you
> because I'm quite certain you have never seen the file (and the answer
> confirmed it).
>
> Basically:
>
> 1) BlockDriver is a struct in which these function members are
> interesting:
>
>     .bdrv_reopen_prepare
>     .bdrv_co_readv
>     .bdrv_co_writev
>     .bdrv_co_is_allocated
>     .bdrv_co_write_zeroes
>     .bdrv_co_discard
>     .bdrv_getlength
>     .bdrv_get_info
>     .bdrv_truncate
>     .bdrv_is_inserted
>     .bdrv_media_changed
>     .bdrv_eject
>     .bdrv_lock_medium
>     .bdrv_ioctl
>     .bdrv_aio_ioctl
>     .bdrv_has_zero_init
>
> They should be implemented as simple forwarders (see above).
> There are 16 functions listed here, you can easily see how this
> already accounts for 100+ SLOC roughly...
>
> The implementations of bdrv_co_readv and bdrv_co_writev should also
> call BLKDBG_EVENT on bs->file too, before forwarding to bs->file.  The
> events to be generated are BLKDBG_READ_AIO and BLKDBG_WRITE_AIO.
>
> 2) This is also a simple forwarder function:
>
>     .bdrv_create
>
> but there is no BlockDriverState argument so the forwarded-to function
> does not have a bs->file argument either.  The forwarded-to function
> is bdrv_create_file.
>
> 3) These members are special
>
>     .format_name   is the string "raw"
>     .bdrv_open     raw_open should set bs->sg to bs->file->sg and return 0
>     .bdrv_close    raw_close should do nothing
>     .bdrv_probe    raw_probe should just return 1.
>
> 4) There is another member, .create_options, which is an array of
> QEMUOptionParameter structs, terminated by an all-zero item.  The only
> option you need is for the virtual disk size.  You will find something
> to copy from in other block drivers, for example block/qcow2.c.
>
> 5) Formats are registered with bdrv_register (takes a BlockDriver*).
> You also need to pass the caller of bdrv_register to block_init.
>
> 6) I'm not sure how to organize the patch series, so I'll leave this to
> your creativity.  I guess in this case move/copy detection of git should
> be disabled.  I would definitely include this spec in the commit
> message as a proof of clean-room reverse engineering.
>
> 7) Remember a BSD header like the one in block.c.
>
> Paolo

This patch implements the email up to the paragraph ending with "100+ SLOC
roughly". The skeleton is generated from the list there, with a simple
shell loop using "sed" and the raw_foo() template.

The BSD license block is copied (and reflowed) from
"util/qemu-progress.c".

Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:28:52 +02:00
Peter Maydell
127c84e1a5 block/qcow2.h: Avoid "1LL << 63" (shifts into sign bit)
The expression "1LL << 63" tries to shift the 1 into the sign bit of a
'long long', which provokes a clang sanitizer warning:

runtime error: left shift of 1 by 63 places cannot be represented in type 'long long'

Use "1ULL << 63" as the definition of QCOW_OFLAG_COPIED instead
to avoid this. For consistency, we also update the other QCOW_OFLAG
definitions to use the ULL suffix rather than LL, though only the
shift by 63 is undefined behaviour.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:28:52 +02:00
Kevin Wolf
9117b47717 qcow2: Change default for new images to compat=1.1
By the time that qemu 1.7 will be released, enough time will have passed
since qemu 1.1, which is the first version to understand version 3
images, that changing the default shouldn't hurt many people any more
and the benefits of using the new format outweigh the pain.

qemu-iotests already runs with compat=1.1 by default.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-08-30 15:28:51 +02:00
Stefan Hajnoczi
b10577df13 win32-aio: drop win32_aio_flush_cb()
The io_flush argument to qemu_aio_set_event_notifier() has been removed
since the block layer learnt to drain requests by itself.  Fix the
Windows build for win32-aio.o by updating the
qemu_aio_set_event_notifier() call and dropping win32_aio_flush_cb().

Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 22:05:04 +02:00
Alex Bligh
bc72ad6754 aio / timers: Switch entire codebase to the new timer API
This is an autogenerated patch using scripts/switch-timer-api.

Switch the entire code base to using the new timer API.

Note this patch may introduce some line length issues.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 19:14:24 +02:00
Alex Bligh
7483d1e547 aio / timers: convert block_job_sleep_ns and co_sleep_ns to new API
Convert block_job_sleep_ns and co_sleep_ns to use the new timer
API.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 19:14:24 +02:00
Paolo Bonzini
04d542c8b8 vmdk: support vmfs files
VMware ESX hosts also use different create and extent types for flat
files, respectively "vmfs" and "VMFS".  This is not documented, but it
can be found at http://kb.vmware.com/kb/10002511 (Recreating a missing
virtual machine disk (VMDK) descriptor file).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 15:35:58 +02:00
Fam Zheng
daac8fdc68 vmdk: support vmfsSparse files
VMware ESX hosts use a variant of the VMDK3 format, identified by the
vmfsSparse create type ad the VMFSSPARSE extent type.

It has 16 KB grain tables (L2) and a variable-size grain directory (L1).
In addition, the grain size is always 512, but that is not a problem
because it is included in the header.

The format of the extents is documented in the VMDK spec.  The format
of the descriptor file is not documented precisely, but it can be
found at http://kb.vmware.com/kb/10026353 (Recreating a missing virtual
machine disk (VMDK) descriptor file for delta disks).

With these patches, vmfsSparse files only work if opened through the
descriptor file.  Data files without descriptor files, as far as I
could understand, are not supported by ESX.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>

--
v2: Rebase to patch 01.
    Change le64_to_cpu to le32_to_cpu.
    Rename vmdk_open_vmdk3 to vmdk_open_vmfs_sparse, which represents the
    current usage of this format.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 15:35:58 +02:00
Fam Zheng
f6b61e54bd vmdk: fix L1 and L2 table size in vmdk3 open
VMDK3 header has the field l1dir_size, but vmdk_open_vmdk3 hardcoded the
value. This patch honors the header field.

And the L2 table size is 4096 according to VMDK spec[1], instead of
1 << 9 (512).

[1]:
http://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf?src=vmdk

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 15:35:58 +02:00
Fam Zheng
b0651b8c24 vmdk: Move l1_size check into vmdk_add_extent()
This header check is common to VMDK3 and VMDK4, so move it into
vmdk_add_extent().

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 15:35:58 +02:00
Asias He
0d51b4debe block: Introduce bs->zero_beyond_eof
In 4146b46c42e0989cb5842e04d88ab6ccb1713a48 (block: Produce zeros when
protocols reading beyond end of file), we break qemu-iotests ./check
-qcow2 022. This happens because qcow2 temporarily sets ->growable = 1
for vmstate accesses (which are stored beyond the end of regular image
data).

We introduce the bs->zero_beyond_eof to allow qcow2_load_vmstate() to
disable ->zero_beyond_eof temporarily in addition to enable ->growable.

[Since the broken patch "block: Produce zeros when protocols reading
beyond end of file" has not been merged yet, I have applied this fix
*first* and will then apply the next patch to keep the tree bisectable.
-- Stefan]

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 14:10:21 +02:00
Kevin Wolf
8ad1898cf1 qcow2: Change default for new images to compat=1.1
By the time that qemu 1.7 will be released, enough time will have passed
since qemu 1.1, which is the first version to understand version 3
images, that changing the default shouldn't hurt many people any more
and the benefits of using the new format outweigh the pain.

qemu-iotests already runs with compat=1.1 by default.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-21 14:41:09 +02:00
Stefan Hajnoczi
f2e5dca46b aio: drop io_flush argument
The .io_flush() handler no longer exists and has no users.  Drop the
io_flush argument to aio_set_fd_handler() and related functions.

The AioFlushEventNotifierHandler and AioFlushHandler typedefs are no
longer used and are dropped too.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-19 15:52:19 +02:00
Stefan Hajnoczi
f0d3576599 block/ssh: drop return_true()
.io_flush() is no longer called so drop return_true().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-19 15:52:19 +02:00
Stefan Hajnoczi
d6d94c6785 block/sheepdog: drop have_co_req() and aio_flush_request()
.io_flush() is no longer called so drop have_co_req() and
aio_flush_request().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-19 15:52:19 +02:00
Stefan Hajnoczi
5d289cc724 block/rbd: drop qemu_rbd_aio_flush_cb()
.io_flush() is no longer called so drop qemu_rbd_aio_flush_cb().
qemu_aio_count is unused now so drop it too.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-19 15:52:19 +02:00
Stefan Hajnoczi
bed2e759eb block/nbd: drop nbd_have_request()
.io_flush() is no longer called so drop nbd_have_request().  We cannot
drop in_flight since it is still used by other block/nbd.c code.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-19 15:52:19 +02:00
Stefan Hajnoczi
94473d0c06 block/linux-aio: drop qemu_laio_completion_cb()
.io_flush() is no longer called so drop qemu_laio_completion_cb().  It
turns out that count is now unused so drop that too.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-19 15:52:19 +02:00
Stefan Hajnoczi
70ecdc6e4e block/iscsi: drop iscsi_process_flush()
.io_flush() is no longer called so drop iscsi_process_flush().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-19 15:52:19 +02:00
Stefan Hajnoczi
372835fbc3 block/gluster: drop qemu_gluster_aio_flush_cb()
Since .io_flush() is no longer called we do not need
qemu_gluster_aio_flush_cb() anymore.  It turns out that qemu_aio_count
is unused now and can be dropped.

Thanks to Bharata B Rao <bharata@linux.vnet.ibm.com> for catching a
build failure with CONFIG_GLUSTERFS_DISCARD, which has been fixed.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-19 15:51:09 +02:00
Stefan Hajnoczi
0d1460226f block/curl: drop curl_aio_flush()
.io_flush() is no longer called so drop curl_aio_flush().  The acb[]
array that the function checks is still used in other parts of
block/curl.c.  Therefore we cannot remove acb[], it is needed.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-19 15:45:35 +02:00
Stefan Hajnoczi
88266f5aa7 block: stop relying on io_flush() in bdrv_drain_all()
If a block driver has no file descriptors to monitor but there are still
active requests, it can return 1 from .io_flush().  This is used to spin
during synchronous I/O.

Stop relying on .io_flush() and instead check
QLIST_EMPTY(&bs->tracked_requests) to decide whether there are active
requests.

This is the first step in removing .io_flush() so that event loops no
longer need to have the concept of synchronous I/O.  Eventually we may
be able to kill synchronous I/O completely by running everything in a
coroutine, but that is future work.

Note this patch moves bs->throttled_reqs initialization to bdrv_new() so
that bdrv_requests_pending(bs) can safely access it.  In practice bs is
g_malloc0() so the memory is already zeroed but it's safer to initialize
the queue properly.

We also need to fix up block/stream.c:close_unused_images() to prevent
traversing a dangling pointer while it rearranges the backing file
chain.  This is necessary since the new bdrv_drain_all() traverses the
backing file chain.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-19 15:45:34 +02:00
Paolo Bonzini
7748c1bd50 raw: add license header
Most of the block layer is under the BSD license, thus it is reasonable
to license block/raw.c the same way.  CCed people should ACK by replying
with a Signed-off-by line.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Jeff Cody <jcody@redhat.com>
Cc: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1375251592-2537-2-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-08-12 09:15:11 -05:00
Fam Zheng
ca8804ced9 vmdk: rename num_gtes_per_gte to num_gtes_per_gt
num_gtes_per_gte is a historical typo, rename it to a more sensible
name. It means "number of GrainTableEntries per GrainTable".

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-06 15:27:32 +02:00
Fam Zheng
bf81507de3 vmdk: use heap allocation for whole_grain
We should never grow the stack beyond 1 MB, otherwise we'll fall off the
end.  Thread stacks and coroutine stacks (1 MB) do not grow.
get_cluster_offset() allocates a big stack offset, it will fail for big
cluster images, change to heap allocated buffer.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-06 15:27:32 +02:00
Fam Zheng
2c43e43c8c vmdk: check l1 size before opening image
L1 table size is calculated from capacity, granularity and l2 table
size. If capacity is too big or later two are too small, the L1 table
will be too big to allocate in memory. Limit it to a reasonable range.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-06 15:27:32 +02:00
Fam Zheng
f8ce04036e vmdk: check l2 table size when opening
header.num_gtes_per_gte determines size for L2 table. Check for too big
value before using it. Limit to 512M entries (2GB per one L2 table).

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-06 15:27:32 +02:00
Fam Zheng
8aa1331c09 vmdk: check granularity field in opening
Granularity is used to calculate the cluster size and allocate r/w
buffer. Check the value from image before using it, so we don't abort()
for unbounded memory allocation.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-06 15:27:32 +02:00
Fam Zheng
e98768d437 vmdk: use unsigned values for on disk header fields
The size and offset fields are all non-negative values, use uint64_t for
them to avoid getting negative in memory value by int overflow.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-06 15:27:32 +02:00
Fam Zheng
5d8caa543c vmdk: Make VMDK3Header and VmdkGrainMarker QEMU_PACKED
It's best to make it consistent that all on disk structures are
QEMU_PACKED.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-06 15:27:32 +02:00
Liu Yuan
e4f5c1bf8f sheepdog: add missing .bdrv_has_zero_init
Commit 3ac21627 changed the behaviour of bdrv_has_zero_init() to default
to 0. In the review for Sheepdog it turned out that enabling it is safe,
so that commit updated one BlockDriver definition of sheepdog to use
bdrv_has_zero_init_1, missed however that there are more BlockDrivers in
the driver. Fix these now.

Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-06 10:41:56 +02:00
Fam Zheng
8e50724313 vmdk: fix comment for vmdk_co_write_zeroes
The comment was truncated. Add the missing parts, especially explain why
we need zero_dry_run.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2013-08-02 18:07:04 +04:00
Richard W.M. Jones
f5075224d6 block/iscsi.c: Fix printf format error.
The error on armv7hl was:

block/iscsi.c: In function ‘is_request_lun_aligned’:
block/iscsi.c:251:26: error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘int64_t’ [-Werror=format=]
                          iscsilun->block_size, sector_num, nb_sectors);
                          ^

This also splits the long line to comply with qemu coding guidelines.

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2013-08-02 18:02:48 +04:00
Peter Maydell
2440a2c3df block/sheepdog: Rename 'dprintf' to 'DPRINTF'
'dprintf' is the name of a POSIX standard function so we should not be
stealing it for our debug macro. Rename to 'DPRINTF' (in line with
a number of other source files.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Acked-by: Richard Henderson <rth@twiddle.net>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1375100199-13934-2-git-send-email-peter.maydell@linaro.org
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-07-29 19:33:53 -05:00
Anthony Liguori
eddbf0ab9d Merge remote-tracking branch 'stefanha/block' into staging
# By Stefan Hajnoczi (4) and others
# Via Stefan Hajnoczi
* stefanha/block:
  dataplane: refuse to start if device is already in use
  dataplane: enable virtio-blk x-data-plane=on live migration
  migration: fix spice migration
  migration: notify migration state before starting thread
  block: Repair the throttling code.
  gluster: Add image resize support

Message-id: 1375112172-24863-1-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-07-29 11:33:48 -05:00
Paolo Bonzini
42ec24e285 gluster: Add image resize support
Implement .bdrv_truncate in GlusterFS block driver so that GlusterFS backend
can support image resizing.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Tested-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-07-29 17:07:37 +02:00
Stefan Weil
52f350227f misc: Fix new typos in comments and strings
All these typos were found by codespell.

sould -> should
emperical -> empirical
intialization -> initialization
successfuly -> successfully
gaurantee -> guarantee

Fix also another error (before before) in the same context.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2013-07-27 11:22:54 +04:00
Ian Main
fc5d3f8432 Implement sync modes for drive-backup.
This patch adds sync-modes to the drive-backup interface and
implements the FULL, NONE and TOP modes of synchronization.

FULL performs as before copying the entire contents of the drive
while preserving the point-in-time using CoW.
NONE only copies new writes to the target drive.
TOP copies changes to the topmost drive image and preserves the
point-in-time using CoW.

For sync mode TOP are creating a new target image using the same backing
file as the original disk image.  Then any new data that has been laid
on top of it since creation is copied in the main backup_run() loop.
There is an extra check in the 'TOP' case so that we don't bother to copy
all the data of the backing file as it already exists in the target.
This is where the bdrv_co_is_allocated() is used to determine if the
data exists in the topmost layer or below.

Also any new data being written is intercepted via the write_notifier
hook which ends up calling backup_do_cow() to copy old data out before
it gets overwritten.

For mode 'NONE' we create the new target image and only copy in the
original data from the disk image starting from the time the call was
made.  This preserves the point in time data by only copying the parts
that are *going to change* to the target image.  This way we can
reconstruct the final image by checking to see if the given block exists
in the new target image first, and if it does not, you can get it from
the original image.  This is basically an optimization allowing you to
do point-in-time snapshots with low overhead vs the 'FULL' version.

Since there is no old data to copy out the loop in backup_run() for the
NONE case just calls qemu_coroutine_yield() which only wakes up after
an event (usually cancel in this case).  The rest is handled by the
before_write notifier which again calls backup_do_cow() to write out
the old data so it can be preserved.

Signed-off-by: Ian Main <imain@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-07-26 22:01:31 +02:00
Kevin Wolf
64aa99d3e0 qcow2: Use dashes instead of underscores in options
This is what QMP wants to use. The options haven't been enabled in any
release yet, so we're still free to change them.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-07-26 21:59:56 +02:00
Peter Lieven
a23fdf3559 block/raw: add .bdrv_get_info
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-07-19 15:27:37 +08:00
Peter Lieven
8bf9344ad6 block/raw: add bdrv_co_write_zeroes
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-07-19 12:29:21 +08:00
Fam Zheng
78f27bd02c block: fix vvfat error path for enable_write_target
s->qcow and s->qcow_filename are allocated but not freed on error. Fix the
possible leaks, remove unnecessary check for bdrv_new(), propagate ret code of
bdrv_create() and also the one of enable_write_target().

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-07-19 12:29:21 +08:00
Bharata B Rao
0c14fb47ec gluster: Add discard support for GlusterFS block driver.
Implement bdrv_aio_discard for gluster.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-07-19 12:29:21 +08:00
Peter Lieven
0777b5dde4 iscsi: factor out sector conversions
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-17 17:01:41 +02:00
Peter Lieven
91bea4e2bb iscsi: assert that sectors are aligned to LUN blocksize
if the blocksize of an iSCSI LUN is bigger than the BDRV_SECTOR_SIZE
it is possible that sector_num or nb_sectors are not correctly
aligned.

to avoid corruption we fail requests which are misaligned.

Signed-off-by: Peter Lieven <pl@kamp.de>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-17 17:01:41 +02:00
Peter Lieven
7e4d5a9f94 iscsi: remove support for misaligned nb_sectors in aio_readv
this hask is not working (anymore). support for misaligned offsets should
be handled at the block layer.

Signed-off-by: Peter Lieven <pl@kamp.de>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-17 17:01:41 +02:00
Peter Lieven
d3bda7bc16 iscsi: fix -ENOSPC in iscsi_create()
the -ENOPSC case did not work due to the missing goto.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-17 17:00:28 +02:00
Ronnie Sahlberg
0a53f01074 Fix iSCSI crash on SG_IO with an iovector
Don't assume that SG_IO is always invoked with a simple buffer,
check the iovec_count and if it is >= 1 then we need to pass an array
of iovectors to libiscsi instead of just a plain buffer.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-17 17:00:26 +02:00
Kevin Wolf
98289620e0 block: Don't parse protocol from file.filename
One of the major reasons for doing something new for -blockdev and
blockdev-add was that the old block layer code parses filenames instead
of just taking them literally. So we should really leave it untouched
when it's passing using the new interfaces (like -drive
file.filename=...).

This allows opening relative file names that contain a colon.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-07-15 09:49:00 +02:00
Fam Zheng
3494d65027 curl: refuse to open URL from HTTP server without range support
CURL driver requests partial data from server on guest IO req. For HTTP
and HTTPS, it uses "Range: ***" in requests, and this will not work if
server not accepting range. This patch does this check when open.

 * Removed curl_size_cb, which is not used: On one hand it's registered to
   libcurl as CURLOPT_WRITEFUNCTION, instead of CURLOPT_HEADERFUNCTION,
   which will get called with *data*, not *header*. On the other hand the
   s->len is assigned unconditionally later.

   In this gone function, the sscanf for "Content-Length: %zd", on
   (void *)ptr, which is not guaranteed to be zero-terminated, is
   potentially a security bug. So this patch fixes it as a side-effect. The
   bug is reported as: https://bugs.launchpad.net/qemu/+bug/1188943
   (Note the bug is marked "private" so you might not be able to see it)

 * Introduced curl_header_cb, which is used to parse header and mark the
   server as accepting range if "Accept-Ranges: bytes" line is seen from
   response header. If protocol is HTTP or HTTPS, but server response has
   no not this support, refuse to open this URL.

Note that python builtin module SimpleHTTPServer is an example of not
supporting range, if you need to test this driver, get a better server
or use internet URLs.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-07-05 09:40:18 +02:00
Fam Zheng
da7a50f938 vmdk: Implement .bdrv_has_zero_init
Depending on the subformat, has_zero_init queries underlying storage for
flat extent. If it has a flat extent and its underlying storage doesn't
have zero init, return 0. Otherwise return 1.

Aligns the operator assignments.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-07-05 09:40:18 +02:00
Peter Lieven
3ac216270a block: change default of .has_zero_init to 0
.has_zero_init defaults to 1 for all formats and protocols.

this is a dangerous default since this means that all
new added drivers need to manually overwrite it to 0 if
they do not ensure that a device is zero initialized
after bdrv_create().

if a driver needs to explicitly set this value to
1 its easier to verify the correctness in the review process.

during review of the existing drivers it turned out
that ssh and gluster had a wrong default of 1.
both protocols support host_devices as backend
which are not by default zero initialized. this
wrong assumption will lead to possible corruption
if qemu-img convert is used to write to such a backend.

vpc and vmdk also defaulted to 1 altough they support
fixed respectively flat extends. this has to be addresses
in separate patches. both formats as well as the mentioned
ssh and gluster are turned to the default of 0 with this
patch for safety.

a similar problem with the wrong default existed for
iscsi most likely because the driver developer did
oversee the default value of 1.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-06-28 13:52:35 +02:00
Kevin Wolf
72c6cc94da vpc: Implement .bdrv_has_zero_init
Depending on the subformat, has_zero_init on VHD must behave like raw
and query the underlying storage (fixed) or like other sparse formats
that can always return 1 (dynamic, differencing).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-06-28 10:21:00 +02:00
Fam Zheng
8ed610a1c9 vmdk: remove wrong calculation of relative path
When creating image with backing file, the driver tries to calculate the
relative path from created image file to backing file, but the path
computation is incorrect. e.g.:

    $ qemu-img create -f vmdk -b vmdk-data-disk.vmdk vmdk-data-snapshot1
    Formatting 'vmdk-data-snapshot1', fmt=vmdk size=10737418240
    backing_file='vmdk-data-disk.vmdk' compat6=off zeroed_grain=off

    $ qemu-img info vmdk-data-snapshot1
    image: vmdk-data-snapshot1
    file format: vmdk
    virtual size: 10G (10737418240 bytes)
    disk size: 12K
->  backing file: disk.vmdk

The common part in file names, "vmdk-data-", is incorrectly forgotten by
relative_path(). As the VMDK specification has no restriction on
parentNameHint to be relative path, we simply remove this by using the
backing_file option.

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-06-28 09:20:27 +02:00
Kevin Wolf
8ab6feec2c gluster: Return bdrv_has_zero_init = 0
GlusterFS volumes can be backed by block devices, in which case
bdrv_create() doesn't make sure that the image is zeroed out. It is
currently not possibly to detect whether a given image is backed by a
file or a block device, and incorrectly assuming that it is zeroed
corrupts images during qemu-img convert, so let's err on the side of
caution and always return 0.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-06-28 09:20:27 +02:00
Richard W.M. Jones
0b3f21e6a9 block/ssh: Set bdrv_has_zero_init according to the file type.
If the remote is a regular file, set it to true (ie. reads of
uninitialized areas in a newly created file will return zeroes).
If we can't prove that, return false (a safe default).

Tested by adding a debugging print statement [not part of this commit]
and creating a remote file and a remote block device:

  $ ./qemu-img create ssh://localhost/tmp/new 100M
  Formatting 'ssh://localhost/tmp/new', fmt=raw size=104857600
  filename ssh://localhost/tmp/new: has_zero_init = 1
  $ sudo lvcreate -L 1G -n tmp /dev/fedora
    Logical volume "tmp" created
  $ ./qemu-img create ssh://localhost/dev/fedora/tmp 1G
  Formatting 'ssh://localhost/dev/fedora/tmp', fmt=raw size=1073741824
  filename ssh://localhost/dev/fedora/tmp: has_zero_init = 0

Cc: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-06-28 09:20:27 +02:00
Kevin Wolf
f59fee8d50 block: Make BlockJobTypes const
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-06-28 09:20:27 +02:00
Dietmar Maurer
98d2c6f2cd block: add basic backup support to block driver
backup_start() creates a block job that copies a point-in-time snapshot
of a block device to a target block device.

We call backup_do_cow() for each write during backup. That function
reads the original data from the block device before it gets
overwritten.  The data is then written to the target device.

Currently backup cluster size is hardcoded to 65536 bytes.

[I made a number of changes to Dietmar's original patch and folded them
in to make code review easy.  Here is the full list:

 * Drop BackupDumpFunc interface in favor of a target block device
 * Detect zero clusters with buffer_is_zero() and use bdrv_co_write_zeroes()
 * Use 0 delay instead of 1us, like other block jobs
 * Unify creation/start functions into backup_start()
 * Simplify cleanup, free bitmap in backup_run() instead of cb
 * function
 * Use HBitmap to avoid duplicating bitmap code
 * Use bdrv_getlength() instead of accessing ->total_sectors
 * directly
 * Delete the backup.h header file, it is no longer necessary
 * Move ./backup.c to block/backup.c
 * Remove #ifdefed out code
 * Coding style and whitespace cleanups
 * Use bdrv_add_before_write_notifier() instead of blockjob-specific hooks
 * Keep our own in-flight CowRequest list instead of using block.c
   tracked requests.  This means a little code duplication but is much
   simpler than trying to share the tracked requests list and use the
   backup block size.
 * Add on_source_error and on_target_error error handling.
 * Use trace events instead of DPRINTF()

-- stefanha]

Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-06-28 09:20:26 +02:00
Kevin Wolf
a5c5ea3f60 raw-posix: Fix /dev/cdrom magic on OS X
The raw-posix driver has code to provide a /dev/cdrom on OS X even
though it doesn't really exist. However, since commit c66a6157 the real
filename is dismissed after finding it, so opening /dev/cdrom fails.
Put the filename back into the options QDict to make this work again.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-06-28 09:20:26 +02:00
Fam Zheng
96c51eb5e4 vmdk: refuse to open higher version than supported
Refuse to open higher version for safety.

Although we try to be compatible with published VMDK spec, VMware has
newer version from ESXi 5.1 exported OVF/OVA, which we have no knowledge
what's changed in it. And it is very likely to have more new versions in
the future, so it's not safe to open them blindly.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-06-24 10:25:43 +02:00
Kevin Wolf
0b919fae31 qcow2: Batch discards
This optimises the discard operation for freed clusters by batching
discard requests (both snapshot deletion and bdrv_discard end up
updating the refcounts cluster by cluster).

Note that we don't discard asynchronously, but keep s->lock held. This
is to avoid that a freed cluster is reallocated and written to while the
discard is still in flight.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-06-24 10:25:17 +02:00