mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Max Reitz	320c706666	qcow2: Free only newly allocated clusters on error In expand_zero_clusters_in_l1, a new cluster is only allocated if it was not already preallocated. On error, such preallocated clusters should not be freed, but only the newly allocated ones. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-27 17:22:43 +02:00
Max Reitz	be0b742ee3	qcow2: Always use error path in l2_allocate Just returning -errno in some cases prevents trace_qcow2_l2_allocate_done from being executed (and, in one case, also the unused allocated L2 table from being freed). Always going down the error path fixes this. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-27 17:22:43 +02:00
Max Reitz	8585afd813	qcow2: Don't put invalid L2 table into cache In l2_allocate, the fail path is executed if qcow2_cache_flush fails. However, the L2 table has not yet been fetched from the L2 table cache. The qcow2_cache_put in the fail path therefore basically gives an undefined argument as the L2 table address (in this case). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-27 11:31:59 +02:00
Max Reitz	e390cf5a97	qcow2: Correct bitmap size in zero expansion Since the expanded_clusters bitmap is addressed using host offsets in the underlying image file, the correct size to use for allocating the bitmap is not determined by the guest disk image but by the underlying host image file. Furthermore, this size may change during the expansion due to cluster allocations on growable image files. In this case, the bitmap needs to be resized as well to reflect the growth. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-27 11:16:35 +02:00
Max Reitz	c01dbccbad	qcow2: Assert against currently impossible overflow If qcow2_alloc_cluster_link_l2 is called with a QCowL2Meta describing a request crossing L2 boundaries, a buffer overflow will occur. This is impossible right now since such requests are never generated (every request is shortened to L2 boundaries before) and probably also completely unintended (considering the name "QCowL2Meta"), however, it is still worth an assertion. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 21:57:44 +02:00
Jeff Cody	687fb89366	block: qed - use QEMU_PACKED for on-disk structures QEDHeader is read, and written, directly from on-disk images via bdrv_pread()/write(). To avoid any unintentional padding, these structs should be packed. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 20:51:15 +02:00
Jeff Cody	c4217f645d	block: qcow2 - used QEMU_PACKED for on-disk structures QCowHeader and QCowExtension are structs that reside in the on-disk image format, and are read and written directly via bdrv_pread()/write(), and as such should be packed to avoid any unintentional struct padding. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 20:51:13 +02:00
Jeff Cody	e54835c06d	block: vpc - use QEMU_PACKED for on-disk structures The VHD footer and header structs (vhd_footer and vhd_dyndisk_header) are on-disk structures for the image format, and as such should be packed. Go ahead and make these typedefs as well, with the preferred QEMU naming convention, so that the packed attribute is used consistently with the struct. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 20:51:10 +02:00
Jeff Cody	8368febd81	block: vdi - use QEMU_PACKED for on-disk structures The header struct VdiHeader is an on-disk structure for the image format, and as such should be packed. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 20:51:05 +02:00
Stefan Hajnoczi	9e6337d081	rbd: avoid qemu_rbd_snap_list() memory leaks When there are no snapshots qemu_rbd_snap_list() returns 0 and the snapshot table pointer is NULL. Don't forget to free the snaps buffer we allocated for librbd rbd_snap_list(). When the function succeeds don't forget to free the snaps buffer after calling rbd_snap_list_end(). Cc: qemu-stable@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 16:22:00 +02:00
Stefan Weil	c3e4f43a99	block: Fix compiler warning (-Werror=uninitialized) The patch fixes a warning from gcc (Debian 4.6.3-14+rpi1) 4.6.3: block/stream.c:141:22: error: ‘copy’ may be used uninitialized in this function [-Werror=uninitialized] This is not a real bug - a better compiler would not complain. Now 'copy' has always a defined value, so the check for ret >= 0 can be removed. Signed-off-by: Stefan Weil <sw@weilnetz.de> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 16:21:28 +02:00
Benoît Canet	030be32184	block: introduce BlockDriver.bdrv_needs_filename to enable some drivers. Some drivers will have driver specifics options but no filename. This new bool allow the block layer to treat them correctly. The .bdrv_needs_filename is set in drivers not having .bdrv_parse_filename and not having .bdrv_open. The first exception to this rule will be the quorum driver. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 16:21:28 +02:00
Fam Zheng	301c7d38a0	vmdk: fix cluster size check for flat extents We use the extent size as cluster size for flat extents (where no L1/L2 table is allocated so it's safe) reuse sector calculating code with sparse extents. Don't pass in the cluster size for adding flat extent, just set it to sectors later, then the cluster size checking will not fail. The cluster_sectors is changed to int64_t to allow big flat extent. Without this, flat extent opening is broken: # qemu-img create -f vmdk -o subformat=monolithicFlat /tmp/a.vmdk 100G Formatting '/tmp/a.vmdk', fmt=vmdk size=107374182400 compat6=off subformat='monolithicFlat' zeroed_grain=off # qemu-img info /tmp/a.vmdk image: /tmp/a.vmdk file format: raw virtual size: 0 (0 bytes) disk size: 4.0K Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 16:21:28 +02:00
Max Reitz	7454d60045	qcow2: Don't shadow return value When trying to update the refcounts for a snapshot, the return value of update_refcount on a compressed cluster was pretty much ignored, cancelling the update on error but returning 0. This is caused by an inner "ret" variable shadowing the outer one (the latter is used in the return statement). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 10:08:56 +02:00
Anthony Liguori	16121fa39e	Merge remote-tracking branch 'stefanha/block' into staging # By Stefan Hajnoczi (4) and others # Via Stefan Hajnoczi * stefanha/block: virtio-blk: do not relay a previous driver's WCE configuration to the current blockdev: do not default cache.no-flush to true block: don't lose data from last incomplete sector qcow2: Correct snapshots size for overlap check coroutine: fix /perf/nesting coroutine benchmark coroutine: add qemu_coroutine_yield benchmark qemu-timer: do not take the lock in timer_pending qemu-timer: make qemu_timer_mod_ns() and qemu_timer_del() thread-safe qemu-timer: drop outdated signal safety comments osdep: warn if open(O_DIRECT) on fails with EINVAL libcacard: link against qemu-error.o for error_report() Message-id: 1379698931-946-1-git-send-email-stefanha@redhat.com	2013-09-23 11:53:05 -05:00
Anthony Liguori	f3ca508f00	Merge remote-tracking branch 'bonzini/scsi-next' into staging # By Hervé Poussineau (5) and Stefan Weil (1) # Via Paolo Bonzini * bonzini/scsi-next: block/iscsi: Drop iscsi_co_get_block_status for older versions of libiscsi lsi: add 53C810 variant lsi: remove todo lsi: ignore write accesses to CTEST0 registers lsi: check ssid versus sdid only if ssid is valid lsi: use constant name instead of its value	2013-09-23 11:52:32 -05:00
Max Reitz	0f39ac9a07	qcow2: Correct snapshots size for overlap check Using s->snapshots_size instead of snapshots_size for the metadata overlap check in qcow2_write_snapshots leads to the detection of an overlap with the main qcow2 image header when deleting the last snapshot, since s->snapshots_size has not yet been updated and is therefore non-zero. However, the offset returned by qcow2_alloc_clusters will be zero since snapshots_size is zero. Therefore, an overlap is detected albeit no such will occur. This patch fixes this by replacing s->snapshots_size by snapshots_size when calling qcow2_pre_write_overlap_check. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-20 12:48:03 +02:00
Stefan Weil	f35c934a5a	block/iscsi: Drop iscsi_co_get_block_status for older versions of libiscsi Debian wheezy includes libiscsi-dev 1.4.0 which does not provide SCSI_PROVISIONING_TYPE_DEALLOCATED. Drop iscsi_co_get_block_status in this case to allow compilation without errors. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-09-18 01:28:50 +02:00
Anthony Liguori	5dc11192b2	Merge remote-tracking branch 'kwolf/for-anthony' into staging # By Max Reitz (16) and others # Via Kevin Wolf * kwolf/for-anthony: (33 commits) qemu-iotests: Fix test 038 block: Assert validity of BdrvActionOps qemu-iotests: Cleanup test image in test number 007 qemu-img: fix invalid JSON coroutine: add ./configure --disable-coroutine-pool qemu-iotests: Adjustments due to error propagation qcow2: Use Error parameter qemu-img create: Emit filename on error block: Error parameter for create functions block: Error parameter for open functions bdrv: Use "Error" for creating images bdrv: Use "Error" for opening images qemu-iotests: add 057 internal snapshot for block device test case hmp: add interface hmp_snapshot_delete_blkdev_internal hmp: add interface hmp_snapshot_blkdev_internal qmp: add interface blockdev-snapshot-delete-internal-sync qmp: add interface blockdev-snapshot-internal-sync qmp: add internal snapshot support in qmp_transaction snapshot: distinguish id and name in snapshot delete snapshot: new function bdrv_snapshot_find_by_id_and_name() ... Message-id: 1379073063-14963-1-git-send-email-kwolf@redhat.com	2013-09-17 09:51:40 -05:00
Peter Lieven	65f3e33964	iscsi: split discard requests in multiple parts Replace .bdrv_aio_discard with .bdrv_co_discard so that discard requests can be split in multiple parts, each for a small amount of sectors. This is useful because we expose a generic API with no limit on the amount of sectors that can be unmapped in one request. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-09-12 13:14:19 +02:00
Max Reitz	3ef6c40ad0	qcow2: Use Error parameter Employ usage of the new Error ** parameter in qcow2_open, qcow2_create and associated functions. Signed-off-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:48 +02:00
Max Reitz	cc84d90ff5	block: Error parameter for create functions Add an Error ** parameter to bdrv_create and its associated functions to allow more specific error messages. Signed-off-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:48 +02:00
Max Reitz	34b5d2c68e	block: Error parameter for open functions Add an Error ** parameter to bdrv_open, bdrv_file_open and associated functions to allow more specific error messages. Signed-off-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:48 +02:00
Max Reitz	d5124c00d8	bdrv: Use "Error" for creating images Add an Error ** parameter to BlockDriver.bdrv_create to allow more specific error messages. Signed-off-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:48 +02:00
Max Reitz	015a1036a7	bdrv: Use "Error" for opening images Add an Error ** parameter to BlockDriver.bdrv_open and BlockDriver.bdrv_file_open to allow more specific error messages. Signed-off-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:47 +02:00
Wenchao Xia	a89d89d3e6	snapshot: distinguish id and name in snapshot delete Snapshot creation actually already distinguish id and name since it take a structured parameter sn, but delete can't. Later an accurate delete is needed in qmp_transaction abort and blockdev-snapshot-delete-sync, so change its prototype. Also errp is added to tip error, but return value is kepted to let caller check what kind of error happens. Existing caller for it are savevm, delvm and qemu-img, they are not impacted by introducing a new function bdrv_snapshot_delete_by_id_or_name(), which check the return value and do the operation again. Before this patch: For qcow2, it search id first then name to find the one to delete. For rbd, it search name. For sheepdog, it does nothing. After this patch: For qcow2, logic is the same by call it twice in caller. For rbd, it always fails in delete with id, but still search for name in second try, no change to user. Some code for *errp is based on Pavel's patch. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:47 +02:00
Wenchao Xia	2ea1dd758c	snapshot: new function bdrv_snapshot_find_by_id_and_name() To make it clear about id and name in searching, add this API to distinguish them. Caller can choose to search by id or name, *errp will be set only for exception. Some code are modified based on Pavel's patch. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:47 +02:00
Max Reitz	9296b3ed70	qcow2: Implement bdrv_amend_options Implement bdrv_amend_options for compat, size, backing_file, backing_fmt and lazy_refcounts. Downgrading images from compat=1.1 to compat=0.10 is achieved through handling all incompatible flags accordingly, clearing all compatible and autoclear flags and expanding all zero clusters. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:46 +02:00
Max Reitz	b6481f376b	qcow2: Save refcount order in BDRVQcowState Save the image refcount order in BDRVQcowState. This will be relevant for future code supporting different refcount orders than four and also for code that needs to verify a certain refcount order for an opened image. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:46 +02:00
Max Reitz	32b6444d23	qcow2-cluster: Expand zero clusters Add functionality for expanding zero clusters. This is necessary for downgrading the image version to one without zero cluster support. For non-backed images, this function may also just discard zero clusters instead of truly expanding them. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:46 +02:00
Max Reitz	e7108feaac	qcow2-cache: Empty cache Add a function for emptying a cache, i.e., flushing it and marking all elements invalid. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:46 +02:00
Tal Kain	56e023af80	raw-win32.c: Fix incorrect handling behaviour of small block files It is a valid case that the read data's size is smaller than the requested size since there could be files that are smaller than the minimum block size (For ex. when a VMDK disk descriptor file) Signed-off-by: Tal Kain <tal.kain@ravellosystems.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:46 +02:00
Kevin Wolf	1ebf561c11	qcow2: Discard VM state in active L1 after creating snapshot During savevm, the VM state is written to the active L1 of the image and then a snapshot is taken. After that, the VM state isn't needed any more in the active L1 and should be discarded. This is implemented by this patch. The impact of not discarding the VM state is that a snapshot can never become smaller than any previous snapshot (because it would be padded with old VM state), and more importantly that future savevm operations cause unnecessary COWs (with associated flushes), which makes subsequent snapshots much slower. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:46 +02:00
Kevin Wolf	670df5e3b4	qcow2: Pass discard type to qcow2_discard_clusters() The function will be used internally instead of only being called for guest discard requests. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:46 +02:00
Peter Lieven	54a5c1d5db	iscsi: add .bdrv_get_block_status this patch adds a coroutine for .bdrv_co_block_status as well as a generic framework that can be used to build coroutines in block/iscsi. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-09-12 08:46:21 +02:00
Peter Lieven	f18a7cbb09	iscsi: add logical block provisioning information to iscsilun Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-09-12 08:46:21 +02:00
Paolo Bonzini	5accc8408f	scsi: prefer UUID to VM name for the initiator name The UUID is unique even across multiple hosts, thus it is better than a VM name even if it is less user-friendly. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-09-12 08:46:21 +02:00
Paolo Bonzini	f5f7abcfd5	raw-posix: report unwritten extents as zero These are created for example with XFS_IOC_ZERO_RANGE. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:09 +02:00
Paolo Bonzini	63390a8d14	raw-posix: return get_block_status data and flags Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:09 +02:00
Paolo Bonzini	4bc74be997	block: return get_block_status data and flags for formats Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:09 +02:00
Paolo Bonzini	b6b8a33354	block: introduce bdrv_get_block_status API For now, bdrv_get_block_status is just another name for bdrv_is_allocated. The next patches will add more flags. This also touches all block drivers with a mostly mechanical rename. The sole exception is cow; because it calls cow_co_is_allocated from the read code, we keep that function and make cow_co_get_block_status a wrapper. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:09 +02:00
Paolo Bonzini	d663640c04	block: expect errors from bdrv_co_is_allocated Some bdrv_is_allocated callers do not expect errors, but the fallback in qcow2.c might make other callers trip on assertion failures or infinite loops. Fix the callers to always look for errors. Cc: qemu-stable@nongnu.org Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:09 +02:00
Paolo Bonzini	4f5786376e	block: remove bdrv_is_allocated_above/bdrv_co_is_allocated_above distinction Now that bdrv_is_allocated detects coroutine context, the two can use the same code. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:09 +02:00
Paolo Bonzini	bdad13b9de	block: make bdrv_co_is_allocated static bdrv_is_allocated can detect coroutine context and go through a fast path, similar to other block layer functions. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Paolo Bonzini	e641c1e81e	cow: do not call bdrv_co_is_allocated As we change bdrv_is_allocated to gather more information from bs and bs->file, it will become a bit slower. It is still appropriate for online jobs, but not for reads/writes. Call the internal function instead. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Paolo Bonzini	26ae980492	cow: make writes go at a less indecent speed Only sync once per write, rather than once per sector. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Paolo Bonzini	276cbc7f2f	cow: make reads go at a decent speed Do not do two reads for each sector; load each sector of the bitmap and use bitmap operations to process it. Writes are still dog slow! Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Fam Zheng	4f6fd3491c	block: make bdrv_delete() static Manage BlockDriverState lifecycle with refcnt, so bdrv_delete() is no longer public and should be called by bdrv_unref() if refcnt is decreased to 0. This is an identical change because effectively, there's no multiple reference of BDS now: no caller of bdrv_ref() yet, only bdrv_new() sets bs->refcnt to 1, so all bdrv_unref() now actually delete the BDS. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Fam Zheng	13c91cb7e2	iscsi: use bdrv_new() instead of stack structure BlockDriverState structure needs bdrv_new() to initialize refcnt, don't allocate a local structure variable and memset to 0, becasue with coming refcnt implementation, bdrv_unref will crash if bs->refcnt not initialized to 1. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Fam Zheng	3d34c6cd99	vvfat: use bdrv_new() to allocate BlockDriverState we need bdrv_new() to properly initialize BDS, don't allocate memory manually. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Stefan Weil	68dc036488	w32: Fix access to host devices (regression) QEMU failed to open host devices like \\.\PhysicalDrive0 (first hard disk) since some time (commit 8a79380b8ef1b02d2abd705dd026a18863b09020?). Those devices use hdev_open which did not use the latest API for options. This resulted in a fatal runtime error: Block protocol 'host_device' doesn't support the option 'filename' Duplicate code from raw_open to fix this. Cc: qemu-stable@nongnu.org Reported-by: David Brenner <david.brenner3@gmail.com> Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Benoît Canet	2024c1df43	block: Add iops_size to do the iops accounting for a given io size. This feature can be used in case where users are avoiding the iops limit by doing jumbo I/Os hammering the storage backend. Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:07 +02:00
Benoît Canet	3e9fab690d	block: Add support for throttling burst max in QMP and the command line. The max parameter of the leaky bucket throttling algorithm can be used to allow the guest to do bursts. The max value is a pool of I/O that the guest can use without being throttled at all. Throttling is triggered once this pool is empty. Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:07 +02:00
Benoît Canet	cc0681c454	block: Enable the new throttling code in the block layer. Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:07 +02:00
Anthony Liguori	bb7d4d82b6	Merge remote-tracking branch 'kwolf/for-anthony' into staging # By Max Reitz (11) and others # Via Kevin Wolf * kwolf/for-anthony: (26 commits) qemu-iotests: Overlapping cluster allocations qcow2_check: Mark image consistent qcow2-refcount: Repair shared refcount blocks qcow2-refcount: Repair OFLAG_COPIED errors qcow2-refcount: Move OFLAG_COPIED checks qcow2: Employ metadata overlap checks qcow2: Metadata overlap checks qcow2: Add corrupt bit qemu-iotests: Snapshotting zero clusters qcow2-refcount: Snapshot update for zero clusters option: Add assigned flag to QEMUOptionParameter gluster: Abort on AIO completion failure block: Remove old raw driver switch raw block driver from "raw.o" to "raw_bsd.o" raw_bsd: register bdrv_raw raw_bsd: add raw_create_options raw_bsd: introduce "special members" raw_bsd: add raw_create() raw_bsd: emit debug events in bdrv_co_readv() and bdrv_co_writev() add skeleton for BSD licensed "raw" BlockDriver ... Message-id: 1378111792-20436-1-git-send-email-kwolf@redhat.com Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>	2013-09-03 12:32:46 -05:00
Max Reitz	24530f3e06	qcow2_check: Mark image consistent If no corruptions remain after an image repair (and no errors have been encountered), clear the corrupt flag in qcow2_check. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-02 10:15:15 +02:00
Max Reitz	afa50193cd	qcow2-refcount: Repair shared refcount blocks If the refcount of a refcount block is greater than one, we can at least try to repair that problem by duplicating the affected block. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-02 10:06:59 +02:00
Stefan Hajnoczi	5b21a2ae4d	curl: qemu_bh_new() can never return NULL Drop error code path which cannot be taken since qemu_bh_new() does not return NULL. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2013-09-01 19:11:56 +04:00
Stefan Weil	4c293dc6e4	misc: Fix some typos in names and comments Most typos were found using a modified version of codespell: accross -> across issueing -> issuing TICNT_THRESHHOLD -> TICNT_THRESHOLD bandwith -> bandwidth VCARD_7816_PROPIETARY -> VCARD_7816_PROPRIETARY occured -> occurred gaurantee -> guarantee sofware -> software Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2013-09-01 18:59:24 +04:00
Max Reitz	e23e400ec6	qcow2-refcount: Repair OFLAG_COPIED errors Since the OFLAG_COPIED checks are now executed after the refcounts have been repaired (if repairing), it is safe to assume that they are correct but the OFLAG_COPIED flag may be not. Therefore, if its value differs from what it should be (considering the according refcount), that discrepancy can be repaired by correctly setting (or clearing that flag. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:48:44 +02:00
Max Reitz	4f6ed88c03	qcow2-refcount: Move OFLAG_COPIED checks Move the OFLAG_COPIED checks out of check_refcounts_l1 and check_refcounts_l2 and after the actual refcount checks/fixes (since the refcounts might actually change there). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:48:44 +02:00
Max Reitz	cf93980e77	qcow2: Employ metadata overlap checks The pre-write overlap check function is now called before most of the qcow2 writes (aborting it on collision or other error). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:48:43 +02:00
Max Reitz	a40f1c2add	qcow2: Metadata overlap checks Two new functions are added; the first one checks a given range in the image file for overlaps with metadata (main header, L1 tables, L2 tables, refcount table and blocks). The second one should be used immediately before writing to the image file as it calls the first function and, upon collision, marks the image as corrupt and makes the BDS unusable, thereby preventing further access. Both functions take a bitmask argument specifying the structures which should be checked for overlaps, making it possible to also check metadata writes against colliding with other structures. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:48:43 +02:00
Max Reitz	69c9872653	qcow2: Add corrupt bit This adds an incompatible bit indicating corruption to qcow2. Any image with this bit set may not be written to unless for repairing (and subsequently clearing the bit if the repair has been successful). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:48:43 +02:00
Max Reitz	8b81a7b6ba	qcow2-refcount: Snapshot update for zero clusters Account for all cluster types in qcow2_update_snapshot_refcounts; this prevents this function from updating the refcount of unallocated zero clusters which effectively led to wrong adjustments of the refcount of cluster 0 (the main qcow2 header). This in turn resulted in images with (unallocated) zero clusters having a cluster 0 refcount greater than one after creating a snapshot. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Bharata B Rao	9faa574f7d	gluster: Abort on AIO completion failure Currently if gluster AIO callback thread fails to notify the QEMU thread about AIO completion, we try graceful recovery by marking the disk drive as inaccessible. This error recovery code is race-prone as found by Asias and Stefan. However as found out by Paolo, this kind of error is impossible and hence simplify the code that handles this error recovery. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Kevin Wolf	e5b1d99f55	block: Remove old raw driver This is unused code now. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Laszlo Ersek	7a6d3fc594	switch raw block driver from "raw.o" to "raw_bsd.o" "Incoming" function prototypes and "outgoing" function calls must match reality. Implemented using the "struct BlockDriver" definition in "include/block/block_int.h", and gcc errors & warnings. v1->v2: On 08/20/13 09:51, Kevin Wolf wrote: > Am 18.08.2013 um 16:29 hat Paolo Bonzini geschrieben: >> Il 16/08/2013 16:15, Laszlo Ersek ha scritto: >>> +static int raw_reopen_prepare(BDRVReopenState reopen_state, >>> + BlockReopenQueue queue, Error *errp) >>> { >>> - return bdrv_reopen_prepare(bs->file); >>> + BDRVReopenState tmp = reopen_state; >>> + >>> + tmp.bs = tmp.bs->file; >>> + return bdrv_reopen_prepare(&tmp, queue, errp); >>> } >> >> This should just return zero, my fault. > > Which is because bdrv_reopen_queue() already queues bs->file for reopen. > The simple return 0; implementation is shared by all other format drivers > that support reopening images. Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Laszlo Ersek	775d6afd5c	raw_bsd: register bdrv_raw On 08/05/13 15:03, Paolo Bonzini wrote: > > [...] > > 5) Formats are registered with bdrv_register (takes a BlockDriver). You > also need to pass the caller of bdrv_register to block_init. Fill in the BlockDriver structure with the raw_() functions that have been added to "block/raw_bsd.c", in the order the fields are defined in "include/block/block_int.h". I needed more explanation / naming examples for registering the driver than what Paolo gave me, so I copied / adapted from "block/qcow2.c". The parts I took as basis for modification are blamed on commit `5efa9d5a8b` Author: Anthony Liguori <aliguori@us.ibm.com> Date: Sat May 9 17:03:42 2009 -0500 Convert block infrastructure to use new module init functionality commit `20d97356c9` Author: Blue Swirl <blauwirbel@gmail.com> Date: Fri Apr 23 20:19:47 2010 +0000 Fix OpenBSD build Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Laszlo Ersek	ff369a483d	raw_bsd: add raw_create_options On 08/05/13 15:03, Paolo Bonzini wrote: > > [...] > > 4) There is another member, .create_options, which is an array of > QEMUOptionParameter structs, terminated by an all-zero item. The only > option you need is for the virtual disk size. You will find something > to copy from in other block drivers, for example block/qcow2.c. Code taken and adapted from "block/qcow2.c", as suggested. The code being copied/modified is blamed on commit `20d97356c9` Author: Blue Swirl <blauwirbel@gmail.com> Date: Fri Apr 23 20:19:47 2010 +0000 Fix OpenBSD build and commit `7c80ab3f21` Author: Jes Sorensen <Jes.Sorensen@redhat.com> Date: Fri Dec 17 16:02:39 2010 +0100 block/qcow2.c: rename qcow_ functions to qcow2_ Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Laszlo Ersek	01dd96d8f4	raw_bsd: introduce "special members" On 08/05/13 15:03, Paolo Bonzini wrote: > > [...] > > 3) These members are special > > .format_name is the string "raw" > .bdrv_open raw_open should set bs->sg to bs->file->sg and return 0 > .bdrv_close raw_close should do nothing > .bdrv_probe raw_probe should just return 1. v1->v2: On 08/20/13 10:11, Kevin Wolf wrote: > Am 16.08.2013 um 16:15 hat Laszlo Ersek geschrieben: >> +static int raw_probe(void) >> +{ >> + return 1; >> +} > > Maybe add a comment here like "smallest possible positive score so that > raw is used if and only if no other block driver works". Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Laszlo Ersek	1565262c37	raw_bsd: add raw_create() On 08/05/13 15:03, Paolo Bonzini wrote: > > [...] > > 2) This is also a simple forwarder function: > > .bdrv_create > > but there is no BlockDriverState argument so the forwarded-to function > does not have a bs->file argument either. The forwarded-to function is > bdrv_create_file. Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Laszlo Ersek	9eaafd90d1	raw_bsd: emit debug events in bdrv_co_readv() and bdrv_co_writev() On 08/05/13 15:03, Paolo Bonzini wrote: > > [...] > > 1) BlockDriver is a struct in which these function members are > interesting: > > .bdrv_reopen_prepare > .bdrv_co_readv > .bdrv_co_writev > .bdrv_co_is_allocated > .bdrv_co_write_zeroes > .bdrv_co_discard > .bdrv_getlength > .bdrv_get_info > .bdrv_truncate > .bdrv_is_inserted > .bdrv_media_changed > .bdrv_eject > .bdrv_lock_medium > .bdrv_ioctl > .bdrv_aio_ioctl > .bdrv_has_zero_init > > They should be implemented as simple forwarders (see above). There are > 16 functions listed here, you can easily see how this already accounts > for 100+ SLOC roughly... > > The implementations of bdrv_co_readv and bdrv_co_writev should also call > BLKDBG_EVENT on bs->file too, before forwarding to bs->file. The events > to be generated are BLKDBG_READ_AIO and BLKDBG_WRITE_AIO. Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Laszlo Ersek	e1c66c6d82	add skeleton for BSD licensed "raw" BlockDriver On 08/05/13 15:03, Paolo Bonzini wrote: > > > ----- Original Message ----- >> From: "Laszlo Ersek" <lersek@redhat.com> >> To: "Paolo Bonzini" <pbonzini@redhat.com> >> Sent: Monday, August 5, 2013 2:43:46 PM >> Subject: Re: [PATCH 1/2] raw: add license header >> >> On 08/02/13 00:27, Paolo Bonzini wrote: >>> On 08/01/2013 10:13 AM, Christoph Hellwig wrote: >>>> On Wed, Jul 31, 2013 at 08:19:51AM +0200, Paolo Bonzini wrote: >>>>> Most of the block layer is under the BSD license, thus it is >>>>> reasonable to license block/raw.c the same way. CCed people should >>>>> ACK by replying with a Signed-off-by line. >>>> >>>> The coded was intended to be GPLv2. >>> >>> Laszlo, would you be willing to do clean-room reverse engineering? >>> >>> (No rants, please. :)) >> >> What's the scope exactly? > > It's quite small, it's a file full of forwarders like > > static void raw_foo(BlockDriverState bs) > { > return bdrv_foo(bs->file); > } > > It's 170 lines of code, all as boring as this. I only picked you > because I'm quite certain you have never seen the file (and the answer > confirmed it). > > Basically: > > 1) BlockDriver is a struct in which these function members are > interesting: > > .bdrv_reopen_prepare > .bdrv_co_readv > .bdrv_co_writev > .bdrv_co_is_allocated > .bdrv_co_write_zeroes > .bdrv_co_discard > .bdrv_getlength > .bdrv_get_info > .bdrv_truncate > .bdrv_is_inserted > .bdrv_media_changed > .bdrv_eject > .bdrv_lock_medium > .bdrv_ioctl > .bdrv_aio_ioctl > .bdrv_has_zero_init > > They should be implemented as simple forwarders (see above). > There are 16 functions listed here, you can easily see how this > already accounts for 100+ SLOC roughly... > > The implementations of bdrv_co_readv and bdrv_co_writev should also > call BLKDBG_EVENT on bs->file too, before forwarding to bs->file. The > events to be generated are BLKDBG_READ_AIO and BLKDBG_WRITE_AIO. > > 2) This is also a simple forwarder function: > > .bdrv_create > > but there is no BlockDriverState argument so the forwarded-to function > does not have a bs->file argument either. The forwarded-to function > is bdrv_create_file. > > 3) These members are special > > .format_name is the string "raw" > .bdrv_open raw_open should set bs->sg to bs->file->sg and return 0 > .bdrv_close raw_close should do nothing > .bdrv_probe raw_probe should just return 1. > > 4) There is another member, .create_options, which is an array of > QEMUOptionParameter structs, terminated by an all-zero item. The only > option you need is for the virtual disk size. You will find something > to copy from in other block drivers, for example block/qcow2.c. > > 5) Formats are registered with bdrv_register (takes a BlockDriver). > You also need to pass the caller of bdrv_register to block_init. > > 6) I'm not sure how to organize the patch series, so I'll leave this to > your creativity. I guess in this case move/copy detection of git should > be disabled. I would definitely include this spec in the commit > message as a proof of clean-room reverse engineering. > > 7) Remember a BSD header like the one in block.c. > > Paolo This patch implements the email up to the paragraph ending with "100+ SLOC roughly". The skeleton is generated from the list there, with a simple shell loop using "sed" and the raw_foo() template. The BSD license block is copied (and reflowed) from "util/qemu-progress.c". Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Peter Maydell	127c84e1a5	block/qcow2.h: Avoid "1LL << 63" (shifts into sign bit) The expression "1LL << 63" tries to shift the 1 into the sign bit of a 'long long', which provokes a clang sanitizer warning: runtime error: left shift of 1 by 63 places cannot be represented in type 'long long' Use "1ULL << 63" as the definition of QCOW_OFLAG_COPIED instead to avoid this. For consistency, we also update the other QCOW_OFLAG definitions to use the ULL suffix rather than LL, though only the shift by 63 is undefined behaviour. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Kevin Wolf	9117b47717	qcow2: Change default for new images to compat=1.1 By the time that qemu 1.7 will be released, enough time will have passed since qemu 1.1, which is the first version to understand version 3 images, that changing the default shouldn't hurt many people any more and the benefits of using the new format outweigh the pain. qemu-iotests already runs with compat=1.1 by default. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-08-30 15:28:51 +02:00
Stefan Hajnoczi	b10577df13	win32-aio: drop win32_aio_flush_cb() The io_flush argument to qemu_aio_set_event_notifier() has been removed since the block layer learnt to drain requests by itself. Fix the Windows build for win32-aio.o by updating the qemu_aio_set_event_notifier() call and dropping win32_aio_flush_cb(). Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 22:05:04 +02:00
Alex Bligh	bc72ad6754	aio / timers: Switch entire codebase to the new timer API This is an autogenerated patch using scripts/switch-timer-api. Switch the entire code base to using the new timer API. Note this patch may introduce some line length issues. Signed-off-by: Alex Bligh <alex@alex.org.uk> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 19:14:24 +02:00
Alex Bligh	7483d1e547	aio / timers: convert block_job_sleep_ns and co_sleep_ns to new API Convert block_job_sleep_ns and co_sleep_ns to use the new timer API. Signed-off-by: Alex Bligh <alex@alex.org.uk> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 19:14:24 +02:00
Paolo Bonzini	04d542c8b8	vmdk: support vmfs files VMware ESX hosts also use different create and extent types for flat files, respectively "vmfs" and "VMFS". This is not documented, but it can be found at http://kb.vmware.com/kb/10002511 (Recreating a missing virtual machine disk (VMDK) descriptor file). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 15:35:58 +02:00
Fam Zheng	daac8fdc68	vmdk: support vmfsSparse files VMware ESX hosts use a variant of the VMDK3 format, identified by the vmfsSparse create type ad the VMFSSPARSE extent type. It has 16 KB grain tables (L2) and a variable-size grain directory (L1). In addition, the grain size is always 512, but that is not a problem because it is included in the header. The format of the extents is documented in the VMDK spec. The format of the descriptor file is not documented precisely, but it can be found at http://kb.vmware.com/kb/10026353 (Recreating a missing virtual machine disk (VMDK) descriptor file for delta disks). With these patches, vmfsSparse files only work if opened through the descriptor file. Data files without descriptor files, as far as I could understand, are not supported by ESX. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> -- v2: Rebase to patch 01. Change le64_to_cpu to le32_to_cpu. Rename vmdk_open_vmdk3 to vmdk_open_vmfs_sparse, which represents the current usage of this format. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 15:35:58 +02:00
Fam Zheng	f6b61e54bd	vmdk: fix L1 and L2 table size in vmdk3 open VMDK3 header has the field l1dir_size, but vmdk_open_vmdk3 hardcoded the value. This patch honors the header field. And the L2 table size is 4096 according to VMDK spec[1], instead of 1 << 9 (512). [1]: http://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf?src=vmdk Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 15:35:58 +02:00
Fam Zheng	b0651b8c24	vmdk: Move l1_size check into vmdk_add_extent() This header check is common to VMDK3 and VMDK4, so move it into vmdk_add_extent(). Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 15:35:58 +02:00
Asias He	0d51b4debe	block: Introduce bs->zero_beyond_eof In 4146b46c42e0989cb5842e04d88ab6ccb1713a48 (block: Produce zeros when protocols reading beyond end of file), we break qemu-iotests ./check -qcow2 022. This happens because qcow2 temporarily sets ->growable = 1 for vmstate accesses (which are stored beyond the end of regular image data). We introduce the bs->zero_beyond_eof to allow qcow2_load_vmstate() to disable ->zero_beyond_eof temporarily in addition to enable ->growable. [Since the broken patch "block: Produce zeros when protocols reading beyond end of file" has not been merged yet, I have applied this fix first and will then apply the next patch to keep the tree bisectable. -- Stefan] Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 14:10:21 +02:00
Kevin Wolf	8ad1898cf1	qcow2: Change default for new images to compat=1.1 By the time that qemu 1.7 will be released, enough time will have passed since qemu 1.1, which is the first version to understand version 3 images, that changing the default shouldn't hurt many people any more and the benefits of using the new format outweigh the pain. qemu-iotests already runs with compat=1.1 by default. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-21 14:41:09 +02:00
Stefan Hajnoczi	f2e5dca46b	aio: drop io_flush argument The .io_flush() handler no longer exists and has no users. Drop the io_flush argument to aio_set_fd_handler() and related functions. The AioFlushEventNotifierHandler and AioFlushHandler typedefs are no longer used and are dropped too. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:52:19 +02:00
Stefan Hajnoczi	f0d3576599	block/ssh: drop return_true() .io_flush() is no longer called so drop return_true(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:52:19 +02:00
Stefan Hajnoczi	d6d94c6785	block/sheepdog: drop have_co_req() and aio_flush_request() .io_flush() is no longer called so drop have_co_req() and aio_flush_request(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:52:19 +02:00
Stefan Hajnoczi	5d289cc724	block/rbd: drop qemu_rbd_aio_flush_cb() .io_flush() is no longer called so drop qemu_rbd_aio_flush_cb(). qemu_aio_count is unused now so drop it too. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:52:19 +02:00
Stefan Hajnoczi	bed2e759eb	block/nbd: drop nbd_have_request() .io_flush() is no longer called so drop nbd_have_request(). We cannot drop in_flight since it is still used by other block/nbd.c code. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:52:19 +02:00
Stefan Hajnoczi	94473d0c06	block/linux-aio: drop qemu_laio_completion_cb() .io_flush() is no longer called so drop qemu_laio_completion_cb(). It turns out that count is now unused so drop that too. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:52:19 +02:00
Stefan Hajnoczi	70ecdc6e4e	block/iscsi: drop iscsi_process_flush() .io_flush() is no longer called so drop iscsi_process_flush(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:52:19 +02:00
Stefan Hajnoczi	372835fbc3	block/gluster: drop qemu_gluster_aio_flush_cb() Since .io_flush() is no longer called we do not need qemu_gluster_aio_flush_cb() anymore. It turns out that qemu_aio_count is unused now and can be dropped. Thanks to Bharata B Rao <bharata@linux.vnet.ibm.com> for catching a build failure with CONFIG_GLUSTERFS_DISCARD, which has been fixed. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:51:09 +02:00
Stefan Hajnoczi	0d1460226f	block/curl: drop curl_aio_flush() .io_flush() is no longer called so drop curl_aio_flush(). The acb[] array that the function checks is still used in other parts of block/curl.c. Therefore we cannot remove acb[], it is needed. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:45:35 +02:00
Stefan Hajnoczi	88266f5aa7	block: stop relying on io_flush() in bdrv_drain_all() If a block driver has no file descriptors to monitor but there are still active requests, it can return 1 from .io_flush(). This is used to spin during synchronous I/O. Stop relying on .io_flush() and instead check QLIST_EMPTY(&bs->tracked_requests) to decide whether there are active requests. This is the first step in removing .io_flush() so that event loops no longer need to have the concept of synchronous I/O. Eventually we may be able to kill synchronous I/O completely by running everything in a coroutine, but that is future work. Note this patch moves bs->throttled_reqs initialization to bdrv_new() so that bdrv_requests_pending(bs) can safely access it. In practice bs is g_malloc0() so the memory is already zeroed but it's safer to initialize the queue properly. We also need to fix up block/stream.c:close_unused_images() to prevent traversing a dangling pointer while it rearranges the backing file chain. This is necessary since the new bdrv_drain_all() traverses the backing file chain. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:45:34 +02:00
Paolo Bonzini	7748c1bd50	raw: add license header Most of the block layer is under the BSD license, thus it is reasonable to license block/raw.c the same way. CCed people should ACK by replying with a Signed-off-by line. Cc: Christoph Hellwig <hch@lst.de> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Jeff Cody <jcody@redhat.com> Cc: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1375251592-2537-2-git-send-email-pbonzini@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-08-12 09:15:11 -05:00
Fam Zheng	ca8804ced9	vmdk: rename num_gtes_per_gte to num_gtes_per_gt num_gtes_per_gte is a historical typo, rename it to a more sensible name. It means "number of GrainTableEntries per GrainTable". Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-06 15:27:32 +02:00
Fam Zheng	bf81507de3	vmdk: use heap allocation for whole_grain We should never grow the stack beyond 1 MB, otherwise we'll fall off the end. Thread stacks and coroutine stacks (1 MB) do not grow. get_cluster_offset() allocates a big stack offset, it will fail for big cluster images, change to heap allocated buffer. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-06 15:27:32 +02:00
Fam Zheng	2c43e43c8c	vmdk: check l1 size before opening image L1 table size is calculated from capacity, granularity and l2 table size. If capacity is too big or later two are too small, the L1 table will be too big to allocate in memory. Limit it to a reasonable range. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-06 15:27:32 +02:00
Fam Zheng	f8ce04036e	vmdk: check l2 table size when opening header.num_gtes_per_gte determines size for L2 table. Check for too big value before using it. Limit to 512M entries (2GB per one L2 table). Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-06 15:27:32 +02:00
Fam Zheng	8aa1331c09	vmdk: check granularity field in opening Granularity is used to calculate the cluster size and allocate r/w buffer. Check the value from image before using it, so we don't abort() for unbounded memory allocation. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-06 15:27:32 +02:00
Fam Zheng	e98768d437	vmdk: use unsigned values for on disk header fields The size and offset fields are all non-negative values, use uint64_t for them to avoid getting negative in memory value by int overflow. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-06 15:27:32 +02:00
Fam Zheng	5d8caa543c	vmdk: Make VMDK3Header and VmdkGrainMarker QEMU_PACKED It's best to make it consistent that all on disk structures are QEMU_PACKED. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-06 15:27:32 +02:00
Liu Yuan	e4f5c1bf8f	sheepdog: add missing .bdrv_has_zero_init Commit `3ac21627` changed the behaviour of bdrv_has_zero_init() to default to 0. In the review for Sheepdog it turned out that enabling it is safe, so that commit updated one BlockDriver definition of sheepdog to use bdrv_has_zero_init_1, missed however that there are more BlockDrivers in the driver. Fix these now. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-06 10:41:56 +02:00
Fam Zheng	8e50724313	vmdk: fix comment for vmdk_co_write_zeroes The comment was truncated. Add the missing parts, especially explain why we need zero_dry_run. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2013-08-02 18:07:04 +04:00
Richard W.M. Jones	f5075224d6	block/iscsi.c: Fix printf format error. The error on armv7hl was: block/iscsi.c: In function ‘is_request_lun_aligned’: block/iscsi.c:251:26: error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘int64_t’ [-Werror=format=] iscsilun->block_size, sector_num, nb_sectors); ^ This also splits the long line to comply with qemu coding guidelines. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2013-08-02 18:02:48 +04:00
Peter Maydell	2440a2c3df	block/sheepdog: Rename 'dprintf' to 'DPRINTF' 'dprintf' is the name of a POSIX standard function so we should not be stealing it for our debug macro. Rename to 'DPRINTF' (in line with a number of other source files.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Andreas Färber <afaerber@suse.de> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Acked-by: Richard Henderson <rth@twiddle.net> Acked-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1375100199-13934-2-git-send-email-peter.maydell@linaro.org Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-07-29 19:33:53 -05:00
Anthony Liguori	eddbf0ab9d	Merge remote-tracking branch 'stefanha/block' into staging # By Stefan Hajnoczi (4) and others # Via Stefan Hajnoczi * stefanha/block: dataplane: refuse to start if device is already in use dataplane: enable virtio-blk x-data-plane=on live migration migration: fix spice migration migration: notify migration state before starting thread block: Repair the throttling code. gluster: Add image resize support Message-id: 1375112172-24863-1-git-send-email-stefanha@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-07-29 11:33:48 -05:00
Paolo Bonzini	42ec24e285	gluster: Add image resize support Implement .bdrv_truncate in GlusterFS block driver so that GlusterFS backend can support image resizing. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Tested-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-07-29 17:07:37 +02:00
Stefan Weil	52f350227f	misc: Fix new typos in comments and strings All these typos were found by codespell. sould -> should emperical -> empirical intialization -> initialization successfuly -> successfully gaurantee -> guarantee Fix also another error (before before) in the same context. Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2013-07-27 11:22:54 +04:00
Ian Main	fc5d3f8432	Implement sync modes for drive-backup. This patch adds sync-modes to the drive-backup interface and implements the FULL, NONE and TOP modes of synchronization. FULL performs as before copying the entire contents of the drive while preserving the point-in-time using CoW. NONE only copies new writes to the target drive. TOP copies changes to the topmost drive image and preserves the point-in-time using CoW. For sync mode TOP are creating a new target image using the same backing file as the original disk image. Then any new data that has been laid on top of it since creation is copied in the main backup_run() loop. There is an extra check in the 'TOP' case so that we don't bother to copy all the data of the backing file as it already exists in the target. This is where the bdrv_co_is_allocated() is used to determine if the data exists in the topmost layer or below. Also any new data being written is intercepted via the write_notifier hook which ends up calling backup_do_cow() to copy old data out before it gets overwritten. For mode 'NONE' we create the new target image and only copy in the original data from the disk image starting from the time the call was made. This preserves the point in time data by only copying the parts that are going to change to the target image. This way we can reconstruct the final image by checking to see if the given block exists in the new target image first, and if it does not, you can get it from the original image. This is basically an optimization allowing you to do point-in-time snapshots with low overhead vs the 'FULL' version. Since there is no old data to copy out the loop in backup_run() for the NONE case just calls qemu_coroutine_yield() which only wakes up after an event (usually cancel in this case). The rest is handled by the before_write notifier which again calls backup_do_cow() to write out the old data so it can be preserved. Signed-off-by: Ian Main <imain@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-07-26 22:01:31 +02:00
Kevin Wolf	64aa99d3e0	qcow2: Use dashes instead of underscores in options This is what QMP wants to use. The options haven't been enabled in any release yet, so we're still free to change them. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-07-26 21:59:56 +02:00
Peter Lieven	a23fdf3559	block/raw: add .bdrv_get_info Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-07-19 15:27:37 +08:00
Peter Lieven	8bf9344ad6	block/raw: add bdrv_co_write_zeroes Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-07-19 12:29:21 +08:00
Fam Zheng	78f27bd02c	block: fix vvfat error path for enable_write_target s->qcow and s->qcow_filename are allocated but not freed on error. Fix the possible leaks, remove unnecessary check for bdrv_new(), propagate ret code of bdrv_create() and also the one of enable_write_target(). Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-07-19 12:29:21 +08:00
Bharata B Rao	0c14fb47ec	gluster: Add discard support for GlusterFS block driver. Implement bdrv_aio_discard for gluster. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-07-19 12:29:21 +08:00
Peter Lieven	0777b5dde4	iscsi: factor out sector conversions Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-07-17 17:01:41 +02:00
Peter Lieven	91bea4e2bb	iscsi: assert that sectors are aligned to LUN blocksize if the blocksize of an iSCSI LUN is bigger than the BDRV_SECTOR_SIZE it is possible that sector_num or nb_sectors are not correctly aligned. to avoid corruption we fail requests which are misaligned. Signed-off-by: Peter Lieven <pl@kamp.de> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-07-17 17:01:41 +02:00
Peter Lieven	7e4d5a9f94	iscsi: remove support for misaligned nb_sectors in aio_readv this hask is not working (anymore). support for misaligned offsets should be handled at the block layer. Signed-off-by: Peter Lieven <pl@kamp.de> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-07-17 17:01:41 +02:00
Peter Lieven	d3bda7bc16	iscsi: fix -ENOSPC in iscsi_create() the -ENOPSC case did not work due to the missing goto. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-07-17 17:00:28 +02:00
Ronnie Sahlberg	0a53f01074	Fix iSCSI crash on SG_IO with an iovector Don't assume that SG_IO is always invoked with a simple buffer, check the iovec_count and if it is >= 1 then we need to pass an array of iovectors to libiscsi instead of just a plain buffer. Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-07-17 17:00:26 +02:00
Kevin Wolf	98289620e0	block: Don't parse protocol from file.filename One of the major reasons for doing something new for -blockdev and blockdev-add was that the old block layer code parses filenames instead of just taking them literally. So we should really leave it untouched when it's passing using the new interfaces (like -drive file.filename=...). This allows opening relative file names that contain a colon. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-07-15 09:49:00 +02:00
Fam Zheng	3494d65027	curl: refuse to open URL from HTTP server without range support CURL driver requests partial data from server on guest IO req. For HTTP and HTTPS, it uses "Range: **" in requests, and this will not work if server not accepting range. This patch does this check when open. Removed curl_size_cb, which is not used: On one hand it's registered to libcurl as CURLOPT_WRITEFUNCTION, instead of CURLOPT_HEADERFUNCTION, which will get called with data, not header. On the other hand the s->len is assigned unconditionally later. In this gone function, the sscanf for "Content-Length: %zd", on (void )ptr, which is not guaranteed to be zero-terminated, is potentially a security bug. So this patch fixes it as a side-effect. The bug is reported as: https://bugs.launchpad.net/qemu/+bug/1188943 (Note the bug is marked "private" so you might not be able to see it) Introduced curl_header_cb, which is used to parse header and mark the server as accepting range if "Accept-Ranges: bytes" line is seen from response header. If protocol is HTTP or HTTPS, but server response has no not this support, refuse to open this URL. Note that python builtin module SimpleHTTPServer is an example of not supporting range, if you need to test this driver, get a better server or use internet URLs. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-07-05 09:40:18 +02:00
Fam Zheng	da7a50f938	vmdk: Implement .bdrv_has_zero_init Depending on the subformat, has_zero_init queries underlying storage for flat extent. If it has a flat extent and its underlying storage doesn't have zero init, return 0. Otherwise return 1. Aligns the operator assignments. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-07-05 09:40:18 +02:00
Peter Lieven	3ac216270a	block: change default of .has_zero_init to 0 .has_zero_init defaults to 1 for all formats and protocols. this is a dangerous default since this means that all new added drivers need to manually overwrite it to 0 if they do not ensure that a device is zero initialized after bdrv_create(). if a driver needs to explicitly set this value to 1 its easier to verify the correctness in the review process. during review of the existing drivers it turned out that ssh and gluster had a wrong default of 1. both protocols support host_devices as backend which are not by default zero initialized. this wrong assumption will lead to possible corruption if qemu-img convert is used to write to such a backend. vpc and vmdk also defaulted to 1 altough they support fixed respectively flat extends. this has to be addresses in separate patches. both formats as well as the mentioned ssh and gluster are turned to the default of 0 with this patch for safety. a similar problem with the wrong default existed for iscsi most likely because the driver developer did oversee the default value of 1. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-28 13:52:35 +02:00
Kevin Wolf	72c6cc94da	vpc: Implement .bdrv_has_zero_init Depending on the subformat, has_zero_init on VHD must behave like raw and query the underlying storage (fixed) or like other sparse formats that can always return 1 (dynamic, differencing). Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-28 10:21:00 +02:00
Fam Zheng	8ed610a1c9	vmdk: remove wrong calculation of relative path When creating image with backing file, the driver tries to calculate the relative path from created image file to backing file, but the path computation is incorrect. e.g.: $ qemu-img create -f vmdk -b vmdk-data-disk.vmdk vmdk-data-snapshot1 Formatting 'vmdk-data-snapshot1', fmt=vmdk size=10737418240 backing_file='vmdk-data-disk.vmdk' compat6=off zeroed_grain=off $ qemu-img info vmdk-data-snapshot1 image: vmdk-data-snapshot1 file format: vmdk virtual size: 10G (10737418240 bytes) disk size: 12K -> backing file: disk.vmdk The common part in file names, "vmdk-data-", is incorrectly forgotten by relative_path(). As the VMDK specification has no restriction on parentNameHint to be relative path, we simply remove this by using the backing_file option. Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-28 09:20:27 +02:00
Kevin Wolf	8ab6feec2c	gluster: Return bdrv_has_zero_init = 0 GlusterFS volumes can be backed by block devices, in which case bdrv_create() doesn't make sure that the image is zeroed out. It is currently not possibly to detect whether a given image is backed by a file or a block device, and incorrectly assuming that it is zeroed corrupts images during qemu-img convert, so let's err on the side of caution and always return 0. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-28 09:20:27 +02:00
Richard W.M. Jones	0b3f21e6a9	block/ssh: Set bdrv_has_zero_init according to the file type. If the remote is a regular file, set it to true (ie. reads of uninitialized areas in a newly created file will return zeroes). If we can't prove that, return false (a safe default). Tested by adding a debugging print statement [not part of this commit] and creating a remote file and a remote block device: $ ./qemu-img create ssh://localhost/tmp/new 100M Formatting 'ssh://localhost/tmp/new', fmt=raw size=104857600 filename ssh://localhost/tmp/new: has_zero_init = 1 $ sudo lvcreate -L 1G -n tmp /dev/fedora Logical volume "tmp" created $ ./qemu-img create ssh://localhost/dev/fedora/tmp 1G Formatting 'ssh://localhost/dev/fedora/tmp', fmt=raw size=1073741824 filename ssh://localhost/dev/fedora/tmp: has_zero_init = 0 Cc: Kevin Wolf <kwolf@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-28 09:20:27 +02:00
Kevin Wolf	f59fee8d50	block: Make BlockJobTypes const Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-28 09:20:27 +02:00
Dietmar Maurer	98d2c6f2cd	block: add basic backup support to block driver backup_start() creates a block job that copies a point-in-time snapshot of a block device to a target block device. We call backup_do_cow() for each write during backup. That function reads the original data from the block device before it gets overwritten. The data is then written to the target device. Currently backup cluster size is hardcoded to 65536 bytes. [I made a number of changes to Dietmar's original patch and folded them in to make code review easy. Here is the full list: * Drop BackupDumpFunc interface in favor of a target block device * Detect zero clusters with buffer_is_zero() and use bdrv_co_write_zeroes() * Use 0 delay instead of 1us, like other block jobs * Unify creation/start functions into backup_start() * Simplify cleanup, free bitmap in backup_run() instead of cb * function * Use HBitmap to avoid duplicating bitmap code * Use bdrv_getlength() instead of accessing ->total_sectors * directly * Delete the backup.h header file, it is no longer necessary * Move ./backup.c to block/backup.c * Remove #ifdefed out code * Coding style and whitespace cleanups * Use bdrv_add_before_write_notifier() instead of blockjob-specific hooks * Keep our own in-flight CowRequest list instead of using block.c tracked requests. This means a little code duplication but is much simpler than trying to share the tracked requests list and use the backup block size. * Add on_source_error and on_target_error error handling. * Use trace events instead of DPRINTF() -- stefanha] Signed-off-by: Dietmar Maurer <dietmar@proxmox.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-28 09:20:26 +02:00
Kevin Wolf	a5c5ea3f60	raw-posix: Fix /dev/cdrom magic on OS X The raw-posix driver has code to provide a /dev/cdrom on OS X even though it doesn't really exist. However, since commit `c66a6157` the real filename is dismissed after finding it, so opening /dev/cdrom fails. Put the filename back into the options QDict to make this work again. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-28 09:20:26 +02:00
Fam Zheng	96c51eb5e4	vmdk: refuse to open higher version than supported Refuse to open higher version for safety. Although we try to be compatible with published VMDK spec, VMware has newer version from ESXi 5.1 exported OVF/OVA, which we have no knowledge what's changed in it. And it is very likely to have more new versions in the future, so it's not safe to open them blindly. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-24 10:25:43 +02:00
Kevin Wolf	0b919fae31	qcow2: Batch discards This optimises the discard operation for freed clusters by batching discard requests (both snapshot deletion and bdrv_discard end up updating the refcounts cluster by cluster). Note that we don't discard asynchronously, but keep s->lock held. This is to avoid that a freed cluster is reallocated and written to while the discard is still in flight. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-24 10:25:17 +02:00
Kevin Wolf	67af674e47	qcow2: Options to enable discard for freed clusters Deleted snapshots are discarded in the image file by default, discard requests take their default from the -drive discard=... option and other places that free clusters must always be enabled explicitly. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-24 10:25:17 +02:00
Kevin Wolf	6cfcb9b8b9	qcow2: Add refcount update reason to all callers This adds a refcount update reason to all callers of update_refcounts(), so that a follow-up patch can use this information to decide whether clusters that reach a refcount of 0 should be discarded in the image file. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-24 10:25:17 +02:00
Anthony Liguori	3ed8a8430a	Merge remote-tracking branch 'bonzini/scsi-next' into staging # By Paolo Bonzini (3) and others # Via Paolo Bonzini * bonzini/scsi-next: iscsi: reorganize iscsi_readcapacity_sync iscsi: simplify freeing of tasks vhost-scsi: fix k->set_guest_notifiers() NULL dereference scsi-disk: scsi-block device for scsi pass-through should not be removable scsi-generic: check the return value of bdrv_aio_ioctl in execute_command scsi-generic: fix sign extension of READ CAPACITY(10) data scsi: reset cdrom tray statuses on scsi_disk_reset Message-id: 1371565016-2643-1-git-send-email-pbonzini@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-06-18 10:06:47 -05:00
Paolo Bonzini	1288844e7c	iscsi: reorganize iscsi_readcapacity_sync Avoid the goto, and use the same retry logic for the 10- and 16- byte versions. Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-06-18 12:43:03 +02:00
Paolo Bonzini	f0d2a4d4d6	iscsi: simplify freeing of tasks Always free them in the iscsi_aio_*_acb functions and remove the checks in their callers. Remove ifs when the task struct was previously dereferenced (spotted by Coverity). Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-06-18 12:43:03 +02:00
Ján Tomko	2330790879	nbd: strip braces from literal IPv6 address in URI Otherwise they would get passed to getaddrinfo and fail with: address resolution failed for [::1]🔢 Name or service not known (Broken by commit v1.4.0-736-gf17c90b) Signed-off-by: Ján Tomko <jtomko@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-06-18 11:43:00 +02:00
Anthony Liguori	21a885a7e2	Merge remote-tracking branch 'luiz/queue/qmp' into staging # By Luiz Capitulino # Via Luiz Capitulino * luiz/queue/qmp: qerror: drop QERR_OPEN_FILE_FAILED macro block: bdrv_reopen_prepare(): don't use QERR_OPEN_FILE_FAILED savevm: qmp_xen_save_devices_state(): use error_setg_file_open() dump: qmp_dump_guest_memory(): use error_setg_file_open() cpus: use error_setg_file_open() blockdev: use error_setg_file_open() block: mirror_complete(): use error_setg_file_open() rng-random: use error_setg_file_open() error: add error_setg_file_open() helper Message-id: 1371484631-29510-1-git-send-email-lcapitulino@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-06-17 13:14:46 -05:00
Evgeny Budilovsky	0bed087df2	vmdk: Allow reading variable size descriptor files the hard-coded 2k buffer on the stack won't allow reading big descriptor files which can be generated when storing big images. For example 500G vmdk splitted to 2G chunks. Signed-off-by: Evgeny Budilovsky <evgeny.budilovsky@ravellosystems.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-17 17:47:59 +02:00
Richard W.M. Jones	8da1aa15db	curl: Don't set curl options on the handle just before it's going to be deleted. (Found by Kamil Dudka) Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Cc: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-17 17:47:59 +02:00
Stefan Hajnoczi	5a394b9e96	vmdk: byteswap VMDK4Header.desc_offset field Remember to byteswap VMDK4Header.desc_offset on big-endian machines. Cc: qemu-stable@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-17 17:47:59 +02:00
Richard W.M. Jones	a7cea2ba47	block/curl.c: Refuse to open the handle for writes. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-17 17:47:59 +02:00
Liu Yuan	cede621ffc	sheepdog: support 'qemu-img snapshot -a' Just call sd_create_branch() in the snapshot_goto to rollback the image is good enough. With this patch, 'loadvm' process for sheepdog is modified: Suppose we have a snapshot chain A --> B --> C, we do 'loadvm A' so as to get a new chain, A --> B \| V C1 in the old code: 1 reload inode of A (in snapshot_goto) 2 read vmstate via A's vdi_id (loadvm_state) 3 delete C and create C1, reload inode of C1 (sd_create_branch on write) with this patch applied: 1 reload inode of A, delete C and create C1 (in snapshot_goto) 2 read vmstate via C1's parent, that is A's vdi_id (loadvm_state) This will fix the possible bug that QEMU exit between 2 and 3 in the old code Cc: qemu-devel@nongnu.org Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-17 17:47:58 +02:00
Liu Yuan	b579ffb3fd	sheepdog: fix snapshot tag initialization This is an old and obvious bug. We should pass snapshot_id to the tag. Or simple command like 'qemu-img snapshot -a tag sheepdog:image' will fail Cc: qemu-devel@nongnu.org Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-17 17:47:58 +02:00
Luiz Capitulino	dacc26aae5	block: mirror_complete(): use error_setg_file_open() Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com>	2013-06-17 11:01:14 -04:00
Richard W.M. Jones	9e5e2b23d3	curl: Whitespace only changes. Trivial patch to remove odd whitespace. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2013-06-11 23:45:43 +04:00
Wenchao Xia	553a7e8718	qmp: add ImageInfo in BlockDeviceInfo used by query-block Now image info will be retrieved as an embbed json object inside BlockDeviceInfo, backing chain info and all related internal snapshot info can be got in the enhanced recursive structure of ImageInfo. New recursive member *backing-image is added to reflect the backing chain status. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-07 13:37:45 +02:00
Wenchao Xia	43526ec8d1	block: add image info query function bdrv_query_image_info() This patch adds function bdrv_query_image_info(), which will retrieve image info in qmp object format. The implementation is based on the code moved from qemu-img.c, but uses block layer function to get snapshot info. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-07 13:37:45 +02:00
Wenchao Xia	fb0ed4539c	block: add snapshot info query function bdrv_query_snapshot_info_list() This patch adds function bdrv_query_snapshot_info_list(), which will retrieve snapshot info of an image in qmp object format. The implementation is based on the code moved from qemu-img.c with modification to fit more for qmp based block layer API. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-07 13:37:45 +02:00
Kevin Wolf	bf736fe34c	blkdebug: Add BLKDBG_FLUSH_TO_OS/DISK events Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-06 11:27:22 +02:00
Wenchao Xia	5b91704469	block: dump snapshot and image info to specified output bdrv_snapshot_dump() and bdrv_image_info_dump() do not dump to a buffer now, some internal buffers are still used for format control, which have no chance to be truncated. As a result, these two functions have no more issue of truncation, and they can be used by both qemu and qemu-img with correct parameter specified. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-04 13:56:30 +02:00
Wenchao Xia	f364ec65b5	block: move qmp and info dump related code to block/qapi.c This patch is a pure code move patch, except following modification: 1 get_human_readable_size() is changed to static function. 2 dump_human_image_info() is renamed to bdrv_image_info_dump(). 3 in qmp_query_block() and qmp_query_blockstats, use bdrv_next(bs) instead of direct traverse of global array 'bdrv_states'. 4 collect_snapshots() and collect_image_info() are renamed, unused parameter *fmt in collect_image_info() is removed. 5 code style fix. To avoid conflict and tip better, macro in header file is BLOCK_QAPI_H instead of QAPI_H. Now block.h and snapshot.h are at the same level in include path, block_int.h and qapi.h will both include them. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-04 13:56:30 +02:00
Wenchao Xia	de08c606f9	block: move snapshot code in block.c to block/snapshot.c All snapshot related code, except bdrv_snapshot_dump() and bdrv_is_snapshot(), is moved to block/snapshot.c. bdrv_snapshot_dump() will be moved to another file later. bdrv_is_snapshot() is not related with internal snapshot. It also fixes small code style errors reported by check script. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-04 13:56:30 +02:00
Qiao Nuohan	ce3a4718fe	Remove twice include of qemu-common.h This patch is used to remove twice include of "qemu-common.h" in block/win32-aio.c Signed-off-by: Qiao Nuohan <qiaonuohan@cn.fujitsu.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2013-05-18 16:35:11 +04:00
Kevin Wolf	2cf7cfa1cd	qcow2: Catch some L1 table index overflows This catches the situation that is described in the bug report at https://bugs.launchpad.net/qemu/+bug/865518 and goes like this: $ qemu-img create -f qcow2 huge.qcow2 $((10241024))T Formatting 'huge.qcow2', fmt=qcow2 size=1152921504606846976 encryption=off cluster_size=65536 lazy_refcounts=off $ qemu-io /tmp/huge.qcow2 -c "write $((102410241024102410241024 - 1024)) 512" Segmentation fault With this patch applied the segfault will be avoided, however the case will still fail, though gracefully: $ qemu-img create -f qcow2 /tmp/huge.qcow2 $((10241024))T Formatting 'huge.qcow2', fmt=qcow2 size=1152921504606846976 encryption=off cluster_size=65536 lazy_refcounts=off qemu-img: The image size is too large for file format 'qcow2' Note that even long before these overflow checks kick in, you get insanely high memory usage (up to INT_MAX sizeof(uint64_t) = 16 GB for the L1 table), so with somewhat smaller image sizes you'll probably see qemu aborting for a failed g_malloc(). If you need huge image sizes, you should increase the cluster size to the maximum of 2 MB in order to get higher limits. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-14 16:44:33 +02:00
Dong Xu Wang	c7e775e4dd	remove double semicolons Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2013-05-12 13:25:55 +04:00
Fam Zheng	cdeaf1f159	vmdk: add bdrv_co_write_zeroes Use special offset to write zeroes efficiently, when zeroed-grain GTE is available. If zero-write an allocated cluster, cluster is leaked because its offset pointer is overwritten by "0x1". Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-03 10:33:49 +02:00
Fam Zheng	e304e8e5a0	vmdk: store fields of VmdkMetaData in cpu endian Previously VmdkMetaData.offset is stored little endian while other fields are cpu endian. This changes offset to cpu endian and convert before writing to image. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-03 10:33:46 +02:00
Fam Zheng	95b0aa4231	vmdk: change magic number to macro Two hard coded flag bits are changed to macros. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-03 10:33:43 +02:00
Fam Zheng	69e0b6dfa4	vmdk: Add option to create zeroed-grain image Add image create option "zeroed-grain" to enable zeroed-grain GTE feature of vmdk sparse extents. When this option is on, header version of newly created extent will be 2 and VMDK4_FLAG_ZERO_GRAIN flag bit will be set. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-03 10:33:41 +02:00
Fam Zheng	14ead646fe	vmdk: add support for “zeroed‐grain” GTE Introduced support for zeroed-grain GTE, as specified in Virtual Disk Format 5.0[1]. Recent VMware hosted platform products support a new “zeroed‐grain” grain table entry (GTE). The zeroed‐grain GTE returns all zeros on read. In other words, the zeroed‐grain GTE indicates that a grain in the child disk is zero‐filled but does not actually occupy space in storage. A sparse extent with zeroed‐grain GTE has the following in its header: * SparseExtentHeader.version = 2 * SparseExtentHeader.flags has bit 2 set Other than the new flag and the possibly zeroed‐grain GTE, version 2 sparse extents are identical to version 1. Also, a zeroed‐grain GTE has value 0x1 in the GT table. [1] Virtual Disk Format 5.0, http://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf?src=vmdk Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-03 10:33:38 +02:00
Fam Zheng	65f7472577	vmdk: named return code. Internal routines in vmdk.c previously return -1 on error and 0 on success. More return values are useful for future changes such as zeroed-grain GTE. Change all the magic `return 0` and `return -1` to macro names: * VMDK_OK 0 * VMDK_ERROR (-1) * VMDK_UNALLOC (-2) * VMDK_ZEROED (-3) Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-03 10:33:35 +02:00
Jeff Cody	059e2fbbca	block: add read-only support to VHDX image format. This adds in read-only support to the VHDX image format. This supports reads for fixed-size, and dynamic sized VHDX images. Differencing files are still unsupported. The image must be opened without BDRV_O_RDWR set, because we do not yet update the headers. I.e., pass 'readonly=on' in the drive image options from the QEMU commandline. Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-03 10:31:58 +02:00
Jeff Cody	e8d4e5ffdb	block: initial VHDX driver support framework - supports open and probe This is the initial block driver framework for VHDX image support (i.e. Hyper-V image file formats), that supports opening VHDX files, and parsing the headers. This commit does not yet enable: - reading - writing - updating the header - differencing files (images with parents) - log replay / dirty logs (only clean images) This is based on Microsoft's VHDX specification: "VHDX Format Specification v0.95", published 4/12/2012 https://www.microsoft.com/en-us/download/details.aspx?id=29681 Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-03 10:31:58 +02:00
Jeff Cody	203cdba3bc	block: vhdx header for the QEMU support of VHDX images This is based on Microsoft's VHDX specification: "VHDX Format Specification v0.95", published 4/12/2012 https://www.microsoft.com/en-us/download/details.aspx?id=29681 These structures define the various header, metadata, and other block structures defined in the VHDX specification. Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-03 10:31:58 +02:00
Liu Yuan	859e5553a4	sheepdog: fix loadvm operation Currently the 'loadvm' opertaion works as following: 1. switch to the snapshot 2. mark current working VDI as a snapshot 3. rely on sd_create_branch to create a new working VDI based on the snapshot This works not the same as other format as QCOW2. For e.g, qemu > savevm # get a live snapshot snap1 qemu > savevm # snap2 qemu > loadvm 1 # This will steally create snap3 of the working VDI Which will result in following snapshot chain: base <-- snap1 <-- snap2 <-- snap3 ^ \| working VDI snap3 was unnecessarily created and might be annoying users. This patch discard the unnecessary 'snap3' creation. and implement rollback(loadvm) operation to the specified snapshot by 1. switch to the snapshot 2. delete working VDI 3. rely on sd_create_branch to create a new working VDI based on the snapshot The snapshot chain for above example will be: base <-- snap1 <-- snap2 ^ \| working VDI Cc: qemu-devel@nongnu.org Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-26 13:37:51 +02:00
MORITA Kazutaka	13c31de2fd	sheepdog: resend write requests when SD_RES_READONLY is received When a snapshot is taken from out side of qemu (e.g. qemu-img snapshot), write requests to the current vdi return SD_RES_READONLY. In this case, the sheepdog block driver needs to update the current inode to the latest one and resend the write requests. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-26 13:26:28 +02:00
MORITA Kazutaka	9ff53a0eb8	sheepdog: add helper function to reload inode This adds a helper function to update the current inode state with the specified vdi object. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-26 13:26:28 +02:00
MORITA Kazutaka	6a0b549033	sheepdog: add SD_RES_READONLY result code Sheepdog returns SD_RES_READONLY when qemu sends write requests to the snapshot vdi. This adds the result code and makes sd_strerror() print its error reason. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-26 13:26:28 +02:00
MORITA Kazutaka	982dcbf4cb	sheepdog: cleanup find_vdi_name This makes 'filename' and 'tag' constant variables, and renames 'for_snapshot' to 'lock' to clear how it works. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-26 13:26:28 +02:00
Kevin Wolf	c3ca988d2b	rbd: Fix use after free in rbd_open() Commit `a9ccedc3` frees the QemuOpts for the driver-specific options immediately, even though it still needs the filename string that is contained there. This doesn't work. Move the deletion of the QemuOpts to the end of the function where its content isn't needed any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-26 13:26:28 +02:00
Liu Yuan	8d71c63137	sheepdog: implement .bdrv_co_is_allocated() Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-26 13:26:28 +02:00
Liu Yuan	e8bfaa2fae	sheepdog: use BDRV_SECTOR_SIZE Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-26 13:26:28 +02:00
Liu Yuan	cac8f4a60f	sheepdog: add discard/trim support for sheepdog The 'TRIM' command from VM that is to release underlying data storage for better thin-provision is already supported by the Sheepdog. This patch adds the TRIM support at QEMU part. For older Sheepdog that doesn't support it, we return 0(success) to upper layer. Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-26 13:26:27 +02:00
Anthony Liguori	f1ab7a5acf	Merge remote-tracking branch 'kwolf/for-anthony' into staging # By Kevin Wolf (16) and Stefan Hajnoczi (4) # Via Kevin Wolf * kwolf/for-anthony: qemu-iotests: add 053 unaligned compressed image size test block: Allow overriding backing.file.filename block: Remove filename parameter from .bdrv_file_open() vvfat: Use bdrv_open options instead of filename sheepdog: Use bdrv_open options instead of filename rbd: Use bdrv_open options instead of filename iscsi: Use bdrv_open options instead of filename gluster: Use bdrv_open options instead of filename curl: Use bdrv_open options instead of filename blkverify: Use bdrv_open options instead of filename blkdebug: Use bdrv_open options instead of filename raw-win32: Use bdrv_open options instead of filename raw-posix: Use bdrv_open options instead of filename block: Enable filename option block: Add driver-specific options for backing files block: Fail gracefully when using a format driver on protocol level qemu-iotests: Fix _filter_qemu qemu-img: do not zero-pad the compressed write buffer qcow: allow sub-cluster compressed write to last cluster qcow2: allow sub-cluster compressed write to last cluster Message-id: 1366630294-18984-1-git-send-email-kwolf@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-04-22 08:08:22 -05:00
Anthony Liguori	25690739f1	Merge remote-tracking branch 'bonzini/nbd-next' into staging # By Stefan Hajnoczi # Via Paolo Bonzini * bonzini/nbd-next: nbd: set TCP_NODELAY nbd: use TCP_CORK in nbd_co_send_request() nbd: unlock mutex in nbd_co_send_request() error path Message-id: 1366381830-11267-1-git-send-email-pbonzini@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-04-22 08:05:14 -05:00
Kevin Wolf	56d1b4d21d	block: Remove filename parameter from .bdrv_file_open() It is unused now in all block drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 11:34:35 +02:00
Kevin Wolf	7ad9be64e8	vvfat: Use bdrv_open options instead of filename Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	c8c96350e0	sheepdog: Use bdrv_open options instead of filename This is only to convert the internal interface that is used for passing the "filename" to be parsed, but converting to actual fine grained options is left for another day, as it doesn't look trivial. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	a9ccedc3da	rbd: Use bdrv_open options instead of filename This is only to convert the internal interface that is used for passing the "filename" to be parsed, but converting to actual fine grained options is left for another day, as it doesn't look trivial. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	60beb3412d	iscsi: Use bdrv_open options instead of filename This is only to convert the internal interface that is used for passing the "filename" to be parsed, but converting to actual fine grained options is left for another day, as it doesn't look trivial. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	b489477653	gluster: Use bdrv_open options instead of filename This is only to convert the internal interface that is used for passing the "filename" to be parsed, but converting to actual fine grained options is left for another day, as it doesn't look trivial. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	8e6d58cd5b	curl: Use bdrv_open options instead of filename As a bonus, going through the QemuOpts QEMU_OPT_SIZE parser for the readahead option gives us proper error reporting that the previous use of atoi() lacked. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	16c790926b	blkverify: Use bdrv_open options instead of filename Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	f468121290	blkdebug: Use bdrv_open options instead of filename Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	8a79380b8e	raw-win32: Use bdrv_open options instead of filename Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	c66a615723	raw-posix: Use bdrv_open options instead of filename Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	31ca6d077c	block: Add driver-specific options for backing files Options starting in "backing." are passed to the backing file now. If you don't need to specify the filename for the backing file, you can add it on the command line instead of in the image file: $ qemu-nbd -t /tmp/test.img $ qemu-img create -f qcow2 empty.qcow2 1G $ qemu-system-x86_64 -drive file=empty.qcow2,backing.file.driver=nbd,\ backing.file.host=localhost Note that this doesn't override the backing filename from the image. If the image has one, this will fail because NBD doesn't want the options and a filename at the same time. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Stefan Hajnoczi	16b3c5cd9f	qcow: allow sub-cluster compressed write to last cluster Compression in qcow requires image length to be a multiple of the cluster size. Lift this requirement by zero-padding the final cluster when necessary. The virtual disk size is still not cluster-aligned, so the guest cannot access the zero sectors. Note that this is almost identical to the qcow2 version of this code. qcow2's compression code is drawn from qcow. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-04-22 10:27:58 +02:00
Stefan Hajnoczi	f4d38bef7c	qcow2: allow sub-cluster compressed write to last cluster Compression in qcow2 requires image length to be a multiple of the cluster size. Lift this requirement by zero-padding the final cluster when necessary. The virtual disk size is still not cluster-aligned, so the guest cannot access the zero sectors. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-04-22 10:27:58 +02:00
Richard W.M. Jones	c7a101f529	ssh: Remove unnecessary use of strlen function. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-19 11:45:38 +02:00
Stefan Weil	6ae7d660a0	block/ssh: Add missing gcc format attributes Now gcc will check whether format string and variable arguments match. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-19 11:44:16 +02:00
Stefan Hajnoczi	97ebbab0e3	nbd: set TCP_NODELAY Disable the Nagle algorithm to reduce latency. Note this means we must also use TCP_CORK when sending header followed by payload to avoid fragmenting lots of little packets. The previous patch took care of that. Suggested-by: Nick Thomas <nick@bytemark.co.uk> Tested-by: Nick Thomas <nick@bytemark.co.uk> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-04-15 16:35:17 +02:00
Stefan Hajnoczi	0fcece25c0	nbd: use TCP_CORK in nbd_co_send_request() Use TCP_CORK to defer packet transmission until both the header and the payload have been written. Suggested-by: Nick Thomas <nick@bytemark.co.uk> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-04-15 16:30:40 +02:00
Stefan Hajnoczi	6760c47aa4	nbd: unlock mutex in nbd_co_send_request() error path Cc: qemu-stable@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-04-15 16:29:46 +02:00
Josh Durgin	dc7588c1eb	rbd: add an asynchronous flush The existing bdrv_co_flush_to_disk implementation uses rbd_flush(), which is sychronous and causes the main qemu thread to block until it is complete. This results in unresponsiveness and extra latency for the guest. Fix this by using an asynchronous version of flush. This was added to librbd with a special #define to indicate its presence, since it will be backported to stable versions. Thus, there is no need to check the version of librbd. Implement this as bdrv_aio_flush, since it matches other aio functions in the rbd block driver, and leave out bdrv_co_flush_to_disk when the asynchronous version is available. Reported-by: Oliver Francke <oliver@filoo.de> Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-15 10:18:05 +02:00
Richard W.M. Jones	9a2d462e7b	block: ssh: Use libssh2_sftp_fsync (if supported by libssh2) to flush to disk. libssh2_sftp_fsync is an extension to libssh2 to support fsync(2) over sftp, which is itself an extension of OpenSSH. If both libssh2 and the ssh daemon support it, this will allow bdrv_flush_to_disk to commit changes through to disk on the remote server. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-15 10:18:05 +02:00
Richard W.M. Jones	0a12ec87a5	block: Add support for Secure Shell (ssh) block device. qemu-system-x86_64 -drive file=ssh://hostname/some/image QEMU will ssh into 'hostname' and open '/some/image' which is made available as a standard block device. You can specify a username (ssh://user@host/...) and/or a port number (ssh://host:port/...). You can also use an alternate syntax using properties (file.user, file.host, file.port, file.path). Current limitations: - Authentication must be done without passwords or passphrases, using ssh-agent. Other authentication methods are not supported. - Uses a single connection, instead of concurrent AIO with multiple SSH connections. This is implemented using libssh2 on the client side. The server just requires a regular ssh daemon with sftp-server support. Most ssh daemons on Unix/Linux systems will work out of the box. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-15 10:18:05 +02:00
Kevin Wolf	8d3b1a2d0b	block: Introduce bdrv_pwritev() for qcow2_save_vmstate Directly pass the QEMUIOVector on instead of linearising it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-15 08:26:18 +02:00
Kevin Wolf	cf8074b382	block: Introduce bdrv_writev_vmstate Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-15 08:26:18 +02:00
Aurelien Jarno	753d9b82c5	aes: move aes.h from include/block to include/qemu Move aes.h from include/block to include/qemu to show it can be reused by other subsystems. Cc: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@gmail.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2013-04-13 13:51:57 +02:00
Paolo Bonzini	0d09e41a51	hw: move headers to include/ Many of these should be cleaned up with proper qdev-/QOM-ification. Right now there are many catch-all headers in include/hw/ARCH depending on cpu.h, and this makes it necessary to compile these files per-target. However, fixing this does not belong in these patches. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-04-08 18:13:10 +02:00
Kevin Wolf	c2b6ff51e4	qcow2: Fix L1 write error handling in qcow2_update_snapshot_refcount It ignored the error code, and at least the 'goto fail' is obvious nonsense as it creates an endless loop (if the next attempt doesn't magically succeed) and leaves the in-memory L1 table in big-endian instead of converting it back. In error cases, there's no point in writing an updated L1 table, so skip this part for them. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-04-05 18:58:05 +02:00
Kevin Wolf	c2bc78b6a9	qcow2: Return real error in qcow2_update_snapshot_refcount This fixes the error message triggered by the following script: cat > /tmp/blkdebug.cfg <<EOF [inject-error] event = "cluster_free" errno = "28" immediately = "off" EOF $qemu_img create -f qcow2 test.qcow2 10G $qemu_img snapshot -c snap test.qcow2 $qemu_img snapshot -d snap blkdebug:/tmp/blkdebug.cfg:test.qcow2 Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-04-05 18:58:05 +02:00
Stefan Hajnoczi	f9e8cacc55	oslib-posix: rename socket_set_nonblock() to qemu_set_nonblock() The fcntl(fd, F_SETFL, O_NONBLOCK) flag is not specific to sockets. Rename to qemu_set_nonblock() just like qemu_set_cloexec(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2013-04-02 11:47:37 -04:00
Kevin Wolf	ecdd5333ab	qcow2: Gather clusters in a looping loop Instead of just checking once in exactly this order if there are dependendies, non-COW clusters and new allocation, this starts looping around these. This way we can, for example, gather non-COW clusters after new allocations as long as the host cluster offsets stay contiguous. Once handle_dependencies() is extended so that COW areas of in-flight allocations can be overwritten, this allows to continue with gathering other clusters (we wouldn't be able to do that without this change because we would have missed a possible second dependency in one of the next clusters). This means that in the typical sequential write case, we can combine the COW overwrite of one cluster with the allocation of the next cluster as soon as something like Delayed COW gets actually implemented. It is only by avoiding splitting requests this way that Delayed COW actually starts improving performance noticably. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:44 +01:00
Kevin Wolf	2c3b32d256	qcow2: Move cluster gathering to a non-looping loop This patch is mainly to separate the indentation change from the semantic changes. All that really changes here is that everything moves into a while loop, all 'goto done' become 'break' and at the end of the loop a new 'break is inserted. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:44 +01:00
Kevin Wolf	88c6588c51	qcow2: Allow requests with multiple l2metas Instead of expecting a single l2meta, have a list of them. This allows to still have a single I/O request for the guest data, even though multiple l2meta may be needed in order to describe both a COW overwrite and a new cluster allocation (typical sequential write case). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:44 +01:00
Kevin Wolf	710c2496d8	qcow2: Use byte granularity in qcow2_alloc_cluster_offset() This gets rid of the nb_clusters and keep_clusters and the associated complicated calculations. Just advance the number of bytes that have been processed and everything is fine. This patch advances the variables even after the last operation even though they aren't used any more afterwards to make things look more uniform. A later patch will turn the whole thing into a loop and then it actually starts making sense. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:44 +01:00
Kevin Wolf	411d62b04b	qcow2: Prepare handle_alloc/copied() for byte granularity This makes handle_alloc() and handle_copied() return byte-granularity host offsets instead of returning always the cluster start. This is required so that qcow2_alloc_cluster_offset() can stop aligning everything to cluster boundaries. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	e62daaf679	qcow2: handle_copied(): Implement non-zero host_offset Look only for clusters that start at a given physical offset. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	c53ede9f6d	qcow2: handle_copied(): Get rid of keep_clusters parameter Now *bytes is used to return the length of the area that can be written to without performing an allocation or COW. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	acb0467f8d	qcow2: handle_copied(): Get rid of nb_clusters parameter handle_copied() uses its bytes parameter now to determine how many clusters it should try to find. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	0af729ec00	qcow2: Factor out handle_copied() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	83baa9a471	qcow2: Clean up handle_alloc() Things can be simplified a bit now. No semantic changes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	c37f4cd71d	qcow2: Finalise interface of handle_alloc() The interface works completely on a byte granularity now and duplicated parameters are removed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	3b8e2e260c	qcow2: handle_alloc(): Get rid of keep_clusters parameter handle_alloc() is now called with the offset at which the actual new allocation starts instead of the offset at which the whole write request starts, part of which may already be processed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	f5bc635094	qcow2: handle_alloc(): Get rid of nb_clusters parameter We already communicate the same information in *bytes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	10f0ed8b2f	qcow2: Factor out handle_alloc() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	037689d896	qcow2: Decouple cluster allocation from cluster reuse code This moves some code that prepares the allocation of new clusters to where the actual allocation happens. This is the minimum required to be able to move it to a separate function in the next patch. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	65eb2e35c0	qcow2: Change handle_dependency to byte granularity This is a more precise description of what really constitutes a dependency. The behaviour doesn't change at this point because the COW area of the old request is still aligned to cluster boundaries and therefore an overlap is detected wheneven the requests touch any part of the same cluster. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	d9d74f4177	qcow2: Improve check for overlapping allocations The old code detected an overlapping allocation even when the allocations didn't actually overlap, but were only adjacent. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	17a71e5823	qcow2: Handle dependencies earlier Handling overlapping allocations isn't just a detail of cluster allocation. It is rather one of three ways to get the host cluster offset for a write request: 1. If a request overlaps an in-flight allocations, the cluster offset can be taken from there (this is what handle_dependencies will evolve into) or the request must just wait until the allocation has completed. Accessing the L2 is not valid in this case, it has outdated information. 2. Outside overlapping areas, check the clusters that can be written to as they are, with no COW involved. 3. If a COW is required, allocate new clusters Changing the code to reflect this doesn't change the behaviour because overlaps cannot exist for clusters that are kept in step 2. It does however make it easier for later patches to work on clusters that belong to an allocation that is still in flight. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:42 +01:00
Kevin Wolf	9ee6439e27	qcow2: Remove bogus unlock of s->lock The unlock wakes up the next coroutine, but the currently running coroutine will lock it again before it yields, so this doesn't make a lot of sense. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:42 +01:00
Kevin Wolf	c349ca4bb2	qcow2: Fix "total clusters" number in bdrv_check This should be based on the virtual disk size, not on the size of the image. Interesting observation: With some VM state stored in the image file, percentages higher than 100% are possible, even though snapshots themselves are ignored. This is a qcow2 bug to be fixed another day: The VM state should be discarded in the active L2 tables after completing the snapshot creation. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:42 +01:00
Stefan Weil	ea804cadf8	block: Add options QDict to bdrv_file_open() prototypes (fix MinGW build) The new parameter is unused yet. This part was missing in commit `787e4a8500`. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-25 09:53:04 +01:00
Liu Yuan	d43731c758	rbd: fix compile error Commit `787e4a85` [block: Add options QDict to bdrv_file_open() prototypes] didn't update rbd.c accordingly. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-25 09:51:43 +01:00
Kevin Wolf	681e7ad024	nbd: Check against invalid option combinations A file name may only specified if no host or socket path is specified. The latter two may not appear at the same time either. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:32 +01:00
Kevin Wolf	bebbf7fa9c	nbd: Use default port if only host is specified The URL method already takes care to apply the default port when none is specfied. Directly specifying driver-specific options required the port number until now. Allow leaving it out and apply the default. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:32 +01:00
Kevin Wolf	f5866fa438	block: Make find_image_format safe with NULL filename In order to achieve this, the .bdrv_probe callbacks of all drivers must cope with this. The DMG driver is the only one that bases its decision on the filename and it needs to be changed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:32 +01:00
Kevin Wolf	6963a30d82	block: Introduce .bdrv_parse_filename callback If a driver needs structured data and not just a string, it can provide a .bdrv_parse_filename callback now that parses the command line string into separate options. Keeping this separate from .bdrv_open_filename ensures that the preferred way of directly specifying the options always works as well if parsing the string works. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:32 +01:00
Kevin Wolf	f53a1febcd	nbd: Accept -drive options for the network connection The existing parsers for the file name now parse everything into the bdrv_open() options QDict. Instead of using these parsers, you can now directly specify the options on the command line, like this: qemu-system-x86_64 -drive file=nbd:,file.port=1234,file.host=::1 Clearly the file=... part could use further improvement, but it's a start. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:32 +01:00
Kevin Wolf	f17c90bed1	nbd: Keep hostname and port separate The NBD block supports an URL syntax, for which a URL parser returns separate hostname and port fields. It also supports the traditional qemu syntax encoded in a filename. Until now, after parsing the URL to get each piece of information, a new string is built to be fed to socket functions. Instead of building a string in the URL case that is immediately parsed again, parse the string in both cases and use the QemuOpts interface to qemu-sockets.c. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:31 +01:00
Kevin Wolf	787e4a8500	block: Add options QDict to bdrv_file_open() prototypes The new parameter is unused yet. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:31 +01:00
Kevin Wolf	acdfb480ba	qcow2: Fix segfault in qcow2_invalidate_cache Need to pass an options QDict to qcow2_open() now. This fixes a segfault on the migration target with qcow2. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-19 11:48:36 +01:00
Liu Yuan	fca23f0ad2	sheepdog: show error message for halt status Sheepdog (neither quorum nor unsafe mode) will refuse to serve IO requests when number of alive nodes is less than that of copies specified by users. This will return 0x19 to QEMU client which currently doesn't recognize it. This patch adds an error description when QEMU client receives it, other than plainly printing 'Invalid error code' Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-19 11:48:36 +01:00
Stefan Hajnoczi	c4d9d19645	threadpool: drop global thread pool Now that each AioContext has a ThreadPool and the main loop AioContext can be fetched with bdrv_get_aio_context(), we can eliminate the concept of a global thread pool from thread-pool.c. The submit functions must take a ThreadPool* argument. block/raw-posix.c and block/raw-win32.c use aio_get_thread_pool(bdrv_get_aio_context(bs)) to fetch the main loop's ThreadPool. tests/test-thread-pool.c must be updated to reflect the new thread_pool_submit() function prototypes. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>	2013-03-15 16:07:51 +01:00
MORITA Kazutaka	ed9ba72467	sheepdog: set io_flush handler in do_co_req If an io_flush handler is not set, qemu_aio_wait doesn't invoke callbacks. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-15 16:07:50 +01:00
MORITA Kazutaka	0d6db300cd	sheepdog: use non-blocking fd in coroutine context Using a blocking socket in the coroutine context reduces the chance of switching to other work. This patch makes the sheepdog driver use a non-blocking fd always. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-15 16:07:50 +01:00
Paolo Bonzini	381b487d54	qcow2: make is_allocated return true for zero clusters Otherwise, live migration of the top layer will miss zero clusters and let the backing file show through. This also matches what is done in qed. QCOW2_CLUSTER_ZERO clusters are invalid in v2 image files. Check this directly in qcow2_get_cluster_offset instead of replicating the test everywhere. Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-15 16:07:50 +01:00
Stefan Hajnoczi	3647917919	qcow2: drop unnecessary flush in qcow2_update_snapshot_refcount() We already flush when the function completes. There is no need to flush after every compressed cluster. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-15 16:07:50 +01:00
Stefan Hajnoczi	f9cb2860bd	qcow2: drop flush in update_cluster_refcount() The update_cluster_refcount() function increments/decrements a cluster's refcount and then returns the new refcount value. There is no need to flush since both update_cluster_refcount() callers already take care of this: 1. qcow2_alloc_bytes() calls update_cluster_refcount() when compressed sectors will be appended to an existing cluster with enough free space. qcow2_alloc_bytes() already flushes so there is no need to do so in update_cluster_refcount(). 2. qcow2_update_snapshot_refcount() sets a cache dependency on refcounts if it needs to update L2 entries. It also flushes before completing. Removing this flush significantly speeds up qcow2 snapshot creation: $ qemu-img create -f qcow2 test.qcow2 -o size=50G,preallocation=metadata $ time qemu-img snapshot -c new test.qcow2 Time drops from more than 3 minutes to under 1 second. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-15 16:07:50 +01:00
Stefan Hajnoczi	2154f24e4e	qcow2: flush in qcow2_update_snapshot_refcount() Users of qcow2_update_snapshot_refcount() do not flush consistently. qcow2_snapshot_create() flushes but qcow2_snapshot_goto() and qcow2_snapshot_delete() do not. Solve this by moving the bdrv_flush() into qcow2_update_snapshot_refcount(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-15 16:07:50 +01:00
Stefan Hajnoczi	c1f5bafd70	qcow2: set L2 cache dependency in qcow2_alloc_bytes() Compressed writes use qcow2_alloc_bytes() to allocate space with byte granularity. The affected clusters' refcounts will be incremented but we do not need to flush yet. Set a L2 cache dependency on the refcount block cache, so that the refcounts get written out before the L2 updates. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-15 16:07:50 +01:00
Stefan Hajnoczi	f6977f1556	qcow2: flush refcount cache correctly in qcow2_write_snapshots() Since qcow2 metadata is cached we need to flush the caches, not just the underlying file. Use bdrv_flush(bs) instead of bdrv_flush(bs->file). Also add the error return path when bdrv_flush() fails and move the flush after checking for qcow2_alloc_clusters() failure so that the qcow2_alloc_clusters() error return value takes precedence. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-15 16:07:50 +01:00
Stefan Hajnoczi	9991923b26	qcow2: flush refcount cache correctly in alloc_refcount_block() update_refcount() affects the refcount cache, it does not write to disk. Therefore bdrv_flush(bs->file) does nothing. We need to flush the refcount cache in order to write out the refcount updates! While we're here also add error returns when qcow2_cache_flush() fails. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-15 16:07:49 +01:00
Kevin Wolf	74c4510a3c	qcow2: Allow lazy refcounts to be enabled on the command line qcow2 images now accept a boolean lazy_refcounts options. Use it like this: -drive file=test.qcow2,lazy_refcounts=on If the option is specified on the command line, it overrides the default specified by the qcow2 header flags that were set when creating the image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-15 16:07:49 +01:00

... 3 4 5 6 7 ...

1280 Commits