mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Max Reitz	4a273c398b	qcow2: Add more overlap check bitmask macros Introduces the macros QCOW2_OL_CONSTANT and QCOW2_OL_ALL in addition to the already existing QCOW2_OL_CACHED, signifying all metadata overlap checks that can be performed in constant time (regardless of image size etc.) and truly all available overlap checks, respectively. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:50:00 +02:00
Max Reitz	4092e99d93	qcow2: Array assigning options to OL check bits Add an array which assigns the option string to its corresponding overlap check bit. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:50:00 +02:00
Max Reitz	05de7e86ca	qcow2: Add overlap-check options Add runtime options to tune the overlap checks to be performed before write accesses. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:50:00 +02:00
Max Reitz	3e3553905c	qcow2: Make overlap check mask variable Replace the QCOW2_OL_DEFAULT macro by a variable overlap_check in BDRVQcowState. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:50:00 +02:00
Max Reitz	231bb26764	qcow2: Use negated overflow check mask In qcow2_check_metadata_overlap and qcow2_pre_write_overlap_check, change the parameter signifying the checks to perform from its current positive form to a negative one, i.e., it will no longer explicitly specify every check to perform but rather a mask of checks not to perform. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:50:00 +02:00
Max Reitz	00c49b21e7	qcow2: Use better type for numerical snapshot ID When trying to find a new snapshot ID, the existing ones are converted to integers using strtoul. This function returns an unsigned long, therefore its result should be saved in an unsigned long as well. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:50:00 +02:00
Max Reitz	84757f7e67	qcow2: Fix snapshot restoration in snapshot_create If the new snapshot table could not be written in qcow2_snapshot_create, the old snapshot table has to be restored in memory and the new one released. This should include restoration of the old snapshot count as well, which is added by this patch. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:50:00 +02:00
Max Reitz	f9bff97143	qcow2: Remove wrong metadata overlap check In qcow2_write_compressed, if the compression fails, a normal cluster is written to disk. This is done through bdrv_write on the qcow2 BDS itself (using the guest offset), thus it is wrong to do a metadata overlap check before. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:49:59 +02:00
Max Reitz	9e3f08923a	qcow2: Add missing space in error message The error message in qcow2_downgrade about an unsupported refcount order is missing a space. This patch adds it. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:49:59 +02:00
Benoît Canet	f6186f49e2	block: Add BlockDriver.bdrv_check_ext_snapshot. This field is used by blkverify to disable external snapshots creation. It will also be used by block filters like quorum to disable external snapshot creation. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:49:59 +02:00
Peter Lieven	92bc50a5ad	block/get_block_status: avoid redundant callouts on raw devices if a raw device like an iscsi target or host device is used the current implementation makes a second call out to get the block status of bs->file. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:49:59 +02:00
Max Reitz	88fb153512	qcow2: Assert against snapshot name/ID overflow qcow2_write_snapshots relies on the length of every snapshot ID and name fitting into an unsigned 16 bit integer. This is currently ensured by QEMU through generally only allowing 128 byte IDs and 256 byte names. However, if this should change in the future, the length written to the image file should not be silently truncated (though the name itself would be written completely). Since this is currently not an issue but might require attention due to internal QEMU changes in the future, an assert ensuring sanity is enough for now. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:49:59 +02:00
Max Reitz	9186ad9658	qcow2: Free allocated snapshot table on error If an error occurs during qcow2_write_snapshots, the newly allocated snapshot table clusters are leaked and should thus be freed. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:49:59 +02:00
Max Reitz	37d41f0a04	qcow2: Always use error path on writing snapshots qcow2_write_snapshots does contain a fail label and there is no reason not to use it on some errors; therefore, we should always jump there on error. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:49:59 +02:00
Max Reitz	8f730dd24e	qcow2: Free preallocated zero clusters In qcow2_free_any_clusters, preallocated zero clusters should be freed just as normal clusters are. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:49:59 +02:00
Max Reitz	998b959c1e	qcow2: Use pread for inactive L1 in overlap check Currently, qcow2_check_metadata_overlap uses bdrv_read to read inactive L1 tables from disk. The number of sectors to read is calculated through a truncating integer division, therefore, if the L1 table size is not a multiple of the sector size, the final entries will not be read and their entries in memory remain undefined (from the g_malloc). Using bdrv_pread fixes this. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:49:59 +02:00
Max Reitz	37764dfb71	qcow2: Add support for ImageInfoSpecific Add a new ImageInfoSpecificQCow2 type as a subtype of ImageInfoSpecific. This contains the compatibility level as a string and an optional lazy_refcounts boolean (optional means mandatory for compat >= 1.1 and not available for compat == 0.10). Also, add qcow2_get_specific_info, which returns this information. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 14:03:57 +02:00
Max Reitz	a8d8ecb77f	block/qapi: Human-readable ImageInfoSpecific dump Add a function for generically dumping the ImageInfoSpecific information in a human-readable format to block/qapi.c. Use this function in bdrv_image_info_dump and qemu-io-cmds.c:info_f to allow qemu-img info resp. qemu-io -c info to print that format specific information. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 10:52:54 +02:00
Max Reitz	eae041fe6f	block: Add bdrv_get_specific_info Add a function for retrieving an ImageInfoSpecific object from a block driver. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 10:52:54 +02:00
Fam Zheng	79e14bf778	qapi: make use of new BlockJobType Switch the string to enum type BlockJobType in BlockJobDriver. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 10:52:54 +02:00
Fam Zheng	3fc4b10af0	blockjob: rename BlockJobType to BlockJobDriver We will use BlockJobType as the enum type name of block jobs in QAPI, rename current BlockJobType to BlockJobDriver, which will eventually become a set of operations, similar to block drivers. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 10:52:54 +02:00
Anthony Liguori	634ebf4b17	Merge remote-tracking branch 'bonzini/scsi-next' into staging # By Asias He (1) and Peter Lieven (1) # Via Paolo Bonzini * bonzini/scsi-next: scsi: Allocate SCSITargetReq r->buf dynamically [CVE-2013-4344] block/iscsi: reenable iscsi_co_get_block_status Message-id: 1381332391-8781-1-git-send-email-pbonzini@redhat.com Signed-off-by: Anthony Liguori <aliguori@amazon.com>	2013-10-10 10:03:00 -07:00
Peter Lieven	24c7608a5d	block/iscsi: reenable iscsi_co_get_block_status Commit `f35c934a` accidently disabled iscsi_co_get_block_status for all libiscsi versions. Its not possible to check for enumeration constants in the C preprocessor. This patch changes the check to the preprocessor constant LIBISCSI_FEATURE_IOVECTOR which was introduced shortly after get_lba_status support was added to libiscsi. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-10-09 10:43:42 +02:00
Max Reitz	e3b21ef9e0	qcow2: Free allocated L2 cluster on error If an error occurs in l2_allocate, the allocated (but unused) L2 cluster should be freed. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-10-07 13:23:19 +02:00
Max Reitz	fda74f826b	qcow2: Switch L1 table in a single sequence Switching the L1 table in memory should be an atomic operation, as far as possible. Calling qcow2_free_clusters on the old L1 table on disk is not a good idea when the old L1 table is no longer valid and the address to the new one hasn't yet been written into the corresponding BDRVQcowState field. To be more specific, this can lead to segfaults due to qcow2_check_metadata_overlap trying to access the L1 table during the free operation. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-10-02 15:38:29 +02:00
Jeff Cody	5641bf4056	block: vhdx - add migration blocker This blocks migration for VHDX image files, until the functionality can be supported. Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-10-02 15:24:39 +02:00
Max Reitz	db0749012b	qcow2: CHECK_OFLAG_COPIED is obsolete CHECK_OFLAG_COPIED as a parameter to check_refcounts_l1 and check_refcounts_l2 is obselete now, since the OFLAG_COPIED consistency check is actually no longer performed by these functions (but by check_oflag_copied). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-10-02 11:40:41 +02:00
Max Reitz	1e242b5544	qcow2: Correct endianness in overlap check If an inactive L1 table is loaded from disk, its entries are in big endian and have to be converted to host byte order before using them. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-10-02 11:06:35 +02:00
Kevin Wolf	61653008ad	qcow2: Remove useless count_contiguous_clusters() parameter All callers pass start = 0, and it's doubtful if any other value would actually do what you expect. Remove the parameter. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2013-09-27 17:22:43 +02:00
Max Reitz	22f0dd29af	qcow2: COMPRESSED on count_contiguous_clusters Compressed clusters can never be contiguous, therefore the corresponding flag does not need to be given explicitly to count_contiguous_clusters. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-27 17:22:43 +02:00
Max Reitz	15684a4742	qcow2: count_contiguous_clusters and compression The function is not intended to be used on compressed clusters and will not work correctly, if used anyway, since L2E_OFFSET_MASK is not the right mask for determining the offset of compressed clusters. Therefore, assert that the first cluster is not compressed and always include the compression flag in the mask of significant flags, i.e., stop the search as soon as a compressed cluster occurs. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-27 17:22:43 +02:00
Max Reitz	320c706666	qcow2: Free only newly allocated clusters on error In expand_zero_clusters_in_l1, a new cluster is only allocated if it was not already preallocated. On error, such preallocated clusters should not be freed, but only the newly allocated ones. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-27 17:22:43 +02:00
Max Reitz	be0b742ee3	qcow2: Always use error path in l2_allocate Just returning -errno in some cases prevents trace_qcow2_l2_allocate_done from being executed (and, in one case, also the unused allocated L2 table from being freed). Always going down the error path fixes this. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-27 17:22:43 +02:00
Max Reitz	8585afd813	qcow2: Don't put invalid L2 table into cache In l2_allocate, the fail path is executed if qcow2_cache_flush fails. However, the L2 table has not yet been fetched from the L2 table cache. The qcow2_cache_put in the fail path therefore basically gives an undefined argument as the L2 table address (in this case). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-27 11:31:59 +02:00
Max Reitz	e390cf5a97	qcow2: Correct bitmap size in zero expansion Since the expanded_clusters bitmap is addressed using host offsets in the underlying image file, the correct size to use for allocating the bitmap is not determined by the guest disk image but by the underlying host image file. Furthermore, this size may change during the expansion due to cluster allocations on growable image files. In this case, the bitmap needs to be resized as well to reflect the growth. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-27 11:16:35 +02:00
Max Reitz	c01dbccbad	qcow2: Assert against currently impossible overflow If qcow2_alloc_cluster_link_l2 is called with a QCowL2Meta describing a request crossing L2 boundaries, a buffer overflow will occur. This is impossible right now since such requests are never generated (every request is shortened to L2 boundaries before) and probably also completely unintended (considering the name "QCowL2Meta"), however, it is still worth an assertion. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 21:57:44 +02:00
Jeff Cody	687fb89366	block: qed - use QEMU_PACKED for on-disk structures QEDHeader is read, and written, directly from on-disk images via bdrv_pread()/write(). To avoid any unintentional padding, these structs should be packed. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 20:51:15 +02:00
Jeff Cody	c4217f645d	block: qcow2 - used QEMU_PACKED for on-disk structures QCowHeader and QCowExtension are structs that reside in the on-disk image format, and are read and written directly via bdrv_pread()/write(), and as such should be packed to avoid any unintentional struct padding. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 20:51:13 +02:00
Jeff Cody	e54835c06d	block: vpc - use QEMU_PACKED for on-disk structures The VHD footer and header structs (vhd_footer and vhd_dyndisk_header) are on-disk structures for the image format, and as such should be packed. Go ahead and make these typedefs as well, with the preferred QEMU naming convention, so that the packed attribute is used consistently with the struct. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 20:51:10 +02:00
Jeff Cody	8368febd81	block: vdi - use QEMU_PACKED for on-disk structures The header struct VdiHeader is an on-disk structure for the image format, and as such should be packed. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 20:51:05 +02:00
Stefan Hajnoczi	9e6337d081	rbd: avoid qemu_rbd_snap_list() memory leaks When there are no snapshots qemu_rbd_snap_list() returns 0 and the snapshot table pointer is NULL. Don't forget to free the snaps buffer we allocated for librbd rbd_snap_list(). When the function succeeds don't forget to free the snaps buffer after calling rbd_snap_list_end(). Cc: qemu-stable@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 16:22:00 +02:00
Stefan Weil	c3e4f43a99	block: Fix compiler warning (-Werror=uninitialized) The patch fixes a warning from gcc (Debian 4.6.3-14+rpi1) 4.6.3: block/stream.c:141:22: error: ‘copy’ may be used uninitialized in this function [-Werror=uninitialized] This is not a real bug - a better compiler would not complain. Now 'copy' has always a defined value, so the check for ret >= 0 can be removed. Signed-off-by: Stefan Weil <sw@weilnetz.de> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 16:21:28 +02:00
Benoît Canet	030be32184	block: introduce BlockDriver.bdrv_needs_filename to enable some drivers. Some drivers will have driver specifics options but no filename. This new bool allow the block layer to treat them correctly. The .bdrv_needs_filename is set in drivers not having .bdrv_parse_filename and not having .bdrv_open. The first exception to this rule will be the quorum driver. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 16:21:28 +02:00
Fam Zheng	301c7d38a0	vmdk: fix cluster size check for flat extents We use the extent size as cluster size for flat extents (where no L1/L2 table is allocated so it's safe) reuse sector calculating code with sparse extents. Don't pass in the cluster size for adding flat extent, just set it to sectors later, then the cluster size checking will not fail. The cluster_sectors is changed to int64_t to allow big flat extent. Without this, flat extent opening is broken: # qemu-img create -f vmdk -o subformat=monolithicFlat /tmp/a.vmdk 100G Formatting '/tmp/a.vmdk', fmt=vmdk size=107374182400 compat6=off subformat='monolithicFlat' zeroed_grain=off # qemu-img info /tmp/a.vmdk image: /tmp/a.vmdk file format: raw virtual size: 0 (0 bytes) disk size: 4.0K Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 16:21:28 +02:00
Max Reitz	7454d60045	qcow2: Don't shadow return value When trying to update the refcounts for a snapshot, the return value of update_refcount on a compressed cluster was pretty much ignored, cancelling the update on error but returning 0. This is caused by an inner "ret" variable shadowing the outer one (the latter is used in the return statement). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-25 10:08:56 +02:00
Anthony Liguori	16121fa39e	Merge remote-tracking branch 'stefanha/block' into staging # By Stefan Hajnoczi (4) and others # Via Stefan Hajnoczi * stefanha/block: virtio-blk: do not relay a previous driver's WCE configuration to the current blockdev: do not default cache.no-flush to true block: don't lose data from last incomplete sector qcow2: Correct snapshots size for overlap check coroutine: fix /perf/nesting coroutine benchmark coroutine: add qemu_coroutine_yield benchmark qemu-timer: do not take the lock in timer_pending qemu-timer: make qemu_timer_mod_ns() and qemu_timer_del() thread-safe qemu-timer: drop outdated signal safety comments osdep: warn if open(O_DIRECT) on fails with EINVAL libcacard: link against qemu-error.o for error_report() Message-id: 1379698931-946-1-git-send-email-stefanha@redhat.com	2013-09-23 11:53:05 -05:00
Anthony Liguori	f3ca508f00	Merge remote-tracking branch 'bonzini/scsi-next' into staging # By Hervé Poussineau (5) and Stefan Weil (1) # Via Paolo Bonzini * bonzini/scsi-next: block/iscsi: Drop iscsi_co_get_block_status for older versions of libiscsi lsi: add 53C810 variant lsi: remove todo lsi: ignore write accesses to CTEST0 registers lsi: check ssid versus sdid only if ssid is valid lsi: use constant name instead of its value	2013-09-23 11:52:32 -05:00
Max Reitz	0f39ac9a07	qcow2: Correct snapshots size for overlap check Using s->snapshots_size instead of snapshots_size for the metadata overlap check in qcow2_write_snapshots leads to the detection of an overlap with the main qcow2 image header when deleting the last snapshot, since s->snapshots_size has not yet been updated and is therefore non-zero. However, the offset returned by qcow2_alloc_clusters will be zero since snapshots_size is zero. Therefore, an overlap is detected albeit no such will occur. This patch fixes this by replacing s->snapshots_size by snapshots_size when calling qcow2_pre_write_overlap_check. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-20 12:48:03 +02:00
Stefan Weil	f35c934a5a	block/iscsi: Drop iscsi_co_get_block_status for older versions of libiscsi Debian wheezy includes libiscsi-dev 1.4.0 which does not provide SCSI_PROVISIONING_TYPE_DEALLOCATED. Drop iscsi_co_get_block_status in this case to allow compilation without errors. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-09-18 01:28:50 +02:00
Anthony Liguori	5dc11192b2	Merge remote-tracking branch 'kwolf/for-anthony' into staging # By Max Reitz (16) and others # Via Kevin Wolf * kwolf/for-anthony: (33 commits) qemu-iotests: Fix test 038 block: Assert validity of BdrvActionOps qemu-iotests: Cleanup test image in test number 007 qemu-img: fix invalid JSON coroutine: add ./configure --disable-coroutine-pool qemu-iotests: Adjustments due to error propagation qcow2: Use Error parameter qemu-img create: Emit filename on error block: Error parameter for create functions block: Error parameter for open functions bdrv: Use "Error" for creating images bdrv: Use "Error" for opening images qemu-iotests: add 057 internal snapshot for block device test case hmp: add interface hmp_snapshot_delete_blkdev_internal hmp: add interface hmp_snapshot_blkdev_internal qmp: add interface blockdev-snapshot-delete-internal-sync qmp: add interface blockdev-snapshot-internal-sync qmp: add internal snapshot support in qmp_transaction snapshot: distinguish id and name in snapshot delete snapshot: new function bdrv_snapshot_find_by_id_and_name() ... Message-id: 1379073063-14963-1-git-send-email-kwolf@redhat.com	2013-09-17 09:51:40 -05:00
Peter Lieven	65f3e33964	iscsi: split discard requests in multiple parts Replace .bdrv_aio_discard with .bdrv_co_discard so that discard requests can be split in multiple parts, each for a small amount of sectors. This is useful because we expose a generic API with no limit on the amount of sectors that can be unmapped in one request. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-09-12 13:14:19 +02:00
Max Reitz	3ef6c40ad0	qcow2: Use Error parameter Employ usage of the new Error ** parameter in qcow2_open, qcow2_create and associated functions. Signed-off-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:48 +02:00
Max Reitz	cc84d90ff5	block: Error parameter for create functions Add an Error ** parameter to bdrv_create and its associated functions to allow more specific error messages. Signed-off-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:48 +02:00
Max Reitz	34b5d2c68e	block: Error parameter for open functions Add an Error ** parameter to bdrv_open, bdrv_file_open and associated functions to allow more specific error messages. Signed-off-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:48 +02:00
Max Reitz	d5124c00d8	bdrv: Use "Error" for creating images Add an Error ** parameter to BlockDriver.bdrv_create to allow more specific error messages. Signed-off-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:48 +02:00
Max Reitz	015a1036a7	bdrv: Use "Error" for opening images Add an Error ** parameter to BlockDriver.bdrv_open and BlockDriver.bdrv_file_open to allow more specific error messages. Signed-off-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:47 +02:00
Wenchao Xia	a89d89d3e6	snapshot: distinguish id and name in snapshot delete Snapshot creation actually already distinguish id and name since it take a structured parameter sn, but delete can't. Later an accurate delete is needed in qmp_transaction abort and blockdev-snapshot-delete-sync, so change its prototype. Also errp is added to tip error, but return value is kepted to let caller check what kind of error happens. Existing caller for it are savevm, delvm and qemu-img, they are not impacted by introducing a new function bdrv_snapshot_delete_by_id_or_name(), which check the return value and do the operation again. Before this patch: For qcow2, it search id first then name to find the one to delete. For rbd, it search name. For sheepdog, it does nothing. After this patch: For qcow2, logic is the same by call it twice in caller. For rbd, it always fails in delete with id, but still search for name in second try, no change to user. Some code for *errp is based on Pavel's patch. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:47 +02:00
Wenchao Xia	2ea1dd758c	snapshot: new function bdrv_snapshot_find_by_id_and_name() To make it clear about id and name in searching, add this API to distinguish them. Caller can choose to search by id or name, *errp will be set only for exception. Some code are modified based on Pavel's patch. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:47 +02:00
Max Reitz	9296b3ed70	qcow2: Implement bdrv_amend_options Implement bdrv_amend_options for compat, size, backing_file, backing_fmt and lazy_refcounts. Downgrading images from compat=1.1 to compat=0.10 is achieved through handling all incompatible flags accordingly, clearing all compatible and autoclear flags and expanding all zero clusters. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:46 +02:00
Max Reitz	b6481f376b	qcow2: Save refcount order in BDRVQcowState Save the image refcount order in BDRVQcowState. This will be relevant for future code supporting different refcount orders than four and also for code that needs to verify a certain refcount order for an opened image. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:46 +02:00
Max Reitz	32b6444d23	qcow2-cluster: Expand zero clusters Add functionality for expanding zero clusters. This is necessary for downgrading the image version to one without zero cluster support. For non-backed images, this function may also just discard zero clusters instead of truly expanding them. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:46 +02:00
Max Reitz	e7108feaac	qcow2-cache: Empty cache Add a function for emptying a cache, i.e., flushing it and marking all elements invalid. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:46 +02:00
Tal Kain	56e023af80	raw-win32.c: Fix incorrect handling behaviour of small block files It is a valid case that the read data's size is smaller than the requested size since there could be files that are smaller than the minimum block size (For ex. when a VMDK disk descriptor file) Signed-off-by: Tal Kain <tal.kain@ravellosystems.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:46 +02:00
Kevin Wolf	1ebf561c11	qcow2: Discard VM state in active L1 after creating snapshot During savevm, the VM state is written to the active L1 of the image and then a snapshot is taken. After that, the VM state isn't needed any more in the active L1 and should be discarded. This is implemented by this patch. The impact of not discarding the VM state is that a snapshot can never become smaller than any previous snapshot (because it would be padded with old VM state), and more importantly that future savevm operations cause unnecessary COWs (with associated flushes), which makes subsequent snapshots much slower. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:46 +02:00
Kevin Wolf	670df5e3b4	qcow2: Pass discard type to qcow2_discard_clusters() The function will be used internally instead of only being called for guest discard requests. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2013-09-12 10:12:46 +02:00
Peter Lieven	54a5c1d5db	iscsi: add .bdrv_get_block_status this patch adds a coroutine for .bdrv_co_block_status as well as a generic framework that can be used to build coroutines in block/iscsi. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-09-12 08:46:21 +02:00
Peter Lieven	f18a7cbb09	iscsi: add logical block provisioning information to iscsilun Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-09-12 08:46:21 +02:00
Paolo Bonzini	5accc8408f	scsi: prefer UUID to VM name for the initiator name The UUID is unique even across multiple hosts, thus it is better than a VM name even if it is less user-friendly. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-09-12 08:46:21 +02:00
Paolo Bonzini	f5f7abcfd5	raw-posix: report unwritten extents as zero These are created for example with XFS_IOC_ZERO_RANGE. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:09 +02:00
Paolo Bonzini	63390a8d14	raw-posix: return get_block_status data and flags Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:09 +02:00
Paolo Bonzini	4bc74be997	block: return get_block_status data and flags for formats Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:09 +02:00
Paolo Bonzini	b6b8a33354	block: introduce bdrv_get_block_status API For now, bdrv_get_block_status is just another name for bdrv_is_allocated. The next patches will add more flags. This also touches all block drivers with a mostly mechanical rename. The sole exception is cow; because it calls cow_co_is_allocated from the read code, we keep that function and make cow_co_get_block_status a wrapper. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:09 +02:00
Paolo Bonzini	d663640c04	block: expect errors from bdrv_co_is_allocated Some bdrv_is_allocated callers do not expect errors, but the fallback in qcow2.c might make other callers trip on assertion failures or infinite loops. Fix the callers to always look for errors. Cc: qemu-stable@nongnu.org Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:09 +02:00
Paolo Bonzini	4f5786376e	block: remove bdrv_is_allocated_above/bdrv_co_is_allocated_above distinction Now that bdrv_is_allocated detects coroutine context, the two can use the same code. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:09 +02:00
Paolo Bonzini	bdad13b9de	block: make bdrv_co_is_allocated static bdrv_is_allocated can detect coroutine context and go through a fast path, similar to other block layer functions. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Paolo Bonzini	e641c1e81e	cow: do not call bdrv_co_is_allocated As we change bdrv_is_allocated to gather more information from bs and bs->file, it will become a bit slower. It is still appropriate for online jobs, but not for reads/writes. Call the internal function instead. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Paolo Bonzini	26ae980492	cow: make writes go at a less indecent speed Only sync once per write, rather than once per sector. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Paolo Bonzini	276cbc7f2f	cow: make reads go at a decent speed Do not do two reads for each sector; load each sector of the bitmap and use bitmap operations to process it. Writes are still dog slow! Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Fam Zheng	4f6fd3491c	block: make bdrv_delete() static Manage BlockDriverState lifecycle with refcnt, so bdrv_delete() is no longer public and should be called by bdrv_unref() if refcnt is decreased to 0. This is an identical change because effectively, there's no multiple reference of BDS now: no caller of bdrv_ref() yet, only bdrv_new() sets bs->refcnt to 1, so all bdrv_unref() now actually delete the BDS. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Fam Zheng	13c91cb7e2	iscsi: use bdrv_new() instead of stack structure BlockDriverState structure needs bdrv_new() to initialize refcnt, don't allocate a local structure variable and memset to 0, becasue with coming refcnt implementation, bdrv_unref will crash if bs->refcnt not initialized to 1. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Fam Zheng	3d34c6cd99	vvfat: use bdrv_new() to allocate BlockDriverState we need bdrv_new() to properly initialize BDS, don't allocate memory manually. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Stefan Weil	68dc036488	w32: Fix access to host devices (regression) QEMU failed to open host devices like \\.\PhysicalDrive0 (first hard disk) since some time (commit 8a79380b8ef1b02d2abd705dd026a18863b09020?). Those devices use hdev_open which did not use the latest API for options. This resulted in a fatal runtime error: Block protocol 'host_device' doesn't support the option 'filename' Duplicate code from raw_open to fix this. Cc: qemu-stable@nongnu.org Reported-by: David Brenner <david.brenner3@gmail.com> Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:08 +02:00
Benoît Canet	2024c1df43	block: Add iops_size to do the iops accounting for a given io size. This feature can be used in case where users are avoiding the iops limit by doing jumbo I/Os hammering the storage backend. Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:07 +02:00
Benoît Canet	3e9fab690d	block: Add support for throttling burst max in QMP and the command line. The max parameter of the leaky bucket throttling algorithm can be used to allow the guest to do bursts. The max value is a pool of I/O that the guest can use without being throttled at all. Throttling is triggered once this pool is empty. Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:07 +02:00
Benoît Canet	cc0681c454	block: Enable the new throttling code in the block layer. Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-09-06 15:25:07 +02:00
Anthony Liguori	bb7d4d82b6	Merge remote-tracking branch 'kwolf/for-anthony' into staging # By Max Reitz (11) and others # Via Kevin Wolf * kwolf/for-anthony: (26 commits) qemu-iotests: Overlapping cluster allocations qcow2_check: Mark image consistent qcow2-refcount: Repair shared refcount blocks qcow2-refcount: Repair OFLAG_COPIED errors qcow2-refcount: Move OFLAG_COPIED checks qcow2: Employ metadata overlap checks qcow2: Metadata overlap checks qcow2: Add corrupt bit qemu-iotests: Snapshotting zero clusters qcow2-refcount: Snapshot update for zero clusters option: Add assigned flag to QEMUOptionParameter gluster: Abort on AIO completion failure block: Remove old raw driver switch raw block driver from "raw.o" to "raw_bsd.o" raw_bsd: register bdrv_raw raw_bsd: add raw_create_options raw_bsd: introduce "special members" raw_bsd: add raw_create() raw_bsd: emit debug events in bdrv_co_readv() and bdrv_co_writev() add skeleton for BSD licensed "raw" BlockDriver ... Message-id: 1378111792-20436-1-git-send-email-kwolf@redhat.com Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>	2013-09-03 12:32:46 -05:00
Max Reitz	24530f3e06	qcow2_check: Mark image consistent If no corruptions remain after an image repair (and no errors have been encountered), clear the corrupt flag in qcow2_check. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-02 10:15:15 +02:00
Max Reitz	afa50193cd	qcow2-refcount: Repair shared refcount blocks If the refcount of a refcount block is greater than one, we can at least try to repair that problem by duplicating the affected block. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-02 10:06:59 +02:00
Stefan Hajnoczi	5b21a2ae4d	curl: qemu_bh_new() can never return NULL Drop error code path which cannot be taken since qemu_bh_new() does not return NULL. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2013-09-01 19:11:56 +04:00
Stefan Weil	4c293dc6e4	misc: Fix some typos in names and comments Most typos were found using a modified version of codespell: accross -> across issueing -> issuing TICNT_THRESHHOLD -> TICNT_THRESHOLD bandwith -> bandwidth VCARD_7816_PROPIETARY -> VCARD_7816_PROPRIETARY occured -> occurred gaurantee -> guarantee sofware -> software Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2013-09-01 18:59:24 +04:00
Max Reitz	e23e400ec6	qcow2-refcount: Repair OFLAG_COPIED errors Since the OFLAG_COPIED checks are now executed after the refcounts have been repaired (if repairing), it is safe to assume that they are correct but the OFLAG_COPIED flag may be not. Therefore, if its value differs from what it should be (considering the according refcount), that discrepancy can be repaired by correctly setting (or clearing that flag. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:48:44 +02:00
Max Reitz	4f6ed88c03	qcow2-refcount: Move OFLAG_COPIED checks Move the OFLAG_COPIED checks out of check_refcounts_l1 and check_refcounts_l2 and after the actual refcount checks/fixes (since the refcounts might actually change there). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:48:44 +02:00
Max Reitz	cf93980e77	qcow2: Employ metadata overlap checks The pre-write overlap check function is now called before most of the qcow2 writes (aborting it on collision or other error). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:48:43 +02:00
Max Reitz	a40f1c2add	qcow2: Metadata overlap checks Two new functions are added; the first one checks a given range in the image file for overlaps with metadata (main header, L1 tables, L2 tables, refcount table and blocks). The second one should be used immediately before writing to the image file as it calls the first function and, upon collision, marks the image as corrupt and makes the BDS unusable, thereby preventing further access. Both functions take a bitmask argument specifying the structures which should be checked for overlaps, making it possible to also check metadata writes against colliding with other structures. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:48:43 +02:00
Max Reitz	69c9872653	qcow2: Add corrupt bit This adds an incompatible bit indicating corruption to qcow2. Any image with this bit set may not be written to unless for repairing (and subsequently clearing the bit if the repair has been successful). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:48:43 +02:00
Max Reitz	8b81a7b6ba	qcow2-refcount: Snapshot update for zero clusters Account for all cluster types in qcow2_update_snapshot_refcounts; this prevents this function from updating the refcount of unallocated zero clusters which effectively led to wrong adjustments of the refcount of cluster 0 (the main qcow2 header). This in turn resulted in images with (unallocated) zero clusters having a cluster 0 refcount greater than one after creating a snapshot. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Bharata B Rao	9faa574f7d	gluster: Abort on AIO completion failure Currently if gluster AIO callback thread fails to notify the QEMU thread about AIO completion, we try graceful recovery by marking the disk drive as inaccessible. This error recovery code is race-prone as found by Asias and Stefan. However as found out by Paolo, this kind of error is impossible and hence simplify the code that handles this error recovery. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Kevin Wolf	e5b1d99f55	block: Remove old raw driver This is unused code now. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Laszlo Ersek	7a6d3fc594	switch raw block driver from "raw.o" to "raw_bsd.o" "Incoming" function prototypes and "outgoing" function calls must match reality. Implemented using the "struct BlockDriver" definition in "include/block/block_int.h", and gcc errors & warnings. v1->v2: On 08/20/13 09:51, Kevin Wolf wrote: > Am 18.08.2013 um 16:29 hat Paolo Bonzini geschrieben: >> Il 16/08/2013 16:15, Laszlo Ersek ha scritto: >>> +static int raw_reopen_prepare(BDRVReopenState reopen_state, >>> + BlockReopenQueue queue, Error *errp) >>> { >>> - return bdrv_reopen_prepare(bs->file); >>> + BDRVReopenState tmp = reopen_state; >>> + >>> + tmp.bs = tmp.bs->file; >>> + return bdrv_reopen_prepare(&tmp, queue, errp); >>> } >> >> This should just return zero, my fault. > > Which is because bdrv_reopen_queue() already queues bs->file for reopen. > The simple return 0; implementation is shared by all other format drivers > that support reopening images. Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Laszlo Ersek	775d6afd5c	raw_bsd: register bdrv_raw On 08/05/13 15:03, Paolo Bonzini wrote: > > [...] > > 5) Formats are registered with bdrv_register (takes a BlockDriver). You > also need to pass the caller of bdrv_register to block_init. Fill in the BlockDriver structure with the raw_() functions that have been added to "block/raw_bsd.c", in the order the fields are defined in "include/block/block_int.h". I needed more explanation / naming examples for registering the driver than what Paolo gave me, so I copied / adapted from "block/qcow2.c". The parts I took as basis for modification are blamed on commit `5efa9d5a8b` Author: Anthony Liguori <aliguori@us.ibm.com> Date: Sat May 9 17:03:42 2009 -0500 Convert block infrastructure to use new module init functionality commit `20d97356c9` Author: Blue Swirl <blauwirbel@gmail.com> Date: Fri Apr 23 20:19:47 2010 +0000 Fix OpenBSD build Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Laszlo Ersek	ff369a483d	raw_bsd: add raw_create_options On 08/05/13 15:03, Paolo Bonzini wrote: > > [...] > > 4) There is another member, .create_options, which is an array of > QEMUOptionParameter structs, terminated by an all-zero item. The only > option you need is for the virtual disk size. You will find something > to copy from in other block drivers, for example block/qcow2.c. Code taken and adapted from "block/qcow2.c", as suggested. The code being copied/modified is blamed on commit `20d97356c9` Author: Blue Swirl <blauwirbel@gmail.com> Date: Fri Apr 23 20:19:47 2010 +0000 Fix OpenBSD build and commit `7c80ab3f21` Author: Jes Sorensen <Jes.Sorensen@redhat.com> Date: Fri Dec 17 16:02:39 2010 +0100 block/qcow2.c: rename qcow_ functions to qcow2_ Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Laszlo Ersek	01dd96d8f4	raw_bsd: introduce "special members" On 08/05/13 15:03, Paolo Bonzini wrote: > > [...] > > 3) These members are special > > .format_name is the string "raw" > .bdrv_open raw_open should set bs->sg to bs->file->sg and return 0 > .bdrv_close raw_close should do nothing > .bdrv_probe raw_probe should just return 1. v1->v2: On 08/20/13 10:11, Kevin Wolf wrote: > Am 16.08.2013 um 16:15 hat Laszlo Ersek geschrieben: >> +static int raw_probe(void) >> +{ >> + return 1; >> +} > > Maybe add a comment here like "smallest possible positive score so that > raw is used if and only if no other block driver works". Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Laszlo Ersek	1565262c37	raw_bsd: add raw_create() On 08/05/13 15:03, Paolo Bonzini wrote: > > [...] > > 2) This is also a simple forwarder function: > > .bdrv_create > > but there is no BlockDriverState argument so the forwarded-to function > does not have a bs->file argument either. The forwarded-to function is > bdrv_create_file. Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Laszlo Ersek	9eaafd90d1	raw_bsd: emit debug events in bdrv_co_readv() and bdrv_co_writev() On 08/05/13 15:03, Paolo Bonzini wrote: > > [...] > > 1) BlockDriver is a struct in which these function members are > interesting: > > .bdrv_reopen_prepare > .bdrv_co_readv > .bdrv_co_writev > .bdrv_co_is_allocated > .bdrv_co_write_zeroes > .bdrv_co_discard > .bdrv_getlength > .bdrv_get_info > .bdrv_truncate > .bdrv_is_inserted > .bdrv_media_changed > .bdrv_eject > .bdrv_lock_medium > .bdrv_ioctl > .bdrv_aio_ioctl > .bdrv_has_zero_init > > They should be implemented as simple forwarders (see above). There are > 16 functions listed here, you can easily see how this already accounts > for 100+ SLOC roughly... > > The implementations of bdrv_co_readv and bdrv_co_writev should also call > BLKDBG_EVENT on bs->file too, before forwarding to bs->file. The events > to be generated are BLKDBG_READ_AIO and BLKDBG_WRITE_AIO. Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Laszlo Ersek	e1c66c6d82	add skeleton for BSD licensed "raw" BlockDriver On 08/05/13 15:03, Paolo Bonzini wrote: > > > ----- Original Message ----- >> From: "Laszlo Ersek" <lersek@redhat.com> >> To: "Paolo Bonzini" <pbonzini@redhat.com> >> Sent: Monday, August 5, 2013 2:43:46 PM >> Subject: Re: [PATCH 1/2] raw: add license header >> >> On 08/02/13 00:27, Paolo Bonzini wrote: >>> On 08/01/2013 10:13 AM, Christoph Hellwig wrote: >>>> On Wed, Jul 31, 2013 at 08:19:51AM +0200, Paolo Bonzini wrote: >>>>> Most of the block layer is under the BSD license, thus it is >>>>> reasonable to license block/raw.c the same way. CCed people should >>>>> ACK by replying with a Signed-off-by line. >>>> >>>> The coded was intended to be GPLv2. >>> >>> Laszlo, would you be willing to do clean-room reverse engineering? >>> >>> (No rants, please. :)) >> >> What's the scope exactly? > > It's quite small, it's a file full of forwarders like > > static void raw_foo(BlockDriverState bs) > { > return bdrv_foo(bs->file); > } > > It's 170 lines of code, all as boring as this. I only picked you > because I'm quite certain you have never seen the file (and the answer > confirmed it). > > Basically: > > 1) BlockDriver is a struct in which these function members are > interesting: > > .bdrv_reopen_prepare > .bdrv_co_readv > .bdrv_co_writev > .bdrv_co_is_allocated > .bdrv_co_write_zeroes > .bdrv_co_discard > .bdrv_getlength > .bdrv_get_info > .bdrv_truncate > .bdrv_is_inserted > .bdrv_media_changed > .bdrv_eject > .bdrv_lock_medium > .bdrv_ioctl > .bdrv_aio_ioctl > .bdrv_has_zero_init > > They should be implemented as simple forwarders (see above). > There are 16 functions listed here, you can easily see how this > already accounts for 100+ SLOC roughly... > > The implementations of bdrv_co_readv and bdrv_co_writev should also > call BLKDBG_EVENT on bs->file too, before forwarding to bs->file. The > events to be generated are BLKDBG_READ_AIO and BLKDBG_WRITE_AIO. > > 2) This is also a simple forwarder function: > > .bdrv_create > > but there is no BlockDriverState argument so the forwarded-to function > does not have a bs->file argument either. The forwarded-to function > is bdrv_create_file. > > 3) These members are special > > .format_name is the string "raw" > .bdrv_open raw_open should set bs->sg to bs->file->sg and return 0 > .bdrv_close raw_close should do nothing > .bdrv_probe raw_probe should just return 1. > > 4) There is another member, .create_options, which is an array of > QEMUOptionParameter structs, terminated by an all-zero item. The only > option you need is for the virtual disk size. You will find something > to copy from in other block drivers, for example block/qcow2.c. > > 5) Formats are registered with bdrv_register (takes a BlockDriver). > You also need to pass the caller of bdrv_register to block_init. > > 6) I'm not sure how to organize the patch series, so I'll leave this to > your creativity. I guess in this case move/copy detection of git should > be disabled. I would definitely include this spec in the commit > message as a proof of clean-room reverse engineering. > > 7) Remember a BSD header like the one in block.c. > > Paolo This patch implements the email up to the paragraph ending with "100+ SLOC roughly". The skeleton is generated from the list there, with a simple shell loop using "sed" and the raw_foo() template. The BSD license block is copied (and reflowed) from "util/qemu-progress.c". Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Peter Maydell	127c84e1a5	block/qcow2.h: Avoid "1LL << 63" (shifts into sign bit) The expression "1LL << 63" tries to shift the 1 into the sign bit of a 'long long', which provokes a clang sanitizer warning: runtime error: left shift of 1 by 63 places cannot be represented in type 'long long' Use "1ULL << 63" as the definition of QCOW_OFLAG_COPIED instead to avoid this. For consistency, we also update the other QCOW_OFLAG definitions to use the ULL suffix rather than LL, though only the shift by 63 is undefined behaviour. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:28:52 +02:00
Kevin Wolf	9117b47717	qcow2: Change default for new images to compat=1.1 By the time that qemu 1.7 will be released, enough time will have passed since qemu 1.1, which is the first version to understand version 3 images, that changing the default shouldn't hurt many people any more and the benefits of using the new format outweigh the pain. qemu-iotests already runs with compat=1.1 by default. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-08-30 15:28:51 +02:00
Stefan Hajnoczi	b10577df13	win32-aio: drop win32_aio_flush_cb() The io_flush argument to qemu_aio_set_event_notifier() has been removed since the block layer learnt to drain requests by itself. Fix the Windows build for win32-aio.o by updating the qemu_aio_set_event_notifier() call and dropping win32_aio_flush_cb(). Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 22:05:04 +02:00
Alex Bligh	bc72ad6754	aio / timers: Switch entire codebase to the new timer API This is an autogenerated patch using scripts/switch-timer-api. Switch the entire code base to using the new timer API. Note this patch may introduce some line length issues. Signed-off-by: Alex Bligh <alex@alex.org.uk> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 19:14:24 +02:00
Alex Bligh	7483d1e547	aio / timers: convert block_job_sleep_ns and co_sleep_ns to new API Convert block_job_sleep_ns and co_sleep_ns to use the new timer API. Signed-off-by: Alex Bligh <alex@alex.org.uk> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 19:14:24 +02:00
Paolo Bonzini	04d542c8b8	vmdk: support vmfs files VMware ESX hosts also use different create and extent types for flat files, respectively "vmfs" and "VMFS". This is not documented, but it can be found at http://kb.vmware.com/kb/10002511 (Recreating a missing virtual machine disk (VMDK) descriptor file). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 15:35:58 +02:00
Fam Zheng	daac8fdc68	vmdk: support vmfsSparse files VMware ESX hosts use a variant of the VMDK3 format, identified by the vmfsSparse create type ad the VMFSSPARSE extent type. It has 16 KB grain tables (L2) and a variable-size grain directory (L1). In addition, the grain size is always 512, but that is not a problem because it is included in the header. The format of the extents is documented in the VMDK spec. The format of the descriptor file is not documented precisely, but it can be found at http://kb.vmware.com/kb/10026353 (Recreating a missing virtual machine disk (VMDK) descriptor file for delta disks). With these patches, vmfsSparse files only work if opened through the descriptor file. Data files without descriptor files, as far as I could understand, are not supported by ESX. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> -- v2: Rebase to patch 01. Change le64_to_cpu to le32_to_cpu. Rename vmdk_open_vmdk3 to vmdk_open_vmfs_sparse, which represents the current usage of this format. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 15:35:58 +02:00
Fam Zheng	f6b61e54bd	vmdk: fix L1 and L2 table size in vmdk3 open VMDK3 header has the field l1dir_size, but vmdk_open_vmdk3 hardcoded the value. This patch honors the header field. And the L2 table size is 4096 according to VMDK spec[1], instead of 1 << 9 (512). [1]: http://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf?src=vmdk Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 15:35:58 +02:00
Fam Zheng	b0651b8c24	vmdk: Move l1_size check into vmdk_add_extent() This header check is common to VMDK3 and VMDK4, so move it into vmdk_add_extent(). Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 15:35:58 +02:00
Asias He	0d51b4debe	block: Introduce bs->zero_beyond_eof In 4146b46c42e0989cb5842e04d88ab6ccb1713a48 (block: Produce zeros when protocols reading beyond end of file), we break qemu-iotests ./check -qcow2 022. This happens because qcow2 temporarily sets ->growable = 1 for vmstate accesses (which are stored beyond the end of regular image data). We introduce the bs->zero_beyond_eof to allow qcow2_load_vmstate() to disable ->zero_beyond_eof temporarily in addition to enable ->growable. [Since the broken patch "block: Produce zeros when protocols reading beyond end of file" has not been merged yet, I have applied this fix first and will then apply the next patch to keep the tree bisectable. -- Stefan] Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-22 14:10:21 +02:00
Kevin Wolf	8ad1898cf1	qcow2: Change default for new images to compat=1.1 By the time that qemu 1.7 will be released, enough time will have passed since qemu 1.1, which is the first version to understand version 3 images, that changing the default shouldn't hurt many people any more and the benefits of using the new format outweigh the pain. qemu-iotests already runs with compat=1.1 by default. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-21 14:41:09 +02:00
Stefan Hajnoczi	f2e5dca46b	aio: drop io_flush argument The .io_flush() handler no longer exists and has no users. Drop the io_flush argument to aio_set_fd_handler() and related functions. The AioFlushEventNotifierHandler and AioFlushHandler typedefs are no longer used and are dropped too. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:52:19 +02:00
Stefan Hajnoczi	f0d3576599	block/ssh: drop return_true() .io_flush() is no longer called so drop return_true(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:52:19 +02:00
Stefan Hajnoczi	d6d94c6785	block/sheepdog: drop have_co_req() and aio_flush_request() .io_flush() is no longer called so drop have_co_req() and aio_flush_request(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:52:19 +02:00
Stefan Hajnoczi	5d289cc724	block/rbd: drop qemu_rbd_aio_flush_cb() .io_flush() is no longer called so drop qemu_rbd_aio_flush_cb(). qemu_aio_count is unused now so drop it too. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:52:19 +02:00
Stefan Hajnoczi	bed2e759eb	block/nbd: drop nbd_have_request() .io_flush() is no longer called so drop nbd_have_request(). We cannot drop in_flight since it is still used by other block/nbd.c code. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:52:19 +02:00
Stefan Hajnoczi	94473d0c06	block/linux-aio: drop qemu_laio_completion_cb() .io_flush() is no longer called so drop qemu_laio_completion_cb(). It turns out that count is now unused so drop that too. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:52:19 +02:00
Stefan Hajnoczi	70ecdc6e4e	block/iscsi: drop iscsi_process_flush() .io_flush() is no longer called so drop iscsi_process_flush(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:52:19 +02:00
Stefan Hajnoczi	372835fbc3	block/gluster: drop qemu_gluster_aio_flush_cb() Since .io_flush() is no longer called we do not need qemu_gluster_aio_flush_cb() anymore. It turns out that qemu_aio_count is unused now and can be dropped. Thanks to Bharata B Rao <bharata@linux.vnet.ibm.com> for catching a build failure with CONFIG_GLUSTERFS_DISCARD, which has been fixed. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:51:09 +02:00
Stefan Hajnoczi	0d1460226f	block/curl: drop curl_aio_flush() .io_flush() is no longer called so drop curl_aio_flush(). The acb[] array that the function checks is still used in other parts of block/curl.c. Therefore we cannot remove acb[], it is needed. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:45:35 +02:00
Stefan Hajnoczi	88266f5aa7	block: stop relying on io_flush() in bdrv_drain_all() If a block driver has no file descriptors to monitor but there are still active requests, it can return 1 from .io_flush(). This is used to spin during synchronous I/O. Stop relying on .io_flush() and instead check QLIST_EMPTY(&bs->tracked_requests) to decide whether there are active requests. This is the first step in removing .io_flush() so that event loops no longer need to have the concept of synchronous I/O. Eventually we may be able to kill synchronous I/O completely by running everything in a coroutine, but that is future work. Note this patch moves bs->throttled_reqs initialization to bdrv_new() so that bdrv_requests_pending(bs) can safely access it. In practice bs is g_malloc0() so the memory is already zeroed but it's safer to initialize the queue properly. We also need to fix up block/stream.c:close_unused_images() to prevent traversing a dangling pointer while it rearranges the backing file chain. This is necessary since the new bdrv_drain_all() traverses the backing file chain. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-08-19 15:45:34 +02:00
Paolo Bonzini	7748c1bd50	raw: add license header Most of the block layer is under the BSD license, thus it is reasonable to license block/raw.c the same way. CCed people should ACK by replying with a Signed-off-by line. Cc: Christoph Hellwig <hch@lst.de> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Jeff Cody <jcody@redhat.com> Cc: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1375251592-2537-2-git-send-email-pbonzini@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-08-12 09:15:11 -05:00
Fam Zheng	ca8804ced9	vmdk: rename num_gtes_per_gte to num_gtes_per_gt num_gtes_per_gte is a historical typo, rename it to a more sensible name. It means "number of GrainTableEntries per GrainTable". Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-06 15:27:32 +02:00
Fam Zheng	bf81507de3	vmdk: use heap allocation for whole_grain We should never grow the stack beyond 1 MB, otherwise we'll fall off the end. Thread stacks and coroutine stacks (1 MB) do not grow. get_cluster_offset() allocates a big stack offset, it will fail for big cluster images, change to heap allocated buffer. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-06 15:27:32 +02:00
Fam Zheng	2c43e43c8c	vmdk: check l1 size before opening image L1 table size is calculated from capacity, granularity and l2 table size. If capacity is too big or later two are too small, the L1 table will be too big to allocate in memory. Limit it to a reasonable range. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-06 15:27:32 +02:00
Fam Zheng	f8ce04036e	vmdk: check l2 table size when opening header.num_gtes_per_gte determines size for L2 table. Check for too big value before using it. Limit to 512M entries (2GB per one L2 table). Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-06 15:27:32 +02:00
Fam Zheng	8aa1331c09	vmdk: check granularity field in opening Granularity is used to calculate the cluster size and allocate r/w buffer. Check the value from image before using it, so we don't abort() for unbounded memory allocation. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-06 15:27:32 +02:00
Fam Zheng	e98768d437	vmdk: use unsigned values for on disk header fields The size and offset fields are all non-negative values, use uint64_t for them to avoid getting negative in memory value by int overflow. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-06 15:27:32 +02:00
Fam Zheng	5d8caa543c	vmdk: Make VMDK3Header and VmdkGrainMarker QEMU_PACKED It's best to make it consistent that all on disk structures are QEMU_PACKED. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-06 15:27:32 +02:00
Liu Yuan	e4f5c1bf8f	sheepdog: add missing .bdrv_has_zero_init Commit `3ac21627` changed the behaviour of bdrv_has_zero_init() to default to 0. In the review for Sheepdog it turned out that enabling it is safe, so that commit updated one BlockDriver definition of sheepdog to use bdrv_has_zero_init_1, missed however that there are more BlockDrivers in the driver. Fix these now. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-06 10:41:56 +02:00
Fam Zheng	8e50724313	vmdk: fix comment for vmdk_co_write_zeroes The comment was truncated. Add the missing parts, especially explain why we need zero_dry_run. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2013-08-02 18:07:04 +04:00
Richard W.M. Jones	f5075224d6	block/iscsi.c: Fix printf format error. The error on armv7hl was: block/iscsi.c: In function ‘is_request_lun_aligned’: block/iscsi.c:251:26: error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘int64_t’ [-Werror=format=] iscsilun->block_size, sector_num, nb_sectors); ^ This also splits the long line to comply with qemu coding guidelines. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2013-08-02 18:02:48 +04:00
Peter Maydell	2440a2c3df	block/sheepdog: Rename 'dprintf' to 'DPRINTF' 'dprintf' is the name of a POSIX standard function so we should not be stealing it for our debug macro. Rename to 'DPRINTF' (in line with a number of other source files.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Andreas Färber <afaerber@suse.de> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Acked-by: Richard Henderson <rth@twiddle.net> Acked-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1375100199-13934-2-git-send-email-peter.maydell@linaro.org Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-07-29 19:33:53 -05:00
Anthony Liguori	eddbf0ab9d	Merge remote-tracking branch 'stefanha/block' into staging # By Stefan Hajnoczi (4) and others # Via Stefan Hajnoczi * stefanha/block: dataplane: refuse to start if device is already in use dataplane: enable virtio-blk x-data-plane=on live migration migration: fix spice migration migration: notify migration state before starting thread block: Repair the throttling code. gluster: Add image resize support Message-id: 1375112172-24863-1-git-send-email-stefanha@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-07-29 11:33:48 -05:00
Paolo Bonzini	42ec24e285	gluster: Add image resize support Implement .bdrv_truncate in GlusterFS block driver so that GlusterFS backend can support image resizing. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Tested-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-07-29 17:07:37 +02:00
Stefan Weil	52f350227f	misc: Fix new typos in comments and strings All these typos were found by codespell. sould -> should emperical -> empirical intialization -> initialization successfuly -> successfully gaurantee -> guarantee Fix also another error (before before) in the same context. Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2013-07-27 11:22:54 +04:00
Ian Main	fc5d3f8432	Implement sync modes for drive-backup. This patch adds sync-modes to the drive-backup interface and implements the FULL, NONE and TOP modes of synchronization. FULL performs as before copying the entire contents of the drive while preserving the point-in-time using CoW. NONE only copies new writes to the target drive. TOP copies changes to the topmost drive image and preserves the point-in-time using CoW. For sync mode TOP are creating a new target image using the same backing file as the original disk image. Then any new data that has been laid on top of it since creation is copied in the main backup_run() loop. There is an extra check in the 'TOP' case so that we don't bother to copy all the data of the backing file as it already exists in the target. This is where the bdrv_co_is_allocated() is used to determine if the data exists in the topmost layer or below. Also any new data being written is intercepted via the write_notifier hook which ends up calling backup_do_cow() to copy old data out before it gets overwritten. For mode 'NONE' we create the new target image and only copy in the original data from the disk image starting from the time the call was made. This preserves the point in time data by only copying the parts that are going to change to the target image. This way we can reconstruct the final image by checking to see if the given block exists in the new target image first, and if it does not, you can get it from the original image. This is basically an optimization allowing you to do point-in-time snapshots with low overhead vs the 'FULL' version. Since there is no old data to copy out the loop in backup_run() for the NONE case just calls qemu_coroutine_yield() which only wakes up after an event (usually cancel in this case). The rest is handled by the before_write notifier which again calls backup_do_cow() to write out the old data so it can be preserved. Signed-off-by: Ian Main <imain@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-07-26 22:01:31 +02:00
Kevin Wolf	64aa99d3e0	qcow2: Use dashes instead of underscores in options This is what QMP wants to use. The options haven't been enabled in any release yet, so we're still free to change them. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-07-26 21:59:56 +02:00
Peter Lieven	a23fdf3559	block/raw: add .bdrv_get_info Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-07-19 15:27:37 +08:00
Peter Lieven	8bf9344ad6	block/raw: add bdrv_co_write_zeroes Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-07-19 12:29:21 +08:00
Fam Zheng	78f27bd02c	block: fix vvfat error path for enable_write_target s->qcow and s->qcow_filename are allocated but not freed on error. Fix the possible leaks, remove unnecessary check for bdrv_new(), propagate ret code of bdrv_create() and also the one of enable_write_target(). Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-07-19 12:29:21 +08:00
Bharata B Rao	0c14fb47ec	gluster: Add discard support for GlusterFS block driver. Implement bdrv_aio_discard for gluster. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-07-19 12:29:21 +08:00
Peter Lieven	0777b5dde4	iscsi: factor out sector conversions Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-07-17 17:01:41 +02:00
Peter Lieven	91bea4e2bb	iscsi: assert that sectors are aligned to LUN blocksize if the blocksize of an iSCSI LUN is bigger than the BDRV_SECTOR_SIZE it is possible that sector_num or nb_sectors are not correctly aligned. to avoid corruption we fail requests which are misaligned. Signed-off-by: Peter Lieven <pl@kamp.de> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-07-17 17:01:41 +02:00
Peter Lieven	7e4d5a9f94	iscsi: remove support for misaligned nb_sectors in aio_readv this hask is not working (anymore). support for misaligned offsets should be handled at the block layer. Signed-off-by: Peter Lieven <pl@kamp.de> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-07-17 17:01:41 +02:00
Peter Lieven	d3bda7bc16	iscsi: fix -ENOSPC in iscsi_create() the -ENOPSC case did not work due to the missing goto. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-07-17 17:00:28 +02:00
Ronnie Sahlberg	0a53f01074	Fix iSCSI crash on SG_IO with an iovector Don't assume that SG_IO is always invoked with a simple buffer, check the iovec_count and if it is >= 1 then we need to pass an array of iovectors to libiscsi instead of just a plain buffer. Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-07-17 17:00:26 +02:00
Kevin Wolf	98289620e0	block: Don't parse protocol from file.filename One of the major reasons for doing something new for -blockdev and blockdev-add was that the old block layer code parses filenames instead of just taking them literally. So we should really leave it untouched when it's passing using the new interfaces (like -drive file.filename=...). This allows opening relative file names that contain a colon. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-07-15 09:49:00 +02:00
Fam Zheng	3494d65027	curl: refuse to open URL from HTTP server without range support CURL driver requests partial data from server on guest IO req. For HTTP and HTTPS, it uses "Range: **" in requests, and this will not work if server not accepting range. This patch does this check when open. Removed curl_size_cb, which is not used: On one hand it's registered to libcurl as CURLOPT_WRITEFUNCTION, instead of CURLOPT_HEADERFUNCTION, which will get called with data, not header. On the other hand the s->len is assigned unconditionally later. In this gone function, the sscanf for "Content-Length: %zd", on (void )ptr, which is not guaranteed to be zero-terminated, is potentially a security bug. So this patch fixes it as a side-effect. The bug is reported as: https://bugs.launchpad.net/qemu/+bug/1188943 (Note the bug is marked "private" so you might not be able to see it) Introduced curl_header_cb, which is used to parse header and mark the server as accepting range if "Accept-Ranges: bytes" line is seen from response header. If protocol is HTTP or HTTPS, but server response has no not this support, refuse to open this URL. Note that python builtin module SimpleHTTPServer is an example of not supporting range, if you need to test this driver, get a better server or use internet URLs. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-07-05 09:40:18 +02:00
Fam Zheng	da7a50f938	vmdk: Implement .bdrv_has_zero_init Depending on the subformat, has_zero_init queries underlying storage for flat extent. If it has a flat extent and its underlying storage doesn't have zero init, return 0. Otherwise return 1. Aligns the operator assignments. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-07-05 09:40:18 +02:00
Peter Lieven	3ac216270a	block: change default of .has_zero_init to 0 .has_zero_init defaults to 1 for all formats and protocols. this is a dangerous default since this means that all new added drivers need to manually overwrite it to 0 if they do not ensure that a device is zero initialized after bdrv_create(). if a driver needs to explicitly set this value to 1 its easier to verify the correctness in the review process. during review of the existing drivers it turned out that ssh and gluster had a wrong default of 1. both protocols support host_devices as backend which are not by default zero initialized. this wrong assumption will lead to possible corruption if qemu-img convert is used to write to such a backend. vpc and vmdk also defaulted to 1 altough they support fixed respectively flat extends. this has to be addresses in separate patches. both formats as well as the mentioned ssh and gluster are turned to the default of 0 with this patch for safety. a similar problem with the wrong default existed for iscsi most likely because the driver developer did oversee the default value of 1. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-28 13:52:35 +02:00
Kevin Wolf	72c6cc94da	vpc: Implement .bdrv_has_zero_init Depending on the subformat, has_zero_init on VHD must behave like raw and query the underlying storage (fixed) or like other sparse formats that can always return 1 (dynamic, differencing). Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-28 10:21:00 +02:00
Fam Zheng	8ed610a1c9	vmdk: remove wrong calculation of relative path When creating image with backing file, the driver tries to calculate the relative path from created image file to backing file, but the path computation is incorrect. e.g.: $ qemu-img create -f vmdk -b vmdk-data-disk.vmdk vmdk-data-snapshot1 Formatting 'vmdk-data-snapshot1', fmt=vmdk size=10737418240 backing_file='vmdk-data-disk.vmdk' compat6=off zeroed_grain=off $ qemu-img info vmdk-data-snapshot1 image: vmdk-data-snapshot1 file format: vmdk virtual size: 10G (10737418240 bytes) disk size: 12K -> backing file: disk.vmdk The common part in file names, "vmdk-data-", is incorrectly forgotten by relative_path(). As the VMDK specification has no restriction on parentNameHint to be relative path, we simply remove this by using the backing_file option. Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-28 09:20:27 +02:00
Kevin Wolf	8ab6feec2c	gluster: Return bdrv_has_zero_init = 0 GlusterFS volumes can be backed by block devices, in which case bdrv_create() doesn't make sure that the image is zeroed out. It is currently not possibly to detect whether a given image is backed by a file or a block device, and incorrectly assuming that it is zeroed corrupts images during qemu-img convert, so let's err on the side of caution and always return 0. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-28 09:20:27 +02:00
Richard W.M. Jones	0b3f21e6a9	block/ssh: Set bdrv_has_zero_init according to the file type. If the remote is a regular file, set it to true (ie. reads of uninitialized areas in a newly created file will return zeroes). If we can't prove that, return false (a safe default). Tested by adding a debugging print statement [not part of this commit] and creating a remote file and a remote block device: $ ./qemu-img create ssh://localhost/tmp/new 100M Formatting 'ssh://localhost/tmp/new', fmt=raw size=104857600 filename ssh://localhost/tmp/new: has_zero_init = 1 $ sudo lvcreate -L 1G -n tmp /dev/fedora Logical volume "tmp" created $ ./qemu-img create ssh://localhost/dev/fedora/tmp 1G Formatting 'ssh://localhost/dev/fedora/tmp', fmt=raw size=1073741824 filename ssh://localhost/dev/fedora/tmp: has_zero_init = 0 Cc: Kevin Wolf <kwolf@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-28 09:20:27 +02:00
Kevin Wolf	f59fee8d50	block: Make BlockJobTypes const Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-28 09:20:27 +02:00
Dietmar Maurer	98d2c6f2cd	block: add basic backup support to block driver backup_start() creates a block job that copies a point-in-time snapshot of a block device to a target block device. We call backup_do_cow() for each write during backup. That function reads the original data from the block device before it gets overwritten. The data is then written to the target device. Currently backup cluster size is hardcoded to 65536 bytes. [I made a number of changes to Dietmar's original patch and folded them in to make code review easy. Here is the full list: * Drop BackupDumpFunc interface in favor of a target block device * Detect zero clusters with buffer_is_zero() and use bdrv_co_write_zeroes() * Use 0 delay instead of 1us, like other block jobs * Unify creation/start functions into backup_start() * Simplify cleanup, free bitmap in backup_run() instead of cb * function * Use HBitmap to avoid duplicating bitmap code * Use bdrv_getlength() instead of accessing ->total_sectors * directly * Delete the backup.h header file, it is no longer necessary * Move ./backup.c to block/backup.c * Remove #ifdefed out code * Coding style and whitespace cleanups * Use bdrv_add_before_write_notifier() instead of blockjob-specific hooks * Keep our own in-flight CowRequest list instead of using block.c tracked requests. This means a little code duplication but is much simpler than trying to share the tracked requests list and use the backup block size. * Add on_source_error and on_target_error error handling. * Use trace events instead of DPRINTF() -- stefanha] Signed-off-by: Dietmar Maurer <dietmar@proxmox.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-28 09:20:26 +02:00
Kevin Wolf	a5c5ea3f60	raw-posix: Fix /dev/cdrom magic on OS X The raw-posix driver has code to provide a /dev/cdrom on OS X even though it doesn't really exist. However, since commit `c66a6157` the real filename is dismissed after finding it, so opening /dev/cdrom fails. Put the filename back into the options QDict to make this work again. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-28 09:20:26 +02:00
Fam Zheng	96c51eb5e4	vmdk: refuse to open higher version than supported Refuse to open higher version for safety. Although we try to be compatible with published VMDK spec, VMware has newer version from ESXi 5.1 exported OVF/OVA, which we have no knowledge what's changed in it. And it is very likely to have more new versions in the future, so it's not safe to open them blindly. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-24 10:25:43 +02:00
Kevin Wolf	0b919fae31	qcow2: Batch discards This optimises the discard operation for freed clusters by batching discard requests (both snapshot deletion and bdrv_discard end up updating the refcounts cluster by cluster). Note that we don't discard asynchronously, but keep s->lock held. This is to avoid that a freed cluster is reallocated and written to while the discard is still in flight. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-24 10:25:17 +02:00
Kevin Wolf	67af674e47	qcow2: Options to enable discard for freed clusters Deleted snapshots are discarded in the image file by default, discard requests take their default from the -drive discard=... option and other places that free clusters must always be enabled explicitly. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-24 10:25:17 +02:00
Kevin Wolf	6cfcb9b8b9	qcow2: Add refcount update reason to all callers This adds a refcount update reason to all callers of update_refcounts(), so that a follow-up patch can use this information to decide whether clusters that reach a refcount of 0 should be discarded in the image file. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-24 10:25:17 +02:00
Anthony Liguori	3ed8a8430a	Merge remote-tracking branch 'bonzini/scsi-next' into staging # By Paolo Bonzini (3) and others # Via Paolo Bonzini * bonzini/scsi-next: iscsi: reorganize iscsi_readcapacity_sync iscsi: simplify freeing of tasks vhost-scsi: fix k->set_guest_notifiers() NULL dereference scsi-disk: scsi-block device for scsi pass-through should not be removable scsi-generic: check the return value of bdrv_aio_ioctl in execute_command scsi-generic: fix sign extension of READ CAPACITY(10) data scsi: reset cdrom tray statuses on scsi_disk_reset Message-id: 1371565016-2643-1-git-send-email-pbonzini@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-06-18 10:06:47 -05:00
Paolo Bonzini	1288844e7c	iscsi: reorganize iscsi_readcapacity_sync Avoid the goto, and use the same retry logic for the 10- and 16- byte versions. Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-06-18 12:43:03 +02:00
Paolo Bonzini	f0d2a4d4d6	iscsi: simplify freeing of tasks Always free them in the iscsi_aio_*_acb functions and remove the checks in their callers. Remove ifs when the task struct was previously dereferenced (spotted by Coverity). Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-06-18 12:43:03 +02:00
Ján Tomko	2330790879	nbd: strip braces from literal IPv6 address in URI Otherwise they would get passed to getaddrinfo and fail with: address resolution failed for [::1]🔢 Name or service not known (Broken by commit v1.4.0-736-gf17c90b) Signed-off-by: Ján Tomko <jtomko@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-06-18 11:43:00 +02:00
Anthony Liguori	21a885a7e2	Merge remote-tracking branch 'luiz/queue/qmp' into staging # By Luiz Capitulino # Via Luiz Capitulino * luiz/queue/qmp: qerror: drop QERR_OPEN_FILE_FAILED macro block: bdrv_reopen_prepare(): don't use QERR_OPEN_FILE_FAILED savevm: qmp_xen_save_devices_state(): use error_setg_file_open() dump: qmp_dump_guest_memory(): use error_setg_file_open() cpus: use error_setg_file_open() blockdev: use error_setg_file_open() block: mirror_complete(): use error_setg_file_open() rng-random: use error_setg_file_open() error: add error_setg_file_open() helper Message-id: 1371484631-29510-1-git-send-email-lcapitulino@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-06-17 13:14:46 -05:00
Evgeny Budilovsky	0bed087df2	vmdk: Allow reading variable size descriptor files the hard-coded 2k buffer on the stack won't allow reading big descriptor files which can be generated when storing big images. For example 500G vmdk splitted to 2G chunks. Signed-off-by: Evgeny Budilovsky <evgeny.budilovsky@ravellosystems.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-17 17:47:59 +02:00
Richard W.M. Jones	8da1aa15db	curl: Don't set curl options on the handle just before it's going to be deleted. (Found by Kamil Dudka) Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Cc: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-17 17:47:59 +02:00
Stefan Hajnoczi	5a394b9e96	vmdk: byteswap VMDK4Header.desc_offset field Remember to byteswap VMDK4Header.desc_offset on big-endian machines. Cc: qemu-stable@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-17 17:47:59 +02:00
Richard W.M. Jones	a7cea2ba47	block/curl.c: Refuse to open the handle for writes. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-17 17:47:59 +02:00
Liu Yuan	cede621ffc	sheepdog: support 'qemu-img snapshot -a' Just call sd_create_branch() in the snapshot_goto to rollback the image is good enough. With this patch, 'loadvm' process for sheepdog is modified: Suppose we have a snapshot chain A --> B --> C, we do 'loadvm A' so as to get a new chain, A --> B \| V C1 in the old code: 1 reload inode of A (in snapshot_goto) 2 read vmstate via A's vdi_id (loadvm_state) 3 delete C and create C1, reload inode of C1 (sd_create_branch on write) with this patch applied: 1 reload inode of A, delete C and create C1 (in snapshot_goto) 2 read vmstate via C1's parent, that is A's vdi_id (loadvm_state) This will fix the possible bug that QEMU exit between 2 and 3 in the old code Cc: qemu-devel@nongnu.org Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-17 17:47:58 +02:00
Liu Yuan	b579ffb3fd	sheepdog: fix snapshot tag initialization This is an old and obvious bug. We should pass snapshot_id to the tag. Or simple command like 'qemu-img snapshot -a tag sheepdog:image' will fail Cc: qemu-devel@nongnu.org Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-17 17:47:58 +02:00
Luiz Capitulino	dacc26aae5	block: mirror_complete(): use error_setg_file_open() Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com>	2013-06-17 11:01:14 -04:00
Richard W.M. Jones	9e5e2b23d3	curl: Whitespace only changes. Trivial patch to remove odd whitespace. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2013-06-11 23:45:43 +04:00
Wenchao Xia	553a7e8718	qmp: add ImageInfo in BlockDeviceInfo used by query-block Now image info will be retrieved as an embbed json object inside BlockDeviceInfo, backing chain info and all related internal snapshot info can be got in the enhanced recursive structure of ImageInfo. New recursive member *backing-image is added to reflect the backing chain status. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-07 13:37:45 +02:00
Wenchao Xia	43526ec8d1	block: add image info query function bdrv_query_image_info() This patch adds function bdrv_query_image_info(), which will retrieve image info in qmp object format. The implementation is based on the code moved from qemu-img.c, but uses block layer function to get snapshot info. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-07 13:37:45 +02:00
Wenchao Xia	fb0ed4539c	block: add snapshot info query function bdrv_query_snapshot_info_list() This patch adds function bdrv_query_snapshot_info_list(), which will retrieve snapshot info of an image in qmp object format. The implementation is based on the code moved from qemu-img.c with modification to fit more for qmp based block layer API. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-07 13:37:45 +02:00
Kevin Wolf	bf736fe34c	blkdebug: Add BLKDBG_FLUSH_TO_OS/DISK events Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-06-06 11:27:22 +02:00
Wenchao Xia	5b91704469	block: dump snapshot and image info to specified output bdrv_snapshot_dump() and bdrv_image_info_dump() do not dump to a buffer now, some internal buffers are still used for format control, which have no chance to be truncated. As a result, these two functions have no more issue of truncation, and they can be used by both qemu and qemu-img with correct parameter specified. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-04 13:56:30 +02:00
Wenchao Xia	f364ec65b5	block: move qmp and info dump related code to block/qapi.c This patch is a pure code move patch, except following modification: 1 get_human_readable_size() is changed to static function. 2 dump_human_image_info() is renamed to bdrv_image_info_dump(). 3 in qmp_query_block() and qmp_query_blockstats, use bdrv_next(bs) instead of direct traverse of global array 'bdrv_states'. 4 collect_snapshots() and collect_image_info() are renamed, unused parameter *fmt in collect_image_info() is removed. 5 code style fix. To avoid conflict and tip better, macro in header file is BLOCK_QAPI_H instead of QAPI_H. Now block.h and snapshot.h are at the same level in include path, block_int.h and qapi.h will both include them. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-04 13:56:30 +02:00
Wenchao Xia	de08c606f9	block: move snapshot code in block.c to block/snapshot.c All snapshot related code, except bdrv_snapshot_dump() and bdrv_is_snapshot(), is moved to block/snapshot.c. bdrv_snapshot_dump() will be moved to another file later. bdrv_is_snapshot() is not related with internal snapshot. It also fixes small code style errors reported by check script. Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-06-04 13:56:30 +02:00
Qiao Nuohan	ce3a4718fe	Remove twice include of qemu-common.h This patch is used to remove twice include of "qemu-common.h" in block/win32-aio.c Signed-off-by: Qiao Nuohan <qiaonuohan@cn.fujitsu.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2013-05-18 16:35:11 +04:00
Kevin Wolf	2cf7cfa1cd	qcow2: Catch some L1 table index overflows This catches the situation that is described in the bug report at https://bugs.launchpad.net/qemu/+bug/865518 and goes like this: $ qemu-img create -f qcow2 huge.qcow2 $((10241024))T Formatting 'huge.qcow2', fmt=qcow2 size=1152921504606846976 encryption=off cluster_size=65536 lazy_refcounts=off $ qemu-io /tmp/huge.qcow2 -c "write $((102410241024102410241024 - 1024)) 512" Segmentation fault With this patch applied the segfault will be avoided, however the case will still fail, though gracefully: $ qemu-img create -f qcow2 /tmp/huge.qcow2 $((10241024))T Formatting 'huge.qcow2', fmt=qcow2 size=1152921504606846976 encryption=off cluster_size=65536 lazy_refcounts=off qemu-img: The image size is too large for file format 'qcow2' Note that even long before these overflow checks kick in, you get insanely high memory usage (up to INT_MAX sizeof(uint64_t) = 16 GB for the L1 table), so with somewhat smaller image sizes you'll probably see qemu aborting for a failed g_malloc(). If you need huge image sizes, you should increase the cluster size to the maximum of 2 MB in order to get higher limits. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-14 16:44:33 +02:00
Dong Xu Wang	c7e775e4dd	remove double semicolons Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2013-05-12 13:25:55 +04:00
Fam Zheng	cdeaf1f159	vmdk: add bdrv_co_write_zeroes Use special offset to write zeroes efficiently, when zeroed-grain GTE is available. If zero-write an allocated cluster, cluster is leaked because its offset pointer is overwritten by "0x1". Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-03 10:33:49 +02:00
Fam Zheng	e304e8e5a0	vmdk: store fields of VmdkMetaData in cpu endian Previously VmdkMetaData.offset is stored little endian while other fields are cpu endian. This changes offset to cpu endian and convert before writing to image. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-03 10:33:46 +02:00
Fam Zheng	95b0aa4231	vmdk: change magic number to macro Two hard coded flag bits are changed to macros. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-03 10:33:43 +02:00
Fam Zheng	69e0b6dfa4	vmdk: Add option to create zeroed-grain image Add image create option "zeroed-grain" to enable zeroed-grain GTE feature of vmdk sparse extents. When this option is on, header version of newly created extent will be 2 and VMDK4_FLAG_ZERO_GRAIN flag bit will be set. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-03 10:33:41 +02:00
Fam Zheng	14ead646fe	vmdk: add support for “zeroed‐grain” GTE Introduced support for zeroed-grain GTE, as specified in Virtual Disk Format 5.0[1]. Recent VMware hosted platform products support a new “zeroed‐grain” grain table entry (GTE). The zeroed‐grain GTE returns all zeros on read. In other words, the zeroed‐grain GTE indicates that a grain in the child disk is zero‐filled but does not actually occupy space in storage. A sparse extent with zeroed‐grain GTE has the following in its header: * SparseExtentHeader.version = 2 * SparseExtentHeader.flags has bit 2 set Other than the new flag and the possibly zeroed‐grain GTE, version 2 sparse extents are identical to version 1. Also, a zeroed‐grain GTE has value 0x1 in the GT table. [1] Virtual Disk Format 5.0, http://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf?src=vmdk Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-03 10:33:38 +02:00
Fam Zheng	65f7472577	vmdk: named return code. Internal routines in vmdk.c previously return -1 on error and 0 on success. More return values are useful for future changes such as zeroed-grain GTE. Change all the magic `return 0` and `return -1` to macro names: * VMDK_OK 0 * VMDK_ERROR (-1) * VMDK_UNALLOC (-2) * VMDK_ZEROED (-3) Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-03 10:33:35 +02:00
Jeff Cody	059e2fbbca	block: add read-only support to VHDX image format. This adds in read-only support to the VHDX image format. This supports reads for fixed-size, and dynamic sized VHDX images. Differencing files are still unsupported. The image must be opened without BDRV_O_RDWR set, because we do not yet update the headers. I.e., pass 'readonly=on' in the drive image options from the QEMU commandline. Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-03 10:31:58 +02:00
Jeff Cody	e8d4e5ffdb	block: initial VHDX driver support framework - supports open and probe This is the initial block driver framework for VHDX image support (i.e. Hyper-V image file formats), that supports opening VHDX files, and parsing the headers. This commit does not yet enable: - reading - writing - updating the header - differencing files (images with parents) - log replay / dirty logs (only clean images) This is based on Microsoft's VHDX specification: "VHDX Format Specification v0.95", published 4/12/2012 https://www.microsoft.com/en-us/download/details.aspx?id=29681 Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-03 10:31:58 +02:00
Jeff Cody	203cdba3bc	block: vhdx header for the QEMU support of VHDX images This is based on Microsoft's VHDX specification: "VHDX Format Specification v0.95", published 4/12/2012 https://www.microsoft.com/en-us/download/details.aspx?id=29681 These structures define the various header, metadata, and other block structures defined in the VHDX specification. Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-05-03 10:31:58 +02:00
Liu Yuan	859e5553a4	sheepdog: fix loadvm operation Currently the 'loadvm' opertaion works as following: 1. switch to the snapshot 2. mark current working VDI as a snapshot 3. rely on sd_create_branch to create a new working VDI based on the snapshot This works not the same as other format as QCOW2. For e.g, qemu > savevm # get a live snapshot snap1 qemu > savevm # snap2 qemu > loadvm 1 # This will steally create snap3 of the working VDI Which will result in following snapshot chain: base <-- snap1 <-- snap2 <-- snap3 ^ \| working VDI snap3 was unnecessarily created and might be annoying users. This patch discard the unnecessary 'snap3' creation. and implement rollback(loadvm) operation to the specified snapshot by 1. switch to the snapshot 2. delete working VDI 3. rely on sd_create_branch to create a new working VDI based on the snapshot The snapshot chain for above example will be: base <-- snap1 <-- snap2 ^ \| working VDI Cc: qemu-devel@nongnu.org Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-26 13:37:51 +02:00
MORITA Kazutaka	13c31de2fd	sheepdog: resend write requests when SD_RES_READONLY is received When a snapshot is taken from out side of qemu (e.g. qemu-img snapshot), write requests to the current vdi return SD_RES_READONLY. In this case, the sheepdog block driver needs to update the current inode to the latest one and resend the write requests. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-26 13:26:28 +02:00
MORITA Kazutaka	9ff53a0eb8	sheepdog: add helper function to reload inode This adds a helper function to update the current inode state with the specified vdi object. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-26 13:26:28 +02:00
MORITA Kazutaka	6a0b549033	sheepdog: add SD_RES_READONLY result code Sheepdog returns SD_RES_READONLY when qemu sends write requests to the snapshot vdi. This adds the result code and makes sd_strerror() print its error reason. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-26 13:26:28 +02:00
MORITA Kazutaka	982dcbf4cb	sheepdog: cleanup find_vdi_name This makes 'filename' and 'tag' constant variables, and renames 'for_snapshot' to 'lock' to clear how it works. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-26 13:26:28 +02:00
Kevin Wolf	c3ca988d2b	rbd: Fix use after free in rbd_open() Commit `a9ccedc3` frees the QemuOpts for the driver-specific options immediately, even though it still needs the filename string that is contained there. This doesn't work. Move the deletion of the QemuOpts to the end of the function where its content isn't needed any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-26 13:26:28 +02:00
Liu Yuan	8d71c63137	sheepdog: implement .bdrv_co_is_allocated() Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-26 13:26:28 +02:00
Liu Yuan	e8bfaa2fae	sheepdog: use BDRV_SECTOR_SIZE Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-26 13:26:28 +02:00
Liu Yuan	cac8f4a60f	sheepdog: add discard/trim support for sheepdog The 'TRIM' command from VM that is to release underlying data storage for better thin-provision is already supported by the Sheepdog. This patch adds the TRIM support at QEMU part. For older Sheepdog that doesn't support it, we return 0(success) to upper layer. Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-26 13:26:27 +02:00
Anthony Liguori	f1ab7a5acf	Merge remote-tracking branch 'kwolf/for-anthony' into staging # By Kevin Wolf (16) and Stefan Hajnoczi (4) # Via Kevin Wolf * kwolf/for-anthony: qemu-iotests: add 053 unaligned compressed image size test block: Allow overriding backing.file.filename block: Remove filename parameter from .bdrv_file_open() vvfat: Use bdrv_open options instead of filename sheepdog: Use bdrv_open options instead of filename rbd: Use bdrv_open options instead of filename iscsi: Use bdrv_open options instead of filename gluster: Use bdrv_open options instead of filename curl: Use bdrv_open options instead of filename blkverify: Use bdrv_open options instead of filename blkdebug: Use bdrv_open options instead of filename raw-win32: Use bdrv_open options instead of filename raw-posix: Use bdrv_open options instead of filename block: Enable filename option block: Add driver-specific options for backing files block: Fail gracefully when using a format driver on protocol level qemu-iotests: Fix _filter_qemu qemu-img: do not zero-pad the compressed write buffer qcow: allow sub-cluster compressed write to last cluster qcow2: allow sub-cluster compressed write to last cluster Message-id: 1366630294-18984-1-git-send-email-kwolf@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-04-22 08:08:22 -05:00
Anthony Liguori	25690739f1	Merge remote-tracking branch 'bonzini/nbd-next' into staging # By Stefan Hajnoczi # Via Paolo Bonzini * bonzini/nbd-next: nbd: set TCP_NODELAY nbd: use TCP_CORK in nbd_co_send_request() nbd: unlock mutex in nbd_co_send_request() error path Message-id: 1366381830-11267-1-git-send-email-pbonzini@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-04-22 08:05:14 -05:00
Kevin Wolf	56d1b4d21d	block: Remove filename parameter from .bdrv_file_open() It is unused now in all block drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 11:34:35 +02:00
Kevin Wolf	7ad9be64e8	vvfat: Use bdrv_open options instead of filename Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	c8c96350e0	sheepdog: Use bdrv_open options instead of filename This is only to convert the internal interface that is used for passing the "filename" to be parsed, but converting to actual fine grained options is left for another day, as it doesn't look trivial. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	a9ccedc3da	rbd: Use bdrv_open options instead of filename This is only to convert the internal interface that is used for passing the "filename" to be parsed, but converting to actual fine grained options is left for another day, as it doesn't look trivial. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	60beb3412d	iscsi: Use bdrv_open options instead of filename This is only to convert the internal interface that is used for passing the "filename" to be parsed, but converting to actual fine grained options is left for another day, as it doesn't look trivial. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	b489477653	gluster: Use bdrv_open options instead of filename This is only to convert the internal interface that is used for passing the "filename" to be parsed, but converting to actual fine grained options is left for another day, as it doesn't look trivial. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	8e6d58cd5b	curl: Use bdrv_open options instead of filename As a bonus, going through the QemuOpts QEMU_OPT_SIZE parser for the readahead option gives us proper error reporting that the previous use of atoi() lacked. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	16c790926b	blkverify: Use bdrv_open options instead of filename Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	f468121290	blkdebug: Use bdrv_open options instead of filename Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	8a79380b8e	raw-win32: Use bdrv_open options instead of filename Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	c66a615723	raw-posix: Use bdrv_open options instead of filename Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Kevin Wolf	31ca6d077c	block: Add driver-specific options for backing files Options starting in "backing." are passed to the backing file now. If you don't need to specify the filename for the backing file, you can add it on the command line instead of in the image file: $ qemu-nbd -t /tmp/test.img $ qemu-img create -f qcow2 empty.qcow2 1G $ qemu-system-x86_64 -drive file=empty.qcow2,backing.file.driver=nbd,\ backing.file.host=localhost Note that this doesn't override the backing filename from the image. If the image has one, this will fail because NBD doesn't want the options and a filename at the same time. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-04-22 10:27:59 +02:00
Stefan Hajnoczi	16b3c5cd9f	qcow: allow sub-cluster compressed write to last cluster Compression in qcow requires image length to be a multiple of the cluster size. Lift this requirement by zero-padding the final cluster when necessary. The virtual disk size is still not cluster-aligned, so the guest cannot access the zero sectors. Note that this is almost identical to the qcow2 version of this code. qcow2's compression code is drawn from qcow. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-04-22 10:27:58 +02:00
Stefan Hajnoczi	f4d38bef7c	qcow2: allow sub-cluster compressed write to last cluster Compression in qcow2 requires image length to be a multiple of the cluster size. Lift this requirement by zero-padding the final cluster when necessary. The virtual disk size is still not cluster-aligned, so the guest cannot access the zero sectors. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-04-22 10:27:58 +02:00
Richard W.M. Jones	c7a101f529	ssh: Remove unnecessary use of strlen function. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-19 11:45:38 +02:00
Stefan Weil	6ae7d660a0	block/ssh: Add missing gcc format attributes Now gcc will check whether format string and variable arguments match. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-19 11:44:16 +02:00
Stefan Hajnoczi	97ebbab0e3	nbd: set TCP_NODELAY Disable the Nagle algorithm to reduce latency. Note this means we must also use TCP_CORK when sending header followed by payload to avoid fragmenting lots of little packets. The previous patch took care of that. Suggested-by: Nick Thomas <nick@bytemark.co.uk> Tested-by: Nick Thomas <nick@bytemark.co.uk> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-04-15 16:35:17 +02:00
Stefan Hajnoczi	0fcece25c0	nbd: use TCP_CORK in nbd_co_send_request() Use TCP_CORK to defer packet transmission until both the header and the payload have been written. Suggested-by: Nick Thomas <nick@bytemark.co.uk> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-04-15 16:30:40 +02:00
Stefan Hajnoczi	6760c47aa4	nbd: unlock mutex in nbd_co_send_request() error path Cc: qemu-stable@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-04-15 16:29:46 +02:00
Josh Durgin	dc7588c1eb	rbd: add an asynchronous flush The existing bdrv_co_flush_to_disk implementation uses rbd_flush(), which is sychronous and causes the main qemu thread to block until it is complete. This results in unresponsiveness and extra latency for the guest. Fix this by using an asynchronous version of flush. This was added to librbd with a special #define to indicate its presence, since it will be backported to stable versions. Thus, there is no need to check the version of librbd. Implement this as bdrv_aio_flush, since it matches other aio functions in the rbd block driver, and leave out bdrv_co_flush_to_disk when the asynchronous version is available. Reported-by: Oliver Francke <oliver@filoo.de> Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-15 10:18:05 +02:00
Richard W.M. Jones	9a2d462e7b	block: ssh: Use libssh2_sftp_fsync (if supported by libssh2) to flush to disk. libssh2_sftp_fsync is an extension to libssh2 to support fsync(2) over sftp, which is itself an extension of OpenSSH. If both libssh2 and the ssh daemon support it, this will allow bdrv_flush_to_disk to commit changes through to disk on the remote server. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-15 10:18:05 +02:00
Richard W.M. Jones	0a12ec87a5	block: Add support for Secure Shell (ssh) block device. qemu-system-x86_64 -drive file=ssh://hostname/some/image QEMU will ssh into 'hostname' and open '/some/image' which is made available as a standard block device. You can specify a username (ssh://user@host/...) and/or a port number (ssh://host:port/...). You can also use an alternate syntax using properties (file.user, file.host, file.port, file.path). Current limitations: - Authentication must be done without passwords or passphrases, using ssh-agent. Other authentication methods are not supported. - Uses a single connection, instead of concurrent AIO with multiple SSH connections. This is implemented using libssh2 on the client side. The server just requires a regular ssh daemon with sftp-server support. Most ssh daemons on Unix/Linux systems will work out of the box. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-15 10:18:05 +02:00
Kevin Wolf	8d3b1a2d0b	block: Introduce bdrv_pwritev() for qcow2_save_vmstate Directly pass the QEMUIOVector on instead of linearising it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-15 08:26:18 +02:00
Kevin Wolf	cf8074b382	block: Introduce bdrv_writev_vmstate Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-15 08:26:18 +02:00
Aurelien Jarno	753d9b82c5	aes: move aes.h from include/block to include/qemu Move aes.h from include/block to include/qemu to show it can be reused by other subsystems. Cc: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@gmail.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2013-04-13 13:51:57 +02:00
Paolo Bonzini	0d09e41a51	hw: move headers to include/ Many of these should be cleaned up with proper qdev-/QOM-ification. Right now there are many catch-all headers in include/hw/ARCH depending on cpu.h, and this makes it necessary to compile these files per-target. However, fixing this does not belong in these patches. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-04-08 18:13:10 +02:00
Kevin Wolf	c2b6ff51e4	qcow2: Fix L1 write error handling in qcow2_update_snapshot_refcount It ignored the error code, and at least the 'goto fail' is obvious nonsense as it creates an endless loop (if the next attempt doesn't magically succeed) and leaves the in-memory L1 table in big-endian instead of converting it back. In error cases, there's no point in writing an updated L1 table, so skip this part for them. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-04-05 18:58:05 +02:00
Kevin Wolf	c2bc78b6a9	qcow2: Return real error in qcow2_update_snapshot_refcount This fixes the error message triggered by the following script: cat > /tmp/blkdebug.cfg <<EOF [inject-error] event = "cluster_free" errno = "28" immediately = "off" EOF $qemu_img create -f qcow2 test.qcow2 10G $qemu_img snapshot -c snap test.qcow2 $qemu_img snapshot -d snap blkdebug:/tmp/blkdebug.cfg:test.qcow2 Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-04-05 18:58:05 +02:00
Stefan Hajnoczi	f9e8cacc55	oslib-posix: rename socket_set_nonblock() to qemu_set_nonblock() The fcntl(fd, F_SETFL, O_NONBLOCK) flag is not specific to sockets. Rename to qemu_set_nonblock() just like qemu_set_cloexec(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2013-04-02 11:47:37 -04:00
Kevin Wolf	ecdd5333ab	qcow2: Gather clusters in a looping loop Instead of just checking once in exactly this order if there are dependendies, non-COW clusters and new allocation, this starts looping around these. This way we can, for example, gather non-COW clusters after new allocations as long as the host cluster offsets stay contiguous. Once handle_dependencies() is extended so that COW areas of in-flight allocations can be overwritten, this allows to continue with gathering other clusters (we wouldn't be able to do that without this change because we would have missed a possible second dependency in one of the next clusters). This means that in the typical sequential write case, we can combine the COW overwrite of one cluster with the allocation of the next cluster as soon as something like Delayed COW gets actually implemented. It is only by avoiding splitting requests this way that Delayed COW actually starts improving performance noticably. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:44 +01:00
Kevin Wolf	2c3b32d256	qcow2: Move cluster gathering to a non-looping loop This patch is mainly to separate the indentation change from the semantic changes. All that really changes here is that everything moves into a while loop, all 'goto done' become 'break' and at the end of the loop a new 'break is inserted. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:44 +01:00
Kevin Wolf	88c6588c51	qcow2: Allow requests with multiple l2metas Instead of expecting a single l2meta, have a list of them. This allows to still have a single I/O request for the guest data, even though multiple l2meta may be needed in order to describe both a COW overwrite and a new cluster allocation (typical sequential write case). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:44 +01:00
Kevin Wolf	710c2496d8	qcow2: Use byte granularity in qcow2_alloc_cluster_offset() This gets rid of the nb_clusters and keep_clusters and the associated complicated calculations. Just advance the number of bytes that have been processed and everything is fine. This patch advances the variables even after the last operation even though they aren't used any more afterwards to make things look more uniform. A later patch will turn the whole thing into a loop and then it actually starts making sense. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:44 +01:00
Kevin Wolf	411d62b04b	qcow2: Prepare handle_alloc/copied() for byte granularity This makes handle_alloc() and handle_copied() return byte-granularity host offsets instead of returning always the cluster start. This is required so that qcow2_alloc_cluster_offset() can stop aligning everything to cluster boundaries. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	e62daaf679	qcow2: handle_copied(): Implement non-zero host_offset Look only for clusters that start at a given physical offset. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	c53ede9f6d	qcow2: handle_copied(): Get rid of keep_clusters parameter Now *bytes is used to return the length of the area that can be written to without performing an allocation or COW. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	acb0467f8d	qcow2: handle_copied(): Get rid of nb_clusters parameter handle_copied() uses its bytes parameter now to determine how many clusters it should try to find. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	0af729ec00	qcow2: Factor out handle_copied() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	83baa9a471	qcow2: Clean up handle_alloc() Things can be simplified a bit now. No semantic changes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	c37f4cd71d	qcow2: Finalise interface of handle_alloc() The interface works completely on a byte granularity now and duplicated parameters are removed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	3b8e2e260c	qcow2: handle_alloc(): Get rid of keep_clusters parameter handle_alloc() is now called with the offset at which the actual new allocation starts instead of the offset at which the whole write request starts, part of which may already be processed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	f5bc635094	qcow2: handle_alloc(): Get rid of nb_clusters parameter We already communicate the same information in *bytes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	10f0ed8b2f	qcow2: Factor out handle_alloc() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	037689d896	qcow2: Decouple cluster allocation from cluster reuse code This moves some code that prepares the allocation of new clusters to where the actual allocation happens. This is the minimum required to be able to move it to a separate function in the next patch. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	65eb2e35c0	qcow2: Change handle_dependency to byte granularity This is a more precise description of what really constitutes a dependency. The behaviour doesn't change at this point because the COW area of the old request is still aligned to cluster boundaries and therefore an overlap is detected wheneven the requests touch any part of the same cluster. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	d9d74f4177	qcow2: Improve check for overlapping allocations The old code detected an overlapping allocation even when the allocations didn't actually overlap, but were only adjacent. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	17a71e5823	qcow2: Handle dependencies earlier Handling overlapping allocations isn't just a detail of cluster allocation. It is rather one of three ways to get the host cluster offset for a write request: 1. If a request overlaps an in-flight allocations, the cluster offset can be taken from there (this is what handle_dependencies will evolve into) or the request must just wait until the allocation has completed. Accessing the L2 is not valid in this case, it has outdated information. 2. Outside overlapping areas, check the clusters that can be written to as they are, with no COW involved. 3. If a COW is required, allocate new clusters Changing the code to reflect this doesn't change the behaviour because overlaps cannot exist for clusters that are kept in step 2. It does however make it easier for later patches to work on clusters that belong to an allocation that is still in flight. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:42 +01:00
Kevin Wolf	9ee6439e27	qcow2: Remove bogus unlock of s->lock The unlock wakes up the next coroutine, but the currently running coroutine will lock it again before it yields, so this doesn't make a lot of sense. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:42 +01:00
Kevin Wolf	c349ca4bb2	qcow2: Fix "total clusters" number in bdrv_check This should be based on the virtual disk size, not on the size of the image. Interesting observation: With some VM state stored in the image file, percentages higher than 100% are possible, even though snapshots themselves are ignored. This is a qcow2 bug to be fixed another day: The VM state should be discarded in the active L2 tables after completing the snapshot creation. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:42 +01:00
Stefan Weil	ea804cadf8	block: Add options QDict to bdrv_file_open() prototypes (fix MinGW build) The new parameter is unused yet. This part was missing in commit `787e4a8500`. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-25 09:53:04 +01:00
Liu Yuan	d43731c758	rbd: fix compile error Commit `787e4a85` [block: Add options QDict to bdrv_file_open() prototypes] didn't update rbd.c accordingly. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-25 09:51:43 +01:00
Kevin Wolf	681e7ad024	nbd: Check against invalid option combinations A file name may only specified if no host or socket path is specified. The latter two may not appear at the same time either. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:32 +01:00
Kevin Wolf	bebbf7fa9c	nbd: Use default port if only host is specified The URL method already takes care to apply the default port when none is specfied. Directly specifying driver-specific options required the port number until now. Allow leaving it out and apply the default. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:32 +01:00
Kevin Wolf	f5866fa438	block: Make find_image_format safe with NULL filename In order to achieve this, the .bdrv_probe callbacks of all drivers must cope with this. The DMG driver is the only one that bases its decision on the filename and it needs to be changed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:32 +01:00
Kevin Wolf	6963a30d82	block: Introduce .bdrv_parse_filename callback If a driver needs structured data and not just a string, it can provide a .bdrv_parse_filename callback now that parses the command line string into separate options. Keeping this separate from .bdrv_open_filename ensures that the preferred way of directly specifying the options always works as well if parsing the string works. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:32 +01:00
Kevin Wolf	f53a1febcd	nbd: Accept -drive options for the network connection The existing parsers for the file name now parse everything into the bdrv_open() options QDict. Instead of using these parsers, you can now directly specify the options on the command line, like this: qemu-system-x86_64 -drive file=nbd:,file.port=1234,file.host=::1 Clearly the file=... part could use further improvement, but it's a start. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:32 +01:00
Kevin Wolf	f17c90bed1	nbd: Keep hostname and port separate The NBD block supports an URL syntax, for which a URL parser returns separate hostname and port fields. It also supports the traditional qemu syntax encoded in a filename. Until now, after parsing the URL to get each piece of information, a new string is built to be fed to socket functions. Instead of building a string in the URL case that is immediately parsed again, parse the string in both cases and use the QemuOpts interface to qemu-sockets.c. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:31 +01:00
Kevin Wolf	787e4a8500	block: Add options QDict to bdrv_file_open() prototypes The new parameter is unused yet. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:31 +01:00
Kevin Wolf	acdfb480ba	qcow2: Fix segfault in qcow2_invalidate_cache Need to pass an options QDict to qcow2_open() now. This fixes a segfault on the migration target with qcow2. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-19 11:48:36 +01:00
Liu Yuan	fca23f0ad2	sheepdog: show error message for halt status Sheepdog (neither quorum nor unsafe mode) will refuse to serve IO requests when number of alive nodes is less than that of copies specified by users. This will return 0x19 to QEMU client which currently doesn't recognize it. This patch adds an error description when QEMU client receives it, other than plainly printing 'Invalid error code' Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-19 11:48:36 +01:00
Stefan Hajnoczi	c4d9d19645	threadpool: drop global thread pool Now that each AioContext has a ThreadPool and the main loop AioContext can be fetched with bdrv_get_aio_context(), we can eliminate the concept of a global thread pool from thread-pool.c. The submit functions must take a ThreadPool* argument. block/raw-posix.c and block/raw-win32.c use aio_get_thread_pool(bdrv_get_aio_context(bs)) to fetch the main loop's ThreadPool. tests/test-thread-pool.c must be updated to reflect the new thread_pool_submit() function prototypes. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>	2013-03-15 16:07:51 +01:00
MORITA Kazutaka	ed9ba72467	sheepdog: set io_flush handler in do_co_req If an io_flush handler is not set, qemu_aio_wait doesn't invoke callbacks. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-15 16:07:50 +01:00
MORITA Kazutaka	0d6db300cd	sheepdog: use non-blocking fd in coroutine context Using a blocking socket in the coroutine context reduces the chance of switching to other work. This patch makes the sheepdog driver use a non-blocking fd always. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-15 16:07:50 +01:00
Paolo Bonzini	381b487d54	qcow2: make is_allocated return true for zero clusters Otherwise, live migration of the top layer will miss zero clusters and let the backing file show through. This also matches what is done in qed. QCOW2_CLUSTER_ZERO clusters are invalid in v2 image files. Check this directly in qcow2_get_cluster_offset instead of replicating the test everywhere. Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-15 16:07:50 +01:00
Stefan Hajnoczi	3647917919	qcow2: drop unnecessary flush in qcow2_update_snapshot_refcount() We already flush when the function completes. There is no need to flush after every compressed cluster. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-15 16:07:50 +01:00
Stefan Hajnoczi	f9cb2860bd	qcow2: drop flush in update_cluster_refcount() The update_cluster_refcount() function increments/decrements a cluster's refcount and then returns the new refcount value. There is no need to flush since both update_cluster_refcount() callers already take care of this: 1. qcow2_alloc_bytes() calls update_cluster_refcount() when compressed sectors will be appended to an existing cluster with enough free space. qcow2_alloc_bytes() already flushes so there is no need to do so in update_cluster_refcount(). 2. qcow2_update_snapshot_refcount() sets a cache dependency on refcounts if it needs to update L2 entries. It also flushes before completing. Removing this flush significantly speeds up qcow2 snapshot creation: $ qemu-img create -f qcow2 test.qcow2 -o size=50G,preallocation=metadata $ time qemu-img snapshot -c new test.qcow2 Time drops from more than 3 minutes to under 1 second. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-15 16:07:50 +01:00
Stefan Hajnoczi	2154f24e4e	qcow2: flush in qcow2_update_snapshot_refcount() Users of qcow2_update_snapshot_refcount() do not flush consistently. qcow2_snapshot_create() flushes but qcow2_snapshot_goto() and qcow2_snapshot_delete() do not. Solve this by moving the bdrv_flush() into qcow2_update_snapshot_refcount(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-15 16:07:50 +01:00
Stefan Hajnoczi	c1f5bafd70	qcow2: set L2 cache dependency in qcow2_alloc_bytes() Compressed writes use qcow2_alloc_bytes() to allocate space with byte granularity. The affected clusters' refcounts will be incremented but we do not need to flush yet. Set a L2 cache dependency on the refcount block cache, so that the refcounts get written out before the L2 updates. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-15 16:07:50 +01:00
Stefan Hajnoczi	f6977f1556	qcow2: flush refcount cache correctly in qcow2_write_snapshots() Since qcow2 metadata is cached we need to flush the caches, not just the underlying file. Use bdrv_flush(bs) instead of bdrv_flush(bs->file). Also add the error return path when bdrv_flush() fails and move the flush after checking for qcow2_alloc_clusters() failure so that the qcow2_alloc_clusters() error return value takes precedence. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-15 16:07:50 +01:00
Stefan Hajnoczi	9991923b26	qcow2: flush refcount cache correctly in alloc_refcount_block() update_refcount() affects the refcount cache, it does not write to disk. Therefore bdrv_flush(bs->file) does nothing. We need to flush the refcount cache in order to write out the refcount updates! While we're here also add error returns when qcow2_cache_flush() fails. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-15 16:07:49 +01:00
Kevin Wolf	74c4510a3c	qcow2: Allow lazy refcounts to be enabled on the command line qcow2 images now accept a boolean lazy_refcounts options. Use it like this: -drive file=test.qcow2,lazy_refcounts=on If the option is specified on the command line, it overrides the default specified by the qcow2 header flags that were set when creating the image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-15 16:07:49 +01:00
Kevin Wolf	de9c0cec6c	block: Add options QDict to bdrv_open() prototype It doesn't do anything yet except storing the options QDict in the BlockDriverState. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-15 16:07:49 +01:00
Kevin Wolf	1a86938f04	block: Add options QDict to .bdrv_open() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-15 16:07:49 +01:00
Peter Lieven	cb1b83e740	iscsi: add iscsi_truncate support this patch adds iscsi_truncate which effectively allows for online resizing of iscsi volumes. for this to work you have to resize the volume on your storage and then call block_resize command in qemu which will issue a readcapacity16 to update the capacity. v4: - factor out complete readcapacity logic into a separate function - handle capacity change check condition in readcapacity function (this happens if the block_resize cmd is the first iscsi task executed after a resize on the storage) v3: - remove switch statement in iscsi_open - create separate patch for brdv_drain_all() in bdrv_truncate() v2: - add a general bdrv_drain_all() before bdrv_truncate() to avoid in-flight AIOs while the device is truncated - since no AIOs are in flight we can use a sync libiscsi call to re-read the capacity - factor out the readcapacity16 logic as it is redundant to iscsi_open() and iscsi_truncate(). Signed-off-by: Peter Lieven <pl@kamp.de> [allow any type of unit attention check condition in iscsi_readcapacity_sync(), as in Message-ID: <51263A2A.6070304@dlhnet.de> - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-03-05 17:51:51 +01:00
Peter Lieven	1dde716ed6	iscsi: retry read, write, flush and unmap on unit attention check conditions the storage might return a check condition status for various reasons. (e.g. bus reset, capacity change, thin-provisioning info etc.) currently all these informative status responses lead to an I/O error which is populated to the guest. this patch introduces a retry mechanism to avoid this. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-03-05 17:51:50 +01:00
MORITA Kazutaka	1b8bbb46e7	sheepdog: add support for connecting to unix domain socket This patch adds support for a unix domain socket for a connection between qemu and local sheepdog server. You can use the unix domain socket with the following syntax: $ qemu sheepdog+unix:///<vdiname>?socket=<socket path>[#snapid] Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-04 09:54:17 +01:00
MORITA Kazutaka	25af257d21	sheepdog: use inet_connect to simplify connect code This uses the form "<host>:<port>" for the representation of the sheepdog server to use inet_connect. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-04 09:54:17 +01:00
MORITA Kazutaka	5d6768e3b8	sheepdog: accept URIs The URI syntax is consistent with the NBD and Gluster syntax. The syntax is sheepdog[+tcp]://[host:port]/vdiname[#snapid\|#tag] Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-04 09:54:17 +01:00
MORITA Kazutaka	bf1c852aa9	move socket_set_nodelay to osdep.c Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-04 09:54:17 +01:00
Stefan Hajnoczi	4db35162ea	qcow2: support compressed clusters in BlockFragInfo Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-22 21:21:09 +01:00
Stefan Hajnoczi	fba31bae2d	qcow2: record fragmentation statistics during check The qemu-img check command can display fragmentation statistics: * Total number of clusters in virtual disk * Number of allocated clusters * Number of fragmented clusters This patch adds fragmentation statistics support to qcow2. Compressed and normal clusters count as allocated. Zero clusters are not counted as allocated unless their L2 entry has a non-zero offset (e.g. preallocation). Only the current L1 table counts towards the statistics - snapshots are ignored. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-22 21:21:09 +01:00
Stefan Hajnoczi	801f704452	qcow2: introduce check_refcounts_l1/l2() flags The check_refcounts_l1/l2() functions have a check_copied argument to check that the QCOW_O_COPIED flag is consistent with refcount == 1. This should be a bool, not an int. However, the next patch introduces qcow2 fragmentation statistics and also needs to pass an option to check_refcounts_l1/l2(). This is a good opportunity to use an int flags field. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-22 21:21:09 +01:00
Federico Simoncelli	c6bb9ad198	qemu-img: find the image end offset during check This patch adds the support for reporting the image end offset (in bytes). This is particularly useful after a conversion (or a rebase) where the destination is a block device in order to find the first unused byte at the end of the image. Signed-off-by: Federico Simoncelli <fsimonce@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-02-22 21:21:08 +01:00
Stefan Hajnoczi	8a8f584008	block/curl: only restrict protocols with libcurl>=7.19.4 The curl_easy_setopt(state->curl, CURLOPT_PROTOCOLS, ...) interface was introduced in libcurl 7.19.4. Therefore we cannot protect against CVE-2013-0249 when linking against an older libcurl. This fixes the build failure introduced by `fb6d1bbd24`. Reported-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Andreas Färber <andreas.faeber@web.de> Message-id: 1360743934-8337-1-git-send-email-stefanha@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-02-13 11:57:35 -06:00
Stefan Hajnoczi	33ccf6675f	Revert "block/vpc: Fix size calculation" This reverts commit `f880defbb0`. Jeff Cody's testing revealed that the interpretation of size differs even between VirtualPC and HyperV. Revert this so there is time to consider the impact of any backwards incompatible behavior this change creates. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-12 12:25:15 +01:00
Stefan Hajnoczi	da888d37b0	block/raw-posix: detect readonly Linux block devices using BLKROGET Linux block devices can be set read-only with "blockdev --setro <device>". The same thing can be done for LVM volumes using "lvchange --permission r <volume>". This read-only setting is independent of device node permissions. Therefore the device can still be opened O_RDWR but actual writes will fail. This results in odd behavior for QEMU. bdrv_open() is supposed to fail if a read-only image is being opened with BDRV_O_RDWR. By not failing for Linux block devices, the guest boots up but every write produces an I/O error. This patch checks whether the block device is read-only so that Linux block devices behave like regular files. Reported-by: Sibiao Luo <sluo@redhat.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-02-12 12:22:49 +01:00
Stefan Weil	f880defbb0	block/vpc: Fix size calculation The size calculated from the CHS values is not the real image (disk) size, but usually a smaller value. This is caused by rounding effects. Only older operating systems use CHS. Such guests won't be able to use the whole disk. All modern operating systems use the real size. This patch fixes https://bugs.launchpad.net/qemu/+bug/1105670/. Signed-off-by: Stefan Weil <sw@weilnetz.de> Message-id: 1360265212-22037-1-git-send-email-sw@weilnetz.de Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-02-11 08:14:41 -06:00
Markus Armbruster	312fd5f290	error: Strip trailing '\n' from error string arguments (again) Commit `6daf194d` and `be62a2eb` got rid of a bunch, but they keep coming back. Tracked down with this Coccinelle semantic patch: @r@ expression err, eno, cls, fmt; position p; @@ ( error_report(fmt, ...)@p \| error_set(err, cls, fmt, ...)@p \| error_set_errno(err, eno, cls, fmt, ...)@p \| error_setg(err, fmt, ...)@p \| error_setg_errno(err, eno, fmt, ...)@p ) @script:python@ fmt << r.fmt; p << r.p; @@ if "\\n" in str(fmt): print "%s:%s:%s:%s" % (p[0].file, p[0].line, p[0].column, fmt) Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 1360354939-10994-4-git-send-email-armbru@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-02-11 08:13:19 -06:00
Stefan Hajnoczi	fb6d1bbd24	block/curl: disable extra protocols to prevent CVE-2013-0249 There is a buffer overflow in libcurl POP3/SMTP/IMAP. The workaround is simple: disable extra protocols so that they cannot be exploited. Full details here: http://curl.haxx.se/docs/adv_20130206.html QEMU only cares about HTTP, HTTPS, FTP, FTPS, and TFTP. I have tested that this fix prevents the exploit on my host with libcurl-7.27.0-5.fc18. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-02-08 11:14:20 -06:00
Andreas Färber	fdf263f63f	block/raw-posix: Build fix for O_ASYNC Commit `eeb6b45d48` (block: raw-posix image file reopen) broke the build on OpenIndiana. illumos has no O_ASYNC. Exclude it from flags to be compared and instead assert that it is not set where defined. Cf. `e61ab1da7e` for qemu-ga. Cc: qemu-stable@nongnu.org (1.3.x) Cc: Jeff Cody <jcody@redhat.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andreas Färber <andreas.faerber@web.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 15:11:12 +01:00
Philipp Hahn	cd92347575	vmdk: Allow space in file name The previous scanf() format string stopped parsing the file name on the first white white space, which seems to be allowed at least by VMware Workstation. Change the format string to collect everything between the first and second quote as the file name, disallowing line breaks. Signed-off-by: Philipp Hahn <hahn@univention.de> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 14:58:29 +01:00
Kevin Wolf	46536235d8	parallels: Fix bdrv_open() error handling Return -errno instead of -1 on errors. Hey, no memory leak to fix here while we're touching it! Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 14:58:29 +01:00
Kevin Wolf	4f8aa2e19f	dmg: Use g_free instead of free The buffers are allocated with g_(re)alloc, so use g_free to free them. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 14:58:29 +01:00
Kevin Wolf	69d34a360d	dmg: Fix bdrv_open() error handling Return -errno instead of -1 on errors and add error checks in some places that didn't have one. Passing things by reference requires more correct typing, replaced a few off_ts therefore - with a 32-bit off_t this is even a fix for truncation bugs. While touching the code, fix even some more memory leaks than in the other drivers... Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 14:58:29 +01:00
Kevin Wolf	59294e4659	vpc: Fix bdrv_open() error handling Return -errno instead of -1 on errors. While touching the code, fix a memory leak. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 14:58:28 +01:00
Kevin Wolf	1a60657f57	cloop: Fix bdrv_open() error handling Return -errno instead of -1 on errors. While touching the code, fix a memory leak. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 14:58:28 +01:00
Kevin Wolf	5b7d7dfd19	bochs: Fix bdrv_open() error handling Return -errno instead of -1 on errors. While touching the code, fix a memory leak. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 14:58:28 +01:00
Liu Yuan	6f74c260b4	sheepdog: pass vdi_id to sheep daemon for sd_close() Sheep daemon needs vdi_id to identify which vdi is closed to release resources such as object cache. Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 14:58:28 +01:00
Othmar Pasteka	7f2039f611	vmdk: Allow selecting SCSI adapter in image creation Introduce a new option "adapter_type" when converting to vmdk images. It can be one of the following: ide (default), buslogic, lsilogic or legacyESX (according to the vmdk spec from vmware). In case of a non-ide adapter, heads is set to 255 instead of the 16. The latter is used for "ide". Also see LP#545089 Signed-off-by: Othmar Pasteka <pasteka@kabsi.at> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 14:58:28 +01:00
Markus Armbruster	6528499fa4	g_malloc(0) and g_malloc0(0) return NULL; simplify Once upon a time, it was decided that qemu_malloc(0) should abort. Switching to glib retired that bright idea. Some code that was added to cope with it (e.g. in commits `702ef63`, `b76b6e9`) is still around. Bury it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-30 11:14:46 +01:00
Paolo Bonzini	88ff0e48ee	mirror: do nothing on zero-sized disk On a zero-sized disk we need to break out of the job successfully before bdrv_dirty_iter_init is called, otherwise you will get an assertion failure with the next patch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:35 +01:00
Stefan Weil	0e87ba2ccb	block/vdi: Check for bad signature vdi_open did not check for a bad signature. This check was only in vdi_probe. Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:35 +01:00
Stefan Weil	8937f8222c	block/vdi: Improved return values from vdi_open vdi_open returned -1 in case of any error, but it should return an error code (negative value of errno or -EMEDIUMTYPE). Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:35 +01:00
Stefan Weil	9f0470bb2d	block/vdi: Improve debug output for signature The signature is a 32 bit value and needs up to 8 hex digits for printing. Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:35 +01:00
Stefan Weil	15bac0d54f	block: Use error code EMEDIUMTYPE for wrong format in some block drivers This improves error reports for bochs, cow, qcow, qcow2, qed and vmdk when a file with the wrong format is selected. Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:35 +01:00
Paolo Bonzini	884fea4e87	mirror: support arbitrarily-sized iterations Yet another optimization is to extend the mirroring iteration to include more adjacent dirty blocks. This limits the number of I/O operations and makes mirroring efficient even with a small granularity. Most of the infrastructure is already in place; we only need to put a loop around the computation of the origin and sector count of the iteration. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:35 +01:00
Paolo Bonzini	402a47411b	mirror: support more than one in-flight AIO operation With AIO support in place, we can start copying more than one chunk in parallel. This patch introduces the required infrastructure for this: the buffer is split into multiple granularity-sized chunks, and there is a free list to access them. Because of copy-on-write, a single operation may already require multiple chunks to be available on the free list. In addition, two different iterations on the HBitmap may want to copy the same cluster. We avoid this by keeping a bitmap of in-flight I/O operations, and blocking until the previous iteration completes. This should be a pretty rare occurrence, though; as long as there is no overlap the next iteration can start before the previous one finishes. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:35 +01:00
Paolo Bonzini	08e4ed6cde	mirror: add buf-size argument to drive-mirror This makes sense when the next commit starts using the extra buffer space to perform many I/O operations asynchronously. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:34 +01:00
Paolo Bonzini	bd48bde8f0	mirror: switch mirror_iteration to AIO There is really no change in the behavior of the job here, since there is still a maximum of one in-flight I/O operation between the source and the target. However, this patch already introduces the AIO callbacks (which are unmodified in the next patch) and some of the logic to count in-flight operations and only complete the job when there is none. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:34 +01:00
Paolo Bonzini	eee13dfe30	mirror: allow customizing the granularity The desired granularity may be very different depending on the kind of operation (e.g. continuous replication vs. collapse-to-raw) and whether the VM is expected to perform lots of I/O while mirroring is in progress. Allow the user to customize it, while providing a sane default so that in general there will be no extra allocated space in the target compared to the source. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:34 +01:00
Paolo Bonzini	50717e941b	block: allow customizing the granularity of the dirty bitmap Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:34 +01:00
Paolo Bonzini	acc906c6c5	block: return count of dirty sectors, not chunks Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:33 +01:00
Paolo Bonzini	b812f6719c	mirror: perform COW if the cluster size is bigger than the granularity When mirroring runs, the backing files for the target may not yet be ready. However, this means that a copy-on-write operation on the target would fill the missing sectors with zeros. Copy-on-write only happens if the granularity of the dirty bitmap is smaller than the cluster size (and only for clusters that are allocated in the source after the job has started copying). So far, the granularity was fixed to 1MB; to avoid the problem we detected the situation and required the backing files to be available in that case only. However, we want to lower the granularity for efficiency, so we need a better solution. The solution is to always copy a whole cluster the first time it is touched. The code keeps a bitmap of clusters that have already been allocated by the mirroring job, and only does "manual" copy-on-write if the chunk being copied is zero in the bitmap. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:33 +01:00
Paolo Bonzini	8f0720ecbc	block: implement dirty bitmap using HBitmap This actually uses the dirty bitmap in the block layer, and converts mirroring to use an HBitmapIter. Reviewed-by: Laszlo Ersek <lersek@redhat.com> (except block/mirror.c parts) Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:33 +01:00
Peter Lieven	7371d56fb2	iscsi: add support for iovectors This patch adds support for directly passing the iovec array from QEMUIOVector if libiscsi supports it (1.8.0 or newer). Signed-off-by: Peter Lieven <pl@kamp.de> [Preserve the improvements from commit `4cc841b`, iscsi: partly avoid iovec linearization in iscsi_aio_writev, 2012-11-19 - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-01-24 15:37:55 +01:00
Paolo Bonzini	4790b03d30	iscsi: do not leak acb->buf when commands are aborted acb->buf is freed in the WRITE(16) callback, but this may not get called at all when commands are aborted. Add another free in the ABORT TASK callback, which requires setting acb->buf to NULL everywhere. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-01-24 15:37:55 +01:00
Anthony Liguori	177f7fc688	Merge remote-tracking branch 'bonzini/scsi-next' into staging # By Peter Lieven (3) and others # Via Paolo Bonzini * bonzini/scsi-next: scsi: Drop useless null test in scsi_unit_attention() lsi: use qbus_reset_all to reset SCSI bus scsi: fix segfault with 0-byte disk iscsi: add support for iSCSI NOPs [v2] iscsi: partly avoid iovec linearization in iscsi_aio_writev iscsi: add iscsi_create support	2013-01-23 09:08:54 -06:00
Peter Lieven	5b5d34ec98	iscsi: add support for iSCSI NOPs [v2] This patch will send NOP-Out PDUs every 5 seconds to the iSCSI target. If a consecutive number of NOP-In replies fail a reconnect is initiated. iSCSI NOPs help to ensure that the connection to the target is still operational. This should not, but in reality may be the case even if the TCP connection is still alive if there are bugs in either the target or the initiator implementation. v2: - track the NOPs inside libiscsi so libiscsi can reset the counter in case it initiates a reconnect. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-01-22 15:07:03 +01:00
Peter Lieven	4cc841b57c	iscsi: partly avoid iovec linearization in iscsi_aio_writev libiscsi expects all write16 data in a linear buffer. If the iovec only contains one buffer we can skip the linearization step as well as the additional malloc/free and pass the buffer directly. Reported-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-01-22 15:07:03 +01:00
Peter Lieven	de8864e5ae	iscsi: add iscsi_create support This patch adds support for bdrv_create. This allows e.g. to use qemu-img to convert from any supported device to an iscsi backed storage as destination. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-01-22 15:07:03 +01:00
Anthony Liguori	8b17ed4caa	Merge remote-tracking branch 'stefanha/block' into staging # By Kevin Wolf (4) and others # Via Stefan Hajnoczi * stefanha/block: dataplane: support viostor virtio-pci status bit setting dataplane: avoid reentrancy during virtio_blk_data_plane_stop() win32-aio: use iov utility functions instead of open-coding them win32-aio: Fix memory leak win32-aio: Fix vectored reads aio: Fix return value of aio_poll() ide: Remove wrong assertion block: fix null-pointer bug on error case in block commit	2013-01-20 11:01:10 -06:00
Andreas Färber	c36dd8a09f	block/raw-posix: Make hdev_aio_discard() available outside Linux Fixes the build on OpenBSD among others. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Andreas Färber <andreas.faerber@web.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2013-01-19 14:35:02 +00:00
Michael Tokarev	3249dbe661	win32-aio: use iov utility functions instead of open-coding them We have iov_from_buf() and iov_to_buf(), use them instead of open-coding these in block/win32-aio.c Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-18 09:57:51 +01:00
Kevin Wolf	e8bccad5ac	win32-aio: Fix memory leak The buffer is allocated for both reads and writes, and obviously it should be freed even if an error occurs. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-17 10:58:09 +01:00
Kevin Wolf	bcbbd234d4	win32-aio: Fix vectored reads Copying data in the right direction really helps a lot! Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-17 10:57:13 +01:00
Jeff Cody	6d759117d3	block: fix null-pointer bug on error case in block commit This is a bug that was caught by a coverity run by Markus. In the error case when we errored out to exit_restore_open early in the function, 'overlay_bs' was still NULL at that point, although it is used to look up flags and perform a bdrv_reopen(). Move the overlay_bs lookup to where it is needed, and check for NULL before restoring the flags. Also get rid of the unneeded parameter initialization. Reported-By: Markus Armbruster <armbru@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-17 10:51:11 +01:00
Markus Armbruster	7191bf311e	block: Fix how mirror_run() frees its buffer It allocates with qemu_blockalign(), therefore it must free with qemu_vfree(), not g_free(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-15 17:28:55 +01:00
Markus Armbruster	7479acdbce	win32-aio: Fix how win32_aio_process_completion() frees buffer win32_aio_submit() allocates it with qemu_blockalign(), therefore it must be freed with qemu_vfree(), not g_free(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-15 16:47:45 +01:00
Liu Yuan	f700f8e346	sheepdog: clean up sd_aio_setup() The last two parameters of sd_aio_setup() are never used, so remove them. Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-15 13:40:10 +01:00
Liu Yuan	4778307278	sheepdog: multiplex the rw FD to flush cache This will reduce sockfds connected to the sheep server to one, which simply the future hacks. Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-15 11:18:49 +01:00
Paolo Bonzini	8238010b26	block: make discard asynchronous This is easy with the thread pool, because we can use s->is_xfs and s->has_discard from the worker function. QEMU has a widespread assumption that each I/O operation writes less than 2^32 bytes. This patch doesn't fix it throughout of course, but it starts correcting struct RawPosixAIOData so that there is no regression with respect to the synchronous discard implementation. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-15 10:03:47 +01:00
Paolo Bonzini	fcd9d45552	raw: support discard on block devices Block devices use a ioctl instead of fallocate, so add a separate implementation. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-15 10:03:47 +01:00
Paolo Bonzini	c85191e5c9	raw-posix: remember whether discard failed Avoid sending system calls repeatedly if they shall fail. This does not apply to XFS: if the filesystem-specific ioctl fails, something weird is happening. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-15 10:03:47 +01:00
Kusanagi Kouichi	3d4fa43e64	raw-posix: support discard on more filesystems Linux 2.6.38 introduced the filesystem independent interface to deallocate part of a file. As of Linux 3.7, btrfs, ext4, ocfs2, tmpfs and xfs support it. Even though the system calls here are in practice issued on Linux, the code is structured to allow plugging in alternatives for other Unix variants. EOPNOTSUPP is used unconditionally in this patch, but it is supported in both OpenBSD and Mac OS X since forever (see for example http://lists.debian.org/debian-glibc/2006/02/msg00337.html). Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-15 10:03:47 +01:00
Kevin Wolf	8d2497c355	qcow2: Fix segfault on zero-length write One of the recent refactoring patches (commit `f50f88b9`) didn't take care to initialise l2meta properly, so with zero-length writes, which don't even enter the write loop, qemu just segfaulted. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-15 09:08:55 +01:00
Anthony Liguori	da758bd7a3	Merge remote-tracking branch 'kwolf/for-anthony' into staging * kwolf/for-anthony: dataplane: handle misaligned virtio-blk requests dataplane: extract virtio-blk read/write processing into do_rdwr_cmd() block: make qiov_is_aligned() public raw-posix: fix bdrv_aio_ioctl sheepdog: implement direct write semantics block: do not probe zero-sized disks Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-01-14 10:26:26 -06:00
Stefan Hajnoczi	c53b1c5114	block: make qiov_is_aligned() public The qiov_is_aligned() function checks whether a QEMUIOVector meets a BlockDriverState's alignment requirements. This is needed by virtio-blk-data-plane so: 1. Move the function from block/raw-posix.c to block/block.c. 2. Make it public in block/block.h. 3. Rename to bdrv_qiov_is_aligned(). 4. Change return type from int to bool. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-14 10:06:56 +01:00
Paolo Bonzini	b608c8dc02	raw-posix: fix bdrv_aio_ioctl When the raw-posix aio=thread code was moved from posix-aio-compat.c to block/raw-posix.c, there was an unintended change to the ioctl code. The code used to return the ioctl command, which posix_aio_read() would later morph into a zero. This hack is not necessary anymore, and in fact breaks scsi-generic (which expects a zero return code). Remove it. Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-14 10:06:56 +01:00
Liu Yuan	0e7106d8b5	sheepdog: implement direct write semantics Sheepdog supports both writeback/writethrough write but has not yet supported DIRECTIO semantics which bypass the cache completely even if Sheepdog daemon is set up with cache enabled. Suppose cache is enabled on Sheepdog daemon size, the new cache control is cache=writeback # enable the writeback semantics for write cache=writethrough # enable the emulated writethrough semantics for write cache=directsync # disable cache competely Guest WCE toggling on the run time to toggle writeback/writethrough is also supported. Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-14 10:06:56 +01:00
Paolo Bonzini	4d4545743f	qemu-option: move standard option definitions out of qemu-config.c Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-01-12 17:17:53 +01:00
Stefan Weil	eb7ff6fb0b	Replace remaining gmtime, localtime by gmtime_r, localtime_r This allows removing of MinGW specific code and improves reentrancy for POSIX hosts. [Removed unused ret variable in qemu_get_timedate() to fix warning: vl.c: In function ‘qemu_get_timedate’: vl.c:451:16: error: variable ‘ret’ set but not used [-Werror=unused-but-set-variable] -- Stefan Hajnoczi] Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-11 09:44:37 +01:00
Liu Yuan	d6b1ef89a1	sheepdog: pass oid directly to send_pending_req() Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-02 16:09:00 +01:00
Liu Yuan	bd751f2204	sheepdog: don't update inode when create_and_write fails For the error case such as SD_RES_NO_SPACE, we shouldn't update the inode bitmap to avoid the scenario that the object is allocated but wasn't created at the server side. This will result in VM's IO error on the failed object. Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-02 16:08:58 +01:00
Stefan Weil	fccedc624c	block/raw-win32: Fix compiler warnings (wrong format specifiers) Commit `fbcad04d6b` added fprintf statements with wrong format specifiers. GetLastError() returns a DWORD which is unsigned long, so %lu must be used. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-02 16:08:57 +01:00
Stefan Hajnoczi	4065742ac0	raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane The raw_get_aio_fd() function allows virtio-blk-data-plane to get the file descriptor of a raw image file with Linux AIO enabled. This interface is really a layering violation that can be resolved once the block layer is able to run outside the global mutex - at that point virtio-blk-data-plane will switch from custom Linux AIO code to using the block layer. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-02 15:31:39 +01:00
Paolo Bonzini	9c17d615a6	softmmu: move include files to include/sysemu/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:32:45 +01:00
Paolo Bonzini	1de7afc984	misc: move include files to include/qemu/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:32:39 +01:00
Paolo Bonzini	caf71f86a3	migration: move include files to include/migration/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:31:32 +01:00
Paolo Bonzini	737e150e89	block: move include files to include/block/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:31:31 +01:00
Paolo Bonzini	7b1b5d1913	qapi: move include files to include/qobject/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:31:31 +01:00
Paolo Bonzini	f8fe796407	janitor: do not include qemu-char everywhere Touching char/char.h basically causes the whole of QEMU to be rebuilt. Avoid this, it is usually unnecessary. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:29:59 +01:00
Paolo Bonzini	077805fa92	janitor: do not rely on indirect inclusions of or from qemu-char.h Various header files rely on qemu-char.h including qemu-config.h or main-loop.h, but they really do not need qemu-char.h at all (particularly interesting is the case of the block layer!). Clean this up, and also add missing inclusions of qemu-char.h itself. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:29:52 +01:00
Paolo Bonzini	525877c999	build: move rules from Makefile to */Makefile.objs Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:29:06 +01:00
Kevin Wolf	226c3c26b9	qcow2: Factor out handle_dependencies() Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	4e95314e2b	qcow2: Execute run_dependent_requests() without lock There's no reason for run_dependent_requests() to hold s->lock, and a later patch will require that in fact the lock is not held. Also, before this patch, run_dependent_requests() not only does what its name suggests, but also removes the l2meta from the list of in-flight requests. When changing this, it becomes an one-liner, so just inline it completely. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	280d373579	qcow2: Enable dirty flag in qcow2_alloc_cluster_link_l2 This is closer to where the dirty flag is really needed, and it avoids having checks for special cases related to cluster allocation directly in the writev loop. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	f50f88b9fe	qcow2: Allocate l2meta only for cluster allocations Even for writes to already allocated clusters, an l2meta is allocated, though it stays effectively unused. After this patch, only allocating requests still have one. Each l2meta now describes an in-flight request that writes to clusters that are not yet hooked up in the L2 table. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	060bee8943	qcow2: Drop l2meta.cluster_offset There's no real reason to have an l2meta for normal requests that don't allocate anything. Before we can get rid of it, we must return the host cluster offset in a different way. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	cf5c1a231e	qcow2: Allocate l2meta dynamically As soon as delayed COW is introduced, the l2meta struct is needed even after completion of the request, so it can't live on the stack. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	593fb83cac	qcow2: Introduce Qcow2COWRegion This makes it easier to address the areas for which a COW must be performed. As a nice side effect, the COW code in qcow2_alloc_cluster_link_l2 becomes really trivial. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	1d3afd649b	qcow2: Round QCowL2Meta.offset down to cluster boundary The offset within the cluster is already present as n_start and this is what the code uses. QCowL2Meta.offset is only needed at a cluster granularity. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	67a7a0ebe5	qcow2: Move BLKDBG_EVENT out of the lock We want to use these events to suspend requests for testing concurrent AIO requests. Suspending requests while they are holding the CoMutex is rather boring for this purpose. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-12 12:33:48 +01:00
Kevin Wolf	3c90c65d7a	blkdebug: Implement suspend/resume of AIO requests This allows more systematic AIO testing. The patch adds three new operations to blkdebug: * Setting a "breakpoint" on a blkdebug event. The next request that triggers this breakpoint is suspended and is tagged with a name. The breakpoint is removed after a request has triggered it. * A suspended request (identified by it's tag) can be resumed * It's possible to check whether a suspended request with a given tag exists. This can be used for waiting for an event. Ideally, we would instead tag requests right when they are created and set breakpoints for individual requests. However, at this point the block layer doesn't allow this easily, and breakpoints that trigger for any request already allow a lot of useful testing. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-12 12:33:48 +01:00
Kevin Wolf	9e35542b0f	blkdebug: Factor out remove_rule() The cleanup work to remove a rule depends on the type of the rule. It's easy for the existing rules as there is no data that must be cleaned up and is specific to a type yet, but the next patch will change this. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-12 12:33:48 +01:00
Kevin Wolf	312a2ba0eb	blkdebug: Allow usage without config file As soon as new rules can be set during runtime, as introduced by the next patch, blkdebug makes sense even without a config file. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-12 12:33:48 +01:00
Fabien Chouteau	fbcad04d6b	Fix error code checking for SetFilePointer() call An error has occurred if the return value is invalid_set_file_pointer and getlasterror doesn't return no_error. Signed-off-by: Fabien Chouteau <chouteau@adacore.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2012-12-11 11:36:57 +01:00
Stefan Priebe	473c7f0255	rbd: Fix race between aio completition and aio cancel This one fixes a race which qemu had also in iscsi block driver between cancellation and io completition. qemu_rbd_aio_cancel was not synchronously waiting for the end of the command. To archieve this it introduces a new status flag which uses -EINPROGRESS. Signed-off-by: Stefan Priebe <s.priebe@profihost.ag> Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-11 11:05:11 +01:00
Paolo Bonzini	c208e8c2d8	raw-posix: inline paio_ioctl into hdev_aio_ioctl clang now warns about an unused function: CC block/raw-posix.o block/raw-posix.c:707:26: warning: unused function paio_ioctl [-Wunused-function] static BlockDriverAIOCB paio_ioctl(BlockDriverState bs, int fd, ^ 1 warning generated. because the only use of paio_ioctl() is inside a #if defined(__linux__) guard and it is static now. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2012-12-11 11:04:26 +01:00
Charles Arnold	258d2edbcd	block: vpc support for ~2 TB disks The VHD specification allows for up to a 2 TB disk size. The current implementation in qemu emulates EIDE and ATA-2 hardware which only allows for up to 127 GB. This disk size limitation can be overridden by allowing up to 255 heads instead of the normal 4 bit limitation of 16. Doing so allows disk images to be created of up to nearly 2 TB. This change does not violate the VHD format specification nor does it change how smaller disks (ie, <=127GB) are defined. [Charles Arnold also writes: "In analyzing a 160 GB VHD fixed disk image created on Windows 2008 R2, it appears that MS is also ignoring the CHS values in the footer geometry field in whatever driver they use for accessing the image. The CHS values are set at 65535,16,255 which obviously doesn't represent an image size of 160 GB." -- Stefan] Signed-off-by: Charles Arnold <carnold@suse.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2012-12-11 11:04:26 +01:00
Charles Arnold	1fe1fa510a	block: vpc initialize the uuid footer field Initialize the uuid field in the footer with a generated uuid. Signed-off-by: Charles Arnold <carnold@suse.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2012-12-11 11:04:25 +01:00
Kevin Wolf	c57b6656c3	aio: Get rid of qemu_aio_flush() There are no remaining users, and new users should probably be using bdrv_drain_all() in the first place. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-11 11:04:25 +01:00
Peter Lieven	f807ecd574	iscsi: do not assume device is zero initialized Without any complex checks we can't assume that an iscsi target is initialized to zero. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-11-28 12:51:58 +01:00
Peter Lieven	e829b0bb05	iscsi: fix deadlock during login If the connection is interrupted before the first login is successfully completed qemu-kvm is waiting forever in qemu_aio_wait(). This is fixed by performing an sync login to the target. If the connection breaks after the first successful login errors are handled internally by libiscsi. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-11-28 12:50:56 +01:00
Peter Lieven	8da1e18b0c	iscsi: fix segfault in url parsing If an invalid URL is specified iscsi_get_error(iscsi) is called with iscsi == NULL. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-11-28 12:46:13 +01:00
Stefan Priebe	08448d5195	use int64_t for return values from rbd instead of int rbd / rados tends to return pretty often length of writes or discarded blocks. These values might be bigger than int. The steps to reproduce are: mkfs.xfs -f a whole device bigger than int in bytes. mkfs.xfs sends a discard. Important is that you use scsi-hd and set discard_granularity=512. Otherwise rbd disabled discard support. Signed-off-by: Stefan Priebe <s.priebe@profihost.ag> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2012-11-21 09:43:23 +01:00
Stefan Hajnoczi	8ba2aae32c	vdi: don't override libuuid symbols It's poor symbol hygiene to provide a global symbols that collide with a common library like libuuid. If QEMU links against a shared library that depends on uuid_generate() it can end up calling our stub version of the function. This exact scenario happened with GlusterFS libgfapi.so, which depends on libglusterfs.so's uuid_generate(). Scope the uuid stubs for vdi.c only and avoid affecting other shared objects. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2012-11-21 09:40:29 +01:00
Jeff Cody	1bc6b705ee	block: add bdrv_reopen() support for raw hdev, floppy, and cdrom For hdev, floppy, and cdrom, the reopen() handlers are the same as for the file reopen handler. For floppy and cdrom types, however, we keep O_NONBLOCK, as in the _open function. Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2012-11-21 09:40:29 +01:00
Gerhard Wiesinger	b1649fae49	vmdk: Fix data corruption bug in WRITE and READ handling Fixed a MAJOR BUG in VMDK files on file boundaries on reads and ALSO ON WRITES WHICH MIGHT CORRUPT THE IMAGE AND DATA!!!!!! Triggered for example with the following VMDK file (partly listed): RW 4193792 FLAT "XP-W1-f001.vmdk" 0 RW 2097664 FLAT "XP-W1-f002.vmdk" 0 RW 4193792 FLAT "XP-W1-f003.vmdk" 0 RW 512 FLAT "XP-W1-f004.vmdk" 0 RW 4193792 FLAT "XP-W1-f005.vmdk" 0 RW 2097664 FLAT "XP-W1-f006.vmdk" 0 RW 4193792 FLAT "XP-W1-f007.vmdk" 0 RW 512 FLAT "XP-W1-f008.vmdk" 0 Patch includes: 1.) Patch fixes wrong calculation on extent boundaries. Especially it fixes the relativeness of the sector number to the current extent. Verfied correctness with: 1.) Converted either with Virtualbox to VDI and then with qemu-img and then with qemu-img only: VBoxManage clonehd --format vdi /VM/XP-W/new/XP-W1.vmdk ~/.VirtualBox/Harddisks/XP-W1-new-test.vdi ./qemu-img convert -O raw ~/.VirtualBox/Harddisks/XP-W1-new-test.vdi /root/QEMU/VM-XP-W1/XP-W1-via-VBOX.img md5sum /root/QEMU/VM-XP-W/XP-W1-direct.img md5sum /root/QEMU/VM-XP-W/XP-W1-via-VBOX.img => same MD5 hash 2.) Verified debug log files 3.) Run Windows XP successfully 4.) chkdsk run successfully without any errors Signed-off-by: Gerhard Wiesinger <lists@wiesinger.com> Acked-by: Fam Zheng <famcool@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-11-14 18:19:23 +01:00
Stefan Hajnoczi	d7331bed11	aio: rename AIOPool to AIOCBInfo Now that AIOPool no longer keeps a freelist, it isn't really a "pool" anymore. Rename it to AIOCBInfo and make it const since it no longer needs to be modified. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-11-14 18:19:21 +01:00
Stefan Weil	cee40d2d2d	block: Workaround for older versions of MinGW gcc Versions before gcc-4.6 don't support unnamed fields in initializers (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10676). Offset and OffsetHigh belong to an unnamed struct which is part of an unnamed union. Therefore the original code does not work with older versions of gcc. Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-11-14 18:19:21 +01:00
Kevin Wolf	a354807706	qcow2: Fix refcount table size calculation A missing factor for the refcount table entry size in the calculation could mean that too little memory was allocated for the in-memory representation of the table, resulting in a buffer overflow. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Tested-by: Michael Tokarev <mjt@tls.msk.ru>	2012-11-14 18:19:21 +01:00
Paolo Bonzini	1d7d2a9d21	nbd: accept URIs The URI syntax is consistent with the Gluster syntax. Export names are specified in the path, preceded by one or more (otherwise unused) slashes. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-11-12 14:38:28 +01:00
Paolo Bonzini	d04b0bbbc9	nbd: accept relative path to Unix socket Adding the "is_unix" member now will simplify the parsing of NBD URIs. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-11-12 11:33:29 +01:00
Paolo Bonzini	f563a5d7a8	Merge remote-tracking branch 'origin/master' into threadpool Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-10-31 10:42:51 +01:00
Paolo Bonzini	a27365265c	raw-win32: implement native asynchronous I/O With the new support for EventNotifiers in the AIO event loop, we can hook a completion port to every opened file and use asynchronous I/O on them. Wine's support is extremely inefficient, also because it really does the I/O synchronously on regular files. (!) But it works, and it is good to keep the Win32 and POSIX ports as similar as possible. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-10-31 10:38:13 +01:00
Paolo Bonzini	10fb6e0682	raw-posix: move linux-aio.c to block/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-10-31 10:38:13 +01:00
Paolo Bonzini	fc4edb84bf	raw-win32: add emulated AIO support Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-10-31 10:38:13 +01:00
Paolo Bonzini	9f8540ecef	raw-posix: rename raw-posix-aio.h, hide unavailable prototypes Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-10-31 10:38:12 +01:00
Paolo Bonzini	de81a16936	raw: merge posix-aio-compat.c into block/raw-posix.c Making the qemu_paiocb specific to raw devices will let us access members of the BDRVRawState arbitrarily. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-10-31 10:38:12 +01:00
Paolo Bonzini	47e6b251a5	block: switch posix-aio-compat to threadpool This is not meant for portability, but to remove code duplication. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-10-31 10:38:12 +01:00

... 6 7 8 9 10 ...

1461 Commits