mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Kevin Wolf	3edf1e73d5	dmg: Implement .bdrv_co_preadv() interface This implements .bdrv_co_preadv() for the cloop block driver. While updating the error paths, change -1 to a valid -errno code. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	5cd230819e	cloop: Implement .bdrv_co_preadv() interface This implements .bdrv_co_preadv() for the cloop block driver. While updating the error paths, change -1 to a valid -errno code. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	3b8fd33011	bochs: Implement .bdrv_co_preadv() interface Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	3fb06697ae	block: Introduce .bdrv_co_preadv/pwritev BlockDriver function Many parts of the block layer are already byte granularity. The block driver interface, however, was still missing an interface that allows making use of this. This patch introduces a new BlockDriver interface, which is based on coroutines, vectored, has flags and uses a byte granularity. This is now the preferred interface for new drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	cab3a3563c	block: Rename bdrv_co_do_preadv/writev to bdrv_co_preadv/writev It used to be an internal helper function just for implementing bdrv_co_do_readv/writev(), but now that it's a public interface, it deserves a name without "do" in it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	0884447382	block: Support AIO drivers in bdrv_driver_preadv/pwritev() Instead of registering emulation functions as .bdrv_co_writev, just directly check whether the function is there or not, and use the AIO interface if it isn't. This makes the read/write functions more consistent with how things are done in other places (flush, discard, etc.) Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:07 +02:00
Kevin Wolf	78a07294d5	block: Introduce bdrv_driver_pwritev() This is a function that simply calls into the block driver for doing a write, providing the byte granularity interface we want to eventually have everywhere, and using whatever interface that driver supports. This one is a bit more interesting than the version for reads: It adds support for .bdrv_co_writev_flags() everywhere, so that drivers implementing this function can drop .bdrv_co_writev() now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:07 +02:00
Kevin Wolf	166fe96051	block: Introduce bdrv_driver_preadv() This is a function that simply calls into the block driver for doing a read, providing the byte granularity interface we want to eventually have everywhere, and using whatever interface that driver supports. For now, this is just a wrapper for calling bs->drv->bdrv_co_readv(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	dd7f7ed104	linux-aio: make it more type safe Replace void* with an opaque LinuxAioState type. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	6b98bd6495	block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end Extract the handling of io_plug "depth" from linux-aio.c and let the main bdrv_drain loop do nothing but wait on I/O. Like the two newly introduced functions, bdrv_io_plug and bdrv_io_unplug now operate on all children. The visit order is now symmetrical between plug and unplug, making it possible for formats to implement plug/unplug. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	ce0f141259	block: introduce bdrv_no_throttling_begin/end Extract the handling of throttling from bdrv_flush_io_queue. These new functions will soon become BdrvChildRole callbacks, as they can be generalized to "beginning of drain" and "end of drain". Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	b6e84c97ed	block: extract bdrv_drain_poll/bdrv_co_yield_to_drain from bdrv_drain/bdrv_co_drain Do not call bdrv_drain_recurse twice in bdrv_co_drain. A small tweak to the logic in Fam's patch, which is harmless since no one implements bdrv_drain anyway. But better get it right. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	a72f641407	block: move restarting of throttled reqs to block/throttle-groups.c We want to remove throttled_reqs from block/io.c. This is the easy part---hide the handling of throttled_reqs during disable/enable of throttling within throttle-groups.c. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	733bbc8cea	block: make bdrv_start_throttled_reqs return void The return value is unused and I am not sure why it would be useful. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Kevin Wolf	90c78624f1	block: Don't disable I/O throttling on sync requests We had to disable I/O throttling with synchronous requests because we didn't use to run timers in nested event loops when the code was introduced. This isn't true any more, and throttling works just fine even when using the synchronous API. The removed code is in fact dead code since commit `a8823a3b` ('block: Use blk_co_pwritev() for blk_write()') because I/O throttling can only be set on the top layer, but BlockBackend always uses the coroutine interface now instead of using the sync API emulation in block.c. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <1458660792-3035-2-git-send-email-kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Eric Blake	15c2f669e3	qapi: Split visit_end_struct() into pieces As mentioned in previous patches, we want to call visit_end_struct() functions unconditionally, so that visitors can release resources tied up since the matching visit_start_struct() without also having to worry about error priority if more than one error occurs. Even though error_propagate() can be safely used to ignore a second error during cleanup caused by a first error, it is simpler if the cleanup cannot set an error. So, split out the error checking portion (basically, input visitors checking for unvisited keys) into a new function visit_check_struct(), which can be safely skipped if any earlier errors are encountered, and leave the cleanup portion (which never fails, but must be called unconditionally if visit_start_struct() succeeded) in visit_end_struct(). Generated code in qapi-visit.c has diffs resembling: \|@@ -59,10 +59,12 @@ void visit_type_ACPIOSTInfo(Visitor *v, \| goto out_obj; \| } \| visit_type_ACPIOSTInfo_members(v, obj, &err); \|- error_propagate(errp, err); \|- err = NULL; \|+ if (err) { \|+ goto out_obj; \|+ } \|+ visit_check_struct(v, &err); \| out_obj: \|- visit_end_struct(v, &err); \|+ visit_end_struct(v); \| out: and in qapi-event.c: @@ -47,7 +47,10 @@ void qapi_event_send_acpi_device_ost(ACP \| goto out; \| } \| visit_type_q_obj_ACPI_DEVICE_OST_arg_members(v, &param, &err); \|- visit_end_struct(v, err ? NULL : &err); \|+ if (!err) { \|+ visit_check_struct(v, &err); \|+ } \|+ visit_end_struct(v); \| if (err) { \| goto out; Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1461879932-9020-20-git-send-email-eblake@redhat.com> [Conflict with a doc fixup resolved] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2016-05-12 09:47:55 +02:00
Kevin Wolf	d208c50d9d	vvfat: Fix default volume label Commit `d5941dd` documented that it leaves the default volume name as it was ("QEMU VVFAT"), but it doesn't actually implement this. You get an empty name (eleven space characters) instead. This fixes the implementation to apply the advertised default. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-04-29 11:14:13 +02:00
Kevin Wolf	ebb72c9f06	vvfat: Fix volume name assertion Commit `d5941dd` made the volume name configurable, but it didn't consider that the rw code compares the volume name string to assert that the first directory entry is the volume name. This made vvfat crash in rw mode. This fixes the assertion to compare with the configured volume name instead of a literal string. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-04-29 11:14:08 +02:00
Fam Zheng	ab27c3b5e7	mirror: Workaround for unexpected iohandler events during completion Commit `5a7e7a0ba` moved mirror_exit to a BH handler but didn't add any protection against new requests that could sneak in just before the BH is dispatched. For example (assuming a code base at that commit): main_loop_wait # 1 os_host_main_loop_wait g_main_context_dispatch aio_ctx_dispatch aio_dispatch ... mirror_run bdrv_drain (a) block_job_defer_to_main_loop qemu_iohandler_poll virtio_queue_host_notifier_read ... virtio_submit_multiwrite (b) blk_aio_multiwrite main_loop_wait # 2 <snip> aio_dispatch aio_bh_poll (c) mirror_exit At (a) we know the BDS has no pending request. However, the same main_loop_wait call is going to dispatch iohandlers (EventNotifier events), which may lead to a new I/O from guest. So the invariant is already broken at (c). Data loss. Commit `f3926945c8` made iohandler to use aio API. The order of virtio_queue_host_notifier_read and block_job_defer_to_main_loop within a main_loop_wait becomes unpredictable, and even worse, if the host notifier event arrives at the next main_loop_wait call, the unpredictable order between mirror_exit and virtio_queue_host_notifier_read is also a trouble. As shown below, this commit made the bug easier to trigger: - Bug case 1: main_loop_wait # 1 os_host_main_loop_wait g_main_context_dispatch aio_ctx_dispatch (qemu_aio_context) ... mirror_run bdrv_drain (a) block_job_defer_to_main_loop aio_ctx_dispatch (iohandler_ctx) virtio_queue_host_notifier_read ... virtio_submit_multiwrite (b) blk_aio_multiwrite main_loop_wait # 2 ... aio_dispatch aio_bh_poll (c) mirror_exit - Bug case 2: main_loop_wait # 1 os_host_main_loop_wait g_main_context_dispatch aio_ctx_dispatch (qemu_aio_context) ... mirror_run bdrv_drain (a) block_job_defer_to_main_loop main_loop_wait # 2 ... aio_ctx_dispatch (iohandler_ctx) virtio_queue_host_notifier_read ... virtio_submit_multiwrite (b) blk_aio_multiwrite aio_dispatch aio_bh_poll (c) mirror_exit In both cases, (b) breaks the invariant wanted by (a) and (c). Until then, the request loss has been silent. Later, `3f09bfbc7b` added asserts at (c) to check the invariant (in bdrv_replace_in_backing_chain), and Max reported an assertion failure first visible there, by doing active committing while the guest is running bonnie++. 2.5 added bdrv_drained_begin at (a) to protect the dataplane case from similar problems, but we never realize the main loop bug until now. As a bandage, this patch disables iohandler's external events temporarily together with bs->ctx. Launchpad Bug: 1570134 Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-22 16:44:09 +02:00
Fam Zheng	4150ae60eb	mirror: Don't extend the last sub-chunk The last sub-chunk is rounded up to the copy granularity in the target image, resulting in a larger size than the source. Add a function to clip the copied sectors to the end. This undoes the "wrong" changes to tests/qemu-iotests/109.out in `e5b43573e2`. The remaining two offset changes are okay. [ kwolf: Use DIV_ROUND_UP to calculate nb_chunks now ] Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2016-04-20 16:52:55 +02:00
Max Reitz	f27a274259	block/mirror: Refresh stale bitmap iterator cache If the drive's dirty bitmap is dirtied while the mirror operation is running, the cache of the iterator used by the mirror code may become stale and not contain all dirty bits. This only becomes an issue if we are looking for contiguously dirty chunks on the drive. In that case, we can easily detect the discrepancy and just refresh the iterator if one occurs. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-20 16:52:55 +02:00
Max Reitz	9c83625bdd	block/mirror: Revive dead yielding code mirror_iteration() is supposed to wait if the current chunk is subject to a still in-flight mirroring operation. However, it mixed checking this conflict situation with checking the dirty status of a chunk. A simplification for the latter condition (the first chunk encountered is always dirty) led to neglecting the former: We just skip the first chunk and thus never test whether it conflicts with an in-flight operation. To fix this, pull out the code which waits for in-flight operations on the first chunk of the range to be mirrored to settle. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-20 16:52:55 +02:00
Jeff Cody	d85fa9eb87	block/gluster: prevent data loss after i/o error Upon receiving an I/O error after an fsync, by default gluster will dump its cache. However, QEMU will retry the fsync, which is especially useful when encountering errors such as ENOSPC when using the werror=stop option. When using caching with gluster, however, the last written data will be lost upon encountering ENOSPC. Using the write-behind-cache xlator option of 'resync-failed-syncs-after-fsync' should cause gluster to retain the cached data after a failed fsync, so that ENOSPC and other transient errors are recoverable. Unfortunately, we have no way of knowing if the 'resync-failed-syncs-after-fsync' xlator option is supported, so for now close the fd and set the BDS driver to NULL upon fsync error. Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-04-19 12:24:59 -04:00
Jeff Cody	5d4343e6c2	block/gluster: code movement of qemu_gluster_close() Move qemu_gluster_close() further up in the file, in preparation for the next patch, to avoid a forward declaration. Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-04-19 12:24:59 -04:00
Jeff Cody	a882745356	block/gluster: return correct error value Upon error, gluster will call the aio callback function with a ret value of -1, with errno set to the proper error value. If we set the acb->ret value to the return value in the callback, that results in every error being EPERM (i.e. 1). Instead, set it to the proper error result. Reviewed-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-04-19 12:24:59 -04:00
Kevin Wolf	16aaf975ee	block: Don't ignore flags in blk_{,co,aio}_write_zeroes() Commit `57d6a428` neglected to pass the given flags to blk_aio_prwv(), which broke discard by WRITE SAME for scsi-disk (the UNMAP bit would be ignored). Commit `fc1453cd` introduced the same bug for blk_write_zeroes(). This is used for 'qemu-img convert' without has_zero_init (e.g. on a block device) and for preallocation=falloc in parallels. Commit `8896e088` is the version for blk_co_write_zeroes(). This function is only used in qemu-io. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-04-15 17:22:12 +02:00
Jeff Cody	9c057d0b68	block/vpc: update comments to be compliant w/coding guidelines Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-15 17:22:12 +02:00
Jeff Cody	32f6439cf7	block/vpc: set errp in vpc_open Add more useful error information to failure paths in vpc_open Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-15 17:22:12 +02:00
Jeff Cody	66176fc6a7	block/vpc: make checks on max table size a bit more lax The check on the max_table_size field not being larger than required is valid, and in accordance with the VHD spec. However, there have been VHD images encountered in the wild that have an out-of-spec max table size that is technically too large. There is no issue in allowing this larger table size, as we also later verify that the computed size (used for the pagetable) is large enough to fit all sectors. In addition, max_table_entries is bounds checked against SIZE_MAX and INT_MAX. Remove the strict check, so that we can accomodate these sorts of images that are benignly out of spec. Reported-by: Stefan Hajnoczi <stefanha@redhat.com> Reported-by: Grant Wu <grantwwu@gmail.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-15 17:22:12 +02:00
Jeff Cody	c23fb11bbb	block/vpc: Use the correct max sector count for VHD images The old VHD_MAX_SECTORS value is incorrect, and is a throwback to the CHS calculations. The VHD specification allows images up to 2040 GiB, which (using 512 byte sectors) corresponds to a maximum number of sectors of 0xff000000, rather than the old value of 0xfe0001ff. Update VHD_MAX_SECTORS to reflect the correct value. Also, update comment references to the actual size limit, and correct one compare so that we can have sizes up to the limit. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-15 17:22:12 +02:00
Jeff Cody	bab246db1d	block/vpc: use current_size field for XenConverter VHD images XenConverter VHD images are another VHD image where current_size is different from the CHS values in the the format header. Use current_size as the default, by looking at the creator_app signature field. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-15 17:22:12 +02:00
Stefan Hajnoczi	9bdfb9e8ac	vpc: use current_size field for XenServer VHD images The vpc driver has two methods of determining virtual disk size. The correct one to use depends on the software that generated the image file. Add the XenServer creator_app signature so that image size is correctly detected for those images. Reported-by: Grant Wu <grantwwu@gmail.com> Reported-by: Spencer Baugh <sbaugh@catern.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-15 17:22:12 +02:00
Jeff Cody	0211b9becc	block/vpc: set errp in vpc_create Add more useful error information to failure paths in vpc_create(). Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-15 17:22:11 +02:00
Kevin Wolf	7fa84cd8d4	block: Fix blk_aio_write_zeroes() Commit `57d6a428` broke blk_aio_write_zeroes() because in some write functions in the call path don't have an explicit length argument but reuse qiov->size instead. Which is great, except that write_zeroes doesn't have a qiov, which this commit interprets as 0 bytes. Consequently, blk_aio_write_zeroes() didn't effectively do anything. This patch introduces an explicit acb->bytes in BlkAioEmAIOCB and uses that instead of acb->rwco.size. The synchronous version of the function is okay because it does pass a qiov (with the right size and a NULL pointer as its base). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-04-15 17:22:11 +02:00
Max Reitz	4e876bcf2b	qcow2: Prevent backing file names longer than 1023 We reject backing file names with a length of more than 1023 characters when opening a qcow2 file, so we should not produce such files ourselves. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-12 18:06:51 +02:00
Paolo Bonzini	40a99aace3	vpc: fix return value check for blk_pwrite bdrv_pwrite_sync used to return zero or negative error, while blk_pwrite returns the number of written bytes when successful. This caused VPC image creation to fail spectacularly: it wrote the first 512 bytes, and then exited immediately because of the non-zero answer from blk_pwrite. But the truly spectacular part is that it returns a positive value (the 512 that blk_pwrite returned) causing everyone to believe that it succeeded. This fixes qemu-iotests with vpc format. Fixes: `b8f45cdf78` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-12 18:06:51 +02:00
Fam Zheng	39bf92dd70	mirror: Replace bdrv_drain(bs) with bdrv_co_drain(bs) Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1459855253-5378-3-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-04-11 16:59:09 +01:00
Fam Zheng	a77fd4bb29	block: Fix bdrv_drain in coroutine Using the nested aio_poll() in coroutine is a bad idea. This patch replaces the aio_poll loop in bdrv_drain with a BH, if called in coroutine. For example, the bdrv_drain() in mirror.c can hang when a guest issued request is pending on it in qemu_co_mutex_lock(). Mirror coroutine in this case has just finished a request, and the block job is about to complete. It calls bdrv_drain() which waits for the other coroutine to complete. The other coroutine is a scsi-disk request. The deadlock happens when the latter is in turn pending on the former to yield/terminate, in qemu_co_mutex_lock(). The state flow is as below (assuming a qcow2 image): mirror coroutine scsi-disk coroutine ------------------------------------------------------------- do last write qcow2:qemu_co_mutex_lock() ... scsi disk read tracked request begin qcow2:qemu_co_mutex_lock.enter qcow2:qemu_co_mutex_unlock() bdrv_drain while (has tracked request) aio_poll() In the scsi-disk coroutine, the qemu_co_mutex_lock() will never return because the mirror coroutine is blocked in the aio_poll(blocking=true). With this patch, the added qemu_coroutine_yield() allows the scsi-disk coroutine to make progress as expected: mirror coroutine scsi-disk coroutine ------------------------------------------------------------- do last write qcow2:qemu_co_mutex_lock() ... scsi disk read tracked request begin qcow2:qemu_co_mutex_lock.enter qcow2:qemu_co_mutex_unlock() bdrv_drain.enter > schedule BH > qemu_coroutine_yield() > qcow2:qemu_co_mutex_lock.return > ... tracked request end ... (resumed from BH callback) bdrv_drain.return ... Reported-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1459855253-5378-2-git-send-email-famz@redhat.com Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-04-11 16:59:09 +01:00
Peter Maydell	31370dbe5d	Block layer patches for 2.6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJXA9qJAAoJEH8JsnLIjy/W+W8P/2xTXH8+h0qurUvv5Rz6HUbD HHNlGnqa3M5yMLMqmtlb0J/9dj0pTlGNkp7d/9blh3MlZH/ZpeUMtq8Lro23YhUF 0J94sRKGhK3T5GqYSA/BFbVvXQJ3yX7cKcYaQjmh7rK6Ua+65Mv/dulci+jbfGuu BkiVgumGAalSeaFqXZR685g61ZHbz+mQJnd3VFcvletnPBu0j1GMkuU0THAcy09q CTUjwWlL9CHu1lYkAa0KxgFtj6mZ+gEu5ws5Lvk8yFtSB+af/mJtzoHuq/7+Ske2 7SiVXotFW8kR7ic1TnWiEku8+31FSBVJp6xUcRVTDOHVG7oSQxBDg5bGwPn8TxXy bvLvTJDIFodGhkiDTuGLuttvX+U2xCl4GmBS01OiFF53UGWgjjY+pkDZiaNC4nFW vwItj7/KGKL2Nq6cVfGCDOYYjFtHAPGI3yyJ2babXecv+9nKr0WeJpk6cfKVnP17 rZs28Y3Ub/P2M4oOt4YdhRSanQZbe5eIQOsdfWX4q12hujL0zbsCtV6dpjeTsY74 J4CBLzBYCj6y9Jc8R+D6XLYJBtJQGaSj99Oqe9WdUuHEqTGQt4HMuYHAj77wImVG ccURYiBpmB+FChLhG+yIlo1PHS0kpgeD+ZkZDHC0gYdiFqdKnFeQ7lj3Jj2tSWXY 7Y7qbaOhtXp+20M6oO+G =ax/b -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches for 2.6 # gpg: Signature made Tue 05 Apr 2016 16:32:25 BST using RSA key ID C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" * remotes/kevin/tags/for-upstream: crypto: Avoid memory leak on failure qemu-iotests: 149: Use "/usr/bin/env python" block: Forbid I/O throttling on nodes with multiple parents for 2.6 block: forbid x-blockdev-del from acting on DriveInfo Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-04-05 17:03:32 +01:00
Eric Blake	95c3df5a24	crypto: Avoid memory leak on failure Commit `7836857` introduced a memory leak due to invalid use of Error vs. visit_type_end(). If visiting the intermediate members fails, we clear the error and unconditionally use visit_end_struct() on the same error object; but if that cleanup succeeds, we then skip the qapi_free call. Until a later patch adds visit_check_struct(), the only safe approach is to use two separate error objects. Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 1459526222-30052-1-git-send-email-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-04-05 17:23:21 +02:00
Eric Blake	a89ef0c357	nbd: don't request FUA on FLUSH The NBD protocol does not clearly document what will happen if a client sends NBD_CMD_FLAG_FUA on NBD_CMD_FLUSH. Historically, both the qemu and upstream NBD servers silently ignored that flag, but that feels a bit risky. Meanwhile, the qemu NBD client unconditionally sends the flag (without even bothering to check whether the caller cares; at least with NBD_CMD_WRITE the client only sends FUA if requested by a higher layer). There is ongoing discussion on the NBD list to fix the protocol documentation to require that the server MUST ignore the flag (unless the kernel folks can better explain what FUA means for a flush), but until those doc improvements land, the current nbd.git master was recently changed to reject the flag with EINVAL (see nbd commit ab22e082), which now makes it impossible for a qemu client to use FLUSH with an upstream NBD server. We should not send FUA with flush unless the upstream protocol documents what it will do, and even then, it should be something that the caller can opt into, rather than being unconditional. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1459526902-32561-1-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-04-05 11:46:52 +02:00
Stefan Hajnoczi	0d94b74655	block/nfs: add missing #include "qemu/cutils.h" parse_uint_full() used to be included from qemu-common.h but was moved to qemu/cutils.h in commit `f348b6d1a5` ("util: move declarations out of qemu-common.h"). Cc: Veronia Bahaa <veroniabahaa@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1459341994-20567-3-git-send-email-stefanha@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-03-30 16:50:39 -04:00
Stefan Hajnoczi	d165b8cb8b	block/nfs: add missing #include "qapi/error.h" error_setg() used to be included indirectly through qemu/osdep.h. Since commit `da34e65cb4` ("include/qemu/osdep.h: Don't include qapi/error.h") it requires an explicit include. Cc: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1459341994-20567-2-git-send-email-stefanha@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-03-30 16:50:39 -04:00
Max Reitz	a90639270d	block/null-{co,aio}: Implement get_block_status() Signed-off-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 12:16:04 +02:00
Max Reitz	cd219eb1e5	block/null-{co,aio}: Allow reading zeroes This is optional so that it does not impede the null block driver's performance unless this behavior is desired. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 12:16:03 +02:00
Kevin Wolf	09cf9db1bc	block: Remove bdrv_(set_)enable_write_cache() The only remaining users were block jobs (mirror and backup) which unconditionally enabled WCE on the BlockBackend of the target image. As these block jobs don't go through BlockBackend for their I/O requests, they aren't affected by this setting anyway but always get a writeback mode, so that call can be removed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:03 +02:00
Kevin Wolf	61de4c6808	block: Remove BDRV_O_CACHE_WB The previous patches have successively made blk->enable_write_cache the true source for the information whether a writethrough mode must be implemented. The corresponding BDRV_O_CACHE_WB is only useless baggage we're carrying around, so now's the time to remove it. At the same time, we remove the 'cache.writeback' option parsing on the BDS level as the only effect was setting the BDRV_O_CACHE_WB flag. This change requires test cases that explicitly enabled the option to drop it. Other than that and the change of the error message when writethrough is enabled on the BDS level (from "Can't set writethrough mode" to "doesn't support the option"), there should be no change in behaviour. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:03 +02:00
Kevin Wolf	5481531154	raw: Support BDRV_REQ_FUA Pass through the FUA flag to the lower layer so that the separate flush can be saved in practically relevant cases where a (raw) format driver sits on top of the protocol driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:02 +02:00
Kevin Wolf	2b556518c3	nbd: Support BDRV_REQ_FUA The NBD server already used to send a FUA flag when the writethrough mode was set. This code was a remnant from the times where protocol drivers actually had to implement writethrough modes. Since nowadays the block layer sends flushes in writethrough mode and non-root nodes are always writeback, this was mostly dead code - only mostly because if NBD was configured to be used without a format, we sent _both_ FUA and an explicit flush afterwards, which makes the code not technically dead, but useless overhead. This patch changes the code so that the block layer's FUA flag is recognised and translated into a NBD FUA flag. The additional flush is avoided now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:02 +02:00
Kevin Wolf	9f0eb9e129	iscsi: Support BDRV_REQ_FUA This replaces the existing hack in the iscsi driver that sent the FUA bit in writethrough mode and ignored the following flush in order to optimise the number of roundtrips (see commit `73b5394e`). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:02 +02:00
Kevin Wolf	93f5e6d88a	block: Introduce bdrv_co_writev_flags() This function will allow drivers to implement BDRV_REQ_FUA natively instead of sending a separate flush after the write. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:02 +02:00
Kevin Wolf	c83f9fba2a	block/qapi: Use blk_enable_write_cache() Now that WCE is handled on the BlockBackend level, the flag is meaningless for BDSes. As the schema requires us to fill the field, we return an enabled write cache for them. Note that this means that querying the BlockBackend name may return writethrough as the cache information, whereas querying the node-name of the root of that same BlockBackend will return writeback. This may appear odd at first, but it actually makes sense because it correctly repesents the layer that implements the WCE handling. This becomes more apparent when you consider nodes that are the root node of multiple BlockBackends, where each BB can have its own WCE setting. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:02 +02:00
Kevin Wolf	bfd18d1e0b	block: Move enable_write_cache to BB level Whether a write cache is used or not is a decision that concerns the user (e.g. the guest device) rather than the backend. It was already logically part of the BB level as bdrv_move_feature_fields() always kept it on top of the BDS tree; with this patch, the core of it (the actual flag and the additional flushes) is also implemented there. Direct callers of bdrv_open() must pass BDRV_O_CACHE_WB now if bs doesn't have a BlockBackend attached. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:02 +02:00
Kevin Wolf	855a6a93a1	block: Handle flush error in bdrv_pwrite_sync() We don't want to silently ignore a flush error. Also, there is little point in avoiding the flush for writethrough modes and once WCE is moved to the BB layer, we definitely need the flush here because bdrv_pwrite() won't involve one any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:01 +02:00
Kevin Wolf	72e775c7d9	block: Always set writeback mode in blk_new_open() All callers of blk_new_open() either don't rely on the WCE bit set after blk_new_open() because they explicitly set it anyway, or they pass BDRV_O_CACHE_WB unconditionally. This patch changes blk_new_open() so that it always enables writeback mode and asserts that BDRV_O_CACHE_WB is clear. For those callers that used to pass BDRV_O_CACHE_WB unconditionally, the flag is removed now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:01 +02:00
Pavel Dovgalyuk	63785678f3	replay: introduce block devices record/replay This patch introduces block driver that implement recording and replaying of block devices' operations. All block completion operations are added to the queue. Queue is flushed at checkpoints and information about processed requests is recorded to the log. In replay phase the queue is matched with events read from the log. Therefore block devices requests are processed deterministically. Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> [ kwolf: Rebased onto modified and already applied part of the series ] Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 12:15:57 +02:00
Pavel Dovgalyuk	c32b82afaf	block: add flush callback This patch adds callback for flush request. This callback is responsible for flushing whole block devices stack. bdrv_flush function does not proceed to underlying devices. It should be performed by this callback function, if needed. Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 12:12:15 +02:00
Daniel P. Berrange	e6ff69bf5e	block: move encryption deprecation warning into qcow code For a couple of releases we have been warning Encrypted images are deprecated Support for them will be removed in a future release. You can use 'qemu-img convert' to convert your image to an unencrypted one. This warning was issued by system emulators, qemu-img, qemu-nbd and qemu-io. Such a broad warning was issued because the original intention was to rip out all the code for dealing with encryption inside the QEMU block layer APIs. The new block encryption framework used for the LUKS driver does not rely on the unloved block layer API for encryption keys, instead using the QOM 'secret' object type. It is thus no longer appropriate to warn about encryption unconditionally. When the qcow/qcow2 drivers are converted to use the new encryption framework too, it will be practical to keep AES-CBC support present for use in qemu-img, qemu-io & qemu-nbd to allow for interoperability with older QEMU versions and liberation of data from existing encrypted qcow2 files. This change moves the warning out of the generic block code and into the qcow/qcow2 drivers. Further, the warning is set to only appear when running the system emulators, since qemu-img, qemu-io, qemu-nbd are expected to support qcow2 encryption long term now that the maint burden has been eliminated. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 12:12:15 +02:00
Daniel P. Berrange	78368575a6	block: add generic full disk encryption driver Add a block driver that is capable of supporting any full disk encryption format. This utilizes the previously added block encryption code, and at this time supports the LUKS format. The driver code is capable of supporting any format supported by the QCryptoBlock module, so it registers one block driver for each format. This patch only registers the "luks" driver since the "qcow" driver is there only for back-compatibility with existing qcow built-in encryption. New LUKS compatible volumes can be formatted using qemu-img with defaults for all settings. $ qemu-img create --object secret,data=123456,id=sec0 \ -f luks -o key-secret=sec0 demo.luks 10G Alternatively the cryptographic settings can be explicitly set $ qemu-img create --object secret,data=123456,id=sec0 \ -f luks -o key-secret=sec0,cipher-alg=aes-256,\ cipher-mode=cbc,ivgen-alg=plain64,hash-alg=sha256 \ demo.luks 10G And query its size $ qemu-img info demo.img image: demo.img file format: luks virtual size: 10G (10737418240 bytes) disk size: 132K encrypted: yes Note that it was not necessary to provide the password when querying info for the volume. The password is only required when performing I/O on the volume All volumes created by this new 'luks' driver should be capable of being opened by the kernel dm-crypt driver. The only algorithms listed in the LUKS spec that are not currently supported by this impl are sha512 and ripemd160 hashes and cast6 cipher. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> [ kwolf - Added #include to resolve conflict with `da34e65c` ] Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 12:11:26 +02:00
Daniel P. Berrange	abb06c5ac1	block: add flag to indicate that no I/O will be performed When opening an image it is useful to know whether the caller intends to perform I/O on the image or not. In the case of encrypted images this will allow the block driver to avoid having to prompt for decryption keys when we merely want to query header metadata about the image. eg qemu-img info This flag is enforced at the top level only, since even if we don't want todo I/O on the 'qcow2' file payload, the underlying 'file' driver will still need todo I/O to read the qcow2 header, for example. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 11:59:32 +02:00
Max Reitz	5430215699	block/qapi: Pass bdrv_query_blk_stats() s->stats bdrv_query_blk_stats() does not need access to all of BlockStats, BlockDeviceStats is enough and is what this function is actually supposed to fill. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 11:59:32 +02:00
Max Reitz	0e8f44bee9	block/qapi: Set s->device in bdrv_query_stats() This is the only instance of bdrv_query_blk_stats() accessing anything in the BlockStats structure other than s->stats, so let us move it to its caller (where it makes just as much sense) allowing us to make bdrv_query_blk_stats() take a pointer to the BlockDeviceStats instead of BlockStats. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 11:59:32 +02:00
Peter Xu	5eda622768	block/qapi: fix unbounded stack for dump_qdict Using heap instead of stack for better safety. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 11:59:32 +02:00
Peter Xu	853ccfed8f	block/qapi: make two printf() formats literal Fix two places to use literal printf format when possible. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 11:59:32 +02:00
Kevin Wolf	72f41b6fbd	block: Remove blk_set_bs() The function is unused since commit `f21d96d0` ('block: Use BdrvChild in BlockBackend'). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-03-30 11:59:32 +02:00
Programmingkid	d0855f1235	block/raw-posix.c: Make physical devices usable in QEMU under Mac OS X host Mac OS X can be picky when it comes to allowing the user to use physical devices in QEMU. Most mounted volumes appear to be off limits to QEMU. If an issue is detected, a message is displayed showing the user how to unmount a volume. Now QEMU uses both CD and DVD media. Signed-off-by: John Arbuckle <programmingkidx@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 11:59:32 +02:00
Peter Maydell	553934db66	-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJW+dDJAAoJEL2+eyfA3jBXk6oP/R/zX4foUVFMTvDbxHwWc41t gXGk1BpIjFnteab/tzUBDIdgs/DPxzM6bClhe45gNInVBgnOyeVmpUwRGGNYQKbn FdkrAcC6Vy6BJv+xRTMMS+h4i6ebJ6HqqQPwkz0VulxsAknDPQsBebe0tM8uO7k9 G+ccMYOyUUiGTIRC3pBkRCu8APEialPSv3MpUTMtp71R3US+pEwmo1AgyOFq/lDu B/8LUBoR48XCEGfOA6ZixzoMwF1lTWpezx5/KF+fQ26sgnNzjpwYWnJk+LG7Gtvj 8PHYsHDoXSISlIgxzLpS0AA6s54+mutgIeNJG5FBXGrSSNlAB1+cKZsnZw42YjfI BVIHQkmcGT+h9UEDdekiOfQorypSYRm51ueTGO/lUbxNifvJ5LQA97F0G/filoCj ovGIfOwgpWaEBPCb//U1TRGhhTg+dNyCeC4GoxDEFyWmLPYp8p7Xtz+vsZOIdH4O Wl9i6BzzeNEgJyutKqn2qpNLl6Pfd548MOJJqAUkGxDGrCJMkmn2lJSpSSji6cdm y4Az/tPY0/xpxwjSRakaIMOlhDoGXmrQG+I6JG1TZLSH7x1+Ajhr2ryx4CBONceV 1quibAqoG1GwxCyYn7dv4aeJrDlg3XzEWQW6nJhuE91d9ZH+jF5u2+i+IZcQCDBe Cd6d0SZlcOnq3M5LiOrA =1ekF -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging # gpg: Signature made Tue 29 Mar 2016 01:48:09 BST using RSA key ID C0DE3057 # gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>" # gpg: aka "Jeffrey Cody <jeff@codyprime.org>" # gpg: aka "Jeffrey Cody <codyprime@gmail.com>" * remotes/cody/tags/block-pull-request: qemu-iotests: add no-op streaming test qemu-iotests: fix test_stream_partial() block: never cancel a streaming job without running stream_complete() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-03-29 19:54:49 +01:00
Alberto Garcia	6578629e08	block: never cancel a streaming job without running stream_complete() We need to call stream_complete() in order to do all the necessary clean-ups, even if there's an early failure. At the moment it's only useful to make sure that s->backing_file_str is not leaked, but it will become more important if we introduce support for streaming to any intermediate node. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 2abedf2debc65c250560237f31a8e6756883c8fc.1458566441.git.berto@igalia.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-03-28 13:56:44 -04:00
Veronia Bahaa	f348b6d1a5	util: move declarations out of qemu-common.h Move declarations out of qemu-common.h for functions declared in utils/ files: e.g. include/qemu/path.h for utils/path.c. Move inline functions out of qemu-common.h and into new files (e.g. include/qemu/bcd.h) Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-22 22:20:17 +01:00
Rutuja Shah	73bcb24d93	Replaced get_tick_per_sec() by NANOSECONDS_PER_SECOND This patch replaces get_ticks_per_sec() calls with the macro NANOSECONDS_PER_SECOND. Also, as there are no callers, get_ticks_per_sec() is then removed. This replacement improves the readability and understandability of code. For example, timer_mod(fdctrl->result_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + (get_ticks_per_sec() / 50)); NANOSECONDS_PER_SECOND makes it obvious that qemu_clock_get_ns matches the unit of the expression on the right side of the plus. Signed-off-by: Rutuja Shah <rutu.shah.26@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-22 22:20:17 +01:00
Markus Armbruster	daf015ef5a	include/qemu/iov.h: Don't include qemu-common.h qemu-common.h should only be included by .c files. Its file comment explains why: "No header file should depend on qemu-common.h, as this would easily lead to circular header dependencies." qemu/iov.h includes qemu-common.h for QEMUIOVector stuff. Move all that to qemu/iov.h and drop the ill-advised include. Include qemu/iov.h where the QEMUIOVector stuff is now missing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-22 22:20:16 +01:00
Markus Armbruster	da34e65cb4	include/qemu/osdep.h: Don't include qapi/error.h Commit `57cb38b` included qapi/error.h into qemu/osdep.h to get the Error typedef. Since then, we've moved to include qemu/osdep.h everywhere. Its file comment explains: "To avoid getting into possible circular include dependencies, this file should not include any other QEMU headers, with the exceptions of config-host.h, compiler.h, os-posix.h and os-win32.h, all of which are doing a similar job to this file and are under similar constraints." qapi/error.h doesn't do a similar job, and it doesn't adhere to similar constraints: it includes qapi-types.h. That's in excess of 100KiB of crap most .c files don't actually need. Add the typedef to qemu/typedefs.h, and include that instead of qapi/error.h. Include qapi/error.h in .c files that need it and don't get it now. Include qapi-types.h in qom/object.h for uint16List. Update scripts/clean-includes accordingly. Update it further to match reality: replace config.h by config-target.h, add sysemu/os-posix.h, sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h comment quoted above similarly. This reduces the number of objects depending on qapi/error.h from "all of them" to less than a third. Unfortunately, the number depending on qapi-types.h shrinks only a little. More work is needed for that one. Signed-off-by: Markus Armbruster <armbru@redhat.com> [Fix compilation without the spice devel packages. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-22 22:20:15 +01:00
Eric Blake	32bafa8fdd	qapi: Don't special-case simple union wrappers Simple unions were carrying a special case that hid their 'data' QMP member from the resulting C struct, via the hack method QAPISchemaObjectTypeVariant.simple_union_type(). But by using the work we started by unboxing flat union and alternate branches, coupled with the ability to visit the members of an implicit type, we can now expose the simple union's implicit type in qapi-types.h: \| struct q_obj_ImageInfoSpecificQCow2_wrapper { \| ImageInfoSpecificQCow2 data; \| }; \| \| struct q_obj_ImageInfoSpecificVmdk_wrapper { \| ImageInfoSpecificVmdk data; \| }; ... \| struct ImageInfoSpecific { \| ImageInfoSpecificKind type; \| union { /* union tag is @type / \| void data; \|- ImageInfoSpecificQCow2 qcow2; \|- ImageInfoSpecificVmdk vmdk; \|+ q_obj_ImageInfoSpecificQCow2_wrapper qcow2; \|+ q_obj_ImageInfoSpecificVmdk_wrapper vmdk; \| } u; \| }; Doing this removes asymmetry between QAPI's QMP side and its C side (both sides now expose 'data'), and means that the treatment of a simple union as sugar for a flat union is now equivalent in both languages (previously the two approaches used a different layer of dereferencing, where the simple union could be converted to a flat union with equivalent C layout but different {} on the wire, or to an equivalent QMP wire form but with different C representation). Using the implicit type also lets us get rid of the simple_union_type() hack. Of course, now all clients of simple unions have to adjust from using su->u.member to using su->u.member.data; while this touches a number of files in the tree, some earlier cleanup patches helped minimize the change to the initialization of a temporary variable rather than every single member access. The generated qapi-visit.c code is also affected by the layout change: \|@@ -7393,10 +7393,10 @@ void visit_type_ImageInfoSpecific_member \| } \| switch (obj->type) { \| case IMAGE_INFO_SPECIFIC_KIND_QCOW2: \|- visit_type_ImageInfoSpecificQCow2(v, "data", &obj->u.qcow2, &err); \|+ visit_type_q_obj_ImageInfoSpecificQCow2_wrapper_members(v, &obj->u.qcow2, &err); \| break; \| case IMAGE_INFO_SPECIFIC_KIND_VMDK: \|- visit_type_ImageInfoSpecificVmdk(v, "data", &obj->u.vmdk, &err); \|+ visit_type_q_obj_ImageInfoSpecificVmdk_wrapper_members(v, &obj->u.vmdk, &err); \| break; \| default: \| abort(); Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1458254921-17042-13-git-send-email-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2016-03-18 10:29:26 +01:00
Alberto Garcia	6049490df4	quorum: Emit QUORUM_REPORT_BAD for reads in fifo mode If there's an I/O error in one of Quorum children then QEMU should emit QUORUM_REPORT_BAD. However this is not working with read-pattern=fifo. This patch fixes this problem. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: d57e39e8d3e8564003a1e2aadbd29c97286eb2d2.1458034554.git.berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-03-17 16:43:30 +01:00
Kevin Wolf	8896e08814	block: Use blk_co_pwritev() in blk_co_write_zeroes() Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 16:30:00 +01:00
Kevin Wolf	57d6a42883	block: Use blk_aio_prwv() for aio_read/write/write_zeroes Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 16:30:00 +01:00
Kevin Wolf	a55d3fba99	block: Use blk_prw() in blk_pread()/blk_pwrite() Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:57 +01:00
Kevin Wolf	fc1453cdfc	block: Use blk_co_pwritev() in blk_write_zeroes() Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:57 +01:00
Kevin Wolf	5bd5119667	block: Pull up blk_read_unthrottled() implementation Use blk_read(), so that it goes through blk_co_preadv() like all read requests from the BB to the BDS. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:57 +01:00
Kevin Wolf	a8823a3bfd	block: Use blk_co_pwritev() for blk_write() Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:57 +01:00
Kevin Wolf	1bf1cbc91f	block: Use blk_co_preadv() for blk_read() This patch introduces blk_co_preadv() as a central function on the BlockBackend level that is supposed to handle all read requests from the BB to its root BDS eventually. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:57 +01:00
Kevin Wolf	f21d96d04b	block: Use BdrvChild in BlockBackend Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:57 +01:00
Max Reitz	981f4f578e	block: Add blk_next_root_bs() This function iterates over all BDSs attached to a BB. We are going to need it when rewriting bdrv_next() so it no longer uses bdrv_states. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:56 +01:00
Max Reitz	fe1a9cbc33	block: Move some bdrv_*_all() functions to BB Move bdrv_commit_all() and bdrv_flush_all() to the BlockBackend level. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:56 +01:00
Max Reitz	7c735873d9	blockdev: Remove blk_hide_on_behalf_of_hmp_drive_del() We can basically inline it in hmp_drive_del(); monitor_remove_blk() is called already, so we just need to call bdrv_make_anon(), too. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:56 +01:00
Max Reitz	efaa7c4eeb	blockdev: Split monitor reference from BB creation Before this patch, blk_new() automatically assigned a name to the new BlockBackend and considered it referenced by the monitor. This patch removes the implicit monitor_add_blk() call from blk_new() (and consequently the monitor_remove_blk() call from blk_delete(), too) and thus blk_new() (and related functions) no longer take a BB name argument. In fact, there is only a single point where blk_new()/blk_new_open() is called and the new BB is monitor-owned, and that is in blockdev_init(). Besides thus relieving us from having to invent names for all of the BBs we use in qemu-img, this fixes a bug where qemu cannot create a new image if there already is a monitor-owned BB named "image". If a BB and its BDS tree are created in a single operation, as of this patch the BDS tree will be created before the BB is given a name (whereas it was the other way around before). This results in minor change to the output of iotest 087, whose reference output is amended accordingly. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:56 +01:00
Max Reitz	e5e785500b	blockdev: Separate BB name management Introduce separate functions (monitor_add_blk() and monitor_remove_blk()) which set or unset a BB name. Since the name is equivalent to the monitor's reference to a BB, adding a name the same as declaring the BB to be monitor-owned and removing it revokes this status, hence the function names. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:56 +01:00
Max Reitz	2cf22d6a1a	blockdev: Add list of all BlockBackends While monitor_block_backends contains nearly all BBs, we sometimes really need all BBs. To this end, this patch adds the block_backend list. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:56 +01:00
Max Reitz	9492b0b928	blockdev: Rename blk_backends The blk_backends list does not contain all BlockBackends but only the ones which are referenced by the monitor, and that is not necessarily true for every BlockBackend. Rename the list to monitor_block_backends to make that fact clear. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:56 +01:00
Max Reitz	a55448b368	qapi: Drop QERR_UNKNOWN_BLOCK_FORMAT_FEATURE Just specifying a custom string is simpler in basically all places that used it, and in addition, specifying the BB or node name is something we generally do not do in other error messages when opening a BDS, so we should not do it here. This changes the output for iotest 036 (to the better, in my opinion), so the reference output needs to be changed accordingly. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:56 +01:00
Max Reitz	1393f21270	block: Add blk_commit_all() Later, we will remove bdrv_commit_all() and move its contents here, and in order to replace bdrv_commit_all() calls by calls to blk_commit_all() before doing so, we need to add it as an alias now. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:56 +01:00
Max Reitz	74d1b8fc27	block: Use blk_next() in block-backend.c Instead of iterating directly through blk_backends, we can use blk_next() instead. This gives us some abstraction from the list itself which we can use to rename it, for example. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:56 +01:00
Stefan Hajnoczi	1f3ddfcb25	Revert "qed: Implement .bdrv_drain" This reverts commit `df9a681dc9`. Note that commit `df9a681dc9` included some unrelated hunks, possibly due to a merge failure or an overlooked squash. This only reverts the qed .bdrv_drain() implementation. The qed .bdrv_drain() implementation is unsafe and can lead to a double request completion. Paolo Bonzini reports: "The problem is that bdrv_qed_drain calls qed_plug_allocating_write_reqs unconditionally, but this is not correct if an allocating write is queued. In this case, qed_unplug_allocating_write_reqs will restart the allocating write and possibly cause it to complete. The aiocb however is still in use for the L2/L1 table writes, and will then be completed again as soon as the table writes are stable." For QEMU 2.6 we can simply revert this commit. A full solution for the qed need check timer may be added if the bdrv_drain() implementation is extended. Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1457431876-8475-1-git-send-email-stefanha@redhat.com	2016-03-17 09:50:14 +00:00
Jeff Cody	03c698f0a2	block/sheepdog: fix argument passed to qemu_strtoul() The function qemu_strtoul() reads 'unsigned long' sized data, which is larger than uint32_t on 64-bit machines. Even though the snap_id field in the header is 32-bits, we must accommodate the full size in qemu_strtoul(). This patch also adds more meaningful error handling to the qemu_strtoul() call, and subsequent results. Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Message-id: e56fc50abedd9a112e0683342c8eafda063cd2f9.1456935548.git.jcody@redhat.com	2016-03-16 13:25:29 -04:00
Alberto Garcia	b9c600d207	quorum: Fix crash in quorum_aio_cb() quorum_aio_cb() emits the QUORUM_REPORT_BAD event if there's an I/O error in a Quorum child. However sacb->aiocb must be correctly initialized for this to happen. read_quorum_children() and read_fifo_child() are not doing this, which results in a QEMU crash. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 8138570d071ba7e25db3736979234a1fd71dbd05.1457610443.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-03-14 17:35:06 +01:00
Fam Zheng	ebab225910	block: Move block dirty bitmap code to separate files The only code change is making bdrv_dirty_bitmap_truncate public. It is used in block.c. Also two long lines (bdrv_get_dirty) are wrapped. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1457412306-18940-5-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-03-14 17:35:05 +01:00
Fam Zheng	b2f56462d5	backup: Use Bitmap to replace "s->bitmap" "s->bitmap" tracks done sectors, we only check bit states without using any iterator which HBitmap is good for. Switch to "Bitmap" which is simpler and more memory efficient. Meanwhile, rename it to done_bitmap, to reflect the intention. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1457412306-18940-2-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-03-14 17:35:05 +01:00
Kevin Wolf	b8f45cdf78	vpc: Use BB functions in .bdrv_create() All users of the block layers are supposed to go through a BlockBackend. The .bdrv_create() implementation is one such user, so this patch converts it. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-14 16:46:44 +01:00
Kevin Wolf	c4bea1690e	vmdk: Use BB functions in .bdrv_create() All users of the block layers are supposed to go through a BlockBackend. The .bdrv_create() implementation is one such user, so this patch converts it. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-14 16:46:43 +01:00
Kevin Wolf	10bf03af12	vhdx: Use BB functions in .bdrv_create() All users of the block layers are supposed to go through a BlockBackend. The .bdrv_create() implementation is one such user, so this patch converts it. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-14 16:46:43 +01:00
Kevin Wolf	a08f0c3b5f	vdi: Use BB functions in .bdrv_create() All users of the block layers are supposed to go through a BlockBackend. The .bdrv_create() implementation is one such user, so this patch converts it. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-14 16:46:43 +01:00
Kevin Wolf	fba98d455a	sheepdog: Use BB functions in .bdrv_create() All users of the block layers are supposed to go through a BlockBackend. The .bdrv_create() implementation is one such user, so this patch converts it. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-14 16:46:43 +01:00
Kevin Wolf	8a56fdadaf	qed: Use BB functions in .bdrv_create() All users of the block layers are supposed to go through a BlockBackend. The .bdrv_create() implementation is one such user, so this patch converts it. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-14 16:46:43 +01:00
Kevin Wolf	23588797b6	qcow2: Use BB functions in .bdrv_create() All users of the block layers are supposed to go through a BlockBackend. The .bdrv_create() implementation is one such user, so this patch converts it. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-14 16:46:43 +01:00
Kevin Wolf	6af4016020	qcow: Use BB functions in .bdrv_create() All users of the block layers are supposed to go through a BlockBackend. The .bdrv_create() implementation is one such user, so this patch converts it. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-14 16:46:43 +01:00
Kevin Wolf	8942764f54	parallels: Use BB functions in .bdrv_create() All users of the block layers are supposed to go through a BlockBackend. The .bdrv_create() implementation is one such user, so this patch converts it. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-14 16:46:43 +01:00
Kevin Wolf	c10c9d9615	block: Introduce blk_set_allow_write_beyond_eof() We check that the guest can't write beyond the end of its disk, but for other internal users it can make sense to allow growing a file. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-14 16:46:43 +01:00
Kevin Wolf	6340472c54	block: Use writeback in .bdrv_create() implementations There's no reason to use a writethrough cache mode while creating an image. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-14 16:46:43 +01:00
Fam Zheng	71968dbfd8	vmdk: Switch to heap arrays for vmdk_parent_open Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-14 16:46:43 +01:00
Fam Zheng	5997c210b9	vmdk: Switch to heap arrays for vmdk_read_cid Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-14 16:46:43 +01:00
Fam Zheng	965415eb20	vmdk: Switch to heap arrays for vmdk_write_cid It is only called once for each opened image, so we can do it the easy way. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-14 16:46:43 +01:00
Changlong Xie	924e8a2bbc	quorum: modify vote rules for flush operation Keep flush interface the same logic as quorum read/write, Otherwise in following scenario, we'll encounter unexpected errors. Quorum has two children(A, B). A do flush sucessfully, but B flush failed. This cause the filesystem of guest become read-only with following errors: end_request: I/O error, dev vda, sector 11159960 Aborting journal on device vda3-8 EXT4-fs error (device vda3): ext4_journal_start_sb:327: Detected abort journal EXT4-fs (vda3): Remounting filesystem read-only Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-14 16:46:43 +01:00
Changlong Xie	0ae053b7e1	qmp event: Refactor QUORUM_REPORT_BAD Introduce QuorumOpType, and make QUORUM_REPORT_BAD compatible with it. Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-14 16:46:43 +01:00
Jeff Cody	fb9245c261	block/vpc: give option to force the current_size field in .bdrv_create When QEMU creates a VHD image, it goes by the original spec, calculating the current_size based on the nearest CHS geometry (with an exception for disks > 127GB). Apparently, Azure will only allow images that are sized to the nearest MB, and the current_size as calculated from CHS cannot guarantee that. Allow QEMU to create images similar to how Hyper-V creates images, by setting current_size to the specified virtual disk size. This introduces an option, force_size, to be passed to the vpc format during image creation, e.g.: qemu-img convert -f raw -o force_size -O vpc test.img test.vhd When using the "force_size" option, the creator app field used by QEMU will be "qem2" instead of "qemu", to indicate the difference. In light of this, we also add parsing of the "qem2" field during vpc_open. Bug reference: https://bugs.launchpad.net/qemu/+bug/1490611 Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-14 16:46:42 +01:00
Jeff Cody	c540d53ac8	block/vpc: choose size calculation method based on creator_app field The VHD file format is used by both Virtual PC, and Hyper-V. However, how the virtual disk size is calculated varies between the two. Virtual PC uses the CHS drive parameters to determine the drive size. Hyper-V, on the other hand, uses the current_size field in the footer when determining image size. This is problematic for a few reasons: * VHD images from Hyper-V, using CHS calculations, will likely be trunctated. * If we just rely always on current_size, then QEMU may have data compatibility issues with Virtual PC (we may write too much data into a VHD file to be used by Virtual PC, for instance). * Existing VHD images created by QEMU have used the CHS calculations, except for images exceeding the 127GB limit. We want to remain compatible with our own generated images. Luckily, the VHD specification defines a 'Creator App' field, that is used to indicate what software created the VHD file. This patch does two things: 1. Uses the 'Creator App' field to help determine how to calculate size, and 2. Adds a VPC format option 'force_size_calc', so that the user can override the 'Creator App' auto-detection, in case there exist VHD images with unknown or contradictory 'Creator App' entries. N.B.: We currently use the maximum CHS value as an indication to use the current_size field. This patch does not change that, even with the 'force_size_calc' option. Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-14 16:46:42 +01:00
Kevin Wolf	c21cc6ca98	block/qapi: Include empty drives in query-blockstats Since commit `5ec18f8c`, query-blockstats didn't return the statistics of drives without media any more because such drives have only a BB now, but not a BDS any more. This patch fixes the regression so that query-blockstats iterates over BBs by default and empty drives are displayed again. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-14 16:46:42 +01:00
Kevin Wolf	b07363a1a3	block/qapi: Factor out bdrv_query_bds_stats() The new functions handles the data that is taken from the BlockDriverState. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-14 16:46:42 +01:00
Kevin Wolf	2b77e60ab8	block/qapi: Factor out bdrv_query_blk_stats() The new functions handles the data that is taken from the BlockBackend. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-14 16:46:42 +01:00
Daniel P. Berrange	b16a44e13e	osdep: remove use of socket_error() from all code Now that QEMU wraps the Win32 sockets methods to automatically set errno upon failure, there is no reason for callers to use the socket_error() method. They can rely on accessing errno even on Win32. Remove all use of socket_error() from general code, leaving it as a static method in oslib-win32.c only. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2016-03-10 17:19:34 +00:00
Eric Blake	0399293e5b	util: Shorten references into SocketAddress An upcoming patch will alter how simple unions, like SocketAddress, are laid out, which will impact all lines of the form 'addr->u.XXX' (expanding it to the longer 'addr->u.XXX.data'). For better legibility in that patch, and less need for line wrapping, it's better to use a temporary variable to reduce the effect of a layout change to just the variable initializations, rather than every reference within a SocketAddress. Also, take advantage of some C99 initialization where it makes sense (simplifying g_new0() to g_new()). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1457021813-10704-7-git-send-email-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2016-03-05 10:41:52 +01:00
John Snow	4c9bca7e39	block/backup: avoid copying less than full target clusters During incremental backups, if the target has a cluster size that is larger than the backup cluster size and we are backing up to a target that cannot (for whichever reason) pull clusters up from a backing image, we may inadvertantly create unusable incremental backup images. For example: If the bitmap tracks changes at a 64KB granularity and we transmit 64KB of data at a time but the target uses a 128KB cluster size, it is possible that only half of a target cluster will be recognized as dirty by the backup block job. When the cluster is allocated on the target image but only half populated with data, we lose the ability to distinguish between zero padding and uninitialized data. This does not happen if the target image has a backing file that points to the last known good backup. Even if we have a backing file, though, it's likely going to be faster to just buffer the redundant data ourselves from the live image than fetching it from the backing file, so let's just always round up to the target granularity. The same logic applies to backup modes top, none, and full. Copying fractional clusters without the guarantee of COW is dangerous, but even if we can rely on COW, it's likely better to just re-copy the data. Reported-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1456433911-24718-3-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-02-29 14:55:14 -05:00
John Snow	16096a4d47	block/backup: make backup cluster size configurable 64K might not always be appropriate, make this a runtime value. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1456433911-24718-2-git-send-email-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-02-29 14:55:14 -05:00
Fam Zheng	21cd917ff5	mirror: Add mirror_wait_for_io The three lines are duplicated a number of times now, refactor a function. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1454637630-10585-3-git-send-email-famz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-02-29 14:54:31 -05:00
Fam Zheng	e5b43573e2	mirror: Rewrite mirror_iteration The "pnum < nb_sectors" condition in deciding whether to actually copy data is unnecessarily strict, and the qiov initialization is unnecessarily for bdrv_aio_write_zeroes and bdrv_aio_discard. Rewrite mirror_iteration to fix both flaws. The output of iotests 109 is updated because we now report the offset and len slightly differently in mirroring progress. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1454637630-10585-2-git-send-email-famz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-02-29 14:54:31 -05:00
Max Reitz	04a3615860	vhdx: Simplify vhdx_set_shift_bits() For values which are powers of two (and we do assume all of these to be), sizeof(x) * 8 - 1 - clz(x) == ctz(x). Therefore, use ctz(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 1450451066-13335-3-git-send-email-mreitz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-02-29 14:54:31 -05:00
Max Reitz	939901dcd2	vhdx: DIV_ROUND_UP() in vhdx_calc_bat_entries() We have DIV_ROUND_UP(), so we can use it to produce more easily readable code. It may be slower than the bit shifting currently performed (because it actually performs a division), but since vhdx_calc_bat_entries() is never used in a hot path, this is completely fine. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 1450451066-13335-2-git-send-email-mreitz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-02-29 14:54:31 -05:00
Daniel P. Berrange	b189346eb1	iscsi: add support for getting CHAP password via QCryptoSecret API The iSCSI driver currently accepts the CHAP password in plain text as a block driver property. This change adds a new "password-secret" property that accepts the ID of a QCryptoSecret instance. $QEMU \ -object secret,id=sec0,filename=/home/berrange/example.pw \ -drive driver=iscsi,url=iscsi://example.com/target-foo/lun1,\ user=dan,password-secret=sec0 Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 1453385961-10718-4-git-send-email-berrange@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-02-29 14:54:31 -05:00
Daniel P. Berrange	1bff960642	curl: add support for HTTP authentication parameters If connecting to a web server which has authentication turned on, QEMU gets a 401 as curl has not been configured with any authentication credentials. This adds 4 new parameters to the curl block driver options 'username', 'password-secret', 'proxy-username' and 'proxy-password-secret'. Passwords are provided using the recently added 'secret' object type $QEMU \ -object secret,id=sec0,filename=/home/berrange/example.pw \ -object secret,id=sec1,filename=/home/berrange/proxy.pw \ -drive driver=http,url=http://example.com/some.img,\ username=dan,password-secret=sec0,\ proxy-username=dan,proxy-password-secret=sec1 Of course it is possible to use the same secret for both the proxy & server passwords if desired, or omit the proxy auth details, or the server auth details as required. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 1453385961-10718-3-git-send-email-berrange@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-02-29 14:54:31 -05:00
Daniel P. Berrange	60390a2192	rbd: add support for getting password from QCryptoSecret object Currently RBD passwords must be provided on the command line via $QEMU -drive file=rbd:pool/image:id=myname:\ key=QVFDVm41aE82SHpGQWhBQXEwTkN2OGp0SmNJY0UrSE9CbE1RMUE=:\ auth_supported=cephx This is insecure because the key is visible in the OS process listing. This adds support for an 'password-secret' parameter in the RBD parameters that can be used with the QCryptoSecret object to provide the password via a file: echo "QVFDVm41aE82SHpGQWhBQXEwTkN2OGp0SmNJY0UrSE9CbE1RMUE=" > poolkey.b64 $QEMU -object secret,id=secret0,file=poolkey.b64,format=base64 \ -drive driver=rbd,filename=rbd:pool/image:id=myname:\ auth_supported=cephx,password-secret=secret0 Reviewed-by: Josh Durgin <jdurgin@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 1453385961-10718-2-git-send-email-berrange@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-02-29 14:54:30 -05:00
Vasiliy Tolstov	eab8eb8db3	sheepdog: allow to delete snapshot This patch implements a blockdriver function bdrv_snapshot_delete() in the sheepdog driver. With the new function, snapshots of sheepdog can be deleted from libvirt. Cc: Jeff Cody <jcody@redhat.com> Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Vasiliy Tolstov <v.tolstov@selfip.ru> Message-id: 1450873346-22334-1-git-send-email-mitake.hitoshi@lab.ntt.co.jp Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-02-29 14:54:30 -05:00
Peter Lieven	7725b8bf12	block/nfs: add support for setting debug level recent libnfs versions support logging debug messages. Add support for it in qemu through an URL parameter. Example: qemu -cdrom nfs://127.0.0.1/iso/my.iso?debug=2 Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1447052973-14513-1-git-send-email-pl@kamp.de Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-02-29 14:54:30 -05:00
Alberto Garcia	398befdf50	qapi: Add burst length fields to BlockDeviceInfo This patch adds the new bps__max_length and iops__max_length parameters to the BlockDeviceInfo struct. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-02-22 14:08:06 +01:00
Changlong Xie	f38738e212	quorum: fix segfault when read fails in fifo mode Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-02-22 09:49:46 +01:00
Daniel P. Berrange	75822a12c0	nbd: enable use of TLS with NBD block driver This modifies the NBD driver so that it is possible to request use of TLS. This is done by providing the 'tls-creds' parameter with the ID of a previously created QCryptoTLSCreds object. For example $QEMU -object tls-creds-x509,id=tls0,endpoint=client,\ dir=/home/berrange/security/qemutls \ -drive driver=nbd,host=localhost,port=9000,tls-creds=tls0 The client will drop the connection if the NBD server does not provide TLS. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1455129674-17255-15-git-send-email-berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-02-16 17:16:33 +01:00
Daniel P. Berrange	f95910fe6b	nbd: implement TLS support in the protocol negotiation This extends the NBD protocol handling code so that it is capable of negotiating TLS support during the connection setup. This involves requesting the STARTTLS protocol option before any other NBD options. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1455129674-17255-14-git-send-email-berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-02-16 17:16:28 +01:00
Daniel P. Berrange	1c778ef729	nbd: convert to using I/O channels for actual socket I/O Now that all callers are converted to use I/O channels for initial connection setup, it is possible to switch the core NBD protocol handling core over to use QIOChannel APIs for actual sockets I/O. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1455129674-17255-7-git-send-email-berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-02-16 17:13:57 +01:00
Daniel P. Berrange	064097d919	nbd: convert block client to use I/O channels for connection setup This converts the NBD block driver client to use the QIOChannelSocket class for initial connection setup. The NbdClientSession struct has two pointers, one to the master QIOChannelSocket providing the raw data channel, and one to a QIOChannel which is the current channel used for I/O. Initially the two point to the same object, but when TLS support is added, they will point to different objects. The qemu-img & qemu-io tools now need to use MODULE_INIT_QOM to ensure the QIOChannel object classes are registered. The qemu-nbd tool already did this. In this initial conversion though, all I/O is still actually done using the raw POSIX sockets APIs. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1455129674-17255-4-git-send-email-berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-02-16 17:13:22 +01:00
Peter Maydell	f075c89f0a	-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWugGdAAoJEJykq7OBq3PIE5cH/RY0iIsvDU4dD05Xs/xlDPNL 0lPwqmYIhGPCYdEQIhLsnf8bfYkyXGaS/bgWFzKk6mb4XI6HhcDdEoOXOSa3Rreq bm58ngE9xwC5ZiDuBFxzMXuoU/+iAU/YHiTxL84JaMpBjUQsUCvHlLMVXJXGphzG pXE5neYuy+MftiZ5f0nlVaIGCdd9rHWkKMgpViR0Xanx1cZe0TSkh12qlCRLXNJG SHDq2wNrFp7lV4IK2evcU59GvJ5tfXyaDEh7yCMrvMUFM96jBjSgtgKbuE+t4VWi 6Awli6z9OvygSu0s/GuJ7yW41RSiTv/Qu1TagltkNjuSB5Gzw3lTgmorgVlLS+Y= =PZTe -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging # gpg: Signature made Tue 09 Feb 2016 15:11:25 GMT using RSA key ID 81AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" * remotes/stefanha/tags/block-pull-request: block: add missing call to bdrv_drain_recurse blockjob: Fix hang in block_job_finish_sync iov: avoid memcpy for "simple" iov_from_buf/iov_to_buf Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-02-09 17:56:46 +00:00
Paolo Bonzini	9dcf8ecd9e	block: add missing call to bdrv_drain_recurse This is also needed in bdrv_drain_all, not just in bdrv_drain. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1450867706-19860-3-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-02-09 13:52:26 +00:00
Eric Blake	51e72bc1dd	qapi: Swap visit_* arguments for consistent 'name' placement JSON uses "name":value, but many of our visitor interfaces were called with visit_type_FOO(v, &value, name, errp). This can be a bit confusing to have to mentally swap the parameter order to match JSON order. It's particularly bad for visit_start_struct(), where the 'name' parameter is smack in the middle of the otherwise-related group of 'obj, kind, size' parameters! It's time to do a global swap of the parameter ordering, so that the 'name' parameter is always immediately after the Visitor argument. Additional reason in favor of the swap: the existing include/qjson.h prefers listing 'name' first in json_prop_(), and I have plans to unify that file with the qapi visitors; listing 'name' first in qapi will minimize churn to the (admittedly few) qjson.h clients. Later patches will then fix docs, object.h, visitor-impl.h, and those clients to match. Done by first patching scripts/qapi.py by hand to make generated files do what I want, then by running the following Coccinelle script to affect the rest of the code base: $ spatch --sp-file script `git grep -l '\bvisit_' -- '*/.[ch]'` I then had to apply some touchups (Coccinelle insisted on TAB indentation in visitor.h, and botched the signature of visit_type_enum() by rewriting 'const char const strings[]' to the syntactically invalid 'const charconst[] strings'). The movement of parameters is sufficient to provoke compiler errors if any callers were missed. // Part 1: Swap declaration order @@ type TV, TErr, TObj, T1, T2; identifier OBJ, ARG1, ARG2; @@ void visit_start_struct -(TV v, TObj OBJ, T1 ARG1, const char name, T2 ARG2, TErr errp) +(TV v, const char name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp) { ... } @@ type bool, TV, T1; identifier ARG1; @@ bool visit_optional -(TV v, T1 ARG1, const char name) +(TV v, const char name, T1 ARG1) { ... } @@ type TV, TErr, TObj, T1; identifier OBJ, ARG1; @@ void visit_get_next_type -(TV v, TObj OBJ, T1 ARG1, const char name, TErr errp) +(TV v, const char name, TObj OBJ, T1 ARG1, TErr errp) { ... } @@ type TV, TErr, TObj, T1, T2; identifier OBJ, ARG1, ARG2; @@ void visit_type_enum -(TV v, TObj OBJ, T1 ARG1, T2 ARG2, const char name, TErr errp) +(TV v, const char name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp) { ... } @@ type TV, TErr, TObj; identifier OBJ; identifier VISIT_TYPE =~ "^visit_type_"; @@ void VISIT_TYPE -(TV v, TObj OBJ, const char name, TErr errp) +(TV v, const char name, TObj OBJ, TErr errp) { ... } // Part 2: swap caller order @@ expression V, NAME, OBJ, ARG1, ARG2, ERR; identifier VISIT_TYPE =~ "^visit_type_"; @@ ( -visit_start_struct(V, OBJ, ARG1, NAME, ARG2, ERR) +visit_start_struct(V, NAME, OBJ, ARG1, ARG2, ERR) \| -visit_optional(V, ARG1, NAME) +visit_optional(V, NAME, ARG1) \| -visit_get_next_type(V, OBJ, ARG1, NAME, ERR) +visit_get_next_type(V, NAME, OBJ, ARG1, ERR) \| -visit_type_enum(V, OBJ, ARG1, ARG2, NAME, ERR) +visit_type_enum(V, NAME, OBJ, ARG1, ARG2, ERR) \| -VISIT_TYPE(V, OBJ, NAME, ERR) +VISIT_TYPE(V, NAME, OBJ, ERR) ) Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <1454075341-13658-19-git-send-email-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2016-02-08 17:29:56 +01:00
Fam Zheng	ac987b30d0	block: Use returned *file in bdrv_co_get_block_status Now that all drivers return the right "file" pointer, we can use it. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1453780743-16806-14-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 17:50:47 +01:00
Fam Zheng	e0f100f57c	vmdk: Return extent's file in bdrv_get_block_status Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1453780743-16806-13-git-send-email-famz@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 17:50:47 +01:00
Fam Zheng	d0a18f1025	vmdk: Fix calculation of block status's offset "offset" is the offset of cluster and sector_num doesn't necessarily refer to the start of it, it should add index_in_cluster. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1453780743-16806-12-git-send-email-famz@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 17:50:47 +01:00
Fam Zheng	7429e20788	vpc: Assign bs->file->bs to file in vpc_co_get_block_status Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1453780743-16806-11-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 17:50:47 +01:00
Fam Zheng	8bfb137152	vdi: Assign bs->file->bs to file in vdi_co_get_block_status Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1453780743-16806-10-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 17:50:47 +01:00
Fam Zheng	d234c92931	sheepdog: Assign bs to file in sd_co_get_block_status Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1453780743-16806-9-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 17:50:47 +01:00
Fam Zheng	53f1dfd1ff	qed: Assign bs->file->bs to file in bdrv_qed_co_get_block_status Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1453780743-16806-8-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 17:50:47 +01:00
Fam Zheng	ddf4987d76	parallels: Assign bs->file->bs to file in parallels_co_get_block_status Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1453780743-16806-7-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 17:50:47 +01:00
Fam Zheng	3399833f14	iscsi: Assign bs to file in iscsi_co_get_block_status Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1453780743-16806-6-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 17:50:47 +01:00
Fam Zheng	02650acbc6	raw: Assign bs to file in raw_co_get_block_status Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1453780743-16806-5-git-send-email-famz@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 17:50:47 +01:00
Fam Zheng	178b4db7e5	qcow2: Assign bs->file->bs to file in qcow2_co_get_block_status Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1453780743-16806-4-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 17:50:47 +01:00
Fam Zheng	3064bf6fff	qcow: Assign bs->file->bs to file in qcow_co_get_block_status Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1453780743-16806-3-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 17:50:47 +01:00
Fam Zheng	67a0fd2a9b	block: Add "file" output parameter to block status query functions The added parameter can be used to return the BDS pointer which the valid offset is referring to. Its value should be ignored unless BDRV_BLOCK_OFFSET_VALID in ret is set. Until block drivers fill in the right value, let's clear it explicitly right before calling .bdrv_get_block_status. The "bs->file" condition in bdrv_co_get_block_status is kept now to keep iotest case 102 passing, and will be fixed once all drivers return the right file pointer. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1453780743-16806-2-git-send-email-famz@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 17:50:47 +01:00
Paolo Bonzini	1963f8d52e	block: acquire in bdrv_query_image_info NFS calls aio_poll inside bdrv_get_allocated_size. This requires acquiring the AioContext. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1450867706-19860-1-git-send-email-pbonzini@redhat.com Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 17:50:47 +01:00
Max Reitz	d8da3cef3b	block: Add blk_remove_all_bs() When bdrv_close_all() is called, instead of force-closing all root BlockDriverStates, it is better to just drop the reference from all BlockBackends and let them be closed automatically. This prevents BDS from getting closed that are still referenced by other BDS, which may result in loss of cached data. This patch adds a function for doing that, but does not yet incorporate it in bdrv_close_all(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-02-02 17:50:46 +01:00
Max Reitz	13855c6b9f	block: Use blk_remove_bs() in blk_delete() Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-02-02 17:50:46 +01:00
Max Reitz	033cb5659a	block: Remove BDS close notifier It is unused now, so we can remove it. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-02-02 17:50:46 +01:00
Max Reitz	3301f6c6e9	block: Add BB-BDS remove/insert notifiers bdrv_close() no longer signifies ejection of a medium, this is now done by removing the BDS from the BB. Therefore, we want to have a notifier for that in the BB instead of a close notifier in the BDS. The former is added now, the latter is removed later. Symmetrically, another notifier list is added that is invoked whenever a BDS is inserted. We will need that for virtio-blk and virtio-scsi, which can then remove their op blockers on BDS ejection and set them up on insertion. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-02-02 17:50:46 +01:00
Fam Zheng	3db1d98a20	vmdk: Fix converting to streamOptimized Commit `d62d9dc4b8` lifted streamOptimized images's version to 3, but we now refuse to open version 3 images read-write. We need to make streamOptimized an exception to allow converting to it. This fixes the accidentally broken iotests case 059 for the same reason. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 17:49:34 +01:00
Max Reitz	327032ce74	block/qapi: Emit tray_open only if there is a tray Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1454096953-31773-5-git-send-email-mreitz@redhat.com	2016-02-02 17:47:06 +01:00
Max Reitz	8f3a73bc57	block: Add blk_dev_has_tray() Pull out the check whether a block device has a tray from blk_dev_is_tray_open() into its own function so both attributes (whether there is a tray vs. whether that tray is open) can be queried independently. Cc: qemu-stable <qemu-stable@nongnu.org> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1454096953-31773-2-git-send-email-mreitz@redhat.com	2016-02-02 17:46:56 +01:00
Fam Zheng	d62d9dc4b8	vmdk: Create streamOptimized as version 3 VMware products accept only version 3 for streamOptimized, let's bump the version. Reported-by: Radoslav Gerganov <rgerganov@vmware.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-01-20 13:36:24 +01:00
Kevin Wolf	191fb11bdf	qcow2: Make image inaccessible after failed qcow2_invalidate_cache() If qcow2_invalidate_cache() fails, we are in a state where qcow2_close() has already been completed, but the image hasn't been reopened yet. Calling into any qcow2 function for an image in this state will cause crashes. The real solution would be to get rid of the close/open pair and instead do an atomic reset of the involved data structures, but this isn't trivial, so let's just make the image inaccessible for now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-01-20 13:36:24 +01:00
Kevin Wolf	140fd5a69c	qcow2: Fix BDRV_O_INACTIVE handling in qcow2_invalidate_cache() What qcow2_invalidate_cache() should do is close the image with BDRV_O_INACTIVE set and reopen it with the flag cleared. In fact, it used to do exactly the opposite: qcow2_close() relied on bs->open_flags, which is already updated to have cleared BDRV_O_INACTIVE at this point, whereas qcow2_open() was called with s->flags, which has the flag still set. Fix this. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-01-20 13:36:23 +01:00
Kevin Wolf	ec6d891224	qcow2: Implement .bdrv_inactivate The callback has to ensure that closing or flushing the image afterwards wouldn't cause a write access to the image files. This means that just the caches have to be written out, which is part of the existing .bdrv_close implementation. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-01-20 13:36:23 +01:00
Kevin Wolf	04c01a5c8f	block: Rename BDRV_O_INCOMING to BDRV_O_INACTIVE Instead of covering only the state of images on the migration destination before the migration is completed, the flag will also cover the state of images on the migration source after completion. This common state implies that the image is technically still open, but no writes will happen and any cached contents will be reloaded from disk if and when the image leaves this state. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-01-20 13:36:23 +01:00
Kevin Wolf	09e0c771e4	block: Assert no write requests under BDRV_O_INCOMING As long as BDRV_O_INCOMING is set, the image file is only opened so we have a file descriptor for it. We're definitely not supposed to modify the image, it's still owned by the migration source. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-01-20 13:36:23 +01:00
Kevin Wolf	b527c9b392	qcow2: Write full header on image creation When creating a qcow2 image, we didn't necessarily call qcow2_update_header(), but could end up with the basic header that qcow2_create2() created manually. One thing that this basic header lacks is the feature table. Let's make sure that it's always present. This requires a few updates to test cases as well. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-01-20 13:36:23 +01:00
Kevin Wolf	1a4828c793	qcow2: Write feature table only for v3 images Version 2 images don't have feature bits, so writing a feature table to those images is kind of pointless. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-01-20 13:36:23 +01:00
Peter Maydell	80c71a241a	block: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-01-20 13:36:23 +01:00
Christian Borntraeger	972b543c6b	block/raw-posix: avoid bogus fixup for cylinders on DASD disks large volume DASD that have > 64k cylinders do claim to have 0xFFFE cylinders as special value in the old 16 bit field. We want to pass this "token" along to the guest, instead of calculating the real number. Otherwise qemu might fail with "cyls must be between 1 and 65535" Cc: qemu-stable@nongnu.org Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-01-19 17:43:55 +01:00
Paolo Bonzini	f1c17521e7	nbd-server: do not exit on failed memory allocation The amount of memory allocated in nbd_co_receive_request is driven by the NBD client (possibly a virtual machine). Parallel I/O can cause the server to allocate a large amount of memory; check for failures and return ENOMEM in that case. Cc: qemu-block@nongnu.org Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-01-15 18:58:02 +01:00
Zhu Lingshan	1cb6d137ff	iscsi: send readcapacity10 when readcapacity16 failed When play with Dell MD3000 target, for sure it is a TYPE_DISK, but readcapacity16 would fail. Then we find that readcapacity10 succeeded. It looks like the target just support readcapacity10 even through it is a TYPE_DISK or have some TYPE_ROM characteristics. This patch can give a chance to send readcapacity16 when readcapacity10 failed. This patch is not harmful to original pathes Signed-off-by: Zhu Lingshan <lszhu@suse.com> Message-Id: <1451359934-9236-1-git-send-email-lszhu@suse.com> [Don't fall through on UNIT ATTENTION. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-01-15 18:58:01 +01:00
Markus Armbruster	bf89e87427	vhdx: Fix "log that needs to be replayed" error message The arguments of error_setg_errno() should yield a short error string without newlines. Here, we try to append additional help to the error message by embedding newlines in the error string. That's nice, but it's doesn't play nicely with the errno part. tests/qemu-iotests/070.out shows the resulting mess: can't open device TEST_DIR/iotest-dirtylog-10G-4M.vhdx: VHDX image file 'TEST_DIR/iotest-dirtylog-10G-4M.vhdx' opened read-only, but contains a log that needs to be replayed. To replay the log, execute: qemu-img check -r all 'TEST_DIR/iotest-dirtylog-10G-4M.vhdx': Operation not permitted Switch to error_setg() and error_append_hint(). Result: can't open device TEST_DIR/iotest-dirtylog-10G-4M.vhdx: VHDX image file 'TEST_DIR/iotest-dirtylog-10G-4M.vhdx' opened read-only, but contains a log that needs to be replayed To replay the log, run: qemu-img check -r all 'TEST_DIR/iotest-dirtylog-10G-4M.vhdx' Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1450452927-8346-21-git-send-email-armbru@redhat.com>	2016-01-13 15:16:18 +01:00
Markus Armbruster	d28d737fb9	vmdk: Clean up "Invalid extent lines" error message vmdk_parse_extents() reports parse errors like this: error_setg(errp, "Invalid extent lines:\n%s", p); where p points to the beginning of the malformed line in the image descriptor. This results in a multi-line error message Invalid extent lines: <first line that doesn't parse> <remaining text that may or may not parse, if any> Error messages should not have newlines embedded. Since the remaining text is not helpful, we can simply report: Invalid extent line: <first line that doesn't parse> Cc: Fam Zheng <famz@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1450452927-8346-19-git-send-email-armbru@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-01-13 15:16:18 +01:00
Markus Armbruster	e4937694b6	vmdk: Clean up control flow in vmdk_parse_extents() a bit Factor out loop stepping to turn a while-loop with goto into a for-loop with continue. Cc: Fam Zheng <famz@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-Id: <1450452927-8346-18-git-send-email-armbru@redhat.com>	2016-01-13 15:16:18 +01:00
Markus Armbruster	9af9e0fed7	error: Strip trailing '\n' from error string arguments (again) Commit `6daf194d`, `be62a2eb` and `312fd5f` got rid of a bunch, but they keep coming back. Tracked down with the Coccinelle semantic patch from commit `312fd5f`. Cc: Fam Zheng <famz@redhat.com> Cc: Peter Crosthwaite <crosthwaitepeter@gmail.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: Jason J. Herne <jjherne@linux.vnet.ibm.com> Cc: Stefan Berger <stefanb@linux.vnet.ibm.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Changchun Ouyang <changchun.ouyang@intel.com> Cc: zhanghailiang <zhang.zhanghailiang@huawei.com> Cc: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: Markus Armbruster <armbru@pond.sub.org> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Acked-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Acked-by: Fam Zheng <famz@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1450452927-8346-17-git-send-email-armbru@redhat.com>	2016-01-13 15:16:18 +01:00
Markus Armbruster	e43bfd9c87	error: Use error_prepend() where it makes obvious sense Done with this Coccinelle semantic patch @@ expression FMT, E1, E2; expression list ARGS; @@ - error_setg(E1, FMT, ARGS, error_get_pretty(E2)); + error_propagate(E1, E2);/###/ + error_prepend(E1, FMT/@@@/, ARGS); followed by manual cleanup, first because I can't figure out how to make Coccinelle transform strings, and second to get rid of now superfluous error_propagate(). We now use or propagate the original error whole instead of just its message obtained with error_get_pretty(). This avoids suppressing its hint (see commit `50b7b00`), but I can't see how the errors touched in this commit could come with hints. It also improves the message printed with &error_abort when we screw up (see commit `1e9b65b`). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-01-13 15:16:17 +01:00
Markus Armbruster	c29b77f955	error: Use error_reportf_err() where it makes obvious sense Done with this Coccinelle semantic patch @@ expression FMT, E, S; expression list ARGS; @@ - error_report(FMT, ARGS, error_get_pretty(E)); + error_reportf_err(E, FMT/@@@/, ARGS); ( - error_free(E); \| exit(S); \| abort(); ) followed by a replace of '%s"/@@@/' by '"' and some line rewrapping, because I can't figure out how to make Coccinelle transform strings. We now use the error whole instead of just its message obtained with error_get_pretty(). This avoids suppressing its hint (see commit `50b7b00`), but I can't see how the errors touched in this commit could come with hints. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1450452927-8346-12-git-send-email-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-01-13 15:16:17 +01:00
Markus Armbruster	4fffeb5e19	error: Use error_report_err() where appropriate (again) Same Coccinelle semantic patch as in commit `565f65d`. We now use the original error whole instead of just its message obtained with error_get_pretty(). This avoids suppressing its hint (see commit `50b7b00`), but I don't think the errors touched in this commit can come with hints. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1450452927-8346-3-git-send-email-armbru@redhat.com>	2016-01-13 15:16:16 +01:00
Zhu Lingshan	240125bc49	iscsi: fix readcapacity error message fix:The error message for readcapacity 16 incorrectly mentioned a readcapacity 10 failure, fixed the error message. Signed-off-by: Zhu Lingshan <lszhu@suse.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2016-01-11 11:39:28 +03:00
Fam Zheng	0fa296eb00	block/qapi: Clear err for further error Since `a5002d5` (block/qapi: allow best-effort query) we don't return at this error, however err must be cleared before passing to bdrv_query_snapshot_info_list below, as required by error API. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1450779107-26765-1-git-send-email-famz@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-01-07 21:30:17 +01:00
Stefan Hajnoczi	3515727f31	block/mirror: replace IOV_MAX with blk_get_max_iov() Use blk_get_max_iov() instead of hardcoding IOV_MAX, which may not apply to all BlockDrivers. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-12-22 16:01:07 +08:00
Stefan Hajnoczi	222565f65c	block: replace IOV_MAX with BlockLimits.max_iov Request merging must not result in a huge request that exceeds the maximum number of iovec elements. Use BlockLimits.max_iov instead of hardcoding IOV_MAX. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-12-22 16:01:07 +08:00
Stefan Hajnoczi	648296e067	block-backend: add blk_get_max_iov() Add a function to query BlockLimits.max_iov. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-12-22 16:01:07 +08:00
Stefan Hajnoczi	bd44feb754	block: add BlockLimits.max_iov field The maximum number of struct iovec elements depends on the BlockDriverState. The raw-posix and iSCSI protocols have a maximum of IOV_MAX but others could have different values. Cc: Peter Lieven <pl@kamp.de> Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-12-22 16:01:07 +08:00
John Snow	a5002d5302	block/qapi: allow best-effort query For more complex BDS trees that can be created under normal circumstances, we lose the ability to issue query commands because of our inability to re-construct the absolute filename. Instead, omit this field when it is a problem and present as much information as we can. This will change the expected output in iotest 110, where we will now see a json filename and the lack of an absolute filename instead of an error. Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1450122916-4706-6-git-send-email-jsnow@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-12-18 14:36:17 +01:00
John Snow	5c9d9ca597	block/qapi: explicitly warn if !has_full_backing_filename Disambiguate "Backing filename and full backing filename are equivalent" from "full backing filename could not be determined." Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1450122916-4706-4-git-send-email-jsnow@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-12-18 14:36:17 +01:00
John Snow	12dcb1c018	block/qapi: always report full_backing_filename Always report full_backing_filename, even if it's the same as backing_filename. In the next patch, full_backing_filename may be omitted if it cannot be generated instead of allowing e.g. drive_query to abort if it runs into this scenario. The presence or absence of the "full" field becomes useful information. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1450122916-4706-3-git-send-email-jsnow@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-12-18 14:36:17 +01:00
John Snow	548e1ff379	block/qapi: do not redundantly print "actual path" If it happens to match the backing path, that was the actual path. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1450122916-4706-2-git-send-email-jsnow@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-12-18 14:36:17 +01:00
Paolo Bonzini	ba88944495	block: fix bdrv_ioctl called from coroutine When called from a coroutine, bdrv_ioctl must be asynchronous just like e.g. bdrv_flush. The code was incorrectly making it synchronous, fix it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-12-18 14:34:44 +01:00
Paolo Bonzini	27a7649a48	block: use drained section around bdrv_snapshot_delete Do not use bdrv_drain, since by itself it does not guarantee anything. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-12-18 14:34:44 +01:00
Kevin Wolf	d657c0c289	raw-posix: Make aio=native option binding Traditionally, aio=native was treated as an advice that could simply be ignored if an error occurs while initialising Linux AIO or the feature wasn't compiled in. This behaviour was deprecated in commit `96518254` (qemu 2.3; error during init) and commit `1501ecc1` (qemu 2.5; not compiled in). This patch changes raw-posix to error out in these cases instead of printing a deprecation warning. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-12-18 14:34:43 +01:00
Denis V. Lunev	b1fc8f934b	qcow2: insert assert into qcow2_get_specific_info() s->qcow_version is always set to 2 or 3. Let's assert if this is wrong. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Max Reitz <mreitz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-12-18 14:34:43 +01:00
Max Reitz	03bb78ed25	qcow2: Point to amend function in check If a reference count is not representable with the current refcount order, the image check should point to qemu-img amend for increasing the refcount order. However, qemu-img amend needs write access to the image which cannot be provided if the image is marked corrupt; and the image check will not mark the image consistent unless everything actually is consistent. Therefore, if an image is marked corrupt and the image check encounters a reference count overflow, it cannot be fixed by using qemu-img amend to increase the refcount order. Instead, one has to use qemu-img convert to create a completely new copy of the image in this case. Alternatively, we may want to give the user a way of manually removing the corrupt flag, maybe through qemu-img amend, but this is not part of this patch. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-12-18 14:34:43 +01:00
Max Reitz	61ce55fc02	qcow2: Invoke refcount order amendment function Make use of qcow2_change_refcount_order() to support changing the refcount order with qemu-img amend. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-12-18 14:34:43 +01:00
Max Reitz	791c9a004e	qcow2: Add function for refcount order amendment Add a function qcow2_change_refcount_order() which allows changing the refcount order of a qcow2 image. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-12-18 14:34:43 +01:00
Max Reitz	c293a80927	qcow2: Use intermediate helper CB for amend If there is more than one time-consuming operation to be performed for qcow2_amend_options(), we need an intermediate CB which coordinates the progress of the individual operations and passes the result to the original status callback. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-12-18 14:34:43 +01:00
Max Reitz	1038bbb803	qcow2: Split upgrade/downgrade paths for amend If the image version should be upgraded, that is the first we should do; if it should be downgraded, that is the last we should do. So split the version change block into an upgrade part at the start and a downgrade part at the end. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-12-18 14:34:43 +01:00
Max Reitz	164e0f89cc	qcow2: Use abort() instead of assert(false) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-12-18 14:34:43 +01:00
Max Reitz	29d72431ef	qcow2: Use error_report() in qcow2_amend_options() Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-12-18 14:34:43 +01:00
Max Reitz	8b13976d3f	block: Add opaque value to the amend CB Add an opaque value which is to be passed to the bdrv_amend_options() status callback. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-12-18 14:34:43 +01:00
Kevin Wolf	c5e8bfb7cd	blkdebug: Enable reopen Just reopening the children (as block.c does now) is enough. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2015-12-18 14:34:43 +01:00
Kevin Wolf	4cdd01d32e	block: Pass driver-specific options to .bdrv_refresh_filename() In order to decide whether a blkdebug: filename can be produced or a json: one is necessary, blkdebug checked whether bs->options had more options than just "config", "x-image" or "image" (the latter including nested options). That doesn't work well when generic block layer options are present. This patch passes an option QDict to the driver that contains only driver-specific options, i.e. the options for the general block layer as well as child nodes are already filtered out. Works much better this way. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2015-12-18 14:34:42 +01:00
Kevin Wolf	d9b7b05703	block: Allow references for backing files For bs->file, using references to existing BDSes has been possible for a while already. This patch enables the same for bs->backing. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-12-18 14:34:42 +01:00
Kevin Wolf	40365552c2	mirror: Error out when a BDS would get two BBs bdrv_replace_in_backing_chain() asserts that not both old and new BlockDdriverState have a BlockBackend attached to them because both would have to end up pointing to the new BDS and we don't support more than one BB per BDS yet. Before we can safely allow references to existing nodes as backing files, we need to make sure that even if a backing file has a BB on it, this doesn't crash qemu. There are probably also some cases with the 'replaces' option set where drive-mirror could fail this assertion today. They are fixed with this error check as well. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-12-18 14:34:42 +01:00
Kevin Wolf	5365f44dfa	qcow2: Add .bdrv_join_options callback qcow2 accepts a few driver-specific options that overlap semantically (e.g. "overlap-check" is an alias of "overlap-check.template", and any missing cache size option is derived from the given ones). When bdrv_reopen() merges the set of updated options with left out options that should be kept at their old value, we need to consider this and filter out any duplicates (which would generally cause errors because new and old value would contradict each other). This patch adds a .bdrv_join_options callback to BlockDriver and implements it for qcow2. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2015-12-18 14:34:42 +01:00
Eric Blake	1310a3d3bd	qobject: Rename qtype_code to QType The name QType matches our CODING_STYLE conventions for type names in CamelCase. It also matches the fact that we are already naming all the enum members with a prefix of QTYPE, not QTYPE_CODE. And doing the rename will also make it easier for the next patch to use QAPI for providing the enum, which also wants CamelCase type names. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1449033659-25497-3-git-send-email-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2015-12-17 08:21:28 +01:00
Eric Blake	7fb1cf1606	qapi: Don't let implicit enum MAX member collide Now that we guarantee the user doesn't have any enum values beginning with a single underscore, we can use that for our own purposes. Renaming ENUM_MAX to ENUM__MAX makes it obvious that the sentinel is generated. This patch was mostly generated by applying a temporary patch: \|diff --git a/scripts/qapi.py b/scripts/qapi.py \|index e6d014b..b862ec9 100644 \|--- a/scripts/qapi.py \|+++ b/scripts/qapi.py \|@@ -1570,6 +1570,7 @@ const char const %(c_name)s_lookup[] = { \| max_index = c_enum_const(name, 'MAX', prefix) \| ret += mcgen(''' \| [%(max_index)s] = NULL, \|+// %(max_index)s \| }; \| ''', \| max_index=max_index) then running: $ cat qapi-{types,event}.c tests/test-qapi-types.c \| sed -n 's,^// $.$MAX,s\|\1MAX\|\1_MAX\|g,p' > list $ git grep -l _MAX \| xargs sed -i -f list The only things not generated are the changes in scripts/qapi.py. Rejecting enum members named 'MAX' is now useless, and will be dropped in the next patch. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1447836791-369-23-git-send-email-eblake@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> [Rebased to current master, commit message tweaked] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2015-12-17 08:21:28 +01:00
Eric Blake	a31939e6c8	blkdebug: Merge hand-rolled and qapi BlkdebugEvent enum No need to keep two separate enums, where editing one is likely to forget the other. Now that we can specify a qapi enum prefix, we don't even have to change the bulk of the uses. get_event_by_name() could perhaps be replaced by qapi_enum_parse(), but I left that for another day. CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1447836791-369-20-git-send-email-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2015-12-17 08:21:27 +01:00
Fam Zheng	61408b250e	block: Don't wait serialising for non-COR read requests The assertion problem was noticed in `06c3916b35`, but it wasn't completely fixed, because even though the req is not marked as serialising, it still gets serialised by wait_serialising_requests against other serialising requests, which could lead to the same assertion failure. Fix it by even more explicitly skipping the serialising for this specific case. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1448962590-2842-2-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-12-03 11:08:07 +08:00
Peter Maydell	cf22132367	Block layer patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJWXxTvAAoJEH8JsnLIjy/Wg5gQAKREoHJUx9x35EHJO+bIfCzd vBYbMS53VZAO308twPmA4Jxg0GyEna65Fn7qoeI0fCPSKSNibHxg+uKi/xopIsaD aKo+7ypp9aCqsZXDpMKMYn0JKBXIdGObE/FuQRv7ALqe9uEts8TLUzOtPNmn95YL nx7BJsJ+mCJucBP8owOPCNHMDnYvIpHj6bG2JxOZB+K5AFeT92gEgRKd9oPQOOY8 N/oAqLaMkSPLDmG0ox+xbVrQflTTgktENRJpDIiYiSTRXg/5UGZs2UdqbAzCs9GA IBvUlrbDpEzCfnydX6r4lg7Ve5al0uWuHM31wzjeJw4zLvx7sLf5qfn9PZws/WOr 33sq2ar0rcVpOJD03m+IYtAU8EOVrc/ek012AELJBA2pSmR//GrMVlGRfqbNWq5m 7dgsnVaMrzESqJkiRQGn4STHAPOFHmOcK3sDq37En9rB2dbzPZr0J8YTyTp075ah /vujdzfacH7fkmqa4Pcz984pZoUohce5Gj8zWx8j3acl50sHCyg8tdb0P6VB+Mau +cL+kRPj0Lzk3opJsyylTSwyn8srLc+KMuZ2KhG30d8Dg7ogRIcoICRdq6yEM5kg hx3c5MS/3ArVno9OMxAz111JEr5t4bBUk1AUJszrqPuOEnCEmzktFa2ihFZ6BWbo 3sqrpUwInznKu1mhW+Ab =Dtb6 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches # gpg: Signature made Wed 02 Dec 2015 15:57:35 GMT using RSA key ID C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" * remotes/kevin/tags/for-upstream: blkdebug: silence warning under qtest qcow2: Fix potential qemu-img check crash on 32 bit hosts Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-12-02 17:05:34 +00:00
Fam Zheng	176c36997f	mirror: Quiesce source during "mirror_exit" With dataplane, the ioeventfd events could be dispatched after mirror_run releases the dirty bitmap, but before mirror_exit actually does the device switch, because the iothread will still be running, and it will cause silent data loss. Fix this by adding a bdrv_drained_begin/end pair around the window, so that no new external request will be handled. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-12-02 10:44:06 -05:00
Michael S. Tsirkin	20873526a3	blkdebug: silence warning under qtest make check always outputs warnings, this is not nice. Disable blkdebug warnings under qtest. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-id: 1448883874-17933-1-git-send-email-mst@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-12-02 16:28:10 +01:00
Kevin Wolf	c2551b47c9	qcow2: Fix potential qemu-img check crash on 32 bit hosts This crash was caught with qemu-iotests test case 138. Commit `b6d36de` already fixed a few 32 bit truncation bugs that could cause qemu-img check to allocate too little memory and consequently it would segfault. On 32 bit hosts, there is one more place that needs to be fixed because size_t was involved in the calculation and is a 32 bit type there. Cc: qemu-stable@nongnu.org Reported-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Michael S. Tsirkin <mst@redhat.com>	2015-12-02 13:22:29 +01:00
Markus Armbruster	903c341d57	block/qapi: Plug memory leak on query-block error path Spotted by Coverity. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-25 14:27:43 +01:00
Programmingkid	98caa5bc00	raw-posix.c: Make GetBSDPath() handle caching options Add support for caching options that can be specified from the command line. The CD-ROM raw char device bypasses the host page cache and therefore has alignment requirements. Alignment probing is necessary so only use the raw char device if BDRV_O_NOCACHE is set. This patch fixes -cdrom /dev/cdrom on Mac OS X hosts, where bdrv_read() used to fail due to misaligned requests during image format probing. Signed-off-by: John Arbuckle <programmingkidx@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-25 14:27:43 +01:00
Vladimir Sementsov-Ogievskiy	c9f6856ded	parallels: dirty BAT properly for continuous allocations This patch marks part of the BAT dirty properly. There is a possibility that multy-block allocation could have one block allocated on one BAT page and next block on the next page. The code without the patch could not save updated position to the file. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Message-id: 1447779778-26062-1-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-11-24 09:25:36 +08:00
Denis V. Lunev	7cb1448149	migration: implement bdrv_all_find_vmstate_bs helper The patch also ensures proper locking for the operation. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2015-11-19 11:50:00 +01:00
Denis V. Lunev	a9085f9b55	snapshot: create bdrv_all_create_snapshot helper to create snapshot for all loaded block drivers. The patch also ensures proper locking. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2015-11-19 11:50:00 +01:00
Denis V. Lunev	723ccda1a0	snapshot: create bdrv_all_find_snapshot helper to check that snapshot is available for all loaded block drivers. The check bs != bs1 in hmp_info_snapshots is an optimization. The check for availability of this snapshot will return always true as the list of snapshots was collected from that image. The patch also ensures proper locking. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2015-11-19 11:50:00 +01:00
Denis V. Lunev	4c1cdbaad0	snapshot: create bdrv_all_goto_snapshot helper to switch to snapshot on all loaded block drivers. The patch also ensures proper locking. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2015-11-19 11:50:00 +01:00
Denis V. Lunev	9b00ea376d	snapshot: create bdrv_all_delete_snapshot helper to delete snapshots from all loaded block drivers. The patch also ensures proper locking. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2015-11-19 11:50:00 +01:00
Denis V. Lunev	25af925fff	snapshot: return error code from bdrv_snapshot_delete_by_id_or_name this will make code better in the next patch Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2015-11-19 11:50:00 +01:00
Denis V. Lunev	e9ff957ac2	snapshot: create helper to test that block drivers supports snapshots The patch enforces proper locking for this operation. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2015-11-19 11:50:00 +01:00
Peter Lieven	ca78ecfa72	block: add blk_abort_aio_request Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1447345846-15624-3-git-send-email-pl@kamp.de Signed-off-by: John Snow <jsnow@redhat.com>	2015-11-17 15:06:21 -05:00
Alberto Garcia	aece5edc96	block: Update copyright of the accounting code Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 80a2278e3ec2dafd5daab20a7cb2d6a9b83371e4.1446044838.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:47 +01:00
Alberto Garcia	918a17a464	block: Use QEMU_CLOCK_VIRTUAL for the accounting code in qtest mode This patch switches to QEMU_CLOCK_VIRTUAL for the accounting code in qtest mode, and makes the latency of the operation constant. This way we can perform tests on the accounting code with reproducible results. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 35ed0501450fa572684e9b5e92c361ab6cce565b.1446044838.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:46 +01:00
Alberto Garcia	96e4dedaff	block: Add average I/O queue depth to BlockDeviceTimedStats This patch adds two new fields to BlockDeviceTimedStats that track the average number of pending read and write requests for a block device. The values are calculated for the period of time defined for that interval. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: fd31fef53e2714f2f30d59ed58ca2f67ec9ab926.1446044837.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:46 +01:00
Alberto Garcia	979e9b03fc	block: Compute minimum, maximum and average I/O latencies This patch keeps track of the minimum, maximum and average latencies of I/O operations during a certain interval of time. The values are exposed in the BlockDeviceTimedStats structure. An option to define the intervals to collect these statistics will be added in a separate patch. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: c7382dc89622c64f918d09f32815827772628f8e.1446044837.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:45 +01:00
Alberto Garcia	362e9299b3	block: Allow configuring whether to account failed and invalid ops This patch adds two options, "stats-account-invalid" and "stats-account-failed", that can be used to decide whether invalid and failed I/O operations must be used when collecting statistics for latency and last access time. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: ebc7e5966511a342cad428a392c5f5ad56b15213.1446044837.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:45 +01:00
Alberto Garcia	7ee12dafe9	block: Add statistics for failed and invalid I/O operations This patch adds the block_acct_failed() and block_acct_invalid() functions to allow keeping track of failed and invalid I/O operations. The number of failed and invalid operations is exposed in BlockDeviceStats. We don't keep track of the time spent on invalid operations because they are cancelled immediately when they are started. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: a7256ccb883a86356b1c6c46b5a29ed5448546a5.1446044837.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:45 +01:00
Alberto Garcia	cb38fffbc9	block: Add idle_time_ns to BlockDeviceStats This patch adds the new field 'idle_time_ns' to the BlockDeviceStats structure, indicating the time that has passed since the previous I/O operation. It also adds the block_acct_idle_time_ns() call, to ensure that all references to the clock type used for accounting are in the same place. This will later allow us to use a different clock for iotests. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 7d8cfcf931453e1a2443e6626e8c1edc347c7c8a.1446044837.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:45 +01:00
Alberto Garcia	5519593c07	block: define 'clock_type' for the accounting code Its value is still QEMU_CLOCK_REALTIME, but having it in a variable will allow us to change its value easily in the future when running in qtest mode. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 547485eb841cf9e3b2770c96539ae9ae5996e214.1446044837.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:45 +01:00
John Snow	78f51fde88	block: Add BlockJobTxn support to backup_run Allow a BlockJobTxn to be passed into backup_run, which will allow the job to join a transactional group if present. Propagate this new parameter outward into new QMP helper functions in blockdev.c to allow transaction commands to pass forward their BlockJobTxn object in a forthcoming patch. [split up from a patch originally by Stefan and Fam. --js] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1446765200-3054-12-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:44 +01:00
John Snow	c347b2c62a	block/backup: Rely on commit/abort for cleanup Switch over to the new .commit/.abort handlers for cleaning up incremental bitmaps. [split up from a patch originally by Stefan and Fam. --js] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1446765200-3054-11-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:44 +01:00
Fam Zheng	18930ba3d1	blockjob: Introduce reference count and fix reference to job->bs Add reference count to block job, meanwhile move the ownership of the reference to job->bs from the caller (which is released in two completion callbacks) to the block job itself. It is necessary for block_job_complete_sync to work, because block job shouldn't live longer than its bs, as asserted in bdrv_delete. Now block_job_complete_sync can be simplified. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1446765200-3054-6-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:43 +01:00
Fam Zheng	b976ea3cf5	backup: Extract dirty bitmap handling as a separate function This will be reused by the coming new transactional completion code. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1446765200-3054-5-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:43 +01:00
Fam Zheng	df9a681dc9	qed: Implement .bdrv_drain The "need_check_timer" is used to clear the "NEED_CHECK" flag in the image header after a grace period once metadata update has finished. In compliance to the bdrv_drain semantics we should make sure it remains deleted once .bdrv_drain is called. We cannot reuse qed_need_check_timer_cb because here it doesn't satisfy the assertion. Do the "plug" and "flush" calls manually. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1447064214-29930-10-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:43 +01:00
Fam Zheng	67da1dc5ce	block: Introduce BlockDriver.bdrv_drain callback Drivers can have internal request sources that generate IO, like the need_check_timer in QED. Since we want quiesced periods that contain nested event loops in block layer, we need to have a way to disable such event sources. Block drivers must implement the "bdrv_drain" callback if it has any internal sources that can generate I/O activity, like a timer or a worker thread (even in a library) that can schedule QEMUBH in an asynchronous callback. Update the comments of bdrv_drain and bdrv_drained_begin accordingly. Like bdrv_requests_pending(), we should consider all the children of bs. Before, the while loop just works, as bdrv_requests_pending() already tracks its children; now we mustn't miss the callback, so recurse down explicitly. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1447064214-29930-9-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:43 +01:00
Fam Zheng	83c98d7b92	block: Drop BlockDriver.bdrv_ioctl Now the callback is not used any more, drop the field along with all implementations in block drivers, which are iscsi and raw. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1447064214-29930-8-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:43 +01:00
Fam Zheng	5c5ae76acb	block: Emulate bdrv_ioctl with bdrv_aio_ioctl and track both Currently all drivers that support .bdrv_aio_ioctl also implement .bdrv_ioctl redundantly. To track ioctl requests in block layer it is easier if we unify the two paths, because we'll need to run it in a coroutine, as required by tracked_request_begin. While we're at it, use .bdrv_aio_ioctl plus aio_poll() to emulate bdrv_ioctl(). Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1447064214-29930-7-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:43 +01:00
Fam Zheng	4bb17ab51a	iscsi: Emulate commands in iscsi_aio_ioctl as iscsi_ioctl iscsi_ioctl emulates SG_GET_VERSION_NUM and SG_GET_SCSI_ID. Now that bdrv_ioctl() will be emulated with .bdrv_aio_ioctl, replicate the logic into iscsi_aio_ioctl to make them consistent. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1447064214-29930-5-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:42 +01:00
Fam Zheng	b1066c8755	block: Track discard requests Both bdrv_discard and bdrv_aio_discard will call into bdrv_co_discard, so add tracked_request_begin/end calls around the loop. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1447064214-29930-4-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:42 +01:00
Fam Zheng	cdb5e3155e	block: Track flush requests Both bdrv_flush and bdrv_aio_flush eventually call bdrv_co_flush, add tracked_request_begin and tracked_request_end pair in that function so that all flush requests are now tracked. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1447064214-29930-3-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:42 +01:00
Fam Zheng	ebde595ce6	block: Add more types for tracked request We'll track more request types besides read and write, change the boolean field to an enum. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1447064214-29930-2-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:08 +01:00
Peter Maydell	7497b8dddc	-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWQ4IFAAoJEL2+eyfA3jBXJI4P/21rybu1IWAVWHBVsrYw25cx eO8FTtiU6/g+OuYgqYM5mRqE0ZvV+mE9nv1t40EvGlP+W8bVbue9vIGoz1523ygh U3GABsicHVOjDHQA1yw7PcPfMPrPxSJE96rpxeYvingkXtfA9aFEtYGTGcNSRcNk w7WtQdsOEPWfMLtvfcz3pkyZWMVTN7Uo2AETmQgkZBkZhI4kPFzkXk+p+WbeTjRp S329GKJkMgcsZ6XNWp27rIeb0T4O9AfZOOQtShNOS8IQSq1lEf6smfF6swQqSplI E6tgmeK+oJi2SV7wSURYb/6ypM3v408mCjxHdCmNdHzbwoPnFd5FF7qmz+00mQ2C dFduVlgNLpq26uzvR/3+nBJIffjTU4mF2aCa4mMCEKfD4qgHuCkMLJme4e3KBdTW RvzefSVWML9zcgOYA/NVCy9WHIZEvyleXAVupbABJ+/w1iJYTqaphMhecfYux0S0 BGyFSPRLYS1/JC4OSzVhW0wHj3QYgs/edMtIh6jRXuFWdhV1kurlcXR7pZvqjUbS xRWKZeyaWpIEbvR9BcYiHiidQPNXwT1VikIJ6CEtSe5bmyTyClQ3j6laCVhJGxRW ymWzQd7RUDvy4w7aIZJSyUubTtj7NohPn86BHGIWRXa6+kSy5C+Gz98ijR1Jb2gl xB2CkNsDHi4f8nWO/syI =tlqg -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging # gpg: Signature made Wed 11 Nov 2015 17:59:33 GMT using RSA key ID C0DE3057 # gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>" # gpg: aka "Jeffrey Cody <jeff@codyprime.org>" # gpg: aka "Jeffrey Cody <codyprime@gmail.com>" * remotes/cody/tags/block-pull-request: gluster: allocate GlusterAIOCBs on the stack Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-11-11 23:20:07 +00:00
Kevin Wolf	a99dfb45f2	qcow2: Fix qcow2_get_cluster_offset() for zero clusters When searching for contiguous zero clusters, we only need to check the cluster type. Before this patch, an increasing offset (L2E_OFFSET_MASK) was expected, so that the function never returned more than a single zero cluster in practice. This patch fixes it to actually return as many contiguous zero clusters as it can. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1446657384-5907-1-git-send-email-kwolf@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-11-11 16:55:29 +01:00
Alberto Garcia	f636ae85f3	block: Add blk_get_refcnt() This function returns the reference count of a given BlockBackend. For convenience, it returns 0 if the BlockBackend pointer is NULL. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: dfdd8a17dbe3288842840636d2cfe5bb895abcb0.1446475331.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-11-11 16:55:28 +01:00
Alberto Garcia	10f3cd15dd	mirror: block all operations on the target image during the job There's nothing preventing the target image from being used by other operations during the 'drive-mirror' job, so we should block them all until the job is done. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 82b88fd04cde918a08a6f9a4ab85626d7fd7b502.1446475331.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-11-11 16:55:28 +01:00

... 3 4 5 6 7 ...

2580 Commits