mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Max Reitz	5e003f17ec	block: Make bdrv_next() keep strong references On one hand, it is a good idea for bdrv_next() to return a strong reference because ideally nearly every pointer should be refcounted. This fixes intermittent failure of iotest 194. On the other, it is absolutely necessary for bdrv_next() itself to keep a strong reference to both the BB (in its first phase) and the BDS (at least in the second phase) because when called the next time, it will dereference those objects to get a link to the next one. Therefore, it needs these objects to stay around until then. Just storing the pointer to the next in the iterator is not really viable because that pointer might become invalid as well. Both arguments taken together means we should probably just invoke bdrv_ref() and blk_ref() in bdrv_next(). This means we have to assert that bdrv_next() is always called from the main loop, but that was probably necessary already before this patch and judging from the callers, it also looks to actually be the case. Keeping these strong references means however that callers need to give them up if they decide to abort the iteration early. They can do so through the new bdrv_next_cleanup() function. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171110172545.32609-1-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-17 18:21:31 +01:00
Alberto Garcia	c89bcf3af0	block: Leave valid throttle timers when removing a BDS from a backend If a BlockBackend has I/O limits set then its ThrottleGroupMember structure uses the AioContext from its attached BlockDriverState. Those two contexts must be kept in sync manually. This is not ideal and will be fixed in the future by removing the throttling configuration from the BlockBackend and storing it in an implicit filter node instead, but for now we have to live with this. When you remove the BlockDriverState from the backend then the throttle timers are destroyed. If a new BlockDriverState is later inserted then they are created again using the new AioContext. There are a couple of problems with this: a) The code manipulates the timers directly, leaving the ThrottleGroupMember.aio_context field in an inconsisent state. b) If you remove the I/O limits (e.g by destroying the backend) when the timers are gone then throttle_group_unregister_tgm() will attempt to destroy them again, crashing QEMU. While b) could be fixed easily by allowing the timers to be freed twice, this would result in a situation in which we can no longer guarantee that a valid ThrottleState has a valid AioContext and timers. This patch ensures that the timers and AioContext are always valid when I/O limits are set, regardless of whether the BlockBackend has a BlockDriverState inserted or not. [Fixed "There'a" typo as suggested by Max Reitz <mreitz@redhat.com> --Stefan] Reported-by: sochin jiang <sochin.jiang@huawei.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: e089c66e7c20289b046d782cea4373b765c5bc1d.1510339534.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-11-13 15:43:49 +00:00
Alberto Garcia	48bf7ea81a	block: Check for inserted BlockDriverState in blk_io_limits_disable() When you set I/O limits using block_set_io_throttle or the command line throttling.* options they are kept in the BlockBackend regardless of whether a BlockDriverState is attached to the backend or not. Therefore when removing the limits using blk_io_limits_disable() we need to check if there's a BDS before attempting to drain it, else it will crash QEMU. This can be reproduced very easily using HMP: (qemu) drive_add 0 if=none,throttling.iops-total=5000 (qemu) drive_del none0 Reported-by: sochin jiang <sochin.jiang@huawei.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 0d3a67ce8d948bb33e08672564714dcfb76a3d8c.1510339534.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-11-13 14:38:46 +00:00
Stefan Hajnoczi	dc868fb03b	throttle-groups: drain before detaching ThrottleState I/O requests hang after stop/cont commands at least since QEMU 2.10.0 with -drive iops=100: (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct count=1000 (qemu) stop (qemu) cont ...I/O is stuck... This happens because blk_set_aio_context() detaches the ThrottleState while requests may still be in flight: if (tgm->throttle_state) { throttle_group_detach_aio_context(tgm); throttle_group_attach_aio_context(tgm, new_context); } This patch encloses the detach/attach calls in a drained region so no I/O request is left hanging. Also add assertions so we don't make the same mistake again in the future. Reported-by: Yongxue Hong <yhong@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20171110151934.16883-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-11-13 14:02:09 +00:00
Zhengui	632a773543	block: all I/O should be completed before removing throttle timers. In blk_remove_bs, all I/O should be completed before removing throttle timers. If there has inflight I/O, removing throttle timers here will cause the inflight I/O never return. This patch add bdrv_drained_begin before throttle_timers_detach_aio_context to let all I/O completed before removing throttle timers. [Moved declaration of bs as suggested by Alberto Garcia <berto@igalia.com>. --Stefan] Signed-off-by: Zhengui <lizhengui@huawei.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1508564040-120700-1-git-send-email-lizhengui@huawei.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-11-13 14:02:05 +00:00
Manos Pitsidianakis	f738cfc843	block: tidy ThrottleGroupMember initializations Move the CoMutex and CoQueue inits inside throttle_group_register_tgm() which is called whenever a ThrottleGroupMember is initialized. There's no need for them to be separate. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-05 16:47:52 +02:00
Manos Pitsidianakis	c61791fc23	block: add aio_context field in ThrottleGroupMember timer_cb() needs to know about the current Aio context of the throttle request that is woken up. In order to make ThrottleGroupMember backend agnostic, this information is stored in an aio_context field instead of accessing it from BlockBackend. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-05 16:47:52 +02:00
Manos Pitsidianakis	022cdc9f40	block: move ThrottleGroup membership to ThrottleGroupMember This commit eliminates the 1:1 relationship between BlockBackend and throttle group state. Users will be able to create multiple throttle nodes, each with its own throttle group state, in the future. The throttle group state cannot be per-BlockBackend anymore, it must be per-throttle node. This is done by gathering ThrottleGroup membership details from BlockBackendPublic into ThrottleGroupMember and refactoring existing code to use the structure. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-09-05 16:47:51 +02:00
Fam Zheng	ca2e214411	block-backend: Allow more "can inactivate" cases These two conditions corresponds to mirror job's source and target, which need to be allowed as they are part of the non-shared storage migration workflow: failing to inactivate either will result in a failure during migration completion. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170823134242.12080-3-famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [eblake: improve comment grammar] Signed-off-by: Eric Blake <eblake@redhat.com>	2017-08-23 10:21:55 -05:00
Fam Zheng	c16de8f59a	block-backend: Refactor inactivate check The logic will be fixed (extended), move it to a separate function. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170823134242.12080-2-famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-08-23 10:21:55 -05:00
Fam Zheng	5f7772c4d0	block-backend: Defer shared_perm tightening migration completion As in the case of nbd_export_new(), bdrv_invalidate_cache() can be called when migration is still in progress. In this case we are not ready to tighten the shared permissions fenced by blk->disable_perm. Defer to a VM state change handler. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170815130740.31229-4-famz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2017-08-15 10:03:28 -05:00
Kevin Wolf	a429b9b5f4	block: Make blk_all_next() public Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2017-07-18 15:14:36 +02:00
Kevin Wolf	77beef8365	block: Make blk_get_attached_dev_id() public Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2017-07-18 15:14:35 +02:00
Max Reitz	3a691c50f1	block: Add PreallocMode to blk_truncate() blk_truncate() itself will pass that value to bdrv_truncate(), and all callers of blk_truncate() just set the parameter to PREALLOC_MODE_OFF for now. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-4-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:01 +02:00
Max Reitz	7ea37c3066	block: Add PreallocMode to bdrv_truncate() For block drivers that just pass a truncate request to the underlying protocol, we can now pass the preallocation mode instead of aborting if it is not PREALLOC_MODE_OFF. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:01 +02:00
Manos Pitsidianakis	f5a5ca7969	block: change variable names in BlockDriverState Change the 'int count' parameter in pwrite_zeros, pdiscard related functions (and some others) to 'int bytes', as they both refer to bytes. This helps with code legibility. Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Message-id: 20170609101808.13506-1-el13635@mail.ntua.gr Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-06-26 14:54:46 +02:00
Paolo Bonzini	9caa6f3dbe	block: split BlockAcctStats creation and setup block_acct_destroy is called unconditionally in blk_delete, but there is no BlockAcctStats function that is called unconditionally in blk_new. Split block_acct_init in two, so that it will be possible to create a QemuMutex in block_acct_init and destroy it in block_acct_cleanup. Cc: Alberto Garcia <berto@igalia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-19-pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	93001e9d87	throttle-groups: protect throttled requests with a CoMutex Another possibility is to use tg->lock, which we're holding anyway in both schedule_next_request and throttle_group_co_io_limits_intercept. This would require open-coding the CoQueue however, so I've chosen this alternative. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-10-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	d993b85804	block: access io_limits_disabled with atomic ops Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-4-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Kevin Wolf	93c26503e0	block: Fix anonymous BBs in blk_root_inactivate() blk->name isn't an array, but a pointer that can be NULL. Checking for an anonymous BB must involve a NULL check first, otherwise we get crashes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2017-06-09 11:45:03 +02:00
Kevin Wolf	cfa1a5723f	block: Drop permissions when migration completes With image locking, permissions affect other qemu processes as well. We want to be sure that the destination can run, so let's drop permissions on the source when migration completes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Kevin Wolf	4417ab7adf	block: New BdrvChildRole.activate() for blk_resume_after_migration() Instead of manually calling blk_resume_after_migration() in migration code after doing bdrv_invalidate_cache_all(), integrate the BlockBackend activation with cache invalidation into a single function. This is achieved with a new callback in BdrvChildRole that is called by bdrv_invalidate_cache_all(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Max Reitz	ed3d2ec98a	block: Add errp to b{lk,drv}_truncate() For one thing, this allows us to drop the error message generation from qemu-img.c and blockdev.c and instead have it unified in bdrv_truncate(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170328205129.15138-3-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-28 16:02:02 +02:00
Krzysztof Kozlowski	0731a50feb	block: Constify data passed by pointer to blk_name blk_name() is not modifying data passed to it through pointer and it returns also a pointer to const so the argument can be made const for code safeness. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-04-27 15:39:49 +02:00
Eric Blake	1606e4cf8a	throttle: Remove block from group on hot-unplug When a block device that is part of a throttle group is hot-unplugged, we forgot to remove it from the throttle group. This leaves stale memory around, and causes an easily reproducible crash: $ ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio \ -device virtio-scsi-pci,bus=pci.0 -drive \ id=drive_image2,if=none,format=raw,file=file2,bps=512000,iops=100,group=foo \ -device scsi-hd,id=image2,drive=drive_image2 -drive \ id=drive_image3,if=none,format=raw,file=file3,bps=512000,iops=100,group=foo \ -device scsi-hd,id=image3,drive=drive_image3 {'execute':'qmp_capabilities'} {'execute':'device_del','arguments':{'id':'image3'}} {'execute':'system_reset'} Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1428810 Suggested-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170406190847.29347-1-eblake@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-11 15:33:00 +02:00
Fam Zheng	e92f0e1910	block: Use bdrv_coroutine_enter to start I/O coroutines BDRV_POLL_WHILE waits for the started I/O by releasing bs's ctx then polling the main context, which relies on the yielded coroutine continuing on bs->ctx before notifying qemu_aio_context with bdrv_wakeup(). Thus, using qemu_coroutine_enter to start I/O is wrong because if the coroutine is entered from main loop, co->ctx will be qemu_aio_context, as a result of the "release, poll, acquire" loop of BDRV_POLL_WHILE, race conditions happen when both main thread and the iothread access the same BDS: main loop iothread ----------------------------------------------------------------------- blockdev_snapshot aio_context_acquire(bs->ctx) virtio_scsi_data_plane_handle_cmd bdrv_drained_begin(bs->ctx) bdrv_flush(bs) bdrv_co_flush(bs) aio_context_acquire(bs->ctx).enter ... qemu_coroutine_yield(co) BDRV_POLL_WHILE() aio_context_release(bs->ctx) aio_context_acquire(bs->ctx).return ... aio_co_wake(co) aio_poll(qemu_aio_context) ... co_schedule_bh_cb() ... qemu_coroutine_enter(co) ... /* (A) bdrv_co_flush(bs) /* (B) I/O on bs / continues... / aio_context_release(bs->ctx) aio_context_acquire(bs->ctx) Note that in above case, bdrv_drained_begin() doesn't do the "release, poll, acquire" in BDRV_POLL_WHILE, because bs->in_flight == 0. Fix this by using bdrv_coroutine_enter and enter coroutine in the right context. iotests 109 output is updated because the coroutine reenter flow during mirror job complete is different (now through co_queue_wakeup, instead of the unconditional qemu_coroutine_switch before), making the end job len different. Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2017-04-11 20:07:15 +08:00
Kevin Wolf	d35ff5e6b3	block: Ignore guest dev permissions during incoming migration Usually guest devices don't like other writers to the same image, so they use blk_set_perm() to prevent this from happening. In the migration phase before the VM is actually running, though, they don't have a problem with writes to the image. On the other hand, storage migration needs to be able to write to the image in this phase, so the restrictive blk_set_perm() call of qdev devices breaks it. This patch flags all BlockBackends with a qdev device as blk->disable_perm during incoming migration, which means that the requested permissions are stored in the BlockBackend, but not actually applied to its root node yet. Once migration has finished and the VM should be resumed, the permissions are applied. If they cannot be applied (e.g. because the NBD server used for block migration hasn't been shut down), resuming the VM fails. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Kashyap Chamarthy <kchamart@redhat.com>	2017-04-07 14:44:05 +02:00
John Snow	f4d9cc88ee	block-backend: add drained_begin / drained_end ops Allow block backends to forward drain requests to their devices/users. The initial intended purpose for this patch is to allow BBs to forward requests along to BlockJobs, which will want to pause if their associated BB has entered a drained region. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170316212351.13797-3-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2017-03-22 13:26:27 -04:00
Fam Zheng	50bfbe93b2	block: Don't use error_abort in blk_new_open We have an errp and bdrv_root_attach_child can fail permission check, error_abort is not the best choice here. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-07 14:53:29 +01:00
Kevin Wolf	887354bd13	hmp: Request permissions in qemu-io The HMP command 'qemu-io' is a bit tricky because it wants to work on the original BlockBackend, but additional permissions could be required. The details are explained in a comment in the code, but in summary, just request whatever permissions the current qemu-io command needs. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:47:50 +01:00
Kevin Wolf	b541155587	block: Add BdrvChildRole.get_parent_desc() For meaningful error messages in the permission system, we need to get some human-readable description of the parent of a BdrvChild. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:37 +01:00
Kevin Wolf	39829a01ae	block: Allow error return in BlockDevOps.change_media_cb() Some devices allow a media change between read-only and read-write media. They need to adapt the permissions in their .change_media_cb() implementation, which can fail. So add an Error parameter to the function. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	c62d32f503	block: Request real permissions in blk_new_open() We can figure out the necessary permissions from the flags that the caller passed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	d7086422b1	block: Add error parameter to blk_insert_bs() Now that blk_insert_bs() requests the BlockBackend permissions for the node it attaches to, it can fail. Instead of aborting, pass the errors to the callers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	6d0eb64d5c	block: Add permissions to blk_new() We want every user to be specific about the permissions it needs, so we'll pass the initial permissions as parameters to blk_new(). A user only needs to call blk_set_perm() if it wants to change the permissions after the fact. The permissions are stored in the BlockBackend and applied whenever a BlockDriverState should be attached in blk_insert_bs(). This does not include actually choosing the right set of permissions everywhere yet. Instead, the usual FIXME comment is added to each place and will be addressed in individual patches. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	981776b348	block: Add permissions to BlockBackend The BlockBackend can now store the permissions that its user requires. This is necessary because nodes can be ejected from or inserted into a BlockBackend and all of these operations must make sure that the user still gets what it requested initially. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	d5e6f437c5	block: Let callers request permissions when attaching a child node When attaching a node as a child to a new parent, the required and shared permissions for this parent are checked against all other parents of the node now, and an error is returned if there is a conflict. This allows error returns to a function that previously always succeeded, and the same is true for quite a few callers and their callers. Converting all of them within the same patch would be too much, so for now everyone tells that they don't need any permissions and allow everyone else to do anything. This way we can use &error_abort initially and convert caller by caller to pass actual permission requirements and implement error handling. All these places are marked with FIXME comments and it will be the job of the next patches to clean them up again. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	52cdbc5869	block: Pass BdrvChild to bdrv_truncate() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-24 16:09:23 +01:00
Paolo Bonzini	b9e413dd37	block: explicitly acquire aiocontext in aio callbacks that need it Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-16-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:39 +00:00
Paolo Bonzini	1919631e6b	block: explicitly acquire aiocontext in bottom halves that need it Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-15-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:39:39 +00:00
Paolo Bonzini	35f106e684	block-backend: allow blk_prw from coroutine context qcow2_create2 calls this. Do not run a nested event loop, as that breaks when aio_co_wake tries to queue the coroutine on the co_queue_wakeup list of the currently running one. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213135235.12274-4-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-02-21 11:14:07 +00:00
John Snow	c47ee043dc	block-backend: Always notify on blk_eject blk_eject is only used by scsi-disk and atapi, and in both cases we only attempt to invoke blk_eject if we have a bona-fide change in tray state. The "issue" here is that the tray state does not generate a QMP event unless there is a medium/BDS attached to the device, so if libvirt et al are waiting for a tray event to occur from an empty-but-closed drive, software opening that drive will not emit an event and libvirt will wait forever. Change this by modifying blk_eject to always emit an event, instead of conditionally on a "real" backend eject. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1373264 Reported-by: Peter Krempa <pkrempa@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1478553214-497-2-git-send-email-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2016-11-14 11:15:54 -05:00
Paolo Bonzini	88b062c203	block: introduce BDRV_POLL_WHILE We want the BDS event loop to run exclusively in the iothread that owns the BDS's AioContext. This macro will provide the synchronization between the two event loops; for now it just wraps the common idiom of a while loop around aio_poll. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-Id: <1477565348-5458-8-git-send-email-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2016-10-28 21:50:18 +08:00
Paolo Bonzini	9972354856	block: add BDS field to count in-flight requests Unlike tracked_requests, this field also counts throttled requests, and remains non-zero if an AIO operation needs a BH to be "really" completed. With this change, it is no longer necessary to have a dummy BdrvTrackedRequest for requests that are never serialising, and it is no longer necessary to poll the AioContext once after bdrv_requests_pending(bs) returns false. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-Id: <1477565348-5458-5-git-send-email-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2016-10-28 21:50:18 +08:00
Kevin Wolf	48af776a5b	block: Use blk_co_ioctl() for all BB level ioctls All read/write functions already have a single coroutine-based function on the BlockBackend level through which all requests go (no matter what API style the external caller used) and which passes the requests down to the block node level. This patch exports a bdrv_co_ioctl() function and uses it to extend this mode of operation to ioctls. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-10-27 19:05:22 +02:00
Kevin Wolf	8c2e3dd55f	block: Use blk_co_pdiscard() for all BB level discard All read/write functions already have a single coroutine-based function on the BlockBackend level through which all requests go (no matter what API style the external caller used) and which passes the requests down to the block node level. This patch extends this mode of operation to discards. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-10-27 19:05:22 +02:00
Kevin Wolf	be07a88981	block: Use blk_co_flush() for all BB level flushes All read/write functions already have a single coroutine-based function on the BlockBackend level through which all requests go (no matter what API style the external caller used) and which passes the requests down to the block node level. This patch extends this mode of operation to flushes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-10-27 19:05:22 +02:00
Kevin Wolf	2d76e724cf	block: Add qdev ID to DEVICE_TRAY_MOVED The event currently only contains the BlockBackend name. However, with anonymous BlockBackends, this is always the empty string. Add the qdev ID (or if none was given, the QOM path) so that the user can still see which device caused the event. Event generation has to be moved from bdrv_eject() to the BlockBackend because the BDS doesn't know the attached device, but that's easy because blk_eject() is the only user of it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-10-07 13:34:22 +02:00
Kevin Wolf	bbc8ea98bc	block-backend: Remember if attached device is non-qdev Almost all block devices are qdevified by now. This allows us to go back from the BlockBackend to the DeviceState. xen_disk is the last device that is missing. We'll remember in the BlockBackend if a xen_disk is attached and can then disable any features that require going from a BB to the DeviceState. While at it, clearly mark the function used by xen_disk as legacy even in its name, not just in TODO comments. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-10-07 13:34:22 +02:00
Kevin Wolf	2bf7e10f78	block: Add node name to BLOCK_IO_ERROR event The event currently only contains the BlockBackend name. However, with anonymous BlockBackends, this is always the empty string. Add the node name so that the user can still see which block device caused the event. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-10-07 13:34:22 +02:00
Paolo Bonzini	fffb6e1223	block: use aio_bh_schedule_oneshot This simplifies bottom half handlers by removing calls to qemu_bh_delete and thus removing the need to stash the bottom half pointer in the opaque datum. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-10-07 13:34:07 +02:00
Kevin Wolf	b85114f8cf	block: Use 'detect-zeroes' option for 'blockdev-change-medium' Instead of modifying the new BDS after it has been opened, use the newly supported 'detect-zeroes' option in bdrv_open_common() so that all requirements are checked (detect-zeroes=unmap requires discard=unmap). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-09-29 14:13:39 +02:00
John Snow	49137bf684	block-backend: remove blk_flush_all We can teach Xen to drain and flush each device as it needs to, instead of trying to flush ALL devices. This removes the last user of blk_flush_all. The function is therefore removed under the premise that any new uses of blk_flush_all would be the wrong paradigm: either flush the single device that requires flushing, or use an appropriate flush_all mechanism from outside of the BlkBackend layer. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-09-29 14:13:38 +02:00
Kevin Wolf	1c89e1fa2f	block: Add blk_by_dev() This finds a BlockBackend given the device model that is attached to it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-09-23 13:36:10 +02:00
Pavel Butsykin	35fadca80e	block: remove BlockDriver.bdrv_write_compressed There are no block drivers left that implement the old .bdrv_write_compressed interface, so it can be removed. Also now we have no need to use the bdrv_pwrite_compressed function and we can remove it entirely. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Jeff Cody <jcody@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-09-05 19:06:48 +02:00
Pavel Butsykin	751e2f0698	block: Convert bdrv_pwrite_compressed() to BdrvChild Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Eric Blake <eblake@redhat.com> CC: Jeff Cody <jcody@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-09-05 19:06:47 +02:00
Pavel Butsykin	fe5c1355e7	block: switch blk_write_compressed() to byte-based interface This is a preparatory patch, which continues the general trend of the transition to the byte-based interfaces. bdrv_check_request() and blk_check_request() are no longer used, thus we can remove them. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Jeff Cody <jcody@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-09-05 19:06:47 +02:00
Kevin Wolf	b6c1bae5df	block: Accept node-name for block-stream In order to remove the necessity to use BlockBackend names in the external API, we want to allow node-names everywhere. This converts block-stream to accept a node-name without lifting the restriction that we're operating at a root node. In case of an invalid device name, the command returns the GenericError error class now instead of DeviceNotFound, because this is what qmp_get_root_bs() returns. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2016-09-05 19:06:47 +02:00
Eric Blake	1c6c4bb7f0	block: Convert BB interface to byte-based discards Change sector-based blk_discard(), blk_co_discard(), and blk_aio_discard() to instead be byte-based blk_pdiscard(), blk_co_pdiscard(), and blk_aio_pdiscard(). NBD gets a lot simpler now that ignoring the unaligned portion of a byte-based discard request is handled under the hood by the block layer. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468624988-423-6-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-20 14:11:55 +01:00
Eric Blake	60ebac16bc	block: Convert bdrv_aio_discard() to byte-based Another step towards byte-based interfaces everywhere. Replace the sector-based bdrv_aio_discard() with a new byte-based bdrv_aio_pdiscard(), which silently ignores any unaligned head or tail. Driver callbacks will be converted in followup patches. Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 1468624988-423-5-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-20 14:11:55 +01:00
Eric Blake	0c51a893b6	block: Convert bdrv_discard() to byte-based Another step towards byte-based interfaces everywhere. Replace the sector-based bdrv_discard() with a new byte-based bdrv_pdiscard(), which silently ignores any unaligned head or tail. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468624988-423-3-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-20 14:11:55 +01:00
Eric Blake	9f1963b3f7	block: Convert bdrv_co_discard() to byte-based Another step towards byte-based interfaces everywhere. Replace the sector-based bdrv_co_discard() with a new byte-based bdrv_co_pdiscard(), which silently ignores any unaligned head or tail. Driver callbacks will be converted in followup patches. By calculating the alignment outside of the loop, and clamping the max discard to an aligned value, we can simplify the actions done within the loop. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468624988-423-2-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-20 14:11:54 +01:00
Kevin Wolf	8c39825218	block/qdev: Allow configuring rerror/werror with qdev properties The rerror/werror policies are implemented in the devices, so that's where they should be configured. In comparison to the old options in -drive, the qdev properties are only added to those devices that actually support them. If the option isn't given (or "auto" is specified), the setting of the BlockBackend is used for compatibility with the old options. For block jobs, "auto" is the same as "enospc". Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-07-13 13:32:27 +02:00
Paolo Bonzini	0b8b8753e4	coroutine: move entry argument to qemu_coroutine_create In practice the entry argument is always known at creation time, and it is confusing that sometimes qemu_coroutine_enter is used with a non-NULL argument to re-enter a coroutine (this happens in block/sheepdog.c and tests/test-coroutine.c). So pass the opaque value at creation time, for consistency with e.g. aio_bh_new. Mostly done with the following semantic patch: @ entry1 @ expression entry, arg, co; @@ - co = qemu_coroutine_create(entry); + co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry2 @ expression entry, arg; identifier co; @@ - Coroutine co = qemu_coroutine_create(entry); + Coroutine co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry3 @ expression entry, arg; @@ - qemu_coroutine_enter(qemu_coroutine_create(entry), arg); + qemu_coroutine_enter(qemu_coroutine_create(entry, arg)); @ reentry @ expression co; @@ - qemu_coroutine_enter(co, NULL); + qemu_coroutine_enter(co); except for the aforementioned few places where the semantic patch stumbled (as expected) and for test_co_queue, which would otherwise produce an uninitialized variable warning. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Kevin Wolf	a03ef88f77	block: Convert bdrv_co_preadv/pwritev to BdrvChild This is the final patch for converting the common I/O path to take a BdrvChild parameter instead of BlockDriverState. The completion of this conversion means that all users that perform I/O on an image need to actually hold a reference (in the form of BdrvChild, possible as part of a BlockBackend) to that image. This also protects against inconsistent use of BlockBackend vs. BlockDriverState functions because direct use of a BlockDriverState isn't possible any more and blk->root is private for block-backends.c. In addition, we can now distinguish different users in the I/O path, and the future op blockers work is going to add assertions based on permissions stored in BdrvChild. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:27 +02:00
Kevin Wolf	720ff280e7	block: Convert bdrv_pwrite_zeroes() to BdrvChild Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:27 +02:00
Eric Blake	5def6b80e1	block: Switch transfer length bounds to byte-based Sector-based limits are awkward to think about; in our on-going quest to move to byte-based interfaces, convert max_transfer_length and opt_transfer_length. Rename them (dropping the _length suffix) so that the compiler will help us catch the change in semantics across any rebased code, and improve the documentation. Use unsigned values, so that we don't have to worry about negative values and so that bit-twiddling is easier; however, we are still constrained by 2^31 of signed int in most APIs. When a value comes from an external source (iscsi and raw-posix), sanitize the results to ensure that opt_transfer is a power of 2. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:25 +02:00
Eric Blake	24ce9a2026	block: Give nonzero result to blk_get_max_transfer_length() Making all callers special-case 0 as unlimited is awkward, and we DO have a hard maximum of BDRV_REQUEST_MAX_SECTORS given our current block layer API limits. In the case of scsi, this means that we now always advertise a limit to the guest, even in cases where the underlying layers previously use 0 for no inherent limit beyond the block layer. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-05 16:46:25 +02:00
Kevin Wolf	1e98fefd95	block: Make blk_co_preadv/pwritev() public Also add trace points now that the function can be directly called. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2016-05-25 19:04:21 +02:00
Kevin Wolf	0c3169dffa	block: Default to enabled write cache in blk_new() The existing users of the function are: 1. blk_new_open(), which already enabled the write cache 2. Some test cases that don't care about the setting 3. blockdev_init() for empty drives, where the cache mode is overridden with the value from the options when a medium is inserted Therefore, this patch doesn't change the current behaviour. It will be convenient, however, for additional users of blk_new() (like block jobs) if the most sensible WCE setting is the default. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2016-05-25 19:04:21 +02:00
Eric Blake	d004bd52aa	block: Rename blk_write_zeroes() Commit `983a1600` changed the semantics of blk_write_zeroes() to be byte-based rather than sector-based, but did not change the name, which is an open invitation for other code to misuse the function. Renaming to pwrite_zeroes() makes it more in line with other byte-based interfaces, and will help make it easier to track which remaining write_zeroes interfaces still need conversion. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-25 19:04:21 +02:00
Kevin Wolf	36fe13317b	block: Fix reconfiguring graph with drained nodes When changing the BlockDriverState that a BdrvChild points to while the node is currently drained, we must call the .drained_end() parent callback. Conversely, when this means attaching a new node that is already drained, we need to call .drained_begin(). bdrv_root_attach_child() takes now an opaque parameter, which is needed because the callbacks must also be called if we're attaching a new child to the BlockBackend when the root node is already drained, and they need a way to identify the BlockBackend. Previously, child->opaque was set too late and the callbacks would still see it as NULL. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-25 19:04:10 +02:00
Max Reitz	109525ad6a	block: Drop errp parameter from blk_new() blk_new() cannot fail so its Error ** parameter has become superfluous. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-25 19:04:10 +02:00
Max Reitz	5b3639371c	block: Make bdrv_open() return a BDS There are no callers to bdrv_open() or bdrv_open_inherit() left that pass a pointer to a non-NULL BDS pointer as the first argument of these functions, so we can finally drop that parameter and just make them return the new BDS. Generally, the following pattern is applied: bs = NULL; ret = bdrv_open(&bs, ..., &local_err); if (ret < 0) { error_propagate(errp, local_err); ... } by bs = bdrv_open(..., errp); if (!bs) { ret = -EINVAL; ... } Of course, there are only a few instances where the pattern is really pure. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-25 19:04:10 +02:00
Max Reitz	28eb9b12f7	block: Drop blk_new_with_bs() Its only caller is blk_new_open(), so we can just inline it there. The bdrv_new_root() call is dropped in the process because we can just let bdrv_open() create the BDS. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-25 19:04:10 +02:00
Kevin Wolf	88be7b4be4	block: Fix bdrv_next() memory leak The bdrv_next() users all leaked the BdrvNextIterator after completing the iteration. Simply changing bdrv_next() to free the iterator before returning NULL at the end of list doesn't work because some callers exit the loop before looking at all BDSes. This patch moves the BdrvNextIterator from the heap to the stack of the caller and switches to a bdrv_first()/bdrv_next() interface for initialising the iterator. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-25 19:04:10 +02:00
Kevin Wolf	1f0c461b82	block: Remove BlockDriverState.blk This patch removes the remaining users of bs->blk, which will allow us to have multiple BBs on top of a single BDS. In the meantime, all checks that are currently in place to prevent the user from creating such setups can be switched to bdrv_has_blk() instead of accessing BDS.blk. Future patches can allow them and e.g. enable users to mirror to a block device that already has a BlockBackend on it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	7c8eece45b	block: Avoid bs->blk in bdrv_next() We need to introduce a separate BdrvNextIterator struct that can keep more state than just the current BDS in order to avoid using the bs->blk pointer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	dde33812a8	block: Add bdrv_has_blk() In many cases we just want to know whether a BDS has at least one BB attached, without needing to know the exact BB that is attached. In contrast to bs->blk, this is still a valid question when more than one BB can be attached, so just answer it by checking the parents list. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	91c6e4b7bb	block: Remove bdrv_aio_multiwrite() Since virtio-blk implements request merging itself these days, the only remaining users are test cases for the function. That doesn't make the function exactly useful any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	4c265bf9f4	block: User BdrvChild callback for device name In order to get rid of bs->blk for bdrv_get_device_name() and bdrv_get_device_or_node_name(), ask all parents for their name and simply pick the first one. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	5c8cab4808	block: Use BdrvChild callbacks for change_media/resize We want to get rid of BlockDriverState.blk in order to allow multiple BlockBackends per BDS. Converting the device callbacks in block.c (which assume a single BlockBackend) to per-child callbacks gets us rid of the first few instances. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Kevin Wolf	7ca7f0f6db	block: Decouple throttling from BlockDriverState This moves the throttling related part of the BDS life cycle management to BlockBackend. The throttling group reference is now kept even when no medium is inserted. With this commit, throttling isn't disabled and then re-enabled any more during graph reconfiguration. This fixes the temporary breakage of I/O throttling when used with live snapshots or block jobs that manipulate the graph. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	c2066af051	block: Drain throttling queue with BdrvChild callback This removes the last part of I/O throttling from block/io.c and moves it to the BlockBackend. Instead of having knowledge about throttling inside io.c, we can call a BdrvChild callback .drained_begin/end, which happens to drain the throttled requests for BlockBackend parents. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	22aa8b246a	block: Introduce BdrvChild.opaque BlockBackends use it to get a back pointer from BdrvChild to BlockBackend in any BdrvChildRole callbacks. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	97148076e8	block: Move I/O throttling configuration functions to BlockBackend Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	441565b279	block: Move actual I/O throttling to BlockBackend Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	27ccdd5259	block: Move throttling fields from BDS to BB This patch changes where the throttling state is stored (used to be the BlockDriverState, now it is the BlockBackend), but it doesn't actually make it a BB level feature yet. For example, throttling is still disabled when the BDS is detached from the BB. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:30 +02:00
Kevin Wolf	49d2165d7d	block: Convert throttle_group_get_name() to BlockBackend Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:29 +02:00
Kevin Wolf	f2cd875d54	block: Introduce BlockBackendPublic Some features, like I/O throttling, are implemented outside block-backend.c, but still want to keep information in BlockBackend, e.g. list entries that allow keeping a list of BlockBackends. In order to avoid exposing the whole struct layout in the public header file, this patch introduces an embedded public struct where such information can be added and a pair of functions to convert between BlockBackend and BlockBackendPublic. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:29 +02:00
Kevin Wolf	a5614993d7	block: Make sure throttled BDSes always have a BB It was already true in principle that a throttled BDS always has a BB attached, except that the order of operations while attaching or detaching a BDS to/from a BB wasn't careful enough. This commit breaks graph manipulations while I/O throttling is enabled. It would have been possible to keep things working with some temporary hacks, but quite cumbersome, so it's not worth the hassle. We'll fix things again in a minute. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-05-19 16:45:29 +02:00
Eric Blake	7b1deac84e	block: Kill unused sector-based blk_* functions Now that there are no remaining clients, we can drop the sector-based blk_read(), blk_write(), blk_aio_readv(), and blk_aio_writev(). Sadly, there are still remaining sector-based interfaces, such as blk_*discard(), or blk_write_compressed(); those will have to wait for another day. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:09 +02:00
Eric Blake	60cb2fa7eb	block: Introduce byte-based aio read/write blk_aio_readv() and blk_aio_writev() are annoying in that they can't access sub-sector granularity, and cannot pass flags. Also, they require the caller to pass redundant information about the size of the I/O (qiov->size in bytes must match nb_sectors in sectors). Add new blk_aio_preadv() and blk_aio_pwritev() functions to fix the flaws. The next few patches will upgrade callers, then finally delete the old interfaces. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:08 +02:00
Eric Blake	983a160050	block: Switch blk_*write_zeroes() to byte interface Sector-based blk_write() should die; convert the one-off variant blk_write_zeroes() to use an offset/count interface instead. Likewise for blk_co_write_zeroes() and blk_aio_write_zeroes(). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:08 +02:00
Eric Blake	b7d17f9fa4	block: Switch blk_read_unthrottled() to byte interface Sector-based blk_read() should die; convert the one-off variant blk_read_unthrottled(). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:08 +02:00
Eric Blake	8341f00dc2	block: Allow BDRV_REQ_FUA through blk_pwrite() We have several block drivers that understand BDRV_REQ_FUA, and emulate it in the block layer for the rest by a full flush. But without a way to actually request BDRV_REQ_FUA during a pass-through blk_pwrite(), FUA-aware block drivers like NBD are forced to repeat the emulation logic of a full flush regardless of whether the backend they are writing to could do it more efficiently. This patch just wires up a flags argument; followup patches will actually make use of it in the NBD driver and in qemu-io. Signed-off-by: Eric Blake <eblake@redhat.com> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	cab3a3563c	block: Rename bdrv_co_do_preadv/writev to bdrv_co_preadv/writev It used to be an internal helper function just for implementing bdrv_co_do_readv/writev(), but now that it's a public interface, it deserves a name without "do" in it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Paolo Bonzini	ce0f141259	block: introduce bdrv_no_throttling_begin/end Extract the handling of throttling from bdrv_flush_io_queue. These new functions will soon become BdrvChildRole callbacks, as they can be generalized to "beginning of drain" and "end of drain". Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Kevin Wolf	16aaf975ee	block: Don't ignore flags in blk_{,co,aio}_write_zeroes() Commit `57d6a428` neglected to pass the given flags to blk_aio_prwv(), which broke discard by WRITE SAME for scsi-disk (the UNMAP bit would be ignored). Commit `fc1453cd` introduced the same bug for blk_write_zeroes(). This is used for 'qemu-img convert' without has_zero_init (e.g. on a block device) and for preallocation=falloc in parallels. Commit `8896e088` is the version for blk_co_write_zeroes(). This function is only used in qemu-io. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-04-15 17:22:12 +02:00
Kevin Wolf	7fa84cd8d4	block: Fix blk_aio_write_zeroes() Commit `57d6a428` broke blk_aio_write_zeroes() because in some write functions in the call path don't have an explicit length argument but reuse qiov->size instead. Which is great, except that write_zeroes doesn't have a qiov, which this commit interprets as 0 bytes. Consequently, blk_aio_write_zeroes() didn't effectively do anything. This patch introduces an explicit acb->bytes in BlkAioEmAIOCB and uses that instead of acb->rwco.size. The synchronous version of the function is okay because it does pass a qiov (with the right size and a NULL pointer as its base). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-04-15 17:22:11 +02:00

1 2 3 4 5

224 Commits