mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Paolo Bonzini	733bbc8cea	block: make bdrv_start_throttled_reqs return void The return value is unused and I am not sure why it would be useful. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Kevin Wolf	90c78624f1	block: Don't disable I/O throttling on sync requests We had to disable I/O throttling with synchronous requests because we didn't use to run timers in nested event loops when the code was introduced. This isn't true any more, and throttling works just fine even when using the synchronous API. The removed code is in fact dead code since commit `a8823a3b` ('block: Use blk_co_pwritev() for blk_write()') because I/O throttling can only be set on the top layer, but BlockBackend always uses the coroutine interface now instead of using the sync API emulation in block.c. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <1458660792-3035-2-git-send-email-kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Fam Zheng	a77fd4bb29	block: Fix bdrv_drain in coroutine Using the nested aio_poll() in coroutine is a bad idea. This patch replaces the aio_poll loop in bdrv_drain with a BH, if called in coroutine. For example, the bdrv_drain() in mirror.c can hang when a guest issued request is pending on it in qemu_co_mutex_lock(). Mirror coroutine in this case has just finished a request, and the block job is about to complete. It calls bdrv_drain() which waits for the other coroutine to complete. The other coroutine is a scsi-disk request. The deadlock happens when the latter is in turn pending on the former to yield/terminate, in qemu_co_mutex_lock(). The state flow is as below (assuming a qcow2 image): mirror coroutine scsi-disk coroutine ------------------------------------------------------------- do last write qcow2:qemu_co_mutex_lock() ... scsi disk read tracked request begin qcow2:qemu_co_mutex_lock.enter qcow2:qemu_co_mutex_unlock() bdrv_drain while (has tracked request) aio_poll() In the scsi-disk coroutine, the qemu_co_mutex_lock() will never return because the mirror coroutine is blocked in the aio_poll(blocking=true). With this patch, the added qemu_coroutine_yield() allows the scsi-disk coroutine to make progress as expected: mirror coroutine scsi-disk coroutine ------------------------------------------------------------- do last write qcow2:qemu_co_mutex_lock() ... scsi disk read tracked request begin qcow2:qemu_co_mutex_lock.enter qcow2:qemu_co_mutex_unlock() bdrv_drain.enter > schedule BH > qemu_coroutine_yield() > qcow2:qemu_co_mutex_lock.return > ... tracked request end ... (resumed from BH callback) bdrv_drain.return ... Reported-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1459855253-5378-2-git-send-email-famz@redhat.com Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-04-11 16:59:09 +01:00
Kevin Wolf	93f5e6d88a	block: Introduce bdrv_co_writev_flags() This function will allow drivers to implement BDRV_REQ_FUA natively instead of sending a separate flush after the write. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:02 +02:00
Kevin Wolf	bfd18d1e0b	block: Move enable_write_cache to BB level Whether a write cache is used or not is a decision that concerns the user (e.g. the guest device) rather than the backend. It was already logically part of the BB level as bdrv_move_feature_fields() always kept it on top of the BDS tree; with this patch, the core of it (the actual flag and the additional flushes) is also implemented there. Direct callers of bdrv_open() must pass BDRV_O_CACHE_WB now if bs doesn't have a BlockBackend attached. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:02 +02:00
Kevin Wolf	855a6a93a1	block: Handle flush error in bdrv_pwrite_sync() We don't want to silently ignore a flush error. Also, there is little point in avoiding the flush for writethrough modes and once WCE is moved to the BB layer, we definitely need the flush here because bdrv_pwrite() won't involve one any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:01 +02:00
Pavel Dovgalyuk	c32b82afaf	block: add flush callback This patch adds callback for flush request. This callback is responsible for flushing whole block devices stack. bdrv_flush function does not proceed to underlying devices. It should be performed by this callback function, if needed. Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 12:12:15 +02:00
Daniel P. Berrange	abb06c5ac1	block: add flag to indicate that no I/O will be performed When opening an image it is useful to know whether the caller intends to perform I/O on the image or not. In the case of encrypted images this will allow the block driver to avoid having to prompt for decryption keys when we merely want to query header metadata about the image. eg qemu-img info This flag is enforced at the top level only, since even if we don't want todo I/O on the 'qcow2' file payload, the underlying 'file' driver will still need todo I/O to read the qcow2 header, for example. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 11:59:32 +02:00
Veronia Bahaa	f348b6d1a5	util: move declarations out of qemu-common.h Move declarations out of qemu-common.h for functions declared in utils/ files: e.g. include/qemu/path.h for utils/path.c. Move inline functions out of qemu-common.h and into new files (e.g. include/qemu/bcd.h) Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-22 22:20:17 +01:00
Markus Armbruster	da34e65cb4	include/qemu/osdep.h: Don't include qapi/error.h Commit `57cb38b` included qapi/error.h into qemu/osdep.h to get the Error typedef. Since then, we've moved to include qemu/osdep.h everywhere. Its file comment explains: "To avoid getting into possible circular include dependencies, this file should not include any other QEMU headers, with the exceptions of config-host.h, compiler.h, os-posix.h and os-win32.h, all of which are doing a similar job to this file and are under similar constraints." qapi/error.h doesn't do a similar job, and it doesn't adhere to similar constraints: it includes qapi-types.h. That's in excess of 100KiB of crap most .c files don't actually need. Add the typedef to qemu/typedefs.h, and include that instead of qapi/error.h. Include qapi/error.h in .c files that need it and don't get it now. Include qapi-types.h in qom/object.h for uint16List. Update scripts/clean-includes accordingly. Update it further to match reality: replace config.h by config-target.h, add sysemu/os-posix.h, sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h comment quoted above similarly. This reduces the number of objects depending on qapi/error.h from "all of them" to less than a third. Unfortunately, the number depending on qapi-types.h shrinks only a little. More work is needed for that one. Signed-off-by: Markus Armbruster <armbru@redhat.com> [Fix compilation without the spice devel packages. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-22 22:20:15 +01:00
Kevin Wolf	5bd5119667	block: Pull up blk_read_unthrottled() implementation Use blk_read(), so that it goes through blk_co_preadv() like all read requests from the BB to the BDS. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:57 +01:00
Kevin Wolf	a8823a3bfd	block: Use blk_co_pwritev() for blk_write() Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:57 +01:00
Kevin Wolf	1bf1cbc91f	block: Use blk_co_preadv() for blk_read() This patch introduces blk_co_preadv() as a central function on the BlockBackend level that is supposed to handle all read requests from the BB to its root BDS eventually. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:57 +01:00
Max Reitz	fe1a9cbc33	block: Move some bdrv_*_all() functions to BB Move bdrv_commit_all() and bdrv_flush_all() to the BlockBackend level. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:56 +01:00
Paolo Bonzini	9dcf8ecd9e	block: add missing call to bdrv_drain_recurse This is also needed in bdrv_drain_all, not just in bdrv_drain. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1450867706-19860-3-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-02-09 13:52:26 +00:00
Fam Zheng	ac987b30d0	block: Use returned *file in bdrv_co_get_block_status Now that all drivers return the right "file" pointer, we can use it. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1453780743-16806-14-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 17:50:47 +01:00
Fam Zheng	67a0fd2a9b	block: Add "file" output parameter to block status query functions The added parameter can be used to return the BDS pointer which the valid offset is referring to. Its value should be ignored unless BDRV_BLOCK_OFFSET_VALID in ret is set. Until block drivers fill in the right value, let's clear it explicitly right before calling .bdrv_get_block_status. The "bs->file" condition in bdrv_co_get_block_status is kept now to keep iotest case 102 passing, and will be fixed once all drivers return the right file pointer. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1453780743-16806-2-git-send-email-famz@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-02-02 17:50:47 +01:00
Kevin Wolf	04c01a5c8f	block: Rename BDRV_O_INCOMING to BDRV_O_INACTIVE Instead of covering only the state of images on the migration destination before the migration is completed, the flag will also cover the state of images on the migration source after completion. This common state implies that the image is technically still open, but no writes will happen and any cached contents will be reloaded from disk if and when the image leaves this state. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-01-20 13:36:23 +01:00
Kevin Wolf	09e0c771e4	block: Assert no write requests under BDRV_O_INCOMING As long as BDRV_O_INCOMING is set, the image file is only opened so we have a file descriptor for it. We're definitely not supposed to modify the image, it's still owned by the migration source. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-01-20 13:36:23 +01:00
Peter Maydell	80c71a241a	block: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-01-20 13:36:23 +01:00
Stefan Hajnoczi	222565f65c	block: replace IOV_MAX with BlockLimits.max_iov Request merging must not result in a huge request that exceeds the maximum number of iovec elements. Use BlockLimits.max_iov instead of hardcoding IOV_MAX. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-12-22 16:01:07 +08:00
Stefan Hajnoczi	bd44feb754	block: add BlockLimits.max_iov field The maximum number of struct iovec elements depends on the BlockDriverState. The raw-posix and iSCSI protocols have a maximum of IOV_MAX but others could have different values. Cc: Peter Lieven <pl@kamp.de> Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-12-22 16:01:07 +08:00
Paolo Bonzini	ba88944495	block: fix bdrv_ioctl called from coroutine When called from a coroutine, bdrv_ioctl must be asynchronous just like e.g. bdrv_flush. The code was incorrectly making it synchronous, fix it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-12-18 14:34:44 +01:00
Fam Zheng	61408b250e	block: Don't wait serialising for non-COR read requests The assertion problem was noticed in `06c3916b35`, but it wasn't completely fixed, because even though the req is not marked as serialising, it still gets serialised by wait_serialising_requests against other serialising requests, which could lead to the same assertion failure. Fix it by even more explicitly skipping the serialising for this specific case. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1448962590-2842-2-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-12-03 11:08:07 +08:00
Fam Zheng	67da1dc5ce	block: Introduce BlockDriver.bdrv_drain callback Drivers can have internal request sources that generate IO, like the need_check_timer in QED. Since we want quiesced periods that contain nested event loops in block layer, we need to have a way to disable such event sources. Block drivers must implement the "bdrv_drain" callback if it has any internal sources that can generate I/O activity, like a timer or a worker thread (even in a library) that can schedule QEMUBH in an asynchronous callback. Update the comments of bdrv_drain and bdrv_drained_begin accordingly. Like bdrv_requests_pending(), we should consider all the children of bs. Before, the while loop just works, as bdrv_requests_pending() already tracks its children; now we mustn't miss the callback, so recurse down explicitly. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1447064214-29930-9-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:43 +01:00
Fam Zheng	5c5ae76acb	block: Emulate bdrv_ioctl with bdrv_aio_ioctl and track both Currently all drivers that support .bdrv_aio_ioctl also implement .bdrv_ioctl redundantly. To track ioctl requests in block layer it is easier if we unify the two paths, because we'll need to run it in a coroutine, as required by tracked_request_begin. While we're at it, use .bdrv_aio_ioctl plus aio_poll() to emulate bdrv_ioctl(). Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1447064214-29930-7-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:43 +01:00
Fam Zheng	b1066c8755	block: Track discard requests Both bdrv_discard and bdrv_aio_discard will call into bdrv_co_discard, so add tracked_request_begin/end calls around the loop. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1447064214-29930-4-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:42 +01:00
Fam Zheng	cdb5e3155e	block: Track flush requests Both bdrv_flush and bdrv_aio_flush eventually call bdrv_co_flush, add tracked_request_begin and tracked_request_end pair in that function so that all flush requests are now tracked. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1447064214-29930-3-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:42 +01:00
Fam Zheng	ebde595ce6	block: Add more types for tracked request We'll track more request types besides read and write, change the boolean field to an enum. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1447064214-29930-2-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-11-12 16:22:08 +01:00
Kevin Wolf	37a639a7fb	block: Consider all child nodes in bdrv_requests_pending() The function manually recursed into bs->file and bs->backing to check whether there were any requests pending, but it ignored other children. There's no need to special case file and backing here, so just replace these two explicit recursions by a loop recursing for all child nodes. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1446029211-27148-1-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-29 17:59:27 +00:00
Fam Zheng	51288d7917	block: Introduce "drained begin/end" API The semantics is that after bdrv_drained_begin(bs), bs will not get new external requests until the matching bdrv_drained_end(bs). Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:24 +02:00
Max Reitz	7f0e9da6f1	block: Move BlockAcctStats into BlockBackend As the comment above bdrv_get_stats() says, BlockAcctStats is something which belongs to the device instead of each BlockDriverState. This patch therefore moves it into the BlockBackend. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:23 +02:00
Max Reitz	53d8f9d8fb	block: Remove wr_highest_sector from BlockAcctStats BlockAcctStats contains statistics about the data transferred from and to the device; wr_highest_sector does not fit in with the rest. Furthermore, those statistics are supposed to be specific for a certain device and not necessarily for a BDS (see the comment above bdrv_get_stats()); on the other hand, wr_highest_sector may be a rather important information to know for each BDS. When BlockAcctStats is finally removed from the BDS, we will want to keep wr_highest_sector in the BDS. Finally, wr_highest_sector is renamed to wr_highest_offset and given the appropriate meaning. Externally, it is represented as an offset so there is no point in doing something different internally. Its definition is changed to match that in qapi/block-core.json which is "the offset after the greatest byte written to". Doing so should not cause any harm since if external programs tried to calculate the volume usage by (wr_highest_offset + 512) / volume_size, after this patch they will just assume the volume to be full slightly earlier than before. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-23 18:18:23 +02:00
Kevin Wolf	439db28cf9	block/io: Make bdrv_requests_pending() public Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	760e006384	block: Convert bs->backing_hd to BdrvChild This is the final step in converting all of the BlockDriverState pointers that block drivers use to BdrvChild. After this patch, bs->children contains the full list of child nodes that are referenced by a given BDS, and these children are only referenced through BdrvChild, so that updating the pointer in there is enough for changing edges in the graph. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	9a4f4c3156	block: Convert bs->file to BdrvChild This patch removes the temporary duplication between bs->file and bs->file_child by converting everything to BdrvChild. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Paolo Bonzini	c84b31926f	block: switch from g_slice allocator to malloc Simplify memory allocation by sticking with a single API. GSlice is not that fast anyway (tcmalloc/jemalloc are better). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-12 11:17:45 +01:00
Wen Congyang	9568b511c9	block: Introduce a new API bdrv_co_no_copy_on_readv() In some cases, we need to disable copy-on-read, and just read the data. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Message-id: 1441682913-14320-2-git-send-email-wency@cn.fujitsu.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-09-25 08:37:07 -04:00
Stefan Hajnoczi	7a63f3cdc4	block: update bdrv_drain_all()/bdrv_drain() comments The doc comments for bdrv_drain_all() and bdrv_drain() are outdated: * The bdrv_drain() comment is a poor man's bdrv_lock()/bdrv_unlock() which Fam Zheng is currently developing. Unfortunately this warning was never really enough because devices keep submitting I/O and op blockers don't prevent that. * The bdrv_drain_all() comment is still partially correct but reflects the nature of the implementation rather than API documentation. Do make it clear that bdrv_drain() is only appropriate within an AioContext. For anything spanning AioContexts you need bdrv_drain_all(). Cc: Markus Armbruster <armbru@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1435854281-6078-1-git-send-email-stefanha@redhat.com	2015-07-07 10:31:08 +01:00
Alberto Garcia	764ba3ae51	block: remove redundant check before g_slist_find() An empty GSList is represented by a NULL pointer, therefore it's a perfectly valid argument for g_slist_find() and there's no need to make any additional check. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 1435583533-5758-1-git-send-email-berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 10:06:23 +01:00
Fam Zheng	508249952c	block: Fix dirty bitmap in bdrv_co_discard Unsetting dirty globally with discard is not very correct. The discard may zero out sectors (depending on can_write_zeroes_with_unmap), we should replicate this change to destination side to make sure that the guest sees the same data. Calling bdrv_reset_dirty also troubles mirror job because the hbitmap iterator doesn't expect unsetting of bits after current position. So let's do it the opposite way which fixes both problems: set the dirty bits if we are to discard it. Reported-by: wangxiaolong@ucloud.cn Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 10:06:23 +01:00
Fam Zheng	ba3f0e2545	block: Add bdrv_get_block_status_above Like bdrv_is_allocated_above, this function follows the backing chain until seeing BDRV_BLOCK_ALLOCATED. Base is not included. Reimplement bdrv_is_allocated on top. [Initialized bdrv_co_get_block_status_above() ret to 0 to silence mingw64 compiler warning about the unitialized variable. assert(bs != base) prevents that case but I suppose the program could be compiled with -DNDEBUG. --Stefan] Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 10:03:50 +01:00
Dimitris Aragiorgis	1b6bc94d5d	Fix migration in case of scsi-generic During migration, QEMU uses fsync()/fdatasync() on the open file descriptor for read-write block devices to flush data just before stopping the VM. However, fsync() on a scsi-generic device returns -EINVAL which causes the migration to fail. This patch skips flushing data in case of an SG device, since submitting SCSI commands directly via an SG character device (e.g. /dev/sg0) bypasses the page cache completely, anyway. Note that fsync() not only flushes the page cache but also the disk cache. The scsi-generic device never sends flushes, and for migration it assumes that the same SCSI device is used by the destination host, so it does not issue any SCSI SYNCHRONIZE CACHE (10) command. Finally, remove the bdrv_is_sg() test from iscsi_co_flush() since this is now redundant (we flush the underlying protocol at the end of bdrv_co_flush() which, with this patch, we never reach). Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1435056300-14924-3-git-send-email-dimara@arrikto.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:08:52 +01:00
Alexander Yarygin	f406c03c09	block: Let bdrv_drain_all() to call aio_poll() for each AioContext After the commit `9b536adc` ("block: acquire AioContext in bdrv_drain_all()") the aio_poll() function got called for every BlockDriverState, in assumption that every device may have its own AioContext. If we have thousands of disks attached, there are a lot of BlockDriverStates but only a few AioContexts, leading to tons of unnecessary aio_poll() calls. This patch changes the bdrv_drain_all() function allowing it find shared AioContexts and to call aio_poll() only for unique ones. Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-id: 1433936297-7098-4-git-send-email-yarygin@linux.vnet.ibm.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:06:16 +01:00
Markus Armbruster	d49b683644	qerror: Move #include out of qerror.h Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>	2015-06-22 18:20:40 +02:00
Alberto Garcia	76f4afb40f	throttle: Add throttle group support The throttle group support use a cooperative round robin scheduling algorithm. The principles of the algorithm are simple: - Each BDS of the group is used as a token in a circular way. - The active BDS computes if a wait must be done and arms the right timer. - If a wait must be done the token timer will be armed so the token will become the next active BDS. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: f0082a86f3ac01c46170f7eafe2101a92e8fde39.1433779731.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 14:00:00 +01:00
Benoît Canet	0e5b0a2d54	throttle: Extract timers from ThrottleState into a separate structure Group throttling will share ThrottleState between multiple bs. As a consequence the ThrottleState will be accessed by multiple aio context. Timers are tied to their aio context so they must go out of the ThrottleState structure. This commit paves the way for each bs of a common ThrottleState to have its own timer. Signed-off-by: Benoit Canet <benoit.canet@nodalink.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 6cf9ea96d8b32ae2f8769cead38f68a6a0c8c909.1433779731.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 14:00:00 +01:00
Paolo Bonzini	a53f1a95f9	block: get_block_status: use "else" when testing the opposite condition A bit of Boolean algebra (and common sense) tells us that the second "if" here is looking for blocks that are not allocated. This is the opposite of the "if" that sets BDRV_BLOCK_ALLOCATED, and thus it can use an "else". Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1431599702-10431-1-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Fam Zheng	9eeb6dd1b2	block: Fix NULL deference for unaligned write if qiov is NULL For zero write, callers pass in NULL qiov (qemu-io "write -z" or scsi-disk "write same"). Commit `fc3959e466` fixed bdrv_co_write_zeroes which is the common case for this bug, but it still exists in bdrv_aio_write_zeroes. A simpler fix would be in bdrv_co_do_pwritev which is the NULL dereference point and covers both cases. So don't access it in bdrv_co_do_pwritev in this case, use three aligned writes. [Initialize ret to 0 in bdrv_co_do_zero_pwritev() to avoid uninitialized variable warning with gcc 4.9.2. --Stefan] Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1431522721-3266-3-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Fam Zheng	d01c07f222	Revert "block: Fix unaligned zero write" This reverts commit `fc3959e466`. The core write code already handles the case, so remove this duplication. Because commit `61007b316` moved the touched code from block.c to block/io.c, the change is manually reverted. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1431522721-3266-2-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Denis V. Lunev	459b4e6612	block: align bounce buffers to page The following sequence int fd = open(argv[1], O_RDWR \| O_CREAT \| O_DIRECT, 0644); for (i = 0; i < 100000; i++) write(fd, buf, 4096); performs 5% better if buf is aligned to 4096 bytes. The difference is quite reliable. On the other hand we do not want at the moment to enforce bounce buffering if guest request is aligned to 512 bytes. The patch changes default bounce buffer optimal alignment to MAX(page size, 4k). 4k is chosen as maximal known sector size on real HDD. The justification of the performance improve is quite interesting. From the kernel point of view each request to the disk was split by two. This could be seen by blktrace like this: 9,0 11 1 0.000000000 11151 Q WS 312737792 + 1023 [qemu-img] 9,0 11 2 0.000007938 11151 Q WS 312738815 + 8 [qemu-img] 9,0 11 3 0.000030735 11151 Q WS 312738823 + 1016 [qemu-img] 9,0 11 4 0.000032482 11151 Q WS 312739839 + 8 [qemu-img] 9,0 11 5 0.000041379 11151 Q WS 312739847 + 1016 [qemu-img] 9,0 11 6 0.000042818 11151 Q WS 312740863 + 8 [qemu-img] 9,0 11 7 0.000051236 11151 Q WS 312740871 + 1017 [qemu-img] 9,0 5 1 0.169071519 11151 Q WS 312741888 + 1023 [qemu-img] After the patch the pattern becomes normal: 9,0 6 1 0.000000000 12422 Q WS 314834944 + 1024 [qemu-img] 9,0 6 2 0.000038527 12422 Q WS 314835968 + 1024 [qemu-img] 9,0 6 3 0.000072849 12422 Q WS 314836992 + 1024 [qemu-img] 9,0 6 4 0.000106276 12422 Q WS 314838016 + 1024 [qemu-img] and the amount of requests sent to disk (could be calculated counting number of lines in the output of blktrace) is reduced about 2 times. Both qemu-img and qemu-io are affected while qemu-kvm is not. The guest does his job well and real requests comes properly aligned (to page). Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1431441056-26198-3-git-send-email-den@openvz.org CC: Paolo Bonzini <pbonzini@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Denis V. Lunev	4196d2f030	block: minimal bounce buffer alignment The patch introduces new concept: minimal memory alignment for bounce buffers. Original so called "optimal" value is actually minimal required value for aligment. It should be used for validation that the IOVec is properly aligned and bounce buffer is not required. Though, from the performance point of view, it would be better if bounce buffer or IOVec allocated by QEMU will be aligned stricter. The patch does not change any alignment value yet. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1431441056-26198-2-git-send-email-den@openvz.org CC: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Paolo Bonzini	eaf5fe2dd4	block: return EPERM on writes or discards to read-only devices This is the behavior in the operating system, for example Linux's blkdev_write_iter has the following: if (bdev_read_only(I_BDEV(bd_inode))) return -EPERM; This does not apply to opening a device for read/write, when the device only supports read-only operation. In this case any of EACCES, EPERM or EROFS is acceptable depending on why writing is not possible. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1431013548-22492-1-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Stefan Hajnoczi	61007b316c	block: move I/O request processing to block/io.c The block.c file has grown to over 6000 lines. It is time to split this file so there are fewer conflicts and the code is easier to maintain. Extract I/O request processing code: * Read * Write * Zero writes and making the image empty * Flush * Discard * ioctl * Tracked requests and queuing * Throttling and copy-on-read * Block status and allocated functions * Refreshing block limits * Reading/writing vmstate * qemu_blockalign() and friends The patch simply moves code from block.c into block/io.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-04-28 15:36:17 +02:00

1 2 3 4 5

204 Commits