mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Stefan Hajnoczi	97ebbab0e3	nbd: set TCP_NODELAY Disable the Nagle algorithm to reduce latency. Note this means we must also use TCP_CORK when sending header followed by payload to avoid fragmenting lots of little packets. The previous patch took care of that. Suggested-by: Nick Thomas <nick@bytemark.co.uk> Tested-by: Nick Thomas <nick@bytemark.co.uk> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-04-15 16:35:17 +02:00
Stefan Hajnoczi	0fcece25c0	nbd: use TCP_CORK in nbd_co_send_request() Use TCP_CORK to defer packet transmission until both the header and the payload have been written. Suggested-by: Nick Thomas <nick@bytemark.co.uk> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-04-15 16:30:40 +02:00
Stefan Hajnoczi	6760c47aa4	nbd: unlock mutex in nbd_co_send_request() error path Cc: qemu-stable@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-04-15 16:29:46 +02:00
Josh Durgin	dc7588c1eb	rbd: add an asynchronous flush The existing bdrv_co_flush_to_disk implementation uses rbd_flush(), which is sychronous and causes the main qemu thread to block until it is complete. This results in unresponsiveness and extra latency for the guest. Fix this by using an asynchronous version of flush. This was added to librbd with a special #define to indicate its presence, since it will be backported to stable versions. Thus, there is no need to check the version of librbd. Implement this as bdrv_aio_flush, since it matches other aio functions in the rbd block driver, and leave out bdrv_co_flush_to_disk when the asynchronous version is available. Reported-by: Oliver Francke <oliver@filoo.de> Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-15 10:18:05 +02:00
Richard W.M. Jones	9a2d462e7b	block: ssh: Use libssh2_sftp_fsync (if supported by libssh2) to flush to disk. libssh2_sftp_fsync is an extension to libssh2 to support fsync(2) over sftp, which is itself an extension of OpenSSH. If both libssh2 and the ssh daemon support it, this will allow bdrv_flush_to_disk to commit changes through to disk on the remote server. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-15 10:18:05 +02:00
Richard W.M. Jones	0a12ec87a5	block: Add support for Secure Shell (ssh) block device. qemu-system-x86_64 -drive file=ssh://hostname/some/image QEMU will ssh into 'hostname' and open '/some/image' which is made available as a standard block device. You can specify a username (ssh://user@host/...) and/or a port number (ssh://host:port/...). You can also use an alternate syntax using properties (file.user, file.host, file.port, file.path). Current limitations: - Authentication must be done without passwords or passphrases, using ssh-agent. Other authentication methods are not supported. - Uses a single connection, instead of concurrent AIO with multiple SSH connections. This is implemented using libssh2 on the client side. The server just requires a regular ssh daemon with sftp-server support. Most ssh daemons on Unix/Linux systems will work out of the box. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-15 10:18:05 +02:00
Kevin Wolf	8d3b1a2d0b	block: Introduce bdrv_pwritev() for qcow2_save_vmstate Directly pass the QEMUIOVector on instead of linearising it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-15 08:26:18 +02:00
Kevin Wolf	cf8074b382	block: Introduce bdrv_writev_vmstate Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-04-15 08:26:18 +02:00
Aurelien Jarno	753d9b82c5	aes: move aes.h from include/block to include/qemu Move aes.h from include/block to include/qemu to show it can be reused by other subsystems. Cc: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@gmail.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2013-04-13 13:51:57 +02:00
Paolo Bonzini	0d09e41a51	hw: move headers to include/ Many of these should be cleaned up with proper qdev-/QOM-ification. Right now there are many catch-all headers in include/hw/ARCH depending on cpu.h, and this makes it necessary to compile these files per-target. However, fixing this does not belong in these patches. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-04-08 18:13:10 +02:00
Kevin Wolf	c2b6ff51e4	qcow2: Fix L1 write error handling in qcow2_update_snapshot_refcount It ignored the error code, and at least the 'goto fail' is obvious nonsense as it creates an endless loop (if the next attempt doesn't magically succeed) and leaves the in-memory L1 table in big-endian instead of converting it back. In error cases, there's no point in writing an updated L1 table, so skip this part for them. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-04-05 18:58:05 +02:00
Kevin Wolf	c2bc78b6a9	qcow2: Return real error in qcow2_update_snapshot_refcount This fixes the error message triggered by the following script: cat > /tmp/blkdebug.cfg <<EOF [inject-error] event = "cluster_free" errno = "28" immediately = "off" EOF $qemu_img create -f qcow2 test.qcow2 10G $qemu_img snapshot -c snap test.qcow2 $qemu_img snapshot -d snap blkdebug:/tmp/blkdebug.cfg:test.qcow2 Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-04-05 18:58:05 +02:00
Stefan Hajnoczi	f9e8cacc55	oslib-posix: rename socket_set_nonblock() to qemu_set_nonblock() The fcntl(fd, F_SETFL, O_NONBLOCK) flag is not specific to sockets. Rename to qemu_set_nonblock() just like qemu_set_cloexec(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>	2013-04-02 11:47:37 -04:00
Kevin Wolf	ecdd5333ab	qcow2: Gather clusters in a looping loop Instead of just checking once in exactly this order if there are dependendies, non-COW clusters and new allocation, this starts looping around these. This way we can, for example, gather non-COW clusters after new allocations as long as the host cluster offsets stay contiguous. Once handle_dependencies() is extended so that COW areas of in-flight allocations can be overwritten, this allows to continue with gathering other clusters (we wouldn't be able to do that without this change because we would have missed a possible second dependency in one of the next clusters). This means that in the typical sequential write case, we can combine the COW overwrite of one cluster with the allocation of the next cluster as soon as something like Delayed COW gets actually implemented. It is only by avoiding splitting requests this way that Delayed COW actually starts improving performance noticably. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:44 +01:00
Kevin Wolf	2c3b32d256	qcow2: Move cluster gathering to a non-looping loop This patch is mainly to separate the indentation change from the semantic changes. All that really changes here is that everything moves into a while loop, all 'goto done' become 'break' and at the end of the loop a new 'break is inserted. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:44 +01:00
Kevin Wolf	88c6588c51	qcow2: Allow requests with multiple l2metas Instead of expecting a single l2meta, have a list of them. This allows to still have a single I/O request for the guest data, even though multiple l2meta may be needed in order to describe both a COW overwrite and a new cluster allocation (typical sequential write case). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:44 +01:00
Kevin Wolf	710c2496d8	qcow2: Use byte granularity in qcow2_alloc_cluster_offset() This gets rid of the nb_clusters and keep_clusters and the associated complicated calculations. Just advance the number of bytes that have been processed and everything is fine. This patch advances the variables even after the last operation even though they aren't used any more afterwards to make things look more uniform. A later patch will turn the whole thing into a loop and then it actually starts making sense. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:44 +01:00
Kevin Wolf	411d62b04b	qcow2: Prepare handle_alloc/copied() for byte granularity This makes handle_alloc() and handle_copied() return byte-granularity host offsets instead of returning always the cluster start. This is required so that qcow2_alloc_cluster_offset() can stop aligning everything to cluster boundaries. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	e62daaf679	qcow2: handle_copied(): Implement non-zero host_offset Look only for clusters that start at a given physical offset. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	c53ede9f6d	qcow2: handle_copied(): Get rid of keep_clusters parameter Now *bytes is used to return the length of the area that can be written to without performing an allocation or COW. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	acb0467f8d	qcow2: handle_copied(): Get rid of nb_clusters parameter handle_copied() uses its bytes parameter now to determine how many clusters it should try to find. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	0af729ec00	qcow2: Factor out handle_copied() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	83baa9a471	qcow2: Clean up handle_alloc() Things can be simplified a bit now. No semantic changes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	c37f4cd71d	qcow2: Finalise interface of handle_alloc() The interface works completely on a byte granularity now and duplicated parameters are removed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	3b8e2e260c	qcow2: handle_alloc(): Get rid of keep_clusters parameter handle_alloc() is now called with the offset at which the actual new allocation starts instead of the offset at which the whole write request starts, part of which may already be processed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	f5bc635094	qcow2: handle_alloc(): Get rid of nb_clusters parameter We already communicate the same information in *bytes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	10f0ed8b2f	qcow2: Factor out handle_alloc() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	037689d896	qcow2: Decouple cluster allocation from cluster reuse code This moves some code that prepares the allocation of new clusters to where the actual allocation happens. This is the minimum required to be able to move it to a separate function in the next patch. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	65eb2e35c0	qcow2: Change handle_dependency to byte granularity This is a more precise description of what really constitutes a dependency. The behaviour doesn't change at this point because the COW area of the old request is still aligned to cluster boundaries and therefore an overlap is detected wheneven the requests touch any part of the same cluster. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	d9d74f4177	qcow2: Improve check for overlapping allocations The old code detected an overlapping allocation even when the allocations didn't actually overlap, but were only adjacent. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:43 +01:00
Kevin Wolf	17a71e5823	qcow2: Handle dependencies earlier Handling overlapping allocations isn't just a detail of cluster allocation. It is rather one of three ways to get the host cluster offset for a write request: 1. If a request overlaps an in-flight allocations, the cluster offset can be taken from there (this is what handle_dependencies will evolve into) or the request must just wait until the allocation has completed. Accessing the L2 is not valid in this case, it has outdated information. 2. Outside overlapping areas, check the clusters that can be written to as they are, with no COW involved. 3. If a COW is required, allocate new clusters Changing the code to reflect this doesn't change the behaviour because overlaps cannot exist for clusters that are kept in step 2. It does however make it easier for later patches to work on clusters that belong to an allocation that is still in flight. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:42 +01:00
Kevin Wolf	9ee6439e27	qcow2: Remove bogus unlock of s->lock The unlock wakes up the next coroutine, but the currently running coroutine will lock it again before it yields, so this doesn't make a lot of sense. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:42 +01:00
Kevin Wolf	c349ca4bb2	qcow2: Fix "total clusters" number in bdrv_check This should be based on the virtual disk size, not on the size of the image. Interesting observation: With some VM state stored in the image file, percentages higher than 100% are possible, even though snapshots themselves are ignored. This is a qcow2 bug to be fixed another day: The VM state should be discarded in the active L2 tables after completing the snapshot creation. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-28 11:52:42 +01:00
Stefan Weil	ea804cadf8	block: Add options QDict to bdrv_file_open() prototypes (fix MinGW build) The new parameter is unused yet. This part was missing in commit `787e4a8500`. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-25 09:53:04 +01:00
Liu Yuan	d43731c758	rbd: fix compile error Commit `787e4a85` [block: Add options QDict to bdrv_file_open() prototypes] didn't update rbd.c accordingly. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-25 09:51:43 +01:00
Kevin Wolf	681e7ad024	nbd: Check against invalid option combinations A file name may only specified if no host or socket path is specified. The latter two may not appear at the same time either. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:32 +01:00
Kevin Wolf	bebbf7fa9c	nbd: Use default port if only host is specified The URL method already takes care to apply the default port when none is specfied. Directly specifying driver-specific options required the port number until now. Allow leaving it out and apply the default. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:32 +01:00
Kevin Wolf	f5866fa438	block: Make find_image_format safe with NULL filename In order to achieve this, the .bdrv_probe callbacks of all drivers must cope with this. The DMG driver is the only one that bases its decision on the filename and it needs to be changed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:32 +01:00
Kevin Wolf	6963a30d82	block: Introduce .bdrv_parse_filename callback If a driver needs structured data and not just a string, it can provide a .bdrv_parse_filename callback now that parses the command line string into separate options. Keeping this separate from .bdrv_open_filename ensures that the preferred way of directly specifying the options always works as well if parsing the string works. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:32 +01:00
Kevin Wolf	f53a1febcd	nbd: Accept -drive options for the network connection The existing parsers for the file name now parse everything into the bdrv_open() options QDict. Instead of using these parsers, you can now directly specify the options on the command line, like this: qemu-system-x86_64 -drive file=nbd:,file.port=1234,file.host=::1 Clearly the file=... part could use further improvement, but it's a start. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:32 +01:00
Kevin Wolf	f17c90bed1	nbd: Keep hostname and port separate The NBD block supports an URL syntax, for which a URL parser returns separate hostname and port fields. It also supports the traditional qemu syntax encoded in a filename. Until now, after parsing the URL to get each piece of information, a new string is built to be fed to socket functions. Instead of building a string in the URL case that is immediately parsed again, parse the string in both cases and use the QemuOpts interface to qemu-sockets.c. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:31 +01:00
Kevin Wolf	787e4a8500	block: Add options QDict to bdrv_file_open() prototypes The new parameter is unused yet. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2013-03-22 17:51:31 +01:00
Kevin Wolf	acdfb480ba	qcow2: Fix segfault in qcow2_invalidate_cache Need to pass an options QDict to qcow2_open() now. This fixes a segfault on the migration target with qcow2. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-19 11:48:36 +01:00
Liu Yuan	fca23f0ad2	sheepdog: show error message for halt status Sheepdog (neither quorum nor unsafe mode) will refuse to serve IO requests when number of alive nodes is less than that of copies specified by users. This will return 0x19 to QEMU client which currently doesn't recognize it. This patch adds an error description when QEMU client receives it, other than plainly printing 'Invalid error code' Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-19 11:48:36 +01:00
Stefan Hajnoczi	c4d9d19645	threadpool: drop global thread pool Now that each AioContext has a ThreadPool and the main loop AioContext can be fetched with bdrv_get_aio_context(), we can eliminate the concept of a global thread pool from thread-pool.c. The submit functions must take a ThreadPool* argument. block/raw-posix.c and block/raw-win32.c use aio_get_thread_pool(bdrv_get_aio_context(bs)) to fetch the main loop's ThreadPool. tests/test-thread-pool.c must be updated to reflect the new thread_pool_submit() function prototypes. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>	2013-03-15 16:07:51 +01:00
MORITA Kazutaka	ed9ba72467	sheepdog: set io_flush handler in do_co_req If an io_flush handler is not set, qemu_aio_wait doesn't invoke callbacks. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-15 16:07:50 +01:00
MORITA Kazutaka	0d6db300cd	sheepdog: use non-blocking fd in coroutine context Using a blocking socket in the coroutine context reduces the chance of switching to other work. This patch makes the sheepdog driver use a non-blocking fd always. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-15 16:07:50 +01:00
Paolo Bonzini	381b487d54	qcow2: make is_allocated return true for zero clusters Otherwise, live migration of the top layer will miss zero clusters and let the backing file show through. This also matches what is done in qed. QCOW2_CLUSTER_ZERO clusters are invalid in v2 image files. Check this directly in qcow2_get_cluster_offset instead of replicating the test everywhere. Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-15 16:07:50 +01:00
Stefan Hajnoczi	3647917919	qcow2: drop unnecessary flush in qcow2_update_snapshot_refcount() We already flush when the function completes. There is no need to flush after every compressed cluster. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-15 16:07:50 +01:00
Stefan Hajnoczi	f9cb2860bd	qcow2: drop flush in update_cluster_refcount() The update_cluster_refcount() function increments/decrements a cluster's refcount and then returns the new refcount value. There is no need to flush since both update_cluster_refcount() callers already take care of this: 1. qcow2_alloc_bytes() calls update_cluster_refcount() when compressed sectors will be appended to an existing cluster with enough free space. qcow2_alloc_bytes() already flushes so there is no need to do so in update_cluster_refcount(). 2. qcow2_update_snapshot_refcount() sets a cache dependency on refcounts if it needs to update L2 entries. It also flushes before completing. Removing this flush significantly speeds up qcow2 snapshot creation: $ qemu-img create -f qcow2 test.qcow2 -o size=50G,preallocation=metadata $ time qemu-img snapshot -c new test.qcow2 Time drops from more than 3 minutes to under 1 second. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-15 16:07:50 +01:00
Stefan Hajnoczi	2154f24e4e	qcow2: flush in qcow2_update_snapshot_refcount() Users of qcow2_update_snapshot_refcount() do not flush consistently. qcow2_snapshot_create() flushes but qcow2_snapshot_goto() and qcow2_snapshot_delete() do not. Solve this by moving the bdrv_flush() into qcow2_update_snapshot_refcount(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-15 16:07:50 +01:00
Stefan Hajnoczi	c1f5bafd70	qcow2: set L2 cache dependency in qcow2_alloc_bytes() Compressed writes use qcow2_alloc_bytes() to allocate space with byte granularity. The affected clusters' refcounts will be incremented but we do not need to flush yet. Set a L2 cache dependency on the refcount block cache, so that the refcounts get written out before the L2 updates. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-15 16:07:50 +01:00
Stefan Hajnoczi	f6977f1556	qcow2: flush refcount cache correctly in qcow2_write_snapshots() Since qcow2 metadata is cached we need to flush the caches, not just the underlying file. Use bdrv_flush(bs) instead of bdrv_flush(bs->file). Also add the error return path when bdrv_flush() fails and move the flush after checking for qcow2_alloc_clusters() failure so that the qcow2_alloc_clusters() error return value takes precedence. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-15 16:07:50 +01:00
Stefan Hajnoczi	9991923b26	qcow2: flush refcount cache correctly in alloc_refcount_block() update_refcount() affects the refcount cache, it does not write to disk. Therefore bdrv_flush(bs->file) does nothing. We need to flush the refcount cache in order to write out the refcount updates! While we're here also add error returns when qcow2_cache_flush() fails. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-03-15 16:07:49 +01:00
Kevin Wolf	74c4510a3c	qcow2: Allow lazy refcounts to be enabled on the command line qcow2 images now accept a boolean lazy_refcounts options. Use it like this: -drive file=test.qcow2,lazy_refcounts=on If the option is specified on the command line, it overrides the default specified by the qcow2 header flags that were set when creating the image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-15 16:07:49 +01:00
Kevin Wolf	de9c0cec6c	block: Add options QDict to bdrv_open() prototype It doesn't do anything yet except storing the options QDict in the BlockDriverState. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-15 16:07:49 +01:00
Kevin Wolf	1a86938f04	block: Add options QDict to .bdrv_open() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-15 16:07:49 +01:00
Peter Lieven	cb1b83e740	iscsi: add iscsi_truncate support this patch adds iscsi_truncate which effectively allows for online resizing of iscsi volumes. for this to work you have to resize the volume on your storage and then call block_resize command in qemu which will issue a readcapacity16 to update the capacity. v4: - factor out complete readcapacity logic into a separate function - handle capacity change check condition in readcapacity function (this happens if the block_resize cmd is the first iscsi task executed after a resize on the storage) v3: - remove switch statement in iscsi_open - create separate patch for brdv_drain_all() in bdrv_truncate() v2: - add a general bdrv_drain_all() before bdrv_truncate() to avoid in-flight AIOs while the device is truncated - since no AIOs are in flight we can use a sync libiscsi call to re-read the capacity - factor out the readcapacity16 logic as it is redundant to iscsi_open() and iscsi_truncate(). Signed-off-by: Peter Lieven <pl@kamp.de> [allow any type of unit attention check condition in iscsi_readcapacity_sync(), as in Message-ID: <51263A2A.6070304@dlhnet.de> - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-03-05 17:51:51 +01:00
Peter Lieven	1dde716ed6	iscsi: retry read, write, flush and unmap on unit attention check conditions the storage might return a check condition status for various reasons. (e.g. bus reset, capacity change, thin-provisioning info etc.) currently all these informative status responses lead to an I/O error which is populated to the guest. this patch introduces a retry mechanism to avoid this. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-03-05 17:51:50 +01:00
MORITA Kazutaka	1b8bbb46e7	sheepdog: add support for connecting to unix domain socket This patch adds support for a unix domain socket for a connection between qemu and local sheepdog server. You can use the unix domain socket with the following syntax: $ qemu sheepdog+unix:///<vdiname>?socket=<socket path>[#snapid] Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-04 09:54:17 +01:00
MORITA Kazutaka	25af257d21	sheepdog: use inet_connect to simplify connect code This uses the form "<host>:<port>" for the representation of the sheepdog server to use inet_connect. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-04 09:54:17 +01:00
MORITA Kazutaka	5d6768e3b8	sheepdog: accept URIs The URI syntax is consistent with the NBD and Gluster syntax. The syntax is sheepdog[+tcp]://[host:port]/vdiname[#snapid\|#tag] Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-04 09:54:17 +01:00
MORITA Kazutaka	bf1c852aa9	move socket_set_nodelay to osdep.c Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-03-04 09:54:17 +01:00
Stefan Hajnoczi	4db35162ea	qcow2: support compressed clusters in BlockFragInfo Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-22 21:21:09 +01:00
Stefan Hajnoczi	fba31bae2d	qcow2: record fragmentation statistics during check The qemu-img check command can display fragmentation statistics: * Total number of clusters in virtual disk * Number of allocated clusters * Number of fragmented clusters This patch adds fragmentation statistics support to qcow2. Compressed and normal clusters count as allocated. Zero clusters are not counted as allocated unless their L2 entry has a non-zero offset (e.g. preallocation). Only the current L1 table counts towards the statistics - snapshots are ignored. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-22 21:21:09 +01:00
Stefan Hajnoczi	801f704452	qcow2: introduce check_refcounts_l1/l2() flags The check_refcounts_l1/l2() functions have a check_copied argument to check that the QCOW_O_COPIED flag is consistent with refcount == 1. This should be a bool, not an int. However, the next patch introduces qcow2 fragmentation statistics and also needs to pass an option to check_refcounts_l1/l2(). This is a good opportunity to use an int flags field. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-22 21:21:09 +01:00
Federico Simoncelli	c6bb9ad198	qemu-img: find the image end offset during check This patch adds the support for reporting the image end offset (in bytes). This is particularly useful after a conversion (or a rebase) where the destination is a block device in order to find the first unused byte at the end of the image. Signed-off-by: Federico Simoncelli <fsimonce@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-02-22 21:21:08 +01:00
Stefan Hajnoczi	8a8f584008	block/curl: only restrict protocols with libcurl>=7.19.4 The curl_easy_setopt(state->curl, CURLOPT_PROTOCOLS, ...) interface was introduced in libcurl 7.19.4. Therefore we cannot protect against CVE-2013-0249 when linking against an older libcurl. This fixes the build failure introduced by `fb6d1bbd24`. Reported-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Andreas Färber <andreas.faeber@web.de> Message-id: 1360743934-8337-1-git-send-email-stefanha@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-02-13 11:57:35 -06:00
Stefan Hajnoczi	33ccf6675f	Revert "block/vpc: Fix size calculation" This reverts commit `f880defbb0`. Jeff Cody's testing revealed that the interpretation of size differs even between VirtualPC and HyperV. Revert this so there is time to consider the impact of any backwards incompatible behavior this change creates. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-12 12:25:15 +01:00
Stefan Hajnoczi	da888d37b0	block/raw-posix: detect readonly Linux block devices using BLKROGET Linux block devices can be set read-only with "blockdev --setro <device>". The same thing can be done for LVM volumes using "lvchange --permission r <volume>". This read-only setting is independent of device node permissions. Therefore the device can still be opened O_RDWR but actual writes will fail. This results in odd behavior for QEMU. bdrv_open() is supposed to fail if a read-only image is being opened with BDRV_O_RDWR. By not failing for Linux block devices, the guest boots up but every write produces an I/O error. This patch checks whether the block device is read-only so that Linux block devices behave like regular files. Reported-by: Sibiao Luo <sluo@redhat.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-02-12 12:22:49 +01:00
Stefan Weil	f880defbb0	block/vpc: Fix size calculation The size calculated from the CHS values is not the real image (disk) size, but usually a smaller value. This is caused by rounding effects. Only older operating systems use CHS. Such guests won't be able to use the whole disk. All modern operating systems use the real size. This patch fixes https://bugs.launchpad.net/qemu/+bug/1105670/. Signed-off-by: Stefan Weil <sw@weilnetz.de> Message-id: 1360265212-22037-1-git-send-email-sw@weilnetz.de Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-02-11 08:14:41 -06:00
Markus Armbruster	312fd5f290	error: Strip trailing '\n' from error string arguments (again) Commit `6daf194d` and `be62a2eb` got rid of a bunch, but they keep coming back. Tracked down with this Coccinelle semantic patch: @r@ expression err, eno, cls, fmt; position p; @@ ( error_report(fmt, ...)@p \| error_set(err, cls, fmt, ...)@p \| error_set_errno(err, eno, cls, fmt, ...)@p \| error_setg(err, fmt, ...)@p \| error_setg_errno(err, eno, fmt, ...)@p ) @script:python@ fmt << r.fmt; p << r.p; @@ if "\\n" in str(fmt): print "%s:%s:%s:%s" % (p[0].file, p[0].line, p[0].column, fmt) Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 1360354939-10994-4-git-send-email-armbru@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-02-11 08:13:19 -06:00
Stefan Hajnoczi	fb6d1bbd24	block/curl: disable extra protocols to prevent CVE-2013-0249 There is a buffer overflow in libcurl POP3/SMTP/IMAP. The workaround is simple: disable extra protocols so that they cannot be exploited. Full details here: http://curl.haxx.se/docs/adv_20130206.html QEMU only cares about HTTP, HTTPS, FTP, FTPS, and TFTP. I have tested that this fix prevents the exploit on my host with libcurl-7.27.0-5.fc18. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-02-08 11:14:20 -06:00
Andreas Färber	fdf263f63f	block/raw-posix: Build fix for O_ASYNC Commit `eeb6b45d48` (block: raw-posix image file reopen) broke the build on OpenIndiana. illumos has no O_ASYNC. Exclude it from flags to be compared and instead assert that it is not set where defined. Cf. `e61ab1da7e` for qemu-ga. Cc: qemu-stable@nongnu.org (1.3.x) Cc: Jeff Cody <jcody@redhat.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andreas Färber <andreas.faerber@web.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 15:11:12 +01:00
Philipp Hahn	cd92347575	vmdk: Allow space in file name The previous scanf() format string stopped parsing the file name on the first white white space, which seems to be allowed at least by VMware Workstation. Change the format string to collect everything between the first and second quote as the file name, disallowing line breaks. Signed-off-by: Philipp Hahn <hahn@univention.de> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 14:58:29 +01:00
Kevin Wolf	46536235d8	parallels: Fix bdrv_open() error handling Return -errno instead of -1 on errors. Hey, no memory leak to fix here while we're touching it! Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 14:58:29 +01:00
Kevin Wolf	4f8aa2e19f	dmg: Use g_free instead of free The buffers are allocated with g_(re)alloc, so use g_free to free them. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 14:58:29 +01:00
Kevin Wolf	69d34a360d	dmg: Fix bdrv_open() error handling Return -errno instead of -1 on errors and add error checks in some places that didn't have one. Passing things by reference requires more correct typing, replaced a few off_ts therefore - with a 32-bit off_t this is even a fix for truncation bugs. While touching the code, fix even some more memory leaks than in the other drivers... Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 14:58:29 +01:00
Kevin Wolf	59294e4659	vpc: Fix bdrv_open() error handling Return -errno instead of -1 on errors. While touching the code, fix a memory leak. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 14:58:28 +01:00
Kevin Wolf	1a60657f57	cloop: Fix bdrv_open() error handling Return -errno instead of -1 on errors. While touching the code, fix a memory leak. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 14:58:28 +01:00
Kevin Wolf	5b7d7dfd19	bochs: Fix bdrv_open() error handling Return -errno instead of -1 on errors. While touching the code, fix a memory leak. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 14:58:28 +01:00
Liu Yuan	6f74c260b4	sheepdog: pass vdi_id to sheep daemon for sd_close() Sheep daemon needs vdi_id to identify which vdi is closed to release resources such as object cache. Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 14:58:28 +01:00
Othmar Pasteka	7f2039f611	vmdk: Allow selecting SCSI adapter in image creation Introduce a new option "adapter_type" when converting to vmdk images. It can be one of the following: ide (default), buslogic, lsilogic or legacyESX (according to the vmdk spec from vmware). In case of a non-ide adapter, heads is set to 255 instead of the 16. The latter is used for "ide". Also see LP#545089 Signed-off-by: Othmar Pasteka <pasteka@kabsi.at> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-02-01 14:58:28 +01:00
Markus Armbruster	6528499fa4	g_malloc(0) and g_malloc0(0) return NULL; simplify Once upon a time, it was decided that qemu_malloc(0) should abort. Switching to glib retired that bright idea. Some code that was added to cope with it (e.g. in commits `702ef63`, `b76b6e9`) is still around. Bury it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-30 11:14:46 +01:00
Paolo Bonzini	88ff0e48ee	mirror: do nothing on zero-sized disk On a zero-sized disk we need to break out of the job successfully before bdrv_dirty_iter_init is called, otherwise you will get an assertion failure with the next patch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:35 +01:00
Stefan Weil	0e87ba2ccb	block/vdi: Check for bad signature vdi_open did not check for a bad signature. This check was only in vdi_probe. Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:35 +01:00
Stefan Weil	8937f8222c	block/vdi: Improved return values from vdi_open vdi_open returned -1 in case of any error, but it should return an error code (negative value of errno or -EMEDIUMTYPE). Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:35 +01:00
Stefan Weil	9f0470bb2d	block/vdi: Improve debug output for signature The signature is a 32 bit value and needs up to 8 hex digits for printing. Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:35 +01:00
Stefan Weil	15bac0d54f	block: Use error code EMEDIUMTYPE for wrong format in some block drivers This improves error reports for bochs, cow, qcow, qcow2, qed and vmdk when a file with the wrong format is selected. Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:35 +01:00
Paolo Bonzini	884fea4e87	mirror: support arbitrarily-sized iterations Yet another optimization is to extend the mirroring iteration to include more adjacent dirty blocks. This limits the number of I/O operations and makes mirroring efficient even with a small granularity. Most of the infrastructure is already in place; we only need to put a loop around the computation of the origin and sector count of the iteration. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:35 +01:00
Paolo Bonzini	402a47411b	mirror: support more than one in-flight AIO operation With AIO support in place, we can start copying more than one chunk in parallel. This patch introduces the required infrastructure for this: the buffer is split into multiple granularity-sized chunks, and there is a free list to access them. Because of copy-on-write, a single operation may already require multiple chunks to be available on the free list. In addition, two different iterations on the HBitmap may want to copy the same cluster. We avoid this by keeping a bitmap of in-flight I/O operations, and blocking until the previous iteration completes. This should be a pretty rare occurrence, though; as long as there is no overlap the next iteration can start before the previous one finishes. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:35 +01:00
Paolo Bonzini	08e4ed6cde	mirror: add buf-size argument to drive-mirror This makes sense when the next commit starts using the extra buffer space to perform many I/O operations asynchronously. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:34 +01:00
Paolo Bonzini	bd48bde8f0	mirror: switch mirror_iteration to AIO There is really no change in the behavior of the job here, since there is still a maximum of one in-flight I/O operation between the source and the target. However, this patch already introduces the AIO callbacks (which are unmodified in the next patch) and some of the logic to count in-flight operations and only complete the job when there is none. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:34 +01:00
Paolo Bonzini	eee13dfe30	mirror: allow customizing the granularity The desired granularity may be very different depending on the kind of operation (e.g. continuous replication vs. collapse-to-raw) and whether the VM is expected to perform lots of I/O while mirroring is in progress. Allow the user to customize it, while providing a sane default so that in general there will be no extra allocated space in the target compared to the source. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:34 +01:00
Paolo Bonzini	50717e941b	block: allow customizing the granularity of the dirty bitmap Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:34 +01:00
Paolo Bonzini	acc906c6c5	block: return count of dirty sectors, not chunks Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:33 +01:00
Paolo Bonzini	b812f6719c	mirror: perform COW if the cluster size is bigger than the granularity When mirroring runs, the backing files for the target may not yet be ready. However, this means that a copy-on-write operation on the target would fill the missing sectors with zeros. Copy-on-write only happens if the granularity of the dirty bitmap is smaller than the cluster size (and only for clusters that are allocated in the source after the job has started copying). So far, the granularity was fixed to 1MB; to avoid the problem we detected the situation and required the backing files to be available in that case only. However, we want to lower the granularity for efficiency, so we need a better solution. The solution is to always copy a whole cluster the first time it is touched. The code keeps a bitmap of clusters that have already been allocated by the mirroring job, and only does "manual" copy-on-write if the chunk being copied is zero in the bitmap. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:33 +01:00
Paolo Bonzini	8f0720ecbc	block: implement dirty bitmap using HBitmap This actually uses the dirty bitmap in the block layer, and converts mirroring to use an HBitmapIter. Reviewed-by: Laszlo Ersek <lersek@redhat.com> (except block/mirror.c parts) Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-25 18:18:33 +01:00
Peter Lieven	7371d56fb2	iscsi: add support for iovectors This patch adds support for directly passing the iovec array from QEMUIOVector if libiscsi supports it (1.8.0 or newer). Signed-off-by: Peter Lieven <pl@kamp.de> [Preserve the improvements from commit `4cc841b`, iscsi: partly avoid iovec linearization in iscsi_aio_writev, 2012-11-19 - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-01-24 15:37:55 +01:00
Paolo Bonzini	4790b03d30	iscsi: do not leak acb->buf when commands are aborted acb->buf is freed in the WRITE(16) callback, but this may not get called at all when commands are aborted. Add another free in the ABORT TASK callback, which requires setting acb->buf to NULL everywhere. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-01-24 15:37:55 +01:00
Anthony Liguori	177f7fc688	Merge remote-tracking branch 'bonzini/scsi-next' into staging # By Peter Lieven (3) and others # Via Paolo Bonzini * bonzini/scsi-next: scsi: Drop useless null test in scsi_unit_attention() lsi: use qbus_reset_all to reset SCSI bus scsi: fix segfault with 0-byte disk iscsi: add support for iSCSI NOPs [v2] iscsi: partly avoid iovec linearization in iscsi_aio_writev iscsi: add iscsi_create support	2013-01-23 09:08:54 -06:00
Peter Lieven	5b5d34ec98	iscsi: add support for iSCSI NOPs [v2] This patch will send NOP-Out PDUs every 5 seconds to the iSCSI target. If a consecutive number of NOP-In replies fail a reconnect is initiated. iSCSI NOPs help to ensure that the connection to the target is still operational. This should not, but in reality may be the case even if the TCP connection is still alive if there are bugs in either the target or the initiator implementation. v2: - track the NOPs inside libiscsi so libiscsi can reset the counter in case it initiates a reconnect. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-01-22 15:07:03 +01:00
Peter Lieven	4cc841b57c	iscsi: partly avoid iovec linearization in iscsi_aio_writev libiscsi expects all write16 data in a linear buffer. If the iovec only contains one buffer we can skip the linearization step as well as the additional malloc/free and pass the buffer directly. Reported-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-01-22 15:07:03 +01:00
Peter Lieven	de8864e5ae	iscsi: add iscsi_create support This patch adds support for bdrv_create. This allows e.g. to use qemu-img to convert from any supported device to an iscsi backed storage as destination. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-01-22 15:07:03 +01:00
Anthony Liguori	8b17ed4caa	Merge remote-tracking branch 'stefanha/block' into staging # By Kevin Wolf (4) and others # Via Stefan Hajnoczi * stefanha/block: dataplane: support viostor virtio-pci status bit setting dataplane: avoid reentrancy during virtio_blk_data_plane_stop() win32-aio: use iov utility functions instead of open-coding them win32-aio: Fix memory leak win32-aio: Fix vectored reads aio: Fix return value of aio_poll() ide: Remove wrong assertion block: fix null-pointer bug on error case in block commit	2013-01-20 11:01:10 -06:00
Andreas Färber	c36dd8a09f	block/raw-posix: Make hdev_aio_discard() available outside Linux Fixes the build on OpenBSD among others. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Andreas Färber <andreas.faerber@web.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2013-01-19 14:35:02 +00:00
Michael Tokarev	3249dbe661	win32-aio: use iov utility functions instead of open-coding them We have iov_from_buf() and iov_to_buf(), use them instead of open-coding these in block/win32-aio.c Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-18 09:57:51 +01:00
Kevin Wolf	e8bccad5ac	win32-aio: Fix memory leak The buffer is allocated for both reads and writes, and obviously it should be freed even if an error occurs. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-17 10:58:09 +01:00
Kevin Wolf	bcbbd234d4	win32-aio: Fix vectored reads Copying data in the right direction really helps a lot! Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-17 10:57:13 +01:00
Jeff Cody	6d759117d3	block: fix null-pointer bug on error case in block commit This is a bug that was caught by a coverity run by Markus. In the error case when we errored out to exit_restore_open early in the function, 'overlay_bs' was still NULL at that point, although it is used to look up flags and perform a bdrv_reopen(). Move the overlay_bs lookup to where it is needed, and check for NULL before restoring the flags. Also get rid of the unneeded parameter initialization. Reported-By: Markus Armbruster <armbru@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-17 10:51:11 +01:00
Markus Armbruster	7191bf311e	block: Fix how mirror_run() frees its buffer It allocates with qemu_blockalign(), therefore it must free with qemu_vfree(), not g_free(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-15 17:28:55 +01:00
Markus Armbruster	7479acdbce	win32-aio: Fix how win32_aio_process_completion() frees buffer win32_aio_submit() allocates it with qemu_blockalign(), therefore it must be freed with qemu_vfree(), not g_free(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-15 16:47:45 +01:00
Liu Yuan	f700f8e346	sheepdog: clean up sd_aio_setup() The last two parameters of sd_aio_setup() are never used, so remove them. Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-15 13:40:10 +01:00
Liu Yuan	4778307278	sheepdog: multiplex the rw FD to flush cache This will reduce sockfds connected to the sheep server to one, which simply the future hacks. Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-15 11:18:49 +01:00
Paolo Bonzini	8238010b26	block: make discard asynchronous This is easy with the thread pool, because we can use s->is_xfs and s->has_discard from the worker function. QEMU has a widespread assumption that each I/O operation writes less than 2^32 bytes. This patch doesn't fix it throughout of course, but it starts correcting struct RawPosixAIOData so that there is no regression with respect to the synchronous discard implementation. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-15 10:03:47 +01:00
Paolo Bonzini	fcd9d45552	raw: support discard on block devices Block devices use a ioctl instead of fallocate, so add a separate implementation. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-15 10:03:47 +01:00
Paolo Bonzini	c85191e5c9	raw-posix: remember whether discard failed Avoid sending system calls repeatedly if they shall fail. This does not apply to XFS: if the filesystem-specific ioctl fails, something weird is happening. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-15 10:03:47 +01:00
Kusanagi Kouichi	3d4fa43e64	raw-posix: support discard on more filesystems Linux 2.6.38 introduced the filesystem independent interface to deallocate part of a file. As of Linux 3.7, btrfs, ext4, ocfs2, tmpfs and xfs support it. Even though the system calls here are in practice issued on Linux, the code is structured to allow plugging in alternatives for other Unix variants. EOPNOTSUPP is used unconditionally in this patch, but it is supported in both OpenBSD and Mac OS X since forever (see for example http://lists.debian.org/debian-glibc/2006/02/msg00337.html). Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-15 10:03:47 +01:00
Kevin Wolf	8d2497c355	qcow2: Fix segfault on zero-length write One of the recent refactoring patches (commit `f50f88b9`) didn't take care to initialise l2meta properly, so with zero-length writes, which don't even enter the write loop, qemu just segfaulted. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-15 09:08:55 +01:00
Anthony Liguori	da758bd7a3	Merge remote-tracking branch 'kwolf/for-anthony' into staging * kwolf/for-anthony: dataplane: handle misaligned virtio-blk requests dataplane: extract virtio-blk read/write processing into do_rdwr_cmd() block: make qiov_is_aligned() public raw-posix: fix bdrv_aio_ioctl sheepdog: implement direct write semantics block: do not probe zero-sized disks Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-01-14 10:26:26 -06:00
Stefan Hajnoczi	c53b1c5114	block: make qiov_is_aligned() public The qiov_is_aligned() function checks whether a QEMUIOVector meets a BlockDriverState's alignment requirements. This is needed by virtio-blk-data-plane so: 1. Move the function from block/raw-posix.c to block/block.c. 2. Make it public in block/block.h. 3. Rename to bdrv_qiov_is_aligned(). 4. Change return type from int to bool. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-14 10:06:56 +01:00
Paolo Bonzini	b608c8dc02	raw-posix: fix bdrv_aio_ioctl When the raw-posix aio=thread code was moved from posix-aio-compat.c to block/raw-posix.c, there was an unintended change to the ioctl code. The code used to return the ioctl command, which posix_aio_read() would later morph into a zero. This hack is not necessary anymore, and in fact breaks scsi-generic (which expects a zero return code). Remove it. Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-14 10:06:56 +01:00
Liu Yuan	0e7106d8b5	sheepdog: implement direct write semantics Sheepdog supports both writeback/writethrough write but has not yet supported DIRECTIO semantics which bypass the cache completely even if Sheepdog daemon is set up with cache enabled. Suppose cache is enabled on Sheepdog daemon size, the new cache control is cache=writeback # enable the writeback semantics for write cache=writethrough # enable the emulated writethrough semantics for write cache=directsync # disable cache competely Guest WCE toggling on the run time to toggle writeback/writethrough is also supported. Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-01-14 10:06:56 +01:00
Paolo Bonzini	4d4545743f	qemu-option: move standard option definitions out of qemu-config.c Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2013-01-12 17:17:53 +01:00
Stefan Weil	eb7ff6fb0b	Replace remaining gmtime, localtime by gmtime_r, localtime_r This allows removing of MinGW specific code and improves reentrancy for POSIX hosts. [Removed unused ret variable in qemu_get_timedate() to fix warning: vl.c: In function ‘qemu_get_timedate’: vl.c:451:16: error: variable ‘ret’ set but not used [-Werror=unused-but-set-variable] -- Stefan Hajnoczi] Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-11 09:44:37 +01:00
Liu Yuan	d6b1ef89a1	sheepdog: pass oid directly to send_pending_req() Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-02 16:09:00 +01:00
Liu Yuan	bd751f2204	sheepdog: don't update inode when create_and_write fails For the error case such as SD_RES_NO_SPACE, we shouldn't update the inode bitmap to avoid the scenario that the object is allocated but wasn't created at the server side. This will result in VM's IO error on the failed object. Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-02 16:08:58 +01:00
Stefan Weil	fccedc624c	block/raw-win32: Fix compiler warnings (wrong format specifiers) Commit `fbcad04d6b` added fprintf statements with wrong format specifiers. GetLastError() returns a DWORD which is unsigned long, so %lu must be used. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-02 16:08:57 +01:00
Stefan Hajnoczi	4065742ac0	raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane The raw_get_aio_fd() function allows virtio-blk-data-plane to get the file descriptor of a raw image file with Linux AIO enabled. This interface is really a layering violation that can be resolved once the block layer is able to run outside the global mutex - at that point virtio-blk-data-plane will switch from custom Linux AIO code to using the block layer. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2013-01-02 15:31:39 +01:00
Paolo Bonzini	9c17d615a6	softmmu: move include files to include/sysemu/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:32:45 +01:00
Paolo Bonzini	1de7afc984	misc: move include files to include/qemu/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:32:39 +01:00
Paolo Bonzini	caf71f86a3	migration: move include files to include/migration/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:31:32 +01:00
Paolo Bonzini	737e150e89	block: move include files to include/block/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:31:31 +01:00
Paolo Bonzini	7b1b5d1913	qapi: move include files to include/qobject/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:31:31 +01:00
Paolo Bonzini	f8fe796407	janitor: do not include qemu-char everywhere Touching char/char.h basically causes the whole of QEMU to be rebuilt. Avoid this, it is usually unnecessary. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:29:59 +01:00
Paolo Bonzini	077805fa92	janitor: do not rely on indirect inclusions of or from qemu-char.h Various header files rely on qemu-char.h including qemu-config.h or main-loop.h, but they really do not need qemu-char.h at all (particularly interesting is the case of the block layer!). Clean this up, and also add missing inclusions of qemu-char.h itself. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:29:52 +01:00
Paolo Bonzini	525877c999	build: move rules from Makefile to */Makefile.objs Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:29:06 +01:00
Kevin Wolf	226c3c26b9	qcow2: Factor out handle_dependencies() Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	4e95314e2b	qcow2: Execute run_dependent_requests() without lock There's no reason for run_dependent_requests() to hold s->lock, and a later patch will require that in fact the lock is not held. Also, before this patch, run_dependent_requests() not only does what its name suggests, but also removes the l2meta from the list of in-flight requests. When changing this, it becomes an one-liner, so just inline it completely. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	280d373579	qcow2: Enable dirty flag in qcow2_alloc_cluster_link_l2 This is closer to where the dirty flag is really needed, and it avoids having checks for special cases related to cluster allocation directly in the writev loop. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	f50f88b9fe	qcow2: Allocate l2meta only for cluster allocations Even for writes to already allocated clusters, an l2meta is allocated, though it stays effectively unused. After this patch, only allocating requests still have one. Each l2meta now describes an in-flight request that writes to clusters that are not yet hooked up in the L2 table. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	060bee8943	qcow2: Drop l2meta.cluster_offset There's no real reason to have an l2meta for normal requests that don't allocate anything. Before we can get rid of it, we must return the host cluster offset in a different way. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	cf5c1a231e	qcow2: Allocate l2meta dynamically As soon as delayed COW is introduced, the l2meta struct is needed even after completion of the request, so it can't live on the stack. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	593fb83cac	qcow2: Introduce Qcow2COWRegion This makes it easier to address the areas for which a COW must be performed. As a nice side effect, the COW code in qcow2_alloc_cluster_link_l2 becomes really trivial. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	1d3afd649b	qcow2: Round QCowL2Meta.offset down to cluster boundary The offset within the cluster is already present as n_start and this is what the code uses. QCowL2Meta.offset is only needed at a cluster granularity. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-13 15:37:59 +01:00
Kevin Wolf	67a7a0ebe5	qcow2: Move BLKDBG_EVENT out of the lock We want to use these events to suspend requests for testing concurrent AIO requests. Suspending requests while they are holding the CoMutex is rather boring for this purpose. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-12 12:33:48 +01:00
Kevin Wolf	3c90c65d7a	blkdebug: Implement suspend/resume of AIO requests This allows more systematic AIO testing. The patch adds three new operations to blkdebug: * Setting a "breakpoint" on a blkdebug event. The next request that triggers this breakpoint is suspended and is tagged with a name. The breakpoint is removed after a request has triggered it. * A suspended request (identified by it's tag) can be resumed * It's possible to check whether a suspended request with a given tag exists. This can be used for waiting for an event. Ideally, we would instead tag requests right when they are created and set breakpoints for individual requests. However, at this point the block layer doesn't allow this easily, and breakpoints that trigger for any request already allow a lot of useful testing. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-12 12:33:48 +01:00
Kevin Wolf	9e35542b0f	blkdebug: Factor out remove_rule() The cleanup work to remove a rule depends on the type of the rule. It's easy for the existing rules as there is no data that must be cleaned up and is specific to a type yet, but the next patch will change this. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-12 12:33:48 +01:00
Kevin Wolf	312a2ba0eb	blkdebug: Allow usage without config file As soon as new rules can be set during runtime, as introduced by the next patch, blkdebug makes sense even without a config file. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-12-12 12:33:48 +01:00
Fabien Chouteau	fbcad04d6b	Fix error code checking for SetFilePointer() call An error has occurred if the return value is invalid_set_file_pointer and getlasterror doesn't return no_error. Signed-off-by: Fabien Chouteau <chouteau@adacore.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2012-12-11 11:36:57 +01:00

1 2 3 4 5 ...

985 Commits