mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Kevin Wolf	38f3ef574b	raw: Prohibit dangerous writes for probed images If the user neglects to specify the image format, QEMU probes the image to guess it automatically, for convenience. Relying on format probing is insecure for raw images (CVE-2008-2004). If the guest writes a suitable header to the device, the next probe will recognize a format chosen by the guest. A malicious guest can abuse this to gain access to host files, e.g. by crafting a QCOW2 header with backing file /etc/shadow. Commit `1e72d3b` (April 2008) provided -drive parameter format to let users disable probing. Commit `f965509` (March 2009) extended QCOW2 to optionally store the backing file format, to let users disable backing file probing. QED has had a flag to suppress probing since the beginning (2010), set whenever a raw backing file is assigned. All of these additions that allow to avoid format probing have to be specified explicitly. The default still allows the attack. In order to fix this, commit `79368c8` (July 2010) put probed raw images in a restricted mode, in which they wouldn't be able to overwrite the first few bytes of the image so that they would identify as a different image. If a write to the first sector would write one of the signatures of another driver, qemu would instead zero out the first four bytes. This patch was later reverted in commit `8b33d9e` (September 2010) because it didn't get the handling of unaligned qiov members right. Today's block layer that is based on coroutines and has qiov utility functions makes it much easier to get this functionality right, so this patch implements it. The other differences of this patch to the old one are that it doesn't silently write something different than the guest requested by zeroing out some bytes (it fails the request instead) and that it doesn't maintain a list of signatures in the raw driver (it calls the usual probe function instead). Note that this change doesn't introduce new breakage for false positive cases where the guest legitimately writes data into the first sector that matches the signatures of an image format (e.g. for nested virt): These cases were broken before, only the failure mode changes from corruption after the next restart (when the wrong format is probed) to failing the problematic write request. Also note that like in the original patch, the restrictions only apply if the image format has been guessed by probing. Explicitly specifying a format allows guests to write anything they like. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1416497234-29880-8-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:13 +01:00
Max Reitz	2c28b21f7c	block: Add blk_add_close_notifier() for BB Adding something like a "delete notifier" to a BlockBackend would not make much sense, because whoever is interested in registering there will probably hold a reference to that BlockBackend; therefore, the notifier will never be called (or only when the notifiee already relinquished its reference and thus most probably is no longer interested in that notification). Therefore, this patch just passes through the close notifier interface of the root BDS. This will be called when the device is ejected, for instance, and therefore does make sense. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1416309679-333-4-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:12 +01:00
Max Reitz	2019ba0a01	block: Add AioContextNotifier functions to BB Because all BlockDriverStates behind a single BlockBackend reside in a single AioContext, it is fine to just pass these functions (blk_add_aio_context_notifier() and blk_remove_aio_context_notifier()) through to the root BlockDriverState. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1416309679-333-3-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:12 +01:00
Max Reitz	2bb0dce762	block: Lift more functions into BlockBackend There are already some blk_aio_* functions, so we might as well have blk_co_* functions (as far as we need them). This patch adds blk_co_flush(), blk_co_discard(), and also blk_invalidate_cache() (which is not a blk_co_* function but is needed nonetheless). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1416309679-333-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:12 +01:00
Max Reitz	8779441b1b	blkdebug: Simplify and improve filename generation Instead of actually recreating the options from scratch, just reuse the options given for creating the BDS, which are the configuration file name and additional options. In case there are no additional options we can thus create a plain filename. This obviously results in a different output for qemu-iotest 099 which exactly tests this filename generation. Fix it up as well. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1415697825-26678-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:11 +01:00
Kevin Wolf	9e193c5a65	block/qapi: Add cache information to query-block Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-12-10 10:31:09 +01:00
Fam Zheng	f71eaa74c0	qmp: Add optional switch "query-nodes" in query-blockstats This bool option will allow query all the node names. It iterates all the BDSes that are assigned a name, also in this case don't query up the backing chain. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:25:29 +01:00
Fam Zheng	4875a77950	block: Include "node-name" if present in query-blockstats Node name is a better identifier of BDS. We will want to query statistics of a BDS node buried in the BDS graph, so reporting the node's name if there is one will do the trick. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:25:29 +01:00
Kevin Wolf	24bf10dac3	Revert "qemu-img info: show nocow info" This reverts commit `000c4dfff4`. The main reason for reverting this commit before the 2.2 release is that it adds a QAPI interface that we don't want to keep: The 'nocow' flag doesn't generally make sense for block nodes, but only for the raw-posix driver. It should therefore be part of ImageInfoSpecific rather than ImageInfo. The commit contains more problems, but unlike the API stability issue they wouldn't justify reverting it. Conflicts: block/qapi.c Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-24 13:52:10 +01:00
Max Reitz	098ffa6674	block/raw-posix: Catch fsync() errors fsync() may fail, and that case should be handled. Reported-by: László Érsek <lersek@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-11-18 12:09:00 +01:00
Max Reitz	731de38052	block/raw-posix: Only sync after successful preallocation The loop which filled the file with zeroes may have been left early due to an error. In that case, the fsync() should be skipped. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-11-18 12:09:00 +01:00
Max Reitz	39411cf3c3	block/raw-posix: Fix preallocating write() loop write() may write less bytes than requested; in this case, the number of bytes written is returned. This is the byte count we should be subtracting from the number of bytes still to be written, and not the byte count we requested to write. Reported-by: László Érsek <lersek@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-11-18 12:08:59 +01:00
Markus Armbruster	d1f06fe665	raw-posix: The SEEK_HOLE code is flawed, rewrite it On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris, but not Linux), try_seek_hole() reports trailing data instead. Additionally, unlikely lseek() failures are treated badly: * When SEEK_HOLE fails, try_seek_hole() reports trailing data. For -ENXIO, there's in fact a trailing hole. Can happen only when something truncated the file since we opened it. * When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds, then try_seek_hole() reports a trailing hole. This is okay only when SEEK_DATA failed with -ENXIO (which means the non-trailing hole found by SEEK_HOLE has since become trailing somehow). For other failures (unlikely), it's wrong. * When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely), then try_seek_hole() reports bogus data [-1,start), which its caller raw_co_get_block_status() turns into zero sectors of data. Could theoretically lead to infinite loops in code that attempts to scan data vs. hole forward. Rewrite from scratch, with very careful comments. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2014-11-18 09:45:48 +01:00
Markus Armbruster	c4875e5b22	raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Commit `5500316` (May 2012) implemented raw_co_is_allocated() as follows: 1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl 2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek() 3. Else pretend there are no holes Later on, raw_co_is_allocated() was generalized to raw_co_get_block_status(). Commit `4f11aa8` (May 2014) changed it to try the three methods in order until success, because "there may be implementations which support [SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice versa." Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC. Commit `38c4d0a` (Sep 2014) added it. Because that's a significant speed hit, the next commit `7c159037` put SEEK_HOLE/SEEK_DATA first. As you see, the obvious use of FIEMAP is wrong, and the correct use is slow. I guess this puts it somewhere between -7 "The obvious use is wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard to Misuse scale[]. "Fortunately", the FIEMAP code is used only when SEEK_HOLE/SEEK_DATA aren't defined, but CONFIG_FIEMAP is Uncommon. SEEK_HOLE had no XFS implementation between 2011 (when it was introduced for ext4 and btrfs) and 2012. * SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails Unlikely. Thus, the FIEMAP code executes rarely. Makes it a nice hidey-hole for bugs. Worse, bugs hiding there can theoretically bite even on a host that has SEEK_HOLE/SEEK_DATA. I don't want to worry about this crap, not even theoretically. Get rid of it. [*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2014-11-18 09:45:35 +01:00
Markus Armbruster	be2ebc6dad	raw-posix: Fix comment for raw_co_get_block_status() Missed in commit `705be72`. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2014-11-18 09:44:02 +01:00
Fam Zheng	5f58330790	vmdk: Leave bdi intact if -ENOTSUP in vmdk_get_info When extent types don't match, we return -ENOTSUP. In this case, be polite to the caller and don't modify bdi. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1415938161-16217-1-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-14 09:20:45 +00:00
Max Reitz	d20418ee51	block/vdi: Limit maximum size even futher The block layer read and write functions do not like requests which are bigger than INT_MAX bytes. Since the VDI bmap is read and written in a single operation, its size is therefore limited accordingly. This reduces the maximum VDI image size supported by QEMU to half of what it currently is (down to approximately 512 TB). The VDI test 084 has to be adapted accordingly. Actually, one could clearly see that it was broken from the "Could not open 'TEST_DIR/t.IMGFMT': Invalid argument" line for an image which was supposed to work just fine. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de>	2014-11-09 23:39:50 +01:00
Peter Maydell	9a33c0c851	-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUV2wdAAoJEJykq7OBq3PIjcAH/29rl938ETw1wjXxYe3uH+R6 K2yFEiPh9/cOJSH0mJ+gD8DZIN+iyR4eoQGP2s5ALFPcX3bkYxRLlUeYK0BCp883 esc7gO6XPhLvTVqP0xgACRCdUwH2I0VTToDlHjXXZogyI/DuDX3gzWJufE3x1DGs WNTMOp5n/uYkWH3rI3DkInmbSddEz3pgX65a8BuYtw0V/RSeSRnHKDYHMygvJBRL EVfWRNeOIrZ730CyJry0t8ITjsZxiBDKXR5glNSwaIfQUfGkTSWi9YNSurNYkUDr aMS2rgvOVlrOUDKTHUj9oS3jgoGWcDtlk9E1MeSoyIptbRoMhdFVl1AUJZsrMJU= =Mfbu -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging # gpg: Signature made Mon 03 Nov 2014 11:50:53 GMT using RSA key ID 81AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" * remotes/stefanha/tags/block-pull-request: (53 commits) block: declare blockjobs and dataplane friends! block: let commit blockjob run in BDS AioContext block: let mirror blockjob run in BDS AioContext block: let stream blockjob run in BDS AioContext block: let backup blockjob run in BDS AioContext block: add bdrv_drain() blockjob: add block_job_defer_to_main_loop() blockdev: add note that block_job_cb() must be thread-safe blockdev: acquire AioContext in blockdev_mark_auto_del() blockdev: acquire AioContext in do_qmp_query_block_jobs_one() block: acquire AioContext in generic blockjob QMP commands iotests: Expand test 061 block/qcow2: Simplify shared L2 handling in amend block/qcow2: Make get_refcount() global block/qcow2: Implement status CB for amend qemu-img: Fix insignificant memleak qemu-img: Add progress output for amend block: Add status callback to bdrv_amend_options() block: qemu-iotest 107 supports NFS iotests: Add test for qcow2's bdrv_make_empty ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2014-11-03 18:34:09 +00:00
Peter Maydell	7135781f65	trivial patches for 2014-11-02 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJUVhuDAAoJEL7lnXSkw9fbOKMIAIE3XZMhar4Vmokb/K0DFbnh gy2z7iCe7vumLKiRSJX1LGmkFO3dwykw82JZQ1SVo0RdgguJ5dx1Abx1qDM1rojL jJT0pJ9zWPl4fTv38wCEfaysQHPdgwoH4826ga+MXnVS9XHRHHxuQ4vI01AK3oyQ 4t6/wto9H8kF3n6ny7tz5WNZClsq7qbiIqw5nNCILQfSh/VBPwxQNBiWf/nYVMuY Ubk5noztZwH+hbiAQL5lAPz/HolcRwg1tzbR0dfmt8/aqO28rJhasG58JgtziI2y JSg4BwldqUQEgiHonArLfQDixjLtEEyL+fQSzZm02ixwcBpc/ADSyGDy2R1zpH8= =j1ga -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2014-11-02' into staging trivial patches for 2014-11-02 # gpg: Signature made Sun 02 Nov 2014 11:54:43 GMT using RSA key ID A4C3D7DB # gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>" # gpg: aka "Michael Tokarev <mjt@corpit.ru>" # gpg: aka "Michael Tokarev <mjt@debian.org>" * remotes/mjt/tags/pull-trivial-patches-2014-11-02: (23 commits) vdi: wrapped uuid_unparse() in #ifdef tap: fix possible fd leak in net_init_tap tap: do not close(fd) in net_init_tap_one target-i386: Remove unused model_features_t struct tap_int.h: remove repeating NETWORK_SCRIPT defines os-posix: reorder parent notification for -daemonize pidfile: stop making pidfile error a special case os-posix: replace goto again with a proper loop os-posix: use global daemon_pipe instead of cryptic fds[1] dump: Fix dump-guest-memory termination and use-after-close virtio-9p-proxy: improve error messages in connect_namedsocket() virtio-9p-proxy: fix error return in proxy_init() virtio-9p-proxy: Fix sockfd leak target-tricore: check return value before using it net/slirp: specify logbase for smbd Revert "os-posix: report error message when lock file failed" util: Improve os_mem_prealloc error message sparse: fix build target-arm: A64: remove redundant store target-xtensa: mark XtensaConfig structs as unused ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2014-11-03 14:55:17 +00:00
Stefan Hajnoczi	9e85cd5ce0	block: let commit blockjob run in BDS AioContext The commit block job must run in the BlockDriverState AioContext so that it works with dataplane. Acquire the AioContext in blockdev.c so starting the block job is safe. One detail here is that the bdrv_drain_all() must be moved inside the aio_context_acquire() region so requests cannot sneak in between the drain and acquire. The completion code in block/commit.c must perform backing chain manipulation and bdrv_reopen() from the main loop. Use block_job_defer_to_main_loop() to achieve that. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1413889440-32577-11-git-send-email-stefanha@redhat.com	2014-11-03 11:41:49 +00:00
Stefan Hajnoczi	5a7e7a0bad	block: let mirror blockjob run in BDS AioContext The mirror block job must run in the BlockDriverState AioContext so that it works with dataplane. Acquire the AioContext in blockdev.c so starting the block job is safe. Note that to_replace is treated separately from other BlockDriverStates in that it does not need to be in the same AioContext. Explicitly acquire/release to_replace's AioContext when accessing it. The completion code in block/mirror.c must perform BDS graph manipulation and bdrv_reopen() from the main loop. Use block_job_defer_to_main_loop() to achieve that. The bdrv_drain_all() call is not allowed outside the main loop since it could lead to lock ordering problems. Use bdrv_drain(bs) instead because we have acquired the AioContext so nothing else can sneak in I/O. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1413889440-32577-10-git-send-email-stefanha@redhat.com	2014-11-03 11:41:49 +00:00
Stefan Hajnoczi	f3e69beb94	block: let stream blockjob run in BDS AioContext The stream block job must run in the BlockDriverState AioContext so that it works with dataplane. The basics of acquiring the AioContext are easy in blockdev.c. The tricky part is the completion code which drops part of the backing file chain. This must be done in the main loop where bdrv_unref() and bdrv_close() are safe to call. Use block_job_defer_to_main_loop() to achieve that. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1413889440-32577-9-git-send-email-stefanha@redhat.com	2014-11-03 11:41:49 +00:00
Stefan Hajnoczi	761731b180	block: let backup blockjob run in BDS AioContext The backup block job must run in the BlockDriverState AioContext so that it works with dataplane. The basics of acquiring the AioContext are easy in blockdev.c. The completion code in block/backup.c must call bdrv_unref() from the main loop. Use block_job_defer_to_main_loop() to achieve that. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1413889440-32577-8-git-send-email-stefanha@redhat.com	2014-11-03 11:41:49 +00:00
Max Reitz	ecf58777c5	block/qcow2: Simplify shared L2 handling in amend Currently, we have a bitmap for keeping track of which clusters have been created during the zero cluster expansion process. This was necessary because we need to properly increase the refcount for shared L2 tables. However, now we can simply take the L2 refcount and use it for the cluster allocated for expansion. This will be the correct refcount and therefore we don't have to remember that cluster having been allocated any more. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Message-id: 1414404776-4919-7-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:49 +00:00
Max Reitz	44751917db	block/qcow2: Make get_refcount() global Reading the refcount of a cluster is an operation which can be useful in all of the qcow2 code, so make that function globally available. While touching this function, amend the comment describing the "addend" parameter: It is (no longer, if it ever was) necessary to have it set to -1 or 1; any value is fine. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Message-id: 1414404776-4919-6-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:49 +00:00
Max Reitz	4057a2b24a	block/qcow2: Implement status CB for amend The only really time-consuming operation potentially performed by qcow2_amend_options() is zero cluster expansion when downgrading qcow2 images from compat=1.1 to compat=0.10, so report status of that operation and that operation only through the status CB. For this, approximate the progress as the number of L1 entries visited during the operation. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Message-id: 1414404776-4919-5-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:49 +00:00
Max Reitz	7748543420	block: Add status callback to bdrv_amend_options() Depending on the changed options and the image format, bdrv_amend_options() may take a significant amount of time. In these cases, a way to be informed about the operation's status is desirable. Since the operation is rather complex and may fundamentally change the image, implementing it as AIO or a coroutine does not seem feasible. On the other hand, implementing it as a block job would be significantly more difficult than a simple callback and would not add benefits other than progress report to the amending operation, because it should not actually be run as a block job at all. A callback may not be very pretty, but it's very easy to implement and perfectly fits its purpose here. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414404776-4919-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:48 +00:00
Max Reitz	d4a3238af5	qemu-img: Implement commit like QMP qemu-img should use QMP commands whenever possible in order to ensure feature completeness of both online and offline image operations. As qemu-img itself has no access to QMP (since this would basically require just everything being linked into qemu-img), imitate QMP's implementation of block-commit by using commit_active_start() and then waiting for the block job to finish. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1414159063-25977-9-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:48 +00:00
Max Reitz	b21c76529d	block/mirror: Improve progress report Instead of taking the total length of the block device as the block job's length, use the number of dirty sectors. The progress is now the number of sectors mirrored to the target block device. Note that this may result in the job's length increasing during operation, which is however in fact desirable. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414159063-25977-8-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:48 +00:00
Max Reitz	94054183da	qcow2: Optimize bdrv_make_empty() bdrv_make_empty() is currently only called if the current image represents an external snapshot that has been committed to its base image; it is therefore unlikely to have internal snapshots. In this case, bdrv_make_empty() can be greatly sped up by emptying the L1 and refcount table (while having the dirty flag set, which only works for compat=1.1) and creating a trivial refcount structure. If there are snapshots or for compat=0.10, fall back to the simple implementation (discard all clusters). [Applied s/clusters/cluster/ typo fix suggested by Eric Blake --Stefan] Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1414159063-25977-4-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:48 +00:00
Max Reitz	491d27e2af	qcow2: Implement bdrv_make_empty() Implement this function by making all clusters in the image file fall through to the backing file (by using the recently extended discard). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414159063-25977-3-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:48 +00:00
Max Reitz	808c4b6f30	qcow2: Allow "full" discard Normally, discarded sectors should read back as zero. However, there are cases in which a sector (or rather cluster) should be discarded as if they were never written in the first place, that is, reading them should fall through to the backing file again. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414159063-25977-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:47 +00:00
Max Reitz	d7f62751a1	raw-posix: raw_co_get_block_status() return value Instead of generating the full return value thrice in try_fiemap(), try_seek_hole() and as a fall-back in raw_co_get_block_status() itself, generate the value only in raw_co_get_block_status(). While at it, also remove the pnum parameter from try_fiemap() and try_seek_hole(). Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414148280-17949-3-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:47 +00:00
Max Reitz	e6d7ec32dd	raw-posix: Fix raw_co_get_block_status() after EOF As its comment states, raw_co_get_block_status() should unconditionally return 0 and set pnum to 0 for after EOF. An assertion after lseek(..., SEEK_HOLE) tried to catch this case by asserting that errno != -ENXIO (which would indicate a position after the EOF); but it should be errno != ENXIO instead. Regardless of that, there should be no such assertion at all. If bdrv_getlength() returned an outdated value and the image has been resized outside of qemu, lseek() will return with errno == ENXIO. Just return that value as an error then. Setting pnum to 0 and returning 0 should not be done here, as in that case we should update the device length as well. So, from qemu's perspective, the file has not been resized; it's just that there was an error querying sectors beyond a certain point (the actual file size). Additionally, nb_sectors should be clamped against the image end. This was probably not an issue if FIEMAP or SEEK_HOLE/SEEK_DATA worked, but the fallback did not take this case into account. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414148280-17949-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:47 +00:00
Richard W.M. Jones	f76faeda4b	block/curl: Improve type safety of s->timeout. qemu_opt_get_number returns a uint64_t, and curl_easy_setopt expects a long (not an int). There is no warning about the latter type error because curl_easy_setopt uses a varargs argument. Store the timeout (which is a positive number of seconds) as a uint64_t. Check that the number given by the user is reasonable. Zero is permissible (meaning no timeout is enforced by cURL). Cast it to long before calling curl_easy_setopt to fix the type error. Example error message after this change has been applied: $ ./qemu-img create -f qcow2 /tmp/test.qcow2 \ -b 'json: { "file.driver":"https", "file.url":"https://foo/bar", "file.timeout":-1 }' qemu-img: /tmp/test.qcow2: Could not open 'json: { "file.driver":"https", "file.url":"https://foo/bar", "file.timeout":-1 }': timeout parameter is too large or negative: Invalid argument Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:47 +00:00
Zhang Haoyu	3432a1929e	snapshot: add bdrv_drain_all() to bdrv_snapshot_delete() to avoid concurrency problem If there are still pending i/o while deleting snapshot, because deleting snapshot is done in non-coroutine context, and the pending i/o read/write (bdrv_co_do_rw) is done in coroutine context, so it's possible to cause concurrency problem between above two operations. Add bdrv_drain_all() to bdrv_snapshot_delete() to avoid this problem. Signed-off-by: Zhang Haoyu <zhanghy@sangfor.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 201410211637596311287@sangfor.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:42 +00:00
Adam Crume	be21788495	rbd: Add support for bdrv_invalidate_cache This fixes Ceph issue 2467: ttp://tracker.ceph.com/issues/2467 [Dropped return r in void function as suggested by Josh Durgin <josh.durgin@inktank.com>. --Stefan] Signed-off-by: Adam Crume <adamcrume@gmail.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1412880272-3154-1-git-send-email-adamcrume@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:41 +00:00
Denis V. Lunev	da725d0b0e	block/parallels: fix access to not initialized memory in catalog_bitmap found by valgrind. Command: ./qemu-img convert -f parallels -O qcow2 1.hds 1.img Invalid read of size 4 at 0x17D0EF: parallels_co_read (parallels.c:357) by 0x11FEE4: bdrv_aio_rw_vector (block.c:4640) by 0x11FFBF: bdrv_aio_readv_em (block.c:4652) by 0x11F55F: bdrv_co_readv_em (block.c:4862) by 0x123428: bdrv_aligned_preadv (block.c:3056) by 0x1239FA: bdrv_co_do_preadv (block.c:3162) by 0x125424: bdrv_rw_co_entry (block.c:2706) by 0x155DD9: coroutine_trampoline (coroutine-ucontext.c:118) by 0x6975B6F: ??? (in /lib/x86_64-linux-gnu/libc-2.19.so) The problem is that s->catalog_bitmap is allocated/filled as gmalloc(s->catalog_size) thus index validity check must be inclusive, i.e. index >= s->catalog_size is invalid. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1412759610-2257-4-git-send-email-den@openvz.org CC: Jeff Cody <jcody@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:41 +00:00
Peter Lieven	dc9e716369	block/iscsi: check for oversized requests Cancel oversized requests early. They would generate an iSCSI protocol error anyway; after having transferred possibly a lot of data over the wire. Suggested-By: Max Reitz <mreitz@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:41 +00:00
Peter Lieven	3dab155154	block/iscsi: use sector_limits_lun2qemu throughout iscsi_refresh_limits As Max pointed out there is a hidden cast from int64_t to int for all limits. So use the newly introduced sector_limits_lun2qemu for all limits received from the target. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:41 +00:00
Peter Lieven	52f6fa1430	block/iscsi: set max_transfer_length Copy the max_xfer_len from the BlockLimits VPD or use the maximum value fitting in the CDB. The helper function sector_limits_lun2qemu is introduced to convert and cap the limits from the VPD to the maximum power of two fitting in an integer; integer is the range for nb_sectors throughout the block layer. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:41 +00:00
SeokYeon Hwang	f18a768efb	vdi: wrapped uuid_unparse() in #ifdef Wrapped uuid_unparse() in #ifdef to avoid "-Wunused-function" on clang 3.4 or later. Signed-off-by: SeokYeon Hwang <syeon.hwang@samsung.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2014-11-02 10:05:35 +03:00
Fam Zheng	c1d4096b0f	iscsi: Refuse to open as writable if the LUN is write protected Before, when a write protected iSCSI target is attached as scsi-disk with BDRV_O_RDWR, we report it as writable, while in fact all writes will fail. One way to improve this is to report write protect flag as true to guest, but a even better way is to refuse using a write protected LUN to guest. Target write protect flag is checked with a mode sense query. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-10-31 11:29:02 +01:00
Roger Pau Monne	3cad83075c	block: char devices on FreeBSD are not behind a pager Introduce a new flag to mark devices that require requests to be aligned and replace the usage of BDRV_O_NOCACHE and O_DIRECT with this flag when appropriate. If a character device is used as a backend on a FreeBSD host set this flag unconditionally. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2014-10-23 16:56:53 +02:00
Max Reitz	a1391444fe	qcow2: Do not overflow when writing an L1 sector While writing an L1 table sector, qcow2_write_l1_entry() copies the respective range from s->l1_table to the local "buf" array. The size of s->l1_table does not have to be a multiple of L1_ENTRIES_PER_SECTOR; thus, limit the index which is used for copying all entries to the L1 size. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:02 +02:00
Max Reitz	17bd5f4727	qcow2: Drop REFCOUNT_SHIFT With BDRVQcowState.refcount_block_bits, we don't need REFCOUNT_SHIFT anymore. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:02 +02:00
Max Reitz	791230d8bb	qcow2: Clean up after refcount rebuild Because the old refcount structure will be leaked after having rebuilt it, we need to recalculate the refcounts and run a leak-fixing operation afterwards (if leaks should be fixed at all). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	c7c0681bc8	qcow2: Rebuild refcount structure during check The previous commit introduced the "rebuild" variable to qcow2's implementation of the image consistency check. Now make use of this by adding a function which creates a completely new refcount structure based solely on the in-memory information gathered before. The old refcount structure will be leaked, however. This leak will be dealt with in a follow-up commit. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	f307b2558f	qcow2: Do not perform potentially damaging repairs If a referenced cluster has a refcount of 0, increasing its refcount may result in clusters being allocated for the refcount structures. This may overwrite the referenced cluster, therefore we cannot simply increase the refcount then. In such cases, we can either try to replicate all the refcount operations solely for the check operation, basing the allocations on the in-memory refcount table; or we can simply rebuild the whole refcount structure based on the in-memory refcount table. Since the latter will be much easier, do that. To prepare for this, introduce a "rebuild" boolean which should be set to true whenever a fix is rather dangerous or too complicated using the current refcount structures. Another example for this is refcount blocks being referenced more than once. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	001c158def	qcow2: Fix refcount blocks beyond image end If the qcow2 check function detects a refcount block located beyond the image end, grow the image appropriately. This cannot break anything and is the logical fix for such a case. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	9696df219a	qcow2: Reuse refcount table in calculate_refcounts() We will later call calculate_refcounts multiple times, so reuse the refcount table if possible. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	641bb63cd6	qcow2: Let inc_refcounts() resize the reftable Now that the refcount table can be passed around by reference, do that for inc_refcounts() (and subsequently check_refcounts_l1() and check_refcounts_l2()) and use it for resizing it when a cluster after the image end is encountered. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	fef4d3d564	qcow2: Let inc_refcounts() return -errno As of a future patch, inc_refcounts() will have to throw errors which are generally signaled by returning -errno. Therefore, let it return an integer which is either 0 for success or -errno and handle the -errno case in all callers. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	ad27390c85	qcow2: Split fail code in L1 and L2 checks Instead of printing out an error message, incrementing check_errors and returning a fixed -errno, just do cleanups and return -ret, with ret set by the code which threw the exception (jumped to the fail label). Also, increment check_errors on error in check_refcounts_l2(). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	713d9675e0	qcow2: Use int64_t for in-memory reftable size Use int64_t for the entry count of the in-memory refcount table throughout the check functions. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	057a3fe57e	qcow2: Pull check_refblocks() up Pull check_refblocks() before calculate_refcounts() so we can drop its static declaration. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	78fb328e85	qcow2: Use sizeof(refcount_table) When implementing variable refcounts, we want to be able to easily find all the places in qemu which are tied to a certain refcount order. Replace sizeof(uint16_t) in the check code by sizeof(refcount_table) so we can later find it more easily. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	6ca56bf5e9	qcow2: Split qcow2_check_refcounts() Put the code for calculating the reference counts and comparing them during qemu-img check into own functions. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	5b84106bd9	qcow2: Fix leaks in dirty images When opening dirty images, qcow2's repair function should not only repair errors but leaks as well. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	1d13d65466	qcow2: Calculate refcount block entry count The size of a refblock entry is (in theory) variable; calculate therefore the number of entries per refblock and the according bit shift (1 << x == entry count) when opening an image. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	e9082e4736	block/vdi: Use {DIV_,}ROUND_UP There are macros for these operations, so make use of them. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Markus Armbruster	84ebe3755f	block: Make device model's references to BlockBackend strong Doesn't make a difference just yet, but it's the right thing to do. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 14:03:51 +02:00
Markus Armbruster	a7f53e26a6	block: Lift device model API into BlockBackend Move device model attachment / detachment and the BlockDevOps device model callbacks and their wrappers from BlockDriverState to BlockBackend. Wrapper calls in block.c change from bdrv_dev_FOO_cb(bs, ...) to if (bs->blk) { bdrv_dev_FOO_cb(bs->blk, ...); } No change, because both bdrv_dev_change_media_cb() and bdrv_dev_resize_cb() do nothing when no device model is attached, and a device model can be attached only when bs->blk. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 14:03:50 +02:00
Markus Armbruster	d829a2115f	block/qapi: Convert qmp_query_block() to BlockBackend Much more command code needs conversion. I start with this one because it's using bdrv_dev_* functions, which I'm about to lift into BlockBackend. While there, give bdrv_query_info() internal linkage. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 14:03:50 +02:00
Markus Armbruster	26f8b3a847	blockdev: Fix blockdev-add not to create DriveInfo blockdev_init() always creates a DriveInfo, but only drive_new() fills it in. qmp_blockdev_add() leaves it blank. This results in a drive with type = IF_IDE, bus = 0, unit = 0. Screwed up in commit `ee13ed1c`. Board initialization code looking for IDE drive (0,0) can pick up one of these bogus drives. The QMP command has to execute really early to be visible. Not sure how likely that is in practice. Fix by creating DriveInfo in drive_new(). Block backends created by blockdev-add don't get one. Breaks the test for "has been created by qmp_blockdev_add()" in blockdev_mark_auto_del() and do_drive_del(), because it changes the value of dinfo && !dinfo->enable_auto_del from true to false. Simply test !dinfo instead. Leaves DriveInfo member enable_auto_del unused. Drop it. A few places assume a block backend always has a DriveInfo. Fix them up. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 14:03:50 +02:00
Markus Armbruster	d3aeb1b7da	blockdev: Drop superfluous DriveInfo member id Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 14:03:50 +02:00
Markus Armbruster	4be746345f	hw: Convert from BlockDriverState to BlockBackend, mostly Device models should access their block backends only through the block-backend.h API. Convert them, and drop direct includes of inappropriate headers. Just four uses of BlockDriverState are left: * The Xen paravirtual block device backend (xen_disk.c) opens images itself when set up via xenbus, bypassing blockdev.c. I figure it should go through qmp_blockdev_add() instead. * Device model "usb-storage" prompts for keys. No other device model does, and this one probably shouldn't do it, either. * ide_issue_trim_cb() uses bdrv_aio_discard() instead of blk_aio_discard() because it fishes its backend out of a BlockAIOCB, which has only the BlockDriverState. * PC87312State has an unused BlockDriverState[] member. The next two commits take care of the latter two. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 14:02:25 +02:00
Markus Armbruster	097310b53e	block: Rename BlockDriverCompletionFunc to BlockCompletionFunc I'll use it with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:27 +02:00
Markus Armbruster	7c84b1b831	block: Rename BlockDriverAIOCB* to BlockAIOCB* I'll use BlockDriverAIOCB with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:27 +02:00
Markus Armbruster	7f06d47eff	block: Merge BlockBackend and BlockDriverState name spaces BlockBackend's name space is separate only to keep the initial patches simple. Time to merge the two. Retain bdrv_find() and bdrv_get_device_name() for now, to keep this series manageable. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	bfb197e0d9	block: Eliminate BlockDriverState member device_name[] device_name[] can become non-empty only in bdrv_new_root() and bdrv_move_feature_fields(). The latter is used only to undo damage done by bdrv_swap(). The former is called only by blk_new_with_bs(). Therefore, when a BlockDriverState's device_name[] is non-empty, then it's been created with a BlockBackend, and vice versa. Furthermore, blk_new_with_bs() keeps the two names equal. Therefore, device_name[] is redundant. Eliminate it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	9ba10c95a4	block: Make BlockBackend own its BlockDriverState On BlockBackend destruction, unref its BlockDriverState. Replaces the callers' unrefs. This turns the pointer from BlockBackend to BlockDriverState into a strong reference, managed with bdrv_ref() / bdrv_unref(). The back-pointer remains weak. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	8fb3c76c94	block: Code motion to get rid of stubs/blockdev.c Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	18e46a033d	block: Connect BlockBackend and DriveInfo Make the BlockBackend own the DriveInfo. Change blockdev_init() to return the BlockBackend instead of the DriveInfo. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	7e7d56d9e0	block: Connect BlockBackend to BlockDriverState Convenience function blk_new_with_bs() creates a BlockBackend with its BlockDriverState. Callers have to unref both. The commit after next will relieve them of the need to unref the BlockDriverState. Complication: due to the silly way drive_del works, we need a way to hide a BlockBackend, just like bdrv_make_anon(). To emphasize its "special" status, give the function a suitably off-putting name: blk_hide_on_behalf_of_do_drive_del(). Unfortunately, hiding turns the BlockBackend's name into the empty string. Can't avoid that without breaking the blk->bs->device_name equals blk->name invariant. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	26f54e9a3c	block: New BlockBackend A block device consists of a frontend device model and a backend. A block backend has a tree of block drivers doing the actual work. The tree is managed by the block layer. We currently use a single abstraction BlockDriverState both for tree nodes and the backend as a whole. Drawbacks: * Its API includes both stuff that makes sense only at the block backend level (root of the tree) and stuff that's only for use within the block layer. This makes the API bigger and more complex than necessary. Moreover, it's not obvious which interfaces are meant for device models, and which really aren't. * Since device models keep a reference to their backend, the backend object can't just be destroyed. But for media change, we need to replace the tree. Our solution is to make the BlockDriverState generic, with actual driver state in a separate object, pointed to by member opaque. That lets us replace the tree by deinitializing and reinitializing its root. This special need of the root makes the data structure awkward everywhere in the tree. The general plan is to separate the APIs into "block backend", for use by device models, monitor and whatever other code dealing with block backends, and "block driver", for use by the block layer and whatever other code (if any) dealing with trees and tree nodes. Code dealing with block backends, device models in particular, should become completely oblivious of BlockDriverState. This should let us clean up both APIs, and the tree data structures. This commit is a first step. It creates a minimal "block backend" API: type BlockBackend and functions to create, destroy and find them. BlockBackend objects are created and destroyed exactly when root BlockDriverState objects are created and destroyed. "Root" in the sense of "in bdrv_states". They're not yet used for anything; that'll come shortly. A root BlockDriverState is created with bdrv_new_root(), so where to create a BlockBackend is obvious. Where these roots get destroyed isn't always as obvious. It is obvious in qemu-img.c, qemu-io.c and qemu-nbd.c, and in error paths of blockdev_init(), blk_connect(). That leaves destruction of objects successfully created by blockdev_init() and blk_connect(). blockdev_init() is used only by drive_new() and qmp_blockdev_add(). Objects created by the latter are currently indestructible (see commit `48f364d` "blockdev: Refuse to drive_del something added with blockdev-add" and commit `2d246f0` "blockdev: Introduce DriveInfo.enable_auto_del"). Objects created by the former get destroyed by drive_del(). Objects created by blk_connect() get destroyed by blk_disconnect(). BlockBackend is reference-counted. Its reference count never exceeds one so far, but that's going to change. In drive_del(), the BB's reference count is surely one now. The BDS's reference count is greater than one when something else is holding a reference, such as a block job. In this case, the BB is destroyed right away, but the BDS lives on until all extra references get dropped. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	e4e9986b1c	block: Split bdrv_new_root() off bdrv_new() Creating an anonymous BDS can't fail. Make that obvious. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Max Reitz	ec0de76874	nbd: Fix filename generation Export names may be used with nbd+unix, too, fix nbd_refresh_filename() accordingly. Also, for nbd+tcp, the documented path schema is "nbd://host[:port]/export", so use it. Furthermore, as can be seen from that schema, the port is optional. That makes six single cases for how the filename can be formatted; it is not easy to generalize these cases without the resulting statement being completely unreadable, thus there is simply one snprintf() per case. Finally, taking the options from BDRVNBDState::socket_opts is wrong, because those will not contain the export name. Just use BlockDriverState::options instead. Reported-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Tony Breeds	7c15903789	block/raw-posix: use seek_hole ahead of fiemap try_fiemap() uses FIEMAP_FLAG_SYNC which has a significant performance impact. Prefer seek_hole() over fiemap() to avoid this impact where possible. seek_hole is more widely used and, arguably, has potential to be optimised in the kernel. Reported-By: Michael Steffens <michael_steffens@posteo.de> Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Pádraig Brady <pbrady@redhat.com> Cc: Eric Blake <eblake@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Tony Breeds	38c4d0aea3	block/raw-posix: Fix disk corruption in try_fiemap Using fiemap without FIEMAP_FLAG_SYNC is a known corrupter. Add the FIEMAP_FLAG_SYNC flag to the FS_IOC_FIEMAP ioctl. This has the downside of significantly reducing performance. Reported-By: Michael Steffens <michael_steffens@posteo.de> Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Pádraig Brady <pbrady@redhat.com> Cc: Eric Blake <eblake@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Zhang Haoyu	d8bb71b622	qcow2: fix leak of Qcow2DiscardRegion in update_refcount_discard When the Qcow2DiscardRegion is adjacent to another one referenced by "d", free this Qcow2DiscardRegion metadata referenced by "p" after it was removed from s->discards queue. Signed-off-by: Zhang Haoyu <zhanghy@sangfor.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Max Reitz	9009b1963c	qapi: Add corrupt field to ImageInfoSpecificQCow2 Just like lazy-refcounts, this field will be present iff the qcow2 compat level is 1.1 (or probably any future revision). As expected, this breaks some tests due to the new field present in qemu-img info output; so fix their output accordingly. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1412105489-7681-3-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-10-04 19:18:17 +01:00
Fam Zheng	d1319b077a	vmdk: Fix integer overflow in offset calculation This fixes the bug introduced by commit `c6ac36e` (vmdk: Optimize cluster allocation). $ ~/build/master/qemu-io /stor/vm/arch.vmdk -c 'write 2G 1k' write failed: Invalid argument Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1411437381-11234-1-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-10-03 10:30:33 +01:00
Richard W.M. Jones	18fe46d79a	ssh: Don't crash if either host or path is not specified. $ ./qemu-img create -f qcow2 overlay \ -b 'json: { "file.driver":"ssh", "file.host":"localhost", "file.host_key_check":"no" }' qemu-img: qobject/qdict.c:193: qdict_get_obj: Assertion `obj != ((void *)0)' failed. Aborted A similar crash also happens if the file.host field is omitted. https://bugzilla.redhat.com/show_bug.cgi?id=1147343 Bug found and reported by Jun Li. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-10-03 10:30:33 +01:00
Peter Maydell	1831e15060	This update brings dataplane to virtio-scsi (NOT yet 100% thread-safe, though, which makes it really, really experimental. It also brings asynchronous cancellation to the SCSI subsystem and implements it in virtio-scsi. This is a pretty important feature. Almost all the work here was done by Fam Zheng. I also included the virtio refcount fixes from Gonglei, because they had a small conflict with virtio-scsi dataplane. This pull request is using the new subkey 4E6B09D7. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJUKpR2AAoJEBRUblpOawnXNLAH/RBeF66ZqWc29dl78JKEbv0+ C5pL61GhlI5vFIjSbPU3/iaZQifw3E4NLvX3SCN5ImsLzBw4r3qerapP2Ut96K/j 5CYdWTF1oqE32oCefvlWhJulHmE1vxGN53BvOz3HHxoehdF1/tJ0wUoZyfztGTOF tiW85VMewi6CKm47/ns5tSNfGMVzWHqnUg67z/mwN6ZmPFU1dXBlgmiIv8Znahrn B1AOAeMjWaKvOS+tiYNVG6k0GENWGoiypxiTR3ZXLQKxOYdkh/X0ARULqLMonASX YsT772nzO9KZDIsdLj9QZZmM7vxs7UhW0MgQlvcSWP9vfZa5SeuRSgoXorPDj3Q= =54T3 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging This update brings dataplane to virtio-scsi (NOT yet 100% thread-safe, though, which makes it really, really experimental. It also brings asynchronous cancellation to the SCSI subsystem and implements it in virtio-scsi. This is a pretty important feature. Almost all the work here was done by Fam Zheng. I also included the virtio refcount fixes from Gonglei, because they had a small conflict with virtio-scsi dataplane. This pull request is using the new subkey 4E6B09D7. # gpg: Signature made Tue 30 Sep 2014 12:31:02 BST using RSA key ID 4E6B09D7 # gpg: Good signature from "Paolo Bonzini <pbonzini@redhat.com>" # gpg: aka "Paolo Bonzini <bonzini@gnu.org>" * remotes/bonzini/tags/for-upstream: (39 commits) block/iscsi: handle failure on malloc of the allocationmap util: introduce bitmap_try_new virtio-scsi: Handle TMF request cancellation asynchronously scsi: Introduce scsi_req_cancel_async scsi: Introduce scsi_req_cancel_complete scsi: Drop SCSIReqOps.cancel_io scsi: Unify request unref in scsi_req_cancel scsi-generic: Handle canceled request in scsi_command_complete scsi: Drop scsi_req_abort virtio-scsi: Process ".iothread" property virtio-scsi: Call bdrv_io_plug/bdrv_io_unplug in cmd request handling virtio-scsi: Batched prepare for cmd reqs virtio-scsi: Two stages processing of cmd request virtio-scsi: Add migration state notifier for dataplane code virtio-scsi: Hook up with dataplane virtio-scsi-dataplane: Code to run virtio-scsi on iothread virtio-scsi: Add VirtIOSCSIVring in VirtIOSCSIReq virtio-scsi: Add 'iothread' property to virtio-scsi virtio: add a wrapper for virtio-backend initialization virtio-9p: fix virtio-9p child refcount in transports ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2014-09-30 16:45:35 +01:00
Peter Lieven	a9fe4c957b	block/iscsi: handle failure on malloc of the allocationmap Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-30 13:30:51 +02:00
Kevin Wolf	ed9114356b	raw-posix: Fix build without posix_fallocate() Check for the presence of posix_fallocate() in configure and only compile in support for PREALLOC_MODE_FALLOC when it's there. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2014-09-29 16:28:24 +01:00
Peter Maydell	0ebcc56453	Block patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJUJbcwAAoJEH8JsnLIjy/WmckP/RBMK3gkr2aArp0k/sbdFcnW D7GusGkoj1jXRVFKuG3d+T/nozJTX0mkfnaRo8QCg/5VVJtqeHfsfY6JHgppQEai C5Z8gtzIFxsQNE0/8d2D6zJMnqsIeXAVtDONWgWcDOZstVXn+KEKOw5E3KCZeGxF yLA+h4h8sncT3ysPE0/jnQqeIwGJpLusFXmICiRY2lXRcCgY/qniymUq25FZO69m Isw8iVHrUr05u1kfthdQeSAmdPQqI5tK5siR4CERtO2XdlyiODmdo5DRZmATfGiL CGUIJd30mvrKpqRpIk8gbuGEtz92RuTnyoi9xkLcKi52eWyflGH4gjN3ojYe5Pro hRY06AcwI9lAVjCy7lKPO6dJn8RWOi/wkgfffK0CFujlo8r1BtwfCW0Y9koYuhdq mM/K0IEbjNgOvacHLIMOTQuKz3DLv7QB8wZQ+cTkdmvC6kNGnqD9yrA4cAQP4kjR wjm/WfDIRKPif6cMwgaOiQeNaL0VF2LIn4GRWgaiIqEYfcKzy65uBFoPqOlAoXr8 NtTzCIXDIBbTSzz3VLBKD22FOsP0pfPrMeXkf+GwJwStKMcp6PGY4iUd3+M3bfOP iUsxTJ4Uw5zMg0MgOaaWnUaDdbRtKGtH1y2UA+4X3gO+3W8Vz0e8B5cPMBqN/mpy lUS4mBA08mTvaLE520rp =ManP -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block patches # gpg: Signature made Fri 26 Sep 2014 19:57:52 BST using RSA key ID C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" * remotes/kevin/tags/for-upstream: qemu-iotests: Fail test if explicit test case number is unknown block: Validate node-name vpc: fix beX_to_cpu() and cpu_to_beX() confusion docs: add blkdebug block driver documentation block: Catch simultaneous usage of options and their aliases block: Specify -drive legacy option aliases in array block: Improve message for device name clashing with node name qemu-nbd: Destroy the BlockDriverState properly block: Keep DriveInfo alive until BlockDriverState dies blockdev: Disentangle BlockDriverState and DriveInfo creation blkdebug: show an error for invalid event names Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2014-09-29 12:26:15 +01:00
Stefan Hajnoczi	44e7ebb8bb	trace-events: drop orphan iscsi trace events iscsi_aio_write16_cb, iscsi_aio_writev, iscsi_aio_read16_cb, and iscsi_aio_readv have not not been in use since commit `063c3378a9` ("block/iscsi: introduce bdrv_co_{readv, writev, flush_to_disk}"). These were the only trace events in block/iscsi.c so drop the the trace.h include. Cc: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-id: 1411394595-15300-4-git-send-email-stefanha@redhat.com	2014-09-26 09:34:38 +01:00
Stefan Hajnoczi	a4127c428e	vpc: fix beX_to_cpu() and cpu_to_beX() confusion The beX_to_cpu() and cpu_to_beX() functions perform the same operation - they do a byteswap if the host CPU endianness is little-endian or a nothing otherwise. The point of two names for the same operation is that it documents which direction the data is being converted. This makes it clear whether the data is suitable for CPU processing or in its external representation. This patch fixes incorrect beX_to_cpu()/cpu_to_beX() usage. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-25 15:24:14 +02:00
Stefan Hajnoczi	d4362d642e	blkdebug: show an error for invalid event names It is easy to typo a blkdebug configuration and waste a lot of time figuring out why no rules are matching. Push the Error** down into add_rule() so we can report an error when the event name is invalid. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-25 15:24:14 +02:00
Peter Maydell	380f649e02	-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUIAsHAAoJEJykq7OBq3PIyR4H/ikrs35Boxv08mR8rTyfpSnQ ERQhMnKWIe3cy5pzNG5TKiPliljF0FnkNC3KmBLU5TsoqXeW76WJF//Db5hNnTzG FAIeJu2RUSqhjqoz5K6TNYOJGH2XQ+/EZbMyrIeLBwYFn0gFMvZJOVYgpBWP0QTQ 7sImrlxihUalwwL/6twfE6s5aA12DXN8hlC57u+9nvf+5ocaDyOJ7jUBU2EEhSNr TzDTO3gTCSEmnDriwKi3m3mIW/y7kLTXWyGolprZ0UpRyYRSmjcfgBHu8l0X8NCv lKkrYuE4V3QIzJk4BWbZQSPjoDlLbH9gnq3H+VwMxMZDxDKtAqAJRw/3H4yWhZ8= =JJEF -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging # gpg: Signature made Mon 22 Sep 2014 12:41:59 BST using RSA key ID 81AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" * remotes/stefanha/tags/block-pull-request: (59 commits) block: Always compile virtio-blk dataplane vring: Better error handling if num is too large virtio: Import virtio_vring.h async: aio_context_new(): Handle event_notifier_init failure block: vhdx - fix reading beyond pointer during image creation block: delete cow block driver block/archipelago: Fix typo in qemu_archipelago_truncate() ahci: Add test_identify case to ahci-test. ahci: Add test_hba_enable to ahci-test. ahci: Add test_hba_spec to ahci-test. ahci: properly shadow the TFD register ahci: add test_pci_enable to ahci-test. ahci: Add test_pci_spec to ahci-test. ahci: MSI capability should be at 0x80, not 0x50. ahci: Adding basic functionality qtest. layout: Add generators for refcount table and blocks fuzz: Add fuzzing functions for entries of refcount table and blocks docs: List all image elements currently supported by the fuzzer qapi/block-core: Add "new" qcow2 options qcow2: Add overlap-check.template option ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2014-09-23 12:08:55 +01:00
Jeff Cody	e91a8b2fef	block: vhdx - fix reading beyond pointer during image creation In vhdx_create_metadata(), we allocate 40 bytes to entry_buffer for the various metadata table entries. However, we write out 64kB from that buffer into the new file. Only write out the correct 40 bytes. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:46 +01:00
Stefan Hajnoczi	550830f935	block: delete cow block driver This patch removes support for the cow file format. Normally we do not break backwards compatibility but in this case there is no impact and it is the most logical option. Extraordinary claims require extraordinary evidence so I will show why removing the cow block driver is the right thing to do. The cow file format is the disk image format for Usermode Linux, a way of running a Linux system in userspace. The performance of UML was never great and it was hacky, but it enjoyed some popularity before hardware virtualization support became mainstream. QEMU's block/cow.c is supposed to read this image file format. Unfortunately the file format was underspecified: 1. Earlier Linux versions used the MAXPATHLEN constant for the backing filename field. The value of MAXPATHLEN can change, so Linux switched to a 4096 literal but QEMU has a 1024 literal. 2. Padding was not used on the header struct (both in the Linux kernel and in QEMU) so the struct layout varied across architectures. In particular, i386 and x86_64 were different due to int64_t alignment differences. Linux now uses __attribute__((packed)), QEMU does not. Therefore: 1. QEMU cow images do not conform to the Linux cow image file format. 2. cow images cannot be shared between different host architectures. This means QEMU cow images are useless and QEMU has not had bug reports from users actually hitting these issues. Let's get rid of this thing, it serves no purpose and no one will be affected. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-id: 1410877464-20481-1-git-send-email-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:45 +01:00
Chrysostomos Nanakos	3e7e6f186f	block/archipelago: Fix typo in qemu_archipelago_truncate() Fix a typo introduced by `94c80a438c` Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:44 +01:00
Max Reitz	ee42b5ce0b	qcow2: Add overlap-check.template option Being able to set the overlap-check option to a string and then refine it via the overlap-check.* options is a nice idea for the command line but does not work so well for non-flattened dicts. In that case, one can only specify either but not both, so add a field to overlap-check.* which does the same as directly specifying overlap-check but can be used in conjunction with the other fields in non-flattened dicts. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1408557576-14574-4-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:34 +01:00
Max Reitz	7b17ce60cc	qcow2: Fix leak of QemuOpts in qcow2_open() Currently, the QemuOpts object opts is leaked if anything fails from its creation up to and including the image repair block. Fix this by freeing that object in the fail path. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Message-id: 1408557576-14574-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:32 +01:00
Max Reitz	a97c67ee6c	qcow2: Check L1/L2/reftable entries for alignment Offsets taken from the L1, L2 and refcount tables are generally assumed to be correctly aligned. However, this cannot be guaranteed if the image has been written to by something different than qemu, thus check all offsets taken from these tables for correct cluster alignment. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1409926039-29044-5-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:28 +01:00
Max Reitz	adb435522b	qcow2: Use qcow2_signal_corruption() for overlaps Use the new function in case of a failed overlap check. This changes output in case of corruption, so adapt iotest 060's reference output accordingly. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Message-id: 1409926039-29044-4-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:28 +01:00
Max Reitz	85186ebdac	qcow2: Add qcow2_signal_corruption() Add a helper function for easily marking an image corrupt (on fatal corruptions) while outputting an informative message to stderr and via QAPI. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Message-id: 1409926039-29044-3-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:27 +01:00
Max Reitz	9bf040b962	qapi/block: Add "fatal" to BLOCK_IMAGE_CORRUPTED Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when reading from an image, they should generally not be. Nonetheless, even an image only read from may of course be corrupted and this can be detected during normal operation. In this case, a non-fatal event should be emitted, but the image should not be marked corrupt (in accordance to "fatal" set to false). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1409926039-29044-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:26 +01:00
Fam Zheng	e819ab2277	block: Introduce "null" drivers This is an analogue to Linux null_blk. It can be used for testing or benchmarking block device emulation and general block layer functionalities such as coroutines and throttling, where disk IO is not necessary or wanted. Use null-aio:// for AIO version, and null-co:// for coroutine version. [Resolved conflict with Fam's async bdrv_aio_cancel() series: 1. Drop .bdrv_aio_cancel() since it is now done by block.c 2. Rename qemu_aio_release() to qemu_aio_unref() --Stefan] Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1410415798-20673-2-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:23 +01:00
Fam Zheng	8007429a99	block: Rename qemu_aio_release -> qemu_aio_unref Suggested-by: Benoît Canet <benoit.canet@irqsave.net> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:17 +01:00
Fam Zheng	5da91e4ef4	win32-aio: Drop win32_aiocb_info.cancel Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:15 +01:00
Fam Zheng	6d24b4df38	sheepdog: Convert sd_aiocb_info.cancel to .cancel_async Also drop the now unused SheepdogAIOCB.finished field. Note that this aio is internal to sheepdog driver and has NULL cb and opaque, and should be unused at all. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:14 +01:00
Fam Zheng	7691e24dbe	rbd: Drop rbd_aiocb_info.cancel And also drop the now unused "cancelled" field. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:13 +01:00
Fam Zheng	7940e5056b	quorum: Convert quorum_aiocb_info.cancel to .cancel_async Before, we cancel all the child requests with bdrv_aio_cancel, then free the acb.. Now we just kick off asynchronous cancellation of child requests and return, we know quorum_aio_cb will be called later, so in the end quorum_aio_finalize will take care of calling the caller's cb. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:13 +01:00
Liu Yuan	997dd8df3e	quorum: fix quorum_aio_cancel() For a fifo read pattern, we only have one running aio (possible other cases that has less number than num_children in the future), so we need to check if .acb is NULL against bdrv_aio_cancel() to avoid segfault. Cc: Eric Blake <eblake@redhat.com> Cc: Benoit Canet <benoit@irqsave.net> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:12 +01:00
Fam Zheng	533ffb17a5	qed: Drop qed_aiocb_info.cancel Also drop the now unused ->finished field. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:11 +01:00
Fam Zheng	facb5539d6	curl: Drop curl_aiocb_info.cancel Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:10 +01:00
Fam Zheng	9784e70f3d	blkverify: Drop blkverify_aiocb_info.cancel Also the finished pointer is not used any more. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:09 +01:00
Fam Zheng	4c781717c5	blkdebug: Drop blkdebug_aiocb_info.cancel Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:08 +01:00
Fam Zheng	4743b42a1b	archipelago: Drop archipelago_aiocb_info.cancel The cancelled flag is no longer useful. Later the request will complete as before, and cb will be called. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:07 +01:00
Fam Zheng	722d93335f	iscsi: Convert iscsi_aiocb_info.cancel to .cancel_async Also drop the unused field "canceled". Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:06 +01:00
Fam Zheng	771b64daf9	linux-aio: Convert laio_aiocb_info.cancel to .cancel_async Just call io_cancel (2), if it fails, it means the request is not canceled, so the event loop will eventually call qemu_laio_process_completion. In qemu_laio_process_completion, change to call the cb unconditionally. It is required by bdrv_aio_cancel_async. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:39:04 +01:00
Peter Maydell	c2ebb05e25	block/vhdx.c: Mark parent_vhdx_guid variable as unused The parent_vhdx_guid variable is defined but never used, which provokes complaints from newer versions of clang. Since the variable definition is here acting as documentation of the image format, mark it with the 'unused' attribute to keep the compiler happy rather than simply deleting it. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-22 11:35:02 +01:00
Adelina Tuvenie	a011898d25	block: allow creation of fixed vhdx images When trying to create a fixed vhd image qemu-img will return the following error: qemu-img: test.vhdx: Could not create image: Cannot allocate memory This happens because of a incorrect check in vhdx.c. Specifficaly, in vhdx_create_bat(), after allocating memory for the BAT entry, there is a check to determine if the allocation was unsuccsessful. The error comes from the fact that it checks if s->bat isn't NULL, which is true in case of succsessful allocation, and exits with error ENOMEM. Signed-off-by: Adelina Tuvenie <atuvenie@cloudbasesolutions.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2014-09-22 12:09:31 +04:00
Hu Tao	0e4271b711	qcow2: Add falloc and full preallocation option preallocation=falloc allocates disk space by posix_fallocate(), preallocation=full allocates disk space by writing zeros to disk. Both modes imply preallocation=metadata. Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-12 15:43:06 +02:00
Hu Tao	06247428be	raw-posix: Add falloc and full preallocation option This patch adds a new option preallocation for raw format, and implements falloc and full preallocation. Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-12 15:43:06 +02:00
Hu Tao	ffeaac9b4e	qapi: introduce PreallocMode and new PreallocModes full and falloc. This patch prepares for the subsequent patches. Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-12 15:43:06 +02:00
Hu Tao	180e95265e	block: don't convert file size to sector size and avoid converting it back later. Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-12 15:43:06 +02:00
Hu Tao	c2eb918e32	block: round up file size to nearest sector Currently the file size requested by user is rounded down to nearest sector, causing the actual file size could be a bit less than the size user requested. Since some formats (like qcow2) record virtual disk size in bytes, this can make the last few bytes cannot be accessed. This patch fixes it by rounding up file size to nearest sector so that the actual file size is no less than the requested file size. Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-12 15:43:06 +02:00
Chrysostomos Nanakos	94c80a438c	block/archipelago: Implement bdrv_truncate() Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Benoît Canet	5366d0c8bc	block: Make the block accounting functions operate on BlockAcctStats This is the next step for decoupling block accounting functions from BlockDriverState. In a future commit the BlockAcctStats structure will be moved from BlockDriverState to the device models structures. Note that bdrv_get_stats was introduced so device models can retrieve the BlockAcctStats structure of a BlockDriverState without being aware of it's layout. This function should go away when BlockAcctStats will be embedded in the device models structures. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Keith Busch <keith.busch@intel.com> CC: Anthony Liguori <aliguori@amazon.com> CC: "Michael S. Tsirkin" <mst@redhat.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: Peter Maydell <peter.maydell@linaro.org> CC: Michael Tokarev <mjt@tls.msk.ru> CC: John Snow <jsnow@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Alexander Graf <agraf@suse.de> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Benoît Canet	28298fd3d9	block: rename BlockAcctType members to start with BLOCK_ instead of BDRV_ The middle term goal is to move the BlockAcctStats structure in the device models. (Capturing I/O accounting statistics in the device models is good for billing) This patch make a small step in this direction by removing a reference to BDRV. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Keith Busch <keith.busch@intel.com> CC: Anthony Liguori <aliguori@amazon.com> CC: "Michael S. Tsirkin" <mst@redhat.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Richard Henderson <rth@twiddle.net> CC: Markus Armbruster <armbru@redhat.com> CC: Alexander Graf <agraf@suse.de>i Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Benoît Canet	5e5a94b605	block: Extract the block accounting code The plan is to add new accounting metrics (latency, invalid requests, failed requests, queue depth) and block.c is overpopulated so it will be better to work in a separate module. Moreover the long term plan is to have statistics in each of the BDS of the graph for metrology purpose; this means that the device model statistics must move from the topmost BDS to the device model. So we need to decouple the statistic code from BlockDriverState. This is another argument for the extraction of the code in a separate module. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: Benoit Canet <benoit@irqsave.net> CC: Fam Zheng <famz@redhat.com> CC: Peter Crosthwaite <peter.crosthwaite@xilinx.com> CC: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Benoît Canet	0ddd0ad96a	block: Extract the BlockAcctStats structure Extract the block accounting statistics into a structure so the block device models can hold them in the future. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Xiaodong Gong	0d4cc3e715	Fix improper usage of cpu_to_be32 in vpc cpu_to_be32() is wrong since vhd_type is an enum constant (just a regular CPU-endian integer). Signed-off-by: Xiaodong Gong <gordongong0350@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-09-10 10:41:29 +02:00
Stefan Hajnoczi	b6b1d31f09	vmdk: fix buf leak in vmdk_parse_extents() vmdk_open_sparse() does not take ownership of buf so the caller always needs to free it. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2014-09-08 11:12:44 +01:00
Stefan Hajnoczi	ff74f33c31	vmdk: fix vmdk_parse_extents() extent_file leaks Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2014-09-08 11:12:44 +01:00
Chrysostomos Nanakos	072f9ac44a	block/archipelago: Use QEMU atomic builtins Replace __sync builtins with ones provided by QEMU for atomic operations. Special thanks goes to Paolo Bonzini for his refactoring suggestion in order to use the already existing atomic builtins interface. Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-09-08 11:12:43 +01:00
Richard W.M. Jones	41c2346716	curl: The macro that you have to uncomment to get debugging is DEBUG_CURL. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2014-09-02 22:38:16 +04:00
Fam Zheng	8df3abfcee	quorum: Fix leak of opts in quorum_open Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 17:10:18 +01:00
Fam Zheng	3158593126	blkverify: Fix leak of opts in blkverify_open Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 17:10:18 +01:00
Fam Zheng	810f4f86b7	nfs: Fix leak of opts in nfs_file_open Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 17:10:18 +01:00
Richard W.M. Jones	a2f468e48f	curl: Don't deref NULL pointer in call to aio_poll. In commit `63f0f45f2e` the following mechanical change was made: if (!state) { - qemu_aio_wait(); + aio_poll(state->s->aio_context, true); } The new code now checks if state is NULL and then dereferences it ('state->s') which is obviously incorrect. This commit replaces state->s->aio_context with bdrv_get_aio_context(bs), fixing this problem. The two other hunks are concerned with getting the BlockDriverState pointer bs to where it is needed. The original bug causes a segfault when using libguestfs to access a VMware vCenter Server and doing any kind of complex read-heavy operations. With this commit the segfault goes away. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 16:19:01 +01:00
Richard W.M. Jones	a94f83d94f	curl: Allow a cookie or cookies to be sent with http/https requests. In order to access VMware ESX efficiently, we need to send a session cookie. This patch is very simple and just allows you to send that session cookie. It punts on the question of how you get the session cookie in the first place, but in practice you can just run a `curl' command against the server and extract the cookie that way. To use it, add file.cookie to the curl URL. For example: $ qemu-img info 'json: { "file.driver":"https", "file.url":"https://vcenter/folder/Windows%202003/Windows%202003-flat.vmdk?dcPath=Datacenter&dsName=datastore1", "file.sslverify":"off", "file.cookie":"vmware_soap_session=\"52a01262-bf93-ccce-d379-8dabb3e55560\""}' image: [...] file format: raw virtual size: 8.0G (8589934592 bytes) disk size: unavailable Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 16:11:14 +01:00
Stefan Hajnoczi	2cdff7f620	linux-aio: avoid deadlock in nested aio_poll() calls If two Linux AIO request completions are fetched in the same io_getevents() call, QEMU will deadlock if request A's callback waits for request B to complete using an aio_poll() loop. This was reported to happen with the mirror blockjob. This patch moves completion processing into a BH and makes it resumable. Nested event loops can resume completion processing so that request B will complete and the deadlock will not occur. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ming Lei <ming.lei@canonical.com> Cc: Marcin Gibuła <m.gibula@beyond.pl> Reported-by: Marcin Gibuła <m.gibula@beyond.pl> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Marcin Gibuła <m.gibula@beyond.pl>	2014-08-29 15:59:17 +01:00
Liu Yuan	a780dea045	sheepdog: fix a core dump while do auto-reconnecting We should reinit local_err as NULL inside the while loop or g_free() will report corrupption and abort the QEMU when sheepdog driver tries reconnecting. This was broken in commit `356b4ca`. qemu-system-x86_64: failed to get the header, Resource temporarily unavailable qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: (null) [xcb] Unknown sequence number while awaiting reply [xcb] Most likely this is a multi-threaded client and XInitThreads has not been called [xcb] Aborting, sorry about that. qemu-system-x86_64: ../../src/xcb_io.c:298: poll_for_response: Assertion `!xcb_xlib_threads_sequence_lost' failed. Aborted (core dumped) Cc: qemu-devel@nongnu.org Cc: Markus Armbruster <armbru@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 10:46:58 +01:00
Paolo Bonzini	b493317d34	aio-win32: add support for sockets Uses the same select/WSAEventSelect scheme as main-loop.c. WSAEventSelect() is edge-triggered, so it cannot be used directly, but it is still used as a way to exit from a blocking g_poll(). Before g_poll() is called, we poll sockets with a non-blocking select() to achieve the level-triggered semantics we require: if a socket is ready, the g_poll() is made non-blocking too. Based on a patch from Or Goshen. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 10:46:58 +01:00
Liu Yuan	a9db86b223	block/quorum: add simple read pattern support This patch adds single read pattern to quorum driver and quorum vote is default pattern. For now we do a quorum vote on all the reads, it is designed for unreliable underlying storage such as non-redundant NFS to make sure data integrity at the cost of the read performance. For some use cases as following: VM -------------- \| \| v v A B Both A and B has hardware raid storage to justify the data integrity on its own. So it would help performance if we do a single read instead of on all the nodes. Further, if we run VM on either of the storage node, we can make a local read request for better performance. This patch generalize the above 2 nodes case in the N nodes. That is, vm -> write to all the N nodes, read just one of them. If single read fails, we try to read next node in FIFO order specified by the startup command. The 2 nodes case is very similar to DRBD[1] though lack of auto-sync functionality in the single device/node failure for now. But compared with DRBD we still have some advantages over it: - Suppose we have 20 VMs running on one(assume A) of 2 nodes' DRBD backed storage. And if A crashes, we need to restart all the VMs on node B. But for practice case, we can't because B might not have enough resources to setup 20 VMs at once. So if we run our 20 VMs with quorum driver, and scatter the replicated images over the data center, we can very likely restart 20 VMs without any resource problem. After all, I think we can build a more powerful replicated image functionality on quorum and block jobs(block mirror) to meet various High Availibility needs. E.g, Enable single read pattern on 2 children, -drive driver=quorum,children.0.file.filename=0.qcow2,\ children.1.file.filename=1.qcow2,read-pattern=fifo,vote-threshold=1 [1] http://en.wikipedia.org/wiki/Distributed_Replicated_Block_Device [Dropped \n from an error_setg() error message --Stefan] Cc: Benoit Canet <benoit@irqsave.net> Cc: Eric Blake <eblake@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 10:46:58 +01:00
Hitoshi Mitake	38890b246d	sheepdog: improve error handling for a case of failed lock Recently, sheepdog revived its VDI locking functionality. This patch updates sheepdog driver of QEMU for this feature. It changes an error code for a case of failed locking. -EBUSY is a suitable one. Reported-by: Valerio Pachera <sirio81@gmail.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Liu Yuan <namei.unix@gmail.com> Cc: MORITA Kazutaka <morita.kazutaka@gmail.com> Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 10:46:57 +01:00
Hitoshi Mitake	1dbfafed7f	sheepdog: adopting protocol update for VDI locking The update is required for supporting iSCSI multipath. It doesn't affect behavior of QEMU driver but adding a new field to vdi request struct is required. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Liu Yuan <namei.unix@gmail.com> Cc: MORITA Kazutaka <morita.kazutaka@gmail.com> Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 10:46:57 +01:00
Daniel Henrique Barboza	212aefaa53	block.curl: adding 'timeout' option The curl hardcoded timeout (5 seconds) sometimes is not long enough depending on the remote server configuration and network traffic. The user should be able to set how much long he is willing to wait for the connection. Adding a new option to set this timeout gives the user this flexibility. The previous default timeout of 5 seconds will be used if this option is not present. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Reviewed-by: Benoit Canet <benoit.canet@nodalink.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-29 10:46:57 +01:00
Stefan Hajnoczi	6d0de8eb21	mirror: fix uninitialized variable delay_ns warnings The gcc 4.1.2 compiler warns that delay_ns may be uninitialized in mirror_iteration(). There are two break statements in the do ... while loop that skip over the delay_ns assignment. These are probably the cause of the warning. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>	2014-08-28 13:42:25 +01:00
Markus Armbruster	0a156f7c75	vmdk: Use bdrv_nb_sectors() where sectors, not bytes are wanted Instead of bdrv_getlength(). Commit `57322b7` did this all over block, but one more bdrv_getlength() has crept in since. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-22 11:10:12 +02:00
Fam Zheng	cbf95a0b11	blkdebug: Delete BH in bdrv_aio_cancel Otherwise error_callback_bh will access the already released acb. Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-22 11:07:00 +02:00
Stefan Hajnoczi	61ed73cff4	raw-posix: fix O_DIRECT short reads The following O_DIRECT read from a <512 byte file fails: $ truncate -s 320 test.img $ qemu-io -n -c 'read -P 0 0 512' test.img qemu-io: can't open device test.img: Could not read image for determining its format: Invalid argument Note that qemu-io completes successfully without the -n (O_DIRECT) option. This patch fixes qemu-iotests ./check -nocache -vmdk 059. Cc: qemu-stable@nongnu.org Suggested-by: Kevin Wolf <kwolf@redhat.com> Reported-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-22 11:00:56 +02:00
Peter Lieven	d832fb4d66	block/iscsi: fix memory corruption on iscsi resize bs->total_sectors is not yet updated at this point. resulting in memory corruption if the volume has grown and data is written to the newly availble areas. CC: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-22 10:55:22 +02:00
Michael Tokarev	13b552c2f4	block/vvfat.c: remove debugging code to reinit stderr if NULL Just log to stderr unconditionally, like other similar code does. Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-21 10:36:29 +02:00
Max Reitz	fafcfe228d	quorum: Implement bdrv_refresh_filename() Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 14:31:56 +02:00
Max Reitz	2019d68b3b	nbd: Implement bdrv_refresh_filename() Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 14:31:56 +02:00
Max Reitz	74b36b2eb8	blkverify: Implement bdrv_refresh_filename() Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 14:31:56 +02:00
Max Reitz	2c31b04c94	blkdebug: Implement bdrv_refresh_filename() Because blkdebug cannot simply create a configuration file, simply refuse to reconstruct a plain filename and only generate an options QDict from the rules instead. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 14:31:56 +02:00
Max Reitz	6c1c8d5d3e	qcow2: Add runtime options for cache sizes Add options for specifying the size of the metadata caches. This can either be done directly for each cache (if only one is given, the other will be derived according to a default ratio) or combined for both. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 11:51:28 +02:00
Max Reitz	02004bd4ba	qcow2: Use g_try_new0() for cache array With a variable cache size, the number given to qcow2_cache_create() may be huge. Therefore, use g_try_new0(). While at it, use g_new0() instead of g_malloc0() for allocating the Qcow2Cache object. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 11:51:28 +02:00
Max Reitz	440ba08aea	qcow2: Constant cache size in bytes Specifying the metadata cache sizes in clusters results in less clusters (and much less bytes) covered for small cluster sizes and vice versa. Using a constant byte size reduces this difference, and makes it possible to manually specify the cache size in an easily comprehensible unit. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 11:51:28 +02:00
Markus Armbruster	d4df3dbc02	block: Drop some superfluous casts from void * They clutter the code. Unfortunately, I can't figure out how to make Coccinelle drop all of them, so I have to settle for common special cases: @@ type T; T pt; void pv; @@ - pt = (T )pv; + pt = pv; @@ type T; @@ - (T ) ($g_malloc\\|g_malloc0\\|g_realloc\\|g_new\\|g_new0\\|g_renew\\| g_try_malloc\\|g_try_malloc0\\|g_try_realloc\\| g_try_new\\|g_try_new0\\|g_try_renew$(...)) Topped off with minor manual style cleanups. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 11:51:28 +02:00
Markus Armbruster	02c4f26b15	block: Use g_new() & friends to avoid multiplying sizes g_new(T, n) is safer than g_malloc(sizeof(v) n) for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void *, which lets the compiler catch more type errors. Perhaps a conversion to g_malloc_n() would be neater in places, but that's merely four years old, and we can't use such newfangled stuff. This commit only touches allocations with size arguments of the form sizeof(T), plus two that use 4 instead of sizeof(uint32_t). We can make the others safe by converting to g_malloc_n() when it becomes available to us in a couple of years. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 11:51:28 +02:00
Markus Armbruster	5839e53bbc	block: Use g_new() & friends where that makes obvious sense g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer, for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void , which lets the compiler catch more type errors. Patch created with Coccinelle, with two manual changes on top: Add const to bdrv_iterate_format() to keep the types straight * Convert the allocation in bdrv_drop_intermediate(), which Coccinelle inexplicably misses Coccinelle semantic patch: @@ type T; @@ -g_malloc(sizeof(T)) +g_new(T, 1) @@ type T; @@ -g_try_malloc(sizeof(T)) +g_try_new(T, 1) @@ type T; @@ -g_malloc0(sizeof(T)) +g_new0(T, 1) @@ type T; @@ -g_try_malloc0(sizeof(T)) +g_try_new0(T, 1) @@ type T; expression n; @@ -g_malloc(sizeof(T) * (n)) +g_new(T, n) @@ type T; expression n; @@ -g_try_malloc(sizeof(T) * (n)) +g_try_new(T, n) @@ type T; expression n; @@ -g_malloc0(sizeof(T) * (n)) +g_new0(T, n) @@ type T; expression n; @@ -g_try_malloc0(sizeof(T) * (n)) +g_try_new0(T, n) @@ type T; expression p, n; @@ -g_realloc(p, sizeof(T) * (n)) +g_renew(T, p, n) @@ type T; expression p, n; @@ -g_try_realloc(p, sizeof(T) * (n)) +g_try_renew(T, p, n) Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 11:51:28 +02:00
Stefan Hajnoczi	39ba3bf69c	qcow2: fix new_blocks double-free in alloc_refcount_block() Commit `de82815db1` ("qcow2: Handle failure for potentially large allocations") introduced a double-free of new_blocks in the alloc_refcount_block() error path. The qemu-iotests qcow2 026 test case was failing because qemu-io segfaulted. Make sure new_blocks is NULL after we free it the first time. Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 18:03:26 +01:00
Denis V. Lunev	d25d598020	parallels: 2TB+ parallels images support Parallels has released in the recent updates of Parallels Server 5/6 new addition to his image format. Images with signature WithouFreSpacExt have offsets in the catalog coded not as offsets in sectors (multiple of 512 bytes) but offsets coded in blocks (i.e. header->tracks * 512) In this case all 64 bits of header->nb_sectors are used for image size. This patch implements support of this for qemu-img and also adds specific check for an incorrect image. Images with block size greater than INT_MAX/513 are not supported. The biggest available Parallels image cluster size in the field is 1 Mb. Thus this limit will not hurt anyone. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Jeff Cody <jcody@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 18:03:13 +01:00
Denis V. Lunev	418a7adb77	parallels: split check for parallels format in parallels_open and rework error path a bit. There is no difference at the moment, but the code will be definitely shorter when additional processing will be required for WithouFreSpacExt Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Jeff Cody <jcody@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 18:03:13 +01:00
Denis V. Lunev	f08e2f8465	parallels: replace tabs with spaces in block/parallels.c Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Jeff Cody <jcody@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 18:03:13 +01:00
Denis V. Lunev	8c27d54fa0	parallels: extend parallels format header with actual data values Parallels image format has several additional fields inside: - nb_sectors is actually 64 bit wide. Upper 32bits are not used for images with signature "WithoutFreeSpace" and must be explicitly zeroed according to Parallels. They will be used for images with signature "WithouFreSpacExt" - inuse is magic which means that the image is currently opened for read/write or was not closed correctly, the magic is 0x746f6e59 - data_off is the location of the first data block. It can be zero and in this case data starts just beyond the header aligned to 512 bytes. Though this field does not matter for read-only driver This patch adds these values to struct parallels_header and adds proper handling of nb_sectors for currently supported WithoutFreeSpace images. WithouFreSpacExt will be covered in next patches. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Jeff Cody <jcody@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 18:03:13 +01:00
Paolo Bonzini	9e52c53b8c	blkdebug: report errors on flush too Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 18:03:11 +01:00
Max Reitz	ff52aab2df	qcow2: Catch !*host_offset for data allocation qcow2_alloc_cluster_offset() uses host_offset == 0 as "no preferred offset" for the (data) cluster range to be allocated. However, this offset is actually valid and may be allocated on images with a corrupted refcount table or first refcount block. In this case, the corruption prevention should normally catch that write anyway (because it would overwrite the image header). But since 0 is a special value here, the function assumes that nothing has been allocated at all which it asserts against. Because this condition is not qemu's fault but rather that of a broken image, it shouldn't throw an assertion but rather mark the image corrupt and show an appropriate message, which this patch does by calling the corruption check earlier than it would be called normally (before the assertion). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-15 15:07:16 +02:00
Max Reitz	8fcffa9853	qcow2: Return useful error code in refcount_init() If bdrv_pread() returns an error, it is very unlikely that it was ENOMEM. In this case, the return value should be passed along; as bdrv_pread() will always either return the number of bytes read or a negative value (the error code), the condition for checking whether bdrv_pread() failed can be simplified (and clarified) as well. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-15 15:07:16 +02:00
Kevin Wolf	7504edf477	mirror: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the mirror block job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-15 15:07:16 +02:00
Kevin Wolf	5fb09cd586	vpc: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the vpc block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-15 15:07:16 +02:00
Kevin Wolf	d6e5993197	vmdk: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the vmdk block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-15 15:07:16 +02:00
Kevin Wolf	a67e128a4f	vhdx: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the vhdx block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-15 15:07:16 +02:00
Kevin Wolf	17cce73578	vdi: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the vdi block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-15 15:07:16 +02:00
Kevin Wolf	0f7a02379b	rbd: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the rbd block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:16 +02:00
Kevin Wolf	4b6af3d58a	raw-win32: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the raw-win32 block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:16 +02:00
Kevin Wolf	50d4a858e6	raw-posix: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the raw-posix block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:15 +02:00
Kevin Wolf	4f4896db5f	qed: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the qed block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-15 15:07:15 +02:00
Kevin Wolf	de82815db1	qcow2: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the qcow2 block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:15 +02:00
Kevin Wolf	0df93305f2	qcow1: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the qcow1 block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:15 +02:00
Kevin Wolf	f7b593d937	parallels: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the parallels block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-15 15:07:15 +02:00
Kevin Wolf	2347dd7b68	nfs: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the nfs block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-15 15:07:15 +02:00
Kevin Wolf	4d5a3f888c	iscsi: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the iscsi block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-08-15 15:07:15 +02:00
Kevin Wolf	b546a94474	dmg: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the dmg block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-15 15:07:15 +02:00
Kevin Wolf	8dc7a7725b	curl: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the curl block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-15 15:07:15 +02:00
Kevin Wolf	4ae7a52e43	cloop: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the cloop block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-15 15:07:15 +02:00
Kevin Wolf	7bf665ee35	bochs: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the bochs block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>	2014-08-15 15:07:15 +02:00
Jeff Cody	fef6070eff	block: vpc - use block layer ops in vpc_create, instead of posix calls Use the block layer to create, and write to, the image file in the VPC .bdrv_create() operation. This has a couple of benefits: Images can now be created over protocols, and hacks such as NOCOW are not needed in the image format driver, and the underlying file protocol appropriate for the host OS can be relied upon. Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-15 15:07:15 +02:00
Jeff Cody	dddc7750d6	block: use the standard 'ret' instead of 'result' Most QEMU code uses 'ret' for function return values. The VDI driver uses a mix of 'result' and 'ret'. This cleans that up, switching over to the standard 'ret' usage. Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-15 15:07:15 +02:00
Jeff Cody	70747862f1	block: vdi - use block layer ops in vdi_create, instead of posix calls Use the block layer to create, and write to, the image file in the VDI .bdrv_create() operation. This has a couple of benefits: Images can now be created over protocols, and hacks such as NOCOW are not needed in the image format driver, and the underlying file protocol appropriate for the host OS can be relied upon. Also some minor cleanup for error handling. Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-15 15:07:14 +02:00
Jeff Cody	4f75b52a07	block: VHDX endian fixes This patch contains several changes for endian conversion fixes for VHDX, particularly for big-endian machines (multibyte values in VHDX are all on disk in LE format). Tests were done with existing qemu-iotests on an IBM POWER7 (8406-71Y). This includes sample images created by Hyper-V, both with dirty logs and without. In addition, VHDX image files created (and written to) on a BE machine were tested on a LE machine, and vice-versa. Reported-by: Markus Armburster <armbru@redhat.com> Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-15 15:07:14 +02:00
Jeff Cody	349592e0b9	block: vhdx - add error check This add an error check for an invalid descriptor entry signature, when flushing the log descriptor entries. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-15 15:07:14 +02:00
Chrysostomos Nanakos	76d3d83a37	block/archipelago: Add support for creating images qemu-img archipelago:<volumename>[/mport=<mapperd_port>[:vport=<vlmcd_port>] [:segment=<segment_name>]] [size] Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:14 +02:00
Chrysostomos Nanakos	70537a8506	block/archipelago: Implement bdrv_parse_filename() VM Image on Archipelago volume can also be specified like this: file=archipelago:<volumename>[/mport=<mapperd_port>[:vport=<vlmcd_port>][: segment=<segment_name>]] Examples: file=archipelago:my_vm_volume file=archipelago:my_vm_volume/mport=123 file=archipelago:my_vm_volume/mport=123:vport=1234 file=archipelago:my_vm_volume/mport=123:vport=1234:segment=my_segment Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:14 +02:00
Chrysostomos Nanakos	c9a12e751b	block: Support Archipelago as a QEMU block backend VM Image on Archipelago volume is specified like this: file.driver=archipelago,file.volume=<volumename>[,file.mport=<mapperd_port>[, file.vport=<vlmcd_port>][,file.segment=<segment_name>]] 'archipelago' is the protocol. 'mport' is the port number on which mapperd is listening. This is optional and if not specified, QEMU will make Archipelago to use the default port. 'vport' is the port number on which vlmcd is listening. This is optional and if not specified, QEMU will make Archipelago to use the default port. 'segment' is the name of the shared memory segment Archipelago stack is using. This is optional and if not specified, QEMU will make Archipelago to use the default value, 'archipelago'. Examples: file.driver=archipelago,file.volume=my_vm_volume file.driver=archipelago,file.volume=my_vm_volume,file.mport=123 file.driver=archipelago,file.volume=my_vm_volume,file.mport=123, file.vport=1234 file.driver=archipelago,file.volume=my_vm_volume,file.mport=123, file.vport=1234,file.segment=my_segment Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-15 15:07:14 +02:00
Chunyan Liu	000c4dfff4	qemu-img info: show nocow info Add nocow info in 'qemu-img info' output to show whether the file currently has NOCOW flag set or not. Signed-off-by: Chunyan Liu <cyliu@suse.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:14 +02:00
Fam Zheng	c6ac36e145	vmdk: Optimize cluster allocation This drops the unnecessary bdrv_truncate() from, and also improves, cluster allocation code path. Before, when we need a new cluster, get_cluster_offset truncates the image to bdrv_getlength() + cluster_size, and returns the offset of added area, i.e. the image length before truncating. This is not efficient, so it's now rewritten as: - Save the extent file length when opening. - When allocating cluster, use the saved length as cluster offset. - Don't truncate image, because we'll anyway write data there: just write any data at the EOF position, in descending priority: * New user data (cluster allocation happens in a write request). * Filling data in the beginning and/or ending of the new cluster, if not covered by user data: either backing file content (COW), or zero for standalone images. One major benifit of this change is, on host mounted NFS images, even over a fast network, ftruncate is slow (see the example below). This change significantly speeds up cluster allocation. Comparing by converting a cirros image (296M) to VMDK on an NFS mount point, over 1Gbe LAN: $ time qemu-img convert cirros-0.3.1.img /mnt/a.raw -O vmdk Before: real 0m21.796s user 0m0.130s sys 0m0.483s After: real 0m2.017s user 0m0.047s sys 0m0.190s We also get rid of unchecked bdrv_getlength() and bdrv_truncate(), and get a little more documentation in function comments. Tested that this passes qemu-iotests for all VMDK subformats. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:14 +02:00
Markus Armbruster	52bf1e722d	block: Avoid bdrv_get_geometry() where errors should be detected bdrv_get_geometry() hides errors. Use bdrv_nb_sectors() or bdrv_getlength() instead where that's obviously inappropriate. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:13 +02:00
Markus Armbruster	75d3d21f9e	block: Drop superfluous aligning of bdrv_getlength()'s value It returns a multiple of the sector size. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:13 +02:00
Markus Armbruster	57322b7811	block: Use bdrv_nb_sectors() where sectors, not bytes are wanted Instead of bdrv_getlength(). Aside: a few of these callers don't handle errors. I didn't investigate whether they should. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:13 +02:00
Kevin Wolf	df26a35025	raw-posix: Fail gracefully if no working alignment is found If qemu couldn't find out what O_DIRECT alignment to use with a given file, it would run into assert(bdrv_opt_mem_align(bs) != 0); in block.c and confuse users. This adds a more descriptive error message for such cases. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-07-18 13:18:43 +01:00

... 2 3 4 5 6 ...

1916 Commits