mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Daniel P. Berrange	10817bf09d	coroutine: move into libqemuutil.a library The coroutine files are currently referenced by the block-obj-y variable. The coroutine functionality though is already used by more than just the block code. eg migration code uses coroutine yield. In the future the I/O channel code will also use the coroutine yield functionality. Since the coroutine code is nicely self-contained it can be easily built as part of the libqemuutil.a library, making it widely available. The headers are also moved into include/qemu, instead of the include/block directory, since they are now part of the util codebase, and the impl was never in the block/ directory either. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2015-10-20 14:59:04 +01:00
Fam Zheng	6b826af7b0	blkdebug: Don't confuse image as backing file The word "backing file" nowadays refers to the backing_hd in the external snapshot sense (i.e. bs->backing_hd), instead of the file sense (bs->file). Correct the comment to use the right term. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-16 15:35:48 +02:00
Kevin Wolf	e394621fbd	qcow2: Remove forward declaration of QCowAIOCB This struct doesn't exist any more since commit `3fc48d09` in August 2011, it's about time to remove its forward declaration. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-10-16 15:34:30 +02:00
Stefan Hajnoczi	1501ecc1d8	raw-posix: warn about BDRV_O_NATIVE_AIO if libaio is unavailable raw-posix.c silently ignores BDRV_O_NATIVE_AIO if libaio is unavailable. It is confusing when aio=native performance is identical to aio=threads because the binary was accidentally built without libaio. Print a deprecation warning if -drive aio=native is used with a binary that does not support libaio. There are probably users using aio=native who would be inconvenienced if QEMU suddenly refused to start their guests. In the future this will become an error. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-16 15:34:30 +02:00
Kevin Wolf	7e39d3a2dd	blkverify: Fix BDS leak in .bdrv_open error path Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2015-10-16 15:34:30 +02:00
Kevin Wolf	8e419aefa0	block: Remove bdrv_swap() bdrv_swap() is unused now. Remove it and all functions that have no other users than bdrv_swap(). In particular, this removes the .bdrv_rebind callbacks from block drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:30 +02:00
Kevin Wolf	3f09bfbc7b	block: Add and use bdrv_replace_in_backing_chain() This cleans up the mess we left behind in the mirror code after the previous patch. Instead of using bdrv_swap(), just change pointers. The interface change of the mirror job that callers must consider is that after job completion, their local BDS pointers still point to the same node now. qemu-img must change its code accordingly (which makes it easier to understand); the other callers stays unchanged because after completion they don't do anything with the BDS, but just with the job, and the job is still owned by the source BDS. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:30 +02:00
Kevin Wolf	8ccb9569a9	blockjob: Store device name at job creation Some block jobs change the block device graph on completion. This means that the device that owns the job and originally was addressed with its device name may no longer be what the corresponding BlockBackend points to. Previously, the effects of bdrv_swap() ensured that the job was (at least partially) transferred to the target image. Events that contain the device name could still use bdrv_get_device_name(job->bs) and get the same result. After removing bdrv_swap(), this won't work any more. Instead, save the device name at job creation and use that copy for QMP events and anything else identifying the job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:30 +02:00
Kevin Wolf	a2d6190048	block-backend: Add blk_set_bs() It allows changing the BlockDriverState that a BlockBackend points to. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	439db28cf9	block/io: Make bdrv_requests_pending() public Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	5db15a5769	block: Manage backing file references in bdrv_set_backing_hd() This simplifies the code somewhat, especially when dropping whole backing file subchains. The exception is the mirroring code that does adventurous things with bdrv_swap() and in order to keep it working, I had to duplicate most of bdrv_set_backing_hd() locally. We'll get rid again of this ugliness shortly. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	760e006384	block: Convert bs->backing_hd to BdrvChild This is the final step in converting all of the BlockDriverState pointers that block drivers use to BdrvChild. After this patch, bs->children contains the full list of child nodes that are referenced by a given BDS, and these children are only referenced through BdrvChild, so that updating the pointer in there is enough for changing edges in the graph. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	9a4f4c3156	block: Convert bs->file to BdrvChild This patch removes the temporary duplication between bs->file and bs->file_child by converting everything to BdrvChild. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	0bd6e91a7e	quorum: Convert to BdrvChild Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	3e586be0b2	blkverify: Convert s->test_file to BdrvChild Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	24bc15d1f6	vmdk: Use BdrvChild instead of BDS for references to extents Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Paolo Bonzini	c84b31926f	block: switch from g_slice allocator to malloc Simplify memory allocation by sticking with a single API. GSlice is not that fast anyway (tcmalloc/jemalloc are better). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-12 11:17:45 +01:00
Paolo Bonzini	eab2ac9d3c	block/ssh: remove dead code The "err" label cannot be reached with qp != NULL. Remove the free-ing of qp and avoid future regressions by removing the initializer. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> ACKed-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2015-10-08 19:46:01 +03:00
Richard W.M. Jones	73ba05d936	block/raw-posix: Open file descriptor O_RDWR to work around glibc posix_fallocate emulation issue. https://bugzilla.redhat.com/show_bug.cgi?id=1265196 The following command fails on an NFS mountpoint: $ qemu-img create -f qcow2 -o preallocation=falloc disk.img 262144 Formatting 'disk.img', fmt=qcow2 size=262144 encryption=off cluster_size=65536 preallocation='falloc' lazy_refcounts=off qemu-img: disk.img: Could not preallocate data for the new file: Bad file descriptor The reason turns out to be because NFS doesn't support the posix_fallocate call. glibc emulates it instead. However glibc's emulation involves using the pread(2) syscall. The pread syscall fails with EBADF if the file descriptor is opened without the read open-flag (ie. open (..., O_WRONLY)). I contacted glibc upstream about this, and their response is here: https://bugzilla.redhat.com/show_bug.cgi?id=1265196#c9 There are two possible fixes: Use Linux fallocate directly, or (this fix) work around the problem in qemu by opening the file with O_RDWR instead of O_WRONLY. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1265196 Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-10-02 13:48:29 +02:00
Kevin Wolf	5d555030ba	raw-win32: Fix write request error handling aio_worker() wrote the return code to the wrong variable. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Guangmu Zhu <guangmuzhu@gmail.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-10-02 13:48:29 +02:00
Jeff Cody	5279efebcf	block: mirror - fix full sync mode when target does not support zero init During mirror, if the target device does not support zero init, a mirror may result in a corrupted image for sync="full" mode. This is due to how the initial dirty bitmap is set up prior to copying data - we did not mark sectors as dirty that are unallocated. This means those unallocated sectors are skipped over on the target, and for a device without zero init, invalid data may reside in those holes. If both of the following conditions are true, then we will explicitly mark all sectors as dirty: 1.) sync = "full" 2.) bdrv_has_zero_init(target) == false If the target does support zero init, but a target image is passed in with data already present (i.e. an "existing" image), it is assumed the data present in the existing image is valid data for those sectors. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 91ed4bc5bda7e2b09eb508b07c83f4071fe0b3c9.1443705220.git.jcody@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-10-01 15:02:21 -04:00
Peter Maydell	9e071429e6	* First batch of MAINTAINERS updates * IOAPIC fixes (to pass kvm-unit-tests with -machine kernel_irqchip=off) * NBD API upgrades from Daniel * strtosz fixes from Marc-André * improved support for readonly=on on scsi-generic devices * new "info ioapic" and "info lapic" monitor commands * Peter Crosthwaite's ELF_MACHINE cleanups * docs patches from Thomas and Daniel -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJWBSAEAAoJEL/70l94x66DeL4H/21YR4GWCqo30f+W5kx24ZNo by8H2kdZmWKRr/La1JlAReki9GCP1U8Q0cYC8V885gHLKcahWS/75UKwNbw0OSyg 2jj4uREc645TTFAvV5kQ+uAw9F/dchvkXylrVgOoUPipfmYibXY8JLu9AcVnZi6H X5Rvpqo4Uhp2cbRG7rYWrwgpNL+VZmKc8LDdqdlXrkjjanhuAYO2E9NBKaE+xJQQ FHcpkV92iSZFEZ0CB535BTIdNdDM/ae6bw1As27EF10YBTfneCQNazSeh13pLO2n lHit2GZr2VeTSBrPkPsItToY/Gw38duVZK4QM5/wSkHBzyeUJY0ltQrf53veYfk= =uc+I -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * First batch of MAINTAINERS updates * IOAPIC fixes (to pass kvm-unit-tests with -machine kernel_irqchip=off) * NBD API upgrades from Daniel * strtosz fixes from Marc-André * improved support for readonly=on on scsi-generic devices * new "info ioapic" and "info lapic" monitor commands * Peter Crosthwaite's ELF_MACHINE cleanups * docs patches from Thomas and Daniel # gpg: Signature made Fri 25 Sep 2015 11:20:52 BST using RSA key ID 78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" * remotes/bonzini/tags/for-upstream: (52 commits) doc: Refresh URLs in the qemu-tech documentation docs: describe the QEMU build system structure / design typedef: add typedef for QemuOpts i386: interrupt poll processing i386: partial revert of interrupt poll fix ppc: Rename ELF_MACHINE to be PPC specific i386: Rename ELF_MACHINE to be x86 specific alpha: Remove ELF_MACHINE from cpu.h mips: Remove ELF_MACHINE from cpu.h sparc: Remove ELF_MACHINE from cpu.h s390: Remove ELF_MACHINE from cpu.h sh4: Remove ELF_MACHINE from cpu.h xtensa: Remove ELF_MACHINE from cpu.h tricore: Remove ELF_MACHINE from cpu.h or32: Remove ELF_MACHINE from cpu.h lm32: Remove ELF_MACHINE from cpu.h unicore: Remove ELF_MACHINE from cpu.h moxie: Remove ELF_MACHINE from cpu.h cris: Remove ELF_MACHINE from cpu.h m68k: Remove ELF_MACHINE from cpu.h ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-09-25 21:52:30 +01:00
Hitoshi Mitake	e6fd57ea29	sheepdog: refine discard support This patch refines discard support of the sheepdog driver. The existing discard mechanism was implemented on SD_OP_DISCARD_OBJ, which was introduced before fine grained reference counting on newer sheepdog. It doesn't care about relations of snapshots and clones and discards objects unconditionally. With this patch, the driver just updates an inode object for updating reference. Removing the object is done in sheep process side. Cc: Teruaki Ishizaki <ishizaki.teruaki@lab.ntt.co.jp> Cc: Vasiliy Tolstov <v.tolstov@selfip.ru> Cc: Jeff Cody <jcody@redhat.com> Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Tested-by: Vasiliy Tolstov <v.tolstov@selfip.ru> Message-id: 1441076590-8015-3-git-send-email-mitake.hitoshi@lab.ntt.co.jp Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-09-25 10:25:19 -04:00
Hitoshi Mitake	498f21405a	sheepdog: use per AIOCB dirty indexes for non overlapping requests In the commit 96b14ff85acf, requests for overlapping areas are serialized. However, it cannot handle a case of non overlapping requests. In such a case, min_dirty_data_idx and max_dirty_data_idx can be overwritten by the requests and invalid inode update can happen e.g. a case like create(1, 2) and create(3, 4) are issued in parallel. This patch lets SheepdogAIOCB have dirty data indexes instead of BDRVSheepdogState for avoiding the above situation. This patch also does trivial renaming for better description: overwrapping -> overlapping Cc: Teruaki Ishizaki <ishizaki.teruaki@lab.ntt.co.jp> Cc: Vasiliy Tolstov <v.tolstov@selfip.ru> Cc: Jeff Cody <jcody@redhat.com> Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Tested-by: Vasiliy Tolstov <v.tolstov@selfip.ru> Message-id: 1441076590-8015-2-git-send-email-mitake.hitoshi@lab.ntt.co.jp Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-09-25 10:25:19 -04:00
Wen Congyang	06c3916b35	Backup: don't do copy-on-read in before_write_notifier We will copy data in before_write_notifier to do backup. It is a nested I/O request, so we cannot do copy-on-read. The steps to reproduce it: 1. -drive copy-on-read=on,... // qemu option 2. drive_backup -f disk0 /path_to_backup.img // monitor command Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Tested-by: Jeff Cody <jcody@redhat.com> Message-id: 1441682913-14320-3-git-send-email-wency@cn.fujitsu.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-09-25 08:37:07 -04:00
Wen Congyang	9568b511c9	block: Introduce a new API bdrv_co_no_copy_on_readv() In some cases, we need to disable copy-on-read, and just read the data. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Message-id: 1441682913-14320-2-git-send-email-wency@cn.fujitsu.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-09-25 08:37:07 -04:00
Liu Yuan	4da65c8092	sheepdog: add reopen support With reopen supported, block-commit (and offline commit) is now supported for image files whose base image uses the Sheepdog protocol driver. Cc: qemu-devel@nongnu.org Cc: Jeff Cody <jcody@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <liuyuan@cmss.chinamobile.com> Message-id: 1440730438-24676-1-git-send-email-namei.unix@gmail.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-09-25 08:37:07 -04:00
Peter Lieven	18a8056e0b	block/nfs: cache allocated filesize for read-only files If the file is readonly its not expected to grow so save the blocking call to nfs_fstat_async and use the value saved at connection time. Also important the monitor (and thus the main loop) will not hang if block device info is queried and the NFS share is unresponsive. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1440671441-7978-1-git-send-email-pl@kamp.de Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-09-25 08:37:07 -04:00
Peter Lieven	055c6f912c	block/nfs: fix calculation of allocated file size st.st_blocks is always counted in 512 byte units. Do not use st.st_blksize as multiplicator which may be larger. Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1440067607-14547-1-git-send-email-pl@kamp.de Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-09-25 08:37:07 -04:00
Daniel P. Berrange	7a5ed43764	nbd: convert to use the QAPI SocketAddress object The nbd block driver currently uses a QemuOpts object when setting up sockets. Switch it over to use the QAPI SocketAddress object instead. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1442411543-28513-2-git-send-email-berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-25 12:04:40 +02:00
Peter Maydell	007e620a75	Block layer patches (v2) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJV9uA2AAoJEH8JsnLIjy/WPa0P/0tjyUtcp+rTd2yAzC+BQlOA dxjq3c3P2HbJKnKa74PwgIBqt7w20TRa8OtMXuJ9XB75iuVRs51dXUDYHUCNvYbW dse33PRAUoSYfiaJ3UrsstM5PJH9sDvPHCBekP9CrUa+S9AdcX/7GUiXaiIwB+sj X2aur6muwFMK6hIHnUTYypx11/pYYvxVOm5xDMHQWtzbtXHeyVyxJvZkLZzT/DJ2 1sP3P65Ku0gZQA3rMOnKV6iYhAxrApgAJzhDzPdjKiD7nfxiatIauTvxXhMM2h6Y bHHAXAHbf8/kBPbklltwuihXX6/OdMM02S7dU42cPp5TFSPYDLLfRoF34pVy8Ycj 9BK8H9NNUg/TbHxWv8JLKcuTvk0wv7TDa+zah/Rt7o6jTSn50sxOWnMbj1KbP+IK 9zkg0hwvUhqDCbkqd1iFYe/5DfVA7eUu5MwhE0Dkncqflmmytw5BZAYFWuPOP4u8 rH66kg8JFIhLp+H0R3lqSBTezLh8GwMQRTNfrbemiDkA8Pd3GXhNHg3tGPTXK+FS 4YwUTL2AaJgDRXzz3CpaYh2Pku5t7LsXKRCG3BR7corkhmTBNiHn6V07D6d1kxHa cnzsG2gvJqDzELzG3tfsTGkfCtNJrqD0Uj+bB+f7V3K7TiN4RcC2b0Nejn54Jp94 YZMLP101bpYIPTkVDnRe =R3AS -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches (v2) # gpg: Signature made Mon 14 Sep 2015 15:56:54 BST using RSA key ID C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" * remotes/kevin/tags/for-upstream: (23 commits) qcow2: Make qcow2_alloc_bytes() more explicit vmdk: Fix next_cluster_sector for compressed write iotests: Add test for checking large image files qcow2: Make size_to_clusters() return uint64_t qemu-iotests: More qcow2 reopen tests qemu-iotests: Reopen qcow2 with lazy-refcounts change qcow2: Support updating driver-specific options in reopen qcow2: Make qcow2_update_options() suitable for transactions qcow2: Fix memory leak in qcow2_update_options() error path qcow2: Leave s unchanged on qcow2_update_options() failure qcow2: Move rest of option handling to qcow2_update_options() qcow2: Move qcow2_update_options() call up qcow2: Factor out qcow2_update_options() qcow2: Improve error message qemu-io: Add command 'reopen' qemu-io: Remove duplicate 'open' error message block: Allow specifying driver-specific options to reopen qcow2: Rename BDRVQcowState to BDRVQcow2State block: Drop bdrv_find_whitelisted_format() block: Drop drv parameter from bdrv_fill_options() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-09-14 18:51:09 +01:00
Peter Maydell	a2aa09e181	* Support for jemalloc * qemu_mutex_lock_iothread "No such process" fix * cutils: qemu_strto* wrappers * iohandler.c simplification * Many other fixes and misc patches. And some MTTCG work (with Emilio's fixes squashed): * Signal-free TCG kick * Removing spinlock in favor of QemuMutex * User-mode emulation multi-threading fixes/docs -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJV8Tk7AAoJEL/70l94x66Ds3QH/3bi0RRR2NtKIXAQrGo5tfuD NPMu1K5Hy+/26AC6mEVNRh4kh7dPH5E4NnDGbxet1+osvmpjxAjc2JrxEybhHD0j fkpzqynuBN6cA2Gu5GUNoKzxxTmi2RrEYigWDZqCftRXBeO2Hsr1etxJh9UoZw5H dgpU3j/n0Q8s08jUJ1o789knZI/ckwL4oXK4u2KhSC7ZTCWhJT7Qr7c0JmiKReaF JEYAsKkQhICVKRVmC8NxML8U58O8maBjQ62UN6nQpVaQd0Yo/6cstFTZsRrHMHL3 7A2Tyg862cMvp+1DOX3Bk02yXA+nxnzLF8kUe0rYo6llqDBDStzqyn1j9R0qeqA= =nB06 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * Support for jemalloc * qemu_mutex_lock_iothread "No such process" fix * cutils: qemu_strto* wrappers * iohandler.c simplification * Many other fixes and misc patches. And some MTTCG work (with Emilio's fixes squashed): * Signal-free TCG kick * Removing spinlock in favor of QemuMutex * User-mode emulation multi-threading fixes/docs # gpg: Signature made Thu 10 Sep 2015 09:03:07 BST using RSA key ID 78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" * remotes/bonzini/tags/for-upstream: (44 commits) cutils: work around platform differences in strto{l,ul,ll,ull} cpu-exec: fix lock hierarchy for user-mode emulation exec: make mmap_lock/mmap_unlock globally available tcg: comment on which functions have to be called with mmap_lock held tcg: add memory barriers in page_find_alloc accesses remove unused spinlock. replace spinlock by QemuMutex. cpus: remove tcg_halt_cond and tcg_cpu_thread globals cpus: protect work list with work_mutex scripts/dump-guest-memory.py: fix after RAMBlock change configure: Add support for jemalloc add macro file for coccinelle configure: factor out adding disas configure vhost-scsi: fix wrong vhost-scsi firmware path checkpatch: remove tests that are not relevant outside the kernel checkpatch: adapt some tests to QEMU CODING_STYLE: update mixed declaration rules qmp: Add example usage of strto*l() qemu wrapper cutils: Add qemu_strtoull() wrapper cutils: Add qemu_strtoll() wrapper ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-09-14 16:13:16 +01:00
Max Reitz	2ac01520be	qcow2: Make qcow2_alloc_bytes() more explicit In case of -EAGAIN returned by update_refcount(), we should discard the cluster offset we were trying to allocate and request a new one, because in theory that old offset might now be taken by a refcount block. In practice, this was not the case due to update_refcount() generally returning strictly monotonic increasing cluster offsets. However, this behavior is not set in stone, and it is also not obvious when looking at qcow2_alloc_bytes() alone, so we should not rely on it. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-09-14 16:51:37 +02:00
Radoslav Gerganov	3efffc3292	vmdk: Fix next_cluster_sector for compressed write When the VMDK is streamOptimized (or compressed), the next_cluster_sector must not be incremented by a fixed number of sectors. Instead of this, it must be rounded up to the next consecutive sector. Fixing this results in much smaller compressed images. Signed-off-by: Radoslav Gerganov <rgerganov@vmware.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-09-14 16:51:37 +02:00
Max Reitz	b6d36def6d	qcow2: Make size_to_clusters() return uint64_t Sadly, some images may have more clusters than what can be represented using a plain int. We should be prepared for that case (in qcow2_check_refcounts() we actually were trying to catch that case, but since size_to_clusters() truncated the returned value, that check never did anything useful). Cc: qemu-stable <qemu-stable@nongnu.org> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-09-14 16:51:37 +02:00
Kevin Wolf	5b0959a7d4	qcow2: Support updating driver-specific options in reopen For updating the cache sizes, disabling lazy refcounts and updating the clean_cache_timer there is a bit more to do than just changing the variables, but otherwise we're all set for changing options during bdrv_reopen(). Just implement the missing pieces and hook the functions up in bdrv_reopen(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-09-14 16:51:37 +02:00
Kevin Wolf	ee55b17304	qcow2: Make qcow2_update_options() suitable for transactions Before we can allow updating options at runtime with bdrv_reopen(), we need to split the function into prepare/commit/abort parts. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-09-14 16:51:37 +02:00
Kevin Wolf	c1344ded70	qcow2: Fix memory leak in qcow2_update_options() error path Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-09-14 16:51:36 +02:00
Kevin Wolf	007dbc396c	qcow2: Leave s unchanged on qcow2_update_options() failure On return, either all new options should be applied to BDRVQcowState (on success), or all of the old settings should be preserved (on failure). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-09-14 16:51:36 +02:00
Kevin Wolf	94edf3fbe8	qcow2: Move rest of option handling to qcow2_update_options() With this commit, the handling of driver-specific options in qcow2_open() is completely separated out into qcow2_update_options(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-09-14 16:51:36 +02:00
Kevin Wolf	90efa0eaef	qcow2: Move qcow2_update_options() call up qcow2_update_options() only updates some variables in BDRVQcowState and doesn't really depend on other parts of it being initialised yet, so it can be moved so that it immediately follows the other half of option handling code in qcow2_open(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-09-14 16:51:36 +02:00
Kevin Wolf	4c75d1a157	qcow2: Factor out qcow2_update_options() Eventually we want to be able to change options at runtime. As a first step towards that goal, separate some option handling code from the general initialisation code in qcow2_open(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-09-14 16:51:36 +02:00
Kevin Wolf	f113ae839e	qcow2: Improve error message Eric says that "any" sounds better than "either", and my non-native feeling says the same, so let's change it. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-09-14 16:51:36 +02:00
Kevin Wolf	4d2cb09251	block: Allow specifying driver-specific options to reopen Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-09-14 16:51:36 +02:00
Kevin Wolf	ff99129ab8	qcow2: Rename BDRVQcowState to BDRVQcow2State BDRVQcowState is already used by qcow1, and gdb is always confused which one to use. Rename the qcow2 one so they can be distinguished. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2015-09-14 16:51:36 +02:00
Max Reitz	6ebf9aa2ef	block: Drop drv parameter from bdrv_open() Now that this parameter is effectively unused, we can drop it and just pass NULL on to bdrv_open_inherit(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-09-14 16:51:36 +02:00
Max Reitz	e6641719fe	block: Always pass NULL as drv for bdrv_open() Change all callers of bdrv_open() to pass the driver name in the options QDict instead of passing its BlockDriver pointer. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-09-14 16:51:36 +02:00
Daniel P. Berrange	b6af097528	maint: remove / fix many doubled words Many source files have doubled words (eg "the the", "to to", and so on). Most of these can simply be removed, but a couple were actual mis-spellings (eg "to to" instead of "to do"). There was even one triple word score "to to to" :-) Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2015-09-11 10:21:38 +03:00
Daniel P. Berrange	a8f15a2775	maint: remove double semicolons in many files A number of source files have statements accidentally terminated by a double semicolon - eg 'foo = bar;;'. This is harmless but a mistake none the less. The tcg/ia64/tcg-target.c file is whitelisted because it has valid use of ';;' in a comment containing assembly code. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2015-09-11 10:21:38 +03:00
Peter Lieven	6d1f252d8c	block/iscsi: validate block size returned from target It has been reported that at least tgtd returns a block size of 0 for LUN 0. To avoid running into divide by zero later on and protect against other problematic block sizes validate the block size right at connection time. Cc: qemu-stable@nongnu.org Reported-by: Andrey Korolyov <andrey@xdel.ru> Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1439552016-8557-1-git-send-email-pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-09-07 18:14:03 +02:00
Wen Congyang	834cb2ada5	quorum: validate vote threshold against num_children even if read-pattern is fifo We need to use threshold to check if too many write operation fails. If threshold is larger than num children, we always get write error event even if all write operations success. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Message-id: 55962F72.3060003@cn.fujitsu.com Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-09-04 21:00:32 +02:00
Alberto Garcia	909c260c71	qcow2: reorder fields in Qcow2CachedTable to reduce padding Changing the current ordering saves 8 bytes per cache entry in x86_64. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 0bd55291211df3dfb514d0e7d2031dd5c4f9f807.1438690126.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-09-04 21:00:32 +02:00
Alberto Garcia	279621c046	qcow2: add option to clean unused cache entries after some time This adds a new 'cache-clean-interval' option that cleans all qcow2 cache entries that haven't been used in a certain interval, given in seconds. This allows setting a large L2 cache size so it can handle scenarios with lots of I/O and at the same time use little memory during periods of inactivity. This feature currently relies on MADV_DONTNEED to free that memory, so it is not useful in systems that don't follow that behavior. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: a70d12da60433df9360ada648b3f34b8f6f354ce.1438690126.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-09-04 21:00:32 +02:00
Alberto Garcia	355ee2d0e8	qcow2: mark the memory as no longer needed after qcow2_cache_empty() After having emptied the cache, the data in the cache tables is no longer useful, so we can tell the kernel that we are done with it. In Linux this frees the resources associated with it. The effect of this can be seen in the HMP commit operation: it moves data from the top to the base image (and fills both caches), then it empties the top image. At this point the data in that cache is no longer needed so it's just wasting memory. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 08538b098e1faf6c92496477cf9b47a20e5aacea.1438690126.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-09-04 21:00:32 +02:00
Max Reitz	bdd03cdf5d	block/raw-posix: Use raw_normalize_devicepath() The filename given to qemu_open() in block/raw-posix.c should generally have been processed by raw_normalize_devicepath(); unless we are only probing (in which case the caller often checks whether the file is a block device or not, and this property will be changed by raw_normalize_devicepath() on NetBSD) or it is about a deprecated device (i.e. floppy). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-09-04 20:59:48 +02:00
Wen Congyang	e12f378409	block: more check for replaced node We use mirror+replace to fix quorum's broken child. bs/s->common.bs is quorum, and to_replace is the broken child. The new child is target_bs. Without this patch, the replace node can be any node, and it can be top BDS with BB, or another quorum's child. We just check if the broken child is part of the quorum BDS in this patch. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Message-id: 55A86486.1000404@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-09-02 14:56:39 +01:00
Kevin Wolf	e424aff5f3	mirror: Fix coroutine reentrance This fixes a regression introduced by commit `dcfb3beb` ("mirror: Do zero write on target if sectors not allocated"), which was reported to cause aborts with the message "Co-routine re-entered recursively". The cause for this bug is the following code in mirror_iteration_done(): if (s->common.busy) { qemu_coroutine_enter(s->common.co, NULL); } This has always been ugly because - unlike most places that reenter - it doesn't have a specific yield that it pairs with, but is more uncontrolled. What we really mean here is "reenter the coroutine if it's in one of the four explicit yields in mirror.c". This used to be equivalent with s->common.busy because neither mirror_run() nor mirror_iteration() call any function that could yield. However since commit `dcfb3beb` this doesn't hold true any more: bdrv_get_block_status_above() can yield. So what happens is that bdrv_get_block_status_above() wants to take a lock that is already held, so it adds itself to the queue of waiting coroutines and yields. Instead of being woken up by the unlock function, however, it gets woken up by mirror_iteration_done(), which is obviously wrong. In most cases the code actually happens to cope fairly well with such cases, but in this specific case, the unlock must already have scheduled the coroutine for wakeup when mirror_iteration_done() reentered it. And then the coroutine happened to process the scheduled restarts and tried to reenter itself recursively. This patch fixes the problem by pairing the reenter in mirror_iteration_done() with specific yields instead of abusing s->common.busy. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1439455310-11263-1-git-send-email-kwolf@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-08-14 09:51:31 -04:00
Stefan Hajnoczi	cae98cb87d	block/mirror: limit qiov to IOV_MAX elements If mirror has more free buffers than IOV_MAX, preadv(2)/pwritev(2) EINVAL failures may be encountered. It is possible to trigger this by setting granularity to a low value like 8192. This patch stops appending chunks once IOV_MAX is reached. The spurious EINVAL failure can be reproduced with a qcow2 image file and the following QMP invocation: qmp.command('drive-mirror', device='virtio0', target='/tmp/r7.s1', granularity=8192, sync='full', mode='absolute-paths', format='raw') While the guest is running dd if=/dev/zero of=/var/tmp/foo oflag=direct bs=4k. Cc: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1435761950-26714-1-git-send-email-stefanha@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-08-06 04:41:09 -04:00
Sascha Silbe	e94867ed5f	block: don't register quorum driver if SHA256 support is unavailable Commit `488981a4` [block: convert quorum blockdrv to use crypto APIs] broke qemu-iotest 041 on hosts with GnuTLS < 2.10.0. It converted a compile-time check to a run-time check at device open time. The result is that we now advertise a feature (the quorum block driver) that will never work (on those hosts). There's no way (short of parsing human-readable error messages) for qemu-iotests or any other API consumer to recognise that the quorum block driver isn't _actually_ available and shouldn't be used or tested. Move the run-time check to bdrv_quorum_init() to avoid registering the quorum block driver if we know it cannot work. This way API consumers can recognise it's unavailable. Fixes: `488981a4af` Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1438699705-21761-1-git-send-email-silbe@linux.vnet.ibm.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-08-05 15:19:32 +01:00
Peter Maydell	9f8c5b69c2	-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVtwOFAAoJEL2+eyfA3jBXeN8QAJAuQL5Wvwe1epGhiOIAVadX Zx0Yx8DnYbHH184tsEOyuliHUxSN6X6bBlO1rTaI/tpEOeZEBY9EBUBDdcty//Y4 ZHCKDppYSP773wJDLIvHn7gw83Z+zqgRw1WXcyd4piBKiilJ5c2wvpDrTNqwFh0I 1mqTFoVOARK0OVlxuXdhaH+hBI823MaPaSRckOWYgZjRZN2HlCAN8sDigdQtqBGp Udugw4TijlMi5/JmLUmvDHLaVZz2EEKlH7fHjInU98Z2p5UE/dkTI2QIj2smTSEk XpUkmukHgLHyIxDnE2pAhdQG12RTd8HrnkmohauHXsvVzQTvW6tOBfR57sqVSy1g WPxEiAgvs5kWnF7LLmR4AO7xd+r9/EFrNsMpdinKjA68RAnxEfFcCriNMPtPtv2h g/Nw60uqrSiv98/JL7ApcqwcevJOteBY50D//hkpyevlyPkh4hg0vISXfaAaDFrp YC0byq32UtkWZJkgAAj+ZkGJOdeRGl+Ma8+2FwxHAaZepCMGJC2xRAQ28CEHUEfP WAM/VRirgmxV4kG2JTnotlWqj6z0iYdjiL+AV+Mq0cywkboOQoyaOoIGC30kovGZ XMcEDLTr1wke/ODU7JegOX+SPLU+Nf4zXz6QJMaIRlAGO+EpQiJoHjH0PWvjAoqV xcWr+wXm9H0ejX5Rx1is =9QFT -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/cody/tags/jtc-for-upstream-pull-request' into staging # gpg: Signature made Tue Jul 28 05:22:29 2015 BST using RSA key ID C0DE3057 # gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>" # gpg: aka "Jeffrey Cody <jeff@codyprime.org>" # gpg: aka "Jeffrey Cody <codyprime@gmail.com>" # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 9957 4B4D 3474 90E7 9D98 D624 BDBE 7B27 C0DE 3057 * remotes/cody/tags/jtc-for-upstream-pull-request: block/ssh: Avoid segfault if inet_connect doesn't set errno. sheepdog: serialize requests to overwrapping area Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-07-28 13:22:57 +01:00
Richard W.M. Jones	325e390421	block/ssh: Avoid segfault if inet_connect doesn't set errno. On some (but not all) systems: $ qemu-img create -f qcow2 overlay -b ssh://xen/ Segmentation fault It turns out this happens when inet_connect returns -1 in the following code, but errno == 0. s->sock = inet_connect(s->hostport, errp); if (s->sock < 0) { ret = -errno; goto err; } In the test case above, no host called "xen" exists, so getaddrinfo fails. On Fedora 22, getaddrinfo happens to set errno = ENOENT (although it is not documented to do that), so it doesn't segfault. On RHEL 7, errno is not set by the failing getaddrinfo, so ret = -errno = 0, so the caller doesn't know there was an error and continues with a half-initialized BDRVSSHState struct, and everything goes south from there, eventually resulting in a segfault. Fix this by setting ret to -EIO (same as block/nbd.c and block/sheepdog.c). The real error is saved in the Error** errp struct, so it is printed correctly: $ ./qemu-img create -f qcow2 overlay -b ssh://xen/ qemu-img: overlay: address resolution failed for xen:22: No address associated with hostname Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Reported-by: Jun Li BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1147343 Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-07-28 00:19:05 -04:00
Hitoshi Mitake	6a55c82cec	sheepdog: serialize requests to overwrapping area Current sheepdog driver only serializes create requests in oid unit. This mechanism isn't enough for handling requests to overwrapping area spanning multiple oids, so it can result bugs like below: https://bugs.launchpad.net/sheepdog-project/+bug/1456421 This patch adds a new serialization mechanism for the problem. The difference from the old one is: 1. serialize entire aiocb if their targetting areas overwrap 2. serialize all requests (read, write, and discard), not only creates This patch also removes the old mechanism because the new one can be an alternative. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Teruaki Ishizaki <ishizaki.teruaki@lab.ntt.co.jp> Cc: Vasiliy Tolstov <v.tolstov@selfip.ru> Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Tested-by: Vasiliy Tolstov <v.tolstov@selfip.ru> Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-07-28 00:16:57 -04:00
Jeff Cody	b15deac795	block: vpc - prevent overflow if max_table_entries >= 0x40000000 When we allocate the pagetable based on max_table_entries, we multiply the max table entry value by 4 to accomodate a table of 32-bit integers. However, max_table_entries is a uint32_t, and the VPC driver accepts ranges for that entry over 0x40000000. So during this allocation: s->pagetable = qemu_try_blockalign(bs->file, s->max_table_entries * 4); The size arg overflows, allocating significantly less memory than expected. Since qemu_try_blockalign() size argument is size_t, cast the multiplication correctly to prevent overflow. The value of "max_table_entries * 4" is used elsewhere in the code as well, so store the correct value for use in all those cases. We also check the Max Tables Entries value, to make sure that it is < SIZE_MAX / 4, so we know the pagetable size will fit in size_t. Cc: qemu-stable@nongnu.org Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-07-27 17:19:06 +02:00
Fam Zheng	9990069758	mirror: Speed up bitmap initial scanning Limiting to sectors_per_chunk for each bdrv_is_allocated_above is slow, because the underlying protocol driver would issue much more queries than necessary. We should coalesce the query. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: <1436413678-7114-4-git-send-email-famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-22 11:14:21 +01:00
Richard W.M. Jones	796a060bc0	block/curl: Don't lose original error when a connection fails. Currently if qemu is connected to a curl source (eg. web server), and the web server fails / times out / dies, you always see a bogus EIO "Input/output error". For example, choose a large file located on any local webserver which you control: $ qemu-img convert -p http://example.com/large.iso /tmp/test Once it starts copying the file, stop the webserver and you will see qemu-img fail with: qemu-img: error while reading sector 61440: Input/output error This patch does two things: Firstly print the actual error from curl so it doesn't get lost. Secondly, change EIO to EPROTO. EPROTO is a POSIX.1 compatible errno which more accurately reflects that there was a protocol error, rather than some kind of hardware failure. After this patch is applied, the error changes to: $ qemu-img convert -p http://example.com/large.iso /tmp/test qemu-img: curl: transfer closed with 469989 bytes remaining to read qemu-img: error while reading sector 16384: Protocol error Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-07-14 21:50:13 -04:00
Wen Congyang	48ac0a4df8	mirror: correct buf_size If bus_size is less than 0, the command fails. If buf_size is 0, use DEFAULT_MIRROR_BUF_SIZE. If buf_size % granularity is not 0, mirror_free_init() will do dangerous things. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 5555A588.3080907@cn.fujitsu.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-07-14 21:50:13 -04:00
Stefan Hajnoczi	17d9716d7b	block: keep bitmap if incremental backup job is cancelled Reclaim the dirty bitmap if an incremental backup block job is cancelled. The ret variable may be 0 when the job is cancelled so it's not enough to check ret < 0. Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1434380534-7680-1-git-send-email-stefanha@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-07-14 21:50:13 -04:00
Fam Zheng	4c0cbd6fec	block/mirror: Sleep periodically during bitmap scanning Before, we only yield after initializing dirty bitmap, where the QMP command would return. That may take very long, and guest IO will be blocked. Add sleep points like the later mirror iterations. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1431486673-19280-1-git-send-email-famz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2015-07-14 21:50:13 -04:00
Josh Durgin	e34d8f297d	rbd: fix ceph settings precedence Apply the ceph settings from a config file before any ceph settings from the command line. Since the ceph config file location may be specified on the command line, parse it once to read the config file, and do a second pass to apply the rest of the command line ceph options. Signed-off-by: Josh Durgin <jdurgin@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-07-14 17:15:23 +02:00
Josh Durgin	99a3c89d5d	rbd: make qemu's cache setting override any ceph setting To be safe, when cache=none is used ceph settings should not be able to override it to turn on caching. This was previously possible with rbd_cache=true in the rbd device configuration or a ceph configuration file. Similarly, rbd settings could have turned off caching when qemu requested it, although this would just be a performance problem. Fix this by changing rbd's cache setting to match qemu after all other ceph settings have been applied. Signed-off-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-07-14 17:15:23 +02:00
Josh Durgin	3dbf00e058	rbd: remove unused constants and fields RBDAIOCB.status was only used for cancel, which was removed in `7691e24dbe`. RBDAIOCB.sector_num was never used. RADOSCB.done and rcbid were never used. RBD_FD* are obsolete since the pipe was removed in `e04fb07fd1`. Signed-off-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-07-14 17:15:23 +02:00
Peter Maydell	acf7b7fdf3	Bugfixes and Daniel Berrange's crypto library. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJVnQWdAAoJEL/70l94x66D6OgIAKJlzQfmy5w7Q9WD4vCMhD76 JrpLSsn7Gx/Bws0Nu9nLQlqun5z4hiUxyG2kP/WqD9+tV3cpSMSyrG6ImVdqKnQ5 +Z8WJZuREkQv0aqDUjQVST+eIDZuh2LWJXAjhgsCXUHY77eWb/7WmKT79xJOa+5C 5xB1qxudqX5IsTvpiKKPbmUGYkAcvRX1dUSaFwRIMO0UyKn59B9WfM9a5slIbLW7 XfI8+wEJshTVLuQkkTfdidWQc5M5DwlmO7ESUNR/BRPCPFeyjcDqgQY5pBM5XVo9 C+S0R3zIt3Ew0fhCtLRyjlIT0bGfwjbU5HRiHcyldBKhNUZZjSUoOWJnYRHXUDY= =H8wA -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging Bugfixes and Daniel Berrange's crypto library. # gpg: Signature made Wed Jul 8 12:12:29 2015 BST using RSA key ID 78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: ossaudio: fix memory leak ui: convert VNC to use generic cipher API block: convert qcow/qcow2 to use generic cipher API ui: convert VNC websockets to use crypto APIs block: convert quorum blockdrv to use crypto APIs crypto: add a nettle cipher implementation crypto: add a gcrypt cipher implementation crypto: introduce generic cipher API & built-in implementation crypto: move built-in D3DES implementation into crypto/ crypto: move built-in AES implementation into crypto/ crypto: introduce new module for computing hash digests vl: move rom_load_all after machine init done Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-07-08 20:46:35 +01:00
Daniel P. Berrange	f6fa64f6d2	block: convert qcow/qcow2 to use generic cipher API Switch the qcow/qcow2 block driver over to use the generic cipher API, this allows it to use the pluggable AES implementations, instead of being hardcoded to use QEMU's built-in impl. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1435770638-25715-10-git-send-email-berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-08 13:11:01 +02:00
Daniel P. Berrange	488981a4af	block: convert quorum blockdrv to use crypto APIs Get rid of direct use of gnutls APIs in quorum blockdrv in favour of using the crypto APIs. This avoids the need to do conditional compilation of the quorum driver. It can simply report an error at file open file instead if the required hash algorithm isn't supported by QEMU. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1435770638-25715-8-git-send-email-berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-08 13:11:01 +02:00
Ting Wang	970311646a	blockjob: add block_job_release function There is job resource leak in function mirror_start_job, although bdrv_create_dirty_bitmap is unlikely failed. Add block_job_release for each release when needed. Signed-off-by: Ting Wang <kathy.wangting@huawei.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1435311455-56048-1-git-send-email-kathy.wangting@huawei.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-07 14:27:14 +01:00
Richard W.M. Jones	25d9747b64	block/raw-posix: Don't think /dev/fd/<NN> is a floppy drive. In libguestfs we use /dev/fd/<NN> to pass pre-opened file descriptors to qemu-img. Lately I've discovered that although this works, qemu believes that these are floppy disk images. That in itself isn't much of a problem, but now qemu prints a warning about host floppy pass-thru being deprecated. Extend the existing test so that it ignores /dev/fd/ as well as /dev/fdset/ A simple test of this, if you are using the bash shell, is: qemu-img info <( cat /dev/null ) without this patch: $ qemu-img info <( cat /dev/null ) qemu-img: Host floppy pass-through is deprecated Support for it will be removed in a future release. qemu-img: Could not open '/dev/fd/63': Could not refresh total sector count: Illegal seek with this patch: $ qemu-img info <( cat /dev/null ) qemu-img: Could not open '/dev/fd/63': Could not refresh total sector count: Illegal seek Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-id: 1435761614-31358-1-git-send-email-rjones@redhat.com Fixes: https://bugs.launchpad.net/qemu/+bug/1470536 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-07 14:27:14 +01:00
Fam Zheng	53ec73e264	block: Use bdrv_drain to replace uncessary bdrv_drain_all There callers work on a single BlockDriverState subtree, where using bdrv_drain() is more accurate. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-07 14:27:14 +01:00
Daniel P. Berrange	6f2945cde6	crypto: move built-in AES implementation into crypto/ To prepare for a generic internal cipher API, move the built-in AES implementation into the crypto/ directory Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1435770638-25715-3-git-send-email-berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-07 12:04:13 +02:00
Stefan Hajnoczi	7a63f3cdc4	block: update bdrv_drain_all()/bdrv_drain() comments The doc comments for bdrv_drain_all() and bdrv_drain() are outdated: * The bdrv_drain() comment is a poor man's bdrv_lock()/bdrv_unlock() which Fam Zheng is currently developing. Unfortunately this warning was never really enough because devices keep submitting I/O and op blockers don't prevent that. * The bdrv_drain_all() comment is still partially correct but reflects the nature of the implementation rather than API documentation. Do make it clear that bdrv_drain() is only appropriate within an AioContext. For anything spanning AioContexts you need bdrv_drain_all(). Cc: Markus Armbruster <armbru@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1435854281-6078-1-git-send-email-stefanha@redhat.com	2015-07-07 10:31:08 +01:00
Alberto Garcia	1bd84ee717	qcow2: remove unnecessary check The value of 'i' is guaranteed to be >= 0 Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 1435824371-2660-1-git-send-email-berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-07 10:31:04 +01:00
Alberto Garcia	764ba3ae51	block: remove redundant check before g_slist_find() An empty GSList is represented by a NULL pointer, therefore it's a perfectly valid argument for g_slist_find() and there's no need to make any additional check. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 1435583533-5758-1-git-send-email-berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 10:06:23 +01:00
Peter Lieven	29c838cdc9	block/nfs: limit maximum readahead size to 1MB a malicious caller could otherwise specify a very large value via the URI and force libnfs to allocate a large amount of memory for the readahead buffer. Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-id: 1435317241-25585-1-git-send-email-pl@kamp.de Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 10:06:23 +01:00
Peter Lieven	9049736ec7	block/iscsi: restore compatiblity with libiscsi 1.9.0 RHEL7 and others are stuck with libiscsi 1.9.0 since there unfortunately was an ABI breakage after that release. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1435313881-19366-1-git-send-email-pl@kamp.de Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 10:06:23 +01:00
Fam Zheng	508249952c	block: Fix dirty bitmap in bdrv_co_discard Unsetting dirty globally with discard is not very correct. The discard may zero out sectors (depending on can_write_zeroes_with_unmap), we should replicate this change to destination side to make sure that the guest sees the same data. Calling bdrv_reset_dirty also troubles mirror job because the hbitmap iterator doesn't expect unsetting of bits after current position. So let's do it the opposite way which fixes both problems: set the dirty bits if we are to discard it. Reported-by: wangxiaolong@ucloud.cn Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 10:06:23 +01:00
Fam Zheng	dcfb3beb51	mirror: Do zero write on target if sectors not allocated If guest discards a source cluster, mirroring with bdrv_aio_readv is overkill. Some protocols do zero upon discard, where it's best to use bdrv_aio_write_zeroes, otherwise, bdrv_aio_discard will be enough. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 10:06:23 +01:00
Fam Zheng	0fc9f8ea28	qmp: Add optional bool "unmap" to drive-mirror If specified as "true", it allows discarding on target sectors where source is not allocated. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 10:06:23 +01:00
Fam Zheng	ba3f0e2545	block: Add bdrv_get_block_status_above Like bdrv_is_allocated_above, this function follows the backing chain until seeing BDRV_BLOCK_ALLOCATED. Base is not included. Reimplement bdrv_is_allocated on top. [Initialized bdrv_co_get_block_status_above() ret to 0 to silence mingw64 compiler warning about the unitialized variable. assert(bs != base) prevents that case but I suppose the program could be compiled with -DNDEBUG. --Stefan] Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 10:03:50 +01:00
John Snow	4b80ab2b7d	qapi: Rename 'dirty-bitmap' mode to 'incremental' If we wish to make differential backups a feature that's easy to access, it might be pertinent to rename the "dirty-bitmap" mode to "incremental" to make it clear what /type/ of backup the dirty-bitmap is helping us perform. This is an API breaking change, but 2.4 has not yet gone live, so we have this flexibility. Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1433463642-21840-2-git-send-email-jsnow@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 09:20:18 +01:00
Jindřich Makovička	3e5feb6202	qcow2: Handle EAGAIN returned from update_refcount Fixes a crash during image compression Signed-off-by: Jindřich Makovička <makovick@gmail.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 09:20:18 +01:00
Peter Lieven	5dd7a535b7	block/iscsi: add support for request timeouts libiscsi starting with 1.15 will properly support timeout of iscsi commands. The default will remain no timeout, but this can be changed via cmdline parameters, e.g.: qemu -iscsi timeout=30 -drive file=iscsi://... If a timeout occurs a reconnect is scheduled and the timed out command will be requeued for processing after a successful reconnect. The required API call iscsi_set_timeout is present since libiscsi 1.10 which was released in October 2013. However, due to some bugs in the libiscsi code the use is not recommended before version 1.15. Please note that this patch bumps the libiscsi requirement to 1.10 to have all function and macros defined. The patch fixes also a off-by-one error in the NOP timeout calculation which was fixed while touching these code parts. Signed-off-by: Peter Lieven <pl@kamp.de> Message-id: 1434455107-19328-1-git-send-email-pl@kamp.de Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-02 09:20:18 +01:00
Dimitris Aragiorgis	3307ed7b3f	raw-posix: Introduce hdev_is_sg() Until now, an SG device was identified only by checking if its path started with "/dev/sg". Then, hdev_open() would set the bs->sg flag accordingly. The patch relies on the actual properties of the device instead of the specified file path. To this end, test for an SG device (e.g. /dev/sg0) by ensuring that all of the following holds: - The specified file name corresponds to a character device - The device supports the SG_GET_VERSION_NUM ioctl - The device supports the SG_GET_SCSI_ID ioctl Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com> Message-id: 1435056300-14924-6-git-send-email-dimara@arrikto.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:08:52 +01:00
Dimitris Aragiorgis	a93a3982a6	raw-posix: Use DPRINTF for DEBUG_FLOPPY Get rid of several #ifdef DEBUG_FLOPPY and substitute them with DPRINTF. Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1435056300-14924-5-git-send-email-dimara@arrikto.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:08:52 +01:00
Dimitris Aragiorgis	bcb225550d	raw-posix: DPRINTF instead of DEBUG_BLOCK_PRINT Building the QEMU tools fails if we #define DEBUG_BLOCK inside block/raw-posix.c. Here instead of adding qemu-log.o in block-obj-y so that DEBUG_BLOCK_PRINT can be used, we substitute the latter with a simple DPRINTF() (that does not cause bit-rot). Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1435056300-14924-4-git-send-email-dimara@arrikto.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:08:52 +01:00
Dimitris Aragiorgis	1b6bc94d5d	Fix migration in case of scsi-generic During migration, QEMU uses fsync()/fdatasync() on the open file descriptor for read-write block devices to flush data just before stopping the VM. However, fsync() on a scsi-generic device returns -EINVAL which causes the migration to fail. This patch skips flushing data in case of an SG device, since submitting SCSI commands directly via an SG character device (e.g. /dev/sg0) bypasses the page cache completely, anyway. Note that fsync() not only flushes the page cache but also the disk cache. The scsi-generic device never sends flushes, and for migration it assumes that the same SCSI device is used by the destination host, so it does not issue any SCSI SYNCHRONIZE CACHE (10) command. Finally, remove the bdrv_is_sg() test from iscsi_co_flush() since this is now redundant (we flush the underlying protocol at the end of bdrv_co_flush() which, with this patch, we never reach). Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1435056300-14924-3-git-send-email-dimara@arrikto.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:08:52 +01:00
Dimitris Aragiorgis	b192af8acc	block: Use bdrv_is_sg() everywhere Instead of checking bs->sg use bdrv_is_sg() consistently throughout the code. Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1435056300-14924-2-git-send-email-dimara@arrikto.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:08:52 +01:00
Wolfgang Bumiller	d5941ddae8	vvfat: add a label option Until now the vvfat volume label was hardcoded to be "QEMU VVFAT", now you can pass a file.label=labelname option to the -drive to change it. The FAT structure defines the volume label to be limited to 11 bytes and is filled up spaces when shorter than that. The trailing spaces however aren't exposed to the user by operating systems. [Added missing comment '#' characters in block-core.json to fix build errors. --Stefan] Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Message-id: 1434706529-13895-2-git-send-email-w.bumiller@proxmox.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:06:17 +01:00
Alexander Yarygin	97b0385a34	block-backend: Introduce blk_drain() This patch introduces the blk_drain() function which allows to replace blk_drain_all() when only one BlockDriverState needs to be drained. Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1434537440-28236-2-git-send-email-yarygin@linux.vnet.ibm.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:06:16 +01:00
Alberto Garcia	2f388b93a1	throttle: Check current timers before updating any_timer_armed[] Calling throttle_group_config() cancels all timers from a particular BlockDriverState, so any_timer_armed[] should be updated accordingly. However, with the current code it may happen that a timer is armed in a different BlockDriverState from the same group, so any_timer_armed[] would be set to false in a situation where there is still a timer armed. The consequence is that we might end up with two timers armed. This should not have any noticeable impact however, since all accesses to the ThrottleGroup are protected by a lock, and the situation would become normal again shortly thereafter as soon as all timers have been fired. The correct way to solve this is to check that we're actually cancelling a timer before updating any_timer_armed[]. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 1434382875-3998-1-git-send-email-berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:06:16 +01:00
Alexander Yarygin	f406c03c09	block: Let bdrv_drain_all() to call aio_poll() for each AioContext After the commit `9b536adc` ("block: acquire AioContext in bdrv_drain_all()") the aio_poll() function got called for every BlockDriverState, in assumption that every device may have its own AioContext. If we have thousands of disks attached, there are a lot of BlockDriverStates but only a few AioContexts, leading to tons of unnecessary aio_poll() calls. This patch changes the bdrv_drain_all() function allowing it find shared AioContexts and to call aio_poll() only for unique ones. Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-id: 1433936297-7098-4-git-send-email-yarygin@linux.vnet.ibm.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-23 15:06:16 +01:00
Markus Armbruster	cc7a8ea740	Include qapi/qmp/qerror.h exactly where needed In particular, don't include it into headers. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>	2015-06-22 18:20:41 +02:00
Markus Armbruster	d49b683644	qerror: Move #include out of qerror.h Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>	2015-06-22 18:20:40 +02:00
Markus Armbruster	4629ed1e98	qerror: Finally unused, clean up Remove it except for two things in qerror.h: * Two #include to be cleaned up separately to avoid cluttering this patch. * The QERR_ macros. Mark as obsolete. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>	2015-06-22 18:20:40 +02:00
Markus Armbruster	c6bd8c706a	qerror: Clean up QERR_ macros to expand into a single string These macros expand into error class enumeration constant, comma, string. Unclean. Has been that way since commit `13f59ae`. The error class is always ERROR_CLASS_GENERIC_ERROR since the previous commit. Clean up as follows: * Prepend every use of a QERR_ macro by ERROR_CLASS_GENERIC_ERROR, and delete it from the QERR_ macro. No change after preprocessing. * Rewrite error_set(ERROR_CLASS_GENERIC_ERROR, ...) into error_setg(...). Again, no change after preprocessing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>	2015-06-22 18:20:40 +02:00
Eric Blake	fc48ffc39e	qobject: Use 'bool' for qbool We require a C99 compiler, so let's use 'bool' instead of 'int' when dealing with boolean values. There are few enough clients to fix them all in one pass. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Andreas Färber <afaerber@suse.de> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2015-06-22 17:40:00 +02:00
Peter Maydell	f3e3b083d4	Block layer core and image format patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJVevYFAAoJEH8JsnLIjy/W3jEP/0hiQ3rCRZ/he8s5maTdT+TR YSeHkB5rKpz0Uopn1DMn1QrIbUVzX7dyb+uf9zQ0/xRQIzf6k8uxqU/NWrdoF3NK qx91dGWedwnG+TEBIMbcR7nMrw4dP6kH7uPz/VWMXDHVLz0HIcD95qhKgs0mSY6J dWqex6ACjXM68zJU5IioagU9evV80WZE1S8z7zfixxtTBx5hCaTVbwalkaCxcrXw PbZle55rjI8B10+OzgBw0fq10nias+NTndU9CwNBboxmEtAjq8/mQ663vcWlmiFo 9a/hkda27Z5ut/0Tqk1v4uLHauylp++rrAabPBAuCFMKes6cdkddP15Q/r52aJ29 5meodQtbet1rGrM+Aq4vuSuWId71PGypEI/3URDdNfYFNISoeLLsk4lcQUu7VrDD sRX3Jt8SI3nkIgOnhPyi7NDPmafxFt8yRt5vM8MyR5ynF8NS/2hiAc3wqnbXGjUj a5GqDCefb1yM0R5HvksuFFt3OnXlKJQ3J+ksXNUJf9DSAZPauqWD696pcTeg8wyy 3PIGkczgUuKTVfFWd3THZxJLAo7ZuqvXBHHV8o1SeMBDxwh4FhTd8Kjvm3rUNFfl VDox4qwZ1AcxLrxgqazKU7sD9iWBDHURRcpOoUsBys7oxQZnhcmMp1fRlEkTOyrD HNiSNByqBrtkfeVzHlSe =QgYk -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer core and image format patches # gpg: Signature made Fri Jun 12 16:08:53 2015 BST using RSA key ID C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" * remotes/kevin/tags/for-upstream: (25 commits) block: Fix reopen flag inheritance block: Add BlockDriverState.inherits_from block: Add list of children to BlockDriverState queue.h: Add QLIST_FIX_HEAD_PTR() block: Drain requests before swapping nodes in bdrv_swap() block: Move flag inheritance to bdrv_open_inherit() block: Use QemuOpts in bdrv_open_common() block: Use macro for cache option names vmdk: Use bdrv_open_image() quorum: Use bdrv_open_image() check-qdict: Test cases for new functions qdict: Add qdict_{set,copy}_default() qdict: Add qdict_array_entries() iotests: Add tests for overriding BDRV_O_PROTOCOL block: driver should override flags in bdrv_open() block: Change bitmap truncate conditional to assertion block: record new size in bdrv_dirty_bitmap_truncate raw-posix: Fix .bdrv_co_get_block_status() for unaligned image size vmdk: Use vmdk_find_index_in_cluster everywhere vmdk: Fix index_in_cluster calculation in vmdk_co_get_block_status ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2015-06-15 10:43:06 +01:00
Kevin Wolf	67251a3113	block: Fix reopen flag inheritance When reopening an image, the block layer already takes care to reopen bs->file as well with recalculated inherited flags. The same must happen for any other child (most notably missing before this patch: backing files). If bs->file (or any other child) didn't originally inherit from bs, e.g. because it was created separately and then only referenced, it must not inherit flags on reopen either, so check the inherited_from field before propagation the reopen down. VMDK already reopened its extents manually; this code can now be dropped. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2015-06-12 17:04:59 +02:00
Kevin Wolf	f3930ed0bb	block: Move flag inheritance to bdrv_open_inherit() Instead of letting every caller of bdrv_open() determine the right flags for its child node manually and pass them to the function, pass the parent node and the role of the newly opened child (like backing file, protocol layer, etc.). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-06-12 17:04:59 +02:00
Kevin Wolf	a646836784	vmdk: Use bdrv_open_image() Besides standardising on a single interface for opening child nodes, this patch allows the user to specify options to individual extent nodes. Overriding file names isn't possible with this yet, so it's of limited usefulness, but still a step forward. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2015-06-12 16:58:07 +02:00
Kevin Wolf	ea6828d81b	quorum: Use bdrv_open_image() Besides standardising on a single interface for opening child nodes, this simplifies the .bdrv_open() implementation of the quorum block driver by using block layer functionality for handling BlockdevRefs. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2015-06-12 16:58:07 +02:00
Kevin Wolf	b8684454e1	raw-posix: Fix .bdrv_co_get_block_status() for unaligned image size Image files with an unaligned image size have a final hole that starts at EOF, i.e. in the middle of a sector. Currently, *pnum == 0 is returned when checking the status of this sector. In qemu-img, this triggers an assertion failure. In order to fix this, one type for the sector that contains EOF must be found. Treating a hole as data is safe, so this patch rounds the calculated number of data sectors up, so that a partial sector at EOF is treated as a full data sector. This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1229394 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Cole Robinson <crobinso@redhat.com>	2015-06-12 15:54:01 +02:00
Fam Zheng	90df601f06	vmdk: Use vmdk_find_index_in_cluster everywhere Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-06-12 15:54:01 +02:00
Fam Zheng	61f0ed1d54	vmdk: Fix index_in_cluster calculation in vmdk_co_get_block_status It has the similar issue with `b1649fae49`. Since the calculation is repeated for a few times already, introduce a function so it can be reused. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-06-12 15:54:01 +02:00
Max Reitz	bc85ef265a	qcow2: Add DEFAULT_L2_CACHE_CLUSTERS If a relatively large cluster size is chosen, the default of 1 MB L2 cache is not really appropriate. In this case, unless overridden by the user, the default cache size should not be determined by its size in bytes but by the number of L2 tables (clusters) it is supposed to contain. Note that without this patch, MIN_L2_CACHE_SIZE will effectively take over the same role. However, providing space for just two L2 tables is not enough to be the default. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-06-12 15:54:01 +02:00
Max Reitz	57e2166959	qcow2: Set MIN_L2_CACHE_SIZE to 2 The L2 cache must cover at least two L2 tables, because during COW two L2 tables are accessed simultaneously. Reported-by: Alexander Graf <agraf@suse.de> Cc: qemu-stable <qemu-stable@nongnu.org> Signed-off-by: Max Reitz <mreitz@redhat.com> Tested-by: Alexander Graf <agraf@suse.de> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-06-12 15:54:00 +02:00
Alberto Garcia	b8fe1694e5	throttle: add the name of the ThrottleGroup to BlockDeviceInfo Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 172df91f09c69c6f0440a697bbd1b3f95b077ee4.1433779731.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 14:00:00 +01:00
Alberto Garcia	db6283385c	throttle: acquire the ThrottleGroup lock in bdrv_swap() bdrv_swap() touches the fields of a BlockDriverState that are protected by the ThrottleGroup lock. Although those fields end up in their original place, they are temporarily swapped in the process, so there's a chance that an operation on a member of the same group happening on a different thread can try to use them. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: d92dc40d7c4f1fc5cda5cbbf4ffb7a4670b79d17.1433779731.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 14:00:00 +01:00
Alberto Garcia	76f4afb40f	throttle: Add throttle group support The throttle group support use a cooperative round robin scheduling algorithm. The principles of the algorithm are simple: - Each BDS of the group is used as a token in a circular way. - The active BDS computes if a wait must be done and arms the right timer. - If a wait must be done the token timer will be armed so the token will become the next active BDS. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: f0082a86f3ac01c46170f7eafe2101a92e8fde39.1433779731.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 14:00:00 +01:00
Alberto Garcia	2ff1f2e3a3	throttle: Add throttle group infrastructure Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 2fdb4de17210b733a13eb472c33cd08b45f8fd21.1433779731.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 14:00:00 +01:00
Benoît Canet	0e5b0a2d54	throttle: Extract timers from ThrottleState into a separate structure Group throttling will share ThrottleState between multiple bs. As a consequence the ThrottleState will be accessed by multiple aio context. Timers are tied to their aio context so they must go out of the ThrottleState structure. This commit paves the way for each bs of a common ThrottleState to have its own timer. Signed-off-by: Benoit Canet <benoit.canet@nodalink.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 6cf9ea96d8b32ae2f8769cead38f68a6a0c8c909.1433779731.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 14:00:00 +01:00
Kevin Wolf	f4a769abaa	raw-posix: Fix .bdrv_co_get_block_status() for unaligned image size Image files with an unaligned image size have a final hole that starts at EOF, i.e. in the middle of a sector. Currently, *pnum == 0 is returned when checking the status of this sector. In qemu-img, this triggers an assertion failure. In order to fix this, one type for the sector that contains EOF must be found. Treating a hole as data is safe, so this patch rounds the calculated number of data sectors up, so that a partial sector at EOF is treated as a full data sector. This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1229394 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1433840108-9996-1-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-06-12 13:58:33 +01:00
Markus Armbruster	8809cfc38e	blkdebug: Simplify passing of Error through qemu_opts_foreach() Cc: Kevin Wolf <kwolf@redhat.com> Cc: qemu-block@nongnu.org Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com>	2015-06-09 07:40:23 +02:00
Markus Armbruster	28d0de7a4f	QemuOpts: Convert qemu_opts_foreach() to Error Retain the function value for now, to permit selective conversion of its callers. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com>	2015-06-09 07:37:37 +02:00
Markus Armbruster	a4c7367f7d	QemuOpts: Drop qemu_opts_foreach() parameter abort_on_failure When the argument is non-zero, qemu_opts_foreach() stops on callback returning non-zero, and returns that value. When the argument is zero, it doesn't stop, and returns the bit-wise inclusive or of all the return values. Funky :) The callers that pass zero could just as well pass one, because their callbacks can't return anything but zero: * qemu_add_globals()'s callback qdev_add_one_global() * qemu_config_write()'s callback config_write_opts() * main()'s callbacks default_driver_check(), drive_enable_snapshot(), vnc_init_func() Drop the parameter, and always stop. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com>	2015-06-08 19:33:20 +02:00
Fam Zheng	44f192f364	iscsi: Remove pointless runtime check of macro value raw_bsd already has QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE != 512), so iscsi should relax. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2015-06-03 14:21:23 +03:00
Daniel P. Berrange	8336aafae1	qcow2/qcow: protect against uninitialized encryption key When a qcow[2] file is opened, if the header reports an encryption method, this is used to set the 'crypt_method_header' field on the BDRVQcow[2]State struct, and the 'encrypted' flag in the BDRVState struct. When doing I/O operations, the 'crypt_method' field on the BDRVQcow[2]State struct is checked to determine if encryption needs to be applied. The crypt_method_header value is copied into crypt_method when the bdrv_set_key() method is called. The QEMU code which opens a block device is expected to always do a check if (bdrv_is_encrypted(bs)) { bdrv_set_key(bs, ....key...); } If code forgets to do this, then 'crypt_method' is never set and so when I/O is performed, QEMU writes plain text data into a sector which is expected to contain cipher text, or when reading, will return cipher text instead of plain text. Change the qcow[2] code to consult bs->encrypted when deciding whether encryption is required, and assert(s->crypt_method) to protect against cases where the caller forgets to set the encryption key. Also put an assert in the set_key methods to protect against the case where the caller sets an encryption key on a block device that does not have encryption Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Alberto Garcia	d1b4efe5c4	qcow2: style fixes in qcow2-cache.c Fix pointer declaration to make it consistent with the rest of the code. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Alberto Garcia	a3f1afb43a	qcow2: make qcow2_cache_put() a void function This function never receives an invalid table pointer, so we can make it void and remove all the error checking code. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Alberto Garcia	812e4082ca	qcow2: use a hash to look for entries in the L2 cache The current cache algorithm traverses the array starting always from the beginning, so the average number of comparisons needed to perform a lookup is proportional to the size of the array. By using a hash of the offset as the starting point, lookups are faster and independent from the array size. The hash is computed using the cluster number of the table, multiplied by 4 to make it perform better when there are collisions. In my tests, using a cache with 2048 entries, this reduces the average number of comparisons per lookup from 430 to 2.5. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Alberto Garcia	fdfbca82a0	qcow2: remove qcow2_cache_find_entry_to_replace() A cache miss means that the whole array was traversed and the entry we were looking for was not found, so there's no need to traverse it again in order to select an entry to replace. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Alberto Garcia	2693310ecc	qcow2: use an LRU algorithm to replace entries from the L2 cache The current algorithm to evict entries from the cache gives always preference to those in the lowest positions. As the size of the cache increases, the chances of the later elements of being removed decrease exponentially. In a scenario with random I/O and lots of cache misses, entries in positions 8 and higher are rarely (if ever) evicted. This can be seen even with the default cache size, but with larger caches the problem becomes more obvious. Using an LRU algorithm makes the chances of being removed from the cache independent from the position. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Alberto Garcia	baf07d60f5	qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty() Since all tables are now stored together, it is possible to obtain the position of a particular table directly from its address, so the operation becomes O(1). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Alberto Garcia	72e80b8901	qcow2: use one single memory block for the L2/refcount cache tables The qcow2 L2/refcount cache contains one separate table for each cache entry. Doing one allocation per table adds unnecessary overhead and it also requires us to store the address of each table separately. Since the size of the cache is constant during its lifetime, it's better to have an array that contains all the tables using one single allocation. In my tests measuring freshly created caches with sizes 128MB (L2) and 32MB (refcount) this uses around 10MB of RAM less. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Fam Zheng	13c4941cdd	vmdk: Fix overflow if l1_size is 0x20000000 Richard Jones caught this bug with afl fuzzer. In fact, that's the only possible value to overflow (extent->l1_size = 0x20000000) l1_size: l1_size = extent->l1_size * sizeof(long) => 0x80000000; g_try_malloc returns NULL because l1_size is interpreted as negative during type casting from 'int' to 'gsize', which yields a enormous value. Hence, by coincidence, we get a "not too bad" behavior: qemu-img: Could not open '/tmp/afl6.img': Could not open '/tmp/afl6.img': Cannot allocate memory Values larger than 0x20000000 will be refused by the validation in vmdk_add_extent. Values smaller than 0x20000000 will not overflow l1_size. Cc: qemu-stable@nongnu.org Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Fam Zheng	5e82a31eb9	vmdk: Fix next_cluster_sector for compressed write This fixes the bug introduced by commit `c6ac36e` (vmdk: Optimize cluster allocation). Sometimes, write_len could be larger than cluster size, because it contains both data and marker. We must advance next_cluster_sector in this case, otherwise the image gets corrupted. Cc: qemu-stable@nongnu.org Reported-by: Antoni Villalonga <qemu-list@friki.cat> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:00 +02:00
Kevin Wolf	ecbda7a225	qcow2: Flush pending discards before allocating cluster Before a freed cluster can be reused, pending discards for this cluster must be processed. The original assumption was that this was not a problem because discards are only cached during discard/write zeroes operations, which are synchronous so that no concurrent write requests can cause cluster allocations. However, the discard/write zeroes operation itself can allocate a new L2 table (and it has to in order to put zero flags there), so make sure we can cope with the situation. This fixes https://bugs.launchpad.net/bugs/1349972. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2015-05-22 17:08:00 +02:00
Paolo Bonzini	a53f1a95f9	block: get_block_status: use "else" when testing the opposite condition A bit of Boolean algebra (and common sense) tells us that the second "if" here is looking for blocks that are not allocated. This is the opposite of the "if" that sets BDRV_BLOCK_ALLOCATED, and thus it can use an "else". Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1431599702-10431-1-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Fam Zheng	9eeb6dd1b2	block: Fix NULL deference for unaligned write if qiov is NULL For zero write, callers pass in NULL qiov (qemu-io "write -z" or scsi-disk "write same"). Commit `fc3959e466` fixed bdrv_co_write_zeroes which is the common case for this bug, but it still exists in bdrv_aio_write_zeroes. A simpler fix would be in bdrv_co_do_pwritev which is the NULL dereference point and covers both cases. So don't access it in bdrv_co_do_pwritev in this case, use three aligned writes. [Initialize ret to 0 in bdrv_co_do_zero_pwritev() to avoid uninitialized variable warning with gcc 4.9.2. --Stefan] Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1431522721-3266-3-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Fam Zheng	d01c07f222	Revert "block: Fix unaligned zero write" This reverts commit `fc3959e466`. The core write code already handles the case, so remove this duplication. Because commit `61007b316` moved the touched code from block.c to block/io.c, the change is manually reverted. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1431522721-3266-2-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Denis V. Lunev	459b4e6612	block: align bounce buffers to page The following sequence int fd = open(argv[1], O_RDWR \| O_CREAT \| O_DIRECT, 0644); for (i = 0; i < 100000; i++) write(fd, buf, 4096); performs 5% better if buf is aligned to 4096 bytes. The difference is quite reliable. On the other hand we do not want at the moment to enforce bounce buffering if guest request is aligned to 512 bytes. The patch changes default bounce buffer optimal alignment to MAX(page size, 4k). 4k is chosen as maximal known sector size on real HDD. The justification of the performance improve is quite interesting. From the kernel point of view each request to the disk was split by two. This could be seen by blktrace like this: 9,0 11 1 0.000000000 11151 Q WS 312737792 + 1023 [qemu-img] 9,0 11 2 0.000007938 11151 Q WS 312738815 + 8 [qemu-img] 9,0 11 3 0.000030735 11151 Q WS 312738823 + 1016 [qemu-img] 9,0 11 4 0.000032482 11151 Q WS 312739839 + 8 [qemu-img] 9,0 11 5 0.000041379 11151 Q WS 312739847 + 1016 [qemu-img] 9,0 11 6 0.000042818 11151 Q WS 312740863 + 8 [qemu-img] 9,0 11 7 0.000051236 11151 Q WS 312740871 + 1017 [qemu-img] 9,0 5 1 0.169071519 11151 Q WS 312741888 + 1023 [qemu-img] After the patch the pattern becomes normal: 9,0 6 1 0.000000000 12422 Q WS 314834944 + 1024 [qemu-img] 9,0 6 2 0.000038527 12422 Q WS 314835968 + 1024 [qemu-img] 9,0 6 3 0.000072849 12422 Q WS 314836992 + 1024 [qemu-img] 9,0 6 4 0.000106276 12422 Q WS 314838016 + 1024 [qemu-img] and the amount of requests sent to disk (could be calculated counting number of lines in the output of blktrace) is reduced about 2 times. Both qemu-img and qemu-io are affected while qemu-kvm is not. The guest does his job well and real requests comes properly aligned (to page). Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1431441056-26198-3-git-send-email-den@openvz.org CC: Paolo Bonzini <pbonzini@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Denis V. Lunev	4196d2f030	block: minimal bounce buffer alignment The patch introduces new concept: minimal memory alignment for bounce buffers. Original so called "optimal" value is actually minimal required value for aligment. It should be used for validation that the IOVec is properly aligned and bounce buffer is not required. Though, from the performance point of view, it would be better if bounce buffer or IOVec allocated by QEMU will be aligned stricter. The patch does not change any alignment value yet. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1431441056-26198-2-git-send-email-den@openvz.org CC: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Paolo Bonzini	eaf5fe2dd4	block: return EPERM on writes or discards to read-only devices This is the behavior in the operating system, for example Linux's blkdev_write_iter has the following: if (bdev_read_only(I_BDEV(bd_inode))) return -EPERM; This does not apply to opening a device for read/write, when the device only supports read-only operation. In this case any of EACCES, EPERM or EROFS is acceptable depending on why writing is not possible. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1431013548-22492-1-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:33 +01:00
Denis V. Lunev	ddd2ef2ce8	block/parallels: improve image writing performance further Try to perform IO for the biggest continuous block possible. All blocks abscent in the image are accounted in the same type and preallocation is made for all of them at once. The performance for sequential write is increased from 200 Mb/sec to 235 Mb/sec on my SSD HDD. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-28-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:32 +01:00
Denis V. Lunev	19f5dc1591	block/parallels: optimize linear image expansion Plain image expansion spends a lot of time to update image file size. This seriously affects the performance. The following simple test qemu_img create -f parallels -o cluster_size=64k ./1.hds 64G qemu_io -n -c "write -P 0x11 0 1024M" ./1.hds could be improved if the format driver will pre-allocate some space in the image file with a reasonable chunk. This patch preallocates 128 Mb using bdrv_write_zeroes, which should normally use fallocate() call inside. Fallback to older truncate() could be used as a fallback using image open options thanks to the previous patch. The benefit is around 15%. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Karan <rkagan@parallels.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-27-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:32 +01:00
Denis V. Lunev	d61790112f	block/parallels: add prealloc-mode and prealloc-size open paramemets This is preparational commit for tweaks in Parallels image expansion. The idea is that enlarge via truncate by one data block is slow. It would be much better to use fallocate via bdrv_write_zeroes and expand by some significant amount at once. Original idea with sequential file writing to the end of the file without fallocate/truncate would be slower than this approach if the image is expanded with several operations: - each image expanding means file metadata update, i.e. filesystem journal write. Truncate/write to newly truncated space update file metadata twice thus truncate removal helps. With fallocate call inside bdrv_write_zeroes file metadata is updated only once and this should happen infrequently thus this approach is the best one for the image expansion - tail writes are ordered, i.e. the guest IO queue could not be sent immediately to the host introducing additional IO delays This patch just adds proper parameters into BDRVParallelsState and performs options parsing in parallels_open. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-26-git-send-email-den@openvz.org CC: Roman Kagan <rkagan@parallels.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:32 +01:00
Denis V. Lunev	0d31c7c200	block/parallels: delay writing to BAT till bdrv_co_flush_to_os The idea is that we do not need to immediately sync BAT to the image as from the guest point of view there is a possibility that IO is lost even in the physical controller until flush command was finished. bdrv_co_flush_to_os is exactly the right place for this purpose. Technically the patch uses loaded BAT data as a cache and performs actual on-disk metadata updates in parallels_co_flush_to_os callback. This patch speed ups qemu-img create -f parallels -o cluster_size=64k ./1.hds 64G qemu-io -f parallels -c "write -P 0x11 0 1024k" 1.hds writing from 50-60 Mb/sec to 80-90 Mb/sec on rotational media and from 160 Mb/sec to 190 Mb/sec on SSD disk. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-25-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:32 +01:00
Denis V. Lunev	2d68e22e94	block/parallels: create bat_entry_off helper calculate offset of the BAT entry in the image file. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-24-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:32 +01:00
Denis V. Lunev	6953d92078	block/parallels: improve image reading performance Try to perform IO for the biggest continuous block possible. The performance for sequential read is increased from 220 Mb/sec to 360 Mb/sec for continous image on my SSD HDD. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-23-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:32 +01:00
Denis V. Lunev	6dd6b9f144	block/parallels: implement incorrect close detection The software driver must set inuse field in Parallels header to 0x746F6E59 when the image is opened in read-write mode. The presence of this magic in the header on open forces image consistency check. There is an unfortunate trick here. We can not check for inuse in parallels_check as this will happen too late. It is possible to do that for simple check, but during the fix this would always report an error as the image was opened in BDRV_O_RDWR mode. Thus we save the flag in BDRVParallelsState for this. On the other hand, nothing should be done to clear inuse in parallels_check. Generic close will do the job right. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-21-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:32 +01:00
Denis V. Lunev	49ad646731	block/parallels: implement parallels_check method of block driver The check is very simple at the moment. It calculates necessary stats and fix only the following errors: - space leak at the end of the image. This would happens due to preallocation - clusters outside the image are zeroed. Nothing else could be done here Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-20-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:32 +01:00
Denis V. Lunev	23d6bd3bd1	block/parallels: move parallels_open/probe to the very end of the file This will help to avoid forward declarations for upcoming parallels_check Some very obvious formatting fixes were made to the moved code to make checkpatch happy. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-19-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-05-22 09:37:32 +01:00

1 2 3 4 5 ...

2199 Commits