mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Kevin Wolf	718c0fce2f	qcow2: Fix full preallocation with external data file preallocate_co() already gave the data file the full size without forwarding the requested preallocation mode to the protocol. When bdrv_co_truncate() was called later with the preallocation mode, the file didn't actually grow any more, so the data file stayed unallocated even if full preallocation was requested. Pass the right preallocation mode to preallocate_co() and remove the second bdrv_co_truncate() to fix this. As a side effect, the ugly one-byte write in preallocate_co() is replaced with a truncate call, now leaving the last block unallocated on the protocol level as it should be. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-04-30 15:29:00 +02:00
Kevin Wolf	360bd07471	qcow2: Add errp to preallocate_co() We'll add a bdrv_co_truncate() call in the next patch which can return an Error that we don't want to discard. So add an errp parameter to preallocate_co(). Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-04-30 15:29:00 +02:00
Kevin Wolf	f29fbf7c6b	qcow2: Avoid COW during metadata preallocation Limiting the allocation to INT_MAX bytes isn't particularly clever because it means that the final cluster will be a partial cluster which will be completed through a COW operation. This results in unnecessary data read and write requests which lead to an unwanted non-sparse filesystem block for metadata preallocation. Align the maximum allocation size down to the cluster size to avoid this situation. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-04-30 15:29:00 +02:00
Kevin Wolf	93e32b3e20	qcow2: Fix preallocation bdrv_pwrite to wrong file With an external data file, preallocate_co() must write the final byte to the external data file, not to the qcow2 image file. This is harmless for preallocation of newly created images (only the qcow2 file size is increased to the virtual disk size while it should be much smaller), but with preallocated resize, it could in theory cause visible corruption if the metadata of the image is larger than the data (e.g. lots of bitmaps). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-04-16 16:23:24 +02:00
Kevin Wolf	a0cf83639c	qcow2: Fix data file error condition in qcow2_co_create() We were trying to check whether bdrv_open_blockdev_ref() returned success, but accidentally checked the wrong variable. Spotted by Coverity (CID 1399703). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>	2019-03-19 15:49:29 +01:00
Peter Maydell	523a2a42c3	Pull request -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+ber27ys35W+dsvQfe+BBqr8OQ4FAlyIFSwACgkQfe+BBqr8 OQ7wghAAm16eCEr57oTO7QXR3y8uVFsKqXBn9cNH6nbrFp2PUQSglwMDKBls1Z5m olF23X/JaqSlSmkL9BBuzDZ6Up+kkHKuxPq4/5RKXfiDI0pr3R0eqts0COAlaN9q Bew3ipj99m8gzMi2093AW4+Ob0N3658fuDTGLe1M1Uoy7CEg1QJ7rVOBBEui7vIl RbZ8l/Zmb4ldNpB3lnE4Nh9ue8fy0RAj3Nai161nCnNeXNF/VzD3Ye8bojSBbnux PIMX6/RWmykX4feIf9QP8apDpxX4HkyuPq5EdwT9PD8PwdyXPAXZtsYUNCuNtQuk n5VKFVgFYgqUclBeVHmrMYPU4K4iCFQp4/Fua7wzPEC0iG05NiiDv91oVkEJCp3L ManHeuGfNLCcXaIntKZhuJl1cK8yMM3yDww6/pPTehrPjcyvKa0NOqhQBExektcD R6q7maJRzFaxSxdcs+Zzuog9zESvH1mlJxQCKzeYhAP0kkxInyTELE/Vbx37xuqR RFfZYyVQ6x87Q/sxHx4EMiV97WUM8elZOQdSEC/okt5WUUNpgIu0WF9nSQ1VKZ8C CZmv5xh9ogfwvB/kOm6IVwNkLvVagJQcLwddORI5LLXLbSIUcuwVSuyMp/7iDtQ/ hnHkGs2mIJ2JUYbSSNsSJNs6oTurn8eTFCeGoYKJgd9l4QxaThU= =ekU+ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging Pull request # gpg: Signature made Tue 12 Mar 2019 20:23:08 GMT # gpg: using RSA key F9B7ABDBBCACDF95BE76CBD07DEF8106AAFC390E # gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" [full] # Primary key fingerprint: FAEB 9711 A12C F475 812F 18F2 88A9 064D 1835 61EB # Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76 CBD0 7DEF 8106 AAFC 390E * remotes/jnsnow/tags/bitmaps-pull-request: (22 commits) tests/qemu-iotests: add bitmap resize test 246 block/qcow2-bitmap: Allow resizes with persistent bitmaps block/qcow2-bitmap: Don't check size for IN_USE bitmap docs/interop/qcow2: Improve bitmap flag in_use specification bitmaps: Fix typo in function name block/dirty-bitmaps: implement inconsistent bit block/dirty-bitmaps: disallow busy bitmaps as merge source block/dirty-bitmaps: prohibit removing readonly bitmaps block/dirty-bitmaps: prohibit readonly bitmaps for backups block/dirty-bitmaps: add block_dirty_bitmap_check function block/dirty-bitmap: add inconsistent status block/dirty-bitmaps: add inconsistent bit iotests: add busy/recording bit test to 124 blockdev: remove unused paio parameter documentation block/dirty-bitmaps: move comment block block/dirty-bitmaps: unify qmp_locked and user_locked calls block/dirty-bitmap: explicitly lock bitmaps with successors nbd: change error checking order for bitmaps block/dirty-bitmap: change semantics of enabled predicate block/dirty-bitmap: remove set/reset assertions against enabled bit ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # tests/qemu-iotests/group	2019-03-13 17:30:34 +00:00
Alberto Garcia	8a2ce0bc1e	block: Add a 'mutable_opts' field to BlockDriver If we reopen a BlockDriverState and there is an option that is present in bs->options but missing from the new set of options then we have to return an error unless the driver is able to reset it to its default value. This patch adds a new 'mutable_opts' field to BlockDriver. This is a list of runtime options that can be modified during reopen. If an option in this list is unspecified on reopen then it must be reset (or return an error). Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-12 20:30:14 +01:00
John Snow	d19c6b36ff	block/qcow2-bitmap: Allow resizes with persistent bitmaps Since we now load all bitmaps into memory anyway, we can just truncate them in-memory and then flush them back to disk. Just in case, we will still check and enforce that this shortcut is valid -- i.e. that any bitmap described on-disk is indeed in-memory and can be modified. If there are any inconsistent bitmaps, we refuse to allow the truncate as we do not actually load these bitmaps into memory, and it isn't safe or reasonable to attempt to truncate corrupted data. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190311185147.52309-4-vsementsov@virtuozzo.com [vsementsov: drop bitmap flushing, fix block comments style] Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 14:57:38 -04:00
Kevin Wolf	6c3944dc62	qcow2: Implement data-file-raw create option Provide an option to force QEMU to always keep the external data file consistent as a standalone read-only raw image. At the moment, this means making sure that write_zeroes requests are forwarded to the data file instead of just updating the metadata, and checking that no backing file is used. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	9b890bdcb6	qcow2: Store data file name in the image Rather than requiring that the external data file node is passed explicitly when creating the qcow2 node, store the filename in the designated header extension during .bdrv_create and read it from there as a default during .bdrv_open. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	dcc98687f8	qcow2: Creating images with external data file This adds a .bdrv_create option to use an external data file. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	0e8c08be27	qcow2: Add basic data-file infrastructure This adds a .bdrv_open option to specify the external data file node. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	966b000f49	qcow2: External file I/O This changes the qcow2 implementation to direct all guest data I/O to s->data_file rather than bs->file, while metadata I/O still uses bs->file. At the moment, this is still always the same, but soon we'll add options to set s->data_file to an external data file. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:46 +01:00
Kevin Wolf	37be14036b	qcow2: Prepare qcow2_co_block_status() for data file Offset 0 cannot be assumed to mean an unallocated cluster any more. Instead, the cluster type needs to be checked. *file must refer to the data file instead of the image file if a valid offset is returned from qcow2_co_block_status(). Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Kevin Wolf	77e023ff79	qcow2: Return 0/-errno in qcow2_alloc_compressed_cluster_offset() qcow2_alloc_compressed_cluster_offset() used to return the cluster offset for success and 0 for error. This doesn't only conflict with 0 as a valid host offset, but also loses the error code. Similar to the change made to qcow2_alloc_cluster_offset() for uncompressed clusters in commit `148da7ea9d`, make the function return 0/-errno and return the allocated cluster offset in a by-reference parameter. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Kevin Wolf	93c2493646	qcow2: Basic definitions for external data files This adds basic constants, struct fields and helper function for external data file support to the implementation. QCOW2_INCOMPAT_MASK and QCOW2_AUTOCLEAR_MASK are not updated yet so that opening images with an external data file still fails (we don't handle them correctly yet). Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Kevin Wolf	c5e86ebc11	qcow2: Simplify preallocation code Image creation already involves a bdrv_co_truncate() call, which allows to specify a preallocation mode. Just pass the right mode there and remove the code that is made redundant by this. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Alberto Garcia	af39bd0d9a	qcow2: Default to 4KB for the qcow2 cache entry size QEMU 2.12 (commit `1221fe6f63`) introduced a new setting called l2-cache-entry-size that allows making entries on the qcow2 L2 cache smaller than the cluster size. I have been performing several tests with different cluster and entry sizes and all of them show that reducing the entry size (aka L2 slice) consistently improves I/O performance, notably during random I/O (all tests done with sequential I/O show similar results). This is to be expected because loading and evicting an L2 slice is more expensive the larger the slice is. Here are some numbers on fully populated 40GB qcow2 images. The rightmost column represents the maximum L2 cache size in both cases. Cluster size = 64 KB \|-------------+--------------+--------------+--------------\| \| \| 1MB L2 cache \| 3MB L2 cache \| 5MB L2 cache \| \|-------------+--------------+--------------+--------------\| \| 4KB slices \| 6545 IOPS \| 12045 IOPS \| 55680 IOPS \| \| 16KB slices \| 5177 IOPS \| 9798 IOPS \| 56278 IOPS \| \| 64KB slices \| 2718 IOPS \| 5326 IOPS \| 57355 IOPS \| \|-------------+--------------+--------------+--------------\| Cluster size = 256 KB \|--------------+----------------+--------------+-----------------\| \| \| 512KB L2 cache \| 1MB L2 cache \| 1280KB L2 cache \| \|--------------+----------------+--------------+-----------------\| \| 4KB slices \| 8539 IOPS \| 21071 IOPS \| 55417 IOPS \| \| 64KB slices \| 3598 IOPS \| 9772 IOPS \| 57687 IOPS \| \| 256KB slices \| 1415 IOPS \| 4120 IOPS \| 58001 IOPS \| \|--------------+----------------+--------------+-----------------\| As can be seen in the numbers, the only exception to the rule is when the cache is large enough to hold all L2 tables. This is also to be expected because in this case no cache entry is ever evicted so reducing its size doesn't bring any benefit. This patch sets the default L2 cache entry size to 4KB except when the cache is large enough for the whole disk. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-08 12:26:45 +01:00
Peter Maydell	adf2e451f3	Block layer patches: - Block graph change fixes (avoid loops, cope with non-tree graphs) - bdrv_set_aio_context() related fixes - HMP snapshot commands: Use only tag, not the ID to identify snapshots - qmeu-img, commit: Error path fixes - block/nvme: Build fix for gcc 9 - MAINTAINERS updates - Fix various issues with bdrv_refresh_filename() - Fix various iotests - Include LUKS overhead in qemu-img measure for qcow2 - A fix for vmdk's image creation interface -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJcc/knAAoJEH8JsnLIjy/WptQP/3F8Lh52H4egXaP7NUUuDjQM AhqhuDAp/EZBS+xim9kLTogNJADe/rMWdSX/YB5aLpSPYbjasC66NgaLhd6QewgQ VIcsLUdlYAyZ5ZjJytimfMTLwm1X02RmVIe55y52DTY8LlfViZzOlf3qwqPm00ao EJB2cl8UJLM+PVEu59cCw3R0/06LY+WIJRB32d3tnCBRTkaJwfR9h4lrp/juVcFZ U+2eWU68KMbUHSYiWANowN+KRV3uPY4HVA98v3F0vDmcBxlVHOeBg6S+PcT7tK8p huzCMwcdwUyPMJgVs/+WBtUnbG0jN6SHUYmFLz859UMVgBnCw5tzBMf8qw1wOA4A Iw+zor27Pxj4IlxcLPp5f97YZ8k9acdMR2VKPH6xLJZ1JF+sKa54RfzESd5EJeIj Mfcp773H0lIaWcFJ6RY1F0L1E1ta7QigwNBiWMdYfh0a0EWHnDvGyYeaSPYEQ+rl e8bZOcfrYwVI7DTDiZOIkGA9D8DXEPDNp+sl6s1DxeY69D0NNaXTtCPqFNNAbFbd 20uD7yDRZlWq32cQB/K9D5cSkZRSOzdUpLfLU3nQU2+dz11x6OpM6m7DVboSrztD 1HtPPDzDEvH5dOP7ibd60s+ntjkSiNfNkUgnuVrBE/d/PocC1eHHpZt5V7f43Ofb RxVwH5+smzQ9nsNBfQR0 =gaah -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - Block graph change fixes (avoid loops, cope with non-tree graphs) - bdrv_set_aio_context() related fixes - HMP snapshot commands: Use only tag, not the ID to identify snapshots - qmeu-img, commit: Error path fixes - block/nvme: Build fix for gcc 9 - MAINTAINERS updates - Fix various issues with bdrv_refresh_filename() - Fix various iotests - Include LUKS overhead in qemu-img measure for qcow2 - A fix for vmdk's image creation interface # gpg: Signature made Mon 25 Feb 2019 14:18:15 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (71 commits) iotests: Skip 211 on insufficient memory vmdk: false positive of compat6 with hwversion not set iotests: add LUKS payload overhead to 178 qemu-img measure test qcow2: include LUKS payload overhead in qemu-img measure iotests.py: s/_/-/g on keys in qmp_log() iotests: Let 045 be run concurrently iotests: Filter SSH paths iotests.py: Filter filename in any string value iotests.py: Add is_str() iotests: Fix 207 to use QMP filters for qmp_log iotests: Fix 232 for LUKS iotests: Remove superfluous rm from 232 iotests: Fix 237 for Python 2.x iotests: Re-add filename filters iotests: Test json:{} filenames of internal BDSs block: BDS options may lack the "driver" option block/null: Generate filename even with latency-ns block/curl: Implement bdrv_refresh_filename() block/curl: Harmonize option defaults block/nvme: Fix bdrv_refresh_filename() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-02-26 19:04:47 +00:00
Stefan Hajnoczi	61914f8906	qcow2: include LUKS payload overhead in qemu-img measure LUKS encryption reserves clusters for its own payload data. The size of this area must be included in the qemu-img measure calculation so that we arrive at the correct minimum required image size. (Ab)use the qcrypto_block_create() API to determine the payload overhead. We discard the payload data that qcrypto thinks will be written to the image. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190218104525.23674-2-stefanha@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:28 +01:00
Max Reitz	2654267cc1	block: Add strong_runtime_opts to BlockDriver This new field can be set by block drivers to list the runtime options they accept that may influence the contents of the respective BDS. As of a follow-up patch, this list will be used by the common bdrv_refresh_filename() implementation to decide which options to put into BDS.full_open_options (and consequently whether a JSON filename has to be created), thus freeing the drivers of having to implement that logic themselves. Additionally, this patch adds the field to all of the block drivers that need it and sets it accordingly. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-22-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Max Reitz	998c201923	block: Add BDS.auto_backing_file If the backing file is overridden, this most probably does change the guest-visible data of a BDS. Therefore, we will need to consider this in bdrv_refresh_filename(). To see whether it has been overridden, we might want to compare bs->backing_file and bs->backing->bs->filename. However, bs->backing_file is changed by bdrv_set_backing_hd() (which is just used to change the backing child at runtime, without modifying the image header), so bs->backing_file most of the time simply contains a copy of bs->backing->bs->filename anyway, so it is useless for such a comparison. This patch adds an auto_backing_file BDS field which contains the backing file path as indicated by the image header, which is not changed by bdrv_set_backing_hd(). Because of bdrv_refresh_filename() magic, however, a BDS's filename may differ from what has been specified during bdrv_open(). Then, the comparison between bs->auto_backing_file and bs->backing->bs->filename may fail even though bs->backing was opened from bs->auto_backing_file. To mitigate this, we can copy the real BDS's filename (after the whole bdrv_open() and bdrv_refresh_filename() process) into bs->auto_backing_file, if we know the former has been opened based on the latter. This is only possible if no options modifying the backing file's behavior have been specified, though. To simplify things, this patch only copies the filename from the backing file if no options have been specified for it at all. Furthermore, there are cases where an overlay is created by qemu which already contains a BDS's filename (e.g. in blockdev-snapshot-sync). We do not need to worry about updating the overlay's bs->auto_backing_file there, because we actually wrote a post-bdrv_refresh_filename() filename into the image header. So all in all, there will be false negatives where (as of a future patch) bdrv_refresh_filename() will assume that the backing file differs from what was specified in the image header, even though it really does not. However, these cases should be limited to where (1) the user actually did override something in the backing chain (e.g. by specifying options for the backing file), or (2) the user executed a QMP command to change some node's backing file (e.g. change-backing-file or block-commit with @backing-file given) where the given filename does not happen to coincide with qemu's idea of the backing BDS's filename. Then again, (1) really is limited to -drive. With -blockdev or blockdev-add, you have to adhere to the schema, so a user cannot give partial "unimportant" options (e.g. by just setting backing.node-name and leaving the rest to the image header). Therefore, trying to fix this would mean trying to fix something for -drive only. To improve on (2), we would need a full infrastructure to "canonicalize" an arbitrary filename (+ options), so it can be compared against another. That seems a bit over the top, considering that filenames nowadays are there mostly for the user's entertainment. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-5-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:25 +01:00
Vladimir Sementsov-Ogievskiy	c793d4ff20	block/qcow2: use qemu_iovec_init_buf Use new qemu_iovec_init_buf() instead of qemu_iovec_init_external( ... , 1), which simplifies the code. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190218140926.333779-10-vsementsov@virtuozzo.com Message-Id: <20190218140926.333779-10-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-02-22 09:42:13 +00:00
Andrey Shinkevich	b8968c875f	qcow2: Add list of bitmaps to ImageInfoSpecificQCow2 In the 'Format specific information' section of the 'qemu-img info' command output, the supplemental information about existing QCOW2 bitmaps will be shown, such as a bitmap name, flags and granularity: image: /vz/vmprivate/VM1/harddisk.hdd file format: qcow2 virtual size: 64G (68719476736 bytes) disk size: 3.0M cluster_size: 1048576 Format specific information: compat: 1.1 lazy refcounts: true bitmaps: [0]: flags: [0]: in-use [1]: auto name: back-up1 granularity: 65536 [1]: flags: [0]: in-use [1]: auto name: back-up2 granularity: 65536 refcount bits: 16 corrupt: false Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1549638368-530182-3-git-send-email-andrey.shinkevich@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-02-11 14:35:43 -06:00
Andrey Shinkevich	1bf6e9ca92	bdrv_query_image_info Error parameter added Inform a user in case qcow2_get_specific_info fails to obtain QCOW2 image specific information. This patch is preliminary to the one "qcow2: Add list of bitmaps to ImageInfoSpecificQCow2". Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <1549638368-530182-2-git-send-email-andrey.shinkevich@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-02-11 14:35:43 -06:00
Kevin Wolf	4720cbeea1	block: Fix hangs in synchronous APIs with iothreads In the block layer, synchronous APIs are often implemented by creating a coroutine that calls the asynchronous coroutine-based implementation and then waiting for completion with BDRV_POLL_WHILE(). For this to work with iothreads (more specifically, when the synchronous API is called in a thread that is not the home thread of the block device, so that the coroutine will run in a different thread), we must make sure to call aio_wait_kick() at the end of the operation. Many places are missing this, so that BDRV_POLL_WHILE() keeps hanging even if the condition has long become false. Note that bdrv_dec_in_flight() involves an aio_wait_kick() call. This corresponds to the BDRV_POLL_WHILE() in the drain functions, but it is generally not enough for most other operations because they haven't set the return value in the coroutine entry stub yet. To avoid race conditions there, we need to kick after setting the return value. The race window is small enough that the problem doesn't usually surface in the common path. However, it does surface and causes easily reproducible hangs if the operation can return early before even calling bdrv_inc/dec_in_flight, which many of them do (trivial error or no-op success paths). The bug in bdrv_truncate(), bdrv_check() and bdrv_invalidate_cache() is slightly different: These functions even neglected to schedule the coroutine in the home thread of the node. This avoids the hang, but is obviously wrong, too. Fix those to schedule the coroutine in the right AioContext in addition to adding aio_wait_kick() calls. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-02-01 13:46:44 +01:00
Vladimir Sementsov-Ogievskiy	e23c9d7a1c	qcow2: do decompression in threads Do decompression in threads, like it is already done for compression. This improves asynchronous compressed reads performance. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-12-14 11:52:41 +01:00
Vladimir Sementsov-Ogievskiy	c3c10f7295	qcow2: aio support for compressed cluster read Allocate buffers locally and release qcow2 lock. Than, reads inside qcow2_co_preadv_compressed may be done in parallel, however all decompression is still done synchronously. Let's improve it in the following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-12-14 11:52:41 +01:00
Vladimir Sementsov-Ogievskiy	c068a1cd52	qcow2: use byte-based read in qcow2_decompress_cluster We are gradually moving away from sector-based interfaces, towards byte-based. Get rid of it here too. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-12-14 11:52:40 +01:00
Vladimir Sementsov-Ogievskiy	341926ab83	qcow2: refactor decompress_buffer - make it look more like a pair of qcow2_compress - rename the function and its parameters - drop extra out_len variable, check filling of output buffer by strm structure itself - fix code style - add some documentation Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-12-14 11:52:40 +01:00
Vladimir Sementsov-Ogievskiy	f4b3e2a960	qcow2: move decompression from qcow2-cluster.c to qcow2.c Compression is done in threads in qcow2.c. We want to do decompression in the same way, so, firstly, move it to the same file. The only change is braces around if-body in decompress_buffer, to satisfy checkpatch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-12-14 11:52:40 +01:00
Vladimir Sementsov-Ogievskiy	6994fd78b9	qcow2: make more generic interface for qcow2_compress Give explicit size both for source and destination buffers, to make it similar with decompression path and than cleanly reuse parameter structure for decompression threads. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-12-14 11:52:40 +01:00
Vladimir Sementsov-Ogievskiy	19a4448853	qcow2: use Z_OK instead of 0 for deflateInit2 return code check Use appropriate macro, corresponding to deflateInit2 spec. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-12-14 11:52:40 +01:00
Vladimir Sementsov-Ogievskiy	c972fa123c	crypto: support multiple threads accessing one QCryptoBlock The two thing that should be handled are cipher and ivgen. For ivgen the solution is just mutex, as iv calculations should not be long in comparison with encryption/decryption. And for cipher let's just keep per-thread ciphers. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2018-12-12 11:16:49 +00:00
Alberto Garcia	6f8f015c0c	qcow2: Get the request alignment for encrypted images from QCryptoBlock This doesn't have any practical effect at the moment because the values of BDRV_SECTOR_SIZE, QCRYPTO_BLOCK_LUKS_SECTOR_SIZE and QCRYPTO_BLOCK_QCOW_SECTOR_SIZE are all the same (512 bytes), but future encryption methods could have different requirements. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-11-05 15:09:54 +01:00
Peter Maydell	3b698f52f9	block/qcow2: Don't take address of fields in packed structs Taking the address of a field in a packed struct is a bad idea, because it might not be actually aligned enough for that pointer type (and thus cause a crash on dereference on some host architectures). Newer versions of clang warn about this. Avoid the bug by not using the "modify in place" byte swapping functions. There are a few places where the in-place swap function is used on something other than a packed struct field; we convert those anyway, for consistency. This patch was produced with the following spatch script (and hand-editing to fold a few resulting overlength lines): @@ expression E; @@ -be16_to_cpus(&E); +E = be16_to_cpu(E); @@ expression E; @@ -be32_to_cpus(&E); +E = be32_to_cpu(E); @@ expression E; @@ -be64_to_cpus(&E); +E = be64_to_cpu(E); @@ expression E; @@ -cpu_to_be16s(&E); +E = cpu_to_be16(E); @@ expression E; @@ -cpu_to_be32s(&E); +E = cpu_to_be32(E); @@ expression E; @@ -cpu_to_be64s(&E); +E = cpu_to_be64(E); Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: John Snow <jsnow@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-11-05 15:09:54 +01:00
Vladimir Sementsov-Ogievskiy	9c98f145df	dirty-bitmaps: clean-up bitmaps loading and migration logic This patch aims to bring the following behavior: 1. We don't load bitmaps, when started in inactive mode. It's the case of incoming migration. In this case we wait for bitmaps migration through migration channel (if 'dirty-bitmaps' capability is enabled) or for invalidation (to load bitmaps from the image). 2. We don't remove persistent bitmaps on inactivation. Instead, we only remove bitmaps after storing. This is the only way to restore bitmaps, if we decided to resume source after [failed] migration with 'dirty-bitmaps' capability enabled (which means, that bitmaps were not stored). 3. We load bitmaps on open and any invalidation, it's ok for all cases: - normal open - migration target invalidation with dirty-bitmaps capability (bitmaps are migrating through migration channel, the are not stored, so they should have IN_USE flag set and will be skipped when loading. However, it would fail if bitmaps are read-only[1]) - migration target invalidation without dirty-bitmaps capability (normal load of the bitmaps, if migrated with shared storage) - source invalidation with dirty-bitmaps capability (skip because IN_USE) - source invalidation without dirty-bitmaps capability (bitmaps were dropped, reload them) [1]: to accurately handle this, migration of read-only bitmaps is explicitly forbidden in this patch. New mechanism for not storing bitmaps when migrate with dirty-bitmaps capability is introduced: migration filed in BdrvDirtyBitmap. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: John Snow <jsnow@redhat.com>	2018-10-29 16:23:17 -04:00
Vladimir Sementsov-Ogievskiy	2ea427efff	bloc/qcow2: drop dirty_bitmaps_loaded state variable This variable doesn't work as it should, because it is actually cleared in qcow2_co_invalidate_cache() by memset(). Drop it, as the following patch will introduce new behavior. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com>	2018-10-29 16:23:15 -04:00
Vladimir Sementsov-Ogievskiy	132adb6820	block/qcow2: improve error message in qcow2_inactivate Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [Maintainer edit -- touched up error message. --js] Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com>	2018-10-29 16:23:15 -04:00
Markus Armbruster	4b5766488f	error: Fix use of error_prepend() with &error_fatal, &error_abort From include/qapi/error.h: * Pass an existing error to the caller with the message modified: * error_propagate(errp, err); * error_prepend(errp, "Could not frobnicate '%s': ", name); Fei Li pointed out that doing error_propagate() first doesn't work well when @errp is &error_fatal or &error_abort: the error_prepend() is never reached. Since I doubt fixing the documentation will stop people from getting it wrong, introduce error_propagate_prepend(), in the hope that it lures people away from using its constituents in the wrong order. Update the instructions in error.h accordingly. Convert existing error_prepend() next to error_propagate to error_propagate_prepend(). If any of these get reached with &error_fatal or &error_abort, the error messages improve. I didn't check whether that's the case anywhere. Cc: Fei Li <fli@suse.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20181017082702.5581-2-armbru@redhat.com>	2018-10-19 14:51:34 +02:00
Leonid Bloch	bd016b912c	qcow2: Explicit number replaced by a constant Signed-off-by: Leonid Bloch <lbloch@janustech.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-10-01 12:51:12 +02:00
Leonid Bloch	e957b50b8d	qcow2: Set the default cache-clean-interval to 10 minutes The default cache-clean-interval is set to 10 minutes, in order to lower the overhead of the qcow2 caches (before the default was 0, i.e. disabled). * For non-Linux platforms the default is kept at 0, because cache-clean-interval is not supported there yet. Signed-off-by: Leonid Bloch <lbloch@janustech.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-10-01 12:51:12 +02:00
Leonid Bloch	45b4949c7b	qcow2: Resize the cache upon image resizing The caches are now recalculated upon image resizing. This is done because the new default behavior of assigning L2 cache relatively to the image size, implies that the cache will be adapted accordingly after an image resize. Signed-off-by: Leonid Bloch <lbloch@janustech.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-10-01 12:51:12 +02:00
Leonid Bloch	b749562d98	qcow2: Assign the L2 cache relatively to the image size Sufficient L2 cache can noticeably improve the performance when using large images with frequent I/O. Previously, unless 'cache-size' was specified and was large enough, the L2 cache was set to a certain size without taking the virtual image size into account. Now, the L2 cache assignment is aware of the virtual size of the image, and will cover the entire image, unless the cache size needed for that is larger than a certain maximum. This maximum is set to 1 MB by default (enough to cover an 8 GB image with the default cluster size) but can be increased or decreased using the 'l2-cache-size' option. This option was previously documented as the maximum L2 cache size, and this patch makes it behave as such, instead of as a constant size. Also, the existing option 'cache-size' can limit the sum of both L2 and refcount caches, as previously. Signed-off-by: Leonid Bloch <lbloch@janustech.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-10-01 12:51:12 +02:00
Leonid Bloch	657ada52ab	qcow2: Avoid duplication in setting the refcount cache size The refcount cache size does not need to be set to its minimum value in read_cache_sizes(), as it is set to at least its minimum value in qcow2_update_options_prepare(). Signed-off-by: Leonid Bloch <lbloch@janustech.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-10-01 12:51:12 +02:00
Leonid Bloch	b6a95c6d10	qcow2: Make sizes more humanly readable Signed-off-by: Leonid Bloch <lbloch@janustech.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-10-01 12:51:12 +02:00
Peter Xu	3ab72385b2	qapi: Drop qapi_event_send_FOO()'s Error argument The generated qapi_event_send_FOO() take an Error argument. They can't actually fail, because all they do with the argument is passing it to functions that can't fail: the QObject output visitor, and the @qmp_emit callback, which is either monitor_qapi_event_queue() or event_test_emit(). Drop the argument, and pass &error_abort to the QObject output visitor and @qmp_emit instead. Suggested-by: Eric Blake <eblake@redhat.com> Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180815133747.25032-4-peterx@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Commit message rewritten, update to qapi-code-gen.txt corrected] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2018-08-28 18:21:38 +02:00
Leonid Bloch	308999e9d4	qcow2: A grammar fix in conflicting cache sizing error message Signed-off-by: Leonid Bloch <lbloch@janustech.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-30 15:35:37 +02:00
Peter Maydell	7851f1a706	Block layer patches: - Copy offloading fixes for when the copy increases the image size - Temporary revert of the removal of deprecated -drive options - Fix request serialisation in the image fleecing scenario - Fix copy-on-read crash with unaligned image size - Fix another drain crash -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJbRNLQAAoJEH8JsnLIjy/WOaQQALlZk01JohETuwGG6HGl0LdI jEEm+N0J+BlGOVjoGU67OKGidUCl5WvBsQTlyYkmlaToGuk/njWxCa/GA6+iNRnt MDq7Ovr8uZI3D+0Fuc6xg/6NBiLkukgh0Q9gMWkzn3jaNWzO2WcTr8WXwepvP6sj YtPhEQOXTT3sXf/MFY8ig7qRrZ6f7LFOoKu7LMnrD+QWDo8TY3QLZaxP9OUFHH7S A6J0LIfuRZlq79a7SgrRkCR2ddtgYyBQ+zD7PD5kf1vLW4+dOhDOutQEsZCMCPgR ft99kNhrZcJGN6n2r8/oVcvRkw5c4I1JPgakm/GoW/NllfPMebuPospKaS4wiJnB zI4YOtmco4Mfxkw/wK+Ep/bPCpxEF43uDcpPiEjsNADrdLq0eKnPn5ctwSyWlGvn ayQWxDoKoYckn/ccjtLxJ2xPws8433cTXrBdIKnJadWxi3iRNzlIKHRuEfXf9zQt G+Nq7ruysT9TPf9ifuCHcZnTsi3SLYLsjCj7pAgBkazBYE2cCI3eKN8kxsDJi7qv cWzFCpwE28pHRJ6FwtdzBVkNcfTlC/XopR1M66OzYZlLqR/4hbNhyHL3hBV+yfrM fC7mPi81ttI6e+JAgC6K8t3Ey242MjSzUYa7pJUNws7RpqUhfhr6EXXbBceJKsVW F8qKZoiIEK7wDacUiEiE =FXOo -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - Copy offloading fixes for when the copy increases the image size - Temporary revert of the removal of deprecated -drive options - Fix request serialisation in the image fleecing scenario - Fix copy-on-read crash with unaligned image size - Fix another drain crash # gpg: Signature made Tue 10 Jul 2018 16:37:52 BST # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (24 commits) block: Use common write req handling in truncate block: Fix bdrv_co_truncate overlap check block: Use common req handling in copy offloading block: Use common req handling for discard block: Fix handling of image enlarging write block: Extract common write req handling block: Use uint64_t for BdrvTrackedRequest byte fields block: Use BdrvChild to discard block: Add copy offloading trace points block: Prefix file driver trace points with "file_" Revert "block: Remove deprecated -drive geometry options" Revert "block: Remove deprecated -drive option addr" Revert "block: Remove deprecated -drive option serial" Revert "block: Remove dead deprecation warning code" block/blklogwrites: Make sure the log sector size is not too small qapi/block-core.json: Add missing documentation for blklogwrites log-append option block/backup: fix fleecing scheme: use serialized writes block: add BDRV_REQ_SERIALISING flag block: split flags in copy_range block/io: fix copy_range ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-07-10 17:28:29 +01:00
Vladimir Sementsov-Ogievskiy	67b51fb998	block: split flags in copy_range Pass read flags and write flags separately. This is needed to handle coming BDRV_REQ_NO_SERIALISING clearly in following patches. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 13:04:25 +02:00
Vladimir Sementsov-Ogievskiy	0e4e4318ea	qcow2: add overlap check for bitmap directory Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20180705151515.779173-1-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-07-09 19:43:24 +02:00
Fam Zheng	65f1899e9b	qcow2: Drop unreachable break Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20180702025836.20957-4-famz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-07-09 19:43:24 +02:00
Fam Zheng	148546c822	qcow2: Drop unused cluster_data Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20180702025836.20957-2-famz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-07-09 19:43:24 +02:00
Vladimir Sementsov-Ogievskiy	ceb029cd6f	qcow2: add compress threads Do data compression in separate threads. This significantly improve performance for qemu-img convert with -W (allow async writes) and -c (compressed) options. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-05 10:29:19 +02:00
Vladimir Sementsov-Ogievskiy	2714f13d69	qcow2: refactor data compression Make a separate function for compression to be parallelized later. - use .avail_out field instead of .next_out to calculate size of compressed data. It looks more natural and it allows to keep dest to be void pointer - set avail_out to be at least one byte less than input, to be sure avoid inefficient compression earlier Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-05 10:29:00 +02:00
Fam Zheng	e06f4639d8	qcow2: Fix src_offset in copy offloading Not updating src_offset will result in wrong data being written to dst image. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-29 14:20:56 +02:00
Kevin Wolf	8b24cd1415	qcow2: Free allocated clusters on write error If we managed to allocate the clusters, but then failed to write the data, there's a good chance that we'll still be able to free the clusters again in order to avoid cluster leaks (the refcounts are cached, so even if we can't write them out right now, we may be able to do so when the VM is resumed after a werror=stop/enospc pause). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Eric Blake <eblake@redhat.com>	2018-06-29 14:20:56 +02:00
Markus Armbruster	796d323945	block/crypto: Simplify block_crypto_{open,create}_opts_init() block_crypto_open_opts_init() and block_crypto_create_opts_init() contain a virtual visit of QCryptoBlockOptions and QCryptoBlockCreateOptions less member "format", respectively. Change their callers to put member "format" in the QDict, so they can use the generated visitors for these types instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-29 14:20:56 +02:00
Fam Zheng	354d930dc6	qcow2: Remove dead check on !ret In the beginning of the function, we initialize the local variable to 0, and in the body of the function, we check the assigned values and exit the loop immediately. So here it can never be non-zero. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-29 14:20:56 +02:00
Kevin Wolf	47e86b868d	qcow2: Remove coroutine trampoline for preallocate_co() All callers are coroutine_fns now, so we can just directly call preallocate_co(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-29 14:20:56 +02:00
Kevin Wolf	061ca8a368	block: Convert .bdrv_truncate callback to coroutine_fn bdrv_truncate() is an operation that can block (even for a quite long time, depending on the PreallocMode) in I/O paths that shouldn't block. Convert it to a coroutine_fn so that we have the infrastructure for drivers to make their .bdrv_co_truncate implementation asynchronous. This change could potentially introduce new race conditions because bdrv_truncate() isn't necessarily executed atomically any more. Whether this is a problem needs to be evaluated for each block driver that supports truncate: * file-posix/win32, gluster, iscsi, nfs, rbd, ssh, sheepdog: The protocol drivers are trivially safe because they don't actually yield yet, so there is no change in behaviour. * copy-on-read, crypto, raw-format: Essentially just filter drivers that pass the request to a child node, no problem. * qcow2: The implementation modifies metadata, so it needs to hold s->lock to be safe with concurrent I/O requests. In order to avoid double locking, this requires pulling the locking out into preallocate_co() and using qcow2_write_caches() instead of bdrv_flush(). * qed: Does a single header update, this is fine without locking. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-29 14:20:56 +02:00
Kevin Wolf	ae5475e82f	qcow2: Fix qcow2_truncate() error return value If qcow2_alloc_clusters_at() returns an error, we do need to negate it to get back the positive errno code for error_setg_errno(), but we still need to return the negative error code. Fixes: `772d1f973f` Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-29 14:20:56 +02:00
Anton Nefedov	29cd0403f1	qapi: remove empty flat union branches and types Flat unions may now have uncovered branches, so it is possible to get rid of empty types defined for that purpose only. Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1529311206-76847-3-git-send-email-anton.nefedov@virtuozzo.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2018-06-22 16:33:46 +02:00
Markus Armbruster	af91062ee1	block: Factor out qobject_input_visitor_new_flat_confused() Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-15 14:49:44 +02:00
Markus Armbruster	92adf9dbcd	block: Clean up a misuse of qobject_to() in .bdrv_co_create_opts() The following pattern occurs in the .bdrv_co_create_opts() methods of parallels, qcow, qcow2, qed, vhdx and vpc: qobj = qdict_crumple_for_keyval_qiv(qdict, errp); qobject_unref(qdict); qdict = qobject_to(QDict, qobj); if (qdict == NULL) { ret = -EINVAL; goto done; } v = qobject_input_visitor_new_keyval(QOBJECT(qdict)); [...] ret = 0; done: qobject_unref(qdict); [...] return ret; If qobject_to() fails, we return failure without setting errp. That's wrong. As far as I can tell, it cannot fail here. Clean it up anyway, by removing the useless conversion. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-15 14:49:44 +02:00
Markus Armbruster	e5af0da1dc	block: Fix -blockdev for certain non-string scalars Configuration flows through the block subsystem in a rather peculiar way. Configuration made with -drive enters it as QemuOpts. Configuration made with -blockdev / blockdev-add enters it as QAPI type BlockdevOptions. The block subsystem uses QDict, QemuOpts and QAPI types internally. The precise flow is next to impossible to explain (I tried for this commit message, but gave up after wasting several hours). What I can explain is a flaw in the BlockDriver interface that leads to this bug: $ qemu-system-x86_64 -blockdev node-name=n1,driver=nfs,server.type=inet,server.host=localhost,path=/foo/bar,user=1234 qemu-system-x86_64: -blockdev node-name=n1,driver=nfs,server.type=inet,server.host=localhost,path=/foo/bar,user=1234: Internal error: parameter user invalid QMP blockdev-add is broken the same way. Here's what happens. The block layer passes configuration represented as flat QDict (with dotted keys) to BlockDriver methods .bdrv_file_open(). The QDict's members are typed according to the QAPI schema. nfs_file_open() converts it to QAPI type BlockdevOptionsNfs, with qdict_crumple() and a qobject input visitor. This visitor comes in two flavors. The plain flavor requires scalars to be typed according to the QAPI schema. That's the case here. The keyval flavor requires string scalars. That's not the case here. nfs_file_open() uses the latter, and promptly falls apart for members @user, @group, @tcp-syn-count, @readahead-size, @page-cache-size, @debug. Switching to the plain flavor would fix -blockdev, but break -drive, because there the scalars arrive in nfs_file_open() as strings. The proper fix would be to replace the QDict by QAPI type BlockdevOptions in the BlockDriver interface. Sadly, that's beyond my reach right now. Next best would be to fix the block layer to always pass correctly typed QDicts to the BlockDriver methods. Also beyond my reach. What I can do is throw another hack onto the pile: have nfs_file_open() convert all members to string, so use of the keyval flavor actually works, by replacing qdict_crumple() by new function qdict_crumple_for_keyval_qiv(). The pattern "pass result of qdict_crumple() to qobject_input_visitor_new_keyval()" occurs several times more: * qemu_rbd_open() Same issue as nfs_file_open(), but since BlockdevOptionsRbd has only string members, its only a latent bug. Fix it anyway. * parallels_co_create_opts(), qcow_co_create_opts(), qcow2_co_create_opts(), bdrv_qed_co_create_opts(), sd_co_create_opts(), vhdx_co_create_opts(), vpc_co_create_opts() These work, because they create the QDict with qemu_opts_to_qdict_filtered(), which creates only string scalars. The function sports a TODO comment asking for better typing; that's going to be fun. Use qdict_crumple_for_keyval_qiv() to be safe. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-15 14:49:44 +02:00
Max Reitz	609f45ea95	block: Add block-specific QDict header There are numerous QDict functions that have been introduced for and are used only by the block layer. Move their declarations into an own header file to reflect that. While qdict_extract_subqdict() is in fact used outside of the block layer (in util/qemu-config.c), it is still a function related very closely to how the block layer works with nested QDicts, namely by sometimes flattening them. Therefore, its declaration is put into this header as well and util/qemu-config.c includes it with a comment stating exactly which function it needs. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20180509165530.29561-7-mreitz@redhat.com> [Copyright note tweaked, superfluous includes dropped] Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-15 14:49:44 +02:00
Max Reitz	ddf3b47ef4	qcow2: Do not mark inactive images corrupt When signaling a corruption on a read-only image, qcow2 already makes fatal events non-fatal (i.e., they will not result in the image being closed, and the image header's corrupt flag will not be set). This is necessary because we cannot set the corrupt flag on read-only images, and it is possible because further corruption of read-only images is impossible. Inactive images are effectively read-only, too, so we should do the same for them. bdrv_is_writable() can tell us whether an image can actually be written to, so use its result instead of !bs->read_only. (Otherwise, the assert(!(bs->open_flags & BDRV_O_INACTIVE)) in bdrv_co_pwritev() will fail, crashing qemu.) Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20180606193702.7113-3-mreitz@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-11 16:18:45 +02:00
Max Reitz	d1402b5026	block: Add Error parameter to bdrv_amend_options Looking at the qcow2 code that is riddled with error_report() calls, this is really how it should have been from the start. Along the way, turn the target_version/current_version comparisons at the beginning of qcow2_downgrade() into assertions (the caller has to make sure these conditions are met), and rephrase the error message on using compat=1.1 to get refcount widths other than 16 bits. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20180509210023.20283-3-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-11 16:18:45 +02:00
Peter Maydell	0d514fa234	Pull request * Copy offloading for qemu-img convert (iSCSI, raw, and qcow2) If the underlying storage supports copy offloading, qemu-img convert will use it instead of performing reads and writes. This avoids data transfers and thus frees up storage bandwidth for other purposes. SCSI EXTENDED COPY and Linux copy_file_range(2) are used to implement this optimization. * Drop spurious "WARNING: I\/O thread spun for 1000 iterations" warning -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJbFSBoAAoJEJykq7OBq3PISpEIAIcMao4/rzinAWXzS+ncK9LO 6FtRVpgutpHaWX2ayySaz5n2CdR3cNMrpCI7sjY2Kw0lrdkqxPgl5n0SWD+VCl4W 7+JLz/uF0iUV8X+99e7WGAjZbm9LSlxgn5AQKfrrwyPf0ZfzoYQ5nBMcQ6xjEeQP 48j2WqJqN9/u8RBD07o11yn0+CE5g56/f12xVjR5ASVodzsAmcZ2OQRMQbM01isU 1mBekJQkDxJkt5l13Rql8+t+vWz8/9BEW2c/eIDKvoayMqYJpdfKv4DqLloIuHnc 3RkquA0zUuKtl7xEnEkH/We7fi4QPGW/vyBN7ychS/zKzZFQrXmwqrAuFSw3dKU= =vZp+ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging Pull request * Copy offloading for qemu-img convert (iSCSI, raw, and qcow2) If the underlying storage supports copy offloading, qemu-img convert will use it instead of performing reads and writes. This avoids data transfers and thus frees up storage bandwidth for other purposes. SCSI EXTENDED COPY and Linux copy_file_range(2) are used to implement this optimization. * Drop spurious "WARNING: I\/O thread spun for 1000 iterations" warning # gpg: Signature made Mon 04 Jun 2018 12:20:08 BST # gpg: using RSA key 9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: main-loop: drop spin_counter qemu-img: Convert with copy offloading block-backend: Add blk_co_copy_range iscsi: Implement copy offloading iscsi: Create and use iscsi_co_wait_for_task iscsi: Query and save device designator when opening file-posix: Implement bdrv_co_copy_range qcow2: Implement copy offloading raw: Implement copy offloading raw: Check byte range uniformly block: Introduce API for copy offloading Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-04 18:34:04 +01:00
Peter Maydell	f67c9b693a	acpi, vhost, misc: fixes, features vDPA support, fix to vhost blk RO bit handling, some include path cleanups, NFIT ACPI table. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJbEXNvAAoJECgfDbjSjVRpc8gH/R8xrcFrV+k9wwbgYcOcGb6Y LWjseE31pqJcxRV80vLOdzYEuLStZQKQQY7xBDMlA5vdyvZxIA6FLO2IsiJSbFAk EK8pclwhpwQAahr8BfzenabohBv2UO7zu5+dqSvuJCiMWF3jGtPAIMxInfjXaOZY odc1zY2D2EgsC7wZZ1hfraRbISBOiRaez9BoGDKPOyBY9G1ASEgxJgleFgoBLfsK a1XU+fDM6hAVdxftfkTm0nibyf7PWPDyzqghLqjR9WXLvZP3Cqud4p8N29mY51pR KSTjA4FYk6Z9EVMltyBHfdJs6RQzglKjxcNGdlrvacDfyFi79fGdiosVllrjfJM= =3+V0 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging acpi, vhost, misc: fixes, features vDPA support, fix to vhost blk RO bit handling, some include path cleanups, NFIT ACPI table. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Fri 01 Jun 2018 17:25:19 BST # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (31 commits) vhost-blk: turn on pre-defined RO feature bit ACPI testing: test NFIT platform capabilities nvdimm, acpi: support NFIT platform capabilities tests/.gitignore: add entry for generated file arch_init: sort architectures ui: use local path for local headers qga: use local path for local headers colo: use local path for local headers migration: use local path for local headers usb: use local path for local headers sd: fix up include vhost-scsi: drop an unused include ppc: use local path for local headers rocker: drop an unused include e1000e: use local path for local headers ioapic: fix up includes ide: use local path for local headers display: use local path for local headers trace: use local path for local headers migration: drop an unused include ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-04 10:15:16 +01:00
Fam Zheng	fd9fcd37a8	qcow2: Implement copy offloading The two callbacks are implemented quite similarly to the read/write functions: bdrv_co_copy_range_from maps for read and calls into bs->file or bs->backing depending on the allocation status; bdrv_co_copy_range_to maps for write and calls into bs->file. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20180601092648.24614-5-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-01 14:41:48 +01:00
Michael S. Tsirkin	0d8c41dae5	block: use local path for local headers When pulling in headers that are in the same directory as the C file (as opposed to one in include/), we should use its relative path, without a directory. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>	2018-05-31 04:16:06 +03:00
Alberto Garcia	7af5eea9b3	qcow2: Fix Coverity warning when calculating the refcount cache size MIN_REFCOUNT_CACHE_SIZE is 4 and the cluster size is guaranteed to be at most 2MB, so the minimum refcount cache size (in bytes) is always going to fit in a 32-bit integer. Coverity doesn't know that, and since we're storing the result in a uint64_t (*refcount_cache_size) it thinks that we need the 64 bits and that we probably want to do a 64-bit multiplication to prevent the result from being truncated. This is a false positive in this case, but it's a fair warning. We could do a 64-bit multiplication to get rid of it, but since we know that a 32-bit variable is enough to store this value let's simply reuse min_refcount_cache, make it a normal int and stop doing casts. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-05-29 20:09:17 +02:00
Alberto Garcia	52253998ec	qcow2: Give the refcount cache the minimum possible size by default The L2 and refcount caches have default sizes that can be overridden using the l2-cache-size and refcount-cache-size (an additional parameter named cache-size sets the combined size of both caches). Unless forced by one of the aforementioned parameters, QEMU will set the unspecified sizes so that the L2 cache is 4 times larger than the refcount cache. This is based on the premise that the refcount metadata needs to be only a fourth of the L2 metadata to cover the same amount of disk space. This is incorrect for two reasons: a) The amount of disk covered by an L2 table depends solely on the cluster size, but in the case of a refcount block it depends on the cluster size and the width of each refcount entry. The 4/1 ratio is only valid with 16-bit entries (the default). b) When we talk about disk space and L2 tables we are talking about guest space (L2 tables map guest clusters to host clusters), whereas refcount blocks are used for host clusters (including L1/L2 tables and the refcount blocks themselves). On a fully populated (and uncompressed) qcow2 file, image size > virtual size so there are more refcount entries than L2 entries. Problem (a) could be fixed by adjusting the algorithm to take into account the refcount entry width. Problem (b) could be fixed by increasing a bit the refcount cache size to account for the clusters used for qcow2 metadata. However this patch takes a completely different approach and instead of keeping a ratio between both cache sizes it assigns as much as possible to the L2 cache and the remainder to the refcount cache. The reason is that L2 tables are used for every single I/O request from the guest and the effect of increasing the cache is significant and clearly measurable. Refcount blocks are however only used for cluster allocation and internal snapshots and in practice are accessed sequentially in most cases, so the effect of increasing the cache is negligible (even when doing random writes from the guest). So, make the refcount cache as small as possible unless the user explicitly asks for a larger one. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 9695182c2eb11b77cb319689a1ebaa4e7c9d6591.1523968389.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-05-15 16:15:21 +02:00
Marc-André Lureau	cb3e7f08ae	qobject: Replace qobject_incref/QINCREF qobject_decref/QDECREF Now that we can safely call QOBJECT() on QObject * as well as its subtypes, we can have macros qobject_ref() / qobject_unref() that work everywhere instead of having to use QINCREF() / QDECREF() for QObject and qobject_incref() / qobject_decref() for its subtypes. The replacement is mechanical, except I broke a long line, and added a cast in monitor_qmp_cleanup_req_queue_locked(). Unlike qobject_decref(), qobject_unref() doesn't accept void *. Note that the new macros evaluate their argument exactly once, thus no need to shout them. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180419150145.24795-4-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Rebased, semantic conflict resolved, commit message improved] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2018-05-04 08:27:53 +02:00
Vladimir Sementsov-Ogievskiy	605bc8be42	qcow2: try load bitmaps only once Checking reopen by existence of some bitmaps is wrong, as it may be some other bitmaps, or on the other hand, user may remove bitmaps. This criteria is bad. To simplify things and make behavior more predictable let's just add a flag to remember, that we've already tried to load bitmaps on open and do not want do it again. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20180411122606.367301-2-vsementsov@virtuozzo.com [mreitz: Changed comment wording according to Eric Blake's suggestion] Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-04-16 13:35:32 +02:00
Vladimir Sementsov-Ogievskiy	2d949dfcef	qcow2: fix bitmaps loading when bitmaps already exist On reopen with existing bitmaps, instead of loading bitmaps, lets reopen them if needed. This also fixes bitmaps migration through shared storage. Consider the case. Persistent bitmaps are stored on bdrv_inactivate. Then, on destination process_incoming_migration_bh() calls bdrv_invalidate_cache_all() which leads to qcow2_load_autoloading_dirty_bitmaps() which fails if bitmaps are already loaded on destination start. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20180320170521.32152-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-03-26 21:17:24 +02:00
Max Reitz	7dc847ebba	qapi: Replace qobject_to_X(o) by qobject_to(X, o) This patch was generated using the following Coccinelle script: @@ expression Obj; @@ ( - qobject_to_qnum(Obj) + qobject_to(QNum, Obj) \| - qobject_to_qstring(Obj) + qobject_to(QString, Obj) \| - qobject_to_qdict(Obj) + qobject_to(QDict, Obj) \| - qobject_to_qlist(Obj) + qobject_to(QList, Obj) \| - qobject_to_qbool(Obj) + qobject_to(QBool, Obj) ) and a bit of manual fix-up for overly long lines and three places in tests/check-qjson.c that Coccinelle did not find. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20180224154033.29559-4-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: swap order from qobject_to(o, X), rebase to master, also a fix to latent false-positive compiler complaint about hw/i386/acpi-build.c] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-19 14:58:36 -05:00
Kevin Wolf	b0292b851b	block: x-blockdev-create QMP command This adds a synchronous x-blockdev-create QMP command that can create qcow2 images on a given node name. We don't want to block while creating an image, so this is not the final interface in all aspects, but BlockdevCreateOptionsQcow2 and .bdrv_co_create() are what they actually might look like in the end. In any case, this should be good enough to test whether we interpret BlockdevCreateOptions as we should. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-03-09 15:17:47 +01:00
Kevin Wolf	b76b4f6045	qcow2: Use visitor for options in qcow2_create() Instead of manually creating the BlockdevCreateOptions object, use a visitor to parse the given options into the QAPI object. This involves translation from the old command line syntax to the syntax mandated by the QAPI schema. Option names are still checked against qcow2_create_opts, so only the old option names are allowed on the command line, even if they are translated in qcow2_create(). In contrast, new option values are optionally recognised besides the old values: 'compat' accepts 'v2'/'v3' as an alias for '0.10'/'1.1', and 'encrypt.format' accepts 'qcow' as an alias for 'aes' now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-03-09 15:17:47 +01:00
Kevin Wolf	e4b5dad826	qcow2: Handle full/falloc preallocation in qcow2_co_create() Once qcow2_co_create() can be called directly on an already existing node, we must provide the 'full' and 'falloc' preallocation modes outside of creating the image on the protocol layer. Fortunately, we have preallocated truncate now which can provide this functionality. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-03-09 15:17:47 +01:00
Kevin Wolf	60900b7bd9	qcow2: Use QCryptoBlockCreateOptions in qcow2_co_create() Instead of passing the encryption format name and the QemuOpts down, use the QCryptoBlockCreateOptions contained in BlockdevCreateOptions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-03-09 15:17:47 +01:00
Kevin Wolf	e1d74bc6c6	qcow2: Use BlockdevRef in qcow2_co_create() Instead of passing a separate BlockDriverState* into qcow2_co_create(), make use of the BlockdevRef that is included in BlockdevCreateOptions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-03-09 15:17:47 +01:00
Kevin Wolf	29ca9e450d	qcow2: Pass BlockdevCreateOptions to qcow2_co_create() All of the simple options are now passed to qcow2_co_create() in a BlockdevCreateOptions object. Still missing: node-name and the encryption options. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-03-09 15:17:47 +01:00
Kevin Wolf	cbf2b7c404	qcow2: Let qcow2_create() handle protocol layer Currently, qcow2_create() only parses the QemuOpts and then calls qcow2_co_create() for the actual image creation, which includes both the creation of the actual file on the file system and writing a valid empty qcow2 image into that file. The plan is that qcow2_co_create() becomes the function that implements the functionality for a future 'blockdev-create' QMP command, which only creates the qcow2 layer on an already opened file node. This is a first step towards that goal: Let's move out anything that deals with the protocol layer from qcow2_co_create() into qcow2_create(). This means that qcow2_co_create() doesn't need a file name any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-03-09 15:17:47 +01:00
Kevin Wolf	8ca5bfdc46	qcow2: Rename qcow2_co_create2() to qcow2_co_create() The functions originally known as qcow2_create() and qcow2_create2() are now called qcow2_co_create_opts() and qcow2_co_create(), which matches the names of the BlockDriver callbacks that they will implement at the end of this patch series. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-03-09 15:17:47 +01:00
Alberto Garcia	0cf0e5980b	qcow2: Generalize validate_table_offset() into qcow2_validate_table() This function checks that the offset and size of a table are valid. While the offset checks are fine, the size check is too generic, since it only verifies that the total size in bytes fits in a 64-bit integer. In practice all tables used in qcow2 have much smaller size limits, so the size needs to be checked again for each table using its actual limit. This patch generalizes this function by allowing the caller to specify the maximum size for that table. In addition to that it allows passing an Error variable. The function is also renamed and made public since we're going to use it in other parts of the code. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-09 15:17:47 +01:00
Paolo Bonzini	2fd6163884	block: convert bdrv_check callback to coroutine_fn Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1516279431-30424-8-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-09 15:17:47 +01:00
Paolo Bonzini	2b148f392b	block: convert bdrv_invalidate_cache callback to coroutine_fn QED's bdrv_invalidate_cache implementation would like to reuse functions that acquire/release the metadata locks. Call it from coroutine context to simplify the logic. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1516279431-30424-6-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-09 15:17:47 +01:00
Paolo Bonzini	1fafcd9368	qcow2: make qcow2_do_open a coroutine_fn It is called from qcow2_invalidate_cache in coroutine context (incoming migration runs in a coroutine), so it's cleaner if metadata is always loaded from a coroutine. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1516279431-30424-4-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-09 15:17:47 +01:00
Paolo Bonzini	8b220eb7c8	qcow2: introduce qcow2_write_caches and qcow2_flush_caches They will be used to avoid recursively taking s->lock during bdrv_open or bdrv_check. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1516279431-30424-7-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-09 15:17:47 +01:00
Peter Maydell	58e2e17dba	Block layer patches -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJanYJPAAoJEH8JsnLIjy/WxjUQAJA+DTOmGXvaNpMs65BrU79K /r/iGVrzHv/RMLmrWMnqj96W9SnpMuiAP9hVLNsekqClY9q4ME4DpGcXhWfhSvF5 FC51ehvFJdfo8cPorsevcqNj60iWebjcx3lFfUq2606UOyYih3oijYxr6gSwWbRc GAgdGMqsvGYpzgqAQVEWHUhaX0La49/OzY42aR+E+LCBNfTYvlydvyoc+tUTdIpW 1eM/ASGndGsN0Cf2vxlbKgJ0/P6v+cRZuuIDhKZqre+YG+yM+pq7yZb+o7nf/P36 TPR93BsT7FSVAizRK7VFRuPIynHpiaxYygrJERCXF0sxsV4OlKjpmt/uUPamWFh+ 46Jx2NK1AuAx87BdErgmA119ObO3oAPxK0+2p981obb6SphTbbPxDj6SOlYCt4mJ mhff4JtIiwCmDSckAwd2mkBI1Tvl9qqcELrpyd2t2eU4ec2vf7fPd85EsK/Mq6Kr dbfqFvjNaaMxChoqFgkHAveYJ7zYqRFI2IY5o9c1QyZehCGPWjScxHXZZYdpDl59 YF9DkYQDOyvEX2jmMECaO1r/0nnO+BqQHu5ItJuTte9rjP9Q0do3iBISiIefewtf yji6/QNn2hFrnr1HPAwLFFC3kPgc8Mq8mIUb53j8vG/01KhVRCcnJm2K6D4IUwLZ S6ZnQJB97eE4y7YR5dNt =2axz -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches # gpg: Signature made Mon 05 Mar 2018 17:45:51 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (38 commits) block: Fix NULL dereference on empty drive error qcow2: Replace align_offset() with ROUND_UP() block/ssh: Add basic .bdrv_truncate() block/ssh: Make ssh_grow_file() blocking block/ssh: Pull ssh_grow_file() from ssh_create() qemu-img: Make resize error message more general qcow2: make qcow2_co_create2() a coroutine_fn block: rename .bdrv_create() to .bdrv_co_create_opts() Revert "IDE: Do not flush empty CDROM drives" block: test blk_aio_flush() with blk->root == NULL block: add BlockBackend->in_flight counter block: extract AIO_WAIT_WHILE() from BlockDriverState aio: rename aio_context_in_iothread() to in_aio_context_home_thread() docs: document how to use the l2-cache-entry-size parameter specs/qcow2: Fix documentation of the compressed cluster descriptor iotest 033: add misaligned write-zeroes test via truncate block: fix write with zero flag set and iovector provided block: Drop unused .bdrv_co_get_block_status() vvfat: Switch to .bdrv_co_block_status() vpc: Switch to .bdrv_co_block_status() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # include/block/block.h	2018-03-06 11:20:44 +00:00
Markus Armbruster	9af2398977	Include less of the generated modular QAPI headers In my "build everything" tree, a change to the types in qapi-schema.json triggers a recompile of about 4800 out of 5100 objects. The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h, qapi-types.h. Each of these headers still includes all its shards. Reduce compile time by including just the shards we actually need. To illustrate the benefits: adding a type to qapi/migration.json now recompiles some 2300 instead of 4800 objects. The next commit will improve it further. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180211093607.27351-24-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-02 13:45:50 -06:00
Markus Armbruster	0dd13589b0	Include qapi/qmp/qerror.h exactly where needed Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180211093607.27351-2-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-02 13:14:08 -06:00
Alberto Garcia	9e029689e1	qcow2: Replace align_offset() with ROUND_UP() The align_offset() function is equivalent to the ROUND_UP() macro so there's no need to use the former. The ROUND_UP() name is also a bit more explicit. This patch uses ROUND_UP() instead of the slower QEMU_ALIGN_UP() because align_offset() already requires that the second parameter is a power of two. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20180215131008.5153-1-berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-03-02 18:39:56 +01:00
Stefan Hajnoczi	c274393a3e	qcow2: make qcow2_co_create2() a coroutine_fn qcow2_create2() calls qemu_co_mutex_lock(). Only a coroutine_fn may call another coroutine_fn. In fact, qcow2_create2 is always called from coroutine context. Rename the function to add the "co" moniker and add coroutine_fn. Reported-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20170705102231.20711-3-stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-02 18:39:07 +01:00
Stefan Hajnoczi	efc75e2a4c	block: rename .bdrv_create() to .bdrv_co_create_opts() BlockDriver->bdrv_create() has been called from coroutine context since commit `5b7e1542cf` ("block: make bdrv_create adopt coroutine"). Make this explicit by renaming to .bdrv_co_create_opts() and add the coroutine_fn annotation. This makes it obvious to block driver authors that they may yield, use CoMutex, or other coroutine_fn APIs. bdrv_co_create is reserved for the QAPI-based version that Kevin is working on. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20170705102231.20711-2-stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-02 18:39:07 +01:00
Eric Blake	a320fb04b6	qcow2: Switch to .bdrv_co_block_status() We are gradually moving away from sector-based interfaces, towards byte-based. Update the qcow2 driver accordingly. For now, we are ignoring the 'want_zero' hint. However, it should be relatively straightforward to honor the hint as a way to return larger *pnum values when we have consecutive clusters with the same data/zero status but which differ only in having non-consecutive mappings. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-02 18:39:07 +01:00
Alberto Garcia	1221fe6f63	qcow2: Allow configuring the L2 slice size Now that the code is ready to handle L2 slices we can finally add an option to allow configuring their size. An L2 slice is the portion of an L2 table that is read by the qcow2 cache. Until now the cache was always reading full L2 tables, and since the L2 table size is equal to the cluster size this was not very efficient with large clusters. Here's a more detailed explanation of why it makes sense to have smaller cache entries in order to load L2 data: https://lists.gnu.org/archive/html/qemu-block/2017-09/msg00635.html This patch introduces a new command-line option to the qcow2 driver named l2-cache-entry-size (cf. l2-cache-size). The cache entry size has the same restrictions as the cluster size: it must be a power of two and it has the same range of allowed values, with the additional requirement that it must not be larger than the cluster size. The L2 cache entry size (L2 slice size) remains equal to the cluster size for now by default, so this feature must be explicitly enabled. Although my tests show that 4KB slices consistently improve performance and give the best results, let's wait and make more tests with different cluster sizes before deciding on an optimal default. Now that the cache entry size is not necessarily equal to the cluster size we need to reflect that in the MIN_L2_CACHE_SIZE documentation. That minimum value is a requirement of the COW algorithm: we need to read two L2 slices (and not two L2 tables) in order to do COW, see l2_allocate() for the actual code. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: c73e5611ff4a9ec5d20de68a6c289553a13d2354.1517840877.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-02-13 17:00:00 +01:00

1 2 3 4 5 ...

614 Commits