mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Kevin Wolf	2758be056b	vmdk: Flush only once in vmdk_L2update() If we have a backup L2 table, we currently flush once after writing to the active L2 table and again after writing to the backup table. A single flush is enough and makes things a little less slow. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-6-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Kevin Wolf	78cae78dbc	vmdk: Don't update L2 table for zero write on zero cluster If a cluster is already zeroed, we don't have to call vmdk_L2update(), which is rather slow because it flushes the image file. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-5-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Kevin Wolf	4823cde58e	vmdk: Fix partial overwrite of zero cluster When overwriting a zero cluster, we must not perform copy-on-write from the backing file, but from a zeroed buffer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-4-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Kevin Wolf	2821c1cc0f	vmdk: Fix zero cluster allocation m_data must contain valid data even for zero clusters when no cluster was allocated in the image file. Without this, zero writes segfault with images that have zeroed_grain=on. For zero writes, we don't want to allocate a cluster in the image file even in compressed files. Fixes: `524089bce4` Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-3-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Kevin Wolf	4dc20e6465	vmdk: Rename VmdkMetaData.valid to new_allocation m_data is used for zero clusters even though valid == 0. It really only means that a new cluster was allocated in the image file. Rename it to reflect this. While at it, change it from int to bool, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Eric Blake	a3aeeab557	block: Add blk_new_with_bs() helper There are several callers that need to create a new block backend from an existing BDS; make the task slightly easier with a common helper routine. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200424190903.522087-2-eblake@redhat.com> [mreitz: Set @ret only in error paths, see https://lists.nongnu.org/archive/html/qemu-block/2020-04/msg01216.html] Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200428192648.749066-2-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 13:17:36 +02:00
Kevin Wolf	8c6242b6f3	block-backend: Add flags to blk_truncate() Now that node level interface bdrv_truncate() supports passing request flags to the block driver, expose this on the BlockBackend level, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-4-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Kevin Wolf	7b8e485742	block: Add flags to bdrv(_co)_truncate() Now that block drivers can support flags for .bdrv_co_truncate, expose the parameter in the node level interfaces bdrv_co_truncate() and bdrv_truncate(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-3-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Maxim Levitsky	b92902dfea	block: pass BlockDriver reference to the .bdrv_co_create This will allow the reuse of a single generic .bdrv_co_create implementation for several drivers. No functional changes. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200326011218.29230-2-mlevitsk@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-26 14:44:33 +01:00
Philippe Mathieu-Daudé	880a7817c1	misc: Replace zero-length arrays with flexible array member (manual) Description copied from Linux kernel commit from Gustavo A. R. Silva (see [3]): --v-- description start --v-- The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member [1], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being unadvertenly introduced [2] to the Linux codebase from now on. --^-- description end --^-- Do the similar housekeeping in the QEMU codebase (which uses C99 since commit `7be41675f7`). All these instances of code were found with the help of the following command (then manual analysis, without modifying structures only having a single flexible array member, such QEDTable in block/qed.h): git grep -F '[0];' [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=76497732932f [3] https://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git/commit/?id=17642a2fbd2c1 Inspired-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 22:07:42 +01:00
Max Reitz	c80d8b06cf	block: Add @exact parameter to bdrv_co_truncate() We have two drivers (iscsi and file-posix) that (in some cases) return success from their .bdrv_co_truncate() implementation if the block device is larger than the requested offset, but cannot be shrunk. Some callers do not want that behavior, so this patch adds a new parameter that they can use to turn off that behavior. This patch just adds the parameter and lets the block/io.c and block/block-backend.c functions pass it around. All other callers always pass false and none of the implementations evaluate it, so that this patch does not change existing behavior. Future patches take care of that. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-5-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 12:00:07 +01:00
Max Reitz	bedb8bb419	vmdk: Reject invalid compressed writes Compressed writes generally have to write full clusters, not just in theory but also in practice when it comes to vmdk's streamOptimized subformat. It currently is just silently broken for writes with non-zero in-cluster offsets: $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M $ qemu-io -c 'write 4k 4k' -c 'read 4k 4k' foo.vmdk wrote 4096/4096 bytes at offset 4096 4 KiB, 1 ops; 00.01 sec (443.724 KiB/sec and 110.9309 ops/sec) read failed: Invalid argument (The technical reason is that vmdk_write_extent() just writes the incomplete compressed data actually to offset 4k. When reading the data, vmdk_read_extent() looks at offset 0 and finds the compressed data size to be 0, because that is what it reads from there. This yields an error.) For incomplete writes with zero in-cluster offsets, the error path when reading the rest of the cluster is a bit different, but the result is the same: $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M $ qemu-io -c 'write 0k 4k' -c 'read 4k 4k' foo.vmdk wrote 4096/4096 bytes at offset 0 4 KiB, 1 ops; 00.01 sec (362.641 KiB/sec and 90.6603 ops/sec) read failed: Invalid argument (Here, vmdk_read_extent() finds the data and then sees that the uncompressed data is short.) It is better to reject invalid writes than to make the user believe they might have succeeded and then fail when trying to read it back. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190815153638.4600-5-mreitz@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-03 14:55:35 +02:00
Max Reitz	cdc0dd2586	vmdk: Use bdrv_dirname() for relative extent paths This makes iotest 033 pass with e.g. subformat=monolithicFlat. It also turns a former error in 059 into success. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190815153638.4600-3-mreitz@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-03 14:55:35 +02:00
Max Reitz	4dd84ac9a7	vmdk: Make block_status recurse for flat extents Fixes: `69f47505ee` Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190725155512.9827-3-mreitz@redhat.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-08-19 17:13:26 +02:00
Sam Eiderman	98eb9733f4	vmdk: Add read-only support for seSparse snapshots Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in QEMU). This format was lacking in the following: * Grain directory (L1) and grain table (L2) entries were 32-bit, allowing access to only 2TB (slightly less) of data. * The grain size (default) was 512 bytes - leading to data fragmentation and many grain tables. * For space reclamation purposes, it was necessary to find all the grains which are not pointed to by any grain table - so a reverse mapping of "offset of grain in vmdk" to "grain table" must be constructed - which takes large amounts of CPU/RAM. The format specification can be found in VMware's documentation: https://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf In ESXi 6.5, to support snapshot files larger than 2TB, a new format was introduced: SESparse (Space Efficient). This format fixes the above issues: * All entries are now 64-bit. * The grain size (default) is 4KB. * Grain directory and grain tables are now located at the beginning of the file. + seSparse format reserves space for all grain tables. + Grain tables can be addressed using an index. + Grains are located in the end of the file and can also be addressed with an index. - seSparse vmdks of large disks (64TB) have huge preallocated headers - mainly due to L2 tables, even for empty snapshots. * The header contains a reverse mapping ("backmap") of "offset of grain in vmdk" to "grain table" and a bitmap ("free bitmap") which specifies for each grain - whether it is allocated or not. Using these data structures we can implement space reclamation efficiently. * Due to the fact that the header now maintains two mappings: * The regular one (grain directory & grain tables) * A reverse one (backmap and free bitmap) These data structures can lose consistency upon crash and result in a corrupted VMDK. Therefore, a journal is also added to the VMDK and is replayed when the VMware reopens the file after a crash. Since ESXi 6.7 - SESparse is the only snapshot format available. Unfortunately, VMware does not provide documentation regarding the new seSparse format. This commit is based on black-box research of the seSparse format. Various in-guest block operations and their effect on the snapshot file were tested. The only VMware provided source of information (regarding the underlying implementation) was a log file on the ESXi: /var/log/hostd.log Whenever an seSparse snapshot is created - the log is being populated with seSparse records. Relevant log records are of the form: [...] Const Header: [...] constMagic = 0xcafebabe [...] version = 2.1 [...] capacity = 204800 [...] grainSize = 8 [...] grainTableSize = 64 [...] flags = 0 [...] Extents: [...] Header : <1 : 1> [...] JournalHdr : <2 : 2> [...] Journal : <2048 : 2048> [...] GrainDirectory : <4096 : 2048> [...] GrainTables : <6144 : 2048> [...] FreeBitmap : <8192 : 2048> [...] BackMap : <10240 : 2048> [...] Grain : <12288 : 204800> [...] Volatile Header: [...] volatileMagic = 0xcafecafe [...] FreeGTNumber = 0 [...] nextTxnSeqNumber = 0 [...] replayJournal = 0 The sizes that are seen in the log file are in sectors. Extents are of the following format: <offset : size> This commit is a strict implementation which enforces: * magics * version number 2.1 * grain size of 8 sectors (4KB) * grain table size of 64 sectors * zero flags * extent locations Additionally, this commit proivdes only a subset of the functionality offered by seSparse's format: * Read-only * No journal replay * No space reclamation * No unmap support Hence, journal header, journal, free bitmap and backmap extents are unused, only the "classic" (L1 -> L2 -> data) grain access is implemented. However there are several differences in the grain access itself. Grain directory (L1): * Grain directory entries are indexes (not offsets) to grain tables. * Valid grain directory entries have their highest nibble set to 0x1. * Since grain tables are always located in the beginning of the file - the index can fit into 32 bits - so we can use its low part if it's valid. Grain table (L2): * Grain table entries are indexes (not offsets) to grains. * If the highest nibble of the entry is: 0x0: The grain in not allocated. The rest of the bytes are 0. 0x1: The grain is unmapped - guest sees a zero grain. The rest of the bits point to the previously mapped grain, see 0x3 case. 0x2: The grain is zero. 0x3: The grain is allocated - to get the index calculate: ((entry & 0x0fff000000000000) >> 48) \| ((entry & 0x0000ffffffffffff) << 12) * The difference between 0x1 and 0x2 is that 0x1 is an unallocated grain which results from the guest using sg_unmap to unmap the grain - but the grain itself still exists in the grain extent - a space reclamation procedure should delete it. Unmapping a zero grain has no effect (0x2 will not change to 0x1) but unmapping an unallocated grain will (0x0 to 0x1) - naturally. In order to implement seSparse some fields had to be changed to support both 32-bit and 64-bit entry sizes. Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com> Message-id: 20190620091057.47441-4-shmuel.eiderman@oracle.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-06-24 15:53:02 +02:00
Sam Eiderman	59d6ee4850	vmdk: Reduce the max bound for L1 table size 512M of L1 entries is a very loose bound, only 32M are required to store the maximal supported VMDK file size of 2TB. Fixed qemu-iotest 59# - now failure occures before on impossible L1 table size. Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com> Message-id: 20190620091057.47441-3-shmuel.eiderman@oracle.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-06-24 15:53:02 +02:00
Sam Eiderman	940a2cd5d2	vmdk: Fix comment regarding max l1_size coverage Commit `b0651b8c24` ("vmdk: Move l1_size check into vmdk_add_extent") extended the l1_size check from VMDK4 to VMDK3 but did not update the default coverage in the moved comment. The previous vmdk4 calculation: (512 * 1024 * 1024) * 512(l2 entries) * 65536(grain) = 16PB The added vmdk3 calculation: (512 * 1024 * 1024) * 4096(l2 entries) * 512(grain) = 1PB Adding the calculation of vmdk3 to the comment. In any case, VMware does not offer virtual disks more than 2TB for vmdk4/vmdk3 or 64TB for the new undocumented seSparse format which is not implemented yet in qemu. Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com> Message-id: 20190620091057.47441-2-shmuel.eiderman@oracle.com Reviewed-by: yuchenlin <yuchenlin@synology.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-06-24 15:53:02 +02:00
Kevin Wolf	d861ab3acf	block: Add BlockBackend.ctx This adds a new parameter to blk_new() which requires its callers to declare from which AioContext this BlockBackend is going to be used (or the locks of which AioContext need to be taken anyway). The given context is only stored and kept up to date when changing AioContexts. Actually applying the stored AioContext to the root node is saved for another commit. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:22:22 +02:00
Sam Eiderman	7502be838e	vmdk: Set vmdk parent backing_format to vmdk Commit `b69864e5a` ("vmdk: Support version=3 in VMDK descriptor files") fixed the probe function to correctly guess vmdk descriptors with version=3. This solves the issue where vmdk snapshot with parent vmdk descriptor containing "version=3" would be treated as raw instead vmdk. In the future case where a new vmdk version is introduced, we will again experience this issue, even if the user will provide "-f vmdk" it will only apply to the tip image and not to the underlying "misprobed" parent image. The code in vmdk.c already assumes that the backing file of vmdk must be vmdk (see vmdk_is_cid_valid which returns 0 if backing file is not vmdk). So let's make it official by supplying the backing_format as vmdk. Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-By: Liran Alon <liran.alon@oracle.com> Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> Signed-off-by: Shmuel Eiderman <shmuel.eiderman@oracle.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <fam@euphon.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Sam Eiderman	b69864e5a8	vmdk: Support version=3 in VMDK descriptor files Commit `509d39aa22` added support for read only VMDKs of version 3. This commit fixes the probe function to correctly handle descriptors of version 3. This commit has two effects: 1. We no longer need to supply '-f vmdk' when pointing to descriptor files of version 3 in qemu/qemu-img command line arguments. 2. This fixes the scenario where a VMDK points to a parent version 3 descriptor file which is being probed as "raw" instead of "vmdk". Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Shmuel Eiderman <shmuel.eiderman@oracle.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-19 15:49:29 +01:00
Peter Maydell	adf2e451f3	Block layer patches: - Block graph change fixes (avoid loops, cope with non-tree graphs) - bdrv_set_aio_context() related fixes - HMP snapshot commands: Use only tag, not the ID to identify snapshots - qmeu-img, commit: Error path fixes - block/nvme: Build fix for gcc 9 - MAINTAINERS updates - Fix various issues with bdrv_refresh_filename() - Fix various iotests - Include LUKS overhead in qemu-img measure for qcow2 - A fix for vmdk's image creation interface -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJcc/knAAoJEH8JsnLIjy/WptQP/3F8Lh52H4egXaP7NUUuDjQM AhqhuDAp/EZBS+xim9kLTogNJADe/rMWdSX/YB5aLpSPYbjasC66NgaLhd6QewgQ VIcsLUdlYAyZ5ZjJytimfMTLwm1X02RmVIe55y52DTY8LlfViZzOlf3qwqPm00ao EJB2cl8UJLM+PVEu59cCw3R0/06LY+WIJRB32d3tnCBRTkaJwfR9h4lrp/juVcFZ U+2eWU68KMbUHSYiWANowN+KRV3uPY4HVA98v3F0vDmcBxlVHOeBg6S+PcT7tK8p huzCMwcdwUyPMJgVs/+WBtUnbG0jN6SHUYmFLz859UMVgBnCw5tzBMf8qw1wOA4A Iw+zor27Pxj4IlxcLPp5f97YZ8k9acdMR2VKPH6xLJZ1JF+sKa54RfzESd5EJeIj Mfcp773H0lIaWcFJ6RY1F0L1E1ta7QigwNBiWMdYfh0a0EWHnDvGyYeaSPYEQ+rl e8bZOcfrYwVI7DTDiZOIkGA9D8DXEPDNp+sl6s1DxeY69D0NNaXTtCPqFNNAbFbd 20uD7yDRZlWq32cQB/K9D5cSkZRSOzdUpLfLU3nQU2+dz11x6OpM6m7DVboSrztD 1HtPPDzDEvH5dOP7ibd60s+ntjkSiNfNkUgnuVrBE/d/PocC1eHHpZt5V7f43Ofb RxVwH5+smzQ9nsNBfQR0 =gaah -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - Block graph change fixes (avoid loops, cope with non-tree graphs) - bdrv_set_aio_context() related fixes - HMP snapshot commands: Use only tag, not the ID to identify snapshots - qmeu-img, commit: Error path fixes - block/nvme: Build fix for gcc 9 - MAINTAINERS updates - Fix various issues with bdrv_refresh_filename() - Fix various iotests - Include LUKS overhead in qemu-img measure for qcow2 - A fix for vmdk's image creation interface # gpg: Signature made Mon 25 Feb 2019 14:18:15 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (71 commits) iotests: Skip 211 on insufficient memory vmdk: false positive of compat6 with hwversion not set iotests: add LUKS payload overhead to 178 qemu-img measure test qcow2: include LUKS payload overhead in qemu-img measure iotests.py: s/_/-/g on keys in qmp_log() iotests: Let 045 be run concurrently iotests: Filter SSH paths iotests.py: Filter filename in any string value iotests.py: Add is_str() iotests: Fix 207 to use QMP filters for qmp_log iotests: Fix 232 for LUKS iotests: Remove superfluous rm from 232 iotests: Fix 237 for Python 2.x iotests: Re-add filename filters iotests: Test json:{} filenames of internal BDSs block: BDS options may lack the "driver" option block/null: Generate filename even with latency-ns block/curl: Implement bdrv_refresh_filename() block/curl: Harmonize option defaults block/nvme: Fix bdrv_refresh_filename() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-02-26 19:04:47 +00:00
yuchenlin	26c9296c31	vmdk: false positive of compat6 with hwversion not set In vmdk_co_create_opts, when it finds hw_version is undefined, it will set it to 4, which misleading the compat6 and hwversion in vmdk_co_do_create. Simply set hw_version to NULL after free, let the logic in vmdk_co_do_create to decide the value of hw_version. This bug can be reproduced by: $ qemu-img convert -O vmdk -o subformat=streamOptimized,compat6 /home/yuchenlin/syno.qcow2 /home/yuchenlin/syno.vmdk qemu-img: /home/yuchenlin/syno.vmdk: error while converting vmdk: compat6 cannot be enabled with hwversion set Signed-off-by: yuchenlin <yuchenlin@synology.com> Message-id: 20190221110805.28239-1-yuchenlin@synology.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:28 +01:00
Max Reitz	abc521a9aa	block: Add BlockDriver.bdrv_gather_child_options Some follow-up patches will rework the way bs->full_open_options is refreshed in bdrv_refresh_filename(). The new implementation will remove the need for the block drivers' bdrv_refresh_filename() implementations to set bs->full_open_options; instead, it will be generic and use static information from each block driver. However, by implementing bdrv_gather_child_options(), block drivers will still be able to override the way the full_open_options of their children are incorporated into their own. We need to implement this function for VMDK because we have to prevent the generic implementation from gathering the options of all children: It is not possible to specify options for the extents through the runtime options. For quorum, the child names that would be used by the generic implementation and the ones that we actually (currently) want to use differ. See quorum_gather_child_options() for more information. Note that both of these are cases which are not ideal: In case of VMDK it would probably be nice to be able to specify options for all extents. In case of quorum, the current runtime option structure is simply broken and needs to be fixed (but that is left for another patch). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-23-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Max Reitz	645ae7d88e	block: bdrv_get_full_backing_filename_from_...'s ret. val. Make bdrv_get_full_backing_filename_from_filename() return an allocated string instead of placing the result in a caller-provided buffer. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190201192935.18394-11-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:26 +01:00
Max Reitz	009b03aaa2	block: Make path_combine() return the path Besides being safe for arbitrary path lengths, after some follow-up patches all callers will want a freshly allocated buffer anyway. In the meantime, path_combine_deprecated() is added which has the same interface as path_combine() had before this patch. All callers to that function will be converted in follow-up patches. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20190201192935.18394-10-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:26 +01:00
Max Reitz	998c201923	block: Add BDS.auto_backing_file If the backing file is overridden, this most probably does change the guest-visible data of a BDS. Therefore, we will need to consider this in bdrv_refresh_filename(). To see whether it has been overridden, we might want to compare bs->backing_file and bs->backing->bs->filename. However, bs->backing_file is changed by bdrv_set_backing_hd() (which is just used to change the backing child at runtime, without modifying the image header), so bs->backing_file most of the time simply contains a copy of bs->backing->bs->filename anyway, so it is useless for such a comparison. This patch adds an auto_backing_file BDS field which contains the backing file path as indicated by the image header, which is not changed by bdrv_set_backing_hd(). Because of bdrv_refresh_filename() magic, however, a BDS's filename may differ from what has been specified during bdrv_open(). Then, the comparison between bs->auto_backing_file and bs->backing->bs->filename may fail even though bs->backing was opened from bs->auto_backing_file. To mitigate this, we can copy the real BDS's filename (after the whole bdrv_open() and bdrv_refresh_filename() process) into bs->auto_backing_file, if we know the former has been opened based on the latter. This is only possible if no options modifying the backing file's behavior have been specified, though. To simplify things, this patch only copies the filename from the backing file if no options have been specified for it at all. Furthermore, there are cases where an overlay is created by qemu which already contains a BDS's filename (e.g. in blockdev-snapshot-sync). We do not need to worry about updating the overlay's bs->auto_backing_file there, because we actually wrote a post-bdrv_refresh_filename() filename into the image header. So all in all, there will be false negatives where (as of a future patch) bdrv_refresh_filename() will assume that the backing file differs from what was specified in the image header, even though it really does not. However, these cases should be limited to where (1) the user actually did override something in the backing chain (e.g. by specifying options for the backing file), or (2) the user executed a QMP command to change some node's backing file (e.g. change-backing-file or block-commit with @backing-file given) where the given filename does not happen to coincide with qemu's idea of the backing BDS's filename. Then again, (1) really is limited to -drive. With -blockdev or blockdev-add, you have to adhere to the schema, so a user cannot give partial "unimportant" options (e.g. by just setting backing.node-name and leaving the rest to the image header). Therefore, trying to fix this would mean trying to fix something for -drive only. To improve on (2), we would need a full infrastructure to "canonicalize" an arbitrary filename (+ options), so it can be compared against another. That seems a bit over the top, considering that filenames nowadays are there mostly for the user's entertainment. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-5-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:25 +01:00
Max Reitz	f30c66ba6e	block: Use bdrv_refresh_filename() to pull Before this patch, bdrv_refresh_filename() is used in a pushing manner: Whenever the BDS graph is modified, the parents of the modified edges are supposed to be updated (recursively upwards). However, that is nonviable, considering that we want child changes not to concern parents. Also, in the long run we want a pull model anyway: Here, we would have a bdrv_filename() function which returns a BDS's filename, freshly constructed. This patch is an intermediate step. It adds bdrv_refresh_filename() calls before every place a BDS.filename value is used. The only exceptions are protocol drivers that use their own filename, which clearly would not profit from refreshing that filename before. Also, bdrv_get_encrypted_filename() is removed along the way (as a user of BDS.filename), since it is completely unused. In turn, all of the calls to bdrv_refresh_filename() before this patch are removed, because we no longer have to call this function on graph changes. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190201192935.18394-2-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:25 +01:00
Vladimir Sementsov-Ogievskiy	199d95b043	block/vmdk: use qemu_iovec_init_buf Use new qemu_iovec_init_buf() instead of qemu_iovec_init_external( ... , 1), which simplifies the code. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190218140926.333779-12-vsementsov@virtuozzo.com Message-Id: <20190218140926.333779-12-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-02-22 09:42:13 +00:00
Andrey Shinkevich	1bf6e9ca92	bdrv_query_image_info Error parameter added Inform a user in case qcow2_get_specific_info fails to obtain QCOW2 image specific information. This patch is preliminary to the one "qcow2: Add list of bitmaps to ImageInfoSpecificQCow2". Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <1549638368-530182-2-git-send-email-andrey.shinkevich@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-02-11 14:35:43 -06:00
Kevin Wolf	4a960ece17	vmdk: Reject excess extents in blockdev-create Clarify that the number of extents provided in BlockdevCreateOptionsVmdk must match the number of extents that will actually be used. Providing more extents will result in an error now. This requires adapting the test case to provide the right number of extents. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com>	2019-02-01 13:46:44 +01:00
Fam Zheng	3015372dd0	vmdk: Implement .bdrv_co_create callback This makes VMDK support blockdev-create. The implementation reuses the image creation code in vmdk_co_create_opts which now acceptes a callback pointer to "retrieve" BlockBackend pointers from the caller. This way we separate the logic between file/extent acquisition and initialization. The QAPI command parameters are mostly the same as the old create_opts except the dropped legacy @compat6 switch, which is redundant with @hwversion. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-02-01 13:46:44 +01:00
Fam Zheng	5be28490ca	vmdk: Refactor vmdk_create_extent The extracted vmdk_init_extent takes a BlockBackend object and initializes the format metadata. It is the common part between "qemu-img create" and "blockdev-create". Add a "BlockBackend *pbb" parameter to vmdk_create_extent, to return the opened BB to the caller in the next patch. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-02-01 13:46:44 +01:00
yuchenlin	51b3c6b73a	vmdk: align end of file to a sector boundary There is a rare case which the size of last compressed cluster is larger than the cluster size, which will cause the file is not aligned at the sector boundary. There are three reasons to do it. First, if vmdk doesn't align at the sector boundary, there may be many undefined behaviors, such as, in vbox it will show VMDK: Compressed image is corrupted 'syno-vm-disk1.vmdk' (VERR_ZIP_CORRUPTED) when we try to import an ova with unaligned vmdk. Second, all the cluster_sector is aligned to sector, the last one should be like this, too. Third, it ease reading with sector based I/Os. Signed-off-by: yuchenlin <yuchenlin@synology.com> Message-Id: <20180913082952.3675-1-yuchenlin@synology.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2018-09-26 10:47:18 +08:00
Max Reitz	439e89fc09	vmdk: Fix possible segfault with non-VMDK backing VMDK performs a probing check in vmdk_co_create_opts() to prevent the user from assigning non-VMDK files as a backing file, because it only supports VMDK backing files. However, with the @backing runtime option, it is possible to assign arbitrary nodes as backing nodes, regardless of what the image header says. Therefore, VMDK may not just access backing nodes assuming they are VMDK nodes -- which it does, because it needs to compare the backing file's CID with the overlay's parentCID value, and naturally the backing file only has a CID when it's a VMDK file. Instead, it should report the CID of non-VMDK backing files not to match the overlay because clearly a non-present CID does not match. Without this change, vmdk_read_cid() reads from the backing file's bs->file, which may be NULL (in which case we get a segfault). Also, it interprets bs->opaque as a BDRVVmdkState and then reads from the .desc_offset field, which usually will just return some arbitrary value which then results in either garbage to be read, or bdrv_pread() to return an error, both of which result in a non-matching CID to be reported. (In a very unlikely case, we could read something that looks like a VMDK descriptor, and then get a CID which might actually match. But that is highly unlikely, and the only result would be that VMDK accepts the backing file which is not too bad (albeit unintentional).) ((And in theory, the seek to .desc_offset might leak data from another block driver's opaque object. But then again, the user should realize very quickly that a non-VMDK backing file does not work (because the read will very likely fail, due to the reasons given above), so this should not be exploitable.)) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20180702210721.4847-2-mreitz@redhat.com Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-07-09 19:43:24 +02:00
yuchenlin	a77672ea3d	vmdk: return ERROR when cluster sector is larger than vmdk limitation VMDK has a hard limitation of extent size, which is due to the size of grain table entry is 32 bits. It means it can only point to a grain located at offset = 2^32. To avoid writing the user data beyond limitation and record a useless offset in grain table. We should return ERROR here. Signed-off-by: yuchenlin <yuchenlin@synology.com> Message-id: 20180322133337.28024-1-yuchenlin@synology.com Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-03-26 21:17:24 +02:00
Paolo Bonzini	2fd6163884	block: convert bdrv_check callback to coroutine_fn Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1516279431-30424-8-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-09 15:17:47 +01:00
Stefan Hajnoczi	efc75e2a4c	block: rename .bdrv_create() to .bdrv_co_create_opts() BlockDriver->bdrv_create() has been called from coroutine context since commit `5b7e1542cf` ("block: make bdrv_create adopt coroutine"). Make this explicit by renaming to .bdrv_co_create_opts() and add the coroutine_fn annotation. This makes it obvious to block driver authors that they may yield, use CoMutex, or other coroutine_fn APIs. bdrv_co_create is reserved for the QAPI-based version that Kevin is working on. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20170705102231.20711-2-stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-02 18:39:07 +01:00
Eric Blake	c72080b9b8	vmdk: Switch to .bdrv_co_block_status() We are gradually moving away from sector-based interfaces, towards byte-based. Update the vmdk driver accordingly. Drop the now-unused vmdk_find_index_in_cluster(). Also, fix a pre-existing bug: if find_extent() fails (unlikely, since the block layer did a bounds check), then we must return a failure, rather than 0. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-02 18:39:07 +01:00
Markus Armbruster	922a01a013	Move include qemu/option.h from qemu-common.h to actual users qemu-common.h includes qemu/option.h, but most places that include the former don't actually need the latter. Drop the include, and add it to the places that actually need it. While there, drop superfluous includes of both headers, and separate #include from file comment with a blank line. This cleanup makes the number of objects depending on qemu/option.h drop from 4545 (out of 4743) to 284 in my "build everything" tree. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-20-armbru@redhat.com> [Semantic conflict with commit `bdd6a90a9e` in block/nvme.c resolved]	2018-02-09 13:52:16 +01:00
Max Reitz	23c4b2a896	block/vmdk: Add blkdebug events This is certainly not complete, but it includes at least write_aio and read_aio. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171123020832.8165-5-mreitz@redhat.com Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-01-23 12:34:43 +01:00
Max Reitz	3c363575dc	block/vmdk: Fix , instead of ; at end of line Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171123020832.8165-2-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-01-23 12:34:42 +01:00
Fam Zheng	0e51b9b7c7	vmdk: Fix error handling/reporting of vmdk_check Errors from the callees must be captured and propagated to our caller, ensure this for both find_extent() and bdrv_getlength(). Reported-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-08-08 15:19:16 +02:00
Peter Maydell	9877860e7b	block/vmdk: Report failures in vmdk_read_cid() The function vmdk_read_cid() can fail if the read on the underlying block device fails, or if there's a format error in the VMDK file. However its API doesn't provide a mechanism to report these errors, and in some cases we were returning a CID of 0 and in some cases a CID of 0xffffffff, either of which might potentially be valid values. Change the function to return 0 on success or a negative errno, and return the CID via a uint32_t* argument. Update the callsites to handle and propagate the error appropriately. This fixes in passing a Coverity-spotted issue (CID 1350038) where we weren't checking the return value from sscanf(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-18 15:14:35 +02:00
Max Reitz	3a691c50f1	block: Add PreallocMode to blk_truncate() blk_truncate() itself will pass that value to bdrv_truncate(), and all callers of blk_truncate() just set the parameter to PREALLOC_MODE_OFF for now. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-4-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-07-11 17:45:01 +02:00
Juan Quintela	795c40b8bd	migration: Create migration/blocker.h This allows us to remove lots of includes of migration/migration.h Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-05-17 12:04:59 +02:00
Max Reitz	ed3d2ec98a	block: Add errp to b{lk,drv}_truncate() For one thing, this allows us to drop the error message generation from qemu-img.c and blockdev.c and instead have it unified in bdrv_truncate(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170328205129.15138-3-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-28 16:02:02 +02:00
Kevin Wolf	55880601d8	block: Add BDRV_O_RESIZE for blk_new_open() blk_new_open() is a convenience function that processes flags rather than QDict options as a simple way to just open an image file. In order to keep it convenient in the future, it must automatically request the necessary permissions. This can easily be inferred from the flags for read and write, but we need another flag that tells us whether to get the resize permission. We can't just always request it because that means that no block jobs can run on the resulting BlockBackend (which is something that e.g. qemu-img commit wants to do), but we also can't request it never because most of the .bdrv_create() implementations call blk_truncate(). The solution is to introduce another flag that is passed by all users that want to resize the image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	862f215fab	block: Request child permissions in format drivers This makes use of the .bdrv_child_perm() implementation for formats that we just added. All format drivers expose the permissions they actually need nows, so that they can be set accordingly and updated when parents are attached or detached. The only format not included here is raw, which was already converted with the other filter drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	4e4bf5c42c	block: Attach bs->file only during .bdrv_open() The way that attaching bs->file worked was a bit unusual in that it was the only child that would be attached to a node which is not opened yet. Because of this, the block layer couldn't know yet which permissions the driver would eventually need. This patch moves the point where bs->file is attached to the beginning of the individual .bdrv_open() implementations, so drivers already know what they are going to do with the child. This is also more consistent with how driver-specific children work. For a moment, bdrv_open() gets its own BdrvChild to perform image probing, but instead of directly assigning this BdrvChild to the BDS, it becomes a temporary one and the node name is passed as an option to the drivers, so that they can simply use bdrv_open_child() to create another reference for their own use. This duplicated child for (the not opened yet) bs is not the final state, a follow-up patch will change the image probing code to use a BlockBackend, which is completely independent of bs. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-24 16:09:23 +01:00
QingFeng Hao	4545d4f4af	block/vmdk: Fix the endian problem of buf_len and lba The problem was triggered by qemu-iotests case 055. It failed when it was comparing the compressed vmdk image with original test.img. The cause is that buf_len in vmdk_write_extent wasn't converted to little-endian before it was stored to disk. But later vmdk_read_extent read it and converted it from little-endian to cpu endian. If the cpu is big-endian like s390, the problem will happen and the data length read by vmdk_read_extent will become invalid! The fix is to add the conversion in vmdk_write_extent, meanwhile, repair the endianness problem of lba field which shall also be converted to little-endian before storing to disk. Cc: qemu-stable@nongnu.org Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com> Signed-off-by: Jing Liu <liujbjl@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20161216052040.53067-2-haoqf@linux.vnet.ibm.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-02-12 00:47:42 +01:00

1 2 3 4 5 ...

277 Commits