mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Alberto Garcia	8e8cb375e0	block: Give always priority to unused entries in the qcow2 L2 cache The current algorithm to replace entries from the L2 cache gives priority to newer hits by dividing the hit count of all existing entries by two everytime there is a cache miss. However, if there are several cache misses the hit count of the existing entries can easily go down to 0. This will result in those entries being replaced even when there are others that have never been used. This problem is more noticeable with larger disk images and cache sizes, since the chances of having several misses before the cache is full are higher. If we make sure that the hit count can never go down to 0 again, unused entries will always have priority. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:22 +01:00
Denis V. Lunev	fa21e6faa6	nbd: fix max_discard/max_transfer_length nbd_co_discard calls nbd_client_session_co_discard which uses uint32_t as the length in bytes of the data to discard due to the following definition: struct nbd_request { uint32_t magic; uint32_t type; uint64_t handle; uint64_t from; uint32_t len; <-- the length of data to be discarded, in bytes } QEMU_PACKED; Thus we should limit bl_max_discard to UINT32_MAX >> BDRV_SECTOR_BITS to avoid overflow. NBD read/write code uses the same structure for transfers. Fix max_transfer_length accordingly. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Peter Lieven <pl@kamp.de> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:22 +01:00
Max Reitz	1ce52846d3	nbd: Improve error messages This patch makes use of the Error object for nbd_receive_negotiate() so that errors during negotiation look nicer. Furthermore, this patch adds an additional error message if the received magic was wrong, but would be correct for the other protocol version, respectively: So if an export name was specified, but the NBD server magic corresponds to an old handshake, this condition is explicitly signaled to the user, and vice versa. As these messages are now part of the "Could not open image" error message, additional filtering has to be employed in iotest 083, which this patch does as well. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:22 +01:00
Jeff Cody	e729fa6afe	block: fix off-by-one error in qcow and qcow2 This fixes an off-by-one error introduced in `9a29e18`. Both qcow and qcow2 need to make sure to leave room for string terminator '\0' for the backing file, so the max length of the non-terminated string is either 1023 or PATH_MAX - 1. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Stefan Hajnoczi	0adfa1ed65	qed: check for header size overflow Header size is denoted in clusters. The maximum cluster size is 64 MB but there is no limit on header size. Check for uint32_t overflow in case the header size field has a whacky value. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1421065893-18875-2-git-send-email-stefanha@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	177b75104d	block/dmg: improve zeroes handling Disk images may contain large all-zeroes gaps (1.66k sectors or 812 MiB is seen in the real world). These blocks (type 2) do not need to be extracted into a temporary buffer, there is no need to allocate memory for these blocks nor to check its length. (For the test image, the maximum uncompressed size is 1054371 bytes, probably for a bzip2-compressed block.) Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-13-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	6b383c08c4	block/dmg: support bzip2 block entry types This patch adds support for bzip2-compressed block entries as introduced with OS X 10.4 (source: https://en.wikipedia.org/wiki/Apple_Disk_Image). It was tested against a 5.2G "OS X Yosemite" installation image which stores the BLXX block in the XML property list (instead of resource forks) and has over 5k chunks. New configure entries are added (--enable-bzip2 / --disable-bzip2) to control inclusion of bzip2 functionality (which requires linking against libbz2). The help message suggests that this option is needed for DMG files, but the tests are generic enough that other parts of QEMU can use bzip2 if needed. The identifiers are based on http://newosxbook.com/DMG.html. The decompression routines are based on the zlib case, but as there is no way to reset the decompression state (unlike zlib), memory is allocated and deallocated for every decompression. This should not be problematic as the decompression takes most of the time and as blocks are typically about/over 1 MiB in size, only one allocation is done every 2000 sectors. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1420566495-13284-12-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	a8b10c6ead	block/dmg: factor out block type check In preparation for adding bzip2 support, split the type check into a separate function. Make all offsets relative to the begin of a chunk such that it is easier to recognize the position without having to add up all offsets. Some comments are added to describe the fields. There is no functional change. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-11-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	66ec3bba97	block/dmg: use SectorNumber from BLKX header Previously the sector table parsing relied on the previous offset of the DMG file. Now it uses the sector number from the BLKX header (see http://newosxbook.com/DMG.html). The implementation of dmg2img (from vu1tur) does not base the output sector on the location of the terminator (0xffffffff) either so it should be safe to drop this dependency on the previous state. (It makes somehow makes sense, a terminator should halt further processing of a block and is perhaps used to preallocate some space.) Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-10-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	c6d34865fa	block/dmg: fix sector data offset calculation This patch addresses two issues: - The data fork offset was not taken into account, resulting in failure to read an InstallESD.dmg file (5164763151 bytes) which had a non-zero DataForkOffset field. - The offset of the previous block ("partition") was unconditionally added to the current block because older files would start the input offset of a new block at zero. Newer files (including vlc-2.1.5.dmg, tuxpaint-0.9.15-macosx.dmg and OS X Yosemite [MAS].dmg) failed in reads because these files have chunk offsets, relative to the begin of a data fork. Now the data offset of the mish is taken into account. While we could check that the data_offset is within the data fork, let's not do that here as it would only result in parse failures on invalid files (rather than gracefully handling such bad files). dmg_read will error out if the offset is incorrect. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-9-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	8daf425794	block/dmg: set virtual size to a non-zero value Right now the virtual size is always reported as zero which makes it impossible to convert between formats. After this patch, the number of sectors will be read from the trailer ("koly" block). To verify the behavior, the output of `dmg2img foo.dmg foo.img` was compared against `qemu-img convert -f dmg -O raw foo.dmg foo.raw`. The tests showed that the file contents are exactly the same, except that QEMU creates a slightly larger file (it matches the total sectors count). Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-8-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	0599e56ed4	block/dmg: process XML plists The format is simple enough to avoid using a full-blown XML parser. It assumes that all BLKX items begin with the "mish" magic word, therefore it is not a problem if other values get matched which are not a BLKX block. The offsets are based on the description at http://newosxbook.com/DMG.html For compatibility with glib 2.12, use g_base64_decode (which additionally requires an extra buffer allocation) instead of g_base64_decode_inplace (which is only available since glib 2.20). Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-7-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	f6e6652d7c	block/dmg: validate chunk size to avoid overflow Previously the chunk size was not checked, allowing for a large memory allocation. This patch checks whether the chunks size is within the resource fork length, and whether the resource fork is below the trailer of the dmg file. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-6-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	7aee37b93a	block/dmg: process a buffer instead of reading ints As the decoded plist XML is not a pointer in the file, dmg_read_mish_block must be able to process a buffer instead of a file pointer. Since the full buffer must be processed, let's change the return value again to just a success flag. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-5-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	b0e8dc5d54	block/dmg: extract processing of resource forks Besides the offset, also read the resource length. This length is now used in the extracted function to verify the end of the resource fork against "count" from the resource fork. Instead of relying on the value of offset to conclude whether the resource fork is available or not (info_begin==0), check the rsrc_fork_length instead. This would allow a dmg file to begin with a resource fork. This seemingly unnecessary restriction was found while trying to craft a DMG file by hand. Other changes: - Do not require resource data offset to be 0x100 (but check that it is within bounds though). - Further improve boundary checking (resource data must be within the resource fork). - Use correct value for resource data length (spotted by John Snow) - Consider the resource data offset when determining info_end. This fixes an EINVAL on the tuxpaint dmg example. The resource fork format is documented at https://developer.apple.com/legacy/library/documentation/mac/pdf/MoreMacintoshToolbox.pdf#page=151 Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-4-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	65a1c7c96a	block/dmg: extract mish block decoding functionality Extract the mish block decoder such that this can be used for other formats in the future. A new DmgHeaderState struct is introduced to share state while decoding. The code is kept unchanged as much as possible, a "fail" label is added for example where a simple return would probably do. In dmg_open, the variable "tmp" is renamed to "rsrc_data_offset" for clarity and comments have been added explaining various data. Note that this patch has one subtle difference with the previous version which should not affect functionality. In the previous code, the end of a resource was inferred from the mish block (the offsets would be increased by the fields). In this patch, the resource length is used instead to avoid the need to rely on the previous offsets. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1420566495-13284-3-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Wu	fa8354bd22	block/dmg: properly detect the UDIF trailer DMG files have a variable length with a UDIF trailer at the end of a file. This UDIF trailer is essential as it describes the contents of the image. At the moment however, the start of this trailer is almost always incorrect as bdrv_getlength() returns a multiple of the block size (rounded up). This results in a failure to recognize DMG files, resulting in Invalid argument (EINVAL) errors. As there is no API to retrieve the real file size, look for the magic header in the last two sectors to find the start of this 512-byte UDIF trailer (the "koly" block). The resource fork offset ("info_begin") has its offset adjusted as the initial value of offset does not mean "end of file" anymore, but "begin of UDIF trailer". [Replaced error_set(errp, ERROR_CLASS_GENERIC_ERROR, ...) with error_setg(errp, ...) as discussed with Peter. --Stefan] Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1420566495-13284-2-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Francesco Romani	e2462113b2	block: add event when disk usage exceeds threshold Managing applications, like oVirt (http://www.ovirt.org), make extensive use of thin-provisioned disk images. To let the guest run smoothly and be not unnecessarily paused, oVirt sets a disk usage threshold (so called 'high water mark') based on the occupation of the device, and automatically extends the image once the threshold is reached or exceeded. In order to detect the crossing of the threshold, oVirt has no choice but aggressively polling the QEMU monitor using the query-blockstats command. This lead to unnecessary system load, and is made even worse under scale: deployments with hundreds of VMs are no longer rare. To fix this, this patch adds: * A new monitor command `block-set-write-threshold', to set a mark for a given block device. * A new event `BLOCK_WRITE_THRESHOLD', to report if a block device usage exceeds the threshold. * A new `write_threshold' field into the `BlockDeviceInfo' structure, to report the configured threshold. This will allow the managing application to use smarter and more efficient monitoring, greatly reducing the need of polling. [Updated qemu-iotests 067 output to add the new 'write_threshold' property. --Stefan] [Changed g_assert_false() to !g_assert() to fix the build on older glib versions. --Kevin] Signed-off-by: Francesco Romani <fromani@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1421068273-692-1-git-send-email-fromani@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Lieven	454057b7d9	block-backend: expose bs->bl.max_transfer_length Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Peter Lieven	f4564d53c6	block: add accounting for merged requests Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Fam Zheng	35f5a49374	qed: Really remove unused field QEDAIOCB.finished The commit `533ffb17a` that removed qed_aiocb_info.cancel said to remove this but didn't do it. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:21 +01:00
Denis V. Lunev	1cdc3239f1	block: use fallocate(FALLOC_FL_PUNCH_HOLE) & fallocate(0) to write zeroes This sequence works efficiently if FALLOC_FL_ZERO_RANGE is not supported. Unfortunately, FALLOC_FL_ZERO_RANGE is supported on really modern systems and only for a couple of filesystems. FALLOC_FL_PUNCH_HOLE is much more mature. The sequence of 2 operations FALLOC_FL_PUNCH_HOLE and 0 is necessary due to the following reasons: - FALLOC_FL_PUNCH_HOLE creates a hole in the file, the file becomes sparse. In order to retain original functionality we must allocate disk space afterwards. This is done using fallocate(0) call - fallocate(0) without preceeding FALLOC_FL_PUNCH_HOLE will do nothing if called above already allocated areas of the file, i.e. the content will not be zeroed This should increase the performance a bit for not-so-modern kernels. CC: Max Reitz <mreitz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	d50d822219	block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes There is a possibility that we are extending our image and thus writing zeroes beyond the end of the file. In this case we do not need to care about the hole to make sure that there is no data in the file under this offset (pre-condition to fallocate(0) to work). We could simply call fallocate(0). This improves the performance of writing zeroes even on really old platforms which do not have even FALLOC_FL_PUNCH_HOLE. Before the patch do_fallocate was used when either CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE are defined. Now the story is different. CONFIG_FALLOCATE is defined when Linux fallocate is defined, posix_fallocate is completely different story (CONFIG_POSIX_FALLOCATE). CONFIG_FALLOCATE is mandatory prerequite for both CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE thus we are on the safe side. CC: Max Reitz <mreitz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	b953f07500	block: use fallocate(FALLOC_FL_ZERO_RANGE) in handle_aiocb_write_zeroes This efficiently writes zeroes on Linux if the kernel is capable enough. FALLOC_FL_ZERO_RANGE correctly handles all cases, including and not including file expansion. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	37cc9f7f68	block/raw-posix: refactor handle_aiocb_write_zeroes a bit move code dealing with a block device to a separate function. This will allow to implement additional processing for ordinary files. Please note, that xfs_code has been moved before checking for s->has_write_zeroes as xfs_write_zeroes does not touch this flag inside. This makes code a bit more consistent. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	0b99171230	block/raw-posix: create do_fallocate helper The pattern do { if (fallocate(s->fd, mode, offset, len) == 0) { return 0; } } while (errno == EINTR); ret = translate_err(-errno); will be commonly useful in next patches. Create helper for it. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Denis V. Lunev	1486df0e31	block/raw-posix: create translate_err helper to merge errno values actually the code if (ret == -ENODEV \|\| ret == -ENOSYS \|\| ret == -EOPNOTSUPP \|\| ret == -ENOTTY) { ret = -ENOTSUP; } is present twice and will be added a couple more times. Create helper for this. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Fam Zheng <famz@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:20 +01:00
Jeff Cody	cdf9634bdf	block: vhdx - force FileOffsetMB field to '0' for certain block states The v1.0.0 spec calls out PAYLOAD_BLOCK_ZERO FileOffsetMB field as being 'reserved'. In practice, this means that Hyper-V will fail to read a disk image with PAYLOAD_BLOCK_ZERO block states with a FileOffsetMB value other than 0. The other states that indicate a block that is not there (PAYLOAD_BLOCK_UNDEFINED, PAYLOAD_BLOCK_NOT_PRESENT, PAYLOAD_BLOCK_UNMAPPED) have multiple options for what FileOffsetMB may be set to, and '0' is explicitly called out as an option. For all the above states, we will also just set the FileOffsetMB value to 0. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: a9fe92f53f07e6ab1693811e4312c0d1e958500b.1421787566.git.jcody@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-01-23 12:41:32 -05:00
Jeff Cody	9a29e18f7d	block: update string sizes for filename,backing_file,exact_filename The string field entries 'filename', 'backing_file', and 'exact_filename' in the BlockDriverState struct are defined as 1024 bytes. However, many places that use these values accept a maximum of PATH_MAX bytes, so we have a mixture of 1024 byte and PATH_MAX byte allocations. This patch makes the BlockDriverStruct field string sizes match usage. This patch also does a few fixes related to the size that needs to happen now: * the block qapi driver is updated to use PATH_MAX bytes * the qcow and qcow2 drivers have an additional safety check * the block vvfat driver is updated to use PATH_MAX bytes for the size of backing_file, for systems where PATH_MAX is < 1024 bytes. * qemu-img uses PATH_MAX rather than 1024. These instances were not changed to be dynamically allocated, however, as the extra temporary 3K in stack usage for qemu-img does not seem worrisome. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-23 18:17:06 +01:00
Jeff Cody	1d33936ea8	block: mirror - change string allocation to 2-bytes The backing_filename string in mirror_run() is only used to check for a NULL string, so we don't need to allocate 1024 bytes (or, later, PATH_MAX bytes), when we only need to copy the first 2 characters. We technically only need 1 byte, as we are just checking for NULL, but since backing_filename[] is populated by bdrv_get_backing_filename(), a string size of 1 will always only return '\0'; Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-23 18:17:06 +01:00
Jeff Cody	564d64bdde	block: qapi - move string allocation from stack to the heap Rather than declaring 'backing_filename2' on the stack in bdrv_query_image_info(), dynamically allocate it on the heap. Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-23 18:17:06 +01:00
Jeff Cody	fe2065629a	block: vmdk - move string allocations from stack to the heap Functions 'vmdk_parse_extents' and 'vmdk_create' allocate several PATH_MAX sized arrays on the stack. Make these dynamically allocated. Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-23 18:17:05 +01:00
Jeff Cody	395a22fae0	block: vmdk - make ret variable usage clear Keep the variable 'ret' something that is returned by the function it is defined in. For the return value of 'sscanf', use a more meaningful variable name. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-23 18:17:05 +01:00
Max Reitz	8dd93d9339	qcow2: Add two more unalignment checks This adds checks for unaligned L2 table offsets and unaligned data cluster offsets (actually the preallocated offsets for zero clusters) to the zero cluster expansion function. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-23 18:17:05 +01:00
Paolo Bonzini	66552b894b	coroutine: drop qemu_coroutine_adjust_pool_size This is not needed anymore. The new TLS-based algorithm is adaptive. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1417518350-6167-7-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-01-13 13:43:29 +00:00
Fam Zheng	c29c1dd312	qmp: Add command 'blockdev-backup' Similar to drive-backup, but this command uses a device id as target instead of creating/opening an image file. Also add blocker on target bs, since the target is also a named device now. Add check and report error for bs == target which became possible but is an illegal case with introduction of blockdev-backup. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-id: 1418899027-8445-3-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-01-13 11:47:56 +00:00
Vladimir Sementsov-Ogievskiy	c4237dfa63	block: fix spoiling all dirty bitmaps by mirror and migration Mirror and migration use dirty bitmaps for their purposes, and since commit [block: per caller dirty bitmap] they use their own bitmaps, not the global one. But they use old functions bdrv_set_dirty and bdrv_reset_dirty, which change all dirty bitmaps. Named dirty bitmaps series by Fam and Snow are affected: mirroring and migration will spoil all (not related to this mirroring or migration) named dirty bitmaps. This patch fixes this by adding bdrv_set_dirty_bitmap and bdrv_reset_dirty_bitmap, which change concrete bitmap. Also, to prevent such mistakes in future, old functions bdrv_(set,reset)_dirty are made static, for internal block usage. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com> CC: John Snow <jsnow@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: Denis V. Lunev <den@openvz.org> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1417081246-3593-1-git-send-email-vsementsov@parallels.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-01-13 11:47:56 +00:00
Max Reitz	1085daf941	block/vmdk: Relative backing file for creation When a vmdk image is created with a backing file, it is opened to check whether it is indeed a vmdk file by letting qemu probe it. When doing so, the backing filename is relative to the image's base directory so it should be interpreted accordingly. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-13 11:47:56 +00:00
Max Reitz	9f07429e88	block: JSON filenames and relative backing files When using a relative backing file name, qemu needs to know the directory of the top image file. For JSON filenames, such a directory cannot be easily determined (e.g. how do you determine the directory of a qcow2 BDS directly on top of a quorum BDS?). Therefore, do not allow relative filenames for the backing file of BDSs only having a JSON filename. Furthermore, BDS::exact_filename should be used whenever possible. If BDS::filename is not equal to BDS::exact_filename, the former will always be a JSON object. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-01-13 11:47:56 +00:00
Peter Wu	debfb917a4	block/iscsi: fix uninitialized variable 'ret' was never initialized in the success path. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-03 09:22:13 +01:00
Paolo Bonzini	82595da8de	linux-aio: simplify removal of completed iocbs from the list There is no need to do another O(n) pass on the list; the iocb to split the list at is already available through the array we passed to io_submit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1418305950-30924-6-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-12-12 16:57:55 +00:00
Paolo Bonzini	de35464461	linux-aio: drop return code from laio_io_unplug and ioq_submit These are unused. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1418305950-30924-5-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-12-12 16:57:55 +00:00
Paolo Bonzini	8455ce053a	linux-aio: rename LaioQueue idx field to "n" It does not identify an index in an array anymore. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1418305950-30924-4-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-12-12 16:57:55 +00:00
Paolo Bonzini	43f2376e09	linux-aio: track whether the queue is blocked Avoid that unplug submits requests when io_submit reported that it couldn't accept more; at the same time, try more io_submit calls if it could handle the whole set of requests that were passed, so that the "blocked" flag is reset as soon as possible. After the previous patch, laio_submit already tried to avoid submitting requests to a blocked queue, by comparing s->io_q.idx with "==" instead of the more natural ">=". Switch to the simpler expression now that we have the "blocked" flag. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1418305950-30924-3-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-12-12 16:57:55 +00:00
Paolo Bonzini	28b240877b	linux-aio: queue requests that cannot be submitted Keep a queue of requests that were not submitted; pass them to the kernel when a completion is reported, unless the queue is plugged. The array of iocbs is rebuilt every time from scratch. This avoids keeping the iocbs array and list synchronized. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1418305950-30924-2-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-12-12 16:57:55 +00:00
Jeff Cody	85b712c9d5	block: vhdx - set .bdrv_has_zero_init to bdrv_has_zero_init_1 Now that new VHDX images will default to BAT block states of PAYLOAD_BLOCK_ZERO, we can indicate that VHDX has zero init. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 5e582703e36450b9ca939e2e5c9fa3930030f7fe.1418018421.git.jcody@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-12-12 16:35:35 +00:00
Jeff Cody	30af51ce7f	block: vhdx - change .vhdx_create default block state to ZERO The VHDX spec specifies that the default new block state is PAYLOAD_BLOCK_NOT_PRESENT for a dynamic VHDX image, and PAYLOAD_BLOCK_FULLY_PRESENT for a fixed VHDX image. However, in order to create space-efficient VHDX images with qemu-img convert, it is desirable to be able to set has_zero_init to true for VHDX. There is currently an option when creating VHDX images, to use block state ZERO for new blocks. However, this currently defaults to 'off'. In order to be able to eventually set has_zero_init to true for VHDX, this needs to default to 'on'. This patch changes the default to 'on', and provides some help information to warn against setting it to 'off' when using qemu-img convert. [Max Reitz pointed out that a full stop was missing at the end of the VHDX_BLOCK_OPT_ZERO option help text. I have added it. --Stefan] Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 85164899eacc86e150c3ceba793cf93b398dedd7.1418018421.git.jcody@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-12-12 15:42:49 +00:00
Jeff Cody	a9d1e9daa5	block: vhdx - update PAYLOAD_BLOCK_UNMAPPED value to match 1.00 spec The 0.95 VHDX spec defined PAYLOAD_BLOCK_UNMAPPED to be 5. The 1.00 VHDX spec redefines PAYLOAD_BLOCK_UNMAPPED to be 3 instead. The original value of 5 is now an undefined state in the spec, but it should be safe to treat it the same and return zeros for data read. This way, we can maintain compatibility with any images out in the wild that may have been created in accordance to the 0.95 spec. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 8a4d2da73a8dbc04cde62bea782fc09ff84b1cf1.1418018421.git.jcody@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-12-12 15:42:22 +00:00
Jeff Cody	0571df44a1	block: vhdx - remove redundant comments Minor cleanup. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: e8718ae3fd3e40a527e46a00e394973fbaab4d53.1418018421.git.jcody@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-12-12 15:42:22 +00:00
Gonglei	9281dbe653	block/rbd: fix memory leak Variable local_err going out of scope leaks the storage it points to. Cc: Markus Armbruster <armbru@redhat.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Amos Kong <akong@redhat.com> Message-id: 1417674851-6248-1-git-send-email-arei.gonglei@huawei.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-12-12 13:16:56 +00:00
Max Reitz	5c98415b2a	vmdk: Fix error for JSON descriptor file names If vmdk blindly tries to use path_combine() using bs->file->filename as the base file name, this will result in a bad error message for JSON file names when calling bdrv_open(). It is better to only try bs->file->exact_filename; if that is empty, bs->file->filename will be useless for path_combine() and an error should be emitted (containing bs->file->filename because desc_file_path (which is bs->file->exact_filename) is empty). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1417615043-26174-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-12-12 13:14:10 +00:00
Fam Zheng	d899d2e248	vmdk: Set errp on failures in vmdk_open_vmdk4 Reported-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Don Koch <dkoch@verizon.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1417649314-13704-7-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:21 +01:00
Fam Zheng	9aeecbbc62	vmdk: Remove unnecessary initialization It will be assigned to the return value of vmdk_read_desc. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Don Koch <dkoch@verizon.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1417649314-13704-6-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:21 +01:00
Fam Zheng	03c3359dfc	vmdk: Check descriptor file length when reading it Since a too small file cannot be a valid VMDK image, and also since the buffer's first 4 bytes will be unconditionally examined by vmdk_open_sparse, let's error out the small file case to be clear. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Don Koch <dkoch@verizon.com> Message-id: 1417649314-13704-5-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:21 +01:00
Fam Zheng	73b7bcad43	vmdk: Clean up descriptor file reading Zeroing a buffer that will be filled right after is not necessary, and allocating a power of two + 1 is naughty. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Don Koch <dkoch@verizon.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1417649314-13704-4-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:21 +01:00
Fam Zheng	8a3e0bc370	vmdk: Fix comment to match code of extent lines commit `04d542c8b` (vmdk: support vmfs files) added support of VMFS extent type but the comment above the changed code is left out. Update the comment so they are consistent. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Don Koch <dkoch@verizon.com> Message-id: 1417649314-13704-3-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:21 +01:00
Fam Zheng	e5dc64b8ff	vmdk: Use g_random_int to generate CID This replaces two "time(NULL)" invocations with "g_random_int()". According to VMDK spec, CID "is a random 32‐bit value updated the first time the content of the virtual disk is modified after the virtual disk is opened". Using "seconds since epoch" is just a "lame way" to generate it, and not completely safe because of the low precision. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Don Koch <dkoch@verizon.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1417649314-13704-2-git-send-email-famz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:21 +01:00
Jeff Cody	625fa9fe6f	block: remove BLOCK_OPT_NOCOW from vpc_create_opts In commit `fef6070`, the need for NOCOW was removed from the vpc driver, as we removed the the posix calls. However, the BLOCK_OPT_NOCOW was not removed from vpc_create_opts. This was a mistake - remove the opt from there as well. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Message-id: 8ba076fa725fed681cde7d8afc4fb239ae06a9c6.1417620301.git.jcody@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:21 +01:00
Jeff Cody	0d0d7f47b4	block: remove BLOCK_OPT_NOCOW from vdi_create_opts In commit `7074786`, the need for NOCOW was removed from the vdi driver, as we removed the the posix calls. However, the BLOCK_OPT_NOCOW was not removed from vdi_create_opts. This was a mistake - remove the opt from there as well. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Message-id: e189364de11929d8fa04722f5d845de0a9834d44.1417620301.git.jcody@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:20 +01:00
Max Reitz	01212d4ed6	block/raw-posix: Fix ret in raw_open_common() The return value must be negative on error; there is one place in raw_open_common() where errp is set, but ret remains 0. Fix it. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:20 +01:00
Max Reitz	6a69b9620a	qcow2: Respect bdrv_truncate() error bdrv_truncate() may fail and qcow2_write_compressed() should return the error code in that case. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:20 +01:00
Max Reitz	3b5e14c76a	qcow2: Flushing the caches in qcow2_close may fail qcow2_cache_flush() may fail; if one of the caches failed to be flushed successfully to disk in qcow2_close() the image should not be marked clean, and we should emit a warning. This breaks the (qcow2-specific) iotests 026, 071 and 089; change their output accordingly. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:20 +01:00
Max Reitz	11c89769dc	qcow2: Prevent numerical overflow In qcow2_alloc_cluster_offset(), num is limited to INT_MAX >> BDRV_SECTOR_BITS by all callers. However, since remaining is of type uint64_t, we might as well cast num to that type before performing the shift. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:20 +01:00
Max Reitz	fd752801ae	block/nfs: Add create_opts The nfs protocol driver is capable of creating images, but did not specify any creation options. Fix it. A way to test this issue is the following: $ qemu-img create -f nfs nfs://127.0.0.1/foo.qcow2 64M Without this patch, it segfaults. With this patch, it does not. However, this is not something that should really work; qemu-img should check whether the parameter for the -f option (and -O for convert) is indeed a format, and error out if it is not. Therefore, I am not making it an iotest. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:19 +01:00
Max Reitz	1bcb15cf77	block/vvfat: qcow driver may not be found Although virtually impossible right now, bdrv_find_format("qcow") may fail. The vvfat block driver should heed that case. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:19 +01:00
Max Reitz	ef8104378c	block: Omit bdrv_find_format for essential drivers We can always assume raw, file and qcow2 being available; so do not use bdrv_find_format() to locate their BlockDriver objects but statically reference the respective objects. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:19 +01:00
Max Reitz	5f535a941e	block: Make essential BlockDriver objects public There are some block drivers which are essential to QEMU and may not be removed: These are raw, file and qcow2 (as the default non-raw format). Make their BlockDriver objects public so they can be directly referenced throughout the block layer without needing to call bdrv_find_format() and having to deal with an error at runtime, while the real problem occurred during linking (where raw, file or qcow2 were not linked into qemu). Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:19 +01:00
Paolo Bonzini	a56ebc6ba4	block: do not use get_clock() Use the external qemu-timer API instead. No one else should be calling cpu_get_clock(), get_clock() and get_clock_realtime() directly; they are internal functions and they should be confined to qemu-timer.c and cpus.c (where the icount implementation resides). All accesses should go through qemu_clock_get_ns. Cc: kwolf@redhat.com Cc: stefanha@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1417010463-3527-2-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:13 +01:00
Kevin Wolf	2ebafc854d	qcow2: Fix header extension size check After reading the extension header, offset is incremented, but not checked against end_offset any more. This way an integer overflow could happen when checking whether the extension end is within the allowed range, effectively disabling the check. This patch adds the missing check and a test case for it. Cc: qemu-stable@nongnu.org Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1416935562-7760-2-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:13 +01:00
Kevin Wolf	38f3ef574b	raw: Prohibit dangerous writes for probed images If the user neglects to specify the image format, QEMU probes the image to guess it automatically, for convenience. Relying on format probing is insecure for raw images (CVE-2008-2004). If the guest writes a suitable header to the device, the next probe will recognize a format chosen by the guest. A malicious guest can abuse this to gain access to host files, e.g. by crafting a QCOW2 header with backing file /etc/shadow. Commit `1e72d3b` (April 2008) provided -drive parameter format to let users disable probing. Commit `f965509` (March 2009) extended QCOW2 to optionally store the backing file format, to let users disable backing file probing. QED has had a flag to suppress probing since the beginning (2010), set whenever a raw backing file is assigned. All of these additions that allow to avoid format probing have to be specified explicitly. The default still allows the attack. In order to fix this, commit `79368c8` (July 2010) put probed raw images in a restricted mode, in which they wouldn't be able to overwrite the first few bytes of the image so that they would identify as a different image. If a write to the first sector would write one of the signatures of another driver, qemu would instead zero out the first four bytes. This patch was later reverted in commit `8b33d9e` (September 2010) because it didn't get the handling of unaligned qiov members right. Today's block layer that is based on coroutines and has qiov utility functions makes it much easier to get this functionality right, so this patch implements it. The other differences of this patch to the old one are that it doesn't silently write something different than the guest requested by zeroing out some bytes (it fails the request instead) and that it doesn't maintain a list of signatures in the raw driver (it calls the usual probe function instead). Note that this change doesn't introduce new breakage for false positive cases where the guest legitimately writes data into the first sector that matches the signatures of an image format (e.g. for nested virt): These cases were broken before, only the failure mode changes from corruption after the next restart (when the wrong format is probed) to failing the problematic write request. Also note that like in the original patch, the restrictions only apply if the image format has been guessed by probing. Explicitly specifying a format allows guests to write anything they like. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1416497234-29880-8-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:13 +01:00
Max Reitz	2c28b21f7c	block: Add blk_add_close_notifier() for BB Adding something like a "delete notifier" to a BlockBackend would not make much sense, because whoever is interested in registering there will probably hold a reference to that BlockBackend; therefore, the notifier will never be called (or only when the notifiee already relinquished its reference and thus most probably is no longer interested in that notification). Therefore, this patch just passes through the close notifier interface of the root BDS. This will be called when the device is ejected, for instance, and therefore does make sense. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1416309679-333-4-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:12 +01:00
Max Reitz	2019ba0a01	block: Add AioContextNotifier functions to BB Because all BlockDriverStates behind a single BlockBackend reside in a single AioContext, it is fine to just pass these functions (blk_add_aio_context_notifier() and blk_remove_aio_context_notifier()) through to the root BlockDriverState. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1416309679-333-3-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:12 +01:00
Max Reitz	2bb0dce762	block: Lift more functions into BlockBackend There are already some blk_aio_* functions, so we might as well have blk_co_* functions (as far as we need them). This patch adds blk_co_flush(), blk_co_discard(), and also blk_invalidate_cache() (which is not a blk_co_* function but is needed nonetheless). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1416309679-333-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:12 +01:00
Max Reitz	8779441b1b	blkdebug: Simplify and improve filename generation Instead of actually recreating the options from scratch, just reuse the options given for creating the BDS, which are the configuration file name and additional options. In case there are no additional options we can thus create a plain filename. This obviously results in a different output for qemu-iotest 099 which exactly tests this filename generation. Fix it up as well. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1415697825-26678-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:31:11 +01:00
Kevin Wolf	9e193c5a65	block/qapi: Add cache information to query-block Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2014-12-10 10:31:09 +01:00
Fam Zheng	f71eaa74c0	qmp: Add optional switch "query-nodes" in query-blockstats This bool option will allow query all the node names. It iterates all the BDSes that are assigned a name, also in this case don't query up the backing chain. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:25:29 +01:00
Fam Zheng	4875a77950	block: Include "node-name" if present in query-blockstats Node name is a better identifier of BDS. We will want to query statistics of a BDS node buried in the BDS graph, so reporting the node's name if there is one will do the trick. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-12-10 10:25:29 +01:00
Kevin Wolf	24bf10dac3	Revert "qemu-img info: show nocow info" This reverts commit `000c4dfff4`. The main reason for reverting this commit before the 2.2 release is that it adds a QAPI interface that we don't want to keep: The 'nocow' flag doesn't generally make sense for block nodes, but only for the raw-posix driver. It should therefore be part of ImageInfoSpecific rather than ImageInfo. The commit contains more problems, but unlike the API stability issue they wouldn't justify reverting it. Conflicts: block/qapi.c Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-24 13:52:10 +01:00
Max Reitz	098ffa6674	block/raw-posix: Catch fsync() errors fsync() may fail, and that case should be handled. Reported-by: László Érsek <lersek@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-11-18 12:09:00 +01:00
Max Reitz	731de38052	block/raw-posix: Only sync after successful preallocation The loop which filled the file with zeroes may have been left early due to an error. In that case, the fsync() should be skipped. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-11-18 12:09:00 +01:00
Max Reitz	39411cf3c3	block/raw-posix: Fix preallocating write() loop write() may write less bytes than requested; in this case, the number of bytes written is returned. This is the byte count we should be subtracting from the number of bytes still to be written, and not the byte count we requested to write. Reported-by: László Érsek <lersek@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-11-18 12:08:59 +01:00
Markus Armbruster	d1f06fe665	raw-posix: The SEEK_HOLE code is flawed, rewrite it On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris, but not Linux), try_seek_hole() reports trailing data instead. Additionally, unlikely lseek() failures are treated badly: * When SEEK_HOLE fails, try_seek_hole() reports trailing data. For -ENXIO, there's in fact a trailing hole. Can happen only when something truncated the file since we opened it. * When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds, then try_seek_hole() reports a trailing hole. This is okay only when SEEK_DATA failed with -ENXIO (which means the non-trailing hole found by SEEK_HOLE has since become trailing somehow). For other failures (unlikely), it's wrong. * When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely), then try_seek_hole() reports bogus data [-1,start), which its caller raw_co_get_block_status() turns into zero sectors of data. Could theoretically lead to infinite loops in code that attempts to scan data vs. hole forward. Rewrite from scratch, with very careful comments. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2014-11-18 09:45:48 +01:00
Markus Armbruster	c4875e5b22	raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Commit `5500316` (May 2012) implemented raw_co_is_allocated() as follows: 1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl 2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek() 3. Else pretend there are no holes Later on, raw_co_is_allocated() was generalized to raw_co_get_block_status(). Commit `4f11aa8` (May 2014) changed it to try the three methods in order until success, because "there may be implementations which support [SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice versa." Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC. Commit `38c4d0a` (Sep 2014) added it. Because that's a significant speed hit, the next commit `7c159037` put SEEK_HOLE/SEEK_DATA first. As you see, the obvious use of FIEMAP is wrong, and the correct use is slow. I guess this puts it somewhere between -7 "The obvious use is wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard to Misuse scale[]. "Fortunately", the FIEMAP code is used only when SEEK_HOLE/SEEK_DATA aren't defined, but CONFIG_FIEMAP is Uncommon. SEEK_HOLE had no XFS implementation between 2011 (when it was introduced for ext4 and btrfs) and 2012. * SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails Unlikely. Thus, the FIEMAP code executes rarely. Makes it a nice hidey-hole for bugs. Worse, bugs hiding there can theoretically bite even on a host that has SEEK_HOLE/SEEK_DATA. I don't want to worry about this crap, not even theoretically. Get rid of it. [*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2014-11-18 09:45:35 +01:00
Markus Armbruster	be2ebc6dad	raw-posix: Fix comment for raw_co_get_block_status() Missed in commit `705be72`. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2014-11-18 09:44:02 +01:00
Fam Zheng	5f58330790	vmdk: Leave bdi intact if -ENOTSUP in vmdk_get_info When extent types don't match, we return -ENOTSUP. In this case, be polite to the caller and don't modify bdi. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1415938161-16217-1-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-14 09:20:45 +00:00
Max Reitz	d20418ee51	block/vdi: Limit maximum size even futher The block layer read and write functions do not like requests which are bigger than INT_MAX bytes. Since the VDI bmap is read and written in a single operation, its size is therefore limited accordingly. This reduces the maximum VDI image size supported by QEMU to half of what it currently is (down to approximately 512 TB). The VDI test 084 has to be adapted accordingly. Actually, one could clearly see that it was broken from the "Could not open 'TEST_DIR/t.IMGFMT': Invalid argument" line for an image which was supposed to work just fine. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de>	2014-11-09 23:39:50 +01:00
Peter Maydell	9a33c0c851	-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUV2wdAAoJEJykq7OBq3PIjcAH/29rl938ETw1wjXxYe3uH+R6 K2yFEiPh9/cOJSH0mJ+gD8DZIN+iyR4eoQGP2s5ALFPcX3bkYxRLlUeYK0BCp883 esc7gO6XPhLvTVqP0xgACRCdUwH2I0VTToDlHjXXZogyI/DuDX3gzWJufE3x1DGs WNTMOp5n/uYkWH3rI3DkInmbSddEz3pgX65a8BuYtw0V/RSeSRnHKDYHMygvJBRL EVfWRNeOIrZ730CyJry0t8ITjsZxiBDKXR5glNSwaIfQUfGkTSWi9YNSurNYkUDr aMS2rgvOVlrOUDKTHUj9oS3jgoGWcDtlk9E1MeSoyIptbRoMhdFVl1AUJZsrMJU= =Mfbu -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging # gpg: Signature made Mon 03 Nov 2014 11:50:53 GMT using RSA key ID 81AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" * remotes/stefanha/tags/block-pull-request: (53 commits) block: declare blockjobs and dataplane friends! block: let commit blockjob run in BDS AioContext block: let mirror blockjob run in BDS AioContext block: let stream blockjob run in BDS AioContext block: let backup blockjob run in BDS AioContext block: add bdrv_drain() blockjob: add block_job_defer_to_main_loop() blockdev: add note that block_job_cb() must be thread-safe blockdev: acquire AioContext in blockdev_mark_auto_del() blockdev: acquire AioContext in do_qmp_query_block_jobs_one() block: acquire AioContext in generic blockjob QMP commands iotests: Expand test 061 block/qcow2: Simplify shared L2 handling in amend block/qcow2: Make get_refcount() global block/qcow2: Implement status CB for amend qemu-img: Fix insignificant memleak qemu-img: Add progress output for amend block: Add status callback to bdrv_amend_options() block: qemu-iotest 107 supports NFS iotests: Add test for qcow2's bdrv_make_empty ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2014-11-03 18:34:09 +00:00
Peter Maydell	7135781f65	trivial patches for 2014-11-02 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJUVhuDAAoJEL7lnXSkw9fbOKMIAIE3XZMhar4Vmokb/K0DFbnh gy2z7iCe7vumLKiRSJX1LGmkFO3dwykw82JZQ1SVo0RdgguJ5dx1Abx1qDM1rojL jJT0pJ9zWPl4fTv38wCEfaysQHPdgwoH4826ga+MXnVS9XHRHHxuQ4vI01AK3oyQ 4t6/wto9H8kF3n6ny7tz5WNZClsq7qbiIqw5nNCILQfSh/VBPwxQNBiWf/nYVMuY Ubk5noztZwH+hbiAQL5lAPz/HolcRwg1tzbR0dfmt8/aqO28rJhasG58JgtziI2y JSg4BwldqUQEgiHonArLfQDixjLtEEyL+fQSzZm02ixwcBpc/ADSyGDy2R1zpH8= =j1ga -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2014-11-02' into staging trivial patches for 2014-11-02 # gpg: Signature made Sun 02 Nov 2014 11:54:43 GMT using RSA key ID A4C3D7DB # gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>" # gpg: aka "Michael Tokarev <mjt@corpit.ru>" # gpg: aka "Michael Tokarev <mjt@debian.org>" * remotes/mjt/tags/pull-trivial-patches-2014-11-02: (23 commits) vdi: wrapped uuid_unparse() in #ifdef tap: fix possible fd leak in net_init_tap tap: do not close(fd) in net_init_tap_one target-i386: Remove unused model_features_t struct tap_int.h: remove repeating NETWORK_SCRIPT defines os-posix: reorder parent notification for -daemonize pidfile: stop making pidfile error a special case os-posix: replace goto again with a proper loop os-posix: use global daemon_pipe instead of cryptic fds[1] dump: Fix dump-guest-memory termination and use-after-close virtio-9p-proxy: improve error messages in connect_namedsocket() virtio-9p-proxy: fix error return in proxy_init() virtio-9p-proxy: Fix sockfd leak target-tricore: check return value before using it net/slirp: specify logbase for smbd Revert "os-posix: report error message when lock file failed" util: Improve os_mem_prealloc error message sparse: fix build target-arm: A64: remove redundant store target-xtensa: mark XtensaConfig structs as unused ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2014-11-03 14:55:17 +00:00
Stefan Hajnoczi	9e85cd5ce0	block: let commit blockjob run in BDS AioContext The commit block job must run in the BlockDriverState AioContext so that it works with dataplane. Acquire the AioContext in blockdev.c so starting the block job is safe. One detail here is that the bdrv_drain_all() must be moved inside the aio_context_acquire() region so requests cannot sneak in between the drain and acquire. The completion code in block/commit.c must perform backing chain manipulation and bdrv_reopen() from the main loop. Use block_job_defer_to_main_loop() to achieve that. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1413889440-32577-11-git-send-email-stefanha@redhat.com	2014-11-03 11:41:49 +00:00
Stefan Hajnoczi	5a7e7a0bad	block: let mirror blockjob run in BDS AioContext The mirror block job must run in the BlockDriverState AioContext so that it works with dataplane. Acquire the AioContext in blockdev.c so starting the block job is safe. Note that to_replace is treated separately from other BlockDriverStates in that it does not need to be in the same AioContext. Explicitly acquire/release to_replace's AioContext when accessing it. The completion code in block/mirror.c must perform BDS graph manipulation and bdrv_reopen() from the main loop. Use block_job_defer_to_main_loop() to achieve that. The bdrv_drain_all() call is not allowed outside the main loop since it could lead to lock ordering problems. Use bdrv_drain(bs) instead because we have acquired the AioContext so nothing else can sneak in I/O. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1413889440-32577-10-git-send-email-stefanha@redhat.com	2014-11-03 11:41:49 +00:00
Stefan Hajnoczi	f3e69beb94	block: let stream blockjob run in BDS AioContext The stream block job must run in the BlockDriverState AioContext so that it works with dataplane. The basics of acquiring the AioContext are easy in blockdev.c. The tricky part is the completion code which drops part of the backing file chain. This must be done in the main loop where bdrv_unref() and bdrv_close() are safe to call. Use block_job_defer_to_main_loop() to achieve that. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1413889440-32577-9-git-send-email-stefanha@redhat.com	2014-11-03 11:41:49 +00:00
Stefan Hajnoczi	761731b180	block: let backup blockjob run in BDS AioContext The backup block job must run in the BlockDriverState AioContext so that it works with dataplane. The basics of acquiring the AioContext are easy in blockdev.c. The completion code in block/backup.c must call bdrv_unref() from the main loop. Use block_job_defer_to_main_loop() to achieve that. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1413889440-32577-8-git-send-email-stefanha@redhat.com	2014-11-03 11:41:49 +00:00
Max Reitz	ecf58777c5	block/qcow2: Simplify shared L2 handling in amend Currently, we have a bitmap for keeping track of which clusters have been created during the zero cluster expansion process. This was necessary because we need to properly increase the refcount for shared L2 tables. However, now we can simply take the L2 refcount and use it for the cluster allocated for expansion. This will be the correct refcount and therefore we don't have to remember that cluster having been allocated any more. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Message-id: 1414404776-4919-7-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:49 +00:00
Max Reitz	44751917db	block/qcow2: Make get_refcount() global Reading the refcount of a cluster is an operation which can be useful in all of the qcow2 code, so make that function globally available. While touching this function, amend the comment describing the "addend" parameter: It is (no longer, if it ever was) necessary to have it set to -1 or 1; any value is fine. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Message-id: 1414404776-4919-6-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:49 +00:00
Max Reitz	4057a2b24a	block/qcow2: Implement status CB for amend The only really time-consuming operation potentially performed by qcow2_amend_options() is zero cluster expansion when downgrading qcow2 images from compat=1.1 to compat=0.10, so report status of that operation and that operation only through the status CB. For this, approximate the progress as the number of L1 entries visited during the operation. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Message-id: 1414404776-4919-5-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:49 +00:00
Max Reitz	7748543420	block: Add status callback to bdrv_amend_options() Depending on the changed options and the image format, bdrv_amend_options() may take a significant amount of time. In these cases, a way to be informed about the operation's status is desirable. Since the operation is rather complex and may fundamentally change the image, implementing it as AIO or a coroutine does not seem feasible. On the other hand, implementing it as a block job would be significantly more difficult than a simple callback and would not add benefits other than progress report to the amending operation, because it should not actually be run as a block job at all. A callback may not be very pretty, but it's very easy to implement and perfectly fits its purpose here. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414404776-4919-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:48 +00:00
Max Reitz	d4a3238af5	qemu-img: Implement commit like QMP qemu-img should use QMP commands whenever possible in order to ensure feature completeness of both online and offline image operations. As qemu-img itself has no access to QMP (since this would basically require just everything being linked into qemu-img), imitate QMP's implementation of block-commit by using commit_active_start() and then waiting for the block job to finish. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1414159063-25977-9-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:48 +00:00
Max Reitz	b21c76529d	block/mirror: Improve progress report Instead of taking the total length of the block device as the block job's length, use the number of dirty sectors. The progress is now the number of sectors mirrored to the target block device. Note that this may result in the job's length increasing during operation, which is however in fact desirable. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414159063-25977-8-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:48 +00:00
Max Reitz	94054183da	qcow2: Optimize bdrv_make_empty() bdrv_make_empty() is currently only called if the current image represents an external snapshot that has been committed to its base image; it is therefore unlikely to have internal snapshots. In this case, bdrv_make_empty() can be greatly sped up by emptying the L1 and refcount table (while having the dirty flag set, which only works for compat=1.1) and creating a trivial refcount structure. If there are snapshots or for compat=0.10, fall back to the simple implementation (discard all clusters). [Applied s/clusters/cluster/ typo fix suggested by Eric Blake --Stefan] Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1414159063-25977-4-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:48 +00:00
Max Reitz	491d27e2af	qcow2: Implement bdrv_make_empty() Implement this function by making all clusters in the image file fall through to the backing file (by using the recently extended discard). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414159063-25977-3-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:48 +00:00
Max Reitz	808c4b6f30	qcow2: Allow "full" discard Normally, discarded sectors should read back as zero. However, there are cases in which a sector (or rather cluster) should be discarded as if they were never written in the first place, that is, reading them should fall through to the backing file again. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414159063-25977-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:47 +00:00
Max Reitz	d7f62751a1	raw-posix: raw_co_get_block_status() return value Instead of generating the full return value thrice in try_fiemap(), try_seek_hole() and as a fall-back in raw_co_get_block_status() itself, generate the value only in raw_co_get_block_status(). While at it, also remove the pnum parameter from try_fiemap() and try_seek_hole(). Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414148280-17949-3-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:47 +00:00
Max Reitz	e6d7ec32dd	raw-posix: Fix raw_co_get_block_status() after EOF As its comment states, raw_co_get_block_status() should unconditionally return 0 and set pnum to 0 for after EOF. An assertion after lseek(..., SEEK_HOLE) tried to catch this case by asserting that errno != -ENXIO (which would indicate a position after the EOF); but it should be errno != ENXIO instead. Regardless of that, there should be no such assertion at all. If bdrv_getlength() returned an outdated value and the image has been resized outside of qemu, lseek() will return with errno == ENXIO. Just return that value as an error then. Setting pnum to 0 and returning 0 should not be done here, as in that case we should update the device length as well. So, from qemu's perspective, the file has not been resized; it's just that there was an error querying sectors beyond a certain point (the actual file size). Additionally, nb_sectors should be clamped against the image end. This was probably not an issue if FIEMAP or SEEK_HOLE/SEEK_DATA worked, but the fallback did not take this case into account. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414148280-17949-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:47 +00:00
Richard W.M. Jones	f76faeda4b	block/curl: Improve type safety of s->timeout. qemu_opt_get_number returns a uint64_t, and curl_easy_setopt expects a long (not an int). There is no warning about the latter type error because curl_easy_setopt uses a varargs argument. Store the timeout (which is a positive number of seconds) as a uint64_t. Check that the number given by the user is reasonable. Zero is permissible (meaning no timeout is enforced by cURL). Cast it to long before calling curl_easy_setopt to fix the type error. Example error message after this change has been applied: $ ./qemu-img create -f qcow2 /tmp/test.qcow2 \ -b 'json: { "file.driver":"https", "file.url":"https://foo/bar", "file.timeout":-1 }' qemu-img: /tmp/test.qcow2: Could not open 'json: { "file.driver":"https", "file.url":"https://foo/bar", "file.timeout":-1 }': timeout parameter is too large or negative: Invalid argument Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 11:41:47 +00:00
Zhang Haoyu	3432a1929e	snapshot: add bdrv_drain_all() to bdrv_snapshot_delete() to avoid concurrency problem If there are still pending i/o while deleting snapshot, because deleting snapshot is done in non-coroutine context, and the pending i/o read/write (bdrv_co_do_rw) is done in coroutine context, so it's possible to cause concurrency problem between above two operations. Add bdrv_drain_all() to bdrv_snapshot_delete() to avoid this problem. Signed-off-by: Zhang Haoyu <zhanghy@sangfor.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 201410211637596311287@sangfor.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:42 +00:00
Adam Crume	be21788495	rbd: Add support for bdrv_invalidate_cache This fixes Ceph issue 2467: ttp://tracker.ceph.com/issues/2467 [Dropped return r in void function as suggested by Josh Durgin <josh.durgin@inktank.com>. --Stefan] Signed-off-by: Adam Crume <adamcrume@gmail.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1412880272-3154-1-git-send-email-adamcrume@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:41 +00:00
Denis V. Lunev	da725d0b0e	block/parallels: fix access to not initialized memory in catalog_bitmap found by valgrind. Command: ./qemu-img convert -f parallels -O qcow2 1.hds 1.img Invalid read of size 4 at 0x17D0EF: parallels_co_read (parallels.c:357) by 0x11FEE4: bdrv_aio_rw_vector (block.c:4640) by 0x11FFBF: bdrv_aio_readv_em (block.c:4652) by 0x11F55F: bdrv_co_readv_em (block.c:4862) by 0x123428: bdrv_aligned_preadv (block.c:3056) by 0x1239FA: bdrv_co_do_preadv (block.c:3162) by 0x125424: bdrv_rw_co_entry (block.c:2706) by 0x155DD9: coroutine_trampoline (coroutine-ucontext.c:118) by 0x6975B6F: ??? (in /lib/x86_64-linux-gnu/libc-2.19.so) The problem is that s->catalog_bitmap is allocated/filled as gmalloc(s->catalog_size) thus index validity check must be inclusive, i.e. index >= s->catalog_size is invalid. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1412759610-2257-4-git-send-email-den@openvz.org CC: Jeff Cody <jcody@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:41 +00:00
Peter Lieven	dc9e716369	block/iscsi: check for oversized requests Cancel oversized requests early. They would generate an iSCSI protocol error anyway; after having transferred possibly a lot of data over the wire. Suggested-By: Max Reitz <mreitz@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:41 +00:00
Peter Lieven	3dab155154	block/iscsi: use sector_limits_lun2qemu throughout iscsi_refresh_limits As Max pointed out there is a hidden cast from int64_t to int for all limits. So use the newly introduced sector_limits_lun2qemu for all limits received from the target. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:41 +00:00
Peter Lieven	52f6fa1430	block/iscsi: set max_transfer_length Copy the max_xfer_len from the BlockLimits VPD or use the maximum value fitting in the CDB. The helper function sector_limits_lun2qemu is introduced to convert and cap the limits from the VPD to the maximum power of two fitting in an integer; integer is the range for nb_sectors throughout the block layer. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-11-03 09:48:41 +00:00
SeokYeon Hwang	f18a768efb	vdi: wrapped uuid_unparse() in #ifdef Wrapped uuid_unparse() in #ifdef to avoid "-Wunused-function" on clang 3.4 or later. Signed-off-by: SeokYeon Hwang <syeon.hwang@samsung.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2014-11-02 10:05:35 +03:00
Fam Zheng	c1d4096b0f	iscsi: Refuse to open as writable if the LUN is write protected Before, when a write protected iSCSI target is attached as scsi-disk with BDRV_O_RDWR, we report it as writable, while in fact all writes will fail. One way to improve this is to report write protect flag as true to guest, but a even better way is to refuse using a write protected LUN to guest. Target write protect flag is checked with a mode sense query. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-10-31 11:29:02 +01:00
Roger Pau Monne	3cad83075c	block: char devices on FreeBSD are not behind a pager Introduce a new flag to mark devices that require requests to be aligned and replace the usage of BDRV_O_NOCACHE and O_DIRECT with this flag when appropriate. If a character device is used as a backend on a FreeBSD host set this flag unconditionally. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2014-10-23 16:56:53 +02:00
Max Reitz	a1391444fe	qcow2: Do not overflow when writing an L1 sector While writing an L1 table sector, qcow2_write_l1_entry() copies the respective range from s->l1_table to the local "buf" array. The size of s->l1_table does not have to be a multiple of L1_ENTRIES_PER_SECTOR; thus, limit the index which is used for copying all entries to the L1 size. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:02 +02:00
Max Reitz	17bd5f4727	qcow2: Drop REFCOUNT_SHIFT With BDRVQcowState.refcount_block_bits, we don't need REFCOUNT_SHIFT anymore. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:02 +02:00
Max Reitz	791230d8bb	qcow2: Clean up after refcount rebuild Because the old refcount structure will be leaked after having rebuilt it, we need to recalculate the refcounts and run a leak-fixing operation afterwards (if leaks should be fixed at all). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	c7c0681bc8	qcow2: Rebuild refcount structure during check The previous commit introduced the "rebuild" variable to qcow2's implementation of the image consistency check. Now make use of this by adding a function which creates a completely new refcount structure based solely on the in-memory information gathered before. The old refcount structure will be leaked, however. This leak will be dealt with in a follow-up commit. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	f307b2558f	qcow2: Do not perform potentially damaging repairs If a referenced cluster has a refcount of 0, increasing its refcount may result in clusters being allocated for the refcount structures. This may overwrite the referenced cluster, therefore we cannot simply increase the refcount then. In such cases, we can either try to replicate all the refcount operations solely for the check operation, basing the allocations on the in-memory refcount table; or we can simply rebuild the whole refcount structure based on the in-memory refcount table. Since the latter will be much easier, do that. To prepare for this, introduce a "rebuild" boolean which should be set to true whenever a fix is rather dangerous or too complicated using the current refcount structures. Another example for this is refcount blocks being referenced more than once. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	001c158def	qcow2: Fix refcount blocks beyond image end If the qcow2 check function detects a refcount block located beyond the image end, grow the image appropriately. This cannot break anything and is the logical fix for such a case. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	9696df219a	qcow2: Reuse refcount table in calculate_refcounts() We will later call calculate_refcounts multiple times, so reuse the refcount table if possible. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	641bb63cd6	qcow2: Let inc_refcounts() resize the reftable Now that the refcount table can be passed around by reference, do that for inc_refcounts() (and subsequently check_refcounts_l1() and check_refcounts_l2()) and use it for resizing it when a cluster after the image end is encountered. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	fef4d3d564	qcow2: Let inc_refcounts() return -errno As of a future patch, inc_refcounts() will have to throw errors which are generally signaled by returning -errno. Therefore, let it return an integer which is either 0 for success or -errno and handle the -errno case in all callers. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	ad27390c85	qcow2: Split fail code in L1 and L2 checks Instead of printing out an error message, incrementing check_errors and returning a fixed -errno, just do cleanups and return -ret, with ret set by the code which threw the exception (jumped to the fail label). Also, increment check_errors on error in check_refcounts_l2(). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	713d9675e0	qcow2: Use int64_t for in-memory reftable size Use int64_t for the entry count of the in-memory refcount table throughout the check functions. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	057a3fe57e	qcow2: Pull check_refblocks() up Pull check_refblocks() before calculate_refcounts() so we can drop its static declaration. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	78fb328e85	qcow2: Use sizeof(refcount_table) When implementing variable refcounts, we want to be able to easily find all the places in qemu which are tied to a certain refcount order. Replace sizeof(uint16_t) in the check code by sizeof(refcount_table) so we can later find it more easily. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	6ca56bf5e9	qcow2: Split qcow2_check_refcounts() Put the code for calculating the reference counts and comparing them during qemu-img check into own functions. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	5b84106bd9	qcow2: Fix leaks in dirty images When opening dirty images, qcow2's repair function should not only repair errors but leaks as well. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	1d13d65466	qcow2: Calculate refcount block entry count The size of a refblock entry is (in theory) variable; calculate therefore the number of entries per refblock and the according bit shift (1 << x == entry count) when opening an image. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Max Reitz	e9082e4736	block/vdi: Use {DIV_,}ROUND_UP There are macros for these operations, so make use of them. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-23 15:34:01 +02:00
Markus Armbruster	84ebe3755f	block: Make device model's references to BlockBackend strong Doesn't make a difference just yet, but it's the right thing to do. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 14:03:51 +02:00
Markus Armbruster	a7f53e26a6	block: Lift device model API into BlockBackend Move device model attachment / detachment and the BlockDevOps device model callbacks and their wrappers from BlockDriverState to BlockBackend. Wrapper calls in block.c change from bdrv_dev_FOO_cb(bs, ...) to if (bs->blk) { bdrv_dev_FOO_cb(bs->blk, ...); } No change, because both bdrv_dev_change_media_cb() and bdrv_dev_resize_cb() do nothing when no device model is attached, and a device model can be attached only when bs->blk. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 14:03:50 +02:00
Markus Armbruster	d829a2115f	block/qapi: Convert qmp_query_block() to BlockBackend Much more command code needs conversion. I start with this one because it's using bdrv_dev_* functions, which I'm about to lift into BlockBackend. While there, give bdrv_query_info() internal linkage. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 14:03:50 +02:00
Markus Armbruster	26f8b3a847	blockdev: Fix blockdev-add not to create DriveInfo blockdev_init() always creates a DriveInfo, but only drive_new() fills it in. qmp_blockdev_add() leaves it blank. This results in a drive with type = IF_IDE, bus = 0, unit = 0. Screwed up in commit `ee13ed1c`. Board initialization code looking for IDE drive (0,0) can pick up one of these bogus drives. The QMP command has to execute really early to be visible. Not sure how likely that is in practice. Fix by creating DriveInfo in drive_new(). Block backends created by blockdev-add don't get one. Breaks the test for "has been created by qmp_blockdev_add()" in blockdev_mark_auto_del() and do_drive_del(), because it changes the value of dinfo && !dinfo->enable_auto_del from true to false. Simply test !dinfo instead. Leaves DriveInfo member enable_auto_del unused. Drop it. A few places assume a block backend always has a DriveInfo. Fix them up. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 14:03:50 +02:00
Markus Armbruster	d3aeb1b7da	blockdev: Drop superfluous DriveInfo member id Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 14:03:50 +02:00
Markus Armbruster	4be746345f	hw: Convert from BlockDriverState to BlockBackend, mostly Device models should access their block backends only through the block-backend.h API. Convert them, and drop direct includes of inappropriate headers. Just four uses of BlockDriverState are left: * The Xen paravirtual block device backend (xen_disk.c) opens images itself when set up via xenbus, bypassing blockdev.c. I figure it should go through qmp_blockdev_add() instead. * Device model "usb-storage" prompts for keys. No other device model does, and this one probably shouldn't do it, either. * ide_issue_trim_cb() uses bdrv_aio_discard() instead of blk_aio_discard() because it fishes its backend out of a BlockAIOCB, which has only the BlockDriverState. * PC87312State has an unused BlockDriverState[] member. The next two commits take care of the latter two. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 14:02:25 +02:00
Markus Armbruster	097310b53e	block: Rename BlockDriverCompletionFunc to BlockCompletionFunc I'll use it with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:27 +02:00
Markus Armbruster	7c84b1b831	block: Rename BlockDriverAIOCB* to BlockAIOCB* I'll use BlockDriverAIOCB with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:27 +02:00
Markus Armbruster	7f06d47eff	block: Merge BlockBackend and BlockDriverState name spaces BlockBackend's name space is separate only to keep the initial patches simple. Time to merge the two. Retain bdrv_find() and bdrv_get_device_name() for now, to keep this series manageable. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	bfb197e0d9	block: Eliminate BlockDriverState member device_name[] device_name[] can become non-empty only in bdrv_new_root() and bdrv_move_feature_fields(). The latter is used only to undo damage done by bdrv_swap(). The former is called only by blk_new_with_bs(). Therefore, when a BlockDriverState's device_name[] is non-empty, then it's been created with a BlockBackend, and vice versa. Furthermore, blk_new_with_bs() keeps the two names equal. Therefore, device_name[] is redundant. Eliminate it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	9ba10c95a4	block: Make BlockBackend own its BlockDriverState On BlockBackend destruction, unref its BlockDriverState. Replaces the callers' unrefs. This turns the pointer from BlockBackend to BlockDriverState into a strong reference, managed with bdrv_ref() / bdrv_unref(). The back-pointer remains weak. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	8fb3c76c94	block: Code motion to get rid of stubs/blockdev.c Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	18e46a033d	block: Connect BlockBackend and DriveInfo Make the BlockBackend own the DriveInfo. Change blockdev_init() to return the BlockBackend instead of the DriveInfo. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	7e7d56d9e0	block: Connect BlockBackend to BlockDriverState Convenience function blk_new_with_bs() creates a BlockBackend with its BlockDriverState. Callers have to unref both. The commit after next will relieve them of the need to unref the BlockDriverState. Complication: due to the silly way drive_del works, we need a way to hide a BlockBackend, just like bdrv_make_anon(). To emphasize its "special" status, give the function a suitably off-putting name: blk_hide_on_behalf_of_do_drive_del(). Unfortunately, hiding turns the BlockBackend's name into the empty string. Can't avoid that without breaking the blk->bs->device_name equals blk->name invariant. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	26f54e9a3c	block: New BlockBackend A block device consists of a frontend device model and a backend. A block backend has a tree of block drivers doing the actual work. The tree is managed by the block layer. We currently use a single abstraction BlockDriverState both for tree nodes and the backend as a whole. Drawbacks: * Its API includes both stuff that makes sense only at the block backend level (root of the tree) and stuff that's only for use within the block layer. This makes the API bigger and more complex than necessary. Moreover, it's not obvious which interfaces are meant for device models, and which really aren't. * Since device models keep a reference to their backend, the backend object can't just be destroyed. But for media change, we need to replace the tree. Our solution is to make the BlockDriverState generic, with actual driver state in a separate object, pointed to by member opaque. That lets us replace the tree by deinitializing and reinitializing its root. This special need of the root makes the data structure awkward everywhere in the tree. The general plan is to separate the APIs into "block backend", for use by device models, monitor and whatever other code dealing with block backends, and "block driver", for use by the block layer and whatever other code (if any) dealing with trees and tree nodes. Code dealing with block backends, device models in particular, should become completely oblivious of BlockDriverState. This should let us clean up both APIs, and the tree data structures. This commit is a first step. It creates a minimal "block backend" API: type BlockBackend and functions to create, destroy and find them. BlockBackend objects are created and destroyed exactly when root BlockDriverState objects are created and destroyed. "Root" in the sense of "in bdrv_states". They're not yet used for anything; that'll come shortly. A root BlockDriverState is created with bdrv_new_root(), so where to create a BlockBackend is obvious. Where these roots get destroyed isn't always as obvious. It is obvious in qemu-img.c, qemu-io.c and qemu-nbd.c, and in error paths of blockdev_init(), blk_connect(). That leaves destruction of objects successfully created by blockdev_init() and blk_connect(). blockdev_init() is used only by drive_new() and qmp_blockdev_add(). Objects created by the latter are currently indestructible (see commit `48f364d` "blockdev: Refuse to drive_del something added with blockdev-add" and commit `2d246f0` "blockdev: Introduce DriveInfo.enable_auto_del"). Objects created by the former get destroyed by drive_del(). Objects created by blk_connect() get destroyed by blk_disconnect(). BlockBackend is reference-counted. Its reference count never exceeds one so far, but that's going to change. In drive_del(), the BB's reference count is surely one now. The BDS's reference count is greater than one when something else is holding a reference, such as a block job. In this case, the BB is destroyed right away, but the BDS lives on until all extra references get dropped. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Markus Armbruster	e4e9986b1c	block: Split bdrv_new_root() off bdrv_new() Creating an anonymous BDS can't fail. Make that obvious. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Max Reitz	ec0de76874	nbd: Fix filename generation Export names may be used with nbd+unix, too, fix nbd_refresh_filename() accordingly. Also, for nbd+tcp, the documented path schema is "nbd://host[:port]/export", so use it. Furthermore, as can be seen from that schema, the port is optional. That makes six single cases for how the filename can be formatted; it is not easy to generalize these cases without the resulting statement being completely unreadable, thus there is simply one snprintf() per case. Finally, taking the options from BDRVNBDState::socket_opts is wrong, because those will not contain the export name. Just use BlockDriverState::options instead. Reported-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Tony Breeds	7c15903789	block/raw-posix: use seek_hole ahead of fiemap try_fiemap() uses FIEMAP_FLAG_SYNC which has a significant performance impact. Prefer seek_hole() over fiemap() to avoid this impact where possible. seek_hole is more widely used and, arguably, has potential to be optimised in the kernel. Reported-By: Michael Steffens <michael_steffens@posteo.de> Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Pádraig Brady <pbrady@redhat.com> Cc: Eric Blake <eblake@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Tony Breeds	38c4d0aea3	block/raw-posix: Fix disk corruption in try_fiemap Using fiemap without FIEMAP_FLAG_SYNC is a known corrupter. Add the FIEMAP_FLAG_SYNC flag to the FS_IOC_FIEMAP ioctl. This has the downside of significantly reducing performance. Reported-By: Michael Steffens <michael_steffens@posteo.de> Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Pádraig Brady <pbrady@redhat.com> Cc: Eric Blake <eblake@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00
Zhang Haoyu	d8bb71b622	qcow2: fix leak of Qcow2DiscardRegion in update_refcount_discard When the Qcow2DiscardRegion is adjacent to another one referenced by "d", free this Qcow2DiscardRegion metadata referenced by "p" after it was removed from s->discards queue. Signed-off-by: Zhang Haoyu <zhanghy@sangfor.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-10-20 13:41:26 +02:00

1 2 3 4 5 ...

1935 Commits