mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Fam Zheng	0eb1e89112	block: Use common req handling in copy offloading This brings the request handling logic inline with write and discard, fixing write_gen, resize_cb, dirty bitmaps and image size refreshing. The last of these issues broke iotest case 222, which is now fixed. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 16:46:22 +02:00
Fam Zheng	00695c27a0	block: Use common req handling for discard Reuse the new bdrv_co_write_req_prepare/finish helpers. The variation here is that discard requests don't affect bs->wr_highest_offset, and it cannot extend the image. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 16:46:16 +02:00
Fam Zheng	7f8f03ef6d	block: Fix handling of image enlarging write Two problems exist when a write request that enlarges the image (i.e. write beyond EOF) finishes: 1) parent is not notified about size change; 2) dirty bitmap is not resized although we try to set the dirty bits; Fix them just like how bdrv_co_truncate works. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 16:01:52 +02:00
Fam Zheng	85fe24796d	block: Extract common write req handling As a mechanical refactoring patch, this is the first step towards unified and more correct write code paths. This is helpful because multiple BlockDriverState fields need to be updated after modifying image data, and it's hard to maintain in multiple places such as copy offload, discard and truncate. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 16:01:52 +02:00
Fam Zheng	22931a1533	block: Use uint64_t for BdrvTrackedRequest byte fields This matches the types used for bytes in the rest parts of block layer. In the case of bdrv_co_truncate, new_bytes can be the image size which probably doesn't fit in a 32 bit int. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 16:01:52 +02:00
Fam Zheng	0b9fd3f467	block: Use BdrvChild to discard Other I/O functions are already using a BdrvChild pointer in the API, so make discard do the same. It makes it possible to initiate the same permission checks before doing I/O, and much easier to share the helper functions for this, which will be added and used by write, truncate and copy range paths. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 16:01:52 +02:00
Fam Zheng	ecc983a507	block: Add copy offloading trace points A few trace points that can help reveal what is happening in a copy offloading I/O path. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 16:01:52 +02:00
Fam Zheng	f8a30874ca	block: Prefix file driver trace points with "file_" With in one module, trace points usually have a common prefix named after the module name. paio_submit and paio_submit_co are the only two trace points so far in the two file protocol drivers. As we are adding more, having a common prefix here is better so that trace points can be enabled with a glob. Rename them. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 16:01:51 +02:00
Cornelia Huck	44e8b4689c	Revert "block: Remove deprecated -drive option serial" This reverts commit `b008326744`. Hold off removing this for one more QEMU release (current libvirt release still uses it.) Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 14:36:11 +02:00
Ari Sundholm	ba814c82bb	block/blklogwrites: Make sure the log sector size is not too small The sector size needs to be large enough to accommodate the data structures for the log super block and log write entries. This was previously not properly checked, which made it possible to cause QEMU to badly misbehave. Signed-off-by: Ari Sundholm <ari@tuxera.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 13:17:48 +02:00
Vladimir Sementsov-Ogievskiy	f8d59dfb40	block/backup: fix fleecing scheme: use serialized writes Fleecing scheme works as follows: we want a kind of temporary snapshot of active drive A. We create temporary image B, with B->backing = A. Then we start backup(sync=none) from A to B. From this point, B reads as point-in-time snapshot of A (A continues to be active drive, accepting guest IO). This scheme needs some additional synchronization between reads from B and backup COW operations, otherwise, the following situation is theoretically possible: (assume B is qcow2, client is NBD client, reading from B) 1. client starts reading and take qcow2 mutex in qcow2_co_preadv, and goes up to l2 table loading (assume cache miss) 2) guest write => backup COW => qcow2 write => try to take qcow2 mutex => waiting 3. l2 table loaded, we see that cluster is UNALLOCATED, go to "case QCOW2_CLUSTER_UNALLOCATED" and unlock mutex before bdrv_co_preadv(bs->backing, ...) 4) aha, mutex unlocked, backup COW continues, and we finally finish guest write and change cluster in our active disk A 5. actually, do bdrv_co_preadv(bs->backing, ...) and read _new updated_ data. To avoid this, let's make backup writes serializing, to not intersect with reads from B. Note: we expand range of handled cases from (sync=none and B->backing = A) to just (A in backing chain of B), to finally allow safe reading from B during backup for all cases when A in backing chain of B, i.e. B formally looks like point-in-time snapshot of A. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 13:10:29 +02:00
Vladimir Sementsov-Ogievskiy	09d2f94846	block: add BDRV_REQ_SERIALISING flag Serialized writes should be used in copy-on-write of backup(sync=none) for image fleecing scheme. We need to change an assert in bdrv_aligned_pwritev, added in `28de2dcd88`. The assert may fail now, because call to wait_serialising_requests here may become first call to it for this request with serializing flag set. It occurs if the request is aligned (otherwise, we should already set serializing flag before calling bdrv_aligned_pwritev and correspondingly waited for all intersecting requests). However, for aligned requests, we should not care about outdating of previously read data, as there no such data. Therefore, let's just update an assert to not care about aligned requests. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 13:10:25 +02:00
Vladimir Sementsov-Ogievskiy	67b51fb998	block: split flags in copy_range Pass read flags and write flags separately. This is needed to handle coming BDRV_REQ_NO_SERIALISING clearly in following patches. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 13:04:25 +02:00
Vladimir Sementsov-Ogievskiy	999658a05e	block/io: fix copy_range Here two things are fixed: 1. Architecture On each recursion step, we go to the child of src or dst, only for one of them. So, it's wrong to create tracked requests for both on each step. It leads to tracked requests duplication. 2. Wait for serializing requests on write path independently of BDRV_REQ_NO_SERIALISING Before commit `9ded4a0114` "backup: Use copy offloading", BDRV_REQ_NO_SERIALISING was used for only one case: read in copy-on-write operation during backup. Also, the flag was handled only on read path (in bdrv_co_preadv and bdrv_aligned_preadv). After `9ded4a0114`, flag is used for not waiting serializing operations on backup target (in same case of copy-on-write operation). This behavior change is unsubstantiated and potentially dangerous, let's drop it and add additional asserts and documentation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 13:04:22 +02:00
Kevin Wolf	b0ddcbbb36	block: Fix copy-on-read crash with partial final cluster If the virtual disk size isn't aligned to full clusters, bdrv_co_do_copy_on_readv() may get pnum == 0 before having the full cluster completed, which will let it run into an assertion failure: qemu-io: block/io.c:1203: bdrv_co_do_copy_on_readv: Assertion `skip_bytes < pnum' failed. Check for EOF, assert that we read at least as much as the read request originally wanted to have (which is true at EOF because otherwise bdrv_check_byte_request() would already have returned an error) and return success early even though we couldn't copy the full cluster. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 10:36:15 +02:00
Kevin Wolf	4be6a6d118	block: Poll after drain on attaching a node Commit `dcf94a23b1` ('block: Don't poll in parent drain callbacks') removed polling in bdrv_child_cb_drained_begin() on the grounds that the original bdrv_drain() already will poll and BdrvChildRole.drained_begin calls must not cause graph changes (and therefore must not call aio_poll() or the recursion through the graph will break. This reasoning is correct for calls through bdrv_do_drained_begin(). However, BdrvChildRole.drained_begin is also called when a node that is already in a drained section (i.e. bdrv_do_drained_begin() has already returned and therefore can't poll any more) is attached to a new parent. In this case, we must explicitly poll to have all requests completed before the drained new child can be attached to the parent. In bdrv_replace_child_noperm(), we know that we're not inside the recursion of bdrv_do_drained_begin() because graph changes are not allowed there, and bdrv_replace_child_noperm() is a graph change. The call of BdrvChildRole.drained_begin() must therefore be followed by a BDRV_POLL_WHILE() that waits for the completion of requests. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-10 10:36:15 +02:00
Vladimir Sementsov-Ogievskiy	0e4e4318ea	qcow2: add overlap check for bitmap directory Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20180705151515.779173-1-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-07-09 19:43:24 +02:00
Max Reitz	439e89fc09	vmdk: Fix possible segfault with non-VMDK backing VMDK performs a probing check in vmdk_co_create_opts() to prevent the user from assigning non-VMDK files as a backing file, because it only supports VMDK backing files. However, with the @backing runtime option, it is possible to assign arbitrary nodes as backing nodes, regardless of what the image header says. Therefore, VMDK may not just access backing nodes assuming they are VMDK nodes -- which it does, because it needs to compare the backing file's CID with the overlay's parentCID value, and naturally the backing file only has a CID when it's a VMDK file. Instead, it should report the CID of non-VMDK backing files not to match the overlay because clearly a non-present CID does not match. Without this change, vmdk_read_cid() reads from the backing file's bs->file, which may be NULL (in which case we get a segfault). Also, it interprets bs->opaque as a BDRVVmdkState and then reads from the .desc_offset field, which usually will just return some arbitrary value which then results in either garbage to be read, or bdrv_pread() to return an error, both of which result in a non-matching CID to be reported. (In a very unlikely case, we could read something that looks like a VMDK descriptor, and then get a CID which might actually match. But that is highly unlikely, and the only result would be that VMDK accepts the backing file which is not too bad (albeit unintentional).) ((And in theory, the seek to .desc_offset might leak data from another block driver's opaque object. But then again, the user should realize very quickly that a non-VMDK backing file does not work (because the read will very likely fail, due to the reasons given above), so this should not be exploitable.)) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20180702210721.4847-2-mreitz@redhat.com Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-07-09 19:43:24 +02:00
Fam Zheng	6d6bcc46b5	raw: Drop superfluous semicolon Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20180702025836.20957-5-famz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-07-09 19:43:24 +02:00
Fam Zheng	65f1899e9b	qcow2: Drop unreachable break Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20180702025836.20957-4-famz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-07-09 19:43:24 +02:00
Fam Zheng	9f850f67ad	file-posix: Fix fd_open check in raw_co_copy_range_to One of them is a typo. But update both to be more readable. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20180702025836.20957-3-famz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-07-09 19:43:24 +02:00
Fam Zheng	148546c822	qcow2: Drop unused cluster_data Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20180702025836.20957-2-famz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-07-09 19:43:24 +02:00
Peter Maydell	1daf14ec9e	Block layer patches: - qcow2: Use worker threads for compression to improve performance of 'qemu-img convert -W' and compressed backup jobs - blklogwrites: New filter driver to log write requests to an image in the dm-log-writes format - file-posix: Fix image locking during image creation - crypto: Fix memory leak in error path - Error out instead of silently truncating node names -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJbPfHhAAoJEH8JsnLIjy/Wxp8P/jeEJ/jJ0N2doix5KJSvtkdP dYx4cJxHg9tMSnmBfA49qc1ilgbzZqjbF/wH3Ko3zPsmFP0xsxx+9zE4c3iL7HA6 6E4/r3szznO8DIxdgzAIeHjZc7YPpsY3alrthT55eR/FnDyc8zofi/iUMgvTKPYA 8TnWqgi2fdaVwNy+lIRJFdhzzQNtPxOEO+0DFHaOZvQ9vlc5DGPC+Hj3Qc11GK8M u7orkeYqQPknxy0hJKPxtWHvzCQUJMcSSs6PbuslnzOerYiQLmx+RVIwMhXfVZjV lXe2SppAszBujtcIENhZlj1cECs7MrWTmFDcWvBA+Mh/JhFEWwykmlQHYrjCqdvw QyWhVLgP/6jQDaBls6LADSZDxX7i0sV27DzVGN4gYUX/KcyVPEDROm90MyE8nxN0 Y5hhujTkzu94Zvd/OlKZRs4tPxmbRrd2SWy0id8Kj5/gO/+UWECOZCrPrlB5uzec bsxHoeXLVe2/56JkCIiOw3hLILMY6gPLJgeaQjz6hu1oZJqYuoof8grVFjUCSH0C BBlz8O4upHdPIKNRl4rmdgwasg5YIiJ7SFl4h7KMCD0elkKmo3SyTKosagvqrpVN JvW1YuosCedcfYIKNPnZdEGWDKuTaZtcnZnM9IRY6hr8ySMw1LloFsgXsJ3tqTwe wb1Wjm4Qlz5BaoB7o/Os =z5AC -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - qcow2: Use worker threads for compression to improve performance of 'qemu-img convert -W' and compressed backup jobs - blklogwrites: New filter driver to log write requests to an image in the dm-log-writes format - file-posix: Fix image locking during image creation - crypto: Fix memory leak in error path - Error out instead of silently truncating node names # gpg: Signature made Thu 05 Jul 2018 11:24:33 BST # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: file-posix: Unlock FD after creation file-posix: Fix creation locking block/blklogwrites: Add an option for the update interval of the log superblock block/blklogwrites: Add an option for appending to an old log block/blklogwrites: Change log_sector_size from int64_t to uint64_t block/crypto: Fix memory leak in create error path block: Don't silently truncate node names block: Add blklogwrites block: Move two block permission constants to the relevant enum qcow2: add compress threads qcow2: refactor data compression qemu-img: allow compressed not-in-order writes Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-07-05 15:53:04 +01:00
Max Reitz	7c20c808a5	file-posix: Unlock FD after creation Closing the FD does not necessarily mean that it is unlocked. Fix this by relinquishing all permission locks before qemu_close(). Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-05 11:07:58 +02:00
Max Reitz	d815efcaf0	file-posix: Fix creation locking raw_apply_lock_bytes() takes a bit mask of "permissions that are NOT shared". Also, make the "perm" and "shared" variables uint64_t, because I do not particularly like using ~ on signed integers (and other permission masks are usually uint64_t, too). Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-05 11:07:58 +02:00
Ari Sundholm	1dce698ea8	block/blklogwrites: Add an option for the update interval of the log superblock This is a way to ensure that the log superblock is periodically updated. Before, this was only done on flush requests, which may not be enough if the VM exits abnormally, omitting the final flush. The default interval is 4096 write requests. Signed-off-by: Ari Sundholm <ari@tuxera.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-05 10:50:21 +02:00
Ari Sundholm	0878b3c113	block/blklogwrites: Add an option for appending to an old log Suggested by Kevin Wolf. May be useful when testing multiple batches of writes or doing long-term testing involving restarts of the VM. Signed-off-by: Ari Sundholm <ari@tuxera.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-05 10:50:20 +02:00
Ari Sundholm	2dacaf7c82	block/blklogwrites: Change log_sector_size from int64_t to uint64_t This was a simple oversight when working on intermediate versions of the original patch which introduced blklogwrites. Signed-off-by: Ari Sundholm <ari@tuxera.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-05 10:50:20 +02:00
Kevin Wolf	0b68589d17	block/crypto: Fix memory leak in create error path Fixes: Coverity CID 1393782 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2018-07-05 10:29:19 +02:00
Aapo Vienamo	bfcc224e3c	block: Add blklogwrites Implements a block device write logging system, similar to Linux kernel device mapper dm-log-writes. The write operations that are performed on a block device are logged to a file or another block device. The write log format is identical to the dm-log-writes format. Currently, log markers are not supported. This functionality can be used for crash consistency and fs consistency testing. By implementing it in qemu, tests utilizing write logs can be be used to test non-Linux drivers and older kernels. The driver accepts an optional parameter to set the sector size used for logging. This makes the driver require all requests to be aligned to this sector size and also makes offsets and sizes of writes in the log metadata to be expressed in terms of this value (the log format has a granularity of one sector for offsets and sizes). This allows accurate logging of writes to guest block devices that have unusual sector sizes. The implementation is based on the blkverify and blkdebug block drivers. Signed-off-by: Aapo Vienamo <aapo@tuxera.com> Signed-off-by: Ari Sundholm <ari@tuxera.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-05 10:29:19 +02:00
Vladimir Sementsov-Ogievskiy	ceb029cd6f	qcow2: add compress threads Do data compression in separate threads. This significantly improve performance for qemu-img convert with -W (allow async writes) and -c (compressed) options. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-05 10:29:19 +02:00
Vladimir Sementsov-Ogievskiy	2714f13d69	qcow2: refactor data compression Make a separate function for compression to be parallelized later. - use .avail_out field instead of .next_out to calculate size of compressed data. It looks more natural and it allows to keep dest to be void pointer - set avail_out to be at least one byte less than input, to be sure avoid inefficient compression earlier Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-07-05 10:29:00 +02:00
Vladimir Sementsov-Ogievskiy	58f72b965e	dirty-bitmap: fix double lock on bitmap enabling Bitmap lock/unlock were added to bdrv_enable_dirty_bitmap in `8b1402ce80`, but some places were not updated correspondingly, which leads to trying to take this lock twice, which is dead-lock. Fix this. Actually, iotest 199 (about dirty bitmap postcopy migration) is broken now, and this fixes it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20180625165745.25259-3-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2018-07-04 02:12:49 -04:00
Vladimir Sementsov-Ogievskiy	92bcea40d3	block/dirty-bitmap: add bdrv_enable_dirty_bitmap_locked Add _locked version of bdrv_enable_dirty_bitmap, to fix dirty bitmap migration in the following patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20180625165745.25259-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2018-07-04 02:12:49 -04:00
Peter Maydell	a395717cbd	-----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJbOvCTAAoJEL2+eyfA3jBX354P/RFVLsozhcb3DeFj5Ocq2kfS sRt/82Ke/f/w/8lNd4wNbOsdG/eg2M4RmLWF4ONWWoeO7Z0KIatUOTtw5HxjBxBX XfhQy7RZ65luuCHnLDU6c4IdnDvXBVG/kErydhDZjEyeY8qlxrurBB8331cTRFwu hisweIwogPOFDA+/Bty0W0EyVQWFAobL3ExYFlOYFuHwsqJfMPQbytw2zDzC4kjn 8Ecppyt7rfLsEcyf/4OAoHfbbYOiQl7PkXE7/uXDFyL8zPdRpIlDFSZtmy1Zb213 mcYhPmehUkFHV/BDF/LdnzjlraK8oMaNu0IDld5cX/1xUU4VtbW2YjAt6OdCn7Ll 7YbNNKYU/mM1QUPshX4qJkbUaCu7JoTDKPiBbJei/MV7zMJBLpNVG/AuJE2gbweI 2levV76QzS2+fQVKv/9LUliqOYEp5T0/aybb+35Vzhf5WNpSO7s1oaCDAvSgUhS+ qU1MIAROQQPCmdM8PwqzG9b2TGp/tYcWOju5bqt488Twmo0BbTGjYCFl6StJHibC mN5fASP5nQiz1fc3FrBp0h/PCQlGtd2ZgeyeC+lkPVcFovclA2vo/ib1k2LK/0nU TzkKIFRJZH58yYjppuBrB6c/aFfVkutE4Hz25i+3nZ91ZEyQKbv1mDxCyRNXNWjt Gteul6gUo/AjzMOWFFvH =AKR7 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging # gpg: Signature made Tue 03 Jul 2018 04:42:11 BST # gpg: using RSA key BDBE7B27C0DE3057 # gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>" # gpg: aka "Jeffrey Cody <jeff@codyprime.org>" # gpg: aka "Jeffrey Cody <codyprime@gmail.com>" # Primary key fingerprint: 9957 4B4D 3474 90E7 9D98 D624 BDBE 7B27 C0DE 3057 * remotes/cody/tags/block-pull-request: backup: Use copy offloading block: Honour BDRV_REQ_NO_SERIALISING in copy range block: Fix parameter checking in bdrv_co_copy_range_internal Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-07-03 11:49:51 +01:00
Peter Maydell	9b75dcb15f	nbd patches for 2018-07-02 Bug fixes and iotest exposure of fleecing via NBD (serving a read-only point-in-time view via blockdev-backup sync:none, as well as serving dirty bitmaps over NBD), including a new x-dirty-bitmap parameter when opening NBD clients as the counterpart to x-nbd-server-add-bitmap. Also a random fix for iscsi block_status spotted by Coverity that missed other miscellaneous trees. - Eric Blake: nbd/server: Fix dirty bitmap logic regression - Eric Blake: iscsi: Avoid potential for get_status overflow - John Snow/Vladimir Sementsov-Ogievskiy: 0/2 block: formalize and test fleecing - Eric Blake: 0/2 test NBD bitmap export -----BEGIN PGP SIGNATURE----- Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAABCAAGBQJbOtJPAAoJEKeha0olJ0NqEvwH/3FwWnlBdBvdYGgPjzGE1Atm ofCKcyxE/2VJtxeWlZQHzs1VqSq81s7am5SdzOrIWnQekvHFcLu6/71RABiauzMd neCvVOrXOVdktj1i1Z2Gg4BgjDmqbTDlo5ssVh/oXP0Zebi6OZFfQrB7y3cGBvui 4XI7lW9qJxt6F1FlKloXnofWRDENyo5vgdz6QjQXfauthw2T5045RIPTfiz03FCp fbs+6K0+bKxfPdNLrqxxOZo/loYnEXbDYv6VBAIWBqztXVnMHxCqnh0YN05jwsfF TRW0/YT8lpWarOZ1soIC6a/OGXQZbxgRhZ+Zr+Wa2jw0YNHJanU9isxi37aUqQo= =Xivx -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2018-07-02' into staging nbd patches for 2018-07-02 Bug fixes and iotest exposure of fleecing via NBD (serving a read-only point-in-time view via blockdev-backup sync:none, as well as serving dirty bitmaps over NBD), including a new x-dirty-bitmap parameter when opening NBD clients as the counterpart to x-nbd-server-add-bitmap. Also a random fix for iscsi block_status spotted by Coverity that missed other miscellaneous trees. - Eric Blake: nbd/server: Fix dirty bitmap logic regression - Eric Blake: iscsi: Avoid potential for get_status overflow - John Snow/Vladimir Sementsov-Ogievskiy: 0/2 block: formalize and test fleecing - Eric Blake: 0/2 test NBD bitmap export # gpg: Signature made Tue 03 Jul 2018 02:33:03 BST # gpg: using RSA key A7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" # gpg: aka "[jpeg image of size 6874]" # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2018-07-02: iotests: New test 223 for exporting dirty bitmap over NBD nbd/client: Add x-dirty-bitmap to query bitmap from server iotests: add 222 to test basic fleecing blockdev: enable non-root nodes for backup source iscsi: Avoid potential for get_status overflow nbd/server: Fix dirty bitmap logic regression Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-07-03 10:47:02 +01:00
Fam Zheng	9ded4a0114	backup: Use copy offloading The implementation is similar to the 'qemu-img convert'. In the beginning of the job, offloaded copy is attempted. If it fails, further I/O will go through the existing bounce buffer code path. Then, as Kevin pointed out, both this and qemu-img convert can benefit from a local check if one request fails because of, for example, the offset is beyond EOF, but another may well be accepted by the protocol layer. This will be implemented separately. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20180703023758.14422-4-famz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2018-07-02 23:23:45 -04:00
Fam Zheng	dee12de893	block: Honour BDRV_REQ_NO_SERIALISING in copy range This semantics is needed by drive-backup so implement it before using this API there. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20180703023758.14422-3-famz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2018-07-02 23:23:45 -04:00
Fam Zheng	d4d3e5a0d5	block: Fix parameter checking in bdrv_co_copy_range_internal src may be NULL if BDRV_REQ_ZERO_WRITE flag is set, in this case only check dst and dst->bs. This bug was introduced when moving in the request tracking code from bdrv_co_copy_range, in `37aec7d75e`. This especially fixes the possible segfault when initializing src_bs with a NULL src. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20180703023758.14422-2-famz@redhat.com Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2018-07-02 23:23:45 -04:00
Eric Blake	216ee3657e	nbd/client: Add x-dirty-bitmap to query bitmap from server In order to test that the NBD server is properly advertising dirty bitmaps, we need a bare minimum client that can request and read the context. Since feature freeze for 3.0 is imminent, this is the smallest workable patch, which replaces the qemu block status report with the results of the NBD server's dirty bitmap (making it very easy to use 'qemu-img map --output=json' to learn where the dirty portions are). Note that the NBD protocol defines a dirty section with the same bit but opposite sense that normal "base:allocation" uses to report an allocated section; so in qemu-img map output, "data":true corresponds to clean, "data":false corresponds to dirty. A more complete solution that allows dirty bitmaps to be queried at the same time as normal block status will be required before this addition can lose the x- prefix. Until then, the fact that this replaces normal status with dirty status means actions like 'qemu-img convert' will likely misbehave due to treating dirty regions of the file as if they are unallocated. The next patch adds an iotest to exercise this new code. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180702191458.28741-2-eblake@redhat.com>	2018-07-02 15:27:38 -05:00
Eric Blake	8ee1cef459	iscsi: Avoid potential for get_status overflow Detected by Coverity: Multiplying two 32-bit int and assigning the result to a 64-bit number is a risk of overflow. Prior to the conversion to byte-based interfaces, the block layer took care of ensuring that a status request never exceeded 2G in the driver; but after that conversion, the block layer expects drivers to deal with any size request (the driver can always truncate the request size back down, as long as it makes progress). So, in the off-chance that someone makes a large request, we are at the mercy of whether iscsi_get_lba_status_task() will cap things to at most INT_MAX / iscsilun->block_size when it populates lbasd->num_blocks; since I could not easily audit that, it's better to be safe than sorry by just forcing a 64-bit multiply. Fixes: `92809c36` CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180508212718.1482663-1-eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>	2018-07-02 14:28:26 -05:00
Philippe Mathieu-Daudé	f043568f54	vdi: Use definitions from "qemu/units.h" Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stefan Weil <sw@weilnetz.de> Message-Id: <20180625124238.25339-3-f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-07-02 14:45:23 +02:00
Eric Blake	583c99d393	block: Remove unused sector-based vectored I/O We are gradually moving away from sector-based interfaces, towards byte-based. Now that all callers of vectored I/O have been converted to use our preferred byte-based bdrv_co_p{read,write}v(), we can delete the unused bdrv_co_{read,write}v(). Furthermore, this gets rid of the signature difference between the public bdrv_co_writev() and the callback .bdrv_co_writev (the latter still exists, because some drivers still need more work before they are fully byte-based). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-29 14:20:56 +02:00
Eric Blake	3a7404b31e	vhdx: Switch to byte-based calls We are gradually moving away from sector-based interfaces, towards byte-based. Make the change for the last few sector-based calls into the block layer from the vhdx driver. Ideally, the vhdx driver should switch to doing everything byte-based, but that's a more invasive change that requires a bit more auditing. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-29 14:20:56 +02:00
Eric Blake	04a11d87d1	replication: Switch to byte-based calls We are gradually moving away from sector-based interfaces, towards byte-based. Make the change for the last few sector-based calls into the block layer from the replication driver. Ideally, the replication driver should switch to doing everything byte-based, but that's a more invasive change that requires a bit more auditing. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-29 14:20:56 +02:00
Eric Blake	609841a3f8	qcow: Switch to a byte-based driver We are gradually moving away from sector-based interfaces, towards byte-based. The qcow driver is now ready to fully utilize the byte-based callback interface, as long as we override the default alignment to still be 512 (needed at least for asserts present because of encryption, but easier to do everywhere than to audit which sub-sector requests are handled correctly, especially since we no longer recommend qcow for new disk images). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-29 14:20:56 +02:00
Eric Blake	d1326a786d	qcow: Switch qcow_co_writev to byte-based calls We are gradually moving away from sector-based interfaces, towards byte-based. Make the change for the internals of the qcow driver write function, by iterating over offset/bytes instead of sector_num/nb_sectors, and with a rename of index_in_cluster and repurposing of n to track bytes instead of sectors. A later patch will then switch the qcow driver as a whole over to byte-based operation. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-29 14:20:56 +02:00
Eric Blake	a15312b017	qcow: Switch qcow_co_readv to byte-based calls We are gradually moving away from sector-based interfaces, towards byte-based. Make the change for the internals of the qcow driver read function, by iterating over offset/bytes instead of sector_num/nb_sectors, and with a rename of index_in_cluster and repurposing of n to track bytes instead of sectors. A later patch will then switch the qcow driver as a whole over to byte-based operation. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-29 14:20:56 +02:00
Eric Blake	787993a543	qcow: Switch get_cluster_offset to be byte-based We are gradually moving away from sector-based interfaces, towards byte-based. Make the change for the internal helper function get_cluster_offset(), by changing n_start and n_end to be byte offsets rather than sector indices within the cluster being allocated. However, assert that these values are still sector-aligned (at least qcrypto_block_encrypt() still wants that). For now we get that alignment for free because we still use sector-based driver callbacks. A later patch will then switch the qcow driver as a whole over to byte-based operation; but will still leave things at sector alignments as it is not worth auditing the qcow image format to worry about sub-sector requests. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-29 14:20:56 +02:00
Eric Blake	d08c2a245f	parallels: Switch to byte-based calls We are gradually moving away from sector-based interfaces, towards byte-based. Make the change for the last few sector-based calls into the block layer from the parallels driver. Ideally, the parallels driver should switch to doing everything byte-based, but that's a more invasive change that requires a bit more auditing. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-29 14:20:56 +02:00
Fam Zheng	c436e3d014	file-posix: Fix EINTR handling EINTR should be checked against errno, not ret. While fixing the bug, collect the branches with a switch block. Also, change the return value from -ENOSTUP to -ENOSPC when the actual issue is request range passes EOF, which should be distinguishable from the case of error == ENOSYS by the caller, so that it could still retry with other byte ranges, whereas it shouldn't retry anymore upon ENOSYS. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-29 14:20:56 +02:00
Fam Zheng	1439b9c110	iscsi: Don't blindly use designator length in response for memcpy Per SCSI definition the designator_length we receive from INQUIRY is 8, 12 or at most 16, but we should be careful because the remote iscsi target may misbehave, otherwise we could have a buffer overflow. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-29 14:20:56 +02:00
Fam Zheng	e06f4639d8	qcow2: Fix src_offset in copy offloading Not updating src_offset will result in wrong data being written to dst image. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-29 14:20:56 +02:00
Kevin Wolf	33d70fb6fa	file-posix: Implement co versions of discard/flush This simplifies file-posix by implementing the coroutine variants of the discard and flush BlockDriver callbacks. These were the last remaining users of paio_submit(), which can be removed now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-06-29 14:20:56 +02:00
Kevin Wolf	8b24cd1415	qcow2: Free allocated clusters on write error If we managed to allocate the clusters, but then failed to write the data, there's a good chance that we'll still be able to free the clusters again in order to avoid cluster leaks (the refcounts are cached, so even if we can't write them out right now, we may be able to do so when the VM is resumed after a werror=stop/enospc pause). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Eric Blake <eblake@redhat.com>	2018-06-29 14:20:56 +02:00
Markus Armbruster	796d323945	block/crypto: Simplify block_crypto_{open,create}_opts_init() block_crypto_open_opts_init() and block_crypto_create_opts_init() contain a virtual visit of QCryptoBlockOptions and QCryptoBlockCreateOptions less member "format", respectively. Change their callers to put member "format" in the QDict, so they can use the generated visitors for these types instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-29 14:20:56 +02:00
Fam Zheng	37aec7d75e	block: Move request tracking to children in copy offloading in_flight and tracked requests need to be tracked in every layer during recursion. For now the only user is qemu-img convert where overlapping requests and IOThreads don't exist, therefore this change doesn't make much difference form user point of view, but it is incorrect as part of the API. Fix it. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-29 14:20:56 +02:00
Fam Zheng	354d930dc6	qcow2: Remove dead check on !ret In the beginning of the function, we initialize the local variable to 0, and in the body of the function, we check the assigned values and exit the loop immediately. So here it can never be non-zero. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-29 14:20:56 +02:00
Kevin Wolf	93f4e2ff4b	file-posix: Make .bdrv_co_truncate asynchronous This moves the code to resize an image file to the thread pool to avoid blocking. Creating large images with preallocation with blockdev-create is now actually a background job instead of blocking the monitor (and most other things) until the preallocation has completed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-29 14:20:56 +02:00
Kevin Wolf	1bc5f09f2e	block: Use tracked request for truncate When growing an image, block drivers (especially protocol drivers) may initialise the newly added area. I/O requests to the same area need to wait for this initialisation to be completed so that data writes don't get overwritten and reads don't read uninitialised data. To avoid overhead in the fast I/O path by adding new locking in the protocol drivers and to restrict the impact to requests that actually touch the new area, reuse the existing tracked request infrastructure in block/io.c and mark all discard requests as serialising. With this change, it is safe for protocol drivers to make .bdrv_co_truncate actually asynchronous. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-29 14:20:56 +02:00
Kevin Wolf	3d9f2d2af6	block: Move bdrv_truncate() implementation to io.c This moves the bdrv_truncate() implementation from block.c to block/io.c so it can have access to the tracked requests infrastructure. This involves making refresh_total_sectors() public (in block_int.h). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-29 14:20:56 +02:00
Kevin Wolf	47e86b868d	qcow2: Remove coroutine trampoline for preallocate_co() All callers are coroutine_fns now, so we can just directly call preallocate_co(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-29 14:20:56 +02:00
Kevin Wolf	061ca8a368	block: Convert .bdrv_truncate callback to coroutine_fn bdrv_truncate() is an operation that can block (even for a quite long time, depending on the PreallocMode) in I/O paths that shouldn't block. Convert it to a coroutine_fn so that we have the infrastructure for drivers to make their .bdrv_co_truncate implementation asynchronous. This change could potentially introduce new race conditions because bdrv_truncate() isn't necessarily executed atomically any more. Whether this is a problem needs to be evaluated for each block driver that supports truncate: * file-posix/win32, gluster, iscsi, nfs, rbd, ssh, sheepdog: The protocol drivers are trivially safe because they don't actually yield yet, so there is no change in behaviour. * copy-on-read, crypto, raw-format: Essentially just filter drivers that pass the request to a child node, no problem. * qcow2: The implementation modifies metadata, so it needs to hold s->lock to be safe with concurrent I/O requests. In order to avoid double locking, this requires pulling the locking out into preallocate_co() and using qcow2_write_caches() instead of bdrv_flush(). * qed: Does a single header update, this is fine without locking. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-29 14:20:56 +02:00
Kevin Wolf	ae5475e82f	qcow2: Fix qcow2_truncate() error return value If qcow2_alloc_clusters_at() returns an error, we do need to negate it to get back the positive errno code for error_setg_errno(), but we still need to return the negative error code. Fixes: `772d1f973f` Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-29 14:20:56 +02:00
Markus Armbruster	e6af90f3fb	block/crypto: Pacify Coverity after commit `f853465aac` Coverity can't see that qobject_input_visitor_new_flat_confused() returns non-null when it doesn't set @local_err. Check the return value instead, like all the other callers do. Fixes: CID 1393615 Fixes: CID 1393616 Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-29 14:20:56 +02:00
Nishanth Aravamudan	ed6e216171	linux-aio: properly bubble up errors from initialization laio_init() can fail for a couple of reasons, which will lead to a NULL pointer dereference in laio_attach_aio_context(). To solve this, add a aio_setup_linux_aio() function which is called early in raw_open_common. If this fails, propagate the error up. The signature of aio_get_linux_aio() was not modified, because it seems preferable to return the actual errno from the possible failing initialization calls. Additionally, when the AioContext changes, we need to associate a LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context callback and call the new aio_setup_linux_aio(), which will allocate a new AioContext if needed, and return errors on failures. If it fails for any reason, fallback to threaded AIO with an error message, as the device is already in-use by the guest. Add an assert that aio_get_linux_aio() cannot return NULL. Signed-off-by: Nishanth Aravamudan <naravamudan@digitalocean.com> Message-id: 20180622193700.6523-1-naravamudan@digitalocean.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-27 13:06:34 +01:00
Anton Nefedov	29cd0403f1	qapi: remove empty flat union branches and types Flat unions may now have uncovered branches, so it is possible to get rid of empty types defined for that purpose only. Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1529311206-76847-3-git-send-email-anton.nefedov@virtuozzo.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2018-06-22 16:33:46 +02:00
Max Reitz	481debaa32	block/mirror: Add copy mode QAPI interface This patch allows the user to specify whether to use active or only background mode for mirror block jobs. Currently, this setting will remain constant for the duration of the entire block job. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180613181823.13618-14-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-18 17:05:16 +02:00
Max Reitz	d06107ade0	block/mirror: Add active mirroring This patch implements active synchronous mirroring. In active mode, the passive mechanism will still be in place and is used to copy all initially dirty clusters off the source disk; but every write request will write data both to the source and the target disk, so the source cannot be dirtied faster than data is mirrored to the target. Also, once the block job has converged (BLOCK_JOB_READY sent), source and target are guaranteed to stay in sync (unless an error occurs). Active mode is completely optional and currently disabled at runtime. A later patch will add a way for users to enable it. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180613181823.13618-13-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-18 17:05:15 +02:00
Max Reitz	429076e88d	block/mirror: Add MirrorBDSOpaque This will allow us to access the block job data when the mirror block driver becomes more complex. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180613181823.13618-11-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-18 17:04:59 +02:00
Max Reitz	72d10a9421	block/dirty-bitmap: Add bdrv_dirty_iter_next_area This new function allows to look for a consecutively dirty area in a dirty bitmap. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180613181823.13618-10-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-18 17:04:57 +02:00
Max Reitz	a33fbb4f8b	hbitmap: Add @advance param to hbitmap_iter_next() This new parameter allows the caller to just query the next dirty position without moving the iterator. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180613181823.13618-8-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-18 17:04:55 +02:00
Max Reitz	138f9fffb8	block/mirror: Use source as a BdrvChild With this, the mirror_top_bs is no longer just a technically required node in the BDS graph but actually represents the block job operation. Also, drop MirrorBlockJob.source, as we can reach it through mirror_top_bs->backing. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180613181823.13618-6-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-18 17:04:54 +02:00
Max Reitz	1181e19a6d	block/mirror: Wait for in-flight op conflicts This patch makes the mirror code differentiate between simply waiting for any operation to complete (mirror_wait_for_free_in_flight_slot()) and specifically waiting for all operations touching a certain range of the virtual disk to complete (mirror_wait_on_conflicts()). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180613181823.13618-5-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-18 17:04:53 +02:00
Max Reitz	12aa40822d	block/mirror: Use CoQueue to wait on in-flight ops Attach a CoQueue to each in-flight operation so if we need to wait for any we can use it to wait instead of just blindly yielding and hoping for some operation to wake us. A later patch will use this infrastructure to allow requests accessing the same area of the virtual disk to specifically wait for each other. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180613181823.13618-4-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-18 17:04:52 +02:00
Max Reitz	2e1990b26e	block/mirror: Convert to coroutines In order to talk to the source BDS (and maybe in the future to the target BDS as well) directly, we need to convert our existing AIO requests into coroutine I/O requests. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180613181823.13618-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-18 17:04:48 +02:00
Max Reitz	4295c5fc61	block/mirror: Pull out mirror_perform() When converting mirror's I/O to coroutines, we are going to need a point where these coroutines are created. mirror_perform() is going to be that point. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180613181823.13618-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-18 17:04:43 +02:00
Greg Kurz	f45280cbf6	block: fix QEMU crash with scsi-hd and drive_del Removing a drive with drive_del while it is being used to run an I/O intensive workload can cause QEMU to crash. An AIO flush can yield at some point: blk_aio_flush_entry() blk_co_flush(blk) bdrv_co_flush(blk->root->bs) ... qemu_coroutine_yield() and let the HMP command to run, free blk->root and give control back to the AIO flush: hmp_drive_del() blk_remove_bs() bdrv_root_unref_child(blk->root) child_bs = blk->root->bs bdrv_detach_child(blk->root) bdrv_replace_child(blk->root, NULL) blk->root->bs = NULL g_free(blk->root) <============== blk->root becomes stale bdrv_unref(child_bs) bdrv_delete(child_bs) bdrv_close() bdrv_drained_begin() bdrv_do_drained_begin() bdrv_drain_recurse() aio_poll() ... qemu_coroutine_switch() and the AIO flush completion ends up dereferencing blk->root: blk_aio_complete() scsi_aio_complete() blk_get_aio_context(blk) bs = blk_bs(blk) ie, bs = blk->root ? blk->root->bs : NULL ^^^^^ stale The problem is that we should avoid making block driver graph changes while we have in-flight requests. Let's drain all I/O for this BB before calling bdrv_root_unref_child(). Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-18 15:03:25 +02:00
Kevin Wolf	0f12264e7a	block: Allow graph changes in bdrv_drain_all_begin/end sections bdrv_drain_all_*() used bdrv_next() to iterate over all root nodes and did a subtree drain for each of them. This works fine as long as the graph is static, but sadly, reality looks different. If the graph changes so that root nodes are added or removed, we would have to compensate for this. bdrv_next() returns each root node only once even if it's the root node for multiple BlockBackends or for a monitor-owned block driver tree, which would only complicate things. The much easier and more obviously correct way is to fundamentally change the way the functions work: Iterate over all BlockDriverStates, no matter who owns them, and drain them individually. Compensation is only necessary when a new BDS is created inside a drain_all section. Removal of a BDS doesn't require any action because it's gone afterwards anyway. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-18 15:03:25 +02:00
Kevin Wolf	6cd5c9d7b2	block: ignore_bds_parents parameter for drain functions In the future, bdrv_drained_all_begin/end() will drain all invidiual nodes separately rather than whole subtrees. This means that we don't want to propagate the drain to all parents any more: If the parent is a BDS, it will already be drained separately. Recursing to all parents is unnecessary work and would make it an O(n²) operation. Prepare the drain function for the changed drain_all by adding an ignore_bds_parents parameter to the internal implementation that prevents the propagation of the drain to BDS parents. We still (have to) propagate it to non-BDS parents like BlockBackends or Jobs because those are not drained separately. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-18 15:03:25 +02:00
Kevin Wolf	c8ca33d06d	block: Move bdrv_drain_all_begin() out of coroutine context Before we can introduce a single polling loop for all nodes in bdrv_drain_all_begin(), we must make sure to run it outside of coroutine context like we already do for bdrv_do_drained_begin(). Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-18 15:03:25 +02:00
Kevin Wolf	0109e7e6f8	block: Defer .bdrv_drain_begin callback to polling phase We cannot allow aio_poll() in bdrv_drain_invoke(begin=true) until we're done with propagating the drain through the graph and are doing the single final BDRV_POLL_WHILE(). Just schedule the coroutine with the callback and increase bs->in_flight to make sure that the polling phase will wait for it. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-18 15:03:25 +02:00
Kevin Wolf	dcf94a23b1	block: Don't poll in parent drain callbacks bdrv_do_drained_begin() is only safe if we have a single BDRV_POLL_WHILE() after quiescing all affected nodes. We cannot allow that parent callbacks introduce a nested polling loop that could cause graph changes while we're traversing the graph. Split off bdrv_do_drained_begin_quiesce(), which only quiesces a single node without waiting for its requests to complete. These requests will be waited for in the BDRV_POLL_WHILE() call down the call chain. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-18 15:03:25 +02:00
Kevin Wolf	fe4f0614ef	block: Drain recursively with a single BDRV_POLL_WHILE() Anything can happen inside BDRV_POLL_WHILE(), including graph changes that may interfere with its callers (e.g. child list iteration in recursive callers of bdrv_do_drained_begin). Switch to a single BDRV_POLL_WHILE() call for the whole subtree at the end of bdrv_do_drained_begin() to avoid such effects. The recursion happens now inside the loop condition. As the graph can only change between bdrv_drain_poll() calls, but not inside of it, doing the recursion here is safe. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-18 15:03:25 +02:00
Kevin Wolf	d30b8e64b7	block: Remove bdrv_drain_recurse() For bdrv_drain(), recursively waiting for child node requests is pointless because we didn't quiesce their parents, so new requests could come in anyway. Letting the function work only on a single node makes it more consistent. For subtree drains and drain_all, we already have the recursion in bdrv_do_drained_begin(), so the extra recursion doesn't add anything either. Remove the useless code. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-18 15:03:25 +02:00
Kevin Wolf	89bd030533	block: Really pause block jobs on drain We already requested that block jobs be paused in .bdrv_drained_begin, but no guarantee was made that the job was actually inactive at the point where bdrv_drained_begin() returned. This introduces a new callback BdrvChildRole.bdrv_drained_poll() and uses it to make bdrv_drain_poll() consider block jobs using the node to be drained. For the test case to work as expected, we have to switch from block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even considered active and must be waited for when draining the node. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-18 15:03:25 +02:00
Kevin Wolf	1cc8e54ada	block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE() Commit `91af091f92` added an additional aio_poll() to BDRV_POLL_WHILE() in order to make sure that all pending BHs are executed on drain. This was the wrong place to make the fix, as it is useless overhead for all other users of the macro and unnecessarily complicates the mechanism. This patch effectively reverts said commit (the context has changed a bit and the code has moved to AIO_WAIT_WHILE()) and instead polls in the loop condition for drain. The effect is probably hard to measure in any real-world use case because actual I/O will dominate, but if I run only the initialisation part of 'qemu-img convert' where it calls bdrv_block_status() for the whole image to find out how much data there is copy, this phase actually needs only roughly half the time after this patch. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-18 15:03:25 +02:00
Kevin Wolf	c13ad59f01	block: Don't manually poll in bdrv_drain_all() All involved nodes are already idle, we called bdrv_do_drain_begin() on them. The comment in the code suggested that this was not correct because the completion of a request on one node could spawn a new request on a different node (which might have been drained before, so we wouldn't drain the new request). In reality, new requests to different nodes aren't spawned out of nothing, but only in the context of a parent request, and they aren't submitted to random nodes, but only to child nodes. As long as we still poll for the completion of the parent request (which we do), draining each root node separately is good enough. Remove the additional polling code from bdrv_drain_all_begin() and replace it with an assertion that all nodes are already idle after we drained them separately. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-18 15:03:25 +02:00
Kevin Wolf	7d40d9ef9d	block: Remove 'recursive' parameter from bdrv_drain_invoke() All callers pass false for the 'recursive' parameter now. Remove it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-18 15:03:25 +02:00
Kevin Wolf	79ab8b21dc	block: Use bdrv_do_drain_begin/end in bdrv_drain_all() bdrv_do_drain_begin/end() implement already everything that bdrv_drain_all_begin/end() need and currently still do manually: Disable external events, call parent drain callbacks, call block driver callbacks. It also does two more things: The first is incrementing bs->quiesce_counter. bdrv_drain_all() already stood out in the test case by behaving different from the other drain variants. Adding this is not only safe, but in fact a bug fix. The second is calling bdrv_drain_recurse(). We already do that later in the same function in a loop, so basically doing an early first iteration doesn't hurt. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-18 15:03:25 +02:00
Kevin Wolf	bb67568954	test-bdrv-drain: bdrv_drain() works with cross-AioContext events As long as nobody keeps the other I/O thread from working, there is no reason why bdrv_drain() wouldn't work with cross-AioContext events. The key is that the root request we're waiting for is in the AioContext we're polling (which it always is for bdrv_drain()) so that aio_poll() is woken up in the end. Add a test case that shows that it works. Remove the comment in bdrv_drain() that claims otherwise. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-18 15:03:25 +02:00
Kevin Wolf	b008326744	block: Remove deprecated -drive option serial The -drive option serial was deprecated in QEMU 2.10. It's time to remove it. Tests need to be updated to set the serial number with -global instead of using the -drive option. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2018-06-15 14:49:44 +02:00
Markus Armbruster	d083f954a9	rbd: New parameter key-secret Legacy -drive supports "password-secret" parameter that isn't available with -blockdev / blockdev-add. That's because we backed out our first try to provide it there due to interface design doubts, in commit `577d8c9a81`, v2.9.0. This is the second try. It brings back the parameter, except it's named "key-secret" now. Let's review our reasons for backing out the first try, as stated in the commit message: * BlockdevOptionsRbd member @password-secret isn't actually a password, it's a key generated by Ceph. Addressed by the rename. * We're not sure where member @password-secret belongs (see the previous commit). See previous commit. * How @password-secret interacts with settings from a configuration file specified with @conf is undocumented. Not actually true, the documentation for @conf says "Values in the configuration file will be overridden by options specified via QAPI", and we've tested this. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-15 14:49:44 +02:00
Markus Armbruster	a3699de4dd	rbd: New parameter auth-client-required Parameter auth-client-required lets you configure authentication methods. We tried to provide that in v2.9.0, but backed out due to interface design doubts (commit `464444fcc1`). This commit is similar to what we backed out, but simpler: we use a list of enumeration values instead of a list of objects with a member of enumeration type. Let's review our reasons for backing out the first try, as stated in the commit message: * The implementation uses deprecated rados_conf_set() key "auth_supported". No biggie. Fixed: we use "auth-client-required". * The implementation makes -drive silently ignore invalid parameters "auth" and "auth-supported..X" where X isn't "auth". Fixable (in fact I'm going to fix similar bugs around parameter server), so again no biggie. That fix is commit `2836284db6`. This commit doesn't bring the bugs back. BlockdevOptionsRbd member @password-secret applies only to authentication method cephx. Should it be a variant member of RbdAuthMethod? We've had time to ponder, and we decided to stick to the way Ceph configuration works: the key configured separately, and silently ignored if the authentication method doesn't use it. * BlockdevOptionsRbd member @user could apply to both methods cephx and none, but I'm not sure it's actually used with none. If it isn't, should it be a variant member of RbdAuthMethod? Likewise. * The client offers a set of authentication methods, not a list. Should the methods be optional members of BlockdevOptionsRbd instead of members of list @auth-supported? The latter begs the question what multiple entries for the same method mean. Trivial question now that RbdAuthMethod contains nothing but @type, but less so when RbdAuthMethod acquires other members, such the ones discussed above. Again, we decided to stick to the way Ceph configuration works, except we make auth-client-required a list of enumeration values instead of a string containing keywords separated by delimiters. * How BlockdevOptionsRbd member @auth-supported interacts with settings from a configuration file specified with @conf is undocumented. I suspect it's untested, too. Not actually true, the documentation for @conf says "Values in the configuration file will be overridden by options specified via QAPI", and we've tested this. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-15 14:49:44 +02:00
Markus Armbruster	f853465aac	block: Make remaining uses of qobject input visitor more robust Remaining uses of qobject_input_visitor_new_keyval() in the block subsystem: * block_crypto_open_opts_init() Currently doesn't visit any non-string scalars, thus safe. It's called from - block_crypto_open_luks() Creates the QDict with qemu_opts_to_qdict_filtered(), which creates only string scalars, but has a TODO asking for other types. - qcow_open() - qcow2_open(), qcow2_co_invalidate_cache(), qcow2_reopen_prepare() * block_crypto_create_opts_init(), called from - block_crypto_co_create_opts_luks() Also creates the QDict with qemu_opts_to_qdict_filtered(). * vdi_co_create_opts() Also creates the QDict with qemu_opts_to_qdict_filtered(). Replace these uses by qobject_input_visitor_new_flat_confused() for robustness. This adds crumpling. Right now, that's a no-op, but if we ever extend these things in non-flat ways, crumpling will be needed. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-15 14:49:44 +02:00
Markus Armbruster	af91062ee1	block: Factor out qobject_input_visitor_new_flat_confused() Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-15 14:49:44 +02:00
Markus Armbruster	92adf9dbcd	block: Clean up a misuse of qobject_to() in .bdrv_co_create_opts() The following pattern occurs in the .bdrv_co_create_opts() methods of parallels, qcow, qcow2, qed, vhdx and vpc: qobj = qdict_crumple_for_keyval_qiv(qdict, errp); qobject_unref(qdict); qdict = qobject_to(QDict, qobj); if (qdict == NULL) { ret = -EINVAL; goto done; } v = qobject_input_visitor_new_keyval(QOBJECT(qdict)); [...] ret = 0; done: qobject_unref(qdict); [...] return ret; If qobject_to() fails, we return failure without setting errp. That's wrong. As far as I can tell, it cannot fail here. Clean it up anyway, by removing the useless conversion. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-15 14:49:44 +02:00
Markus Armbruster	374c52467a	block: Fix -drive for certain non-string scalars The previous commit fixed -blockdev breakage due to misuse of the qobject input visitor's keyval flavor in bdrv_file_open(). The commit message explain why using the plain flavor would be just as wrong; it would break -drive. Turns out we break it in three places: nbd_open(), sd_open() and ssh_file_open(). They are even marked FIXME. Example breakage: $ qemu-system-x86 -drive node-name=n1,driver=nbd,server.type=inet,server.host=localhost,server.port=1234,server.numeric=off qemu-system-x86: -drive node-name=n1,driver=nbd,server.type=inet,server.host=localhost,server.port=1234,server.numeric=off: Invalid parameter type for 'numeric', expected: boolean Fix it the same way: replace qdict_crumple() by qdict_crumple_for_keyval_qiv(), and switch from plain to the keyval flavor. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-15 14:49:44 +02:00
Markus Armbruster	e5af0da1dc	block: Fix -blockdev for certain non-string scalars Configuration flows through the block subsystem in a rather peculiar way. Configuration made with -drive enters it as QemuOpts. Configuration made with -blockdev / blockdev-add enters it as QAPI type BlockdevOptions. The block subsystem uses QDict, QemuOpts and QAPI types internally. The precise flow is next to impossible to explain (I tried for this commit message, but gave up after wasting several hours). What I can explain is a flaw in the BlockDriver interface that leads to this bug: $ qemu-system-x86_64 -blockdev node-name=n1,driver=nfs,server.type=inet,server.host=localhost,path=/foo/bar,user=1234 qemu-system-x86_64: -blockdev node-name=n1,driver=nfs,server.type=inet,server.host=localhost,path=/foo/bar,user=1234: Internal error: parameter user invalid QMP blockdev-add is broken the same way. Here's what happens. The block layer passes configuration represented as flat QDict (with dotted keys) to BlockDriver methods .bdrv_file_open(). The QDict's members are typed according to the QAPI schema. nfs_file_open() converts it to QAPI type BlockdevOptionsNfs, with qdict_crumple() and a qobject input visitor. This visitor comes in two flavors. The plain flavor requires scalars to be typed according to the QAPI schema. That's the case here. The keyval flavor requires string scalars. That's not the case here. nfs_file_open() uses the latter, and promptly falls apart for members @user, @group, @tcp-syn-count, @readahead-size, @page-cache-size, @debug. Switching to the plain flavor would fix -blockdev, but break -drive, because there the scalars arrive in nfs_file_open() as strings. The proper fix would be to replace the QDict by QAPI type BlockdevOptions in the BlockDriver interface. Sadly, that's beyond my reach right now. Next best would be to fix the block layer to always pass correctly typed QDicts to the BlockDriver methods. Also beyond my reach. What I can do is throw another hack onto the pile: have nfs_file_open() convert all members to string, so use of the keyval flavor actually works, by replacing qdict_crumple() by new function qdict_crumple_for_keyval_qiv(). The pattern "pass result of qdict_crumple() to qobject_input_visitor_new_keyval()" occurs several times more: * qemu_rbd_open() Same issue as nfs_file_open(), but since BlockdevOptionsRbd has only string members, its only a latent bug. Fix it anyway. * parallels_co_create_opts(), qcow_co_create_opts(), qcow2_co_create_opts(), bdrv_qed_co_create_opts(), sd_co_create_opts(), vhdx_co_create_opts(), vpc_co_create_opts() These work, because they create the QDict with qemu_opts_to_qdict_filtered(), which creates only string scalars. The function sports a TODO comment asking for better typing; that's going to be fun. Use qdict_crumple_for_keyval_qiv() to be safe. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-15 14:49:44 +02:00
Max Reitz	609f45ea95	block: Add block-specific QDict header There are numerous QDict functions that have been introduced for and are used only by the block layer. Move their declarations into an own header file to reflect that. While qdict_extract_subqdict() is in fact used outside of the block layer (in util/qemu-config.c), it is still a function related very closely to how the block layer works with nested QDicts, namely by sometimes flattening them. Therefore, its declaration is put into this header as well and util/qemu-config.c includes it with a comment stating exactly which function it needs. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20180509165530.29561-7-mreitz@redhat.com> [Copyright note tweaked, superfluous includes dropped] Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-15 14:49:44 +02:00
Markus Armbruster	deadbb8ebb	iscsi: Drop deprecated -drive parameter "filename" Parameter "filename" is deprecated since commit `5c3ad1a6a8`, v2.10.0. Time to get rid of it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-15 14:49:44 +02:00
Markus Armbruster	bb9f762ff3	rbd: Drop deprecated -drive parameter "filename" Parameter "filename" is deprecated since commit `91589d9e5c`, v2.10.0. Time to get rid of it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-06-15 14:49:44 +02:00
Vladimir Sementsov-Ogievskiy	b598e531f1	qapi: add x-block-dirty-bitmap-merge Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20180606182449.1607-5-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2018-06-11 14:53:32 -04:00
Vladimir Sementsov-Ogievskiy	8b1402ce80	block/dirty-bitmap: add lock to bdrv_enable/disable_dirty_bitmap Add locks and remove comments about BQL accordingly to dirty_bitmap_mutex definition in block_int.h. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20180606182449.1607-2-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2018-06-11 14:53:31 -04:00
Paolo Bonzini	b133c27f5d	block: simplify code around releasing bitmaps QLIST_REMOVE does not require walking the list, and once the "bitmap" argument is removed from bdrv_do_release_matching_dirty_bitmap_locked the code simplifies a lot and it is worth inlining everything in the callers of bdrv_do_release_matching_dirty_bitmap. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180326104037.6894-1-pbonzini@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2018-06-11 14:53:31 -04:00
Paolo Bonzini	ab41fc4853	block: remove bdrv_dirty_bitmap_make_anon All this function is doing will be repeated by bdrv_do_release_matching_dirty_bitmap_locked, except resetting bm->persistent. But even that does not matter because the bitmap will be freed. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20180323164254.26487-1-pbonzini@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2018-06-11 14:53:31 -04:00
Max Reitz	ddf3b47ef4	qcow2: Do not mark inactive images corrupt When signaling a corruption on a read-only image, qcow2 already makes fatal events non-fatal (i.e., they will not result in the image being closed, and the image header's corrupt flag will not be set). This is necessary because we cannot set the corrupt flag on read-only images, and it is possible because further corruption of read-only images is impossible. Inactive images are effectively read-only, too, so we should do the same for them. bdrv_is_writable() can tell us whether an image can actually be written to, so use its result instead of !bs->read_only. (Otherwise, the assert(!(bs->open_flags & BDRV_O_INACTIVE)) in bdrv_co_pwritev() will fail, crashing qemu.) Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20180606193702.7113-3-mreitz@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-11 16:18:45 +02:00
Alberto Garcia	bc33c047d1	throttle: Fix crash on reopen The throttle block filter can be reopened, and with this it is possible to change the throttle group that the filter belongs to. The way the code does that is the following: - On throttle_reopen_prepare(): create a new ThrottleGroupMember and attach it to the new throttle group. - On throttle_reopen_commit(): detach the old ThrottleGroupMember, delete it and replace it with the new one. The problem with this is that by replacing the ThrottleGroupMember the previous value of io_limits_disabled is lost, causing an assertion failure in throttle_co_drain_end(). This problem can be reproduced by reopening a throttle node: $QEMU -monitor stdio -object throttle-group,id=tg0,x-iops-total=1000 \ -blockdev node-name=hd0,driver=qcow2,file.driver=file,file.filename=hd.qcow2 \ -blockdev node-name=root,driver=throttle,throttle-group=tg0,file=hd0,read-only=on (qemu) block_stream root block/throttle.c:214: throttle_co_drain_end: Assertion `tgm->io_limits_disabled' failed. Since we only want to change the throttle group on reopen there's no need to create a ThrottleGroupMember and discard the old one. It's easier if we simply detach it from its current group and attach it to the new one. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 20180608151536.7378-1-berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-11 16:18:45 +02:00
Vladimir Sementsov-Ogievskiy	7eb24009db	block/qcow2-bitmap: fix free_bitmap_clusters This assert may fail, because bitmap_table is not initialized. Just drop it, as it's obvious, that bitmap_table_load sets bitmap_table parameter only when returning zero. Reported-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20180608101225.2575-1-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-11 16:18:45 +02:00
Max Reitz	3cce51c919	qcow2: Repair OFLAG_COPIED when fixing leaks Repairing OFLAG_COPIED is usually safe because it is done after the refcounts have been repaired. Therefore, it we did not find anyone else referencing a data or L2 cluster, it makes no sense to not set OFLAG_COPIED -- and the other direction (clearing OFLAG_COPIED) is always safe, anyway, it may just induce leaks. Furthermore, if OFLAG_COPIED is actually consistent with a wrong (leaky) refcount, we will decrement the refcount with -r leaks, but OFLAG_COPIED will then be wrong. qemu-img check should not produce images that are more corrupted afterwards then they were before. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1527085 Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20180509200059.31125-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-11 16:18:45 +02:00
Max Reitz	d1402b5026	block: Add Error parameter to bdrv_amend_options Looking at the qcow2 code that is riddled with error_report() calls, this is really how it should have been from the start. Along the way, turn the target_version/current_version comparisons at the beginning of qcow2_downgrade() into assertions (the caller has to make sure these conditions are met), and rephrase the error message on using compat=1.1 to get refcount widths other than 16 bits. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20180509210023.20283-3-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-11 16:18:45 +02:00
Max Reitz	b8cf1913a9	block/file-posix: File locking during creation When creating a file, we should take the WRITE and RESIZE permissions. We do not need either for the creation itself, but we do need them for clearing and resizing it. So we can take the proper permissions by replacing O_TRUNC with an explicit truncation to 0, and by taking the appropriate file locks between those two steps. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20180509215336.31304-3-mreitz@redhat.com Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-11 16:18:45 +02:00
Max Reitz	d0a96155de	block/file-posix: Pass FD to locking helpers raw_apply_lock_bytes() and raw_check_lock_bytes() currently take a BDRVRawState *, but they only use the lock_fd field. During image creation, we do not have a BDRVRawState, but we do have an FD; so if we want to reuse the functions there, we should modify them to receive only the FD. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180509215336.31304-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-06-11 16:18:45 +02:00
Paolo Bonzini	68acc99f14	sheepdog: remove huge BSS object block/sheepdog.o has a 4M static variable that is 90% of QEMU's whole .bss section. Replace it with a heap-allocated block, and make it smaller too since only the inode header is actually being used. bss size goes down from 4464280 to 269976. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20180523160721.14018-3-pbonzini@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2018-06-05 10:15:12 -04:00
Paolo Bonzini	03b036cc0c	sheepdog: cleanup repeated expression The expression "SD_INODE_SIZE - sizeof(inode.data_vdi_id)" already has a macro defined for the same value (though with a nicer definition using offsetof). Replace it. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20180523160721.14018-2-pbonzini@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2018-06-05 10:15:12 -04:00
Peter Maydell	0d514fa234	Pull request * Copy offloading for qemu-img convert (iSCSI, raw, and qcow2) If the underlying storage supports copy offloading, qemu-img convert will use it instead of performing reads and writes. This avoids data transfers and thus frees up storage bandwidth for other purposes. SCSI EXTENDED COPY and Linux copy_file_range(2) are used to implement this optimization. * Drop spurious "WARNING: I\/O thread spun for 1000 iterations" warning -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJbFSBoAAoJEJykq7OBq3PISpEIAIcMao4/rzinAWXzS+ncK9LO 6FtRVpgutpHaWX2ayySaz5n2CdR3cNMrpCI7sjY2Kw0lrdkqxPgl5n0SWD+VCl4W 7+JLz/uF0iUV8X+99e7WGAjZbm9LSlxgn5AQKfrrwyPf0ZfzoYQ5nBMcQ6xjEeQP 48j2WqJqN9/u8RBD07o11yn0+CE5g56/f12xVjR5ASVodzsAmcZ2OQRMQbM01isU 1mBekJQkDxJkt5l13Rql8+t+vWz8/9BEW2c/eIDKvoayMqYJpdfKv4DqLloIuHnc 3RkquA0zUuKtl7xEnEkH/We7fi4QPGW/vyBN7ychS/zKzZFQrXmwqrAuFSw3dKU= =vZp+ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging Pull request * Copy offloading for qemu-img convert (iSCSI, raw, and qcow2) If the underlying storage supports copy offloading, qemu-img convert will use it instead of performing reads and writes. This avoids data transfers and thus frees up storage bandwidth for other purposes. SCSI EXTENDED COPY and Linux copy_file_range(2) are used to implement this optimization. * Drop spurious "WARNING: I\/O thread spun for 1000 iterations" warning # gpg: Signature made Mon 04 Jun 2018 12:20:08 BST # gpg: using RSA key 9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: main-loop: drop spin_counter qemu-img: Convert with copy offloading block-backend: Add blk_co_copy_range iscsi: Implement copy offloading iscsi: Create and use iscsi_co_wait_for_task iscsi: Query and save device designator when opening file-posix: Implement bdrv_co_copy_range qcow2: Implement copy offloading raw: Implement copy offloading raw: Check byte range uniformly block: Introduce API for copy offloading Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-04 18:34:04 +01:00
Peter Maydell	f67c9b693a	acpi, vhost, misc: fixes, features vDPA support, fix to vhost blk RO bit handling, some include path cleanups, NFIT ACPI table. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJbEXNvAAoJECgfDbjSjVRpc8gH/R8xrcFrV+k9wwbgYcOcGb6Y LWjseE31pqJcxRV80vLOdzYEuLStZQKQQY7xBDMlA5vdyvZxIA6FLO2IsiJSbFAk EK8pclwhpwQAahr8BfzenabohBv2UO7zu5+dqSvuJCiMWF3jGtPAIMxInfjXaOZY odc1zY2D2EgsC7wZZ1hfraRbISBOiRaez9BoGDKPOyBY9G1ASEgxJgleFgoBLfsK a1XU+fDM6hAVdxftfkTm0nibyf7PWPDyzqghLqjR9WXLvZP3Cqud4p8N29mY51pR KSTjA4FYk6Z9EVMltyBHfdJs6RQzglKjxcNGdlrvacDfyFi79fGdiosVllrjfJM= =3+V0 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging acpi, vhost, misc: fixes, features vDPA support, fix to vhost blk RO bit handling, some include path cleanups, NFIT ACPI table. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Fri 01 Jun 2018 17:25:19 BST # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (31 commits) vhost-blk: turn on pre-defined RO feature bit ACPI testing: test NFIT platform capabilities nvdimm, acpi: support NFIT platform capabilities tests/.gitignore: add entry for generated file arch_init: sort architectures ui: use local path for local headers qga: use local path for local headers colo: use local path for local headers migration: use local path for local headers usb: use local path for local headers sd: fix up include vhost-scsi: drop an unused include ppc: use local path for local headers rocker: drop an unused include e1000e: use local path for local headers ioapic: fix up includes ide: use local path for local headers display: use local path for local headers trace: use local path for local headers migration: drop an unused include ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-04 10:15:16 +01:00
Fam Zheng	b5679fa49c	block-backend: Add blk_co_copy_range It's a BlockBackend wrapper of the BDS interface. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20180601092648.24614-10-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-01 14:41:48 +01:00
Fam Zheng	604dfaaa32	iscsi: Implement copy offloading Issue EXTENDED COPY (LID1) command to implement the copy_range API. The parameter data construction code is modified from libiscsi's iscsi-dd.c. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20180601092648.24614-9-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-01 14:41:48 +01:00
Fam Zheng	66e75c03b2	iscsi: Create and use iscsi_co_wait_for_task This loop is repeated a growing number times. Make a helper. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20180601092648.24614-8-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-01 14:41:48 +01:00
Fam Zheng	cc9743c236	iscsi: Query and save device designator when opening The device designator data returned in INQUIRY command will be useful to fill in source/target fields during copy offloading. Do this when connecting to the target and save the data for later use. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20180601092648.24614-7-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-01 14:41:48 +01:00
Fam Zheng	1efad060d7	file-posix: Implement bdrv_co_copy_range With copy_file_range(2), we can implement the bdrv_co_copy_range semantics. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20180601092648.24614-6-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-01 14:41:48 +01:00
Fam Zheng	fd9fcd37a8	qcow2: Implement copy offloading The two callbacks are implemented quite similarly to the read/write functions: bdrv_co_copy_range_from maps for read and calls into bs->file or bs->backing depending on the allocation status; bdrv_co_copy_range_to maps for write and calls into bs->file. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20180601092648.24614-5-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-01 14:41:48 +01:00
Fam Zheng	72d219e2f9	raw: Implement copy offloading Just pass down to ->file. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20180601092648.24614-4-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-01 14:41:47 +01:00
Fam Zheng	3844553852	raw: Check byte range uniformly We don't verify the request range against s->size in the I/O callbacks except for raw_co_pwritev. This is inconsistent (especially for raw_co_pwrite_zeroes and raw_co_pdiscard), so fix them, in the meanwhile make the helper reusable by the coming new callbacks. Note that in most cases the block layer already verifies the request byte range against our reported image length, before invoking the driver callbacks. The exception is during image creating, after blk_set_allow_write_beyond_eof(blk, true) is called. But in that case, the requests are not directly from the user or guest. So there is no visible behavior change in adding the check code. The int64_t -> uint64_t inconsistency, as shown by the type casting, is pre-existing due to the interface. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20180601092648.24614-3-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-01 14:41:47 +01:00
Fam Zheng	fcc6767836	block: Introduce API for copy offloading Introduce the bdrv_co_copy_range() API for copy offloading. Block drivers implementing this API support efficient copy operations that avoid reading each block from the source device and writing it to the destination devices. Examples of copy offload primitives are SCSI EXTENDED COPY and Linux copy_file_range(2). Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20180601092648.24614-2-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-01 14:41:47 +01:00
Michael S. Tsirkin	0d8c41dae5	block: use local path for local headers When pulling in headers that are in the same directory as the C file (as opposed to one in include/), we should use its relative path, without a directory. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>	2018-05-31 04:16:06 +03:00
Kevin Wolf	3fb588a0f2	block/create: Mark blockdev-create stable We're ready to declare the blockdev-create job stable. This renames the corresponding QMP command from x-blockdev-create to blockdev-create. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2018-05-30 13:31:18 +02:00
Kevin Wolf	e5ab4347f9	block/create: Make x-blockdev-create a job This changes the x-blockdev-create QMP command so that it doesn't block the monitor and the main loop any more, but starts a background job that performs the image creation. The basic job as implemented here is all that is necessary to make image creation asynchronous and to provide a QMP interface that can be marked stable, but it still lacks a few features that jobs usually provide: The job will ignore pause commands and it doesn't publish more than very basic progress yet (total-progress is 1 and current-progress advances from 0 to 1 when the driver callbacks returns). These features can be added later without breaking compatibility. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2018-05-30 13:31:07 +02:00
Kevin Wolf	1266c9b9f5	job: Add error message for failing jobs So far we relied on job->ret and strerror() to produce an error message for failed jobs. Not surprisingly, this tends to result in completely useless messages. This adds a Job.error field that can contain an error string for a failing job, and a parameter to job_completed() that sets the field. As a default, if NULL is passed, we continue to use strerror(job->ret). All existing callers are changed to pass NULL. They can be improved in separate patches. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2018-05-30 13:31:01 +02:00
Kevin Wolf	4a5f2779ba	vhdx: Fix vhdx_co_create() return value .bdrv_co_create() is supposed to return 0 on success, but vhdx could return a positive value instead. Fix this. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2018-05-29 22:19:57 +02:00
Kevin Wolf	53618dd838	vdi: Fix vdi_co_do_create() return value .bdrv_co_create() is supposed to return 0 on success, but vdi could return a positive value instead. Fix this. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2018-05-29 22:19:57 +02:00
Alberto Garcia	7af5eea9b3	qcow2: Fix Coverity warning when calculating the refcount cache size MIN_REFCOUNT_CACHE_SIZE is 4 and the cluster size is guaranteed to be at most 2MB, so the minimum refcount cache size (in bytes) is always going to fit in a 32-bit integer. Coverity doesn't know that, and since we're storing the result in a uint64_t (*refcount_cache_size) it thinks that we need the 64 bits and that we probably want to do a 64-bit multiplication to prevent the result from being truncated. This is a false positive in this case, but it's a fair warning. We could do a 64-bit multiplication to get rid of it, but since we know that a 32-bit variable is enough to store this value let's simply reuse min_refcount_cache, make it a normal int and stop doing casts. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-05-29 20:09:17 +02:00
Kevin Wolf	30a5c887bf	job: Move progress fields to Job BlockJob has fields .offset and .len, which are actually misnomers today because they are no longer tied to block device sizes, but just progress counters. As such they make a lot of sense in generic Jobs. This patch moves the fields to Job and renames them to .progress_current and .progress_total to describe their function better. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:51 +02:00
Kevin Wolf	2e1795b581	job: Add job_transition_to_ready() The transition to the READY state was still performed in the BlockJob layer, in the same function that sent the BLOCK_JOB_READY QMP event. This patch brings the state transition to the Job layer and implements the QMP event using a notifier called from the Job layer, like we already do for other events related to state transitions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:51 +02:00
Kevin Wolf	198c49cc8d	job: Add job_yield() This moves block_job_yield() to the Job layer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:51 +02:00
Kevin Wolf	3d70ff53b6	job: Move completion and cancellation to Job This moves the top-level job completion and cancellation functions from BlockJob to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-05-23 14:30:51 +02:00
Kevin Wolf	62c9e4162a	job: Switch transactions to JobTxn This doesn't actually move any transaction code to Job yet, but it renames the type for transactions from BlockJobTxn to JobTxn and makes them contain Jobs rather than BlockJobs Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	6a74c075ac	job: Move job_finish_sync() to Job block_job_finish_sync() doesn't contain anything block job specific any more, so it can be moved to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	3453d97243	job: Move .complete callback to Job This moves the .complete callback that tells a READY job to complete from BlockJobDriver to JobDriver. The wrapper function job_complete() doesn't require anything block job specific any more and can be moved to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	b69f777dd9	job: Add job_drain() block_job_drain() contains a blk_drain() call which cannot be moved to Job, so add a new JobDriver callback JobDriver.drain which has a common implementation for all BlockJobs. In addition to this we keep the existing BlockJobDriver.drain callback that is called by the common drain implementation for all block jobs. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	004e95df98	job: Convert block_job_cancel_async() to Job block_job_cancel_async() did two things that were still block job specific: * Setting job->force. This field makes sense on the Job level, so we can just move it. While at it, rename it to job->force_cancel to make its purpose more obvious. * Resetting the I/O status. This can't be moved because generic Jobs don't have an I/O status. What the function really implements is a user resume, except without entering the coroutine. Consequently, it makes sense to call the .user_resume driver callback here which already resets the I/O status. The old block_job_cancel_async() has two separate if statements that check job->iostatus != BLOCK_DEVICE_IO_STATUS_OK and job->user_paused. However, the former condition always implies the latter (as is asserted in block_job_iostatus_reset()), so changing the explicit call of block_job_iostatus_reset() on the former condition with the .user_resume callback on the latter condition is equivalent and doesn't need to access any BlockJob specific state. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	4ad351819b	job: Move single job finalisation to Job This moves the finalisation of a single job from BlockJob to Job. Some part of this code depends on job transactions, and job transactions call this code, we introduce some temporary calls from Job functions to BlockJob ones. This will be fixed once transactions move to Job, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	bb02b65c7d	job: Move BlockJobCreateFlags to Job This renames the BlockJobCreateFlags constants, moves a few JOB_INTERNAL checks to job_create() and the auto_{finalize,dismiss} fields from BlockJob to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	b15de82867	job: Move pause/resume functions to Job While we already moved the state related to job pausing to Job, the functions to do were still BlockJob only. This commit moves them over to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	5d43e86e11	job: Add job_sleep_ns() There is nothing block layer specific about block_job_sleep_ns(), so move the function to Job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	da01ff7f38	job: Move coroutine and related code to Job This commit moves some core functions for dealing with the job coroutine from BlockJob to Job. This includes primarily entering the coroutine (both for the first and reentering) and yielding explicitly and at pause points. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	1908a5590c	job: Move defer_to_main_loop to Job Move the defer_to_main_loop functionality from BlockJob to Job. The code can be simplified because we can use job->aio_context in job_defer_to_main_loop_bh() now, instead of having to access the BlockDriverState. Probably taking the data->aio_context lock in addition was already unnecessary in the old code because we didn't actually make use of anything protected by the old AioContext except getting the new AioContext, in case it changed between scheduling the BH and running it. But it's certainly unnecessary now that the BDS isn't accessed at all any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:50 +02:00
Kevin Wolf	daa7f2f946	job: Move cancelled to Job We cannot yet move the whole logic around job cancelling to Job because it depends on quite a few other things that are still only in BlockJob, but we can move the cancelled field at least. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:49 +02:00
Kevin Wolf	80fa2c756b	job: Add reference counting This moves reference counting from BlockJob to Job. In order to keep calling the BlockJob cleanup code when the job is deleted via job_unref(), introduce a new JobDriver.free callback. Every block job must use block_job_free() for this callback, this is asserted in block_job_create(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:49 +02:00
Kevin Wolf	a50c2ab858	job: Move state transitions to Job This moves BlockJob.status and the closely related functions (block_)job_state_transition() and (block_)job_apply_verb to Job. The two QAPI enums are renamed to JobStatus and JobVerb. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-05-23 14:30:49 +02:00
Kevin Wolf	252291eaea	job: Add JobDriver.job_type This moves the job_type field from BlockJobDriver to JobDriver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:49 +02:00
Kevin Wolf	8e4c87000f	job: Rename BlockJobType into JobType QAPI types aren't externally visible, so we can rename them without causing problems. Before we add a job type to Job, rename the enum so it can be used for more than just block jobs. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:49 +02:00
Kevin Wolf	33e9e9bd62	job: Create Job, JobDriver and job_create() This is the first step towards creating an infrastructure for generic background jobs that aren't tied to a block device. For now, Job only stores its ID and JobDriver, the rest stays in BlockJob. The following patches will move over more parts of BlockJob to Job if they are meaningful outside the context of a block job. BlockJob.driver is now redundant, but this patch leaves it around to avoid unnecessary churn. The next patches will get rid of almost all of its uses anyway so that it can be removed later with much less churn. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-23 14:30:49 +02:00
Peter Maydell	ae8622ec19	sheepdog: Remove unnecessary NULL check in sd_prealloc() In commit `8b9ad56e9c`, we removed the code that could result in our getting to sd_prealloc()'s out_with_err_set label with a NULL blk pointer. That makes the NULL check in the error-handling path unnecessary, and Coverity gripes about it (CID `1390636`). Delete the redundant check. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-05-23 13:29:06 +02:00
Laurent Vivier	4a4ff4c58f	Remove unnecessary variables for function return value Re-run Coccinelle script scripts/coccinelle/return_directly.cocci Signed-off-by: Laurent Vivier <lvivier@redhat.com> ppc part Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2018-05-20 08:48:13 +03:00
Kevin Wolf	c82be42cc8	nfs: Remove processed options from QDict Commit `c22a03454` QAPIfied option parsing in the NFS block driver, but forgot to remove all the options we processed. Therefore, we get an error in bdrv_open_inherit(), which thinks the remaining options are invalid. Trying to open an NFS image will result in an error like this: Block protocol 'nfs' doesn't support the option 'server.host' Remove all options from the QDict to make the NFS driver work again. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20180516160816.26259-1-kwolf@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2018-05-16 13:37:47 -04:00
Kevin Wolf	54b7af4369	nfs: Fix error path in nfs_options_qdict_to_qapi() Don't throw away local_err, but propagate it to errp. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20180516161034.27440-1-kwolf@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2018-05-16 13:37:47 -04:00
Max Reitz	228345bf5d	block: Support BDRV_REQ_WRITE_UNCHANGED in filters Update the rest of the filter drivers to support BDRV_REQ_WRITE_UNCHANGED. They already forward write request flags to their children, so we just have to announce support for it. This patch does not cover the replication driver because that currently does not support flags at all, and because it just grabs the WRITE permission for its children when it can, so we should be fine just submitting the incoming WRITE_UNCHANGED requests as normal writes. It also does not cover format drivers for similar reasons. They all use bdrv_format_default_perms() as their .bdrv_child_perm() implementation so they just always grab the WRITE permission for their file children whenever possible. In addition, it often would be difficult to ascertain whether incoming unchanging writes end up as unchanging writes in their files. So we just leave them as normal potentially changing writes. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180421132929.21610-7-mreitz@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-05-15 16:15:21 +02:00
Max Reitz	1b1a920b71	block/quorum: Support BDRV_REQ_WRITE_UNCHANGED We just need to forward it to quorum's children (except in case of a rewrite because of corruption), but for that we first have to support flags in child requests at all. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180421132929.21610-6-mreitz@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-05-15 16:15:21 +02:00
Max Reitz	7adcf59fec	block: Set BDRV_REQ_WRITE_UNCHANGED for COR writes Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180421132929.21610-5-mreitz@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-05-15 16:15:21 +02:00
Max Reitz	c6035964f8	block: Add BDRV_REQ_WRITE_UNCHANGED flag This flag signifies that a write request will not change the visible disk content. With this flag set, it is sufficient to have the BLK_PERM_WRITE_UNCHANGED permission instead of BLK_PERM_WRITE. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180421132929.21610-4-mreitz@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-05-15 16:15:21 +02:00
Max Reitz	6c6f24fd84	block: Add COR filter driver This adds a simple copy-on-read filter driver. It relies on the already existing COR functionality in the central block layer code, which may be moved here once we no longer need it there. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20180421132929.21610-2-mreitz@redhat.com Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-05-15 16:15:21 +02:00
Alberto Garcia	52253998ec	qcow2: Give the refcount cache the minimum possible size by default The L2 and refcount caches have default sizes that can be overridden using the l2-cache-size and refcount-cache-size (an additional parameter named cache-size sets the combined size of both caches). Unless forced by one of the aforementioned parameters, QEMU will set the unspecified sizes so that the L2 cache is 4 times larger than the refcount cache. This is based on the premise that the refcount metadata needs to be only a fourth of the L2 metadata to cover the same amount of disk space. This is incorrect for two reasons: a) The amount of disk covered by an L2 table depends solely on the cluster size, but in the case of a refcount block it depends on the cluster size and the width of each refcount entry. The 4/1 ratio is only valid with 16-bit entries (the default). b) When we talk about disk space and L2 tables we are talking about guest space (L2 tables map guest clusters to host clusters), whereas refcount blocks are used for host clusters (including L1/L2 tables and the refcount blocks themselves). On a fully populated (and uncompressed) qcow2 file, image size > virtual size so there are more refcount entries than L2 entries. Problem (a) could be fixed by adjusting the algorithm to take into account the refcount entry width. Problem (b) could be fixed by increasing a bit the refcount cache size to account for the clusters used for qcow2 metadata. However this patch takes a completely different approach and instead of keeping a ratio between both cache sizes it assigns as much as possible to the L2 cache and the remainder to the refcount cache. The reason is that L2 tables are used for every single I/O request from the guest and the effect of increasing the cache is significant and clearly measurable. Refcount blocks are however only used for cluster allocation and internal snapshots and in practice are accessed sequentially in most cases, so the effect of increasing the cache is negligible (even when doing random writes from the guest). So, make the refcount cache as small as possible unless the user explicitly asks for a larger one. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 9695182c2eb11b77cb319689a1ebaa4e7c9d6591.1523968389.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-05-15 16:15:21 +02:00
Alberto Garcia	74c44a5934	Fix error message about compressed clusters with OFLAG_COPIED Compressed clusters are not supposed to have the COPIED bit set. "qemu-img check" detects that and prints an error message reporting the number of the affected host cluster. This doesn't make much sense because compressed clusters are not aligned to host clusters, so it would be better to report the offset instead. Plus, the calculation is wrong and it uses the raw L2 entry as if it was simply an offset. This patch fixes the error message and reports the offset of the compressed cluster. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 0f687957feb72e80c740403191a47e607c2463fe.1523376013.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-05-15 16:15:21 +02:00
Kevin Wolf	bd21935b50	blockjob: Add block_job_driver() The backup block job directly accesses the driver field in BlockJob. Add a wrapper for getting it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-15 16:11:50 +02:00
Kevin Wolf	dee81d5111	blockjob: Introduce block_job_ratelimit_get_delay() This gets us rid of more direct accesses to BlockJob fields from the job drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-15 16:11:50 +02:00
Kevin Wolf	18bb69287e	blockjob: Implement block_job_set_speed() centrally All block job drivers support .set_speed and all of them duplicate the same code to implement it. Move that code to blockjob.c and remove the now useless callback. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-15 16:11:50 +02:00
Kevin Wolf	f05fee508f	blockjob: Move RateLimit to BlockJob Every block job has a RateLimit, and they all do the exact same thing with it, so it should be common infrastructure. Move the struct field for a start. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-15 16:11:50 +02:00
Kevin Wolf	05df8a6a2b	blockjob: Wrappers for progress counter access Block job drivers are not expected to mess with the internals of the BlockJob object, so provide wrapper functions for one of the cases where they still do it: Updating the progress counter. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>	2018-05-15 16:11:49 +02:00
Eric Blake	e18a58b4e3	block: Merge .bdrv_co_writev{,_flags} in drivers We have too many driver callback interfaces; simplify the mess somewhat by merging the flags parameter of .bdrv_co_writev_flags() into .bdrv_co_writev(). Note that as long as a driver doesn't set .supported_write_flags, the flags argument will be 0 and behavior is identical. Also note that the public function bdrv_co_writev() still lacks a flags argument; so the driver signature is thus intentionally slightly different. But that's not the end of the world, nor the first time that the driver interface differs slightly from the public interface. Ideally, we should be rewriting all of these drivers to use modern byte-based interfaces. But that's a more invasive patch to write and audit, compared to the simplification done here. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-05-15 16:11:41 +02:00
Eric Blake	edfab6a08b	block: Drop last of the sector-based aio callbacks We are gradually moving away from sector-based interfaces, towards byte-based. Now that all drivers with aio callbacks are using the byte-based interfaces, we can remove the sector-based versions. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-05-15 16:11:41 +02:00
Eric Blake	918889b291	vxhs: Switch to byte-based callbacks We are gradually moving away from sector-based interfaces, towards byte-based. Make the change for the last few sector-based callbacks in the vxhs driver. Note that the driver was already using byte-based calls for performing actual I/O, so this just gets rid of a round trip of scaling; however, as I don't know if VxHS is tolerant of non-sector AIO operations, I went with the conservative approach of adding .bdrv_refresh_limits to override the block layer defaults back to the pre-patch value of 512. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-05-15 16:11:41 +02:00
Eric Blake	e8e16d4baf	rbd: Switch to byte-based callbacks We are gradually moving away from sector-based interfaces, towards byte-based. Make the change for the last few sector-based callbacks in the rbd driver. Note that the driver was already using byte-based calls for performing actual I/O, so this just gets rid of a round trip of scaling; however, as I don't know if RBD is tolerant of non-sector AIO operations, I went with the conservate approach of adding .bdrv_refresh_limits to override the block layer defaults back to the pre-patch value of 512. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-05-15 16:11:41 +02:00
Eric Blake	b3241e9274	null: Switch to byte-based read/write We are gradually moving away from sector-based interfaces, towards byte-based. Make the change for the last few sector-based callbacks in the null-co and null-aio drivers. Note that since the null driver does nothing on writes, it trivially supports the BDRV_REQ_FUA flag (all writes have already landed to the same bit-bucket without needing an extra flush call). Also, since the null driver does just as well with byte-based requests, we can now avoid cycles wasted on read-modify-write by taking advantage of the block layer now defaulting the alignment to 1 instead of 512. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-05-15 16:11:41 +02:00
Eric Blake	de7056a3f9	file-win32: Switch to byte-based callbacks We are gradually moving away from sector-based interfaces, towards byte-based. Make the change for the last few sector-based callbacks in the file-win32 driver. Note that the driver was already using byte-based calls for performing actual I/O, so this just gets rid of a round trip of scaling; however, as I don't know if Windows is tolerant of non-sector AIO operations, I went with the conservative approach of modifying .bdrv_refresh_limits to override the block layer defaults back to the pre-patch value of 512. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-05-15 16:11:41 +02:00
Eric Blake	e31f6864a6	block: Support byte-based aio callbacks We are gradually moving away from sector-based interfaces, towards byte-based. Add new sector-based aio callbacks for read and write, to match the fact that bdrv_aio_pdiscard is already byte-based. Ideally, drivers should be converted to use coroutine callbacks rather than aio; but that is not quite as trivial (and if we were to do that conversion, the null-aio driver would disappear), so for the short term, converting the signature but keeping things with aio is easier. However, we CAN declare that a driver that uses the byte-based aio interfaces now defaults to byte-based operations, and must explicitly provide a refresh_limits override to stick with larger alignments (making the alignment issues more obvious directly in the drivers touched in the next few patches). Once all drivers are converted, the sector-based aio callbacks will be removed; in the meantime, a FIXME comment is added due to a slight inefficiency that will be touched up as part of that later cleanup. Simplify some instances of 'bs->drv' into 'drv' while touching this, since the local variable already exists to reduce typing. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-05-15 16:11:41 +02:00
Daniel Henrique Barboza	7803696d85	block-backend: simplify blk_get_aio_context blk_get_aio_context verifies if BlockDriverState bs is not NULL, return bdrv_get_aio_context(bs) if true or qemu_get_aio_context() otherwise. However, bdrv_get_aio_context from block.c already does this verification itself, also returning qemu_get_aio_context() if bs is NULL: AioContext bdrv_get_aio_context(BlockDriverState bs) { return bs ? bs->aio_context : qemu_get_aio_context(); } This patch simplifies blk_get_aio_context to simply call bdrv_get_aio_context instead of replicating the same logic. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-05-15 16:11:41 +02:00
Stefan Hajnoczi	31be8a2a97	block/file-posix: add x-check-page-cache=on\|off option mincore(2) checks whether pages are resident. Use it to verify that page cache has been dropped. You can trigger a verification failure by mmapping the image file from another process that loads a byte from a page, forcing it to become resident. bdrv_co_invalidate_cache() will fail while that process is alive. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180427162312.18583-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-05-11 16:43:05 +01:00
Stefan Hajnoczi	dd577a26ff	block/file-posix: implement bdrv_co_invalidate_cache() on Linux On Linux posix_fadvise(POSIX_FADV_DONTNEED) invalidates pages. Use this to drop page cache on the destination host during shared storage migration. This way the destination host will read the latest copy of the data and will not use stale data from the page cache. The flow is as follows: 1. Source host writes out all dirty pages and inactivates drives. 2. QEMU_VM_EOF is sent on migration stream. 3. Destination host invalidates caches before accessing drives. This patch enables live migration even with -drive cache.direct=off. Terms and conditions may apply, please see patch for details. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180427162312.18583-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-05-11 16:43:05 +01:00
Kevin Wolf	a2cb9239b7	sheepdog: Fix sd_co_create_opts() memory leaks Both the option string for the 'redundancy' option and the SheepdogRedundancy object that is created accordingly could be leaked in error paths. This fixes the memory leaks. Reported by Coverity (CID 1390614 and 1390641). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20180503153509.22223-1-kwolf@redhat.com Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2018-05-08 10:47:27 -04:00
Max Reitz	eb36639f7b	block/mirror: Make cancel always cancel pre-READY Commit `b76e4458b1` made the mirror block job respect block-job-cancel's @force flag: With that flag set, it would now always really cancel, even post-READY. Unfortunately, it had a side effect: Without that flag set, it would now never cancel, not even before READY. Considering that is an incompatible change and not noted anywhere in the commit or the description of block-job-cancel's @force parameter, this seems unintentional and we should revert to the previous behavior, which is to immediately cancel the job when block-job-cancel is called before source and target are in sync (i.e. before the READY event). Cc: qemu-stable@nongnu.org Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1572856 Reported-by: Yanan Fu <yfu@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20180501220509.14152-2-mreitz@redhat.com Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2018-05-08 10:47:27 -04:00
Stefan Hajnoczi	ddc4115efd	block/mirror: honor ratelimit again Commit `b76e4458b1` ("block/mirror: change the semantic of 'force' of block-job-cancel") accidentally removed the ratelimit in the mirror job. Reintroduce the ratelimit but keep the block-job-cancel force=true behavior that was added in commit `b76e4458b1`. Note that block_job_sleep_ns() returns immediately when the job is cancelled. Therefore it's safe to unconditionally call block_job_sleep_ns() - a cancelled job does not sleep. This commit fixes the non-deterministic qemu-iotests 185 output. The test relies on the ratelimit to make the job sleep until the 'quit' command is processed. Previously the job could complete before the 'quit' command was received since there was no ratelimit. Cc: Liang Li <liliang.opensource@gmail.com> Cc: Jeff Cody <jcody@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20180424123527.19168-1-stefanha@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2018-05-08 10:47:27 -04:00
Peter Maydell	c8b7e627b4	nbd patches for 2018-05-04 - Vladimir Sementsov-Ogievskiy: 0/2 fix coverity bugs - Eric Blake: nbd/client: Fix error messages during NBD_INFO_BLOCK_SIZE - Eric Blake: nbd/client: Relax handling of large NBD_CMD_BLOCK_STATUS reply -----BEGIN PGP SIGNATURE----- Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAABCAAGBQJa7F9jAAoJEKeha0olJ0NqwZIH/3LbaF7Q0CcuB6d+nQo3jYm2 fGIb8pV4pZLgC4D4qrelXP2Ttn0RNMLNdq2UR6F66/MAFj2/sF4gRE/p82exZRK3 be2hDQpzKOTgrF7SQN9ccI1df7nrMhvXgo1Y4rhSFZKtMTBPZirDFdAP/2xklSzI jMHE/Iq9Ng16303FR5KEiVtmAWxAaagapcvEKIrD0SpoiAX9jk6dAT5EwOHLi4Lf 2CCe/5iBFC96zLE2xJ8n+esZ1chJJp/2gubmYON/lwLx5fXqYVowywDVNzv+uc8A sg2VMjb/UySOLJ6IxdxgGdln0w46RB7u55nRnyH6LcU2IdBeladhkI7Oh/reOTI= =wnw9 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2018-05-04' into staging nbd patches for 2018-05-04 - Vladimir Sementsov-Ogievskiy: 0/2 fix coverity bugs - Eric Blake: nbd/client: Fix error messages during NBD_INFO_BLOCK_SIZE - Eric Blake: nbd/client: Relax handling of large NBD_CMD_BLOCK_STATUS reply # gpg: Signature made Fri 04 May 2018 14:25:55 BST # gpg: using RSA key A7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" # gpg: aka "[jpeg image of size 6874]" # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2018-05-04: nbd/client: Relax handling of large NBD_CMD_BLOCK_STATUS reply nbd/client: Fix error messages during NBD_INFO_BLOCK_SIZE migration/block-dirty-bitmap: fix memory leak in dirty_bitmap_load_bits nbd/client: fix nbd_negotiate_simple_meta_context Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-05-04 14:42:46 +01:00
Eric Blake	acfd8f7a5f	nbd/client: Relax handling of large NBD_CMD_BLOCK_STATUS reply The NBD spec is proposing a relaxation of NBD_CMD_BLOCK_STATUS where a server may have the final extent per context give a length beyond the original request, if it can easily prove that subsequent bytes have the same status, on the grounds that a client can take advantage of this information for fewer block status requests. Since qemu 2.12 as a client always sends NBD_CMD_FLAG_REQ_ONE, and rejects a server that sends extra length, the upstream NBD spec will probably limit this behavior to clients that don't request REQ_ONE semantics; but it doesn't hurt to relax qemu to always be permissive of this server behavior, even if it continues to use REQ_ONE. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180503222626.1303410-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2018-05-04 08:23:39 -05:00
Marc-André Lureau	f5a74a5a50	qobject: Modify qobject_ref() to return obj For convenience and clarity, make it possible to call qobject_ref() at the time when the reference is associated with a variable, or argument, by making qobject_ref() return the same pointer as given. Use that to simplify the callers. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180419150145.24795-5-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Useless change to qobject_ref_impl() dropped, commit message improved slightly] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2018-05-04 08:27:53 +02:00
Marc-André Lureau	cb3e7f08ae	qobject: Replace qobject_incref/QINCREF qobject_decref/QDECREF Now that we can safely call QOBJECT() on QObject * as well as its subtypes, we can have macros qobject_ref() / qobject_unref() that work everywhere instead of having to use QINCREF() / QDECREF() for QObject and qobject_incref() / qobject_decref() for its subtypes. The replacement is mechanical, except I broke a long line, and added a cast in monitor_qmp_cleanup_req_queue_locked(). Unlike qobject_decref(), qobject_unref() doesn't accept void *. Note that the new macros evaluate their argument exactly once, thus no need to shout them. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180419150145.24795-4-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Rebased, semantic conflict resolved, commit message improved] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2018-05-04 08:27:53 +02:00
Vladimir Sementsov-Ogievskiy	605bc8be42	qcow2: try load bitmaps only once Checking reopen by existence of some bitmaps is wrong, as it may be some other bitmaps, or on the other hand, user may remove bitmaps. This criteria is bad. To simplify things and make behavior more predictable let's just add a flag to remember, that we've already tried to load bitmaps on open and do not want do it again. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20180411122606.367301-2-vsementsov@virtuozzo.com [mreitz: Changed comment wording according to Eric Blake's suggestion] Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-04-16 13:35:32 +02:00
Kevin Wolf	2fe4bba19b	commit/stream: Reset delay_ns Streaming and the commit block job only want to apply throttling when they actually copied data instead of skipping it, so they made the calculation of delay_ns conditional. However, delay_ns isn't reset when skipping some sectors, so instead of not waiting, the old delay is applied again. Properly reset delay_ns where needed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-04-10 16:33:47 +02:00
Jeff Cody	bfb15b4bec	block/rbd: remove processed options from qdict Commit `4bfb274` added some QAPIfication of option parsing in qemu_rbd_open(). We need to remove all the options we processed, otherwise in bdrv_open_inherit() we will think the remaining options are invalid. (This needs to go in 2.12 to avoid a regression that prevents rbd from being opened.) Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2018-04-04 12:05:13 -04:00
Peter Maydell	fd69ad866b	Block layer patches -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJaw6JlAAoJEH8JsnLIjy/W8JcP/06bxQ+8056Vz/+oRAm3VEZN iFJ1w+A1cFVpaqeE1fl+nfluyT+bU4J8YIyeHQlgckFyzQz13v0cSSSpBJYfJAO9 XSO/g96kILwN+h16NNdKX2K7uewVPZgyDbNf+FzM2UHuSSXdSp0ZbeW1yXvB6cpx hk2exA0Vfxm2dq2Xzz00EzGCWdZAnWWesj9pimASQ1W0426t7PJRCrRaUVawaJfT J+IB4zsF4BJ49CvqmeFqyPfDV9q1GF3c9EkXzFgw7qYY2C4F4bAKv82AxWiKEiRI cm1bHEdVyDFuBxzMuGOhZS3g2YkJDnEbkfJ0zj5cKmmYO2Pc0q8DeAMZTvy3aobL R5UawZTgPv2O71hOlRFl/YZwMjhb80BXDq9mx7TrrhQ5/wCAUOs4e1XYHn+kzL3W ohp5r74Qj1s84pGi8sPaXWm8myxalkTicT1cKFHztb+3+DwVPQIOXhMkMqrJyAK8 3GwZsGmtzI5DCIyF3whLbElN/OHRM9zr/T+uwhRQw8bs/XYPU27f367Y+JJGAaZq xTD7vlF49XSOoSDTYcvJfODlloJODyL+CxFS/vpJKfMYgDWYQZiwvYeIEywmwIYE 2bEdgcN/9BOdYK4NRuJzDY+Vra7iVnDvXhsZ8UClp/SPLspJ54zwCny9f5rSIDJF NSiBcahA5femreYqAfWz =quyk -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches # gpg: Signature made Tue 03 Apr 2018 16:48:53 BST # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: iotests: Test abnormally large size in compressed cluster descriptor qemu-iotests: Use ppc64 qemu_arch on ppc64le host iotests: Test preallocated truncate of 2G image block/file-posix: Fix fully preallocated truncate iotests: fix 208 for luks format iotests: Update 186 after commit `ac64273c66` iotests: Update 051 and 186 after commit `1454509726` block: handle invalid lseek returns gracefully gluster: Fix blockdev-add with server.N.type=unix Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-04-04 14:00:07 +01:00
Peter Maydell	e5efa1f5f2	-----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJaw6duAAoJEL2+eyfA3jBXoYUP/07v6FQ0E91eprL2Zg2uowA1 u6e/neVhlzZgMpOevWfnjScBYHA5wjPjM2ZbKckbnFjH60s7gVBuYtbnwcChRMY1 LXJts1LivuQOru5QIf0Fwi2W4vkPzV8rQsWp6nmnIDf/Jzs2iKEe6jy6A2enSLYM FKOrJc1ViNWFpW1y/7mV8gHIAChEC3wEL9L2JUQEvi4VQD7uCwJv0YS5Ci+xjV21 /DPaVfZA4VMNNi+FTRezW+l5m0zOxenxUnDXqpP5yjgPyTpyqfHSCCVK54HLQ2L/ 8uMblucjRRFXl6zsX2Quzwqiq80D9KxidhOKk9OU1Yi+rbDz4vSTlE/fBRHKVVln Rd8DhTK7aPG1bsiQSHCqcd4+a9qsHOn0LY7tS1m78CaVx1O8JhMBe5Y+NJEMR8AM iiWss8QyXggDkagzyiGrMHguCI5yM0EWqYfWtjLBiKOkzTjn9xzSbAbn9JK9TBBa aZ6JwNE1krtsTydJ1K37KFNabX/yyZR1KlYUc90FwRC1b9pfPHCmaMQ2imx/pg/y f3p3uG1UQmBXMMgchstqc1Xf4BqEeOV5pBKlF2Dwodi1ewZoS+WtuvqjURs2aJUr 2YLOGPBtPHboWl+kjmn+csJJ8zSWw9byRyHHKJN7zFEx3iXZ4I9HMauzap+goWJq 2NCKYjzyWkuR7YLR4w3q =vLaJ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging # gpg: Signature made Tue 03 Apr 2018 17:10:22 BST # gpg: using RSA key BDBE7B27C0DE3057 # gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>" # gpg: aka "Jeffrey Cody <jeff@codyprime.org>" # gpg: aka "Jeffrey Cody <codyprime@gmail.com>" # Primary key fingerprint: 9957 4B4D 3474 90E7 9D98 D624 BDBE 7B27 C0DE 3057 * remotes/cody/tags/block-pull-request: gluster: Fix blockdev-add with server.N.type=unix blockjob: use qapi enum helpers blockjob: leak fix, remove from txn when failing early Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-04-04 12:33:23 +01:00
Max Reitz	82b45e0a0b	block/file-posix: Fix fully preallocated truncate Storing the lseek() result in an int results in it overflowing when the file is at least 2 GB big. Then, we have a 50 % chance of the result being "negative" and thus thinking an error occurred when actually everything went just fine. So we should use the correct type for storing the result: off_t. Reported-by: Daniel P. Berrange <berrange@redhat.com> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549231 Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20180228131315.30194-2-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-04-03 17:39:37 +02:00
Kevin Wolf	9dae635afa	gluster: Fix blockdev-add with server.N.type=unix The legacy command line interface gets the socket path from an option called 'socket'. QAPI in contract uses SocketAddress, where the corresponding option is called 'path'. Fix the gluster block driver to accept both 'socket' and 'path', with 'path' being the preferred syntax. https://bugzilla.redhat.com/show_bug.cgi?id=1545155 Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20180403110810.25624-1-kwolf@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2018-04-03 09:57:14 -04:00
Jeff Cody	a03083a017	block: handle invalid lseek returns gracefully In commit `223a23c198`, we implemented a workaround in the gluster driver to handle invalid values returned for SEEK_DATA or SEEK_HOLE. In some instances, these same invalid values can be seen in the posix file handler as well - for example, it has been reported on FUSE gluster mounts. Calling assert() for these invalid values is overly harsh; we can safely return -EIO and allow this case to be treated as a "learned nothing" case (e.g., D4 / H4, as commented in the code). This patch does the same thing that `223a23c198` did for gluster.c, except in file-posix.c Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-04-03 15:25:17 +02:00
Kevin Wolf	3e4d88eabf	gluster: Fix blockdev-add with server.N.type=unix The legacy command line interface gets the socket path from an option called 'socket'. QAPI in contract uses SocketAddress, where the corresponding option is called 'path'. Fix the gluster block driver to accept both 'socket' and 'path', with 'path' being the preferred syntax. https://bugzilla.redhat.com/show_bug.cgi?id=1545155 Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-04-03 15:20:36 +02:00
Eric Blake	00d96a4612	nbd: Fix 32-bit compilation on BLOCK_STATUS iotests 123 and 209 fail on 32-bit platforms. The culprit: sizeof(extent) is wrong; we want sizeof(*extent). But since the struct is 8 bytes, it happened to work on 64-bit platforms where the pointer is also 8 bytes (nasty). Fixes: `78a33ab58` Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180327210517.1804242-1-eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2018-04-02 08:45:21 -05:00
Peter Maydell	f55e88f2ab	qapi patches for 2018-03-27, 2.12-rc1 - Marc-André Lureau: qmp-test: fix response leak - Eric Blake: tests: Silence false positive warning on generated test name - Laurent Vivier: 0/4 (partial) coccinelle: re-run scripts from scripst/coccinelle - Peter Xu: 0/8 Monitor: some oob related patches (fixes, new param, tests) - Satheesh Rajendran: hmp.c: Revert hmp_info_cpus output format change -----BEGIN PGP SIGNATURE----- Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAABCAAGBQJaumDMAAoJEKeha0olJ0NqY34IAJ1AJLZeqtb9HIPDrL/iydXQ qrMPxfz/fbcjMbomYvltvCYB3GMQw88YVtTuu2kS5aZsr6vLhpw2ZjGrb+a0llFM lF6oERg/6VmfQNaeqcHEzORIdG6/GEctxoWS07f+NgSrjdxNNHyU+i/LGvvA5K4m x7akPw3DZIF/oBLDlFgtl80KtOf8vQ8QitCIp2nsURQFLHvjVSoXHGUfhoiUUgNA 89rF62Repq9xZXSsFr+Jpp4/bPGxA717sixXSxFUDuTVIZGnffbt4wVxLi4TKSx1 1Ho6psPK3d82rQ9OeIc6CqulElLMPGn7aNwR1VR6ig0sB409K5PQJybJDqQD/C4= =ld/c -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-qapi-2018-03-27-v2' into staging qapi patches for 2018-03-27, 2.12-rc1 - Marc-André Lureau: qmp-test: fix response leak - Eric Blake: tests: Silence false positive warning on generated test name - Laurent Vivier: 0/4 (partial) coccinelle: re-run scripts from scripst/coccinelle - Peter Xu: 0/8 Monitor: some oob related patches (fixes, new param, tests) - Satheesh Rajendran: hmp.c: Revert hmp_info_cpus output format change # gpg: Signature made Tue 27 Mar 2018 16:18:36 BST # gpg: using RSA key A7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" # gpg: aka "[jpeg image of size 6874]" # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-qapi-2018-03-27-v2: hmp.c: Revert hmp_info_cpus output format change tests: qmp-test: add test for new "x-oob" tests: Add parameter to qtest_init_without_qmp_handshake monitor: new parameter "x-oob" qmp: cleanup qmp queues properly tests: add oob-test for qapi-schema tests: let qapi-schema tests detect oob qapi: restrict allow-oob value to be "true" qmp: fix qmp_capabilities error regression qdict: remove useless cast error: Remove NULL checks on error_propagate() calls error: Strip trailing '\n' from error string arguments (again again) tests: Silence false positive warning on generated test name qmp-test: fix response leak Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-03-27 19:20:57 +01:00
Peter Maydell	6cf38cbf29	-----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJaulgHAAoJEJykq7OBq3PI0owH/3isnzyO8efBzlv4HlWb1qV0 F7m9urAKZm1QLCkJBlli/G5iNaNUfrXSum4qYK22XAiwV5WAoPAjwrOvtTRRTSfW vFGAucmf6NKkomuZRAiPQftqR+DL9VhBC7mcdRwfwWZ82AGZZnAQsDf8y9e3kal2 YPwogmvQi+bn2Psv6q3zYtrTW9bC1VJWzYFP4xcSpcxGOY/WfLhPX9nrhjQz07mm h/y9mWqXA8PytmopMLj31xRmhTSndIwo+uvScLH95UXRuAOhjyHMvvD4HfI4nkR+ WD94pE6UiUpIdCZ6fh8H/jaCpN8jPnXI4XJg9cQcRHZYkOrHwQ8G1xxoOintjW0= =55b0 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging # gpg: Signature made Tue 27 Mar 2018 15:41:11 BST # gpg: using RSA key 9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: MAINTAINERS: add include/block/aio-wait.h coroutine: add test-aio coroutine queue chaining test case coroutine: avoid co_queue_wakeup recursion queue: add QSIMPLEQ_PREPEND() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-03-27 17:11:33 +01:00
Laurent Vivier	625eaca9e5	qdict: remove useless cast Re-run Coccinelle script scripts/coccinelle/qobject.cocci Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20180323143202.28879-5-lvivier@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-27 10:17:32 -05:00
Peter Maydell	bdc408e91b	A fix for dirty bitmap migration through shared storage, and a VMDK patch keeping us from creating too large extents. -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJauVVBAAoJEPQH2wBh1c9AGtYIAJ1ojl6+guKQHjtzv9W8ch50 2tzH9/lFkhq/Tyay8MCcq1dWQr23tKqusxi6fDHVdc8XIx6fDuPFzWxUQwwLvS81 9CA6Qs2NngkiAs89ZZ0Sc9aj+LzFbKvXDXPYd4FFeLVVzD0CA4qghH5THrrH6LRm 4DoRUK1QejrOC0v2zCRQvEN6RlI6WFGCf9YiYKkdrvswnjtLgOobCt7TvC9Nade2 smGyxtDENkV6bAZgQgxMAlf88GCyKslb4Fu6U+sfKMDejWmlYEREbBsQc+/Gp6Af R6ZeiXEJ+KUF34RktPTmhJlUbzRc4Mw0Ij7Abrfhtpre9hifIp62XSrM8sasgLo= =k+ly -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2018-03-26' into staging A fix for dirty bitmap migration through shared storage, and a VMDK patch keeping us from creating too large extents. # gpg: Signature made Mon 26 Mar 2018 21:17:05 BST # gpg: using RSA key F407DB0061D5CF40 # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * remotes/maxreitz/tags/pull-block-2018-03-26: vmdk: return ERROR when cluster sector is larger than vmdk limitation iotests: enable shared migration cases in 169 qcow2: fix bitmaps loading when bitmaps already exist qcow2-bitmap: add qcow2_reopen_bitmaps_rw_hint() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-03-27 14:11:30 +01:00
Stefan Hajnoczi	c40a254570	coroutine: avoid co_queue_wakeup recursion qemu_aio_coroutine_enter() is (indirectly) called recursively when processing co_queue_wakeup. This can lead to stack exhaustion. This patch rewrites co_queue_wakeup in an iterative fashion (instead of recursive) with bounded memory usage to prevent stack exhaustion. qemu_co_queue_run_restart() is inlined into qemu_aio_coroutine_enter() and the qemu_coroutine_enter() call is turned into a loop to avoid recursion. There is one change that is worth mentioning: Previously, when coroutine A queued coroutine B, qemu_co_queue_run_restart() entered coroutine B from coroutine A. If A was terminating then it would still stay alive until B yielded. After this patch B is entered by A's parent so that a A can be deleted immediately if it is terminating. It is safe to make this change since B could never interact with A if it was terminating anyway. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20180322152834.12656-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-03-27 13:05:28 +01:00
yuchenlin	a77672ea3d	vmdk: return ERROR when cluster sector is larger than vmdk limitation VMDK has a hard limitation of extent size, which is due to the size of grain table entry is 32 bits. It means it can only point to a grain located at offset = 2^32. To avoid writing the user data beyond limitation and record a useless offset in grain table. We should return ERROR here. Signed-off-by: yuchenlin <yuchenlin@synology.com> Message-id: 20180322133337.28024-1-yuchenlin@synology.com Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-03-26 21:17:24 +02:00
Vladimir Sementsov-Ogievskiy	2d949dfcef	qcow2: fix bitmaps loading when bitmaps already exist On reopen with existing bitmaps, instead of loading bitmaps, lets reopen them if needed. This also fixes bitmaps migration through shared storage. Consider the case. Persistent bitmaps are stored on bdrv_inactivate. Then, on destination process_incoming_migration_bh() calls bdrv_invalidate_cache_all() which leads to qcow2_load_autoloading_dirty_bitmaps() which fails if bitmaps are already loaded on destination start. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20180320170521.32152-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-03-26 21:17:24 +02:00
Vladimir Sementsov-Ogievskiy	b1336cc2ec	qcow2-bitmap: add qcow2_reopen_bitmaps_rw_hint() Add version of qcow2_reopen_bitmaps_rw, which do the same work but also return a hint about was header updated or not. This will be used in the following fix for bitmaps reloading after migration. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20180320170521.32152-2-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-03-26 21:17:24 +02:00
Kevin Wolf	6f16f7c562	vhdx: Check for 4 GB maximum log size on creation It's unclear what the real maximum is, but we use an uint32_t to store the log size in vhdx_co_create(), so we should check that the given value fits in 32 bits. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2018-03-26 12:17:43 +02:00
Kevin Wolf	0fcc38e7d0	vhdx: Don't use error_setg_errno() with constant errno error_setg_errno() is meant for cases where we got an errno from the OS that can add useful extra information to an error message. It's pointless if we pass a constant errno, these cases should use plain error_setg(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2018-03-26 12:17:43 +02:00
Kevin Wolf	b412f49407	vhdx: Require power-of-two block size on create Images with a non-power-of-two block size are invalid and cannot be opened. Reject such block sizes when creating an image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2018-03-26 12:17:43 +02:00
Kevin Wolf	2332d82589	parallels: Check maximum cluster size on create It's unclear what the real maximum cluster size is for the Parallels format, but let's at least make sure that we don't get integer overflows in our .bdrv_co_create implementation. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-03-26 12:17:43 +02:00
Kevin Wolf	120bc742c0	luks: Turn another invalid assertion into check Commit `e39e959e` fixed an invalid assertion in the .bdrv_length implementation, but left a similar assertion in place for .bdrv_truncate. Instead of crashing when the user requests a too large image size, fail gracefully. A file size of exactly INT64_MAX caused failure before, but is actually legal. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2018-03-26 12:17:40 +02:00
Kevin Wolf	95a14d51b2	vdi: Fix build with CONFIG_VDI_DEBUG Use qemu_uuid_unparse() instead of uuid_unparse() to make vdi.c compile again when CONFIG_VDI_DEBUG is set. In order to prevent future bitrot, replace '#ifdef CONFIG_VDI_DEBUG' by 'if (VDI_DEBUG)' so that the compiler always sees the code. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-03-26 12:16:12 +02:00
Kevin Wolf	61fa64871d	vdi: Change 'static' create option to 'preallocation' in QMP What static=on really does is what we call metadata preallocation for other block drivers. While we can still change the QMP interface, make it more consistent by using 'preallocation' for VDI, too. This doesn't implement any new functionality, so the only supported preallocation modes are 'off' and 'metadata' for now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-03-26 12:16:12 +02:00
Alberto Garcia	abf754fe40	qcow2: Reset free_cluster_index when allocating a new refcount block When we try to allocate new clusters we first look for available ones starting from s->free_cluster_index and once we find them we increase their reference counts. Before we get to call update_refcount() to do this last step s->free_cluster_index is already pointing to the next cluster after the ones we are trying to allocate. During update_refcount() it may happen however that we also need to allocate a new refcount block in order to store the refcounts of these new clusters (and to complicate things further that may also require us to grow the refcount table). After all this we don't know if the clusters that we originally tried to allocate are still available, so we return -EAGAIN to ask the caller to restart the search for free clusters. This is what can happen in a common scenario: 1) We want to allocate a new cluster and we see that cluster N is free. 2) We try to increase N's refcount but all refcount blocks are full, so we allocate a new one at N+1 (where s->free_cluster_index was pointing at). 3) Once we're done we return -EAGAIN to look again for a free cluster, but now s->free_cluster_index points at N+2, so that's the one we allocate. Cluster N remains unallocated and we have a hole in the qcow2 file. This can be reproduced easily: qemu-img create -f qcow2 -o cluster_size=512 hd.qcow2 1M qemu-io -c 'write 0 124k' hd.qcow2 After this the image has 132608 bytes (256 clusters), and the refcount block is full. If we write 512 more bytes it should allocate two new clusters: the data cluster itself and a new refcount block. qemu-io -c 'write 124k 512' hd.qcow2 However the image has now three new clusters (259 in total), and the first one of them is empty (and unallocated): dd if=hd.qcow2 bs=512c skip=256 count=1 \| hexdump -C If we write larger amounts of data in the last step instead of the 512 bytes used in this example we can create larger holes in the qcow2 file. What this patch does is reset s->free_cluster_index to its previous value when alloc_refcount_block() returns -EAGAIN. This way the caller will try to allocate again the original clusters if they are still free. The output of iotest 026 also needs to be updated because now that images have no holes some tests fail at a different point and the number of leaked clusters is different. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-26 12:16:00 +02:00
Fabiano Rosas	8140e786f0	block/blkreplay: Remove protocol-related fields The blkreplay driver is not a protocol so it should implement bdrv_open instead of bdrv_file_open and not provide a protocol_name. Attempts to invoke this driver using protocol syntax (i.e. blkreplay:<filename:options:...>) will now fail gracefully: $ qemu-img info blkreplay:foo qemu-img: Could not open 'blkreplay:foo': Unknown protocol 'blkreplay' Signed-off-by: Fabiano Rosas <farosas@linux.vnet.ibm.com> Reviewed-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-26 12:16:00 +02:00
Fabiano Rosas	a7328ba55f	block/throttle: Remove protocol-related fields The throttle driver is not a protocol so it should implement bdrv_open instead of bdrv_file_open and not provide a protocol_name. Attempts to invoke this driver using protocol syntax (i.e. throttle:<filename:options:...>) will now fail gracefully: $ qemu-img info throttle:foo qemu-img: Could not open 'throttle:foo': Unknown protocol 'throttle' Signed-off-by: Fabiano Rosas <farosas@linux.vnet.ibm.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-26 12:16:00 +02:00
Fabiano Rosas	65d2c3e2f6	block/quorum: Remove protocol-related fields The quorum driver is not a protocol so it should implement bdrv_open instead of bdrv_file_open and not provide a protocol_name. Attempts to invoke this driver using protocol syntax (i.e. quorum:<filename:options:...>) will now fail gracefully: $ qemu-img info quorum:foo qemu-img: Could not open 'quorum:foo': Unknown protocol 'quorum' Signed-off-by: Fabiano Rosas <farosas@linux.vnet.ibm.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-26 12:16:00 +02:00
Fabiano Rosas	cb83d2efe1	block/replication: Remove protocol_name field The protocol_name field is used when selecting a driver via protocol syntax (i.e. <protocol_name>:<filename:options:...>). Drivers that are only selected explicitly (e.g. driver=replication,mode=primary,...) should not have a protocol_name. This patch removes the protocol_name field from the brdv_replication structure so that attempts to invoke this driver using protocol syntax will fail gracefully: $ qemu-img info replication:foo qemu-img: Could not open 'replication:': Unknown protocol 'replication' Buglink: https://bugs.launchpad.net/qemu/+bug/1726733 Signed-off-by: Fabiano Rosas <farosas@linux.vnet.ibm.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-26 12:16:00 +02:00
Vladimir Sementsov-Ogievskiy	7e5c776d15	qapi: add block latency histogram interface Set (and clear) histograms through new command block-latency-histogram-set and show new statistics in query-blockstats results. For now, the command is marked experimental with prefix 'x-', to gain experience with the interface without being stuck with design decisions. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180309165212.97144-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [eblake: fix typos, mention x- prefix in commit message] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-19 14:58:38 -05:00
Vladimir Sementsov-Ogievskiy	b741ae7417	block/accounting: introduce latency histogram Introduce latency histogram statics for block devices. For each accounted operation type, the latency region [0, +inf) is divided into subregions by several points. Then, calculate hits for each subregion. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180309165212.97144-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-19 14:58:37 -05:00
Max Reitz	7dc847ebba	qapi: Replace qobject_to_X(o) by qobject_to(X, o) This patch was generated using the following Coccinelle script: @@ expression Obj; @@ ( - qobject_to_qnum(Obj) + qobject_to(QNum, Obj) \| - qobject_to_qstring(Obj) + qobject_to(QString, Obj) \| - qobject_to_qdict(Obj) + qobject_to(QDict, Obj) \| - qobject_to_qlist(Obj) + qobject_to(QList, Obj) \| - qobject_to_qbool(Obj) + qobject_to(QBool, Obj) ) and a bit of manual fix-up for overly long lines and three places in tests/check-qjson.c that Coccinelle did not find. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20180224154033.29559-4-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: swap order from qobject_to(o, X), rebase to master, also a fix to latent false-positive compiler complaint about hw/i386/acpi-build.c] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-19 14:58:36 -05:00
Paolo Bonzini	b16a7cfac5	iscsi: fix iSER compilation This fails in Fedora 28. Reported-by: Andreas Schwab <schwab@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:39 +01:00
Fam Zheng	4f8e3a1f22	vvfat: Fix inherit_options flags Overriding flags violates the precedence rules of bdrv_reopen_queue_child. Just like the read-only option, no-flush should be put into the options. The same is done in bdrv_temp_snapshot_options. Reported-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:39 +01:00
Liang Li	b76e4458b1	block/mirror: change the semantic of 'force' of block-job-cancel When doing drive mirror to a low speed shared storage, if there was heavy BLK IO write workload in VM after the 'ready' event, drive mirror block job can't be canceled immediately, it would keep running until the heavy BLK IO workload stopped in the VM. Libvirt depends on the current block-job-cancel semantics, which is that when used without a flag after the 'ready' event, the command blocks until data is in sync. However, these semantics are awkward in other situations, for example, people may use drive mirror for realtime backups while still wanting to use block live migration. Libvirt cannot start a block live migration while another drive mirror is in progress, but the user would rather abandon the backup attempt as broken and proceed with the live migration than be stuck waiting for the current drive mirror backup to finish. The drive-mirror command already includes a 'force' flag, which libvirt does not use, although it documented the flag as only being useful to quit a job which is paused. However, since quitting a paused job has the same effect as abandoning a backup in a non-paused job (namely, the destination file is not in sync, and the command completes immediately), we can just improve the documentation to make the force flag obviously useful. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Jeff Cody <jcody@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Eric Blake <eblake@redhat.com> Cc: John Snow <jsnow@redhat.com> Reported-by: Huaitong Han <huanhuaitong@didichuxing.com> Signed-off-by: Huaitong Han <huanhuaitong@didichuxing.com> Signed-off-by: Liang Li <liliangleo@didichuxing.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:39 +01:00
Kevin Wolf	1cfeaf386e	vpc: Require aligned size in .bdrv_co_create Perform the rounding to match a CHS geometry only in the legacy code path in .bdrv_co_create_opts. QMP now requires that the user already passes a CHS aligned image size, unless force-size=true is given. CHS alignment is required to make the image compatible with Virtual PC, but not for use with newer Microsoft hypervisors. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-03-19 12:01:39 +01:00
Kevin Wolf	182c883550	vpc: Support .bdrv_co_create This adds the .bdrv_co_create driver callback to vpc, which enables image creation over QMP. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-03-19 12:01:39 +01:00
Kevin Wolf	09b68dab5a	vhdx: Support .bdrv_co_create This adds the .bdrv_co_create driver callback to vhdx, which enables image creation over QMP. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-03-19 12:01:39 +01:00
Kevin Wolf	da23248f27	vdi: Make comments consistent with other drivers This makes the .bdrv_co_create(_opts) implementation of vdi look more like the other recently converted block drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-03-19 12:01:39 +01:00
Kevin Wolf	959355a476	qed: Support .bdrv_co_create This adds the .bdrv_co_create driver callback to qed, which enables image creation over QMP. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2018-03-19 12:01:39 +01:00
Kevin Wolf	42a3e1ab36	qcow: Support .bdrv_co_create This adds the .bdrv_co_create driver callback to qcow, which enables image creation over QMP. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2018-03-19 12:01:39 +01:00
Kevin Wolf	1511b49040	parallels: Support .bdrv_co_create This adds the .bdrv_co_create driver callback to parallels, which enables image creation over QMP. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2018-03-19 12:01:39 +01:00
Max Reitz	e38105748f	vdi: Implement .bdrv_co_create Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:39 +01:00
Max Reitz	ec73f06071	vdi: Move file creation to vdi_co_create_opts Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:39 +01:00
Max Reitz	49858b5098	vdi: Pull option parsing from vdi_co_create In preparation of QAPI-fying VDI image creation, we have to create a BlockdevCreateOptionsVdi type which is received by a (future) vdi_co_create(). vdi_co_create_opts() now converts the QemuOpts object into such a BlockdevCreateOptionsVdi object. The protocol-layer file is still created in vdi_co_do_create() (and BlockdevCreateOptionsVdi.file is set to an empty string), but that will be addressed by a follow-up patch. Note that cluster-size is not part of the QAPI schema because it is not supported by default. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:39 +01:00
Kevin Wolf	3d7ed9c453	luks: Catch integer overflow for huge sizes When you request an image size close to UINT64_MAX, the addition of the crypto header may cause an integer overflow. Catch it instead of silently truncating the image size. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2018-03-19 12:01:24 +01:00
Kevin Wolf	e39e959e89	luks: Turn invalid assertion into check The .bdrv_getlength implementation of the crypto block driver asserted that the payload offset isn't after EOF. This is an invalid assertion to make as the image file could be corrupted. Instead, check it and return -EIO if the file is too small for the payload offset. Zero length images are fine, so trigger -EIO only on offset > len, not on offset >= len as the assertion did before. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2018-03-19 12:01:24 +01:00
Kevin Wolf	1bedcaf120	luks: Support .bdrv_co_create This adds the .bdrv_co_create driver callback to luks, which enables image creation over QMP. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2018-03-19 12:01:24 +01:00
Kevin Wolf	1ec4f4160a	luks: Create block_crypto_co_create_generic() Everything that refers to the protocol layer or QemuOpts is moved out of block_crypto_create_generic(), so that the remaining function is suitable to be called by a .bdrv_co_create implementation. LUKS is the only driver that actually implements the old interface, and we don't intend to use it in any new drivers, so put the moved out code directly into a LUKS function rather than creating a generic intermediate one. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-03-19 12:01:24 +01:00
Kevin Wolf	3b5a1f6aa4	luks: Separate image file creation from formatting The crypto driver used to create the image file in a callback from the crypto subsystem. If we want to implement .bdrv_co_create, this needs to go away because that callback will get a reference to an already existing block node. Move the image file creation to block_crypto_create_generic(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	11b61fbc0d	blockjobs: add block-job-finalize Instead of automatically transitioning from PENDING to CONCLUDED, gate the .prepare() and .commit() phases behind an explicit acknowledgement provided by the QMP monitor if auto_finalize = false has been requested. This allows us to perform graph changes in prepare and/or commit so that graph changes do not occur autonomously without knowledge of the controlling management layer. Transactions that have reached the "PENDING" state together can all be moved to invoke their finalization methods by issuing block_job_finalize to any one job in the transaction. Jobs in a transaction with mixed job->auto_finalize settings will all remain stuck in the "PENDING" state, as if the entire transaction was specified with auto_finalize = false. Jobs that specified auto_finalize = true, however, will still not emit the PENDING event. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	35d6b368f2	blockjobs: ensure abort is called for cancelled jobs Presently, even if a job is canceled post-completion as a result of a failing peer in a transaction, it will still call .commit because nothing has updated or changed its return code. The reason why this does not cause problems currently is because backup's implementation of .commit checks for cancellation itself. I'd like to simplify this contract: (1) Abort is called if the job/transaction fails (2) Commit is called if the job/transaction succeeds To this end: A job's return code, if 0, will be forcibly set as -ECANCELED if that job has already concluded. Remove the now redundant check in the backup job implementation. We need to check for cancellation in both block_job_completed AND block_job_completed_single, because jobs may be cancelled between those two calls; for instance in transactions. This also necessitates an ABORTING -> ABORTING transition to be allowed. The check in block_job_completed could be removed, but there's no point in starting to attempt to succeed a transaction that we know in advance will fail. This does NOT affect mirror jobs that are "canceled" during their synchronous phase. The mirror job itself forcibly sets the canceled property to false prior to ceding control, so such cases will invoke the "commit" callback. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	75f710599f	blockjobs: add block_job_dismiss For jobs that have reached their CONCLUDED state, prior to having their last reference put down (meaning jobs that have completed successfully, unsuccessfully, or have been canceled), allow the user to dismiss the job's lingering status report via block-job-dismiss. This gives management APIs the chance to conclusively determine if a job failed or succeeded, even if the event broadcast was missed. Note: block_job_do_dismiss and block_job_decommission happen to do exactly the same thing, but they're called from different semantic contexts, so both aliases are kept to improve readability. Note 2: Don't worry about the 0x04 flag definition for AUTO_DISMISS, she has a friend coming in a future patch to fill the hole where 0x02 is. Verbs: Dismiss: operates on CONCLUDED jobs only. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	0ec4dfb8d6	blockjobs: add block_job_verb permission table Which commands ("verbs") are appropriate for jobs in which state is also somewhat burdensome to keep track of. As of this commit, it looks rather useless, but begins to look more interesting the more states we add to the STM table. A recurring theme is that no verb will apply to an 'undefined' job. Further, it's not presently possible to restrict the "pause" or "resume" verbs any more than they are in this commit because of the asynchronous nature of how jobs enter the PAUSED state; justifications for some seemingly erroneous applications are given below. ===== Verbs ===== Cancel: Any state except undefined. Pause: Any state except undefined; 'created': Requests that the job pauses as it starts. 'running': Normal usage. (PAUSED) 'paused': The job may be paused for internal reasons, but the user may wish to force an indefinite user-pause, so this is allowed. 'ready': Normal usage. (STANDBY) 'standby': Same logic as above. Resume: Any state except undefined; 'created': Will lift a user's pause-on-start request. 'running': Will lift a pause request before it takes effect. 'paused': Normal usage. 'ready': Will lift a pause request before it takes effect. 'standby': Normal usage. Set-speed: Any state except undefined, though ready may not be meaningful. Complete: Only a 'ready' job may accept a complete request. ======= Changes ======= (1) To facilitate "nice" error checking, all five major block-job verb interfaces in blockjob.c now support an errp parameter: - block_job_user_cancel is added as a new interface. - block_job_user_pause gains an errp paramter - block_job_user_resume gains an errp parameter - block_job_set_speed already had an errp parameter. - block_job_complete already had an errp parameter. (2) block-job-pause and block-job-resume will no longer no-op when trying to pause an already paused job, or trying to resume a job that isn't paused. These functions will now report that they did not perform the action requested because it was not possible. iotests have been adjusted to address this new behavior. (3) block-job-complete doesn't worry about checking !block_job_started, because the permission table guards against this. (4) test-bdrv-drain's job implementation needs to announce that it is 'ready' now, in order to be completed. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	c9de40505f	blockjobs: add state transition table The state transition table has mostly been implied. We're about to make it a bit more complex, so let's make the STM explicit instead. Perform state transitions with a function that for now just asserts the transition is appropriate. Transitions: Undefined -> Created: During job initialization. Created -> Running: Once the job is started. Jobs cannot transition from "Created" to "Paused" directly, but will instead synchronously transition to running to paused immediately. Running -> Paused: Normal workflow for pauses. Running -> Ready: Normal workflow for jobs reaching their sync point. (e.g. mirror) Ready -> Standby: Normal workflow for pausing ready jobs. Paused -> Running: Normal resume. Standby -> Ready: Resume of a Standby job. +---------+ \|UNDEFINED\| +--+------+ \| +--v----+ \|CREATED\| +--+----+ \| +--v----+ +------+ \|RUNNING<----->PAUSED\| +--+----+ +------+ \| +--v--+ +-------+ \|READY<------->STANDBY\| +-----+ +-------+ Notably, there is no state presently defined as of this commit that deals with a job after the "running" or "ready" states, so this table will be adjusted alongside the commits that introduce those states. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
John Snow	75859b9420	blockjobs: model single jobs as transactions model all independent jobs as single job transactions. It's one less case we have to worry about when we add more states to the transition machine. This way, we can just treat all job lifetimes exactly the same. This helps tighten assertions of the STM graph and removes some conditionals that would have been needed in the coming commits adding a more explicit job lifetime management API. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-19 12:01:24 +01:00
Peter Maydell	9cc7d0cf6a	-----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJaqD6PAAoJEH3vgQaq/DkO9FIP/3pAW3xJUDGYsONiebX1IbhA VpoQCcjks3cHD18AUoVHufayJBUVfed1LhYPP8xoDuSRmKs1xU1O9FknxMQaL+Dw kbliBY7GjN8A2EcCjW+ZwyNT/KpjyXXwuZ2PSnOSSiN3JK6wrLCzeZyKyOYewLCS u9fKscnqWkg+awbCfDlVs92AaBAKoOP9loOq6e2J/jVY8HSDGb2owRnsxaWg8gJ8 J9BlnXENQ14jEwickD3sluPfWkhu9xh7cCocH8cfgXL5veGUELz0Ugx4RHcsAF9Q SVDg/EhRRN11cvOkLnlggETaLbGtEE64AL4HhjxzCLraHsnEazPDwFgetB9mOhhF Nqu8HuGcVvRgn89au89mxAvTSWX9KFq4oF8Vi+FZZHkLilRx6NJnMpUpd9zkSJDq yjR2/BV0A9Ep1gvWX/rhpPrN5dALYHcaxoiSB497Yj4SI2ZSyzfrneteYdPv4EEc 3CSJ3l6NCGAE2dNXuVZTVqHyXOSl7mJQQmT53dtsSNipCMEsVr0mOx3DPNY26LIc DUdnX6JOyZPU0wzOj8xjFNV72/gBEkqVZ5p9UJ+lrIYwOsTobpzfDtYquu4asda8 IN44mcbRCZRFIiZZOGEdnwf34vIpQKMiZAtszAaan9KXwTXV9LbipaomBEN88vUD IgI5XsZTfiD2uIjnREWv =ISfR -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging # gpg: Signature made Tue 13 Mar 2018 21:11:43 GMT # gpg: using RSA key 7DEF8106AAFC390E # gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" # Primary key fingerprint: FAEB 9711 A12C F475 812F 18F2 88A9 064D 1835 61EB # Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76 CBD0 7DEF 8106 AAFC 390E * remotes/jnsnow/tags/bitmaps-pull-request: iotests: add dirty bitmap postcopy test iotests: add dirty bitmap migration test migration: add postcopy migration of dirty bitmaps migration: allow qmp command migrate-start-postcopy for any postcopy migration: add is_active_iterate handler migration/qemu-file: add qemu_put_counted_string() migration: include migrate_dirty_bitmaps in migrate_postcopy qapi: add dirty-bitmaps migration capability migration: introduce postcopy-only pending dirty-bitmap: add locked state block/dirty-bitmap: add _locked version of bdrv_reclaim_dirty_bitmap block/dirty-bitmap: fix locking in bdrv_reclaim_dirty_bitmap block/dirty-bitmap: add bdrv_dirty_bitmap_enable_successor() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-03-16 14:15:18 +00:00
Peter Maydell	475fe4576f	nbd patches for 2018-03-13 - Eric Blake: iotests: Fix stuck NBD process on 33 - Vladimir Sementsov-Ogievskiy: 0/5 nbd server fixing and refactoring before BLOCK_STATUS - Eric Blake: nbd/server: Honor FUA request on NBD_CMD_TRIM - Stefan Hajnoczi: 0/2 block: fix nbd-server-stop crash after blockdev-snapshot-sync - Vladimir Sementsov-Ogievskiy: nbd block status base:allocation -----BEGIN PGP SIGNATURE----- Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAABCAAGBQJaqDklAAoJEKeha0olJ0NqRsAH+waQGLA8YPwxlnpRW2kulLfC dZXv/ocl2vGgxRrDLEL46xh1RUpapERHADk/Qun8reQpqLicd6p8VCuoOZFEj6QN Xo98JHrKKL6AZ1rVhWUzD8G6qwgL6FGq6Eb5ty/kanf2/0igwtHmu86nOgGyc9dz zelGPdIxyxIEjCNiLIN49iEFs+gk1hr8qp1TNMbnHlQh8moOYqdCJWNisOQowoEE soCJ4NLnvKBtnmrxDrvtkppQKW8ukDOG/q5BkSTvAIEBH/v0ioohFUNTFkC8vmjO 8YAwlXAz6EpQuKxpEfl7vxaT19edrNIo55JO1/Gwzk50g4/Mt+AH2JM/msFcMKQ= =i+nR -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2018-03-13-v2' into staging nbd patches for 2018-03-13 - Eric Blake: iotests: Fix stuck NBD process on 33 - Vladimir Sementsov-Ogievskiy: 0/5 nbd server fixing and refactoring before BLOCK_STATUS - Eric Blake: nbd/server: Honor FUA request on NBD_CMD_TRIM - Stefan Hajnoczi: 0/2 block: fix nbd-server-stop crash after blockdev-snapshot-sync - Vladimir Sementsov-Ogievskiy: nbd block status base:allocation # gpg: Signature made Tue 13 Mar 2018 20:48:37 GMT # gpg: using RSA key A7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" # gpg: aka "[jpeg image of size 6874]" # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2018-03-13-v2: iotests: new test 209 for NBD BLOCK_STATUS iotests: add file_path helper iotests.py: tiny refactor: move system imports up nbd: BLOCK_STATUS for standard get_block_status function: client part block/nbd-client: save first fatal error in nbd_iter_error nbd: BLOCK_STATUS for standard get_block_status function: server part nbd/server: add nbd_read_opt_name helper nbd/server: add nbd_opt_invalid helper iotests: add 208 nbd-server + blockdev-snapshot-sync test case block: let blk_add/remove_aio_context_notifier() tolerate BDS changes nbd/server: Honor FUA request on NBD_CMD_TRIM nbd/server: refactor nbd_trip: split out nbd_handle_request nbd/server: refactor nbd_trip: cmd_read and generic reply nbd/server: fix: check client->closing before sending reply nbd/server: fix sparse read nbd/server: move nbd_co_send_structured_error up iotests: Fix stuck NBD process on 33 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-03-16 13:14:07 +00:00
Vladimir Sementsov-Ogievskiy	4f43e9535b	dirty-bitmap: add locked state Add special state, when qmp operations on the bitmap are disabled. It is needed during bitmap migration. "Frozen" state is not appropriate here, because it looks like bitmap is unchanged. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180207155837.92351-5-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2018-03-13 17:05:00 -04:00
Vladimir Sementsov-Ogievskiy	044ee8e143	block/dirty-bitmap: add _locked version of bdrv_reclaim_dirty_bitmap Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180207155837.92351-4-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2018-03-13 17:04:54 -04:00
Vladimir Sementsov-Ogievskiy	604ab74bb5	block/dirty-bitmap: fix locking in bdrv_reclaim_dirty_bitmap Like other setters here these functions should take a lock. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180207155837.92351-3-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2018-03-13 17:04:48 -04:00
Vladimir Sementsov-Ogievskiy	78a33ab587	nbd: BLOCK_STATUS for standard get_block_status function: client part Minimal realization: only one extent in server answer is supported. Flag NBD_CMD_FLAG_REQ_ONE is used to force this behavior. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180312152126.286890-6-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweaks, fix min_block check and 32-bit cap, use -1 instead of errno on failure in nbd_negotiate_simple_meta_context, ensure that block status makes progress on success] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-13 15:43:48 -05:00

... 3 4 5 6 7 ...

4126 Commits