mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Stefan Hajnoczi	f8ffcb2bda	block/export: fix vhost-user-blk get_config() information leak Refuse get_config() requests in excess of sizeof(struct virtio_blk_config). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201027173528.213464-5-stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-11-03 16:39:05 -05:00
Stefan Hajnoczi	11f60f7eae	block/export: make vhost-user-blk config space little-endian VIRTIO 1.0 devices have little-endian configuration space. The vhost-user-blk-server.c code already uses little-endian for virtqueue processing but not for the configuration space fields. Fix this so the vhost-user-blk export works on big-endian hosts. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201027173528.213464-4-stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-11-03 16:39:05 -05:00
Stefan Hajnoczi	bc15e44cb2	configure: introduce --enable-vhost-user-blk-server Make it possible to compile out the vhost-user-blk server. It is enabled by default on Linux. Note that vhost-user-server.c depends on libvhost-user, which requires CONFIG_LINUX. The CONFIG_VHOST_USER dependency was erroneous since that option controls vhost-user frontends (previously known as "master") and not device backends (previously known as "slave"). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201027173528.213464-3-stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-11-03 16:39:05 -05:00
Philippe Mathieu-Daudé	a0546a7b6f	block/nvme: Fix nvme_submit_command() on big-endian host The Completion Queue Command Identifier is a 16-bit value, so nvme_submit_command() is unlikely to work on big-endian hosts, as the relevant bits are truncated. Fix by using the correct byte-swap function. Fixes: `bdd6a90a9e` ("block: Add VFIO based NVMe driver") Reported-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20201029093306.1063879-25-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Philippe Mathieu-Daudé	4b19e9b815	block/nvme: Fix use of write-only doorbells page on Aarch64 arch qemu_vfio_pci_map_bar() calls mmap(), and mmap(2) states: 'offset' must be a multiple of the page size as returned by sysconf(_SC_PAGE_SIZE). In commit `f68453237b` we started to use an offset of 4K which broke this contract on Aarch64 arch. Fix by mapping at offset 0, and and accessing doorbells at offset=4K. Fixes: `f68453237b` ("block/nvme: Map doorbells pages write-only") Reported-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-24-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Eric Auger	9e13d59884	block/nvme: Align iov's va and size on host page size Make sure iov's va and size are properly aligned on the host page size. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-23-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Eric Auger	f8fd3ebac3	block/nvme: Change size and alignment of prp_list_pages In preparation of 64kB host page support, let's change the size and alignment of the prp_list_pages so that the VFIO DMA MAP succeeds with 64kB host page size. We align on the host page size. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-22-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Eric Auger	2387aaced7	block/nvme: Change size and alignment of queue In preparation of 64kB host page support, let's change the size and alignment of the queue so that the VFIO DMA MAP succeeds. We align on the host page size. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-21-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Eric Auger	0aecd06049	block/nvme: Change size and alignment of IDENTIFY response buffer In preparation of 64kB host page support, let's change the size and alignment of the IDENTIFY command response buffer so that the VFIO DMA MAP succeeds. We align on the host page size. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-20-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Philippe Mathieu-Daudé	a652a3ec69	block/nvme: Correct minimum device page size While trying to simplify the code using a macro, we forgot the 12-bit shift... Correct that. Fixes: `fad1eb6886` ("block/nvme: Use register definitions from 'block/nvme.h'") Reported-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-19-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Philippe Mathieu-Daudé	c8228ac355	block/nvme: Set request_alignment at initialization Commit `bdd6a90a9e` ("block: Add VFIO based NVMe driver") sets the request_alignment in nvme_refresh_limits(). For consistency, also set it during initialization. Reported-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-18-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	08d5406798	block/nvme: Simplify nvme_cmd_sync() As all commands use the ADMIN queue, it is pointless to pass it as argument each time. Remove the argument, and rename the function as nvme_admin_cmd_sync() to make this new behavior clearer. Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20201029093306.1063879-17-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	52b75ea8ec	block/nvme: Simplify ADMIN queue access We don't need to dereference from BDRVNVMeState each time. Use a NVMeQueuePair pointer on the admin queue. The nvme_init() becomes easier to review, matching the style of nvme_add_io_queue(). Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-16-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	3c363c073e	block/nvme: Correctly initialize Admin Queue Attributes From the specification chapter 3.1.8 "AQA - Admin Queue Attributes" the Admin Submission Queue Size field is a 0’s based value: Admin Submission Queue Size (ASQS): Defines the size of the Admin Submission Queue in entries. Enabling a controller while this field is cleared to 00h produces undefined results. The minimum size of the Admin Submission Queue is two entries. The maximum size of the Admin Submission Queue is 4096 entries. This is a 0’s based value. This bug has never been hit because the device initialization uses a single command synchronously :) Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-15-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	76a24781cc	block/nvme: Use definitions instead of magic values in add_io_queue() Replace magic values by definitions, and simplifiy since the number of queues will never reach 64K. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-14-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	dfa9c6c656	block/nvme: Make nvme_init_queue() return boolean indicating error Just for consistency, following the example documented since commit `e3fe3988d7` ("error: Document Error API usage rules"), return a boolean value indicating an error is set or not. Directly pass errp as the local_err is not requested in our case. This simplifies a bit nvme_create_queue_pair(). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-12-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	7a5f00dde3	block/nvme: Make nvme_identify() return boolean indicating error Just for consistency, following the example documented since commit `e3fe3988d7` ("error: Document Error API usage rules"), return a boolean value indicating an error is set or not. Directly pass errp as the local_err is not requested in our case. Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20201029093306.1063879-11-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	1b539bd6db	block/nvme: Use unsigned integer for queue counter/size We can not have negative queue count/size/index, use unsigned type. Rename 'nr_queues' as 'queue_count' to match the spec naming. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-10-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	3214b0f094	block/nvme: Move definitions before structure declarations To be able to use some definitions in structure declarations, move them earlier. No logical change. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-9-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	6e1e9ff2d3	block/nvme: Trace queue pair creation/deletion Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-8-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé	51e98b6d21	block/nvme: Improve nvme_free_req_queue_wait() trace information What we want to trace is the block driver state and the queue index. Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-7-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé	1c914cd120	block/nvme: Trace nvme_poll_queue() per queue As we want to enable multiple queues, report the event in each nvme_poll_queue() call, rather than once in the callback calling nvme_poll_queues(). Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-6-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé	15b2260bef	block/nvme: Trace controller capabilities Controllers have different capabilities and report them in the CAP register. We are particularly interested by the page size limits. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-5-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé	58ad6ae0cb	block/nvme: Report warning with warn_report() Instead of displaying warning on stderr, use warn_report() which also displays it on the monitor. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-4-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé	8526e39e99	block/nvme: Use hex format to display offset in trace events Use the same format used for the hw/vfio/ trace events. Suggested-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-3-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:20 +00:00
AlexChen	c9eb2f3e38	block/vvfat: Fix bad printf format specifiers We should use printf format specifier "%u" instead of "%d" for argument of type "unsigned int". In addition, fix two error format problems found by checkpatch.pl: ERROR: space required after that ',' (ctx:VxV) + fprintf(stderr,"%s attributes=0x%02x begin=%u size=%d\n", ^ ERROR: line over 90 characters + fprintf(stderr, "%d, %s (%u, %d)\n", i, commit->path ? commit->path : "(null)", commit->param.rename.cluster, commit->action); Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Alex Chen <alex.chen@huawei.com> Message-Id: <5FA12620.6030705@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-11-03 16:24:56 +01:00
Eric Blake	dbc7b01492	nbd: Add 'qemu-nbd -A' to expose allocation depth Allow the server to expose an additional metacontext to be requested by savvy clients. qemu-nbd adds a new option -A to expose the qemu:allocation-depth metacontext through NBD_CMD_BLOCK_STATUS; this can also be set via QMP when using block-export-add. qemu as client is hacked into viewing the key aspects of this new context by abusing the already-experimental x-dirty-bitmap option to collapse all depths greater than 2, which results in a tri-state value visible in the output of 'qemu-img map --output=json' (yes, that means x-dirty-bitmap is now a bit of a misnomer, but I didn't feel like renaming it as it would introduce a needless break of back-compat, even though we make no compat guarantees with x- members): unallocated (depth 0) => "zero":false, "data":true local (depth 1) => "zero":false, "data":false backing (depth 2+) => "zero":true, "data":true libnbd as client is probably a nicer way to get at the information without having to decipher such hacks in qemu as client. ;) Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201027050556.269064-11-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2020-10-30 15:22:00 -05:00
Eric Blake	a92b1b065e	block: Return depth level during bdrv_is_allocated_above When checking for allocation across a chain, it's already easy to count the depth within the chain at which the allocation is found. Instead of throwing that information away, return it to the caller. Existing callers only cared about allocated/non-allocated, but having a depth available will be used by NBD in the next patch. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201027050556.269064-9-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-30 15:21:23 -05:00
Greg Kurz	1a6d3bd229	block: End quiescent sections when a BDS is deleted If a BDS gets deleted during blk_drain_all(), it might miss a call to bdrv_do_drained_end(). This means missing a call to aio_enable_external() and the AIO context remains disabled for ever. This can cause a device to become irresponsive and to disrupt the guest execution, ie. hang, loop forever or worse. This scenario is quite easy to encounter with virtio-scsi on POWER when punching multiple blockdev-create QMP commands while the guest is booting and it is still running the SLOF firmware. This happens because SLOF disables/re-enables PCI devices multiple times via IO/MEM/MASTER bits of PCI_COMMAND register after the initial probe/feature negotiation, as it tends to work with a single device at a time at various stages like probing and running block/network bootloaders without doing a full reset in-between. This naturally generates many dataplane stops and starts, and thus many drain sections that can race with blockdev_create_run(). In the end, SLOF bails out. It is somehow reproducible on x86 but it requires to generate articial dataplane start/stop activity with stop/cont QMP commands. In this case, seabios ends up looping for ever, waiting for the virtio-scsi device to send a response to a command it never received. Add a helper that pairs all previously called bdrv_do_drained_begin() with a bdrv_do_drained_end() and call it from bdrv_close(). While at it, update the "/bdrv-drain/graph-change/drain_all" test in test-bdrv-drain so that it can catch the issue. BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1874441 Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160346526998.272601.9045392804399803158.stgit@bahia.lan> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-27 15:26:20 +01:00
Alberto Garcia	46cd1e8a47	qcow2: Skip copy-on-write when allocating a zero cluster Since commit `c8bb23cbdb` when a write request results in a new allocation QEMU first tries to see if the rest of the cluster outside the written area contains only zeroes. In that case, instead of doing a normal copy-on-write operation and writing explicit zero buffers to disk, the code zeroes the whole cluster efficiently using pwrite_zeroes() with BDRV_REQ_NO_FALLBACK. This improves performance very significantly but it only happens when we are writing to an area that was completely unallocated before. Zero clusters (QCOW2_CLUSTER_ZERO_*) are treated like normal clusters and are therefore slower to allocate. This happens because the code uses bdrv_is_allocated_above() rather bdrv_block_status_above(). The former is not as accurate for this purpose but it is faster. However in the case of qcow2 the underlying call does already report zero clusters just fine so there is no reason why we cannot use that information. After testing 4KB writes on an image that only contains zero clusters this patch results in almost five times more IOPS. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <6d77cab968c501c44d6e1089b9bc91b04170b49e.1603731354.git.berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-27 15:26:20 +01:00
Alberto Garcia	d40f4a565a	qcow2: Report BDRV_BLOCK_ZERO more accurately in bdrv_co_block_status() If a BlockDriverState supports backing files but has none then any unallocated area reads back as zeroes. bdrv_co_block_status() is only reporting this is if want_zero is true, but this is an inexpensive test and there is no reason not to do it in all cases. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <66fa0914a0e2b727ab6d1b63ca773d7cd29a9a9e.1603731354.git.berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-27 15:26:20 +01:00
Peter Maydell	a95e0396c8	* fix --disable-tcg builds (Claudio) * Fixes for macOS --enable-modules build and OpenBSD curses/iconv detection (myself) * Start preparing for meson 0.56 (myself) * Move directory configuration to meson (myself) * Start untangling qemu_init (myself) * Windows fixes (Sunil) * Remove -no-kbm (Thomas) -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl+WrxEUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNQAggAqfucqEQvz6s+DCPv2u572diyMvhe Y7vmaQF0qYKoAvy5OLqGlqXVsn8lwf19zJWo9Z7k4qNefWl84ii0J/kEmnolzTGq 7Z0CRSnGbNQy9YedYXuymaR3E0VY+6lsPnzIpufQISzQRdjzT8OQ51DMAhc04oQl saXsts7y+om+tzvW2JFGtNsfFRUjcRKqjIAVfwneBXFW9TRD2epvYxz/S0o+XJwF eSiINvTqDxxPyy6XJykC46xf/TTfReHv6fQgTn7Jw3TQuo4m7qXLi5Vj8W1erZJv t3xhZNabt813T6ztNcAAuJ0srIn55Ac7Fuq3/1ecgeVD08ntmabe4WhKRg== =931x -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging * fix --disable-tcg builds (Claudio) * Fixes for macOS --enable-modules build and OpenBSD curses/iconv detection (myself) * Start preparing for meson 0.56 (myself) * Move directory configuration to meson (myself) * Start untangling qemu_init (myself) * Windows fixes (Sunil) * Remove -no-kbm (Thomas) # gpg: Signature made Mon 26 Oct 2020 11:12:17 GMT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: machine: move SMP initialization from vl.c machine: move UP defaults to class_base_init machine: remove deprecated -machine enforce-config-section option win32: boot broken when bind & data dir are the same WHPX: Fix WHPX build break configure: move install_blobs from configure to meson configure: remove unused variable from config-host.mak configure: move directory options from config-host.mak to meson configure: allow configuring localedir Makefile: separate meson rerun from the rest of the ninja invocation Remove deprecated -no-kvm option replay: do not build if TCG is not available qtest: unbreak non-TCG builds in bios-tables-test hw/core/qdev-clock: add a reference on aliased clocks do not use colons in test names meson: rewrite curses/iconv test build: fix macOS --enable-modules build Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-10-26 15:49:11 +00:00
Vladimir Sementsov-Ogievskiy	7e7e510077	block/io: fix bdrv_is_allocated_above bdrv_is_allocated_above wrongly handles short backing files: it reports after-EOF space as UNALLOCATED which is wrong, as on read the data is generated on the level of short backing file (if all overlays have unallocated areas at that place). Reusing bdrv_common_block_status_above fixes the issue and unifies code path. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20200924194003.22080-5-vsementsov@virtuozzo.com [Fix s/has/have/ as suggested by Eric Blake. Fix s/area/areas/. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Vladimir Sementsov-Ogievskiy	624f27bbe9	block/io: bdrv_common_block_status_above: support bs == base We are going to reuse bdrv_common_block_status_above in bdrv_is_allocated_above. bdrv_is_allocated_above may be called with include_base == false and still bs == base (for ex. from img_rebase()). So, support this corner case. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20200924194003.22080-4-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Vladimir Sementsov-Ogievskiy	3555a43261	block/io: bdrv_common_block_status_above: support include_base In order to reuse bdrv_common_block_status_above in bdrv_is_allocated_above, let's support include_base parameter. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20200924194003.22080-3-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Vladimir Sementsov-Ogievskiy	67c095c8b8	block/io: fix bdrv_co_block_status_above bdrv_co_block_status_above has several design problems with handling short backing files: 1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but without BDRV_BLOCK_ALLOCATED flag, when actually short backing file which produces these after-EOF zeros is inside requested backing sequence. 2. With want_zero=false, it may return pnum=0 prior to actual EOF, because of EOF of short backing file. Fix these things, making logic about short backing files clearer. With fixed bdrv_block_status_above we also have to improve is_zero in qcow2 code, otherwise iotest 154 will fail, because with this patch we stop to merge zeros of different types (produced by fully unallocated in the whole backing chain regions vs produced by short backing files). Note also, that this patch leaves for another day the general problem around block-status: misuse of BDRV_BLOCK_ALLOCATED as is-fs-allocated vs go-to-backing. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20200924194003.22080-2-vsementsov@virtuozzo.com [Fix s/comes/come/ as suggested by Eric Blake --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	d9b495f9c6	block/export: add vhost-user-blk multi-queue support Allow the number of queues to be configured using --export vhost-user-blk,num-queues=N. This setting should match the QEMU --device vhost-user-blk-pci,num-queues=N setting but QEMU vhost-user-blk.c lowers its own value if the vhost-user-blk backend offers fewer queues than QEMU. The vhost-user-blk-server.c code is already capable of multi-queue. All virtqueue processing runs in the same AioContext. No new locking is needed. Add the num-queues=N option and set the VIRTIO_BLK_F_MQ feature bit. Note that the feature bit only announces the presence of the num_queues configuration space field. It does not promise that there is more than 1 virtqueue, so we can set it unconditionally. I tested multi-queue by running a random read fio test with numjobs=4 on an -smp 4 guest. After the benchmark finished the guest /proc/interrupts file showed activity on all 4 virtio-blk MSI-X. The /sys/block/vda/mq/ directory shows that Linux blk-mq has 4 queues configured. An automated test is included in the next commit. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20201001144604.559733-2-stefanha@redhat.com [Fixed accidental tab characters as suggested by Markus Armbruster --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	f51d23c80a	block/export: add iothread and fixed-iothread options Make it possible to specify the iothread where the export will run. By default the block node can be moved to other AioContexts later and the export will follow. The fixed-iothread option forces strict behavior that prevents changing AioContext while the export is active. See the QAPI docs for details. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200929125516.186715-5-stefanha@redhat.com [Fix stray '#' character in block-export.json and add missing "(since: 5.2)" as suggested by Eric Blake. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	cbc20bfb8f	block: move block exports to libblockdev Block exports are used by softmmu, qemu-storage-daemon, and qemu-nbd. They are not used by other programs and are not otherwise needed in libblock. Undo the recent move of blockdev-nbd.c from blockdev_ss into block_ss. Since bdrv_close_all() (libblock) calls blk_exp_close_all() (libblockdev) a stub function is required.. Make qemu-nbd.c use signal handling utility functions instead of duplicating the code. This helps because os-posix.c is in libblockdev and it depends on a qemu_system_killed() symbol that qemu-nbd.c lacks. Once we use the signal handling utility functions we also end up providing the necessary symbol. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20200929125516.186715-4-stefanha@redhat.com [Fixed s/ndb/nbd/ typo in commit description as suggested by Eric Blake --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	3a213f83d9	util/vhost-user-server: use static library in meson.build Don't compile contrib/libvhost-user/libvhost-user.c again. Instead build the static library once and then reuse it throughout QEMU. Also switch from CONFIG_LINUX to CONFIG_VHOST_USER, which is what the vhost-user tools (vhost-user-gpu, etc) do. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-14-stefanha@redhat.com [Added CONFIG_LINUX again because libvhost-user doesn't build on macOS. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	80a06cc52b	util/vhost-user-server: move header to include/ Headers used by other subsystems are located in include/. Also add the vhost-user-server and vhost-user-blk-server headers to MAINTAINERS. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-13-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	90fc91d50b	block/export: convert vhost-user-blk server to block export API Use the new QAPI block exports API instead of defining our own QOM objects. This is a large change because the lifecycle of VuBlockDev needs to follow BlockExportDriver. QOM properties are replaced by QAPI options objects. VuBlockDev is renamed VuBlkExport and contains a BlockExport field. Several fields can be dropped since BlockExport already has equivalents. The file names and meson build integration will be adjusted in a future patch. libvhost-user should probably be built as a static library that is linked into QEMU instead of as a .c file that results in duplicate compilation. The new command-line syntax is: $ qemu-storage-daemon \ --blockdev file,node-name=drive0,filename=test.img \ --export vhost-user-blk,node-name=drive0,id=export0,unix-socket=/tmp/vhost-user-blk.sock Note that unix-socket is optional because we may wish to accept chardevs too in the future. Markus noted that supported address families are not explicit in the QAPI schema. It is unlikely that support for more address families will be added since file descriptor passing is required and few address families support it. If a new address family needs to be added, then the QAPI 'features' syntax can be used to advertize them. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20200924151549.913737-12-stefanha@redhat.com [Skip test on big-endian host architectures because this device doesn't support them yet (as already mentioned in a code comment). --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	0534b1b227	block/export: report flush errors Propagate the flush return value since errors are possible. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-11-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	7185c85776	util/vhost-user-server: rework vu_client_trip() coroutine lifecycle The vu_client_trip() coroutine is leaked during AioContext switching. It is also unsafe to destroy the vu_dev in panic_cb() since its callers still access it in some cases. Rework the lifecycle to solve these safety issues. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-10-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	47ba680466	util/vhost-user-server: drop unused DevicePanicNotifier The device panic notifier callback is not used. Drop it. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-7-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	df6af7ce77	block/export: consolidate request structs into VuBlockReq Only one struct is needed per request. Drop req_data and the separate VuBlockReq instance. Instead let vu_queue_pop() allocate everything at once. This fixes the req_data memory leak in vu_block_virtio_process_req(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Coiby Xu	3578389bcf	block/export: vhost-user block device backend server By making use of libvhost-user, block device drive can be shared to the connected vhost-user client. Only one client can connect to the server one time. Since vhost-user-server needs a block drive to be created first, delay the creation of this object. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Coiby Xu <coiby.xu@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-id: 20200918080912.321299-6-coiby.xu@gmail.com [Shorten "vhost_user_blk_server" string to "vhost_user_blk" to avoid the following compiler warning: ../block/export/vhost-user-blk-server.c:178:50: error: ‘%s’ directive output truncated writing 21 bytes into a region of size 20 [-Werror=format-truncation=] and fix "Invalid size %ld ..." ssize_t format string arguments for 32-bit hosts. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Philippe Mathieu-Daudé	f25e7ab2b0	block/nvme: Add driver statistics for access alignment and hw errors Keep statistics of some hardware errors, and number of aligned/unaligned I/O accesses. QMP example booting a full RHEL 8.3 aarch64 guest: { "execute": "query-blockstats" } { "return": [ { "device": "", "node-name": "drive0", "stats": { "flush_total_time_ns": 6026948, "wr_highest_offset": 3383991230464, "wr_total_time_ns": 807450995, "failed_wr_operations": 0, "failed_rd_operations": 0, "wr_merged": 3, "wr_bytes": 50133504, "failed_unmap_operations": 0, "failed_flush_operations": 0, "account_invalid": false, "rd_total_time_ns": 1846979900, "flush_operations": 130, "wr_operations": 659, "rd_merged": 1192, "rd_bytes": 218244096, "account_failed": false, "idle_time_ns": 2678641497, "rd_operations": 7406, }, "driver-specific": { "driver": "nvme", "completion-errors": 0, "unaligned-accesses": 2959, "aligned-accesses": 4477 }, "qdev": "/machine/peripheral-anon/device[0]/virtio-backend" } ] } Suggested-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20201001162939.1567915-1-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Claudio Fontana	9b1c911654	replay: do not build if TCG is not available this fixes non-TCG builds broken recently by replay reverse debugging. Stub the needed functions in stub/, splitting roughly between functions needed only by system emulation, by system emulation and tools, and by everyone. This includes duplicating some code in replay/, and puts the logic for non-replay related events in the replay/ module (+ the stubs), so this should be revisited in the future. Surprisingly, only _one_ qtest was affected by this, ide-test.c, which resulted in a buzz as the bh events were never delivered, and the bh never executed. Many other subsystems _should_ have been affected. This fixes the immediate issue, however a better way to group replay functionality to TCG-only code could be developed in the long term. Signed-off-by: Claudio Fontana <cfontana@suse.de> Message-Id: <20201013192123.22632-4-cfontana@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-22 11:53:54 -04:00
Daniel P. Berrangé	e1c4269763	block: deprecate the sheepdog block driver This thread from a little over a year ago: http://lists.wpkg.org/pipermail/sheepdog/2019-March/thread.html states that sheepdog is no longer actively developed. The only mentioned users are some companies who are said to have it for legacy reasons with plans to replace it by Ceph. There is talk about cutting out existing features to turn it into a simple demo of how to write a distributed block service. There is no evidence of anyone working on that idea: https://github.com/sheepdog/sheepdog/commits/master No real commits to git since Jan 2018, and before then just some minor technical debt cleanup. There is essentially no activity on the mailing list aside from patches to QEMU that get CC'd due to our MAINTAINERS entry. Fedora packages for sheepdog failed to build from upstream source because of the more strict linker that no longer merges duplicate global symbols. Fedora patches it to add the missing "extern" annotations and presumably other distros do to, but upstream source remains broken. There is only basic compile testing, no functional testing of the driver. Since there are no build pre-requisites the sheepdog driver is currently enabled unconditionally. This would result in configure issuing a deprecation warning by default for all users. Thus the configure default is changed to disable it, requiring users to pass --enable-sheepdog to build the driver. Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20201002113243.2347710-3-berrange@redhat.com> Reviewed-by: Neal Gompa <ngompa13@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-15 16:06:28 +02:00
Elena Afanasova	5b4c95d0a3	block/blkdebug: fix memory leak Spotted by PVS-Studio Signed-off-by: Elena Afanasova <eafanasova@gmail.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1e903f928eb3da332cc95e2a6f87243bd9fe66e4.camel@gmail.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-10-13 13:33:46 +02:00
Christian Borntraeger	cd466702f0	vmdk: fix maybe uninitialized warnings Fedora 32 gcc 10 seems to give false positives: Compiling C object libblock.fa.p/block_vmdk.c.o ../block/vmdk.c: In function ‘vmdk_parse_extents’: ../block/vmdk.c:587:5: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 587 \| g_free(extent->l1_table); \| ^~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:754:17: note: ‘extent’ was declared here 754 \| VmdkExtent extent; \| ^~~~~~ ../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 620 \| ret = vmdk_init_tables(bs, extent, errp); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:598:17: note: ‘extent’ was declared here 598 \| VmdkExtent extent; \| ^~~~~~ ../block/vmdk.c:1178:39: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 1178 \| extent->flat_start_offset = flat_offset << 9; \| ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~ ../block/vmdk.c: In function ‘vmdk_open_vmdk4’: ../block/vmdk.c:581:22: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 581 \| extent->l2_cache = \| ~~~~~~~~~~~~~~~~~^ 582 \| g_malloc(extent->entry_size * extent->l2_size * L2_CACHE_SIZE); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:872:17: note: ‘extent’ was declared here 872 \| VmdkExtent extent; \| ^~~~~~ ../block/vmdk.c: In function ‘vmdk_open’: ../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 620 \| ret = vmdk_init_tables(bs, extent, errp); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:598:17: note: ‘extent’ was declared here 598 \| VmdkExtent extent; \| ^~~~~~ cc1: all warnings being treated as errors make: *** [Makefile.ninja:884: libblock.fa.p/block_vmdk.c.o] Error 1 fix them by assigning a default value. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Fam Zheng <fam@euphon.net> Message-Id: <20200930155859.303148-2-borntraeger@de.ibm.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-10-13 13:33:45 +02:00
Vladimir Sementsov-Ogievskiy	99d72dba1c	block/nbd: nbd_co_reconnect_loop(): don't connect if drained In a recent commit `12c75e20a2` we've improved nbd_co_reconnect_loop() to not make drain wait for additional sleep. Similarly, we shouldn't try to connect, if previous sleep was interrupted by drain begin, otherwise drain_begin will have to wait for the whole connection attempt. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200903190301.367620-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-09 15:04:32 -05:00
Vladimir Sementsov-Ogievskiy	46f56631b5	block/nbd: fix reconnect-delay reconnect-delay has a design flaw: we handle it in the same loop where we do connection attempt. So, reconnect-delay may be exceeded by unpredictable time of connection attempt. Let's instead use separate timer. How to reproduce the bug: 1. Create an image on node1: qemu-img create -f qcow2 xx 100M 2. Start NBD server on node1: qemu-nbd xx 3. On node2 start qemu-io: ./build/qemu-io --image-opts \ driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=15 4. Type 'read 0 512' in qemu-io interface to check that connection works Be careful: you should make steps 5-7 in a short time, less than 15 seconds. 5. Kill nbd server on node1 6. Run 'read 0 512' in qemu-io interface again, to be sure that nbd client goes to reconnect loop. 7. On node1 run the following command sudo iptables -A INPUT -p tcp --dport 10809 -j DROP This will make the connect() call of qemu-io at node2 take a long time. And you'll see that read command in qemu-io will hang for a long time, more than 15 seconds specified by reconnect-delay parameter. It's the bug. 8. Don't forget to drop iptables rule on node1: sudo iptables -D INPUT -p tcp --dport 10809 -j DROP Important note: Step [5] is necessary to reproduce _this_ bug. If we miss step [5], the read command (step 6) will hang for a long time and this commit doesn't help, because there will be not long connect() to unreachable host, but long sendmsg() to unreachable host, which should be fixed by enabling and adjusting keep-alive on the socket, which is a thing for further patch set. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200903190301.367620-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-09 15:04:32 -05:00
Vladimir Sementsov-Ogievskiy	8a509afd72	block/nbd: correctly use qio_channel_detach_aio_context when needed Don't use nbd_client_detach_aio_context() driver handler where we want to finalize the connection. We should directly use qio_channel_detach_aio_context() in such cases. Driver handler may (and will) contain another things, unrelated to the qio channel. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200903190301.367620-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-09 15:04:32 -05:00
Vladimir Sementsov-Ogievskiy	8c517de24a	block/nbd: fix drain dead-lock because of nbd reconnect-delay We pause reconnect process during drained section. So, if we have some requests, waiting for reconnect we should cancel them, otherwise they deadlock the drained section. How to reproduce: 1. Create an image: qemu-img create -f qcow2 xx 100M 2. Start NBD server: qemu-nbd xx 3. Start vm with second nbd disk on node2, like this: ./build/x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \ file=/work/images/cent7.qcow2 -drive \ driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=60 \ -vnc :0 -m 2G -enable-kvm -vga std 4. Access the vm through vnc (or some other way?), and check that NBD drive works: dd if=/dev/sdb of=/dev/null bs=1M count=10 - the command should succeed. 5. Now, kill the nbd server, and run dd in the guest again: dd if=/dev/sdb of=/dev/null bs=1M count=10 Now Qemu is trying to reconnect, and dd-generated requests are waiting for the connection (they will wait up to 60 seconds (see reconnect-delay option above) and than fail). But suddenly, vm may totally hang in the deadlock. You may need to increase reconnect-delay period to catch the dead-lock. VM doesn't respond because drain dead-lock happens in cpu thread with global mutex taken. That's not good thing by itself and is not fixed by this commit (true way is using iothreads). Still this commit fixes drain dead-lock itself. Note: probably, we can instead continue to reconnect during drained section. To achieve this, we may move negotiation to the connect thread to make it independent of bs aio context. But expanding drained section doesn't seem good anyway. So, let's now fix the bug the simplest way. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200903190301.367620-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-09 15:04:32 -05:00
Peter Maydell	f2687fdb75	* Reverse debugging (Pavel) * CFLAGS cleanup (Paolo) * ASLR fix (Mark) * cpus.c refactoring (Claudio) -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl98EB0UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOCsQf9G7EUAK1zcEOx20LtDdXFrk4tjsRp S83OGdihWe8SM+XiY9BfqsBbXdByqF+SitePOV3feGK0mOP5vtJIL7/2DLrtFTeF wOeARRA9ePVb7hcL5oXAQeE3bXrX8wq8Qtw9xAoHdw5JAEVmKIEJS6AL5Eu3M2Fh pvdBoV84pOm2/ARS3eRstRyW8gCC8rdLDlNsVDtCbYdNVq+VdkzR0l5Phc8JDx1M Qjdl1KpN6ZkuN8M6tnaQNTb9IUVu5c1tu5jdR6JdLUqAWp1wYZJ6r2jSatZWfLR3 H+gzFsDoLPfCjZ3IhfZyvzF5leSZmdbFfzI0tHS1UJ/ZZYjutDvlPlbyYA== =Jys5 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging * Reverse debugging (Pavel) * CFLAGS cleanup (Paolo) * ASLR fix (Mark) * cpus.c refactoring (Claudio) # gpg: Signature made Tue 06 Oct 2020 07:35:09 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (37 commits) tests/acceptance: add reverse debugging test replay: create temporary snapshot at debugger connection replay: describe reverse debugging in docs/replay.txt gdbstub: add reverse continue support in replay mode gdbstub: add reverse step support in replay mode replay: flush rr queue before loading the vmstate replay: implement replay-seek command replay: introduce breakpoint at the specified step replay: introduce info hmp/qmp command qapi: introduce replay.json for record/replay-related stuff migration: introduce icount field for snapshots qcow2: introduce icount field for snapshots replay: provide an accessor for rr filename replay: don't record interrupt poll configure: don't enable ASLR for --enable-debug Windows builds configure: consistently pass CFLAGS/CXXFLAGS/LDFLAGS to meson configure: do not clobber environment CFLAGS/CXXFLAGS/LDFLAGS dtc: Convert Makefile bits to meson bits slirp: Convert Makefile bits to meson bits accel/tcg: use current_machine as it is always set for softmmu ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-10-06 15:04:10 +01:00
Pavel Dovgalyuk	b39847a505	migration: introduce icount field for snapshots Saving icount as a parameters of the snapshot allows navigation between them in the execution replay scenario. This information can be used for finding a specific snapshot for proceeding the recorded execution to the specific moment of the time. E.g., 'reverse step' action (introduced in one of the following patches) needs to load the nearest snapshot which is prior to the current moment of time. This patch also updates snapshot test which verifies qemu monitor output. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> Acked-by: Markus Armbruster <armbru@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com> -- v4 changes: - squashed format update with test output update v7 changes: - introduced the spaces between the fields in snapshot info output - updated the test to match new field widths Message-Id: <160174518865.12451.14327573383978752463.stgit@pasha-ThinkPad-X280> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-06 08:34:49 +02:00
Pavel Dovgalyuk	bbacffc5f7	qcow2: introduce icount field for snapshots This patch introduces the icount field for saving within the snapshot. It is required for navigation between the snapshots in record/replay mode. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> Acked-by: Kevin Wolf <kwolf@redhat.com> -- v7 changes: - also fix the test which checks qcow2 snapshot extra data Message-Id: <160174518284.12451.2301137308458777398.stgit@pasha-ThinkPad-X280> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-06 08:34:49 +02:00
Vladimir Sementsov-Ogievskiy	b33b354f3a	block/io: refactor save/load vmstate Like for read/write in a previous commit, drop extra indirection layer, generate directly bdrv_readv_vmstate() and bdrv_writev_vmstate(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-8-vsementsov@virtuozzo.com>	2020-10-05 10:59:42 +01:00
Vladimir Sementsov-Ogievskiy	fae2681add	block: drop bdrv_prwv Now that we are not maintaining boilerplate code for coroutine wrappers, there is no more sense in keeping the extra indirection layer of bdrv_prwv(). Let's drop it and instead generate pure bdrv_preadv() and bdrv_pwritev(). Currently, bdrv_pwritev() and bdrv_preadv() are returning bytes on success, auto generated functions will instead return zero, as their _co_ prototype. Still, it's simple to make the conversion safe: the only external user of bdrv_pwritev() is test-bdrv-drain, and it is comfortable enough with bdrv_co_pwritev() instead. So prototypes are moved to local block/coroutines.h. Next, the only internal use is bdrv_pread() and bdrv_pwrite(), which are modified to return bytes on success. Of course, it would be great to convert bdrv_pread() and bdrv_pwrite() to return 0 on success. But this requires audit (and probably conversion) of all their users, let's leave it for another day refactoring. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-7-vsementsov@virtuozzo.com>	2020-10-05 10:59:42 +01:00
Vladimir Sementsov-Ogievskiy	9bb4b066cc	block: generate coroutine-wrapper code Use code generation implemented in previous commit to generated coroutine wrappers in block.c and block/io.c Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-6-vsementsov@virtuozzo.com>	2020-10-05 10:59:42 +01:00
Vladimir Sementsov-Ogievskiy	aaaa20b69b	scripts: add block-coroutine-wrapper.py We have a very frequent pattern of creating a coroutine from a function with several arguments: - create a structure to pack parameters - create _entry function to call original function taking parameters from struct - do different magic to handle completion: set ret to NOT_DONE or EINPROGRESS or use separate bool field - fill the struct and create coroutine from _entry function with this struct as a parameter - do coroutine enter and BDRV_POLL_WHILE loop Let's reduce code duplication by generating coroutine wrappers. This patch adds scripts/block-coroutine-wrapper.py together with some friends, which will generate functions with declared prototypes marked by the 'generated_co_wrapper' specifier. The usage of new code generation is as follows: 1. define the coroutine function somewhere int coroutine_fn bdrv_co_NAME(...) {...} 2. declare in some header file int generated_co_wrapper bdrv_NAME(...); with same list of parameters (generated_co_wrapper is defined in "include/block/block.h"). 3. Make sure the block_gen_c declaration in block/meson.build mentions the file with your marker function. Still, no function is now marked, this work is for the following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924185414.28642-5-vsementsov@virtuozzo.com> [Added encoding='utf-8' to open() calls as requested by Vladimir. Fixed typo and grammar issues pointed out by Eric Blake. Removed clang-format dependency that caused build test issues. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-05 10:59:06 +01:00
Vladimir Sementsov-Ogievskiy	21c2283ebc	block: declare some coroutine functions in block/coroutines.h We are going to keep coroutine-wrappers code (structure-packing parameters, BDRV_POLL wrapper functions) in separate auto-generated files. So, we'll need a header with declaration of original _co_ functions, for those which are static now. As well, we'll need declarations for wrapper functions. Do these declarations now, as a preparation step. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-4-vsementsov@virtuozzo.com>	2020-10-05 09:35:52 +01:00
Vladimir Sementsov-Ogievskiy	f9e694cb32	block/io: refactor coroutine wrappers Most of our coroutine wrappers already follow this convention: We have 'coroutine_fn bdrv_co_<something>(<normal argument list>)' as the core function, and a wrapper 'bdrv_<something>(<same argument list>)' which does parameter packing and calls bdrv_run_co(). The only outsiders are the bdrv_prwv_co and bdrv_common_block_status_above wrappers. Let's refactor them to behave as the others, it simplifies further conversion of coroutine wrappers. This patch adds an indirection layer, but it will be compensated by a further commit, which will drop bdrv_co_prwv together with the is_write logic, to keep the read and write paths separate. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-3-vsementsov@virtuozzo.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	eefffb0244	block/nvme: Replace magic value by SCALE_MS definition Use self-explicit SCALE_MS definition instead of magic value (missed in similar commit `e4f310fe7f`). Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-7-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	fad1eb6886	block/nvme: Use register definitions from 'block/nvme.h' Use the NVMe register definitions from "block/nvme.h" which ease a bit reviewing the code while matching the datasheet. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-6-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	9406e0d97e	block/nvme: Drop NVMeRegs structure, directly use NvmeBar NVMeRegs only contains NvmeBar. Simplify the code by using NvmeBar directly. This triggers a checkpatch.pl error: ERROR: Use of volatile is usually wrong, please add a comment #30: FILE: block/nvme.c:691: + volatile NvmeBar *regs; This is a false positive as in our case we are using I/O registers, so the 'volatile' use is justified. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-5-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	37d7a45abd	block/nvme: Reduce I/O registers scope We only access the I/O register in nvme_init(). Remove the reference in BDRVNVMeState and reduce its scope. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-4-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	f68453237b	block/nvme: Map doorbells pages write-only Per the datasheet sections 3.1.13/3.1.14: "The host should not read the doorbell registers." As we don't need read access, map the doorbells with write-only permission. We keep a reference to this mapped address in the BDRVNVMeState structure. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-3-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	b02c01a513	util/vfio-helpers: Pass page protections to qemu_vfio_pci_map_bar() Pages are currently mapped READ/WRITE. To be able to use different protections, add a new argument to qemu_vfio_pci_map_bar(). Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-2-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Alberto Garcia	c508c73dca	qcow2: Use L1E_SIZE in qcow2_write_l1_entry() We overlooked these in `02b1ecfa10` Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200928162333.14998-1-berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	30dbc81d31	block/export: Move writable to BlockExportOptions The 'writable' option is a basic option that will probably be applicable to most if not all export types that we will implement. Move it from NBD to the generic BlockExport layer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-26-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	8cade320c8	block/export: Add query-block-exports This adds a simple QMP command to query the list of block exports. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-25-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	331170e073	block/export: Create BlockBackend in blk_exp_add() Every export type will need a BlockBackend, so creating it centrally in blk_exp_add() instead of the .create driver callback avoids duplication. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-24-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	37a4f70cea	block/export: Move blk to BlockExport Every block export has a BlockBackend representing the disk that is exported. It should live in BlockExport therefore. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-23-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	1a9f7a804f	block/export: Add BLOCK_EXPORT_DELETED event Clients may want to know when an export has finally disappeard (block-export-del returns earlier than that in the general case), so add a QAPI event for it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200924152717.287415-22-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	3c3bc462ad	block/export: Add block-export-del Implement a new QMP command block-export-del and make nbd-server-remove a wrapper around it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-21-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	3859ad36f0	block/export: Move strong user reference to block_exports The reference owned by the user/monitor that is created when adding the export and dropped when removing it was tied to the 'exports' list in nbd/server.c. Every block export will have a user reference, so move it to the block export level and tie it to the 'block_exports' list in block/export/export.c instead. This is necessary for introducing a QMP command for removing exports. Note that exports are present in block_exports even after the user has requested shutdown. This is different from NBD's exports where exports are immediately removed on a shutdown request, even if they are still in the process of shutting down. In order to avoid that the user still interacts with an export that is shutting down (and possibly removes it a second time), we need to remember if the user actually still owns it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-20-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	d53be9ce55	block/export: Add 'id' option to block-export-add We'll need an id to identify block exports in monitor commands. This adds one. Note that this is different from the 'name' option in the NBD server, which is the externally visible export name. While block export ids need to be unique in the whole process, export names must be unique only for the same server. Different export types or (potentially in the future) multiple NBD servers can have the same export name externally, but still need different block export ids internally. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-19-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	bc4ee65b8c	block/export: Add blk_exp_close_all(_type) This adds a function to shut down all block exports, and another one to shut down the block exports of a single type. The latter is used for now when stopping the NBD server. As soon as we implement support for multiple NBD servers, we'll need a per-server list of exports and it will be replaced by a function using that. As a side effect, the BlockExport layer has a list tracking all existing exports now. closed_exports loses its only user and can go away. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-18-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	a6ff798966	block/export: Allocate BlockExport in blk_exp_add() Instead of letting the driver allocate and return the BlockExport object, allocate it already in blk_exp_add() and pass it. This allows us to initialise the generic part before calling into the driver so that the driver can just use these values instead of having to parse the options a second time. For symmetry, move freeing the BlockExport to blk_exp_unref(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-17-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	b6076afcab	block/export: Add node-name to BlockExportOptions Every block export needs a block node to export, so add a 'node-name' option to BlockExportOptions and remove the replaced option 'device' from BlockExportOptionsNbd. To maintain compatibility in nbd-server-add, BlockExportOptionsNbd needs to be wrapped by a new type NbdServerAddOptions that adds 'device' back because nbd-server-add doesn't use the BlockExportOptions base type at all (so even without changing it to a 'node-name' option in block-export-add, this compatibility code would be necessary). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-16-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	8612c68673	block/export: Move AioContext from NBDExport to BlockExport Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-15-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	c69de1bef5	block/export: Move refcount from NBDExport to BlockExport Having a refcount makes sense for all types of block exports. It is also a prerequisite for keeping a list of all exports at the BlockExport level. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-14-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	1c8222b014	nbd: Add max-connections to nbd-server-start This is a QMP equivalent of qemu-nbd's --shared option, limiting the maximum number of clients that can attach at the same time. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924152717.287415-9-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	9b562c646b	block/export: Remove magic from block-export-add nbd-server-add tries to be convenient and adds two questionable features that we don't want to share in block-export-add, even for NBD exports: 1. When requesting a writable export of a read-only device, the export is silently downgraded to read-only. This should be an error in the context of block-export-add. 2. When using a BlockBackend name, unplugging the device from the guest will automatically stop the NBD server, too. This may sometimes be what you want, but it could also be very surprising. Let's keep things explicit with block-export-add. If the user wants to stop the export, they should tell us so. Move these things into the nbd-server-add QMP command handler so that they apply only there. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-8-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	56ee86261e	block/export: Add BlockExport infrastructure and block-export-add We want to have a common set of commands for all types of block exports. Currently, this is only NBD, but we're going to add more types. This patch adds the basic BlockExport and BlockExportDriver structs and a QMP command block-export-add that creates a new export based on the given BlockExportOptions. qmp_nbd_server_add() becomes a wrapper around qmp_block_export_add(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-5-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	143ea7670c	qapi: Rename BlockExport to BlockExportOptions The name BlockExport will be used for the struct containing the runtime state of block exports, so change the name of export creation options. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924152717.287415-4-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	5daa6bfd8e	qapi: Create block-export module Move all block export related types and commands from block-core to the new QAPI module block-export. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924152717.287415-3-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Philippe Mathieu-Daudé	74f2e02766	block/sheepdog: Replace magic val by NANOSECONDS_PER_SECOND definition Use self-explicit NANOSECONDS_PER_SECOND definition instead of magic value. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200921110145.520944-1-philmd@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Philippe Mathieu-Daudé	f68c01470b	qapi: Restrict query-uuid command to machine code Only qemu-system-FOO and qemu-storage-daemon provide QMP monitors, therefore such declarations and definitions are irrelevant for user-mode emulation. Restricting the query-uuid command to machine.json pulls less QAPI-generated code into user-mode. Acked-by: Markus Armbruster <armbru@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200913195348.1064154-6-philmd@redhat.com> [Commit message tweaked] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-09-29 15:41:35 +02:00
Stefan Hajnoczi	d73415a315	qemu/atomic.h: rename atomic_ to qatomic_ clang's C11 atomic_fetch_() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int ' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic$64$\?_[a-z0-9_]\+' include/qemu/atomic.h \| \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>	2020-09-23 16:07:44 +01:00
Daniel P. Berrangé	b18a24a9f8	block/file: switch to use qemu_open/qemu_create for improved errors Currently at startup if using cache=none on a filesystem lacking O_DIRECT such as tmpfs, at startup QEMU prints qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: file system may not support O_DIRECT qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: Could not open '/tmp/foo.img': Invalid argument while at QMP level the hint is missing, so QEMU reports just "error": { "class": "GenericError", "desc": "Could not open '/tmp/foo.img': Invalid argument" } which is close to useless for the end user trying to figure out what they did wrong. With this change at startup QEMU prints qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: Unable to open '/tmp/foo.img': filesystem does not support O_DIRECT while at the QMP level QEMU reports a massively more informative "error": { "class": "GenericError", "desc": "Unable to open '/tmp/foo.img': filesystem does not support O_DIRECT" } Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2020-09-16 10:33:48 +01:00
Daniel P. Berrangé	448058aa99	util: rename qemu_open() to qemu_open_old() We want to introduce a new version of qemu_open() that uses an Error object for reporting problems and make this it the preferred interface. Rename the existing method to release the namespace for the new impl. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2020-09-16 10:33:48 +01:00
Stefano Garzarella	7bae7c805d	block/rbd: add 'namespace' to qemu_rbd_strong_runtime_opts[] Commit `19ae9ae014` ("block/rbd: Add support for ceph namespaces") introduced namespace support for RBD, but we forgot to add the new 'namespace' options to qemu_rbd_strong_runtime_opts[]. The 'namespace' is used to identify the image, so it is a strong option since it can changes the data of a BDS. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1821528 Fixes: `19ae9ae014` ("block/rbd: Add support for ceph namespaces") Cc: Florian Florensa <fflorensa@online.net> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20200914190553.74871-1-sgarzare@redhat.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:31:10 +02:00
Alberto Garcia	bfd0989acf	qcow2: Convert qcow2_alloc_cluster_offset() into qcow2_alloc_host_offset() qcow2_alloc_cluster_offset() takes an (unaligned) guest offset and returns the (aligned) offset of the corresponding cluster in the qcow2 image. In practice none of the callers need to know where the cluster starts so this patch makes the function calculate and return the final host offset directly. The function is also renamed accordingly. See `388e581615` for a similar change to qcow2_get_cluster_offset(). Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <9bfef50ec9200d752413be4fc2aeb22a28378817.1599833007.git.berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:31:10 +02:00
Alberto Garcia	8e958260c5	qcow2: Make preallocate_co() resize the image to the correct size This function preallocates metadata structures and then extends the image to its new size, but that new size calculation is wrong because it doesn't take into account that the host_offset variable is always cluster-aligned. This problem can be reproduced with preallocation=metadata when the original size is not cluster-aligned but the new size is. In this case the final image size will be shorter than expected. qemu-img create -f qcow2 img.qcow2 31k qemu-img resize --preallocation=metadata img.qcow2 128k Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <adeb8b059917b141d5f5b3bd2a016262d3052c79.1599833007.git.berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> [mreitz: Mark compat=0.10 unsupported for iotest 125] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:30:36 +02:00
John Snow	c1dadda02c	block/qcow: remove runtime opts Introduced by `d85f4222b4`, These were seemingly never used at all. Signed-off-by: John Snow <jsnow@redhat.com> Message-Id: <20200806211345.2925343-3-jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
John Snow	30b70f070f	block/rbd: remove runtime_opts This saw its last use in `4bfb274165`. Signed-off-by: John Snow <jsnow@redhat.com> Message-Id: <20200806211345.2925343-2-jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	580384d637	qcow2: Return the original error code in qcow2_co_pwrite_zeroes() This function checks the current status of a (sub)cluster in order to see if an unaligned 'write zeroes' request can be done efficiently by simply updating the L2 metadata and without having to write actual zeroes to disk. If the situation does not allow using the fast path then the function returns -ENOTSUP and the caller falls back to writing zeroes. If can happen however that the aforementioned check returns an actual error code so in this case we should pass it to the caller. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200909123739.719-1-berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	3fec237fca	qcow2: Make qcow2_free_any_clusters() free only one cluster This function takes an L2 entry and a number of clusters to free. Although in principle it can free any type of cluster (using the L2 entry to determine its type) in practice the API is broken because compressed clusters have a variable size and there is no way to free more than one without having the L2 entry of each one of them. The good news all callers are passing nb_clusters=1 so we can simply get rid of that parameter. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <77cea0f4616f921d37e971b3c5b18a2faa24b173.1599573989.git.berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	1a52b73dba	qcow2: Handle QCowL2Meta on error in preallocate_co() If qcow2_alloc_cluster_offset() or qcow2_alloc_cluster_link_l2() fail then this function simply returns the error code, potentially leaking the QCowL2Meta structure and leaving stale items in s->cluster_allocs. A second problem is that this function calls qcow2_free_any_clusters() on failure but passing a host cluster offset instead of an L2 entry. Luckily for normal uncompressed clusters a raw offset also works like a valid L2 entry so it works just the same, but we should be using qcow2_free_clusters() instead. This patch fixes both problems by using qcow2_handle_l2meta(). Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <cd3a6b9abd43f9c0b60be413d760f0cacc67eb66.1599573989.git.berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Swapnil Ingle	83a6a90009	block/vhdx: Support vhdx image only with 512 bytes logical sector size block/vhdx uses qemu block layer where sector size is always 512 bytes. This may have issues with 4K logical sector sized vhdx image. For e.g qemu-img convert on such images fails with following assert: $qemu-img convert -f vhdx -O raw 4KTest1.vhdx test.raw qemu-img: util/iov.c:388: qiov_slice: Assertion `offset + len <= qiov->size' failed. Aborted This patch adds an check to return ENOTSUP for vhdx images which have logical sector size other than 512 bytes. Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com> Message-Id: <1596794594-44531-1-git-send-email-swapnil.ingle@nutanix.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	2b60c5b996	qcow2: Rewrite the documentation of qcow2_alloc_cluster_offset() The current text corresponds to an earlier, simpler version of this function and it does not explain how it works now. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <bb5bd06f07c5a05b0818611de0d06ec5b66c8df3.1599150873.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	f7bd5bba1b	qcow2: Don't check nb_clusters when removing l2meta from the list In the past, when a new cluster was allocated the l2meta structure was a variable in the stack so it was necessary to have a way to tell whether it had been initialized and contained valid data or not. The nb_clusters field was used for this purpose. Since commit `f50f88b9fe` this is no longer the case, l2meta (nowadays a pointer to a list) is only allocated when needed and nb_clusters is guaranteed to be > 0 so this check is unnecessary. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <ab0b67c29c7ba26e598db35f12aa5ab5982539c1.1599150873.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	184581fa4d	qcow2: Fix removal of list members from BDRVQcow2State.cluster_allocs When a write request needs to allocate new clusters (or change the L2 bitmap of existing ones) a QCowL2Meta structure is created so the L2 metadata can be later updated and any copy-on-write can be performed if necessary. A write request can span a region consisting of an arbitrary combination of previously unallocated and allocated clusters, and if the unallocated ones can be put contiguous to the existing ones then QEMU will do so in order to minimize the number of write operations. In practice this means that a write request has not just one but a number of QCowL2Meta structures. All of them are added to the cluster_allocs list that is stored in BDRVQcow2State and is used to detect overlapping requests. After the write request finishes all its associated QCowL2Meta are removed from that list. calculate_l2_meta() takes care of creating and putting those structures in the list, and qcow2_handle_l2meta() takes care of removing them. The problem is that the error path in handle_alloc() also tries to remove an item in that list, a remnant from the time when this was handled there (that code would not even be correct anymore because it only removes one struct and not all the ones from the same write request). This can trigger a double removal of the same item from the list, causing a crash. This is not easy to reproduce in practice because it requires that do_alloc_cluster_offset() fails after a successful previous allocation during the same write request, but it can be reproduced with the included test case. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <3440a1c4d53c4fe48312b478c96accb338cbef7c.1599150873.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	02b1ecfa10	qcow2: Use macros for the L1, refcount and bitmap table entry sizes This patch replaces instances of sizeof(uint64_t) in the qcow2 driver with macros that indicate what those sizes are actually referring to. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200828110828.13833-1-berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:12 +02:00
Lukas Straub	5eb9a3c7b0	block/quorum.c: stable children names If we remove the child with the highest index from the quorum, decrement s->next_child_index. This way we get stable children names as long as we only remove the last child. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Fixes: https://bugs.launchpad.net/bugs/1881231 Reviewed-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <5d5f930424c1c770754041aa8ad6421dc4e2b58e.1596536719.git.lukasstraub2@web.de> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:12 +02:00
Peter Maydell	2499453eb1	Block layer patches: - qemu-img create: Fail gracefully when backing file is an empty string - Fixes related to filter block nodes ("Deal with filters" series) - block/nvme: Various cleanups required to use multiple queues - block/nvme: Use NvmeBar structure from "block/nvme.h" - file-win32: Fix "locking" option - iotests: Allow running from different directory -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl9Z7bcRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9ZS6w//bos+A0RfRRF0YFWkIBLQWxqzKcGvMJ8W XWv3mFzd47UaDgRYwVnCC3CR6bLYEINISngZ3geA4jI1+w7AtYKDOO0HN32dUg+D ZrNMn02701CA6qkmpxJ+yjsrl9ltR3jYe0me4Wr39Pvdexa2pl/e+M4Vas6FhkYL ghAwNThypscGCrFjAlz3ru2Sc/K+sPWrGoqkzr+SWvsm9wy4vb8aLxr8Yy50x/zc CqALS9SQ/YA93BCVi9CzPkVyV3ioA0kg/y38WvLtAQ9GZ3m/ekMro3WvdYsRsFCN LGXsuwFig+U7Kd7lJrCS9TLnlTJstNGqPq9jEoV5cThPvGknFfMvVOzRmmP7tzqT YRcPRy39z44OoLKa3kyg3aF38BTxt+9gPqBnivKMr9j9EecMvPsXXHRvF+lP+LsP j753Ih561hX6FurcjX8pc9GOM2cQA0GjlyL77UTTAmLZyFXP/8e55oQbBuYTylc/ Xlvmc/T+yEGiEGTnK+FxgDAiUaxbCCM9cDVStJjTvsIq43dwXb48g1onDsGZ5eDf j9lmAD6TJxHNOB5ErNsDPODf4/D1wJ9t9WVF8UZp9ArfPHRdxMzT7Q4LvetaDmVl +hQC9cgTq8Qd8LwSqbKEYua4L6iGbmLAT7/N6htq5L1eVLg76/tLg/tKSwh/vKAY yzPmyHaVK84= =gaaW -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - qemu-img create: Fail gracefully when backing file is an empty string - Fixes related to filter block nodes ("Deal with filters" series) - block/nvme: Various cleanups required to use multiple queues - block/nvme: Use NvmeBar structure from "block/nvme.h" - file-win32: Fix "locking" option - iotests: Allow running from different directory # gpg: Signature made Thu 10 Sep 2020 10:11:19 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (65 commits) block/qcow2-cluster: Add missing "fallthrough" annotation block/nvme: Pair doorbell registers block/nvme: Use generic NvmeBar structure block/nvme: Group controller registers in NVMeRegs structure file-win32: Fix "locking" option iotests: Allow running from different directory iotests: Test committing to overridden backing iotests: Add test for commit in sub directory iotests: Add filter mirror test cases iotests: Add filter commit test cases iotests: Let complete_and_wait() work with commit iotests: Test that qcow2's data-file is flushed block: Leave BDS.backing_{file,format} constant block: Inline bdrv_co_block_status_from_*() blockdev: Fix active commit choice block: Drop backing_bs() qemu-img: Use child access functions nbd: Use CAF when looking for dirty bitmap commit: Deal with filters backup: Deal with filters ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-11 14:47:49 +01:00
Thomas Huth	b9be6faed1	block/qcow2-cluster: Add missing "fallthrough" annotation When compiling with -Werror=implicit-fallthrough, the compiler currently complains: ../../devel/qemu/block/qcow2-cluster.c: In function ‘cluster_needs_new_alloc’: ../../devel/qemu/block/qcow2-cluster.c:1320:12: error: this statement may fall through [-Werror=implicit-fallthrough=] if (l2_entry & QCOW_OFLAG_COPIED) { ^ ../../devel/qemu/block/qcow2-cluster.c:1323:5: note: here case QCOW2_CLUSTER_UNALLOCATED: ^~~~ It's quite obvious that the fallthrough is intended here, so let's add a comment to silence the compiler warning. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20200908070028.193298-1-thuth@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé	e5ff22ba9f	block/nvme: Pair doorbell registers For each queue doorbell registers are paired as: - Submission Queue Tail Doorbell - Completion Queue Head Doorbell Reflect that in the NVMeRegs structure, and adapt nvme_create_queue_pair() accordingly. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200904124130.583838-4-philmd@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Fam Zheng <fam@euphon.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé	c7100f0a0b	block/nvme: Use generic NvmeBar structure Commit `f3c507adcd` ("NVMe: Initial commit for new storage interface") introduced the NvmeBar structure. Unfortunately in commit `bdd6a90a9e` ("block: Add VFIO based NVMe driver") we duplicated it. Apparently in commit `a3d9a352d4` ("block: Move NVMe constants to a separate header") we tried to unify headers but forgot to remove the structure declared in the block/nvme.c source file. Do it now, and remove the structure size check which is redundant with the header check added in commit `74e18435c0` ("hw/block/nvme: Align I/O BAR to 4 KiB"). Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200904124130.583838-3-philmd@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Fam Zheng <fam@euphon.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé	0ea32f34ce	block/nvme: Group controller registers in NVMeRegs structure We want to use the NvmeBar structure from "block/nvme.h" in the next commit. As a preliminary step, group all the NVMe controller registers in the 'ctrl' field, keeping the doorbells registers out of it. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200904124130.583838-2-philmd@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Fam Zheng <fam@euphon.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:12 +02:00
Kevin Wolf	3b079ac0ff	file-win32: Fix "locking" option The intended behaviour was that locking=off/auto work and have no effect (to remain compatible with file-posix), whereas locking=on would return an error. Unfortunately, the code forgot to remove "locking" from the options QDict, so any attempt to use the option would fail. Replace the option parsing code for "locking" with something that is part of the raw_runtime_opts QemuOptsList (so it is properly removed from the QDict) and looks more like file-posix. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200907092739.9988-1-kwolf@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:12 +02:00
Markus Armbruster	b15e402fc8	trace-events: Fix attribution of trace points to source Some trace points are attributed to the wrong source file. Happens when we neglect to update trace-events for code motion, or add events in the wrong place, or misspell the file name. Clean up with help of scripts/cleanup-trace-events.pl. Funnies requiring manual post-processing: * accel/tcg/cputlb.c trace points are in trace-events. * block.c and blockdev.c trace points are in block/trace-events. * hw/block/nvme.c uses the preprocessor to hide its trace point use from cleanup-trace-events.pl. * hw/tpm/tpm_spapr.c uses pseudo trace point tpm_spapr_show_buffer to guard debug code. * include/hw/xen/xen_common.h trace points are in hw/xen/trace-events. * linux-user/trace-events abbreviates a tedious list of filenames to /signal.c. net/colo-compare and net/filter-rewriter.c use pseudo trace points colo_compare_miscompare and colo_filter_rewriter_debug to guard debug code. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200806141334.3646302-5-armbru@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-09-09 17:17:58 +01:00
Markus Armbruster	6ec9379870	trace-events: Delete unused trace points Tracked down with the help of scripts/cleanup-trace-events.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 20200806141334.3646302-4-armbru@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-09-09 17:17:02 +01:00
Max Reitz	0b877d09df	block: Leave BDS.backing_{file,format} constant Parts of the block layer treat BDS.backing_file as if it were whatever the image header says (i.e., if it is a relative path, it is relative to the overlay), other parts treat it like a cache for bs->backing->bs->filename (relative paths are relative to the CWD). Considering bs->backing->bs->filename exists, let us make it mean the former. Among other things, this now allows the user to specify a base when using qemu-img to commit an image file in a directory that is not the CWD (assuming, everything uses relative filenames). Before this patch: $ ./qemu-img create -f qcow2 foo/bot.qcow2 1M $ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2 $ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2 $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2 qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2' $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2 qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2' $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2 qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2' After this patch: $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2 Image committed. $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2 qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2' $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2 Image committed. With this change, bdrv_find_backing_image() must look at whether the user has overridden a BDS's backing file. If so, it can no longer use bs->backing_file, but must instead compare the given filename against the backing node's filename directly. Note that this changes the QAPI output for a node's backing_file. We had very inconsistent output there (sometimes what the image header said, sometimes the actual filename of the backing image). This inconsistent output was effectively useless, so we have to decide one way or the other. Considering that bs->backing_file usually at runtime contained the path to the image relative to qemu's CWD (or absolute), this patch changes QAPI's backing_file to always report the bs->backing->bs->filename from now on. If you want to receive the image header information, you have to refer to full-backing-filename. This necessitates a change to iotest 228. The interesting information it really wanted is the image header, and it can get that now, but it has to use full-backing-filename instead of backing_file. Because of this patch's changes to bs->backing_file's behavior, we also need some reference output changes. Along with the changes to bs->backing_file, stop updating BDS.backing_format in bdrv_backing_attach() as well. This way, ImageInfo's backing-filename and backing-filename-format fields will represent what the image header says and nothing else. iotest 245 changes in behavior: With the backing node no longer overriding the parent node's backing_file string, you can now omit the @backing option when reopening a node with neither a default nor a current backing file even if it used to have a backing node at some point. 273 also changes: The base image is opened without a format layer, so ImageInfo.backing-filename-format used to report "file" for the base image's overlay after blockdev-snapshot. However, the image header never says "file" anywhere, so it now reports $IMGFMT. Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	549ec0d978	block: Inline bdrv_co_block_status_from_*() With bdrv_filter_bs(), we can easily handle this default filter behavior in bdrv_co_block_status(). blkdebug wants to have an additional assertion, so it keeps its own implementation, except bdrv_co_block_status_from_file() needs to be inlined there. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	9a71b9de3f	commit: Deal with filters This includes some permission limiting (for example, we only need to take the RESIZE permission if the base is smaller than the top). Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	2b088c60bb	backup: Deal with filters Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	3f072a7fb7	mirror: Deal with filters This includes some permission limiting (for example, we only need to take the RESIZE permission for active commits where the base is smaller than the top). base_overlay is introduced so we can query bdrv_is_allocated_above() on it - we cannot do that with base itself, because a filter's block_status is the same as its child node, so if there are filters on base, bdrv_is_allocated_above() on base would return information including base. Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to "target_backing_bs", because that is what it really refers to. Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	c6f6d8462c	block-copy: Use CAF to find sync=top base Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	0a7585dbba	block: Use child access functions for QAPI queries query-block, query-named-block-nodes, and query-blockstats now return any filtered child under "backing", not just bs->backing or COW children. This is so that filters do not interrupt the reported backing chain. This changes the output for iotest 184, as the throttled node now appears as a backing child. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	3f26191c73	block: Report data child for query-blockstats It makes no sense to report the block stats of a purely metadata-storing child in query-blockstats. So if the primary child does not have any data, try to find a unique data-storing child. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	07cd7b659a	block/null: Implement bdrv_get_allocated_file_size It is trivial, so we might as well do it. Remove _filter_actual_image_size from iotest 184, so we get to see the result in its reference output. Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	c8af87573f	block/snapshot: Fix fallback If the top node's driver does not provide snapshot functionality and we want to fall back to a node down the chain, we need to snapshot all non-COW children. For simplicity's sake, just do not fall back if there is more than one such child. Furthermore, we really only can fall back to bs->file and bs->backing, because bdrv_snapshot_goto() has to modify the child link (notably, set it to NULL). Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	c4db2e25df	block: Use CAF in bdrv_co_rw_vmstate() If a node whose driver does not provide VM state functions has a metadata child, the VM state should probably go there; if it is a filter, the VM state should probably go there. It follows that we should generally go down to the primary child. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	66b129ac5e	block: Iterate over children in refresh_limits Instead of looking at just bs->file and bs->backing, we should look at all children that could end up receiving forwarded requests. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	fb787f02a6	vmdk: Drop vmdk_co_flush() Before HEAD^, we needed this because bdrv_co_flush() by itself would only flush bs->file. With HEAD^, bdrv_co_flush() will flush all children on which a WRITE or WRITE_UNCHANGED permission has been taken. Thus, vmdk no longer needs to do it itself. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	883833e29c	block: Flush all children in generic code If the driver does not support .bdrv_co_flush() so bdrv_co_flush() itself has to flush the children of the given node, it should not flush just bs->file->bs, but in fact all children that might have been written to (judging from the permissions taken on them). This is a bug fix for qcow2 images with an external data file, as they so far did not flush that data_file node. In any case, the BLKDBG_EVENT() should be emitted on the primary child, because that is where a blkdebug node would be if there is any. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	23b93525a2	block: Use bdrv_cow_child() in bdrv_co_truncate() The condition modified here is not about potentially filtered children, but only about COW sources (i.e. traditional backing files). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	67acfd2188	stream: Deal with filters Because of the (not so recent anymore) changes that make the stream job independent of the base node and instead track the node above it, we have to split that "bottom" node into two cases: The bottom COW node, and the node directly above the base node (which may be an R/W filter or the bottom COW node). Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	cb8503159a	block: Use CAFs in block status functions Use the child access functions in the block status inquiry functions as appropriate. Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	93393e698c	block: Use bdrv_filter_(bs\|child) where obvious Places that use patterns like if (bs->drv->is_filter && bs->file) { ... something about bs->file->bs ... } should be BlockDriverState *filtered = bdrv_filter_bs(bs); if (filtered) { ... something about @filtered ... } instead. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	4935e8be22	copy-on-read: Support compressed writes Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	e7e754aec3	throttle: Support compressed writes Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	8b8277cdb0	block: Drop bdrv_is_encrypted() The original purpose of bdrv_is_encrypted() was to inquire whether a BDS can be used without the user entering a password or not. It has not been used for that purpose for quite some time. Actually, it is not even fit for that purpose, because to answer that question, it would have recursively query all of the given node's children. So now we have to decide in which direction we want to fix bdrv_is_encrypted(): Recursively query all children, or drop it and just use bs->encrypted to get the current node's status? Nowadays, its only purpose is to report through bdrv_query_image_info() whether the given image is encrypted or not. For this purpose, it is probably more interesting to see whether a given node itself is encrypted or not (otherwise, a management application cannot discern for certain which nodes are really encrypted and which just have encrypted children). Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	b111b3fcde	block/nvme: Use an array of EventNotifier In preparation of using multiple IRQ (thus multiple eventfds) make BDRVNVMeState::irq_notifier an array (for now of a single element, the admin queue notifier). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-16-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	7a1fb2ef40	block/nvme: Extract nvme_poll_queue() As we want to do per-queue polling, extract the nvme_poll_queue() method which operates on a single queue. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-15-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	0a28b02ef9	block/nvme: Simplify nvme_create_queue_pair() arguments nvme_create_queue_pair() doesn't require BlockDriverState anymore. Replace it by BDRVNVMeState and AioContext to simplify. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-14-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	073a06978c	block/nvme: Replace BDRV_POLL_WHILE by AIO_WAIT_WHILE BDRV_POLL_WHILE() is defined as: #define BDRV_POLL_WHILE(bs, cond) ({ \ BlockDriverState *bs_ = (bs); \ AIO_WAIT_WHILE(bdrv_get_aio_context(bs_), \ cond); }) As we will remove the BlockDriverState use in the next commit, start by using the exploded version of BDRV_POLL_WHILE(). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-13-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	3a6d34d066	block/nvme: Simplify nvme_init_queue() arguments nvme_init_queue() doesn't require BlockDriverState anymore. Replace it by BDRVNVMeState to simplify. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-12-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	38e1f8186f	block/nvme: Replace qemu_try_blockalign(bs) by qemu_try_memalign(pg_sz) qemu_try_blockalign() is a generic API that call back to the block driver to return its page alignment. As we call from within the very same driver, we already know to page alignment stored in our state. Remove indirections and use the value from BDRVNVMeState. This change is required to later remove the BlockDriverState argument, to make nvme_init_queue() per hardware, and not per block driver. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-11-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	2ed846930d	block/nvme: Replace qemu_try_blockalign0 by qemu_try_blockalign/memset In the next commit we'll get rid of qemu_try_blockalign(). To ease review, first replace qemu_try_blockalign0() by explicit calls to qemu_try_blockalign() and memset(). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-10-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	7d3b214ae4	block/nvme: Use union of NvmeIdCtrl / NvmeIdNs structures We allocate an unique chunk of memory then use it for two different structures. By using an union, we make it clear the data is overlapping (and we can remove the casts). Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-9-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:24:53 +02:00
Philippe Mathieu-Daudé	4d98093937	block/nvme: Rename local variable We are going to modify the code in the next commit. Renaming the 'resp' variable to 'id' first makes the next commit easier to review. No logical changes. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-8-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	c8edbfb2cc	block/nvme: Use common error path in nvme_add_io_queue() Rearrange nvme_add_io_queue() by using a common error path. This will be proven useful in few commits where we add IRQ notification to the IO queues. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-7-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	bf6ce5ec6d	block/nvme: Improve error message when IO queue creation failed Do not use the same error message for different failures. Display a different error whether it is the CQ or the SQ. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-6-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	73159e52e6	block/nvme: Define INDEX macros to ease code review Use definitions instead of '0' or '1' indexes. Also this will be useful when using multi-queues later. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-5-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	0ea45f76eb	block/nvme: Let nvme_create_queue_pair() fail gracefully As nvme_create_queue_pair() is allowed to fail, replace the alloc() calls by try_alloc() to avoid aborting QEMU. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-4-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	e266f52cfb	block/nvme: Avoid further processing if trace event not enabled Avoid further processing if TRACE_NVME_SUBMIT_COMMAND_RAW is not enabled. This is an untested intend of performance optimization. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-3-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	e4f310fe7f	block/nvme: Replace magic value by SCALE_MS definition Use self-explicit SCALE_MS definition instead of magic value. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-2-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Peter Maydell	df8176274a	nbd patches for 2020-09-02 - fix a few iotests affected by earlier nbd changes - avoid blocking qemu by nbd client in connect() - build qemu-nbd for mingw -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl9QFB8ACgkQp6FrSiUn Q2pTwggArPN7dRwm1KD9jL8X6PV01uhLuLRFzrrofFX22Blroj5XPbR24BdKeN8V uDSFGzzoz2Vx3vKPHyZdx7bbeEL0pF9dzjZX6JwzT0McbVuge5aG2zC/ARdfc4PN 1Yf2FB/nY8Xt5G12usu3FIz7JBoQNm4mlPnqVqf7t0LQxgUFvO7F2LernyqEOYKS uSpXHNFqddZcax7etXeldJOlSGJBQTaCmplrJbw2ilVhLZJD+0OglY4SAsrrenN+ gb/KD4REjhtQsmoTaqthGCnGXipEoJYDfEOMhbkl0UK+9Mx0t+3cj3tflG0eGldM ERk1d4d7VSlyqIy7w43v3+IB8M99ag== =7Cn5 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2020-09-02' into staging nbd patches for 2020-09-02 - fix a few iotests affected by earlier nbd changes - avoid blocking qemu by nbd client in connect() - build qemu-nbd for mingw # gpg: Signature made Wed 02 Sep 2020 22:52:31 BST # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2020-09-02: nbd: disable signals and forking on Windows builds nbd: skip SIGTERM handler if NBD device support is not built block: add missing socket_init() calls to tools block/nbd: use non-blocking connect: fix vm hang on connect() iotests/259: Fix reference output iotests/059: Fix reference output Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-03 21:35:01 +01:00
Vladimir Sementsov-Ogievskiy	1dc4718d84	block/nbd: use non-blocking connect: fix vm hang on connect() This makes nbd's connection_co yield during reconnects, so that reconnect doesn't block the main thread. This is very important in case of an unavailable nbd server host: connect() call may take a long time, blocking the main thread (and due to reconnect, it will hang again and again with small gaps of working time during pauses between connection attempts). Realization notes: - We don't want to implement non-blocking connect() over non-blocking socket, because getaddrinfo() doesn't have portable non-blocking realization anyway, so let's just use a thread for both getaddrinfo() and connect(). - We can't use qio_channel_socket_connect_async (which behaves similarly and starts a thread to execute connect() call), as it's relying on someone iterating main loop (g_main_loop_run() or something like this), which is not always the case. - We can't use thread_pool_submit_co API, as thread pool waits for all threads to finish (but we don't want to wait for blocking reconnect attempt on shutdown. So, we just create the thread by hand. Some additional difficulties are: - We want our connect to avoid blocking drained sections and aio context switches. To achieve this, we make it possible to "cancel" synchronous wait for the connect (which is a coroutine yield actually), still, the thread continues in background, and if successful, its result may be reused on next reconnect attempt. - We don't want to wait for reconnect on shutdown, so there is CONNECT_THREAD_RUNNING_DETACHED thread state, which means that the block layer is no longer interested in a result, and thread should close new connected socket on finish and free the state. How to reproduce the bug, fixed with this commit: 1. Create an image on node1: qemu-img create -f qcow2 xx 100M 2. Start NBD server on node1: qemu-nbd xx 3. Start vm with second nbd disk on node2, like this: ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \ file=/work/images/cent7.qcow2 -drive file=nbd+tcp://192.168.100.2 \ -vnc :0 -qmp stdio -m 2G -enable-kvm -vga std 4. Access the vm through vnc (or some other way?), and check that NBD drive works: dd if=/dev/sdb of=/dev/null bs=1M count=10 - the command should succeed. 5. Now, let's trigger nbd-reconnect loop in Qemu process. For this: 5.1 Kill NBD server on node1 5.2 run "dd if=/dev/sdb of=/dev/null bs=1M count=10" in the guest again. The command should fail and a lot of error messages about failing disk may appear as well. Now NBD client driver in Qemu tries to reconnect. Still, VM works well. 6. Make node1 unavailable on NBD port, so connect() from node2 will last for a long time: On node1 (Note, that 10809 is just a default NBD port): sudo iptables -A INPUT -p tcp --dport 10809 -j DROP After some time the guest hangs, and you may check in gdb that Qemu hangs in connect() call, issued from the main thread. This is the BUG. 7. Don't forget to drop iptables rule from your node1: sudo iptables -D INPUT -p tcp --dport 10809 -j DROP Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200812145237.4396-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: minor wording and formatting tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-09-02 16:47:23 -05:00
Peter Maydell	e4d8b7c1a9	qemu-nvme -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE28EdLTc7SjdV9QLsYlFWYQpPbMAFAl9Pro4ACgkQYlFWYQpP bMC1CRAAkawTI4mMcOfI3smFoMeiY8kZJWXJUBXfHbMJ4asaoIjTkH/lXRXBw7KQ sH5tB9CuOums3VjagkZ0Sw6R/kP1LbyJTAwq/pwOXwYRDc/E3zQpMblkIHH1boIM Bxl814hw3hBqV+D0wgeKpl83lbiOpd10Cbpdb/xNKat6qVquLGurSGKgA7jNuF4s oPTPtfZpyH9LUr4DV+sL+fGX6vaCdSFZPZUhJqwFfx79+r3+YiHGLAE6fgsdGDJt 2RSSKMqBe2gg0BY5ToW9L55BsLnwMMrAZnGzEkeZvRKqm0JZBXQsERa61p4VEAJf uYkSEqOwsKjXQNTdDEekyH67AkgXaoqG0hiiOcgoLsla7C0zROtoKcfVM/+WC0LT T0/bfgubmoDV8kLzPuOV8xOGxjfbu4Qlxy1JsIC6BU4zBQvpDwOeTx3MUWaCUfvk YmDMEhZWGcZ3RBLrgQmzm4ZKMtGdYXnGQz5dwVkRRfghQs2fl5ZmUjGR7MqKe18n 4K0nzhPiXbOTlqvLVvzVlrBzdc8ECAs1kVoJF7C3LwRmXbT2N/fUhZP/nYpeM2Hj DQNmA8KpXMKae2+2iDnQNWbvdpz3SiHD6dK7A1bEsdoG0L60xfyeAF+JuPiESUnd OAhf+muxKiInv2k5GNh7mDZPWM6nDepf/PZP6ohc7dKxVam7N2M= =Y23H -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/nvme/tags/pull-nvme-20200902' into staging qemu-nvme # gpg: Signature made Wed 02 Sep 2020 15:39:10 BST # gpg: using RSA key DBC11D2D373B4A3755F502EC625156610A4F6CC0 # gpg: Good signature from "Keith Busch <kbusch@kernel.org>" [unknown] # gpg: aka "Keith Busch <keith.busch@gmail.com>" [unknown] # gpg: aka "Keith Busch <keith.busch@intel.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: DBC1 1D2D 373B 4A37 55F5 02EC 6251 5661 0A4F 6CC0 * remotes/nvme/tags/pull-nvme-20200902: (39 commits) hw/block/nvme: remove explicit qsg/iov parameters hw/block/nvme: use preallocated qsg/iov in nvme_dma_prp hw/block/nvme: consolidate qsg/iov clearing hw/block/nvme: add ns/cmd references in NvmeRequest hw/block/nvme: be consistent about zeros vs zeroes hw/block/nvme: add check for mdts hw/block/nvme: refactor request bounds checking hw/block/nvme: verify validity of prp lists in the cmb hw/block/nvme: add request mapping helper hw/block/nvme: add tracing to nvme_map_prp hw/block/nvme: refactor dma read/write hw/block/nvme: destroy request iov before reuse hw/block/nvme: remove redundant has_sg member hw/block/nvme: replace dma_acct with blk_acct equivalent hw/block/nvme: add mapping helpers hw/block/nvme: memset preallocated requests structures hw/block/nvme: bump supported version to v1.3 hw/block/nvme: provide the mandatory subnqn field hw/block/nvme: enforce valid queue creation sequence hw/block/nvme: reject invalid nsid values in active namespace id list ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-02 21:20:20 +01:00
Klaus Jensen	69265150aa	hw/block/nvme: be consistent about zeros vs zeroes The NVM Express specification generally uses 'zeroes' and not 'zeros', so let us align with it. Cc: Fam Zheng <fam@euphon.net> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>	2020-09-02 08:48:50 +02:00
Klaus Jensen	c26f217370	hw/block/nvme: bump spec data structures to v1.3 Add missing fields in the Identify Controller and Identify Namespace data structures to bring them in line with NVMe v1.3. This also adds data structures and defines for SGL support which requires a couple of trivial changes to the nvme block driver as well. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Fam Zheng <fam@euphon.net> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Message-Id: <20200706061303.246057-2-its@irrelevant.dk>	2020-09-02 08:48:50 +02:00
Peter Maydell	887adde81d	meson fixes: * bump submodule to 0.55.1 * SDL, pixman and zlib fixes * firmwarepath fix * fix firmware builds meson related: * move install to Meson * move NSIS to Meson * do not make meson use cmake * add description to options -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl9OcpcUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroM9+Af+InfjEFtsoubgQA2L7B1sHeksINOI nMAw9plmJSX0Qabp0PJDrMcBiLZlbdFGCm/88heTyDGRtbhIXLUrE2J0dyV4R4nR OSyuTgDna75QLxy6k1dIh6qVAtcj2hg+CxaJrUf2Ix6A1d1PAfWoweNXSF1LBmJf pyQ39eZXStI7/bkmwgTY3qK1gwjEaskvf68fTp1hgTN0VHwUb23/nucPaizbVF+a A0nWprPQaTjWhAFQ2jJesBbhN6FtD3EnOZs56JOSQ9J7W6uDnw8a+tOdEDbSYKe2 DdEUEjcz2cqAMCiWrzOMgxB8T885H+8yE6jgEmPHWaUUORQADMcBDMihNA== =NGLI -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging meson fixes: * bump submodule to 0.55.1 * SDL, pixman and zlib fixes * firmwarepath fix * fix firmware builds meson related: * move install to Meson * move NSIS to Meson * do not make meson use cmake * add description to options # gpg: Signature made Tue 01 Sep 2020 17:11:03 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (26 commits) Makefile: Fix in-tree clean/distclean Makefile: Add back TAGS/ctags/cscope rules meson: add description to options build: fix recurse-all target meson: use pkg-config method to find dependencies configure: do not include ${prefix} in firmwarepath meson: add pixman dependency to UI modules meson: add pixman dependency to chardev/baum module meson: add NSIS building meson: use meson mandir instead of qemu_mandir meson: pass docdir option meson: use meson datadir instead of qemu_datadir meson: pass qemu_suffix option configure: build docdir like other suffixed directories configure: always /-seperate directory from qemu_suffix configure: rename confsuffix option meson: move zlib detection to meson build-sys: remove install target from Makefile meson: install $localstatedir/run for qga meson: install desktop file ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-01 22:50:23 +01:00
Liao Pingfang	f181ab4ba5	block/vmdk: Remove superfluous breaks Remove superfluous breaks, as there is a "return" before them. Signed-off-by: Liao Pingfang <liao.pingfang@zte.com.cn> Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <1594631107-36574-1-git-send-email-wang.yi59@zte.com.cn> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-09-01 08:37:28 +02:00
Paolo Bonzini	a10c8516ed	block: always link with zlib The qcow2 driver needs the zlib dependency. While emulators provided it through the migration code, this is not true of the tools. Move the dependency from the qcow1 rule directly into block_ss so that it is included unconditionally. Fixes build with --disable-qcow1. Reported-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Cc: qemu-block@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-01 01:51:51 -04:00
Eduardo Habkost	7c9dcd6cab	throttle-groups: Move ThrottleGroup typedef to header Move typedef closer to the type check macros, to make it easier to convert the code to OBJECT_DEFINE_TYPE() in the future. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Tested-By: Roman Bolshakov <r.bolshakov@yadro.com> Message-Id: <20200825192110.3528606-17-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-08-27 14:04:54 -04:00
Alberto Garcia	7bbb59202a	qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters This function is only used by qcow2_expand_zero_clusters() to downgrade a qcow2 image to a previous version. This would require transforming all extended L2 entries into normal L2 entries but this is not a simple task and there are no plans to implement this at the moment. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <15e65112b4144381b4d8c0bdf8fb76b0d813e3d1.1594396418.git.berto@igalia.com> [mreitz: Fixed comment style] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 10:20:15 +02:00
Alberto Garcia	2118771ddf	qcow2: Allow preallocation and backing files if extended_l2 is set Traditional qcow2 images don't allow preallocation if a backing file is set. This is because once a cluster is allocated there is no way to tell that its data should be read from the backing file. Extended L2 entries have individual allocation bits for each subcluster, and therefore it is perfectly possible to have an allocated cluster with all its subclusters unallocated. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <6d5b0f38e7dc5f2f31d8cab1cb92044e9909aece.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 09:20:04 +02:00
Alberto Garcia	7be2025258	qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Now that the implementation of subclusters is complete we can finally add the necessary options to create and read images with this feature, which we call "extended L2 entries". Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <6476caaa73216bd05b7bb2d504a20415e1665176.1594396418.git.berto@igalia.com> [mreitz: %s/5\.1/5.2/; fixed 302's and 303's reference output] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 09:19:55 +02:00
Alberto Garcia	40dee94320	qcow2: Add prealloc field to QCowL2Meta This field allows us to indicate that the L2 metadata update does not come from a write request with actual data but from a preallocation request. For traditional images this does not make any difference, but for images with extended L2 entries this means that the clusters are allocated normally in the L2 table but individual subclusters are marked as unallocated. This will allow preallocating images that have a backing file. There is one special case: when we resize an existing image we can also request that the new clusters are preallocated. If the image already had a backing file then we have to hide any possible stale data and zero out the new clusters (see commit `955c7d6687` for more details). In this case the subclusters cannot be left as unallocated so the L2 bitmap must be updated. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <960d4c444a4f5a870e2b47e5da322a73cd9a2f5a.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	0dd07b298f	qcow2: Add subcluster support to qcow2_measure() Extended L2 entries are bigger than normal L2 entries so this has an impact on the amount of metadata needed for a qcow2 file. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <7efae2efd5e36b42d2570743a12576d68ce53685.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	a6841a2de6	qcow2: Add subcluster support to qcow2_co_pwrite_zeroes() This works now at the subcluster level and pwrite_zeroes_alignment is updated accordingly. qcow2_cluster_zeroize() is turned into qcow2_subcluster_zeroize() with the following changes: - The request can now be subcluster-aligned. - The cluster-aligned body of the request is still zeroized using zero_in_l2_slice() as before. - The subcluster-aligned head and tail of the request are zeroized with the new zero_l2_subclusters() function. There is just one thing to take into account for a possible future improvement: compressed clusters cannot be partially zeroized so zero_l2_subclusters() on the head or the tail can return -ENOTSUP. This makes the caller repeat the complete request and write actual zeroes to disk. This is sub-optimal because 1) if the head area was compressed we would still be able to use the fast path for the body and possibly the tail. 2) if the tail area was compressed we are writing zeroes to the head and the body areas, which are already zeroized. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <17e05e2ee7e12f10dcf012da81e83ebe27eb3bef.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	bf4a66eed4	qcow2: Add subcluster support to handle_alloc_space() The bdrv_co_pwrite_zeroes() call here fills complete clusters with zeroes, but it can happen that some subclusters are not part of the write request or the copy-on-write. This patch makes sure that only the affected subclusters are overwritten. A potential improvement would be to also fill with zeroes the other subclusters if we can guarantee that we are not overwriting existing data. However this would waste more disk space, so we should first evaluate if it's really worth doing. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <b3dc97e8e2240ddb5191a4f930e8fc9653f94621.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	ff4cdec7f6	qcow2: Clear the L2 bitmap when allocating a compressed cluster Compressed clusters always have the bitmap part of the extended L2 entry set to 0. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <04455b3de5dfeb9d1cfe1fc7b02d7060a6e09710.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	aca00cd971	qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() The L2 bitmap needs to be updated after each write to indicate what new subclusters are now allocated. This needs to happen even if the cluster was already allocated and the L2 entry was otherwise valid. In some cases however a write operation doesn't need change the L2 bitmap (because all affected subclusters were already allocated). This is detected in calculate_l2_meta(), and qcow2_alloc_cluster_link_l2() is never called in those cases. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <0875620d49f44320334b6a91c73b3f301f975f38.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	fc2e6528d5	qcow2: Add subcluster support to check_refcounts_l2() The offset field of an uncompressed cluster's L2 entry must be aligned to the cluster size, otherwise it is invalid. If the cluster has no data then it means that the offset points to a preallocation, so we can clear the offset field without affecting the guest-visible data. This is what 'qemu-img check' does when run in repair mode. On traditional qcow2 images this can only happen when QCOW_OFLAG_ZERO is set, and repairing such entries turns the clusters from ZERO_ALLOC into ZERO_PLAIN. Extended L2 entries have no ZERO_ALLOC clusters and no QCOW_OFLAG_ZERO but the idea is the same: if none of the subclusters are allocated then we can clear the offset field and leave the bitmap untouched. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <9f4ed1d0a34b0a545b032c31ecd8c14734065342.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	a68cd70326	qcow2: Add subcluster support to discard_in_l2_slice() Two things need to be taken into account here: 1) With full_discard == true the L2 entry must be cleared completely. This also includes the L2 bitmap if the image has extended L2 entries. 2) With full_discard == false we have to make the discarded cluster read back as zeroes. With normal L2 entries this is done with the QCOW_OFLAG_ZERO bit, whereas with extended L2 entries this is done with the individual 'all zeroes' bits for each subcluster. Note however that QCOW_OFLAG_ZERO is not supported in v2 qcow2 images so, if there is a backing file, discard cannot guarantee that the image will read back as zeroes. If this is important for the caller it should forbid it as qcow2_co_pdiscard() does (see `80f5c01183` for more details). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <5ef8274e628aa3ab559bfac467abf488534f2b76.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	205fa50750	qcow2: Add subcluster support to zero_in_l2_slice() The QCOW_OFLAG_ZERO bit that indicates that a cluster reads as zeroes is only used in standard L2 entries. Extended L2 entries use individual 'all zeroes' bits for each subcluster. This must be taken into account when updating the L2 entry and also when deciding that an existing entry does not need to be updated. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <b61d61606d8c9b367bd641ab37351ddb9172799a.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	3f9c6b3b1f	qcow2: Add subcluster support to qcow2_get_host_offset() The logic of this function remains pretty much the same, except that it uses count_contiguous_subclusters(), which combines the logic of count_contiguous_clusters() / count_contiguous_clusters_unallocated() and checks individual subclusters. qcow2_cluster_to_subcluster_type() is not necessary as a separate function anymore so it's inlined into its caller. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <d2193fd48653a350d80f0eca1c67b1d9053fb2f3.1594396418.git.berto@igalia.com> [mreitz: Initialize expected_type to anything] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	d53ec3d8d8	qcow2: Add subcluster support to calculate_l2_meta() If an image has subclusters then there are more copy-on-write scenarios that we need to consider. Let's say we have a write request from the middle of subcluster #3 until the end of the cluster: 1) If we are writing to a newly allocated cluster then we need copy-on-write. The previous contents of subclusters #0 to #3 must be copied to the new cluster. We can optimize this process by skipping all leading unallocated or zero subclusters (the status of those skipped subclusters will be reflected in the new L2 bitmap). 2) If we are overwriting an existing cluster: 2.1) If subcluster #3 is unallocated or has the all-zeroes bit set then we need copy-on-write (on subcluster #3 only). 2.2) If subcluster #3 was already allocated then there is no need for any copy-on-write. However we still need to update the L2 bitmap to reflect possible changes in the allocation status of subclusters #4 to #31. Because of this, this function checks if all the overwritten subclusters are already allocated and in this case it returns without creating a new QCowL2Meta structure. After all these changes l2meta_cow_start() and l2meta_cow_end() are not necessarily cluster-aligned anymore. We need to update the calculation of old_start and old_end in handle_dependencies() to guarantee that no two requests try to write on the same cluster. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <4292dd56e4446d386a2fe307311737a711c00708.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	97490a143e	qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC When dealing with subcluster types there is a new value called QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC that has no equivalent in QCow2ClusterType. This patch handles that value in all places where subcluster types are processed. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <bf09e2e2439a468a901bb96ace411eed9ee50295.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	10dabdc596	qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_* In order to support extended L2 entries some functions of the qcow2 driver need to start dealing with subclusters instead of clusters. qcow2_get_host_offset() is modified to return the subcluster type instead of the cluster type, and all callers are updated to replace all values of QCow2ClusterType with their QCow2SubclusterType equivalents. This patch only changes the data types, there are no semantic changes. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <f6c29737c295f32cbee74c903c30b01820363b34.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	ca4a0bb81b	qcow2: Add cluster type parameter to qcow2_get_host_offset() This function returns an integer that can be either an error code or a cluster type (a value from the QCow2ClusterType enum). We are going to start using subcluster types instead of cluster types in some functions so it's better to use the exact data types instead of integers for clarity and in order to detect errors more easily. This patch makes qcow2_get_host_offset() return 0 on success and puts the returned cluster type in a separate parameter. There are no semantic changes. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <396b6eab1859a271551dcd7dcba77f8934aa3c3f.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	c94d037825	qcow2: Add qcow2_cluster_is_allocated() This helper function tells us if a cluster is allocated (that is, there is an associated host offset for it). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <6d8771c5c79cbdc6c519875a5078e1cc85856d63.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	70d1cbae03	qcow2: Add qcow2_get_subcluster_range_type() There are situations in which we want to know how many contiguous subclusters of the same type there are in a given cluster. This can be done by simply iterating over the subclusters and repeatedly calling qcow2_get_subcluster_type() for each one of them. However once we determined the type of a subcluster we can check the rest efficiently by counting the number of adjacent ones (or zeroes) in the bitmap. This is what this function does. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <db917263d568ec6ffb4a41cac3c9100f96bf6c18.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	34905d8eb1	qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type() This patch adds QCow2SubclusterType, which is the subcluster-level version of QCow2ClusterType. All QCOW2_SUBCLUSTER_* values have the the same meaning as their QCOW2_CLUSTER_* equivalents (when they exist). See below for details and caveats. In images without extended L2 entries clusters are treated as having exactly one subcluster so it is possible to replace one data type with the other while keeping the exact same semantics. With extended L2 entries there are new possible values, and every subcluster in the same cluster can obviously have a different QCow2SubclusterType so functions need to be adapted to work on the subcluster level. There are several things that have to be taken into account: a) QCOW2_SUBCLUSTER_COMPRESSED means that the whole cluster is compressed. We do not support compression at the subcluster level. b) There are two different values for unallocated subclusters: QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN which means that the whole cluster is unallocated, and QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC which means that the cluster is allocated but the subcluster is not. The latter can only happen in images with extended L2 entries. c) QCOW2_SUBCLUSTER_INVALID is used to detect the cases where an L2 entry has a value that violates the specification. The caller is responsible for handling these situations. To prevent compatibility problems with images that have invalid values but are currently being read by QEMU without causing side effects, QCOW2_SUBCLUSTER_INVALID is only returned for images with extended L2 entries. qcow2_cluster_to_subcluster_type() is added as a separate function from qcow2_get_subcluster_type(), but this is only temporary and both will be merged in a subsequent patch. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <26ef38e270f25851c98b51278852b4c4a7f97e69.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	39a9f0a50e	qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Extended L2 entries are 128-bit wide: 64 bits for the entry itself and 64 bits for the subcluster allocation bitmap. In order to support them correctly get/set_l2_entry() need to be updated so they take the entry width into account in order to calculate the correct offset. This patch also adds the get/set_l2_bitmap() functions that are used to access the bitmaps. For convenience we allow calling get_l2_bitmap() on images without subclusters. In this case the returned value is always 0 and has no meaning. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <6ee0f81ae3329c991de125618b3675e1e46acdbb.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	c8fd8554d9	qcow2: Add l2_entry_size() qcow2 images with subclusters have 128-bit L2 entries. The first 64 bits contain the same information as traditional images and the last 64 bits form a bitmap with the status of each individual subcluster. Because of that we cannot assume that L2 entries are sizeof(uint64_t) anymore. This function returns the proper value for the image. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <d34d578bd0380e739e2dde3e8dd6187d3d249fa9.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	3e71981592	qcow2: Add offset_into_subcluster() and size_to_subclusters() Like offset_into_cluster() and size_to_clusters(), but for subclusters. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <3cc2390dcdef3d234d47c741b708bd8734490862.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	a53e8b7202	qcow2: Add offset_to_sc_index() For a given offset, return the subcluster number within its cluster (i.e. with 32 subclusters per cluster it returns a number between 0 and 31). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <56e3e4ac0d827c6a2f5f259106c5ddb7c4ca2653.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	d0346b5591	qcow2: Add subcluster-related fields to BDRVQcow2State This patch adds the following new fields to BDRVQcow2State: - subclusters_per_cluster: Number of subclusters in a cluster - subcluster_size: The size of each subcluster, in bytes - subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size Images without subclusters are treated as if they had exactly one subcluster per cluster (i.e. subcluster_size = cluster_size). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <55bfeac86b092fa2c9d182a95cbeb479ff7eca4f.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	a3c7d91625	qcow2: Add dummy has_subclusters() function This function will be used by the qcow2 code to check if an image has subclusters or not. At the moment this simply returns false. Once all patches needed for subcluster support are ready then QEMU will be able to create and read images with subclusters and this function will return the actual value. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <905526221083581a1b7057bca1585487661c5c13.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	12c6aebedf	qcow2: Add get_l2_entry() and set_l2_entry() The size of an L2 entry is 64 bits, but if we want to have subclusters we need extended L2 entries. This means that we have to access L2 tables and slices differently depending on whether an image has extended L2 entries or not. This patch replaces all l2_slice[] accesses with calls to get_l2_entry() and set_l2_entry(). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <9586363531fec125ba1386e561762d3e4224e9fc.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	57538c864f	qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() When writing to a qcow2 file there are two functions that take a virtual offset and return a host offset, possibly allocating new clusters if necessary: - handle_copied() looks for normal data clusters that are already allocated and have a reference count of 1. In those clusters we can simply write the data and there is no need to perform any copy-on-write. - handle_alloc() looks for clusters that do need copy-on-write, either because they haven't been allocated yet, because their reference count is != 1 or because they are ZERO_ALLOC clusters. The ZERO_ALLOC case is a bit special because those are clusters that are already allocated and they could perfectly be dealt with in handle_copied() (as long as copy-on-write is performed when required). In fact, there is extra code specifically for them in handle_alloc() that tries to reuse the existing allocation if possible and frees them otherwise. This patch changes the handling of ZERO_ALLOC clusters so the semantics of these two functions are now like this: - handle_copied() looks for clusters that are already allocated and which we can overwrite (NORMAL and ZERO_ALLOC clusters with a reference count of 1). - handle_alloc() looks for clusters for which we need a new allocation (all other cases). One important difference after this change is that clusters found in handle_copied() may now require copy-on-write, but this will be necessary anyway once we add support for subclusters. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <eb17fc938f6be7be2e8d8ff42763d2c19241f866.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	c1587d877e	qcow2: Split cluster_needs_cow() out of count_cow_clusters() We are going to need it in other places. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <65e5d9627ca2ebe7e62deaeddf60949c33067d9d.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	8f91d6906c	qcow2: Add calculate_l2_meta() handle_alloc() creates a QCowL2Meta structure in order to update the image metadata and perform the necessary copy-on-write operations. This patch moves that code to a separate function so it can be used from other places. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <e5bc4a648dac31972bfa7a0e554be8064be78799.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	388e581615	qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset() qcow2_get_cluster_offset() takes an (unaligned) guest offset and returns the (aligned) offset of the corresponding cluster in the qcow2 image. In practice none of the callers need to know where the cluster starts so this patch makes the function calculate and return the final host offset directly. The function is also renamed accordingly. There is a pre-existing exception with compressed clusters: in this case the function returns the complete cluster descriptor (containing the offset and size of the compressed data). This does not change with this patch but it is now documented. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <ffae6cdc5ca8950e8280ac0f696dcc376cb07095.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	9c4269d54b	qcow2: Make Qcow2AioTask store the full host offset The file_cluster_offset field of Qcow2AioTask stores a cluster-aligned host offset. In practice this is not very useful because all users() of this structure need the final host offset into the cluster, which they calculate using host_offset = file_cluster_offset + offset_into_cluster(s, offset) There is no reason why Qcow2AioTask cannot store host_offset directly and that is what this patch does. () compressed clusters are the exception: in this case what file_cluster_offset was storing was the full compressed cluster descriptor (offset + size). This does not change with this patch but it is documented now. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <07c4b15c644dcf06c9459f98846ac1c4ea96e26f.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Marc-André Lureau	5e5733e599	meson: convert block Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-08-21 06:30:18 -04:00
Paolo Bonzini	243af0225a	trace: switch position of headers to what Meson requires Meson doesn't enjoy the same flexibility we have with Make in choosing the include path. In particular the tracing headers are using $(build_root)/$(<D). In order to keep the include directives unchanged, the simplest solution is to generate headers with patterns like "trace/trace-audio.h" and place forwarding headers in the source tree such that for example "audio/trace.h" includes "trace/trace-audio.h". This patch is too ugly to be applied to the Makefiles now. It's only a way to separate the changes to the tracing header files from the Meson rewrite of the tracing logic. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-08-21 06:18:24 -04:00
Stefan Reiter	7661a886a1	block/block-copy: always align copied region to cluster size Since commit `42ac214406` (block/block-copy: refactor task creation) block_copy_task_create calculates the area to be copied via bdrv_dirty_bitmap_next_dirty_area, but that can return an unaligned byte count if the image's last cluster end is not aligned to the bitmap's granularity. Always ALIGN_UP the resulting bytes value to satisfy block_copy_do_copy, which requires the 'bytes' parameter to be aligned to cluster size. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> Message-Id: <20200810095523.15071-1-s.reiter@proxmox.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-10 17:12:46 +02:00
Tuguoyi	348fcc4f7a	qcow2-cluster: Fix integer left shift error in qcow2_alloc_cluster_link_l2() When calculating the offset, the result of left shift operation will be promoted to type int64 automatically because the left operand of + operator is uint64_t. but the result after integer promotion may be produce an error value for us and trigger the following asserting error. For example, consider i=0x2000, cluster_bits=18, the result of left shift operation will be 0x80000000. Cause argument i is of signed integer type, the result is automatically promoted to 0xffffffff80000000 which is not we expected The way to trigger the assertion error: qemu-img create -f qcow2 -o preallocation=full,cluster_size=256k tmpdisk 10G This patch fix it by casting @i to uint64_t before doing left shift operation Signed-off-by: Guoyi Tu <tu.guoyi@h3c.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 81ba90fe0c014f269621c283269b42ad@h3c.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-08-05 14:56:11 +01:00
Max Reitz	fe16c7ddf8	qcow2: Release read-only bitmaps when inactivated During migration, we release all bitmaps after storing them on disk, as long as they are (1) stored on disk, (2) not read-only, and (3) consistent. (2) seems arbitrary, though. The reason we do not release them is because we do not write them, as there is no need to; and then we just forget about all bitmaps that we have not written to the file. However, read-only persistent bitmaps are still in the file and in sync with their in-memory representation, so we may as well release them just like any R/W bitmap that we have updated. It leads to actual problems, too: After migration, letting the source continue may result in an error if there were any bitmaps on read-only nodes (such as backing images), because those have not been released by bdrv_inactive_all(), but bdrv_invalidate_cache_all() attempts to reload them (which fails, because they are still present in memory). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200730120234.49288-2-mreitz@redhat.com> Tested-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-08-03 08:59:37 -05:00
Peter Maydell	5045be872d	nbd patches for 2020-07-28 - fix NBD handling of trim/zero requests larger than 2G - allow no-op resizes on NBD (in turn fixing qemu-img convert -c into NBD) - several deadlock fixes when using NBD reconnect -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl8gPV4ACgkQp6FrSiUn Q2ozdQgAiDHaHG2NX4jmduID7677/XhsLoVl1MV7UZnU+y9qQ2p+Mbsw1oMneu8P Dtfgx/mlWVGu68gn31f4xVq74VTZH6p3IGV7PMcYZ50xbESoFs6CYUwUWUp1GeC3 +kPOl0EpLvm1W/V93sKmg8FflGmNiJHNkfl/ddfk0gs6Z3EfjkmGJt7IP/pv1UCs 4icWvCJsqw2z8TnEwtTpMX5HZlWth1x37lUOShlPL5kA5hZqU+zYU/bYB5iKx+16 MebYg7C7CXYCCtH9cDH/swUWhOdQLkywA6yBAwc1zENsKy84aIAJIUls/Ji0q6CY A4s5c0FovLBuMDd9oLr0kJbkJQeVZA== =DD6l -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2020-07-28' into staging nbd patches for 2020-07-28 - fix NBD handling of trim/zero requests larger than 2G - allow no-op resizes on NBD (in turn fixing qemu-img convert -c into NBD) - several deadlock fixes when using NBD reconnect # gpg: Signature made Tue 28 Jul 2020 15:59:42 BST # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2020-07-28: block/nbd: nbd_co_reconnect_loop(): don't sleep if drained block/nbd: on shutdown terminate connection attempt block/nbd: allow drain during reconnect attempt block/nbd: split nbd_establish_connection out of nbd_client_connect iotests: Test convert to qcow2 compressed to NBD iotests: Add more qemu_img helpers iotests: Make qemu_nbd_popen() a contextmanager block: nbd: Fix convert qcow2 compressed to nbd nbd: Fix large trim/zero requests Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-28 20:43:03 +01:00

... 2 3 4 5 6 ...

5177 Commits