mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
zhenwei pi	4714791b66	hw/block/nvme: add smart_critical_warning property There is a very low probability that hitting physical NVMe disk hardware critical warning case, it's hard to write & test a monitor agent service. For debugging purposes, add a new 'smart_critical_warning' property to emulate this situation. The orignal version of this change is implemented by adding a fixed property which could be initialized by QEMU command line. Suggested by Philippe & Klaus, rework like current version. Test with this patch: 1, change smart_critical_warning property for a running VM: #virsh qemu-monitor-command nvme-upstream '{ "execute": "qom-set", "arguments": { "path": "/machine/peripheral-anon/device[0]", "property": "smart_critical_warning", "value":16 } }' 2, run smartctl in guest #smartctl -H -l error /dev/nvme0n1 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: FAILED! - volatile memory backup device has failed Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
zhenwei pi	c6d1b5c13b	nvme: introduce bit 5 for critical warning According to NVM Express v1.4, Section 5.14.1.2 ("SMART / Health Information"), introduce bit 5 for "Persistent Memory Region has become read-only or unreliable". Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> [k.jensen: minor brush ups in commit message] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
Klaus Jensen	b05fde2881	hw/block/nvme: enum style fix Align with existing style and use a typedef for header-file enums. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>	2021-02-08 21:15:53 +01:00
Dmitry Fomichev	e9ba46eeaf	nvme: Make ZNS-related definitions Define values and structures that are needed to support Zoned Namespace Command Set (NVMe TP 4053). Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 20:58:34 +01:00
Niklas Cassel	922e6f4ebd	hw/block/nvme: Support allocated CNS command variants Many CNS commands have "allocated" command variants. These include a namespace as long as it is allocated, that is a namespace is included regardless if it is active (attached) or not. While these commands are optional (they are mandatory for controllers supporting the namespace attachment command), our QEMU implementation is more complete by actually providing support for these CNS values. However, since our QEMU model currently does not support the namespace attachment command, these new allocated CNS commands will return the same result as the active CNS command variants. The reason for not hooking up this command completely is because the NVMe specification requires the namespace management command to be supported if the namespace attachment command is supported. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 20:58:34 +01:00
Niklas Cassel	141354d55b	hw/block/nvme: Add support for Namespace Types Define the structures and constants required to implement Namespace Types support. Namespace Types introduce a new command set, "I/O Command Sets", that allows the host to retrieve the command sets associated with a namespace. Introduce support for the command set and enable detection for the NVM Command Set. The new workflows for identify commands rely heavily on zero-filled identify structs. E.g., certain CNS commands are defined to return a zero-filled identify struct when an inactive namespace NSID is supplied. Add a helper function in order to avoid code duplication when reporting zero-filled identify structures. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 20:58:34 +01:00
Dmitry Fomichev	62e8faa468	hw/block/nvme: Add Commands Supported and Effects log This log page becomes necessary to implement to allow checking for Zone Append command support in Zoned Namespace Command Set. This commit adds the code to report this log page for NVM Command Set only. The parts that are specific to zoned operation will be added later in the series. All incoming admin and i/o commands are now only processed if their corresponding support bits are set in this log. This provides an easy way to control what commands to support and what not to depending on set CC.CSS. Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 20:58:32 +01:00
Klaus Jensen	6fd704a59a	nvme: add namespace I/O optimization fields to shared header This adds the NPWG, NPWA, NPDG, NPDA and NOWS family of fields to the shared nvme.h header for use by later patches. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Fam Zheng <fam@euphon.net> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>	2021-02-08 18:55:48 +01:00
Klaus Jensen	54064e51d1	hw/block/nvme: add dulbe support Add support for reporting the Deallocated or Unwritten Logical Block Error (DULBE). Rely on the block status flags reported by the block layer and consider any block with the BDRV_BLOCK_ZERO flag to be deallocated. Multiple factors affect when a Write Zeroes command result in deallocation of blocks. * the underlying file system block size * the blockdev format * the 'discard' and 'logical_block_size' parameters format \| discard \| wz (512B) wz (4KiB) wz (64KiB) ----------------------------------------------------- qcow2 ignore n n y qcow2 unmap n n y raw ignore n y y raw unmap n y y So, this works best with an image in raw format and 4KiB LBAs, since holes can then be punched on a per-block basis (this assumes a file system with a 4kb block size, YMMV). A qcow2 image, uses a cluster size of 64KiB by default and blocks will only be marked deallocated if a full cluster is zeroed or discarded. However, this is consistent with the spec since Write Zeroes "should" deallocate the block if the Deallocate attribute is set and "may" deallocate if the Deallocate attribute is not set. Thus, we always try to deallocate (the BDRV_REQ_MAY_UNMAP flag is always set). Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-02-08 18:55:48 +01:00
Daniel P. Berrangé	3d3e9b1f66	block: rename and alter bdrv_all_find_snapshot semantics Currently bdrv_all_find_snapshot() will return 0 if it finds a snapshot, -1 if an error occurs, or if it fails to find a snapshot. New callers to be added want to distinguish between the error scenario and failing to find a snapshot. Rename it to bdrv_all_has_snapshot and make it return -1 on error, 0 if no snapshot is found and 1 if snapshot is found. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-7-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	c22d644ca7	block: allow specifying name of block device for vmstate storage Currently the vmstate will be stored in the first block device that supports snapshots. Historically this would have usually been the root device, but with UEFI it might be the variable store. There needs to be a way to override the choice of block device to store the state in. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-6-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	cf3a74c94f	block: add ability to specify list of blockdevs during snapshot When running snapshot operations, there are various rules for which blockdevs are included/excluded. While this provides reasonable default behaviour, there are scenarios that are not well handled by the default logic. Some of the conditions do not have a single correct answer. Thus there needs to be a way for the mgmt app to provide an explicit list of blockdevs to perform snapshots across. This can be achieved by passing a list of node names that should be used. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-5-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	e26f98e209	block: push error reporting into bdrv_all__snapshot functions The bdrv_all__snapshot functions return a BlockDriverState pointer for the invalid backend, which the callers then use to report an error message. In some cases multiple callers are reporting the same error message, but with slightly different text. In the future there will be more error scenarios for some of these methods, which will benefit from fine grained error message reporting. So it is helpful to push error reporting down a level. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> [PMD: Initialize variables] Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210204124834.774401-2-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Roman Kagan	5082fc82a6	nbd: make nbd_read* return -EIO on error NBD reconnect logic considers the error code from the functions that read NBD messages to tell if reconnect should be attempted or not: it is attempted on -EIO, otherwise the client transitions to NBD_CLIENT_QUIT state (see nbd_channel_error). This error code is propagated from the primitives like nbd_read. The problem, however, is that nbd_read itself turns every error into -1 rather than -EIO. As a result, if the NBD server happens to die while sending the message, the client in QEMU receives less data than it expects, considers it as a fatal error, and wouldn't attempt reestablishing the connection. Fix it by turning every negative return from qio_channel_read_all into -EIO returned from nbd_read. Apparently that was the original behavior, but got broken later. Also adjust nbd_readXX to follow. Fixes: `e6798f06a6` ("nbd: generalize usage of nbd_read") Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210129073859.683063-4-rvkagan@yandex-team.ru> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy	a5215b8fdf	block/io: use int64_t bytes in copy_range We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert now copy_range parameters which are already 64bit to signed type. It's safe as we don't work with requests overflowing BDRV_MAX_LENGTH (which is less than INT64_MAX), and do check the requests in bdrv_co_copy_range_internal() (by bdrv_check_request32(), which calls bdrv_check_request()). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-17-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy	e9e52efdc5	block/io: support int64_t bytes in read/write wrappers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). Now, since bdrv_co_preadv_part() and bdrv_co_pwritev_part() have been updated, update all their wrappers. For all of them type of 'bytes' is widening, so callers are safe. We have update request_fn in blkverify.c simultaneously. Still it's just a pointer to one of bdrv_co_pwritev() or bdrv_co_preadv(), and type is widening for callers of the request_fn anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-16-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweak] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy	37e9403ea8	block/io: support int64_t bytes in bdrv_co_p{read,write}v_part() We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, prepare bdrv_co_preadv_part() and bdrv_co_pwritev_part() and their remaining dependencies now. bdrv_pad_request() is updated simultaneously, as pointer to bytes passed to it both from bdrv_co_pwritev_part() and bdrv_co_preadv_part(). So, all callers of bdrv_pad_request() are updated to pass 64bit bytes. bdrv_pad_request() is already good for 64bit requests, add corresponding assertion. Look at bdrv_co_preadv_part() and bdrv_co_pwritev_part(). Type is widening, so callers are safe. Let's look inside the functions. In bdrv_co_preadv_part() and bdrv_aligned_pwritev() we only pass bytes to other already int64_t interfaces (and some obviously safe calculations), it's OK. In bdrv_co_do_zero_pwritev() aligned_bytes may become large now, still it's passed to bdrv_aligned_pwritev which supports int64_t bytes. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-15-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:11 -06:00
Eric Blake	8024726459	block: use int64_t as bytes type in tracked requests We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). All requests in block/io must not overflow BDRV_MAX_LENGTH, all external users of BdrvTrackedRequest already have corresponding assertions, so we are safe. Add some assertions still. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-9-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:14:15 -06:00
Vladimir Sementsov-Ogievskiy	801625e69d	block/throttle-groups: throttle_group_co_io_limits_intercept(): 64bit bytes The function is called from 64bit io handlers, and bytes is just passed to throttle_account() which is 64bit too (unsigned though). So, let's convert intermediate argument to 64bit too. This patch is a first in the 64-bit-blocklayer series, so we are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). Patch-correctness audit by Eric Blake: Caller has 32-bit, this patch now causes widening which is safe: block/block-backend.c: blk_do_preadv() passes 'unsigned int' block/block-backend.c: blk_do_pwritev_part() passes 'unsigned int' block/throttle.c: throttle_co_pwrite_zeroes() passes 'int' block/throttle.c: throttle_co_pdiscard() passes 'int' Caller has 64-bit, this patch fixes potential bug where pre-patch could narrow, except it's easy enough to trace that callers are still capped at 2G actions: block/throttle.c: throttle_co_preadv() passes 'uint64_t' block/throttle.c: throttle_co_pwritev() passes 'uint64_t' Implementation in question: block/throttle-groups.c throttle_group_co_io_limits_intercept() takes 'unsigned int bytes' and uses it: argument to util/throttle.c throttle_account(uint64_t) All safe: it patches a latent bug, and does not introduce any 64-bit gotchas once throttle_co_p{read,write}v are relaxed, and assuming throttle_account() is not buggy. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20201211183934.169161-7-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:14:00 -06:00
Vladimir Sementsov-Ogievskiy	69b55e03f7	block: refactor bdrv_check_request: add errp It's better to pass &error_abort than just assert that result is 0: on crash, we'll immediately see the reason in the backtrace. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: fix iotest 206 fallout] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:00:33 -06:00
Vladimir Sementsov-Ogievskiy	143a6384f5	block/block-copy: drop unused argument of block_copy() Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-21-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	5b49c2bdc1	block/block-copy: drop unused block_copy_set_progress_callback() Drop unused code. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-20-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	e0323a045f	blockjob: add set_speed to BlockJobDriver We are going to use async block-copy call in backup, so we'll need to passthrough setting backup speed to block-copy call. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-9-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	a6d23d56df	block/block-copy: add block_copy_cancel Add function to cancel running async block-copy call. It will be used in backup. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-8-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	7e032df0ea	block/block-copy: add ratelimit to block-copy We are going to directly use one async block-copy operation for backup job, so we need rate limiter. We want to maintain current backup behavior: only background copying is limited and copy-before-write operations only participate in limit calculation. Therefore we need one rate limiter for block-copy state and boolean flag for block-copy call state for actual limitation. Note, that we can't just calculate each chunk in limiter after successful copying: it will not save us from starting a lot of async sub-requests which will exceed limit too much. Instead let's use the following scheme on sub-request creation: 1. If at the moment limit is not exceeded, create the request and account it immediately. 2. If at the moment limit is already exceeded, drop create sub-request and handle limit instead (by sleep). With this approach we'll never exceed the limit more than by one sub-request (which pretty much matches current backup behavior). Note also, that if there is in-flight block-copy async call, block_copy_kick() should be used after set-speed to apply new setup faster. For that block_copy_kick() published in this patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-7-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	26be9d62dd	block/block-copy: add max_chunk and max_workers parameters They will be used for backup. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-5-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	de4641b46b	block/block-copy: implement block_copy_async We'll need async block-copy invocation to use in backup directly. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-4-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	86c6a3b690	qapi: backup: add perf.use-copy-range parameter Experiments show, that copy_range is not always making things faster. So, to make experimentation simpler, let's add a parameter. Some more perf parameters will be added soon, so here is a new struct. For now, add new backup qmp parameter with x- prefix for the following reasons: - We are going to add more performance parameters, some will be related to the whole block-copy process, some only to background copying in backup (ignored for copy-before-write operations). - On the other hand, we are going to use block-copy interface in other block jobs, which will need performance options as well.. And it should be the same structure or at least somehow related. So, there are too much unclean things about how the interface and now we need the new options mostly for testing. Let's keep them experimental for a while. In do_backup_common() new x-perf parameter handled in a way to make further options addition simpler. We add use-copy-range with default=true, and we'll change the default in further patch, after moving backup to use block-copy. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-2-vsementsov@virtuozzo.com> [mreitz: s/5\.2/6.0/] Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	7f4a396d76	qapi: block-stream: add "bottom" argument The code already don't freeze base node and we try to make it prepared for the situation when base node is changed during the operation. In other words, block-stream doesn't own base node. Let's introduce a new interface which should replace the current one, which will in better relations with the code. Specifying bottom node instead of base, and requiring it to be non-filter gives us the following benefits: - drop difference between above_base and base_overlay, which will be renamed to just bottom, when old interface dropped - clean way to work with parallel streams/commits on the same backing chain, which otherwise become a problem when we introduce a filter for stream job - cleaner interface. Nobody will surprised the fact that base node may disappear during block-stream, when there is no word about "base" in the interface. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201216061703.70908-11-vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Andrey Shinkevich	e275458b29	copy-on-read: skip non-guest reads if no copy needed If the flag BDRV_REQ_PREFETCH was set, skip idling read/write operations in COR-driver. It can be taken into account for the COR-algorithms optimization. That check is being made during the block stream job by the moment. Add the BDRV_REQ_PREFETCH flag to the supported_read_flags of the COR-filter. block: Modify the comment for the flag BDRV_REQ_PREFETCH as we are going to use it alone and pass it to the COR-filter driver for further processing. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201216061703.70908-9-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Andrey Shinkevich	897dd0ec4f	block: include supported_read_flags into BDS structure Add the new member supported_read_flags to the BlockDriverState structure. It will control the flags set for copy-on-read operations. Make the block generic layer evaluate supported read flags before they go to a block driver. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [vsementsov: use assert instead of abort] Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201216061703.70908-8-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Andrey Shinkevich	880747a887	qapi: add filter-node-name to block-stream Provide the possibility to pass the 'filter-node-name' parameter to the block-stream job as it is done for the commit block job. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [vsementsov: comment indentation, s/Since: 5.2/Since: 6.0/] Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201216061703.70908-5-vsementsov@virtuozzo.com> [mreitz: s/commit/stream/] Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 11:26:54 +01:00
Andrey Shinkevich	8872ef78ab	block: add API function to insert a node Provide API for insertion a node to backing chain. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201216061703.70908-3-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 11:26:54 +01:00
Vladimir Sementsov-Ogievskiy	d1a764d126	block: introduce BDRV_REQ_NO_WAIT flag Add flag to make serialising request no wait: if there are conflicting requests, just return error immediately. It's will be used in upcoming preallocate filter. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201021145859.11201-7-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy	8ac5aab255	block: bdrv_mark_request_serialising: split non-waiting function We'll need a separate function, which will only "mark" request serialising with specified align but not wait for conflicting requests. So, it will be like old bdrv_mark_request_serialising(), before merging bdrv_wait_serialising_requests_locked() into it. To reduce the possible mess, let's do the following: Public function that does both marking and waiting will be called bdrv_make_request_serialising, and private function which will only "mark" will be called tracked_request_set_serialising(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201021145859.11201-6-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy	2153994e2e	block: simplify comment to BDRV_REQ_SERIALISING 1. BDRV_REQ_NO_SERIALISING doesn't exist already, don't mention it. 2. We are going to add one more user of BDRV_REQ_SERIALISING, so comment about backup becomes a bit confusing here. The use case in backup is documented in block/backup.c, so let's just drop duplication here. 3. The fact that BDRV_REQ_SERIALISING is only for write requests is omitted. Add a note. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20201021145859.11201-2-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy	8b1170012b	block: introduce BDRV_MAX_LENGTH We are going to modify block layer to work with 64bit requests. And first step is moving to int64_t type for both offset and bytes arguments in all block request related functions. It's mostly safe (when widening signed or unsigned int to int64_t), but switching from uint64_t is questionable. So, let's first establish the set of requests we want to work with. First signed int64_t should be enough, as off_t is signed anyway. Then, obviously offset + bytes should not overflow. And most interesting: (offset + bytes) being aligned up should not overflow as well. Aligned to what alignment? First thing that comes in mind is bs->bl.request_alignment, as we align up request to this alignment. But there is another thing: look at bdrv_mark_request_serialising(). It aligns request up to some given alignment. And this parameter may be bdrv_get_cluster_size(), which is often a lot greater than bs->bl.request_alignment. Note also, that bdrv_mark_request_serialising() uses signed int64_t for calculations. So, actually, we already depend on some restrictions. Happily, bdrv_get_cluster_size() returns int and bs->bl.request_alignment has 32bit unsigned type, but defined to be a power of 2 less than INT_MAX. So, we may establish, that INT_MAX is absolute maximum for any kind of alignment that may occur with the request. Note, that bdrv_get_cluster_size() is not documented to return power of 2, still bdrv_mark_request_serialising() behaves like it is. Also, backup uses bdi.cluster_size and is not prepared to it not being power of 2. So, let's establish that Qemu supports only power-of-2 clusters and alignments. So, alignment can't be greater than 2^30. Finally to be safe with calculations, to not calculate different maximums for different nodes (depending on cluster size and request_alignment), let's simply set QEMU_ALIGN_DOWN(INT64_MAX, 2^30) as absolute maximum bytes length for Qemu. Actually, it's not much less than INT64_MAX. OK, then, let's apply it to block/io. Let's consider all block/io entry points of offset/bytes: 4 bytes/offset interface functions: bdrv_co_preadv_part(), bdrv_co_pwritev_part(), bdrv_co_copy_range_internal() and bdrv_co_pdiscard() and we check them all with bdrv_check_request(). We also have one entry point with only offset: bdrv_co_truncate(). Check the offset. And one public structure: BdrvTrackedRequest. Happily, it has only three external users: file-posix.c: adopted by this patch write-threshold.c: only read fields test-write-threshold.c: sets obviously small constant values Better is to make the structure private and add corresponding interfaces.. Still it's not obvious what kind of interface is needed for file-posix.c. Let's keep it public but add corresponding assertions. After this patch we'll convert functions in block/io.c to int64_t bytes and offset parameters. We can assume that offset/bytes pair always satisfy new restrictions, and make corresponding assertions where needed. If we reach some offset/bytes point in block/io.c missing bdrv_check_request() it is considered a bug. As well, if block/io.c modifies a offset/bytes request, expanding it more then aligning up to request_alignment, it's a bug too. For all io requests except for discard we keep for now old restriction of 32bit request length. iotest 206 output error message changed, as now test disk size is larger than new limit. Add one more test case with new maximum disk size to cover too-big-L1 case. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-5-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Max Reitz	0c9b70d590	fuse: Allow exporting BDSs via FUSE block-export-add type=fuse allows mounting block graph nodes via FUSE on some existing regular file. That file should then appears like a raw disk image, and accesses to it result in accesses to the exported BDS. Right now, we only implement the necessary block export functions to set it up and shut it down. We do not implement any access functions, so accessing the mount point only results in errors. This will be addressed by a followup patch. We keep a hash table of exported mount points, because we want to be able to detect when users try to use a mount point twice. This is because we invoke stat() to check whether the given mount point is a regular file, but if that file is served by ourselves (because it is already used as a mount point), then this stat() would have to be served by ourselves, too, which is impossible to do while we (as the caller) are waiting for it to settle. Therefore, keep track of mount point paths to at least catch the most obvious instances of that problem. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:39 +01:00
Peter Maydell	683685e72d	Pull request for 5.2 NVMe fixes to solve IOMMU issues on non-x86 and error message/tracing improvements. Elena Afanasova's ioeventfd fixes are also included. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAl+ixjgACgkQnKSrs4Gr c8iZYgf+OB2eAGsdZO97fKh6VUUoRKa+BgWKuh37Cfpp3q+dLuIFMSKfU/UgprLc aowt6uTFfwudDV9KltUB2EiXIzpuf7JhMNOiDRkyEvYSj4KHRPsQmFCd35Nrjezy VvxSGafe2Z60Qnvcx+iGeMATSFX9YTcTZeHttC07v7dWn/yEK3b1hobcmjCcwWeR Ud8pjMyh5E2z/NpW8E669/byJf9iahx3LSQxSWt+9PVTPuftAB0Suu+m6svz1wvk sjVfIbtVWCp2BdGf5U6a2rEqF3+kIcFkfHp+MwgE0EdMz1wfjudaPl13a0C4DSun PSt9E+Ct5BTrDUvqCHvQDOaFiMZTPg== =Poyb -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha-gitlab/tags/block-pull-request' into staging Pull request for 5.2 NVMe fixes to solve IOMMU issues on non-x86 and error message/tracing improvements. Elena Afanasova's ioeventfd fixes are also included. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> # gpg: Signature made Wed 04 Nov 2020 15:18:16 GMT # gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha-gitlab/tags/block-pull-request: (33 commits) util/vfio-helpers: Assert offset is aligned to page size util/vfio-helpers: Convert vfio_dump_mapping to trace events util/vfio-helpers: Improve DMA trace events util/vfio-helpers: Trace where BARs are mapped util/vfio-helpers: Trace PCI BAR region info util/vfio-helpers: Trace PCI I/O config accesses util/vfio-helpers: Improve reporting unsupported IOMMU type block/nvme: Fix nvme_submit_command() on big-endian host block/nvme: Fix use of write-only doorbells page on Aarch64 arch block/nvme: Align iov's va and size on host page size block/nvme: Change size and alignment of prp_list_pages block/nvme: Change size and alignment of queue block/nvme: Change size and alignment of IDENTIFY response buffer block/nvme: Correct minimum device page size block/nvme: Set request_alignment at initialization block/nvme: Simplify nvme_cmd_sync() block/nvme: Simplify ADMIN queue access block/nvme: Correctly initialize Admin Queue Attributes block/nvme: Use definitions instead of magic values in add_io_queue() block/nvme: Introduce Completion Queue definitions ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-11-23 13:03:13 +00:00
Greg Kurz	009cde17a5	block: Move bdrv_drain_all_end_quiesce() to block_int.h This function is really an internal helper for bdrv_close(). Update its doc comment to make this clear and make the function private. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160387245480.131299.13430357162209598411.stgit@bahia> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-11-09 15:44:21 +01:00
Philippe Mathieu-Daudé	54248d4d73	block/nvme: Introduce Completion Queue definitions Rename Submission Queue flags with 'Sq' to differentiate submission queue flags from command queue flags, and introduce Completion Queue flag definitions. Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20201029093306.1063879-13-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Peter Maydell	8680d6e364	nvme pull 2 Nov 2020 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE28EdLTc7SjdV9QLsYlFWYQpPbMAFAl+gI74ACgkQYlFWYQpP bMCBrA/9GXMZZGDfHFenXF+rS6J+ZKxtk29vq9Ly8KZ9YW7CzF9MP8qE/5iyFfmx d1BknXGQerW2kAzpkOq2/MKDklOc+0BAhaTdUaFR/ao5ZKuv2LQ8uFnKVoTrhTx9 +HVkTVUTnez6ReCZVIrtN4+XVdyQTeQotJg6H2m5Q/BxQKcj6OMOlneuSGDn5vFN EWgDvEmfFEkzbN8FMXtkT35bg3vA5TGmfQRMk1SMMREOPxF04CaTVTxYscCpS0WC Cl+62mx4XLjscK7hwXuTNTrxeOLxZ2xLK5dhDd/qxBveio07mIM5X2psdKR0t5qX HLtm437T9CAYmyo8jgvM4KL8f+rbJnLd579qyVwIMsue28Qisj9nuWCTcaEpjfck 4krhxJwxenRtqQ9wYrnbnQI5yQDIE6iUGf0toXwCNdJIr+FvyIcT7vJtTzZXtRI8 sxwK5wfJ/WSey9uNLZGFbQuv4vjOMV+Nk3mEi1gUV8ujogo+2U6WUAE3NhqFLKn1 YT6AJhDZvqL1f8gFrbiqR8xwvPrYmwK/tK38X1exSDOqiB7UNzR/apAb1oniul0e rS5xWzIs9APvkdWQssCHvrVDdh6VISXQ5bnT8lkfmvYrCTn2gUGAFXDrxZjXIaL9 scCr8N9STkHmoYpc2ACRKIpfK3E1sDjGA8mAPemkxsLakNwBS4o= =s4KC -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/nvme/tags/pull-nvme-20201102' into staging nvme pull 2 Nov 2020 # gpg: Signature made Mon 02 Nov 2020 15:20:30 GMT # gpg: using RSA key DBC11D2D373B4A3755F502EC625156610A4F6CC0 # gpg: Good signature from "Keith Busch <kbusch@kernel.org>" [unknown] # gpg: aka "Keith Busch <keith.busch@gmail.com>" [unknown] # gpg: aka "Keith Busch <keith.busch@intel.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: DBC1 1D2D 373B 4A37 55F5 02EC 6251 5661 0A4F 6CC0 * remotes/nvme/tags/pull-nvme-20201102: (30 commits) hw/block/nvme: fix queue identifer validation hw/block/nvme: fix create IO SQ/CQ status codes hw/block/nvme: fix prp mapping status codes hw/block/nvme: report actual LBA data shift in LBAF hw/block/nvme: add trace event for requests with non-zero status code hw/block/nvme: add nsid to get/setfeat trace events hw/block/nvme: reject io commands if only admin command set selected hw/block/nvme: support for admin-only command set hw/block/nvme: validate command set selected hw/block/nvme: support per-namespace smart log hw/block/nvme: fix log page offset check hw/block/nvme: remove pointless rw indirection hw/block/nvme: update nsid when registered hw/block/nvme: change controller pci id pci: allocate pci id for nvme hw/block/nvme: support multiple namespaces hw/block/nvme: refactor identify active namespace id list hw/block/nvme: add support for sgl bit bucket descriptor hw/block/nvme: add support for scatter gather lists hw/block/nvme: harden cmb access ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-11-02 17:17:29 +00:00
Eric Blake	71719cd57f	nbd: Add new qemu:allocation-depth metadata context 'qemu-img map' provides a way to determine which extents of an image come from the top layer vs. inherited from a backing chain. This is useful information worth exposing over NBD. There is a proposal to add a QMP command block-dirty-bitmap-populate which can create a dirty bitmap that reflects allocation information, at which point the qemu:dirty-bitmap:NAME metadata context can expose that information via the creation of a temporary bitmap, but we can shorten the effort by adding a new qemu:allocation-depth metadata context that does the same thing without an intermediate bitmap (this patch does not eliminate the need for that proposal, as it will have other uses as well). While documenting things, remember that although the NBD protocol has NBD_OPT_SET_META_CONTEXT, the rest of its documentation refers to 'metadata context', which is a more apt description of what is actually being used by NBD_CMD_BLOCK_STATUS: the user is requesting metadata by passing one or more context names. So I also touched up some existing wording to prefer the term 'metadata context' where it makes sense. Note that this patch does not actually enable any way to request a server to enable this context; that will come in the next patch. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201027050556.269064-10-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2020-10-30 15:22:00 -05:00
Greg Kurz	1a6d3bd229	block: End quiescent sections when a BDS is deleted If a BDS gets deleted during blk_drain_all(), it might miss a call to bdrv_do_drained_end(). This means missing a call to aio_enable_external() and the AIO context remains disabled for ever. This can cause a device to become irresponsive and to disrupt the guest execution, ie. hang, loop forever or worse. This scenario is quite easy to encounter with virtio-scsi on POWER when punching multiple blockdev-create QMP commands while the guest is booting and it is still running the SLOF firmware. This happens because SLOF disables/re-enables PCI devices multiple times via IO/MEM/MASTER bits of PCI_COMMAND register after the initial probe/feature negotiation, as it tends to work with a single device at a time at various stages like probing and running block/network bootloaders without doing a full reset in-between. This naturally generates many dataplane stops and starts, and thus many drain sections that can race with blockdev_create_run(). In the end, SLOF bails out. It is somehow reproducible on x86 but it requires to generate articial dataplane start/stop activity with stop/cont QMP commands. In this case, seabios ends up looping for ever, waiting for the virtio-scsi device to send a response to a command it never received. Add a helper that pairs all previously called bdrv_do_drained_begin() with a bdrv_do_drained_end() and call it from bdrv_close(). While at it, update the "/bdrv-drain/graph-change/drain_all" test in test-bdrv-drain so that it can catch the issue. BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1874441 Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160346526998.272601.9045392804399803158.stgit@bahia.lan> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-27 15:26:20 +01:00
Alberto Garcia	46cd1e8a47	qcow2: Skip copy-on-write when allocating a zero cluster Since commit `c8bb23cbdb` when a write request results in a new allocation QEMU first tries to see if the rest of the cluster outside the written area contains only zeroes. In that case, instead of doing a normal copy-on-write operation and writing explicit zero buffers to disk, the code zeroes the whole cluster efficiently using pwrite_zeroes() with BDRV_REQ_NO_FALLBACK. This improves performance very significantly but it only happens when we are writing to an area that was completely unallocated before. Zero clusters (QCOW2_CLUSTER_ZERO_*) are treated like normal clusters and are therefore slower to allocate. This happens because the code uses bdrv_is_allocated_above() rather bdrv_block_status_above(). The former is not as accurate for this purpose but it is faster. However in the case of qcow2 the underlying call does already report zero clusters just fine so there is no reason why we cannot use that information. After testing 4KB writes on an image that only contains zero clusters this patch results in almost five times more IOPS. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <6d77cab968c501c44d6e1089b9bc91b04170b49e.1603731354.git.berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-27 15:26:20 +01:00
Gollu Appalanaidu	28fee5b5d0	hw/block/nvme: fix prp mapping status codes Address 0 is not an invalid address. Remove those invalikd checks. Unaligned PRP2 and PRP list entries should result in Invalid PRP Offset status code and not Invalid Field. Fix that. See NVMe Express v1.3d, Section 4.3 ("Physical Region Page Entry and List"). Suggested-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2020-10-27 11:29:25 +01:00
Klaus Jensen	1b48e4611a	hw/block/nvme: reject io commands if only admin command set selected If the host sets CC.CSS to 111b, all commands submitted to I/O queues should be completed with status Invalid Command Opcode. Note that this is technically a v1.4 feature, but it does not hurt to implement before we finally bump the reported version implemented. Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2020-10-27 11:29:25 +01:00
Keith Busch	8c5cea8593	hw/block/nvme: support for admin-only command set Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2020-10-27 11:29:25 +01:00
Keith Busch	492f9a8d79	hw/block/nvme: validate command set selected Fail to start the controller if the user requests a command set that the controller does not support. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2020-10-27 11:29:25 +01:00
Keith Busch	2fbbecc5cd	hw/block/nvme: support per-namespace smart log Let the user specify a specific namespace if they want to get access stats for a specific namespace. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2020-10-27 11:29:25 +01:00
Klaus Jensen	cba0a8a344	hw/block/nvme: add support for scatter gather lists For now, support the Data Block, Segment and Last Segment descriptor types. See NVM Express 1.3d, Section 4.4 ("Scatter Gather List (SGL)"). Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2020-10-27 07:24:47 +01:00
Kevin Wolf	18c6ac1c6e	block: Add bdrv_lock()/unlock() Inside of coroutine context, we can't directly use aio_context_acquire() for the AioContext of a block node because we already own the lock of the current AioContext and we need to avoid double locking to prevent deadlocks. This provides helper functions to lock the AioContext of a node only if it's not the same as the current AioContext. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20201005155855.256490-14-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-10-09 07:08:20 +02:00
Kevin Wolf	e336fd4c4b	block: Add bdrv_co_enter()/leave() Add a pair of functions to temporarily move the current coroutine to the AioContext of a given BlockDriverState. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20201005155855.256490-13-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-10-09 07:08:20 +02:00
Kevin Wolf	26b0b698c0	util/async: Add aio_co_reschedule_self() Add a function that can be used to move the currently running coroutine to a different AioContext (and therefore potentially a different thread). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201005155855.256490-12-kwolf@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-10-09 07:08:20 +02:00
Peter Maydell	f2687fdb75	* Reverse debugging (Pavel) * CFLAGS cleanup (Paolo) * ASLR fix (Mark) * cpus.c refactoring (Claudio) -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl98EB0UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOCsQf9G7EUAK1zcEOx20LtDdXFrk4tjsRp S83OGdihWe8SM+XiY9BfqsBbXdByqF+SitePOV3feGK0mOP5vtJIL7/2DLrtFTeF wOeARRA9ePVb7hcL5oXAQeE3bXrX8wq8Qtw9xAoHdw5JAEVmKIEJS6AL5Eu3M2Fh pvdBoV84pOm2/ARS3eRstRyW8gCC8rdLDlNsVDtCbYdNVq+VdkzR0l5Phc8JDx1M Qjdl1KpN6ZkuN8M6tnaQNTb9IUVu5c1tu5jdR6JdLUqAWp1wYZJ6r2jSatZWfLR3 H+gzFsDoLPfCjZ3IhfZyvzF5leSZmdbFfzI0tHS1UJ/ZZYjutDvlPlbyYA== =Jys5 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging * Reverse debugging (Pavel) * CFLAGS cleanup (Paolo) * ASLR fix (Mark) * cpus.c refactoring (Claudio) # gpg: Signature made Tue 06 Oct 2020 07:35:09 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (37 commits) tests/acceptance: add reverse debugging test replay: create temporary snapshot at debugger connection replay: describe reverse debugging in docs/replay.txt gdbstub: add reverse continue support in replay mode gdbstub: add reverse step support in replay mode replay: flush rr queue before loading the vmstate replay: implement replay-seek command replay: introduce breakpoint at the specified step replay: introduce info hmp/qmp command qapi: introduce replay.json for record/replay-related stuff migration: introduce icount field for snapshots qcow2: introduce icount field for snapshots replay: provide an accessor for rr filename replay: don't record interrupt poll configure: don't enable ASLR for --enable-debug Windows builds configure: consistently pass CFLAGS/CXXFLAGS/LDFLAGS to meson configure: do not clobber environment CFLAGS/CXXFLAGS/LDFLAGS dtc: Convert Makefile bits to meson bits slirp: Convert Makefile bits to meson bits accel/tcg: use current_machine as it is always set for softmmu ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-10-06 15:04:10 +01:00
Pavel Dovgalyuk	b39847a505	migration: introduce icount field for snapshots Saving icount as a parameters of the snapshot allows navigation between them in the execution replay scenario. This information can be used for finding a specific snapshot for proceeding the recorded execution to the specific moment of the time. E.g., 'reverse step' action (introduced in one of the following patches) needs to load the nearest snapshot which is prior to the current moment of time. This patch also updates snapshot test which verifies qemu monitor output. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> Acked-by: Markus Armbruster <armbru@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com> -- v4 changes: - squashed format update with test output update v7 changes: - introduced the spaces between the fields in snapshot info output - updated the test to match new field widths Message-Id: <160174518865.12451.14327573383978752463.stgit@pasha-ThinkPad-X280> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-06 08:34:49 +02:00
Vladimir Sementsov-Ogievskiy	685257a284	include/block/block.h: drop non-ascii quotation mark This is the only non-ascii character in the file and it doesn't really needed here. Let's use normal "'" symbol for consistency with the rest 11 occurrences of "'" in the file. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-05 10:59:42 +01:00
Vladimir Sementsov-Ogievskiy	b33b354f3a	block/io: refactor save/load vmstate Like for read/write in a previous commit, drop extra indirection layer, generate directly bdrv_readv_vmstate() and bdrv_writev_vmstate(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-8-vsementsov@virtuozzo.com>	2020-10-05 10:59:42 +01:00
Vladimir Sementsov-Ogievskiy	fae2681add	block: drop bdrv_prwv Now that we are not maintaining boilerplate code for coroutine wrappers, there is no more sense in keeping the extra indirection layer of bdrv_prwv(). Let's drop it and instead generate pure bdrv_preadv() and bdrv_pwritev(). Currently, bdrv_pwritev() and bdrv_preadv() are returning bytes on success, auto generated functions will instead return zero, as their _co_ prototype. Still, it's simple to make the conversion safe: the only external user of bdrv_pwritev() is test-bdrv-drain, and it is comfortable enough with bdrv_co_pwritev() instead. So prototypes are moved to local block/coroutines.h. Next, the only internal use is bdrv_pread() and bdrv_pwrite(), which are modified to return bytes on success. Of course, it would be great to convert bdrv_pread() and bdrv_pwrite() to return 0 on success. But this requires audit (and probably conversion) of all their users, let's leave it for another day refactoring. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-7-vsementsov@virtuozzo.com>	2020-10-05 10:59:42 +01:00
Vladimir Sementsov-Ogievskiy	9bb4b066cc	block: generate coroutine-wrapper code Use code generation implemented in previous commit to generated coroutine wrappers in block.c and block/io.c Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-6-vsementsov@virtuozzo.com>	2020-10-05 10:59:42 +01:00
Vladimir Sementsov-Ogievskiy	aaaa20b69b	scripts: add block-coroutine-wrapper.py We have a very frequent pattern of creating a coroutine from a function with several arguments: - create a structure to pack parameters - create _entry function to call original function taking parameters from struct - do different magic to handle completion: set ret to NOT_DONE or EINPROGRESS or use separate bool field - fill the struct and create coroutine from _entry function with this struct as a parameter - do coroutine enter and BDRV_POLL_WHILE loop Let's reduce code duplication by generating coroutine wrappers. This patch adds scripts/block-coroutine-wrapper.py together with some friends, which will generate functions with declared prototypes marked by the 'generated_co_wrapper' specifier. The usage of new code generation is as follows: 1. define the coroutine function somewhere int coroutine_fn bdrv_co_NAME(...) {...} 2. declare in some header file int generated_co_wrapper bdrv_NAME(...); with same list of parameters (generated_co_wrapper is defined in "include/block/block.h"). 3. Make sure the block_gen_c declaration in block/meson.build mentions the file with your marker function. Still, no function is now marked, this work is for the following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924185414.28642-5-vsementsov@virtuozzo.com> [Added encoding='utf-8' to open() calls as requested by Vladimir. Fixed typo and grammar issues pointed out by Eric Blake. Removed clang-format dependency that caused build test issues. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-05 10:59:06 +01:00
Vladimir Sementsov-Ogievskiy	5416645fcf	block: return error-code from bdrv_invalidate_cache This is the only coroutine wrapper from block.c and block/io.c which doesn't return a value, so let's convert it to the common behavior, to simplify moving to generated coroutine wrappers in a further commit. Also, bdrv_invalidate_cache is a void function, returning error only through **errp parameter, which is considered to be bad practice, as it forces callers to define and propagate local_err variable, so conversion is good anyway. This patch leaves the conversion of .bdrv_co_invalidate_cache() driver callbacks and bdrv_invalidate_cache_all() for another day. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-2-vsementsov@virtuozzo.com>	2020-10-05 09:35:52 +01:00
Kevin Wolf	5b1cb49704	nbd: Merge nbd_export_new() and nbd_export_create() There is no real reason any more why nbd_export_new() and nbd_export_create() should be separate functions. The latter only performs a few checks before it calls the former. What makes the current state stand out is that it's the only function in BlockExportDriver that is not a static function inside nbd/server.c, but a small wrapper in blockdev-nbd.c that then calls back into nbd/server.c for the real functionality. Move all the checks to nbd/server.c and make the resulting function static to improve readability. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-27-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	331170e073	block/export: Create BlockBackend in blk_exp_add() Every export type will need a BlockBackend, so creating it centrally in blk_exp_add() instead of the .create driver callback avoids duplication. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-24-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	37a4f70cea	block/export: Move blk to BlockExport Every block export has a BlockBackend representing the disk that is exported. It should live in BlockExport therefore. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-23-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	3c3bc462ad	block/export: Add block-export-del Implement a new QMP command block-export-del and make nbd-server-remove a wrapper around it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-21-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	3859ad36f0	block/export: Move strong user reference to block_exports The reference owned by the user/monitor that is created when adding the export and dropped when removing it was tied to the 'exports' list in nbd/server.c. Every block export will have a user reference, so move it to the block export level and tie it to the 'block_exports' list in block/export/export.c instead. This is necessary for introducing a QMP command for removing exports. Note that exports are present in block_exports even after the user has requested shutdown. This is different from NBD's exports where exports are immediately removed on a shutdown request, even if they are still in the process of shutting down. In order to avoid that the user still interacts with an export that is shutting down (and possibly removes it a second time), we need to remember if the user actually still owns it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-20-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	d53be9ce55	block/export: Add 'id' option to block-export-add We'll need an id to identify block exports in monitor commands. This adds one. Note that this is different from the 'name' option in the NBD server, which is the externally visible export name. While block export ids need to be unique in the whole process, export names must be unique only for the same server. Different export types or (potentially in the future) multiple NBD servers can have the same export name externally, but still need different block export ids internally. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-19-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	bc4ee65b8c	block/export: Add blk_exp_close_all(_type) This adds a function to shut down all block exports, and another one to shut down the block exports of a single type. The latter is used for now when stopping the NBD server. As soon as we implement support for multiple NBD servers, we'll need a per-server list of exports and it will be replaced by a function using that. As a side effect, the BlockExport layer has a list tracking all existing exports now. closed_exports loses its only user and can go away. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-18-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	a6ff798966	block/export: Allocate BlockExport in blk_exp_add() Instead of letting the driver allocate and return the BlockExport object, allocate it already in blk_exp_add() and pass it. This allows us to initialise the generic part before calling into the driver so that the driver can just use these values instead of having to parse the options a second time. For symmetry, move freeing the BlockExport to blk_exp_unref(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-17-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	8612c68673	block/export: Move AioContext from NBDExport to BlockExport Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-15-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	c69de1bef5	block/export: Move refcount from NBDExport to BlockExport Having a refcount makes sense for all types of block exports. It is also a prerequisite for keeping a list of all exports at the BlockExport level. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-14-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	00917172a6	qemu-nbd: Use blk_exp_add() to create the export With this change, NBD exports are now only created through the BlockExport interface. This allows us finally to move things from the NBD layer to the BlockExport layer if they make sense for other export types, too. blk_exp_add() returns only a weak reference, so the explicit nbd_export_put() goes away. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-12-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	d794f7f372	nbd: Remove NBDExport.close callback The export close callback is unused by the built-in NBD server. qemu-nbd uses it only during shutdown to wait for the unrefed export to actually go away. It can just use nbd_export_close_all() instead and do without the callback. This removes the close callback from nbd_export_new() and makes both callers of it more similar. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924152717.287415-11-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	1c8222b014	nbd: Add max-connections to nbd-server-start This is a QMP equivalent of qemu-nbd's --shared option, limiting the maximum number of clients that can attach at the same time. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924152717.287415-9-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	9b562c646b	block/export: Remove magic from block-export-add nbd-server-add tries to be convenient and adds two questionable features that we don't want to share in block-export-add, even for NBD exports: 1. When requesting a writable export of a read-only device, the export is silently downgraded to read-only. This should be an error in the context of block-export-add. 2. When using a BlockBackend name, unplugging the device from the guest will automatically stop the NBD server, too. This may sometimes be what you want, but it could also be very surprising. Let's keep things explicit with block-export-add. If the user wants to stop the export, they should tell us so. Move these things into the nbd-server-add QMP command handler so that they apply only there. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-8-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	b57e4de079	qemu-nbd: Use raw block driver for --offset Instead of implementing qemu-nbd --offset in the NBD code, just put a raw block node with the requested offset on top of the user image and rely on that doing the job. This does not only simplify the nbd_export_new() interface and bring it closer to the set of options that the nbd-server-add QMP command offers, but in fact it also eliminates a potential source for bugs in the NBD code which previously had to add the offset manually in all relevant places. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924152717.287415-7-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	56ee86261e	block/export: Add BlockExport infrastructure and block-export-add We want to have a common set of commands for all types of block exports. Currently, this is only NBD, but we're going to add more types. This patch adds the basic BlockExport and BlockExportDriver structs and a QMP command block-export-add that creates a new export based on the given BlockExportOptions. qmp_nbd_server_add() becomes a wrapper around qmp_block_export_add(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-5-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	5daa6bfd8e	qapi: Create block-export module Move all block export related types and commands from block-core to the new QAPI module block-export. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924152717.287415-3-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	8760366cdb	nbd: Remove unused nbd_export_get_blockdev() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924152717.287415-2-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Stefan Hajnoczi	d73415a315	qemu/atomic.h: rename atomic_ to qatomic_ clang's C11 atomic_fetch_() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int ' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic$64$\?_[a-z0-9_]\+' include/qemu/atomic.h \| \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>	2020-09-23 16:07:44 +01:00
Eduardo Habkost	8063396bf3	Use OBJECT_DECLARE_SIMPLE_TYPE when possible This converts existing DECLARE_INSTANCE_CHECKER usage to OBJECT_DECLARE_SIMPLE_TYPE when possible. $ ./scripts/codeconverter/converter.py -i \ --pattern=AddObjectDeclareSimpleType $(git grep -l '' -- '*.[ch]') Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Paul Durrant <paul@xen.org> Message-Id: <20200916182519.415636-6-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-09-18 14:12:32 -04:00
Peter Maydell	f4ef8c9cc1	QOM boilerplate cleanup Documentation build fix: * memory: Remove kernel-doc comment marker (Eduardo Habkost) QOM cleanups: * Rename QOM macros for consistency between TYPE_* and type checking constants (Eduardo Habkost) QOM new macros: * OBJECT_DECLARE_* and OBJECT_DEFINE_* macros (Daniel P. Berrangé) * DECLARE__CHECKER macros (Eduardo Habkost) Automated QOM boilerplate changes: Automated changes to use DECLARE__CHECKER (Eduardo Habkost Automated changes to use OBJECT_DECLARE* (Eduardo Habkost) -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEWjIv1avE09usz9GqKAeTb5hNxaYFAl9abc0UHGVoYWJrb3N0 QHJlZGhhdC5jb20ACgkQKAeTb5hNxaYU9Q/8CyK1w2SlItxBhos7zojqnZ9TP1Jt b1YCApQJ+bKSPAUDyefajQA0D9HeR9bFlreiOprQnmZWOqeOvnRIxNGvelJRqRRu KcIA5DIfVMJRkKJQEXairrGdnPmFLWSLEb7AmwxyAhp5G51PCP/3kbudi3T/vrNr OaccUejs5UgImPfO8Fm+0zqZPmblq/xmtU0p77FvDxGNFPPG8ddpu7eKksGD7FYd 5bTJTtUhONYG9EJMUD2TBxnJoy1pi6AYUu4+2T211RpBcxeiyNSSitI8fZTk6BGl 33VwQib9SXjGaE8VsSvHDHhLLec7sqqr2JH3rfvyKF6BOptKWzmSzFdbo2mrRkSy 8jfCImQgTBBMAHBWP+MFTeKuzfhikZx2DbBLzpppHMMvCca6Zc+oYgR2FbVwuPsw H2YL+8Wx4Ws6RXe147toNDRbv75vnS7F3fU800Pcur5VHJWTgSpT/tggzmVPWsdU GeUgceYlXyVk5/fC89ZhhtD9eurfBSzQR4eN7/nie2wD6PFMpZkOjHwLn40uWsyq xRO0F4uYghNU1N8z6NBhEYLTBtEcS1HFEisSLQrnTQH9W0I7mBx3MaZib/uK7NLC b2gT0hossTT8Z46Z8ynoZarwO5EquAMWEQtc9hfZGWacrQEpjVm2DMYMfu83krWb xhgl+mpKqVasAPk= =RjXc -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ehabkost/tags/machine-next-pull-request' into staging QOM boilerplate cleanup Documentation build fix: * memory: Remove kernel-doc comment marker (Eduardo Habkost) QOM cleanups: * Rename QOM macros for consistency between TYPE_* and type checking constants (Eduardo Habkost) QOM new macros: * OBJECT_DECLARE_* and OBJECT_DEFINE_* macros (Daniel P. Berrangé) * DECLARE__CHECKER macros (Eduardo Habkost) Automated QOM boilerplate changes: Automated changes to use DECLARE__CHECKER (Eduardo Habkost Automated changes to use OBJECT_DECLARE* (Eduardo Habkost) # gpg: Signature made Thu 10 Sep 2020 19:17:49 BST # gpg: using RSA key 5A322FD5ABC4D3DBACCFD1AA2807936F984DC5A6 # gpg: issuer "ehabkost@redhat.com" # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" [full] # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/machine-next-pull-request: (33 commits) virtio-vga: Use typedef name for instance_size vhost-user-vga: Use typedef name for instance_size xilinx_axienet: Use typedef name for instance_size lpc_ich9: Use typedef name for instance_size omap_intc: Use typedef name for instance_size xilinx_axidma: Use typedef name for instance_size tusb6010: Rename TUSB to TUSB6010 pc87312: Rename TYPE_PC87312_SUPERIO to TYPE_PC87312 vfio: Rename PCI_VFIO to VFIO_PCI usb: Rename USB_SERIAL_DEV to USB_SERIAL sabre: Rename SABRE_DEVICE to SABRE rs6000_mc: Rename RS6000MC_DEVICE to RS6000MC filter-rewriter: Rename FILTER_COLO_REWRITER to FILTER_REWRITER esp: Rename ESP_STATE to ESP ahci: Rename ICH_AHCI to ICH9_AHCI vmgenid: Rename VMGENID_DEVICE to TYPE_VMGENID vfio: Rename VFIO_AP_DEVICE_TYPE to TYPE_VFIO_AP_DEVICE dev-smartcard-reader: Rename CCID_DEV_NAME to TYPE_USB_CCID_DEV ap-device: Rename AP_DEVICE_TYPE to TYPE_AP_DEVICE gpex: Fix type checking function name ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-11 19:26:51 +01:00
Eduardo Habkost	8110fa1d94	Use DECLARE_CHECKER macros Generated using: $ ./scripts/codeconverter/converter.py -i \ --pattern=TypeCheckMacro $(git grep -l '' -- '*.[ch]') Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-12-ehabkost@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-13-ehabkost@redhat.com> Message-Id: <20200831210740.126168-14-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-09-09 09:27:09 -04:00
Eduardo Habkost	db1015e92e	Move QOM typedefs and add missing includes Some typedefs and macros are defined after the type check macros. This makes it difficult to automatically replace their definitions with OBJECT_DECLARE_TYPE. Patch generated using: $ ./scripts/codeconverter/converter.py -i \ --pattern=QOMStructTypedefSplit $(git grep -l '' -- '.[ch]') which will split "typdef struct { ... } TypedefName" declarations. Followed by: $ ./scripts/codeconverter/converter.py -i --pattern=MoveSymbols \ $(git grep -l '' -- '.[ch]') which will: - move the typedefs and #defines above the type check macros - add missing #include "qom/object.h" lines if necessary Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-9-ehabkost@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20200831210740.126168-10-ehabkost@redhat.com> Message-Id: <20200831210740.126168-11-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-09-09 09:26:43 -04:00
Max Reitz	0b877d09df	block: Leave BDS.backing_{file,format} constant Parts of the block layer treat BDS.backing_file as if it were whatever the image header says (i.e., if it is a relative path, it is relative to the overlay), other parts treat it like a cache for bs->backing->bs->filename (relative paths are relative to the CWD). Considering bs->backing->bs->filename exists, let us make it mean the former. Among other things, this now allows the user to specify a base when using qemu-img to commit an image file in a directory that is not the CWD (assuming, everything uses relative filenames). Before this patch: $ ./qemu-img create -f qcow2 foo/bot.qcow2 1M $ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2 $ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2 $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2 qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2' $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2 qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2' $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2 qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2' After this patch: $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2 Image committed. $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2 qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2' $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2 Image committed. With this change, bdrv_find_backing_image() must look at whether the user has overridden a BDS's backing file. If so, it can no longer use bs->backing_file, but must instead compare the given filename against the backing node's filename directly. Note that this changes the QAPI output for a node's backing_file. We had very inconsistent output there (sometimes what the image header said, sometimes the actual filename of the backing image). This inconsistent output was effectively useless, so we have to decide one way or the other. Considering that bs->backing_file usually at runtime contained the path to the image relative to qemu's CWD (or absolute), this patch changes QAPI's backing_file to always report the bs->backing->bs->filename from now on. If you want to receive the image header information, you have to refer to full-backing-filename. This necessitates a change to iotest 228. The interesting information it really wanted is the image header, and it can get that now, but it has to use full-backing-filename instead of backing_file. Because of this patch's changes to bs->backing_file's behavior, we also need some reference output changes. Along with the changes to bs->backing_file, stop updating BDS.backing_format in bdrv_backing_attach() as well. This way, ImageInfo's backing-filename and backing-filename-format fields will represent what the image header says and nothing else. iotest 245 changes in behavior: With the backing node no longer overriding the parent node's backing_file string, you can now omit the @backing option when reopening a node with neither a default nor a current backing file even if it used to have a backing node at some point. 273 also changes: The base image is opened without a format layer, so ImageInfo.backing-filename-format used to report "file" for the base image's overlay after blockdev-snapshot. However, the image header never says "file" anywhere, so it now reports $IMGFMT. Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	549ec0d978	block: Inline bdrv_co_block_status_from_*() With bdrv_filter_bs(), we can easily handle this default filter behavior in bdrv_co_block_status(). blkdebug wants to have an additional assertion, so it keeps its own implementation, except bdrv_co_block_status_from_file() needs to be inlined there. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	f1a7f18f07	block: Drop backing_bs() We want to make it explicit where bs->backing is used, and we have done so. The old role of backing_bs() is now effectively taken by bdrv_cow_bs(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	ae23f78646	block: Add bdrv_supports_compressed_writes() Filters cannot compress data themselves but they have to implement .bdrv_co_pwritev_compressed() still (or they cannot forward compressed writes). Therefore, checking whether bs->drv->bdrv_co_pwritev_compressed is non-NULL is not sufficient to know whether the node can actually handle compressed writes. This function looks down the filter chain to see whether there is a non-filter that can actually convert the compressed writes into compressed data (and thus normal writes). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	8b8277cdb0	block: Drop bdrv_is_encrypted() The original purpose of bdrv_is_encrypted() was to inquire whether a BDS can be used without the user entering a password or not. It has not been used for that purpose for quite some time. Actually, it is not even fit for that purpose, because to answer that question, it would have recursively query all of the given node's children. So now we have to decide in which direction we want to fix bdrv_is_encrypted(): Recursively query all children, or drop it and just use bs->encrypted to get the current node's status? Nowadays, its only purpose is to report through bdrv_query_image_info() whether the given image is encrypted or not. For this purpose, it is probably more interesting to see whether a given node itself is encrypted or not (otherwise, a management application cannot discern for certain which nodes are really encrypted and which just have encrypted children). Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Max Reitz	d38d7eb8a5	block: Add chain helper functions Add some helper functions for skipping filters in a chain of block nodes. Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:30 +02:00
Max Reitz	9a6fc88799	block: Add child access functions There are BDS children that the general block layer code can access, namely bs->file and bs->backing. Since the introduction of filters and external data files, their meaning is not quite clear. bs->backing can be a COW source, or it can be a filtered child; bs->file can be a filtered child, it can be data and metadata storage, or it can be just metadata storage. This overloading really is not helpful. This patch adds functions that retrieve the correct child for each exact purpose. Later patches in this series will make use of them. Doing so will allow us to handle filter nodes in a meaningful way. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2020-09-07 12:31:30 +02:00
Klaus Jensen	69265150aa	hw/block/nvme: be consistent about zeros vs zeroes The NVM Express specification generally uses 'zeroes' and not 'zeros', so let us align with it. Cc: Fam Zheng <fam@euphon.net> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>	2020-09-02 08:48:50 +02:00
Klaus Jensen	7c46310d29	hw/block/nvme: support the get/set features select and save fields Since the device does not have any persistent state storage, no features are "saveable" and setting the Save (SV) field in any Set Features command will result in a Feature Identifier Not Saveable status code. Similarly, if the Select (SEL) field is set to request saved values, the devices will (as it should) return the default values instead. Since this also introduces "Supported Capabilities", the nsid field is now also checked for validity wrt. the feature being get/set'ed. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200706061303.246057-13-its@irrelevant.dk>	2020-09-02 08:48:50 +02:00
Klaus Jensen	1302e48e49	hw/block/nvme: add remaining mandatory controller parameters Add support for any remaining mandatory controller operating parameters (features). Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200706061303.246057-12-its@irrelevant.dk>	2020-09-02 08:48:50 +02:00
Klaus Jensen	46ac29c38b	hw/block/nvme: move NvmeFeatureVal into hw/block/nvme.h The NvmeFeatureVal does not belong with the spec-related data structures in include/block/nvme.h that is shared between the block-level nvme driver and the emulated nvme device. Move it into the nvme device specific header file as it is the only user of the structure. Also, remove the unused members. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200706061303.246057-10-its@irrelevant.dk>	2020-09-02 08:48:50 +02:00
Klaus Jensen	5d5a53302b	hw/block/nvme: add support for the asynchronous event request command Add support for the Asynchronous Event Request command. Required for compliance with NVMe revision 1.3d. See NVM Express 1.3d, Section 5.2 ("Asynchronous Event Request command"). Mostly imported from Keith's qemu-nvme tree. Modified with a max number of queued events (controllable with the aer_max_queued device parameter). The spec states that the controller should retain events, so we do best effort here. Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Message-Id: <20200706061303.246057-9-its@irrelevant.dk>	2020-09-02 08:48:50 +02:00
Klaus Jensen	94a7897c41	hw/block/nvme: add support for the get log page command Add support for the Get Log Page command and basic implementations of the mandatory Error Information, SMART / Health Information and Firmware Slot Information log pages. In violation of the specification, the SMART / Health Information log page does not persist information over the lifetime of the controller because the device has no place to store such persistent state. Note that the LPA field in the Identify Controller data structure intentionally has bit 0 cleared because there is no namespace specific information in the SMART / Health information log page. Required for compliance with NVMe revision 1.3d. See NVM Express 1.3d, Section 5.14 ("Get Log Page command"). Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200706061303.246057-8-its@irrelevant.dk>	2020-09-02 08:48:50 +02:00
Klaus Jensen	42a42e4610	hw/block/nvme: mark fw slot 1 as read-only Mark firmware slot 1 as read-only and only support that slot. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200706061303.246057-7-its@irrelevant.dk>	2020-09-02 08:48:50 +02:00
Klaus Jensen	69ff06c49e	hw/block/nvme: add temperature threshold feature It might seem weird to implement this feature for an emulated device, but it is mandatory to support and the feature is useful for testing asynchronous event request support, which will be added in a later patch. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Message-Id: <20200706061303.246057-6-its@irrelevant.dk>	2020-09-02 08:48:50 +02:00

1 2 3 4 5 ...

1234 Commits