mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Peter Maydell	d689ecad07	hw/block/nvme updates * NVMe subsystem support (`-device nvme-subsys`) (Minwoo Im) * Namespace (De\|At)tachment support (Minwoo Im) * Simple Copy command support (Klaus Jensen) * Flush broadcast support (Gollu Appalanaidu) * QEMUIOVector/QEMUSGList duality refactoring (Klaus Jensen) plus various fixes from Minwoo, Gollu, Dmitry and me. v2: - add `nqn` nvme-subsys device parameter instead of using `id`. (Paolo) -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEUigzqnXi3OaiR2bATeGvMW1PDekFAmBHX5EACgkQTeGvMW1P Dek7ygf9H1g1wwp0Epo3U9z3wCWGcGz6uhBcqEW/T7MtmFg0G50C9iUGKOSe7xtv bWDabGJj0VCCzceeVUtMLfDXtDsekdUEiP/40OWuD3GsMDE7BJj0YIEklPioqa14 maDuFoXbXvgmbPUGIf4IIvmQDTOg22K2oBkjCHA9nbaZM6qxIPC3wRtM77sxXDcW QdKQR4QXOs0lCXUlPdO9PhhnYcxt7k2/WxCKQsdF7ZL7USDkgZ/c7m54MUaw2st4 gQWUAh6vxa+oEjpAjgivCdnmd6B4srpUsvMK3U4UKSpYFXus7teK2x88e8t/plKR 9X0sgL40+PQVI9/CeBh0Z8l/vJYCvg== =WF8v -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/nvme/tags/nvme-next-pull-request' into staging hw/block/nvme updates * NVMe subsystem support (`-device nvme-subsys`) (Minwoo Im) * Namespace (De\|At)tachment support (Minwoo Im) * Simple Copy command support (Klaus Jensen) * Flush broadcast support (Gollu Appalanaidu) * QEMUIOVector/QEMUSGList duality refactoring (Klaus Jensen) plus various fixes from Minwoo, Gollu, Dmitry and me. v2: - add `nqn` nvme-subsys device parameter instead of using `id`. (Paolo) # gpg: Signature made Tue 09 Mar 2021 11:44:17 GMT # gpg: using RSA key 522833AA75E2DCE6A24766C04DE1AF316D4F0DE9 # gpg: Good signature from "Klaus Jensen <its@irrelevant.dk>" [unknown] # gpg: aka "Klaus Jensen <k.jensen@samsung.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: DDCA 4D9C 9EF9 31CC 3468 4272 63D5 6FC5 E55D A838 # Subkey fingerprint: 5228 33AA 75E2 DCE6 A247 66C0 4DE1 AF31 6D4F 0DE9 * remotes/nvme/tags/nvme-next-pull-request: (38 commits) hw/block/nvme: support Identify NS Attached Controller List hw/block/nvme: support changed namespace asynchronous event hw/block/nvme: support namespace attachment command hw/block/nvme: refactor nvme_select_ns_iocs hw/block/nvme: support allocated namespace type hw/block/nvme: fix allocated namespace list to 256 hw/block/nvme: fix namespaces array to 1-based hw/block/nvme: support namespace detach hw/block/nvme: refactor nvme_dma hw/block/nvme: remove the req dependency in map functions hw/block/nvme: try to deal with the iov/qsg duality hw/block/nvme: fix strerror printing hw/block/nvme: remove block accounting for write zeroes hw/block/nvme: remove redundant len member in compare context hw/block/nvme: report non-mdts command size limit for dsm hw/block/nvme: add trace event for zone read check hw/block/nvme: fix potential compilation error hw/block/nvme: add identify trace event hw/block/nvme: remove unnecessary endian conversion hw/block/nvme: align zoned.zasl with mdts ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-03-10 20:11:33 +00:00
Minwoo Im	23fb7dfeca	hw/block/nvme: support Identify NS Attached Controller List Support Identify command for Namespace attached controller list. This command handler will traverse the controller instances in the given subsystem to figure out whether the specified nsid is attached to the controllers or not. The 4096bytes Identify data will return with the first entry (16bits) indicating the number of the controller id entries. So, the data can hold up to 2047 entries for the controller ids. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> [k.jensen: rebased for dma refactor] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-03-09 11:00:58 +01:00
Minwoo Im	f432fdfa12	hw/block/nvme: support changed namespace asynchronous event If namespace inventory is changed due to some reasons (e.g., namespace attachment/detachment), controller can send out event notifier to the host to manage namespaces. This patch sends out the AEN to the host after either attach or detach namespaces from controllers. To support clear of the event from the controller, this patch also implemented Get Log Page command for Changed Namespace List log type. To return namespace id list through the command, when namespace inventory is updated, id is added to the per-controller list (changed_ns_list). To indicate the support of this async event, this patch set OAES(Optional Asynchronous Events Supported) in Identify Controller data structure. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-03-09 11:00:58 +01:00
Minwoo Im	645ce1a70c	hw/block/nvme: support namespace attachment command This patch supports Namespace Attachment command for the pre-defined nvme-ns device nodes. Of course, attach/detach namespace should only be supported in case 'subsys' is given. This is because if we detach a namespace from a controller, somebody needs to manage the detached, but allocated namespace in the NVMe subsystem. As command effect for the namespace attachment command is registered, the host will be notified that namespace inventory is changed so that host will rescan the namespace inventory after this command. For example, kernel driver manages this command effect via passthru IOCTL. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> [k.jensen: rebased for dma refactor] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-03-09 11:00:58 +01:00
Gollu Appalanaidu	67ce28a1fd	hw/block/nvme: report non-mdts command size limit for dsm Dataset Management is not subject to MDTS, but exceeded a certain size per range causes internal looping. Report this limit (DMRSL) in the NVM command set specific identify controller data structure. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-03-09 11:00:57 +01:00
Gollu Appalanaidu	c94973288c	hw/block/nvme: add broadcast nsid support flush command Add support for using the broadcast nsid to issue a flush on all namespaces through a single command. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-03-09 11:00:57 +01:00
Klaus Jensen	3862efff31	nvme: updated shared header for copy command Add new data structures and types for the Simple Copy command. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-03-09 11:00:57 +01:00
Minwoo Im	adc36b8d21	hw/block/nvme: add NMIC enum value for Identify Namespace Added Namespace Multi-path I/O and Namespace Sharing Capabilities (NMIC) field to support shared namespace from controller(s). This field is in Identify Namespace data structure in [30]. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-03-09 11:00:57 +01:00
Minwoo Im	66b7e9bed0	hw/block/nvme: add CMIC enum value for Identify Controller Added Controller Multi-path I/O and Namespace Sharing Capabilities (CMIC) field to support multi-controller in the following patches. This field is in Identify Controller data structure in [76]. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-03-09 11:00:57 +01:00
Vladimir Sementsov-Ogievskiy	35f428ba39	qcow2-bitmap: make bytes_covered_by_bitmap_cluster() public Rename bytes_covered_by_bitmap_cluster() to bdrv_dirty_bitmap_serialization_coverage() and make it public. It is needed as we are going to share it with bitmap loading in parallels format. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-Id: <20210224104707.88430-2-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:56:54 +01:00
Maxim Levitsky	a890f08e58	block: add bdrv_co_delete_file_noerr This function wraps bdrv_co_delete_file for the common case of removing a file, which was just created by format driver, on an error condition. It hides the -ENOTSUPP error, and reports all other errors otherwise. Use it in luks driver Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20201217170904.946013-3-mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-02-15 15:10:14 +01:00
Peter Maydell	392b9a74b9	bitmaps patches for 2021-02-12 - add 'transform' member to manipulate bitmaps across migration - work towards better error handling during bdrv_open -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmAnDQsACgkQp6FrSiUn Q2qc5Qf/SKVdpX4j7OnHF6sBuf/8LVWz4KazSqEU0ohazBJmafgJpH2EA5pXMXR4 frZDWeanGmhj1MjMkta/++uvEBU/TMpW2z98mZvjErteXdnRQAlII/hOCI+QZJvg viQ5t1EyrkyXzUePOjs+AwqA5KHWbCKt6QqyItQ78HvI23sw/fuvHj0G67KbVzXZ VcSrVr0J7PXnZV/hWfg+C+Nn9Ro9tsVdn79awLYVQ7/SDro3hzylpcHMQaHMK2oe mX4D2kNq7s21E27Zb6vlknUhQPkMdETk0gfEbpn7sTVMEc58GRLC7Tqfx7l0JIFK 5izVyA5vndKVxDGYPkbDK6VL2uDg4A== =+Epy -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-bitmaps-2021-02-12' into staging bitmaps patches for 2021-02-12 - add 'transform' member to manipulate bitmaps across migration - work towards better error handling during bdrv_open # gpg: Signature made Fri 12 Feb 2021 23:19:39 GMT # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-bitmaps-2021-02-12: block: use return status of bdrv_append() block: return status from bdrv_append and friends qemu-iotests: 300: Add test case for modifying persistence of bitmap migration: dirty-bitmap: Allow control of bitmap persistence migration: dirty-bitmap: Use struct for alias map inner members Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-02-13 21:26:00 +00:00
Vladimir Sementsov-Ogievskiy	a1e708fcda	block: return status from bdrv_append and friends The recommended use of qemu error api assumes returning status together with setting errp and avoid void functions with errp parameter. Let's improve bdrv_append and some friends to reduce error-propagation overhead in further patches. Choose int return status, because bdrv_replace_node_common() has call to bdrv_check_update_perm(), which reports int status, which seems correct to propagate. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210202124956.63146-2-vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-12 15:36:41 -06:00
Vladimir Sementsov-Ogievskiy	bd54669a4a	block: add new BlockDriver handler: bdrv_cancel_in_flight It will be used to stop retrying NBD requests on mirror cancel. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210205163720.887197-2-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-12 09:45:18 -06:00
Peter Maydell	1214d55d1c	Emulated NVMe device updates * deallocate or unwritten logical block error feature (me) * dataset management command (me) * compare command (Gollu Appalanaidu) * namespace types (Niklas Cassel) * zoned namespaces (Dmitry Fomichev) * smart critical warning toggle (Zhenwei Pi) * allow cmb and pmr to coexist (me) * pmr rds/wds support (Naveen Nagar) * cmb v1.4 logic (Padmakar Kalghatgi) And a lot of smaller fixes from Gollu Appalanaidu and Minwoo Im. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEUigzqnXi3OaiR2bATeGvMW1PDekFAmAiON4ACgkQTeGvMW1P DenhxQf/WzoiO5bXmvZeRv+IHoIn5GhJ1NxLRYO5MX0diswh0BBwUmIscbEDMe1b GsD6rpd9YfEO80L/sEGqgV09HT+6e8YwDsNNZBTAsUNRVx0WxckgcGWcrzNNA9Nc WEl0Q8si5USSQ1C7djplLXdR6p4pbA/gIk6AjNIo3q2VK1ZqCBhQGESGEfGgrAXW xWo8C1V8dnKdxUYI2blbti44sElHZJ6jcF5N3Xmv0UUa1WL0hh0u6qr7IbCZe1kO SUFWMIGLF+1C35MUyWpgCjCn5cUdnTA0s/SLEuWtDlNYRhRRh0D6LZviVTZi38Wx 6Cxg/bRkSlcKo1/jswwYcAaH7qQ4Eg== =NB9D -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/nvme/tags/nvme-next-pull-request' into staging Emulated NVMe device updates * deallocate or unwritten logical block error feature (me) * dataset management command (me) * compare command (Gollu Appalanaidu) * namespace types (Niklas Cassel) * zoned namespaces (Dmitry Fomichev) * smart critical warning toggle (Zhenwei Pi) * allow cmb and pmr to coexist (me) * pmr rds/wds support (Naveen Nagar) * cmb v1.4 logic (Padmakar Kalghatgi) And a lot of smaller fixes from Gollu Appalanaidu and Minwoo Im. # gpg: Signature made Tue 09 Feb 2021 07:25:18 GMT # gpg: using RSA key 522833AA75E2DCE6A24766C04DE1AF316D4F0DE9 # gpg: Good signature from "Klaus Jensen <its@irrelevant.dk>" [unknown] # gpg: aka "Klaus Jensen <k.jensen@samsung.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: DDCA 4D9C 9EF9 31CC 3468 4272 63D5 6FC5 E55D A838 # Subkey fingerprint: 5228 33AA 75E2 DCE6 A247 66C0 4DE1 AF31 6D4F 0DE9 * remotes/nvme/tags/nvme-next-pull-request: (56 commits) hw/block/nvme: refactor the logic for zone write checks hw/block/nvme: fix zone boundary check for append hw/block/nvme: fix wrong parameter name 'cross_read' hw/block/nvme: align with existing style hw/block/nvme: fix set feature save field check hw/block/nvme: fix set feature for error recovery hw/block/nvme: error if drive less than a zone size hw/block/nvme: lift cmb restrictions hw/block/nvme: bump to v1.4 hw/block/nvme: move cmb logic to v1.4 hw/block/nvme: add PMR RDS/WDS support hw/block/nvme: disable PMR at boot up hw/block/nvme: remove redundant zeroing of PMR registers hw/block/nvme: rename PMR/CMB shift/mask fields hw/block/nvme: allow cmb and pmr to coexist hw/block/nvme: move msix table and pba to BAR 0 hw/block/nvme: indicate CMB support through controller capabilities register hw/block/nvme: fix 64 bit register hi/lo split writes hw/block/nvme: add size to mmio read/write trace events hw/block/nvme: trigger async event during injecting smart warning ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-02-09 13:24:37 +00:00
Klaus Jensen	c2a3640de8	hw/block/nvme: bump to v1.4 With the new CMB logic in place, bump the implemented specification version to v1.4 by default. This requires adding the setting the CNTRLTYPE field and modifying the VWC field since 0x00 is no longer a valid value for bits 2:1. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:54 +01:00
Padmakar Kalghatgi	f4319477b4	hw/block/nvme: move cmb logic to v1.4 Implement v1.4 logic for configuring the Controller Memory Buffer. By default, the v1.4 scheme will be used (CMB must be explicitly enabled by the host), so drivers that only support v1.3 will not be able to use the CMB anymore. To retain the v1.3 behavior, set the boolean 'legacy-cmb' nvme device parameter. Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Padmakar Kalghatgi <p.kalghatgi@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:54 +01:00
Klaus Jensen	8e9e8b4821	hw/block/nvme: rename PMR/CMB shift/mask fields Use the correct field names. Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
Andrzej Jakowski	c705063129	hw/block/nvme: indicate CMB support through controller capabilities register This patch sets CMBS bit in controller capabilities register when user configures NVMe driver with CMB support, so capabilites are correctly reported to guest OS. Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com> Reviewed-by: Maxim Levitsky <mlevitsky@gmail.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
zhenwei pi	c62720f137	hw/block/nvme: trigger async event during injecting smart warning During smart critical warning injection by setting property from QMP command, also try to trigger asynchronous event. Suggested by Keith, if a event has already been raised, there is no need to enqueue the duplicate event any more. Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> [k.jensen: fix typo in commit message] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
zhenwei pi	4714791b66	hw/block/nvme: add smart_critical_warning property There is a very low probability that hitting physical NVMe disk hardware critical warning case, it's hard to write & test a monitor agent service. For debugging purposes, add a new 'smart_critical_warning' property to emulate this situation. The orignal version of this change is implemented by adding a fixed property which could be initialized by QEMU command line. Suggested by Philippe & Klaus, rework like current version. Test with this patch: 1, change smart_critical_warning property for a running VM: #virsh qemu-monitor-command nvme-upstream '{ "execute": "qom-set", "arguments": { "path": "/machine/peripheral-anon/device[0]", "property": "smart_critical_warning", "value":16 } }' 2, run smartctl in guest #smartctl -H -l error /dev/nvme0n1 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: FAILED! - volatile memory backup device has failed Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
zhenwei pi	c6d1b5c13b	nvme: introduce bit 5 for critical warning According to NVM Express v1.4, Section 5.14.1.2 ("SMART / Health Information"), introduce bit 5 for "Persistent Memory Region has become read-only or unreliable". Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> [k.jensen: minor brush ups in commit message] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
Klaus Jensen	b05fde2881	hw/block/nvme: enum style fix Align with existing style and use a typedef for header-file enums. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>	2021-02-08 21:15:53 +01:00
Dmitry Fomichev	e9ba46eeaf	nvme: Make ZNS-related definitions Define values and structures that are needed to support Zoned Namespace Command Set (NVMe TP 4053). Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 20:58:34 +01:00
Niklas Cassel	922e6f4ebd	hw/block/nvme: Support allocated CNS command variants Many CNS commands have "allocated" command variants. These include a namespace as long as it is allocated, that is a namespace is included regardless if it is active (attached) or not. While these commands are optional (they are mandatory for controllers supporting the namespace attachment command), our QEMU implementation is more complete by actually providing support for these CNS values. However, since our QEMU model currently does not support the namespace attachment command, these new allocated CNS commands will return the same result as the active CNS command variants. The reason for not hooking up this command completely is because the NVMe specification requires the namespace management command to be supported if the namespace attachment command is supported. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 20:58:34 +01:00
Niklas Cassel	141354d55b	hw/block/nvme: Add support for Namespace Types Define the structures and constants required to implement Namespace Types support. Namespace Types introduce a new command set, "I/O Command Sets", that allows the host to retrieve the command sets associated with a namespace. Introduce support for the command set and enable detection for the NVM Command Set. The new workflows for identify commands rely heavily on zero-filled identify structs. E.g., certain CNS commands are defined to return a zero-filled identify struct when an inactive namespace NSID is supplied. Add a helper function in order to avoid code duplication when reporting zero-filled identify structures. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 20:58:34 +01:00
Dmitry Fomichev	62e8faa468	hw/block/nvme: Add Commands Supported and Effects log This log page becomes necessary to implement to allow checking for Zone Append command support in Zoned Namespace Command Set. This commit adds the code to report this log page for NVM Command Set only. The parts that are specific to zoned operation will be added later in the series. All incoming admin and i/o commands are now only processed if their corresponding support bits are set in this log. This provides an easy way to control what commands to support and what not to depending on set CC.CSS. Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 20:58:32 +01:00
Klaus Jensen	6fd704a59a	nvme: add namespace I/O optimization fields to shared header This adds the NPWG, NPWA, NPDG, NPDA and NOWS family of fields to the shared nvme.h header for use by later patches. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Fam Zheng <fam@euphon.net> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>	2021-02-08 18:55:48 +01:00
Klaus Jensen	54064e51d1	hw/block/nvme: add dulbe support Add support for reporting the Deallocated or Unwritten Logical Block Error (DULBE). Rely on the block status flags reported by the block layer and consider any block with the BDRV_BLOCK_ZERO flag to be deallocated. Multiple factors affect when a Write Zeroes command result in deallocation of blocks. * the underlying file system block size * the blockdev format * the 'discard' and 'logical_block_size' parameters format \| discard \| wz (512B) wz (4KiB) wz (64KiB) ----------------------------------------------------- qcow2 ignore n n y qcow2 unmap n n y raw ignore n y y raw unmap n y y So, this works best with an image in raw format and 4KiB LBAs, since holes can then be punched on a per-block basis (this assumes a file system with a 4kb block size, YMMV). A qcow2 image, uses a cluster size of 64KiB by default and blocks will only be marked deallocated if a full cluster is zeroed or discarded. However, this is consistent with the spec since Write Zeroes "should" deallocate the block if the Deallocate attribute is set and "may" deallocate if the Deallocate attribute is not set. Thus, we always try to deallocate (the BDRV_REQ_MAY_UNMAP flag is always set). Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-02-08 18:55:48 +01:00
Daniel P. Berrangé	3d3e9b1f66	block: rename and alter bdrv_all_find_snapshot semantics Currently bdrv_all_find_snapshot() will return 0 if it finds a snapshot, -1 if an error occurs, or if it fails to find a snapshot. New callers to be added want to distinguish between the error scenario and failing to find a snapshot. Rename it to bdrv_all_has_snapshot and make it return -1 on error, 0 if no snapshot is found and 1 if snapshot is found. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-7-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	c22d644ca7	block: allow specifying name of block device for vmstate storage Currently the vmstate will be stored in the first block device that supports snapshots. Historically this would have usually been the root device, but with UEFI it might be the variable store. There needs to be a way to override the choice of block device to store the state in. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-6-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	cf3a74c94f	block: add ability to specify list of blockdevs during snapshot When running snapshot operations, there are various rules for which blockdevs are included/excluded. While this provides reasonable default behaviour, there are scenarios that are not well handled by the default logic. Some of the conditions do not have a single correct answer. Thus there needs to be a way for the mgmt app to provide an explicit list of blockdevs to perform snapshots across. This can be achieved by passing a list of node names that should be used. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-5-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	e26f98e209	block: push error reporting into bdrv_all__snapshot functions The bdrv_all__snapshot functions return a BlockDriverState pointer for the invalid backend, which the callers then use to report an error message. In some cases multiple callers are reporting the same error message, but with slightly different text. In the future there will be more error scenarios for some of these methods, which will benefit from fine grained error message reporting. So it is helpful to push error reporting down a level. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> [PMD: Initialize variables] Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210204124834.774401-2-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Roman Kagan	5082fc82a6	nbd: make nbd_read* return -EIO on error NBD reconnect logic considers the error code from the functions that read NBD messages to tell if reconnect should be attempted or not: it is attempted on -EIO, otherwise the client transitions to NBD_CLIENT_QUIT state (see nbd_channel_error). This error code is propagated from the primitives like nbd_read. The problem, however, is that nbd_read itself turns every error into -1 rather than -EIO. As a result, if the NBD server happens to die while sending the message, the client in QEMU receives less data than it expects, considers it as a fatal error, and wouldn't attempt reestablishing the connection. Fix it by turning every negative return from qio_channel_read_all into -EIO returned from nbd_read. Apparently that was the original behavior, but got broken later. Also adjust nbd_readXX to follow. Fixes: `e6798f06a6` ("nbd: generalize usage of nbd_read") Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210129073859.683063-4-rvkagan@yandex-team.ru> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy	a5215b8fdf	block/io: use int64_t bytes in copy_range We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert now copy_range parameters which are already 64bit to signed type. It's safe as we don't work with requests overflowing BDRV_MAX_LENGTH (which is less than INT64_MAX), and do check the requests in bdrv_co_copy_range_internal() (by bdrv_check_request32(), which calls bdrv_check_request()). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-17-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy	e9e52efdc5	block/io: support int64_t bytes in read/write wrappers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). Now, since bdrv_co_preadv_part() and bdrv_co_pwritev_part() have been updated, update all their wrappers. For all of them type of 'bytes' is widening, so callers are safe. We have update request_fn in blkverify.c simultaneously. Still it's just a pointer to one of bdrv_co_pwritev() or bdrv_co_preadv(), and type is widening for callers of the request_fn anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-16-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweak] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy	37e9403ea8	block/io: support int64_t bytes in bdrv_co_p{read,write}v_part() We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, prepare bdrv_co_preadv_part() and bdrv_co_pwritev_part() and their remaining dependencies now. bdrv_pad_request() is updated simultaneously, as pointer to bytes passed to it both from bdrv_co_pwritev_part() and bdrv_co_preadv_part(). So, all callers of bdrv_pad_request() are updated to pass 64bit bytes. bdrv_pad_request() is already good for 64bit requests, add corresponding assertion. Look at bdrv_co_preadv_part() and bdrv_co_pwritev_part(). Type is widening, so callers are safe. Let's look inside the functions. In bdrv_co_preadv_part() and bdrv_aligned_pwritev() we only pass bytes to other already int64_t interfaces (and some obviously safe calculations), it's OK. In bdrv_co_do_zero_pwritev() aligned_bytes may become large now, still it's passed to bdrv_aligned_pwritev which supports int64_t bytes. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-15-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:11 -06:00
Eric Blake	8024726459	block: use int64_t as bytes type in tracked requests We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). All requests in block/io must not overflow BDRV_MAX_LENGTH, all external users of BdrvTrackedRequest already have corresponding assertions, so we are safe. Add some assertions still. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-9-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:14:15 -06:00
Vladimir Sementsov-Ogievskiy	801625e69d	block/throttle-groups: throttle_group_co_io_limits_intercept(): 64bit bytes The function is called from 64bit io handlers, and bytes is just passed to throttle_account() which is 64bit too (unsigned though). So, let's convert intermediate argument to 64bit too. This patch is a first in the 64-bit-blocklayer series, so we are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). Patch-correctness audit by Eric Blake: Caller has 32-bit, this patch now causes widening which is safe: block/block-backend.c: blk_do_preadv() passes 'unsigned int' block/block-backend.c: blk_do_pwritev_part() passes 'unsigned int' block/throttle.c: throttle_co_pwrite_zeroes() passes 'int' block/throttle.c: throttle_co_pdiscard() passes 'int' Caller has 64-bit, this patch fixes potential bug where pre-patch could narrow, except it's easy enough to trace that callers are still capped at 2G actions: block/throttle.c: throttle_co_preadv() passes 'uint64_t' block/throttle.c: throttle_co_pwritev() passes 'uint64_t' Implementation in question: block/throttle-groups.c throttle_group_co_io_limits_intercept() takes 'unsigned int bytes' and uses it: argument to util/throttle.c throttle_account(uint64_t) All safe: it patches a latent bug, and does not introduce any 64-bit gotchas once throttle_co_p{read,write}v are relaxed, and assuming throttle_account() is not buggy. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20201211183934.169161-7-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:14:00 -06:00
Vladimir Sementsov-Ogievskiy	69b55e03f7	block: refactor bdrv_check_request: add errp It's better to pass &error_abort than just assert that result is 0: on crash, we'll immediately see the reason in the backtrace. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: fix iotest 206 fallout] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:00:33 -06:00
Vladimir Sementsov-Ogievskiy	143a6384f5	block/block-copy: drop unused argument of block_copy() Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-21-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	5b49c2bdc1	block/block-copy: drop unused block_copy_set_progress_callback() Drop unused code. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-20-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	e0323a045f	blockjob: add set_speed to BlockJobDriver We are going to use async block-copy call in backup, so we'll need to passthrough setting backup speed to block-copy call. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-9-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	a6d23d56df	block/block-copy: add block_copy_cancel Add function to cancel running async block-copy call. It will be used in backup. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-8-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	7e032df0ea	block/block-copy: add ratelimit to block-copy We are going to directly use one async block-copy operation for backup job, so we need rate limiter. We want to maintain current backup behavior: only background copying is limited and copy-before-write operations only participate in limit calculation. Therefore we need one rate limiter for block-copy state and boolean flag for block-copy call state for actual limitation. Note, that we can't just calculate each chunk in limiter after successful copying: it will not save us from starting a lot of async sub-requests which will exceed limit too much. Instead let's use the following scheme on sub-request creation: 1. If at the moment limit is not exceeded, create the request and account it immediately. 2. If at the moment limit is already exceeded, drop create sub-request and handle limit instead (by sleep). With this approach we'll never exceed the limit more than by one sub-request (which pretty much matches current backup behavior). Note also, that if there is in-flight block-copy async call, block_copy_kick() should be used after set-speed to apply new setup faster. For that block_copy_kick() published in this patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-7-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	26be9d62dd	block/block-copy: add max_chunk and max_workers parameters They will be used for backup. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-5-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	de4641b46b	block/block-copy: implement block_copy_async We'll need async block-copy invocation to use in backup directly. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-4-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	86c6a3b690	qapi: backup: add perf.use-copy-range parameter Experiments show, that copy_range is not always making things faster. So, to make experimentation simpler, let's add a parameter. Some more perf parameters will be added soon, so here is a new struct. For now, add new backup qmp parameter with x- prefix for the following reasons: - We are going to add more performance parameters, some will be related to the whole block-copy process, some only to background copying in backup (ignored for copy-before-write operations). - On the other hand, we are going to use block-copy interface in other block jobs, which will need performance options as well.. And it should be the same structure or at least somehow related. So, there are too much unclean things about how the interface and now we need the new options mostly for testing. Let's keep them experimental for a while. In do_backup_common() new x-perf parameter handled in a way to make further options addition simpler. We add use-copy-range with default=true, and we'll change the default in further patch, after moving backup to use block-copy. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-2-vsementsov@virtuozzo.com> [mreitz: s/5\.2/6.0/] Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	7f4a396d76	qapi: block-stream: add "bottom" argument The code already don't freeze base node and we try to make it prepared for the situation when base node is changed during the operation. In other words, block-stream doesn't own base node. Let's introduce a new interface which should replace the current one, which will in better relations with the code. Specifying bottom node instead of base, and requiring it to be non-filter gives us the following benefits: - drop difference between above_base and base_overlay, which will be renamed to just bottom, when old interface dropped - clean way to work with parallel streams/commits on the same backing chain, which otherwise become a problem when we introduce a filter for stream job - cleaner interface. Nobody will surprised the fact that base node may disappear during block-stream, when there is no word about "base" in the interface. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201216061703.70908-11-vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Andrey Shinkevich	e275458b29	copy-on-read: skip non-guest reads if no copy needed If the flag BDRV_REQ_PREFETCH was set, skip idling read/write operations in COR-driver. It can be taken into account for the COR-algorithms optimization. That check is being made during the block stream job by the moment. Add the BDRV_REQ_PREFETCH flag to the supported_read_flags of the COR-filter. block: Modify the comment for the flag BDRV_REQ_PREFETCH as we are going to use it alone and pass it to the COR-filter driver for further processing. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201216061703.70908-9-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00

1 2 3 4 5 ...

1204 Commits