mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Klaus Jensen	fa5db2aa16	hw/nvme: fix missing cq eventidx update Prior to reading the shadow doorbell cq head, we have to update the eventidx. Otherwise, we risk that the driver will skip an mmio doorbell write. This happens on riscv64, as reported by Guenter. Adding the missing update to the cq eventidx fixes the issue. Fixes: `3f7fe8de3d` ("hw/nvme: Implement shadow doorbell buffer support") Cc: qemu-stable@nongnu.org Cc: qemu-riscv@nongnu.org Reported-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2023-01-09 08:48:46 +01:00
Klaus Jensen	2fda0726e5	hw/nvme: fix missing endian conversions for doorbell buffers The eventidx and doorbell value are not handling endianness correctly. Fix this. Fixes: `3f7fe8de3d` ("hw/nvme: Implement shadow doorbell buffer support") Cc: qemu-stable@nongnu.org Reported-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2023-01-09 08:48:46 +01:00
Klaus Jensen	47cd3539e1	hw/nvme: rename shadow doorbell related trace events Rename the trace events related to writing the event index and reading the doorbell value to make it more clear that the event is associated with an actual update (write or read respectively). Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2023-01-09 08:48:46 +01:00
Klaus Jensen	48b32c28d5	hw/nvme: use QOM accessors Replace various ->parent_obj use with the equivalent QOM accessors. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2023-01-09 08:48:46 +01:00
Markus Armbruster	edf5ca5dbe	include/hw/pci: Split pci_device.h off pci.h PCIDeviceClass and PCIDevice are defined in pci.h. Many users of the header don't actually need them. Similar structs live in their own headers: PCIBusClass and PCIBus in pci_bus.h, PCIBridge in pci_bridge.h, PCIHostBridgeClass and PCIHostState in pci_host.h, PCIExpressHost in pcie_host.h, and PCIERootPortClass, PCIEPort, and PCIESlot in pcie_port.h. Move PCIDeviceClass and PCIDeviceClass to new pci_device.h, along with the code that needs them. Adjust include directives. This also enables the next commit. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221222100330.380143-6-armbru@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-01-08 01:54:22 -05:00
Markus Armbruster	3d558330ad	Drop more useless casts from void * to pointer Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20221123133811.1398562-1-armbru@redhat.com>	2022-12-14 16:19:35 +01:00
Klaus Jensen	83f56ac321	hw/nvme: remove copy bh scheduling Fix a potential use-after-free by removing the bottom half and enqueuing the completion directly. Fixes: `796d20681d` ("hw/nvme: reimplement the copy command to allow aio cancellation") Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-12-01 08:45:03 +01:00
Klaus Jensen	818b9b8f5e	hw/nvme: fix aio cancel in dsm When the DSM operation is cancelled asynchronously, we set iocb->ret to -ECANCELED. However, the callback function only checks the return value of the completed aio, which may have completed succesfully prior to the cancellation and thus the callback ends up continuing the dsm operation instead of bailing out. Fix this. Secondly, fix a potential use-after-free by removing the bottom half and enqueuing the completion directly. Fixes: `d7d1474fd8` ("hw/nvme: reimplement dsm to allow cancellation") Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-12-01 08:45:00 +01:00
Klaus Jensen	36a251c346	hw/nvme: fix aio cancel in zone reset If the zone reset operation is cancelled but the block unmap operation completes normally, the callback will continue resetting the next zone since it neglects to check iocb->ret which will have been set to -ECANCELED. Make sure that this is checked and bail out if an error is present. Secondly, fix a potential use-after-free by removing the bottom half and enqueuing the completion directly. Fixes: `63d96e4ffd` ("hw/nvme: reimplement zone reset to allow cancellation") Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-12-01 08:44:56 +01:00
Klaus Jensen	3dbc1708ea	hw/nvme: fix aio cancel in flush Make sure that iocb->aiocb is NULL'ed when cancelling. Fix a potential use-after-free by removing the bottom half and enqueuing the completion directly. Fixes: `38f4ac65ac` ("hw/nvme: reimplement flush to allow cancellation") Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-12-01 08:44:52 +01:00
Klaus Jensen	433c71e494	hw/nvme: fix aio cancel in format There are several bugs in the async cancel code for the Format command. Firstly, cancelling a format operation neglects to set iocb->ret as well as clearing the iocb->aiocb after cancelling the underlying aiocb which causes the aio callback to ignore the cancellation. Trivial fix. Secondly, and worse, because the request is queued up for posting to the CQ in a bottom half, if the cancellation is due to the submission queue being deleted (which calls blk_aio_cancel), the req structure is deallocated in nvme_del_sq prior to the bottom half being schedulued. Fix this by simply removing the bottom half, there is no reason to defer it anyway. Fixes: `3bcf26d3d6` ("hw/nvme: reimplement format nvm to allow cancellation") Reported-by: Jonathan Derrick <jonathan.derrick@linux.dev> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-12-01 08:44:16 +01:00
Stefan Hajnoczi	f21f1cfeb9	pci,pc,virtio: features, tests, fixes, cleanups lots of acpi rework first version of biosbits infrastructure ASID support in vhost-vdpa core_count2 support in smbios PCIe DOE emulation virtio vq reset HMAT support part of infrastructure for viommu support in vhost-vdpa VTD PASID support fixes, tests all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmNpXDkPHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpD0AH/2G8ZPrgrxJC9y3uD5/5J6QRzO+TsDYbg5ut uBf4rKSHHzcu6zdyAfsrhbAKKzyD4HrEGNXZrBjnKM1xCiB/SGBcDIWntwrca2+s 5Dpbi4xvd4tg6tVD4b47XNDCcn2uUbeI0e2M5QIbtCmzdi/xKbFAfl5G8DQp431X Kmz79G4CdKWyjVlM0HoYmdCw/4FxkdjD02tE/Uc5YMrePNaEg5Bw4hjCHbx1b6ur 6gjeXAtncm9s4sO0l+sIdyiqlxiTry9FSr35WaQ0qPU+Og5zaf1EiWfdl8TRo4qU EAATw5A4hyw11GfOGp7oOVkTGvcNB/H7aIxD7emdWZV8+BMRPKo= =zTCn -----END PGP SIGNATURE----- Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging pci,pc,virtio: features, tests, fixes, cleanups lots of acpi rework first version of biosbits infrastructure ASID support in vhost-vdpa core_count2 support in smbios PCIe DOE emulation virtio vq reset HMAT support part of infrastructure for viommu support in vhost-vdpa VTD PASID support fixes, tests all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmNpXDkPHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpD0AH/2G8ZPrgrxJC9y3uD5/5J6QRzO+TsDYbg5ut # uBf4rKSHHzcu6zdyAfsrhbAKKzyD4HrEGNXZrBjnKM1xCiB/SGBcDIWntwrca2+s # 5Dpbi4xvd4tg6tVD4b47XNDCcn2uUbeI0e2M5QIbtCmzdi/xKbFAfl5G8DQp431X # Kmz79G4CdKWyjVlM0HoYmdCw/4FxkdjD02tE/Uc5YMrePNaEg5Bw4hjCHbx1b6ur # 6gjeXAtncm9s4sO0l+sIdyiqlxiTry9FSr35WaQ0qPU+Og5zaf1EiWfdl8TRo4qU # EAATw5A4hyw11GfOGp7oOVkTGvcNB/H7aIxD7emdWZV8+BMRPKo= # =zTCn # -----END PGP SIGNATURE----- # gpg: Signature made Mon 07 Nov 2022 14:27:53 EST # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (83 commits) checkpatch: better pattern for inline comments hw/virtio: introduce virtio_device_should_start tests/acpi: update tables for new core count test bios-tables-test: add test for number of cores > 255 tests/acpi: allow changes for core_count2 test bios-tables-test: teach test to use smbios 3.0 tables hw/smbios: add core_count2 to smbios table type 4 vhost-user: Support vhost_dev_start vhost: Change the sequence of device start intel-iommu: PASID support intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function intel-iommu: drop VTDBus intel-iommu: don't warn guest errors when getting rid2pasid entry vfio: move implement of vfio_get_xlat_addr() to memory.c tests: virt: Update expected .acpihmatvirt tables tests: acpi: aarch64/virt: add a test for hmat nodes with no initiators hw/arm/virt: Enable HMAT on arm virt machine tests: Add HMAT AArch64/virt empty table files tests: acpi: q35: update expected blobs .hmat-noinitiators expected HMAT: tests: acpi: q35: add test for hmat nodes without initiators ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-11-07 18:43:56 -05:00
Akihiko Odaki	15377f6e79	msix: Assert that specified vector is in range There were several different ways to deal with the situation where the vector specified for a msix function is out of bound: - early return a function and keep progresssing - propagate the error to the caller - mark msix unusable - assert it is in bound - just ignore An out-of-bound vector should not be specified if the device implementation is correct so let msix functions always assert that the specified vector is in range. An exceptional case is virtio-pci, which allows the guest to configure vectors. For virtio-pci, it is more appropriate to introduce its own checks because it is sometimes too late to check the vector range in msix functions. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20220829083524.143640-1-akihiko.odaki@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Yuval Shaia <yuval.shaia.ml@gmail.com> Signed-off-by: Akihiko Odaki <<a href="mailto:akihiko.odaki@daynix.com" target="_blank">akihiko.odaki@daynix.com</a>><br>	2022-11-07 14:08:17 -05:00
Francis Pravin Antony Michael Raj	632cb6cf07	hw/nvme: Abort copy command when format is one while pif As per the NVMe Command Set specification Section 3.2.2, if i) The namespace is formatted to use 16b Guard Protection Information (i.e., pif = 0) and ii) The Descriptor Format is not cleared to 0h Then the copy command should be aborted with the status code of Invalid Namespace or Format Fixes: `44219b6029` ("hw/nvme: 64-bit pi support") Signed-off-by: Francis Pravin Antony Michael Raj <francis.michael@solidigm.com> Signed-off-by: Jonathan Derrick <jonathan.derrick@solidigm.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-11-02 09:23:05 +01:00
Klaus Jensen	d38cc6fd1c	hw/nvme: reenable cqe batching Commit `2e53b0b450` ("hw/nvme: Use ioeventfd to handle doorbell updates") had the unintended effect of disabling batching of CQEs. This patch changes the sq/cq timers to bottom halfs and instead of calling nvme_post_cqes() immediately (causing an interrupt per cqe), we defer the call. \| iops -----------------+------ baseline \| 138k +cqe batching \| 233k Fixes: `2e53b0b450` ("hw/nvme: Use ioeventfd to handle doorbell updates") Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-11-02 09:23:05 +01:00
Klaus Jensen	e2e137f642	hw/nvme: do not enable ioeventfd by default Do not enable ioeventfd by default. Let the feature mature a bit before we consider enabling it by default. Fixes: `2e53b0b450` ("hw/nvme: Use ioeventfd to handle doorbell updates") Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-08-01 12:01:21 +02:00
Klaus Jensen	04e8da8890	hw/nvme: unregister the event notifier handler on the main loop Make sure the notifier handler is unregistered in the main loop prior to cleaning it up. Fixes: `2e53b0b450` ("hw/nvme: Use ioeventfd to handle doorbell updates") Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-08-01 12:01:21 +02:00
Klaus Jensen	a2da737729	hw/nvme: skip queue processing if notifier is cleared While it is safe to process the queues when they are empty, skip it if the event notifier callback was invoked spuriously. Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-08-01 12:01:21 +02:00
Jinhao Fan	2e53b0b450	hw/nvme: Use ioeventfd to handle doorbell updates Add property "ioeventfd" which is enabled by default. When this is enabled, updates on the doorbell registers will cause KVM to signal an event to the QEMU main loop to handle the doorbell updates. Therefore, instead of letting the vcpu thread run both guest VM and IO emulation, we now use the main loop thread to do IO emulation and thus the vcpu thread has more cycles for the guest VM. Since ioeventfd does not tell us the exact value that is written, it is only useful when shadow doorbell buffer is enabled, where we check for the value in the shadow doorbell buffer when we get the doorbell update event. IOPS comparison on Linux 5.19-rc2: (Unit: KIOPS) qd 1 4 16 64 qemu 35 121 176 153 ioeventfd 41 133 258 313 Changes since v3: - Do not deregister ioeventfd when it was not enabled on a SQ/CQ Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-07-15 10:40:33 +02:00
Niklas Cassel	dfa82ac201	hw/nvme: force nvme-ns param 'shared' to false if no nvme-subsys node Since commit `916b0f0b52` ("hw/nvme: change nvme-ns 'shared' default") the default value of nvme-ns param 'shared' is set to true, regardless if there is a nvme-subsys node or not. On a system without a nvme-subsys node, a namespace will never be able to be attached to more than one controller, so for this configuration, it is counterintuitive for this parameter to be set by default. Force the nvme-ns param 'shared' to false for configurations where there is no nvme-subsys node, as the namespace will never be able to attach to more than one controller anyway. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-07-15 10:40:33 +02:00
Jinhao Fan	387350d5f4	hw/nvme: Add trace events for shadow doorbell buffer When shadow doorbell buffer is enabled, doorbell registers are lazily updated. The actual queue head and tail pointers are stored in Shadow Doorbell buffers. Add trace events for updates on the Shadow Doorbell buffers and EventIdx buffers. Also add trace event for the Doorbell Buffer Config command. Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> [k.jensen: rebased] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-07-15 10:40:33 +02:00
Jinhao Fan	3f7fe8de3d	hw/nvme: Implement shadow doorbell buffer support Implement Doorbel Buffer Config command (Section 5.7 in NVMe Spec 1.3) and Shadow Doorbel buffer & EventIdx buffer handling logic (Section 7.13 in NVMe Spec 1.3). For queues created before the Doorbell Buffer Config command, the nvme_dbbuf_config function tries to associate each existing SQ and CQ with its Shadow Doorbel buffer and EventIdx buffer address. Queues created after the Doorbell Buffer Config command will have the doorbell buffers associated with them when they are initialized. In nvme_process_sq and nvme_post_cqe, proactively check for Shadow Doorbell buffer changes instead of wait for doorbell register changes. This reduces the number of MMIOs. In nvme_process_db(), update the shadow doorbell buffer value with the doorbell register value if it is the admin queue. This is a hack since hosts like Linux NVMe driver and SPDK do not use shadow doorbell buffer for the admin queue. Copying the doorbell register value to the shadow doorbell buffer allows us to support these hosts as well as spec-compliant hosts that use shadow doorbell buffer for the admin queue. Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> [k.jensen: rebased] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-07-15 10:40:33 +02:00
Dr. David Alan Gilbert	a0984714fb	trivial typos: namesapce 'namespace' is misspelled in a bunch of places. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Message-Id: <20220614104045.85728-3-dgilbert@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2022-06-28 11:06:44 +02:00
Klaus Jensen	98836e8e01	hw/nvme: clear aen mask on reset The internally maintained AEN mask is not cleared on reset. Fix this. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Klaus Jensen	b9147a3aa1	Revert "hw/block/nvme: add support for sgl bit bucket descriptor" This reverts commit `d97eee64fe`. The emulated controller correctly accounts for not including bit buckets in the controller-to-host data transfer, however it doesn't correctly account for the holes for the on-disk data offsets. Reported-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Klaus Jensen	cc9bcee265	hw/nvme: clean up CC register write logic The SRIOV series exposed an issued with how CC register writes are handled and how CSTS is set in response to that. Specifically, after applying the SRIOV series, the controller could end up in a state with CC.EN set to '1' but with CSTS.RDY cleared to '0', causing drivers to expect CSTS.RDY to transition to '1' but timing out. Clean this up. Reviewed-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Reviewed-by: Lukasz Maniak <lukasz.maniak@linux.intel.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Łukasz Gieryk	b7698b917a	hw/nvme: Update the initalization place for the AER queue This patch updates the initialization place for the AER queue, so it’s initialized once, at controller initialization, and not every time controller is enabled. While the original version works for a non-SR-IOV device, as it’s hard to interact with the controller if it’s not enabled, the multiple reinitialization is not necessarily correct. With the SR/IOV feature enabled a segfault can happen: a VF can have its controller disabled, while a namespace can still be attached to the controller through the parent PF. An event generated in such case ends up on an uninitialized queue. While it’s an interesting question whether a VF should support AER in the first place, I don’t think it must be answered today. Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Łukasz Gieryk	11871f53ef	hw/nvme: Add support for the Virtualization Management command With the new command one can: - assign flexible resources (queues, interrupts) to primary and secondary controllers, - toggle the online/offline state of given controller. Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Łukasz Gieryk	746d42b133	hw/nvme: Initialize capability structures for primary/secondary controllers With four new properties: - sriov_v{i,q}_flexible, - sriov_max_v{i,q}_per_vf, one can configure the number of available flexible resources, as well as the limits. The primary and secondary controller capability structures are initialized accordingly. Since the number of available queues (interrupts) now varies between VF/PF, BAR size calculation is also adjusted. Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Łukasz Gieryk	aa81771337	hw/nvme: Calculate BAR attributes in a function An NVMe device with SR-IOV capability calculates the BAR size differently for PF and VF, so it makes sense to extract the common code to a separate function. Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Łukasz Gieryk	3bfcc51737	hw/nvme: Remove reg_size variable and update BAR0 size calculation The n->reg_size parameter unnecessarily splits the BAR0 size calculation in two phases; removed to simplify the code. With all the calculations done in one place, it seems the pow2ceil, applied originally to reg_size, is unnecessary. The rounding should happen as the last step, when BAR size includes Nvme registers, queue registers, and MSIX-related space. Finally, the size of the mmio memory region is extended to cover the 1st 4KiB padding (see the map below). Access to this range is handled as interaction with a non-existing queue and generates an error trace, so actually nothing changes, while the reg_size variable is no longer needed. -------------------- \| BAR0 \| -------------------- [Nvme Registers ] [Queues ] [power-of-2 padding] - removed in this patch [4KiB padding (1) ] [MSIX TABLE ] [4KiB padding (2) ] [MSIX PBA ] [power-of-2 padding] Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Łukasz Gieryk	decc02614f	hw/nvme: Make max_ioqpairs and msix_qsize configurable in runtime The NVMe device defines two properties: max_ioqpairs, msix_qsize. Having them as constants is problematic for SR-IOV support. SR-IOV introduces virtual resources (queues, interrupts) that can be assigned to PF and its dependent VFs. Each device, following a reset, should work with the configured number of queues. A single constant is no longer sufficient to hold the whole state. This patch tries to solve the problem by introducing additional variables in NvmeCtrl’s state. The variables for, e.g., managing queues are therefore organized as: - n->params.max_ioqpairs – no changes, constant set by the user - n->(mutable_state) – (not a part of this patch) user-configurable, specifies number of queues available _after_ reset - n->conf_ioqpairs - (new) used in all the places instead of the ‘old’ n->params.max_ioqpairs; initialized in realize() and updated during reset() to reflect user’s changes to the mutable state Since the number of available i/o queues and interrupts can change in runtime, buffers for sq/cqs and the MSIX-related structures are allocated big enough to handle the limits, to completely avoid the complicated reallocation. A helper function (nvme_update_msixcap_ts) updates the corresponding capability register, to signal configuration changes. Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:28 +02:00
Łukasz Gieryk	1e9c685ec7	hw/nvme: Implement the Function Level Reset This patch implements the Function Level Reset, a feature currently not implemented for the Nvme device, while listed as a mandatory ("shall") in the 1.4 spec. The implementation reuses FLR-related building blocks defined for the pci-bridge module, and follows the same logic: - FLR capability is advertised in the PCIE config, - custom pci_write_config callback detects a write to the trigger register and performs the PCI reset, - which, eventually, calls the custom dc->reset handler. Depending on reset type, parts of the state should (or should not) be cleared. To distinguish the type of reset, an additional parameter is passed to the reset function. This patch also enables advertisement of the Power Management PCI capability. The main reason behind it is to announce the no_soft_reset=1 bit, to signal SR-IOV support where each VF can be reset individually. The implementation purposedly ignores writes to the PMCS.PS register, as even such naïve behavior is enough to correctly handle the D3->D0 transition. It’s worth to note, that the power state transition back to to D3, with all the corresponding side effects, wasn't and stil isn't handled properly. Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:28 +02:00
Lukasz Maniak	99f48ae7ae	hw/nvme: Add support for Secondary Controller List Introduce handling for Secondary Controller List (Identify command with CNS value of 15h). Secondary controller ids are unique in the subsystem, hence they are reserved by it upon initialization of the primary controller to the number of sriov_max_vfs. ID reservation requires the addition of an intermediate controller slot state, so the reserved controller has the address 0xFFFF. A secondary controller is in the reserved state when it has no virtual function assigned, but its primary controller is realized. Secondary controller reservations are released to NULL when its primary controller is unregistered. Signed-off-by: Lukasz Maniak <lukasz.maniak@linux.intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:28 +02:00
Lukasz Maniak	5e6f963f01	hw/nvme: Add support for Primary Controller Capabilities Implementation of Primary Controller Capabilities data structure (Identify command with CNS value of 14h). Currently, the command returns only ID of a primary controller. Handling of remaining fields are added in subsequent patches implementing virtualization enhancements. Signed-off-by: Lukasz Maniak <lukasz.maniak@linux.intel.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:28 +02:00
Lukasz Maniak	44c2c09488	hw/nvme: Add support for SR-IOV This patch implements initial support for Single Root I/O Virtualization on an NVMe device. Essentially, it allows to define the maximum number of virtual functions supported by the NVMe controller via sriov_max_vfs parameter. Passing a non-zero value to sriov_max_vfs triggers reporting of SR-IOV capability by a physical controller and ARI capability by both the physical and virtual function devices. NVMe controllers created via virtual functions mirror functionally the physical controller, which may not entirely be the case, thus consideration would be needed on the way to limit the capabilities of the VF. NVMe subsystem is required for the use of SR-IOV. Signed-off-by: Lukasz Maniak <lukasz.maniak@linux.intel.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:28 +02:00
Dmitry Tikhov	d7fe639cab	hw/nvme: add new command abort case NVMe command set specification for end-to-end data protection formatted namespace states: o If the Reference Tag Check bit of the PRCHK field is set to ‘1’ and the namespace is formatted for Type 3 protection, then the controller: ▪ should not compare the protection Information Reference Tag field to the computed reference tag; and ▪ may ignore the ILBRT and EILBRT fields. If a command is aborted as a result of the Reference Tag Check bit of the PRCHK field being set to ‘1’, then that command should be aborted with a status code of Invalid Protection Information, but may be aborted with a status code of Invalid Field in Command. Currently qemu compares reftag in the nvme_dif_prchk function whenever Reference Tag Check bit is set in the command. For type 3 namespaces however, caller of nvme_dif_prchk - nvme_dif_check does not increment reftag for each subsequent logical block. That way commands incorporating more than one logical block for type 3 formatted namespaces with reftag check bit set, always fail with End-to-end Reference Tag Check Error. Comply with spec by handling case of set Reference Tag Check bit in the type 3 formatted namespace. Fixes: `146f720c55` ("hw/block/nvme: end-to-end data protection") Signed-off-by: Dmitry Tikhov <d.tihov@yadro.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-03 21:48:24 +02:00
Klaus Jensen	fbba243bc7	hw/nvme: bump firmware revision The Linux kernel quirks the QEMU NVMe controller pretty heavily because of the namespace identifier mess. Since this is now fixed, bump the firmware revision number to allow the quirk to be disabled for this revision. As of now, bump the firmware revision number to be equal to the QEMU release version number. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-03 21:48:24 +02:00
Klaus Jensen	9f2e1acf83	hw/nvme: do not report null uuid Do not report the "null uuid" (all zeros) in the namespace identification descriptors. Reported-by: Luis Chamberlain <mcgrof@kernel.org> Reported-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-03 21:48:24 +02:00
Klaus Jensen	bd9f371c6f	hw/nvme: do not auto-generate uuid Do not default to generate an UUID for namespaces if it is not explicitly specified. This is a technically a breaking change in behavior. However, since the UUID changes on every VM launch, it is not spec compliant and is of little use since the UUID cannot be used reliably anyway and the behavior prior to this patch must be considered buggy. Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-03 21:48:24 +02:00
Klaus Jensen	36d83272d5	hw/nvme: do not auto-generate eui64 We cannot provide auto-generated unique or persistent namespace identifiers (EUI64, NGUID, UUID) easily. Since 6.1, namespaces have been assigned a generated EUI64 of the form "52:54:00:<namespace counter>". This is will be unique within a QEMU instance, but not globally. Revert that this is assigned automatically and immediately deprecate the compatibility parameter. Users can opt-in to this with the `eui64-default=on` device parameter or set it explicitly with `eui64=UINT64`. Cc: libvir-list@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-03 21:48:24 +02:00
Klaus Jensen	a859eb9f8f	hw/nvme: enforce common serial per subsystem The Identify Controller Serial Number (SN) is the serial number for the NVM subsystem and must be the same across all controller in the NVM subsystem. Enforce this. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-03 21:48:24 +02:00
Klaus Jensen	9235a72a5d	hw/nvme: fix smart aen Pass the right constant to nvme_smart_event(). The NVME_AER* values hold the bit position in the SMART byte, not the shifted value that we expect it to be in nvme_smart_event(). Fixes: `c62720f137` ("hw/block/nvme: trigger async event during injecting smart warning") Acked-by: zhenwei pi <pizhenwei@bytedance.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-03 21:48:24 +02:00
Dmitry Tikhov	2e8f952ae7	hw/nvme: fix copy cmd for pi enabled namespaces Current implementation have problem in the read part of copy command. Because there is no metadata mangling before nvme_dif_check invocation, reftag error could be thrown for blocks of namespace that have not been previously written to. Signed-off-by: Dmitry Tikhov <d.tihov@yadro.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-03 21:48:24 +02:00
Dmitry Tikhov	51c4532663	hw/nvme: add missing return statement Since there is no return after nvme_dsm_cb invocation, metadata associated with non-zero block range is currently zeroed. Also this behaviour leads to segfault since we schedule iocb->bh two times. First when entering nvme_dsm_cb with iocb->idx == iocb->nr and second because of missing return on call stack unwinding by calling blk_aio_pwrite_zeroes and subsequent nvme_dsm_cb callback. Fixes: `d7d1474fd8` ("hw/nvme: reimplement dsm to allow cancellation") Signed-off-by: Dmitry Tikhov <d.tihov@yadro.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-03 21:48:24 +02:00
Dmitry Tikhov	1e64facc01	hw/nvme: fix narrowing conversion Since nlbas is of type int, it does not work with large namespace size values, e.g., 9 TB size of file backing namespace and 8 byte metadata with 4096 bytes lbasz gives negative nlbas value, which is later promoted to negative int64_t type value and results in negative ns->moff which breaks namespace Signed-off-by: Dmitry Tikhov <ddtikhov@gmail.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-03 21:48:24 +02:00
Markus Armbruster	52581c718c	Clean up header guards that don't match their file name Header guard symbols should match their file name to make guard collisions less likely. Cleaned up with scripts/clean-header-guards.pl, followed by some renaming of new guard symbols picked by the script to better ones. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20220506134911.2856099-2-armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> [Change to generated file ebpf/rss.bpf.skeleton.h backed out]	2022-05-11 16:49:06 +02:00
Markus Armbruster	b21e238037	Use g_new() & friends where that makes obvious sense g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer, for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void *, which lets the compiler catch more type errors. This commit only touches allocations with size arguments of the form sizeof(T). Patch created mechanically with: $ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \ --macro-file scripts/cocci-macro-file.h FILES... Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20220315144156.1595462-4-armbru@redhat.com> Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>	2022-03-21 15:44:44 +01:00
Naveen Nagar	44219b6029	hw/nvme: 64-bit pi support This adds support for one possible new protection information format introduced in TP4068 (and integrated in NVMe 2.0): the 64-bit CRC guard and 48-bit reference tag. This version does not support storage tags. Like the CRC16 support already present, this uses a software implementation of CRC64 (so it is naturally pretty slow). But its good enough for verification purposes. This may go nicely hand-in-hand with the support that Keith submitted for the Linux kernel[1]. [1]: https://lore.kernel.org/linux-nvme/20220126165214.GA1782352@dhcp-10-100-145-180.wdc.com/T/ Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Naveen Nagar <naveen.n1@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-03-03 09:30:21 +01:00
Klaus Jensen	ac0b34c58d	hw/nvme: add pi tuple size helper A subsequent patch will introduce a new tuple size; so add a helper and use that instead of sizeof() and magic numbers. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-03-03 09:28:49 +01:00
Naveen Nagar	763c05dfb0	hw/nvme: add support for the lbafee hbs feature Add support for up to 64 LBA formats through the LBAFEE field of the Host Behavior Support feature. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Naveen Nagar <naveen.n1@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-03-03 09:28:49 +01:00
Klaus Jensen	a6de6ed509	hw/nvme: move format parameter parsing There is no need to extract the format command parameters for each namespace. Move it to the entry point. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-03-03 09:28:49 +01:00
Naveen Nagar	d0c0697b9e	hw/nvme: add host behavior support feature Add support for getting and setting the Host Behavior Support feature. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Naveen Nagar <naveen.n1@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-03-03 09:28:48 +01:00
Klaus Jensen	05f7ae45c8	hw/nvme: move dif/pi prototypes into dif.h Move dif/pi data structures and inlines to dif.h. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-03-03 09:28:48 +01:00
Klaus Jensen	e321b4cdc2	hw/nvme: add support for zoned random write area Add support for TP 4076 ("Zoned Random Write Area"), v2021.08.23 ("Ratified"). This adds three new namespace parameters: "zoned.numzrwa" (number of zrwa resources, i.e. number of zones that can have a zrwa), "zoned.zrwas" (zrwa size in LBAs), "zoned.zrwafg" (granularity in LBAs for flushes). Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-02-14 08:58:29 +01:00
Klaus Jensen	25872031e1	hw/nvme: add ozcs enum Add enumeration for OZCS values. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-02-14 08:58:29 +01:00
Klaus Jensen	6190d92ff7	hw/nvme: add struct for zone management send Add struct for Zone Management Send in preparation for more zone send flags. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-02-14 08:58:29 +01:00
Philippe Mathieu-Daudé	8d3a17be6f	hw/nvme/ctrl: Pass buffers as 'void ' types These buffers can be anything, not an array of chars, so use the 'void ' type for them. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-02-14 08:58:29 +01:00
Philippe Mathieu-Daudé	e080ce8676	hw/nvme/ctrl: Have nvme_addr_write() take const buffer The 'buf' argument is not modified, so better pass it as const type. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-02-14 08:58:29 +01:00
Klaus Jensen	736b01642d	hw/nvme: fix CVE-2021-3929 This fixes CVE-2021-3929 "locally" by denying DMA to the iomem of the device itself. This still allows DMA to MMIO regions of other devices (e.g. doing P2P DMA to the controller memory buffer of another NVMe device). Fixes: CVE-2021-3929 Reported-by: Qiuhao Li <Qiuhao.Li@outlook.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-02-14 08:58:29 +01:00
Philippe Mathieu-Daudé	f02b664aad	hw/dma: Let dma_buf_read() / dma_buf_write() propagate MemTxResult Since commit `292e13142d`, dma_buf_rw() returns a MemTxResult type. Do not discard it, return it to the caller. Pass the previously returned value (the QEMUSGList residual size, which was rarely used) as an optional argument. With this new API, SCSIRequest::residual might now be accessed via a pointer. Since the size_t type does not have the same size on 32 and 64-bit host architectures, convert it to a uint64_t, which is big enough to hold the residual size, and the type is constant on both 32/64-bit hosts. Update the few dma_buf_read() / dma_buf_write() callers to the new API. Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Acked-by: Peter Xu <peterx@redhat.com> Message-Id: <20220117125130.131828-1-f4bug@amsat.org>	2022-01-18 12:56:29 +01:00
Philippe Mathieu-Daudé	bfa30f3903	hw/dma: Use dma_addr_t type definition when relevant Update the obvious places where dma_addr_t should be used (instead of uint64_t, hwaddr, size_t, int32_t types). This allows to have &dma_addr_t type portable on 32/64-bit hosts. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20220111184309.28637-11-f4bug@amsat.org>	2022-01-18 12:56:29 +01:00
Philippe Mathieu-Daudé	1e5a3f8b2a	dma: Let dma_buf_read() take MemTxAttrs argument Let devices specify transaction attributes when calling dma_buf_read(). Keep the default MEMTXATTRS_UNSPECIFIED in the few callers. Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20211223115554.3155328-13-philmd@redhat.com>	2021-12-31 01:05:27 +01:00
Philippe Mathieu-Daudé	392e48af34	dma: Let dma_buf_write() take MemTxAttrs argument Let devices specify transaction attributes when calling dma_buf_write(). Keep the default MEMTXATTRS_UNSPECIFIED in the few callers. Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20211223115554.3155328-12-philmd@redhat.com>	2021-12-31 01:05:27 +01:00
Klaus Jensen	e2c57529c9	hw/nvme: fix buffer overrun in nvme_changed_nslist (CVE-2021-3947) Fix missing offset verification. Cc: qemu-stable@nongnu.org Cc: Philippe Mathieu-Daudé <philmd@redhat.com> Reported-by: Qiuhao Li <Qiuhao.Li@outlook.com> Fixes: `f432fdfa12` ("support changed namespace asynchronous event") Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-11-19 07:32:19 +01:00
Klaus Jensen	916b0f0b52	hw/nvme: change nvme-ns 'shared' default Change namespaces to be shared namespaces by default (parameter shared=on). Keep shared=off for older machine types. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-11-19 07:31:56 +01:00
Hannes Reinecke	9fc6e86e8b	hw/nvme: reattach subsystem namespaces on hotplug With commit `5ffbaeed16` ("hw/nvme: fix controller hot unplugging") namespaces get moved from the controller to the subsystem if one is specified. That keeps the namespaces alive after a controller hot-unplug, but after a controller hotplug we have to reconnect the namespaces from the subsystem to the controller. Fixes: `5ffbaeed16` ("hw/nvme: fix controller hot unplugging") Cc: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Hannes Reinecke <hare@suse.de> [k.jensen: only attach to shared and non-detached namespaces] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-11-19 07:31:34 +01:00
Peter Maydell	d637e1dc6d	qbus: Rename qbus_create_inplace() to qbus_init() Rename qbus_create_inplace() to qbus_init(); this is more in line with our usual naming convention for functions that in-place initialize objects. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-id: 20210923121153.23754-5-peter.maydell@linaro.org	2021-09-30 13:42:10 +01:00
Pankaj Raghav	c53a9a9102	hw/nvme: Return error for fused operations Currently, FUSED operations are not supported by QEMU. As per the 1.4 SPEC, controller should abort the command that requested a fused operation with an INVALID FIELD error code if they are not supported. Changes from v1: Added FUSE flag check also to the admin cmd processing as the FUSED operations are mentioned in the general SQE section in the SPEC. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-09-24 08:43:58 +02:00
Naveen Nagar	07a3dfa7c4	hw/nvme: fix verification of select field in namespace attachment Fix is added to check for reserved value in select field for namespace attachment CC: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Naveen Nagar <naveen.n1@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-09-24 08:43:52 +02:00
Klaus Jensen	fd761337ac	hw/nvme: fix validation of ASQ and ACQ Address 0x0 is a valid address. Fix the admin submission and completion queue address validation to not error out on this. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-09-24 08:31:35 +02:00
Klaus Jensen	5f4884c441	hw/nvme: fix missing variable initializers Coverity found that 'uuid', 'csi' and 'eui64' are uninitialized. While we set most of the fields, we do not explicitly set the rsvd2 field in the NvmeIdNsDescr header. Fix this by explicitly zero-initializing the variables. Reported-by: Coverity (CID 1458835, 1459295 and 1459580) Fixes: `6870cfb814` ("hw/nvme: namespace parameter for EUI-64") Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2021-08-09 12:52:16 +02:00
Klaus Jensen	49e03457f1	hw/nvme: fix mmio read The new PMR test unearthed a long-standing issue with MMIO reads on big-endian hosts. Fix this by unconditionally storing all controller registers in little endian. Cc: Gollu Appalanaidu <anaidu.gollu@samsung.com> Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>	2021-07-26 21:09:39 +02:00
Klaus Jensen	5029de44b5	hw/nvme: fix out-of-bounds reads Peter noticed that mmio access may read into the NvmeParams member in the NvmeCtrl struct. Fix the bounds check. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>	2021-07-26 21:09:39 +02:00
Klaus Jensen	a316aa50e6	hw/nvme: use symbolic names for registers Add the NvmeBarRegs enum and use these instead of explicit register offsets. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-07-26 21:09:38 +02:00
Klaus Jensen	5d45edbeac	hw/nvme: split pmrmsc register into upper and lower The specification uses a set of 32 bit PMRMSCL and PMRMSCU registers to make up the 64 bit logical PMRMSC register. Make it so. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-07-26 21:09:38 +02:00
Klaus Jensen	5ffbaeed16	hw/nvme: fix controller hot unplugging Prior to this patch the nvme-ns devices are always children of the NvmeBus owned by the NvmeCtrl. This causes the namespaces to be unrealized when the parent device is removed. However, when subsystems are involved, this is not what we want since the namespaces may be attached to other controllers as well. This patch adds an additional NvmeBus on the subsystem device. When nvme-ns devices are realized, if the parent controller device is linked to a subsystem, the parent bus is set to the subsystem one instead. This makes sure that namespaces are kept alive and not unrealized. Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-07-26 21:09:38 +02:00
Padmakar Kalghatgi	234214734f	hw/nvme: error handling for too many mappings If the number of PRP/SGL mappings exceed 1024, reads and writes will fail because of an internal QEMU limitation of max 1024 vectors. Signed-off-by: Padmakar Kalghatgi <p.kalghatgi@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> [k.jensen: changed the error message to be more generic] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-07-26 21:09:38 +02:00
Klaus Jensen	b0fde9e861	hw/nvme: unregister controller with subsystem at exit Make sure the controller is unregistered from the subsystem when device is removed. Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-07-26 21:09:38 +02:00
Klaus Jensen	cc6fb6bc50	hw/nvme: mark nvme-subsys non-hotpluggable We currently lack the infrastructure to handle subsystem hotplugging, so disable it. Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-07-26 21:09:38 +02:00
Klaus Jensen	5e4f6bcc29	hw/nvme: remove NvmeCtrl parameter from ns setup/check functions The nvme_ns_setup and nvme_ns_check_constraints should not depend on the controller state. Refactor and remove it. Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-07-26 21:09:38 +02:00
Klaus Jensen	83d7ed5c57	hw/nvme: fix pin-based interrupt behavior (again) Jakub noticed[1] that, when using pin-based interrupts, the device will unconditionally deasssert when any CQEs are acknowledged. However, the pin should not be deasserted if other completion queues still holds unacknowledged CQEs. The bug is an artifact of commit `ca247d3509` ("hw/block/nvme: fix pin-based interrupt behavior") which fixed one bug but introduced another. This is the third time someone tries to fix pin-based interrupts (see commit `5e9aa92eb1` ("hw/block: Fix pin-based interrupt behaviour of NVMe"))... Third time's the charm, so fix it, again, by keeping track of how many CQs have unacknowledged CQEs and only deassert when all are cleared. [1]: <20210610114624.304681-1-jakub.jermar@kernkonzept.com> Cc: qemu-stable@nongnu.org Fixes: `ca247d3509` ("hw/block/nvme: fix pin-based interrupt behavior") Reported-by: Jakub Jermář <jakub.jermar@kernkonzept.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-06-29 07:18:10 +02:00
Klaus Jensen	2b02aabc9d	hw/nvme: fix missing check for PMR capability Qiang Liu reported that an access on an unknown address is triggered in memory_region_set_enabled because a check on CAP.PMRS is missing for the PMRCTL register write when no PMR is configured. Cc: qemu-stable@nongnu.org Fixes: `75c3c9de96` ("hw/block/nvme: disable PMR at boot up") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/362 Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-06-29 07:16:25 +02:00
Gollu Appalanaidu	eeef43290d	hw/nvme: documentation fix In the documentation of the '-detached' param "be" and "not" has been used side by side, fix that. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-06-29 07:16:25 +02:00
Gollu Appalanaidu	5f4eb94dbb	hw/nvme: fix endianess conversion and add controller list Add the controller identifiers list CNS 0x13, available list of ctrls in NVM Subsystem that may or may not be attached to namespaces. In Identify Ctrl List of the CNS 0x12 and 0x13 no endian conversion for the nsid field. These two CNS values shows affect when there exists a Subsystem. Added condition if there is no Subsystem return invalid field in command. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-06-29 07:16:25 +02:00
Klaus Jensen	e76fb260ca	Partially revert "hw/block/nvme: drain namespaces on sq deletion" This partially reverts commit `98f84f5a4e`. Since all "multi aio" commands are now reimplemented to properly track the nested aiocbs, we can revert the "hack" that was introduced to make sure all requests we're properly drained upon sq deletion. The revert is partial since we keep the assert that no outstanding requests remain on the submission queue after the explicit cancellation. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-06-29 07:16:25 +02:00
Klaus Jensen	3bcf26d3d6	hw/nvme: reimplement format nvm to allow cancellation Prior to this patch, the aios associated with broadcast format are submitted anonymously (no aiocb reference saved from the blk_aio call). Fix this by formatting the namespaces one after another, saving a reference to the aiocb for each. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-06-29 07:16:25 +02:00
Klaus Jensen	63d96e4ffd	hw/nvme: reimplement zone reset to allow cancellation Prior to this patch, the aios associated with zone reset are submitted anonymously (no reference saved to the aiocb from the blk_aio call). Fix this by resetting the zones one after another, saving a reference to the aiocb for each reset. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-06-29 07:16:25 +02:00
Klaus Jensen	796d20681d	hw/nvme: reimplement the copy command to allow aio cancellation Before this patch the code would issue several aios simultaneously without saving a reference to the aiocb. Without the aiocb reference the individual copies cannot be canceled. Fix this by issuing copies of the ranges one after another. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-06-29 07:16:25 +02:00
Klaus Jensen	f1c97407c5	hw/nvme: add dw0/1 to the req completion trace event Some commands report additional useful information in dw0 and dw1 of the completion queue entry. Add them to the trace. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-06-29 07:16:25 +02:00
Klaus Jensen	2a132309e4	hw/nvme: use prinfo directly in nvme_check_prinfo and nvme_dif_check The nvme_check_prinfo() and nvme_dif_check() functions operate on the 16 bit "control" member of the NvmeCmd. These functions do not otherwise operate on an NvmeCmd or an NvmeRequest, so change them to expect the actual 4 bit PRINFO field and add constants that work on this field as well. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-06-29 07:16:25 +02:00
Klaus Jensen	189a8bf7f6	hw/nvme: remove assert from nvme_get_zone_by_slba Make nvme_get_zone_by_slba() return NULL if the slba is out of range. This allows the function to be used without guarding the call with a call to nvme_check_bounds(), in preparation for the next patch. Add asserts after calling nvme_get_zone_by_slba() instead. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-06-29 07:16:25 +02:00
Klaus Jensen	0ca5c3ccac	hw/nvme: save reftag when generating pi Prepare nvme_dif_pract_generate_dif() and nvme_dif_check() to be callable in smaller increments by making the reftag a pointer parameter updated by the function. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-06-29 07:16:25 +02:00
Klaus Jensen	d7d1474fd8	hw/nvme: reimplement dsm to allow cancellation Prior to this patch, a loop was used to issue multiple "fire and forget" aios for each range in the command. Without a reference to the aiocb returned from the blk_aio_pdiscard calls, the aios cannot be canceled. Fix this by processing the ranges one after another. As a bonus, this fixes how metadata is cleared (i.e. we only zero it out if the data was succesfully discarded). Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-06-29 07:16:25 +02:00
Klaus Jensen	ff0ac2c8b8	hw/nvme: add nvme_block_status_all helper Pull the gist of nvme_check_dulbe() into a helper function. This is in preparation for dsm refactoring. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-06-29 07:16:25 +02:00
Klaus Jensen	38f4ac65ac	hw/nvme: reimplement flush to allow cancellation Prior to this patch, a broadcast flush would result in submitting multiple "fire and forget" aios (no reference saved to the aiocbs returned from the blk_aio_flush calls). Fix this by issuing the flushes one after another. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-06-29 07:16:25 +02:00
Heinrich Schuchardt	3276dde4f2	hw/nvme: default for namespace EUI-64 On machines with version > 6.0 replace a missing EUI-64 by a generated value. Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Reviewed-by: Klaus Jensen <k.jensen@samsung.com>	2021-06-29 07:16:25 +02:00
Heinrich Schuchardt	6870cfb814	hw/nvme: namespace parameter for EUI-64 The EUI-64 field is the only identifier for NVMe namespaces in UEFI device paths. Add a new namespace property "eui64", that provides the user the option to specify the EUI-64. Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Acked-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com>	2021-06-29 07:16:25 +02:00
Gollu Appalanaidu	3553c48fcb	hw/nvme: fix csi field for cns 0x00 and 0x11 As per the TP 4056d Namespace types CNS 0x00 and CNS 0x11 CSI field shouldn't use but it is being used for these two Identify command CNS values, fix that. Remove 'nvme_csi_has_nvm_support()' helper as suggested by Klaus we can safely assume NVM command set support for all namespaces. Suggested-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-06-29 07:16:25 +02:00
Niklas Cassel	cccc2651f4	hw/nvme: add param to control auto zone transitioning to zone state closed In the Zoned Namespace Command Set Specification, chapter 2.5.1 Managing resources "The controller may transition zones in the ZSIO:Implicitly Opened state to the ZSC:Closed state for resource management purposes." The word may in this sentence means that automatically transitioning an implicitly opened zone to closed is completely optional. Add a new parameter so that the user can control if this automatic transitioning should be performed or not. Being able to control this can help with verifying that e.g. a user-space program behaves properly even without this optional ZNS feature. The default value is set to true, in order to not change the existing behavior. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> [k.jensen: moved parameter to controller] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-06-29 07:16:25 +02:00

1 2 3 4

154 Commits