Commit Graph

125 Commits

Author SHA1 Message Date
Marc-André Lureau
8cdb368d19 hw/nvme: fix -Werror=maybe-uninitialized
../hw/nvme/ctrl.c:6081:21: error: ‘result’ may be used uninitialized [-Werror=maybe-uninitialized]

It's not obvious that 'result' is set in all code paths. When &result is
a returned argument, it's even less clear.

Looking at various assignments, 0 seems to be a suitable default value.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Message-ID: <20240328102052.3499331-18-marcandre.lureau@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-04-02 16:15:07 +02:00
Peter Maydell
6fc6931231 virtio,pc,pci: features, cleanups, fixes
more memslots support in libvhost-user
 support PCIe Gen5/Gen6 link speeds in pcie
 more traces in vdpa
 network simulation devices support in vdpa
 SMBIOS type 9 descriptor implementation
 Bump max_cpus to 4096 vcpus in q35
 aw-bits and granule options in VIRTIO-IOMMU
 Support report NUMA nodes for device memory using GI in acpi
 Beginning of shutdown event support in pvpanic
 
 fixes, cleanups all over the place.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmXw0TMPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRp8x4H+gLMoGwaGAX7gDGPgn2Ix4j/3kO77ZJ9X9k/
 1KqZu/9eMS1j2Ei+vZqf05w7qRjxxhwDq3ilEXF/+UFqgAehLqpRRB8j5inqvzYt
 +jv0DbL11PBp/oFjWcytm5CbiVsvq8KlqCF29VNzc162XdtcduUOWagL96y8lJfZ
 uPrOoyeR7SMH9lp3LLLHWgu+9W4nOS03RroZ6Umj40y5B7yR0Rrppz8lMw5AoQtr
 0gMRnFhYXeiW6CXdz+Tzcr7XfvkkYDi/j7ibiNSURLBfOpZa6Y8+kJGKxz5H1K1G
 6ZY4PBcOpQzl+NMrktPHogczgJgOK10t+1i/R3bGZYw2Qn/93Eg=
 =C0UU
 -----END PGP SIGNATURE-----

Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging

virtio,pc,pci: features, cleanups, fixes

more memslots support in libvhost-user
support PCIe Gen5/Gen6 link speeds in pcie
more traces in vdpa
network simulation devices support in vdpa
SMBIOS type 9 descriptor implementation
Bump max_cpus to 4096 vcpus in q35
aw-bits and granule options in VIRTIO-IOMMU
Support report NUMA nodes for device memory using GI in acpi
Beginning of shutdown event support in pvpanic

fixes, cleanups all over the place.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmXw0TMPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRp8x4H+gLMoGwaGAX7gDGPgn2Ix4j/3kO77ZJ9X9k/
# 1KqZu/9eMS1j2Ei+vZqf05w7qRjxxhwDq3ilEXF/+UFqgAehLqpRRB8j5inqvzYt
# +jv0DbL11PBp/oFjWcytm5CbiVsvq8KlqCF29VNzc162XdtcduUOWagL96y8lJfZ
# uPrOoyeR7SMH9lp3LLLHWgu+9W4nOS03RroZ6Umj40y5B7yR0Rrppz8lMw5AoQtr
# 0gMRnFhYXeiW6CXdz+Tzcr7XfvkkYDi/j7ibiNSURLBfOpZa6Y8+kJGKxz5H1K1G
# 6ZY4PBcOpQzl+NMrktPHogczgJgOK10t+1i/R3bGZYw2Qn/93Eg=
# =C0UU
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 12 Mar 2024 22:03:31 GMT
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (68 commits)
  docs/specs/pvpanic: document shutdown event
  hw/cxl: Fix missing reserved data in CXL Device DVSEC
  hmat acpi: Fix out of bounds access due to missing use of indirection
  hmat acpi: Do not add Memory Proximity Domain Attributes Structure targetting non existent memory.
  qemu-options.hx: Document the virtio-iommu-pci aw-bits option
  hw/arm/virt: Set virtio-iommu aw-bits default value to 48
  hw/i386/q35: Set virtio-iommu aw-bits default value to 39
  virtio-iommu: Add an option to define the input range width
  virtio-iommu: Trace domain range limits as unsigned int
  qemu-options.hx: Document the virtio-iommu-pci granule option
  virtio-iommu: Change the default granule to the host page size
  virtio-iommu: Add a granule property
  hw/i386/acpi-build: Add support for SRAT Generic Initiator structures
  hw/acpi: Implement the SRAT GI affinity structure
  qom: new object to associate device to NUMA node
  hw/i386/pc: Inline pc_cmos_init() into pc_cmos_init_late() and remove it
  hw/i386/pc: Set "normal" boot device order in pc_basic_device_init()
  hw/i386/pc: Avoid one use of the current_machine global
  hw/i386/pc: Remove "rtc_state" link again
  Revert "hw/i386/pc: Confine system flash handling to pc_sysfw"
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

# Conflicts:
#	hw/core/machine.c
2024-03-13 15:11:53 +00:00
Akihiko Odaki
1a909e3dd8 hw/pci: Always call pcie_sriov_pf_reset()
Call pcie_sriov_pf_reset() from pci_do_device_reset() just as we do
for msi_reset() and msix_reset() to prevent duplicating code for each
SR-IOV PF.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20240228-reuse-v8-5-282660281e60@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Sriram Yagnaraman <sriram.yagnaraman@ericsson.com>
2024-03-12 17:56:55 -04:00
Akihiko Odaki
c8bc4db403 pcie_sriov: Reset SR-IOV extended capability
pcie_sriov_pf_disable_vfs() is called when resetting the PF, but it only
disables VFs and does not reset SR-IOV extended capability, leaking the
state and making the VF Enable register inconsistent with the actual
state.

Replace pcie_sriov_pf_disable_vfs() with pcie_sriov_pf_reset(), which
does not only disable VFs but also resets the capability.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20240228-reuse-v8-3-282660281e60@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Sriram Yagnaraman <sriram.yagnaraman@ericsson.com>
2024-03-12 17:56:55 -04:00
Akihiko Odaki
91bb64a8d2 hw/nvme: Use pcie_sriov_num_vfs()
nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
configurations to know the number of VFs being disabled due to SR-IOV
configuration writes, but the logic was flawed and resulted in
out-of-bound memory access.

It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
VFs, but it actually doesn't in the following cases:
- PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
- PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
- VFs were only partially enabled because of realization failure.

It is a responsibility of pcie_sriov to interpret SR-IOV configurations
and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
provides, to get the number of enabled VFs before and after SR-IOV
configuration writes.

Cc: qemu-stable@nongnu.org
Fixes: CVE-2024-26328
Fixes: 11871f53ef ("hw/nvme: Add support for the Virtualization Management command")
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20240228-reuse-v8-1-282660281e60@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12 17:56:55 -04:00
Klaus Jensen
fa905f65c5 hw/nvme: add machine compatibility parameter to enable msix exclusive bar
Commit 1901b4967c ("hw/block/nvme: move msix table and pba to BAR 0")
moved the MSI-X table and PBA to BAR 0 to make room for enabling CMR and
PMR at the same time. As reported by Julien Grall in #2184, this breaks
migration through system hibernation.

Add a machine compatibility parameter and set it on machines pre 6.0 to
enable the old behavior automatically, restoring the hibernation
migration support.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2184
Fixes: 1901b4967c ("hw/block/nvme: move msix table and pba to BAR 0")
Reported-by: Julien Grall julien@xen.org
Tested-by: Julien Grall julien@xen.org
Reviewed-by: Jesper Wendel Devantier <foss@defmacro.it>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2024-03-12 16:05:53 +01:00
Klaus Jensen
ee7bda4d38 hw/nvme: generalize the mbar size helper
Generalize the mbar size helper such that it can handle cases where the
MSI-X table and PBA are expected to be in an exclusive bar.

Cc: qemu-stable@nongnu.org
Reviewed-by: Jesper Wendel Devantier <foss@defmacro.it>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2024-03-12 15:48:56 +01:00
Roque Arcudia Hernandez
bdc31646c5 hw/nvme: Add NVMe NGUID property
This patch adds a way to specify an NGUID for a given NVMe Namespace using a
string of hexadecimal digits with an optional '-' separator to group bytes. For
instance:

-device nvme-ns,nguid="e9accd3b83904e13167cf0593437f57d"

If provided, the NGUID will be part of the Namespace Identification Descriptor
list and the Identify Namespace data.

Signed-off-by: Roque Arcudia Hernandez <roqueh@google.com>
Signed-off-by: Nabih Estefan <nabihestefan@google.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2024-03-12 15:48:56 +01:00
Klaus Jensen
8c78015a55 hw/nvme: fix invalid check on mcl
The number of logical blocks within a source range is converted into a
1s based number at the time of parsing. However, when verifying the copy
length we add one again, causing the check against MCL to fail in error.

Cc: qemu-stable@nongnu.org
Fixes: 381ab99d85 ("hw/nvme: check maximum copy length (MCL) for COPY")
Reviewed-by: Minwoo Im <minwoo.im@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2024-03-12 15:48:56 +01:00
Minwoo Im
4f0a4a3d58 hw/nvme: separate 'serial' property for VFs
Currently, when a VF is created, it uses the 'params' object of the PF
as it is. In other words, the 'params.serial' string memory area is also
shared. In this situation, if the VF is removed from the system, the
PF's 'params.serial' object is released with object_finalize() followed
by object_property_del_all() which release the memory for 'serial'
property. If that happens, the next VF created will inherit a serial
from a corrupted memory area.

If this happens, an error will occur when comparing subsys->serial and
n->params.serial in the nvme_subsys_register_ctrl() function.

Cc: qemu-stable@nongnu.org
Fixes: 44c2c09488 ("hw/nvme: Add support for SR-IOV")
Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2024-03-12 15:48:56 +01:00
Klaus Jensen
d2b5bb860e hw/nvme: fix invalid endian conversion
numcntl is one byte and so is max_vfs. Using cpu_to_le16 on big endian
hosts results in numcntl being set to 0.

Fix by dropping the endian conversion.

Fixes: 99f48ae7ae ("hw/nvme: Add support for Secondary Controller List")
Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im@samsung.com>
Message-ID: <20240222-fix-sriov-numcntl-v1-1-d60bea5e72d0@samsung.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-02-27 09:37:30 +01:00
Stefan Hajnoczi
b55e4b9c05 trivial patches for 2023-09-21
-----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEe3O61ovnosKJMUsicBtPaxppPlkFAmUL/84PHG1qdEB0bHMu
 bXNrLnJ1AAoJEHAbT2saaT5Zlz4H/iI7Rhmsw6E46WhQPz1oly8p5I3m6Tcxs5B3
 nagfaJC0EYjKyMZC1bsATJwRj8robCb5SDhZeUfudt1ytZYFfH3ulvlUrGYrMQRW
 YEfBFIDLexqrLpsykc6ovl2NB5BXQsK3n6NNbnYE1OxQt8Cy4kNQi1bStrZ8JzDE
 lIxvWZdwoQJ2K0VRDGRLrL6XG80qeONSXEoppXxJlfhk1Ar3Ruhijn3REzfQybvV
 1zIa1/h80fSLuwOGSPuOLqVCt6JzTuOOrfYc9F+sjcmIQWHLECy6CwTHEbb921Tw
 9HD6ah4rvkxoN2NWSPo/kM6tNW/pyOiYwYldx5rfWcQ5mhScuO8=
 =u6P0
 -----END PGP SIGNATURE-----

Merge tag 'pull-trivial-patches' of https://gitlab.com/mjt0k/qemu into staging

trivial patches for 2023-09-21

# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEe3O61ovnosKJMUsicBtPaxppPlkFAmUL/84PHG1qdEB0bHMu
# bXNrLnJ1AAoJEHAbT2saaT5Zlz4H/iI7Rhmsw6E46WhQPz1oly8p5I3m6Tcxs5B3
# nagfaJC0EYjKyMZC1bsATJwRj8robCb5SDhZeUfudt1ytZYFfH3ulvlUrGYrMQRW
# YEfBFIDLexqrLpsykc6ovl2NB5BXQsK3n6NNbnYE1OxQt8Cy4kNQi1bStrZ8JzDE
# lIxvWZdwoQJ2K0VRDGRLrL6XG80qeONSXEoppXxJlfhk1Ar3Ruhijn3REzfQybvV
# 1zIa1/h80fSLuwOGSPuOLqVCt6JzTuOOrfYc9F+sjcmIQWHLECy6CwTHEbb921Tw
# 9HD6ah4rvkxoN2NWSPo/kM6tNW/pyOiYwYldx5rfWcQ5mhScuO8=
# =u6P0
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 21 Sep 2023 04:33:18 EDT
# gpg:                using RSA key 7B73BAD68BE7A2C289314B22701B4F6B1A693E59
# gpg:                issuer "mjt@tls.msk.ru"
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>" [full]
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>" [full]
# gpg:                 aka "Michael Tokarev <mjt@debian.org>" [full]
# Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D  4324 457C E0A0 8044 65C5
#      Subkey fingerprint: 7B73 BAD6 8BE7 A2C2 8931  4B22 701B 4F6B 1A69 3E59

* tag 'pull-trivial-patches' of https://gitlab.com/mjt0k/qemu:
  docs/devel/reset.rst: Correct function names
  docs/cxl: Cleanout some more aarch64 examples.
  hw/mem/cxl_type3: Add missing copyright and license notice
  hw/cxl: Fix out of bound array access
  docs/cxl: Change to lowercase as others
  hw/cxl/cxl_device: Replace magic number in CXLError definition
  hw/pci-bridge/cxl_upstream: Fix bandwidth entry base unit for SSLBIS
  hw/cxl: Fix CFMW config memory leak
  hw/i386/pc: fix code comment on cumulative flash size
  subprojects: Use the correct .git suffix in the repository URLs
  hw/other: spelling fixes
  hw/tpm: spelling fixes
  hw/pci: spelling fixes
  hw/net: spelling fixes
  i386: spelling fixes
  bsd-user: spelling fixes
  ppc: spelling fixes

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-09-21 09:32:47 -04:00
Michael Tokarev
9b4b4e510b hw/other: spelling fixes
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2023-09-21 11:31:16 +03:00
Stefan Hajnoczi
652b0dd808 block: remove AIOCBInfo->get_aio_context()
The synchronous bdrv_aio_cancel() function needs the acb's AioContext so
it can call aio_poll() to wait for cancellation.

It turns out that all users run under the BQL in the main AioContext, so
this callback is not needed.

Remove the callback, mark bdrv_aio_cancel() GLOBAL_STATE_CODE just like
its blk_aio_cancel() caller, and poll the main loop AioContext.

The purpose of this cleanup is to identify bdrv_aio_cancel() as an API
that does not work with the multi-queue block layer.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230912231037.826804-2-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-20 17:46:01 +02:00
Peter Maydell
b3c8246750 hw/nvme: Avoid dynamic stack allocation
Instead of using a variable-length array in nvme_map_prp(),
allocate on the stack with a g_autofree pointer.

The codebase has very few VLAs, and if we can get rid of them all we
can make the compiler error on new additions.  This is a defensive
measure against security bugs where an on-stack dynamic allocation
isn't correctly size-checked (e.g.  CVE-2021-3527).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-09-12 16:17:05 +02:00
Philippe Mathieu-Daudé
b02c2a85a6 hw/nvme: Use #define to avoid variable length array
In nvme_map_sgl() we create an array segment[] whose size is the
'const int SEG_CHUNK_SIZE'.  Since this is C, rather than C++, a
"const int foo" is not a true constant, it's merely a variable with a
constant value, and so semantically segment[] is a variable-length
array.  Switch SEG_CHUNK_SIZE to a #define so that we can make the
segment[] array truly fixed-size, in the sense that it doesn't
trigger the -Wvla warning.

The codebase has very few VLAs, and if we can get rid of them all we
can make the compiler error on new additions.  This is a defensive
measure against security bugs where an on-stack dynamic allocation
isn't correctly size-checked (e.g.  CVE-2021-3527).

[PMM: rebased (function has moved file), expand commit message
 based on discussion from previous version of patch]

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-09-12 16:17:05 +02:00
Klaus Jensen
3439ba9c5d hw/nvme: fix null pointer access in ruh update
The Reclaim Unit Update operation in I/O Management Receive does not
verify the presence of a configured endurance group prior to accessing
it.

Fix this.

Cc: qemu-stable@nongnu.org
Fixes: 73064edfb8 ("hw/nvme: flexible data placement emulation")
Reviewed-by: Jesper Wendel Devantier <j.devantier@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-08-09 15:32:32 +02:00
Klaus Jensen
6c8f8456cb hw/nvme: fix null pointer access in directive receive
nvme_directive_receive() does not check if an endurance group has been
configured (set) prior to testing if flexible data placement is enabled
or not.

Fix this.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1815
Fixes: 73064edfb8 ("hw/nvme: flexible data placement emulation")
Reviewed-by: Jesper Wendel Devantier <j.devantier@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-08-09 15:32:32 +02:00
Klaus Jensen
6a33f2e920 hw/nvme: fix compliance issue wrt. iosqes/iocqes
As of prior to this patch, the controller checks the value of CC.IOCQES
and CC.IOSQES prior to enabling the controller. As reported by Ben in
GitLab issue #1691, this is not spec compliant. The controller should
only check these values when queues are created.

This patch moves these checks to nvme_create_cq(). We do not need to
check it in nvme_create_sq() since that will error out if the completion
queue is not already created.

Also, since the controller exclusively supports SQEs of size 64 bytes
and CQEs of size 16 bytes, hard code that.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1691
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-08-07 12:27:24 +02:00
Klaus Jensen
ecb1b7b082 hw/nvme: fix oob memory read in fdp events log
As reported by Trend Micro's Zero Day Initiative, an oob memory read
vulnerability exists in nvme_fdp_events(). The host-provided offset is
not verified.

Fix this.

This is only exploitable when Flexible Data Placement mode (fdp=on) is
enabled.

Fixes: CVE-2023-4135
Fixes: 73064edfb8 ("hw/nvme: flexible data placement emulation")
Reported-by: Trend Micro's Zero Day Initiative
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-08-07 08:51:37 +02:00
Klaus Jensen
c1e244b655 hw/nvme: use stl/ldl pci dma api
Use the stl/ldl pci dma api for writing/reading doorbells. This removes
the explicit endian conversions.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Tested-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-07-30 20:09:54 +02:00
Klaus Jensen
ea3c76f149 hw/nvme: fix endianness issue for shadow doorbells
In commit 2fda0726e5 ("hw/nvme: fix missing endian conversions for
doorbell buffers"), we fixed shadow doorbells for big-endian guests
running on little endian hosts. But I did not fix little-endian guests
on big-endian hosts. Fix this.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1765
Fixes: 3f7fe8de3d ("hw/nvme: Implement shadow doorbell buffer support")
Cc: qemu-stable@nongnu.org
Reported-by: Thomas Huth <thuth@redhat.com>
Tested-by: Cédric Le Goater <clg@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-07-19 09:33:54 +02:00
Akihiko Odaki
445416e301 pcie: Use common ARI next function number
Currently the only implementers of ARI is SR-IOV devices, and they
behave similar. Share the ARI next function number.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Message-Id: <20230710153838.33917-2-akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-10 18:59:32 -04:00
Minwoo Im
381ab99d85 hw/nvme: check maximum copy length (MCL) for COPY
MCL(Maximum Copy Length) in the Identify Namespace data structure limits
the number of LBAs to be copied inside of the controller.  We've not
checked it at all, so added the check with returning the proper error
status.

Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-06-28 11:13:42 +02:00
Minwoo Im
cab1da59c2 hw/nvme: consider COPY command in nvme_aio_err
If we don't have NVME_CMD_COPY consideration in the switch statement in
nvme_aio_err(), it will go to have NVME_INTERNAL_DEV_ERROR and
`req->status` will be ovewritten to it.  During the aio context, it
might set the NVMe status field like NVME_CMD_SIZE_LIMIT, but it's
overwritten in the nvme_aio_err().

Add consideration for the NVME_CMD_COPY not to overwrite the status at
the end of the function.

Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-06-28 11:13:42 +02:00
Minwoo Im
7491e0e409 hw/nvme: add comment for nvme-ns properties
Add more comments of existing properties for nvme-ns device.

Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-06-28 11:13:42 +02:00
Alexander Bulekov
f63192b054 hw: replace most qemu_bh_new calls with qemu_bh_new_guarded
This protects devices from bh->mmio reentrancy issues.

Thanks: Thomas Huth <thuth@redhat.com> for diagnosing OS X test failure.
Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230427211013.2994127-5-alxndr@bu.edu>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-04-28 11:31:54 +02:00
Paolo Bonzini
3488fc3262 nvme: remove constant argument to tracepoint
The last argument to -pci_nvme_err_startfail_virt_state is always "OFFLINE"
due to the enclosing "if" condition requiring !sctrl->scs.  Reported by
Coverity.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-20 11:17:35 +02:00
Klaus Jensen
4b32319cda hw/nvme: fix memory leak in nvme_dsm
The iocb (and the allocated memory to hold LBA ranges) leaks if reading
the LBA ranges fails.

Fix this by adding a free and an unref of the iocb.

Reported-by: Coverity (CID 1508281)
Fixes: d7d1474fd8 ("hw/nvme: reimplement dsm to allow cancellation")
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-04-12 12:03:09 +02:00
Klaus Jensen
ca2a091802 hw/nvme: fix missing DNR on compare failure
Even if the host is somehow using compare to do compare-and-write, the
host should be notified immediately about the compare failure and not
have to wait for the driver to potentially retry the command.

Fixes: 0a384f923f ("hw/block/nvme: add compare command")
Reported-by: Jim Harris <james.r.harris@intel.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-03-27 19:05:23 +02:00
Mateusz Kozlowski
9b4f01812f hw/nvme: Change alignment in dma functions for nvme_blk_*
Since the nvme_blk_read/write are used by both the data and metadata
portions of the IO, it can't have the 512B alignment requirement.
Without this change any metadata transfer, which length isn't a multiple
of 512B and which is bigger than 512B, will result in only a partial
transfer.

Signed-off-by: Mateusz Kozlowski <kozlowski.mateuszpl@gmail.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-03-27 17:48:08 +02:00
Jesper Devantier
73064edfb8 hw/nvme: flexible data placement emulation
Add emulation of TP4146 ("Flexible Data Placement").

Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jesper Devantier <j.devantier@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-03-06 15:28:02 +01:00
Gollu Appalanaidu
e181d3da39 hw/nvme: basic directives support
Add support for the Directive Send and Recv commands and the Identify
directive.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-03-06 15:28:02 +01:00
Klaus Jensen
771dbc3ac4 hw/nvme: add basic endurance group support
Add the mandatory Endurance Group identify data structures and log
pages.

For now, all namespaces in a subsystem belongs to a single Endurance
Group.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-03-06 15:28:02 +01:00
Joel Granados
a555af1707 hw/nvme: move adjustment of data_units{read,written}
Move the rounding of bytes read/written into nvme_smart_log which
reports in units of 512 bytes, rounded up in thousands. This is in
preparation for adding the Endurance Group Information log page which
reports in units of billions, rounded up.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-03-06 15:28:02 +01:00
Klaus Jensen
973f76cf77 hw/nvme: cleanup error reporting in nvme_init_pci()
Replace the local Error variable with errp and ERRP_GUARD() and change
the return value to bool.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-01-11 08:41:19 +01:00
Klaus Jensen
784fd35387 hw/nvme: clean up confusing use of errp/local_err
Remove an unnecessary local Error value in nvme_realize(). In the
process, change nvme_check_constraints() to return a bool.

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-01-11 08:41:14 +01:00
Klaus Jensen
fa5db2aa16 hw/nvme: fix missing cq eventidx update
Prior to reading the shadow doorbell cq head, we have to update the
eventidx. Otherwise, we risk that the driver will skip an mmio doorbell
write. This happens on riscv64, as reported by Guenter.

Adding the missing update to the cq eventidx fixes the issue.

Fixes: 3f7fe8de3d ("hw/nvme: Implement shadow doorbell buffer support")
Cc: qemu-stable@nongnu.org
Cc: qemu-riscv@nongnu.org
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-01-09 08:48:46 +01:00
Klaus Jensen
2fda0726e5 hw/nvme: fix missing endian conversions for doorbell buffers
The eventidx and doorbell value are not handling endianness correctly.
Fix this.

Fixes: 3f7fe8de3d ("hw/nvme: Implement shadow doorbell buffer support")
Cc: qemu-stable@nongnu.org
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-01-09 08:48:46 +01:00
Klaus Jensen
47cd3539e1 hw/nvme: rename shadow doorbell related trace events
Rename the trace events related to writing the event index and reading
the doorbell value to make it more clear that the event is associated
with an actual update (write or read respectively).

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-01-09 08:48:46 +01:00
Klaus Jensen
48b32c28d5 hw/nvme: use QOM accessors
Replace various ->parent_obj use with the equivalent QOM accessors.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2023-01-09 08:48:46 +01:00
Markus Armbruster
3d558330ad Drop more useless casts from void * to pointer
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20221123133811.1398562-1-armbru@redhat.com>
2022-12-14 16:19:35 +01:00
Klaus Jensen
83f56ac321 hw/nvme: remove copy bh scheduling
Fix a potential use-after-free by removing the bottom half and enqueuing
the completion directly.

Fixes: 796d20681d ("hw/nvme: reimplement the copy command to allow aio cancellation")
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-12-01 08:45:03 +01:00
Klaus Jensen
818b9b8f5e hw/nvme: fix aio cancel in dsm
When the DSM operation is cancelled asynchronously, we set iocb->ret to
-ECANCELED. However, the callback function only checks the return value
of the completed aio, which may have completed succesfully prior to the
cancellation and thus the callback ends up continuing the dsm operation
instead of bailing out. Fix this.

Secondly, fix a potential use-after-free by removing the bottom half and
enqueuing the completion directly.

Fixes: d7d1474fd8 ("hw/nvme: reimplement dsm to allow cancellation")
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-12-01 08:45:00 +01:00
Klaus Jensen
36a251c346 hw/nvme: fix aio cancel in zone reset
If the zone reset operation is cancelled but the block unmap operation
completes normally, the callback will continue resetting the next zone
since it neglects to check iocb->ret which will have been set to
-ECANCELED. Make sure that this is checked and bail out if an error is
present.

Secondly, fix a potential use-after-free by removing the bottom half and
enqueuing the completion directly.

Fixes: 63d96e4ffd ("hw/nvme: reimplement zone reset to allow cancellation")
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-12-01 08:44:56 +01:00
Klaus Jensen
3dbc1708ea hw/nvme: fix aio cancel in flush
Make sure that iocb->aiocb is NULL'ed when cancelling.

Fix a potential use-after-free by removing the bottom half and enqueuing
the completion directly.

Fixes: 38f4ac65ac ("hw/nvme: reimplement flush to allow cancellation")
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-12-01 08:44:52 +01:00
Klaus Jensen
433c71e494 hw/nvme: fix aio cancel in format
There are several bugs in the async cancel code for the Format command.

Firstly, cancelling a format operation neglects to set iocb->ret as well
as clearing the iocb->aiocb after cancelling the underlying aiocb which
causes the aio callback to ignore the cancellation. Trivial fix.

Secondly, and worse, because the request is queued up for posting to the
CQ in a bottom half, if the cancellation is due to the submission queue
being deleted (which calls blk_aio_cancel), the req structure is
deallocated in nvme_del_sq prior to the bottom half being schedulued.

Fix this by simply removing the bottom half, there is no reason to defer
it anyway.

Fixes: 3bcf26d3d6 ("hw/nvme: reimplement format nvm to allow cancellation")
Reported-by: Jonathan Derrick <jonathan.derrick@linux.dev>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-12-01 08:44:16 +01:00
Stefan Hajnoczi
f21f1cfeb9 pci,pc,virtio: features, tests, fixes, cleanups
lots of acpi rework
 first version of biosbits infrastructure
 ASID support in vhost-vdpa
 core_count2 support in smbios
 PCIe DOE emulation
 virtio vq reset
 HMAT support
 part of infrastructure for viommu support in vhost-vdpa
 VTD PASID support
 fixes, tests all over the place
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmNpXDkPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpD0AH/2G8ZPrgrxJC9y3uD5/5J6QRzO+TsDYbg5ut
 uBf4rKSHHzcu6zdyAfsrhbAKKzyD4HrEGNXZrBjnKM1xCiB/SGBcDIWntwrca2+s
 5Dpbi4xvd4tg6tVD4b47XNDCcn2uUbeI0e2M5QIbtCmzdi/xKbFAfl5G8DQp431X
 Kmz79G4CdKWyjVlM0HoYmdCw/4FxkdjD02tE/Uc5YMrePNaEg5Bw4hjCHbx1b6ur
 6gjeXAtncm9s4sO0l+sIdyiqlxiTry9FSr35WaQ0qPU+Og5zaf1EiWfdl8TRo4qU
 EAATw5A4hyw11GfOGp7oOVkTGvcNB/H7aIxD7emdWZV8+BMRPKo=
 =zTCn
 -----END PGP SIGNATURE-----

Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging

pci,pc,virtio: features, tests, fixes, cleanups

lots of acpi rework
first version of biosbits infrastructure
ASID support in vhost-vdpa
core_count2 support in smbios
PCIe DOE emulation
virtio vq reset
HMAT support
part of infrastructure for viommu support in vhost-vdpa
VTD PASID support
fixes, tests all over the place

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmNpXDkPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpD0AH/2G8ZPrgrxJC9y3uD5/5J6QRzO+TsDYbg5ut
# uBf4rKSHHzcu6zdyAfsrhbAKKzyD4HrEGNXZrBjnKM1xCiB/SGBcDIWntwrca2+s
# 5Dpbi4xvd4tg6tVD4b47XNDCcn2uUbeI0e2M5QIbtCmzdi/xKbFAfl5G8DQp431X
# Kmz79G4CdKWyjVlM0HoYmdCw/4FxkdjD02tE/Uc5YMrePNaEg5Bw4hjCHbx1b6ur
# 6gjeXAtncm9s4sO0l+sIdyiqlxiTry9FSr35WaQ0qPU+Og5zaf1EiWfdl8TRo4qU
# EAATw5A4hyw11GfOGp7oOVkTGvcNB/H7aIxD7emdWZV8+BMRPKo=
# =zTCn
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 07 Nov 2022 14:27:53 EST
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (83 commits)
  checkpatch: better pattern for inline comments
  hw/virtio: introduce virtio_device_should_start
  tests/acpi: update tables for new core count test
  bios-tables-test: add test for number of cores > 255
  tests/acpi: allow changes for core_count2 test
  bios-tables-test: teach test to use smbios 3.0 tables
  hw/smbios: add core_count2 to smbios table type 4
  vhost-user: Support vhost_dev_start
  vhost: Change the sequence of device start
  intel-iommu: PASID support
  intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function
  intel-iommu: drop VTDBus
  intel-iommu: don't warn guest errors when getting rid2pasid entry
  vfio: move implement of vfio_get_xlat_addr() to memory.c
  tests: virt: Update expected *.acpihmatvirt tables
  tests: acpi: aarch64/virt: add a test for hmat nodes with no initiators
  hw/arm/virt: Enable HMAT on arm virt machine
  tests: Add HMAT AArch64/virt empty table files
  tests: acpi: q35: update expected blobs *.hmat-noinitiators expected HMAT:
  tests: acpi: q35: add test for hmat nodes without initiators
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-11-07 18:43:56 -05:00
Akihiko Odaki
15377f6e79 msix: Assert that specified vector is in range
There were several different ways to deal with the situation where the
vector specified for a msix function is out of bound:
- early return a function and keep progresssing
- propagate the error to the caller
- mark msix unusable
- assert it is in bound
- just ignore

An out-of-bound vector should not be specified if the device
implementation is correct so let msix functions always assert that the
specified vector is in range.

An exceptional case is virtio-pci, which allows the guest to configure
vectors. For virtio-pci, it is more appropriate to introduce its own
checks because it is sometimes too late to check the vector range in
msix functions.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20220829083524.143640-1-akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Yuval Shaia <yuval.shaia.ml@gmail.com>
Signed-off-by: Akihiko Odaki &lt;<a href="mailto:akihiko.odaki@daynix.com" target="_blank">akihiko.odaki@daynix.com</a>&gt;<br>
2022-11-07 14:08:17 -05:00
Francis Pravin Antony Michael Raj
632cb6cf07 hw/nvme: Abort copy command when format is one while pif
As per the NVMe Command Set specification Section 3.2.2, if

  i)  The namespace is formatted to use 16b Guard Protection
      Information (i.e., pif = 0) and
  ii) The Descriptor Format is not cleared to 0h

Then the copy command should be aborted with the status code of Invalid
Namespace or Format

Fixes: 44219b6029 ("hw/nvme: 64-bit pi support")
Signed-off-by: Francis Pravin Antony Michael Raj <francis.michael@solidigm.com>
Signed-off-by: Jonathan Derrick <jonathan.derrick@solidigm.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-11-02 09:23:05 +01:00