NVMe command set specification for end-to-end data protection formatted
namespace states:
o If the Reference Tag Check bit of the PRCHK field is set to ‘1’ and
the namespace is formatted for Type 3 protection, then the
controller:
▪ should not compare the protection Information Reference Tag
field to the computed reference tag; and
▪ may ignore the ILBRT and EILBRT fields. If a command is
aborted as a result of the Reference Tag Check bit of the
PRCHK field being set to ‘1’, then that command should be
aborted with a status code of Invalid Protection Information,
but may be aborted with a status code of Invalid Field in
Command.
Currently qemu compares reftag in the nvme_dif_prchk function whenever
Reference Tag Check bit is set in the command. For type 3 namespaces
however, caller of nvme_dif_prchk - nvme_dif_check does not increment
reftag for each subsequent logical block. That way commands incorporating
more than one logical block for type 3 formatted namespaces with reftag
check bit set, always fail with End-to-end Reference Tag Check Error.
Comply with spec by handling case of set Reference Tag Check
bit in the type 3 formatted namespace.
Fixes: 146f720c55 ("hw/block/nvme: end-to-end data protection")
Signed-off-by: Dmitry Tikhov <d.tihov@yadro.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The Linux kernel quirks the QEMU NVMe controller pretty heavily because
of the namespace identifier mess. Since this is now fixed, bump the
firmware revision number to allow the quirk to be disabled for this
revision.
As of now, bump the firmware revision number to be equal to the QEMU
release version number.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Do not report the "null uuid" (all zeros) in the namespace
identification descriptors.
Reported-by: Luis Chamberlain <mcgrof@kernel.org>
Reported-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Do not default to generate an UUID for namespaces if it is not
explicitly specified.
This is a technically a breaking change in behavior. However, since the
UUID changes on every VM launch, it is not spec compliant and is of
little use since the UUID cannot be used reliably anyway and the
behavior prior to this patch must be considered buggy.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
We cannot provide auto-generated unique or persistent namespace
identifiers (EUI64, NGUID, UUID) easily. Since 6.1, namespaces have been
assigned a generated EUI64 of the form "52:54:00:<namespace counter>".
This is will be unique within a QEMU instance, but not globally.
Revert that this is assigned automatically and immediately deprecate the
compatibility parameter. Users can opt-in to this with the
`eui64-default=on` device parameter or set it explicitly with
`eui64=UINT64`.
Cc: libvir-list@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The Identify Controller Serial Number (SN) is the serial number for the
NVM subsystem and must be the same across all controller in the NVM
subsystem.
Enforce this.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Pass the right constant to nvme_smart_event(). The NVME_AER* values hold
the bit position in the SMART byte, not the shifted value that we expect
it to be in nvme_smart_event().
Fixes: c62720f137 ("hw/block/nvme: trigger async event during injecting smart warning")
Acked-by: zhenwei pi <pizhenwei@bytedance.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Current implementation have problem in the read part of copy command.
Because there is no metadata mangling before nvme_dif_check invocation,
reftag error could be thrown for blocks of namespace that have not been
previously written to.
Signed-off-by: Dmitry Tikhov <d.tihov@yadro.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Since there is no return after nvme_dsm_cb invocation, metadata
associated with non-zero block range is currently zeroed. Also this
behaviour leads to segfault since we schedule iocb->bh two times.
First when entering nvme_dsm_cb with iocb->idx == iocb->nr and
second because of missing return on call stack unwinding by calling
blk_aio_pwrite_zeroes and subsequent nvme_dsm_cb callback.
Fixes: d7d1474fd8 ("hw/nvme: reimplement dsm to allow cancellation")
Signed-off-by: Dmitry Tikhov <d.tihov@yadro.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Since nlbas is of type int, it does not work with large namespace size
values, e.g., 9 TB size of file backing namespace and 8 byte metadata
with 4096 bytes lbasz gives negative nlbas value, which is later
promoted to negative int64_t type value and results in negative
ns->moff which breaks namespace
Signed-off-by: Dmitry Tikhov <ddtikhov@gmail.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Header guard symbols should match their file name to make guard
collisions less likely.
Cleaned up with scripts/clean-header-guards.pl, followed by some
renaming of new guard symbols picked by the script to better ones.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220506134911.2856099-2-armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[Change to generated file ebpf/rss.bpf.skeleton.h backed out]
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.
This commit only touches allocations with size arguments of the form
sizeof(T).
Patch created mechanically with:
$ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \
--macro-file scripts/cocci-macro-file.h FILES...
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20220315144156.1595462-4-armbru@redhat.com>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
This adds support for one possible new protection information format
introduced in TP4068 (and integrated in NVMe 2.0): the 64-bit CRC guard
and 48-bit reference tag. This version does not support storage tags.
Like the CRC16 support already present, this uses a software
implementation of CRC64 (so it is naturally pretty slow). But its good
enough for verification purposes.
This may go nicely hand-in-hand with the support that Keith submitted
for the Linux kernel[1].
[1]: https://lore.kernel.org/linux-nvme/20220126165214.GA1782352@dhcp-10-100-145-180.wdc.com/T/
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Naveen Nagar <naveen.n1@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
A subsequent patch will introduce a new tuple size; so add a helper and
use that instead of sizeof() and magic numbers.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Add support for up to 64 LBA formats through the LBAFEE field of the
Host Behavior Support feature.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Naveen Nagar <naveen.n1@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
There is no need to extract the format command parameters for each
namespace. Move it to the entry point.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Add support for getting and setting the Host Behavior Support feature.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Naveen Nagar <naveen.n1@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Add support for TP 4076 ("Zoned Random Write Area"), v2021.08.23
("Ratified").
This adds three new namespace parameters: "zoned.numzrwa" (number of
zrwa resources, i.e. number of zones that can have a zrwa),
"zoned.zrwas" (zrwa size in LBAs), "zoned.zrwafg" (granularity in LBAs
for flushes).
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Add struct for Zone Management Send in preparation for more zone send
flags.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
These buffers can be anything, not an array of chars,
so use the 'void *' type for them.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The 'buf' argument is not modified, so better pass it as const type.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
This fixes CVE-2021-3929 "locally" by denying DMA to the iomem of the
device itself. This still allows DMA to MMIO regions of other devices
(e.g. doing P2P DMA to the controller memory buffer of another NVMe
device).
Fixes: CVE-2021-3929
Reported-by: Qiuhao Li <Qiuhao.Li@outlook.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Since commit 292e13142d, dma_buf_rw() returns a MemTxResult type.
Do not discard it, return it to the caller. Pass the previously
returned value (the QEMUSGList residual size, which was rarely used)
as an optional argument.
With this new API, SCSIRequest::residual might now be accessed via
a pointer. Since the size_t type does not have the same size on
32 and 64-bit host architectures, convert it to a uint64_t, which
is big enough to hold the residual size, and the type is constant
on both 32/64-bit hosts.
Update the few dma_buf_read() / dma_buf_write() callers to the new
API.
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220117125130.131828-1-f4bug@amsat.org>
Update the obvious places where dma_addr_t should be used
(instead of uint64_t, hwaddr, size_t, int32_t types).
This allows to have &dma_addr_t type portable on 32/64-bit
hosts.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220111184309.28637-11-f4bug@amsat.org>
Let devices specify transaction attributes when calling
dma_buf_read().
Keep the default MEMTXATTRS_UNSPECIFIED in the few callers.
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211223115554.3155328-13-philmd@redhat.com>
Let devices specify transaction attributes when calling
dma_buf_write().
Keep the default MEMTXATTRS_UNSPECIFIED in the few callers.
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211223115554.3155328-12-philmd@redhat.com>
Change namespaces to be shared namespaces by default (parameter
shared=on). Keep shared=off for older machine types.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
With commit 5ffbaeed16 ("hw/nvme: fix controller hot unplugging")
namespaces get moved from the controller to the subsystem if one
is specified.
That keeps the namespaces alive after a controller hot-unplug, but
after a controller hotplug we have to reconnect the namespaces
from the subsystem to the controller.
Fixes: 5ffbaeed16 ("hw/nvme: fix controller hot unplugging")
Cc: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
[k.jensen: only attach to shared and non-detached namespaces]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Rename qbus_create_inplace() to qbus_init(); this is more in line
with our usual naming convention for functions that in-place
initialize objects.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20210923121153.23754-5-peter.maydell@linaro.org
Currently, FUSED operations are not supported by QEMU. As per the 1.4 SPEC,
controller should abort the command that requested a fused operation with
an INVALID FIELD error code if they are not supported.
Changes from v1:
Added FUSE flag check also to the admin cmd processing as the FUSED
operations are mentioned in the general SQE section in the SPEC.
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Fix is added to check for reserved value in select field for
namespace attachment
CC: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Naveen Nagar <naveen.n1@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Address 0x0 is a valid address. Fix the admin submission and completion
queue address validation to not error out on this.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Coverity found that 'uuid', 'csi' and 'eui64' are uninitialized. While
we set most of the fields, we do not explicitly set the rsvd2 field in
the NvmeIdNsDescr header.
Fix this by explicitly zero-initializing the variables.
Reported-by: Coverity (CID 1458835, 1459295 and 1459580)
Fixes: 6870cfb814 ("hw/nvme: namespace parameter for EUI-64")
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
The new PMR test unearthed a long-standing issue with MMIO reads on
big-endian hosts.
Fix this by unconditionally storing all controller registers in little
endian.
Cc: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Peter noticed that mmio access may read into the NvmeParams member in
the NvmeCtrl struct.
Fix the bounds check.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Add the NvmeBarRegs enum and use these instead of explicit register
offsets.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
The specification uses a set of 32 bit PMRMSCL and PMRMSCU registers to
make up the 64 bit logical PMRMSC register.
Make it so.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Prior to this patch the nvme-ns devices are always children of the
NvmeBus owned by the NvmeCtrl. This causes the namespaces to be
unrealized when the parent device is removed. However, when subsystems
are involved, this is not what we want since the namespaces may be
attached to other controllers as well.
This patch adds an additional NvmeBus on the subsystem device. When
nvme-ns devices are realized, if the parent controller device is linked
to a subsystem, the parent bus is set to the subsystem one instead. This
makes sure that namespaces are kept alive and not unrealized.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
If the number of PRP/SGL mappings exceed 1024, reads and writes will
fail because of an internal QEMU limitation of max 1024 vectors.
Signed-off-by: Padmakar Kalghatgi <p.kalghatgi@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
[k.jensen: changed the error message to be more generic]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Make sure the controller is unregistered from the subsystem when device
is removed.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
We currently lack the infrastructure to handle subsystem hotplugging, so
disable it.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The nvme_ns_setup and nvme_ns_check_constraints should not depend on the
controller state. Refactor and remove it.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Jakub noticed[1] that, when using pin-based interrupts, the device will
unconditionally deasssert when any CQEs are acknowledged. However, the
pin should not be deasserted if other completion queues still holds
unacknowledged CQEs.
The bug is an artifact of commit ca247d3509 ("hw/block/nvme: fix
pin-based interrupt behavior") which fixed one bug but introduced
another. This is the third time someone tries to fix pin-based
interrupts (see commit 5e9aa92eb1 ("hw/block: Fix pin-based interrupt
behaviour of NVMe"))...
Third time's the charm, so fix it, again, by keeping track of how many
CQs have unacknowledged CQEs and only deassert when all are cleared.
[1]: <20210610114624.304681-1-jakub.jermar@kernkonzept.com>
Cc: qemu-stable@nongnu.org
Fixes: ca247d3509 ("hw/block/nvme: fix pin-based interrupt behavior")
Reported-by: Jakub Jermář <jakub.jermar@kernkonzept.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Qiang Liu reported that an access on an unknown address is triggered in
memory_region_set_enabled because a check on CAP.PMRS is missing for the
PMRCTL register write when no PMR is configured.
Cc: qemu-stable@nongnu.org
Fixes: 75c3c9de96 ("hw/block/nvme: disable PMR at boot up")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/362
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
In the documentation of the '-detached' param "be" and "not" has been
used side by side, fix that.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Add the controller identifiers list CNS 0x13, available list of ctrls
in NVM Subsystem that may or may not be attached to namespaces.
In Identify Ctrl List of the CNS 0x12 and 0x13 no endian conversion
for the nsid field.
These two CNS values shows affect when there exists a Subsystem.
Added condition if there is no Subsystem return invalid field in
command.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
This partially reverts commit 98f84f5a4e.
Since all "multi aio" commands are now reimplemented to properly track
the nested aiocbs, we can revert the "hack" that was introduced to make
sure all requests we're properly drained upon sq deletion.
The revert is partial since we keep the assert that no outstanding
requests remain on the submission queue after the explicit cancellation.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>