Validation of the max_active and max_open zoned parameters are
independent of any other state, so move them to the early
nvme_ns_check_constraints parameter checks.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
It is not an error to report more active/open zones supported than the
number of zones in the namespace.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
The NvmeCtrl num_namespaces member is just an indirection for the
NVME_MAX_NAMESPACES constant.
Remove the indirection.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Streamline namespace array indexing such that both the subsystem and
controller namespaces arrays are 1-indexed.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
There is no need to look up the lba size and metadata size in the LBA
Format structure everytime we want to use it. And we use it a lot.
Cache the values in the NvmeNamespace and update them if the namespace
is formatted.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
The inline nvme_ns_status() helper only has a single call site. Remove
it from the header file and inline it for real.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
In preparation for moving the nvme device into its own subtree, merge
the header files into one.
Also add missing copyright notice and add list of authors with
substantial contributions.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Get rid of the (reserved) double underscore use.
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Get rid of the (reserved) double underscore use.
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Get rid of the (reserved) double underscore use. Rename the "generic"
zone open function to nvme_zrm_open_flags() and add a generic `int
flags` argument instead which allows more flags to be easily added in
the future. There is at least one TP under standardization that would
add an additional flag.
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
While QEMU coding style prefers lowercase hexadecimals in constants, the
NVMe subsystem uses the format from the NVMe specifications in comments,
i.e. 'h' suffix instead of '0x' prefix.
Fix this up across the code base.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
[k.jensen: updated message; added conversion in a couple of missing comments]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
nvme_map_addr_pmr function arguments not aligned, fix that.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Currently IO Command Set Profile feature is supported, but the feature
support flag not set. Further, this feature is changable. Fix that.
Additionally, remove filling default value of the CQE result with zero,
since it will fall back to the default case anyway.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
[k.jensen: fix up commit message]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Currently in compare command metadata aio read blk_aio_preadv return
value ignored. Consider it and complete the block accounting.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Fixes: 0a384f923f ("hw/block/nvme: add compare command")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Currently pci_nvme_err_invalid_lba_range trace is called individually at
each nvme_check_bounds() call site.
Move the trace event to nvme_check_bounds() and remove the redundant
events.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
[k.jensen: commit message fixup]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Fixes all over the place. Faster boot for virtio. ioeventfd support for
mmio.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmCeiMEPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpqsIH/A49Av5Bv8huL75lf9GzCx3E1a/z2W9Fphik
OcQ1ahR+7CRDARub+vTG40MBmZBVefIWjLAj3BwBWzFGPX0DZq0zeI102VzlEVKY
OeUx8ixuiKOSLcS+QxE7ZXIBL2Pn7l+MFUi4nLMYKti7c/kola7zlB57qsmXh+VD
AOQ7Utj6NWoi6QocWJsMSCyHCh3Fk9QzcStLlr6/MkSJa1zqv8l22+8oWH07Fk2M
wZfhrm9k094on28iSejsFYL5e4ROeXUajbOdfyMIxWvAB7boC9Jxk/e0oAbuSB4y
2f71Gfk3mU6irS7PvrxcKbk6BVD2zxM2WumOchZJgxFAujDO6yg=
=fvkT
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pc,pci,virtio: bugfixes, improvements
Fixes all over the place. Faster boot for virtio. ioeventfd support for
mmio.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Fri 14 May 2021 15:27:13 BST
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
Fix build with 64 bits time_t
vhost-vdpa: Make vhost_vdpa_get_device_id() static
hw/virtio: enable ioeventfd configuring for mmio
hw/smbios: support for type 41 (onboard devices extended information)
checkpatch: Fix use of uninitialized value
virtio-scsi: Configure all host notifiers in a single MR transaction
virtio-scsi: Set host notifiers and callbacks separately
virtio-blk: Configure all host notifiers in a single MR transaction
virtio-blk: Fix rollback path in virtio_blk_data_plane_start()
pc-dimm: remove unnecessary get_vmstate_memory_region() method
amd_iommu: fix wrong MMIO operations
virtio-net: Constify VirtIOFeature feature_sizes[]
virtio-blk: Constify VirtIOFeature feature_sizes[]
hw/virtio: Pass virtio_feature_get_config_size() a const argument
x86: acpi: use offset instead of pointer when using build_header()
amd_iommu: Fix pte_override_page_mask()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# Conflicts:
# hw/arm/virt.c
This allows the virtio-blk-pci device to batch the setup of all its
host notifiers. This significantly improves boot time of VMs with a
high number of vCPUs, e.g. from 3m26.186s down to 0m58.023s for a
pseries machine with 384 vCPUs.
Note that memory_region_transaction_commit() must be called before
virtio_bus_cleanup_host_notifier() because the latter might close
ioeventfds that the transaction still assumes to be around when it
commits.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210407143501.244343-3-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When dataplane multiqueue support was added in QEMU 2.7, the path
that would rollback guest notifiers assignment in case of error
simply got dropped.
Later on, when Error was added to blk_set_aio_context() in QEMU 4.1,
another error path was introduced, but it ommits to rollback both
host and guest notifiers.
It seems cleaner to fix the rollback path in one go. The patch is
simple enough that it can be adjusted if backported to a pre-4.1
QEMU.
Fixes: 51b04ac5c6 ("virtio-blk: dataplane multiqueue support")
Cc: stefanha@redhat.com
Fixes: 97896a4887 ("block: Add Error to blk_set_aio_context()")
Cc: kwolf@redhat.com
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210407143501.244343-2-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When no mapping is requested, it is pointless to create
alias regions.
Only create them when multiple mappings are requested to
simplify the memory layout. The flatview is not changed.
For example using 'qemu-system-sh4 -M r2d -S -monitor stdio',
* before:
(qemu) info mtree
address-space: memory
0000000000000000-ffffffffffffffff (prio 0, i/o): system
0000000000000000-0000000000ffffff (prio 0, i/o): pflash
0000000000000000-0000000000ffffff (prio 0, romd): alias pflash-alias @r2d.flash 0000000000000000-0000000000ffffff
0000000004000000-000000000400003f (prio 0, i/o): r2d-fpga
000000000c000000-000000000fffffff (prio 0, ram): r2d.sdram
(qemu) info mtree -f
FlatView #0
AS "memory", root: system
AS "cpu-memory-0", root: system
Root memory region: system
0000000000000000-0000000000ffffff (prio 0, romd): r2d.flash
0000000004000000-000000000400003f (prio 0, i/o): r2d-fpga
000000000c000000-000000000fffffff (prio 0, ram): r2d.sdram
* after:
(qemu) info mtree
address-space: memory
0000000000000000-ffffffffffffffff (prio 0, i/o): system
0000000000000000-0000000000ffffff (prio 0, romd): r2d.flash
0000000004000000-000000000400003f (prio 0, i/o): r2d-fpga
000000000c000000-000000000fffffff (prio 0, ram): r2d.sdram
(qemu) info mtree -f
FlatView #0
AS "memory", root: system
AS "cpu-memory-0", root: system
Root memory region: system
0000000000000000-0000000000ffffff (prio 0, romd): r2d.flash
0000000004000000-000000000400003f (prio 0, i/o): r2d-fpga
000000000c000000-000000000fffffff (prio 0, ram): r2d.sdram
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210325120921.858993-3-f4bug@amsat.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
The ROMD mode isn't related to mapping setup.
Ideally we'd set this mode when the state machine resets,
but for now simply move it to pflash_cfi02_realize() to
not introduce logical change.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210325120921.858993-2-f4bug@amsat.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
... when a xen-block backend instance is created via xenstore.
Following 8d17adf34f ("block: remove support for using "file" driver
with block/char devices"), using the "file" blockdev driver for
everything doesn't work anymore, we need to use the "host_device"
driver when the disk image is a block device and "file" driver when it
is a regular file.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Paul Durrant <paul@xen.org>
Message-Id: <20210430163432.468894-1-anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAmCPtbASHGxhdXJlbnRA
dml2aWVyLmV1AAoJEPMMOL0/L748I3wP/Al7yi77BMpts1t3lGMm7EBjKgkppnpr
wZYEM68bJonvvGiEKQjexn1CUfnDcq7f5SZkzcUNLI4oP57pyywb4/gshN0k/Zz8
uCDveMfnhbio2sqlXiMsH9TOhcv/4wtXAek/ghP7EOjkBvyXrAFIQ7eEPEB9cp+X
xxs9DxqfWmrGB6vt7Er78zjfUETSMa+UrheVLwbRMhJcc0Bg8hT2DCn9Lw6IjfOy
usWdrLTGc6qg1zdZzi8QR7jZ+bNx0h+aJLlm8M4cVitXq9v2wb3+6KdsOAeYioAE
AsnClw0m8j/xtMh3g4/hB4oCxMj0jRdZ9GIGs8Didw5ZwkXTRvFM1GK1PHxqX4pF
8xMW6Qq0bSUr4II6bPOukBUMUAnPYdkh+iHXsYSZG0I3u6VZLgMK3AXmKRukAYqe
kQ1lcRe3Lwsp2h+jMBBsbCWhwYdA3THFO4YO31cUaZ191A7z57905QMbqJG/H3HB
7IUBYBNbrhgysPsNBvY6Lr7yUJIocMgcfP36UHYcBPsDdZgjNCQZneJlkaRlQb8+
CtUSF8D614EguzGsWaIn3uBSm9THKKLd1rSXCyTSgrXDI285mXlKmEWZvm236ew0
OEmIz/Ach/R4268j76enYGa1aubsxnrphUfC3aePu0Wzd3QW4RxnCSq7wc4ARPw7
WTL7J00P578h
=aCeG
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-for-6.1-pull-request' into staging
Trivial patches pull request 20210503
# gpg: Signature made Mon 03 May 2021 09:34:56 BST
# gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg: issuer "laurent@vivier.eu"
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full]
# gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full]
# gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C
* remotes/vivier2/tags/trivial-branch-for-6.1-pull-request: (23 commits)
hw/rx/rx-gdbsim: Do not accept invalid memory size
docs: More precisely describe memory-backend-*::id's user
scripts: fix generation update-binfmts templates
docs/system: Document the removal of "compat" property for POWER CPUs
mc146818rtc: put it into the 'misc' category
Do not include exec/address-spaces.h if it's not really necessary
Do not include cpu.h if it's not really necessary
Do not include hw/boards.h if it's not really necessary
Do not include sysemu/sysemu.h if it's not really necessary
hw: Do not include qemu/log.h if it is not necessary
hw: Do not include hw/irq.h if it is not necessary
hw: Do not include hw/sysbus.h if it is not necessary
hw: Remove superfluous includes of hw/hw.h
ui: Fix memory leak in qemu_xkeymap_mapping_table()
hw/usb: Constify VMStateDescription
hw/display/qxl: Constify VMStateDescription
hw/arm: Constify VMStateDescription
vmstate: Constify some VMStateDescriptions
Fix typo in CFI build documentation
hw/pcmcia: Do not register PCMCIA type if not required
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* Fixes for the DMA space
* New model for ASPEED's Hash and Crypto Engine (Joel and Klaus)
* Acceptance tests (Joel)
* A fix for the XDMA model
* Some extra features for the SMC controller.
* Two new boards : rainier-bmc and quanta-q7l1-bmc (Patrick)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmCPiNgACgkQUaNDx8/7
7KGqBhAAviQHW0A4UPGi91uGq6wN1V4skbdMJIGnvOVnkOH1aRySPfnwiRRYimpc
/3re+dLzu/zf/ehwdJd7nk3zLG2HR3A+Lw0fdBR2gGvuQwyUz/D+34yR43eJ8ju4
HcuOVfo9ZeSIJZPZTHfHD/0/AhNxKCUv7PiV2T3XukGcaiuQKbQIlfY73LDjIIkS
O5FT5IxknCXNWJ4eS8C04EsLzdkdxdZ1QsnaNyhLIywzdO5wThWQ6YE1AK1VPVES
yGiJMRXcXHDicmwru9jZIDG3jiiEO01FUG6hBTB2qA/OaXVark/uw55+qsEwRuEv
NYznDwEVwmN1CB5oGP+MbRlwyyJoirLlJ35FB3KC3OciZCRbrzHA1OtxsqlDf9eJ
K4j3M51CuhU5D9AJ+77BxZewHN2RugIvvlSyQ8FP+mbbvDIBbiiY3mkks7pLpgRh
U33HxOGmFNuSIYavlYD12OQcnimMv6Zqrf3WUikfredpXiY8UNAfxazQPpaCzNFq
DcjNKt6DcdXXSHthQiRhMbWLPl+Lw8dih8Y+cs/xRnjqySHl8eLLb0tFL7Dlkl0z
7yTLyt+A5UN8AKqYZTvGfsofa4RdaIoBq+CG5unQwzulpU5ndOpaUJcc9QhNV+rN
EtxvFEfiq9mDefg1kb2JW/W2ew22sr8fzhRJHnoIXGBJ2RtV+hc=
=N5Us
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/legoater/tags/pull-aspeed-20210503' into staging
Aspeed patches :
* Fixes for the DMA space
* New model for ASPEED's Hash and Crypto Engine (Joel and Klaus)
* Acceptance tests (Joel)
* A fix for the XDMA model
* Some extra features for the SMC controller.
* Two new boards : rainier-bmc and quanta-q7l1-bmc (Patrick)
# gpg: Signature made Mon 03 May 2021 06:23:36 BST
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1
* remotes/legoater/tags/pull-aspeed-20210503:
aspeed: Add support for the quanta-q7l1-bmc board
hw/block: m25p80: Add support for mt25ql02g and mt25qu02g
aspeed: Add support for the rainier-bmc board
aspeed: Deprecate the swift-bmc machine
tests/qtest: Rename m25p80 test in aspeed_smc test
aspeed/smc: Add extra controls to request DMA
aspeed/smc: Add a 'features' attribute to the object class
hw/misc/aspeed_xdma: Add AST2600 support
tests/acceptance: Test ast2600 machine
tests/acceptance: Test ast2400 and ast2500 machines
tests/qtest: Add test for Aspeed HACE
aspeed: Integrate HACE
hw: Model ASPEED's Hash and Crypto Engine
hw/arm/aspeed: Do not sysbus-map mmio flash region directly, use alias
aspeed/i2c: Rename DMA address space
aspeed/i2c: Fix DMA address mask
aspeed/smc: Remove unused "sdram-base" property
aspeed/smc: Use the RAM memory region for DMAs
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Stop including sysemu/sysemu.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-2-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
The Micron mt25ql02g is a 3V 2Gb serial NOR flash memory supporting
dual I/O and quad I/O, 4KB, 32KB, 64KB sector erase. It also supports
4B opcodes. The mt25qu02g operates at 1.8V.
https://4donline.ihs.com/images/VipMasterIC/IC/MICT/MICT-S-A0008500026/MICT-S-A0008511423-1.pdf?hkey=52A5661711E402568146F3353EA87419
Cc: Alistair Francis <alistair.francis@wdc.com>
Cc: Francisco Iglesias <francisco.iglesias@xilinx.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Francisco Iglesias <francisco.iglesias@xilinx.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
virtio_add_queue() aborts when queue_size > VIRTQUEUE_MAX_SIZE, so
vhost_user_blk_device_realize() should check this before calling it.
Simple reproducer:
qemu-system-x86_64 \
-chardev null,id=foo \
-device vhost-user-blk-pci,queue-size=4096,chardev=foo
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1935014
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210413165654.50810-1-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit 1901b4967c changed the nvme device from using a bar exclusive
for MSI-x to sharing it on bar0.
Unfortunately, the msix_uninit_exclusive_bar() call remains in
nvme_exit() which causes havoc when the device is removed with, say,
device_del. Fix this.
Additionally, a subregion is added but it is not removed on exit which
causes a reference to linger and the drive to never be unlocked.
Fixes: 1901b4967c ("hw/block/nvme: move msix table and pba to BAR 0")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
For most commands, when issuing an AIO, the BlockAIOCB is stored in the
NvmeRequest aiocb pointer when the AIO is issued. The main use of this
is cancelling AIOs when deleting submission queues (it is currently not
used for Abort).
However, some commands like Dataset Management Zone Management Send
(zone reset) may involve more than one AIO and here the AIOs are issued
without saving a reference to the BlockAIOCB. This is a problem since
nvme_del_sq() will attempt to cancel outstanding AIOs, potentially with
an invalid BlockAIOCB since the aiocb pointer is not NULL'ed when the
request structure is recycled.
Fix this by
1. making sure the aiocb pointer is NULL'ed when requests are recycled
2. only attempt to cancel the AIO if the aiocb is non-NULL
3. if any AIOs could not be cancelled, drain all aio as a last resort.
Fixes: dc04d25e2f ("hw/block/nvme: add support for the format nvm command")
Fixes: c94973288c ("hw/block/nvme: add broadcast nsid support flush command")
Fixes: e4e430b3d6 ("hw/block/nvme: add simple copy command")
Fixes: 5f5dc4c6a9 ("hw/block/nvme: zero out zones on reset")
Fixes: 2605257a26 ("hw/block/nvme: add the dataset management command")
Cc: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Cc: Minwoo Im <minwoo.im@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
nvme_compare() fails to store the aiocb from the blk_aio_preadv() call.
Fix this.
Fixes: 0a384f923f ("hw/block/nvme: add compare command")
Cc: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
nvme_map_prp needs to calculate the number of list entries based on the
offset value. For the subsequent PRP2 list, need to ensure the number of
entries is within the MAX number of PRP entries for a page.
Signed-off-by: Padmakar Kalghatgi <p.kalghatgi@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Setting the 'fallback' property corrupts the QOM instance state
(FDCtrlSysBus) because it accesses an incorrect offset (it uses
the offset of the FDCtrlISABus state).
Cc: qemu-stable@nongnu.org
Fixes: a73275dd6f ("fdc: Add fallback option")
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210407133742.1680424-1-f4bug@amsat.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
nvme_subsys_ctrl() is used in contexts where the given controller
identifier is from an untrusted source. Like its friends nvme_ns() and
nvme_subsys_ns(), nvme_subsys_ctrl() should just return NULL if an
invalid identifier is given.
Fixes: 645ce1a70c ("hw/block/nvme: support namespace attachment command")
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
nvme_subsys_ns() is used in contexts where the namespace identifier is
taken from an untrusted source. Commit 3921756dee ("hw/block/nvme:
assert namespaces array indices") tried to guard against this by
introducing an assert on the namespace identifier.
This is wrong since it is perfectly valid to call the function with an
invalid namespace identifier and like nvme_ns(), nvme_subsys_ns() should
simply return NULL.
Fixes: 3921756dee ("hw/block/nvme: assert namespaces array indices")
Fixes: 94d8d6d167 ("hw/block/nvme: support allocated namespace type")
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
nvme_ns_attachment() does not verify the contents of the host-supplied
16 bit "Number of Identifiers" field in the command payload.
Make sure the value is capped at 2047 and fix the out-of-bounds read.
Fixes: 645ce1a70c ("hw/block/nvme: support namespace attachment command")
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Add missing license/copyright headers to the nvme-dif.{c,h} files.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Prior to this patch, if a private nvme-ns device (that is, a namespace
that is not linked to a subsystem) is wired up to an nvme-subsys linked
nvme controller device, the device fails to verify that the namespace id
is unique within the subsystem. NVM Express v1.4b, Section 6.1.6 ("NSID
and Namespace Usage") states that because the device supports Namespace
Management, "NSIDs *shall* be unique within the NVM subsystem".
Additionally, prior to this patch, private namespaces are not known to
the subsystem and the namespace is considered exclusive to the
controller with which it is initially wired up to. However, this is not
the definition of a private namespace; per Section 1.6.33 ("private
namespace"), a private namespace is just a namespace that does not
support multipath I/O or namespace sharing, which means "that it is only
able to be attached to one controller at a time".
Fix this by always allocating namespaces in the subsystem (if one is
linked to the controller), regardless of the shared/private status of
the namespace. Whether or not the namespace is shareable is controlled
by a new `shared` nvme-ns parameter.
Finally, this fix allows the nvme-ns `subsys` parameter to be removed,
since the `shared` parameter now serves the purpose of attaching the
namespace to all controllers in the subsystem upon device realization.
It is invalid to have an nvme-ns namespace device with a linked
subsystem without the parent nvme controller device also being linked to
one and since the nvme-ns devices will unconditionally be "attached" (in
QEMU terms that is) to an nvme controller device through an NvmeBus, the
nvme-ns namespace device can always get a reference to the subsystem of
the controller it is explicitly (using 'bus=' parameter) or implicitly
attaching to.
Fixes: e570768566 ("hw/block/nvme: support for shared namespace in subsystem")
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
The Non-MDTS DMSRL limit must be recomputed when namespaces are
detached.
Fixes: 645ce1a70c ("hw/block/nvme: support namespace attachment command")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Remove the unused BlockConf from the controller structure and remove the
noop constraint checking.
Device works just fine with both legacy drive parameter namespace and
nvme-ns namespace definitions.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
The `nvme_nsid()` function returns '-1' (FFFFFFFFh) when the given
namespace is NULL. Since FFFFFFFFh is actually a valid namespace
identifier (the "broadcast" value), change this to be '0' since that
actually *is* the invalid value.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add the missing nvme_adm_opc_str entry for the Namespace Attachment
command.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Protection Information can only be enabled if there is at least 8 bytes
of metadata.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
The check for `n->namespace.blkconf.blk` always fails because
this is in the initialization function.
Signed-off-by: Joelle van Dyne <j@getutm.app>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The description was originally removed in commit 578d914b26
("hw/block/nvme: align zoned.zasl with mdts") together with the removal
of the zoned.append_size_limit parameter itself.
However, it was (most likely accidentally), re-added in commit
f7dcd31885 ("hw/block/nvme: add non-mdts command size limit for verify").
Remove the description again, since the parameter it describes,
zoned.append_size_limit, no longer exists.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Qemu crashes on shutdown if the chardev used by vhost-user-blk has been
finalized before the vhost-user-blk.
This happens with char-socket chardev operating in the listening mode (server).
The char-socket chardev emits "close" event at the end of finalizing when
its internal data is destroyed. This calls vhost-user-blk event handler
which in turn tries to manipulate with destroyed chardev by setting an empty
event handler for vhost-user-blk cleanup postponing.
This patch separates the shutdown case from the cleanup postponing removing
the need to set an event handler.
Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
Message-Id: <20210325151217.262793-4-den-plotnikov@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit 4bcad76f4c ("vhost-user-blk: delay vhost_user_blk_disconnect")
introduced postponing vhost_dev cleanup aiming to eliminate qemu aborts
because of connection problems with vhost-blk daemon.
However, it introdues a new problem. Now, any communication errors
during execution of vhost_dev_init() called by vhost_user_blk_device_realize()
lead to qemu abort on assert in vhost_dev_get_config().
This happens because vhost_user_blk_disconnect() is postponed but
it should have dropped s->connected flag by the time
vhost_user_blk_device_realize() performs a new connection opening.
On the connection opening, vhost_dev initialization in
vhost_user_blk_connect() relies on s->connection flag and
if it's not dropped, it skips vhost_dev initialization and returns
with success. Then, vhost_user_blk_device_realize()'s execution flow
goes to vhost_dev_get_config() where it's aborted on the assert.
To fix the problem this patch adds immediate cleanup on device
initialization(in vhost_user_blk_device_realize()) using different
event handlers for initialization and operation introduced in the
previous patch.
On initialization (in vhost_user_blk_device_realize()) we fully
control the initialization process. At that point, nobody can use the
device since it isn't initialized and we don't need to postpone any
cleanups, so we can do cleaup right away when there is a communication
problem with the vhost-blk daemon.
On operation we leave it as is, since the disconnect may happen when
the device is in use, so the device users may want to use vhost_dev's data
to do rollback before vhost_dev is re-initialized (e.g. in vhost_dev_set_log()).
Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20210325151217.262793-3-den-plotnikov@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It is useful to use different connect/disconnect event handlers
on device initialization and operation as seen from the further
commit fixing a bug on device initialization.
This patch refactors the code to make use of them: we don't rely any
more on the VM state for choosing how to cleanup the device, instead
we explicitly use the proper event handler depending on whether
the device has been initialized.
Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20210325151217.262793-2-den-plotnikov@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Max noticed that since blk_aio_pwrite_zeroes() may invoke the callback
before returning, the callbacks will never see *count == 0 and thus
never free the count variable or decrement num_formats causing a CQE to
never be posted.
Coverity (CID 1451082) also picked up on the fact that count would not
be free'ed if the namespace was of zero size.
Fix both of these issues by explicitly checking *count and finalize for
the given namespace if --(*count) is zero. Enqueing a CQE if there are
no AIOs outstanding after this case is already handled by nvme_format()
by inspecting *num_formats.
Reported-by: Max Reitz <mreitz@redhat.com>
Reported-by: Coverity (CID 1451082)
Fixes: dc04d25e2f ("hw/block/nvme: add support for the format nvm command")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
If nvme_map_dptr() fails, nvme_dif_rw() will leak the bounce context.
Fix this by using the same error handling as everywhere else in the
function.
Reported-by: Coverity (CID 1451080)
Fixes: 146f720c55 ("hw/block/nvme: end-to-end data protection")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Whenever a Xen block device is detach via xenstore, the image
associated with it remained open by the backend QEMU and an error is
logged:
qemu-system-i386: failed to destroy drive: Node xvdz-qcow2 is in use
This happened since object_unparent() doesn't immediately frees the
object and thus keep a reference to the node we are trying to free.
The reference is hold by the "drive" property and the call
xen_block_drive_destroy() fails.
In order to fix that, we call drain_call_rcu() to run the callback
setup by bus_remove_child() via object_unparent().
Fixes: 2d24a64661 ("device-core: use RCU for list of children of a bus")
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <20210308143232.83388-1-anthony.perard@citrix.com>
Per SST25VF016B datasheet [1], SST flash requires a dummy byte after
the address bytes. Note only SPI mode is supported by SST flashes.
[1] http://ww1.microchip.com/downloads/en/devicedoc/s71271_04.pdf
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20210306060152.7250-1-bmeng.cn@gmail.com
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Several QOM type names contain ',':
ARM,bitband-memory
etraxfs,pic
etraxfs,serial
etraxfs,timer
fsl,imx25
fsl,imx31
fsl,imx6
fsl,imx6ul
fsl,imx7
grlib,ahbpnp
grlib,apbpnp
grlib,apbuart
grlib,gptimer
grlib,irqmp
qemu,register
SUNW,bpp
SUNW,CS4231
SUNW,DBRI
SUNW,DBRI.prom
SUNW,fdtwo
SUNW,sx
SUNW,tcx
xilinx,zynq_slcr
xlnx,zynqmp
xlnx,zynqmp-pmu-soc
xlnx,zynq-xadc
These are all device types. They can't be plugged with -device /
device_add, except for xlnx,zynqmp-pmu-soc, and I doubt that one
actually works.
They *can* be used with -device / device_add to request help.
Usability is poor, though: you have to double the comma, like this:
$ qemu-system-x86_64 -device SUNW,,fdtwo,help
Trap for the unwary. The fact that this was broken in
device-introspect-test for more than six years until commit e27bd49876
fixed it demonstrates that "the unwary" includes seasoned developers.
One QOM type name contains ' ': "ICH9 SMB". Because having to
remember just one way to quote would be too easy.
Rename the "SUNW,FOO types to "sun-FOO". Summarily replace ',' and '
' by '-' in the other type names.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210304140229.575481-2-armbru@redhat.com>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
The previous commit rendered the name fdctrl_connect_drives() somewhat
misleading. Get rid of it by inlining the (now pretty simple)
function into its only caller.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20210309161214.1402527-4-armbru@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Drop the crap deprecated in commit 4a27a638e7 "fdc: Deprecate
configuring floppies with -global isa-fdc" (v5.1.0).
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20210309161214.1402527-3-armbru@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
- QAPIfy object-add and --object
- stream: Fail gracefully if permission is denied
- storage-daemon: Fix crash on quit when job is still running
- curl: Fix use after free
- char: Deprecate backend aliases, fix QMP query-chardev-backends
- Fix image creation option defaults that exist in both the format and
the protocol layer (e.g. 'cluster_size' in qcow2 and rbd; the qcow2
default was incorrectly applied to the rbd layer)
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmBUbF4RHGt3b2xmQHJl
ZGhhdC5jb20ACgkQfwmycsiPL9YX2Q//Ve6++hRulIuJVuh8QDxlmGWERqey/ClX
mUqGDOkSSXfftPTDPCYSUFE7QD6HD25oJmUTix2B2P89AIyDcvqvthMDU/j8clor
X3Kx03ky3NLJilZNdYZ2GOMyljgNP3JSrDHBjc/tZx+1e7C5tPNVxXOUW946wIC9
no6xTAarAANl/GS23ZI+vJ3PBEggzAbu6t/hwT//WAB0WB9wFhkCCzWPXIkdBXwP
QpG8chTwuwFAW1c52F0OeQV5FpM0bcMtxYASuNq0HPL6B8qUdKOusTRgTB/fjLLV
tYMhc6tzPLUlin1mGD4m0P+9tRMBFtF/flZVwbd4S+avcAbV2L5S6Xq0QsiNTbx2
oQUk6N2/IWBOMC6D8aBTBwZ7CCasgEg0imtLUdJ8gKp6T44C7cqg1oZwT2dOyYuI
jS+3T+DcZZn3mHmp61nowL/2/2LDAVaLmOfbsvmvlbuX5j8QHj/Lvt6udRjqpelJ
n0jV9Ay0myu7dSK5ng7WNQUlSrba5I/W3CAjuH0CDp90ADCymWSp2jktvv5rSO/R
bQpz58kRY72y4dEwOy0zkWc/EClqh3p4abq5HDBCIkO+EO8CjJhnEnT+oOrFF5C/
LU93bFPyp6ZJoXzsKnjEjSzMzgDT6XuGTAgrh6upZy52ssjkG8zACbNOmXZTYmAg
hg3OlpdEUvM=
=zZCm
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches and object-add QAPIfication
- QAPIfy object-add and --object
- stream: Fail gracefully if permission is denied
- storage-daemon: Fix crash on quit when job is still running
- curl: Fix use after free
- char: Deprecate backend aliases, fix QMP query-chardev-backends
- Fix image creation option defaults that exist in both the format and
the protocol layer (e.g. 'cluster_size' in qcow2 and rbd; the qcow2
default was incorrectly applied to the rbd layer)
# gpg: Signature made Fri 19 Mar 2021 09:18:22 GMT
# gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg: issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (42 commits)
vl: allow passing JSON to -object
qom: move user_creatable_add_opts logic to vl.c and QAPIfy it
tests: convert check-qom-proplist to keyval
qom: Support JSON in HMP object_add and tools --object
char: Simplify chardev_name_foreach()
char: Deprecate backend aliases 'tty' and 'parport'
char: Skip CLI aliases in query-chardev-backends
qom: Add user_creatable_parse_str()
hmp: QAPIfy object_add
qemu-img: Use user_creatable_process_cmdline() for --object
qom: Add user_creatable_add_from_str()
qemu-nbd: Use user_creatable_process_cmdline() for --object
qemu-io: Use user_creatable_process_cmdline() for --object
qom: Factor out user_creatable_process_cmdline()
qom: Remove user_creatable_add_dict()
qemu-storage-daemon: Implement --object with qmp_object_add()
qom: Make "object" QemuOptsList optional
qapi/qom: QAPIfy object-add
qapi/qom: Add ObjectOptions for x-remote-object
qapi/qom: Add ObjectOptions for input-*
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This converts object-add from 'gen': false to the ObjectOptions QAPI
type. As an immediate benefit, clients can now use QAPI schema
introspection for user creatable QOM objects.
It is also the first step towards making the QAPI schema the only
external interface for the creation of user creatable objects. Once all
other places (HMP and command lines of the system emulator and all
tools) go through QAPI, too, some object implementations can be
simplified because some checks (e.g. that mandatory options are set) are
already performed by QAPI, and in another step, QOM boilerplate code
could be generated from the schema.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Format NVM admin command can make a namespace or namespaces to be
with different LBA size and metadata size with protection information
types.
This patch introduces Format NVM command with LBA format, Metadata, and
Protection Information for the device. The secure erase operation things
and support for formatting zoned namespaces are yet to be added.
The parameter checks inside of this patch has been referred from
Keith's old branch.
Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
[anaidu.gollu: rebased on e2e]
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
[k.jensen: rebased for reworked aio tracking]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Pull lba format initialization code into separate function in
preparation for Format NVM support.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
In preparation for Format NVM support, use runtime helpers instead of
the constant device parameters when getting lba size information etc.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
This patch introduces multiple LBA formats supported with the typical
logical block sizes of 512 bytes and 4096 bytes as well as metadata
sizes of 0, 8, 16 and 64 bytes. The format will be chosed based on the
lbads and ms parameters of the nvme-ns device.
Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
[k.jensen: resurrected and rebased]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Verify is not subject to MDTS, so a single Verify command may result in
excessive amounts of allocated memory. Impose a limit on the data size
by adding support for TP 4040 ("Non-MDTS Command Size Limits").
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add support for namespaces formatted with protection information. The
type of end-to-end data protection (i.e. Type 1, Type 2 or Type 3) is
selected with the `pi` nvme-ns device parameter. If the number of
metadata bytes is larger than 8, the `pil` nvme-ns device parameter may
be used to control the location of the 8-byte DIF tuple. The default
`pil` value of '0', causes the DIF tuple to be transferred as the last
8 bytes of the metadata. Set to 1 to store this in the first eight bytes
instead.
Co-authored-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add support for metadata in the form of extended logical blocks as well
as a separate buffer of data. The new `ms` nvme-ns device parameter
specifies the size of metadata per logical block in bytes. The `mset`
nvme-ns device parameter controls whether metadata is transfered as part
of an extended lba (set to '1') or in a separate buffer (set to '0',
the default).
Regardsless of the scheme chosen with `mset`, metadata is stored at the
end of the namespace backing block device. This requires the user
provided PRP/SGLs to be walked and "split" into data and metadata
scatter/gather lists if the extended logical block scheme is used, but
has the advantage of not breaking the deallocated blocks support.
Co-authored-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
nvme_zone_mgmt_recv uses nvme_ns_nlbas() to get the number of LBAs in
the namespace and then calculates the number of zones to report by
incrementing slba with ZSZE until exceeding the number of LBAs as
returned by nvme_ns_nlbas().
This is bad because the namespace might be of such as size that some
LBAs are valid, but are not part of any zone, causing zone management
receive to report one additional (but non-existing) zone.
Fix this with a conventional loop on i < ns->num_zones instead.
Fixes: a479335bfa ("hw/block/nvme: Support Zoned Namespace Command Set")
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Coverity complains about a possible memory corruption in the
nvme_ns_attach and _detach functions. While we should not (famous last
words) be able to reach this function without nsid having previously
been validated, this is still an open door for future misuse.
Make Coverity and maintainers happy by asserting that the index into the
array is valid. Also, while not detected by Coverity (yet), add an
assert in nvme_subsys_ns and nvme_subsys_register_ns as well since a
similar issue is exists there.
Fixes: 037953b5b2 ("hw/block/nvme: support namespace detach")
Fixes: CID 1450757
Fixes: CID 1450758
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
page_size is a uint32_t, and zasl is a uint8_t, so the expression
`page_size << zasl` is done using 32-bit arithmetic and might overflow.
Since we then compare this against a 64 bit data_size value, Coverity
complains that we might overflow unintentionally. An MDTS/ZASL value in
excess of 4GiB is probably impractical, but it is not entirely
unrealistic, so add a cast such that we handle that case properly.
Fixes: 578d914b26 ("hw/block/nvme: align zoned.zasl with mdts")
Fixes: CID 1450756
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Rather than having a device specific debug implementation in
pflash_cfi01.c and pflash_cfi02.c, use the standard tracing facility.
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210216142721.1985543-2-david.edmondson@oracle.com>
[PMD: Rebased, fixed pflash_write_block_erase trace event format]
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
PFlashCFI01.ro is a bool, declare it as such.
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210216142721.1985543-3-david.edmondson@oracle.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Use the 'mode_read_array' event when we set the device in such
mode, and use the 'reset' event in DeviceReset handler.
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210310170528.1184868-10-philmd@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Message-Id: <20210310170528.1184868-9-philmd@redhat.com>
There is multiple places resetting the internal state machine.
Factor the code out in a new pflash_reset_state_machine() method.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Message-Id: <20210310170528.1184868-8-philmd@redhat.com>
The same pattern is used when setting the flash in READ_ARRAY mode:
- Set the state machine command to READ_ARRAY
- Reset the write_cycle counter
- Reset the memory region in ROMD
Refactor the current code by extracting this pattern.
It is used three times:
- When the timer expires and not in bypass mode
- On a read access (on invalid command).
- When the device is initialized. Here the ROMD mode is hidden
by the memory_region_init_rom_device() call.
pflash_register_memory(rom_mode=true) already sets the ROM device
in "read array" mode (from I/O device to ROM one). Explicit that
by renaming the function as pflash_mode_read_array(), adding
a trace event and resetting wcycle.
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210310170528.1184868-7-philmd@redhat.com>
There is only one call to pflash_register_memory() with
rom_mode == false. As we want to modify pflash_register_memory()
in the next patch, open-code this trivial function in place for
the 'rom_mode == false' case.
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210310170528.1184868-6-philmd@redhat.com>
There is only one call to pflash_setup_mappings(). Convert 'rom_mode'
to boolean and set it to true directly within pflash_setup_mappings().
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210310170528.1184868-5-philmd@redhat.com>
Fill the CFI table in out of DeviceRealize() in a new function:
pflash_cfi02_fill_cfi_table().
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210310170528.1184868-4-philmd@redhat.com>
Fill the CFI table in out of DeviceRealize() in a new function:
pflash_cfi01_fill_cfi_table().
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210310170528.1184868-3-philmd@redhat.com>
We are going to move this code, fix its style first.
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210310170528.1184868-2-philmd@redhat.com>
Report the configured granularity for discard operation to the
guest. If this is not set use the block size.
Since until now we have ignored the configured discard granularity
and always reported the block size, let's add
'report-discard-granularity' property and disable it for older
machine types to avoid migration issues.
Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210225001239.47046-1-akihiko.odaki@gmail.com>
The 'running' argument from VMChangeStateHandler does not require
other value than 0 / 1. Make it a plain boolean.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20210111152020.1422021-3-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Support Identify command for Namespace attached controller list. This
command handler will traverse the controller instances in the given
subsystem to figure out whether the specified nsid is attached to the
controllers or not.
The 4096bytes Identify data will return with the first entry (16bits)
indicating the number of the controller id entries. So, the data can
hold up to 2047 entries for the controller ids.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Klaus Jensen <k.jensen@samsung.com>
[k.jensen: rebased for dma refactor]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
If namespace inventory is changed due to some reasons (e.g., namespace
attachment/detachment), controller can send out event notifier to the
host to manage namespaces.
This patch sends out the AEN to the host after either attach or detach
namespaces from controllers. To support clear of the event from the
controller, this patch also implemented Get Log Page command for Changed
Namespace List log type. To return namespace id list through the
command, when namespace inventory is updated, id is added to the
per-controller list (changed_ns_list).
To indicate the support of this async event, this patch set
OAES(Optional Asynchronous Events Supported) in Identify Controller data
structure.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
This patch supports Namespace Attachment command for the pre-defined
nvme-ns device nodes. Of course, attach/detach namespace should only be
supported in case 'subsys' is given. This is because if we detach a
namespace from a controller, somebody needs to manage the detached, but
allocated namespace in the NVMe subsystem.
As command effect for the namespace attachment command is registered,
the host will be notified that namespace inventory is changed so that
host will rescan the namespace inventory after this command. For
example, kernel driver manages this command effect via passthru IOCTL.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Klaus Jensen <k.jensen@samsung.com>
[k.jensen: rebased for dma refactor]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
This patch has no functional changes. This patch just refactored
nvme_select_ns_iocs() to iterate the attached namespaces of the
controlller and make it invoke __nvme_select_ns_iocs().
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
From NVMe spec 1.4b "6.1.5. NSID and Namespace Relationships" defines
valid namespace types:
- Unallocated: Not exists in the NVMe subsystem
- Allocated: Exists in the NVMe subsystem
- Inactive: Not attached to the controller
- Active: Attached to the controller
This patch added support for allocated, but not attached namespace type:
!nvme_ns(n, nsid) && nvme_subsys_ns(n->subsys, nsid)
nvme_ns() returns attached namespace instance of the given controller
and nvme_subsys_ns() returns allocated namespace instance in the
subsystem.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Expand allocated namespace list (subsys->namespaces) to have 256 entries
which is a value lager than at least NVME_MAX_NAMESPACES which is for
attached namespace list in a controller.
Allocated namespace list should at least larger than attached namespace
list.
n->num_namespaces = NVME_MAX_NAMESPACES;
The above line will set the NN field by id->nn so that the subsystem
should also prepare at least this number of namespace list entries.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
subsys->namespaces array used to be sized to NVME_SUBSYS_MAX_NAMESPACES.
But subsys->namespaces are being accessed with 1-based namespace id
which means the very first array entry will always be empty(NULL).
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Given that now we have nvme-subsys device supported, we can manage
namespace allocated, but not attached: detached. This patch introduced
a parameter for nvme-ns device named 'detached'. This parameter
indicates whether the given namespace device is detached from
a entire NVMe subsystem('subsys' given case, shared namespace) or a
controller('bus' given case, private namespace).
- Allocated namespace
1) Shared ns in the subsystem 'subsys0':
-device nvme-ns,id=ns1,drive=blknvme0,nsid=1,subsys=subsys0,detached=true
2) Private ns for the controller 'nvme0' of the subsystem 'subsys0':
-device nvme-subsys,id=subsys0
-device nvme,serial=foo,id=nvme0,subsys=subsys0
-device nvme-ns,id=ns1,drive=blknvme0,nsid=1,bus=nvme0,detached=true
3) (Invalid case) Controller 'nvme0' has no subsystem to manage ns:
-device nvme,serial=foo,id=nvme0
-device nvme-ns,id=ns1,drive=blknvme0,nsid=1,bus=nvme0,detached=true
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The nvme_dma function doesn't just do DMA (QEMUSGList-based) memory transfers;
it also handles QEMUIOVector copies.
Introduce the NvmeTxDirection enum and rename to nvme_tx. Remove mapping
of PRPs/SGLs from nvme_tx and instead assert that they have been mapped
previously. This allows more fine-grained use in subsequent patches.
Add new (better named) helpers, nvme_{c2h,h2c}, that does both PRP/SGL
mapping and transfer.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
The PRP and SGL mapping functions does not have any particular need for
the entire NvmeRequest as a parameter. Clean it up.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Introduce NvmeSg and try to deal with that pesky qsg/iov duality that
haunts all the memory-related functions.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
A Write Zeroes commands should not be counted in either the 'Data Units
Written' or in 'Host Write Commands' SMART/Health Information Log page.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
The 'len' member of the nvme_compare_ctx struct is redundant since the
same information is available in the 'iov' member.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Dataset Management is not subject to MDTS, but exceeded a certain size
per range causes internal looping. Report this limit (DMRSL) in the NVM
command set specific identify controller data structure.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add a trace event for the offline zone condition when checking zone
read.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
[k.jensen: split commit]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
assert may be compiled to a noop and we could end up returning an
uninitialized status.
Fix this by always returning Internal Device Error as a fallback.
Note that, as pointed out by Philippe, per commit 262a69f428 ("osdep.h:
Prohibit disabling assert() in supported builds") this shouldn't be
possible. But clean it up so we don't worry about it again.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
[k.jensen: split commit]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Add a trace event for the Identify command.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Remove an unnecessary le_to_cpu conversion in Identify.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
ZASL (Zone Append Size Limit) is defined exactly like MDTS (Maximum Data
Transfer Size), that is, it is a value in units of the minimum memory
page size (CAP.MPSMIN) and is reported as a power of two.
The 'mdts' nvme device parameter is specified as in the spec, but the
'zoned.append_size_limit' parameter is specified in bytes. This is
suboptimal for a number of reasons:
1. It is just plain confusing wrt. the definition of mdts.
2. There is a lot of complexity involved in validating the value; it
must be a power of two, it should be larger than 4k, if it is zero
we set it internally to mdts, but still report it as zero.
3. While "hw/block/nvme: improve invalid zasl value reporting"
slightly improved the handling of the parameter, the validation is
still wrong; it does not depend on CC.MPS, it depends on
CAP.MPSMIN. And we are not even checking that it is actually less
than or equal to MDTS, which is kinda the *one* condition it must
satisfy.
Fix this by defining zasl exactly like mdts and checking the one thing
that it must satisfy (that it is less than or equal to mdts). Also,
change the default value from 128KiB to 0 (aka, whatever mdts is).
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add support for using the broadcast nsid to issue a flush on all
namespaces through a single command.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Commit 6eb7a07129 ("hw/block/nvme: change controller pci id") changed
the controller to use a Red Hat assigned PCI Device and Vendor ID, but
did not change the IEEE OUI away from the Intel IEEE OUI.
Fix that and use the locally assigned QEMU IEEE OUI instead if the
`use-intel-id` parameter is not explicitly set. Also reverse the Intel
IEEE OUI bytes.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
The Zone Append Size Limit (ZASL) must be at least 4096 bytes, so
improve the user experience by adding an early parameter check in
nvme_check_constraints.
When ZASL is still too small due to the host configuring the device for
an even larger page size, convert the trace point in nvme_start_ctrl to
an NVME_GUEST_ERR such that this is logged by QEMU instead of only
traced.
Reported-by: Corne <info@dantalion.nl>
Cc: Dmitry Fomichev <Dmitry.Fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Firstly, if zoned.max_active is non-zero, zoned.max_open must be less
than or equal to zoned.max_active.
Secondly, if only zones.max_active is set, we have to explicitly set
zones.max_open or we end up with an invalid MAR/MOR configuration. This
is an artifact of the parameters not being zeroes-based like in the
spec.
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reported-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Implicitly and Explicitly Open zones can be closed by Close Zone
management function. This got broken by a recent commit ("hw/block/nvme:
refactor zone resource management") and now such commands fail with
Invalid Zone State Transition status.
Modify nvm_zrm_close() function to make Close Zone work correctly.
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Add support for TP 4065a ("Simple Copy Command"), v2020.05.04
("Ratified").
The implementation uses a bounce buffer to first read in the source
logical blocks, then issue a write of that bounce buffer. The default
maximum number of source logical blocks is 128, translating to 512 KiB
for 4k logical blocks which aligns with the default value of MDTS.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
In preparation for Simple Copy, pull write pointer advancement into a
separate function that is independent off an NvmeRequest.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Zone transition handling and resource management is open coded (and
semi-duplicated in the case of open, close and finish).
In preparation for Simple Copy command support (which also needs to open
zones for writing), consolidate into a set of 'nvme_zrm' functions and
in the process fix a bug with the controller not closing an open zone to
allow another zone to be explicitly opened.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Remove the unused NvmeCtrl parameter in nvme_check_zone_write.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
nvme-ns device is registered to a nvme controller device during the
initialization in nvme_register_namespace() in case that 'bus' property
is given which means it's mapped to a single controller.
This patch introduced a new property 'subsys' just like the controller
device instance did to map a namespace to a NVMe subsystem.
If 'subsys' property is given to the nvme-ns device, it will belong to
the specified subsystem and will be attached to all controllers in that
subsystem by enabling shared namespace capability in NMIC(Namespace
Multi-path I/O and Namespace Capabilities) in Identify Namespace.
Usage:
-device nvme-subsys,id=subsys0
-device nvme,serial=foo,id=nvme0,subsys=subsys0
-device nvme,serial=bar,id=nvme1,subsys=subsys0
-device nvme,serial=baz,id=nvme2,subsys=subsys0
-device nvme-ns,id=ns1,drive=<drv>,nsid=1,subsys=subsys0 # Shared
-device nvme-ns,id=ns2,drive=<drv>,nsid=2,bus=nvme2 # Non-shared
In the above example, 'ns1' will be shared to 'nvme0' and 'nvme1' in
the same subsystem. On the other hand, 'ns2' will be attached to the
'nvme2' only as a private namespace in that subsystem.
All the namespace with 'subsys' parameter will attach all controllers in
the subsystem to the namespace by default.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Tested-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
We have nvme-subsys and nvme devices mapped together. To support
multi-controller scheme to this setup, controller identifier(id) has to
be managed. Earlier, cntlid(controller id) used to be always 0 because
we didn't have any subsystem scheme that controller id matters.
This patch introduced 'cntlid' attribute to the nvme controller
instance(NvmeCtrl) and make it allocated by the nvme-subsys device
mapped to the controller. If nvme-subsys is not given to the
controller, then it will always be 0 as it was.
Added 'ctrls' array in the nvme-subsys instance to manage attached
controllers to the subsystem with a limit(32). This patch didn't take
list for the controllers to make it seamless with nvme-ns device.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Tested-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
nvme controller(nvme) can be mapped to a NVMe subsystem(nvme-subsys).
This patch maps a controller to a subsystem by adding a parameter
'subsys' to the nvme device.
To map a controller to a subsystem, we need to put nvme-subsys first and
then maps the subsystem to the controller:
-device nvme-subsys,id=subsys0
-device nvme,serial=foo,id=nvme0,subsys=subsys0
If 'subsys' property is not given to the nvme controller, then subsystem
NQN will be created with serial (e.g., 'foo' in above example),
Otherwise, it will be based on subsys id (e.g., 'subsys0' in above
example).
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Tested-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
To support multi-path in QEMU NVMe device model, We need to have NVMe
subsystem hierarchy to map controllers and namespaces to a NVMe
subsystem.
This patch introduced a simple nvme-subsys device model. The subsystem
will be prepared with subsystem NQN with <subsys_id> provided in
nvme-subsys device:
ex) -device nvme-subsys,id=subsys0: nqn.2019-08.org.qemu:subsys0
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Tested-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
[k.jensen: added 'nqn' device parameter per request]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Treat the num_queues field as virtio-endian. On big-endian hosts the
vhost-user-blk num_queues field was in the wrong endianness.
Move the blkcfg.num_queues store operation from realize to
vhost_user_blk_update_config() so feature negotiation has finished and
we know the endianness of the device. VIRTIO 1.0 devices are
little-endian, but in case someone wants to use legacy VIRTIO we support
all endianness cases.
Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20210223144653.811468-2-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add more fine-grained selection by adding a CONFIG_TC58128
selector for the TC58128 eeprom.
As this device is only used by the Shix machine, add an entry
to the proper section in MAINTAINERS.
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210222141514.2646278-7-f4bug@amsat.org>
This code was introduced in commit 27c7ca7e77,
("SHIX board emulation (Samuel Tardieu)"). Use
the same license.
Cc: Samuel Tardieu <sam@rfc1149.net>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210222141514.2646278-2-f4bug@amsat.org>
This updates the flash information table to include various ISSI
flashes that are supported by upstream U-Boot and Linux kernel.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20210126060007.12904-3-bmeng.cn@gmail.com
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
This adds the ISSI SPI flash support. The number of dummy cycles in
fast read, fast read dual output and fast read quad output commands
is currently using the default 8. Likewise, the same default value
is used for fast read dual/quad I/O command. Per the datasheet [1],
the number of dummy cycles is configurable, but this is not modeled
at present.
For flash whose size is larger than 16 MiB, the sequence of 3-byte
address along with EXTADD bit in the bank address register (BAR) is
not supported. We assume that guest software always uses op codes
with 4-byte address sequence. Fortunately, this is the case for both
U-Boot and Linux spi-nor drivers.
QPI (Quad Peripheral Interface) that supports 2-cycle instruction
has different default values for dummy cycles of fast read family
commands, and is unsupported at the time being.
[1] http://www.issi.com/WW/pdf/25LP-WP256.pdf
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20210126060007.12904-2-bmeng.cn@gmail.com
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
This was only required for the pc-1.0 and earlier machine types.
Now that these have been removed, we can also drop the corresponding
code from the FDC device.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Acked-by: John Snow <jsnow@redhat.com>
Message-Id: <20210203171832.483176-3-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Linux blkfront expects both "discard-granularity" and
"discard-alignment" present on xenbus in order to properly enable the
feature, not exposing "discard-alignment" left some Linux blkfront
versions with a broken discard setup. This has also been addressed in
Linux with:
https://lore.kernel.org/lkml/20210118151528.81668-1-roger.pau@citrix.com/T/#u
Fix QEMU to report a "discard-alignment" of 0, in order for it to work
with older Linux frontends.
Reported-by: Arthur Borsboom <arthurborsboom@gmail.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Message-Id: <20210118153330.82324-1-roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
nvme_ns_realize passes errp to nvme_register_namespaces, but then try to
prepend errp with local_err.
Just remove the local_err and use errp directly.
Fixes: 15d024d4aa ("hw/block/nvme: split setup and register for namespace")
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Current QEMU HEAD nvme.c does not compile with the default GCC 5.4
on a Ubuntu 16.04 host:
hw/block/nvme.c:3242:9: error: ‘result’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
trace_pci_nvme_getfeat_vwcache(result ? "enabled" : "disabled");
^
hw/block/nvme.c:3150:14: note: ‘result’ was declared here
uint32_t result;
^
Explicitly initialize the result to fix it.
Fixes: aa5e55e3b0 ("hw/block/nvme: open code for volatile write cache")
Fixes: Coverity CID 1446371
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Moving namespace registration to the nvme-ns realization function had
the unintended side-effect of breaking legacy namespace registration.
Fix this.
Fixes: 15d024d4aa ("hw/block/nvme: split setup and register for namespace")
Reported-by: Alexander Graf <agraf@csgraf.de>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Tested-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Refactor the zone write check logic such that the most "meaningful"
error is returned first. That is, first, if the zone is not writable,
return an appropriate status code for that. Then, make sure we are
actually writing at the write pointer and finally check that we do not
cross the zone write boundary. This aligns with the "priority" of status
codes for zone read checks.
Also add a couple of additional descriptive trace events and remove an
always true assert.
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Tested-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
When a zone append is processed the controller checks that validity of
the write before assigning the LBA to the append command. This causes
the boundary check to be wrong.
Fix this by checking the write *after* assigning the LBA. Remove the
append special case from the nvme_check_zone_write and open code it in
nvme_do_write, assigning the slba when basic sanity checks have been
performed. Then check the validity of the resulting write like any other
write command.
In the process, also fix a missing endianness conversion for the zone
append ALBA.
Reported-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Tested-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The actual parameter name is 'cross_read' rather than 'cross_zone_read'.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Change status checks to align with the existing style and remove the
explicit check against NVME_SUCCESS.
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Currently, no features are saveable, so the current check is not wrong,
but add a check against the feature capabilities to make sure this will
not regress if saveable features are added later.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Only enable DULBE if the namespace supports it.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
If a user assigns a backing device with less capacity than the size of a
single zone, the namespace capacity will be reported as zero and the
kernel will silently fail to allocate the namespace.
This patch errors out in case that the backing device cannot accomodate
at least a single zone.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
[k.jensen: small fixup in the error and commit message]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The controller now implements v1.4 and we can lift the restrictions on
CMB Data Pointer and Command Independent Locations Support (CDPCILS) and
CMB Data Pointer Mixed Locations Support (CDPMLS) since the device
really does not care about mixed host/cmb pointers in those cases.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
With the new CMB logic in place, bump the implemented specification
version to v1.4 by default.
This requires adding the setting the CNTRLTYPE field and modifying the
VWC field since 0x00 is no longer a valid value for bits 2:1.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Implement v1.4 logic for configuring the Controller Memory Buffer. By
default, the v1.4 scheme will be used (CMB must be explicitly enabled by
the host), so drivers that only support v1.3 will not be able to use the
CMB anymore.
To retain the v1.3 behavior, set the boolean 'legacy-cmb' nvme device
parameter.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Padmakar Kalghatgi <p.kalghatgi@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Add support for the PMRMSCL and PMRMSCU MMIO registers. This allows
adding RDS/WDS support for PMR as well.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Naveen Nagar <naveen.n1@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The PMR should not be enabled at boot up. Disable the PMR MemoryRegion
initially and implement MMIO for PMRCTL, allowing the host to enable the
PMR explicitly.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The controller registers are initially zero. Remove the redundant
zeroing.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Use the correct field names.
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
With BAR 4 now free to use, allow PMR and CMB to be enabled
simultaneously.
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
In the interest of supporting both CMB and PMR to be enabled on the same
device, move the MSI-X table and pending bit array out of BAR 4 and into
BAR 0.
This is a simplified version of the patch contributed by Andrzej
Jakowski (see [1]). Leaving the CMB at offset 0 removes the need for
changes to CMB address mapping code.
[1]: https://lore.kernel.org/qemu-devel/20200729220107.37758-3-andrzej.jakowski@linux.intel.com/
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Tested-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
This patch sets CMBS bit in controller capabilities register when user
configures NVMe driver with CMB support, so capabilites are correctly
reported to guest OS.
Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com>
Reviewed-by: Maxim Levitsky <mlevitsky@gmail.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
64 bit registers like ASQ and ACQ should be writable by both a hi/lo 32
bit write combination as well as a plain 64 bit write. The spec does not
define ordering on the hi/lo split, but the code currently assumes that
the low order bits are written first. Additionally, the code does not
consider that another address might already have been written into the
register, causing the OR'ing to result in a bad address.
Fix this by explicitly overwriting only the low or high order bits for
32 bit writes.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add the size of the mmio read/write to the trace event.
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
During smart critical warning injection by setting property from QMP
command, also try to trigger asynchronous event.
Suggested by Keith, if a event has already been raised, there is no
need to enqueue the duplicate event any more.
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
[k.jensen: fix typo in commit message]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
There is a very low probability that hitting physical NVMe disk
hardware critical warning case, it's hard to write & test a monitor
agent service.
For debugging purposes, add a new 'smart_critical_warning' property
to emulate this situation.
The orignal version of this change is implemented by adding a fixed
property which could be initialized by QEMU command line. Suggested
by Philippe & Klaus, rework like current version.
Test with this patch:
1, change smart_critical_warning property for a running VM:
#virsh qemu-monitor-command nvme-upstream '{ "execute": "qom-set",
"arguments": { "path": "/machine/peripheral-anon/device[0]",
"property": "smart_critical_warning", "value":16 } }'
2, run smartctl in guest
#smartctl -H -l error /dev/nvme0n1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
- volatile memory backup device has failed
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The zone write pointer is unconditionally advanced, even for write
faults. Make sure that the zone is always transitioned to Full if the
write pointer reaches zone capacity.
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
nvme_ns_setup() finally does not have nothing to do with NvmeCtrl
instance.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
In NVMe, namespace is being attached to process I/O. We register NVMe
namespace to a controller via nvme_register_namespace() during
nvme_ns_setup(). This is main reason of receiving NvmeCtrl object
instance to this function to map the namespace to a controller.
To make namespace instance more independent, it should be split into two
parts: setup and register. This patch split them into two differnt
parts, and finally nvme_ns_setup() does not have nothing to do with
NvmeCtrl instance at all.
This patch is a former patch to introduce NVMe subsystem scheme to the
existing design especially for multi-path. In that case, it should be
split into two to make namespace independent from a controller.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Removed no longer used aregument NvmeCtrl object in nvme_ns_init_blk().
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Volatile Write Cache(VWC) feature is set in nvme_ns_setup() in the
initial time. This feature is related to block device backed, but this
feature is controlled in controller level via Set/Get Features command.
This patch removed dependency between nvme and nvme-ns to manage the VWC
flag value. Also, it open coded the Get Features for VWC to check all
namespaces attached to the controller, and if false detected, return
directly false.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
[k.jensen: report write cache preset if present on ANY namespace]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
nvme_ns_init_zoned() has no use for given NvmeCtrl object.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
TP 4053 says (in section 2.3.1.1) -
... if a Zone Append command specifies a ZSLBA that is not the lowest
logical block address in that zone, then the controller shall abort
that command with a status code of Invalid Field In Command.
In the code, Zone Invalid Write is returned instead, fix this.
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
nvme_io_cmd already checks if the namespace supports the Zone Append
command, so the removed check is dead code.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Add missing string representations for a couple of new commands.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
The zoned command set specification states that "All logical blocks in a
zone *shall* be marked as deallocated when [the zone is reset]". Since
the device guarantees 0x00 to be read from deallocated blocks we have to
issue a pwrite_zeroes since we cannot be sure that a discard will do
anything. But typically, this will be achieved with an efficient
unmap/discard operation.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Align with existing style and use a typedef for header-file enums.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Implicitly and explicitly opended zones are always bulk processed
together, so merge the two processing masks.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
A shutdown is only about flushing stuff. It is the host that should
delete any queues, so do not perform a reset here.
Also, on shutdown, make sure that the PMR is flushed if in use.
Fixes: 368f4e752cf9 ("hw/block/nvme: Process controller reset and shutdown differently")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
The device uses the BDRV_BLOCK_ZERO flag to determine the "deallocated"
status of logical blocks. Since the zoned namespaces command set
specification defines that logical blocks SHALL be marked as deallocated
when the zone is in the Empty or Offline states, DULBE can only be
supported if the zone size is a multiple of the calculated deallocation
granularity (reported in NPDG) which depends on the underlying block
device cluster size (if applicable) or the configured
discard_granularity.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Commit 1c0c2163aa ("hw/block/nvme: verify msix_init_exclusive_bar()
return value") had the unintended effect of breaking support on
several platforms not supporting MSI-X.
Still check for errors, but only report that MSI-X is unsupported
instead of bailing out.
Fixes: 1c0c2163aa ("hw/block/nvme: verify msix_init_exclusive_bar() return value")
Fixes: fbf2e5375e ("hw/block/nvme: Verify msix_vector_use() returned value")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Added brief descriptions of the new device properties that are
now available to users to configure features of Zoned Namespace
Command Set in the emulator.
This patch is for documentation only, no functionality change.
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Zone Descriptor Extension is a label that can be assigned to a zone.
It can be set to an Empty zone and it stays assigned until the zone
is reset.
This commit adds a new optional module property,
"zoned.descr_ext_size". Its value must be a multiple of 64 bytes.
If this value is non-zero, it becomes possible to assign extensions
of that size to any Empty zones. The default value for this property
is 0, therefore setting extensions is disabled by default.
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Add two module properties, "zoned.max_active" and "zoned.max_open"
to control the maximum number of zones that can be active or open.
Once these variables are set to non-default values, these limits are
checked during I/O and Too Many Active or Too Many Open command status
is returned if they are exceeded.
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The emulation code has been changed to advertise NVM Command Set when
"zoned" device property is not set (default) and Zoned Namespace
Command Set otherwise.
Define values and structures that are needed to support Zoned
Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator.
Define trace events where needed in newly introduced code.
In order to improve scalability, all open, closed and full zones
are organized in separate linked lists. Consequently, almost all
zone operations don't require scanning of the entire zone array
(which potentially can be quite large) - it is only necessary to
enumerate one or more zone lists.
Handlers for three new NVMe commands introduced in Zoned Namespace
Command Set specification are added, namely for Zone Management
Receive, Zone Management Send and Zone Append.
Device initialization code has been extended to create a proper
configuration for zoned operation using device properties.
Read/Write command handler is modified to only allow writes at the
write pointer if the namespace is zoned. For Zone Append command,
writes implicitly happen at the write pointer and the starting write
pointer value is returned as the result of the command. Write Zeroes
handler is modified to add zoned checks that are identical to those
done as a part of Write flow.
Subsequent commits in this series add ZDE support and checks for
active and open zone limits.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Many CNS commands have "allocated" command variants. These include
a namespace as long as it is allocated, that is a namespace is
included regardless if it is active (attached) or not.
While these commands are optional (they are mandatory for controllers
supporting the namespace attachment command), our QEMU implementation
is more complete by actually providing support for these CNS values.
However, since our QEMU model currently does not support the namespace
attachment command, these new allocated CNS commands will return the
same result as the active CNS command variants.
The reason for not hooking up this command completely is because the
NVMe specification requires the namespace management command to be
supported if the namespace attachment command is supported.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Define the structures and constants required to implement
Namespace Types support.
Namespace Types introduce a new command set, "I/O Command Sets",
that allows the host to retrieve the command sets associated with
a namespace. Introduce support for the command set and enable
detection for the NVM Command Set.
The new workflows for identify commands rely heavily on zero-filled
identify structs. E.g., certain CNS commands are defined to return
a zero-filled identify struct when an inactive namespace NSID
is supplied.
Add a helper function in order to avoid code duplication when
reporting zero-filled identify structures.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
This log page becomes necessary to implement to allow checking for
Zone Append command support in Zoned Namespace Command Set.
This commit adds the code to report this log page for NVM Command
Set only. The parts that are specific to zoned operation will be
added later in the series.
All incoming admin and i/o commands are now only processed if their
corresponding support bits are set in this log. This provides an
easy way to control what commands to support and what not to
depending on set CC.CSS.
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Move write processing to nvme_do_write() that now handles both WRITE
and WRITE ZEROES. Both nvme_write() and nvme_write_zeroes() become
inline helper functions.
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The majority of code in nvme_rw() is becoming read- or write-specific.
Move these parts to two separate handlers, nvme_read() and nvme_write()
to make the code more readable and to remove multiple is_write checks
that has been present in the i/o path.
This is a refactoring patch, no change in functionality.
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
In NVMe 1.4, a namespace must report an ID descriptor of UUID type
if it doesn't support EUI64 or NGUID. Add a new namespace property,
"uuid", that provides the user the option to either specify the UUID
explicitly or have a UUID generated automatically every time a
namespace is initialized.
Suggested-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Controller reset ans subsystem shutdown are handled very much the same
in the current code, but some of the steps should be different in these
two cases.
Introduce two new functions, nvme_reset_ctrl() and nvme_shutdown_ctrl(),
to separate some portions of the code from nvme_clear_ctrl(). The steps
that are made different between reset and shutdown are that BAR.CC is not
reset to zero upon the shutdown and namespace data is flushed to
backing storage as a part of shutdown handling, but not upon reset.
Suggested-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Commit 37712e00b1 ("hw/block/nvme: factor out pmr setup") changed the
control flow such that the CAP register is erronously cleared after
nvme_init_pmr() has configured it. Since the entire NvmeCtrl structure
is zero-filled initially, there is no need for the explicit clearing, so
just remove it.
Fixes: 37712e00b1 ("hw/block/nvme: factor out pmr setup")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Add the Compare command.
This implementation uses a bounce buffer to read in the data from
storage and then compare with the host supplied buffer.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
[k.jensen: rebased]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add support for the Dataset Management command and the Deallocate
attribute. Deallocation results in discards being sent to the underlying
block device. Whether of not the blocks are actually deallocated is
affected by the same factors as Write Zeroes (see previous commit).
format | discard | dsm (512B) dsm (4KiB) dsm (64KiB)
--------------------------------------------------------
qcow2 ignore n n n
qcow2 unmap n n y
raw ignore n n n
raw unmap n y y
Again, a raw format and 4KiB LBAs are preferable.
In order to set the Namespace Preferred Deallocate Granularity and
Alignment fields (NPDG and NPDA), choose a sane minimum discard
granularity of 4KiB. If we are using a passthru device supporting
discard at a 512B granularity, user should set the discard_granularity
property explicitly. NPDG and NPDA will also account for the
cluster_size of the block driver if required (i.e. for QCOW2).
See NVM Express 1.3d, Section 6.7 ("Dataset Management command").
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add support for reporting the Deallocated or Unwritten Logical Block
Error (DULBE).
Rely on the block status flags reported by the block layer and consider
any block with the BDRV_BLOCK_ZERO flag to be deallocated.
Multiple factors affect when a Write Zeroes command result in
deallocation of blocks.
* the underlying file system block size
* the blockdev format
* the 'discard' and 'logical_block_size' parameters
format | discard | wz (512B) wz (4KiB) wz (64KiB)
-----------------------------------------------------
qcow2 ignore n n y
qcow2 unmap n n y
raw ignore n y y
raw unmap n y y
So, this works best with an image in raw format and 4KiB LBAs, since
holes can then be punched on a per-block basis (this assumes a file
system with a 4kb block size, YMMV). A qcow2 image, uses a cluster size
of 64KiB by default and blocks will only be marked deallocated if a full
cluster is zeroed or discarded. However, this *is* consistent with the
spec since Write Zeroes "should" deallocate the block if the Deallocate
attribute is set and "may" deallocate if the Deallocate attribute is not
set. Thus, we always try to deallocate (the BDRV_REQ_MAY_UNMAP flag is
always set).
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add a new function, nvme_aio_err, to handle errors resulting from AIOs
and use this from the callbacks.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
nvme_check_bounds has no use of the NvmeCtrl parameter; remove it.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Currently, blk_is_read_only() tells whether a given BlockBackend can
only be used in read-only mode because its root node is read-only. Some
callers actually try to answer a slightly different question: Is the
BlockBackend configured to be writable, by taking write permissions on
the root node?
This can differ, for example, for CD-ROM devices which don't take write
permissions, but may be backed by a writable image file. scsi-cd allows
write requests to the drive if blk_is_read_only() returns false.
However, the write request will immediately run into an assertion
failure because the write permission is missing.
This patch introduces separate functions for both questions.
blk_supports_write_perm() answers the question whether the block
node/image file can support writable devices, whereas blk_is_writable()
tells whether the BlockBackend is currently configured to be writable.
All calls of blk_is_read_only() are converted to one of the two new
functions.
Fixes: https://bugs.launchpad.net/bugs/1906693
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210118123448.307825-2-kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
- minor resource leak fixes in qemu-nbd
- ensure proper aio context when nbd server uses iothreads
- iotest refactorings in preparation for rewriting ./check to be more
flexible, and preparing for more nbd server reconnect features
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmAI5sMACgkQp6FrSiUn
Q2rmQwf+Jmlsn8s0tdeeOhv6mp8ZSyvr2/x1/daGHkzZqhoL7m/4kJLP4p/u8uTV
XzXyXt7MKvHKd8UvKB6VsN6z75RJvi7y8pKpOQA96t08hjWuAcVtivKnyZd6MTwj
zeKsmrE8LAuMjHvmsrtrmRqCSdaVeFPb3qC6bvJ+WEiXJIMiXybF7lccPvR7WWjR
2FcyraZJgnlKrQv1i8M1++Px5W14jhOacAMUNAdVzNiYpu4tq6PTk9giq1/GULCz
xVYGHqoTFYy7Slj7xKQJuOwGNLMwL+F9x7/6wRFhKxjutc0/Po1lSfbaNe8q147H
p9jtDT9/OuTlQf7qpqqyQnASABDgaA==
=XW+E
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2021-01-20' into staging
nbd patches for 2021-01-20
- minor resource leak fixes in qemu-nbd
- ensure proper aio context when nbd server uses iothreads
- iotest refactorings in preparation for rewriting ./check to be more
flexible, and preparing for more nbd server reconnect features
# gpg: Signature made Thu 21 Jan 2021 02:28:19 GMT
# gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg: aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A
* remotes/ericb/tags/pull-nbd-2021-01-20:
iotests.py: qemu_io(): reuse qemu_tool_pipe_and_status()
iotests.py: fix qemu_tool_pipe_and_status()
iotests/264: fix style
iotests: define group in each iotest
iotests/294: add shebang line
iotests: make tests executable
iotests: fix some whitespaces in test output files
iotests/303: use dot slash for qcow2.py running
iotests/277: use dot slash for nbd-fault-injector.py running
nbd/server: Quiesce coroutines on context switch
block: Honor blk_set_aio_context() context requirements
qemu-nbd: Fix a memleak in nbd_client_thread()
qemu-nbd: Fix a memleak in qemu_nbd_client_list()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The documentation for bdrv_set_aio_context_ignore() states this:
* The caller must own the AioContext lock for the old AioContext of bs, but it
* must not own the AioContext lock for new_context (unless new_context is the
* same as the current context of bs).
As blk_set_aio_context() makes use of this function, this rule also
applies to it.
Fix all occurrences where this rule wasn't honored.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20201214170519.223781-2-slp@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
As per POSIX specification of limits.h [1], OS libc may define
PAGE_SIZE in limits.h.
To prevent collosion of definition, we rename PAGE_SIZE here.
[1]: https://pubs.opengroup.org/onlinepubs/7908799/xsh/limits.h.html
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210118063808.12471-5-jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Auto Address Increment (AAI) Word-Program is a special command of
SST flashes. AAI-WP allows multiple bytes of data to be programmed
without re-issuing the next sequential address location.
Signed-off-by: Xuzhou Cheng <xuzhou.cheng@windriver.com>
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Message-id: 1608688825-81519-2-git-send-email-bmeng.cn@gmail.com
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
When write is disabled, the write to flash should be avoided
in flash_write8().
Fixes: 82a2499011 ("m25p80: Initial implementation of SPI flash device")
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Message-id: 1608688825-81519-1-git-send-email-bmeng.cn@gmail.com
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
This commit is the result of running the timer-del-timer-free.cocci
script on the whole source tree.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201215154107.3255-4-peter.maydell@linaro.org
The function will be moved to common QOM code, as it is not
specific to TYPE_DEVICE anymore.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Paul Durrant <paul@xen.org>
Message-Id: <20201211220529.2290218-31-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Every single qdev property setter function manually checks
dev->realized. We can just check dev->realized inside
qdev_property_set() instead.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Paul Durrant <paul@xen.org>
Message-Id: <20201211220529.2290218-24-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Move the property types and property macros implemented in
qdev-properties-system.c to a new qdev-properties-system.h
header.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20201211220529.2290218-16-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This is the QEMU equivalent of this Linux commit (but 7 years later):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f7025a43a9da2
The MTD subsystem has its own small museum of ancient NANDs
in a form of the CONFIG_MTD_NAND_MUSEUM_IDS configuration option.
The museum contains stone age NANDs with 256 bytes pages, as well
as iron age NANDs with 512 bytes per page and up to 8MiB page size.
It is with great sorrow that I inform you that the museum is being
decommissioned. The MTD subsystem is out of budget for Kconfig
options and already has too many of them, and there is a general
kernel trend to simplify the configuration menu.
We remove the stone age exhibits along with closing the museum,
but some of the iron age ones are transferred to the regular NAND
depot. Namely, only those which have unique device IDs are
transferred, and the ones which have conflicting device IDs are
removed.
The machine using this device are:
- axis-dev88
- tosa (via tc6393xb_init)
- spitz based (akita, borzoi, terrier)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20201214002620.342384-1-f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* gdbstub: Correct misparsing of vCont C/S requests
* openrisc: Move pic_cpu code into CPU object proper
* nios2: Move IIC code into CPU object proper
* Improve reporting of ROM overlap errors
* xlnx-versal: Add USB support
* hw/misc/zynq_slcr: Avoid #DIV/0! error
* Numonyx: Fix dummy cycles and check for SPI mode on cmds
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAl/YwVIZHHBldGVyLm1h
eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3lOpD/9FjasMvZqYeanyNlv+4Swk
4MFeYouIzXKSFu9tj5eDTHzN1TJl5iSwhkIcr9NBqxppuv2eqzxfWWMEfCZ06pxz
BR2HoSlSLUih8cKpu40cQg0TTMEOGEOV9RAHtt8vSGE0FesoiXG2ORUPcxm3NxbN
l9XZ1x3Yb5ZLqVZViFjlZ5gXnTzJ//uPEzbl7N9+pa0mXDKvmvwAl19DLmF6N2Jj
D+gmrLGeEbkJ358RGO/VF7r/1bOkrhwKrb8MzeqFRmjIqaOGbGqs/71+amiSjS8n
hC1HKf6KQOLrklMVaYg1pRxHLbHpQR+haeeX4Xt9jxx8EUrwXojlyaD8p4V9Hcu8
L5haTIBhPrnTkUfHZYL0qYkqRpzbNq97oX2Gmk967FfsZME5fxNa3kS6zM0GkIBx
YKghaZtFInAFODUbG1hHdUc+WbvfQDhj/mBQ6wWw669vYpoab/3nfVq8YVoupVM/
RntcqpBfqtGgPzuJ2dJEEsm6QlK4SZaGlmPkz542OzcHxw3SgeqkbIuDW/CtNI+b
c5PgX0C2S2AnFAAHURnsXdqt6+O01FZqOU7SCLjmwrBrpDG69lum+JLCqXFe9iMW
XgrTrxyPIcz5+Bv63AqKcm6rpcQs5ekwmLLEjT0OJtr+5ef9MeRil0aChj1j4i+2
H/82yKR4JWW1egEvTJhskQ==
=lHZA
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20201215' into staging
target-arm queue:
* gdbstub: Correct misparsing of vCont C/S requests
* openrisc: Move pic_cpu code into CPU object proper
* nios2: Move IIC code into CPU object proper
* Improve reporting of ROM overlap errors
* xlnx-versal: Add USB support
* hw/misc/zynq_slcr: Avoid #DIV/0! error
* Numonyx: Fix dummy cycles and check for SPI mode on cmds
# gpg: Signature made Tue 15 Dec 2020 13:59:46 GMT
# gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg: issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate]
# gpg: aka "Peter Maydell <pmaydell@gmail.com>" [ultimate]
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate]
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* remotes/pmaydell/tags/pull-target-arm-20201215:
hw/block/m25p80: Fix Numonyx fast read dummy cycle count
hw/block/m25p80: Check SPI mode before running some Numonyx commands
hw/block/m25p80: Fix when VCFG XIP bit is set for Numonyx
hw/block/m25p80: Make Numonyx config field names more accurate
hw/misc/zynq_slcr: Avoid #DIV/0! error
arm: xlnx-versal: Connect usb to virt-versal
usb: xlnx-usb-subsystem: Add xilinx usb subsystem
usb: Add DWC3 model
usb: Add versal-usb2-ctrl-regs module
elf_ops.h: Be more verbose with ROM blob names
elf_ops.h: Don't truncate name of the ROM blobs we create
hw/core/loader.c: Improve reporting of ROM overlap errors
hw/core/loader.c: Track last-seen ROM in rom_check_and_register_reset()
target/nios2: Use deposit32() to update ipending register
target/nios2: Move nios2_check_interrupts() into target/nios2
target/nios2: Move IIC code into CPU object proper
target/openrisc: Move pic_cpu code into CPU object proper
hw/openrisc/openrisc_sim: Abstract out "get IRQ x of CPU y"
hw/openrisc/openrisc_sim: Use IRQ splitter when connecting IRQ to multiple CPUs
gdbstub: Correct misparsing of vCont C/S requests
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Make the code more generic and not specific to TYPE_DEVICE.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com> #s390 parts
Acked-by: Paul Durrant <paul@xen.org>
Message-Id: <20201211220529.2290218-10-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Some Numonyx flash commands cannot be executed in DIO and QIO mode, such as
trying to do DPP or DOR when in QIO mode.
Signed-off-by: Joe Komlodi <komlodi@xilinx.com>
Reviewed-by: Francisco Iglesias <francisco.iglesias@xilinx.com>
Message-id: 1605568264-26376-4-git-send-email-komlodi@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
VCFG XIP is set (disabled) when the NVCFG XIP bits are all set (disabled).
Signed-off-by: Joe Komlodi <komlodi@xilinx.com>
Reviewed-by: Francisco Iglesias <francisco.iglesias@xilinx.com>
Message-id: 1605568264-26376-3-git-send-email-komlodi@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The previous naming of the configuration registers made it sound like that if
the bits were set the settings would be enabled, while the opposite is true.
Signed-off-by: Joe Komlodi <komlodi@xilinx.com>
Reviewed-by: Francisco Iglesias <francisco.iglesias@xilinx.com>
Message-id: 1605568264-26376-2-git-send-email-komlodi@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In order to use inclusive terminology, rename SSI 'slave' as
'peripheral', following the specification resolution:
https://www.oshwa.org/a-resolution-to-redefine-spi-signal-names/
Patch created mechanically using:
$ sed -i s/SSISlave/SSIPeripheral/ $(git grep -l SSISlave)
$ sed -i s/SSI_SLAVE/SSI_PERIPHERAL/ $(git grep -l SSI_SLAVE)
$ sed -i s/ssi-slave/ssi-peripheral/ $(git grep -l ssi-slave)
$ sed -i s/ssi_slave/ssi_peripheral/ $(git grep -l ssi_slave)
$ sed -i s/ssi_create_slave/ssi_create_peripheral/ \
$(git grep -l ssi_create_slave)
Then in VMStateDescription vmstate_ssi_peripheral we restored
the "SSISlave" migration stream name (to avoid breaking migration).
Finally the following files have been manually tweaked:
- hw/ssi/pl022.c
- hw/ssi/xilinx_spips.c
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20201012124955.3409127-4-f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The category of the nand device is not set, put it into the 'storage'
category.
Signed-off-by: Gan Qixin <ganqixin@huawei.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20201112125824.763182-4-ganqixin@huawei.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
There is no "version 2" of the "Lesser" General Public License.
It is either "GPL version 2.0" or "Lesser GPL version 2.1".
This patch replaces all occurrences of "Lesser GPL version 2" with
"Lesser GPL version 2.1" in comment section.
Signed-off-by: Chetan Pant <chetan4windows@gmail.com>
Message-Id: <20201023123034.19609-1-chetan4windows@gmail.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
[thuth: Fixed subject]
Signed-off-by: Thomas Huth <thuth@redhat.com>
Since 7f0f1acedf ("hw/block/nvme: support multiple namespaces"), the
namespaces member of NvmeCtrl is no longer a dynamically allocated
array. Remove the free.
Fixes: 7f0f1acedf ("hw/block/nvme: support multiple namespaces")
Reported-by: Coverity (CID 1436131)
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Message-Id: <20201104102248.32168-4-its@irrelevant.dk>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
nvme_map_sgl_data erroneously uses the sgls member of NvmeIdNs as a
uint16_t.
Reported-by: Coverity (CID 1436129)
Fixes: cba0a8a344 ("hw/block/nvme: add support for scatter gather lists")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Message-Id: <20201104102248.32168-3-its@irrelevant.dk>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Virtqueue has split and packed, so before setting inflight,
you need to inform the back-end virtqueue format.
Signed-off-by: Jin Yu <jin.yu@intel.com>
Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20201103123617.28256-1-jin.yu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This reverts commit adb29c0273.
The commit broke -device vhost-user-blk-pci because the
vhost_dev_prepare_inflight() function it introduced segfaults in
vhost_dev_set_features() when attempting to access struct vhost_dev's
vdev pointer before it has been assigned.
To reproduce the segfault simply launch a vhost-user-blk device with the
contrib vhost-user-blk device backend:
$ build/contrib/vhost-user-blk/vhost-user-blk -s /tmp/vhost-user-blk.sock -r -b /var/tmp/foo.img
$ build/qemu-system-x86_64 \
-device vhost-user-blk-pci,id=drv0,chardev=char1,addr=4.0 \
-object memory-backend-memfd,id=mem,size=1G,share=on \
-M memory-backend=mem,accel=kvm \
-chardev socket,id=char1,path=/tmp/vhost-user-blk.sock
Segmentation fault (core dumped)
Cc: Jin Yu <jin.yu@intel.com>
Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201102165709.232180-1-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE28EdLTc7SjdV9QLsYlFWYQpPbMAFAl+gI74ACgkQYlFWYQpP
bMCBrA/9GXMZZGDfHFenXF+rS6J+ZKxtk29vq9Ly8KZ9YW7CzF9MP8qE/5iyFfmx
d1BknXGQerW2kAzpkOq2/MKDklOc+0BAhaTdUaFR/ao5ZKuv2LQ8uFnKVoTrhTx9
+HVkTVUTnez6ReCZVIrtN4+XVdyQTeQotJg6H2m5Q/BxQKcj6OMOlneuSGDn5vFN
EWgDvEmfFEkzbN8FMXtkT35bg3vA5TGmfQRMk1SMMREOPxF04CaTVTxYscCpS0WC
Cl+62mx4XLjscK7hwXuTNTrxeOLxZ2xLK5dhDd/qxBveio07mIM5X2psdKR0t5qX
HLtm437T9CAYmyo8jgvM4KL8f+rbJnLd579qyVwIMsue28Qisj9nuWCTcaEpjfck
4krhxJwxenRtqQ9wYrnbnQI5yQDIE6iUGf0toXwCNdJIr+FvyIcT7vJtTzZXtRI8
sxwK5wfJ/WSey9uNLZGFbQuv4vjOMV+Nk3mEi1gUV8ujogo+2U6WUAE3NhqFLKn1
YT6AJhDZvqL1f8gFrbiqR8xwvPrYmwK/tK38X1exSDOqiB7UNzR/apAb1oniul0e
rS5xWzIs9APvkdWQssCHvrVDdh6VISXQ5bnT8lkfmvYrCTn2gUGAFXDrxZjXIaL9
scCr8N9STkHmoYpc2ACRKIpfK3E1sDjGA8mAPemkxsLakNwBS4o=
=s4KC
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/nvme/tags/pull-nvme-20201102' into staging
nvme pull 2 Nov 2020
# gpg: Signature made Mon 02 Nov 2020 15:20:30 GMT
# gpg: using RSA key DBC11D2D373B4A3755F502EC625156610A4F6CC0
# gpg: Good signature from "Keith Busch <kbusch@kernel.org>" [unknown]
# gpg: aka "Keith Busch <keith.busch@gmail.com>" [unknown]
# gpg: aka "Keith Busch <keith.busch@intel.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: DBC1 1D2D 373B 4A37 55F5 02EC 6251 5661 0A4F 6CC0
* remotes/nvme/tags/pull-nvme-20201102: (30 commits)
hw/block/nvme: fix queue identifer validation
hw/block/nvme: fix create IO SQ/CQ status codes
hw/block/nvme: fix prp mapping status codes
hw/block/nvme: report actual LBA data shift in LBAF
hw/block/nvme: add trace event for requests with non-zero status code
hw/block/nvme: add nsid to get/setfeat trace events
hw/block/nvme: reject io commands if only admin command set selected
hw/block/nvme: support for admin-only command set
hw/block/nvme: validate command set selected
hw/block/nvme: support per-namespace smart log
hw/block/nvme: fix log page offset check
hw/block/nvme: remove pointless rw indirection
hw/block/nvme: update nsid when registered
hw/block/nvme: change controller pci id
pci: allocate pci id for nvme
hw/block/nvme: support multiple namespaces
hw/block/nvme: refactor identify active namespace id list
hw/block/nvme: add support for sgl bit bucket descriptor
hw/block/nvme: add support for scatter gather lists
hw/block/nvme: harden cmb access
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Virtqueue has split and packed, so before setting inflight,
you need to inform the back-end virtqueue format.
Signed-off-by: Jin Yu <jin.yu@intel.com>
Message-Id: <20200910134851.7817-1-jin.yu@intel.com>
Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The nvme_check_{sq,cq} functions check if the given queue identifer is
valid *and* that the queue exists. Thus, the function return value
cannot simply be inverted to check if the identifer is valid and that
the queue does *not* exist.
Replace the call with an OR'ed version of the checks.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Replace the Invalid Field in Command with the Invalid PRP Offset status
code in the nvme_create_{cq,sq} functions. Also, allow PRP1 to be
address 0x0.
Also replace the Completion Queue Invalid status code returned in
nvme_create_cq when the the queue identifier is invalid with the Invalid
Queue Identifier. The Completion Queue Invalid status code is
exclusively for indicating that the completion queue identifer given
when creating a submission queue is invalid.
See NVM Express v1.3d, Section 5.3 ("Create I/O Completion Queue
command") and 5.4("Create I/O Submission Queue command").
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Address 0 is not an invalid address. Remove those invalikd checks.
Unaligned PRP2 and PRP list entries should result in Invalid PRP Offset
status code and not Invalid Field. Fix that.
See NVMe Express v1.3d, Section 4.3 ("Physical Region Page Entry and
List").
Suggested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Calculate the data shift value to report based on the set value of
logical_block_size device property.
In the process, use a local variable to calculate the LBA format
index instead of the hardcoded value 0. This makes the code more
readable and it will make it easier to add support for multiple LBA
formats in the future.
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
If a command results in a non-zero status code, trace it.
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Include the namespace id in the pci_nvme_{get,set}feat trace events.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
If the host sets CC.CSS to 111b, all commands submitted to I/O queues
should be completed with status Invalid Command Opcode.
Note that this is technically a v1.4 feature, but it does not hurt to
implement before we finally bump the reported version implemented.
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Fail to start the controller if the user requests a command set that the
controller does not support.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Let the user specify a specific namespace if they want to get access
stats for a specific namespace.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Return error if the requested offset starts after the size of the log
being returned. Also, move the check for earlier in the function so
we're not doing unnecessary calculations.
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed- by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The code switches on the opcode to invoke a function specific to that
opcode. There's no point in consolidating back to a common function that
just switches on that same opcode without any actual common code.
Restore the opcode specific behavior without going back through another
level of switches.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
If the user does not specify an nsid parameter on the nvme-ns device,
nvme_register_namespace will find the first free namespace id and assign
that.
This fix makes sure the assigned id is saved.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
There are two reasons for changing this:
1. The nvme device currently uses an internal Intel device id.
2. Since commits "nvme: fix write zeroes offset and count" and "nvme:
support multiple namespaces" the controller device no longer has
the quirks that the Linux kernel think it has.
As the quirks are applied based on pci vendor and device id, change
them to get rid of the quirks.
To keep backward compatibility, add a new 'use-intel-id' parameter to
the nvme device to force use of the Intel vendor and device id. This is
off by default but add a compat property to set this for 5.1 machines
and older. If a 5.1 machine is booted (or the use-intel-id parameter is
explicitly set to true), the Linux kernel will just apply these
unnecessary quirks:
1. NVME_QUIRK_IDENTIFY_CNS which says that the device does not support
anything else than values 0x0 and 0x1 for CNS (Identify Namespace
and Identify Namespace). With multiple namespace support, this just
means that the kernel will "scan" namespaces instead of using
"Active Namespace ID list" (CNS 0x2).
2. NVME_QUIRK_DISABLE_WRITE_ZEROES. The nvme device started out with a
broken Write Zeroes implementation which has since been fixed in
commit 9d6459d21a ("nvme: fix write zeroes offset and count").
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
This adds support for multiple namespaces by introducing a new 'nvme-ns'
device model. The nvme device creates a bus named from the device name
('id'). The nvme-ns devices then connect to this and registers
themselves with the nvme device.
This changes how an nvme device is created. Example with two namespaces:
-drive file=nvme0n1.img,if=none,id=disk1
-drive file=nvme0n2.img,if=none,id=disk2
-device nvme,serial=deadbeef,id=nvme0
-device nvme-ns,drive=disk1,bus=nvme0,nsid=1
-device nvme-ns,drive=disk2,bus=nvme0,nsid=2
The drive property is kept on the nvme device to keep the change
backward compatible, but the property is now optional. Specifying a
drive for the nvme device will always create the namespace with nsid 1.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Prepare to support inactive namespaces.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
This adds support for SGL descriptor type 0x1 (bit bucket descriptor).
See the NVM Express v1.3d specification, Section 4.4 ("Scatter Gather
List (SGL)").
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
For now, support the Data Block, Segment and Last Segment descriptor
types.
See NVM Express 1.3d, Section 4.4 ("Scatter Gather List (SGL)").
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Since the controller has only supported PRPs so far it has not been
required to check the ending address (addr + len - 1) of the CMB access
for validity since it has been guaranteed to be in range of the CMB.
This changes when the controller adds support for SGLs (next patch), so
add that check.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Make the default request status NVME_SUCCESS so only error status codes
have to be set.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
This pulls block layer aio submission/completion to common functions.
For completions, additionally map an AIO error to the Unrecovered Read
and Write Fault status codes.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add the symbolic command name to the pci_nvme_{io,admin}_cmd and
pci_nvme_rw trace events.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
The raw NLB field is a 16 bit value, so use le16_to_cpu instead of
le32_to_cpu and cast to uint32_t before incrementing the value to not
wrap around.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Add the nvme_l2b helper and use it for converting NLB and SLBA to byte
counts and offsets.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Style fixes.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Handling DMA errors gracefully is required for the device to pass the
block/011 test ("disable PCI device while doing I/O") in the blktests
suite.
With this patch the device sets the Controller Fatal Status bit in the
CSTS register when failing to read from a submission queue or writing to
a completion queue; expecting the host to reset the controller.
If DMA errors occur at any other point in the execution of the command
(say, while mapping the PRPs), the command is aborted with a Data
Transfer Error status code.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Fix a typo in the sq doorbell trace event.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
As the 'timestamp' variable is declared as a 48-bit bitfield,
we do not need to wrap the sum result.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Message-Id: <20201002075716.1657849-1-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
vhost-user devices can get a disconnect in the middle of the VHOST-USER
handshake on the migration start. If disconnect event happened right
before sending next VHOST-USER command, then the vhost_dev_set_log()
call in the vhost_migration_log() function will return error. This error
will lead to the assert() and close the QEMU migration source process.
For the vhost-user devices the disconnect event should not break the
migration process, because:
- the device will be in the stopped state, so it will not be changed
during migration
- if reconnect will be made the migration log will be reinitialized as
part of reconnect/init process:
#0 vhost_log_global_start (listener=0x563989cf7be0)
at hw/virtio/vhost.c:920
#1 0x000056398603d8bc in listener_add_address_space (listener=0x563989cf7be0,
as=0x563986ea4340 <address_space_memory>)
at softmmu/memory.c:2664
#2 0x000056398603dd30 in memory_listener_register (listener=0x563989cf7be0,
as=0x563986ea4340 <address_space_memory>)
at softmmu/memory.c:2740
#3 0x0000563985fd6956 in vhost_dev_init (hdev=0x563989cf7bd8,
opaque=0x563989cf7e30, backend_type=VHOST_BACKEND_TYPE_USER,
busyloop_timeout=0)
at hw/virtio/vhost.c:1385
#4 0x0000563985f7d0b8 in vhost_user_blk_connect (dev=0x563989cf7990)
at hw/block/vhost-user-blk.c:315
#5 0x0000563985f7d3f6 in vhost_user_blk_event (opaque=0x563989cf7990,
event=CHR_EVENT_OPENED)
at hw/block/vhost-user-blk.c:379
Update the vhost-user-blk device with the internal started_vu field which
will be used for initialization (vhost_user_blk_start) and clean up
(vhost_user_blk_stop). This additional flag in the VhostUserBlk structure
will be used to track whether the device really needs to be stopped and
cleaned up on a vhost-user level.
The disconnect event will set the overall VHOST device (not vhost-user) to
the stopped state, so it can be used by the general vhost_migration_log
routine.
Such approach could be propogated to the other vhost-user devices, but
better idea is just to make the same connect/disconnect code for all the
vhost-user devices.
This migration issue was slightly discussed earlier:
- https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg01509.html
- https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg05241.html
Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <9fbfba06791a87813fcee3e2315f0b904cc6789a.1599813294.git.dimastep@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fuzzing discovered that virtqueue_unmap_sg() is being called on modified
req->in/out_sg iovecs. This means dma_memory_map() and
dma_memory_unmap() calls do not have matching memory addresses.
Fuzzing discovered that non-RAM addresses trigger a bug:
void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
bool is_write, hwaddr access_len)
{
if (buffer != bounce.buffer) {
^^^^^^^^^^^^^^^^^^^^^^^
A modified iov->iov_base is no longer recognized as a bounce buffer and
the wrong branch is taken.
There are more potential bugs: dirty memory is not tracked correctly and
MemoryRegion refcounts can be leaked.
Use the new iov_discard_undo() API to restore elem->in/out_sg before
virtqueue_push() is called.
Fixes: 827805a249 ("virtio-blk: Convert VirtIOBlockReq.out to structrue")
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Buglink: https://bugs.launchpad.net/qemu/+bug/1890360
Message-Id: <20200917094455.822379-3-stefanha@redhat.com>