This allows the virtio-blk-pci device to batch the setup of all its
host notifiers. This significantly improves boot time of VMs with a
high number of vCPUs, e.g. from 3m26.186s down to 0m58.023s for a
pseries machine with 384 vCPUs.
Note that memory_region_transaction_commit() must be called before
virtio_bus_cleanup_host_notifier() because the latter might close
ioeventfds that the transaction still assumes to be around when it
commits.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210407143501.244343-3-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When dataplane multiqueue support was added in QEMU 2.7, the path
that would rollback guest notifiers assignment in case of error
simply got dropped.
Later on, when Error was added to blk_set_aio_context() in QEMU 4.1,
another error path was introduced, but it ommits to rollback both
host and guest notifiers.
It seems cleaner to fix the rollback path in one go. The patch is
simple enough that it can be adjusted if backported to a pre-4.1
QEMU.
Fixes: 51b04ac5c6 ("virtio-blk: dataplane multiqueue support")
Cc: stefanha@redhat.com
Fixes: 97896a4887 ("block: Add Error to blk_set_aio_context()")
Cc: kwolf@redhat.com
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210407143501.244343-2-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit 1901b4967c changed the nvme device from using a bar exclusive
for MSI-x to sharing it on bar0.
Unfortunately, the msix_uninit_exclusive_bar() call remains in
nvme_exit() which causes havoc when the device is removed with, say,
device_del. Fix this.
Additionally, a subregion is added but it is not removed on exit which
causes a reference to linger and the drive to never be unlocked.
Fixes: 1901b4967c ("hw/block/nvme: move msix table and pba to BAR 0")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
For most commands, when issuing an AIO, the BlockAIOCB is stored in the
NvmeRequest aiocb pointer when the AIO is issued. The main use of this
is cancelling AIOs when deleting submission queues (it is currently not
used for Abort).
However, some commands like Dataset Management Zone Management Send
(zone reset) may involve more than one AIO and here the AIOs are issued
without saving a reference to the BlockAIOCB. This is a problem since
nvme_del_sq() will attempt to cancel outstanding AIOs, potentially with
an invalid BlockAIOCB since the aiocb pointer is not NULL'ed when the
request structure is recycled.
Fix this by
1. making sure the aiocb pointer is NULL'ed when requests are recycled
2. only attempt to cancel the AIO if the aiocb is non-NULL
3. if any AIOs could not be cancelled, drain all aio as a last resort.
Fixes: dc04d25e2f ("hw/block/nvme: add support for the format nvm command")
Fixes: c94973288c ("hw/block/nvme: add broadcast nsid support flush command")
Fixes: e4e430b3d6 ("hw/block/nvme: add simple copy command")
Fixes: 5f5dc4c6a9 ("hw/block/nvme: zero out zones on reset")
Fixes: 2605257a26 ("hw/block/nvme: add the dataset management command")
Cc: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Cc: Minwoo Im <minwoo.im@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
nvme_compare() fails to store the aiocb from the blk_aio_preadv() call.
Fix this.
Fixes: 0a384f923f ("hw/block/nvme: add compare command")
Cc: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
nvme_map_prp needs to calculate the number of list entries based on the
offset value. For the subsequent PRP2 list, need to ensure the number of
entries is within the MAX number of PRP entries for a page.
Signed-off-by: Padmakar Kalghatgi <p.kalghatgi@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Setting the 'fallback' property corrupts the QOM instance state
(FDCtrlSysBus) because it accesses an incorrect offset (it uses
the offset of the FDCtrlISABus state).
Cc: qemu-stable@nongnu.org
Fixes: a73275dd6f ("fdc: Add fallback option")
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210407133742.1680424-1-f4bug@amsat.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
nvme_subsys_ctrl() is used in contexts where the given controller
identifier is from an untrusted source. Like its friends nvme_ns() and
nvme_subsys_ns(), nvme_subsys_ctrl() should just return NULL if an
invalid identifier is given.
Fixes: 645ce1a70c ("hw/block/nvme: support namespace attachment command")
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
nvme_subsys_ns() is used in contexts where the namespace identifier is
taken from an untrusted source. Commit 3921756dee ("hw/block/nvme:
assert namespaces array indices") tried to guard against this by
introducing an assert on the namespace identifier.
This is wrong since it is perfectly valid to call the function with an
invalid namespace identifier and like nvme_ns(), nvme_subsys_ns() should
simply return NULL.
Fixes: 3921756dee ("hw/block/nvme: assert namespaces array indices")
Fixes: 94d8d6d167 ("hw/block/nvme: support allocated namespace type")
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
nvme_ns_attachment() does not verify the contents of the host-supplied
16 bit "Number of Identifiers" field in the command payload.
Make sure the value is capped at 2047 and fix the out-of-bounds read.
Fixes: 645ce1a70c ("hw/block/nvme: support namespace attachment command")
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Add missing license/copyright headers to the nvme-dif.{c,h} files.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Prior to this patch, if a private nvme-ns device (that is, a namespace
that is not linked to a subsystem) is wired up to an nvme-subsys linked
nvme controller device, the device fails to verify that the namespace id
is unique within the subsystem. NVM Express v1.4b, Section 6.1.6 ("NSID
and Namespace Usage") states that because the device supports Namespace
Management, "NSIDs *shall* be unique within the NVM subsystem".
Additionally, prior to this patch, private namespaces are not known to
the subsystem and the namespace is considered exclusive to the
controller with which it is initially wired up to. However, this is not
the definition of a private namespace; per Section 1.6.33 ("private
namespace"), a private namespace is just a namespace that does not
support multipath I/O or namespace sharing, which means "that it is only
able to be attached to one controller at a time".
Fix this by always allocating namespaces in the subsystem (if one is
linked to the controller), regardless of the shared/private status of
the namespace. Whether or not the namespace is shareable is controlled
by a new `shared` nvme-ns parameter.
Finally, this fix allows the nvme-ns `subsys` parameter to be removed,
since the `shared` parameter now serves the purpose of attaching the
namespace to all controllers in the subsystem upon device realization.
It is invalid to have an nvme-ns namespace device with a linked
subsystem without the parent nvme controller device also being linked to
one and since the nvme-ns devices will unconditionally be "attached" (in
QEMU terms that is) to an nvme controller device through an NvmeBus, the
nvme-ns namespace device can always get a reference to the subsystem of
the controller it is explicitly (using 'bus=' parameter) or implicitly
attaching to.
Fixes: e570768566 ("hw/block/nvme: support for shared namespace in subsystem")
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
The Non-MDTS DMSRL limit must be recomputed when namespaces are
detached.
Fixes: 645ce1a70c ("hw/block/nvme: support namespace attachment command")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Remove the unused BlockConf from the controller structure and remove the
noop constraint checking.
Device works just fine with both legacy drive parameter namespace and
nvme-ns namespace definitions.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
The `nvme_nsid()` function returns '-1' (FFFFFFFFh) when the given
namespace is NULL. Since FFFFFFFFh is actually a valid namespace
identifier (the "broadcast" value), change this to be '0' since that
actually *is* the invalid value.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add the missing nvme_adm_opc_str entry for the Namespace Attachment
command.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Protection Information can only be enabled if there is at least 8 bytes
of metadata.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
The check for `n->namespace.blkconf.blk` always fails because
this is in the initialization function.
Signed-off-by: Joelle van Dyne <j@getutm.app>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The description was originally removed in commit 578d914b26
("hw/block/nvme: align zoned.zasl with mdts") together with the removal
of the zoned.append_size_limit parameter itself.
However, it was (most likely accidentally), re-added in commit
f7dcd31885 ("hw/block/nvme: add non-mdts command size limit for verify").
Remove the description again, since the parameter it describes,
zoned.append_size_limit, no longer exists.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Qemu crashes on shutdown if the chardev used by vhost-user-blk has been
finalized before the vhost-user-blk.
This happens with char-socket chardev operating in the listening mode (server).
The char-socket chardev emits "close" event at the end of finalizing when
its internal data is destroyed. This calls vhost-user-blk event handler
which in turn tries to manipulate with destroyed chardev by setting an empty
event handler for vhost-user-blk cleanup postponing.
This patch separates the shutdown case from the cleanup postponing removing
the need to set an event handler.
Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
Message-Id: <20210325151217.262793-4-den-plotnikov@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit 4bcad76f4c ("vhost-user-blk: delay vhost_user_blk_disconnect")
introduced postponing vhost_dev cleanup aiming to eliminate qemu aborts
because of connection problems with vhost-blk daemon.
However, it introdues a new problem. Now, any communication errors
during execution of vhost_dev_init() called by vhost_user_blk_device_realize()
lead to qemu abort on assert in vhost_dev_get_config().
This happens because vhost_user_blk_disconnect() is postponed but
it should have dropped s->connected flag by the time
vhost_user_blk_device_realize() performs a new connection opening.
On the connection opening, vhost_dev initialization in
vhost_user_blk_connect() relies on s->connection flag and
if it's not dropped, it skips vhost_dev initialization and returns
with success. Then, vhost_user_blk_device_realize()'s execution flow
goes to vhost_dev_get_config() where it's aborted on the assert.
To fix the problem this patch adds immediate cleanup on device
initialization(in vhost_user_blk_device_realize()) using different
event handlers for initialization and operation introduced in the
previous patch.
On initialization (in vhost_user_blk_device_realize()) we fully
control the initialization process. At that point, nobody can use the
device since it isn't initialized and we don't need to postpone any
cleanups, so we can do cleaup right away when there is a communication
problem with the vhost-blk daemon.
On operation we leave it as is, since the disconnect may happen when
the device is in use, so the device users may want to use vhost_dev's data
to do rollback before vhost_dev is re-initialized (e.g. in vhost_dev_set_log()).
Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20210325151217.262793-3-den-plotnikov@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It is useful to use different connect/disconnect event handlers
on device initialization and operation as seen from the further
commit fixing a bug on device initialization.
This patch refactors the code to make use of them: we don't rely any
more on the VM state for choosing how to cleanup the device, instead
we explicitly use the proper event handler depending on whether
the device has been initialized.
Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20210325151217.262793-2-den-plotnikov@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Max noticed that since blk_aio_pwrite_zeroes() may invoke the callback
before returning, the callbacks will never see *count == 0 and thus
never free the count variable or decrement num_formats causing a CQE to
never be posted.
Coverity (CID 1451082) also picked up on the fact that count would not
be free'ed if the namespace was of zero size.
Fix both of these issues by explicitly checking *count and finalize for
the given namespace if --(*count) is zero. Enqueing a CQE if there are
no AIOs outstanding after this case is already handled by nvme_format()
by inspecting *num_formats.
Reported-by: Max Reitz <mreitz@redhat.com>
Reported-by: Coverity (CID 1451082)
Fixes: dc04d25e2f ("hw/block/nvme: add support for the format nvm command")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
If nvme_map_dptr() fails, nvme_dif_rw() will leak the bounce context.
Fix this by using the same error handling as everywhere else in the
function.
Reported-by: Coverity (CID 1451080)
Fixes: 146f720c55 ("hw/block/nvme: end-to-end data protection")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Whenever a Xen block device is detach via xenstore, the image
associated with it remained open by the backend QEMU and an error is
logged:
qemu-system-i386: failed to destroy drive: Node xvdz-qcow2 is in use
This happened since object_unparent() doesn't immediately frees the
object and thus keep a reference to the node we are trying to free.
The reference is hold by the "drive" property and the call
xen_block_drive_destroy() fails.
In order to fix that, we call drain_call_rcu() to run the callback
setup by bus_remove_child() via object_unparent().
Fixes: 2d24a64661 ("device-core: use RCU for list of children of a bus")
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <20210308143232.83388-1-anthony.perard@citrix.com>
Per SST25VF016B datasheet [1], SST flash requires a dummy byte after
the address bytes. Note only SPI mode is supported by SST flashes.
[1] http://ww1.microchip.com/downloads/en/devicedoc/s71271_04.pdf
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20210306060152.7250-1-bmeng.cn@gmail.com
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Several QOM type names contain ',':
ARM,bitband-memory
etraxfs,pic
etraxfs,serial
etraxfs,timer
fsl,imx25
fsl,imx31
fsl,imx6
fsl,imx6ul
fsl,imx7
grlib,ahbpnp
grlib,apbpnp
grlib,apbuart
grlib,gptimer
grlib,irqmp
qemu,register
SUNW,bpp
SUNW,CS4231
SUNW,DBRI
SUNW,DBRI.prom
SUNW,fdtwo
SUNW,sx
SUNW,tcx
xilinx,zynq_slcr
xlnx,zynqmp
xlnx,zynqmp-pmu-soc
xlnx,zynq-xadc
These are all device types. They can't be plugged with -device /
device_add, except for xlnx,zynqmp-pmu-soc, and I doubt that one
actually works.
They *can* be used with -device / device_add to request help.
Usability is poor, though: you have to double the comma, like this:
$ qemu-system-x86_64 -device SUNW,,fdtwo,help
Trap for the unwary. The fact that this was broken in
device-introspect-test for more than six years until commit e27bd49876
fixed it demonstrates that "the unwary" includes seasoned developers.
One QOM type name contains ' ': "ICH9 SMB". Because having to
remember just one way to quote would be too easy.
Rename the "SUNW,FOO types to "sun-FOO". Summarily replace ',' and '
' by '-' in the other type names.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210304140229.575481-2-armbru@redhat.com>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
The previous commit rendered the name fdctrl_connect_drives() somewhat
misleading. Get rid of it by inlining the (now pretty simple)
function into its only caller.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20210309161214.1402527-4-armbru@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Drop the crap deprecated in commit 4a27a638e7 "fdc: Deprecate
configuring floppies with -global isa-fdc" (v5.1.0).
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20210309161214.1402527-3-armbru@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
- QAPIfy object-add and --object
- stream: Fail gracefully if permission is denied
- storage-daemon: Fix crash on quit when job is still running
- curl: Fix use after free
- char: Deprecate backend aliases, fix QMP query-chardev-backends
- Fix image creation option defaults that exist in both the format and
the protocol layer (e.g. 'cluster_size' in qcow2 and rbd; the qcow2
default was incorrectly applied to the rbd layer)
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmBUbF4RHGt3b2xmQHJl
ZGhhdC5jb20ACgkQfwmycsiPL9YX2Q//Ve6++hRulIuJVuh8QDxlmGWERqey/ClX
mUqGDOkSSXfftPTDPCYSUFE7QD6HD25oJmUTix2B2P89AIyDcvqvthMDU/j8clor
X3Kx03ky3NLJilZNdYZ2GOMyljgNP3JSrDHBjc/tZx+1e7C5tPNVxXOUW946wIC9
no6xTAarAANl/GS23ZI+vJ3PBEggzAbu6t/hwT//WAB0WB9wFhkCCzWPXIkdBXwP
QpG8chTwuwFAW1c52F0OeQV5FpM0bcMtxYASuNq0HPL6B8qUdKOusTRgTB/fjLLV
tYMhc6tzPLUlin1mGD4m0P+9tRMBFtF/flZVwbd4S+avcAbV2L5S6Xq0QsiNTbx2
oQUk6N2/IWBOMC6D8aBTBwZ7CCasgEg0imtLUdJ8gKp6T44C7cqg1oZwT2dOyYuI
jS+3T+DcZZn3mHmp61nowL/2/2LDAVaLmOfbsvmvlbuX5j8QHj/Lvt6udRjqpelJ
n0jV9Ay0myu7dSK5ng7WNQUlSrba5I/W3CAjuH0CDp90ADCymWSp2jktvv5rSO/R
bQpz58kRY72y4dEwOy0zkWc/EClqh3p4abq5HDBCIkO+EO8CjJhnEnT+oOrFF5C/
LU93bFPyp6ZJoXzsKnjEjSzMzgDT6XuGTAgrh6upZy52ssjkG8zACbNOmXZTYmAg
hg3OlpdEUvM=
=zZCm
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches and object-add QAPIfication
- QAPIfy object-add and --object
- stream: Fail gracefully if permission is denied
- storage-daemon: Fix crash on quit when job is still running
- curl: Fix use after free
- char: Deprecate backend aliases, fix QMP query-chardev-backends
- Fix image creation option defaults that exist in both the format and
the protocol layer (e.g. 'cluster_size' in qcow2 and rbd; the qcow2
default was incorrectly applied to the rbd layer)
# gpg: Signature made Fri 19 Mar 2021 09:18:22 GMT
# gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg: issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (42 commits)
vl: allow passing JSON to -object
qom: move user_creatable_add_opts logic to vl.c and QAPIfy it
tests: convert check-qom-proplist to keyval
qom: Support JSON in HMP object_add and tools --object
char: Simplify chardev_name_foreach()
char: Deprecate backend aliases 'tty' and 'parport'
char: Skip CLI aliases in query-chardev-backends
qom: Add user_creatable_parse_str()
hmp: QAPIfy object_add
qemu-img: Use user_creatable_process_cmdline() for --object
qom: Add user_creatable_add_from_str()
qemu-nbd: Use user_creatable_process_cmdline() for --object
qemu-io: Use user_creatable_process_cmdline() for --object
qom: Factor out user_creatable_process_cmdline()
qom: Remove user_creatable_add_dict()
qemu-storage-daemon: Implement --object with qmp_object_add()
qom: Make "object" QemuOptsList optional
qapi/qom: QAPIfy object-add
qapi/qom: Add ObjectOptions for x-remote-object
qapi/qom: Add ObjectOptions for input-*
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This converts object-add from 'gen': false to the ObjectOptions QAPI
type. As an immediate benefit, clients can now use QAPI schema
introspection for user creatable QOM objects.
It is also the first step towards making the QAPI schema the only
external interface for the creation of user creatable objects. Once all
other places (HMP and command lines of the system emulator and all
tools) go through QAPI, too, some object implementations can be
simplified because some checks (e.g. that mandatory options are set) are
already performed by QAPI, and in another step, QOM boilerplate code
could be generated from the schema.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Format NVM admin command can make a namespace or namespaces to be
with different LBA size and metadata size with protection information
types.
This patch introduces Format NVM command with LBA format, Metadata, and
Protection Information for the device. The secure erase operation things
and support for formatting zoned namespaces are yet to be added.
The parameter checks inside of this patch has been referred from
Keith's old branch.
Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
[anaidu.gollu: rebased on e2e]
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
[k.jensen: rebased for reworked aio tracking]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Pull lba format initialization code into separate function in
preparation for Format NVM support.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
In preparation for Format NVM support, use runtime helpers instead of
the constant device parameters when getting lba size information etc.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
This patch introduces multiple LBA formats supported with the typical
logical block sizes of 512 bytes and 4096 bytes as well as metadata
sizes of 0, 8, 16 and 64 bytes. The format will be chosed based on the
lbads and ms parameters of the nvme-ns device.
Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
[k.jensen: resurrected and rebased]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Verify is not subject to MDTS, so a single Verify command may result in
excessive amounts of allocated memory. Impose a limit on the data size
by adding support for TP 4040 ("Non-MDTS Command Size Limits").
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add support for namespaces formatted with protection information. The
type of end-to-end data protection (i.e. Type 1, Type 2 or Type 3) is
selected with the `pi` nvme-ns device parameter. If the number of
metadata bytes is larger than 8, the `pil` nvme-ns device parameter may
be used to control the location of the 8-byte DIF tuple. The default
`pil` value of '0', causes the DIF tuple to be transferred as the last
8 bytes of the metadata. Set to 1 to store this in the first eight bytes
instead.
Co-authored-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add support for metadata in the form of extended logical blocks as well
as a separate buffer of data. The new `ms` nvme-ns device parameter
specifies the size of metadata per logical block in bytes. The `mset`
nvme-ns device parameter controls whether metadata is transfered as part
of an extended lba (set to '1') or in a separate buffer (set to '0',
the default).
Regardsless of the scheme chosen with `mset`, metadata is stored at the
end of the namespace backing block device. This requires the user
provided PRP/SGLs to be walked and "split" into data and metadata
scatter/gather lists if the extended logical block scheme is used, but
has the advantage of not breaking the deallocated blocks support.
Co-authored-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
nvme_zone_mgmt_recv uses nvme_ns_nlbas() to get the number of LBAs in
the namespace and then calculates the number of zones to report by
incrementing slba with ZSZE until exceeding the number of LBAs as
returned by nvme_ns_nlbas().
This is bad because the namespace might be of such as size that some
LBAs are valid, but are not part of any zone, causing zone management
receive to report one additional (but non-existing) zone.
Fix this with a conventional loop on i < ns->num_zones instead.
Fixes: a479335bfa ("hw/block/nvme: Support Zoned Namespace Command Set")
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Coverity complains about a possible memory corruption in the
nvme_ns_attach and _detach functions. While we should not (famous last
words) be able to reach this function without nsid having previously
been validated, this is still an open door for future misuse.
Make Coverity and maintainers happy by asserting that the index into the
array is valid. Also, while not detected by Coverity (yet), add an
assert in nvme_subsys_ns and nvme_subsys_register_ns as well since a
similar issue is exists there.
Fixes: 037953b5b2 ("hw/block/nvme: support namespace detach")
Fixes: CID 1450757
Fixes: CID 1450758
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
page_size is a uint32_t, and zasl is a uint8_t, so the expression
`page_size << zasl` is done using 32-bit arithmetic and might overflow.
Since we then compare this against a 64 bit data_size value, Coverity
complains that we might overflow unintentionally. An MDTS/ZASL value in
excess of 4GiB is probably impractical, but it is not entirely
unrealistic, so add a cast such that we handle that case properly.
Fixes: 578d914b26 ("hw/block/nvme: align zoned.zasl with mdts")
Fixes: CID 1450756
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Rather than having a device specific debug implementation in
pflash_cfi01.c and pflash_cfi02.c, use the standard tracing facility.
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210216142721.1985543-2-david.edmondson@oracle.com>
[PMD: Rebased, fixed pflash_write_block_erase trace event format]
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
PFlashCFI01.ro is a bool, declare it as such.
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210216142721.1985543-3-david.edmondson@oracle.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Use the 'mode_read_array' event when we set the device in such
mode, and use the 'reset' event in DeviceReset handler.
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210310170528.1184868-10-philmd@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Message-Id: <20210310170528.1184868-9-philmd@redhat.com>
There is multiple places resetting the internal state machine.
Factor the code out in a new pflash_reset_state_machine() method.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Message-Id: <20210310170528.1184868-8-philmd@redhat.com>