Since nvme_map_prp always operate on the request-scoped qsg/iovs, just
pass a single pointer to the NvmeRequest instead of two for each of the
qsg and iov.
Suggested-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Since clean up of the request qsg/iov is now always done post-use, there
is no need to use a stack-allocated qsg/iov in nvme_dma_prp.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Always destroy the request qsg/iov at the end of request use.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Instead of passing around the NvmeNamespace and the NvmeCmd, add them as
members in the NvmeRequest structure.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
The NVM Express specification generally uses 'zeroes' and not 'zeros',
so let us align with it.
Cc: Fam Zheng <fam@euphon.net>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Add 'mdts' device parameter to control the Maximum Data Transfer Size of
the controller and check that it is respected.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Hoist bounds checking into its own function and check for wrap-around.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Before this patch the device already supported PRP lists in the CMB, but
it did not check for the validity of it nor announced the support in the
Identify Controller data structure LISTS field.
If some of the PRPs in a PRP list are in the CMB, then ALL entries must
be there. This patch makes sure that requirement is verified as well as
properly announcing support for PRP lists in the CMB.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Introduce the nvme_map helper to remove some noise in the main nvme_rw
function.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Refactor the nvme_dma_{read,write}_prp functions into a common function
taking a DMADirection parameter.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Make sure the request iov is destroyed before reuse; fixing a memory
leak.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Remove the has_sg member from NvmeRequest since it's redundant.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
The QSG isn't always initialized, so accounting could be wrong. Issue a
call to blk_acct_start instead with the size taken from the QSG or IOV
depending on the kind of I/O.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Add nvme_map_addr, nvme_map_addr_cmb and nvme_addr_to_cmb helpers and
use them in nvme_map_prp.
This fixes a bug where in the case of a CMB transfer, the device would
map to the buffer with a wrong length.
Fixes: b2b2b67a00 ("nvme: Add support for Read Data and Write Data in CMBs.")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com>
This is preparatory to subsequent patches that change how QSGs/IOVs are
handled. It is important that the qsg and iov members of the NvmeRequest
are initially zeroed.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
The SUBNQN field is mandatory in NVM Express 1.3.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200706061303.246057-18-its@irrelevant.dk>
Support returning Command Sequence Error if Set Features on Number of
Queues is called after queues have been created.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Message-Id: <20200706061303.246057-17-its@irrelevant.dk>
Reject the nsid broadcast value (0xffffffff) and 0xfffffffe in the
Active Namespace ID list.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200706061303.246057-16-its@irrelevant.dk>
Since we are not providing the NGUID or EUI64 fields, we must support
the Namespace UUID. We do not have any way of storing a persistent
unique identifier, so conjure up a UUID that is just the namespace id.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200706061303.246057-15-its@irrelevant.dk>
0xffff is not an allowed value for NCQR and NSQR in Set Features on
Number of Queues.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Message-Id: <20200706061303.246057-14-its@irrelevant.dk>
Since the device does not have any persistent state storage, no
features are "saveable" and setting the Save (SV) field in any Set
Features command will result in a Feature Identifier Not Saveable status
code.
Similarly, if the Select (SEL) field is set to request saved values, the
devices will (as it should) return the default values instead.
Since this also introduces "Supported Capabilities", the nsid field is
now also checked for validity wrt. the feature being get/set'ed.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200706061303.246057-13-its@irrelevant.dk>
If the write cache is disabled with a Set Features command, flush it if
currently enabled.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200706061303.246057-11-its@irrelevant.dk>
Add support for the Asynchronous Event Request command. Required for
compliance with NVMe revision 1.3d. See NVM Express 1.3d, Section 5.2
("Asynchronous Event Request command").
Mostly imported from Keith's qemu-nvme tree. Modified with a max number
of queued events (controllable with the aer_max_queued device
parameter). The spec states that the controller *should* retain
events, so we do best effort here.
Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Message-Id: <20200706061303.246057-9-its@irrelevant.dk>
Add support for the Get Log Page command and basic implementations of
the mandatory Error Information, SMART / Health Information and Firmware
Slot Information log pages.
In violation of the specification, the SMART / Health Information log
page does not persist information over the lifetime of the controller
because the device has no place to store such persistent state.
Note that the LPA field in the Identify Controller data structure
intentionally has bit 0 cleared because there is no namespace specific
information in the SMART / Health information log page.
Required for compliance with NVMe revision 1.3d. See NVM Express 1.3d,
Section 5.14 ("Get Log Page command").
Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200706061303.246057-8-its@irrelevant.dk>
Mark firmware slot 1 as read-only and only support that slot.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200706061303.246057-7-its@irrelevant.dk>
It might seem weird to implement this feature for an emulated device,
but it is mandatory to support and the feature is useful for testing
asynchronous event request support, which will be added in a later
patch.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Message-Id: <20200706061303.246057-6-its@irrelevant.dk>
Required for compliance with NVMe revision 1.3d. See NVM Express 1.3d,
Section 5.1 ("Abort command").
The Abort command is a best effort command; for now, the device always
fails to abort the given command.
Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Message-Id: <20200706061303.246057-5-its@irrelevant.dk>
Add various additional tracing and streamline nvme_identify_ns and
nvme_identify_nslist (they do not need to repeat the command, it is
already in the trace name).
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200706061303.246057-4-its@irrelevant.dk>
Fix a missing cpu_to conversion by moving conversion to just before
returning instead.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200706061303.246057-3-its@irrelevant.dk>
Add missing fields in the Identify Controller and Identify Namespace
data structures to bring them in line with NVMe v1.3.
This also adds data structures and defines for SGL support which
requires a couple of trivial changes to the nvme block driver as well.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Fam Zheng <fam@euphon.net>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Message-Id: <20200706061303.246057-2-its@irrelevant.dk>
Simplify the NVMe emulated device by aligning the I/O BAR to 4 KiB.
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200630110429.19972-5-philmd@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
At some point the URL changed, update it to avoid other
developers to search for it.
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200630110429.19972-2-philmd@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
object_get_canonical_path_component() returns a malloced copy of a
property name on success, null on failure.
19 of its 25 callers immediately free the returned copy.
Change object_get_canonical_path_component() to return the property
name directly. Since modifying the name would be wrong, adjust the
return type to const char *.
Drop the free from the 19 callers become simpler, add the g_strdup()
to the other six.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200714160202.3121879-4-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Several block device properties related to blocksize configuration must
be in certain relationship WRT each other: physical block must be no
smaller than logical block; min_io_size, opt_io_size, and
discard_granularity must be a multiple of a logical block.
To ensure these requirements are met, add corresponding consistency
checks to blkconf_blocksizes, adjusting its signature to communicate
possible error to the caller. Also remove the now redundant consistency
checks from the specific devices.
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <20200528225516.1676602-3-rvkagan@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Pass an Error to msix_init_exclusive_bar() and check it.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Message-Id: <20200609190333.59390-23-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Decouple the requested maximum number of ioqpairs (param max_ioqpairs)
from the number of MSI-X interrupt vectors by introducing a new
msix_qsize parameter and initialize MSI-X with that. This allows
emulating a device that has fewer vectors than I/O queue pairs and also
allows more than 2048 queue pairs. To keep the device behaving as
previously, use a msix_qsize default of 65 (default max_ioqpairs + 1).
This decoupling was actually suggested by Maxim some time ago in a
slightly different context, so adding a Suggested-by.
Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Message-Id: <20200609190333.59390-22-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
msix_vector_use() returns -EINVAL on error. Assert it won't.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200609190333.59390-21-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-20-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200609190333.59390-19-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200609190333.59390-18-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-17-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-16-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-15-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Introduce some small helpers to make the next patches easier on the eye.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-14-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-13-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-12-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-11-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-10-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The num_queues device paramater has a slightly confusing meaning because
it accounts for the admin queue pair which is not really optional.
Secondly, it is really a maximum value of queues allowed.
Add a new max_ioqpairs parameter that only accounts for I/O queue pairs,
but keep num_queues for compatibility.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-9-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
First, since the device only supports MSI-X or pin-based interrupt, if
MSI-X is not enabled, it should not accept interrupt vectors different
from 0 when creating completion queues.
Secondly, the irq_status NvmeCtrl member is meant to be compared to the
INTMS register, so it should only be 32 bits wide. And it is really only
useful when used with multi-message MSI.
Third, since we do not force a 1-to-1 correspondence between cqid and
interrupt vector, the irq_status register should not have bits set
according to cqid, but according to the associated interrupt vector.
Fix these issues, but keep irq_status available so we can easily support
multi-message MSI down the line.
Fixes: 5e9aa92eb1 ("hw/block: Fix pin-based interrupt behaviour of NVMe")
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-8-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Pull the controller memory buffer check to its own function. The check
will be used on its own in later patches.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-7-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-6-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move device configuration parameters to separate struct to make it
explicit what is configurable and what is set internally.
Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200609190333.59390-5-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
These break statements was left over when commit 3036a626e9 ("nvme:
add Get/Set Feature Timestamp support") was merged.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-4-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Change the prefix of all nvme device related trace events to 'pci_nvme'
to not clash with trace events from the nvme block driver.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-3-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The size of the BAR is 0x1000 (main registers) + 8 bytes for each
queue. Currently, the size of the BAR is calculated like so:
n->reg_size = pow2ceil(0x1004 + 2 * (n->num_queues + 1) * 4);
Since the 'num_queues' parameter already accounts for the admin queue,
this should in any case not need to be incremented by one. Also, the
size should be initialized to (0x1000).
n->reg_size = pow2ceil(0x1000 + 2 * n->num_queues * 4);
This, with the default value of num_queues (64), we will set aside room
for 1 admin queue and 63 I/O queues (4 bytes per doorbell, 2 doorbells
per queue).
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-2-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now than the non-target specific memory_region_msync() function
is available, use it to make this device target-agnostic.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20200508062456.23344-4-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Several functions can't fail anymore: ich9_pm_add_properties(),
device_add_bootindex_property(), ppc_compat_add_property(),
spapr_caps_add_properties(), PropertyInfo.create(). Drop their @errp
parameter.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200505152926.18877-16-armbru@redhat.com>
This patch introduces support for PMR that has been defined as part of NVMe 1.4
spec. User can now specify a pmrdev option that should point to HostMemoryBackend.
pmrdev memory region will subsequently be exposed as PCI BAR 2 in emulated NVMe
device. Guest OS can perform mmio read and writes to the PMR region that will stay
persistent across system reboot.
Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200330164656.9348-1-andrzej.jakowski@linux.intel.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In my "build everything" tree, changing hw/qdev-properties.h triggers
a recompile of some 2700 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).
Many places including hw/qdev-properties.h (directly or via hw/qdev.h)
actually need only hw/qdev-core.h. Include hw/qdev-core.h there
instead.
hw/qdev.h is actually pointless: all it does is include hw/qdev-core.h
and hw/qdev-properties.h, which in turn includes hw/qdev-core.h.
Replace the remaining uses of hw/qdev.h by hw/qdev-properties.h.
While there, delete a few superfluous inclusions of hw/qdev-core.h.
Touching hw/qdev-properties.h now recompiles some 1200 objects.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190812052359.30071-22-armbru@redhat.com>
In my "build everything" tree, changing hw/hw.h triggers a recompile
of some 2600 out of 6600 objects (not counting tests and objects that
don't depend on qemu/osdep.h).
The previous commits have left only the declaration of hw_error() in
hw/hw.h. This permits dropping most of its inclusions. Touching it
now recompiles less than 200 objects.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190812052359.30071-19-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
In my "build everything" tree, changing migration/vmstate.h triggers a
recompile of some 2700 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).
hw/hw.h supposedly includes it for convenience. Several other headers
include it just to get VMStateDescription. The previous commit made
that unnecessary.
Include migration/vmstate.h only where it's still needed. Touching it
now recompiles only some 1600 objects.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190812052359.30071-16-armbru@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
The device mistakenly reports that the Weighted Round Robin with Urgent
Priority Class arbitration mechanism is supported.
It is not.
Signed-off-by: Klaus Birkelund Jensen <klaus.jensen@cnexlabs.com>
Message-id: 20190606092530.14206-1-klaus@birkelund.eu
Acked-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kenneth Heitke <kenneth.heitke@intel.com>
Reviewed-by: Klaus Birkelund Jensen <klaus.jensen@cnexlabs.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
`nvme_dma_read_prp` erronously used `qemu_iovec_*to*_buf` instead of
`qemu_iovec_*from*_buf` when the request involved the controller memory
buffer.
Signed-off-by: Klaus Birkelund Jensen <klaus.jensen@cnexlabs.com>
Reviewed-by: Kenneth Heitke <kenneth.heitke@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The implementation used blocks units rather than the expected bytes.
Fixes: c03e7ef12a ("nvme: Implement Write Zeroes")
Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There is no need to make another reference.
Signed-off-by: Li Qiang <liq3ea@163.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190120055558.32984-4-liq3ea@163.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
When it is zero, it causes segv.
Using following command:
"-drive file=//home/test/test1.img,if=none,id=id0
-device nvme,drive=id0,serial=test,num_queues=0"
causes following Backtrack:
Thread 4 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe9735700 (LWP 30952)]
0x0000555555a7a77c in nvme_start_ctrl (n=0x5555577473f0) at hw/block/nvme.c:825
825 if (unlikely(n->cq[0])) {
(gdb) bt
0 0x0000555555a7a77c in nvme_start_ctrl (n=0x5555577473f0)
at hw/block/nvme.c:825
1 0x0000555555a7af7f in nvme_write_bar (n=0x5555577473f0, offset=20,
data=4587521, size=4) at hw/block/nvme.c:969
2 0x0000555555a7b81a in nvme_mmio_write (opaque=0x5555577473f0, addr=20,
data=4587521, size=4) at hw/block/nvme.c:1163
3 0x0000555555869236 in memory_region_write_accessor (mr=0x555557747cd0,
addr=20, value=0x7fffe97320f8, size=4, shift=0, mask=4294967295, attrs=...)
at /home/test/qemu1/qemu/memory.c:502
4 0x0000555555869446 in access_with_adjusted_size (addr=20,
value=0x7fffe97320f8, size=4, access_size_min=2, access_size_max=8,
access_fn=0x55555586914d <memory_region_write_accessor>,
mr=0x555557747cd0, attrs=...) at /home/test/qemu1/qemu/memory.c:568
5 0x000055555586c479 in memory_region_dispatch_write (mr=0x555557747cd0,
addr=20, data=4587521, size=4, attrs=...)
at /home/test/qemu1/qemu/memory.c:1499
6 0x00005555558030af in flatview_write_continue (fv=0x7fffe0061130,
addr=4273930260, attrs=..., buf=0x7ffff7ff0028 "\001", len=4, addr1=20,
l=4, mr=0x555557747cd0) at /home/test/qemu1/qemu/exec.c:3234
7 0x00005555558031f9 in flatview_write (fv=0x7fffe0061130, addr=4273930260,
attrs=..., buf=0x7ffff7ff0028 "\001", len=4)
at /home/test/qemu1/qemu/exec.c:3273
8 0x00005555558034ff in address_space_write (
---Type <return> to continue, or q <return> to quit---
as=0x555556758480 <address_space_memory>, addr=4273930260, attrs=...,
buf=0x7ffff7ff0028 "\001", len=4) at /home/test/qemu1/qemu/exec.c:3363
9 0x0000555555803550 in address_space_rw (
as=0x555556758480 <address_space_memory>, addr=4273930260, attrs=...,
buf=0x7ffff7ff0028 "\001", len=4, is_write=true)
at /home/test/qemu1/qemu/exec.c:3374
10 0x00005555558884a1 in kvm_cpu_exec (cpu=0x555556920e40)
at /home/test/qemu1/qemu/accel/kvm/kvm-all.c:2031
11 0x000055555584cd9d in qemu_kvm_cpu_thread_fn (arg=0x555556920e40)
at /home/test/qemu1/qemu/cpus.c:1281
12 0x0000555555dbaf6d in qemu_thread_start (args=0x5555569438a0)
at util/qemu-thread-posix.c:502
13 0x00007ffff5dc86db in start_thread (arg=0x7fffe9735700)
at pthread_create.c:463
14 0x00007ffff5af188f in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Signed-off-by: Li Qiang <liq3ea@163.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190120055558.32984-3-liq3ea@163.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Li Qiang <liq3ea@163.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190120055558.32984-2-liq3ea@163.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
The code had asserted an interrupt every time it was requested to check
for new completion queue entries.This can result in spurious interrupts
seen by the guest OS.
Fix this by asserting an interrupt only if there are un-acknowledged
completion queue entries available.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When the submission and completion queues are being torn down
the IRQ will be asserted for the completion queue when the
submsission queue is deleted. Then when the completion queue
is deleted it stays asserted. Thus, on systems that do
not use MSI, no further interrupts can be triggered on the host.
Linux sees this as a long delay when unbinding the nvme device.
Eventually the interrupt timeout occurs and it continues.
To fix this we ensure we deassert the IRQ for a CQ when it is
deleted.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The CMB is marked as DEVICE_LITTLE_ENDIAN, so the data must be
read/written as if it was little-endian output (in the case of
big endian, we get two swaps, one in the memory core and one
in nvme.c).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Because the CMB BAR has a min_access_size of 2, if you read the last
byte it will try to memcpy *2* bytes from n->cmbuf, causing an off-by-one
error. This is CVE-2018-16847.
Another way to fix this might be to register the CMB as a RAM memory
region, which would also be more efficient. However, that might be a
change for big-endian machines; I didn't think this through and I don't
know how real hardware works. Add a basic testcase for the CMB in case
somebody does this change later on.
Cc: Keith Busch <keith.busch@intel.com>
Cc: qemu-block@nongnu.org
Reported-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Tested-by: Li Qiang <liq3ea@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When blk_flush called in NVMe reset path S/C queues are already freed
which means that re-entering AIO handling loop having some IO requests
unfinished will lockup or crash as their SG structures being potentially
reused. Call blk_drain before freeing the queues to avoid this nasty
scenario.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently, the nvme_cmb_ops mr doesn't check the addr and size.
This can lead an oob access issue. This is triggerable in the guest.
Add check to avoid this issue.
Fixes CVE-2018-16847.
Reported-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Li Qiang <liq3ea@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This avoid a memory leak in unhotplug nvme device.
Signed-off-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently, when hotplug/unhotplug nvme device, it will cause an
assert in object.c. Following is the backtrack:
ERROR:qom/object.c:981:object_unref: assertion failed: (obj->ref > 0)
Thread 2 "qemu-system-x86" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffcbd32700 (LWP 18844)]
0x00007fffdb9e4fff in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
/lib/x86_64-linux-gnu/libglib-2.0.so.0
/lib/x86_64-linux-gnu/libglib-2.0.so.0
qom/object.c:981
/home/liqiang02/qemu-upstream/qemu/memory.c:1732
/home/liqiang02/qemu-upstream/qemu/memory.c:285
util/qemu-thread-posix.c:504
/lib/x86_64-linux-gnu/libpthread.so.0
This is caused by memory_region_unref in nvme_exit.
Remove it to make the PCIdevice refcount correct.
Signed-off-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This reinstates commit b008326744,
which was temporarily reverted for the 3.0 release so that libvirt gets
some extra time to update their command lines.
The -drive option serial was deprecated in QEMU 2.10. It's time to
remove it.
Tests need to be updated to set the serial number with -global instead
of using the -drive option.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
This reverts commit b008326744.
Hold off removing this for one more QEMU release (current libvirt
release still uses it.)
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It eases code review, unit is explicit.
Patch generated using:
$ git grep -E '(1024|2048|4096|8192|(<<|>>).?(10|20|30))' hw/ include/hw/
and modified manually.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20180625124238.25339-14-f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add an optional paramter num_queues for device, and set it
to 64 by default.
Signed-off-by: Weiping Zhang <zhangweiping@didichuxing.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The -drive option serial was deprecated in QEMU 2.10. It's time to
remove it.
Tests need to be updated to set the serial number with -global instead
of using the -drive option.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20180528232719.4721-16-f4bug@amsat.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
according to Eduardo Habkost's commit fd3b02c889 all PCIEs now implement
INTERFACE_PCIE_DEVICE so we don't need is_express field anymore.
Devices that implements only INTERFACE_PCIE_DEVICE (is_express == 1)
or
devices that implements only INTERFACE_CONVENTIONAL_PCI_DEVICE (is_express == 0)
where not affected by the change.
The only devices that were affected are those that are hybrid and also
had (is_express == 1) - therefor only:
- hw/vfio/pci.c
- hw/usb/hcd-xhci.c
- hw/xen/xen_pt.c
For those 3 I made sure that QEMU_PCI_CAP_EXPRESS is on in instance_init()
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Yoni Bettan <ybettan@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Pin-based interrupt of NVMe controller did not work properly
because using an obsolated function pci_irq_pulse().
To fix this, change to use pci_irq_assert() / pci_irq_deassert()
instead of pci_irq_pulse().
Signed-off-by: Hikaru Nishida <hikarupsp@gmail.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add trace output for commands, errors, and undefined behavior.
Add guest error log output for undefined behavior.
Report invalid undefined accesses to MMIO.
Annotate unlikely error checks with unlikely.
Signed-off-by: Doug Gale <doug16k@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
[Drop virtio_blk_data_plane_create() change that misinterprets return
value when the virtio transport does not support dataplane.
--Stefan]
Cc: John Snow <jsnow@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: e77848d3735ba590f23ffbf8094379c646c33d79.1511317952.git.maozy.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Convert nvme_init() to realize and rename it to nvme_realize().
Cc: John Snow <jsnow@redhat.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com>
Message-id: 2882e72d795e04cbe2120f569d551aef2467ac60.1511317952.git.maozy.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Change all devices that set is_express=1 to implement
INTERFACE_PCIE_DEVICE.
Cc: Keith Busch <keith.busch@intel.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: Dmitry Fleytman <dmitry@daynix.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: qemu-block@nongnu.org
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The number of queues that should be return by the admin command should:
1) Only mention the number of non-admin queues.
2) It is zero-based, meaning that '0 == one non-admin queue',
'1 == two non-admin queues', and so forth.
Because our `num_queues` means the number of queues _plus_ the admin
queue, then the right calculation for the number returned from the admin
command is `num_queues - 2`, combining the two requirements mentioned.
The issue was discovered by reducing num_queues from 64 to 8 and running
a Linux VM with an SMP parameter larger than that (e.g. 22). It tries to
utilize all queues, and therefore fails with an invalid queue number
when trying to queue I/Os on the last queue.
Signed-off-by: Dan Aloni <dan@kernelim.com>
CC: Alex Friedman <alex@e8storage.com>
CC: Keith Busch <keith.busch@intel.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Add the ability for the NVMe model to support both the RDS and WDS
modes in the Controller Memory Buffer.
Although not currently supported in the upstreamed Linux kernel a fork
with support exists [1] and user-space test programs that build on
this also exist [2].
Useful for testing CMB functionality in preperation for real CMB
enabled NVMe devices (coming soon).
[1] https://github.com/sbates130272/linux-p2pmem
[2] https://github.com/sbates130272/p2pmem-test
Signed-off-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Implement NVMe Controller Memory Buffers (CMBs) which were added in
version 1.2 of the NVMe Specification. This patch adds an optional
argument (cmb_size_mb) which indicates the size of the CMB (in
MB). Currently only the Submission Queue Support (SQS) is enabled
which aligns with the current Linux driver for NVMe.
Signed-off-by: Stephen Bates <sbates@raithlin.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: ported over from qemu-nvme.git to mainline]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This makes all device emulations with a qdev drive property request
permissions on their BlockBackend. The only thing we block at this point
is resizing images for some devices that can't support it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
msix_init() reports errors with error_report(), which is wrong when
it's used in realize(). The same issue was fixed for msi_init() in
commit 1108b2f. In order to make the API change as small as possible,
leave the return value check to later patch.
For some devices(like e1000e, vmxnet3, nvme) who won't fail because of
msix_init's failure, suppress the error report by passing NULL error
object.
Bonus: add comment for msix_init.
CC: Jiri Pirko <jiri@resnulli.us>
CC: Gerd Hoffmann <kraxel@redhat.com>
CC: Dmitry Fleytman <dmitry@daynix.com>
CC: Jason Wang <jasowang@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Hannes Reinecke <hare@suse.de>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Alex Williamson <alex.williamson@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The condition '!A || (A && B)' is equivalent to '!A || B'.
Buglink: https://bugs.launchpad.net/qemu/+bug/1464611
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The hard-coded default alignment is BDRV_SECTOR_SIZE, however this is not
necessarily the case for all platforms. Use this as the default alignment for
all current callers.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: John Snow <jsnow@redhat.com>
Message-id: 1476445266-27503-2-git-send-email-mark.cave-ayland@ilande.co.uk
Signed-off-by: John Snow <jsnow@redhat.com>
The broken Identify implementation in earlier Qemu versions means we
need to blacklist it from issueing the NVMe 1.1 Identify Namespace List
command. As we want to be able to use it in newer Qemu versions we need
a way to identify those. Bump the PCI revision as a guest visible
indicator of this bug fix.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
NVMe 1.1 requires devices to implement a Namespace List subcommand of
the identify command. Qemu not only not implements this features, but
also misinterprets it as an Identify Controller request. Due to this
any OS trying to use the Namespace List will fail the probe.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
As cache.writeback is a BlockBackend property and as such more related
to the guest device than the BlockDriverState, we already removed it
from the blockdev-add interface. This patch adds the new way to set it,
as a qdev property of the corresponding guest device.
For example: -drive if=none,file=test.img,node-name=img
-device ide-hd,drive=img,write-cache=off
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Tracked down with an ugly, brittle and probably buggy Perl script.
Also move includes converted to <...> up so they get included before
ours where that's obviously okay.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef. Since then, we've moved to include qemu/osdep.h
everywhere. Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h. That's in excess of
100KiB of crap most .c files don't actually need.
Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h. Include qapi/error.h in .c files that need it and don't
get it now. Include qapi-types.h in qom/object.h for uint16List.
Update scripts/clean-includes accordingly. Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h
comment quoted above similarly.
This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third. Unfortunately, the number depending on
qapi-types.h shrinks only a little. More work is needed for that one.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Background on QEMU boot indices
-------------------------------
Normally, the "bootindex" property is configured for bootable devices
with:
DEVICE_instance_init()
device_add_bootindex_property(..., "bootindex", ...)
object_property_add(..., device_get_bootindex,
device_set_bootindex, ...)
and when the bootindex is set on the QEMU command line, with
-device DEVICE,...,bootindex=N
the setter that was configured above is invoked:
device_set_bootindex()
/* parse boot index */
visit_type_int32()
/* verify unicity */
check_boot_index()
/* store parsed boot index */
...
/* insert device path to boot order */
add_boot_device_path()
In the last step, add_boot_device_path() ensures that an OpenFirmware
device path will show up in the "bootorder" fw_cfg file, at a position
corresponding to the device's boot index. Thus guest firmware (SeaBIOS and
OVMF) can try to boot off the device with the right priority.
NVMe boot index
---------------
In QEMU commit 33739c7129,
nvma: ide: add bootindex to qom property
the following generic setters / getters:
- device_set_bootindex()
- device_get_bootindex()
were open-coded for NVMe, under the names
- nvme_set_bootindex()
- nvme_get_bootindex()
Plus nvme_instance_init() was added to configure the "bootindex" property
manually, designating the open-coded getter & setter, rather than calling
device_add_bootindex_property().
Crucially, nvme_set_bootindex() avoided the final add_boot_device_path()
call. This fact is spelled out in the message of commit 33739c7129, and
it was presumably the entire reason for all of the code duplication.
Now, Vladislav filed an RFE for OVMF
<https://github.com/tianocore/edk2/issues/48>; OVMF should boot off NVMe
devices. It is simple to build edk2's existent NvmExpressDxe driver into
OVMF, but the boot order matching logic in OVMF can only handle NVMe if
the "bootorder" fw_cfg file includes such devices.
Therefore this patch converts the NVMe device model to
device_set_bootindex() all the way.
Device paths
------------
device_set_bootindex() accepts an optional parameter called "suffix". When
present, it is expected to take the form of an OpenFirmware device path
node, and it gets appended as last node to the otherwise auto-generated
OFW path.
For NVMe, the auto-generated part is
/pci@i0cf8/pci8086,5845@6[,1]
^ ^ ^ ^
| | PCI slot and (present when nonzero)
| | function of the NVMe controller, both hex
| "driver name" component, built from PCI vendor & device IDs
PCI root at system bus port, PIO
to which here we append the suffix
/namespace@1,0
^ ^
| big endian (MSB at lowest address) numeric interpretation
| of the 64-bit IEEE Extended Unique Identifier, aka EUI-64,
| hex
32-bit NVMe namespace identifier, aka NSID, hex
resulting in the OFW device path
/pci@i0cf8/pci8086,5845@6[,1]/namespace@1,0
The reason for including the NSID and the EUI-64 is that an NVMe device
can in theory produce several different namespaces (distinguished by
NSID). Additionally, each of those may (optionally) have an EUI-64 value.
For now, QEMU only provides namespace 1.
Furthermore, QEMU doesn't even represent the EUI-64 as a standalone field;
it is embedded (and left unused) inside the "NvmeIdNs.res30" array, at the
last eight bytes. (Which is fine, since EUI-64 can be left zero-filled if
unsupported by the device.)
Based on the above, we set the "unit address" part of the last
("namespace") node to fixed "1,0".
OVMF will then map the above OFW device path to the following UEFI device
path fragment, for boot order processing:
PciRoot(0x0)/Pci(0x6,0x1)/NVMe(0x1,00-00-00-00-00-00-00-00)
^ ^ ^ ^ ^ ^
| | | | | octets of the EUI-64 in address order
| | | | NSID
| | | NVMe namespace messaging device path node
| PCI slot and function
PCI root bridge
Cc: Keith Busch <keith.busch@intel.com> (supporter:nvme)
Cc: Kevin Wolf <kwolf@redhat.com> (supporter:Block layer core)
Cc: qemu-block@nongnu.org (open list:nvme)
Cc: Gonglei <arei.gonglei@huawei.com>
Cc: Vladislav Vovchenko <vladislav.vovchenko@sk.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Kevin O'Connor <kevin@koconnor.net>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Gonglei <arei.gonglei@huawei.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Tested-by: Vladislav Vovchenko <vladislav.vovchenko@sk.com>
Message-id: 1453850483-27511-1-git-send-email-lersek@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Use pow2ceil() to round up to the next power of 2, rather
than an inline calculation.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1437741192-20955-4-git-send-email-peter.maydell@linaro.org
Implement support in Identify and Get/Set Features to properly report
and allow to change the Volatile Write Cache status reported by the
virtual NVMe device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Implement a real flush instead of faking it. This is especially important
as Qemu assume Write back cashing by default and thus requires a working
cache flush operation for data integrity.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The SCSI emulation in the Linux NVMe driver really wants to know
if a device has a volatile write cache. Given that qemu has moved
away from a model where we report the backing store WCE bit to
one where the WCE bit is supposed to be part of the migratable
guest-visible state we always return 1 here.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Windows seems to send two separate calls to NVMe controller configuration. The
first sends configuration info and the second the enable bit. I couldn't
enable the Windows 8.1 in-box NVMe driver with base Qemu. I made the
following change to store the configuration data and then handle enable and
NVMe driver works on Windows 8.1.
I am not a Windows expert and I'm not entirely sure this is the correct
approach. I'm offering it for anyone who wishes to use NVMe on Windows 8.1
using Qemu.
I have tested this change with Linux and Windows guests with NVMe devices.
Signed-off-by: Daniel Stekloff <dan@wendan.org>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The shift operation on nlb gives a 32 bit result if no type cast is
applied. This bug was reported by Coverity.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-id: 1426348844-8793-1-git-send-email-sw@weilnetz.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
geometry: hd_geometry_guess function autodetects the drive geometry.
This patch adds a block backend call, that probes the backing device
geometry. If the inner driver method is implemented and succeeds
(currently only for DASDs), the blkconf_geometry will pass-through
the backing device geometry. Otherwise will fallback to old logic.
blocksize: This patch initializes blocksize properties to 0.
In order to set the property a blkconf_blocksizes was introduced.
If user didn't set physical or logical blocksize, it will
retrieve its value from a driver (only succeeds for DASD), otherwise
it will set default 512 value.
The blkconf_blocksizes call was added to all users of BlkConf.
Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1424087278-49393-6-git-send-email-tumanova@linux.vnet.ibm.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
According to NVMe specifications Bits 15:08 represent Minor Version number.
Signed-off-by: Anubhav Rakshit <anubhav.rakshit@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
According to the specification, the low 16 bits should contain the number of
I/O submission queues, and the high 16 bits should contain the number of
I/O completion queues.
Signed-off-by: Alex Friedman <alex@e8storage.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Initialise our maximum page size capability to 64kB and increase
the page_size variable from 16 to 32 bits.
Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Operands don't affect result (CONSTANT_EXPRESSION_RESULT)
((n->bar.aqa >> AQA_ASQS_SHIFT) & AQA_ASQS_MASK) > 4095
is always false regardless of the values of its operands.
This occurs as the logical second operand of '||'.
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Device models should access their block backends only through the
block-backend.h API. Convert them, and drop direct includes of
inappropriate headers.
Just four uses of BlockDriverState are left:
* The Xen paravirtual block device backend (xen_disk.c) opens images
itself when set up via xenbus, bypassing blockdev.c. I figure it
should go through qmp_blockdev_add() instead.
* Device model "usb-storage" prompts for keys. No other device model
does, and this one probably shouldn't do it, either.
* ide_issue_trim_cb() uses bdrv_aio_discard() instead of
blk_aio_discard() because it fishes its backend out of a BlockAIOCB,
which has only the BlockDriverState.
* PC87312State has an unused BlockDriverState[] member.
The next two commits take care of the latter two.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
At present, nvma cannot boot. However, it provides already
a bootindex property, so change bootindex to qom for nvma
device, but not call add_boot_device_path.
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
This is the next step for decoupling block accounting functions from
BlockDriverState.
In a future commit the BlockAcctStats structure will be moved from
BlockDriverState to the device models structures.
Note that bdrv_get_stats was introduced so device models can retrieve the
BlockAcctStats structure of a BlockDriverState without being aware of it's
layout.
This function should go away when BlockAcctStats will be embedded in the device
models structures.
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Keith Busch <keith.busch@intel.com>
CC: Anthony Liguori <aliguori@amazon.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: Peter Maydell <peter.maydell@linaro.org>
CC: Michael Tokarev <mjt@tls.msk.ru>
CC: John Snow <jsnow@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Alexander Graf <agraf@suse.de>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The middle term goal is to move the BlockAcctStats structure in the device models.
(Capturing I/O accounting statistics in the device models is good for billing)
This patch make a small step in this direction by removing a reference to BDRV.
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Keith Busch <keith.busch@intel.com>
CC: Anthony Liguori <aliguori@amazon.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Markus Armbruster <armbru@redhat.com>
CC: Alexander Graf <agraf@suse.de>i
Signed-off-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
g_new(T, n) is safer than g_malloc(sizeof(*v) * n) for two reasons.
One, it catches multiplication overflowing size_t. Two, it returns
T * rather than void *, which lets the compiler catch more type
errors.
Perhaps a conversion to g_malloc_n() would be neater in places, but
that's merely four years old, and we can't use such newfangled stuff.
This commit only touches allocations with size arguments of the form
sizeof(T), plus two that use 4 instead of sizeof(uint32_t). We can
make the others safe by converting to g_malloc_n() when it becomes
available to us in a couple of years.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The function is empty after the previous patch, so remove it.
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It may not be sensible for normal use cases, but it allows to use
/dev/null in QTest.
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
pci_set_irq and the other pci irq wrappers use
PCI_INTERRUPT_PIN config register to compute device
INTx pin to assert/deassert.
An irq is allocated using pci_allocate_irq wrapper
only if is needed by non pci devices.
Removed irq related fields from state if not used anymore.
Signed-off-by: Marcel Apfelbaum <marcel.a@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This is an autogenerated patch using scripts/switch-timer-api.
Switch the entire code base to using the new timer API.
Note this patch may introduce some line length issues.
Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The category will be used to sort the devices displayed in
the command line help.
Signed-off-by: Marcel Apfelbaum <marcel.a@redhat.com>
Message-id: 1375107465-25767-4-git-send-email-marcel.a@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The DMAContext is a simple pointer to an AddressSpace that is now always
already available. Make everyone hold the address space directly,
and clean up the DMA API to use the AddressSpace directly.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Initial commit for emulated Non-Volatile-Memory Express (NVMe) pci
storage device.
NVMe is an open, industry driven storage specification defining
an optimized register and command set designed to deliver the full
capabilities of non-volatile memory on PCIe SSDs. Further information
may be found on the organizations website at:
http://www.nvmexpress.org/
This commit implements the minimum from the specification to work with
existing drivers.
Cc: Keith Busch <keith.busch@gmail.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>