Add support for using the broadcast nsid to issue a flush on all
namespaces through a single command.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Commit 6eb7a07129 ("hw/block/nvme: change controller pci id") changed
the controller to use a Red Hat assigned PCI Device and Vendor ID, but
did not change the IEEE OUI away from the Intel IEEE OUI.
Fix that and use the locally assigned QEMU IEEE OUI instead if the
`use-intel-id` parameter is not explicitly set. Also reverse the Intel
IEEE OUI bytes.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
The Zone Append Size Limit (ZASL) must be at least 4096 bytes, so
improve the user experience by adding an early parameter check in
nvme_check_constraints.
When ZASL is still too small due to the host configuring the device for
an even larger page size, convert the trace point in nvme_start_ctrl to
an NVME_GUEST_ERR such that this is logged by QEMU instead of only
traced.
Reported-by: Corne <info@dantalion.nl>
Cc: Dmitry Fomichev <Dmitry.Fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Firstly, if zoned.max_active is non-zero, zoned.max_open must be less
than or equal to zoned.max_active.
Secondly, if only zones.max_active is set, we have to explicitly set
zones.max_open or we end up with an invalid MAR/MOR configuration. This
is an artifact of the parameters not being zeroes-based like in the
spec.
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reported-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Implicitly and Explicitly Open zones can be closed by Close Zone
management function. This got broken by a recent commit ("hw/block/nvme:
refactor zone resource management") and now such commands fail with
Invalid Zone State Transition status.
Modify nvm_zrm_close() function to make Close Zone work correctly.
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Add support for TP 4065a ("Simple Copy Command"), v2020.05.04
("Ratified").
The implementation uses a bounce buffer to first read in the source
logical blocks, then issue a write of that bounce buffer. The default
maximum number of source logical blocks is 128, translating to 512 KiB
for 4k logical blocks which aligns with the default value of MDTS.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
In preparation for Simple Copy, pull write pointer advancement into a
separate function that is independent off an NvmeRequest.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Zone transition handling and resource management is open coded (and
semi-duplicated in the case of open, close and finish).
In preparation for Simple Copy command support (which also needs to open
zones for writing), consolidate into a set of 'nvme_zrm' functions and
in the process fix a bug with the controller not closing an open zone to
allow another zone to be explicitly opened.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Remove the unused NvmeCtrl parameter in nvme_check_zone_write.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
nvme-ns device is registered to a nvme controller device during the
initialization in nvme_register_namespace() in case that 'bus' property
is given which means it's mapped to a single controller.
This patch introduced a new property 'subsys' just like the controller
device instance did to map a namespace to a NVMe subsystem.
If 'subsys' property is given to the nvme-ns device, it will belong to
the specified subsystem and will be attached to all controllers in that
subsystem by enabling shared namespace capability in NMIC(Namespace
Multi-path I/O and Namespace Capabilities) in Identify Namespace.
Usage:
-device nvme-subsys,id=subsys0
-device nvme,serial=foo,id=nvme0,subsys=subsys0
-device nvme,serial=bar,id=nvme1,subsys=subsys0
-device nvme,serial=baz,id=nvme2,subsys=subsys0
-device nvme-ns,id=ns1,drive=<drv>,nsid=1,subsys=subsys0 # Shared
-device nvme-ns,id=ns2,drive=<drv>,nsid=2,bus=nvme2 # Non-shared
In the above example, 'ns1' will be shared to 'nvme0' and 'nvme1' in
the same subsystem. On the other hand, 'ns2' will be attached to the
'nvme2' only as a private namespace in that subsystem.
All the namespace with 'subsys' parameter will attach all controllers in
the subsystem to the namespace by default.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Tested-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
We have nvme-subsys and nvme devices mapped together. To support
multi-controller scheme to this setup, controller identifier(id) has to
be managed. Earlier, cntlid(controller id) used to be always 0 because
we didn't have any subsystem scheme that controller id matters.
This patch introduced 'cntlid' attribute to the nvme controller
instance(NvmeCtrl) and make it allocated by the nvme-subsys device
mapped to the controller. If nvme-subsys is not given to the
controller, then it will always be 0 as it was.
Added 'ctrls' array in the nvme-subsys instance to manage attached
controllers to the subsystem with a limit(32). This patch didn't take
list for the controllers to make it seamless with nvme-ns device.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Tested-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
nvme controller(nvme) can be mapped to a NVMe subsystem(nvme-subsys).
This patch maps a controller to a subsystem by adding a parameter
'subsys' to the nvme device.
To map a controller to a subsystem, we need to put nvme-subsys first and
then maps the subsystem to the controller:
-device nvme-subsys,id=subsys0
-device nvme,serial=foo,id=nvme0,subsys=subsys0
If 'subsys' property is not given to the nvme controller, then subsystem
NQN will be created with serial (e.g., 'foo' in above example),
Otherwise, it will be based on subsys id (e.g., 'subsys0' in above
example).
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Tested-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
To support multi-path in QEMU NVMe device model, We need to have NVMe
subsystem hierarchy to map controllers and namespaces to a NVMe
subsystem.
This patch introduced a simple nvme-subsys device model. The subsystem
will be prepared with subsystem NQN with <subsys_id> provided in
nvme-subsys device:
ex) -device nvme-subsys,id=subsys0: nqn.2019-08.org.qemu:subsys0
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Tested-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
[k.jensen: added 'nqn' device parameter per request]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Treat the num_queues field as virtio-endian. On big-endian hosts the
vhost-user-blk num_queues field was in the wrong endianness.
Move the blkcfg.num_queues store operation from realize to
vhost_user_blk_update_config() so feature negotiation has finished and
we know the endianness of the device. VIRTIO 1.0 devices are
little-endian, but in case someone wants to use legacy VIRTIO we support
all endianness cases.
Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20210223144653.811468-2-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add more fine-grained selection by adding a CONFIG_TC58128
selector for the TC58128 eeprom.
As this device is only used by the Shix machine, add an entry
to the proper section in MAINTAINERS.
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210222141514.2646278-7-f4bug@amsat.org>
This code was introduced in commit 27c7ca7e77,
("SHIX board emulation (Samuel Tardieu)"). Use
the same license.
Cc: Samuel Tardieu <sam@rfc1149.net>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210222141514.2646278-2-f4bug@amsat.org>
This updates the flash information table to include various ISSI
flashes that are supported by upstream U-Boot and Linux kernel.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20210126060007.12904-3-bmeng.cn@gmail.com
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
This adds the ISSI SPI flash support. The number of dummy cycles in
fast read, fast read dual output and fast read quad output commands
is currently using the default 8. Likewise, the same default value
is used for fast read dual/quad I/O command. Per the datasheet [1],
the number of dummy cycles is configurable, but this is not modeled
at present.
For flash whose size is larger than 16 MiB, the sequence of 3-byte
address along with EXTADD bit in the bank address register (BAR) is
not supported. We assume that guest software always uses op codes
with 4-byte address sequence. Fortunately, this is the case for both
U-Boot and Linux spi-nor drivers.
QPI (Quad Peripheral Interface) that supports 2-cycle instruction
has different default values for dummy cycles of fast read family
commands, and is unsupported at the time being.
[1] http://www.issi.com/WW/pdf/25LP-WP256.pdf
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20210126060007.12904-2-bmeng.cn@gmail.com
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
This was only required for the pc-1.0 and earlier machine types.
Now that these have been removed, we can also drop the corresponding
code from the FDC device.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Acked-by: John Snow <jsnow@redhat.com>
Message-Id: <20210203171832.483176-3-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Linux blkfront expects both "discard-granularity" and
"discard-alignment" present on xenbus in order to properly enable the
feature, not exposing "discard-alignment" left some Linux blkfront
versions with a broken discard setup. This has also been addressed in
Linux with:
https://lore.kernel.org/lkml/20210118151528.81668-1-roger.pau@citrix.com/T/#u
Fix QEMU to report a "discard-alignment" of 0, in order for it to work
with older Linux frontends.
Reported-by: Arthur Borsboom <arthurborsboom@gmail.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Message-Id: <20210118153330.82324-1-roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
nvme_ns_realize passes errp to nvme_register_namespaces, but then try to
prepend errp with local_err.
Just remove the local_err and use errp directly.
Fixes: 15d024d4aa ("hw/block/nvme: split setup and register for namespace")
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Current QEMU HEAD nvme.c does not compile with the default GCC 5.4
on a Ubuntu 16.04 host:
hw/block/nvme.c:3242:9: error: ‘result’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
trace_pci_nvme_getfeat_vwcache(result ? "enabled" : "disabled");
^
hw/block/nvme.c:3150:14: note: ‘result’ was declared here
uint32_t result;
^
Explicitly initialize the result to fix it.
Fixes: aa5e55e3b0 ("hw/block/nvme: open code for volatile write cache")
Fixes: Coverity CID 1446371
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Moving namespace registration to the nvme-ns realization function had
the unintended side-effect of breaking legacy namespace registration.
Fix this.
Fixes: 15d024d4aa ("hw/block/nvme: split setup and register for namespace")
Reported-by: Alexander Graf <agraf@csgraf.de>
Cc: Minwoo Im <minwoo.im.dev@gmail.com>
Tested-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Refactor the zone write check logic such that the most "meaningful"
error is returned first. That is, first, if the zone is not writable,
return an appropriate status code for that. Then, make sure we are
actually writing at the write pointer and finally check that we do not
cross the zone write boundary. This aligns with the "priority" of status
codes for zone read checks.
Also add a couple of additional descriptive trace events and remove an
always true assert.
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Tested-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
When a zone append is processed the controller checks that validity of
the write before assigning the LBA to the append command. This causes
the boundary check to be wrong.
Fix this by checking the write *after* assigning the LBA. Remove the
append special case from the nvme_check_zone_write and open code it in
nvme_do_write, assigning the slba when basic sanity checks have been
performed. Then check the validity of the resulting write like any other
write command.
In the process, also fix a missing endianness conversion for the zone
append ALBA.
Reported-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Tested-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The actual parameter name is 'cross_read' rather than 'cross_zone_read'.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Change status checks to align with the existing style and remove the
explicit check against NVME_SUCCESS.
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Currently, no features are saveable, so the current check is not wrong,
but add a check against the feature capabilities to make sure this will
not regress if saveable features are added later.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Only enable DULBE if the namespace supports it.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
If a user assigns a backing device with less capacity than the size of a
single zone, the namespace capacity will be reported as zero and the
kernel will silently fail to allocate the namespace.
This patch errors out in case that the backing device cannot accomodate
at least a single zone.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
[k.jensen: small fixup in the error and commit message]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The controller now implements v1.4 and we can lift the restrictions on
CMB Data Pointer and Command Independent Locations Support (CDPCILS) and
CMB Data Pointer Mixed Locations Support (CDPMLS) since the device
really does not care about mixed host/cmb pointers in those cases.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
With the new CMB logic in place, bump the implemented specification
version to v1.4 by default.
This requires adding the setting the CNTRLTYPE field and modifying the
VWC field since 0x00 is no longer a valid value for bits 2:1.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Implement v1.4 logic for configuring the Controller Memory Buffer. By
default, the v1.4 scheme will be used (CMB must be explicitly enabled by
the host), so drivers that only support v1.3 will not be able to use the
CMB anymore.
To retain the v1.3 behavior, set the boolean 'legacy-cmb' nvme device
parameter.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Padmakar Kalghatgi <p.kalghatgi@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Add support for the PMRMSCL and PMRMSCU MMIO registers. This allows
adding RDS/WDS support for PMR as well.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Naveen Nagar <naveen.n1@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The PMR should not be enabled at boot up. Disable the PMR MemoryRegion
initially and implement MMIO for PMRCTL, allowing the host to enable the
PMR explicitly.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The controller registers are initially zero. Remove the redundant
zeroing.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Use the correct field names.
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
With BAR 4 now free to use, allow PMR and CMB to be enabled
simultaneously.
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
In the interest of supporting both CMB and PMR to be enabled on the same
device, move the MSI-X table and pending bit array out of BAR 4 and into
BAR 0.
This is a simplified version of the patch contributed by Andrzej
Jakowski (see [1]). Leaving the CMB at offset 0 removes the need for
changes to CMB address mapping code.
[1]: https://lore.kernel.org/qemu-devel/20200729220107.37758-3-andrzej.jakowski@linux.intel.com/
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Tested-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
This patch sets CMBS bit in controller capabilities register when user
configures NVMe driver with CMB support, so capabilites are correctly
reported to guest OS.
Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com>
Reviewed-by: Maxim Levitsky <mlevitsky@gmail.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
64 bit registers like ASQ and ACQ should be writable by both a hi/lo 32
bit write combination as well as a plain 64 bit write. The spec does not
define ordering on the hi/lo split, but the code currently assumes that
the low order bits are written first. Additionally, the code does not
consider that another address might already have been written into the
register, causing the OR'ing to result in a bad address.
Fix this by explicitly overwriting only the low or high order bits for
32 bit writes.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add the size of the mmio read/write to the trace event.
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
During smart critical warning injection by setting property from QMP
command, also try to trigger asynchronous event.
Suggested by Keith, if a event has already been raised, there is no
need to enqueue the duplicate event any more.
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
[k.jensen: fix typo in commit message]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
There is a very low probability that hitting physical NVMe disk
hardware critical warning case, it's hard to write & test a monitor
agent service.
For debugging purposes, add a new 'smart_critical_warning' property
to emulate this situation.
The orignal version of this change is implemented by adding a fixed
property which could be initialized by QEMU command line. Suggested
by Philippe & Klaus, rework like current version.
Test with this patch:
1, change smart_critical_warning property for a running VM:
#virsh qemu-monitor-command nvme-upstream '{ "execute": "qom-set",
"arguments": { "path": "/machine/peripheral-anon/device[0]",
"property": "smart_critical_warning", "value":16 } }'
2, run smartctl in guest
#smartctl -H -l error /dev/nvme0n1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
- volatile memory backup device has failed
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The zone write pointer is unconditionally advanced, even for write
faults. Make sure that the zone is always transitioned to Full if the
write pointer reaches zone capacity.
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
nvme_ns_setup() finally does not have nothing to do with NvmeCtrl
instance.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
In NVMe, namespace is being attached to process I/O. We register NVMe
namespace to a controller via nvme_register_namespace() during
nvme_ns_setup(). This is main reason of receiving NvmeCtrl object
instance to this function to map the namespace to a controller.
To make namespace instance more independent, it should be split into two
parts: setup and register. This patch split them into two differnt
parts, and finally nvme_ns_setup() does not have nothing to do with
NvmeCtrl instance at all.
This patch is a former patch to introduce NVMe subsystem scheme to the
existing design especially for multi-path. In that case, it should be
split into two to make namespace independent from a controller.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Removed no longer used aregument NvmeCtrl object in nvme_ns_init_blk().
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Volatile Write Cache(VWC) feature is set in nvme_ns_setup() in the
initial time. This feature is related to block device backed, but this
feature is controlled in controller level via Set/Get Features command.
This patch removed dependency between nvme and nvme-ns to manage the VWC
flag value. Also, it open coded the Get Features for VWC to check all
namespaces attached to the controller, and if false detected, return
directly false.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
[k.jensen: report write cache preset if present on ANY namespace]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
nvme_ns_init_zoned() has no use for given NvmeCtrl object.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
TP 4053 says (in section 2.3.1.1) -
... if a Zone Append command specifies a ZSLBA that is not the lowest
logical block address in that zone, then the controller shall abort
that command with a status code of Invalid Field In Command.
In the code, Zone Invalid Write is returned instead, fix this.
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
nvme_io_cmd already checks if the namespace supports the Zone Append
command, so the removed check is dead code.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Add missing string representations for a couple of new commands.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
The zoned command set specification states that "All logical blocks in a
zone *shall* be marked as deallocated when [the zone is reset]". Since
the device guarantees 0x00 to be read from deallocated blocks we have to
issue a pwrite_zeroes since we cannot be sure that a discard will do
anything. But typically, this will be achieved with an efficient
unmap/discard operation.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Align with existing style and use a typedef for header-file enums.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Implicitly and explicitly opended zones are always bulk processed
together, so merge the two processing masks.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
A shutdown is only about flushing stuff. It is the host that should
delete any queues, so do not perform a reset here.
Also, on shutdown, make sure that the PMR is flushed if in use.
Fixes: 368f4e752cf9 ("hw/block/nvme: Process controller reset and shutdown differently")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
The device uses the BDRV_BLOCK_ZERO flag to determine the "deallocated"
status of logical blocks. Since the zoned namespaces command set
specification defines that logical blocks SHALL be marked as deallocated
when the zone is in the Empty or Offline states, DULBE can only be
supported if the zone size is a multiple of the calculated deallocation
granularity (reported in NPDG) which depends on the underlying block
device cluster size (if applicable) or the configured
discard_granularity.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Commit 1c0c2163aa ("hw/block/nvme: verify msix_init_exclusive_bar()
return value") had the unintended effect of breaking support on
several platforms not supporting MSI-X.
Still check for errors, but only report that MSI-X is unsupported
instead of bailing out.
Fixes: 1c0c2163aa ("hw/block/nvme: verify msix_init_exclusive_bar() return value")
Fixes: fbf2e5375e ("hw/block/nvme: Verify msix_vector_use() returned value")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Added brief descriptions of the new device properties that are
now available to users to configure features of Zoned Namespace
Command Set in the emulator.
This patch is for documentation only, no functionality change.
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Zone Descriptor Extension is a label that can be assigned to a zone.
It can be set to an Empty zone and it stays assigned until the zone
is reset.
This commit adds a new optional module property,
"zoned.descr_ext_size". Its value must be a multiple of 64 bytes.
If this value is non-zero, it becomes possible to assign extensions
of that size to any Empty zones. The default value for this property
is 0, therefore setting extensions is disabled by default.
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Add two module properties, "zoned.max_active" and "zoned.max_open"
to control the maximum number of zones that can be active or open.
Once these variables are set to non-default values, these limits are
checked during I/O and Too Many Active or Too Many Open command status
is returned if they are exceeded.
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The emulation code has been changed to advertise NVM Command Set when
"zoned" device property is not set (default) and Zoned Namespace
Command Set otherwise.
Define values and structures that are needed to support Zoned
Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator.
Define trace events where needed in newly introduced code.
In order to improve scalability, all open, closed and full zones
are organized in separate linked lists. Consequently, almost all
zone operations don't require scanning of the entire zone array
(which potentially can be quite large) - it is only necessary to
enumerate one or more zone lists.
Handlers for three new NVMe commands introduced in Zoned Namespace
Command Set specification are added, namely for Zone Management
Receive, Zone Management Send and Zone Append.
Device initialization code has been extended to create a proper
configuration for zoned operation using device properties.
Read/Write command handler is modified to only allow writes at the
write pointer if the namespace is zoned. For Zone Append command,
writes implicitly happen at the write pointer and the starting write
pointer value is returned as the result of the command. Write Zeroes
handler is modified to add zoned checks that are identical to those
done as a part of Write flow.
Subsequent commits in this series add ZDE support and checks for
active and open zone limits.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Many CNS commands have "allocated" command variants. These include
a namespace as long as it is allocated, that is a namespace is
included regardless if it is active (attached) or not.
While these commands are optional (they are mandatory for controllers
supporting the namespace attachment command), our QEMU implementation
is more complete by actually providing support for these CNS values.
However, since our QEMU model currently does not support the namespace
attachment command, these new allocated CNS commands will return the
same result as the active CNS command variants.
The reason for not hooking up this command completely is because the
NVMe specification requires the namespace management command to be
supported if the namespace attachment command is supported.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Define the structures and constants required to implement
Namespace Types support.
Namespace Types introduce a new command set, "I/O Command Sets",
that allows the host to retrieve the command sets associated with
a namespace. Introduce support for the command set and enable
detection for the NVM Command Set.
The new workflows for identify commands rely heavily on zero-filled
identify structs. E.g., certain CNS commands are defined to return
a zero-filled identify struct when an inactive namespace NSID
is supplied.
Add a helper function in order to avoid code duplication when
reporting zero-filled identify structures.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
This log page becomes necessary to implement to allow checking for
Zone Append command support in Zoned Namespace Command Set.
This commit adds the code to report this log page for NVM Command
Set only. The parts that are specific to zoned operation will be
added later in the series.
All incoming admin and i/o commands are now only processed if their
corresponding support bits are set in this log. This provides an
easy way to control what commands to support and what not to
depending on set CC.CSS.
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Move write processing to nvme_do_write() that now handles both WRITE
and WRITE ZEROES. Both nvme_write() and nvme_write_zeroes() become
inline helper functions.
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The majority of code in nvme_rw() is becoming read- or write-specific.
Move these parts to two separate handlers, nvme_read() and nvme_write()
to make the code more readable and to remove multiple is_write checks
that has been present in the i/o path.
This is a refactoring patch, no change in functionality.
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
In NVMe 1.4, a namespace must report an ID descriptor of UUID type
if it doesn't support EUI64 or NGUID. Add a new namespace property,
"uuid", that provides the user the option to either specify the UUID
explicitly or have a UUID generated automatically every time a
namespace is initialized.
Suggested-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Controller reset ans subsystem shutdown are handled very much the same
in the current code, but some of the steps should be different in these
two cases.
Introduce two new functions, nvme_reset_ctrl() and nvme_shutdown_ctrl(),
to separate some portions of the code from nvme_clear_ctrl(). The steps
that are made different between reset and shutdown are that BAR.CC is not
reset to zero upon the shutdown and namespace data is flushed to
backing storage as a part of shutdown handling, but not upon reset.
Suggested-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Commit 37712e00b1 ("hw/block/nvme: factor out pmr setup") changed the
control flow such that the CAP register is erronously cleared after
nvme_init_pmr() has configured it. Since the entire NvmeCtrl structure
is zero-filled initially, there is no need for the explicit clearing, so
just remove it.
Fixes: 37712e00b1 ("hw/block/nvme: factor out pmr setup")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Add the Compare command.
This implementation uses a bounce buffer to read in the data from
storage and then compare with the host supplied buffer.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
[k.jensen: rebased]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add support for the Dataset Management command and the Deallocate
attribute. Deallocation results in discards being sent to the underlying
block device. Whether of not the blocks are actually deallocated is
affected by the same factors as Write Zeroes (see previous commit).
format | discard | dsm (512B) dsm (4KiB) dsm (64KiB)
--------------------------------------------------------
qcow2 ignore n n n
qcow2 unmap n n y
raw ignore n n n
raw unmap n y y
Again, a raw format and 4KiB LBAs are preferable.
In order to set the Namespace Preferred Deallocate Granularity and
Alignment fields (NPDG and NPDA), choose a sane minimum discard
granularity of 4KiB. If we are using a passthru device supporting
discard at a 512B granularity, user should set the discard_granularity
property explicitly. NPDG and NPDA will also account for the
cluster_size of the block driver if required (i.e. for QCOW2).
See NVM Express 1.3d, Section 6.7 ("Dataset Management command").
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add support for reporting the Deallocated or Unwritten Logical Block
Error (DULBE).
Rely on the block status flags reported by the block layer and consider
any block with the BDRV_BLOCK_ZERO flag to be deallocated.
Multiple factors affect when a Write Zeroes command result in
deallocation of blocks.
* the underlying file system block size
* the blockdev format
* the 'discard' and 'logical_block_size' parameters
format | discard | wz (512B) wz (4KiB) wz (64KiB)
-----------------------------------------------------
qcow2 ignore n n y
qcow2 unmap n n y
raw ignore n y y
raw unmap n y y
So, this works best with an image in raw format and 4KiB LBAs, since
holes can then be punched on a per-block basis (this assumes a file
system with a 4kb block size, YMMV). A qcow2 image, uses a cluster size
of 64KiB by default and blocks will only be marked deallocated if a full
cluster is zeroed or discarded. However, this *is* consistent with the
spec since Write Zeroes "should" deallocate the block if the Deallocate
attribute is set and "may" deallocate if the Deallocate attribute is not
set. Thus, we always try to deallocate (the BDRV_REQ_MAY_UNMAP flag is
always set).
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add a new function, nvme_aio_err, to handle errors resulting from AIOs
and use this from the callbacks.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
nvme_check_bounds has no use of the NvmeCtrl parameter; remove it.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Currently, blk_is_read_only() tells whether a given BlockBackend can
only be used in read-only mode because its root node is read-only. Some
callers actually try to answer a slightly different question: Is the
BlockBackend configured to be writable, by taking write permissions on
the root node?
This can differ, for example, for CD-ROM devices which don't take write
permissions, but may be backed by a writable image file. scsi-cd allows
write requests to the drive if blk_is_read_only() returns false.
However, the write request will immediately run into an assertion
failure because the write permission is missing.
This patch introduces separate functions for both questions.
blk_supports_write_perm() answers the question whether the block
node/image file can support writable devices, whereas blk_is_writable()
tells whether the BlockBackend is currently configured to be writable.
All calls of blk_is_read_only() are converted to one of the two new
functions.
Fixes: https://bugs.launchpad.net/bugs/1906693
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210118123448.307825-2-kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
- minor resource leak fixes in qemu-nbd
- ensure proper aio context when nbd server uses iothreads
- iotest refactorings in preparation for rewriting ./check to be more
flexible, and preparing for more nbd server reconnect features
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmAI5sMACgkQp6FrSiUn
Q2rmQwf+Jmlsn8s0tdeeOhv6mp8ZSyvr2/x1/daGHkzZqhoL7m/4kJLP4p/u8uTV
XzXyXt7MKvHKd8UvKB6VsN6z75RJvi7y8pKpOQA96t08hjWuAcVtivKnyZd6MTwj
zeKsmrE8LAuMjHvmsrtrmRqCSdaVeFPb3qC6bvJ+WEiXJIMiXybF7lccPvR7WWjR
2FcyraZJgnlKrQv1i8M1++Px5W14jhOacAMUNAdVzNiYpu4tq6PTk9giq1/GULCz
xVYGHqoTFYy7Slj7xKQJuOwGNLMwL+F9x7/6wRFhKxjutc0/Po1lSfbaNe8q147H
p9jtDT9/OuTlQf7qpqqyQnASABDgaA==
=XW+E
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2021-01-20' into staging
nbd patches for 2021-01-20
- minor resource leak fixes in qemu-nbd
- ensure proper aio context when nbd server uses iothreads
- iotest refactorings in preparation for rewriting ./check to be more
flexible, and preparing for more nbd server reconnect features
# gpg: Signature made Thu 21 Jan 2021 02:28:19 GMT
# gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg: aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A
* remotes/ericb/tags/pull-nbd-2021-01-20:
iotests.py: qemu_io(): reuse qemu_tool_pipe_and_status()
iotests.py: fix qemu_tool_pipe_and_status()
iotests/264: fix style
iotests: define group in each iotest
iotests/294: add shebang line
iotests: make tests executable
iotests: fix some whitespaces in test output files
iotests/303: use dot slash for qcow2.py running
iotests/277: use dot slash for nbd-fault-injector.py running
nbd/server: Quiesce coroutines on context switch
block: Honor blk_set_aio_context() context requirements
qemu-nbd: Fix a memleak in nbd_client_thread()
qemu-nbd: Fix a memleak in qemu_nbd_client_list()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The documentation for bdrv_set_aio_context_ignore() states this:
* The caller must own the AioContext lock for the old AioContext of bs, but it
* must not own the AioContext lock for new_context (unless new_context is the
* same as the current context of bs).
As blk_set_aio_context() makes use of this function, this rule also
applies to it.
Fix all occurrences where this rule wasn't honored.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20201214170519.223781-2-slp@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
As per POSIX specification of limits.h [1], OS libc may define
PAGE_SIZE in limits.h.
To prevent collosion of definition, we rename PAGE_SIZE here.
[1]: https://pubs.opengroup.org/onlinepubs/7908799/xsh/limits.h.html
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210118063808.12471-5-jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Auto Address Increment (AAI) Word-Program is a special command of
SST flashes. AAI-WP allows multiple bytes of data to be programmed
without re-issuing the next sequential address location.
Signed-off-by: Xuzhou Cheng <xuzhou.cheng@windriver.com>
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Message-id: 1608688825-81519-2-git-send-email-bmeng.cn@gmail.com
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
When write is disabled, the write to flash should be avoided
in flash_write8().
Fixes: 82a2499011 ("m25p80: Initial implementation of SPI flash device")
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Message-id: 1608688825-81519-1-git-send-email-bmeng.cn@gmail.com
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
This commit is the result of running the timer-del-timer-free.cocci
script on the whole source tree.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201215154107.3255-4-peter.maydell@linaro.org
The function will be moved to common QOM code, as it is not
specific to TYPE_DEVICE anymore.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Paul Durrant <paul@xen.org>
Message-Id: <20201211220529.2290218-31-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Every single qdev property setter function manually checks
dev->realized. We can just check dev->realized inside
qdev_property_set() instead.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Paul Durrant <paul@xen.org>
Message-Id: <20201211220529.2290218-24-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Move the property types and property macros implemented in
qdev-properties-system.c to a new qdev-properties-system.h
header.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20201211220529.2290218-16-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This is the QEMU equivalent of this Linux commit (but 7 years later):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f7025a43a9da2
The MTD subsystem has its own small museum of ancient NANDs
in a form of the CONFIG_MTD_NAND_MUSEUM_IDS configuration option.
The museum contains stone age NANDs with 256 bytes pages, as well
as iron age NANDs with 512 bytes per page and up to 8MiB page size.
It is with great sorrow that I inform you that the museum is being
decommissioned. The MTD subsystem is out of budget for Kconfig
options and already has too many of them, and there is a general
kernel trend to simplify the configuration menu.
We remove the stone age exhibits along with closing the museum,
but some of the iron age ones are transferred to the regular NAND
depot. Namely, only those which have unique device IDs are
transferred, and the ones which have conflicting device IDs are
removed.
The machine using this device are:
- axis-dev88
- tosa (via tc6393xb_init)
- spitz based (akita, borzoi, terrier)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20201214002620.342384-1-f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* gdbstub: Correct misparsing of vCont C/S requests
* openrisc: Move pic_cpu code into CPU object proper
* nios2: Move IIC code into CPU object proper
* Improve reporting of ROM overlap errors
* xlnx-versal: Add USB support
* hw/misc/zynq_slcr: Avoid #DIV/0! error
* Numonyx: Fix dummy cycles and check for SPI mode on cmds
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAl/YwVIZHHBldGVyLm1h
eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3lOpD/9FjasMvZqYeanyNlv+4Swk
4MFeYouIzXKSFu9tj5eDTHzN1TJl5iSwhkIcr9NBqxppuv2eqzxfWWMEfCZ06pxz
BR2HoSlSLUih8cKpu40cQg0TTMEOGEOV9RAHtt8vSGE0FesoiXG2ORUPcxm3NxbN
l9XZ1x3Yb5ZLqVZViFjlZ5gXnTzJ//uPEzbl7N9+pa0mXDKvmvwAl19DLmF6N2Jj
D+gmrLGeEbkJ358RGO/VF7r/1bOkrhwKrb8MzeqFRmjIqaOGbGqs/71+amiSjS8n
hC1HKf6KQOLrklMVaYg1pRxHLbHpQR+haeeX4Xt9jxx8EUrwXojlyaD8p4V9Hcu8
L5haTIBhPrnTkUfHZYL0qYkqRpzbNq97oX2Gmk967FfsZME5fxNa3kS6zM0GkIBx
YKghaZtFInAFODUbG1hHdUc+WbvfQDhj/mBQ6wWw669vYpoab/3nfVq8YVoupVM/
RntcqpBfqtGgPzuJ2dJEEsm6QlK4SZaGlmPkz542OzcHxw3SgeqkbIuDW/CtNI+b
c5PgX0C2S2AnFAAHURnsXdqt6+O01FZqOU7SCLjmwrBrpDG69lum+JLCqXFe9iMW
XgrTrxyPIcz5+Bv63AqKcm6rpcQs5ekwmLLEjT0OJtr+5ef9MeRil0aChj1j4i+2
H/82yKR4JWW1egEvTJhskQ==
=lHZA
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20201215' into staging
target-arm queue:
* gdbstub: Correct misparsing of vCont C/S requests
* openrisc: Move pic_cpu code into CPU object proper
* nios2: Move IIC code into CPU object proper
* Improve reporting of ROM overlap errors
* xlnx-versal: Add USB support
* hw/misc/zynq_slcr: Avoid #DIV/0! error
* Numonyx: Fix dummy cycles and check for SPI mode on cmds
# gpg: Signature made Tue 15 Dec 2020 13:59:46 GMT
# gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg: issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate]
# gpg: aka "Peter Maydell <pmaydell@gmail.com>" [ultimate]
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate]
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* remotes/pmaydell/tags/pull-target-arm-20201215:
hw/block/m25p80: Fix Numonyx fast read dummy cycle count
hw/block/m25p80: Check SPI mode before running some Numonyx commands
hw/block/m25p80: Fix when VCFG XIP bit is set for Numonyx
hw/block/m25p80: Make Numonyx config field names more accurate
hw/misc/zynq_slcr: Avoid #DIV/0! error
arm: xlnx-versal: Connect usb to virt-versal
usb: xlnx-usb-subsystem: Add xilinx usb subsystem
usb: Add DWC3 model
usb: Add versal-usb2-ctrl-regs module
elf_ops.h: Be more verbose with ROM blob names
elf_ops.h: Don't truncate name of the ROM blobs we create
hw/core/loader.c: Improve reporting of ROM overlap errors
hw/core/loader.c: Track last-seen ROM in rom_check_and_register_reset()
target/nios2: Use deposit32() to update ipending register
target/nios2: Move nios2_check_interrupts() into target/nios2
target/nios2: Move IIC code into CPU object proper
target/openrisc: Move pic_cpu code into CPU object proper
hw/openrisc/openrisc_sim: Abstract out "get IRQ x of CPU y"
hw/openrisc/openrisc_sim: Use IRQ splitter when connecting IRQ to multiple CPUs
gdbstub: Correct misparsing of vCont C/S requests
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Make the code more generic and not specific to TYPE_DEVICE.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com> #s390 parts
Acked-by: Paul Durrant <paul@xen.org>
Message-Id: <20201211220529.2290218-10-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Some Numonyx flash commands cannot be executed in DIO and QIO mode, such as
trying to do DPP or DOR when in QIO mode.
Signed-off-by: Joe Komlodi <komlodi@xilinx.com>
Reviewed-by: Francisco Iglesias <francisco.iglesias@xilinx.com>
Message-id: 1605568264-26376-4-git-send-email-komlodi@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
VCFG XIP is set (disabled) when the NVCFG XIP bits are all set (disabled).
Signed-off-by: Joe Komlodi <komlodi@xilinx.com>
Reviewed-by: Francisco Iglesias <francisco.iglesias@xilinx.com>
Message-id: 1605568264-26376-3-git-send-email-komlodi@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The previous naming of the configuration registers made it sound like that if
the bits were set the settings would be enabled, while the opposite is true.
Signed-off-by: Joe Komlodi <komlodi@xilinx.com>
Reviewed-by: Francisco Iglesias <francisco.iglesias@xilinx.com>
Message-id: 1605568264-26376-2-git-send-email-komlodi@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In order to use inclusive terminology, rename SSI 'slave' as
'peripheral', following the specification resolution:
https://www.oshwa.org/a-resolution-to-redefine-spi-signal-names/
Patch created mechanically using:
$ sed -i s/SSISlave/SSIPeripheral/ $(git grep -l SSISlave)
$ sed -i s/SSI_SLAVE/SSI_PERIPHERAL/ $(git grep -l SSI_SLAVE)
$ sed -i s/ssi-slave/ssi-peripheral/ $(git grep -l ssi-slave)
$ sed -i s/ssi_slave/ssi_peripheral/ $(git grep -l ssi_slave)
$ sed -i s/ssi_create_slave/ssi_create_peripheral/ \
$(git grep -l ssi_create_slave)
Then in VMStateDescription vmstate_ssi_peripheral we restored
the "SSISlave" migration stream name (to avoid breaking migration).
Finally the following files have been manually tweaked:
- hw/ssi/pl022.c
- hw/ssi/xilinx_spips.c
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20201012124955.3409127-4-f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The category of the nand device is not set, put it into the 'storage'
category.
Signed-off-by: Gan Qixin <ganqixin@huawei.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20201112125824.763182-4-ganqixin@huawei.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
There is no "version 2" of the "Lesser" General Public License.
It is either "GPL version 2.0" or "Lesser GPL version 2.1".
This patch replaces all occurrences of "Lesser GPL version 2" with
"Lesser GPL version 2.1" in comment section.
Signed-off-by: Chetan Pant <chetan4windows@gmail.com>
Message-Id: <20201023123034.19609-1-chetan4windows@gmail.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
[thuth: Fixed subject]
Signed-off-by: Thomas Huth <thuth@redhat.com>
Since 7f0f1acedf ("hw/block/nvme: support multiple namespaces"), the
namespaces member of NvmeCtrl is no longer a dynamically allocated
array. Remove the free.
Fixes: 7f0f1acedf ("hw/block/nvme: support multiple namespaces")
Reported-by: Coverity (CID 1436131)
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Message-Id: <20201104102248.32168-4-its@irrelevant.dk>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
nvme_map_sgl_data erroneously uses the sgls member of NvmeIdNs as a
uint16_t.
Reported-by: Coverity (CID 1436129)
Fixes: cba0a8a344 ("hw/block/nvme: add support for scatter gather lists")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Message-Id: <20201104102248.32168-3-its@irrelevant.dk>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Virtqueue has split and packed, so before setting inflight,
you need to inform the back-end virtqueue format.
Signed-off-by: Jin Yu <jin.yu@intel.com>
Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20201103123617.28256-1-jin.yu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This reverts commit adb29c0273.
The commit broke -device vhost-user-blk-pci because the
vhost_dev_prepare_inflight() function it introduced segfaults in
vhost_dev_set_features() when attempting to access struct vhost_dev's
vdev pointer before it has been assigned.
To reproduce the segfault simply launch a vhost-user-blk device with the
contrib vhost-user-blk device backend:
$ build/contrib/vhost-user-blk/vhost-user-blk -s /tmp/vhost-user-blk.sock -r -b /var/tmp/foo.img
$ build/qemu-system-x86_64 \
-device vhost-user-blk-pci,id=drv0,chardev=char1,addr=4.0 \
-object memory-backend-memfd,id=mem,size=1G,share=on \
-M memory-backend=mem,accel=kvm \
-chardev socket,id=char1,path=/tmp/vhost-user-blk.sock
Segmentation fault (core dumped)
Cc: Jin Yu <jin.yu@intel.com>
Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201102165709.232180-1-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE28EdLTc7SjdV9QLsYlFWYQpPbMAFAl+gI74ACgkQYlFWYQpP
bMCBrA/9GXMZZGDfHFenXF+rS6J+ZKxtk29vq9Ly8KZ9YW7CzF9MP8qE/5iyFfmx
d1BknXGQerW2kAzpkOq2/MKDklOc+0BAhaTdUaFR/ao5ZKuv2LQ8uFnKVoTrhTx9
+HVkTVUTnez6ReCZVIrtN4+XVdyQTeQotJg6H2m5Q/BxQKcj6OMOlneuSGDn5vFN
EWgDvEmfFEkzbN8FMXtkT35bg3vA5TGmfQRMk1SMMREOPxF04CaTVTxYscCpS0WC
Cl+62mx4XLjscK7hwXuTNTrxeOLxZ2xLK5dhDd/qxBveio07mIM5X2psdKR0t5qX
HLtm437T9CAYmyo8jgvM4KL8f+rbJnLd579qyVwIMsue28Qisj9nuWCTcaEpjfck
4krhxJwxenRtqQ9wYrnbnQI5yQDIE6iUGf0toXwCNdJIr+FvyIcT7vJtTzZXtRI8
sxwK5wfJ/WSey9uNLZGFbQuv4vjOMV+Nk3mEi1gUV8ujogo+2U6WUAE3NhqFLKn1
YT6AJhDZvqL1f8gFrbiqR8xwvPrYmwK/tK38X1exSDOqiB7UNzR/apAb1oniul0e
rS5xWzIs9APvkdWQssCHvrVDdh6VISXQ5bnT8lkfmvYrCTn2gUGAFXDrxZjXIaL9
scCr8N9STkHmoYpc2ACRKIpfK3E1sDjGA8mAPemkxsLakNwBS4o=
=s4KC
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/nvme/tags/pull-nvme-20201102' into staging
nvme pull 2 Nov 2020
# gpg: Signature made Mon 02 Nov 2020 15:20:30 GMT
# gpg: using RSA key DBC11D2D373B4A3755F502EC625156610A4F6CC0
# gpg: Good signature from "Keith Busch <kbusch@kernel.org>" [unknown]
# gpg: aka "Keith Busch <keith.busch@gmail.com>" [unknown]
# gpg: aka "Keith Busch <keith.busch@intel.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: DBC1 1D2D 373B 4A37 55F5 02EC 6251 5661 0A4F 6CC0
* remotes/nvme/tags/pull-nvme-20201102: (30 commits)
hw/block/nvme: fix queue identifer validation
hw/block/nvme: fix create IO SQ/CQ status codes
hw/block/nvme: fix prp mapping status codes
hw/block/nvme: report actual LBA data shift in LBAF
hw/block/nvme: add trace event for requests with non-zero status code
hw/block/nvme: add nsid to get/setfeat trace events
hw/block/nvme: reject io commands if only admin command set selected
hw/block/nvme: support for admin-only command set
hw/block/nvme: validate command set selected
hw/block/nvme: support per-namespace smart log
hw/block/nvme: fix log page offset check
hw/block/nvme: remove pointless rw indirection
hw/block/nvme: update nsid when registered
hw/block/nvme: change controller pci id
pci: allocate pci id for nvme
hw/block/nvme: support multiple namespaces
hw/block/nvme: refactor identify active namespace id list
hw/block/nvme: add support for sgl bit bucket descriptor
hw/block/nvme: add support for scatter gather lists
hw/block/nvme: harden cmb access
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Virtqueue has split and packed, so before setting inflight,
you need to inform the back-end virtqueue format.
Signed-off-by: Jin Yu <jin.yu@intel.com>
Message-Id: <20200910134851.7817-1-jin.yu@intel.com>
Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The nvme_check_{sq,cq} functions check if the given queue identifer is
valid *and* that the queue exists. Thus, the function return value
cannot simply be inverted to check if the identifer is valid and that
the queue does *not* exist.
Replace the call with an OR'ed version of the checks.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Replace the Invalid Field in Command with the Invalid PRP Offset status
code in the nvme_create_{cq,sq} functions. Also, allow PRP1 to be
address 0x0.
Also replace the Completion Queue Invalid status code returned in
nvme_create_cq when the the queue identifier is invalid with the Invalid
Queue Identifier. The Completion Queue Invalid status code is
exclusively for indicating that the completion queue identifer given
when creating a submission queue is invalid.
See NVM Express v1.3d, Section 5.3 ("Create I/O Completion Queue
command") and 5.4("Create I/O Submission Queue command").
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Address 0 is not an invalid address. Remove those invalikd checks.
Unaligned PRP2 and PRP list entries should result in Invalid PRP Offset
status code and not Invalid Field. Fix that.
See NVMe Express v1.3d, Section 4.3 ("Physical Region Page Entry and
List").
Suggested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Calculate the data shift value to report based on the set value of
logical_block_size device property.
In the process, use a local variable to calculate the LBA format
index instead of the hardcoded value 0. This makes the code more
readable and it will make it easier to add support for multiple LBA
formats in the future.
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
If a command results in a non-zero status code, trace it.
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Include the namespace id in the pci_nvme_{get,set}feat trace events.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
If the host sets CC.CSS to 111b, all commands submitted to I/O queues
should be completed with status Invalid Command Opcode.
Note that this is technically a v1.4 feature, but it does not hurt to
implement before we finally bump the reported version implemented.
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Fail to start the controller if the user requests a command set that the
controller does not support.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Let the user specify a specific namespace if they want to get access
stats for a specific namespace.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Return error if the requested offset starts after the size of the log
being returned. Also, move the check for earlier in the function so
we're not doing unnecessary calculations.
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed- by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The code switches on the opcode to invoke a function specific to that
opcode. There's no point in consolidating back to a common function that
just switches on that same opcode without any actual common code.
Restore the opcode specific behavior without going back through another
level of switches.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
If the user does not specify an nsid parameter on the nvme-ns device,
nvme_register_namespace will find the first free namespace id and assign
that.
This fix makes sure the assigned id is saved.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
There are two reasons for changing this:
1. The nvme device currently uses an internal Intel device id.
2. Since commits "nvme: fix write zeroes offset and count" and "nvme:
support multiple namespaces" the controller device no longer has
the quirks that the Linux kernel think it has.
As the quirks are applied based on pci vendor and device id, change
them to get rid of the quirks.
To keep backward compatibility, add a new 'use-intel-id' parameter to
the nvme device to force use of the Intel vendor and device id. This is
off by default but add a compat property to set this for 5.1 machines
and older. If a 5.1 machine is booted (or the use-intel-id parameter is
explicitly set to true), the Linux kernel will just apply these
unnecessary quirks:
1. NVME_QUIRK_IDENTIFY_CNS which says that the device does not support
anything else than values 0x0 and 0x1 for CNS (Identify Namespace
and Identify Namespace). With multiple namespace support, this just
means that the kernel will "scan" namespaces instead of using
"Active Namespace ID list" (CNS 0x2).
2. NVME_QUIRK_DISABLE_WRITE_ZEROES. The nvme device started out with a
broken Write Zeroes implementation which has since been fixed in
commit 9d6459d21a ("nvme: fix write zeroes offset and count").
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
This adds support for multiple namespaces by introducing a new 'nvme-ns'
device model. The nvme device creates a bus named from the device name
('id'). The nvme-ns devices then connect to this and registers
themselves with the nvme device.
This changes how an nvme device is created. Example with two namespaces:
-drive file=nvme0n1.img,if=none,id=disk1
-drive file=nvme0n2.img,if=none,id=disk2
-device nvme,serial=deadbeef,id=nvme0
-device nvme-ns,drive=disk1,bus=nvme0,nsid=1
-device nvme-ns,drive=disk2,bus=nvme0,nsid=2
The drive property is kept on the nvme device to keep the change
backward compatible, but the property is now optional. Specifying a
drive for the nvme device will always create the namespace with nsid 1.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Prepare to support inactive namespaces.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
This adds support for SGL descriptor type 0x1 (bit bucket descriptor).
See the NVM Express v1.3d specification, Section 4.4 ("Scatter Gather
List (SGL)").
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
For now, support the Data Block, Segment and Last Segment descriptor
types.
See NVM Express 1.3d, Section 4.4 ("Scatter Gather List (SGL)").
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Since the controller has only supported PRPs so far it has not been
required to check the ending address (addr + len - 1) of the CMB access
for validity since it has been guaranteed to be in range of the CMB.
This changes when the controller adds support for SGLs (next patch), so
add that check.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Make the default request status NVME_SUCCESS so only error status codes
have to be set.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
This pulls block layer aio submission/completion to common functions.
For completions, additionally map an AIO error to the Unrecovered Read
and Write Fault status codes.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Add the symbolic command name to the pci_nvme_{io,admin}_cmd and
pci_nvme_rw trace events.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
The raw NLB field is a 16 bit value, so use le16_to_cpu instead of
le32_to_cpu and cast to uint32_t before incrementing the value to not
wrap around.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Add the nvme_l2b helper and use it for converting NLB and SLBA to byte
counts and offsets.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Style fixes.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Handling DMA errors gracefully is required for the device to pass the
block/011 test ("disable PCI device while doing I/O") in the blktests
suite.
With this patch the device sets the Controller Fatal Status bit in the
CSTS register when failing to read from a submission queue or writing to
a completion queue; expecting the host to reset the controller.
If DMA errors occur at any other point in the execution of the command
(say, while mapping the PRPs), the command is aborted with a Data
Transfer Error status code.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Fix a typo in the sq doorbell trace event.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
As the 'timestamp' variable is declared as a 48-bit bitfield,
we do not need to wrap the sum result.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Message-Id: <20201002075716.1657849-1-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
vhost-user devices can get a disconnect in the middle of the VHOST-USER
handshake on the migration start. If disconnect event happened right
before sending next VHOST-USER command, then the vhost_dev_set_log()
call in the vhost_migration_log() function will return error. This error
will lead to the assert() and close the QEMU migration source process.
For the vhost-user devices the disconnect event should not break the
migration process, because:
- the device will be in the stopped state, so it will not be changed
during migration
- if reconnect will be made the migration log will be reinitialized as
part of reconnect/init process:
#0 vhost_log_global_start (listener=0x563989cf7be0)
at hw/virtio/vhost.c:920
#1 0x000056398603d8bc in listener_add_address_space (listener=0x563989cf7be0,
as=0x563986ea4340 <address_space_memory>)
at softmmu/memory.c:2664
#2 0x000056398603dd30 in memory_listener_register (listener=0x563989cf7be0,
as=0x563986ea4340 <address_space_memory>)
at softmmu/memory.c:2740
#3 0x0000563985fd6956 in vhost_dev_init (hdev=0x563989cf7bd8,
opaque=0x563989cf7e30, backend_type=VHOST_BACKEND_TYPE_USER,
busyloop_timeout=0)
at hw/virtio/vhost.c:1385
#4 0x0000563985f7d0b8 in vhost_user_blk_connect (dev=0x563989cf7990)
at hw/block/vhost-user-blk.c:315
#5 0x0000563985f7d3f6 in vhost_user_blk_event (opaque=0x563989cf7990,
event=CHR_EVENT_OPENED)
at hw/block/vhost-user-blk.c:379
Update the vhost-user-blk device with the internal started_vu field which
will be used for initialization (vhost_user_blk_start) and clean up
(vhost_user_blk_stop). This additional flag in the VhostUserBlk structure
will be used to track whether the device really needs to be stopped and
cleaned up on a vhost-user level.
The disconnect event will set the overall VHOST device (not vhost-user) to
the stopped state, so it can be used by the general vhost_migration_log
routine.
Such approach could be propogated to the other vhost-user devices, but
better idea is just to make the same connect/disconnect code for all the
vhost-user devices.
This migration issue was slightly discussed earlier:
- https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg01509.html
- https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg05241.html
Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <9fbfba06791a87813fcee3e2315f0b904cc6789a.1599813294.git.dimastep@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fuzzing discovered that virtqueue_unmap_sg() is being called on modified
req->in/out_sg iovecs. This means dma_memory_map() and
dma_memory_unmap() calls do not have matching memory addresses.
Fuzzing discovered that non-RAM addresses trigger a bug:
void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
bool is_write, hwaddr access_len)
{
if (buffer != bounce.buffer) {
^^^^^^^^^^^^^^^^^^^^^^^
A modified iov->iov_base is no longer recognized as a bounce buffer and
the wrong branch is taken.
There are more potential bugs: dirty memory is not tracked correctly and
MemoryRegion refcounts can be leaked.
Use the new iov_discard_undo() API to restore elem->in/out_sg before
virtqueue_push() is called.
Fixes: 827805a249 ("virtio-blk: Convert VirtIOBlockReq.out to structrue")
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Buglink: https://bugs.launchpad.net/qemu/+bug/1890360
Message-Id: <20200917094455.822379-3-stefanha@redhat.com>
Some trace points are attributed to the wrong source file. Happens
when we neglect to update trace-events for code motion, or add events
in the wrong place, or misspell the file name.
Clean up with help of scripts/cleanup-trace-events.pl. Funnies
requiring manual post-processing:
* accel/tcg/cputlb.c trace points are in trace-events.
* block.c and blockdev.c trace points are in block/trace-events.
* hw/block/nvme.c uses the preprocessor to hide its trace point use
from cleanup-trace-events.pl.
* hw/tpm/tpm_spapr.c uses pseudo trace point tpm_spapr_show_buffer to
guard debug code.
* include/hw/xen/xen_common.h trace points are in hw/xen/trace-events.
* linux-user/trace-events abbreviates a tedious list of filenames to
*/signal.c.
* net/colo-compare and net/filter-rewriter.c use pseudo trace points
colo_compare_miscompare and colo_filter_rewriter_debug to guard
debug code.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200806141334.3646302-5-armbru@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Some typedefs and macros are defined after the type check macros.
This makes it difficult to automatically replace their
definitions with OBJECT_DECLARE_TYPE.
Patch generated using:
$ ./scripts/codeconverter/converter.py -i \
--pattern=QOMStructTypedefSplit $(git grep -l '' -- '*.[ch]')
which will split "typdef struct { ... } TypedefName"
declarations.
Followed by:
$ ./scripts/codeconverter/converter.py -i --pattern=MoveSymbols \
$(git grep -l '' -- '*.[ch]')
which will:
- move the typedefs and #defines above the type check macros
- add missing #include "qom/object.h" lines if necessary
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-9-ehabkost@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20200831210740.126168-10-ehabkost@redhat.com>
Message-Id: <20200831210740.126168-11-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
* New Supermicro X11 BMC machine (Erik)
* Fixed valid access size on AST2400 SCU
* Improved robustness of the ftgmac100 model.
* New flash models in m25p80 (Igor)
* Fixed reset sequence of SDHCI/eMMC controllers
* Improved support of the AST2600 SDMC (Joel)
* Couple of SMC cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAl9OQPgACgkQUaNDx8/7
7KHzsRAAmXw6963D3wIuE2Nzb1G5Zvn6nup3AsF5Xs1IZU/cLqNijiz220KslFtQ
y8KrTO/eyBmAsEjrg1f6bWwCTZsouKq/2vWPtmTx3eU4HgeJdPbkln7E1YGmMfBR
T4WJU6mNqkWfFT3WAW3IbB4qCoH3l0DRkgawYPWbdJmTs5CBtXOYCT14TijDVWQ5
p8S4QjTtfRPwG9csHJ1W93t8jadTzderefkN6Zcmf9y6iOCif6SVDFvF769hzg6e
Pzp3xxRV3ewxhSLrGdCK+fQk/IcPaLVUnh+mM3mGLk2rDQoomFXBpaz1N94rw43s
lGuIyLkUGiHbgONmlZMXj03WWQbgGqjYpDWme1rAKJSX6CRJRixucejsRFTG5Evx
odgY1MGNrdg0K8L0O1SQEx7O+URZZO68WrtrMTwLbOHErE7pWAR+h5RqzclwMr3v
0hwQxDeNjhDBj+nUwoPUjXsgfVafzeywFfKuMymnygGog5hFSWiqAFIqyxj+u6YI
HUG8kMHdLqzAgX1NWAomn2cxUEc4Q2wxDlzUgvjcvBwa6HZD+3nrjMRStHTmeVy5
yPKWmRanXH6xIUJoRd2dMEU6SrwGjmjfnKAbG3vgxJ6B5sk4BrfKOFeaCF9M2zP6
ZePWf6XrsPQY7aZgQRTexmXK83jqn73DOkavI2pM9s/6Ts61mdc=
=ZQHA
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/legoater/tags/pull-aspeed-20200901' into staging
Various fixes of Aspeed machines :
* New Supermicro X11 BMC machine (Erik)
* Fixed valid access size on AST2400 SCU
* Improved robustness of the ftgmac100 model.
* New flash models in m25p80 (Igor)
* Fixed reset sequence of SDHCI/eMMC controllers
* Improved support of the AST2600 SDMC (Joel)
* Couple of SMC cleanups
# gpg: Signature made Tue 01 Sep 2020 13:39:20 BST
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1
* remotes/legoater/tags/pull-aspeed-20200901:
hw: add a number of SPI-flash's of m25p80 family
arm: aspeed: add strap define `25HZ` of AST2500
aspeed/smc: Open AHB window of the second chip of the AST2600 FMC controller
aspeed/sdmc: Simplify calculation of RAM bits
aspeed/sdmc: Allow writes to unprotected registers
aspeed/sdmc: Perform memory training
ftgmac100: Improve software reset
ftgmac100: Fix integer overflow in ftgmac100_do_tx()
ftgmac100: Check for invalid len and address before doing a DMA transfer
ftgmac100: Change interrupt status when a DMA error occurs
ftgmac100: Fix interrupt status "Packet moved to RX FIFO"
ftgmac100: Fix interrupt status "Packet transmitted on ethernet"
ftgmac100: Fix registers that can be read
aspeed/sdhci: Fix reset sequence
aspeed/smc: Fix max_slaves of the legacy SMC device
aspeed/smc: Fix MemoryRegionOps definition
hw/arm/aspeed: Add board model for Supermicro X11 BMC
aspeed/scu: Fix valid access size on AST2400
m25p80: Add support for n25q512ax3
m25p80: Return the JEDEC ID twice for mx25l25635e
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Since nvme_map_prp always operate on the request-scoped qsg/iovs, just
pass a single pointer to the NvmeRequest instead of two for each of the
qsg and iov.
Suggested-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Since clean up of the request qsg/iov is now always done post-use, there
is no need to use a stack-allocated qsg/iov in nvme_dma_prp.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Always destroy the request qsg/iov at the end of request use.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Instead of passing around the NvmeNamespace and the NvmeCmd, add them as
members in the NvmeRequest structure.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
The NVM Express specification generally uses 'zeroes' and not 'zeros',
so let us align with it.
Cc: Fam Zheng <fam@euphon.net>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Add 'mdts' device parameter to control the Maximum Data Transfer Size of
the controller and check that it is respected.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Hoist bounds checking into its own function and check for wrap-around.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Before this patch the device already supported PRP lists in the CMB, but
it did not check for the validity of it nor announced the support in the
Identify Controller data structure LISTS field.
If some of the PRPs in a PRP list are in the CMB, then ALL entries must
be there. This patch makes sure that requirement is verified as well as
properly announcing support for PRP lists in the CMB.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Introduce the nvme_map helper to remove some noise in the main nvme_rw
function.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Refactor the nvme_dma_{read,write}_prp functions into a common function
taking a DMADirection parameter.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Make sure the request iov is destroyed before reuse; fixing a memory
leak.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Remove the has_sg member from NvmeRequest since it's redundant.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
The QSG isn't always initialized, so accounting could be wrong. Issue a
call to blk_acct_start instead with the size taken from the QSG or IOV
depending on the kind of I/O.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Add nvme_map_addr, nvme_map_addr_cmb and nvme_addr_to_cmb helpers and
use them in nvme_map_prp.
This fixes a bug where in the case of a CMB transfer, the device would
map to the buffer with a wrong length.
Fixes: b2b2b67a00 ("nvme: Add support for Read Data and Write Data in CMBs.")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com>
This is preparatory to subsequent patches that change how QSGs/IOVs are
handled. It is important that the qsg and iov members of the NvmeRequest
are initially zeroed.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
The SUBNQN field is mandatory in NVM Express 1.3.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200706061303.246057-18-its@irrelevant.dk>
Support returning Command Sequence Error if Set Features on Number of
Queues is called after queues have been created.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Message-Id: <20200706061303.246057-17-its@irrelevant.dk>
Reject the nsid broadcast value (0xffffffff) and 0xfffffffe in the
Active Namespace ID list.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200706061303.246057-16-its@irrelevant.dk>
Since we are not providing the NGUID or EUI64 fields, we must support
the Namespace UUID. We do not have any way of storing a persistent
unique identifier, so conjure up a UUID that is just the namespace id.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200706061303.246057-15-its@irrelevant.dk>
0xffff is not an allowed value for NCQR and NSQR in Set Features on
Number of Queues.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Message-Id: <20200706061303.246057-14-its@irrelevant.dk>
Since the device does not have any persistent state storage, no
features are "saveable" and setting the Save (SV) field in any Set
Features command will result in a Feature Identifier Not Saveable status
code.
Similarly, if the Select (SEL) field is set to request saved values, the
devices will (as it should) return the default values instead.
Since this also introduces "Supported Capabilities", the nsid field is
now also checked for validity wrt. the feature being get/set'ed.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200706061303.246057-13-its@irrelevant.dk>
If the write cache is disabled with a Set Features command, flush it if
currently enabled.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200706061303.246057-11-its@irrelevant.dk>
The NvmeFeatureVal does not belong with the spec-related data structures
in include/block/nvme.h that is shared between the block-level nvme
driver and the emulated nvme device.
Move it into the nvme device specific header file as it is the only
user of the structure. Also, remove the unused members.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200706061303.246057-10-its@irrelevant.dk>
Add support for the Asynchronous Event Request command. Required for
compliance with NVMe revision 1.3d. See NVM Express 1.3d, Section 5.2
("Asynchronous Event Request command").
Mostly imported from Keith's qemu-nvme tree. Modified with a max number
of queued events (controllable with the aer_max_queued device
parameter). The spec states that the controller *should* retain
events, so we do best effort here.
Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Message-Id: <20200706061303.246057-9-its@irrelevant.dk>
Add support for the Get Log Page command and basic implementations of
the mandatory Error Information, SMART / Health Information and Firmware
Slot Information log pages.
In violation of the specification, the SMART / Health Information log
page does not persist information over the lifetime of the controller
because the device has no place to store such persistent state.
Note that the LPA field in the Identify Controller data structure
intentionally has bit 0 cleared because there is no namespace specific
information in the SMART / Health information log page.
Required for compliance with NVMe revision 1.3d. See NVM Express 1.3d,
Section 5.14 ("Get Log Page command").
Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200706061303.246057-8-its@irrelevant.dk>
Mark firmware slot 1 as read-only and only support that slot.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200706061303.246057-7-its@irrelevant.dk>
It might seem weird to implement this feature for an emulated device,
but it is mandatory to support and the feature is useful for testing
asynchronous event request support, which will be added in a later
patch.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Message-Id: <20200706061303.246057-6-its@irrelevant.dk>
Required for compliance with NVMe revision 1.3d. See NVM Express 1.3d,
Section 5.1 ("Abort command").
The Abort command is a best effort command; for now, the device always
fails to abort the given command.
Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Message-Id: <20200706061303.246057-5-its@irrelevant.dk>
Add various additional tracing and streamline nvme_identify_ns and
nvme_identify_nslist (they do not need to repeat the command, it is
already in the trace name).
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200706061303.246057-4-its@irrelevant.dk>
Fix a missing cpu_to conversion by moving conversion to just before
returning instead.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200706061303.246057-3-its@irrelevant.dk>
Add missing fields in the Identify Controller and Identify Namespace
data structures to bring them in line with NVMe v1.3.
This also adds data structures and defines for SGL support which
requires a couple of trivial changes to the nvme block driver as well.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Fam Zheng <fam@euphon.net>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Message-Id: <20200706061303.246057-2-its@irrelevant.dk>
Simplify the NVMe emulated device by aligning the I/O BAR to 4 KiB.
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200630110429.19972-5-philmd@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
At some point the URL changed, update it to avoid other
developers to search for it.
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200630110429.19972-2-philmd@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Support a following SPI flashes:
* mx66l51235f
* mt25ql512ab
Signed-off-by: Igor Kononenko <i.kononenko@yadro.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20200811203724.20699-1-i.kononenko@yadro.com>
Message-Id: <20200819100956.2216690-22-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The mx25l25635e returns the JEDEC ID twice when issuing a RDID command :
[ 2.512027] aspeed-smc 1e630000.spi: reading JEDEC ID C2:20:19:C2:20:19
This can break some firmware testing for this condition on the
supermicrox11-bmc machine.
Reported-by: Erik Smit <erik.lucas.smit@gmail.com>
Message-Id: <20200819100956.2216690-2-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Remove superfluous breaks, as there is a "return" before them.
Signed-off-by: Liao Pingfang <liao.pingfang@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1594631126-36631-1-git-send-email-wang.yi59@zte.com.cn>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Currently we have a SWIM typedef and a SWIM type checking macro,
but OBJECT_DECLARE* would transform the SWIM macro into a
function, and the function name would conflict with the SWIM
typedef name.
Rename the struct and typedef to "Swim". This will make future
conversion to OBJECT_DECLARE* easier.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Laurent Vivier <laurent@vivier.eu>
Tested-By: Roman Bolshakov <r.bolshakov@yadro.com>
Message-Id: <20200825192110.3528606-50-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Automatically size the number of request virtqueues to match the number
of vCPUs. This ensures that completion interrupts are handled on the
same vCPU that submitted the request. No IPI is necessary to complete
an I/O request and performance is improved. The maximum number of MSI-X
vectors and virtqueues limit are respected.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20200818143348.310613-8-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Automatically size the number of virtio-blk-pci request virtqueues to
match the number of vCPUs. Other transports continue to default to 1
request virtqueue.
A 1:1 virtqueue:vCPU mapping ensures that completion interrupts are
handled on the same vCPU that submitted the request. No IPI is
necessary to complete an I/O request and performance is improved. The
maximum number of MSI-X vectors and virtqueues limit are respected.
Performance improves from 78k to 104k IOPS on a 32 vCPU guest with 101
virtio-blk-pci devices (ioengine=libaio, iodepth=1, bs=4k, rw=randread
with NVMe storage).
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Message-Id: <20200818143348.310613-7-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Meson doesn't enjoy the same flexibility we have with Make in choosing
the include path. In particular the tracing headers are using
$(build_root)/$(<D).
In order to keep the include directives unchanged,
the simplest solution is to generate headers with patterns like
"trace/trace-audio.h" and place forwarding headers in the source tree
such that for example "audio/trace.h" includes "trace/trace-audio.h".
This patch is too ugly to be applied to the Makefiles now. It's only
a way to separate the changes to the tracing header files from the
Meson rewrite of the tracing logic.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
object_get_canonical_path_component() returns a malloced copy of a
property name on success, null on failure.
19 of its 25 callers immediately free the returned copy.
Change object_get_canonical_path_component() to return the property
name directly. Since modifying the name would be wrong, adjust the
return type to const char *.
Drop the free from the 19 callers become simpler, add the g_strdup()
to the other six.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200714160202.3121879-4-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
If we want to check error after errp-function call, we need to
introduce local_err and then propagate it to errp. Instead, use
the ERRP_GUARD() macro, benefits are:
1. No need of explicit error_propagate call
2. No need of explicit local_err variable: use errp directly
3. ERRP_GUARD() leaves errp as is if it's not NULL or
&error_fatal, this means that we don't break error_abort
(we'll abort on error_set, not on error_propagate)
If we want to add some info to errp (by error_prepend() or
error_append_hint()), we must use the ERRP_GUARD() macro.
Otherwise, this info will not be added when errp == &error_fatal
(the program will exit prior to the error_append_hint() or
error_prepend() call). No such cases are being fixed here.
This commit is generated by command
sed -n '/^X86 Xen CPUs$/,/^$/{s/^F: //p}' MAINTAINERS | \
xargs git ls-files | grep '\.[hc]$' | \
xargs spatch \
--sp-file scripts/coccinelle/errp-guard.cocci \
--macro-file scripts/cocci-macro-file.h \
--in-place --no-show-diff --max-width 80
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200707165037.1026246-9-armbru@redhat.com>
[ERRP_AUTO_PROPAGATE() renamed to ERRP_GUARD(), and
auto-propagated-errp.cocci to errp-guard.cocci. Commit message
tweaked again.]
If we want to check error after errp-function call, we need to
introduce local_err and then propagate it to errp. Instead, use
the ERRP_GUARD() macro, benefits are:
1. No need of explicit error_propagate call
2. No need of explicit local_err variable: use errp directly
3. ERRP_GUARD() leaves errp as is if it's not NULL or
&error_fatal, this means that we don't break error_abort
(we'll abort on error_set, not on error_propagate)
If we want to add some info to errp (by error_prepend() or
error_append_hint()), we must use the ERRP_GUARD() macro.
Otherwise, this info will not be added when errp == &error_fatal
(the program will exit prior to the error_append_hint() or
error_prepend() call). No such cases are being fixed here.
This commit is generated by command
sed -n '/^Parallel NOR Flash devices$/,/^$/{s/^F: //p}' \
MAINTAINERS | \
xargs git ls-files | grep '\.[hc]$' | \
xargs spatch \
--sp-file scripts/coccinelle/errp-guard.cocci \
--macro-file scripts/cocci-macro-file.h \
--in-place --no-show-diff --max-width 80
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200707165037.1026246-5-armbru@redhat.com>
[ERRP_AUTO_PROPAGATE() renamed to ERRP_GUARD(), and
auto-propagated-errp.cocci to errp-guard.cocci. Commit message
tweaked again.]
Convert
visit_type_FOO(v, ..., &ptr, &err);
...
if (err) {
...
}
to
visit_type_FOO(v, ..., &ptr, errp);
...
if (!ptr) {
...
}
for functions that set @ptr to non-null / null on success / error.
Eliminate error_propagate() that are now unnecessary. Delete @err
that are now unused.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-40-armbru@redhat.com>
When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away. The previous two commits did that for sufficiently simple
cases with Coccinelle. Do it for several more manually.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-37-armbru@redhat.com>
When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away. Convert
if (!foo(..., &err)) {
...
error_propagate(errp, err);
...
return ...
}
to
if (!foo(..., errp)) {
...
...
return ...
}
where nothing else needs @err. Coccinelle script:
@rule1 forall@
identifier fun, err, errp, lbl;
expression list args, args2;
binary operator op;
constant c1, c2;
symbol false;
@@
if (
(
- fun(args, &err, args2)
+ fun(args, errp, args2)
|
- !fun(args, &err, args2)
+ !fun(args, errp, args2)
|
- fun(args, &err, args2) op c1
+ fun(args, errp, args2) op c1
)
)
{
... when != err
when != lbl:
when strict
- error_propagate(errp, err);
... when != err
(
return;
|
return c2;
|
return false;
)
}
@rule2 forall@
identifier fun, err, errp, lbl;
expression list args, args2;
expression var;
binary operator op;
constant c1, c2;
symbol false;
@@
- var = fun(args, &err, args2);
+ var = fun(args, errp, args2);
... when != err
if (
(
var
|
!var
|
var op c1
)
)
{
... when != err
when != lbl:
when strict
- error_propagate(errp, err);
... when != err
(
return;
|
return c2;
|
return false;
|
return var;
)
}
@depends on rule1 || rule2@
identifier err;
@@
- Error *err = NULL;
... when != err
Not exactly elegant, I'm afraid.
The "when != lbl:" is necessary to avoid transforming
if (fun(args, &err)) {
goto out
}
...
out:
error_propagate(errp, err);
even though other paths to label out still need the error_propagate().
For an actual example, see sclp_realize().
Without the "when strict", Coccinelle transforms vfio_msix_setup(),
incorrectly. I don't know what exactly "when strict" does, only that
it helps here.
The match of return is narrower than what I want, but I can't figure
out how to express "return where the operand doesn't use @err". For
an example where it's too narrow, see vfio_intx_enable().
Silently fails to convert hw/arm/armsse.c, because Coccinelle gets
confused by ARMSSE being used both as typedef and function-like macro
there. Converted manually.
Line breaks tidied up manually. One nested declaration of @local_err
deleted manually. Preexisting unwanted blank line dropped in
hw/riscv/sifive_e.c.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-35-armbru@redhat.com>
The previous commit enables conversion of
foo(..., &err);
if (err) {
...
}
to
if (!foo(..., errp)) {
...
}
for QOM functions that now return true / false on success / error.
Coccinelle script:
@@
identifier fun = {
object_apply_global_props, object_initialize_child_with_props,
object_initialize_child_with_propsv, object_property_get,
object_property_get_bool, object_property_parse, object_property_set,
object_property_set_bool, object_property_set_int,
object_property_set_link, object_property_set_qobject,
object_property_set_str, object_property_set_uint, object_set_props,
object_set_propv, user_creatable_add_dict,
user_creatable_complete, user_creatable_del
};
expression list args, args2;
typedef Error;
Error *err;
@@
- fun(args, &err, args2);
- if (err)
+ if (!fun(args, &err, args2))
{
...
}
Fails to convert hw/arm/armsse.c, because Coccinelle gets confused by
ARMSSE being used both as typedef and function-like macro there.
Convert manually.
Line breaks tidied up manually.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200707160613.848843-29-armbru@redhat.com>
The object_property_set_FOO() setters take property name and value in
an unusual order:
void object_property_set_FOO(Object *obj, FOO_TYPE value,
const char *name, Error **errp)
Having to pass value before name feels grating. Swap them.
Same for object_property_set(), object_property_get(), and
object_property_parse().
Convert callers with this Coccinelle script:
@@
identifier fun = {
object_property_get, object_property_parse, object_property_set_str,
object_property_set_link, object_property_set_bool,
object_property_set_int, object_property_set_uint, object_property_set,
object_property_set_qobject
};
expression obj, v, name, errp;
@@
- fun(obj, v, name, errp)
+ fun(obj, name, v, errp)
Chokes on hw/arm/musicpal.c's lcd_refresh() with the unhelpful error
message "no position information". Convert that one manually.
Fails to convert hw/arm/armsse.c, because Coccinelle gets confused by
ARMSSE being used both as typedef and function-like macro there.
Convert manually.
Fails to convert hw/rx/rx-gdbsim.c, because Coccinelle gets confused
by RXCPU being used both as typedef and function-like macro there.
Convert manually. The other files using RXCPU that way don't need
conversion.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200707160613.848843-27-armbru@redhat.com>
[Straightforwad conflict with commit 2336172d9b "audio: set default
value for pcspk.iobase property" resolved]
The previous commit enables conversion of
visit_foo(..., &err);
if (err) {
...
}
to
if (!visit_foo(..., errp)) {
...
}
for visitor functions that now return true / false on success / error.
Coccinelle script:
@@
identifier fun =~ "check_list|input_type_enum|lv_start_struct|lv_type_bool|lv_type_int64|lv_type_str|lv_type_uint64|output_type_enum|parse_type_bool|parse_type_int64|parse_type_null|parse_type_number|parse_type_size|parse_type_str|parse_type_uint64|print_type_bool|print_type_int64|print_type_null|print_type_number|print_type_size|print_type_str|print_type_uint64|qapi_clone_start_alternate|qapi_clone_start_list|qapi_clone_start_struct|qapi_clone_type_bool|qapi_clone_type_int64|qapi_clone_type_null|qapi_clone_type_number|qapi_clone_type_str|qapi_clone_type_uint64|qapi_dealloc_start_list|qapi_dealloc_start_struct|qapi_dealloc_type_anything|qapi_dealloc_type_bool|qapi_dealloc_type_int64|qapi_dealloc_type_null|qapi_dealloc_type_number|qapi_dealloc_type_str|qapi_dealloc_type_uint64|qobject_input_check_list|qobject_input_check_struct|qobject_input_start_alternate|qobject_input_start_list|qobject_input_start_struct|qobject_input_type_any|qobject_input_type_bool|qobject_input_type_bool_keyval|qobject_input_type_int64|qobject_input_type_int64_keyval|qobject_input_type_null|qobject_input_type_number|qobject_input_type_number_keyval|qobject_input_type_size_keyval|qobject_input_type_str|qobject_input_type_str_keyval|qobject_input_type_uint64|qobject_input_type_uint64_keyval|qobject_output_start_list|qobject_output_start_struct|qobject_output_type_any|qobject_output_type_bool|qobject_output_type_int64|qobject_output_type_null|qobject_output_type_number|qobject_output_type_str|qobject_output_type_uint64|start_list|visit_check_list|visit_check_struct|visit_start_alternate|visit_start_list|visit_start_struct|visit_type_.*";
expression list args;
typedef Error;
Error *err;
@@
- fun(args, &err);
- if (err)
+ if (!fun(args, &err))
{
...
}
A few line breaks tidied up manually.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200707160613.848843-19-armbru@redhat.com>
Convert
foo(..., &err);
if (err) {
...
}
to
if (!foo(..., &err)) {
...
}
for qdev_realize(), qdev_realize_and_unref(), qbus_realize() and their
wrappers isa_realize_and_unref(), pci_realize_and_unref(),
sysbus_realize(), sysbus_realize_and_unref(), usb_realize_and_unref().
Coccinelle script:
@@
identifier fun = {
isa_realize_and_unref, pci_realize_and_unref, qbus_realize,
qdev_realize, qdev_realize_and_unref, sysbus_realize,
sysbus_realize_and_unref, usb_realize_and_unref
};
expression list args, args2;
typedef Error;
Error *err;
@@
- fun(args, &err, args2);
- if (err)
+ if (!fun(args, &err, args2))
{
...
}
Chokes on hw/arm/musicpal.c's lcd_refresh() with the unhelpful error
message "no position information". Nothing to convert there; skipped.
Fails to convert hw/arm/armsse.c, because Coccinelle gets confused by
ARMSSE being used both as typedef and function-like macro there.
Converted manually.
A few line breaks tidied up manually.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200707160613.848843-5-armbru@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: John Snow <jsnow@redhat.com>
Message-Id: <20200619091905.21676-6-kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
acpi aml generator needs this, but it is in floppy code now
so we can make the function static.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: John Snow <jsnow@redhat.com>
Message-Id: <20200619091905.21676-5-kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
DSDT change: isa device order changes in case MI1 (ipmi) is present.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20200619091905.21676-4-kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
qdev_prop_set_drive() can fail. None of the other qdev_prop_set_FOO()
can; they abort on error.
To clean up this inconsistency, rename qdev_prop_set_drive() to
qdev_prop_set_drive_err(), and create a qdev_prop_set_drive() that
aborts on error.
Coccinelle script to update callers:
@ depends on !(file in "hw/core/qdev-properties-system.c")@
expression dev, name, value;
symbol error_abort;
@@
- qdev_prop_set_drive(dev, name, value, &error_abort);
+ qdev_prop_set_drive(dev, name, value);
@@
expression dev, name, value, errp;
@@
- qdev_prop_set_drive(dev, name, value, errp);
+ qdev_prop_set_drive_err(dev, name, value, errp);
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200622094227.1271650-14-armbru@redhat.com>
Deprecate
-global isa-fdc.driveA=...
-global isa-fdc.driveB=...
in favour of
-device floppy,unit=0,drive=...
-device floppy,unit=1,drive=...
Same for the other floppy controller devices.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: John Snow <jsnow@redhat.com>
Message-Id: <20200622094227.1271650-7-armbru@redhat.com>
Helper function fdctrl_init_isa() is less than helpful: one of three
places creating "isa-fdc" devices use it. Open-code it there, and
drop the function.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200622094227.1271650-6-armbru@redhat.com>
The floppy controller devices desugar their drive properties into
floppy devices (since commit a92bd191a4 "fdc: Move qdev properties to
FloppyDrive", v2.8.0). This involves some bad magic in
fdctrl_connect_drives(), and exists for backward compatibility.
The functions for boards to create floppy controller devices
fdctrl_init_isa(), fdctrl_init_sysbus(), and sun4m_fdctrl_init()
desugar -drive if=floppy to these floppy controller drive properties.
If you use both -drive if=floppy (or its -fda / -fdb sugar) and
-global isa-fdc for the same floppy device, -global silently loses the
conflict, and both backends involved end up with the floppy device
frontend attached, as demonstrated by iotest 172 (see commit before
previous). This is wrong.
Desugar -drive if=floppy straight to floppy devices instead, with
helper fdctrl_init_drives(). The conflict now gets rejected cleanly:
first, fdctrl_connect_drives() creates the floppy for the controller's
property, then fdctrl_init_drives() attempts to create the floppy for
-drive if=floppy, but fails because the unit is already in use.
Output of iotest 172 changes in three ways:
1. The clash gets rejected.
2. In one test case, "info qtree" has the floppy devices swapped, and
"info block" has their QOM paths swapped. This is because the
floppy device for -fda now gets created after the one for -global
isa-fdc.driveB.
3. The error message for -global floppy.drive=floppy0 changes. Before
the patch, we set isa-fdc.driveA to -fda's block backend, then
create the floppy device for it, then move the backend from
isa-fdc.driveA to floppy.drive. Floppy creation fails when
applying -global floppy.drive=floppy0, because floppy0 is still
attached to isa-fdc. After the patch, we create the floppy for
-fda, then set its drive property to floppy0. Now floppy creation
succeeds, but setting the drive property fails, because -global
already set it. Yes, this is exasperatingly complicated.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200622094227.1271650-5-armbru@redhat.com>
Convert all size-related properties in BlockConf to 32bit. This will
accommodate bigger block sizes (in a followup patch). This also allows
to make them all accept size suffixes, either via DEFINE_PROP_BLOCKSIZE
or via DEFINE_PROP_SIZE32.
Also, since min_io_size is exposed to the guest by scsi and virtio-blk
devices as an uint16_t in units of logical blocks, introduce an
additional check in blkconf_blocksizes to prevent its silent truncation.
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Message-Id: <20200528225516.1676602-7-rvkagan@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Several block device properties related to blocksize configuration must
be in certain relationship WRT each other: physical block must be no
smaller than logical block; min_io_size, opt_io_size, and
discard_granularity must be a multiple of a logical block.
To ensure these requirements are met, add corresponding consistency
checks to blkconf_blocksizes, adjusting its signature to communicate
possible error to the caller. Also remove the now redundant consistency
checks from the specific devices.
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <20200528225516.1676602-3-rvkagan@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The width of opt_io_size in virtio_blk_config is 32bit. However, it's
written with virtio_stw_p; this may result in value truncation, and on
big-endian systems with legacy virtio in completely bogus readings in
the guest.
Use the appropriate accessor to store it.
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200528225516.1676602-2-rvkagan@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Pass an Error to msix_init_exclusive_bar() and check it.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Message-Id: <20200609190333.59390-23-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Decouple the requested maximum number of ioqpairs (param max_ioqpairs)
from the number of MSI-X interrupt vectors by introducing a new
msix_qsize parameter and initialize MSI-X with that. This allows
emulating a device that has fewer vectors than I/O queue pairs and also
allows more than 2048 queue pairs. To keep the device behaving as
previously, use a msix_qsize default of 65 (default max_ioqpairs + 1).
This decoupling was actually suggested by Maxim some time ago in a
slightly different context, so adding a Suggested-by.
Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Message-Id: <20200609190333.59390-22-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
msix_vector_use() returns -EINVAL on error. Assert it won't.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200609190333.59390-21-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-20-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200609190333.59390-19-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200609190333.59390-18-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-17-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-16-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-15-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Introduce some small helpers to make the next patches easier on the eye.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-14-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-13-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-12-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-11-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-10-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The num_queues device paramater has a slightly confusing meaning because
it accounts for the admin queue pair which is not really optional.
Secondly, it is really a maximum value of queues allowed.
Add a new max_ioqpairs parameter that only accounts for I/O queue pairs,
but keep num_queues for compatibility.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-9-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
First, since the device only supports MSI-X or pin-based interrupt, if
MSI-X is not enabled, it should not accept interrupt vectors different
from 0 when creating completion queues.
Secondly, the irq_status NvmeCtrl member is meant to be compared to the
INTMS register, so it should only be 32 bits wide. And it is really only
useful when used with multi-message MSI.
Third, since we do not force a 1-to-1 correspondence between cqid and
interrupt vector, the irq_status register should not have bits set
according to cqid, but according to the associated interrupt vector.
Fix these issues, but keep irq_status available so we can easily support
multi-message MSI down the line.
Fixes: 5e9aa92eb1 ("hw/block: Fix pin-based interrupt behaviour of NVMe")
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-8-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Pull the controller memory buffer check to its own function. The check
will be used on its own in later patches.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-7-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-6-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move device configuration parameters to separate struct to make it
explicit what is configurable and what is set internally.
Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200609190333.59390-5-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
These break statements was left over when commit 3036a626e9 ("nvme:
add Get/Set Feature Timestamp support") was merged.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-4-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Change the prefix of all nvme device related trace events to 'pci_nvme'
to not clash with trace events from the nvme block driver.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-3-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The size of the BAR is 0x1000 (main registers) + 8 bytes for each
queue. Currently, the size of the BAR is calculated like so:
n->reg_size = pow2ceil(0x1004 + 2 * (n->num_queues + 1) * 4);
Since the 'num_queues' parameter already accounts for the admin queue,
this should in any case not need to be incremented by one. Also, the
size should be initialized to (0x1000).
n->reg_size = pow2ceil(0x1000 + 2 * n->num_queues * 4);
This, with the default value of num_queues (64), we will set aside room
for 1 admin queue and 63 I/O queues (4 bytes per doorbell, 2 doorbells
per queue).
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Message-Id: <20200609190333.59390-2-its@irrelevant.dk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
On restart, we were scheduling a BH to process queued requests, which
would run before starting up the data plane, leading to those requests
being assigned and started on coroutines on the main context.
This could cause requests to be wrongly processed in parallel from
different threads (the main thread and the iothread managing the data
plane), potentially leading to multiple issues.
For example, stopping and resuming a VM multiple times while the guest
is generating I/O on a virtio_blk device can trigger a crash with a
stack tracing looking like this one:
<------>
Thread 2 (Thread 0x7ff736765700 (LWP 1062503)):
#0 0x00005567a13b99d6 in iov_memset
(iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803)
at util/iov.c:69
#1 0x00005567a13bab73 in qemu_iovec_memset
(qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530
#2 0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86
#3 0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217
#4 0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323
#5 0x00005567a12f43d9 in qemu_laio_process_completions_and_submit (s=0x7ff7182e8420)
at block/linux-aio.c:236
#6 0x00005567a12f44c2 in qemu_laio_poll_cb (opaque=0x7ff7182e8430) at block/linux-aio.c:267
#7 0x00005567a13aed83 in run_poll_handlers_once (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8)
at util/aio-posix.c:520
#8 0x00005567a13aee9f in run_poll_handlers (ctx=0x5567a2b58c70, max_ns=16000, timeout=0x7ff7367645f8)
at util/aio-posix.c:562
#9 0x00005567a13aefde in try_poll_mode (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8)
at util/aio-posix.c:597
#10 0x00005567a13af115 in aio_poll (ctx=0x5567a2b58c70, blocking=true) at util/aio-posix.c:639
#11 0x00005567a109acca in iothread_run (opaque=0x5567a2b29760) at iothread.c:75
#12 0x00005567a13b2790 in qemu_thread_start (args=0x5567a2b694c0) at util/qemu-thread-posix.c:519
#13 0x00007ff73eedf2de in start_thread () at /lib64/libpthread.so.0
#14 0x00007ff73ec10e83 in clone () at /lib64/libc.so.6
Thread 1 (Thread 0x7ff743986f00 (LWP 1062500)):
#0 0x00005567a13b99d6 in iov_memset
(iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803)
at util/iov.c:69
#1 0x00005567a13bab73 in qemu_iovec_memset
(qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530
#2 0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86
#3 0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217
#4 0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323
#5 0x00005567a12f4a2f in laio_do_submit (fd=19, laiocb=0x7ff5f4ff9ae0, offset=472363008, type=2)
at block/linux-aio.c:375
#6 0x00005567a12f4af2 in laio_co_submit
(bs=0x5567a2b8c460, s=0x7ff7182e8420, fd=19, offset=472363008, qiov=0x7ff5f4ff9ca0, type=2)
at block/linux-aio.c:394
#7 0x00005567a12f1803 in raw_co_prw
(bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, type=2)
at block/file-posix.c:1892
#8 0x00005567a12f1941 in raw_co_pwritev
(bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, flags=0)
at block/file-posix.c:1925
#9 0x00005567a12fe3e1 in bdrv_driver_pwritev
(bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, qiov_offset=0, flags=0)
at block/io.c:1183
#10 0x00005567a1300340 in bdrv_aligned_pwritev
(child=0x5567a2b5b070, req=0x7ff5f4ff9db0, offset=472363008, bytes=20480, align=512, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0) at block/io.c:1980
#11 0x00005567a1300b29 in bdrv_co_pwritev_part
(child=0x5567a2b5b070, offset=472363008, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0)
at block/io.c:2137
#12 0x00005567a12baba1 in qcow2_co_pwritev_task
(bs=0x5567a2b92740, file_cluster_offset=472317952, offset=487305216, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, l2meta=0x0) at block/qcow2.c:2444
#13 0x00005567a12bacdb in qcow2_co_pwritev_task_entry (task=0x5567a2b48540) at block/qcow2.c:2475
#14 0x00005567a13167d8 in aio_task_co (opaque=0x5567a2b48540) at block/aio_task.c:45
#15 0x00005567a13cf00c in coroutine_trampoline (i0=738245600, i1=32759) at util/coroutine-ucontext.c:115
#16 0x00007ff73eb622e0 in __start_context () at /lib64/libc.so.6
#17 0x00007ff6626f1350 in ()
#18 0x0000000000000000 in ()
<------>
This is also known to cause crashes with this message (assertion
failed):
aio_co_schedule: Co-routine was already scheduled in 'aio_co_schedule'
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1812765
Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20200603093240.40489-3-slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move the code that processes queued requests from
virtio_blk_dma_restart_bh() to its own, non-static, function. This
will allow us to call it from the virtio_blk_data_plane_start() in a
future patch.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20200603093240.40489-2-slp@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All remaining conversions to qdev_realize() are for bus-less devices.
Coccinelle script:
// only correct for bus-less @dev!
@@
expression errp;
expression dev;
@@
- qdev_init_nofail(dev);
+ qdev_realize(dev, NULL, &error_fatal);
@ depends on !(file in "hw/core/qdev.c") && !(file in "hw/core/bus.c")@
expression errp;
expression dev;
symbol true;
@@
- object_property_set_bool(OBJECT(dev), true, "realized", errp);
+ qdev_realize(DEVICE(dev), NULL, errp);
@ depends on !(file in "hw/core/qdev.c") && !(file in "hw/core/bus.c")@
expression errp;
expression dev;
symbol true;
@@
- object_property_set_bool(dev, true, "realized", errp);
+ qdev_realize(DEVICE(dev), NULL, errp);
Note that Coccinelle chokes on ARMSSE typedef vs. macro in
hw/arm/armsse.c. Worked around by temporarily renaming the macro for
the spatch run.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200610053247.1583243-57-armbru@redhat.com>
Same transformation as in the previous commit. Manual, because
convincing Coccinelle to transform these cases is not worthwhile.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200610053247.1583243-21-armbru@redhat.com>
Same transformation as in the previous commit. Manual, because
convincing Coccinelle to transform these cases is somewhere between
not worthwhile and infeasible (at least for me).
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200610053247.1583243-11-armbru@redhat.com>
This is the transformation explained in the commit before previous.
Takes care of just one pattern that needs conversion. More to come in
this series.
Coccinelle script:
@ depends on !(file in "hw/arm/highbank.c")@
expression bus, type_name, dev, expr;
@@
- dev = qdev_create(bus, type_name);
+ dev = qdev_new(type_name);
... when != dev = expr
- qdev_init_nofail(dev);
+ qdev_realize_and_unref(dev, bus, &error_fatal);
@@
expression bus, type_name, dev, expr;
identifier DOWN;
@@
- dev = DOWN(qdev_create(bus, type_name));
+ dev = DOWN(qdev_new(type_name));
... when != dev = expr
- qdev_init_nofail(DEVICE(dev));
+ qdev_realize_and_unref(DEVICE(dev), bus, &error_fatal);
@@
expression bus, type_name, expr;
identifier dev;
@@
- DeviceState *dev = qdev_create(bus, type_name);
+ DeviceState *dev = qdev_new(type_name);
... when != dev = expr
- qdev_init_nofail(dev);
+ qdev_realize_and_unref(dev, bus, &error_fatal);
@@
expression bus, type_name, dev, expr, errp;
symbol true;
@@
- dev = qdev_create(bus, type_name);
+ dev = qdev_new(type_name);
... when != dev = expr
- object_property_set_bool(OBJECT(dev), true, "realized", errp);
+ qdev_realize_and_unref(dev, bus, errp);
@@
expression bus, type_name, expr, errp;
identifier dev;
symbol true;
@@
- DeviceState *dev = qdev_create(bus, type_name);
+ DeviceState *dev = qdev_new(type_name);
... when != dev = expr
- object_property_set_bool(OBJECT(dev), true, "realized", errp);
+ qdev_realize_and_unref(dev, bus, errp);
The first rule exempts hw/arm/highbank.c, because it matches along two
control flow paths there, with different @type_name. Covered by the
next commit's manual conversions.
Missing #include "qapi/error.h" added manually.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200610053247.1583243-10-armbru@redhat.com>
[Conflicts in hw/misc/empty_slot.c and hw/sparc/leon3.c resolved]
Let's start simple and put qdev_new() to use. Coccinelle script:
@ depends on !(file in "hw/core/qdev.c")@
expression type_name;
@@
- DEVICE(object_new(type_name))
+ qdev_new(type_name)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200610053247.1583243-6-armbru@redhat.com>
We use the Object type all over the place.
Forward declare it in "qemu/typedefs.h".
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20200504115656.6045-2-f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A socket write during vhost-user communication may trigger a disconnect
event, calling vhost_user_blk_disconnect() and clearing all the
vhost_dev structures holding data that vhost-user functions expect to
remain valid to roll back initialization correctly. Delay the cleanup to
keep vhost_dev structure valid.
There are two possible states to handle:
1. RUN_STATE_PRELAUNCH: skip bh oneshot call and perform disconnect in
the caller routine.
2. RUN_STATE_RUNNING: delay by using bh
BH changes are based on the similar changes for the vhost-user-net
device:
commit e7c83a885f
"vhost-user: delay vhost_user_stop"
Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
Message-Id: <69b73b94dcd066065595266c852810e0863a0895.1590396396.git.dimastep@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Li Feng <fengli@smartx.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Now than the non-target specific memory_region_msync() function
is available, use it to make this device target-agnostic.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20200508062456.23344-4-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
When updating the PFLASH file contents, we should check for a
possible failure of blk_pwrite(). Similar to commit 3a688294e.
Reported-by: Coverity (CID 1357678 CHECKED_RETURN)
Signed-off-by: Mansour Ahmadi <mansourweb@gmail.com>
Message-Id: <20200408003552.58095-1-mansourweb@gmail.com>
[PMD: Add missing "qemu/error-report.h" include and TODO comment]
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Rename the 'reset_flash' as 'mode_read_array' to make explicit we
do not reset the device, we simply set its internal state machine
in the READ_ARRAY mode. We do not reset the status register error
bits, as a device reset would do.
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190716221555.11145-5-philmd@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
The command 0x00 is used by this model since its origin (commit
05ee37ebf6). In this commit the command is described with a
amusing '/* ??? */' comment, probably meaning 'FIXME'.
switch (cmd) {
case 0x00: /* ??? */
...
This comment survived 12 years because the 0x00 value is indeed
not specified by the CFI open standard (as of this commit).
The 'cmd' field is transfered during migration. To keep the
migration feature working with older QEMU version, we have to
take a lot of care with migrated field. We figured out it is
too late to remove a non-specified value from this model
(this would make migration review very complex). It is however
not too late to improve the documentation.
Add few comments to remember this is a special value related
to QEMU, and we won't find information about it on the CFI
spec.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190716221555.11145-3-philmd@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
The 'CFI02' NOR flash was introduced in commit 29133e9a0f, with
timing modelled. One year later, the CFI01 model was introduced
(commit 05ee37ebf6) based on the CFI02 model. As noted in the
header, "It does not support timings". 12 years later, we never
had to model the device timings. Time to remove the unused timer,
we can still add it back if required.
Suggested-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
[Laszlo Ersek: Regression tested EDK2 OVMF IA32X64, ArmVirtQemu Aarch64
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg04373.html]
Message-Id: <20190716221555.11145-2-philmd@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Use the QEMU_IS_ALIGNED() macro to verify the flash block size
is properly aligned. It is quicker to process when reviewing.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200511205246.24621-1-philmd@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Devices may have component devices and buses.
Device realization may fail. Realization is recursive: a device's
realize() method realizes its components, and device_set_realized()
realizes its buses (which should in turn realize the devices on that
bus, except bus_set_realized() doesn't implement that, yet).
When realization of a component or bus fails, we need to roll back:
unrealize everything we realized so far. If any of these unrealizes
failed, the device would be left in an inconsistent state. Must not
happen.
device_set_realized() lets it happen: it ignores errors in the roll
back code starting at label child_realize_fail.
Since realization is recursive, unrealization must be recursive, too.
But how could a partly failed unrealize be rolled back? We'd have to
re-realize, which can fail. This design is fundamentally broken.
device_set_realized() does not roll back at all. Instead, it keeps
unrealizing, ignoring further errors.
It can screw up even for a device with no buses: if the lone
dc->unrealize() fails, it still unregisters vmstate, and calls
listeners' unrealize() callback.
bus_set_realized() does not roll back either. Instead, it stops
unrealizing.
Fortunately, no unrealize method can fail, as we'll see below.
To fix the design error, drop parameter @errp from all the unrealize
methods.
Any unrealize method that uses @errp now needs an update. This leads
us to unrealize() methods that can fail. Merely passing it to another
unrealize method cannot cause failure, though. Here are the ones that
do other things with @errp:
* virtio_serial_device_unrealize()
Fails when qbus_set_hotplug_handler() fails, but still does all the
other work. On failure, the device would stay realized with its
resources completely gone. Oops. Can't happen, because
qbus_set_hotplug_handler() can't actually fail here. Pass
&error_abort to qbus_set_hotplug_handler() instead.
* hw/ppc/spapr_drc.c's unrealize()
Fails when object_property_del() fails, but all the other work is
already done. On failure, the device would stay realized with its
vmstate registration gone. Oops. Can't happen, because
object_property_del() can't actually fail here. Pass &error_abort
to object_property_del() instead.
* spapr_phb_unrealize()
Fails and bails out when remove_drcs() fails, but other work is
already done. On failure, the device would stay realized with some
of its resources gone. Oops. remove_drcs() fails only when
chassis_from_bus()'s object_property_get_uint() fails, and it can't
here. Pass &error_abort to remove_drcs() instead.
Therefore, no unrealize method can fail before this patch.
device_set_realized()'s recursive unrealization via bus uses
object_property_set_bool(). Can't drop @errp there, so pass
&error_abort.
We similarly unrealize with object_property_set_bool() elsewhere,
always ignoring errors. Pass &error_abort instead.
Several unrealize methods no longer handle errors from other unrealize
methods: virtio_9p_device_unrealize(),
virtio_input_device_unrealize(), scsi_qdev_unrealize(), ...
Much of the deleted error handling looks wrong anyway.
One unrealize methods no longer ignore such errors:
usb_ehci_pci_exit().
Several realize methods no longer ignore errors when rolling back:
v9fs_device_realize_common(), pci_qdev_unrealize(),
spapr_phb_realize(), usb_qdev_realize(), vfio_ccw_realize(),
virtio_device_realize().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200505152926.18877-17-armbru@redhat.com>
Several functions can't fail anymore: ich9_pm_add_properties(),
device_add_bootindex_property(), ppc_compat_add_property(),
spapr_caps_add_properties(), PropertyInfo.create(). Drop their @errp
parameter.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200505152926.18877-16-armbru@redhat.com>
when s->inflight is freed, vhost_dev_free_inflight may try to access
s->inflight->addr, it will retrigger the following issue.
==7309==ERROR: AddressSanitizer: heap-use-after-free on address 0x604001020d18 at pc 0x555555ce948a bp 0x7fffffffb170 sp 0x7fffffffb160
READ of size 8 at 0x604001020d18 thread T0
#0 0x555555ce9489 in vhost_dev_free_inflight /root/smartx/qemu-el7/qemu-test/hw/virtio/vhost.c:1473
#1 0x555555cd86eb in virtio_reset /root/smartx/qemu-el7/qemu-test/hw/virtio/virtio.c:1214
#2 0x5555560d3eff in virtio_pci_reset hw/virtio/virtio-pci.c:1859
#3 0x555555f2ac53 in device_set_realized hw/core/qdev.c:893
#4 0x5555561d572c in property_set_bool qom/object.c:1925
#5 0x5555561de8de in object_property_set_qobject qom/qom-qobject.c:27
#6 0x5555561d99f4 in object_property_set_bool qom/object.c:1188
#7 0x555555e50ae7 in qdev_device_add /root/smartx/qemu-el7/qemu-test/qdev-monitor.c:626
#8 0x555555e51213 in qmp_device_add /root/smartx/qemu-el7/qemu-test/qdev-monitor.c:806
#9 0x555555e8ff40 in hmp_device_add /root/smartx/qemu-el7/qemu-test/hmp.c:1951
#10 0x555555be889a in handle_hmp_command /root/smartx/qemu-el7/qemu-test/monitor.c:3404
#11 0x555555beac8b in monitor_command_cb /root/smartx/qemu-el7/qemu-test/monitor.c:4296
#12 0x555556433eb7 in readline_handle_byte util/readline.c:393
#13 0x555555be89ec in monitor_read /root/smartx/qemu-el7/qemu-test/monitor.c:4279
#14 0x5555563285cc in tcp_chr_read chardev/char-socket.c:470
#15 0x7ffff670b968 in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x4a968)
#16 0x55555640727c in glib_pollfds_poll util/main-loop.c:215
#17 0x55555640727c in os_host_main_loop_wait util/main-loop.c:238
#18 0x55555640727c in main_loop_wait util/main-loop.c:497
#19 0x555555b2d0bf in main_loop /root/smartx/qemu-el7/qemu-test/vl.c:2013
#20 0x555555b2d0bf in main /root/smartx/qemu-el7/qemu-test/vl.c:4776
#21 0x7fffdd2eb444 in __libc_start_main (/lib64/libc.so.6+0x22444)
#22 0x555555b3767a (/root/smartx/qemu-el7/qemu-test/x86_64-softmmu/qemu-system-x86_64+0x5e367a)
0x604001020d18 is located 8 bytes inside of 40-byte region [0x604001020d10,0x604001020d38)
freed by thread T0 here:
#0 0x7ffff6f00508 in __interceptor_free (/lib64/libasan.so.4+0xde508)
#1 0x7ffff671107d in g_free (/lib64/libglib-2.0.so.0+0x5007d)
previously allocated by thread T0 here:
#0 0x7ffff6f00a88 in __interceptor_calloc (/lib64/libasan.so.4+0xdea88)
#1 0x7ffff6710fc5 in g_malloc0 (/lib64/libglib-2.0.so.0+0x4ffc5)
SUMMARY: AddressSanitizer: heap-use-after-free /root/smartx/qemu-el7/qemu-test/hw/virtio/vhost.c:1473 in vhost_dev_free_inflight
Shadow bytes around the buggy address:
0x0c08801fc150: fa fa 00 00 00 00 04 fa fa fa fd fd fd fd fd fa
0x0c08801fc160: fa fa fd fd fd fd fd fd fa fa 00 00 00 00 04 fa
0x0c08801fc170: fa fa 00 00 00 00 00 01 fa fa 00 00 00 00 04 fa
0x0c08801fc180: fa fa 00 00 00 00 00 01 fa fa 00 00 00 00 00 01
0x0c08801fc190: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 04 fa
=>0x0c08801fc1a0: fa fa fd[fd]fd fd fd fa fa fa fd fd fd fd fd fa
0x0c08801fc1b0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
0x0c08801fc1c0: fa fa 00 00 00 00 00 fa fa fa fd fd fd fd fd fd
0x0c08801fc1d0: fa fa 00 00 00 00 00 01 fa fa fd fd fd fd fd fa
0x0c08801fc1e0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fd
0x0c08801fc1f0: fa fa 00 00 00 00 00 01 fa fa fd fd fd fd fd fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==7309==ABORTING
Signed-off-by: Li Feng <fengli@smartx.com>
Message-Id: <20200417101707.14467-1-fengli@smartx.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
This patch introduces support for PMR that has been defined as part of NVMe 1.4
spec. User can now specify a pmrdev option that should point to HostMemoryBackend.
pmrdev memory region will subsequently be exposed as PCI BAR 2 in emulated NVMe
device. Guest OS can perform mmio read and writes to the PMR region that will stay
persistent across system reboot.
Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200330164656.9348-1-andrzej.jakowski@linux.intel.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Since 7f5d9b206d ("object-add: don't create return value if
failed"), qmp_object_add() don't write any value in 'ret_data', thus
has random data. Then qobject_unref() fails and abort().
Fix by initialising 'ret_data' properly.
Fixes: 5f07c4d60d ("qapi: Flatten object-add")
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200406164207.1446817-1-anthony.perard@citrix.com>
Commit a31ca6801c ("qemu/queue.h: clear linked list pointers on
remove") revealed that a request was removed twice from a list, once
in xen_block_finish_request() and a second time in
xen_block_release_request() when both function are called from
xen_block_complete_aio(). But also, the `requests_inflight' counter is
decreased twice, and thus became negative.
This is a bug that was introduced in bfd0d63660 ("xen-block: improve
response latency"), where a `finished' list was removed.
That commit also introduced a leak of request in xen_block_do_aio().
That function calls xen_block_finish_request() but the request is
never released after that.
To fix both issue, we do two changes:
- we squash finish_request() and release_request() together as we want
to remove a request from 'inflight' list to add it to 'freelist'.
- before releasing a request, we need to let the other end know the
result, thus we should call xen_block_send_response() before
releasing a request.
The first change fixes the double QLIST_REMOVE() as we remove the extra
call. The second change makes the leak go away because if we want to
call finish_request(), we need to call a function that does all of
finish, send response, and release.
Fixes: bfd0d63660 ("xen-block: improve response latency")
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Message-Id: <20200406140217.1441858-1-anthony.perard@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
[mreitz: Amended commit message as per Paul's suggestions]
Signed-off-by: Max Reitz <mreitz@redhat.com>
the G_IO_HUP is watched in tcp_chr_connect, and the callback
vhost_user_blk_watch is not needed, because tcp_chr_hup is registered as
callback. And it will close the tcp link.
Signed-off-by: Li Feng <fengli@smartx.com>
Message-Id: <20200323052924.29286-1-fengli@smartx.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio_vqs forgot to free on the error path in realize(). Fix that.
The asan stack:
Direct leak of 14336 byte(s) in 1 object(s) allocated from:
#0 0x7f58b93fd970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970)
#1 0x7f58b858249d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d)
#2 0x5562cc627f49 in virtio_add_queue /mnt/sdb/qemu/hw/virtio/virtio.c:2413
#3 0x5562cc4b524a in virtio_blk_device_realize /mnt/sdb/qemu/hw/block/virtio-blk.c:1202
#4 0x5562cc613050 in virtio_device_realize /mnt/sdb/qemu/hw/virtio/virtio.c:3615
#5 0x5562ccb7a568 in device_set_realized /mnt/sdb/qemu/hw/core/qdev.c:891
#6 0x5562cd39cd45 in property_set_bool /mnt/sdb/qemu/qom/object.c:2238
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20200328005705.29898-2-pannengyuan@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
While working on the Tulip driver i tried to write some Teledisk images to
a floppy image which didn't work. Turned out that Teledisk checks the written
data by issuing a READ command to the FDC but running the DMA controller
in VERIFY mode. As we ignored the DMA request in that case, the DMA transfer
never finished, and Teledisk reported an error.
The i8257 spec says about verify transfers:
3) DMA verify, which does not actually involve the transfer of data. When an
8257 channel is in the DMA verify mode, it will respond the same as described
for transfer operations, except that no memory or I/O read/write control signals
will be generated.
Hervé proposed to remove all the dma_mode_ok stuff from fdc to have a more
clear boundary between DMA and FDC, so this patch also does that.
Suggested-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Reviewed-by: Hervé Poussineau <hpoussin@reactos.org>
The given argument for this trace should be cqid, not sqid.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Message-Id: <20200324140646.8274-1-minwoo.im.dev@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
While there, tidy up indentation, and add return just for consistency
and robustness.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200313170517.22480-4-armbru@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[The "while there" cleanups squashed in]
Whenever an unsupported command is encountered, the current code
interprets each transferred byte as new command. Most of the time, those
'commands' are interpreted as new unknown commands. However, in rare
cases, it may be that for example address or length information
passed with the original command is by itself a valid command.
If that happens, the state machine may get completely confused and,
worst case, start writing data into the flash or even erase it.
To avoid the problem, transition into STATE_READING_DATA and keep
sending a value of 0 until the chip is deselected after encountering
an unsupported command.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
When requesting JEDEC data using the JEDEC_READ command, the Linux kernel
always requests 6 bytes. The current implementation only returns three
bytes, and interprets the remaining three bytes as new commands.
While this does not matter most of the time, it is at the very least
confusing. To avoid the problem, always report up to 6 bytes of JEDEC
data. Fill remaining data with 0.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
While at it, add some trace messages to help debug problems
seen when running the latest Linux kernel.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Mapping object-add to the command line as is doesn't result in nice
syntax because of the nesting introduced with 'props'. This becomes
nicer and more consistent with device_add and netdev_add when we accept
properties for the object on the top level instead.
'props' is still accepted after this patch, but marked as deprecated.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200224143008.13362-8-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* fix for xen-block
* fix in exec.c for migration of xen guest
* one cleanup patch
-----BEGIN PGP SIGNATURE-----
iQFOBAABCgA4FiEE+AwAYwjiLP2KkueYDPVXL9f7Va8FAl5XrpgaHGFudGhvbnku
cGVyYXJkQGNpdHJpeC5jb20ACgkQDPVXL9f7Va88wQf/TcU/rOJSIlTzIoIktp+T
uvsb3+TkppdLBeFvAPAfKXFG8JxO7RHxtnn7pZFdlejqNG+AJhARd+LbQMPMO15d
cLo7Da5HE8ni9f+CwtY61SNS3qe1+8qoNRFwxeycA5pfr+XZb5dB8FYW4w5H4mg0
gyf4R0kb/5Y43K4FKEu/09rh3jtV1HqVfbjMrk3u82sex5gp3LT9kg6VJyrGE3rr
D/rmVOM1+rEn8S9e5YG1YqBq1HRSMAbrQ3kvkCJPHE+vLnmkbITyi9faL99vR3Pl
oTtmnwNWUwYzf/FwAA+8/YaaAsEz17KQXOQtFxIC+j9im2KkE5waD15AfEJ5eQgW
EA==
=sKMx
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/aperard/tags/pull-xen-20200227' into staging
Xen queue 2020-02-27
* fix for xen-block
* fix in exec.c for migration of xen guest
* one cleanup patch
# gpg: Signature made Thu 27 Feb 2020 11:57:12 GMT
# gpg: using RSA key F80C006308E22CFD8A92E7980CF5572FD7FB55AF
# gpg: issuer "anthony.perard@citrix.com"
# gpg: Good signature from "Anthony PERARD <anthony.perard@gmail.com>" [marginal]
# gpg: aka "Anthony PERARD <anthony.perard@citrix.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5379 2F71 024C 600F 778A 7161 D8D5 7199 DF83 42C8
# Subkey fingerprint: F80C 0063 08E2 2CFD 8A92 E798 0CF5 572F D7FB 55AF
* remotes/aperard/tags/pull-xen-20200227:
Memory: Only call ramblock_ptr when needed in qemu_ram_writeback
xen-bus/block: explicitly assign event channels to an AioContext
hw/xen/xen_pt_load_rom: Remove unused includes
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
It is not safe to close an event channel from the QEMU main thread when
that channel's poller is running in IOThread context.
This patch adds a new xen_device_set_event_channel_context() function
to explicitly assign the channel AioContext, and modifies
xen_device_bind_event_channel() to initially assign the channel's poller
to the QEMU main thread context. The code in xen-block's dataplane is
then modified to assign the channel to IOThread context during
xen_block_dataplane_start() and de-assign it during in
xen_block_dataplane_stop(), such that the channel is always assigned
back to main thread context before it is closed. aio_set_fd_handler()
already deals with all the necessary synchronization when moving an fd
between AioContext-s so no extra code is needed to manage this.
Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Message-Id: <20191216143451.19024-1-pdurrant@amazon.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
use the new virtio_delete_queue function to cleanup.
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Message-Id: <20200224041336.30790-3-pannengyuan@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio queues forgot to delete in unrealize, and aslo error path in
realize, this patch fix these memleaks, the leak stack is as follow:
Direct leak of 114688 byte(s) in 16 object(s) allocated from:
#0 0x7f24024fdbf0 in calloc (/lib64/libasan.so.3+0xcabf0)
#1 0x7f2401642015 in g_malloc0 (/lib64/libglib-2.0.so.0+0x50015)
#2 0x55ad175a6447 in virtio_add_queue /mnt/sdb/qemu/hw/virtio/virtio.c:2327
#3 0x55ad17570cf9 in vhost_user_blk_device_realize /mnt/sdb/qemu/hw/block/vhost-user-blk.c:419
#4 0x55ad175a3707 in virtio_device_realize /mnt/sdb/qemu/hw/virtio/virtio.c:3509
#5 0x55ad176ad0d1 in device_set_realized /mnt/sdb/qemu/hw/core/qdev.c:876
#6 0x55ad1781ff9d in property_set_bool /mnt/sdb/qemu/qom/object.c:2080
#7 0x55ad178245ae in object_property_set_qobject /mnt/sdb/qemu/qom/qom-qobject.c:26
#8 0x55ad17821eb4 in object_property_set_bool /mnt/sdb/qemu/qom/object.c:1338
#9 0x55ad177aeed7 in virtio_pci_realize /mnt/sdb/qemu/hw/virtio/virtio-pci.c:1801
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200224041336.30790-2-pannengyuan@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The goal is to reduce the amount of requests issued by a guest on
1M reads/writes. This rises the performance up to 4% on that kind of
disk access pattern.
The maximum chunk size to be used for the guest disk accessing is
limited with seg_max parameter, which represents the max amount of
pices in the scatter-geather list in one guest disk request.
Since seg_max is virqueue_size dependent, increasing the virtqueue
size increases seg_max, which, in turn, increases the maximum size
of data to be read/write from a guest disk.
More details in the original problem statment:
https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg03721.html
Suggested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Message-id: 20200214074648.958-1-dplotnikov@virtuozzo.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Fix warning reported by Clang static code analyzer:
CC hw/block/pflash_cfi02.o
hw/block/pflash_cfi02.c:311:5: warning: Value stored to 'ret' is never read
ret = -1;
^ ~~
Reported-by: Clang Static Analyzer
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200215161557.4077-4-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
We have many files that apparently do not depend on the target CPU
configuration, i.e. which can be put into common-obj-y instead of
obj-y. This way, the code can be shared for example between
qemu-system-arm and qemu-system-aarch64, or the various big and
little endian variants like qemu-system-sh4 and qemu-system-sh4eb,
so that we do not have to compile the code multiple times anymore.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200130133841.10779-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* Command line parsing fixes (Michal, Peter, Xiaoyao)
* Cooperlake CPU model fixes (Xiaoyao)
* i386 gdb fix (mkdolata)
* IOEventHandler cleanup (Philippe)
* icount fix (Pavel)
* RR support for random number sources (Pavel)
* Kconfig fixes (Philippe)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJeFbG8AAoJEL/70l94x66DCpMIAKBwxBL+VegqI+ySKgmtIBQX
LtU+ardEeZ37VfWfvuWzTFe+zQ0hsFpz/e0LHE7Ae+LVLMNWXixlmMrTIm+Xs762
hJzxBjhUhkdrMioVYTY16Kqap4Nqaxu70gDQ32Ve2sY6xYGxYLSaJooBOU5bXVgb
HPspHFVpeP6ZshBd1n2LXsgURE6v3AjTwqcsPCkL/AESFdkdOsoHeXjyKWJG1oPy
W7btzlUEqVsauZI8/PhhW/8hZUvUsJVHonYLTZTyy8aklU7aOILSyT2uPXFBVUVQ
irkQjLtD4dWlogBKO4i/QHMuwV+Asa57WNPmqv3EcIWPUWmTY84H0g2AxRgcc2M=
=48jx
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* Compat machines fix (Denis)
* Command line parsing fixes (Michal, Peter, Xiaoyao)
* Cooperlake CPU model fixes (Xiaoyao)
* i386 gdb fix (mkdolata)
* IOEventHandler cleanup (Philippe)
* icount fix (Pavel)
* RR support for random number sources (Pavel)
* Kconfig fixes (Philippe)
# gpg: Signature made Wed 08 Jan 2020 10:41:00 GMT
# gpg: using RSA key BFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini/tags/for-upstream: (38 commits)
chardev: Use QEMUChrEvent enum in IOEventHandler typedef
chardev: use QEMUChrEvent instead of int
chardev/char: Explicit we ignore some QEMUChrEvent in IOEventHandler
monitor/hmp: Explicit we ignore a QEMUChrEvent in IOEventHandler
monitor/qmp: Explicit we ignore few QEMUChrEvent in IOEventHandler
virtio-console: Explicit we ignore some QEMUChrEvent in IOEventHandler
vhost-user-blk: Explicit we ignore few QEMUChrEvent in IOEventHandler
vhost-user-net: Explicit we ignore few QEMUChrEvent in IOEventHandler
vhost-user-crypto: Explicit we ignore some QEMUChrEvent in IOEventHandler
ccid-card-passthru: Explicit we ignore QEMUChrEvent in IOEventHandler
hw/usb/redirect: Explicit we ignore few QEMUChrEvent in IOEventHandler
hw/usb/dev-serial: Explicit we ignore few QEMUChrEvent in IOEventHandler
hw/char/terminal3270: Explicit ignored QEMUChrEvent in IOEventHandler
hw/ipmi: Explicit we ignore some QEMUChrEvent in IOEventHandler
hw/ipmi: Remove unnecessary declarations
target/i386: Add missed features to Cooperlake CPU model
target/i386: Add new bit definitions of MSR_IA32_ARCH_CAPABILITIES
target/i386: Fix handling of k_gs_base register in 32-bit mode in gdbstub
hw/rtc/mc146818: Add missing dependency on ISA Bus
hw/nvram/Kconfig: Restrict CHRP NVRAM to machines using OpenBIOS or SLOF
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The Chardev events are listed in the QEMUChrEvent enum.
By using the enum in the IOEventHandler typedef we:
- make the IOEventHandler type more explicit (this handler
process out-of-band information, while the IOReadHandler
is in-band),
- help static code analyzers.
This patch was produced with the following spatch script:
@match@
expression backend, opaque, context, set_open;
identifier fd_can_read, fd_read, fd_event, be_change;
@@
qemu_chr_fe_set_handlers(backend, fd_can_read, fd_read, fd_event,
be_change, opaque, context, set_open);
@depends on match@
identifier opaque, event;
identifier match.fd_event;
@@
static
-void fd_event(void *opaque, int event)
+void fd_event(void *opaque, QEMUChrEvent event)
{
...
}
Then the typedef was modified manually in
include/chardev/char-fe.h.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20191218172009.8868-15-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Bugfixes all over the place.
HMAT support.
New flags for vhost-user-blk utility.
Auto-tuning of seg max for virtio storage.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl4TaMEPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpvzgH/2LyDAzCa9h93ikSJjmyUk5FUaqve38daEb3
S3JYjwKxQx7u1ydooKhvBQnBCZ2i3S+k62gfYyKB+nBv8xvjs0Eg5D1YJ5E8hciy
lf5OFGWWtX2iPDjZwQwT13kiJe0o3JRGxJJ6XqTEG+1EYOp7cky/FEv4PD030b9m
I2wROZ/Am+onB9YJX8c0Vv1CG+AryuJNXnvwQzTXEjj4U7bEYUyJwVZaCRyAdWQ3
uYXIZN9VwjVX6BFvy9ZAJbEsUVJvOM1/aQaDqcrLz+VlzRT7bRkKHi2G3vakrm1I
r5OpgyLo84132awCncbSykKDH5o8WaxLaJBjGmuBfasMz9wPzAg=
=uL1o
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
virtio, pci, pc: fixes, features
Bugfixes all over the place.
HMAT support.
New flags for vhost-user-blk utility.
Auto-tuning of seg max for virtio storage.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Mon 06 Jan 2020 17:05:05 GMT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream: (32 commits)
intel_iommu: add present bit check for pasid table entries
intel_iommu: a fix to vtd_find_as_from_bus_num()
virtio-net: delete also control queue when TX/RX deleted
virtio: reset region cache when on queue deletion
virtio-mmio: update queue size on guest write
tests: add virtio-scsi and virtio-blk seg_max_adjust test
virtio: make seg_max virtqueue size dependent
hw: fix using 4.2 compat in 5.0 machine types for i440fx/q35
vhost-user-scsi: reset the device if supported
vhost-user: add VHOST_USER_RESET_DEVICE to reset devices
hw/pci/pci_host: Let pci_data_[read/write] use unsigned 'size' argument
hw/pci/pci_host: Remove redundant PCI_DPRINTF()
virtio-mmio: Clear v2 transport state on soft reset
ACPI: add expected files for HMAT tests (acpihmat)
tests/bios-tables-test: add test cases for ACPI HMAT
tests/numa: Add case for QMP build HMAT
hmat acpi: Build Memory Side Cache Information Structure(s)
hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)
hmat acpi: Build Memory Proximity Domain Attributes Structure(s)
numa: Extend CLI to provide memory side cache information
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The Chardev events are listed in the QEMUChrEvent enum. To be
able to use this enum in the IOEventHandler typedef, we need to
explicit all the events ignored by this frontend, to silent the
following GCC warning:
CC s390x-softmmu/hw/block/vhost-user-blk.o
hw/block/vhost-user-blk.c: In function ‘vhost_user_blk_event’:
hw/block/vhost-user-blk.c:370:5: error: enumeration value ‘CHR_EVENT_BREAK’ not handled in switch [-Werror=switch]
370 | switch (event) {
| ^~~~~~
hw/block/vhost-user-blk.c:370:5: error: enumeration value ‘CHR_EVENT_MUX_IN’ not handled in switch [-Werror=switch]
hw/block/vhost-user-blk.c:370:5: error: enumeration value ‘CHR_EVENT_MUX_OUT’ not handled in switch [-Werror=switch]
cc1: all warnings being treated as errors
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20191218172009.8868-10-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Before the patch, seg_max parameter was immutable and hardcoded
to 126 (128 - 2) without respect to queue size. This has two negative effects:
1. when queue size is < 128, we have Virtio 1.1 specfication violation:
(2.6.5.3.1 Driver Requirements) seq_max must be <= queue_size.
This violation affects the old Linux guests (ver < 4.14). These guests
crash on these queue_size setups.
2. when queue_size > 128, as was pointed out by Denis Lunev <den@virtuozzo.com>,
seg_max restrics guest's block request length which affects guests'
performance making them issues more block request than needed.
https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg03721.html
To mitigate this two effects, the patch adds the property adjusting seg_max
to queue size automaticaly. Since seg_max is a guest visible parameter,
the property is machine type managable and allows to choose between
old (seg_max = 126 always) and new (seg_max = queue_size - 2) behaviors.
Not to change the behavior of the older VMs, prevent setting the default
seg_max_adjust value for older machine types.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Message-Id: <20191220140905.1718-2-dplotnikov@virtuozzo.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Replace DeviceState dependency with VMStateIf on vmstate API.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Virtqueue notifications are not necessary during polling, so we disable
them. This allows the guest driver to avoid MMIO vmexits.
Unfortunately the virtio-blk and virtio-scsi handler functions re-enable
notifications, defeating this optimization.
Fix virtio-blk and virtio-scsi emulation so they leave notifications
disabled. The key thing to remember for correctness is that polling
always checks one last time after ending its loop, therefore it's safe
to lose the race when re-enabling notifications at the end of polling.
There is a measurable performance improvement of 5-10% with the null-co
block driver. Real-life storage configurations will see a smaller
improvement because the MMIO vmexit overhead contributes less to
latency.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20191209210957.65087-1-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When the number of a virtio-blk device's virtqueues is larger than
BITS_PER_LONG, the out-of-bounds access to bitmap[ ] will occur.
Fixes: e21737ab15 ("virtio-blk: multiqueue batch notify")
Cc: qemu-stable@nongnu.org
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Li Hangjing <lihangjing@baidu.com>
Reviewed-by: Xie Yongji <xieyongji@baidu.com>
Reviewed-by: Chai Wen <chaiwen@baidu.com>
Message-id: 20191216023050.48620-1-lihangjing@baidu.com
Message-Id: <20191216023050.48620-1-lihangjing@baidu.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Virtio spec 1.1 (and earlier), 5.2.5.2 Driver Requirements: Device
Initialization:
"Devices SHOULD always offer VIRTIO_BLK_F_FLUSH, and MUST offer it if
they offer VIRTIO_BLK_F_CONFIG_WCE"
Currently F_CONFIG_WCE and F_WCE are not connected to each other.
Qemu will advertise F_CONFIG_WCE if config-wce argument is
set for virtio-blk device. And F_WCE is advertised only if
underlying block backend actually has it's caching enabled.
Fix this by advertising F_WCE if F_CONFIG_WCE is also advertised.
To preserve backwards compatibility with newer machine types make this
behaviour governed by "x-enable-wce-if-config-wce" virtio-blk-device
property and introduce hw_compat_4_2 with new property being off by
default for all machine types <= 4.2 (but don't introduce 4.3
machine type itself yet).
Signed-off-by: Evgeny Yakovlev <wrfsh@yandex-team.ru>
Message-Id: <1572978137-189218-1-git-send-email-wrfsh@yandex-team.ru>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Since not all trace backends support dynamic field width in
format (dtrace via stap does not), replace by a static field
width instead.
We previously passed to the trace API 'width << 1' as the number
of hex characters to display (the dynamic field width). We don't
need this anymore. Instead, display the size of bytes accessed.
Fixes: e8aa2d95ea ("pflash: Simplify trace_pflash_io_read/write")
Fixes: c1474acd5d ("pflash: Simplify trace_pflash_data_read/write")
Reported-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Buglink: https://bugs.launchpad.net/qemu/+bug/1844817
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Relevant devices are:
* ide-hd (and ide-cd, ide-drive)
* scsi-hd (and scsi-cd, scsi-disk, scsi-block)
* virtio-blk-pci
We do not call del_boot_device_lchs() for ide-* since we don't need to -
IDE block devices do not support unplugging.
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
Signed-off-by: Sam Eiderman <sameid@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
SWIM (Sander-Wozniak Integrated Machine) is the floppy controller of
the 680x0 Macintosh.
This patch introduces only the basic support: it allows to switch from
IWM (Integrated WOZ Machine) mode to the SWIM mode and makes the linux
driver happy.
It cannot read any floppy image.
Co-developed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Hervé Poussineau <hpoussin@reactos.org>
Message-Id: <20191026164546.30020-10-laurent@vivier.eu>
- iotest patches
- Improve performance of the mirror block job in write-blocking mode
- Limit memory usage for the backup block job
- Add discard and write-zeroes support to the NVMe host block driver
- Fix a bug in the mirror job
- Prevent the qcow2 driver from creating technically non-compliant qcow2
v3 images (where there is not enough extra data for snapshot table
entries)
- Allow callers of bdrv_truncate() (etc.) to determine whether the file
must be resized to the exact given size or whether it is OK for block
devices not to shrink
-----BEGIN PGP SIGNATURE-----
iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAl2224ESHG1yZWl0ekBy
ZWRoYXQuY29tAAoJEPQH2wBh1c9AeXMH/RXKEX4BZYMRKCe41P18tJC9Bl2x0T20
YeOsZVvpARlr7o/36BF2kGFF4MnL0OQ+9ELuyROX865rk/VL2rWqnHDE5oQM889a
dFwMs+0zvNbig3iLNcw0H5OkE2mrdM+a1EUdn/lBe/39Z8dPqPxRGqIYHq38Ugdu
emwSy1nWen7o0f71HRJfyVtI3KcrzXx71FrA/FY2yL/eHz+zRYGZj2SpAdFPkXP/
lgaz+m0tWhnSW1QzEOXB0Gh69ULt/DczCinYmv5qUY1noW5TPPtiDNCQTts5O4ba
oJsR3AJv5/l9m65JTmiyQSqnQfPcstrQ5FqOcSnP637cfqUFyWsvdks=
=L7v1
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2019-10-28' into staging
Block patches for softfreeze:
- iotest patches
- Improve performance of the mirror block job in write-blocking mode
- Limit memory usage for the backup block job
- Add discard and write-zeroes support to the NVMe host block driver
- Fix a bug in the mirror job
- Prevent the qcow2 driver from creating technically non-compliant qcow2
v3 images (where there is not enough extra data for snapshot table
entries)
- Allow callers of bdrv_truncate() (etc.) to determine whether the file
must be resized to the exact given size or whether it is OK for block
devices not to shrink
# gpg: Signature made Mon 28 Oct 2019 12:13:53 GMT
# gpg: using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40
# gpg: issuer "mreitz@redhat.com"
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full]
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40
* remotes/maxreitz/tags/pull-block-2019-10-28: (69 commits)
qemu-iotests: restrict 264 to qcow2 only
Revert "qemu-img: Check post-truncation size"
block: Pass truncate exact=true where reasonable
block: Let format drivers pass @exact
block: Evaluate @exact in protocol drivers
block: Add @exact parameter to bdrv_co_truncate()
block: Do not truncate file node when formatting
block/cor: Drop cor_co_truncate()
block: Handle filter truncation like native impl.
iotests: Test qcow2's snapshot table handling
iotests: Add peek_file* functions
qcow2: Fix v3 snapshot table entry compliancy
qcow2: Repair snapshot table with too many entries
qcow2: Fix overly long snapshot tables
qcow2: Keep track of the snapshot table length
qcow2: Fix broken snapshot table entries
qcow2: Add qcow2_check_fix_snapshot_table()
qcow2: Separate qcow2_check_read_snapshot_table()
qcow2: Write v3-compliant snapshot list on upgrade
qcow2: Put qcow2_upgrade() into its own function
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
libqos update with support for virtio 1.
Packed ring support for virtio.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJdsuDvAAoJECgfDbjSjVRpIP8H/3rHSvZ5+MQGCFLI5GU8m3za
JSOaBSmtcj9KwrpibBfptSCJZNrG8EUVHyo+Z+pvGohXqDB8h9RyBfb6vID8jqzC
5wIzlNBP27F668MUBt2t7xSwK0PWO1QOpEKk6S4SJMpl51ea8ePlTH0jnLVfkaAN
hFKU1wqwc2gMyF9rDjOZ6I+OO1iQbMcrsazFrCXECXCkxDcJM0ey7MheKxVntTjt
0sxFHM2I1A+vXtAzlLo6rS3I9vJ0ATfLfOlZLqrq5uSAL5FKrqsbmGh4sAsFTQAA
eerR6zDz3X+YqfQaVgVk2wixPHQz2w8Rv68j6SiGrdZ29/JT6nVWHT8cGtPsX4c=
=iJuG
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
virtio: features, tests
libqos update with support for virtio 1.
Packed ring support for virtio.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Fri 25 Oct 2019 12:47:59 BST
# gpg: using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream: (25 commits)
virtio: drop unused virtio_device_stop_ioeventfd() function
libqos: add VIRTIO PCI 1.0 support
libqos: extract Legacy virtio-pci.c code
libqos: make the virtio-pci BAR index configurable
libqos: expose common virtqueue setup/cleanup functions
libqos: add MSI-X callbacks to QVirtioPCIDevice
libqos: pass full QVirtQueue to set_queue_address()
libqos: add iteration support to qpci_find_capability()
libqos: access VIRTIO 1.0 vring in little-endian
libqos: implement VIRTIO 1.0 FEATURES_OK step
libqos: enforce Device Initialization order
libqos: add missing virtio-9p feature negotiation
tests/virtio-blk-test: set up virtqueue after feature negotiation
virtio-scsi-test: add missing feature negotiation
libqos: extend feature bits to 64-bit
libqos: read QVIRTIO_MMIO_VERSION register
tests/virtio-blk-test: read config space after feature negotiation
virtio: add property to enable packed virtqueue
vhost_net: enable packed ring support
virtio: event suppression support for packed ring
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
endof() is a useful macro, we can make use of it outside of virtio.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
QEMU does not wait for completed I/O requests, assuming that the guest
driver will reset the device before calling unrealize(). This does not
happen on Windows, and QEMU crashes in virtio_notify(), getting the
result of a completed I/O request on hot-unplugged device.
Signed-off-by: Julia Suvorova <jusual@redhat.com>
Message-Id: <20191018142856.31870-1-jusual@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch implements basic support for the packed virtqueue. Compare
the split virtqueue which has three rings, packed virtqueue only have
one which is supposed to have better cache utilization and more
hardware friendly.
Please refer virtio specification for more information.
Signed-off-by: Wei Xu <wexu@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20191025083527.30803-6-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The function virtio_del_queue was not called at unrealize() callback.
This was detected due to add an allocated element on the vq introduce
in future commits (used_elems) and running address sanitizer memory
leak detector.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20191025083527.30803-4-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Message-id: 20190925143248.10000-20-clg@kaod.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
virtio_notify_config() needs to acquire the global mutex, which isn't
allowed from an iothread, and may lead to a deadlock like this:
- main thead
* Has acquired: qemu_global_mutex.
* Is trying the acquire: iothread AioContext lock via
AIO_WAIT_WHILE (after aio_poll).
- iothread
* Has acquired: AioContext lock.
* Is trying to acquire: qemu_global_mutex (via
virtio_notify_config->prepare_mmio_access).
If virtio_blk_resize() is called from an iothread, schedule
virtio_notify_config() to be run in the main context BH.
[Removed unnecessary newline as suggested by Kevin Wolf
<kwolf@redhat.com>.
--Stefan]
Signed-off-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20190916112411.21636-1-slp@redhat.com
Message-Id: <20190916112411.21636-1-slp@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
When a frontend gracefully disconnects from an offline backend, it will
set its own state to XenbusStateClosed. The code in xen-block.c correctly
deals with this and sets the backend into XenbusStateClosed. Unfortunately
it is possible for toolstack to actually delete the frontend area
before the state key has been read, leading to an apparent frontend state
of XenbusStateUnknown. This prevents the backend state from transitioning
to XenbusStateClosed and hence leaves it limbo.
This patch simply treats a frontend state of XenbusStateUnknown the same
as XenbusStateClosed, which will unblock the backend in these circumstances.
Reported-by: Mark Syms <mark.syms@citrix.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-Id: <20190918115702.38959-1-paul.durrant@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Same rational as: e6cc11d64f
Of the 3 virtqueues, seabios only sets cmd, leaving ctrl
and event without a physical address. This can cause
vhost_verify_ring_part_mapping to return ENOMEM, causing
the following logs:
qemu-system-x86_64: Unable to map available ring for ring 0
qemu-system-x86_64: Verify ring failure on region 0
This has already been fixed for vhost scsi devices and was
recently vhost-user scsi devices. This commit fixes it for
vhost-user-blk devices.
Suggested-by: Phillippe Mathieu-Daude <philmd@redhat.com>
Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <1566498865-55506-1-git-send-email-raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
When 'system_reset' is called, the main loop clear the memory
region cache before the BH has a chance to execute. Later when
the deferred function is called, some assumptions that were
made when scheduling them are no longer true when they actually
execute.
This is what happens using a virtio-blk device (fresh RHEL7.8 install):
$ (sleep 12.3; echo system_reset; sleep 12.3; echo system_reset; sleep 1; echo q) \
| qemu-system-x86_64 -m 4G -smp 8 -boot menu=on \
-device virtio-blk-pci,id=image1,drive=drive_image1 \
-drive file=/var/lib/libvirt/images/rhel78.qcow2,if=none,id=drive_image1,format=qcow2,cache=none \
-device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c4:e7:84 \
-netdev tap,id=net0,script=/bin/true,downscript=/bin/true,vhost=on \
-monitor stdio -serial null -nographic
(qemu) system_reset
(qemu) system_reset
(qemu) qemu-system-x86_64: hw/virtio/virtio.c:225: vring_get_region_caches: Assertion `caches != NULL' failed.
Aborted
(gdb) bt
Thread 1 (Thread 0x7f109c17b680 (LWP 10939)):
#0 0x00005604083296d1 in vring_get_region_caches (vq=0x56040a24bdd0) at hw/virtio/virtio.c:227
#1 0x000056040832972b in vring_avail_flags (vq=0x56040a24bdd0) at hw/virtio/virtio.c:235
#2 0x000056040832d13d in virtio_should_notify (vdev=0x56040a240630, vq=0x56040a24bdd0) at hw/virtio/virtio.c:1648
#3 0x000056040832d1f8 in virtio_notify_irqfd (vdev=0x56040a240630, vq=0x56040a24bdd0) at hw/virtio/virtio.c:1662
#4 0x00005604082d213d in notify_guest_bh (opaque=0x56040a243ec0) at hw/block/dataplane/virtio-blk.c:75
#5 0x000056040883dc35 in aio_bh_call (bh=0x56040a243f10) at util/async.c:90
#6 0x000056040883dccd in aio_bh_poll (ctx=0x560409161980) at util/async.c:118
#7 0x0000560408842af7 in aio_dispatch (ctx=0x560409161980) at util/aio-posix.c:460
#8 0x000056040883e068 in aio_ctx_dispatch (source=0x560409161980, callback=0x0, user_data=0x0) at util/async.c:261
#9 0x00007f10a8fca06d in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#10 0x0000560408841445 in glib_pollfds_poll () at util/main-loop.c:215
#11 0x00005604088414bf in os_host_main_loop_wait (timeout=0) at util/main-loop.c:238
#12 0x00005604088415c4 in main_loop_wait (nonblocking=0) at util/main-loop.c:514
#13 0x0000560408416b1e in main_loop () at vl.c:1923
#14 0x000056040841e0e8 in main (argc=20, argv=0x7ffc2c3f9c58, envp=0x7ffc2c3f9d00) at vl.c:4578
Fix this by cancelling the BH when the virtio dataplane is stopped.
[This is version of the patch was modified as discussed with Philippe on
the mailing list thread.
--Stefan]
Reported-by: Yihuang Yu <yihyu@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Fixes: https://bugs.launchpad.net/qemu/+bug/1839428
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190816171503.24761-1-philmd@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
- file-posix: Fix O_DIRECT alignment detection
- Fixes for concurrent block jobs
- block-backend: Queue requests while drained (fix IDE vs. job crashes)
- qemu-img convert: Deprecate using -n and -o together
- iotests: Migration tests with filter nodes
- iotests: More media change tests
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJdVnduAAoJEH8JsnLIjy/W0IgQAKft/M3aDgt0sbTzQh8vdy6A
yAfTnnSL4Z56+8qAsqhEnplC3rZxvTkg9AGOoNYHOZKl3FgRH9r8g9/Enemh4fWu
MH52hiRf2ytlFVurIQal3aj9O+i0YTnzuvYbysvkH4ID5zbv2QnwdagtEcBxbbYL
NZTMZBynDzp4rKIZ7p6T/kkaklLHh4vZrjW+Mzm3LQx9JJr8TwVNqqetSfc4VKIJ
ByaNbbihDUVjQyIaJ24DXXJdzonGrrtSbSZycturc5FzXymzSRgrXZCeSKCs8X+i
fjwMXH5v4/UfK511ILsXiumeuxBfD2Ck4sAblFxVo06oMPRNmsAKdRLeDByE7IC1
lWep/pB3y/au9CW2/pkWJOiaz5s5iuv2fFYidKUJ0KQ1dD7G8M9rzkQlV3FUmTZO
jBKSxHEffXsYl0ojn0vGmZEd7FAPi3fsZibGGws1dVgxlWI93aUJsjCq0E+lHIRD
hEmQcjqZZa4taKpj0Y3Me05GkL7tH6RYA153jDNb8rPdzriGRCLZSObEISrOJf8H
Mh0gTLi8KJNh6bULd12Ake1tKn7ZeTXpHH+gadz9OU7eIModh1qYTSHPlhy5oAv0
Hm9BikNlS1Hzw+a+EbLcOW7TrsteNeGr7r8T6QKPMq1sfsYcp3svbC2c+zVlQ6Ll
mLoTssksXOkgBevVqSiS
=T7L5
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches:
- file-posix: Fix O_DIRECT alignment detection
- Fixes for concurrent block jobs
- block-backend: Queue requests while drained (fix IDE vs. job crashes)
- qemu-img convert: Deprecate using -n and -o together
- iotests: Migration tests with filter nodes
- iotests: More media change tests
# gpg: Signature made Fri 16 Aug 2019 10:29:18 BST
# gpg: using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream:
file-posix: Handle undetectable alignment
qemu-img convert: Deprecate using -n and -o together
block-backend: Queue requests while drained
mirror: Keep mirror_top_bs drained after dropping permissions
block: Remove blk_pread_unthrottled()
iotests: Add test for concurrent stream/commit
tests: Test mid-drain bdrv_replace_child_noperm()
tests: Test polling in bdrv_drop_intermediate()
block: Reduce (un)drains when replacing a child
block: Keep subtree drained in drop_intermediate
block: Simplify bdrv_filter_default_perms()
iotests: Test migration with all kinds of filter nodes
iotests: Move migration helpers to iotests.py
iotests/118: Add -blockdev based tests
iotests/118: Create test classes dynamically
iotests/118: Test media change for scsi-cd
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>