mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Maxim Levitsky	e87a09d625	block/nvme: add support for discard Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190913133627.28450-3-mlevitsk@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:34:35 +01:00
Maxim Levitsky	e0dd95e373	block/nvme: add support for write zeros Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190913133627.28450-2-mlevitsk@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:34:30 +01:00
Pavel Dovgalyuk	e4ec5ad464	replay: add BH oneshot event for block layer Replay is capable of recording normal BH events, but sometimes there are single use callbacks scheduled with aio_bh_schedule_oneshot function. This patch enables recording and replaying such callbacks. Block layer uses these events for calling the completion function. Replaying these calls makes the execution deterministic. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Markus Armbruster	db72581598	Include qemu/main-loop.h less In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:52 +02:00
Max Reitz	1120407bdf	nvme: Limit blkshift to 12 (for 4 kB blocks) Linux does not support blocks greater than 4 kB anyway, so we might as well limit blkshift to 12 and thus save us from some potential trouble. Reported-by: Peter Maydell <peter.maydell@linaro.org> Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190730114812.10493-1-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Coverity: CID 1403771 Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-30 14:49:24 +02:00
Maxim Levitsky	258867d1dc	block/nvme: don't touch the completion entries Completion entries are meant to be only read by the host and written by the device. The driver is supposed to scan the completions from the last point where it left, and until it sees a completion with non flipped phase bit. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190716163020.13383-4-mlevitsk@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-22 18:40:32 +02:00
Maxim Levitsky	118d1b6a81	block/nvme: support larger that 512 bytes sector devices Currently the driver hardcodes the sector size to 512, and doesn't check the underlying device. Fix that. Also fail if underlying nvme device is formatted with metadata as this needs special support. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190716163020.13383-3-mlevitsk@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-22 18:40:32 +02:00
Maxim Levitsky	461bba04bf	block/nvme: fix doorbell stride Fix the math involving non standard doorbell stride Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190716163020.13383-2-mlevitsk@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-22 18:40:32 +02:00
Michal Privoznik	95667c3be0	nvme: Set number of queues later in nvme_init() When creating the admin queue in nvme_init() the variable that holds the number of queues created is modified before actual queue creation. This is a problem because if creating the queue fails then the variable is left in inconsistent state. This was actually observed when I tried to hotplug a nvme disk. The control got to nvme_file_open() which called nvme_init() which failed and thus nvme_close() was called which in turn called nvme_free_queue_pair() with queue being NULL. This lead to an instant crash: #0 0x000055d9507ec211 in nvme_free_queue_pair (bs=0x55d952ddb880, q=0x0) at block/nvme.c:164 #1 0x000055d9507ee180 in nvme_close (bs=0x55d952ddb880) at block/nvme.c:729 #2 0x000055d9507ee3d5 in nvme_file_open (bs=0x55d952ddb880, options=0x55d952bb1410, flags=147456, errp=0x7ffd8e19e200) at block/nvme.c:781 #3 0x000055d9507629f3 in bdrv_open_driver (bs=0x55d952ddb880, drv=0x55d95109c1e0 <bdrv_nvme>, node_name=0x0, options=0x55d952bb1410, open_flags=147456, errp=0x7ffd8e19e310) at block.c:1291 #4 0x000055d9507633d6 in bdrv_open_common (bs=0x55d952ddb880, file=0x0, options=0x55d952bb1410, errp=0x7ffd8e19e310) at block.c:1551 #5 0x000055d950766881 in bdrv_open_inherit (filename=0x0, reference=0x0, options=0x55d952bb1410, flags=32768, parent=0x55d9538ce420, child_role=0x55d950eaade0 <child_file>, errp=0x7ffd8e19e510) at block.c:3063 #6 0x000055d950765ae4 in bdrv_open_child_bs (filename=0x0, options=0x55d9541cdff0, bdref_key=0x55d950af33aa "file", parent=0x55d9538ce420, child_role=0x55d950eaade0 <child_file>, allow_none=true, errp=0x7ffd8e19e510) at block.c:2712 #7 0x000055d950766633 in bdrv_open_inherit (filename=0x0, reference=0x0, options=0x55d9541cdff0, flags=0, parent=0x0, child_role=0x0, errp=0x7ffd8e19e908) at block.c:3011 #8 0x000055d950766dba in bdrv_open (filename=0x0, reference=0x0, options=0x55d953d00390, flags=0, errp=0x7ffd8e19e908) at block.c:3156 #9 0x000055d9507cb635 in blk_new_open (filename=0x0, reference=0x0, options=0x55d953d00390, flags=0, errp=0x7ffd8e19e908) at block/block-backend.c:389 #10 0x000055d950465ec5 in blockdev_init (file=0x0, bs_opts=0x55d953d00390, errp=0x7ffd8e19e908) at blockdev.c:602 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Message-id: 927aae40b617ba7d4b6c7ffe74e6d7a2595f8e86.1562770546.git.mprivozn@redhat.com Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-07-15 15:48:40 +02:00
Markus Armbruster	0b8fa32f55	Include qemu/module.h where needed, drop it from qemu-common.h Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-4-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c; ui/cocoa.m fixed up]	2019-06-12 13:18:33 +02:00
Max Reitz	cc61b0740f	block/nvme: Fix bdrv_refresh_filename() Currently, nvme's bdrv_refresh_filename() is an exact copy of null's implementation. However, for null, "null-co://" and "null-aio://" are indeed valid filenames -- for nvme, they are not, as a device address is still required. The correct implementation should generate a filename of the form "nvme://[PCI address]/[namespace]" (as the comment above nvme_parse_filename() describes). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-27-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Max Reitz	998b3a1e5a	block: Purify .bdrv_refresh_filename() Currently, BlockDriver.bdrv_refresh_filename() is supposed to both refresh the filename (BDS.exact_filename) and set BDS.full_open_options. Now that we have generic code in the central bdrv_refresh_filename() for creating BDS.full_open_options, we can drop the latter part from all BlockDriver.bdrv_refresh_filename() implementations. This also means that we can drop all of the existing default code for this from the global bdrv_refresh_filename() itself. Furthermore, we now have to call BlockDriver.bdrv_refresh_filename() after having set BDS.full_open_options, because the block driver's implementation should now be allowed to depend on BDS.full_open_options being set correctly. Finally, with this patch we can drop the @options parameter from BlockDriver.bdrv_refresh_filename(); also, add a comment on this function's purpose in block/block_int.h while touching its interface. This completely obsoletes blklogwrite's implementation of .bdrv_refresh_filename(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190201192935.18394-25-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Max Reitz	2654267cc1	block: Add strong_runtime_opts to BlockDriver This new field can be set by block drivers to list the runtime options they accept that may influence the contents of the respective BDS. As of a follow-up patch, this list will be used by the common bdrv_refresh_filename() implementation to decide which options to put into BDS.full_open_options (and consequently whether a JSON filename has to be created), thus freeing the drivers of having to implement that logic themselves. Additionally, this patch adds the field to all of the block drivers that need it and sets it accordingly. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-22-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-02-25 15:11:27 +01:00
Thomas Huth	83c68e149a	block/nvme: Remove QEMU_PACKED from naturally aligned NVMeRegs struct The QEMU_PACKED is causing a compiler warning/error with GCC 9: CC block/nvme.o block/nvme.c: In function ‘nvme_create_queue_pair’: block/nvme.c:209:22: error: taking address of packed member of ‘struct <anonymous>’ may result in an unaligned pointer value [-Werror=address-of-packed-member] 209 \| q->sq.doorbell = &s->regs->doorbells[idx * 2 * s->doorbell_scale]; All members of the struct are naturally aligned, so there should not be the need for QEMU_PACKED here, and the following QEMU_BUILD_BUG_ON also ensures that there is no padding. Thus simply remove the QEMU_PACKED here. Buglink: https://bugs.launchpad.net/qemu/+bug/1817525 Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-02-25 15:09:48 +01:00
Kevin Wolf	4720cbeea1	block: Fix hangs in synchronous APIs with iothreads In the block layer, synchronous APIs are often implemented by creating a coroutine that calls the asynchronous coroutine-based implementation and then waiting for completion with BDRV_POLL_WHILE(). For this to work with iothreads (more specifically, when the synchronous API is called in a thread that is not the home thread of the block device, so that the coroutine will run in a different thread), we must make sure to call aio_wait_kick() at the end of the operation. Many places are missing this, so that BDRV_POLL_WHILE() keeps hanging even if the condition has long become false. Note that bdrv_dec_in_flight() involves an aio_wait_kick() call. This corresponds to the BDRV_POLL_WHILE() in the drain functions, but it is generally not enough for most other operations because they haven't set the return value in the coroutine entry stub yet. To avoid race conditions there, we need to kick after setting the return value. The race window is small enough that the problem doesn't usually surface in the common path. However, it does surface and causes easily reproducible hangs if the operation can return early before even calling bdrv_inc/dec_in_flight, which many of them do (trivial error or no-op success paths). The bug in bdrv_truncate(), bdrv_check() and bdrv_invalidate_cache() is slightly different: These functions even neglected to schedule the coroutine in the home thread of the node. This avoids the hang, but is obviously wrong, too. Fix those to schedule the coroutine in the right AioContext in addition to adding aio_wait_kick() calls. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-02-01 13:46:44 +01:00
Li Feng	2916405a11	block/nvme: optimize the performance of nvme driver based on vfio-pci When the IO size is larger than 2 pages, we move the the pointer one by one in the pagelist, this is inefficient. This is a simple benchmark result: Before: $ qemu-io -c 'write 0 1G' nvme://0000:00:04.0/1 wrote 1073741824/1073741824 bytes at offset 0 1 GiB, 1 ops; 0:00:02.41 (424.504 MiB/sec and 0.4146 ops/sec) $ qemu-io -c 'read 0 1G' nvme://0000:00:04.0/1 read 1073741824/1073741824 bytes at offset 0 1 GiB, 1 ops; 0:00:02.03 (503.055 MiB/sec and 0.4913 ops/sec) After: $ qemu-io -c 'write 0 1G' nvme://0000:00:04.0/1 wrote 1073741824/1073741824 bytes at offset 0 1 GiB, 1 ops; 0:00:02.17 (471.517 MiB/sec and 0.4605 ops/sec) $ qemu-io -c 'read 0 1G' nvme://0000:00:04.0/1 read 1073741824/1073741824 bytes at offset 0 1 GiB, 1 ops; 0:00:01.94 (526.770 MiB/sec and 0.5144 ops/sec) Signed-off-by: Li Feng <lifeng1519@gmail.com> Message-Id: <20181101103807.25862-1-lifeng1519@gmail.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2019-01-09 09:38:34 +08:00
Paolo Bonzini	6388147296	nvme: correct locking around completion nvme_poll_queues is already protected by q->lock, and AIO callbacks are invoked outside the AioContext lock. So remove the acquire/release pair in nvme_handle_event. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20180814062739.19640-1-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2018-10-12 09:46:14 +08:00
Paolo Bonzini	2f0d8947a6	nvme: simplify plug/unplug bdrv_io_plug/bdrv_io_unplug take care of keeping a nesting count, so change s->plugged to just a bool. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20180813144320.12382-2-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2018-08-15 10:12:35 +08:00
Fam Zheng	9582f357bb	nvme: Fix nvme_init error handling It is wrong to leave this field as 1, as nvme_close() called in the error handling code in nvme_file_open() will use it and try to free s->queues again. Another problem is the cleaning ups are duplicated between the fail* labels of nvme_init() and nvme_file_open(), which calls nvme_close(). A third problem is nvme_close() misses g_free() and event_notifier_cleanup(). Fix all of them. Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20180712025420.4932-1-famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2018-08-15 10:12:35 +08:00
Marc-André Lureau	f5a74a5a50	qobject: Modify qobject_ref() to return obj For convenience and clarity, make it possible to call qobject_ref() at the time when the reference is associated with a variable, or argument, by making qobject_ref() return the same pointer as given. Use that to simplify the callers. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180419150145.24795-5-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Useless change to qobject_ref_impl() dropped, commit message improved slightly] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2018-05-04 08:27:53 +02:00
Marc-André Lureau	cb3e7f08ae	qobject: Replace qobject_incref/QINCREF qobject_decref/QDECREF Now that we can safely call QOBJECT() on QObject * as well as its subtypes, we can have macros qobject_ref() / qobject_unref() that work everywhere instead of having to use QINCREF() / QDECREF() for QObject and qobject_incref() / qobject_decref() for its subtypes. The replacement is mechanical, except I broke a long line, and added a cast in monitor_qmp_cleanup_req_queue_locked(). Unlike qobject_decref(), qobject_unref() doesn't accept void *. Note that the new macros evaluate their argument exactly once, thus no need to shout them. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180419150145.24795-4-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Rebased, semantic conflict resolved, commit message improved] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2018-05-04 08:27:53 +02:00
Laurent Vivier	625eaca9e5	qdict: remove useless cast Re-run Coccinelle script scripts/coccinelle/qobject.cocci Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20180323143202.28879-5-lvivier@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-27 10:17:32 -05:00
Eric Blake	e3efee828b	nvme: Drop pointless .bdrv_co_get_block_status() Commit `bdd6a90` has a bug: drivers should never directly set BDRV_BLOCK_ALLOCATED, but only io.c should do that (as needed). Instead, drivers should report BDRV_BLOCK_DATA if it knows that data comes from this BDS. But let's look at the bigger picture: semantically, the nvme driver is similar to the nbd, null, and raw drivers (no backing file, all data comes from this BDS). But while two of those other drivers have to supply the callback (null because it can special-case BDRV_BLOCK_ZERO, raw because it can special-case a different offset), in this case the block layer defaults are good enough without the callback at all (similar to nbd). So, fix the bug by deletion ;) Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2018-03-02 18:39:07 +01:00
Paolo Bonzini	78d8c99e29	block/nvme: fix Coverity reports 1) string not null terminated in sysfs_find_group_file 2) NULL pointer dereference and dead local variable in nvme_init. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20180213015240.9352-1-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2018-03-01 15:21:46 +08:00
Markus Armbruster	922a01a013	Move include qemu/option.h from qemu-common.h to actual users qemu-common.h includes qemu/option.h, but most places that include the former don't actually need the latter. Drop the include, and add it to the places that actually need it. While there, drop superfluous includes of both headers, and separate #include from file comment with a blank line. This cleanup makes the number of objects depending on qemu/option.h drop from 4545 (out of 4743) to 284 in my "build everything" tree. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-20-armbru@redhat.com> [Semantic conflict with commit `bdd6a90a9e` in block/nvme.c resolved]	2018-02-09 13:52:16 +01:00
Fam Zheng	a3d9a352d4	block: Move NVMe constants to a separate header Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20180116060901.17413-8-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2018-02-08 09:22:03 +08:00
Fam Zheng	9ed616129e	block/nvme: Implement .bdrv_(un)register_buf Forward these two calls to the IOVA manager. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20180116060901.17413-6-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2018-02-08 09:22:03 +08:00
Fam Zheng	bdd6a90a9e	block: Add VFIO based NVMe driver This is a new protocol driver that exclusively opens a host NVMe controller through VFIO. It achieves better latency than linux-aio by completely bypassing host kernel vfs/block layer. $rw-$bs-$iodepth linux-aio nvme:// ---------------------------------------- randread-4k-1 10.5k 21.6k randread-512k-1 745 1591 randwrite-4k-1 30.7k 37.0k randwrite-512k-1 1945 1980 (unit: IOPS) The driver also integrates with the polling mechanism of iothread. This patch is co-authored by Paolo and me. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20180116060901.17413-4-famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2018-02-08 09:22:03 +08:00

28 Commits