Mechanical change running Coccinelle spatch with content
generated from the qom-cast-macro-clean-cocci-gen.py added
in the previous commit.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230601093452.38972-3-philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Introduce a new API for thread-local blk_io_plug() that does not
traverse the block graph. The goal is to make blk_io_plug() multi-queue
friendly.
Instead of having block drivers track whether or not we're in a plugged
section, provide an API that allows them to defer a function call until
we're unplugged: blk_io_plug_call(fn, opaque). If blk_io_plug_call() is
called multiple times with the same fn/opaque pair, then fn() is only
called once at the end of the function - resulting in batching.
This patch introduces the API and changes blk_io_plug()/blk_io_unplug().
blk_io_plug()/blk_io_unplug() no longer require a BlockBackend argument
because the plug state is now thread-local.
Later patches convert block drivers to blk_io_plug_call() and then we
can finally remove .bdrv_co_io_plug() once all block drivers have been
converted.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20230530180959.1108766-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The virtio-scsi Host Bus Adapter provides access to devices on a SCSI
bus. Those SCSI devices typically have a BlockBackend. When the
BlockBackend enters a drained section, the SCSI device must temporarily
stop submitting new I/O requests.
Implement this behavior by temporarily stopping virtio-scsi virtqueue
processing when one of the SCSI devices enters a drained section. The
new scsi_device_drained_begin() API allows scsi-disk to message the
virtio-scsi HBA.
scsi_device_drained_begin() uses a drain counter so that multiple SCSI
devices can have overlapping drained sections. The HBA only sees one
pair of .drained_begin/end() calls.
After this commit, virtio-scsi no longer depends on hw/virtio's
ioeventfd aio_set_event_notifier(is_external=true). This commit is a
step towards removing the aio_disable_external() API.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230516190238.8401-19-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
virtio_queue_aio_detach_host_notifier() does two things:
1. It removes the fd handler from the event loop.
2. It processes the virtqueue one last time.
The first step can be peformed by any thread and without taking the
AioContext lock.
The second step may need the AioContext lock (depending on the device
implementation) and runs in the thread where request processing takes
place. virtio-blk and virtio-scsi therefore call
virtio_queue_aio_detach_host_notifier() from a BH that is scheduled in
AioContext.
The next patch will introduce a .drained_begin() function that needs to
call virtio_queue_aio_detach_host_notifier(). .drained_begin() functions
cannot call aio_poll() to wait synchronously for the BH. It is possible
for a .drained_poll() callback to asynchronously wait for the BH, but
that is more complex than necessary here.
Move the virtqueue processing out to the callers of
virtio_queue_aio_detach_host_notifier() so that the function can be
called from any thread. This is in preparation for the next patch.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230516190238.8401-17-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch is part of an effort to remove the aio_disable_external()
API because it does not fit in a multi-queue block layer world where
many AioContexts may be submitting requests to the same disk.
The SCSI emulation code is already in good shape to stop using
aio_disable_external(). It was only used by commit 9c5aad84da
("virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi
disk") to ensure that virtio_scsi_hotunplug() works while the guest
driver is submitting I/O.
Ensure virtio_scsi_hotunplug() is safe as follows:
1. qdev_simple_device_unplug_cb() -> qdev_unrealize() ->
device_set_realized() calls qatomic_set(&dev->realized, false) so
that future scsi_device_get() calls return NULL because they exclude
SCSIDevices with realized=false.
That means virtio-scsi will reject new I/O requests to this
SCSIDevice with VIRTIO_SCSI_S_BAD_TARGET even while
virtio_scsi_hotunplug() is still executing. We are protected against
new requests!
2. scsi_qdev_unrealize() already contains a call to
scsi_device_purge_requests() so that in-flight requests are cancelled
synchronously. This ensures that no in-flight requests remain once
qdev_simple_device_unplug_cb() returns.
Thanks to these two conditions we don't need aio_disable_external()
anymore.
Cc: Zhengui Li <lizhengui@huawei.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-5-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Only report a transport reset event to the guest after the SCSIDevice
has been unrealized by qdev_simple_device_unplug_cb().
qdev_simple_device_unplug_cb() sets the SCSIDevice's qdev.realized field
to false so that scsi_device_find/get() no longer see it.
scsi_target_emulate_report_luns() also needs to be updated to filter out
SCSIDevices that are unrealized.
Change virtio_scsi_push_event() to take event information as an argument
instead of the SCSIDevice. This allows virtio_scsi_hotunplug() to emit a
VIRTIO_SCSI_T_TRANSPORT_RESET event after the SCSIDevice has already
been unrealized.
These changes ensure that the guest driver does not see the SCSIDevice
that's being unplugged if it responds very quickly to the transport
reset event.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-4-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a helper function to check whether the device is realized without
requiring the Big QEMU Lock. The next patch adds a second caller. The
goal is to avoid spreading DeviceState field accesses throughout the
code.
Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-3-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We cannot use the generic reentrancy guard in the LSI code, so
we have to manually prevent endless reentrancy here. The problematic
lsi_execute_script() function has already a way to detect whether
too many instructions have been executed - we just have to slightly
change the logic here that it also takes into account if the function
has been called too often in a reentrant way.
The code in fuzz-lsi53c895a-test.c has been taken from an earlier
patch by Mauro Matteo Cascella.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1563
Message-Id: <20230522091011.1082574-1-thuth@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Thomas Huth <thuth@redhat.com>
While trying to use a SCSI disk on the LSI controller with an
older version of Fedora (25), I'm getting:
qemu: warning: Blocked re-entrant IO on MemoryRegion: lsi-mmio at addr: 0x34
and the SCSI controller is not usable. Seems like we have to
disable the reentrancy checker for the MMIO region, too, to
get this working again.
The problem could be reproduced it like this:
./qemu-system-x86_64 -accel kvm -m 2G -machine q35 \
-device lsi53c810,id=lsi1 -device scsi-hd,drive=d0 \
-drive if=none,id=d0,file=.../somedisk.qcow2 \
-cdrom Fedora-Everything-netinst-i386-25-1.3.iso
Where somedisk.qcow2 is an image that contains already some partitions
and file systems.
In the boot menu of Fedora, go to
"Troubleshooting" -> "Rescue a Fedora system" -> "3) Skip to shell"
Then check "dmesg | grep -i 53c" for failure messages, and try to mount
a partition from somedisk.qcow2.
Message-Id: <20230516090556.553813-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Using linux 6.x guest, at boot time, an inquiry on a scsi-generic
device makes qemu crash. This is caused by a buffer overflow when
scsi-generic patches the block limits VPD page.
Do the operations on a temporary on-stack buffer that is guaranteed
to be large enough.
Reported-by: Théo Maillart <tmaillart@freebox.fr>
Analyzed-by: Théo Maillart <tmaillart@freebox.fr>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There is no need for the AioContext lock in aio_wait_bh_oneshot().
It's easy to remove the lock from existing callers and then switch from
AIO_WAIT_WHILE() to AIO_WAIT_WHILE_UNLOCKED() in aio_wait_bh_oneshot().
Document that the AioContext lock should not be held across
aio_wait_bh_oneshot(). Holding a lock across aio_poll() can cause
deadlock so we don't want callers to do that.
This is a step towards getting rid of the AioContext lock.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230404153307.458883-1-stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
As the code is designed to use the memory APIs to access the script ram,
disable reentrancy checks for the pseudo-RAM ram_io MemoryRegion.
In the future, ram_io may be converted from an IO to a proper RAM MemoryRegion.
Reported-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Message-Id: <20230427211013.2994127-6-alxndr@bu.edu>
Signed-off-by: Thomas Huth <thuth@redhat.com>
This protects devices from bh->mmio reentrancy issues.
Thanks: Thomas Huth <thuth@redhat.com> for diagnosing OS X test failure.
Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230427211013.2994127-5-alxndr@bu.edu>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Bring the files in line with the QEMU coding style, with spaces
for indentation.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/378
Signed-off-by: Yeqi Fu <fufuyqqqqqq@gmail.com>
Message-Id: <20230315032649.57568-1-fufuyqqqqqq@gmail.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The Free Software Foundation moved to a new address and some
sources in QEMU referred to their old location.
The address should be updated and replaced by a pointer to
<https://www.gnu.org/licenses/>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/379
Signed-off-by: Khadija Kamran <kkamran.bese16seecs@seecs.edu.pk>
Message-Id: <576ee9203fdac99d7251a98faa66b9ce1e7febc5.1675941486.git.kkamran.bese16seecs@seecs.edu.pk>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Thomas Huth <thuth@redhat.com>
When an IOThread is configured, the ctrl virtqueue is processed in the
IOThread. TMFs that reset SCSI devices are currently called directly
from the IOThread and trigger an assertion failure in blk_drain() from
the following call stack:
virtio_scsi_handle_ctrl_req -> virtio_scsi_do_tmf -> device_code_reset
-> scsi_disk_reset -> scsi_device_purge_requests -> blk_drain
../block/block-backend.c:1780: void blk_drain(BlockBackend *): Assertion `qemu_in_main_thread()' failed.
The blk_drain() function is not designed to be called from an IOThread
because it needs the Big QEMU Lock (BQL).
This patch defers TMFs that reset SCSI devices to a Bottom Half (BH)
that runs in the main loop thread under the BQL. This way it's safe to
call blk_drain() and the assertion failure is avoided.
Introduce s->tmf_bh_list for tracking TMF requests that have been
deferred to the BH. When the BH runs it will grab the entire list and
process all requests. Care must be taken to clear the list when the
virtio-scsi device is reset or unrealized. Otherwise deferred TMF
requests could execute later and lead to use-after-free or other
undefined behavior.
The s->resetting counter that's used by TMFs that reset SCSI devices is
accessed from multiple threads. This patch makes that explicit by using
atomic accessor functions. With this patch applied the counter is only
modified by the main loop thread under the BQL but can be read by any
thread.
Reported-by: Qing Wang <qinwang@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230221212218.1378734-4-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
dma_blk_cb() only takes the AioContext lock around ->io_func(). That
means the rest of dma_blk_cb() is not protected. In particular, the
DMAAIOCB field accesses happen outside the lock.
There is a race when the main loop thread holds the AioContext lock and
invokes scsi_device_purge_requests() -> bdrv_aio_cancel() ->
dma_aio_cancel() while an IOThread executes dma_blk_cb(). The dbs->acb
field determines how cancellation proceeds. If dma_aio_cancel() sees
dbs->acb == NULL while dma_blk_cb() is still running, the request can be
completed twice (-ECANCELED and the actual return value).
The following assertion can occur with virtio-scsi when an IOThread is
used:
../hw/scsi/scsi-disk.c:368: scsi_dma_complete: Assertion `r->req.aiocb != NULL' failed.
Fix the race by holding the AioContext across dma_blk_cb(). Now
dma_aio_cancel() under the AioContext lock will not see
inconsistent/intermediate states.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230221212218.1378734-3-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If requests are being processed in the IOThread when a SCSIDevice is
unplugged, scsi_device_purge_requests() -> scsi_req_cancel_async() races
with I/O completion callbacks. Both threads load and store req->aiocb.
This can lead to assert(r->req.aiocb == NULL) failures and undefined
behavior.
Protect r->req.aiocb with the AioContext lock to prevent the race.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230221212218.1378734-2-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
BlockDriver->bdrv_getlength is categorized as IO callback, and it
currently doesn't run in a coroutine. We should let it take a graph
rdlock since the callback traverses the block nodes graph, which however
is only possible in a coroutine.
Therefore turn it into a co_wrapper to move the actual function into a
coroutine where the lock can be taken.
Because now this function creates a new coroutine and polls, we need to
take the AioContext lock where it is missing, for the only reason that
internally co_wrapper calls AIO_WAIT_WHILE and it expects to release the
AioContext lock.
This is especially messy when a co_wrapper creates a coroutine and polls
in bdrv_open_driver, because this function has so many callers in so
many context that it can easily lead to deadlocks. Therefore the new
rule for bdrv_open_driver is that the caller must always hold the
AioContext lock of the given bs (except if it is a coroutine), because
the function calls bdrv_refresh_total_sectors() which is now a
co_wrapper.
Once the rwlock is ultimated and placed in every place it needs to be,
we will poll using AIO_WAIT_WHILE_UNLOCKED and remove the AioContext
lock.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230113204212.359076-7-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is below memleak detected when to quit the qemu-system-x86_64 (with
vhost-scsi-pci).
(qemu) quit
=================================================================
==15568==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 40 byte(s) in 1 object(s) allocated from:
#0 0x7f00aec57917 in __interceptor_calloc (/lib64/libasan.so.6+0xb4917)
#1 0x7f00ada0d7b5 in g_malloc0 (/lib64/libglib-2.0.so.0+0x517b5)
#2 0x5648ffd38bac in vhost_scsi_start ../hw/scsi/vhost-scsi.c:92
#3 0x5648ffd38d52 in vhost_scsi_set_status ../hw/scsi/vhost-scsi.c:131
#4 0x5648ffda340e in virtio_set_status ../hw/virtio/virtio.c:2036
#5 0x5648ff8de281 in virtio_ioport_write ../hw/virtio/virtio-pci.c:431
#6 0x5648ff8deb29 in virtio_pci_config_write ../hw/virtio/virtio-pci.c:576
#7 0x5648ffe5c0c2 in memory_region_write_accessor ../softmmu/memory.c:493
#8 0x5648ffe5c424 in access_with_adjusted_size ../softmmu/memory.c:555
#9 0x5648ffe6428f in memory_region_dispatch_write ../softmmu/memory.c:1515
#10 0x5648ffe8613d in flatview_write_continue ../softmmu/physmem.c:2825
#11 0x5648ffe86490 in flatview_write ../softmmu/physmem.c:2867
#12 0x5648ffe86d9f in address_space_write ../softmmu/physmem.c:2963
#13 0x5648ffe86e57 in address_space_rw ../softmmu/physmem.c:2973
#14 0x5648fffbfb3d in kvm_handle_io ../accel/kvm/kvm-all.c:2639
#15 0x5648fffc0e0d in kvm_cpu_exec ../accel/kvm/kvm-all.c:2890
#16 0x5648fffc90a7 in kvm_vcpu_thread_fn ../accel/kvm/kvm-accel-ops.c:51
#17 0x56490042400a in qemu_thread_start ../util/qemu-thread-posix.c:505
#18 0x7f00ac3b6ea4 in start_thread (/lib64/libpthread.so.0+0x7ea4)
Free the vsc->inflight at the 'stop' path.
Fixes: b82526c7ee ("vhost-scsi: support inflight io track")
Cc: Joe Jin <joe.jin@oracle.com>
Cc: Li Feng <fengli@smartx.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Message-Id: <20230104160433.21353-1-dongli.zhang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
PCIDeviceClass and PCIDevice are defined in pci.h. Many users of the
header don't actually need them. Similar structs live in their own
headers: PCIBusClass and PCIBus in pci_bus.h, PCIBridge in
pci_bridge.h, PCIHostBridgeClass and PCIHostState in pci_host.h,
PCIExpressHost in pcie_host.h, and PCIERootPortClass, PCIEPort, and
PCIESlot in pcie_port.h.
Move PCIDeviceClass and PCIDeviceClass to new pci_device.h, along with
the code that needs them. Adjust include directives.
This also enables the next commit.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221222100330.380143-6-armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
hw/pci/pci_bridge.h and hw/cxl/cxl.h include each other.
Fortunately, breaking the loop is merely a matter of deleting
unnecessary includes from headers, and adding them back in places
where they are now missing.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221222100330.380143-2-armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit 02b61f38d3 ("hw/virtio: incorporate backend features in features")
properly negotiates VHOST_USER_F_PROTOCOL_FEATURES with the vhost-user
backend, but we forgot to enable vrings as specified in
docs/interop/vhost-user.rst:
If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the
ring starts directly in the enabled state.
If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is
initialized in a disabled state and is enabled by
``VHOST_USER_SET_VRING_ENABLE`` with parameter 1.
Some vhost-user front-ends already did this by calling
vhost_ops.vhost_set_vring_enable() directly:
- backends/cryptodev-vhost.c
- hw/net/virtio-net.c
- hw/virtio/vhost-user-gpio.c
But most didn't do that, so we would leave the vrings disabled and some
backends would not work. We observed this issue with the rust version of
virtiofsd [1], which uses the event loop [2] provided by the
vhost-user-backend crate where requests are not processed if vring is
not enabled.
Let's fix this issue by enabling the vrings in vhost_dev_start() for
vhost-user front-ends that don't already do this directly. Same thing
also in vhost_dev_stop() where we disable vrings.
[1] https://gitlab.com/virtio-fs/virtiofsd
[2] https://github.com/rust-vmm/vhost/blob/240fc2966/crates/vhost-user-backend/src/event_loop.rs#L217
Fixes: 02b61f38d3 ("hw/virtio: incorporate backend features in features")
Reported-by: German Maglione <gmaglione@redhat.com>
Tested-by: German Maglione <gmaglione@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20221123131630.52020-1-sgarzare@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20221130112439.2527228-3-alex.bennee@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
enabled VIRTIO_F_RING_RESET by default for all virtio devices.
This feature is not currently emulated by QEMU, so for vhost and
vhost-user devices we need to make sure it is supported by the offloaded
device emulation (in-kernel or in another process).
To do this we need to add VIRTIO_F_RING_RESET to the features bitmap
passed to vhost_get_features(). This way it will be masked if the device
does not support it.
This issue was initially discovered with vhost-vsock and vhost-user-vsock,
and then also tested with vhost-user-rng which confirmed the same issue.
They fail when sending features through VHOST_SET_FEATURES ioctl or
VHOST_USER_SET_FEATURES message, since VIRTIO_F_RING_RESET is negotiated
by the guest (Linux >= v6.0), but not supported by the device.
Fixes: 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1318
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20221121101101.29400-1-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Those typos are in files which are used to generate the QEMU manual.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-Id: <20221110190825.879620-1-sw@weilnetz.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
[thuth: update sentence in can.rst as suggested by Peter]
Signed-off-by: Thomas Huth <thuth@redhat.com>
Currently the vwm_pvscsi controller resets individual SCSI devices
with the device_legacy_reset() function. The only difference between
this and device_cold_reset() is that device_legacy_reset() resets the
device but not any child qbuses it might have.
In this case, no SCSI device has a child qbus, so the functions have
the same behaviour. Switch to device_cold_reset() to move away from
using the deprecated function, and bring this SCSI controller in to
line with what all the others do.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20221013160623.1296109-3-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In the SCSI subsystem we currently use the legacy functions
qdev_reset_all() and qbus_reset_all(). These perform a recursive
reset, starting from either a qbus or a qdev. However they do not
permit any of the devices in the tree to use three-phase reset,
because device reset goes through the device_legacy_reset() function
that only calls the single DeviceClass::reset method.
Switch to using the device_cold_reset() and bus_cold_reset()
functions. These also perform a recursive reset, where first the
children are reset and then finally the parent, but they use the new
(...in 2020...) Resettable mechanism, which supports both the old
style single-reset method and also the new 3-phase reset handling.
Since no devices attached to SCSI buses currently try to use 3-phase
reset, this should be a no-behaviour-change commit which just reduces
the use of a deprecated API.
Commit created with:
sed -i -e 's/qdev_reset_all/device_cold_reset/g;s/qbus_reset_all/bus_cold_reset/g' hw/scsi/*.c
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20221013160623.1296109-2-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Section 5.6.6.3 of VirtIO specification states, "Events will also
be reported via sense codes..." However, no sense data is sent when
VIRTIO_SCSI_EVT_RESET_RESCAN or VIRTIO_SCSI_EVT_RESET_REMOVED events
are reported (when disk hotplug/hotunplug events occur). SCSI layer
on Solaris depends on this sense data, and hence does not handle disk
hotplug/hotunplug events.
When the disk inventory changes, use the bus unit attention mechanism
to return a CHECK_CONDITION status with sense data of 0x06/0x3F/0x0E
(sense code REPORTED_LUNS_CHANGED). The first device on the bus to
execute a command successfully will report and consume the unit
attention status.
Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
Message-Id: <20221006194946.24134-1-venu.busireddy@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
SunOS expects CD-ROM devices to have a block size of 512, and will
fail to mount or install using QEMU's default block size of 2048.
When initializing the SCSI device, allow the `physical_block_size'
block device option to override the default block size.
Signed-off-by: John Millikin <john@john-millikin.com>
Message-Id: <20220804122950.1577012-1-john@john-millikin.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The `started` field is manipulated internally within the vhost code
except for one place, vhost-user-blk via f5b22d06fb (vhost: recheck
dev state in the vhost_migration_log routine). Mark that as a FIXME
because it introduces a potential race. I think the referenced fix
should be tracking its state locally.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220802095010.3330793-12-alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwittz@nutanix.com>
In scsi_req_parse_cdb(), if the CDB length implied by the command type
exceeds the initialized portion of the command buffer, reject the request.
Rejected requests are recorded by the `scsi_req_parse_bad` trace event.
On example of a bug detected by this check is SunOS's use of interleaved
DMA and non-DMA commands. This guest behavior currently causes QEMU to
parse uninitialized memory as a SCSI command, with unpredictable
outcomes.
With the new check in place:
* QEMU consistently creates a trace event and rejects the request.
* SunOS retries the request(s) and is able to successfully boot from
disk.
Signed-off-by: John Millikin <john@john-millikin.com>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1127
Message-Id: <20220817053458.698416-2-john@john-millikin.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When a SCSI command is received from the guest, the CDB length implied
by the first byte might exceed the number of bytes the guest sent. In
this case scsi_req_new() will read uninitialized data, causing
unpredictable behavior.
Adds the buf_len parameter to scsi_req_new() and plumbs it through the
call stack.
Signed-off-by: John Millikin <john@john-millikin.com>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1127
Message-Id: <20220817053458.698416-1-john@john-millikin.com>
[Fill in correct length for adapters other than ESP. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Per investigation on the linked ticket, SunOS issues a SCSI bus reset
to the ESP as part of its boot sequence. If this ESP command doesn't
cause devices to assert sense flag UNIT ATTENTION, SunOS will consider
the CD-ROM device to be non-compliant with Common Command Set (CCS).
In this condition, the SunOS installer's early userspace doesn't set
the installation source location to sr0 and the miniroot copy fails.
Signed-off-by: John Millikin <john@john-millikin.com>
Suggested-by: Bill Paul <noisetube@gmail.com>
Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1127
Message-Id: <20220817053846.699310-1-john@john-millikin.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commits 01ef8185b8 amd 24b36e9813 updated the way that the maximum
transfer length is calculated for patching block limits VPD page in an
INQUIRY response.
The same updates also need to be made for the case where the host device
does not support the block limits VPD page at all and we emulate the
whole page.
Without this fix, on host block devices a maximum transfer length of
(INT_MAX - sector_size) bytes is advertised to the guest, resulting in
I/O errors when a request that exceeds the host limits is made by the
guest. (Prior to commit 24b36e9813, this code path would use the
max_transfer value from the host instead of INT_MAX, but still miss the
fix from 01ef8185b8 where max_transfer is also capped to max_iov
host pages, so it would be less wrong, but still wrong.)
Cc: qemu-stable@nongnu.org
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2096251
Fixes: 01ef8185b8
Fixes: 24b36e9813
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20220822125320.48257-1-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
As soon as virtio_scsi_data_plane_start() attaches host notifiers the
IOThread may start virtqueue processing. There is a race between
IOThread virtqueue processing and virtio_scsi_data_plane_start() because
it only assigns s->dataplane_started after attaching host notifiers.
When a virtqueue handler function in the IOThread calls
virtio_scsi_defer_to_dataplane() it may see !s->dataplane_started and
attempt to start dataplane even though we're already in the IOThread:
#0 0x00007f67b360857c __pthread_kill_implementation (libc.so.6 + 0xa257c)
#1 0x00007f67b35bbd56 raise (libc.so.6 + 0x55d56)
#2 0x00007f67b358e833 abort (libc.so.6 + 0x28833)
#3 0x00007f67b358e75b __assert_fail_base.cold (libc.so.6 + 0x2875b)
#4 0x00007f67b35b4cd6 __assert_fail (libc.so.6 + 0x4ecd6)
#5 0x000055ca87fd411b memory_region_transaction_commit (qemu-kvm + 0x67511b)
#6 0x000055ca87e17811 virtio_pci_ioeventfd_assign (qemu-kvm + 0x4b8811)
#7 0x000055ca87e14836 virtio_bus_set_host_notifier (qemu-kvm + 0x4b5836)
#8 0x000055ca87f8e14e virtio_scsi_set_host_notifier (qemu-kvm + 0x62f14e)
#9 0x000055ca87f8dd62 virtio_scsi_dataplane_start (qemu-kvm + 0x62ed62)
#10 0x000055ca87e14610 virtio_bus_start_ioeventfd (qemu-kvm + 0x4b5610)
#11 0x000055ca87f8c29a virtio_scsi_handle_ctrl (qemu-kvm + 0x62d29a)
#12 0x000055ca87fa5902 virtio_queue_host_notifier_read (qemu-kvm + 0x646902)
#13 0x000055ca882c099e aio_dispatch_handler (qemu-kvm + 0x96199e)
#14 0x000055ca882c1761 aio_poll (qemu-kvm + 0x962761)
#15 0x000055ca880e1052 iothread_run (qemu-kvm + 0x782052)
#16 0x000055ca882c562a qemu_thread_start (qemu-kvm + 0x96662a)
This patch assigns s->dataplane_started before attaching host notifiers
so that virtqueue handler functions that run in the IOThread before
virtio_scsi_data_plane_start() returns correctly identify that dataplane
does not need to be started. This fix is taken from the virtio-blk
dataplane code and it's worth adding a comment in virtio-blk as well to
explain why it works.
Note that s->dataplane_started does not need the AioContext lock because
it is set before attaching host notifiers and cleared after detaching
host notifiers. In other words, the IOThread always sees the value true
and the main loop thread does not modify it while the IOThread is
active.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2099541
Reported-by: Qing Wang <qinwang@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20220808162134.240405-1-stefanha@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The existing code assumes that the block size can be generated from p[1] << 8
in multiple places which ignores the top and bottom 8 bits. If the block size
is allowed to be set to an arbitrary value then this causes a mismatch
between the value written by the guest in the block descriptor and the value
subsequently read back using READ CAPACITY causing the guest to generate
requests that can crash QEMU.
For now restrict block size changes to bits 8-15 and also ignore requests to
set the block size to 0 which causes the SCSI emulation to crash in at least
one place with a divide by zero error.
Fixes: 356c4c441e ("scsi-disk: allow MODE SELECT block descriptor to set the block size")
Closes: https://gitlab.com/qemu-project/qemu/-/issues/1112
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220730122656.253448-3-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In scsi_disk_emulate_write_same() the number of host sectors to transfer is
calculated as (s->qdev.blocksize / BDRV_SECTOR_SIZE) which is then used to
copy data in block size chunks to the iov buffer.
Since the loop copying the data to the iov buffer uses a fixed increment of
s->qdev.blocksize then using a block size that isn't a multiple of
BDRV_SECTOR_SIZE introduces a rounding error in the iov buffer size calculation
such that the iov buffer copy overflows the space allocated.
Update the iov buffer copy for() loop so that it will use the smallest of either
the current block size or the remaining transfer count to prevent the overflow.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220730122656.253448-2-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The MODE SELECT command can contain an optional block descriptor that can be used
to set the device block size. If the block descriptor is present then update the
block size on the SCSI device accordingly.
This allows CDROMs to be used with A/UX which requires a CDROM drive which is
capable of switching from a 2048 byte sector size to a 512 byte sector size.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220622105314.802852-13-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A/UX sends a MODE_PAGE_R_W_ERROR command with the AWRE bit set to 0 when enumerating
CDROM drives. Since the bit is currently hardcoded to 1 then indicate that the AWRE
bit can be changed (even though we don't care about the value) so that
the MODE_PAGE_R_W_ERROR page can be set successfully.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220622105314.802852-12-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When A/UX configures the CDROM device it sends a truncated MODE SELECT request
for page 1 (MODE_PAGE_R_W_ERROR) which is only 6 bytes in length rather than
10. This seems to be due to bug in Apple's code which calculates the CDB message
length incorrectly.
The work at [1] suggests that this truncated request is accepted on real
hardware whereas in QEMU it generates an INVALID_PARAM_LEN sense code which
causes A/UX to get stuck in a loop retrying the command in an attempt to succeed.
Alter the mode page request length check so that truncated requests are allowed
if the SCSI_DISK_QUIRK_MODE_PAGE_TRUNCATED quirk is enabled, whilst also adding a
trace event to enable the condition to be detected.
[1] https://68kmla.org/bb/index.php?threads/scsi2sd-project-anyone-interested.29040/page-7#post-316444
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220622105314.802852-10-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When initialising a drive ready to install MacOS, Apple HD SC Setup first attempts
to format the drive. Add a simple FORMAT UNIT command which simply returns success
to allow the format to succeed.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220622105314.802852-9-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Both MacOS and A/UX make use of vendor-specific MODE SELECT commands with PF=0
to identify SCSI devices:
- MacOS sends a MODE SELECT command with PF=0 for the MODE_PAGE_VENDOR_SPECIFIC
(0x0) mode page containing 2 bytes before initialising a disk
- A/UX (installed on disk) sends a MODE SELECT command with PF=0 during SCSI
bus enumeration, and gets stuck in an infinite loop if it fails
Add a new SCSI_DISK_QUIRK_MODE_PAGE_VENDOR_SPECIFIC_APPLE quirk to allow both
PF=0 MODE SELECT commands and implement a MODE_PAGE_VENDOR_SPECIFIC (0x0)
mode page which is compatible with MacOS.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220622105314.802852-7-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
During SCSI bus enumeration A/UX sends a MODE SENSE command to the CDROM with
the DBD bit unset and expects the response to include a block descriptor. As per
the latest SCSI documentation, QEMU currently force-disables the block
descriptor for CDROM devices but the A/UX driver expects the requested block
descriptor to be returned.
If the block descriptor is not returned in the response then A/UX becomes
confused, since the block descriptor returned in the MODE SENSE response is
used to generate a subsequent MODE SELECT command which is then invalid.
Add a new SCSI_DISK_QUIRK_MODE_SENSE_ROM_USE_DBD quirk to allow this behaviour
to be enabled as required. Note that an additional workaround is required for
the previous SCSI_DISK_QUIRK_MODE_PAGE_APPLE_VENDOR quirk which must never
return a block descriptor even though the DBD bit is left unset.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220622105314.802852-5-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
One of the mechanisms MacOS uses to identify CDROM drives compatible with MacOS
is to send a custom MODE SELECT command for page 0x30 to the drive. The
response to this is a hard-coded manufacturer string which must match in order
for the CDROM to be usable within MacOS.
Add an implementation of the MODE SELECT page 0x30 response guarded by a newly
defined SCSI_DISK_QUIRK_MODE_PAGE_APPLE_VENDOR quirk bit so that CDROM drives
attached to non-Apple machines function exactly as before.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220622105314.802852-3-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since the MacOS SCSI implementation is quite old (and Apple added some firmware
customisations to their drives for m68k Macs) there is need to add a mechanism
to correctly handle Apple-specific quirks.
Add a new quirks bitmap to SCSIDiskState that can be used to enable these
features as required.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220622105314.802852-2-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Set current_req to NULL, not current_req->req, to prevent reusing a free'd
buffer in case of repeated SCSI cancel requests. Also apply the fix to
CLEAR QUEUE and BUS DEVICE RESET messages as well, since they also cancel
the request.
Thanks to Alexander Bulekov for providing a reproducer.
Fixes: CVE-2022-0216
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/972
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Tested-by: Alexander Bulekov <alxndr@bu.edu>
Message-Id: <20220711123316.421279-1-mcascell@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Set current_req->req to NULL to prevent reusing a free'd buffer in case of
repeated SCSI cancel requests. Thanks to Thomas Huth for suggesting the patch.
Fixes: CVE-2022-0216
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/972
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20220705200543.2366809-1-mcascell@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Inspired by Julia Lawall's fixing of Linux
kernel comments, I looked at qemu, although I did it manually.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Message-Id: <20220614104045.85728-2-dgilbert@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Commit 1b7fd72955 ("block: rename buffer_alignment to
guest_block_size") noted:
At this point, the field is set by the device emulation, but completely
ignored by the block layer.
The last time the value of buffer_alignment/guest_block_size was
actually used was before commit 339064d506 ("block: Don't use guest
sector size for qemu_blockalign()").
This value has not been used since 2013. Get rid of it.
Cc: Xie Yongji <xieyongji@bytedance.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20220518130945.2657905-1-stefanha@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Faria <afaria@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>