Commit Graph

1598 Commits

Author SHA1 Message Date
Peter Maydell
718780d204 nvme queue
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEUigzqnXi3OaiR2bATeGvMW1PDekFAmb7nokACgkQTeGvMW1P
 Del+2gf/YefiiYSL540C2QeYRwMFd6xFKKWYRRJoaARyLAoqInVdLiBql527Oov8
 rgDQq+D0XXP15CNDvfAZ59a36h1bAW79QCfKEUMSbP8GPeqb5pOSRfvYJSwnG1YX
 SC70vKLOrBhzxYiQYSOhLNKdbUM00OUyf2xibu0zk84UpkXtzSR4h/byFnQIHwEV
 /uUh4+cxY6eQK1Tfk/f66FEJLuJTchFOMVswYolDMezu2vJmToWHju/kpy2ugvaC
 +WEUEti8kL66/B399u1uwAad2OejC1Jf4qMFcFQJ9Cs9RV4HTC9byolceJE+1R0V
 CZt1SxvBNdK/ihs1iTjP7fInPqdYKw==
 =16tX
 -----END PGP SIGNATURE-----

Merge tag 'pull-nvme-20241001' of https://gitlab.com/birkelund/qemu into staging

nvme queue

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCgAdFiEEUigzqnXi3OaiR2bATeGvMW1PDekFAmb7nokACgkQTeGvMW1P
# Del+2gf/YefiiYSL540C2QeYRwMFd6xFKKWYRRJoaARyLAoqInVdLiBql527Oov8
# rgDQq+D0XXP15CNDvfAZ59a36h1bAW79QCfKEUMSbP8GPeqb5pOSRfvYJSwnG1YX
# SC70vKLOrBhzxYiQYSOhLNKdbUM00OUyf2xibu0zk84UpkXtzSR4h/byFnQIHwEV
# /uUh4+cxY6eQK1Tfk/f66FEJLuJTchFOMVswYolDMezu2vJmToWHju/kpy2ugvaC
# +WEUEti8kL66/B399u1uwAad2OejC1Jf4qMFcFQJ9Cs9RV4HTC9byolceJE+1R0V
# CZt1SxvBNdK/ihs1iTjP7fInPqdYKw==
# =16tX
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 01 Oct 2024 08:02:33 BST
# gpg:                using RSA key 522833AA75E2DCE6A24766C04DE1AF316D4F0DE9
# gpg: Good signature from "Klaus Jensen <its@irrelevant.dk>" [full]
# gpg:                 aka "Klaus Jensen <k.jensen@samsung.com>" [full]
# Primary key fingerprint: DDCA 4D9C 9EF9 31CC 3468  4272 63D5 6FC5 E55D A838
#      Subkey fingerprint: 5228 33AA 75E2 DCE6 A247  66C0 4DE1 AF31 6D4F 0DE9

* tag 'pull-nvme-20241001' of https://gitlab.com/birkelund/qemu:
  hw/nvme: add atomic write support
  hw/nvme: add knob for CTRATT.MEM
  hw/nvme: support CTRATT.MEM
  hw/nvme: clear masked events from the aer queue
  hw/nvme: report id controller metadata sgl support

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-10-01 11:34:07 +01:00
Arun Kumar
a1ab67883d hw/nvme: support CTRATT.MEM
Indicate that 'MDTS and Size Limits Exclude Metadata (MEM)' in the
Controller Attributes (CTRATT) I/O Command Set Independent Identify
Controller Data Structure.

Signed-off-by: Arun Kumar <arun.kka@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
[k.jensen: updated commit message]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2024-09-30 12:45:17 +02:00
Dr. David Alan Gilbert
e84af3eb72 block: Remove unused aio_task_pool_empty
aio_task_pool_empty has been unused since it was added in
  6e9b225f73 ("block: introduce aio task pool")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org>
Message-Id: <20240917002007.330689-1-dave@treblig.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2024-09-30 10:53:18 +03:00
Fiona Ebner
9484ad6c17 copy-before-write: allow specifying minimum cluster size
In the context of backup fleecing, discarding the source will not work
when the fleecing image has a larger granularity than the one used for
block-copy operations (can happen if the backup target has smaller
cluster size), because cbw_co_pdiscard_snapshot() will align down the
discard requests and thus effectively ignore then.

To make @discard-source work in such a scenario, allow specifying the
minimum cluster size used for block-copy operations and thus in
particular also the granularity for discard requests to the source.

The type 'size' (corresponding to uint64_t in C) is used in QAPI to
rule out negative inputs and for consistency with already existing
@cluster-size parameters. Since block_copy_calculate_cluster_size()
uses int64_t for its result, a check that the input is not too large
is added in block_copy_state_new() before calling it. The calculation
in block_copy_calculate_cluster_size() is done in the target int64_t
type.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Acked-by: Markus Armbruster <armbru@redhat.com> (QAPI schema)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-Id: <20240711120915.310243-2-f.ebner@proxmox.com>
[vsementsov: switch version to 9.2 in QAPI doc]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2024-09-30 10:52:41 +03:00
Yoochan Jeong
7c85332a2b hw/ufs: minor bug fixes related to ufs-test
Minor bugs and errors related to ufs-test are resolved. Some
permissions and code implementations that are not synchronized
with the ufs spec are edited.

Signed-off-by: Yoochan Jeong <yc01.jeong@samsung.com>
Reviewed-by: Jeuk Kim <jeuk20.kim@samsung.com>
Signed-off-by: Jeuk Kim <jeuk20.kim@samsung.com>
2024-09-06 18:04:16 +09:00
Eric Blake
c8a76dbd90 nbd/server: CVE-2024-7409: Cap default max-connections to 100
Allowing an unlimited number of clients to any web service is a recipe
for a rudimentary denial of service attack: the client merely needs to
open lots of sockets without closing them, until qemu no longer has
any more fds available to allocate.

For qemu-nbd, we default to allowing only 1 connection unless more are
explicitly asked for (-e or --shared); this was historically picked as
a nice default (without an explicit -t, a non-persistent qemu-nbd goes
away after a client disconnects, without needing any additional
follow-up commands), and we are not going to change that interface now
(besides, someday we want to point people towards qemu-storage-daemon
instead of qemu-nbd).

But for qemu proper, and the newer qemu-storage-daemon, the QMP
nbd-server-start command has historically had a default of unlimited
number of connections, in part because unlike qemu-nbd it is
inherently persistent until nbd-server-stop.  Allowing multiple client
sockets is particularly useful for clients that can take advantage of
MULTI_CONN (creating parallel sockets to increase throughput),
although known clients that do so (such as libnbd's nbdcopy) typically
use only 8 or 16 connections (the benefits of scaling diminish once
more sockets are competing for kernel attention).  Picking a number
large enough for typical use cases, but not unlimited, makes it
slightly harder for a malicious client to perform a denial of service
merely by opening lots of connections withot progressing through the
handshake.

This change does not eliminate CVE-2024-7409 on its own, but reduces
the chance for fd exhaustion or unlimited memory usage as an attack
surface.  On the other hand, by itself, it makes it more obvious that
with a finite limit, we have the problem of an unauthenticated client
holding 100 fds opened as a way to block out a legitimate client from
being able to connect; thus, later patches will further add timeouts
to reject clients that are not making progress.

This is an INTENTIONAL change in behavior, and will break any client
of nbd-server-start that was not passing an explicit max-connections
parameter, yet expects more than 100 simultaneous connections.  We are
not aware of any such client (as stated above, most clients aware of
MULTI_CONN get by just fine on 8 or 16 connections, and probably cope
with later connections failing by relying on the earlier connections;
libvirt has not yet been passing max-connections, but generally
creates NBD servers with the intent for a single client for the sake
of live storage migration; meanwhile, the KubeSAN project anticipates
a large cluster sharing multiple clients [up to 8 per node, and up to
100 nodes in a cluster], but it currently uses qemu-nbd with an
explicit --shared=0 rather than qemu-storage-daemon with
nbd-server-start).

We considered using a deprecation period (declare that omitting
max-parameters is deprecated, and make it mandatory in 3 releases -
then we don't need to pick an arbitrary default); that has zero risk
of breaking any apps that accidentally depended on more than 100
connections, and where such breakage might not be noticed under unit
testing but only under the larger loads of production usage.  But it
does not close the denial-of-service hole until far into the future,
and requires all apps to change to add the parameter even if 100 was
good enough.  It also has a drawback that any app (like libvirt) that
is accidentally relying on an unlimited default should seriously
consider their own CVE now, at which point they are going to change to
pass explicit max-connections sooner than waiting for 3 qemu releases.
Finally, if our changed default breaks an app, that app can always
pass in an explicit max-parameters with a larger value.

It is also intentional that the HMP interface to nbd-server-start is
not changed to expose max-connections (any client needing to fine-tune
things should be using QMP).

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-12-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[ericb: Expand commit message to summarize Dan's argument for why we
break corner-case back-compat behavior without a deprecation period]
Signed-off-by: Eric Blake <eblake@redhat.com>
2024-08-08 16:02:23 -05:00
Eric Blake
fb1c2aaa98 nbd/server: Plumb in new args to nbd_client_add()
Upcoming patches to fix a CVE need to track an opaque pointer passed
in by the owner of a client object, as well as request for a time
limit on how fast negotiation must complete.  Prepare for that by
changing the signature of nbd_client_new() and adding an accessor to
get at the opaque pointer, although for now the two servers
(qemu-nbd.c and blockdev-nbd.c) do not change behavior even though
they pass in a new default timeout value.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-11-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[eblake: s/LIMIT/MAX_SECS/ as suggested by Dan]
Signed-off-by: Eric Blake <eblake@redhat.com>
2024-08-08 15:05:27 -05:00
Kevin Wolf
7e17111646 block/graph-lock: Make WITH_GRAPH_RDLOCK_GUARD() fully checked
Upstream clang 18 (and backports to clang 17 in Fedora and RHEL)
implemented support for __attribute__((cleanup())) in its Thread Safety
Analysis, so we can now actually have a proper implementation of
WITH_GRAPH_RDLOCK_GUARD() that understands when we acquire and when we
release the lock.

-Wthread-safety is now only enabled if the compiler is new enough to
understand this pattern. In theory, we could have used some #ifdefs to
keep the existing basic checks on old compilers, but as long as someone
runs a newer compiler (and our CI does), we will catch locking problems,
so it's probably not worth keeping multiple implementations for this.

The implementation can't use g_autoptr any more because the glib macros
define wrapper functions that don't have the right TSA attributes, so
the compiler would complain about them. Just use the cleanup attribute
directly instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20240627181245.281403-3-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-08-06 20:12:39 +02:00
Arun Kumar
d522aef88d hw/nvme: add cross namespace copy support
Extend copy command to copy user data across different namespaces via
support for specifying a namespace for each source range

Signed-off-by: Arun Kumar <arun.kka@samsung.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2024-07-22 14:36:15 +02:00
Minwoo Im
6471556500 hw/nvme: add Identify Endurance Group List
Commit 73064edfb8 ("hw/nvme: flexible data placement emulation")
intorudced NVMe FDP feature to nvme-subsys and nvme-ctrl with a
single endurance group #1 supported.  This means that controller should
return proper identify data to host with Identify Endurance Group List
(CNS 19h).  But, yes, only just for the endurance group #1.  This patch
allows host applications to ask for which endurance group is available
and utilize FDP through that endurance group.

Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2024-07-11 17:05:37 +02:00
Paolo Bonzini
44b424dc4a block: remove separate bdrv_file_open callback
bdrv_file_open and bdrv_open are completely equivalent, they are
never checked except to see which one to invoke.  So merge them
into a single one.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-28 14:44:51 +02:00
Prasad Pandit
24687abf23 linux-aio: add IO_CMD_FDSYNC command support
Libaio defines IO_CMD_FDSYNC command to sync all outstanding
asynchronous I/O operations, by flushing out file data to the
disk storage. Enable linux-aio to submit such aio request.

When using aio=native without fdsync() support, QEMU creates
pthreads, and destroying these pthreads results in TLB flushes.
In a real-time guest environment, TLB flushes cause a latency
spike. This patch helps to avoid such spikes.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
Message-ID: <20240425070412.37248-1-ppandit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-06-10 11:05:43 +02:00
Stefan Hajnoczi
e669e800fc aio: warn about iohandler_ctx special casing
The main loop has two AioContexts: qemu_aio_context and iohandler_ctx.
The main loop runs them both, but nested aio_poll() calls on
qemu_aio_context exclude iohandler_ctx.

Which one should qemu_get_current_aio_context() return when called from
the main loop? Document that it's always qemu_aio_context.

This has subtle effects on functions that use
qemu_get_current_aio_context(). For example, aio_co_reschedule_self()
does not work when moving from iohandler_ctx to qemu_aio_context because
qemu_get_current_aio_context() does not differentiate these two
AioContexts.

Document this in order to reduce the chance of future bugs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240506190622.56095-3-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-06-10 11:05:43 +02:00
Minwoo Im
5c079578d2 hw/ufs: Add support MCQ of UFSHCI 4.0
This patch adds support for MCQ defined in UFSHCI 4.0.  This patch
utilized the legacy I/O codes as much as possible to support MCQ.

MCQ operation & runtime register is placed at 0x1000 offset of UFSHCI
register statically with no spare space among four registers (48B):

	UfsMcqSqReg, UfsMcqSqIntReg, UfsMcqCqReg, UfsMcqCqIntReg

The maxinum number of queue is 32 as per spec, and the default
MAC(Multiple Active Commands) are 32 in the device.

Example:
	-device ufs,serial=foo,id=ufs0,mcq=true,mcq-maxq=8

Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
Reviewed-by: Jeuk Kim <jeuk20.kim@samsung.com>
Message-Id: <20240528023106.856777-3-minwoo.im@samsung.com>
Signed-off-by: Jeuk Kim <jeuk20.kim@samsung.com>
2024-06-03 16:20:42 +09:00
Minwoo Im
cdba3b901a hw/ufs: Update MCQ-related fields to block/ufs.h
This patch is a prep patch for the following MCQ support patch for
hw/ufs.  This patch updated minimal mandatory fields to support MCQ
based on UFSHCI 4.0.

Signed-off-by: Minwoo Im <minwoo.im@samsung.com>
Reviewed-by: Jeuk Kim <jeuk20.kim@samsung.com>
Message-Id: <20240528023106.856777-2-minwoo.im@samsung.com>
Signed-off-by: Jeuk Kim <jeuk20.kim@samsung.com>
2024-06-03 16:20:42 +09:00
Vladimir Sementsov-Ogievskiy
0fd05c8d80 qapi: blockdev-backup: add discard-source parameter
Add a parameter that enables discard-after-copy. That is mostly useful
in "push backup with fleecing" scheme, when source is snapshot-access
format driver node, based on copy-before-write filter snapshot-access
API:

[guest]      [snapshot-access] ~~ blockdev-backup ~~> [backup target]
   |            |
   | root       | file
   v            v
[copy-before-write]
   |             |
   | file        | target
   v             v
[active disk]   [temp.img]

In this case discard-after-copy does two things:

 - discard data in temp.img to save disk space
 - avoid further copy-before-write operation in discarded area

Note that we have to declare WRITE permission on source in
copy-before-write filter, for discard to work. Still we can't take it
unconditionally, as it will break normal backup from RO source. So, we
have to add a parameter and pass it thorough bdrv_open flags.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20240313152822.626493-5-vsementsov@yandex-team.ru>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2024-05-28 15:52:15 +03:00
Vladimir Sementsov-Ogievskiy
006e845b5a block/copy-before-write: create block_copy bitmap in filter node
Currently block_copy creates copy_bitmap in source node. But that is in
bad relation with .independent_close=true of copy-before-write filter:
source node may be detached and removed before .bdrv_close() handler
called, which should call block_copy_state_free(), which in turn should
remove copy_bitmap.

That's all not ideal: it would be better if internal bitmap of
block-copy object is not attached to any node. But that is not possible
now.

The simplest solution is just create copy_bitmap in filter node, where
anyway two other bitmaps are created.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Message-Id: <20240313152822.626493-4-vsementsov@yandex-team.ru>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2024-05-28 15:52:15 +03:00
Hanna Czenczek
5bdbaebcce virtio: Re-enable notifications after drain
During drain, we do not care about virtqueue notifications, which is why
we remove the handlers on it.  When removing those handlers, whether vq
notifications are enabled or not depends on whether we were in polling
mode or not; if not, they are enabled (by default); if so, they have
been disabled by the io_poll_start callback.

Because we do not care about those notifications after removing the
handlers, this is fine.  However, we have to explicitly ensure they are
enabled when re-attaching the handlers, so we will resume receiving
notifications.  We do this in virtio_queue_aio_attach_host_notifier*().
If such a function is called while we are in a polling section,
attaching the notifiers will then invoke the io_poll_start callback,
re-disabling notifications.

Because we will always miss virtqueue updates in the drained section, we
also need to poll the virtqueue once after attaching the notifiers.

Buglink: https://issues.redhat.com/browse/RHEL-3934
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20240202153158.788922-3-hreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07 21:51:03 +01:00
Peter Krempa
72098a3aba stream: Allow users to request only format driver names in backing file format
Introduce a new flag 'backing-mask-protocol' for the block-stream QMP
command which instructs the internals to use 'raw' instead of the
protocol driver in case when a image is used without a dummy 'raw'
wrapper.

The flag is designed such that it can be always asserted by management
tools even when there isn't any update to backing files.

The flag will be used by libvirt so that the backing images still
reference the proper format even when libvirt will stop using the dummy
raw driver (raw driver with no other config). Libvirt needs this so that
the images stay compatible with older libvirt versions which didn't
expect that a protocol driver name can appear in the backing file format
field.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <bbee9a0a59748a8893289bf8249f568f0d587e62.1701796348.git.pkrempa@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-01-26 11:16:58 +01:00
Peter Krempa
4b028cbe75 commit: Allow users to request only format driver names in backing file format
Introduce a new flag 'backing-mask-protocol' for the block-commit QMP
command which instructs the internals to use 'raw' instead of the
protocol driver in case when a image is used without a dummy 'raw'
wrapper.

The flag is designed such that it can be always asserted by management
tools even when there isn't any update to backing files.

The flag will be used by libvirt so that the backing images still
reference the proper format even when libvirt will stop using the dummy
raw driver (raw driver with no other config). Libvirt needs this so that
the images stay compatible with older libvirt versions which didn't
expect that a protocol driver name can appear in the backing file format
field.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <2cb46e37093ce793ea1604abc8bbb90f4c8e434b.1701796348.git.pkrempa@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-01-26 11:16:58 +01:00
Paolo Bonzini
3cbc17ee92 io_uring: move LuringState typedef to block/aio.h
The LuringState typedef is defined twice, in include/block/raw-aio.h and
block/io_uring.c.  Move it in include/block/aio.h, which is included
everywhere the typedef is needed, since include/block/aio.h already has
to define the forward reference to the struct.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-01-18 10:43:14 +01:00
Stefan Hajnoczi
0b2675c473 Rename "QEMU global mutex" to "BQL" in comments and docs
The term "QEMU global mutex" is identical to the more widely used Big
QEMU Lock ("BQL"). Update the code comments and documentation to use
"BQL" instead of "QEMU global mutex".

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-id: 20240102153529.486531-6-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-01-08 10:45:43 -05:00
Stefan Hajnoczi
195801d700 system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()
The Big QEMU Lock (BQL) has many names and they are confusing. The
actual QemuMutex variable is called qemu_global_mutex but it's commonly
referred to as the BQL in discussions and some code comments. The
locking APIs, however, are called qemu_mutex_lock_iothread() and
qemu_mutex_unlock_iothread().

The "iothread" name is historic and comes from when the main thread was
split into into KVM vcpu threads and the "iothread" (now called the main
loop thread). I have contributed to the confusion myself by introducing
a separate --object iothread, a separate concept unrelated to the BQL.

The "iothread" name is no longer appropriate for the BQL. Rename the
locking APIs to:
- void bql_lock(void)
- void bql_unlock(void)
- bool bql_locked(void)

There are more APIs with "iothread" in their names. Subsequent patches
will rename them. There are also comments and documentation that will be
updated in later patches.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Fabiano Rosas <farosas@suse.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Acked-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20240102153529.486531-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-01-08 10:45:43 -05:00
Philippe Mathieu-Daudé
897a06c6d7 iothread: Remove unused Error** argument in aio_context_set_aio_params
aio_context_set_aio_params() doesn't use its undocumented
Error** argument. Remove it to simplify.

Note this removes a use of "unchecked Error**" in
iothread_set_aio_context_params().

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20231120171806.19361-1-philmd@linaro.org>
2024-01-08 10:45:34 -05:00
Stefan Hajnoczi
23c983c8f6 block: remove outdated AioContext locking comments
The AioContext lock no longer exists.

There is one noteworthy change:

  - * More specifically, these functions use BDRV_POLL_WHILE(bs), which
  - * requires the caller to be either in the main thread and hold
  - * the BlockdriverState (bs) AioContext lock, or directly in the
  - * home thread that runs the bs AioContext. Calling them from
  - * another thread in another AioContext would cause deadlocks.
  + * More specifically, these functions use BDRV_POLL_WHILE(bs), which requires
  + * the caller to be either in the main thread or directly in the home thread
  + * that runs the bs AioContext. Calling them from another thread in another
  + * AioContext would cause deadlocks.

I am not sure whether deadlocks are still possible. Maybe they have just
moved to the fine-grained locks that have replaced the AioContext. Since
I am not sure if the deadlocks are gone, I have kept the substance
unchanged and just removed mention of the AioContext.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20231205182011.1976568-15-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-12-21 22:49:27 +01:00
Stefan Hajnoczi
9f8d2fdcce aio: remove aio_context_acquire()/aio_context_release() API
Delete these functions because nothing calls these functions anymore.

I introduced these APIs in commit 98563fc3ec ("aio: add
aio_context_acquire() and aio_context_release()") in 2014. It's with a
sigh of relief that I delete these APIs almost 10 years later.

Thanks to Paolo Bonzini's vision for multi-queue QEMU, we got an
understanding of where the code needed to go in order to remove the
limitations that the original dataplane and the IOThread/AioContext
approach that followed it.

Emanuele Giuseppe Esposito had the splendid determination to convert
large parts of the codebase so that they no longer needed the AioContext
lock. This was a painstaking process, both in the actual code changes
required and the iterations of code review that Emanuele eked out of
Kevin and me over many months.

Kevin Wolf tackled multitudes of graph locking conversions to protect
in-flight I/O from run-time changes to the block graph as well as the
clang Thread Safety Analysis annotations that allow the compiler to
check whether the graph lock is being used correctly.

And me, well, I'm just here to add some pizzazz to the QEMU multi-queue
block layer :). Thank you to everyone who helped with this effort,
including Eric Blake, code reviewer extraordinaire, and others who I've
forgotten to mention.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20231205182011.1976568-11-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-12-21 22:49:27 +01:00
Stefan Hajnoczi
95bbddf9ad aio-wait: draw equivalence between AIO_WAIT_WHILE() and AIO_WAIT_WHILE_UNLOCKED()
Now that the AioContext lock no longer exists, AIO_WAIT_WHILE() and
AIO_WAIT_WHILE_UNLOCKED() are equivalent.

A future patch will get rid of AIO_WAIT_WHILE_UNLOCKED().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20231205182011.1976568-10-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-12-21 22:49:27 +01:00
Stefan Hajnoczi
c43d5bc858 block: remove bdrv_co_lock()
The bdrv_co_lock() and bdrv_co_unlock() functions are already no-ops.
Remove them.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20231205182011.1976568-8-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-12-21 22:49:27 +01:00
Stefan Hajnoczi
b49f4755c7 block: remove AioContext locking
This is the big patch that removes
aio_context_acquire()/aio_context_release() from the block layer and
affected block layer users.

There isn't a clean way to split this patch and the reviewers are likely
the same group of people, so I decided to do it in one patch.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-ID: <20231205182011.1976568-7-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-12-21 22:49:27 +01:00
Stefan Hajnoczi
6bc30f1949 graph-lock: remove AioContext locking
Stop acquiring/releasing the AioContext lock in
bdrv_graph_wrlock()/bdrv_graph_unlock() since the lock no longer has any
effect.

The distinction between bdrv_graph_wrunlock() and
bdrv_graph_wrunlock_ctx() becomes meaningless and they can be collapsed
into one function.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231205182011.1976568-6-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-12-21 22:49:27 +01:00
Kevin Wolf
6bc0bcc89f block: Fix deadlocks in bdrv_graph_wrunlock()
bdrv_graph_wrunlock() calls aio_poll(), which may run callbacks that
have a nested event loop. Nested event loops can depend on other
iothreads making progress, so in order to allow them to make progress it
must not hold the AioContext lock of another thread while calling
aio_poll().

This introduces a @bs parameter to bdrv_graph_wrunlock() whose
AioContext is temporarily dropped (which matches bdrv_graph_wrlock()),
and a bdrv_graph_wrunlock_ctx() that can be used if the BlockDriverState
doesn't necessarily exist any more when unlocking.

This also requires a change to bdrv_schedule_unref(), which was relying
on the incorrectly taken lock. It needs to take the lock itself now.
While this is a separate bug, it can't be fixed a separate patch because
otherwise the intermediate state would either deadlock or try to release
a lock that we don't even hold.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231115172012.112727-3-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[kwolf: Fixed up bdrv_schedule_unref()]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-21 12:45:21 +01:00
Michael Tokarev
a4dbf3fecb include/block/ufs.h: spelling fix: setted
Fixes: bc4e68d362 "hw/ufs: Initial commit for emulated Universal-Flash-Storage"
Reviewed-by: Jeuk Kim <jeuk20.kim@samsung.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-11-15 12:06:04 +03:00
Kevin Wolf
1f051dcbdf block: Protect bs->file with graph_lock
Almost all functions that access bs->file already take the graph
lock now. Add locking to the remaining users and finally annotate the
struct field itself as protected by the graph lock.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-25-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-08 17:56:18 +01:00
Kevin Wolf
79a5586648 block: Add missing GRAPH_RDLOCK annotations
This adds GRAPH_RDLOCK to some driver callbacks that are already called
with the graph lock held, and which will need the annotation because
they access bs->file, but don't have it yet.

This also covers a few callbacks that were not marked GRAPH_RDLOCK
before, but where updating BlockDriver is trivially possible.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-21-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-08 17:56:17 +01:00
Kevin Wolf
e2dd273754 block: Introduce bdrv_co_change_backing_file()
bdrv_change_backing_file() is called both inside and outside coroutine
context. This makes it difficult for it to take the graph lock
internally. It also means that driver implementations need to be able to
run outside of coroutines, too. Switch it to the usual model with a
coroutine based implementation and a co_wrapper instead. The new
function is marked GRAPH_RDLOCK.

As the co_wrapper now runs the function in the AioContext of the node
(as it should always have done), this is not GLOBAL_STATE_CODE() any
more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-20-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-08 17:56:17 +01:00
Kevin Wolf
004915a96a block: Protect bs->backing with graph_lock
Almost all functions that access bs->backing already take the graph
lock now. Add locking to the remaining users and finally annotate the
struct field itself as protected by the graph lock.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-18-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-08 17:56:17 +01:00
Kevin Wolf
ccd6a37947 block: Mark bdrv_replace_node() GRAPH_WRLOCK
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_replace_node(). Its callers may already want
to hold the graph lock and so wouldn't be able to call functions that
take it internally.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-17-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-07 19:14:20 +01:00
Kevin Wolf
d0f9fd94d9 block: Mark bdrv_set_backing_hd_drained() GRAPH_WRLOCK
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_set_backing_hd_drained(). Basically everthing
in the function needs the lock and its callers may already want to hold
the graph lock and so wouldn't be able to call functions that take it
internally.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-14-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-07 19:14:20 +01:00
Kevin Wolf
78a9c76eef block: Mark bdrv_cow_child() and callers GRAPH_RDLOCK
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_cow_child() need to hold a reader lock for the graph because it
accesses bs->backing.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-13-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-07 19:14:19 +01:00
Kevin Wolf
ec82cc41a7 block: Mark bdrv_filter_child() and callers GRAPH_RDLOCK
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_filter_child() need to hold a reader lock for the graph because it
accesses bs->file/backing.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-12-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-07 19:14:19 +01:00
Kevin Wolf
79bb762727 block: Mark bdrv_chain_contains() and callers GRAPH_RDLOCK
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_chain_contains() need to hold a reader lock for the graph because
it calls bdrv_filter_or_cow_bs(), which accesses bs->file/backing.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-11-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-07 19:14:19 +01:00
Kevin Wolf
9275fc72bd block: Mark bdrv_(un)freeze_backing_chain() and callers GRAPH_RDLOCK
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_(un)freeze_backing_chain() need to hold a reader lock for the
graph because it calls bdrv_filter_or_cow_child(), which accesses
bs->file/backing.

Use the opportunity to make bdrv_is_backing_chain_frozen() static, it
has no external callers.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-10-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-07 19:14:19 +01:00
Kevin Wolf
ad74751fc0 block: Mark bdrv_skip_filters() and callers GRAPH_RDLOCK
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_skip_filters() need to hold a reader lock for the graph because it
calls bdrv_filter_child(), which accesses bs->file/backing.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-9-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-07 19:14:19 +01:00
Kevin Wolf
430da832af block: Mark bdrv_skip_implicit_filters() and callers GRAPH_RDLOCK
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_skip_implicit_filters() need to hold a reader lock for the graph
because it calls bdrv_filter_child(), which accesses bs->file/backing.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-8-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-07 19:14:19 +01:00
Kevin Wolf
372b69f503 block: Mark bdrv_filter_or_cow_bs() and callers GRAPH_RDLOCK
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_filter_or_cow_bs() need to hold a reader lock for the graph because
it calls bdrv_filter_or_cow_child(), which accesses bs->file/backing.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-7-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-07 19:14:19 +01:00
Kevin Wolf
f3bbc53dc5 block: Mark block_job_add_bdrv() GRAPH_WRLOCK
Instead of taking the writer lock internally, require callers to already
hold it when calling block_job_add_bdrv(). These callers will typically
already hold the graph lock once the locking work is completed, which
means that they can't call functions that take it internally.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-6-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-07 19:14:19 +01:00
Kevin Wolf
03b9eaca54 block: Mark bdrv_root_attach_child() GRAPH_WRLOCK
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_root_attach_child(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-5-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-07 19:14:19 +01:00
Kevin Wolf
f5a3a270fe block: Mark bdrv_filter_bs() and callers GRAPH_RDLOCK
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_filter_bs() need to hold a reader lock for the graph because
it calls bdrv_filter_child(), which accesses bs->file/backing.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-4-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-07 19:14:19 +01:00
Kevin Wolf
067179868e block: Mark bdrv_has_zero_init() and callers GRAPH_RDLOCK
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_has_zero_init() need to hold a reader lock for the graph because
it calls bdrv_filter_bs(), which accesses bs->file/backing.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-3-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-07 19:14:19 +01:00
Kevin Wolf
221caadcc5 block: Mark bdrv_probe_blocksizes() and callers GRAPH_RDLOCK
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_probe_blocksizes() need to hold a reader lock for the graph because
it calls bdrv_filter_bs(), which accesses bs->file/backing.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-2-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-07 19:14:19 +01:00