This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_op_is_blocked() need to hold a reader lock for the graph
because it calls bdrv_get_device_or_node_name(), which accesses the
parents list of a node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-18-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
qcow2_signal_corruption() need to hold a reader lock for the graph
because it calls bdrv_get_node_name(), which accesses the parents list
of a node.
For some places, we know that they will hold the lock, but we don't have
the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock()
with a FIXME comment. These places will be removed once everything is
properly annotated.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-15-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_amend_options() need to hold a reader lock for the graph. This
removes an assume_graph_lock() call in crypto's implementation.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-14-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_get_parent_name() need to hold a reader lock for the graph
because it accesses the parents list of a node.
For some places, we know that they will hold the lock, but we don't have
the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock()
with a FIXME comment. These places will be removed once everything is
properly annotated.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-13-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_primary_child() need to hold a reader lock for the graph
because it accesses the children list of a node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-12-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_refresh_filename() need to hold a reader lock for the graph
because it accesses the children list of a node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-11-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_get_xdbg_block_graph() need to hold a reader lock for the graph
because it accesses the children list of a node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-10-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reopen isn't easy with respect to locking because many of its functions
need to iterate the graph, some change it, and then you get some drains
in the middle where you can't hold any locks.
Therefore just documents most of the functions to be unlocked, and take
locks internally before accessing the graph.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-9-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_snapshot_fallback() need to hold a reader lock for the graph
because it accesses the children list of a node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-8-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Draining recursively traverses the graph, therefore we need to make sure
that also such accesses to the graph are protected by the graph rdlock.
There are 3 different drain callers to consider:
1. drain in the main loop: no issue at all, rdlock is nop.
2. drain in an iothread: rdlock only works in main loop or coroutines,
so disallow it.
3. drain in a coroutine (regardless of AioContext): the drain mechanism
takes care of scheduling a BH in the bs->aio_context that will
then take care of perform the actual draining. This is wrong,
because as pointed in (2) if bs->aio_context is an iothread then
rdlock won't work. Therefore change bdrv_co_yield_to_drain to
schedule the BH in the main loop.
Caller (2) also implies that we need to modify test-bdrv-drain.c to
disallow draining in the iothreads.
For some places, we know that they will hold the lock, but we don't have
the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock()
with a FIXME comment. These places will be removed once everything is
properly annotated.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-6-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_first_blk() and bdrv_is_root_node() need to hold a reader lock
for the graph. These functions are the only functions in block-backend.c
that access the parent list of a node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-5-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a new wrapper type for GRAPH_RDLOCK functions that should be called
from coroutine context.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230929145157.45443-3-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230904100306.156197-4-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Include both coroutine and non-coroutine versions, the latter being
co_wrapper_mixed_bdrv_rdlock of the former.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230904100306.156197-3-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The next commit will add support for the optional extension
NBD_CMD_FLAG_PAYLOAD during NBD_CMD_BLOCK_STATUS, where the client can
request that the server only return a subset of negotiated contexts,
rather than all contexts. To make that task easier, this patch
populates the list of contexts to return on a per-command basis (for
now, identical to the full set of negotiated contexts).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230925192229.3186470-25-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Peform several minor refactorings of how the list of negotiated meta
contexts is managed, to make upcoming patches easier: Promote the
internal type NBDExportMetaContexts to the public opaque type
NBDMetaContexts, and mark exp const. Use a shorter member name in
NBDClient. Hoist calls to nbd_check_meta_context() earlier in their
callers, as the number of negotiated contexts may impact the flags
exposed in regards to an export, which in turn requires a new
parameter. Drop a redundant parameter to nbd_negotiate_meta_queries.
No semantic change intended on the success path; on the failure path,
dropping context in nbd_check_meta_export even when reporting an error
is safer.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230925192229.3186470-24-eblake@redhat.com>
Update the client code to be able to send an extended request, and
parse an extended header from the server. Note that since we reject
any structured reply with a too-large payload, we can always normalize
a valid header back into the compact form, so that the caller need not
deal with two branches of a union. Still, until a later patch lets
the client negotiate extended headers, the code added here should not
be reached. Note that because of the different magic numbers, it is
just as easy to trace and then tolerate a non-compliant server sending
the wrong header reply as it would be to insist that the server is
compliant.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230925192229.3186470-21-eblake@redhat.com>
[eblake: fix trace format]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Widen the length field of NBDRequest to 64-bits, although we can
assert that all current uses are still under 32 bits: either because
of NBD_MAX_BUFFER_SIZE which is even smaller (and where size_t can
still be appropriate, even on 32-bit platforms), or because nothing
ever puts us into NBD_MODE_EXTENDED yet (and while future patches will
allow larger transactions, the lengths in play here are still capped
at 32-bit). There are no semantic changes, other than a typo fix in a
couple of error messages.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230829175826.377251-23-eblake@redhat.com>
[eblake: fix assertion bug in nbd_co_send_simple_reply]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Add the constants and structs necessary for later patches to start
implementing the NBD_OPT_EXTENDED_HEADERS extension in both the client
and server, matching recent upstream nbd.git (through commit
e6f3b94a934). This patch does not change any existing behavior, but
merely sets the stage for upcoming patches.
This patch does not change the status quo that neither the client nor
server use a packed-struct representation for the request header.
While most of the patch adds new types, there is also some churn for
renaming the existing NBDExtent to NBDExtent32 to contrast it with
NBDExtent64, which I thought was a nicer name than NBDExtentExt.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230829175826.377251-22-eblake@redhat.com>
Once the 64-bit headers extension is enabled, the data layout we send
over the wire for a client request depends on the mode negotiated with
the server. Rather than adding a parameter to nbd_send_request, we
can add a member to struct NBDRequest, since it already does not
reflect on-wire format. Some callers initialize it directly; many
others rely on a common initialization point during
nbd_co_send_request(). At this point, there is no semantic change.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20230829175826.377251-21-eblake@redhat.com>
The upcoming patches for 64-bit extensions requires various points in
the protocol to make decisions based on what was negotiated. While we
could easily add a 'bool extended_headers' alongside the existing
'bool structured_reply', this does not scale well if more modes are
added in the future. Better is to expose the mode enum added in the
recent commit bfe04d0a7d out to a wider use in the code base.
Where the code previously checked for structured_reply being set or
clear, it now prefers checking for an inequality; this works because
the nodes are in a continuum of increasing abilities, and allows us to
touch fewer places if we ever insert other modes in the middle of the
enum. There should be no semantic change in this patch.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230829175826.377251-20-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
- Graph locking part 4 (node management)
- qemu-img map: report compressed data blocks
- block-backend: process I/O in the current AioContext
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmULHnURHGt3b2xmQHJl
ZGhhdC5jb20ACgkQfwmycsiPL9aB5hAAqH8To7WIUtg1rj1PY809ck78ghm18PKg
TNdN7IbrXQghX5foh2VgPwVVl+JaW2CSrJYWQcAO6AbvFduNIi9iKzI6RT0xKXpb
b8oQXS7zntFzwBv8ohOU5NSVJOgVmNP4h5qJIMmXgB9ZcLFG40zggVH2qQT7guUf
9MAc81kI/d5vvSHY0ZjdHjNOgwG4q1j8yytL7OFqWUfB8sXloUCA9lT7w4jIYD8L
v2StUOLWB01Zts2o8SCNaFxuajs6wUee8b/DM1cyPyLy4KtOdXvLKhq2NlXpLo2i
aZFr4PtizTVwrQZIJttA9jqM+QCsDOsiSat3BLNNsKUaCWHZB0rOGLCzMCtisyOo
4PzuL4UI21ik2zieO1qVM+Thqvw16kHtp6dD9pGk4X4ogGreGYEIxzBl79luR+AV
NCRizoeFWTHKymS1tSoKrWT9ZNHcLmwemO6Tt1rMYk9jV3T4uY5e1NwxaUavEfsX
f8dLfQjhNiySOoDknT1OSerBOVdTXURS2ri5H3GZxrxvJ4jOeFkn52C8r3YlZ3Wp
Cr9LCUJZeXgwY+Q1JQ3D4VLY8aZ83txpw6XKEy0eTEv5wxkBj5LWhXx7hNb5F3lg
bqaRYijVJn+P82wVxlftIzMfNeVBFHzFE90taPV5grJjr8lgrGBFmD7Puc97kfDX
oTDBwRxJeew=
=qTNA
-----END PGP SIGNATURE-----
Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging
Block layer patches
- Graph locking part 4 (node management)
- qemu-img map: report compressed data blocks
- block-backend: process I/O in the current AioContext
# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmULHnURHGt3b2xmQHJl
# ZGhhdC5jb20ACgkQfwmycsiPL9aB5hAAqH8To7WIUtg1rj1PY809ck78ghm18PKg
# TNdN7IbrXQghX5foh2VgPwVVl+JaW2CSrJYWQcAO6AbvFduNIi9iKzI6RT0xKXpb
# b8oQXS7zntFzwBv8ohOU5NSVJOgVmNP4h5qJIMmXgB9ZcLFG40zggVH2qQT7guUf
# 9MAc81kI/d5vvSHY0ZjdHjNOgwG4q1j8yytL7OFqWUfB8sXloUCA9lT7w4jIYD8L
# v2StUOLWB01Zts2o8SCNaFxuajs6wUee8b/DM1cyPyLy4KtOdXvLKhq2NlXpLo2i
# aZFr4PtizTVwrQZIJttA9jqM+QCsDOsiSat3BLNNsKUaCWHZB0rOGLCzMCtisyOo
# 4PzuL4UI21ik2zieO1qVM+Thqvw16kHtp6dD9pGk4X4ogGreGYEIxzBl79luR+AV
# NCRizoeFWTHKymS1tSoKrWT9ZNHcLmwemO6Tt1rMYk9jV3T4uY5e1NwxaUavEfsX
# f8dLfQjhNiySOoDknT1OSerBOVdTXURS2ri5H3GZxrxvJ4jOeFkn52C8r3YlZ3Wp
# Cr9LCUJZeXgwY+Q1JQ3D4VLY8aZ83txpw6XKEy0eTEv5wxkBj5LWhXx7hNb5F3lg
# bqaRYijVJn+P82wVxlftIzMfNeVBFHzFE90taPV5grJjr8lgrGBFmD7Puc97kfDX
# oTDBwRxJeew=
# =qTNA
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 20 Sep 2023 12:31:49 EDT
# gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg: issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (28 commits)
block: mark aio_poll as non-coroutine
block-backend: process zoned requests in the current AioContext
block-backend: process I/O in the current AioContext
test-bdrv-drain: avoid race with BH in IOThread drain test
block: remove AIOCBInfo->get_aio_context()
qemu-img: map: report compressed data blocks
block: add BDRV_BLOCK_COMPRESSED flag for bdrv_block_status()
block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK
block: Mark bdrv_unref_child() GRAPH_WRLOCK
block: Mark bdrv_root_unref_child() GRAPH_WRLOCK
block: Take graph rdlock in bdrv_change_aio_context()
block: Take graph rdlock in bdrv_drop_intermediate()
block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK
block: Mark bdrv_child_perm() GRAPH_RDLOCK
block: Mark bdrv_get_cumulative_perm() and callers GRAPH_RDLOCK
block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK
block: Mark bdrv_attach_child() GRAPH_WRLOCK
block: Call transaction callbacks with lock held
block: Mark bdrv_attach_child_common() GRAPH_WRLOCK
block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
- Fix for file-posix's zoning code crashing on I/O errors
- Throttling refactoring
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEy2LXoO44KeRfAE00ofpA0JgBnN8FAmTxnMISHGhyZWl0ekBy
ZWRoYXQuY29tAAoJEKH6QNCYAZzfYkUP+gMG9hhzvgjj/tw9rEBQjciihzcQmqQJ
2Mm37RH2jj5bnnTdaTbMkcRRwVhncYSCwK9q5EYVbZmU9C/v4YJmsSEQlcl7wVou
hbPUv6NHaBrJZX9nxNSa2RHui6pZMLKa/D0rJVB7NjYBrrRtiPo7kiLVQYjYXa2g
kcCCfY4t3Z2RxOP31mMXRjYlhJE9bIuZdTEndrKme8KS2JGPZEJ9xjkoW1tj96EX
oc/Cg2vk7AEtsFYA0bcD8fTFkBDJEwyYl3usu7Tk24pvH16jk7wFSqRVSsDMfnER
tG8X3mHLIY0hbSkpzdHJdXINvZ6FWpQb0CGzIKr+pMiuWVdWr1HglBr0m4pVF+Y4
A6AI6VX2JJgtacypoDyCZC9mzs1jIdeiwq9v5dyuikJ6ivTwEEoeoSLnLTN3AjXn
0mtQYzgCg5Gd6+rTo7XjSO9SSlbaVrDl/B2eXle6tmIFT5k+86fh0hc+zTmP8Rkw
Knbc+5Le95wlMrOUNx2GhXrTGwX510hLxKboho/LITxtAzqvXnEJKrYbnkm3WPnw
wfHnR5VQH1NKEpiH/p33og6OV/vu9e7vgp0ZNZV136SnzC90C1zMUwg2simJW701
34EtN0XBX8XBKrxfe7KscV9kRE8wrWWJVbhp+WOcQEomGI8uraxzWqDIk/v7NZXv
m4XBscaB+Iri
=oKgk
-----END PGP SIGNATURE-----
Merge tag 'pull-block-2023-09-01' of https://gitlab.com/hreitz/qemu into staging
Block patches
- Fix for file-posix's zoning code crashing on I/O errors
- Throttling refactoring
# -----BEGIN PGP SIGNATURE-----
#
# iQJGBAABCAAwFiEEy2LXoO44KeRfAE00ofpA0JgBnN8FAmTxnMISHGhyZWl0ekBy
# ZWRoYXQuY29tAAoJEKH6QNCYAZzfYkUP+gMG9hhzvgjj/tw9rEBQjciihzcQmqQJ
# 2Mm37RH2jj5bnnTdaTbMkcRRwVhncYSCwK9q5EYVbZmU9C/v4YJmsSEQlcl7wVou
# hbPUv6NHaBrJZX9nxNSa2RHui6pZMLKa/D0rJVB7NjYBrrRtiPo7kiLVQYjYXa2g
# kcCCfY4t3Z2RxOP31mMXRjYlhJE9bIuZdTEndrKme8KS2JGPZEJ9xjkoW1tj96EX
# oc/Cg2vk7AEtsFYA0bcD8fTFkBDJEwyYl3usu7Tk24pvH16jk7wFSqRVSsDMfnER
# tG8X3mHLIY0hbSkpzdHJdXINvZ6FWpQb0CGzIKr+pMiuWVdWr1HglBr0m4pVF+Y4
# A6AI6VX2JJgtacypoDyCZC9mzs1jIdeiwq9v5dyuikJ6ivTwEEoeoSLnLTN3AjXn
# 0mtQYzgCg5Gd6+rTo7XjSO9SSlbaVrDl/B2eXle6tmIFT5k+86fh0hc+zTmP8Rkw
# Knbc+5Le95wlMrOUNx2GhXrTGwX510hLxKboho/LITxtAzqvXnEJKrYbnkm3WPnw
# wfHnR5VQH1NKEpiH/p33og6OV/vu9e7vgp0ZNZV136SnzC90C1zMUwg2simJW701
# 34EtN0XBX8XBKrxfe7KscV9kRE8wrWWJVbhp+WOcQEomGI8uraxzWqDIk/v7NZXv
# m4XBscaB+Iri
# =oKgk
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 01 Sep 2023 04:11:46 EDT
# gpg: using RSA key CB62D7A0EE3829E45F004D34A1FA40D098019CDF
# gpg: issuer "hreitz@redhat.com"
# gpg: Good signature from "Hanna Reitz <hreitz@redhat.com>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: CB62 D7A0 EE38 29E4 5F00 4D34 A1FA 40D0 9801 9CDF
* tag 'pull-block-2023-09-01' of https://gitlab.com/hreitz/qemu:
tests/file-io-error: New test
file-posix: Simplify raw_co_prw's 'out' zone code
file-posix: Fix zone update in I/O error path
file-posix: Check bs->bl.zoned for zone info
file-posix: Clear bs->bl.zoned on error
block/throttle-groups: Use ThrottleDirection instread of bool is_write
fsdev: Use ThrottleDirection instread of bool is_write
throttle: use THROTTLE_MAX/ARRAY_SIZE for hard code
throttle: use enum ThrottleDirection instead of bool is_write
cryptodev: use NULL throttle timer cb for read direction
test-throttle: test read only and write only
throttle: support read-only and write-only
test-throttle: use enum ThrottleDirection
throttle: introduce enum ThrottleDirection
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It is forbidden to block on the event loop during a coroutine, as that
can cause deadlocks due to recursive locking.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230908075458.527013-1-pbonzini@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The synchronous bdrv_aio_cancel() function needs the acb's AioContext so
it can call aio_poll() to wait for cancellation.
It turns out that all users run under the BQL in the main AioContext, so
this callback is not needed.
Remove the callback, mark bdrv_aio_cancel() GLOBAL_STATE_CODE just like
its blk_aio_cancel() caller, and poll the main loop AioContext.
The purpose of this cleanup is to identify bdrv_aio_cancel() as an API
that does not work with the multi-queue block layer.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230912231037.826804-2-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Functions qcow2_get_host_offset(), get_cluster_offset(),
vmdk_co_block_status() explicitly report compressed cluster types when data
is compressed. However, this information is never passed further. Let's
make use of it by adding new BDRV_BLOCK_COMPRESSED flag for
bdrv_block_status(), so that caller may know that the data range is
compressed. In particular, we're going to use this flag to tweak
"qemu-img map" output.
This new flag is only being utilized by qcow, qcow2 and vmdk formats, as only
those support compression.
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Message-ID: <20230907210226.953821-2-andrey.drobyshev@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The functions read the parents list in the generic block layer, so we
need to hold the graph lock already there. The BlockDriver
implementations actually modify the graph, so it has to be a writer
lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-22-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_unref_child(). These callers will typically
already hold the graph lock once the locking work is completed, which
means that they can't call functions that take it internally.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-21-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_root_unref_child(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-20-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_child_perm() need to hold a reader lock for the graph because
some implementations access the children list of a node.
The callers of bdrv_child_perm() conveniently already hold the lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-16-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The function reads the parents list, so it needs to hold the graph lock.
This happens to result in BlockDriver.bdrv_set_perm() to be called with
the graph lock held. For consistency, make it the same for all of the
BlockDriver callbacks for updating permissions and annotate the function
pointers with GRAPH_RDLOCK_PTR.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-15-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The function reads the parents list, so it needs to hold the graph lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-14-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_attach_child_common(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-13-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a new wrapper type for GRAPH_WRLOCK functions that should be called
from coroutine context.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-7-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_unref() is called by a lot of places that need to hold the graph
lock (it naturally happens in the context of operations that change the
graph). However, bdrv_unref() takes the graph writer lock internally, so
it can't actually be called while already holding a graph lock without
causing a deadlock.
bdrv_unref() also can't just become GRAPH_WRLOCK because it drains the
node before closing it, and draining requires that the graph is
unlocked.
The solution is to defer deleting the node until we don't hold the lock
any more and draining is possible again.
Note that keeping images open for longer than necessary can create
problems, too: You can't open an image again before it is really closed
(if image locking didn't prevent it, it would cause corruption).
Reopening an image immediately happens at least during bdrv_open() and
bdrv_co_create().
In order to solve this problem, make sure to run the deferred unref in
bdrv_graph_wrunlock(), i.e. the first possible place where we can drain
again. This is also why bdrv_schedule_unref() is marked GRAPH_WRLOCK.
The output of iotest 051 is updated because the additional polling
changes the order of HMP output, resulting in a new "(qemu)" prompt in
the test output that was previously on a separate line and filtered out.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230911094620.45040-6-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The last call site of this function has been removed by commit
c04d0ab026 ("qemu-img: Let info print block graph").
Reviewed-by: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-ID: <20230901184605.32260-2-farosas@suse.de>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
CoMutex has poor performance when lock contention is high. The tracked
requests list is accessed frequently and performance suffers in QEMU
multi-queue block layer scenarios.
It is not necessary to use CoMutex for the requests lock. The lock is
always released across coroutine yield operations. It is held for
relatively short periods of time and it is not beneficial to yield when
the lock is held by another coroutine.
Change the lock type from CoMutex to QemuMutex to improve multi-queue
block layer performance. fio randread bs=4k iodepth=64 with 4 IOThreads
handling a virtio-blk device with 8 virtqueues improves from 254k to
517k IOPS (+203%). Full benchmark results and configuration details are
available here:
980c40845d
In the future we may wish to introduce thread-local tracked requests
lists to avoid lock contention completely. That would be much more
involved though.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230808155852.2745350-3-stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
aio_context is always NULL, so drop it.
Suggested-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230830224802.493686-2-stefanha@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Universal Flash Storage (UFS) is a high-performance mass storage device
with a serial interface. It is primarily used as a high-performance
data storage device for embedded applications.
This commit contains code for UFS device to be recognized
as a UFS PCI device.
Patches to handle UFS logical unit and Transfer Request will follow.
Signed-off-by: Jeuk Kim <jeuk20.kim@samsung.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 10232660d462ee5cd10cf673f1a9a1205fc8276c.1693980783.git.jeuk20.kim@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
When target image is using subclusters, and we align the request during
copy-on-read, it makes sense to align to subcluster_size rather than
cluster_size. Otherwise we end up with unnecessary allocations.
This commit renames bdrv_round_to_clusters() to bdrv_round_to_subclusters()
and utilizes subcluster_size field of BlockDriverInfo to make necessary
alignments. It affects copy-on-read as well as mirror job (which is
using bdrv_round_to_clusters()).
This change also fixes the following bug with failing assert (covered by
the test in the subsequent commit):
qemu-img create -f qcow2 base.qcow2 64K
qemu-img create -f qcow2 -o extended_l2=on,backing_file=base.qcow2,backing_fmt=qcow2 img.qcow2 64K
qemu-io -c "write -P 0xaa 0 2K" img.qcow2
qemu-io -C -c "read -P 0x00 2K 62K" img.qcow2
qemu-io: ../block/io.c:1236: bdrv_co_do_copy_on_readv: Assertion `skip_bytes < pnum' failed.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230711172553.234055-3-andrey.drobyshev@virtuozzo.com>
This is going to be used in the subsequent commit as requests alignment
(in particular, during copy-on-read). This value only makes sense for
the formats which support subclusters (currently QCOW2 only). If this
field isn't set by driver's own bdrv_get_info() implementation, we
simply set it equal to the cluster size thus treating each cluster as
having a single subcluster.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230711172553.234055-2-andrey.drobyshev@virtuozzo.com>
'bool is_write' style is obsolete from throttle framework, adapt
block throttle groups to the new style:
- use ThrottleDirection instead of 'bool is_write'. Ex,
schedule_next_request(ThrottleGroupMember *tgm, bool is_write)
-> schedule_next_request(ThrottleGroupMember *tgm, ThrottleDirection direction)
- use THROTTLE_MAX instead of hard code. Ex, ThrottleGroupMember *tokens[2]
-> ThrottleGroupMember *tokens[THROTTLE_MAX]
- use ThrottleDirection instead of hard code on iteration. Ex, (i = 0; i < 2; i++)
-> for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++)
Use a simple python script to test the new style:
#!/usr/bin/python3
import subprocess
import random
import time
commands = ['virsh blkdeviotune jammy vda --write-bytes-sec ', \
'virsh blkdeviotune jammy vda --write-iops-sec ', \
'virsh blkdeviotune jammy vda --read-bytes-sec ', \
'virsh blkdeviotune jammy vda --read-iops-sec ']
for loop in range(1, 1000):
time.sleep(random.randrange(3, 5))
command = commands[random.randrange(0, 3)] + str(random.randrange(0, 1000000))
subprocess.run(command, shell=True, check=True)
This works fine.
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20230728022006.1098509-10-pizhenwei@bytedance.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Deciphering the hard-coded list of integer return values from
nbd_start_negotiate() will only get more confusing when adding support
for 64-bit extended headers. Better is to name things in an enum.
Although the function in question is private to client.c, putting the
enum in a public header and including an enum-to-string conversion
will allow its use in more places in upcoming patches.
The enum is intentionally laid out so that operators like <= can be
used to group multiple modes with similar characteristics, and where
the least powerful mode has value 0, even though this patch does not
exploit that. No semantic change intended.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-9-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Externally, libnbd exposed the 64-bit opaque marker for each client
NBD packet as the "cookie", because it was less confusing when
contrasted with 'struct nbd_handle *' holding all libnbd state. It
also avoids confusion between the noun 'handle' as a way to identify a
packet and the verb 'handle' for reacting to things like signals.
Upstream NBD changed their spec to favor the name "cookie" based on
libnbd's recommendations[1], so we can do likewise.
[1] https://github.com/NetworkBlockDevice/nbd/commit/ca4392eb2b
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-6-eblake@redhat.com>
[eblake: typo fix]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests. As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2]. Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.
Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions. Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.
Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.
[1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934
[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction. But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
We had a mix of struct declarations followed by typedefs, and direct
struct definitions as part of a typedef. Pick a single style. Also
float forward declarations of opaque types to the top of the file,
rather than interspersed with function declarations, which will help a
future patch that wants to expose yet another opaque type that will be
referenced in NBDRequest. No semantic impact.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20230608135653.2918540-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
[eblake: alter patch per mailing list feedback]
Signed-off-by: Eric Blake <eblake@redhat.com>
bdrv_co_debug_event was recently introduced, with bdrv_debug_event
becoming a wrapper for use in unknown context. Because most of the
time bdrv_debug_event is used on a BdrvChild via the wrapper macro
BLKDBG_EVENT, introduce a similar macro BLKDBG_CO_EVENT that calls
bdrv_co_debug_event, and switch whenever possible.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20230601115145.196465-13-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If the caller keeps the AioContext lock for a block node in an iothread,
polling in bdrv_graph_wrlock() deadlocks if the condition isn't
fulfilled immediately.
Now that all callers make sure to actually have the AioContext locked
when they call bdrv_replace_child_noperm() like they should, we can
change bdrv_graph_wrlock() to take a BlockDriverState whose AioContext
lock the caller holds (NULL if it doesn't) and unlock it temporarily
while polling.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230605085711.21261-11-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
No block driver implements .bdrv_co_io_plug() anymore. Get rid of the
function pointers.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20230530180959.1108766-7-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.
Note that a dev_max_batch check is dropped in laio_io_unplug() because
the semantics of unplug_fn() are different from .bdrv_co_unplug():
1. unplug_fn() is only called when the last blk_io_unplug() call occurs,
not every time blk_io_unplug() is called.
2. unplug_fn() is per-thread, not per-BlockDriverState, so there is no
way to get per-BlockDriverState fields like dev_max_batch.
Therefore this condition cannot be moved to laio_unplug_fn(). It is not
obvious that this condition affects performance in practice, so I am
removing it instead of trying to come up with a more complex mechanism
to preserve the condition.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20230530180959.1108766-6-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20230530180959.1108766-5-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
All callers now pass is_external=false to aio_set_fd_handler() and
aio_set_event_notifier(). The aio_disable_external() API that
temporarily disables fd handlers that were registered is_external=true
is therefore dead code.
Remove aio_disable_external(), aio_enable_external(), and the
is_external arguments to aio_set_fd_handler() and
aio_set_event_notifier().
The entire test-fdmon-epoll test is removed because its sole purpose was
testing aio_disable_external().
Parts of this patch were generated using the following coccinelle
(https://coccinelle.lip6.fr/) semantic patch:
@@
expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque;
@@
- aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque)
+ aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, opaque)
@@
expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready;
@@
- aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, io_poll_ready)
+ aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready)
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230516190238.8401-21-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The FUSE export calls blk_exp_ref/unref() without the AioContext lock.
Instead of fixing the FUSE export, adjust blk_exp_ref/unref() so they
work without the AioContext lock. This way it's less error-prone.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-15-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For simplicity, always run BlockDevOps .drained_begin/end/poll()
callbacks in the main loop thread. This makes it easier to implement the
callbacks and avoids extra locks.
Move the function pointer declarations from the I/O Code section to the
Global State section for BlockDevOps, BdrvChildClass, and BlockDriver.
Narrow IO_OR_GS_CODE() to GLOBAL_STATE_CODE() where appropriate.
The test-bdrv-drain test case calls bdrv_drain() from an IOThread. This
is now only allowed from coroutine context, so update the test case to
run in a coroutine.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230516190238.8401-11-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All of the functions that currently take a BlockDriverState, BdrvChild
or BlockBackend as their first parameter expect the associated
AioContext to be locked when they are called. In the case of
no_co_wrappers, they are called from bottom halves directly in the main
loop, so no other caller can be expected to take the lock for them. This
can result in assertion failures because a lock that isn't taken is
released in nested event loops.
Looking at the first parameter is already done by co_wrappers to decide
where the coroutine should run, so doing the same in no_co_wrappers is
only consistent. Take the lock in the generated bottom halves to fix the
problem.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230525124713.401149-2-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When jobs are sleeping, for example to enforce a given rate limit, they
can be reentered early, in particular in order to get paused, to update
the rate limit or to get cancelled.
Before this patch, they behave in this case as if they had fully
completed their rate limiting delay. This means that requests are sped
up beyond their limit, violating the constraints that the user gave us.
Change the block jobs to sleep in a loop until the necessary delay is
completed, while still allowing cancelling them immediately as well
pausing (handled by the pause point in job_sleep_ns()) and updating the
rate limit.
This change is also motivated by iotests cases being prone to fail
because drain operations pause and unpause them so often that block jobs
complete earlier than they are supposed to. In particular, the next
commit would fail iotests 030 without this change.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230510203601.418015-8-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
These are functions that modify the graph, so they must be able to take
a writer lock. This is impossible if they already hold the reader lock.
If they need a reader lock for some of their operations, they should
take it internally.
Many of them go through blk_*(), which will always take the lock itself.
Direct calls of bdrv_*() need to take the reader lock. Note that while
locking for bdrv_co_*() calls is checked by TSA, this is not the case
for the mixed_coroutine_fns bdrv_*(). Holding the lock is still required
when they are called from coroutine context like here!
This effectively reverts 4ec8df0183, but adds some internal locking
instead.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230510203601.418015-2-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Taking account of the new zone append write operation for zoned devices,
BLOCK_ACCT_ZONE_APPEND enum is introduced as other I/O request type (read,
write, flush).
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Message-id: 20230508051916.178322-3-faithilikerun@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
A zone append command is a write operation that specifies the first
logical block of a zone as the write position. When writing to a zoned
block device using zone append, the byte offset of the call may point at
any position within the zone to which the data is being appended. Upon
completion the device will respond with the position where the data has
been written in the zone.
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20230508051510.177850-3-faithilikerun@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Since Linux doesn't have a user API to issue zone append operations to
zoned devices from user space, the file-posix driver is modified to add
zone append emulation using regular writes. To do this, the file-posix
driver tracks the wp location of all zones of the device. It uses an
array of uint64_t. The most significant bit of each wp location indicates
if the zone type is conventional zones.
The zones wp can be changed due to the following operations issued:
- zone reset: change the wp to the start offset of that zone
- zone finish: change to the end location of that zone
- write to a zone
- zone append
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Message-id: 20230508051510.177850-2-faithilikerun@gmail.com
[Fix errno propagation from handle_aiocb_zone_mgmt()
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Putting zoned/non-zoned BlockDrivers on top of each other is not
allowed.
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20230508045533.175575-6-faithilikerun@gmail.com
Message-id: 20230324090605.28361-6-faithilikerun@gmail.com
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
<philmd@linaro.org> and clarify that the check is about zoned
BlockDrivers.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Add zoned device option to host_device BlockDriver. It will be presented only
for zoned host block devices. By adding zone management operations to the
host_block_device BlockDriver, users can use the new block layer APIs
including Report Zone and four zone management operations
(open, close, finish, reset, reset_all).
Qemu-io uses the new APIs to perform zoned storage commands of the device:
zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
zone_finish(zf).
For example, to test zone_report, use following command:
$ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0
-c "zrp offset nr_zones"
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20230508045533.175575-4-faithilikerun@gmail.com
Message-id: 20230324090605.28361-4-faithilikerun@gmail.com
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
<philmd@linaro.org> and remove spurious ret = -errno in
raw_co_zone_mgmt().
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Use get_sysfs_str_val() to get the string value of device
zoned model. Then get_sysfs_zoned_model() can convert it to
BlockZoneModel type of QEMU.
Use get_sysfs_long_val() to get the long value of zoned device
information.
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20230508045533.175575-3-faithilikerun@gmail.com
Message-id: 20230324090605.28361-3-faithilikerun@gmail.com
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
<philmd@linaro.org>.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Sam Li <faithilikerun@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20230508045533.175575-2-faithilikerun@gmail.com
Message-id: 20230324090605.28361-2-faithilikerun@gmail.com
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
<philmd@linaro.org>.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_refresh_limits() need to hold a reader lock for the graph because
it accesses the children list of a node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230504115750.54437-21-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_recurse_can_replace() need to hold a reader lock for the graph
because it accesses the children list of a node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230504115750.54437-20-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_query_block_graph_info() need to hold a reader lock for the graph
because it accesses the children list of a node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230504115750.54437-19-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of amend
callbacks in BlockDriver need to hold a reader lock for the graph.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230504115750.54437-17-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_co_debug_event() need to hold a reader lock for the graph.
Unfortunately we cannot use a co_wrapper_bdrv_rdlock (i.e. make the
coroutine wrapper a no_coroutine_fn), because the function is called
(using the BLKDBG_EVENT macro) by mixed functions that run both in
coroutine and non-coroutine context (for example many of the functions
in qcow2-cluster.c and qcow2-refcount.c).
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230504115750.54437-16-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_co_get_info() need to hold a reader lock for the graph.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230504115750.54437-15-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_co_get_allocated_file_size() need to hold a reader lock for the
graph.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20230504115750.54437-14-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Drivers were a bit confused about whether .bdrv_open can run in a
coroutine and whether or not it holds a graph lock.
It cannot keep a graph lock from the caller across the whole function
because it both changes the graph (requires a writer lock) and does I/O
(requires a reader lock). Therefore, it should take these locks
internally as needed.
The functions used to be called in coroutine context during image
creation. This was buggy for other reasons, and as of commit 32192301,
all block drivers go through no_co_wrappers. So it is not called in
coroutine context any more.
Fix qcow2 and qed to work with the correct assumptions: The graph lock
needs to be taken internally instead of just assuming it's already
there, and the coroutine path is dead code that can be removed.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230504115750.54437-9-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
GRAPH_RDLOCK_GUARD() and GRAPH_RDLOCK_GUARD_MAINLOOP() only take a
reader lock for the graph, so the correct annotation for them to use is
TSA_ASSERT_SHARED rather than TSA_ASSERT.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20230504115750.54437-8-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For some functions, it is part of their interface to be called without
holding the graph lock. Add a new macro to document this.
The macro expands to TSA_EXCLUDES(), which is a relatively weak check
because it passes in cases where the compiler just doesn't know if the
lock is held. Function pointers can't be checked at all. Therefore, its
primary purpose is documentation.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230504115750.54437-7-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
These functions must not be called in coroutine context, because they
need write access to the graph.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230504115750.54437-4-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Migration code can call bdrv_activate() in coroutine context, whereas
other callers call it outside of coroutines. As it calls other code that
is not supposed to run in coroutines, standardise on running outside of
coroutines.
This adds a no_co_wrapper to switch to the main loop before calling
bdrv_activate().
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230504115750.54437-3-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There is no need for the AioContext lock in aio_wait_bh_oneshot().
It's easy to remove the lock from existing callers and then switch from
AIO_WAIT_WHILE() to AIO_WAIT_WHILE_UNLOCKED() in aio_wait_bh_oneshot().
Document that the AioContext lock should not be held across
aio_wait_bh_oneshot(). Holding a lock across aio_poll() can cause
deadlock so we don't want callers to do that.
This is a step towards getting rid of the AioContext lock.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230404153307.458883-1-stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
After the recent introduction of many new coroutine callbacks,
a couple calls from non-coroutine_fn to coroutine_fn have sneaked
in; fix them.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20230406101752.242125-1-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Devices can pass their MemoryReentrancyGuard (from their DeviceState),
when creating new BHes. Then, the async API will toggle the guard
before/after calling the BH call-back. This prevents bh->mmio reentrancy
issues.
Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Message-Id: <20230427211013.2994127-3-alxndr@bu.edu>
[thuth: Fix "line over 90 characters" checkpatch.pl error]
Signed-off-by: Thomas Huth <thuth@redhat.com>
thread_pool_submit_aio() is always called on a pool taken from
qemu_get_current_aio_context(), and that is the only intended
use: each pool runs only in the same thread that is submitting
work to it, it can't run anywhere else.
Therefore simplify the thread_pool_submit* API and remove the
ThreadPool function parameter.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20230203131731.851116-5-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Use qemu_get_current_aio_context() where possible, since we always
submit work to the current thread anyways.
We want to also be sure that the thread submitting the work is
the same as the one processing the pool, to avoid adding
synchronization to the pool list.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20230203131731.851116-4-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Remove usage of aio_context_acquire by always submitting asynchronous
AIO to the current thread's LuringState.
In order to prevent mistakes from the caller side, avoid passing LuringState
in luring_io_{plug/unplug} and luring_co_submit, and document the functions
to make clear that they work in the current thread's AioContext.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20230203131731.851116-3-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Remove usage of aio_context_acquire by always submitting asynchronous
AIO to the current thread's LinuxAioState.
In order to prevent mistakes from the caller side, avoid passing LinuxAioState
in laio_io_{plug/unplug} and laio_co_submit, and document the functions
to make clear that they work in the current thread's AioContext.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20230203131731.851116-2-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fixup a few minor typos
Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Message-Id: <20230313003744.55476-1-wilfred.mallawa@opensource.wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The introduction of the graph lock is causing blk_get_geometry, a hot function
used in the I/O path, to create a coroutine. However, the only part that really
needs to run in coroutine context is the call to bdrv_co_refresh_total_sectors,
which in turn only happens in the rare case of host CD-ROM devices.
So, write by hand the three wrappers on the path from blk_co_get_geometry to
bdrv_co_refresh_total_sectors, so that the coroutine wrapper is only created
if bdrv_nb_sectors actually calls bdrv_refresh_total_sectors.
Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20230407153303.391121-9-pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_co_get_geometry is only used in blk_co_get_geometry. Inline it in
there, to reduce the number of wrappers for bs->total_sectors.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20230407153303.391121-7-pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fill in the field in BlockLimits directly for host devices, and
copy it from there for the raw format.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20230407153303.391121-5-pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
At the protocol level, has_variable_length only needs to be true in the
very special case of host CD-ROM drives, so that they do not need an
explicit monitor command to read the new size when a disc is loaded
in the tray.
However, at the format level has_variable_length has to be true for all
raw blockdevs and for all filters, even though in practice the length
depends on the underlying file and thus will not change except in the
case of host CD-ROM drives.
As a first step towards computing an accurate value of has_variable_length,
add the value into the BlockLimits structure and initialize the field
from the BlockDriver.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20230407153303.391121-2-pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
blk_get_geometry() eventually calls bdrv_nb_sectors(), which is a
co_wrapper_mixed_bdrv_rdlock. This means that when it is called from
coroutine context, it already assume to have the graph locked.
However, virtio_blk_sect_range_ok() in block/export/virtio-blk-handler.c
(used by vhost-user-blk and VDUSE exports) runs in a coroutine, but
doesn't take the graph lock - blk_*() functions are generally expected
to do that internally. This causes an assertion failure when accessing
an export for the first time if it runs in an iothread.
This is an example of the crash:
$ ./storage-daemon/qemu-storage-daemon --object iothread,id=th0 --blockdev file,filename=/home/kwolf/images/hd.img,node-name=disk --export vhost-user-blk,addr.type=unix,addr.path=/tmp/vhost.sock,node-name=disk,id=exp0,iothread=th0
qemu-storage-daemon: ../block/graph-lock.c:268: void assert_bdrv_graph_readable(void): Assertion `qemu_in_main_thread() || reader_count()' failed.
(gdb) bt
#0 0x00007ffff6eafe5c in __pthread_kill_implementation () from /lib64/libc.so.6
#1 0x00007ffff6e5fa76 in raise () from /lib64/libc.so.6
#2 0x00007ffff6e497fc in abort () from /lib64/libc.so.6
#3 0x00007ffff6e4971b in __assert_fail_base.cold () from /lib64/libc.so.6
#4 0x00007ffff6e58656 in __assert_fail () from /lib64/libc.so.6
#5 0x00005555556337a3 in assert_bdrv_graph_readable () at ../block/graph-lock.c:268
#6 0x00005555555fd5a2 in bdrv_co_nb_sectors (bs=0x5555564c5ef0) at ../block.c:5847
#7 0x00005555555ee949 in bdrv_nb_sectors (bs=0x5555564c5ef0) at block/block-gen.c:256
#8 0x00005555555fd6b9 in bdrv_get_geometry (bs=0x5555564c5ef0, nb_sectors_ptr=0x7fffef7fedd0) at ../block.c:5884
#9 0x000055555562ad6d in blk_get_geometry (blk=0x5555564cb200, nb_sectors_ptr=0x7fffef7fedd0) at ../block/block-backend.c:1624
#10 0x00005555555ddb74 in virtio_blk_sect_range_ok (blk=0x5555564cb200, block_size=512, sector=0, size=512) at ../block/export/virtio-blk-handler.c:44
#11 0x00005555555dd80d in virtio_blk_process_req (handler=0x5555564cbb98, in_iov=0x7fffe8003830, out_iov=0x7fffe8003860, in_num=1, out_num=0) at ../block/export/virtio-blk-handler.c:189
#12 0x00005555555dd546 in vu_blk_virtio_process_req (opaque=0x7fffe8003800) at ../block/export/vhost-user-blk-server.c:66
#13 0x00005555557bf4a1 in coroutine_trampoline (i0=-402635264, i1=32767) at ../util/coroutine-ucontext.c:177
#14 0x00007ffff6e75c20 in ?? () from /lib64/libc.so.6
#15 0x00007fffefffa870 in ?? ()
#16 0x0000000000000000 in ?? ()
Fix this by creating a new blk_co_get_geometry() that takes the lock,
and changing blk_get_geometry() to be a co_wrapper_mixed around it.
To make the resulting code cleaner, virtio-blk-handler.c can directly
call the coroutine version now (though that wouldn't be necessary for
fixing the bug, taking the lock in blk_co_get_geometry() is what fixes
it).
Fixes: 8ab8140a04
Reported-by: Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230327113959.60071-1-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The barrier comes after an atomic increment, so it is enough to use
smp_mb__after_rmw(); this avoids a double barrier on x86 systems.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add emulation of TP4146 ("Flexible Data Placement").
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jesper Devantier <j.devantier@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Add support for the Directive Send and Recv commands and the Identify
directive.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Add the mandatory Endurance Group identify data structures and log
pages.
For now, all namespaces in a subsystem belongs to a single Endurance
Group.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_co_refresh_total_sectors() need to hold a reader lock for the
graph.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230203152202.49054-24-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>