Commit Graph

40 Commits

Author SHA1 Message Date
Eric Blake
c8a76dbd90 nbd/server: CVE-2024-7409: Cap default max-connections to 100
Allowing an unlimited number of clients to any web service is a recipe
for a rudimentary denial of service attack: the client merely needs to
open lots of sockets without closing them, until qemu no longer has
any more fds available to allocate.

For qemu-nbd, we default to allowing only 1 connection unless more are
explicitly asked for (-e or --shared); this was historically picked as
a nice default (without an explicit -t, a non-persistent qemu-nbd goes
away after a client disconnects, without needing any additional
follow-up commands), and we are not going to change that interface now
(besides, someday we want to point people towards qemu-storage-daemon
instead of qemu-nbd).

But for qemu proper, and the newer qemu-storage-daemon, the QMP
nbd-server-start command has historically had a default of unlimited
number of connections, in part because unlike qemu-nbd it is
inherently persistent until nbd-server-stop.  Allowing multiple client
sockets is particularly useful for clients that can take advantage of
MULTI_CONN (creating parallel sockets to increase throughput),
although known clients that do so (such as libnbd's nbdcopy) typically
use only 8 or 16 connections (the benefits of scaling diminish once
more sockets are competing for kernel attention).  Picking a number
large enough for typical use cases, but not unlimited, makes it
slightly harder for a malicious client to perform a denial of service
merely by opening lots of connections withot progressing through the
handshake.

This change does not eliminate CVE-2024-7409 on its own, but reduces
the chance for fd exhaustion or unlimited memory usage as an attack
surface.  On the other hand, by itself, it makes it more obvious that
with a finite limit, we have the problem of an unauthenticated client
holding 100 fds opened as a way to block out a legitimate client from
being able to connect; thus, later patches will further add timeouts
to reject clients that are not making progress.

This is an INTENTIONAL change in behavior, and will break any client
of nbd-server-start that was not passing an explicit max-connections
parameter, yet expects more than 100 simultaneous connections.  We are
not aware of any such client (as stated above, most clients aware of
MULTI_CONN get by just fine on 8 or 16 connections, and probably cope
with later connections failing by relying on the earlier connections;
libvirt has not yet been passing max-connections, but generally
creates NBD servers with the intent for a single client for the sake
of live storage migration; meanwhile, the KubeSAN project anticipates
a large cluster sharing multiple clients [up to 8 per node, and up to
100 nodes in a cluster], but it currently uses qemu-nbd with an
explicit --shared=0 rather than qemu-storage-daemon with
nbd-server-start).

We considered using a deprecation period (declare that omitting
max-parameters is deprecated, and make it mandatory in 3 releases -
then we don't need to pick an arbitrary default); that has zero risk
of breaking any apps that accidentally depended on more than 100
connections, and where such breakage might not be noticed under unit
testing but only under the larger loads of production usage.  But it
does not close the denial-of-service hole until far into the future,
and requires all apps to change to add the parameter even if 100 was
good enough.  It also has a drawback that any app (like libvirt) that
is accidentally relying on an unlimited default should seriously
consider their own CVE now, at which point they are going to change to
pass explicit max-connections sooner than waiting for 3 qemu releases.
Finally, if our changed default breaks an app, that app can always
pass in an explicit max-parameters with a larger value.

It is also intentional that the HMP interface to nbd-server-start is
not changed to expose max-connections (any client needing to fine-tune
things should be using QMP).

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-12-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[ericb: Expand commit message to summarize Dan's argument for why we
break corner-case back-compat behavior without a deprecation period]
Signed-off-by: Eric Blake <eblake@redhat.com>
2024-08-08 16:02:23 -05:00
Markus Armbruster
01bed0ff14 qapi: Refill doc comments to conform to conventions
Sweep the entire documentation again.  Last done in commit
209e64d9ed (qapi: Refill doc comments to conform to current
conventions).

To check the generated documentation does not change, I compared the
generated HTML before and after this commit with "wdiff -3".  Finds no
differences.  Comparing with diff is not useful, as the reflown
paragraphs are visible there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240729065220.860163-1-armbru@redhat.com>
[Straightforward conflict with commit 442110bc6f resolved]
2024-08-05 09:31:51 +02:00
Markus Armbruster
2746f060be qapi: Move error documentation to new "Errors" sections
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240227113921.236097-5-armbru@redhat.com>
2024-03-04 07:12:40 +01:00
Markus Armbruster
89a2273b9d qapi: Add missing union tag documentation
Low-hanging fruit, and except for StatsFilter, the only members of
these unions lacking documentation.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240205074709.3613229-16-armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-02-12 10:04:32 +01:00
Markus Armbruster
e3240ac580 qapi: Elide "Potential additional modes" from generated docs
Documentation of BlockExportRemoveMode has

    Potential additional modes to be added in the future:

    hide: Just hide export from new clients, leave existing connections
    as is.  Remove export after all clients are disconnected.

    soft: Hide export from new clients, answer with ESHUTDOWN for all
    further requests from existing clients.

I think this is useful only for developers.  Elide it from generated
documentation by turning it into a TODO section.

This effectively reverts my own commit b71fd73cc4 (Revert "qapi:
BlockExportRemoveMode: move comments to TODO").  At the time, I was
about to elide TODO sections from the generated manual, I wasn't sure
about this one, and decided to avoid change.  And now I've made up my
mind.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240129115008.674248-4-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2024-02-03 09:20:01 +01:00
Markus Armbruster
a937b6aa73 qapi: Reformat doc comments to conform to current conventions
Change

    # @name: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed
    #        do eiusmod tempor incididunt ut labore et dolore magna aliqua.

to

    # @name: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed
    #     do eiusmod tempor incididunt ut labore et dolore magna aliqua.

See recent commit "qapi: Relax doc string @name: description
indentation rules" for rationale.

Reflow paragraphs to 70 columns width, and consistently use two spaces
to separate sentences.

To check the generated documentation does not change, I compared the
generated HTML before and after this commit with "wdiff -3".  Finds no
differences.  Comparing with diff is not useful, as the reflown
paragraphs are visible there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230428105429.1687850-18-armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Lukas Straub <lukasstraub2@web.de>
[Straightforward conflicts in qapi/audio.json qapi/misc-target.json
qapi/run-state.json resolved]
2023-05-10 10:01:01 +02:00
Markus Armbruster
b71fd73cc4 Revert "qapi: BlockExportRemoveMode: move comments to TODO"
This reverts commit 97cd74f772.

The next commit will hide TODO: sections.  See there for rationale.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230428105429.1687850-6-armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
2023-05-09 09:04:20 +02:00
Markus Armbruster
c110102898 qapi: Fix bullet list markup in documentation
Peter Maydell's commit 100cc4fe0f explains:

    rST insists on a blank line before and after a bulleted list [...]
    Add some extra blank lines in the doc comments so they're
    acceptable rST input.

It missed one in qapi/trace.json.

Paolo Bonzini later added another instance in qapi/stats.json,
providing further, if unintended, evidence for his quip that rST is
the Perl of ASCII-based markups.

Both are parsed as ordinary paragraph, resulting in garbled output.

John Snow missed the need for a blank line when converting
docs/devel/qapi-code-gen.txt to rST.

Add the blank lines we need to get the bullet lists recognized as
such.

Kevin Wolf and Lukas Straub added two more, but indented.  Sphinx
recognizes them as (indented) bullet lists.  The indentation looks
slightly off.

Insert a blank line and delete the extra indentation.

Fixes: 100cc4fe0f (qapi: Add blank lines before bulleted lists)
Fixes: 467ef823d8 (qmp: add filtering of statistics by target vCPU)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230425064223.820979-10-armbru@redhat.com>
[Fix of docs/devel/qapi-code-gen.rst squashed, commit message adjusted]
2023-04-28 11:48:34 +02:00
Xie Yongji
779d82e1d3 vduse-blk: Add name option
Currently we use 'id' option as the name of VDUSE device.
It's a bit confusing since we use one value for two different
purposes: the ID to identfy the export within QEMU (must be
distinct from any other exports in the same QEMU process, but
can overlap with names used by other processes), and the VDUSE
name to uniquely identify it on the host (must be distinct from
other VDUSE devices on the same host, but can overlap with other
export types like NBD in the same process). To make it clear,
this patch adds a separate 'name' option to specify the VDUSE
name for the vduse-blk export instead.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20220614051532.92-7-xieyongji@bytedance.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Xie Yongji
0862a087fd vduse-blk: Add serial option
Add a 'serial' option to allow user to specify this value
explicitly. And the default value is changed to an empty
string as what we did in "hw/block/virtio-blk.c".

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20220614051532.92-6-xieyongji@bytedance.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Xie Yongji
2a2359b844 vduse-blk: Implement vduse-blk export
This implements a VDUSE block backends based on
the libvduse library. We can use it to export the BDSs
for both VM and container (host) usage.

The new command-line syntax is:

$ qemu-storage-daemon \
    --blockdev file,node-name=drive0,filename=test.img \
    --export vduse-blk,node-name=drive0,id=vduse-export0,writable=on

After the qemu-storage-daemon started, we need to use
the "vdpa" command to attach the device to vDPA bus:

$ vdpa dev add name vduse-export0 mgmtdev vduse

Also the device must be removed via the "vdpa" command
before we stop the qemu-storage-daemon.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20220523084611.91-7-xieyongji@bytedance.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24 17:07:06 +02:00
Paolo Bonzini
f55ba8018c block: add more commands to preconfig mode
Of the block device commands, those that are available outside system
emulators do not require a fully constructed machine by definition.
Allow running them before machine initialization has concluded.

Of the ones that are available inside system emulation, allow querying
the PR managers, and setting up accounting and throttling.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-14 16:50:30 +02:00
Andrea Bolognani
23e4645258 qapi: Drop unnecessary whitespace in comments
The only instances that get changed are those in which the
additional whitespace was not (or couldn't possibly be) used for
alignment purposes.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Message-Id: <20220503073737.84223-7-abologna@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2022-05-16 07:22:28 +02:00
Eric Blake
58a6fdcc9e nbd/server: Allow MULTI_CONN for shared writable exports
According to the NBD spec, a server that advertises
NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
not see any cache inconsistencies: when properly separated by a single
flush, actions performed by one client will be visible to another
client, regardless of which client did the flush.

We always satisfy these conditions in qemu - even when we support
multiple clients, ALL clients go through a single point of reference
into the block layer, with no local caching.  The effect of one client
is instantly visible to the next client.  Even if our backend were a
network device, we argue that any multi-path caching effects that
would cause inconsistencies in back-to-back actions not seeing the
effect of previous actions would be a bug in that backend, and not the
fault of caching in qemu.  As such, it is safe to unconditionally
advertise CAN_MULTI_CONN for any qemu NBD server situation that
supports parallel clients.

Note, however, that we don't want to advertise CAN_MULTI_CONN when we
know that a second client cannot connect (for historical reasons,
qemu-nbd defaults to a single connection while nbd-server-add and QMP
commands default to unlimited connections; but we already have
existing means to let either style of NBD server creation alter those
defaults).  This is visible by no longer advertising MULTI_CONN for
'qemu-nbd -r' without -e, as in the iotest nbd-qemu-allocation.

The harder part of this patch is setting up an iotest to demonstrate
behavior of multiple NBD clients to a single server.  It might be
possible with parallel qemu-io processes, but I found it easier to do
in python with the help of libnbd, and help from Nir and Vladimir in
writing the test.

Signed-off-by: Eric Blake <eblake@redhat.com>
Suggested-by: Nir Soffer <nsoffer@redhat.com>
Suggested-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru>
Message-Id: <20220512004924.417153-3-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-05-12 13:10:52 +02:00
Vladimir Sementsov-Ogievskiy
e5fb29d5d0 qapi: nbd-export: allow select bitmaps by node/name pair
Hi all! Current logic of relying on search through backing chain is not
safe neither convenient.

Sometimes it leads to necessity of extra bitmap copying. Also, we are
going to add "snapshot-access" driver, to access some snapshot state
through NBD. And this driver is not formally a filter, and of course
it's not a COW format driver. So, searching through backing chain will
not work. Instead of widening the workaround of bitmap searching, let's
extend the interface so that user can select bitmap precisely.

Note, that checking for bitmap active status is not copied to the new
API, I don't see a reason for it, user should understand the risks. And
anyway, bitmap from other node is unrelated to this export being
read-only or read-write.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Message-Id: <20220314213226.362217-3-v.sementsov-og@mail.ru>
[eblake: Adjust S-o-b to Vladimir's new email, with permission]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-04-26 13:15:19 -05:00
Victor Toso
97cd74f772 qapi: BlockExportRemoveMode: move comments to TODO
@hide and @soft are potential additions which fits the TODO section
perfectly.

The main motivation is to avoid this whole block of comment entering
the wrong section in the python parser.

Signed-off-by: Victor Toso <victortoso@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-Id: <20220328140604.41484-2-victortoso@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2022-03-31 12:34:51 +02:00
Philippe Mathieu-Daude
3a8fa0edd1 qapi/block: Cosmetic change in BlockExportType schema
Fix long line introduced in commit bb01ea7311 ("qapi/block:
Restrict vhost-user-blk to CONFIG_VHOST_USER_BLK_SERVER").

Suggested-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220119121439.214821-1-f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2022-01-28 16:50:43 -06:00
Philippe Mathieu-Daudé
bb01ea7311 qapi/block: Restrict vhost-user-blk to CONFIG_VHOST_USER_BLK_SERVER
When building QEMU with --disable-vhost-user and using introspection,
query-qmp-schema lists vhost-user-blk even though it's not actually
available:

  { "execute": "query-qmp-schema" }
  {
      "return": [
          ...
          {
              "name": "312",
              "members": [
                  {
                      "name": "nbd"
                  },
                  {
                      "name": "vhost-user-blk"
                  }
              ],
              "meta-type": "enum",
              "values": [
                  "nbd",
                  "vhost-user-blk"
              ]
          },

Restrict vhost-user-blk in BlockExportType when
CONFIG_VHOST_USER_BLK_SERVER is disabled, so it
doesn't end listed by query-qmp-schema.

Fixes: 90fc91d50b ("convert vhost-user-blk server to block export API")
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220107105420.395011-4-f4bug@amsat.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-01-14 12:03:16 +01:00
Marc-André Lureau
8a9f1e1d9c qapi: make 'if' condition strings simple identifiers
Change the 'if' condition strings to be C-agnostic. It will accept
'[A-Z][A-Z0-9_]*' identifiers. This allows to express configuration
conditions in other languages (Rust or Python for ex) or other more
suitable forms.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: John Snow <jsnow@redhat.com>
Message-Id: <20210804083105.97531-11-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Rebased with semantic conflict in redefined-event.json]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-08-26 13:53:56 +02:00
Max Reitz
8fc54f9428 export/fuse: Add allow-other option
Without the allow_other mount option, no user (not even root) but the
one who started qemu/the storage daemon can access the export.  Allow
users to configure the export such that such accesses are possible.

While allow_other is probably what users want, we cannot make it an
unconditional default, because passing it is only possible (for non-root
users) if the global fuse.conf configuration file allows it.  Thus, the
default is an 'auto' mode, in which we first try with allow_other, and
then fall back to without.

FuseExport.allow_other reports whether allow_other was actually used as
a mount option or not.  Currently, this information is not used, but a
future patch will let this field decide whether e.g. an export's UID and
GID can be changed through chmod.

One notable thing about 'auto' mode is that libfuse may print error
messages directly to stderr, and so may fusermount (which it executes).
Our export code cannot really filter or hide them.  Therefore, if 'auto'
fails its first attempt and has to fall back, fusermount will print an
error message that mounting with allow_other failed.

This behavior necessitates a change to iotest 308, namely we need to
filter out this error message (because if the first attempt at mounting
with allow_other succeeds, there will be no such message).

Furthermore, common.rc's _make_test_img should use allow-other=off for
FUSE exports, because iotests generally do not need to access images
from other users, so allow-other=on or allow-other=auto have no
advantage.  OTOH, allow-other=on will not work on systems where
user_allow_other is disabled, and with allow-other=auto, we get said
error message that we would need to filter out again.  Just disabling
allow-other is simplest.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210625142317.271673-3-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 12:26:05 +02:00
Max Reitz
4fba06d594 fuse: Allow growable exports
These will behave more like normal files in that writes beyond the EOF
will automatically grow the export size.

As an optimization, keep the RESIZE permission for growable exports so
we do not have to take it for every post-EOF write.  (This permission is
not released when the export is destroyed, because at that point the
BlockBackend is destroyed altogether anyway.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-5-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Max Reitz
0c9b70d590 fuse: Allow exporting BDSs via FUSE
block-export-add type=fuse allows mounting block graph nodes via FUSE on
some existing regular file.  That file should then appears like a raw
disk image, and accesses to it result in accesses to the exported BDS.

Right now, we only implement the necessary block export functions to set
it up and shut it down.  We do not implement any access functions, so
accessing the mount point only results in errors.  This will be
addressed by a followup patch.

We keep a hash table of exported mount points, because we want to be
able to detect when users try to use a mount point twice.  This is
because we invoke stat() to check whether the given mount point is a
regular file, but if that file is served by ourselves (because it is
already used as a mount point), then this stat() would have to be served
by ourselves, too, which is impossible to do while we (as the caller)
are waiting for it to settle.  Therefore, keep track of mount point
paths to at least catch the most obvious instances of that problem.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-3-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Markus Armbruster
9bc6e893b7 qapi: Normalize version references x.y.0 to just x.y
We use x.y most of the time, and x.y.0 sometimes.  Normalize for
consistency.

Reported-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201118064158.3359056-1-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2020-12-10 17:16:44 +01:00
Eric Blake
dbc7b01492 nbd: Add 'qemu-nbd -A' to expose allocation depth
Allow the server to expose an additional metacontext to be requested
by savvy clients.  qemu-nbd adds a new option -A to expose the
qemu:allocation-depth metacontext through NBD_CMD_BLOCK_STATUS; this
can also be set via QMP when using block-export-add.

qemu as client is hacked into viewing the key aspects of this new
context by abusing the already-experimental x-dirty-bitmap option to
collapse all depths greater than 2, which results in a tri-state value
visible in the output of 'qemu-img map --output=json' (yes, that means
x-dirty-bitmap is now a bit of a misnomer, but I didn't feel like
renaming it as it would introduce a needless break of back-compat,
even though we make no compat guarantees with x- members):

unallocated (depth 0) => "zero":false, "data":true
local (depth 1)       => "zero":false, "data":false
backing (depth 2+)    => "zero":true,  "data":true

libnbd as client is probably a nicer way to get at the information
without having to decipher such hacks in qemu as client. ;)

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201027050556.269064-11-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2020-10-30 15:22:00 -05:00
Eric Blake
cbad81cef8 nbd: Update qapi to support exporting multiple bitmaps
Since 'block-export-add' is new to 5.2, we can still tweak the
interface; there, allowing 'bitmaps':['str'] is nicer than
'bitmap':'str'.  This wires up the qapi and qemu-nbd changes to permit
passing multiple bitmaps as distinct metadata contexts that the NBD
client may request, but the actual support for more than one will
require a further patch to the server.

Note that there are no changes made to the existing deprecated
'nbd-server-add' command; this required splitting the QAPI type
BlockExportOptionsNbd, which fortunately does not affect QMP
introspection.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201027050556.269064-5-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
2020-10-30 15:10:15 -05:00
Stefan Hajnoczi
d9b495f9c6 block/export: add vhost-user-blk multi-queue support
Allow the number of queues to be configured using --export
vhost-user-blk,num-queues=N. This setting should match the QEMU --device
vhost-user-blk-pci,num-queues=N setting but QEMU vhost-user-blk.c lowers
its own value if the vhost-user-blk backend offers fewer queues than
QEMU.

The vhost-user-blk-server.c code is already capable of multi-queue. All
virtqueue processing runs in the same AioContext. No new locking is
needed.

Add the num-queues=N option and set the VIRTIO_BLK_F_MQ feature bit.
Note that the feature bit only announces the presence of the num_queues
configuration space field. It does not promise that there is more than 1
virtqueue, so we can set it unconditionally.

I tested multi-queue by running a random read fio test with numjobs=4 on
an -smp 4 guest. After the benchmark finished the guest /proc/interrupts
file showed activity on all 4 virtio-blk MSI-X. The /sys/block/vda/mq/
directory shows that Linux blk-mq has 4 queues configured.

An automated test is included in the next commit.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20201001144604.559733-2-stefanha@redhat.com
[Fixed accidental tab characters as suggested by Markus Armbruster
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
f51d23c80a block/export: add iothread and fixed-iothread options
Make it possible to specify the iothread where the export will run. By
default the block node can be moved to other AioContexts later and the
export will follow. The fixed-iothread option forces strict behavior
that prevents changing AioContext while the export is active. See the
QAPI docs for details.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200929125516.186715-5-stefanha@redhat.com
[Fix stray '#' character in block-export.json and add missing "(since:
5.2)" as suggested by Eric Blake.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
90fc91d50b block/export: convert vhost-user-blk server to block export API
Use the new QAPI block exports API instead of defining our own QOM
objects.

This is a large change because the lifecycle of VuBlockDev needs to
follow BlockExportDriver. QOM properties are replaced by QAPI options
objects.

VuBlockDev is renamed VuBlkExport and contains a BlockExport field.
Several fields can be dropped since BlockExport already has equivalents.

The file names and meson build integration will be adjusted in a future
patch. libvhost-user should probably be built as a static library that
is linked into QEMU instead of as a .c file that results in duplicate
compilation.

The new command-line syntax is:

  $ qemu-storage-daemon \
      --blockdev file,node-name=drive0,filename=test.img \
      --export vhost-user-blk,node-name=drive0,id=export0,unix-socket=/tmp/vhost-user-blk.sock

Note that unix-socket is optional because we may wish to accept chardevs
too in the future.

Markus noted that supported address families are not explicit in the
QAPI schema. It is unlikely that support for more address families will
be added since file descriptor passing is required and few address
families support it. If a new address family needs to be added, then the
QAPI 'features' syntax can be used to advertize them.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20200924151549.913737-12-stefanha@redhat.com
[Skip test on big-endian host architectures because this device doesn't
support them yet (as already mentioned in a code comment).
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Kevin Wolf
443127e81b nbd: Deprecate nbd-server-add/remove
These QMP commands are replaced by block-export-add/del.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200924152717.287415-28-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
30dbc81d31 block/export: Move writable to BlockExportOptions
The 'writable' option is a basic option that will probably be applicable
to most if not all export types that we will implement. Move it from NBD
to the generic BlockExport layer.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-26-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
8cade320c8 block/export: Add query-block-exports
This adds a simple QMP command to query the list of block exports.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-25-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
1a9f7a804f block/export: Add BLOCK_EXPORT_DELETED event
Clients may want to know when an export has finally disappeard
(block-export-del returns earlier than that in the general case), so add
a QAPI event for it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200924152717.287415-22-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
3c3bc462ad block/export: Add block-export-del
Implement a new QMP command block-export-del and make nbd-server-remove
a wrapper around it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-21-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
d53be9ce55 block/export: Add 'id' option to block-export-add
We'll need an id to identify block exports in monitor commands. This
adds one.

Note that this is different from the 'name' option in the NBD server,
which is the externally visible export name. While block export ids need
to be unique in the whole process, export names must be unique only for
the same server. Different export types or (potentially in the future)
multiple NBD servers can have the same export name externally, but still
need different block export ids internally.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-19-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
b6076afcab block/export: Add node-name to BlockExportOptions
Every block export needs a block node to export, so add a 'node-name'
option to BlockExportOptions and remove the replaced option 'device'
from BlockExportOptionsNbd.

To maintain compatibility in nbd-server-add, BlockExportOptionsNbd needs
to be wrapped by a new type NbdServerAddOptions that adds 'device' back
because nbd-server-add doesn't use the BlockExportOptions base type at
all (so even without changing it to a 'node-name' option in
block-export-add, this compatibility code would be necessary).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-16-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
fefee85da0 nbd: Add writethrough to block-export-add
qemu-nbd allows use of writethrough cache modes, which mean that write
requests made through NBD will cause a flush before they complete.
Expose the same functionality in block-export-add.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-10-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
1c8222b014 nbd: Add max-connections to nbd-server-start
This is a QMP equivalent of qemu-nbd's --shared option, limiting the
maximum number of clients that can attach at the same time.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200924152717.287415-9-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
56ee86261e block/export: Add BlockExport infrastructure and block-export-add
We want to have a common set of commands for all types of block exports.
Currently, this is only NBD, but we're going to add more types.

This patch adds the basic BlockExport and BlockExportDriver structs and
a QMP command block-export-add that creates a new export based on the
given BlockExportOptions.

qmp_nbd_server_add() becomes a wrapper around qmp_block_export_add().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-5-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
143ea7670c qapi: Rename BlockExport to BlockExportOptions
The name BlockExport will be used for the struct containing the runtime
state of block exports, so change the name of export creation options.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200924152717.287415-4-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
5daa6bfd8e qapi: Create block-export module
Move all block export related types and commands from block-core to the
new QAPI module block-export.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200924152717.287415-3-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00