Commit Graph

53 Commits

Author SHA1 Message Date
Stefan Hajnoczi
826cc32423 aio-posix: split poll check from ready handler
Adaptive polling measures the execution time of the polling check plus
handlers called when a polled event becomes ready. Handlers can take a
significant amount of time, making it look like polling was running for
a long time when in fact the event handler was running for a long time.

For example, on Linux the io_submit(2) syscall invoked when a virtio-blk
device's virtqueue becomes ready can take 10s of microseconds. This
can exceed the default polling interval (32 microseconds) and cause
adaptive polling to stop polling.

By excluding the handler's execution time from the polling check we make
the adaptive polling calculation more accurate. As a result, the event
loop now stays in polling mode where previously it would have fallen
back to file descriptor monitoring.

The following data was collected with virtio-blk num-queues=2
event_idx=off using an IOThread. Before:

168k IOPS, IOThread syscalls:

  9837.115 ( 0.020 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 16, iocbpp: 0x7fcb9f937db0)    = 16
  9837.158 ( 0.002 ms): IO iothread1/620155 write(fd: 103, buf: 0x556a2ef71b88, count: 8)                         = 8
  9837.161 ( 0.001 ms): IO iothread1/620155 write(fd: 104, buf: 0x556a2ef71b88, count: 8)                         = 8
  9837.163 ( 0.001 ms): IO iothread1/620155 ppoll(ufds: 0x7fcb90002800, nfds: 4, tsp: 0x7fcb9f1342d0, sigsetsize: 8) = 3
  9837.164 ( 0.001 ms): IO iothread1/620155 read(fd: 107, buf: 0x7fcb9f939cc0, count: 512)                        = 8
  9837.174 ( 0.001 ms): IO iothread1/620155 read(fd: 105, buf: 0x7fcb9f939cc0, count: 512)                        = 8
  9837.176 ( 0.001 ms): IO iothread1/620155 read(fd: 106, buf: 0x7fcb9f939cc0, count: 512)                        = 8
  9837.209 ( 0.035 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 32, iocbpp: 0x7fca7d0cebe0)    = 32

174k IOPS (+3.6%), IOThread syscalls:

  9809.566 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0cdd62be0)    = 32
  9809.625 ( 0.001 ms): IO iothread1/623061 write(fd: 103, buf: 0x5647cfba5f58, count: 8)                         = 8
  9809.627 ( 0.002 ms): IO iothread1/623061 write(fd: 104, buf: 0x5647cfba5f58, count: 8)                         = 8
  9809.663 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0d0388b50)    = 32

Notice that ppoll(2) and eventfd read(2) syscalls are eliminated because
the IOThread stays in polling mode instead of falling back to file
descriptor monitoring.

As usual, polling is not implemented on Windows so this patch ignores
the new io_poll_read() callback in aio-win32.c.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20211207132336.36627-2-stefanha@redhat.com

[Fixed up aio_set_event_notifier() calls in
tests/unit/test-fdmon-epoll.c added after this series was queued.
--Stefan]

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-01-12 17:09:39 +00:00
Fabrice Fontaine
3043320390 block/export/fuse.c: fix musl build
Include linux/falloc.h if CONFIG_FALLOCATE_ZERO_RANGE is defined to fix
50482fda98
and avoid the following build failure on musl:

../block/export/fuse.c: In function 'fuse_fallocate':
../block/export/fuse.c:643:21: error: 'FALLOC_FL_ZERO_RANGE' undeclared (first use in this function)
  643 |     else if (mode & FALLOC_FL_ZERO_RANGE) {
      |                     ^~~~~~~~~~~~~~~~~~~~

Fixes:
 - http://autobuild.buildroot.org/results/be24433a429fda681fb66698160132c1c99bc53b

Fixes: 50482fda98 ("block/export/fuse.c: fix musl build")
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Message-Id: <20211022095209.1319671-1-fontaine.fabrice@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-11-02 13:02:46 +01:00
Fabrice Fontaine
28031d5c74 block/export/fuse.c: fix fuse-lseek on uclibc or musl
Include linux/fs.h to avoid the following build failure on uclibc or
musl raised since version 6.0.0:

../block/export/fuse.c: In function 'fuse_lseek':
../block/export/fuse.c:641:19: error: 'SEEK_HOLE' undeclared (first use in this function)
  641 |     if (whence != SEEK_HOLE && whence != SEEK_DATA) {
      |                   ^~~~~~~~~
../block/export/fuse.c:641:19: note: each undeclared identifier is reported only once for each function it appears in
../block/export/fuse.c:641:42: error: 'SEEK_DATA' undeclared (first use in this function); did you mean 'SEEK_SET'?
  641 |     if (whence != SEEK_HOLE && whence != SEEK_DATA) {
      |                                          ^~~~~~~~~
      |                                          SEEK_SET

Fixes:
 - http://autobuild.buildroot.org/results/33c90ebf04997f4d3557cfa66abc9cf9a3076137

Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Message-Id: <20210827220301.272887-1-fontaine.fabrice@gmail.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01 14:38:08 +02:00
Fabrice Fontaine
50482fda98 block/export/fuse.c: fix musl build
Fix the following build failure on musl raised since version 6.0.0 and
4ca37a96a7
because musl does not define FALLOC_FL_ZERO_RANGE:

../block/export/fuse.c: In function 'fuse_fallocate':
../block/export/fuse.c:563:23: error: 'FALLOC_FL_ZERO_RANGE' undeclared (first use in this function)
  563 |     } else if (mode & FALLOC_FL_ZERO_RANGE) {
      |                       ^~~~~~~~~~~~~~~~~~~~

Fixes:
 - http://autobuild.buildroot.org/results/b96e3d364fd1f8bbfb18904a742e73327d308f64

Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Message-Id: <20210809095101.1101336-1-fontaine.fabrice@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-08-09 17:19:27 +02:00
Max Reitz
8573823f3b block/export: Conditionally ignore set-context error
When invoking block-export-add with some iothread and
fixed-iothread=false, and changing the node's iothread fails, the error
is supposed to be ignored.

However, it is still stored in *errp, which is wrong.  If a second error
occurs, the "*errp must be NULL" assertion in error_setv() fails:

  qemu-system-x86_64: ../util/error.c:59: error_setv: Assertion
  `*errp == NULL' failed.

So if fixed-iothread=false, we should ignore the error by passing NULL
to bdrv_try_set_aio_context().

Fixes: f51d23c80a
       ("block/export: add iothread and fixed-iothread options")
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210624083825.29224-2-mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-20 16:49:31 +02:00
Max Reitz
6aeeaed29c export/fuse: Let permissions be adjustable
Allow changing the file mode, UID, and GID through SETATTR.

Without allow_other, UID and GID are not allowed to be changed, because
it would not make sense.  Also, changing group or others' permissions
is not allowed either.

For read-only exports, +w cannot be set.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210625142317.271673-5-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 12:26:05 +02:00
Max Reitz
9bad96a8cc export/fuse: Give SET_ATTR_SIZE its own branch
In order to support changing other attributes than the file size in
fuse_setattr(), we have to give each its own independent branch.  This
also applies to the only attribute we do support right now.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210625142317.271673-4-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 12:26:05 +02:00
Max Reitz
8fc54f9428 export/fuse: Add allow-other option
Without the allow_other mount option, no user (not even root) but the
one who started qemu/the storage daemon can access the export.  Allow
users to configure the export such that such accesses are possible.

While allow_other is probably what users want, we cannot make it an
unconditional default, because passing it is only possible (for non-root
users) if the global fuse.conf configuration file allows it.  Thus, the
default is an 'auto' mode, in which we first try with allow_other, and
then fall back to without.

FuseExport.allow_other reports whether allow_other was actually used as
a mount option or not.  Currently, this information is not used, but a
future patch will let this field decide whether e.g. an export's UID and
GID can be changed through chmod.

One notable thing about 'auto' mode is that libfuse may print error
messages directly to stderr, and so may fusermount (which it executes).
Our export code cannot really filter or hide them.  Therefore, if 'auto'
fails its first attempt and has to fall back, fusermount will print an
error message that mounting with allow_other failed.

This behavior necessitates a change to iotest 308, namely we need to
filter out this error message (because if the first attempt at mounting
with allow_other succeeds, there will be no such message).

Furthermore, common.rc's _make_test_img should use allow-other=off for
FUSE exports, because iotests generally do not need to access images
from other users, so allow-other=on or allow-other=auto have no
advantage.  OTOH, allow-other=on will not work on systems where
user_allow_other is disabled, and with allow-other=auto, we get said
error message that we would need to filter out again.  Just disabling
allow-other is simplest.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210625142317.271673-3-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 12:26:05 +02:00
Max Reitz
2c7dd057aa export/fuse: Pass default_permissions for mount
We do not do any permission checks in fuse_open(), so let the kernel do
them.  We already let fuse_getattr() report the proper UNIX permissions,
so this should work the way we want.

This causes a change in 308's reference output, because now opening a
non-writable export with O_RDWR fails already, instead of only actually
attempting to write to it.  (That is an improvement.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210625142317.271673-2-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 12:26:05 +02:00
Stefan Hajnoczi
1b0b2e6d06 block/export: improve vu_blk_sect_range_ok()
The checks in vu_blk_sect_range_ok() assume VIRTIO_BLK_SECTOR_SIZE is
equal to BDRV_SECTOR_SIZE. This is true, but let's add a
QEMU_BUILD_BUG_ON() to make it explicit.

We might as well check that the request buffer size is a multiple of
VIRTIO_BLK_SECTOR_SIZE while we're at it.

Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210331142727.391465-1-stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-05-18 11:08:13 +02:00
Stefan Hajnoczi
6f4b1996b4 block/export: disable VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD for now
The vhost-user in-flight shmfd feature has not been tested with
qemu-storage-daemon's vhost-user-blk server. Disable this optional
feature for now because it requires MFD_ALLOW_SEALING, which is not
available in some CI environments.

If we need this feature in the future it can be re-enabled after
testing.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210309094106.196911-2-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-19 10:15:06 +01:00
Stefan Hajnoczi
05ae4e674e block/export: port virtio-blk read/write range check
Check that the sector number and byte count are valid.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210223144653.811468-13-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:54 +01:00
Stefan Hajnoczi
db4eadf9f1 block/export: port virtio-blk discard/write zeroes input validation
Validate discard/write zeroes the same way we do for virtio-blk. Some of
these checks are mandated by the VIRTIO specification, others are
internal to QEMU.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210223144653.811468-11-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:54 +01:00
Stefan Hajnoczi
e44362ce31 block/export: fix vhost-user-blk export sector number calculation
The driver is supposed to honor the blk_size field but the protocol
still uses 512-byte sector numbers. It is incorrect to multiply
req->sector_num by blk_size.

VIRTIO 1.1 5.2.5 Device Initialization says:

  blk_size can be read to determine the optimal sector size for the
  driver to use. This does not affect the units used in the protocol
  (always 512 bytes), but awareness of the correct value can affect
  performance.

Fixes: 3578389bcf ("block/export: vhost-user block device backend server")
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210223144653.811468-10-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:54 +01:00
Stefan Hajnoczi
524bac0744 block/export: use VIRTIO_BLK_SECTOR_BITS
Use VIRTIO_BLK_SECTOR_BITS and VIRTIO_BLK_SECTOR_SIZE when dealing with
virtio-blk sector numbers. Although the values happen to be the same as
BDRV_SECTOR_BITS and BDRV_SECTOR_SIZE, they are conceptually different.
This makes it clearer when we are dealing with virtio-blk sector units.

Use VIRTIO_BLK_SECTOR_BITS in vu_blk_initialize_config(). Later patches
will use it the new constants the virtqueue request processing code
path.

Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210223144653.811468-9-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:54 +01:00
Stefan Hajnoczi
a4f1542af5 block/export: fix blk_size double byteswap
The config->blk_size field is little-endian. Use the native-endian
blk_size variable to avoid double byteswapping.

Fixes: 11f60f7eae ("block/export: make vhost-user-blk config space little-endian")
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210223144653.811468-8-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:54 +01:00
Eric Blake
c3033fd372 qapi: Use QAPI_LIST_APPEND in trivial cases
The easiest spots to use QAPI_LIST_APPEND are where we already have an
obvious pointer to the tail of a list.  While at it, consistently use
the variable name 'tail' for that purpose.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210113221013.390592-5-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-01-28 08:08:45 +01:00
Max Reitz
df4ea7091b fuse: Implement hole detection through lseek
This is a relatively new feature in libfuse (available since 3.8.0,
which was released in November 2019), so we have to add a dedicated
check whether it is available before making use of it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-7-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Max Reitz
4ca37a96a7 fuse: (Partially) implement fallocate()
This allows allocating areas after the (old) EOF as part of a growing
resize, writing zeroes, and discarding.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-6-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Max Reitz
4fba06d594 fuse: Allow growable exports
These will behave more like normal files in that writes beyond the EOF
will automatically grow the export size.

As an optimization, keep the RESIZE permission for growable exports so
we do not have to take it for every post-EOF write.  (This permission is
not released when the export is destroyed, because at that point the
BlockBackend is destroyed altogether anyway.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-5-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Max Reitz
41429e3d79 fuse: Implement standard FUSE operations
This makes the export actually useful instead of only producing errors
whenever it is accessed.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-4-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Max Reitz
0c9b70d590 fuse: Allow exporting BDSs via FUSE
block-export-add type=fuse allows mounting block graph nodes via FUSE on
some existing regular file.  That file should then appears like a raw
disk image, and accesses to it result in accesses to the exported BDS.

Right now, we only implement the necessary block export functions to set
it up and shut it down.  We do not implement any access functions, so
accessing the mount point only results in errors.  This will be
addressed by a followup patch.

We keep a hash table of exported mount points, because we want to be
able to detect when users try to use a mount point twice.  This is
because we invoke stat() to check whether the given mount point is a
regular file, but if that file is served by ourselves (because it is
already used as a mount point), then this stat() would have to be served
by ourselves, too, which is impossible to do while we (as the caller)
are waiting for it to settle.  Therefore, keep track of mount point
paths to at least catch the most obvious instances of that problem.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-3-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Stefan Hajnoczi
552c2c4c10 block/export: avoid g_return_val_if() input validation
Do not validate input with g_return_val_if(). This API is intended for
checking programming errors and is compiled out with -DG_DISABLE_CHECKS.

Use an explicit if statement for input validation so it cannot
accidentally be compiled out.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201118091644.199527-5-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-09 13:04:17 -05:00
Marc-André Lureau
0df750e9d3 libvhost-user: make it a meson subproject
By making libvhost-user a subproject, check it builds
standalone (without the global QEMU cflags etc).

Note that the library still relies on QEMU include/qemu/atomic.h and
linux_headers/.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20201125100640.366523-6-marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-08 13:48:58 -05:00
Stefan Hajnoczi
e5e856c1eb meson: move vhost_user_blk_server to meson.build
The --enable/disable-vhost-user-blk-server options were implemented in
./configure. There has been confusion about them and part of the problem
is that the shell syntax used for setting the default value is not easy
to read. Move the option over to meson where the conditions are easier
to understand:

  have_vhost_user_blk_server = (targetos == 'linux')

  if get_option('vhost_user_blk_server').enabled()
      if targetos != 'linux'
          error('vhost_user_blk_server requires linux')
      endif
  elif get_option('vhost_user_blk_server').disabled() or not have_system
      have_vhost_user_blk_server = false
  endif

This patch does not change behavior.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201110171121.1265142-2-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2020-11-12 09:19:40 -05:00
Stefan Hajnoczi
f8ffcb2bda block/export: fix vhost-user-blk get_config() information leak
Refuse get_config() requests in excess of sizeof(struct virtio_blk_config).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201027173528.213464-5-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 16:39:05 -05:00
Stefan Hajnoczi
11f60f7eae block/export: make vhost-user-blk config space little-endian
VIRTIO 1.0 devices have little-endian configuration space. The
vhost-user-blk-server.c code already uses little-endian for virtqueue
processing but not for the configuration space fields. Fix this so the
vhost-user-blk export works on big-endian hosts.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201027173528.213464-4-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 16:39:05 -05:00
Stefan Hajnoczi
bc15e44cb2 configure: introduce --enable-vhost-user-blk-server
Make it possible to compile out the vhost-user-blk server. It is enabled
by default on Linux.

Note that vhost-user-server.c depends on libvhost-user, which requires
CONFIG_LINUX. The CONFIG_VHOST_USER dependency was erroneous since that
option controls vhost-user frontends (previously known as "master") and
not device backends (previously known as "slave").

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201027173528.213464-3-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 16:39:05 -05:00
Stefan Hajnoczi
d9b495f9c6 block/export: add vhost-user-blk multi-queue support
Allow the number of queues to be configured using --export
vhost-user-blk,num-queues=N. This setting should match the QEMU --device
vhost-user-blk-pci,num-queues=N setting but QEMU vhost-user-blk.c lowers
its own value if the vhost-user-blk backend offers fewer queues than
QEMU.

The vhost-user-blk-server.c code is already capable of multi-queue. All
virtqueue processing runs in the same AioContext. No new locking is
needed.

Add the num-queues=N option and set the VIRTIO_BLK_F_MQ feature bit.
Note that the feature bit only announces the presence of the num_queues
configuration space field. It does not promise that there is more than 1
virtqueue, so we can set it unconditionally.

I tested multi-queue by running a random read fio test with numjobs=4 on
an -smp 4 guest. After the benchmark finished the guest /proc/interrupts
file showed activity on all 4 virtio-blk MSI-X. The /sys/block/vda/mq/
directory shows that Linux blk-mq has 4 queues configured.

An automated test is included in the next commit.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20201001144604.559733-2-stefanha@redhat.com
[Fixed accidental tab characters as suggested by Markus Armbruster
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
f51d23c80a block/export: add iothread and fixed-iothread options
Make it possible to specify the iothread where the export will run. By
default the block node can be moved to other AioContexts later and the
export will follow. The fixed-iothread option forces strict behavior
that prevents changing AioContext while the export is active. See the
QAPI docs for details.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200929125516.186715-5-stefanha@redhat.com
[Fix stray '#' character in block-export.json and add missing "(since:
5.2)" as suggested by Eric Blake.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
cbc20bfb8f block: move block exports to libblockdev
Block exports are used by softmmu, qemu-storage-daemon, and qemu-nbd.
They are not used by other programs and are not otherwise needed in
libblock.

Undo the recent move of blockdev-nbd.c from blockdev_ss into block_ss.
Since bdrv_close_all() (libblock) calls blk_exp_close_all()
(libblockdev) a stub function is required..

Make qemu-nbd.c use signal handling utility functions instead of
duplicating the code. This helps because os-posix.c is in libblockdev
and it depends on a qemu_system_killed() symbol that qemu-nbd.c lacks.
Once we use the signal handling utility functions we also end up
providing the necessary symbol.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20200929125516.186715-4-stefanha@redhat.com
[Fixed s/ndb/nbd/ typo in commit description as suggested by Eric Blake
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
3a213f83d9 util/vhost-user-server: use static library in meson.build
Don't compile contrib/libvhost-user/libvhost-user.c again. Instead build
the static library once and then reuse it throughout QEMU.

Also switch from CONFIG_LINUX to CONFIG_VHOST_USER, which is what the
vhost-user tools (vhost-user-gpu, etc) do.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200924151549.913737-14-stefanha@redhat.com
[Added CONFIG_LINUX again because libvhost-user doesn't build on macOS.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
80a06cc52b util/vhost-user-server: move header to include/
Headers used by other subsystems are located in include/. Also add the
vhost-user-server and vhost-user-blk-server headers to MAINTAINERS.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200924151549.913737-13-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
90fc91d50b block/export: convert vhost-user-blk server to block export API
Use the new QAPI block exports API instead of defining our own QOM
objects.

This is a large change because the lifecycle of VuBlockDev needs to
follow BlockExportDriver. QOM properties are replaced by QAPI options
objects.

VuBlockDev is renamed VuBlkExport and contains a BlockExport field.
Several fields can be dropped since BlockExport already has equivalents.

The file names and meson build integration will be adjusted in a future
patch. libvhost-user should probably be built as a static library that
is linked into QEMU instead of as a .c file that results in duplicate
compilation.

The new command-line syntax is:

  $ qemu-storage-daemon \
      --blockdev file,node-name=drive0,filename=test.img \
      --export vhost-user-blk,node-name=drive0,id=export0,unix-socket=/tmp/vhost-user-blk.sock

Note that unix-socket is optional because we may wish to accept chardevs
too in the future.

Markus noted that supported address families are not explicit in the
QAPI schema. It is unlikely that support for more address families will
be added since file descriptor passing is required and few address
families support it. If a new address family needs to be added, then the
QAPI 'features' syntax can be used to advertize them.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20200924151549.913737-12-stefanha@redhat.com
[Skip test on big-endian host architectures because this device doesn't
support them yet (as already mentioned in a code comment).
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
0534b1b227 block/export: report flush errors
Propagate the flush return value since errors are possible.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200924151549.913737-11-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
7185c85776 util/vhost-user-server: rework vu_client_trip() coroutine lifecycle
The vu_client_trip() coroutine is leaked during AioContext switching. It
is also unsafe to destroy the vu_dev in panic_cb() since its callers
still access it in some cases.

Rework the lifecycle to solve these safety issues.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200924151549.913737-10-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
47ba680466 util/vhost-user-server: drop unused DevicePanicNotifier
The device panic notifier callback is not used. Drop it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200924151549.913737-7-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
df6af7ce77 block/export: consolidate request structs into VuBlockReq
Only one struct is needed per request. Drop req_data and the separate
VuBlockReq instance. Instead let vu_queue_pop() allocate everything at
once.

This fixes the req_data memory leak in vu_block_virtio_process_req().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200924151549.913737-6-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Coiby Xu
3578389bcf block/export: vhost-user block device backend server
By making use of libvhost-user, block device drive can be shared to
the connected vhost-user client. Only one client can connect to the
server one time.

Since vhost-user-server needs a block drive to be created first, delay
the creation of this object.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20200918080912.321299-6-coiby.xu@gmail.com
[Shorten "vhost_user_blk_server" string to "vhost_user_blk" to avoid the
following compiler warning:
../block/export/vhost-user-blk-server.c:178:50: error: ‘%s’ directive output truncated writing 21 bytes into a region of size 20 [-Werror=format-truncation=]
and fix "Invalid size %ld ..." ssize_t format string arguments for
32-bit hosts.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Kevin Wolf
30dbc81d31 block/export: Move writable to BlockExportOptions
The 'writable' option is a basic option that will probably be applicable
to most if not all export types that we will implement. Move it from NBD
to the generic BlockExport layer.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-26-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
8cade320c8 block/export: Add query-block-exports
This adds a simple QMP command to query the list of block exports.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-25-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
331170e073 block/export: Create BlockBackend in blk_exp_add()
Every export type will need a BlockBackend, so creating it centrally in
blk_exp_add() instead of the .create driver callback avoids duplication.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-24-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
37a4f70cea block/export: Move blk to BlockExport
Every block export has a BlockBackend representing the disk that is
exported. It should live in BlockExport therefore.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-23-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
1a9f7a804f block/export: Add BLOCK_EXPORT_DELETED event
Clients may want to know when an export has finally disappeard
(block-export-del returns earlier than that in the general case), so add
a QAPI event for it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200924152717.287415-22-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
3c3bc462ad block/export: Add block-export-del
Implement a new QMP command block-export-del and make nbd-server-remove
a wrapper around it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-21-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
3859ad36f0 block/export: Move strong user reference to block_exports
The reference owned by the user/monitor that is created when adding the
export and dropped when removing it was tied to the 'exports' list in
nbd/server.c. Every block export will have a user reference, so move it
to the block export level and tie it to the 'block_exports' list in
block/export/export.c instead. This is necessary for introducing a QMP
command for removing exports.

Note that exports are present in block_exports even after the user has
requested shutdown. This is different from NBD's exports where exports
are immediately removed on a shutdown request, even if they are still in
the process of shutting down. In order to avoid that the user still
interacts with an export that is shutting down (and possibly removes it
a second time), we need to remember if the user actually still owns it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-20-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
d53be9ce55 block/export: Add 'id' option to block-export-add
We'll need an id to identify block exports in monitor commands. This
adds one.

Note that this is different from the 'name' option in the NBD server,
which is the externally visible export name. While block export ids need
to be unique in the whole process, export names must be unique only for
the same server. Different export types or (potentially in the future)
multiple NBD servers can have the same export name externally, but still
need different block export ids internally.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-19-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
bc4ee65b8c block/export: Add blk_exp_close_all(_type)
This adds a function to shut down all block exports, and another one to
shut down the block exports of a single type. The latter is used for now
when stopping the NBD server. As soon as we implement support for
multiple NBD servers, we'll need a per-server list of exports and it
will be replaced by a function using that.

As a side effect, the BlockExport layer has a list tracking all existing
exports now. closed_exports loses its only user and can go away.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-18-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
a6ff798966 block/export: Allocate BlockExport in blk_exp_add()
Instead of letting the driver allocate and return the BlockExport
object, allocate it already in blk_exp_add() and pass it. This allows us
to initialise the generic part before calling into the driver so that
the driver can just use these values instead of having to parse the
options a second time.

For symmetry, move freeing the BlockExport to blk_exp_unref().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-17-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
8612c68673 block/export: Move AioContext from NBDExport to BlockExport
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-15-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00