Commit Graph

5572 Commits

Author SHA1 Message Date
Vladimir Sementsov-Ogievskiy
d44dae1a7c block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts
It's possible that requests start to wait each other in
mirror_wait_on_conflicts(). To avoid it let's use same technique as in
block/io.c in bdrv_wait_serialising_requests_locked() /
bdrv_find_conflicting_request(): don't wait on intersecting request if
it is already waiting for some other request.

For details of the dead-lock look at testIntersectingActiveIO()
test-case which we actually fixing now.

Fixes: d06107ade0
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210702211636.228981-4-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-20 13:14:45 +02:00
Vladimir Sementsov-Ogievskiy
ead3f1bff9 block/mirror: set .co for active-write MirrorOp objects
This field is unused, but it very helpful for debugging.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210702211636.228981-2-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-20 13:14:45 +02:00
Emanuele Giuseppe Esposito
36109bff17 blkdebug: protect rules and suspended_reqs with a lock
First, categorize the structure fields to identify what needs
to be protected and what doesn't.

We essentially need to protect only .state, and the 3 lists in
BDRVBlkdebugState.

Then, add the lock and mark the functions accordingly.

Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20210614082931.24925-7-eesposit@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-07-19 17:38:38 +02:00
Emanuele Giuseppe Esposito
4153b553bd block/blkdebug: remove new_state field and instead use a local variable
There seems to be no benefit in using a field. Replace it with a local
variable, and move the state update before the yields.

The state update has do be done before the yields because now using
a local variable does not allow the new updated state to be visible
by the other yields.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20210614082931.24925-6-eesposit@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-07-19 17:38:38 +02:00
Emanuele Giuseppe Esposito
2196c341f7 blkdebug: do not suspend in the middle of QLIST_FOREACH_SAFE
That would be unsafe in case a rule other than the current one
is removed while the coroutine has yielded.
Keep FOREACH_SAFE because suspend_request deletes the current rule.

After this patch, *all* matching rules are deleted before suspending
the coroutine, rather than just one.
This doesn't affect the existing testcases.

Use actions_count to see how many yield to issue.

Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210614082931.24925-5-eesposit@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-07-19 17:38:38 +02:00
Emanuele Giuseppe Esposito
51a463680d blkdebug: track all actions
Add a counter for each action that a rule can trigger.
This is mainly used to keep track of how many coroutine_yield()
we need to perform after processing all rules in the list.

Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210614082931.24925-4-eesposit@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-07-19 17:38:38 +02:00
Emanuele Giuseppe Esposito
f48ff5af13 blkdebug: move post-resume handling to resume_req_by_tag
We want to move qemu_coroutine_yield() after the loop on rules,
because QLIST_FOREACH_SAFE is wrong if the rule list is modified
while the coroutine has yielded.  Therefore move the suspended
request to the heap and clean it up from the remove side.
All that is left is for blkdebug_debug_event to handle the
yielding.

Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210614082931.24925-3-eesposit@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-07-19 17:38:38 +02:00
Emanuele Giuseppe Esposito
69d0690c10 blkdebug: refactor removal of a suspended request
Extract to a separate function.  Do not rely on FOREACH_SAFE, which is
only "safe" if the *current* node is removed---not if another node is
removed.  Instead, just walk the entire list from the beginning when
asked to resume all suspended requests with a given tag.

Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210614082931.24925-2-eesposit@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-07-19 17:38:38 +02:00
Lukas Straub
0b9cd6b947 nbd: register yank function earlier
Although unlikely, qemu might hang in nbd_send_request().

Allow recovery in this case by registering the yank function before
calling it.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Message-Id: <20210704000730.1befb596@gecko.fritz.box>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-07-12 11:24:00 -05:00
Peter Maydell
d1987c8114 * More SVM fixes (Lara)
* Module annotation database (Gerd)
 * Memory leak fixes (myself)
 * Build fixes (myself)
 * --with-devices-* support (Alex)
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDoeBgUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMtFAgAippmxRt3lt+tcdSrCOZlKmxW6veK
 nUidtzfH5uE8vQsh5Q98WCEq871C/C+St1gK+q2H/MLrJeAqZD39DV+SKTuZ6Tcp
 3jL0iYC+oO0OjkHppDQTUDweF9KrsAW1WEeNz2th1OUDSjBXuXbZ+N497taouX18
 p2UN0gKNsOO2/QFrKL5KO7vSC56eBGoZz6gKtw/7dDtJBtizf1xKBRHW43b+CnQJ
 mHLs7Tj6oMC+vnMHkUKLH/6za3WJF1XHs5fp2isRgqoOSP8m0r6CMg8JnFIvmQf/
 tbLospKSWqcgD5C5PlFm2wSOjdU7zuPKM7wchhKrrEIvdDPhXaKrlpwi5Q==
 =GFX1
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging

* More SVM fixes (Lara)
* Module annotation database (Gerd)
* Memory leak fixes (myself)
* Build fixes (myself)
* --with-devices-* support (Alex)

# gpg: Signature made Fri 09 Jul 2021 17:23:52 BST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini-gitlab/tags/for-upstream: (48 commits)
  meson: Use input/output for entitlements target
  configure: allow the selection of alternate config in the build
  configs: rename default-configs to configs and reorganise
  hw/arm: move CONFIG_V7M out of default-devices
  hw/arm: add dependency on OR_IRQ for XLNX_VERSAL
  meson: Introduce target-specific Kconfig
  meson: switch function tests from compilation to linking
  vl: fix leak of qdict_crumple return value
  target/i386: fix exceptions for MOV to DR
  target/i386: Added DR6 and DR7 consistency checks
  target/i386: Added MSRPM and IOPM size check
  monitor/tcg: move tcg hmp commands to accel/tcg, register them dynamically
  usb: build usb-host as module
  monitor/usb: register 'info usbhost' dynamically
  usb: drop usb_host_dev_is_scsi_storage hook
  monitor: allow register hmp commands
  accel: build tcg modular
  accel: add tcg module annotations
  accel: build qtest modular
  accel: add qtest module annotations
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-11 22:20:51 +01:00
Peter Maydell
42e1d798a6 Block layer patches
- Make blockdev-reopen stable
 - Remove deprecated qemu-img backing file without format
 - rbd: Convert to coroutines and add write zeroes support
 - rbd: Updated MAINTAINERS
 - export/fuse: Allow other users access to the export
 - vhost-user: Fix backends without multiqueue support
 - Fix drive-backup transaction endless drained section
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmDoRdIRHGt3b2xmQHJl
 ZGhhdC5jb20ACgkQfwmycsiPL9bvgQ/+Ogq24n1UOQc8FEKRYfyhajNToQ9ofzWN
 iLiblSGx2QDq+CauD3qdu6z7DLlqEXeoM4NYM462oIPumptQj+9XZt7ftfh6FLWW
 4yJEbjfnVKOba+vFdJ+E0DStwnPaxYdnrPGd53cwHZfbZh4ZmkpTM350mzHHiLTb
 KYKOgWd+UHZbkYeCVNYTGe30SRBiKeAecTpsVZ5HVhe7LstjByuy5stk8dytLpdV
 YqdKOToZfOp77XiHr8YcLLp1HHBGlr5hw73V4SDas0beCp7hqtnAqsTYyXBue4xO
 4zfD4Gujr5JVOCb0crDTyOmOQY5E+y2dqFoOUF00D5AoN2vj4nfQ9ESkbqlE9BVh
 mgJ1izSokYlN2X8rIwGXNR5fbxRmxxfkAA4rScNRytj1KxDHyrDxrp/k8YFemxSQ
 qwgb/FBm0fcr69evPRzovKwZFhcyPremksluHQE4rZZ66qBQ2cGuDJPE7PWVTpPH
 67JCrIVK/O6n5p+4ilFHmQQ3aP3ol0frMFcboYVRchJ2MhIDTsfFL3F/tTK8hy86
 AmrrdQ1BQIAoKNOKnAmOSOUdExM55OcfPmX69+AhEk2GeWP6kgz5Pks4H3qCiKGf
 YoRk8F1V+N4q+C0mFFovB61bNQ6COIlBuzmD9EtmpDD/Ta3Wib+3ZnoGVIdPS+OI
 jyj+qJxd9z4=
 =kH+r
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

- Make blockdev-reopen stable
- Remove deprecated qemu-img backing file without format
- rbd: Convert to coroutines and add write zeroes support
- rbd: Updated MAINTAINERS
- export/fuse: Allow other users access to the export
- vhost-user: Fix backends without multiqueue support
- Fix drive-backup transaction endless drained section

# gpg: Signature made Fri 09 Jul 2021 13:49:22 BST
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (28 commits)
  block: Make blockdev-reopen stable API
  iotests: Test reopening multiple devices at the same time
  block: Support multiple reopening with x-blockdev-reopen
  block: Acquire AioContexts during bdrv_reopen_multiple()
  block: Add bdrv_reopen_queue_free()
  qcow2: Fix dangling pointer after reopen for 'file'
  qemu-img: Improve error for rebase without backing format
  qemu-img: Require -F with -b backing image
  qcow2: Prohibit backing file changes in 'qemu-img amend'
  blockdev: fix drive-backup transaction endless drained section
  vhost-user: Fix backends without multiqueue support
  MAINTAINERS: add block/rbd.c reviewer
  block/rbd: fix type of task->complete
  iotests/fuse-allow-other: Test allow-other
  iotests/308: Test +w on read-only FUSE exports
  export/fuse: Let permissions be adjustable
  export/fuse: Give SET_ATTR_SIZE its own branch
  export/fuse: Add allow-other option
  export/fuse: Pass default_permissions for mount
  util/uri: do not check argument of uri_free()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-10 19:55:21 +01:00
Gerd Hoffmann
f8ade0dc01 modules: add block module annotations
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Jose R. Ziviani <jziviani@suse.de>
Message-Id: <20210624103836.2382472-14-kraxel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-09 18:20:27 +02:00
Paolo Bonzini
63a7f85306 meson: fix missing preprocessor symbols
While most libraries do not need a CONFIG_* symbol because the
"when:" clauses are enough, some do.  Add them back or stop
using them if possible.

In the case of libpmem, the statement to add the CONFIG_* symbol
was still in configure, but could not be triggered because it
checked for "no" instead of "disabled" (and it would be wrong anyway
since the test for the library has not been done yet).

Reported-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Fixes: 587d59d6cc ("configure, meson: convert virgl detection to meson", 2021-07-06)
Fixes: 83ef16821a ("configure, meson: convert libdaxctl detection to meson", 2021-07-06)
Fixes: e36e8c70f6 ("configure, meson: convert libpmem detection to meson", 2021-07-06)
Fixes: 53c22b68e3 ("configure, meson: convert liburing detection to meson", 2021-07-06)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-09 18:19:00 +02:00
Kevin Wolf
6cf42ca2f9 block: Acquire AioContexts during bdrv_reopen_multiple()
As the BlockReopenQueue can contain nodes in multiple AioContexts, only
one of which may be locked when AIO_WAIT_WHILE() can be called, we can't
let the caller lock the right contexts. Instead, individually lock the
AioContext of a single node when iterating the queue.

Reintroduce bdrv_reopen() as a wrapper for reopening a single node that
drains the node and temporarily drops the AioContext lock for
bdrv_reopen_multiple().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210708114709.206487-4-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 13:19:11 +02:00
Kevin Wolf
bcfd86d6a6 qcow2: Fix dangling pointer after reopen for 'file'
Without an external data file, s->data_file is a second pointer with the
same value as bs->file. When changing bs->file to a different BdrvChild
and freeing the old BdrvChild, s->data_file must also be updated,
otherwise it points to freed memory and causes crashes.

This problem was caught by iotests case 245.

Fixes: df2b7086f169239ebad5d150efa29c9bb6d4f820
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210708114709.206487-2-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 13:19:11 +02:00
Eric Blake
5a385bf5c5 qcow2: Prohibit backing file changes in 'qemu-img amend'
This was deprecated back in bc5ee6da7 (qcow2: Deprecate use of
qemu-img amend to change backing file), and no one in the meantime has
given any reasons why it should be supported.  Time to make change
attempts a hard error (but for convenience, specifying the _same_
backing chain is not forbidden).  Update a couple of iotests to match.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210503213600.569128-2-eblake@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 12:26:05 +02:00
Peter Lieven
64cc845bdb block/rbd: fix type of task->complete
task->complete is a bool not an integer.

Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <20210707180449.32665-1-pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 12:26:05 +02:00
Max Reitz
6aeeaed29c export/fuse: Let permissions be adjustable
Allow changing the file mode, UID, and GID through SETATTR.

Without allow_other, UID and GID are not allowed to be changed, because
it would not make sense.  Also, changing group or others' permissions
is not allowed either.

For read-only exports, +w cannot be set.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210625142317.271673-5-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 12:26:05 +02:00
Max Reitz
9bad96a8cc export/fuse: Give SET_ATTR_SIZE its own branch
In order to support changing other attributes than the file size in
fuse_setattr(), we have to give each its own independent branch.  This
also applies to the only attribute we do support right now.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210625142317.271673-4-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 12:26:05 +02:00
Max Reitz
8fc54f9428 export/fuse: Add allow-other option
Without the allow_other mount option, no user (not even root) but the
one who started qemu/the storage daemon can access the export.  Allow
users to configure the export such that such accesses are possible.

While allow_other is probably what users want, we cannot make it an
unconditional default, because passing it is only possible (for non-root
users) if the global fuse.conf configuration file allows it.  Thus, the
default is an 'auto' mode, in which we first try with allow_other, and
then fall back to without.

FuseExport.allow_other reports whether allow_other was actually used as
a mount option or not.  Currently, this information is not used, but a
future patch will let this field decide whether e.g. an export's UID and
GID can be changed through chmod.

One notable thing about 'auto' mode is that libfuse may print error
messages directly to stderr, and so may fusermount (which it executes).
Our export code cannot really filter or hide them.  Therefore, if 'auto'
fails its first attempt and has to fall back, fusermount will print an
error message that mounting with allow_other failed.

This behavior necessitates a change to iotest 308, namely we need to
filter out this error message (because if the first attempt at mounting
with allow_other succeeds, there will be no such message).

Furthermore, common.rc's _make_test_img should use allow-other=off for
FUSE exports, because iotests generally do not need to access images
from other users, so allow-other=on or allow-other=auto have no
advantage.  OTOH, allow-other=on will not work on systems where
user_allow_other is disabled, and with allow-other=auto, we get said
error message that we would need to filter out again.  Just disabling
allow-other is simplest.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210625142317.271673-3-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 12:26:05 +02:00
Max Reitz
2c7dd057aa export/fuse: Pass default_permissions for mount
We do not do any permission checks in fuse_open(), so let the kernel do
them.  We already let fuse_getattr() report the proper UNIX permissions,
so this should work the way we want.

This causes a change in 308's reference output, because now opening a
non-writable export with O_RDWR fails already, instead of only actually
attempting to write to it.  (That is an improvement.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210625142317.271673-2-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 12:26:05 +02:00
Heinrich Schuchardt
c2615bdfbd util/uri: do not check argument of uri_free()
uri_free() checks if its argument is NULL in uri_clean() and g_free().
There is no need to check the argument before the call.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Message-Id: <20210629063602.4239-1-xypron.glpk@gmx.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 12:26:05 +02:00
Peter Lieven
eb06cbab7e block/rbd: drop qemu_rbd_refresh_limits
librbd supports 1 byte alignment for all aio operations.

Currently, there is no API call to query limits from the Ceph
ObjectStore backend.  So drop the bdrv_refresh_limits completely
until there is such an API call.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Message-Id: <20210702172356.11574-7-idryomov@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 12:26:05 +02:00
Peter Lieven
c56ac27d2a block/rbd: add write zeroes support
This patch wittingly sets BDRV_REQ_NO_FALLBACK and silently ignores
BDRV_REQ_MAY_UNMAP for older librbd versions.

The rationale for this is as follows (citing Ilya Dryomov current RBD
maintainer):

---8<---
a) remove the BDRV_REQ_MAY_UNMAP check in qemu_rbd_co_pwrite_zeroes()
   and as a consequence always unmap if librbd is too old

   It's not clear what qemu's expectation is but in general Write
   Zeroes is allowed to unmap.  The only guarantee is that subsequent
   reads return zeroes, everything else is a hint.  This is how it is
   specified in the kernel and in the NVMe spec.

   In particular, block/nvme.c implements it as follows:

   if (flags & BDRV_REQ_MAY_UNMAP) {
       cdw12 |= (1 << 25);
   }

   This sets the Deallocate bit.  But if it's not set, the device may
   still deallocate:

   """
   If the Deallocate bit (CDW12.DEAC) is set to '1' in a Write Zeroes
   command, and the namespace supports clearing all bytes to 0h in the
   values read (e.g., bits 2:0 in the DLFEAT field are set to 001b)
   from a deallocated logical block and its metadata (excluding
   protection information), then for each specified logical block, the
   controller:
   - should deallocate that logical block;

   ...

   If the Deallocate bit is cleared to '0' in a Write Zeroes command,
   and the namespace supports clearing all bytes to 0h in the values
   read (e.g., bits 2:0 in the DLFEAT field are set to 001b) from
   a deallocated logical block and its metadata (excluding protection
   information), then, for each specified logical block, the
   controller:
   - may deallocate that logical block;
   """

   https://nvmexpress.org/wp-content/uploads/NVM-Express-NVM-Command-Set-Specification-2021.06.02-Ratified-1.pdf

b) set BDRV_REQ_NO_FALLBACK in supported_zero_flags

   Again, it's not clear what qemu expects here, but without it we end
   up in a ridiculous situation where specifying the "don't allow slow
   fallback" switch immediately fails all efficient zeroing requests on
   a device where Write Zeroes is always efficient:

   $ qemu-io -c 'help write' | grep -- '-[zun]'
    -n, -- with -z, don't allow slow fallback
    -u, -- with -z, allow unmapping
    -z, -- write zeroes using blk_co_pwrite_zeroes

   $ qemu-io -f rbd -c 'write -z -u -n 0 1M' rbd:foo/bar
   write failed: Operation not supported
--->8---

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Message-Id: <20210702172356.11574-6-idryomov@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 12:26:05 +02:00
Peter Lieven
c3e5fac534 block/rbd: migrate from aio to coroutines
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Message-Id: <20210702172356.11574-5-idryomov@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 12:26:05 +02:00
Peter Lieven
6d9214189e block/rbd: update s->image_size in qemu_rbd_getlength
While at it just call rbd_get_size and avoid rbd_image_info_t.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Message-Id: <20210702172356.11574-4-idryomov@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 12:26:05 +02:00
Peter Lieven
832a93dcb8 block/rbd: store object_size in BDRVRBDState
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Message-Id: <20210702172356.11574-3-idryomov@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 12:26:05 +02:00
Peter Lieven
48672ac058 block/rbd: bump librbd requirement to luminous release
Ceph Luminous (version 12.2.z) is almost 4 years old at this point.
Bump the requirement to get rid of the ifdef'ry in the code.
Qemu 6.1 dropped the support for RHEL-7 which was the last supported
OS that required an older librbd.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Message-Id: <20210702172356.11574-2-idryomov@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 12:26:05 +02:00
Or Ozeri
42e4ac9ef5 block/rbd: Add support for rbd image encryption
Starting from ceph Pacific, RBD has built-in support for image-level encryption.
Currently supported formats are LUKS version 1 and 2.

There are 2 new relevant librbd APIs for controlling encryption, both expect an
open image context:

rbd_encryption_format: formats an image (i.e. writes the LUKS header)
rbd_encryption_load: loads encryptor/decryptor to the image IO stack

This commit extends the qemu rbd driver API to support the above.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
Message-Id: <20210627114635.39326-1-oro@il.ibm.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09 12:26:05 +02:00
Akihiko Odaki
9f460c64e1 block/io: Merge discard request alignments
Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Message-id: 20210705130458.97642-3-akihiko.odaki@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-07-06 14:28:55 +01:00
Akihiko Odaki
0dfc7af2b2 block/file-posix: Optimize for macOS
This commit introduces "punch hole" operation and optimizes transfer
block size for macOS.

Thanks to Konstantin Nazarov for detailed analysis of a flaw in an
old version of this change:
https://gist.github.com/akihikodaki/87df4149e7ca87f18dc56807ec5a1bc5#gistcomment-3654667

Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Message-id: 20210705130458.97642-1-akihiko.odaki@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-07-06 14:28:55 +01:00
Peter Maydell
9c2647f750 Block layer patches
- Supporting changing 'file' in x-blockdev-reopen
 - ssh: add support for sha256 host key fingerprints
 - vhost-user-blk: Implement reconnection during realize
 - introduce QEMU_AUTO_VFREE
 - Don't require password of encrypted backing file for image creation
 - Code cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmDclTcRHGt3b2xmQHJl
 ZGhhdC5jb20ACgkQfwmycsiPL9bzaw/+PYQ9vPG+ZROWl633TUOQu7IYGZynXCET
 ZHlV2JlXnFH8QoO8A53U72cgVg+GlwDpiMCCtjEGMG1yfMBNe+DXR1wFUMDne1Gs
 qIFX4gIVpPGDi3gPeQvefLCwoN8VXwIxvCJj40YR9BY0cqQR6joI81kjVlqKemqB
 cHG4qJHmnphihSLep/dd2BRInkeHWXWU63jK4d7ctdCwHZPNDf0u0FEraxEqS/d0
 5tfdIl3haghhoDRRamyh5bLZBCW1KlpfTR98RdspcwpSlXq/FgV5N9CFa15XSYfQ
 rinSClSIOpLzG90+tBihROVBXvbugO5qZVQTG0Yg1tt4FGG8Cmiqf9MNXC5yctNg
 WnaQQipx/37deafGA4jqorZBJd1R87JLJTBFTpkB47XAFq/ltqsTDhrrfdS+jail
 Fd+qyqWg0Jx3JjdhSUpHvDKBBsErjoxtoyQIGakSreXGmj2UY6BFmGii7lnLZNLo
 +E81C7exnkCIGKkOHy+y9DkpVY/PEJKCG7uwcyy+F2qOqGUOxKLuZomWcLodo6Vf
 /eJ/UsLJt6HhXhXq/1ZZHmaORn8Lft1yr/9azoGXZ7er+jZcbEkhbcZmET+Y6ykq
 Vox/GmLkhyVkM96MA0lMW5hHPWUbF29m9Jmq3nNfvFWBcILEs4uWSlbd0M2oAmWj
 ung9sKIV/8s=
 =aB0a
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

- Supporting changing 'file' in x-blockdev-reopen
- ssh: add support for sha256 host key fingerprints
- vhost-user-blk: Implement reconnection during realize
- introduce QEMU_AUTO_VFREE
- Don't require password of encrypted backing file for image creation
- Code cleanups

# gpg: Signature made Wed 30 Jun 2021 17:00:55 BST
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (24 commits)
  vhost-user-blk: Implement reconnection during realize
  vhost-user-blk: Factor out vhost_user_blk_realize_connect()
  vhost: Distinguish errors in vhost_dev_get_config()
  vhost-user-blk: Add Error parameter to vhost_user_blk_start()
  vhost: Return 0/-errno in vhost_dev_init()
  vhost: Distinguish errors in vhost_backend_init()
  vhost: Add Error parameter to vhost_dev_init()
  block/ssh: add support for sha256 host key fingerprints
  block/commit: use QEMU_AUTO_VFREE
  introduce QEMU_AUTO_VFREE
  iotests: Test replacing files with x-blockdev-reopen
  block: Allow changing bs->file on reopen
  block: BDRVReopenState: drop replace_backing_bs field
  block: move supports_backing check to bdrv_set_file_or_backing_noperm()
  block: bdrv_reopen_parse_backing(): simplify handling implicit filters
  block: bdrv_reopen_parse_backing(): don't check frozen child
  block: bdrv_reopen_parse_backing(): don't check aio context
  block: introduce bdrv_set_file_or_backing_noperm()
  block: introduce bdrv_remove_file_or_backing_child()
  block: comment graph-modifying function not updating permissions
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-02 11:46:32 +01:00
Daniel P. Berrangé
bf783261f0 block/ssh: add support for sha256 host key fingerprints
Currently the SSH block driver supports MD5 and SHA1 for host key
fingerprints. This is a cryptographically sensitive operation and
so these hash algorithms are inadequate by modern standards. This
adds support for SHA256 which has been supported in libssh since
the 0.8.1 release.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210622115156.138458-1-berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-30 12:45:32 +02:00
Philippe Mathieu-Daudé
7b3b616838 block/nbd: Use qcrypto_tls_creds_check_endpoint()
Avoid accessing QCryptoTLSCreds internals by using
the qcrypto_tls_creds_check_endpoint() helper.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-06-29 18:29:47 +01:00
Vladimir Sementsov-Ogievskiy
7170170866 block/commit: use QEMU_AUTO_VFREE
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210628121133.193984-3-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-29 16:51:21 +02:00
Eric Blake
97efa8698e block: Move read-only check during truncation earlier
No need to start a tracked request that will always fail.  The choice
to check read-only after bdrv_inc_in_flight() predates 1bc5f09f2e
(block: Use tracked request for truncate), but waiting for serializing
requests can make the effect more noticeable.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210609163034.997943-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-29 16:51:00 +02:00
Peter Maydell
6512fa497c * Some Meson test conversions
* KVM dirty page ring buffer fix
 * KVM TSC scaling support
 * Fixes for SG_IO with /dev/sdX devices
 * (Non)support for host devices on iOS
 * -smp cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDV5TIUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNySgf9HMnAtLWp36p2ie74o4rrW9x3Ojrm
 fuCq2i3q3nBhEKqqiyp+QQJGubE44mXEZQYtX89tOfSFgg7o6SLIoAcQQskr+In6
 f9I1jjpSVTls0AaGUO+iRn9KiTzeMWeo1l6Wht+2mfBL5XpNLaLLu/T49uPhjlvN
 zFi5blgILxIYMqMCD1joDBnIiqqDozr0p7QzRZD8re25sRhg0NHQxyIh3OxBPpJ9
 3Jhy1Us0cDWrwvPbxz6S5N0zesLu1ojtojVPy6iKjyHSv+6eiE6bHyIbS8duG5+H
 zBC1THOsUV3X1UvPAjuSNlgfNeobGAzmxSJ/evLgWWkpkx1mLtsnL5RARQ==
 =YoOL
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging

* Some Meson test conversions
* KVM dirty page ring buffer fix
* KVM TSC scaling support
* Fixes for SG_IO with /dev/sdX devices
* (Non)support for host devices on iOS
* -smp cleanups

# gpg: Signature made Fri 25 Jun 2021 15:16:18 BST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini-gitlab/tags/for-upstream: (28 commits)
  machine: reject -smp dies!=1 for non-PC machines
  machine: pass QAPI struct to mc->smp_parse
  machine: add error propagation to mc->smp_parse
  machine: move common smp_parse code to caller
  machine: move dies from X86MachineState to CpuTopology
  file-posix: handle EINTR during ioctl
  block: detect DKIOCGETBLOCKCOUNT/SIZE before use
  block: try BSD disk size ioctls one after another
  block: check for sys/disk.h
  block: feature detection for host block support
  file-posix: try BLKSECTGET on block devices too, do not round to power of 2
  block: add max_hw_transfer to BlockLimits
  block-backend: align max_transfer to request alignment
  osdep: provide ROUND_DOWN macro
  scsi-generic: pass max_segments via max_iov field in BlockLimits
  file-posix: fix max_iov for /dev/sg devices
  KVM: Fix dirty ring mmap incorrect size due to renaming accident
  configure, meson: convert libusbredir detection to meson
  configure, meson: convert libcacard detection to meson
  configure, meson: convert libusb detection to meson
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-28 21:04:22 +01:00
Peter Maydell
9e654e1019 block: Make block-copy API thread-safe
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEi5wmzbL9FHyIDoahVh8kwfGfefsFAmDVzrgACgkQVh8kwfGf
 efuoKRAAsHqE46P2xwjjPROtwZC6HP/Ny5xfbF2CglfC4eX4ff89dwWSagjHfBig
 1TQjHemOo5Fs8gyBGec0WPn0HaBVFthq75VGObt+QkHy7+owGDmtVghkXnEO8z2c
 gldPoVeOXAbU4DZXgITQYfN1ljCOdrdKQMrMYmKqXLmw/vMzAfKlBKqsOFQiyVys
 4egn25QxyiOh+T29zyVwmGVABaH2TIuJqkDr+iMh4IypYZf0BlqJRw++kLhGdxGq
 RIpXQiXExy3lRm8htuh+GDAigSXNz93XKU3ZHe8RPBtPfdS0siGWwVTW2nLsH+Rc
 vfUvfkqBSGcFaFjGHg/eNGvoMTYAZKHx8yq72voqrfFIPtz3NAoS2ahz/hIU/NiL
 YLOmOcRZVx+xJ5lxjaxi/SvbTHVtZoBym/Aje/YiT3v4A8rziG6BeBUP9ec/bk+D
 YYxMZNfzW2jVq0Vl4TZyYmV8e/H8Ha3HbLLJip3tiLXsBIjajfNti9iJ/xac5NzD
 jV5pR27yIXilYHPR7GCYaMRp5LJv8uGiW704yAt2dwBizqonm7cgTg5Fcx0HFDyk
 +HyPwu/TGI3cEF7dl+8V+AmwKug+jFj3VFVh5UMtf4bqNeliYNx+QpCUuWPuPMFc
 U1nXWYBtcklD4JNPERUgUW7x2tmAJN5dGDY0LAa2brrOnKVTTwU=
 =JXBO
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/vsementsov/tags/pull-jobs-2021-06-25' into staging

block: Make block-copy API thread-safe

# gpg: Signature made Fri 25 Jun 2021 13:40:24 BST
# gpg:                using RSA key 8B9C26CDB2FD147C880E86A1561F24C1F19F79FB
# gpg: Good signature from "Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 8B9C 26CD B2FD 147C 880E  86A1 561F 24C1 F19F 79FB

* remotes/vsementsov/tags/pull-jobs-2021-06-25:
  block-copy: atomic .cancelled and .finished fields in BlockCopyCallState
  block-copy: add CoMutex lock
  block-copy: move progress_set_remaining in block_copy_task_end
  block-copy: streamline choice of copy_range vs. read/write
  block-copy: small refactor in block_copy_task_entry and block_copy_common
  co-shared-resource: protect with a mutex
  progressmeter: protect with a mutex
  blockjob: let ratelimit handle a speed of 0
  block-copy: let ratelimit handle a speed of 0
  ratelimit: treat zero speed as unlimited

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-28 18:58:19 +01:00
Emanuele Giuseppe Esposito
149009bef4 block-copy: atomic .cancelled and .finished fields in BlockCopyCallState
By adding acquire/release pairs, we ensure that .ret and .error_is_read
fields are written by block_copy_dirty_clusters before .finished is true,
and that they are read by API user after .finished is true.

The atomic here are necessary because the fields are concurrently modified
in coroutines, and read outside.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20210624072043.180494-6-eesposit@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-25 14:33:51 +03:00
Emanuele Giuseppe Esposito
d0c389d2ce block-copy: add CoMutex lock
Group various structures fields, to better understand what we need to
protect with a lock and what doesn't need it.
Then, add a CoMutex to protect concurrent access of block-copy
data structures. This mutex also protects .copy_bitmap, because its thread-safe
API does not prevent it from assigning two tasks to the same
bitmap region.

Exceptions to the lock:
- .sleep_state is handled in the series "coroutine: new sleep/wake API"
and thus here left as TODO.

- .finished, .cancelled and reads to .ret and .error_is_read will be
protected in the following patch, because are used also outside
coroutines.

- .skip_unallocated is atomic. Including it under the mutex would
increase the critical sections and make them also much more complex.
We can have it as atomic since it is only written from outside and
read by block-copy coroutines.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20210624072043.180494-5-eesposit@redhat.com>
  [vsementsov: fix typo in comment]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-25 14:33:39 +03:00
Emanuele Giuseppe Esposito
e3dd339fee block-copy: move progress_set_remaining in block_copy_task_end
Moving this function in task_end ensures to update the progress
anyways, even if there is an error.

It also helps in next patch, allowing task_end to have only
one critical section.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20210624072043.180494-4-eesposit@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-25 14:33:35 +03:00
Paolo Bonzini
05d5e12b24 block-copy: streamline choice of copy_range vs. read/write
Put the logic to determine the copy size in a separate function, so
that there is a simple state machine for the possible methods of
copying data from one BlockDriverState to the other.

Use .method instead of .copy_range as in-out argument, and
include also .zeroes as an additional copy method.

While at it, store the common computation of block_copy_max_transfer
into a new field of BlockCopyState, and make sure that we always
obey max_transfer; that's more efficient even for the
COPY_RANGE_READ_WRITE case.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210624072043.180494-3-eesposit@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-25 14:33:33 +03:00
Emanuele Giuseppe Esposito
c6a3e3df30 block-copy: small refactor in block_copy_task_entry and block_copy_common
Use a local variable instead of referencing BlockCopyState through a
BlockCopyCallState or BlockCopyTask every time.
This is in preparation for next patches.

No functional change intended.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20210624072043.180494-2-eesposit@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-25 14:32:09 +03:00
Emanuele Giuseppe Esposito
a7b4f8fc09 progressmeter: protect with a mutex
Progressmeter is protected by the AioContext mutex, which
is taken by the block jobs and their caller (like blockdev).

We would like to remove the dependency of block layer code on the
AioContext mutex, since most drivers and the core I/O code are already
not relying on it.

Create a new C file to implement the ProgressMeter API, but keep the
struct as public, to avoid forcing allocation on the heap.

Also add a mutex to be able to provide an accurate snapshot of the
progress values to the caller.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210614081130.22134-5-eesposit@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-25 14:24:24 +03:00
Paolo Bonzini
ca657c99e6 block-copy: let ratelimit handle a speed of 0
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20210614081130.22134-3-eesposit@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-25 14:24:16 +03:00
Paolo Bonzini
bd80936a4f file-posix: handle EINTR during ioctl
Similar to other handle_aiocb_* functions, handle_aiocb_ioctl needs to cater
for the possibility that ioctl is interrupted by a signal.  Otherwise, the
I/O is incorrectly reported as a failure to the guest.

Reported-by: Gordon Watson <gwatson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25 10:54:13 +02:00
Joelle van Dyne
09e20abdda block: detect DKIOCGETBLOCKCOUNT/SIZE before use
iOS hosts do not have these defined so we fallback to the
default behaviour.

Co-authored-by: Warner Losh <imp@bsdimp.com>
Signed-off-by: Joelle van Dyne <j@getutm.app>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25 10:54:13 +02:00
Paolo Bonzini
267cd53f5f block: try BSD disk size ioctls one after another
Try all the possible ioctls for disk size as long as they are
supported, to keep the #if ladder simple.

Extracted and cleaned up from a patch by Joelle van Dyne and
Warner Losh.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25 10:54:13 +02:00
Joelle van Dyne
14176c8d05 block: feature detection for host block support
On Darwin (iOS), there are no system level APIs for directly accessing
host block devices. We detect this at configure time.

Signed-off-by: Joelle van Dyne <j@getutm.app>
Message-Id: <20210315180341.31638-2-j@getutm.app>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25 10:54:13 +02:00
Paolo Bonzini
18473467d5 file-posix: try BLKSECTGET on block devices too, do not round to power of 2
bs->sg is only true for character devices, but block devices can also
be used with scsi-block and scsi-generic.  Unfortunately BLKSECTGET
returns bytes in an int for /dev/sgN devices, and sectors in a short
for block devices, so account for that in the code.

The maximum transfer also need not be a power of 2 (for example I have
seen disks with 1280 KiB maximum transfer) so there's no need to pass
the result through pow2floor.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25 10:54:13 +02:00
Paolo Bonzini
24b36e9813 block: add max_hw_transfer to BlockLimits
For block host devices, I/O can happen through either the kernel file
descriptor I/O system calls (preadv/pwritev, io_submit, io_uring)
or the SCSI passthrough ioctl SG_IO.

In the latter case, the size of each transfer can be limited by the
HBA, while for file descriptor I/O the kernel is able to split and
merge I/O in smaller pieces as needed.  Applying the HBA limits to
file descriptor I/O results in more system calls and suboptimal
performance, so this patch splits the max_transfer limit in two:
max_transfer remains valid and is used in general, while max_hw_transfer
is limited to the maximum hardware size.  max_hw_transfer can then be
included by the scsi-generic driver in the block limits page, to ensure
that the stricter hardware limit is used.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25 10:54:13 +02:00
Paolo Bonzini
b99f7fa08a block-backend: align max_transfer to request alignment
Block device requests must be aligned to bs->bl.request_alignment.
It makes sense for drivers to align bs->bl.max_transfer the same
way; however when there is no specified limit, blk_get_max_transfer
just returns INT_MAX.  Since the contract of the function does not
specify that INT_MAX means "no maximum", just align the outcome
of the function (whether INT_MAX or bs->bl.max_transfer) before
returning it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25 10:54:13 +02:00
Paolo Bonzini
01ef8185b8 scsi-generic: pass max_segments via max_iov field in BlockLimits
I/O to a disk via read/write is not limited by the number of segments allowed
by the host adapter; the kernel can split requests if needed, and the limit
imposed by the host adapter can be very low (256k or so) to avoid that SG_IO
returns EINVAL if memory is heavily fragmented.

Since this value is only interesting for SG_IO-based I/O, do not include
it in the max_transfer and only take it into account when patching the
block limits VPD page in the scsi-generic device.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2021-06-25 10:54:12 +02:00
Paolo Bonzini
8ad5ab6148 file-posix: fix max_iov for /dev/sg devices
Even though it was only called for devices that have bs->sg set (which
must be character devices), sg_get_max_segments looked at /sys/dev/block
which only works for block devices.

On Linux the sg driver has its own way to provide the maximum number of
iovecs in a scatter/gather list, so add support for it.  The block device
path is kept because it will be reinstated in the next patches.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2021-06-25 10:54:12 +02:00
Max Reitz
32a9a245d7 block/snapshot: Clarify goto fallback behavior
In the bdrv_snapshot_goto() fallback code, we work with a pointer to
either bs->file or bs->backing.  We detach that child, close the node
(with .bdrv_close()), apply the snapshot on the child node, and then
re-open the node (with .bdrv_open()).

In order for .bdrv_open() to attach the same child node that we had
before, we pass "file={child-node}" or "backing={child-node}" to it.
Therefore, when .bdrv_open() has returned success, we can assume that
bs->file or bs->backing (respectively) points to our original child
again.  This is verified by an assertion.

All of this is not immediately clear from a quick glance at the code,
so add a comment to the assertion what it is for, and why it is valid.
It certainly confused Coverity.

Reported-by: Coverity (CID 1452774)
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210503095418.31521-1-mreitz@redhat.com>
[mreitz: s/close/detach/]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-24 09:49:04 +02:00
Vladimir Sementsov-Ogievskiy
bbfb7c2f35 block/nbd: safer transition to receiving request
req->receiving is a flag of request being in one concrete yield point
in nbd_co_do_receive_one_chunk().

Such kind of boolean flag is always better to unset before scheduling
the coroutine, to avoid double scheduling. So, let's be more careful.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-33-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 12:21:22 -05:00
Vladimir Sementsov-Ogievskiy
91e0998f5a block/nbd: add nbd_client_connected() helper
We already have two similar helpers for other state. Let's add another
one for convenience.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-32-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 12:21:22 -05:00
Vladimir Sementsov-Ogievskiy
a71d597b98 block/nbd: reuse nbd_co_do_establish_connection() in nbd_open()
The only last step we need to reuse the function is coroutine-wrapper.
nbd_open() may be called from non-coroutine context. So, generate the
wrapper and use it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-31-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 12:21:22 -05:00
Vladimir Sementsov-Ogievskiy
97cf89259e nbd/client-connection: add option for non-blocking connection attempt
We'll need a possibility of non-blocking nbd_co_establish_connection(),
so that it returns immediately, and it returns success only if a
connections was previously established in background.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-30-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 12:21:22 -05:00
Vladimir Sementsov-Ogievskiy
51edbf537d block/nbd: split nbd_co_do_establish_connection out of nbd_reconnect_attempt
Split out the part that we want to reuse for nbd_open().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610100802.5888-29-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 12:21:21 -05:00
Vladimir Sementsov-Ogievskiy
43cb34dede nbd/client-connection: return only one io channel
block/nbd doesn't need underlying sioc channel anymore. So, we can
update nbd/client-connection interface to return only one top-most io
channel, which is more straight forward.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-27-vsementsov@virtuozzo.com>
[eblake: squash in Vladimir's fixes for uninit usage caught by clang]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 12:20:53 -05:00
Vladimir Sementsov-Ogievskiy
95a078ea3e block/nbd: drop BDRVNBDState::sioc
Currently sioc pointer is used just to pass from socket-connection to
nbd negotiation. Drop the field, and use local variables instead. With
next commit we'll update nbd/client-connection.c to behave
appropriately (return only top-most ioc, not two channels).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-26-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:54 -05:00
Vladimir Sementsov-Ogievskiy
c2405af0e4 block/nbd: don't touch s->sioc in nbd_teardown_connection()
Negotiation during reconnect is now done in a thread, and s->sioc is
not available during negotiation. Negotiation in thread will be
cancelled by nbd_client_connection_release() called from
nbd_clear_bdrvstate().  So, we don't need this code chunk anymore.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-25-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:54 -05:00
Vladimir Sementsov-Ogievskiy
6d2b0332d3 block/nbd: use negotiation of NBDClientConnection
Now that we can opt in to negotiation as part of the client connection
thread, use that to simplify connection_co.  This is another step on
the way to moving all reconnect code into NBDClientConnection.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-24-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:54 -05:00
Vladimir Sementsov-Ogievskiy
e9ba7788b0 block/nbd: split nbd_handle_updated_info out of nbd_client_handshake()
To be reused in the following patch.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-23-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:54 -05:00
Vladimir Sementsov-Ogievskiy
130d49baa5 nbd/client-connection: add possibility of negotiation
Add arguments and logic to support nbd negotiation in the same thread
after successful connection.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610100802.5888-20-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy
5276c87c12 nbd: move connection code from block/nbd to nbd/client-connection
We now have bs-independent connection API, which consists of four
functions:

  nbd_client_connection_new()
  nbd_client_connection_release()
  nbd_co_establish_connection()
  nbd_co_establish_connection_cancel()

Move them to a separate file together with NBDClientConnection
structure which becomes private to the new API.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-18-vsementsov@virtuozzo.com>
[eblake: comment tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy
248d470198 block/nbd: introduce nbd_client_connection_release()
This is a last step of creating bs-independent nbd connection
interface. With next commit we can finally move it to separate file.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610100802.5888-17-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy
f68729747d block/nbd: introduce nbd_client_connection_new()
This is a step of creating bs-independent nbd connection interface.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-16-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy
90ddc64fb2 block/nbd: rename NBDConnectThread to NBDClientConnection
We are going to move the connection code to its own file, and want
clear names and APIs first.

The structure is shared between user and (possibly) several runs of
connect-thread. So it's wrong to call it "thread". Let's rename to
something more generic.

Appropriately rename connect_thread and thr variables to conn.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-15-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy
c3e7730485 block/nbd: make nbd_co_establish_connection_cancel() bs-independent
nbd_co_establish_connection_cancel() actually needs only pointer to
NBDConnectThread. So, make it clean.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-14-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy
d33833d7af block/nbd: bs-independent interface for nbd_co_establish_connection()
We are going to split connection code to a separate file. Now we are
ready to give nbd_co_establish_connection() clean and bs-independent
interface.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-13-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy
b8e8a3d116 block/nbd: drop thr->state
We don't need all these states. The code refactored to use two boolean
variables looks simpler.

While moving the comment in nbd_co_establish_connection() rework it to
give better information. Also, we are going to move the connection code
to separate file and mentioning drained section would be confusing.

Improve also the comment in NBDConnectThread, while dropping removed
state names from it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610100802.5888-12-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: comment tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy
08ea55d068 block/nbd: simplify waking of nbd_co_establish_connection()
Instead of managing connect_bh, bh_ctx, and wait_connect fields, we
can use a single link to the waiting coroutine with proper mutex
protection.

So new logic is:

nbd_co_establish_connection() sets wait_co under the mutex, releases
the mutex, then yield()s.  Note that wait_co may be scheduled by the
thread immediately after unlocking the mutex.  Still, the main thread
(or iothread) will not reach the code for entering the coroutine until
the yield(), so we are safe.

connect_thread_func() and nbd_co_establish_connection_cancel() do
the following to handle wait_co:

Under the mutex, if thr->wait_co is not NULL, make it NULL and
schedule it. This way, we avoid scheduling the coroutine twice.

Still scheduling is a bit different:

In connect_thread_func() we can just call aio_co_wake under mutex,
after commit
   [async: the main AioContext is only "current" if under the BQL]
we are sure that aio_co_wake() will not try to acquire the aio context
and do qemu_aio_coroutine_enter() but simply schedule the coroutine by
aio_co_schedule().

nbd_co_establish_connection_cancel() will be called from non-coroutine
context in further patch and will be able to go through
qemu_aio_coroutine_enter() path of aio_co_wake(). So keep current
behavior of waking the coroutine after the critical section.

Also, this commit reduces the dependence of
nbd_co_establish_connection() on the internals of bs (we now use a
generic pointer to the coroutine, instead of direct use of
s->connection_co).  This is a step towards splitting the connection
API out of nbd.c.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610100802.5888-11-vsementsov@virtuozzo.com>
Reviewied-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy
2def3edb4b block/nbd: BDRVNBDState: drop unused connect_err and connect_status
These fields are write-only. Drop them.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-10-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy
2a25def4be block/nbd: nbd_client_handshake(): fix leak of s->ioc
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Message-Id: <20210610100802.5888-9-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:53 -05:00
Roman Kagan
e8b35bf5dc block/nbd: ensure ->connection_thread is always valid
Simplify lifetime management of BDRVNBDState->connect_thread by
delaying the possible cleanup of it until the BDRVNBDState itself goes
away.

This also reverts
 0267101af6 "block/nbd: fix possible use after free of s->connect_thread"
as now s->connect_thread can't be cleared until the very end.

Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
 [vsementsov: rebase, revert 0267101af6 changes]
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
 [eblake: tweak comment]
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-8-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy
6cc702beac block/nbd: call socket_address_parse_named_fd() in advance
Detecting monitor by current coroutine works bad when we are not in
coroutine context. And that's exactly so in nbd reconnect code, where
qio_channel_socket_connect_sync() is called from thread.

Monitor is needed only to parse named file descriptor. So, let's just
parse it during nbd_open(), so that all further users of s->saddr don't
need to access monitor.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610100802.5888-7-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy
fb392b548e block/nbd: connect_thread_func(): do qio_channel_set_delay(false)
nbd_open() does it (through nbd_establish_connection()).
Actually we lost that call on reconnect path in 1dc4718d84
"block/nbd: use non-blocking connect: fix vm hang on connect()"
when we have introduced reconnect thread.

Fixes: 1dc4718d84
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610100802.5888-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy
bbba1c376b block/nbd: fix how state is cleared on nbd_open() failure paths
We have two "return error" paths in nbd_open() after
nbd_process_options(). Actually we should call nbd_clear_bdrvstate()
on these paths. Interesting that nbd_process_options() calls
nbd_clear_bdrvstate() by itself.

Let's fix leaks and refactor things to be more obvious:

- intialize yank at top of nbd_open()
- move yank cleanup to nbd_clear_bdrvstate()
- refactor nbd_open() so that all failure paths except for
  yank-register goes through nbd_clear_bdrvstate()

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-4-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:53 -05:00
Roman Kagan
3687ad4903 block/nbd: fix channel object leak
nbd_free_connect_thread leaks the channel object if it hasn't been
stolen.

Unref it and fix the leak.

Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18 10:59:52 -05:00
Daniel P. Berrangé
39683553f9 block: use GDateTime for formatting timestamp when dumping snapshot info
The GDateTime APIs provided by GLib avoid portability pitfalls, such
as some platforms where 'struct timeval.tv_sec' field is still 'long'
instead of 'time_t'. When combined with automatic cleanup, GDateTime
often results in simpler code too.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-06-14 13:28:50 +01:00
Daniel P. Berrangé
99be1ac366 block: remove duplicate trace.h include
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-06-14 13:28:50 +01:00
Daniel P. Berrangé
60ff2ae2a2 block: add trace point when fdatasync fails
A flush failure is a critical failure scenario for some operations.
For example, it will prevent migration from completing, as it will
make vm_stop() report an error. Thus it is important to have a
trace point present for debugging.

Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-06-14 13:28:50 +01:00
Daniel P. Berrangé
c7ddc8821d block: preserve errno from fdatasync failures
When fdatasync() fails on a file backend we set a flag that
short-circuits any future attempts to call fdatasync(). The
first failure returns the true errno, but the later short-
circuited calls return a generic EIO. The latter is unhelpful
because fdatasync() can return a variety of errnos, including
EACCESS.

Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-06-14 13:28:50 +01:00
Paolo Bonzini
7fa1c63553 iscsi: link libm into the module
Depending on the configuration of QEMU, some binaries might not need libm
at all.  In that case libiscsi, which uses exp(), will fail to load.
Link it in the module explicitly.

Reported-by: Yi Sun <yisun@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-04 13:47:07 +02:00
Paolo Bonzini
96acfb1f25 meson: allow optional dependencies for block modules
Right now all dependencies for block modules are passed to
module_ss.add(when: ...), so they are mandatory.  In the next patch we
will need to add a libm dependency to a module, but libm does not exist
on all systems.  So, modify the creation of module_ss and modsrc so that
dependencies can also be passed to module_ss.add(if_true: ...).

While touching the array, remove the useless dependency of the curl
module on glib.  glib is always linked in QEMU and in fact all other
block modules also need it, but they don't have to specify it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-04 13:47:07 +02:00
Peter Maydell
8e6dad2028 Block layer patches
- NBD server: Fix crashes related to switching between AioContexts
 - file-posix: Workaround for discard/write_zeroes on buggy filesystems
 - Follow-up fixes for the reopen vs. permission changes
 - quorum: Fix error handling for flush
 - block-copy: Refactor copy_range handling
 - docs: Describe how to use 'null-co' block driver
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmC3iy8RHGt3b2xmQHJl
 ZGhhdC5jb20ACgkQfwmycsiPL9ZP0hAAuh07CFWLzHCcRC7PBzekSPfzRYYBLDSW
 EObJ1Ov4mvz8UZoP6BDJ5QVzLPhel6hXkxTd83B1D7t/Dq+yJYR0z8Kv3USpaVJ4
 2U26SsoGQM8BmtVDL1Q8tQ5eDWQ4ykxNx6F2/lKBe1EH1lfaun04Xj1rNh7jpilo
 nNmKMDDI1UOkH0lKDR3tqfEV0XQE7o+ZKfPlIbvYMjXk9ZPKUHfjNPGVdCLQVnqH
 VJI01hF7eEx1ykSMdlC+TzNoVGG+mCBokGuW0JlUvOpX6FcGnAlxXQx8u53c1I8O
 lggZV8b2IbrNBUVwXQHLLrXxjOo+u54Ct9y/gXUTAj8qai+9jVRp60Y1pnCyeIeu
 DzFx10xwy04PGleb7AAZ4dT8du2+PTkuyQ9KmlvQ2U4IUcgW124CrDeO7XYr1aif
 hCTJPeEDrC9YNU6AQ8rLXrYUtkumSm2zUzU5nZ//i5WH41475/vsmgP5A+Jr457A
 Xu0yiI2Gqkr9CNsP9ZzMkNj03oIBhPFuGxiwibLQsy/6UVnaDYS0+rQ8FXYnF5+K
 iEpgXe3vKTWxM097kzJMBTDVRMXRa75NtK7KWXMDgVpHTbcv1t1otsn+6dfv+B55
 ULJM1ETsyYS0T6BqNglvdytsraSt7JgSF+ZLHbYk3KVDshwnq/0ksgSqHNNA14ca
 kYTzHhMgo5w=
 =gSq/
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

- NBD server: Fix crashes related to switching between AioContexts
- file-posix: Workaround for discard/write_zeroes on buggy filesystems
- Follow-up fixes for the reopen vs. permission changes
- quorum: Fix error handling for flush
- block-copy: Refactor copy_range handling
- docs: Describe how to use 'null-co' block driver

# gpg: Signature made Wed 02 Jun 2021 14:44:15 BST
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  docs/secure-coding-practices: Describe how to use 'null-co' block driver
  block-copy: refactor copy_range handling
  block-copy: fix block_copy_task_entry() progress update
  nbd/server: Use drained block ops to quiesce the server
  block-backend: add drained_poll
  block: improve permission conflict error message
  block: simplify bdrv_child_user_desc()
  block/vvfat: inherit child_vvfat_qcow from child_of_bds
  block: improve bdrv_child_get_parent_desc()
  block-backend: improve blk_root_get_parent_desc()
  block: document child argument of bdrv_attach_child_common()
  block/file-posix: Try other fallbacks after invalid FALLOC_FL_ZERO_RANGE
  block/file-posix: Fix problem with fallocate(PUNCH_HOLE) on GPFS
  block: drop BlockBackendRootState::read_only
  block: drop BlockDriverState::read_only
  block: consistently use bdrv_is_read_only()
  block/vvfat: fix vvfat_child_perm crash
  block/vvfat: child_vvfat_qcow: add .get_parent_aio_context, fix crash
  qemu-io-cmds: assert that we don't have .perm requested in no-blk case
  block/quorum: Provide .bdrv_co_flush instead of .bdrv_co_flush_to_disk

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-02 19:34:03 +01:00
Vladimir Sementsov-Ogievskiy
bed9523471 block-copy: refactor copy_range handling
Currently we update s->use_copy_range and s->copy_size in
block_copy_do_copy().

It's not very good:

1. block_copy_do_copy() is intended to be a simple function, that wraps
bdrv_co_<io> functions for need of block copy. That's why we don't pass
BlockCopyTask into it. So, block_copy_do_copy() is bad place for
manipulation with generic state of block-copy process

2. We are going to make block-copy thread-safe. So, it's good to move
manipulation with state of block-copy to the places where we'll need
critical sections anyway, to not introduce extra synchronization
primitives in block_copy_do_copy().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210528141628.44287-3-vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy
8146b357d0 block-copy: fix block_copy_task_entry() progress update
Don't report successful progress on failure, when call_state->ret is
set.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210528141628.44287-2-vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-02 14:23:20 +02:00
Sergio Lopez
095cc4d0f6 block-backend: add drained_poll
Allow block backends to poll their devices/users to check if they have
been quiesced when entering a drained section.

This will be used in the next patch to wait for the NBD server to be
completely quiesced.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20210602060552.17433-2-slp@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy
8081f064e4 block/vvfat: inherit child_vvfat_qcow from child_of_bds
Recently we've fixed a crash by adding .get_parent_aio_context handler
to child_vvfat_qcow. Now we want it to support .get_parent_desc as
well. child_vvfat_qcow wants to implement own .inherit_options, it's
not bad. But omitting all other handlers is a bad idea. Let's inherit
the class from child_of_bds instead, similar to chain_child_class and
detach_by_driver_cb_class in test-bdrv-drain.c.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210601075218.79249-5-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy
fd240a184b block-backend: improve blk_root_get_parent_desc()
We have different types of parents: block nodes, block backends and
jobs. So, it makes sense to specify type together with name.

While being here also use g_autofree.

iotest 307 output is updated.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20210601075218.79249-3-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-02 14:23:20 +02:00
Thomas Huth
fa95e9fbab block/file-posix: Try other fallbacks after invalid FALLOC_FL_ZERO_RANGE
If fallocate(... FALLOC_FL_ZERO_RANGE ...) returns EINVAL, it's likely
an indication that the file system is buggy and does not implement
unaligned accesses right. We still might be lucky with the other
fallback fallocate() calls later in this function, though, so we should
not return immediately and try the others first.
Since FALLOC_FL_ZERO_RANGE could also return EINVAL if the file descriptor
is not a regular file, we ignore this filesystem bug silently, without
printing an error message for the user.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210527172020.847617-3-thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-02 14:23:20 +02:00
Thomas Huth
73ebf29729 block/file-posix: Fix problem with fallocate(PUNCH_HOLE) on GPFS
A customer reported that running

 qemu-img convert -t none -O qcow2 -f qcow2 input.qcow2 output.qcow2

fails for them with the following error message when the images are
stored on a GPFS file system :

 qemu-img: error while writing sector 0: Invalid argument

After analyzing the strace output, it seems like the problem is in
handle_aiocb_write_zeroes(): The call to fallocate(FALLOC_FL_PUNCH_HOLE)
returns EINVAL, which can apparently happen if the file system has
a different idea of the granularity of the operation. It's arguably
a bug in GPFS, since the PUNCH_HOLE mode should not result in EINVAL
according to the man-page of fallocate(), but the file system is out
there in production and so we have to deal with it. In commit 294682cc3a
("block: workaround for unaligned byte range in fallocate()") we also
already applied the a work-around for the same problem to the earlier
fallocate(FALLOC_FL_ZERO_RANGE) call, so do it now similar with the
PUNCH_HOLE call. But instead of silently catching and returning
-ENOTSUP (which causes the caller to fall back to writing zeroes),
let's rather inform the user once about the buggy file system and
try the other fallback instead.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210527172020.847617-2-thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy
260242a833 block: drop BlockBackendRootState::read_only
Instead of keeping additional boolean field, let's store the
information in BDRV_O_RDWR bit of BlockBackendRootState::open_flags.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210527154056.70294-4-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy
307261b243 block: consistently use bdrv_is_read_only()
It's better to use accessor function instead of bs->read_only directly.
In some places use bdrv_is_writable() instead of
checking both BDRV_O_RDWR set and BDRV_O_INACTIVE not set.

In bdrv_open_common() it's a bit strange to add one more variable, but
we are going to drop bs->read_only in the next patch, so new ro local
variable substitutes it here.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210527154056.70294-2-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy
39df2c6d57 block/vvfat: fix vvfat_child_perm crash
It's wrong to rely on s->qcow in vvfat_child_perm, as on permission
update during bdrv_open_child() call this field is not set yet.

Still prior to aa5a04c7db, it didn't
crash, as bdrv_open_child passed NULL as child to bdrv_child_perm(),
and NULL was equal to NULL in assertion (still, it was bad guarantee
for child being s->qcow, not backing :).

Since aa5a04c7db
"add bdrv_attach_child_noperm" bdrv_refresh_perms called on parent node
when attaching child, and new correct child pointer is passed to
.bdrv_child_perm. Still, s->qcow is NULL at the moment. Let's rely only
on role instead.

Without that fix,
./build/qemu-system-x86_64 -usb -device usb-storage,drive=fat16 \
    -drive \
    file=fat:rw:fat-type=16:"<path of a host folder>",id=fat16,format=raw,if=none

crashes:
(gdb) bt
0  raise () at /lib64/libc.so.6
1  abort () at /lib64/libc.so.6
2  _nl_load_domain.cold () at /lib64/libc.so.6
3  annobin_assert.c_end () at /lib64/libc.so.6
4  vvfat_child_perm (bs=0x559186f3d690, c=0x559186f1ed20, role=3,
                     reopen_queue=0x0, perm=0, shared=31,
                     nperm=0x7ffe56f28298, nshared=0x7ffe56f282a0) at
    ../block/vvfat.c:3214
5  bdrv_child_perm (bs=0x559186f3d690, child_bs=0x559186f60190,
                    c=0x559186f1ed20, role=3, reopen_queue=0x0,
                    parent_perm=0, parent_shared=31,
                    nperm=0x7ffe56f28298, nshared=0x7ffe56f282a0)
    at ../block.c:2094
6  bdrv_node_refresh_perm (bs=0x559186f3d690, q=0x0,
                           tran=0x559186f65850, errp=0x7ffe56f28530) at
    ../block.c:2336
7  bdrv_list_refresh_perms (list=0x559186db5b90 = {...}, q=0x0,
                            tran=0x559186f65850, errp=0x7ffe56f28530)
    at ../block.c:2358
8  bdrv_refresh_perms (bs=0x559186f3d690, errp=0x7ffe56f28530) at
    ../block.c:2419
9  bdrv_attach_child
    (parent_bs=0x559186f3d690, child_bs=0x559186f60190,
     child_name=0x559184d83e3d "write-target",
     child_class=0x5591852f3b00 <child_vvfat_qcow>, child_role=3,
     errp=0x7ffe56f28530) at ../block.c:2959
10 bdrv_open_child
    (filename=0x559186f5cb80 "/var/tmp/vl.7WYmFU",
     options=0x559186f66c20, bdref_key=0x559184d83e3d "write-target",
     parent=0x559186f3d690, child_class=0x5591852f3b00
     <child_vvfat_qcow>, child_role=3, allow_none=false,
     errp=0x7ffe56f28530) at ../block.c:3351
11 enable_write_target (bs=0x559186f3d690, errp=0x7ffe56f28530) at
    ../block/vvfat.c:3177
12 vvfat_open (bs=0x559186f3d690, options=0x559186f42db0, flags=155650,
               errp=0x7ffe56f28530) at ../block/vvfat.c:1236
13 bdrv_open_driver (bs=0x559186f3d690, drv=0x5591853d97e0
                     <bdrv_vvfat>, node_name=0x0,
                     options=0x559186f42db0, open_flags=155650,
                     errp=0x7ffe56f28640) at ../block.c:1557
14 bdrv_open_common (bs=0x559186f3d690, file=0x0,
                     options=0x559186f42db0, errp=0x7ffe56f28640) at
    ../block.c:1833
...

(gdb) fr 4
 #4  vvfat_child_perm (bs=0x559186f3d690, c=0x559186f1ed20, role=3,
                      reopen_queue=0x0, perm=0, shared=31,
                      nperm=0x7ffe56f28298, nshared=0x7ffe56f282a0) at
    ../block/vvfat.c:3214
3214        assert(c == s->qcow || (role & BDRV_CHILD_COW));
(gdb) p role
 $1 = 3   # BDRV_CHILD_DATA | BDRV_CHILD_METADATA
(gdb) p *c
 $2 = {bs = 0x559186f60190, name = 0x559186f669d0 "write-target", klass
     = 0x5591852f3b00 <child_vvfat_qcow>, role = 3, opaque =
         0x559186f3d690, perm = 3, shared_perm = 4, frozen = false,
         parent_quiesce_counter = 0, next = {le_next = 0x0, le_prev =
             0x559186f41818}, next_parent = {le_next = 0x0, le_prev =
                 0x559186f64320}}
(gdb) p s->qcow
 $3 = (BdrvChild *) 0x0

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210524101257.119377-3-vsementsov@virtuozzo.com>
Tested-by: John Arbuckle <programmingkidx@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy
fb62b58896 block/vvfat: child_vvfat_qcow: add .get_parent_aio_context, fix crash
Commit 3ca1f32257
"block: BdrvChildClass: add .get_parent_aio_context handler" introduced
new handler and commit 228ca37e12
"block: drop ctx argument from bdrv_root_attach_child" made a generic
use of it. But 3ca1f32257 didn't update
child_vvfat_qcow. Fix that.

Before that fix the command

./build/qemu-system-x86_64 -usb -device usb-storage,drive=fat16 \
  -drive file=fat:rw:fat-type=16:"<path of a host folder>",id=fat16,format=raw,if=none

crashes:

1  bdrv_child_get_parent_aio_context (c=0x559d62426d20)
    at ../block.c:1440
2  bdrv_attach_child_common
    (child_bs=0x559d62468190, child_name=0x559d606f9e3d "write-target",
     child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3,
     perm=3, shared_perm=4, opaque=0x559d62445690,
     child=0x7ffc74c2acc8, tran=0x559d6246ddd0, errp=0x7ffc74c2ae60)
    at ../block.c:2795
3  bdrv_attach_child_noperm
    (parent_bs=0x559d62445690, child_bs=0x559d62468190,
     child_name=0x559d606f9e3d "write-target",
     child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3,
     child=0x7ffc74c2acc8, tran=0x559d6246ddd0, errp=0x7ffc74c2ae60) at
    ../block.c:2855
4  bdrv_attach_child
    (parent_bs=0x559d62445690, child_bs=0x559d62468190,
     child_name=0x559d606f9e3d "write-target",
     child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3,
     errp=0x7ffc74c2ae60) at ../block.c:2953
5  bdrv_open_child
    (filename=0x559d62464b80 "/var/tmp/vl.h3TIS4",
     options=0x559d6246ec20, bdref_key=0x559d606f9e3d "write-target",
     parent=0x559d62445690, child_class=0x559d60c58d20
     <child_vvfat_qcow>, child_role=3, allow_none=false,
     errp=0x7ffc74c2ae60) at ../block.c:3351
6  enable_write_target (bs=0x559d62445690, errp=0x7ffc74c2ae60) at
   ../block/vvfat.c:3176
7  vvfat_open (bs=0x559d62445690, options=0x559d6244adb0, flags=155650,
               errp=0x7ffc74c2ae60) at ../block/vvfat.c:1236
8  bdrv_open_driver (bs=0x559d62445690, drv=0x559d60d4f7e0
                     <bdrv_vvfat>, node_name=0x0,
                     options=0x559d6244adb0, open_flags=155650,
                     errp=0x7ffc74c2af70) at ../block.c:1557
9  bdrv_open_common (bs=0x559d62445690, file=0x0,
                     options=0x559d6244adb0, errp=0x7ffc74c2af70) at
...

(gdb) fr 1
 #1  0x0000559d603ea3bf in bdrv_child_get_parent_aio_context
     (c=0x559d62426d20) at ../block.c:1440
1440        return c->klass->get_parent_aio_context(c);
 (gdb) p c->klass
$1 = (const BdrvChildClass *) 0x559d60c58d20 <child_vvfat_qcow>
 (gdb) p c->klass->get_parent_aio_context
$2 = (AioContext *(*)(BdrvChild *)) 0x0

Fixes: 3ca1f32257
Fixes: 228ca37e12
Reported-by: John Arbuckle <programmingkidx@gmail.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210524101257.119377-2-vsementsov@virtuozzo.com>
Tested-by: John Arbuckle <programmingkidx@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-02 14:23:20 +02:00
Lukas Straub
5529b02da2 block/quorum: Provide .bdrv_co_flush instead of .bdrv_co_flush_to_disk
The quorum block driver uses a custom flush callback to handle the
case when some children return io errors. In that case it still
returns success if enough children are healthy.
However, it provides it as the .bdrv_co_flush_to_disk callback, not
as .bdrv_co_flush. This causes the block layer to do it's own
generic flushing for the children instead, which doesn't handle
errors properly.

Fix this by providing .bdrv_co_flush instead of
.bdrv_co_flush_to_disk so the block layer uses the custom flush
callback.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reported-by: Minghao Yuan <meeho@qq.com>
Message-Id: <20210518134214.11ccf05f@gecko.fritz.box>
Tested-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-02 14:23:20 +02:00
Thomas Huth
b4c10fc6fe block/ssh: Bump minimum libssh version to 0.8.7
It has been over two years since RHEL-8 was released, and thus per the
platform build policy, we no longer need to support RHEL-7 as a build
target. So from the RHEL-7 perspective, we do not have to support
libssh v0.7 anymore now.

Let's look at the versions from other distributions and operating
systems - according to repology.org, current shipping versions are:

             RHEL-8: 0.9.4
      Debian Buster: 0.8.7
 openSUSE Leap 15.2: 0.8.7
   Ubuntu LTS 18.04: 0.8.0 *
   Ubuntu LTS 20.04: 0.9.3
            FreeBSD: 0.9.5
          Fedora 33: 0.9.5
          Fedora 34: 0.9.5
            OpenBSD: 0.9.5
     macOS HomeBrew: 0.9.5
         HaikuPorts: 0.9.5

* The version of libssh in Ubuntu 18.04 claims to be 0.8.0 from the
name of the package, but in reality it is a 0.7 patched up as a
Frankenstein monster with patches from the 0.8 development branch.
This gave us some headaches in the past already and so it never worked
with QEMU. All attempts to get it supported have failed in the past,
patches for QEMU have never been merged and a request to Ubuntu to
fix it in their 18.04 distro has been ignored:

 https://bugs.launchpad.net/ubuntu/+source/libssh/+bug/1847514

Thus we really should ignore the libssh in Ubuntu 18.04 in QEMU, too.

Fix it by bumping the minimum libssh version to something that is
greater than 0.8.0 now. Debian Buster and openSUSE Leap have the
oldest version and so 0.8.7 is the new minimum.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20210519155859.344569-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2021-06-02 06:51:09 +02:00
Stefano Garzarella
d0fb9657a3 docs: fix references to docs/devel/tracing.rst
Commit e50caf4a5c ("tracing: convert documentation to rST")
converted docs/devel/tracing.txt to docs/devel/tracing.rst.

We still have several references to the old file, so let's fix them
with the following command:

  sed -i s/tracing.txt/tracing.rst/ $(git grep -l docs/devel/tracing.txt)

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210517151702.109066-2-sgarzare@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2021-06-02 06:51:09 +02:00
Paolo Bonzini
b02629550d replication: move include out of root directory
The replication.h file is included from migration/colo.c and tests/unit/test-replication.c,
so it should be in include/.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-26 14:49:46 +02:00
Paolo Bonzini
29a6ea24eb coroutine-sleep: replace QemuCoSleepState pointer with struct in the API
Right now, users of qemu_co_sleep_ns_wakeable are simply passing
a pointer to QemuCoSleepState by reference to the function.  But
QemuCoSleepState really is just a Coroutine*; making the
content of the struct public is just as efficient and lets us
skip the user_state_pointer indirection.

Since the usage is changed, take the occasion to rename the
struct to QemuCoSleep.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20210517100548.28806-6-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-05-21 18:22:33 +01:00
Paolo Bonzini
eaee072085 coroutine-sleep: allow qemu_co_sleep_wake that wakes nothing
All callers of qemu_co_sleep_wake are checking whether they are passing
a NULL argument inside the pointer-to-pointer: do the check in
qemu_co_sleep_wake itself.

As a side effect, qemu_co_sleep_wake can be called more than once and
it will only wake the coroutine once; after the first time, the argument
will be set to NULL via *sleep_state->user_state_pointer.  However, this
would not be safe unless co_sleep_cb keeps using the QemuCoSleepState*
directly, so make it go through the pointer-to-pointer instead.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20210517100548.28806-4-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-05-21 18:22:33 +01:00
Stefan Hajnoczi
1b0b2e6d06 block/export: improve vu_blk_sect_range_ok()
The checks in vu_blk_sect_range_ok() assume VIRTIO_BLK_SECTOR_SIZE is
equal to BDRV_SECTOR_SIZE. This is true, but let's add a
QEMU_BUILD_BUG_ON() to make it explicit.

We might as well check that the request buffer size is a multiple of
VIRTIO_BLK_SECTOR_SIZE while we're at it.

Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210331142727.391465-1-stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-05-18 11:08:13 +02:00
Vladimir Sementsov-Ogievskiy
38b4409647 qcow2: set bdi->is_dirty
Set bdi->is_dirty, so that qemu-img info could show dirty flag.

After this commit the following check will show '"dirty-flag": true':

./build/qemu-img create -f qcow2 -o lazy_refcounts=on x 1M
./build/qemu-io x
qemu-io> write 0 1M

 After "write" command success, kill the qemu-io process:

kill -9 <qemu-io pid>

./build/qemu-img info --output=json x

This will show '"dirty-flag": true' among other things. (before this
commit it shows '"dirty-flag": false')

Note, that qcow2's dirty-bit is not a "dirty bit for the image". It
only protects qcow2 lazy refcounts feature. So, there are a lot of
conditions when qcow2 session may be not closed correctly, but bit is
0. Still, when bit is set, the last session is definitely not finished
correctly and it's better to report it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210504160656.462836-1-vsementsov@virtuozzo.com>
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-05-18 11:08:13 +02:00
Vladimir Sementsov-Ogievskiy
c61ebf362d write-threshold: deal with includes
"qemu/typedefs.h" is enough for include/block/write-threshold.h header
with forward declaration of BlockDriverState. Also drop extra includes
from block/write-threshold.c and tests/unit/test-write-threshold.c

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210506090621.11848-9-vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy
2e0e9cbd89 block/write-threshold: drop extra APIs
bdrv_write_threshold_exceeded() is unused.

bdrv_write_threshold_is_set() is used only to double check the value of
bs->write_threshold_offset in tests. No real sense in it (both tests do
check real value with help of bdrv_write_threshold_get())

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210506090621.11848-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[mreitz: Adjusted commit message as per Eric's suggestion]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy
ad578c56d5 block: drop write notifiers
They are unused now.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210506090621.11848-3-vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy
94783301b8 block/write-threshold: don't use write notifiers
write-notifiers are used only for write-threshold. New code for such
purpose should create filters.

Let's better special-case write-threshold and drop write notifiers at
all. (Actually, write-threshold is special-cased anyway, as the only
user of write-notifiers)

So, create a new direct interface for bdrv_co_write_req_prepare() and
drop all write-notifier related logic from write-threshold.c.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210506090621.11848-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[mreitz: Adjusted comment as per Eric's suggestion]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy
bcc8584c83 block/copy-on-read: use bdrv_drop_filter() and drop s->active
Now, after huge update of block graph permission update algorithm, we
don't need this workaround with active state of the filter. Drop it and
use new smart bdrv_drop_filter() function.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210506194143.394141-1-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy
9c785cd714 mirror: stop cancelling in-flight requests on non-force cancel in READY
If mirror is READY than cancel operation is not discarding the whole
result of the operation, but instead it's a documented way get a
point-in-time snapshot of source disk.

So, we should not cancel any requests if mirror is READ and
force=false. Let's fix that case.

Note, that bug that we have before this commit is not critical, as the
only .bdrv_cancel_in_flight implementation is nbd_cancel_in_flight()
and it cancels only requests waiting for reconnection, so it should be
rare case.

Fixes: 521ff8b779
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210421075858.40197-1-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy
78632a3d16 monitor: hmp_qemu_io: acquire aio contex, fix crash
Max reported the following bug:

$ ./qemu-img create -f raw src.img 1G
$ ./qemu-img create -f raw dst.img 1G

$ (echo '
   {"execute":"qmp_capabilities"}
   {"execute":"blockdev-mirror",
    "arguments":{"job-id":"mirror",
                 "device":"source",
                 "target":"target",
                 "sync":"full",
                 "filter-node-name":"mirror-top"}}
'; sleep 3; echo '
   {"execute":"human-monitor-command",
    "arguments":{"command-line":
                 "qemu-io mirror-top \"write 0 1G\""}}') \
| x86_64-softmmu/qemu-system-x86_64 \
   -qmp stdio \
   -blockdev file,node-name=source,filename=src.img \
   -blockdev file,node-name=target,filename=dst.img \
   -object iothread,id=iothr0 \
   -device virtio-blk,drive=source,iothread=iothr0

crashes:

0  raise () at /usr/lib/libc.so.6
1  abort () at /usr/lib/libc.so.6
2  error_exit
   (err=<optimized out>,
   msg=msg@entry=0x55fbb1634790 <__func__.27> "qemu_mutex_unlock_impl")
   at ../util/qemu-thread-posix.c:37
3  qemu_mutex_unlock_impl
   (mutex=mutex@entry=0x55fbb25ab6e0,
   file=file@entry=0x55fbb1636957 "../util/async.c",
   line=line@entry=650)
   at ../util/qemu-thread-posix.c:109
4  aio_context_release (ctx=ctx@entry=0x55fbb25ab680) at ../util/async.c:650
5  bdrv_do_drained_begin
   (bs=bs@entry=0x55fbb3a87000, recursive=recursive@entry=false,
   parent=parent@entry=0x0,
   ignore_bds_parents=ignore_bds_parents@entry=false,
   poll=poll@entry=true) at ../block/io.c:441
6  bdrv_do_drained_begin
   (poll=true, ignore_bds_parents=false, parent=0x0, recursive=false,
   bs=0x55fbb3a87000) at ../block/io.c:448
7  blk_drain (blk=0x55fbb26c5a00) at ../block/block-backend.c:1718
8  blk_unref (blk=0x55fbb26c5a00) at ../block/block-backend.c:498
9  blk_unref (blk=0x55fbb26c5a00) at ../block/block-backend.c:491
10 hmp_qemu_io (mon=0x7fffaf3fc7d0, qdict=<optimized out>)
   at ../block/monitor/block-hmp-cmds.c:628

man pthread_mutex_unlock
...
    EPERM  The  mutex type is PTHREAD_MUTEX_ERRORCHECK or
    PTHREAD_MUTEX_RECURSIVE, or the mutex is a robust mutex, and the
    current thread does not own the mutex.

So, thread doesn't own the mutex. And we have iothread here.

Next, note that AIO_WAIT_WHILE() documents that ctx must be acquired
exactly once by caller. But where is it acquired in the call stack?
Seems nowhere.

qemuio_command do acquire aio context.. But we need context acquired
around blk_unref() as well and actually around blk_insert_bs() too.

Let's refactor qemuio_command so that it doesn't acquire aio context
but callers do that instead. This way we can cleanly acquire aio
context in hmp_qemu_io() around all three calls.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210423134233.51495-1-vsementsov@virtuozzo.com>
[mreitz: Fixed comment]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-05-14 16:14:10 +02:00
Connor Kuehl
2b99cfce08 block/rbd: Add an escape-aware strchr helper
Sometimes the parser needs to further split a token it has collected
from the token input stream. Right now, it does a cursory check to see
if the relevant characters appear in the token to determine if it should
break it down further.

However, qemu_rbd_next_tok() will escape characters as it removes tokens
from the token stream and plain strchr() won't. This can make the
initial strchr() check slightly misleading since it implies
qemu_rbd_next_tok() will find the token and split on it, except the
reality is that qemu_rbd_next_tok() will pass over it if it is escaped.

Use a custom strchr to avoid mixing escaped and unescaped string
operations. Furthermore, this code is identical to how
qemu_rbd_next_tok() seeks its next token, so incorporate this custom
strchr into the body of that function to reduce duplication.

Reported-by: Han Han <hhan@redhat.com>
Fixes: https://bugzilla.redhat.com/1873913
Signed-off-by: Connor Kuehl <ckuehl@redhat.com>
Message-Id: <20210421212343.85524-3-ckuehl@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-05-14 16:14:10 +02:00
Markus Armbruster
09ec85176e block: Drop the sheepdog block driver
It was deprecated in commit e1c4269763, v5.2.0.  See that commit
message for rationale.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210501075747.3293186-1-armbru@redhat.com>
ACKed-by: Peter Krempa <pkrempa@redhat.com>
2021-05-12 17:42:23 +02:00
Peter Maydell
4cc10cae64 * NetBSD NVMM support
* RateLimit mutex
 * Prepare for Meson 0.57 upgrade
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCROukUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOFXgf/ThwuBCbwC6pwoHpZzFXHdJRXIqHa
 iKTqjCLymz9NQBRTaMeG5CWjXl4o9syHLzEXLQxuQaynHK8AjbyeMSllBVLzBUme
 TU9AY3qwLShRJm3XGXkuUilFE+IR8FXWFgrTOsZXgbT+JQlkCgiuhCRqfAcDEgi/
 F5SNqlMzPNvF6G0FY9DFBBkoKF4YWROx25SgNl3fxgWwC94px/a22BXTVpOxaClZ
 HE/H+kbJH5sD2dOJR5cqbgFg7eBemNdxO3tSbR6WoP9pcvVPx0Dgh5hUJb5+pUXY
 fV5O5zZ+CdyNjWM4yAHg0y8kOlnqrLwv7pH+NdqWFaWiZ9uCSrVFR13ejQ==
 =sKO4
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging

* NetBSD NVMM support
* RateLimit mutex
* Prepare for Meson 0.57 upgrade

# gpg: Signature made Tue 04 May 2021 13:15:37 BST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini-gitlab/tags/for-upstream:
  glib-compat: accept G_TEST_SLOW environment variable
  gitlab-ci: use --meson=internal for CFI jobs
  configure: handle meson options that have changed type
  configure: reindent meson invocation
  slirp: add configure option to disable smbd
  ratelimit: protect with a mutex
  Add NVMM Accelerator: add maintainers for NetBSD/NVMM
  Add NVMM accelerator: acceleration enlightenments
  Add NVMM accelerator: x86 CPU support
  Add NVMM accelerator: configure and build logic
  oslib-win32: do not rely on macro to get redefined function name

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-05-06 18:56:17 +01:00
Paolo Bonzini
4951967d84 ratelimit: protect with a mutex
Right now, rate limiting is protected by the AioContext mutex, which is
taken for example both by the block jobs and by qmp_block_job_set_speed
(via find_block_job).

We would like to remove the dependency of block layer code on the
AioContext mutex, since most drivers and the core I/O code are already
not relying on it.  However, there is no existing lock that can easily
be taken by both ratelimit_set_speed and ratelimit_calculate_delay,
especially because the latter might run in coroutine context (and
therefore under a CoMutex) but the former will not.

Since concurrent calls to ratelimit_calculate_delay are not possible,
one idea could be to use a seqlock to get a snapshot of slice_ns and
slice_quota.  But for now keep it simple, and just add a mutex to the
RateLimit struct; block jobs are generally not performance critical to
the point of optimizing the clock cycles spent in synchronization.

This also requires the introduction of init/destroy functions, so
add them to the two users of ratelimit.h.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-04 14:15:35 +02:00
Thomas Huth
4c386f8064 Do not include sysemu/sysemu.h if it's not really necessary
Stop including sysemu/sysemu.h in files that don't need it.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-2-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-05-02 17:24:50 +02:00
Kevin Wolf
35b7f4abd5 block: Add BDRV_O_NO_SHARE for blk_new_open()
Normally, blk_new_open() just shares all permissions. This was fine
originally when permissions only protected against uses in the same
process because no other part of the code would actually get to access
the block nodes opened with blk_new_open(). However, since we use it for
file locking now, unsharing permissions becomes desirable.

Add a new BDRV_O_NO_SHARE flag that is used in blk_new_open() to unshare
any permissions that can be unshared.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210422164344.283389-2-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-30 12:27:48 +02:00
Vladimir Sementsov-Ogievskiy
72373e40fb block: bdrv_reopen_multiple: refresh permissions on updated graph
Move bdrv_reopen_multiple to new paradigm of permission update:
first update graph relations, then do refresh the permissions.

We have to modify reopen process in file-posix driver: with new scheme
we don't have prepared permissions in raw_reopen_prepare(), so we
should reconfigure fd in raw_check_perm(). Still this seems more native
and simple anyway.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-31-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-30 12:27:48 +02:00
Vladimir Sementsov-Ogievskiy
1e4c797c75 block: make bdrv_refresh_limits() to be a transaction action
To be used in further commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-28-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-30 12:27:48 +02:00
Vladimir Sementsov-Ogievskiy
b75d64b329 block/backup-top: drop .active
We don't need this workaround anymore: bdrv_append is already smart
enough and we can use new bdrv_drop_filter().

This commit efficiently reverts also recent 705dde27c6, which
checked .active on io path. Still it said that the problem should be
theoretical. And the logic of filter removement is changed anyway.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-25-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-30 12:27:48 +02:00
Vladimir Sementsov-Ogievskiy
228ca37e12 block: drop ctx argument from bdrv_root_attach_child
Passing parent aio context is redundant, as child_class and parent
opaque pointer are enough to retrieve it. Drop the argument and use new
bdrv_child_get_parent_aio_context() interface.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-7-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-30 12:27:47 +02:00
Vladimir Sementsov-Ogievskiy
3ca1f32257 block: BdrvChildClass: add .get_parent_aio_context handler
Add new handler to get aio context and implement it in all child
classes. Add corresponding public interface to be used soon.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-6-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-30 12:27:47 +02:00
Vladimir Sementsov-Ogievskiy
ae9d441706 block: bdrv_append(): don't consume reference
We have too much comments for this feature. It seems better just don't
do it. Most of real users (tests don't count) have to create additional
reference.

Drop also comment in external_snapshot_prepare:
 - bdrv_append doesn't "remove" old bs in common sense, it sounds
   strange
 - the fact that bdrv_append can fail is obvious from the context
 - the fact that we must rollback all changes in transaction abort is
   known (it's the direct role of abort)

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210428151804.439460-5-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-30 12:27:47 +02:00
Vladimir Sementsov-Ogievskiy
0267101af6 block/nbd: fix possible use after free of s->connect_thread
If on nbd_close() we detach the thread (in
nbd_co_establish_connection_cancel() thr->state becomes
CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
s->connect_thread (which is set to NULL), as running thread may free it
at any time.

Still nbd_co_establish_connection() does exactly this: it saves
s->connect_thread to local variable (just for better code style) and
use it even after yield point, when thread may be already detached.

Fix that. Also check thr to be non-NULL on
nbd_co_establish_connection() start for safety.

After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
impossible in the second switch in nbd_co_establish_connection().
Still, don't add extra abort() just before the release. If it somehow
possible to reach this "case:" it won't hurt. Anyway, good refactoring
of all this reconnect mess will come soon.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210406155114.1057355-1-vsementsov@virtuozzo.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-04-13 15:35:12 +02:00
Max Reitz
00769414cd mirror: Do not enter a paused job on completion
Currently, it is impossible to complete jobs on standby (i.e. paused
ready jobs), but actually the only thing in mirror_complete() that does
not work quite well with a paused job is the job_enter() at the end.

If we make it conditional, this function works just fine even if the
mirror job is paused.

So technically this is a no-op, but obviously the intention is to accept
block-job-complete even for jobs on standby, which we need this patch
for first.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210409120422.144040-3-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-09 18:00:29 +02:00
Max Reitz
c41f5b96ee mirror: Move open_backing_file to exit_common
This is a graph change and therefore should be done in job-finalize
(which is what invokes mirror_exit_common()).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210409120422.144040-2-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-09 18:00:29 +02:00
Stefano Garzarella
b084b420d9 block/rbd: fix memory leak in qemu_rbd_co_create_opts()
When we allocate 'q_namespace', we forgot to set 'has_q_namespace'
to true. This can cause several issues, including a memory leak,
since qapi_free_BlockdevCreateOptions() does not deallocate that
memory, as reported by valgrind:

  13 bytes in 1 blocks are definitely lost in loss record 7 of 96
     at 0x4839809: malloc (vg_replace_malloc.c:307)
     by 0x48CEBB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8)
     by 0x48E3FE3: g_strdup (in /usr/lib64/libglib-2.0.so.0.6600.8)
     by 0x180010: qemu_rbd_co_create_opts (rbd.c:446)
     by 0x1AE72C: bdrv_create_co_entry (block.c:492)
     by 0x241902: coroutine_trampoline (coroutine-ucontext.c:173)
     by 0x57530AF: ??? (in /usr/lib64/libc-2.32.so)
     by 0x1FFEFFFA6F: ???

Fix setting 'has_q_namespace' to true when we allocate 'q_namespace'.

Fixes: 19ae9ae014 ("block/rbd: Add support for ceph namespaces")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20210329150129.121182-3-sgarzare@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-09 18:00:29 +02:00
Stefano Garzarella
c1c1f6cf51 block/rbd: fix memory leak in qemu_rbd_connect()
In qemu_rbd_connect(), 'mon_host' is allocated by qemu_rbd_mon_host()
using g_strjoinv(), but it's only freed in the error path, leaking
memory in the success path as reported by valgrind:

  80 bytes in 4 blocks are definitely lost in loss record 5,028 of 6,516
     at 0x4839809: malloc (vg_replace_malloc.c:307)
     by 0x5315BB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8)
     by 0x532B6FF: g_strjoinv (in /usr/lib64/libglib-2.0.so.0.6600.8)
     by 0x87D07E: qemu_rbd_mon_host (rbd.c:538)
     by 0x87D07E: qemu_rbd_connect (rbd.c:562)
     by 0x87E1CE: qemu_rbd_open (rbd.c:740)
     by 0x840EB1: bdrv_open_driver (block.c:1528)
     by 0x8453A9: bdrv_open_common (block.c:1802)
     by 0x8453A9: bdrv_open_inherit (block.c:3444)
     by 0x8464C2: bdrv_open (block.c:3537)
     by 0x8108CD: qmp_blockdev_add (blockdev.c:3569)
     by 0x8EA61B: qmp_marshal_blockdev_add (qapi-commands-block-core.c:1086)
     by 0x90B528: do_qmp_dispatch_bh (qmp-dispatch.c:131)
     by 0x907EA4: aio_bh_poll (async.c:164)

Fix freeing 'mon_host' also when qemu_rbd_connect() ends correctly.

Fixes: 0a55679b4a
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20210329150129.121182-2-sgarzare@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-09 18:00:29 +02:00
David Edmondson
07ee2ab4fd block/vdi: Don't assume that blocks are larger than VdiHeader
Given that the block size is read from the header of the VDI file, a
wide variety of sizes might be seen. Rather than re-using a block
sized memory region when writing the VDI header, allocate an
appropriately sized buffer.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Max Reitz <mreitz@redhat.com>
Message-id: 20210325112941.365238-3-pbonzini@redhat.com
Message-Id: <20210309144015.557477-3-david.edmondson@oracle.com>
Acked-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-31 10:44:21 +01:00
David Edmondson
574b8304cf block/vdi: When writing new bmap entry fails, don't leak the buffer
If a new bitmap entry is allocated, requiring the entire block to be
written, avoiding leaking the buffer allocated for the block should
the write fail.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Max Reitz <mreitz@redhat.com>
Message-id: 20210325112941.365238-2-pbonzini@redhat.com
Message-Id: <20210309144015.557477-2-david.edmondson@oracle.com>
Acked-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-31 10:44:21 +01:00
Max Reitz
484108293d qcow2: Force preallocation with data-file-raw
Setting the qcow2 data-file-raw bit means that you can ignore the
qcow2 metadata when reading from the external data file.  It does not
mean that you have to ignore it, though.  Therefore, the data read must
be the same regardless of whether you interpret the metadata or whether
you ignore it, and thus the L1/L2 tables must all be present and give a
1:1 mapping.

This patch changes 244's output: First, the qcow2 file is larger right
after creation, because of metadata preallocation.  Second, the qemu-img
map output changes: Everything that was not explicitly discarded or
zeroed is now a data area.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210326145509.163455-2-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2021-03-30 13:02:10 +02:00
Max Reitz
53431b9086 block/mirror: Fix mirror_top's permissions
mirror_top currently shares all permissions, and takes only the WRITE
permission (if some parent has taken that permission, too).

That is wrong, though; mirror_top is a filter, so it should take
permissions like any other filter does.  For example, if the parent
needs CONSISTENT_READ, we need to take that, too, and if it cannot share
the WRITE permission, we cannot share it either.

The exception is when mirror_top is used for active commit, where we
cannot take CONSISTENT_READ (because it is deliberately unshared above
the base node) and where we must share WRITE (so that it is shared for
all images in the backing chain, so the mirror job can take it for the
target BB).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210211172242.146671-2-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-03-29 18:09:00 +02:00
Pavel Dovgalyuk
ad0ce64279 qcow2: use external virtual timers
Regular virtual timers are used to emulate timings
related to vCPU and peripheral states. QCOW2 uses timers
to clean the cache. These timers should have external
flag. In the opposite case they affect the execution
and it can't be recorded and replayed.
This patch adds external flag to the timer for qcow2
cache clean.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <161700516327.1141158.8366564693714562536.stgit@pasha-ThinkPad-X280>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-03-29 18:04:29 +02:00
Markus Armbruster
bdabafc683 block: Remove monitor command block_passwd
Command block_passwd always fails since

Commit c01c214b69 "block: remove all encryption handling APIs"
(v2.10.0) turned block_passwd into a stub that always fails, and
hardcoded encryption_key_missing to false in query-named-block-nodes
and query-block.

Commit ad1324e044 "block: remove 'encryption_key_missing' flag from
QAPI" just landed.  Complete the cleanup job: remove block_passwd.

Cc: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210323101951.3686029-1-armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2021-03-23 22:31:56 +01:00
Stefan Hajnoczi
6f4b1996b4 block/export: disable VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD for now
The vhost-user in-flight shmfd feature has not been tested with
qemu-storage-daemon's vhost-user-blk server. Disable this optional
feature for now because it requires MFD_ALLOW_SEALING, which is not
available in some CI environments.

If we need this feature in the future it can be re-enabled after
testing.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210309094106.196911-2-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-19 10:15:06 +01:00
Max Reitz
0f418a2076 curl: Disconnect sockets from CURLState
When a curl transfer is finished, that does not mean that CURL lets go
of all the sockets it used for it.  We therefore must not free a
CURLSocket object before CURL has invoked curl_sock_cb() to tell us to
remove it.  Otherwise, we may get a use-after-free, as described in this
bug report: https://bugs.launchpad.net/qemu/+bug/1916501

(Reproducer from that report:
  $ qemu-img convert -f qcow2 -O raw \
  https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img \
  out.img
)

(Alternatively, it might seem logical to force-drop all sockets that
have been used for a state when the respective transfer is done, kind of
like it is done now, but including unsetting the AIO handlers.
Unfortunately, doing so makes the driver just hang instead of crashing,
which seems to evidence that CURL still uses those sockets.)

Make the CURLSocket object independent of "its" CURLState by putting all
sockets into a hash table belonging to the BDRVCURLState instead of a
list that belongs to a CURLState.  Do not touch any sockets in
curl_clean_state().

Testing, it seems like all sockets are indeed gone by the time the curl
BDS is closed, so it seems like there really was no point in freeing any
socket just because a transfer is done.  libcurl does invoke
curl_sock_cb() with CURL_POLL_REMOVE for every socket it has.

Buglink: https://bugs.launchpad.net/qemu/+bug/1916501
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210309130541.37540-3-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-19 10:15:06 +01:00
Max Reitz
3663dca461 curl: Store BDRVCURLState pointer in CURLSocket
A socket does not really belong to any specific state.  We do not need
to store a pointer to "its" state in it, a pointer to the common
BDRVCURLState is sufficient.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210309130541.37540-2-mreitz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-19 10:15:06 +01:00
Kevin Wolf
1bf26076d6 stream: Don't crash when node permission is denied
The image streaming block job restricts shared permissions of the nodes
it accesses. This can obviously fail when other users already got these
permissions. &error_abort is therefore wrong and can crash. Handle these
errors gracefully and just fail starting the block job.

Reported-by: Nini Gu <ngu@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210309173451.45152-1-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-19 10:15:06 +01:00
Daniel P. Berrangé
8d17adf34f block: remove support for using "file" driver with block/char devices
The 'host_device' and 'host_cdrom' drivers must be used instead.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-03-18 09:22:55 +00:00
Daniel P. Berrangé
e67d8e2928 block: remove 'dirty-bitmaps' field from 'BlockInfo' struct
The same data is available in the 'BlockDeviceInfo' struct.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-03-18 09:22:55 +00:00
Daniel P. Berrangé
81cbfd5088 block: remove dirty bitmaps 'status' field
The same information is available via the 'recording' and 'busy' fields.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-03-18 09:22:55 +00:00
Daniel P. Berrangé
ad1324e044 block: remove 'encryption_key_missing' flag from QAPI
This has been hardcoded to "false" since 2.10.0, since secrets required
to unlock block devices are now always provided up front instead of using
interactive prompts.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-03-18 09:22:55 +00:00
Peter Maydell
6f34661b6c Pull request
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAmBJQHkSHGxhdXJlbnRA
 dml2aWVyLmV1AAoJEPMMOL0/L748EdsP/2U2CGTM95tjDunTs9uZV/7zM6PWt85M
 vAPItNVU2jYPfzmaJN8twrzlj0PEDhvB9Q+OJjE4HEGxEbPcdblLg/R6Zs/EaWuY
 N6oKHPXnOnHb+e80UUJdiAq+Y5RUnJbb5L3ArycnVzBgws+Oj3DtqjB2VDccY4C/
 Gkt23tZ7ikU4958e5VBqW2NUUrr+BQO0mqsW+sbbeE3WPj75NQc6srvS3TWvsg7W
 OYEyVYwm52/q2W/1a3Knfv/YO6UU9NGMpGyDLD2kwQwKbgUWYLW2BiWVwOAUldo9
 De3nfKbKnFezLCZAZro20lfCa/aKwNGCOXWzlrKxqUQCmGYUx7gM1+3ahrSd5N0v
 zUgLdZm7O428ZHL6GujWGLA1UwwzpM9X3P3yo4c0S1J6fHypbI6a9jtewrUFvFgP
 TuQ7dp6cn2DTBYUcsrWilPHbTZMADYQNRD/xUtKqalYBEWy3FX5W75+OYBJKKh+X
 Qip68m6JBzgkszXhCcu6xlLb8ynZJr2VsHvtvIgf4NnLqNOIEgVLcMtoMZT8DPrp
 rIoRc5oUFz8zj5lHnJuLADBUvlCMqoCCoU3h2aqHwH8a7RGb180f+82BW9aBcb2u
 Jk+WgAhBUjWBBC97ReFgrINUD/qZRXVoOq8LthTuQSSyr/i1zq+oLM1F0EDXcMDm
 ssATku2IxL24
 =moUF
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-for-6.0-pull-request' into staging

Pull request

# gpg: Signature made Wed 10 Mar 2021 21:56:09 GMT
# gpg:                using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg:                issuer "laurent@vivier.eu"
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full]
# gpg:                 aka "Laurent Vivier <laurent@vivier.eu>" [full]
# gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier2/tags/trivial-branch-for-6.0-pull-request: (22 commits)
  sysemu: Let VMChangeStateHandler take boolean 'running' argument
  sysemu/runstate: Let runstate_is_running() return bool
  hw/lm32/Kconfig: Have MILKYMIST select LM32_DEVICES
  hw/lm32/Kconfig: Rename CONFIG_LM32 -> CONFIG_LM32_DEVICES
  hw/lm32/Kconfig: Introduce CONFIG_LM32_EVR for lm32-evr/uclinux boards
  qemu-common.h: Update copyright string to 2021
  tests/fp/fp-test: Replace the word 'blacklist'
  qemu-options: Replace the word 'blacklist'
  seccomp: Replace the word 'blacklist'
  scripts/tracetool: Replace the word 'whitelist'
  ui: Replace the word 'whitelist'
  virtio-gpu: Adjust code space style
  exec/memory: Use struct Object typedef
  fuzz-test: remove unneccessary debugging flags
  net: Use id_generate() in the network subsystem, too
  MAINTAINERS: Fix the location of tools manuals
  vhost_user_gpu: Drop dead check for g_malloc() failure
  backends/dbus-vmstate: Fix short read error handling
  target/hexagon/gen_tcg_funcs: Fix a typo
  hw/elf_ops: Fix a typo
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-11 18:55:27 +00:00
Peter Maydell
9abda42bf2 nbd patches for 2021-03-09
- Add Vladimir as NBD co-maintainer
 - Fix reporting of holes in NBD_CMD_BLOCK_STATUS
 - Improve command-line parsing accuracy of large numbers (anything going
 through qemu_strtosz), including the deprecation of hex+suffix
 - Improve some error reporting in the block layer
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmBHlmIACgkQp6FrSiUn
 Q2q2cQgAqJWNb4J/ShjvzocDDPzJ0iBitFbg0huFPfbt4DScubEZo5wBJG7vOhOW
 hIHrWCRzGvRgsn0tcSfrgFaegmHKrLgjkibM7ou8ni9NC1kUBd3R/3FBNIMxhYf7
 Q8Kfspl0LRfMJDKF9jdCnQ4Gxcd6h2OIYZqiWVg8V4Tc8WdCpIVOah7e7wjuW8bT
 vgZvfboUWm5AmIF9j/MxuMn+HFZ4ArSuFVL80ZaXlD00vRra7u3HZ8pUfcOlOujg
 7HeouM1E5j3NNE6aZSN++x/EQ3sg0zmirbWUCcgAyRfdRkAmB15uh2PUzPxEIJKH
 UHUIW5LvNtz2+yzOAz2yK29OE523Yg==
 =blE1
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2021-03-09' into staging

nbd patches for 2021-03-09

- Add Vladimir as NBD co-maintainer
- Fix reporting of holes in NBD_CMD_BLOCK_STATUS
- Improve command-line parsing accuracy of large numbers (anything going
through qemu_strtosz), including the deprecation of hex+suffix
- Improve some error reporting in the block layer

# gpg: Signature made Tue 09 Mar 2021 15:38:10 GMT
# gpg:                using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg:                 aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2021-03-09:
  block/qcow2: refactor qcow2_update_options_prepare error paths
  block/qed: bdrv_qed_do_open: deal with errp
  block/qcow2: simplify qcow2_co_invalidate_cache()
  block/qcow2: read_cache_sizes: return status value
  block/qcow2-bitmap: return status from qcow2_store_persistent_dirty_bitmaps
  block/qcow2-bitmap: improve qcow2_load_dirty_bitmaps() interface
  block/qcow2: qcow2_get_specific_info(): drop error propagation
  blockjob: return status from block_job_set_speed()
  block/mirror: drop extra error propagation in commit_active_start()
  block: drop extra error propagation for bdrv_set_backing_hd
  blockdev: fix drive_backup_prepare() missed error
  block: check return value of bdrv_open_child and drop error propagation
  utils: Deprecate hex-with-suffix sizes
  utils: Improve qemu_strtosz() to have 64 bits of precision
  utils: Enhance testsuite for do_strtosz()
  nbd: server: Report holes for raw images
  MAINTAINERS: add Vladimir as co-maintainer of NBD

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-11 13:57:08 +00:00
Philippe Mathieu-Daudé
538f049704 sysemu: Let VMChangeStateHandler take boolean 'running' argument
The 'running' argument from VMChangeStateHandler does not require
other value than 0 / 1. Make it a plain boolean.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20210111152020.1422021-3-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-03-09 23:13:57 +01:00
Peter Maydell
a557b00469 Block layer patches:
- qemu-storage-daemon: add --pidfile option
 - qemu-storage-daemon: CLI error messages include the option name now
 - vhost-user-blk export: Misc fixes
 - docs: Improvements for qemu-storage-daemon documentation
 - parallels: load bitmap extension
 - backup-top: Don't crash on post-finalize accesses
 - Improve error messages related to node-name options
 - iotests improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmBGWHURHGt3b2xmQHJl
 ZGhhdC5jb20ACgkQfwmycsiPL9ZpyxAAk0gRiayMUidSzgvzU/CeUhzBsC4ayEkn
 dLtTZ8hl7cW/w3GjDK1Wri4MANRN/0YHjiLSzO38lfVpK0z8SJr5aU4CwhRlOKVm
 VWgx+OLlV4Azht9fMNF4SwUXgXhl7pUNiFMNnomb++gvqhjMCedDZcWlnVKhbuQ+
 O3TKGO4tToSGaXP85jCM4xukw5HZ//4QMYg6MH0gDk8ahfE2MhyTHz64oDp412os
 qhxvc4bU2S5xGLaBfLGhsc6VPQFKjblG704P/Y73zeoxq12A0L2Ru98WvrNaXw7Z
 m54jJUINiDkJ7ZOl6W04zdeiLvs3BOrNe+7mxawOTmdkBsLOKErrhrTO1gJmHHmX
 kJLWEh9VYWxVbvE7C3KQt9bclR6wt+aPup4X1XE8pHtocPVONVq5bvctrVgxgK0b
 btN06NcK+2jQxcQkG4MnBJ8S41qmxHyIEQlQWKyUWXvKt6zsFU/NuWKMQrAfYZZi
 5J+RPU/fB073LY4lpAgou0OP1/RIvQmi5zWzjWm/Qbp3JpgC+azcYvxn7UU7J71P
 +u8IEQ4+Q9s0gvXQAh/U8AQg2eOqAwEAyFUJl9wpPN56O03dbI8KyCV5ECIRJu49
 CC8uKlJxZkbw9ZBs11SAmm/0J64WcNb2AMWxDPC8Z6oQbVaRRRznoRwRP2H6odUu
 uBolS43+5cI=
 =eAjH
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- qemu-storage-daemon: add --pidfile option
- qemu-storage-daemon: CLI error messages include the option name now
- vhost-user-blk export: Misc fixes
- docs: Improvements for qemu-storage-daemon documentation
- parallels: load bitmap extension
- backup-top: Don't crash on post-finalize accesses
- Improve error messages related to node-name options
- iotests improvements

# gpg: Signature made Mon 08 Mar 2021 17:01:41 GMT
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (30 commits)
  blockdev: Clarify error messages pertaining to 'node-name'
  block: Clarify error messages pertaining to 'node-name'
  docs: qsd: Explain --export nbd,name=... default
  MAINTAINERS: update parallels block driver
  iotests: add parallels-read-bitmap test
  iotests.py: add unarchive_sample_image() helper
  parallels: support bitmap extension for read-only mode
  block/parallels: BDRVParallelsState: add cluster_size field
  parallels.txt: fix bitmap L1 table description
  qcow2-bitmap: make bytes_covered_by_bitmap_cluster() public
  block/export: port virtio-blk read/write range check
  block/export: port virtio-blk discard/write zeroes input validation
  block/export: fix vhost-user-blk export sector number calculation
  block/export: use VIRTIO_BLK_SECTOR_BITS
  block/export: fix blk_size double byteswap
  libqtest: add qtest_remove_abrt_handler()
  libqtest: add qtest_kill_qemu()
  libqtest: add qtest_socket_server()
  vhost-user-blk: fix blkcfg->num_queues endianness
  docs: replace insecure /tmp examples in qsd docs
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-09 21:31:18 +00:00
Vladimir Sementsov-Ogievskiy
1184b41101 block/qcow2: refactor qcow2_update_options_prepare error paths
Keep setting ret close to setting errp and don't merge different error
paths into one. This way it's more obvious that we don't return
error without setting errp.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20210202124956.63146-15-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-03-08 16:04:46 -06:00
Vladimir Sementsov-Ogievskiy
15ce94a68c block/qed: bdrv_qed_do_open: deal with errp
Always set errp on failure. The generic bdrv_open_driver supports
driver functions which can return a negative value but forget to set
errp.  That's a strange thing. Let's improve bdrv_qed_do_open to not
behave this way. This allows the simplification of code in
bdrv_qed_co_invalidate_cache().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210202124956.63146-14-vsementsov@virtuozzo.com>
[eblake: commit message grammar tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-03-08 16:03:32 -06:00
Vladimir Sementsov-Ogievskiy
e6247c9c9f block/qcow2: simplify qcow2_co_invalidate_cache()
qcow2_do_open correctly sets errp on each failure path. So, we can
simplify code in qcow2_co_invalidate_cache() and drop explicit error
propagation.

Add ERRP_GUARD() as mandated by the documentation in
include/qapi/error.h so that error_prepend() is actually called even if
errp is &error_fatal.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210202124956.63146-13-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-03-08 16:03:27 -06:00
Vladimir Sementsov-Ogievskiy
772c4cad13 block/qcow2: read_cache_sizes: return status value
It's better to return status together with setting errp. It allows to
reduce error propagation.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20210202124956.63146-12-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-03-08 16:03:23 -06:00
Vladimir Sementsov-Ogievskiy
526e31de99 block/qcow2-bitmap: return status from qcow2_store_persistent_dirty_bitmaps
It's better to return status together with setting errp. It makes
possible to avoid error propagation.

While being here, put ERRP_GUARD() to fix error_prepend(errp, ...)
usage inside qcow2_store_persistent_dirty_bitmaps() (see the comment
above ERRP_GUARD() definition in include/qapi/error.h)

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20210202124956.63146-11-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-03-08 16:03:21 -06:00
Vladimir Sementsov-Ogievskiy
0c1e9d2a9a block/qcow2-bitmap: improve qcow2_load_dirty_bitmaps() interface
It's recommended for bool functions with errp to return true on success
and false on failure. Non-standard interfaces don't help to understand
the code. The change is also needed to reduce error propagation.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210202124956.63146-10-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-03-08 16:01:47 -06:00
Vladimir Sementsov-Ogievskiy
83bad8cbf5 block/qcow2: qcow2_get_specific_info(): drop error propagation
Don't use error propagation in qcow2_get_specific_info(). For this
refactor qcow2_get_bitmap_info_list, its current interface is rather
weird.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210202124956.63146-9-vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
[eblake: separate local 'tail' variable from 'info_list' parameter]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-03-08 15:16:11 -06:00
Vladimir Sementsov-Ogievskiy
eb5becc18f block/mirror: drop extra error propagation in commit_active_start()
Let's check return value of mirror_start_job to check for failure
instead of local_err.

Rename ret to job, as ret is usually integer variable.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20210202124956.63146-7-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-03-08 15:14:14 -06:00
Vladimir Sementsov-Ogievskiy
bc52024959 block: check return value of bdrv_open_child and drop error propagation
This patch is generated by cocci script:

@@
symbol bdrv_open_child, errp, local_err;
expression file;
@@

  file = bdrv_open_child(...,
-                        &local_err
+                        errp
                        );
- if (local_err)
+ if (!file)
  {
      ...
-     error_propagate(errp, local_err);
      ...
  }

with command

spatch --sp-file x.cocci --macro-file scripts/cocci-macro-file.h \
--in-place --no-show-diff --max-width 80 --use-gitgrep block

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20210202124956.63146-4-vsementsov@virtuozzo.com>
[eblake: fix qcow2_do_open() to use ERRP_GUARD, necessary as the only
caller to pass allow_none=true]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-03-08 15:07:09 -06:00
Vladimir Sementsov-Ogievskiy
baefd97700 parallels: support bitmap extension for read-only mode
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210224104707.88430-5-vsementsov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:55 +01:00
Vladimir Sementsov-Ogievskiy
e0b5207f54 block/parallels: BDRVParallelsState: add cluster_size field
We are going to use it in more places, calculating
"s->tracks << BDRV_SECTOR_BITS" doesn't look good.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210224104707.88430-4-vsementsov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:54 +01:00
Vladimir Sementsov-Ogievskiy
35f428ba39 qcow2-bitmap: make bytes_covered_by_bitmap_cluster() public
Rename bytes_covered_by_bitmap_cluster() to
bdrv_dirty_bitmap_serialization_coverage() and make it public.
It is needed as we are going to share it with bitmap loading in
parallels format.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-Id: <20210224104707.88430-2-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:54 +01:00
Stefan Hajnoczi
05ae4e674e block/export: port virtio-blk read/write range check
Check that the sector number and byte count are valid.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210223144653.811468-13-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:54 +01:00
Stefan Hajnoczi
db4eadf9f1 block/export: port virtio-blk discard/write zeroes input validation
Validate discard/write zeroes the same way we do for virtio-blk. Some of
these checks are mandated by the VIRTIO specification, others are
internal to QEMU.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210223144653.811468-11-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:54 +01:00
Stefan Hajnoczi
e44362ce31 block/export: fix vhost-user-blk export sector number calculation
The driver is supposed to honor the blk_size field but the protocol
still uses 512-byte sector numbers. It is incorrect to multiply
req->sector_num by blk_size.

VIRTIO 1.1 5.2.5 Device Initialization says:

  blk_size can be read to determine the optimal sector size for the
  driver to use. This does not affect the units used in the protocol
  (always 512 bytes), but awareness of the correct value can affect
  performance.

Fixes: 3578389bcf ("block/export: vhost-user block device backend server")
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210223144653.811468-10-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:54 +01:00
Stefan Hajnoczi
524bac0744 block/export: use VIRTIO_BLK_SECTOR_BITS
Use VIRTIO_BLK_SECTOR_BITS and VIRTIO_BLK_SECTOR_SIZE when dealing with
virtio-blk sector numbers. Although the values happen to be the same as
BDRV_SECTOR_BITS and BDRV_SECTOR_SIZE, they are conceptually different.
This makes it clearer when we are dealing with virtio-blk sector units.

Use VIRTIO_BLK_SECTOR_BITS in vu_blk_initialize_config(). Later patches
will use it the new constants the virtqueue request processing code
path.

Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210223144653.811468-9-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:54 +01:00
Stefan Hajnoczi
a4f1542af5 block/export: fix blk_size double byteswap
The config->blk_size field is little-endian. Use the native-endian
blk_size variable to avoid double byteswapping.

Fixes: 11f60f7eae ("block/export: make vhost-user-blk config space little-endian")
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210223144653.811468-8-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:54 +01:00
Max Reitz
705dde27c6 backup-top: Refuse I/O in inactive state
When the backup-top node transitions from active to inactive in
bdrv_backup_top_drop(), the BlockCopyState is freed and the filtered
child is removed, so the node effectively becomes unusable.

However, noone told its I/O functions this, so they will happily
continue accessing bs->backing and s->bcs.  Prevent that by aborting
early when s->active is false.

(After the preceding patch, the node should be gone after
bdrv_backup_top_drop(), so this should largely be a theoretical problem.
But still, better to be safe than sorry, and also I think it just makes
sense to check s->active in the I/O functions.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210219153348.41861-3-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:55:18 +01:00
Max Reitz
bdc4c4c5e3 backup: Remove nodes from job in .clean()
The block job holds a reference to the backup-top node (because it is
passed as the main job BDS to block_job_create()).  Therefore,
bdrv_backup_top_drop() cannot delete the backup-top node (replacing it
by its child does not affect the job parent, because that has
.stay_at_node set).  That is a problem, because all of its I/O functions
assume the BlockCopyState (s->bcs) to be valid and that it has a
filtered child; but after bdrv_backup_top_drop(), neither of those
things are true.

It does not make sense to add new parents to backup-top after
backup_clean(), so we should detach it from the job before
bdrv_backup_top_drop().  Because there is no function to do that for a
single node, just detach all of the job's nodes -- the job does not do
anything past backup_clean() anyway.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210219153348.41861-2-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:55:18 +01:00
Paolo Bonzini
f7544edcd3 qemu-config: add error propagation to qemu_config_parse
This enables some simplification of vl.c via error_fatal, and improves
error messages.  Before:

  $ ./qemu-system-x86_64 -readconfig .
  qemu-system-x86_64: error reading file
  qemu-system-x86_64: -readconfig .: read config .: Invalid argument
  $ /usr/libexec/qemu-kvm -readconfig foo
  qemu-kvm: -readconfig foo: read config foo: No such file or directory

After:

  $ ./qemu-system-x86_64 -readconfig .
  qemu-system-x86_64: -readconfig .: Cannot read config file: Is a directory
  $ ./qemu-system-x86_64 -readconfig foo
  qemu-system-x86_64: -readconfig foo: Could not open 'foo': No such file or directory

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210226170816.231173-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06 11:41:54 +01:00
Maxim Levitsky
6094cbeb72 block: qcow2: remove the created file on initialization error
If the qcow initialization fails, we should remove the file if it was
already created, to avoid leaving stale files around.

We already do this for luks raw images.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20201217170904.946013-4-mlevitsk@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-02-15 15:10:14 +01:00
Maxim Levitsky
a890f08e58 block: add bdrv_co_delete_file_noerr
This function wraps bdrv_co_delete_file for the common case of removing a file,
which was just created by format driver, on an error condition.

It hides the -ENOTSUPP error, and reports all other errors otherwise.

Use it in luks driver

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20201217170904.946013-3-mlevitsk@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-02-15 15:10:14 +01:00
Maxim Levitsky
dcb6699512 crypto: luks: Fix tiny memory leak
When the underlying block device doesn't support the
bdrv_co_delete_file interface, an 'Error' object was leaked.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201217170904.946013-2-mlevitsk@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-02-15 15:10:14 +01:00
Peter Maydell
392b9a74b9 bitmaps patches for 2021-02-12
- add 'transform' member to manipulate bitmaps across migration
 - work towards better error handling during bdrv_open
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmAnDQsACgkQp6FrSiUn
 Q2qc5Qf/SKVdpX4j7OnHF6sBuf/8LVWz4KazSqEU0ohazBJmafgJpH2EA5pXMXR4
 frZDWeanGmhj1MjMkta/++uvEBU/TMpW2z98mZvjErteXdnRQAlII/hOCI+QZJvg
 viQ5t1EyrkyXzUePOjs+AwqA5KHWbCKt6QqyItQ78HvI23sw/fuvHj0G67KbVzXZ
 VcSrVr0J7PXnZV/hWfg+C+Nn9Ro9tsVdn79awLYVQ7/SDro3hzylpcHMQaHMK2oe
 mX4D2kNq7s21E27Zb6vlknUhQPkMdETk0gfEbpn7sTVMEc58GRLC7Tqfx7l0JIFK
 5izVyA5vndKVxDGYPkbDK6VL2uDg4A==
 =+Epy
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-bitmaps-2021-02-12' into staging

bitmaps patches for 2021-02-12

- add 'transform' member to manipulate bitmaps across migration
- work towards better error handling during bdrv_open

# gpg: Signature made Fri 12 Feb 2021 23:19:39 GMT
# gpg:                using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg:                 aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-bitmaps-2021-02-12:
  block: use return status of bdrv_append()
  block: return status from bdrv_append and friends
  qemu-iotests: 300: Add test case for modifying persistence of bitmap
  migration: dirty-bitmap: Allow control of bitmap persistence
  migration: dirty-bitmap: Use struct for alias map inner members

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-02-13 21:26:00 +00:00
Vladimir Sementsov-Ogievskiy
934aee14d3 block: use return status of bdrv_append()
Now bdrv_append returns status and we can drop all the local_err things
around it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20210202124956.63146-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-12 15:39:44 -06:00
Vladimir Sementsov-Ogievskiy
ff789bf5a9 block/backup: implement .cancel job handler
Cancel in-flight io on target to not waste the time.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210205163720.887197-10-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-12 12:17:08 -06:00
Vladimir Sementsov-Ogievskiy
521ff8b779 block/mirror: implement .cancel job handler
Cancel in-flight io on target to not waste the time.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210205163720.887197-6-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-12 11:29:40 -06:00
Vladimir Sementsov-Ogievskiy
3fc1ec3725 block/raw-format: implement .bdrv_cancel_in_flight handler
We are going to cancel in-flight requests on mirror nbd target on job
cancel. Still nbd is often used not directly but as raw-format child.
So, add pass-through handler here.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210205163720.887197-4-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-12 09:45:18 -06:00
Vladimir Sementsov-Ogievskiy
c4f7f24e1f block/nbd: implement .bdrv_cancel_in_flight
Just stop waiting for connection in existing requests.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210205163720.887197-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-12 09:45:18 -06:00
Vladimir Sementsov-Ogievskiy
bd54669a4a block: add new BlockDriver handler: bdrv_cancel_in_flight
It will be used to stop retrying NBD requests on mirror cancel.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210205163720.887197-2-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-12 09:45:18 -06:00
Daniel P. Berrangé
3d3e9b1f66 block: rename and alter bdrv_all_find_snapshot semantics
Currently bdrv_all_find_snapshot() will return 0 if it finds
a snapshot, -1 if an error occurs, or if it fails to find a
snapshot. New callers to be added want to distinguish between
the error scenario and failing to find a snapshot.

Rename it to bdrv_all_has_snapshot and make it return -1 on
error, 0 if no snapshot is found and 1 if snapshot is found.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210204124834.774401-7-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08 11:19:51 +00:00
Daniel P. Berrangé
c22d644ca7 block: allow specifying name of block device for vmstate storage
Currently the vmstate will be stored in the first block device that
supports snapshots. Historically this would have usually been the
root device, but with UEFI it might be the variable store. There
needs to be a way to override the choice of block device to store
the state in.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210204124834.774401-6-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08 11:19:51 +00:00
Daniel P. Berrangé
cf3a74c94f block: add ability to specify list of blockdevs during snapshot
When running snapshot operations, there are various rules for which
blockdevs are included/excluded. While this provides reasonable default
behaviour, there are scenarios that are not well handled by the default
logic. Some of the conditions do not have a single correct answer.

Thus there needs to be a way for the mgmt app to provide an explicit
list of blockdevs to perform snapshots across. This can be achieved
by passing a list of node names that should be used.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210204124834.774401-5-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08 11:19:51 +00:00
Daniel P. Berrangé
e26f98e209 block: push error reporting into bdrv_all_*_snapshot functions
The bdrv_all_*_snapshot functions return a BlockDriverState pointer
for the invalid backend, which the callers then use to report an
error message. In some cases multiple callers are reporting the
same error message, but with slightly different text. In the future
there will be more error scenarios for some of these methods, which
will benefit from fine grained error message reporting. So it is
helpful to push error reporting down a level.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
[PMD: Initialize variables]
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210204124834.774401-2-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08 11:19:51 +00:00
Roman Kagan
ddde5ee769 block/nbd: only enter connection coroutine if it's present
When an NBD block driver state is moved from one aio_context to another
(e.g. when doing a drain in a migration thread),
nbd_client_attach_aio_context_bh is executed that enters the connection
coroutine.

However, the assumption that ->connection_co is always present here
appears incorrect: the connection may have encountered an error other
than -EIO in the underlying transport, and thus may have decided to quit
rather than keep trying to reconnect, and therefore it may have
terminated the connection coroutine.  As a result an attempt to reassign
the client in this state (NBD_CLIENT_QUIT) to a different aio_context
leads to a null pointer dereference:

  #0  qio_channel_detach_aio_context (ioc=0x0)
      at /build/qemu-gYtjVn/qemu-5.0.1/io/channel.c:452
  #1  0x0000562a242824b3 in bdrv_detach_aio_context (bs=0x562a268d6a00)
      at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6151
  #2  bdrv_set_aio_context_ignore (bs=bs@entry=0x562a268d6a00,
      new_context=new_context@entry=0x562a260c9580,
      ignore=ignore@entry=0x7feeadc9b780)
      at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6230
  #3  0x0000562a24282969 in bdrv_child_try_set_aio_context
      (bs=bs@entry=0x562a268d6a00, ctx=0x562a260c9580,
      ignore_child=<optimized out>, errp=<optimized out>)
      at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6332
  #4  0x0000562a242bb7db in blk_do_set_aio_context (blk=0x562a2735d0d0,
      new_context=0x562a260c9580,
      update_root_node=update_root_node@entry=true, errp=errp@entry=0x0)
      at /build/qemu-gYtjVn/qemu-5.0.1/block/block-backend.c:1989
  #5  0x0000562a242be0bd in blk_set_aio_context (blk=<optimized out>,
      new_context=<optimized out>, errp=errp@entry=0x0)
      at /build/qemu-gYtjVn/qemu-5.0.1/block/block-backend.c:2010
  #6  0x0000562a23fbd953 in virtio_blk_data_plane_stop (vdev=<optimized
      out>)
      at /build/qemu-gYtjVn/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292
  #7  0x0000562a241fc7bf in virtio_bus_stop_ioeventfd (bus=0x562a260dbf08)
      at /build/qemu-gYtjVn/qemu-5.0.1/hw/virtio/virtio-bus.c:245
  #8  0x0000562a23fefb2e in virtio_vmstate_change (opaque=0x562a260dbf90,
      running=0, state=<optimized out>)
      at /build/qemu-gYtjVn/qemu-5.0.1/hw/virtio/virtio.c:3220
  #9  0x0000562a2402ebfd in vm_state_notify (running=running@entry=0,
      state=state@entry=RUN_STATE_FINISH_MIGRATE)
      at /build/qemu-gYtjVn/qemu-5.0.1/softmmu/vl.c:1275
  #10 0x0000562a23f7bc02 in do_vm_stop (state=RUN_STATE_FINISH_MIGRATE,
      send_stop=<optimized out>)
      at /build/qemu-gYtjVn/qemu-5.0.1/cpus.c:1032
  #11 0x0000562a24209765 in migration_completion (s=0x562a260e83a0)
      at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:2914
  #12 migration_iteration_run (s=0x562a260e83a0)
      at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:3275
  #13 migration_thread (opaque=opaque@entry=0x562a260e83a0)
      at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:3439
  #14 0x0000562a2435ca96 in qemu_thread_start (args=<optimized out>)
      at /build/qemu-gYtjVn/qemu-5.0.1/util/qemu-thread-posix.c:519
  #15 0x00007feed31466ba in start_thread (arg=0x7feeadc9c700)
      at pthread_create.c:333
  #16 0x00007feed2e7c41d in __GI___sysctl (name=0x0, nlen=608471908,
      oldval=0x562a2452b138, oldlenp=0x0, newval=0x562a2452c5e0
      <__func__.28102>, newlen=0)
      at ../sysdeps/unix/sysv/linux/sysctl.c:30
  #17 0x0000000000000000 in ?? ()

Fix it by checking that the connection coroutine is non-null before
trying to enter it.  If it is null, no entering is needed, as the
connection is probably going down anyway.

Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210129073859.683063-3-rvkagan@yandex-team.ru>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:17:12 -06:00
Roman Kagan
3b5e4db673 block/nbd: only detach existing iochannel from aio_context
When the reconnect in NBD client is in progress, the iochannel used for
NBD connection doesn't exist.  Therefore an attempt to detach it from
the aio_context of the parent BlockDriverState results in a NULL pointer
dereference.

The problem is triggerable, in particular, when an outgoing migration is
about to finish, and stopping the dataplane tries to move the
BlockDriverState from the iothread aio_context to the main loop.  If the
NBD connection is lost before this point, and the NBD client has entered
the reconnect procedure, QEMU crashes:

  #0  qemu_aio_coroutine_enter (ctx=0x5618056c7580, co=0x0)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-coroutine.c:109
  #1  0x00005618034b1b68 in nbd_client_attach_aio_context_bh (
      opaque=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block/nbd.c:164
  #2  0x000056180353116b in aio_wait_bh (opaque=0x7f60e1e63700)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:55
  #3  0x0000561803530633 in aio_bh_call (bh=0x7f60d40a7e80)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:136
  #4  aio_bh_poll (ctx=ctx@entry=0x5618056c7580)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:164
  #5  0x0000561803533e5a in aio_poll (ctx=ctx@entry=0x5618056c7580,
      blocking=blocking@entry=true)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-posix.c:650
  #6  0x000056180353128d in aio_wait_bh_oneshot (ctx=0x5618056c7580,
      cb=<optimized out>, opaque=<optimized out>)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:71
  #7  0x000056180345c50a in bdrv_attach_aio_context (new_context=0x5618056c7580,
      bs=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6172
  #8  bdrv_set_aio_context_ignore (bs=bs@entry=0x561805ed4c00,
      new_context=new_context@entry=0x5618056c7580,
      ignore=ignore@entry=0x7f60e1e63780)
      at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6237
  #9  0x000056180345c969 in bdrv_child_try_set_aio_context (
      bs=bs@entry=0x561805ed4c00, ctx=0x5618056c7580,
      ignore_child=<optimized out>, errp=<optimized out>)
      at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6332
  #10 0x00005618034957db in blk_do_set_aio_context (blk=0x56180695b3f0,
      new_context=0x5618056c7580, update_root_node=update_root_node@entry=true,
      errp=errp@entry=0x0)
      at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:1989
  #11 0x00005618034980bd in blk_set_aio_context (blk=<optimized out>,
      new_context=<optimized out>, errp=errp@entry=0x0)
      at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:2010
  #12 0x0000561803197953 in virtio_blk_data_plane_stop (vdev=<optimized out>)
      at /build/qemu-6MF7tq/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292
  #13 0x00005618033d67bf in virtio_bus_stop_ioeventfd (bus=0x5618056d9f08)
      at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio-bus.c:245
  #14 0x00005618031c9b2e in virtio_vmstate_change (opaque=0x5618056d9f90,
      running=0, state=<optimized out>)
      at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio.c:3220
  #15 0x0000561803208bfd in vm_state_notify (running=running@entry=0,
      state=state@entry=RUN_STATE_FINISH_MIGRATE)
      at /build/qemu-6MF7tq/qemu-5.0.1/softmmu/vl.c:1275
  #16 0x0000561803155c02 in do_vm_stop (state=RUN_STATE_FINISH_MIGRATE,
      send_stop=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/cpus.c:1032
  #17 0x00005618033e3765 in migration_completion (s=0x5618056e6960)
      at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:2914
  #18 migration_iteration_run (s=0x5618056e6960)
      at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3275
  #19 migration_thread (opaque=opaque@entry=0x5618056e6960)
      at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3439
  #20 0x0000561803536ad6 in qemu_thread_start (args=<optimized out>)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-thread-posix.c:519
  #21 0x00007f61085d06ba in start_thread ()
     from /lib/x86_64-linux-gnu/libpthread.so.0
  #22 0x00007f610830641d in sysctl () from /lib/x86_64-linux-gnu/libc.so.6
  #23 0x0000000000000000 in ?? ()

Fix it by checking that the iochannel is non-null before trying to
detach it from the aio_context.  If it is null, no detaching is needed,
and it will get reattached in the proper aio_context once the connection
is reestablished.

Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210129073859.683063-2-rvkagan@yandex-team.ru>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy
a5215b8fdf block/io: use int64_t bytes in copy_range
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, convert now copy_range parameters which are already 64bit to signed
type.

It's safe as we don't work with requests overflowing BDRV_MAX_LENGTH
(which is less than INT64_MAX), and do check the requests in
bdrv_co_copy_range_internal() (by bdrv_check_request32(), which calls
bdrv_check_request()).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-17-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy
e9e52efdc5 block/io: support int64_t bytes in read/write wrappers
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

Now, since bdrv_co_preadv_part() and bdrv_co_pwritev_part() have been
updated, update all their wrappers.

For all of them type of 'bytes' is widening, so callers are safe. We
have update request_fn in blkverify.c simultaneously. Still it's just a
pointer to one of bdrv_co_pwritev() or bdrv_co_preadv(), and type is
widening for callers of the request_fn anyway.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-16-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy
37e9403ea8 block/io: support int64_t bytes in bdrv_co_p{read,write}v_part()
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_co_preadv_part() and bdrv_co_pwritev_part() and their
remaining dependencies now.

bdrv_pad_request() is updated simultaneously, as pointer to bytes passed
to it both from bdrv_co_pwritev_part() and bdrv_co_preadv_part().

So, all callers of bdrv_pad_request() are updated to pass 64bit bytes.
bdrv_pad_request() is already good for 64bit requests, add
corresponding assertion.

Look at bdrv_co_preadv_part() and bdrv_co_pwritev_part().
Type is widening, so callers are safe. Let's look inside the functions.

In bdrv_co_preadv_part() and bdrv_aligned_pwritev() we only pass bytes
to other already int64_t interfaces (and some obviously safe
calculations), it's OK.

In bdrv_co_do_zero_pwritev() aligned_bytes may become large now, still
it's passed to bdrv_aligned_pwritev which supports int64_t bytes.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-15-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:17:11 -06:00
Vladimir Sementsov-Ogievskiy
8b0c5d7659 block/io: support int64_t bytes in bdrv_aligned_preadv()
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_aligned_preadv() now.

Make the bytes variable in bdrv_padding_rmw_read() int64_t, as it is
only used for pass-through to bdrv_aligned_preadv().

All bdrv_aligned_preadv() callers are safe as type is widening. Let's
look inside:

 - add a new-style assertion that request is good.
 - callees bdrv_is_allocated(), bdrv_co_do_copy_on_readv() supports
   int64_t bytes
 - conversion of bytes_remaining is OK, as we never have requests
   overflowing BDRV_MAX_LENGTH
 - looping through bytes_remaining is ok, num is updated to int64_t
   - for bdrv_driver_preadv we have same limit of max_transfer
   - qemu_iovec_memset is OK, as bytes+qiov_offset should not overflow
     qiov->size anyway (thanks to bdrv_check_qiov_request())

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-14-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:17:11 -06:00
Vladimir Sementsov-Ogievskiy
9df5afbdd1 block/io: support int64_t bytes in bdrv_co_do_copy_on_readv()
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_co_do_copy_on_readv() now.

'bytes' type widening, so callers are safe. Look at the function
itself:

bytes, skip_bytes and progress become int64_t.

bdrv_round_to_clusters() is OK, cluster_bytes now may be large.
trace_bdrv_co_do_copy_on_readv() is OK

looping through cluster_bytes is still OK.

pnum is still capped to max_transfer, and to MAX_BOUNCE_BUFFER when we
are going to do COR operation. Therefor calculations in
qemu_iovec_from_buf() and bdrv_driver_preadv() should not change.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-13-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:17:11 -06:00
Vladimir Sementsov-Ogievskiy
fcfd9ade68 block/io: support int64_t bytes in bdrv_aligned_pwritev()
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_aligned_pwritev() now and convert the dependencies:
bdrv_co_write_req_prepare() and bdrv_co_write_req_finish() to signed
type bytes.

Conversion of bdrv_co_write_req_prepare() and
bdrv_co_write_req_finish() is definitely safe, as all requests in
block/io must not overflow BDRV_MAX_LENGTH. Still add assertions.

For bdrv_aligned_pwritev() 'bytes' type is widened, so callers are
safe. Let's check usage of the parameter inside the function.

Passing to bdrv_co_write_req_prepare() and bdrv_co_write_req_finish()
is OK.

Passing to qemu_iovec_* is OK after new assertion. All other callees
are already updated to int64_t.

Checking alignment is not changed, offset + bytes and qiov_offset +
bytes calculations are safe (thanks to new assertions).

max_transfer is kept to be int for now. It has a default of INT_MAX
here, and some drivers may rely on it. It's to be refactored later.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-12-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:16:03 -06:00
Vladimir Sementsov-Ogievskiy
5ae07b1410 block/io: support int64_t bytes in bdrv_co_do_pwrite_zeroes()
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_co_do_pwrite_zeroes() now.

Callers are safe, as converting int to int64_t is safe. Concentrate on
'bytes' usage in the function (thx to Eric Blake):

    compute 'int tail' via % 'int alignment' - safe
    fragmentation loop 'int num' - still fragments with a cap on
      max_transfer

    use of 'num' within the loop
    MIN(bytes, max_transfer) as well as %alignment - still works, so
         calculations in if (head) {} are safe
    clamp size by 'int max_write_zeroes' - safe
    drv->bdrv_co_pwrite_zeroes(int) - safe because of clamping
    clamp size by 'int max_transfer' - safe
    buf allocation is still clamped to max_transfer
    qemu_iovec_init_buf(size_t) - safe because of clamping
    bdrv_driver_pwritev(uint64_t) - safe

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-11-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:16:03 -06:00
Vladimir Sementsov-Ogievskiy
17abcbeee2 block/io: use int64_t bytes in driver wrappers
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, convert driver wrappers parameters which are already 64bit to
signed type.

Requests in block/io.c must never exceed BDRV_MAX_LENGTH (which is less
than INT64_MAX), which makes the conversion to signed 64bit type safe.

Add corresponding assertions.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-10-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:16:03 -06:00
Eric Blake
8024726459 block: use int64_t as bytes type in tracked requests
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

All requests in block/io must not overflow BDRV_MAX_LENGTH, all
external users of BdrvTrackedRequest already have corresponding
assertions, so we are safe. Add some assertions still.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-9-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:14:15 -06:00
Vladimir Sementsov-Ogievskiy
63f4ad1186 block/io: improve bdrv_check_request: check qiov too
Operations with qiov add more restrictions on bytes, let's cover it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-8-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:14:00 -06:00
Vladimir Sementsov-Ogievskiy
801625e69d block/throttle-groups: throttle_group_co_io_limits_intercept(): 64bit bytes
The function is called from 64bit io handlers, and bytes is just passed
to throttle_account() which is 64bit too (unsigned though). So, let's
convert intermediate argument to 64bit too.

This patch is a first in the 64-bit-blocklayer series, so we are
generally moving to int64_t for both offset and bytes parameters on all
io paths. Main motivation is realization of 64-bit write_zeroes
operation for fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

Patch-correctness audit by Eric Blake:

  Caller has 32-bit, this patch now causes widening which is safe:
  block/block-backend.c: blk_do_preadv() passes 'unsigned int'
  block/block-backend.c: blk_do_pwritev_part() passes 'unsigned int'
  block/throttle.c: throttle_co_pwrite_zeroes() passes 'int'
  block/throttle.c: throttle_co_pdiscard() passes 'int'

  Caller has 64-bit, this patch fixes potential bug where pre-patch
  could narrow, except it's easy enough to trace that callers are still
  capped at 2G actions:
  block/throttle.c: throttle_co_preadv() passes 'uint64_t'
  block/throttle.c: throttle_co_pwritev() passes 'uint64_t'

  Implementation in question: block/throttle-groups.c
  throttle_group_co_io_limits_intercept() takes 'unsigned int bytes'
  and uses it: argument to util/throttle.c throttle_account(uint64_t)

  All safe: it patches a latent bug, and does not introduce any 64-bit
  gotchas once throttle_co_p{read,write}v are relaxed, and assuming
  throttle_account() is not buggy.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20201211183934.169161-7-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:14:00 -06:00
Vladimir Sementsov-Ogievskiy
98ca45494f block/io: bdrv_pad_request(): support qemu_iovec_init_extended failure
Make bdrv_pad_request() honest: return error if
qemu_iovec_init_extended() failed.

Update also bdrv_padding_destroy() to clean the structure for safety.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-6-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:14:00 -06:00
Vladimir Sementsov-Ogievskiy
f0deecff82 block/io: refactor bdrv_pad_request(): move bdrv_pad_request() up
Prepare for the following patch when bdrv_pad_request() will be able to
fail. Update the comments.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:00:52 -06:00
Vladimir Sementsov-Ogievskiy
a56ed80c42 block: fix theoretical overflow in bdrv_init_padding()
Calculation of sum may theoretically overflow, so use 64bit type and
add some good assertions.

Use int64_t constantly.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: tweak assertion order]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:00:33 -06:00
Vladimir Sementsov-Ogievskiy
4c002cef0e util/iov: make qemu_iovec_init_extended() honest
Actually, we can't extend the io vector in all cases. Handle possible
MAX_IOV and size_t overflows.

For now add assertion to callers (actually they rely on success anyway)
and fix them in the following patch.

Add also some additional good assertions to qemu_iovec_init_slice()
while being here.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:00:33 -06:00
Vladimir Sementsov-Ogievskiy
69b55e03f7 block: refactor bdrv_check_request: add errp
It's better to pass &error_abort than just assert that result is 0: on
crash, we'll immediately see the reason in the backtrace.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: fix iotest 206 fallout]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:00:33 -06:00
Kevin Wolf
26513a0174 block: Fix VM size column width in bdrv_snapshot_dump()
size_to_str() can return a size like "4.24 MiB", with a single digit
integer part and two fractional digits. This is eight characters, but
commit b39847a5 changed the format string to only reserve seven
characters for the column.

This can result in unaligned columns, which in turn changes the output of
iotests case 267 because exceeding the column size defeats the attempt
to filter the size out of the output (observed with the ppc64 emulator).
The resulting change is only a whitespace change, but since commit
f203080b this is enough for iotests to consider the test failed.

Taking a character away from the tag name column and adding it to the VM
size column doesn't change anything in the common case (the tag name is
left justified, the VM size is right justified), but fixes this case.

Fixes: b39847a505
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210202155911.179865-1-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-02-02 17:23:55 +01:00
Philippe Mathieu-Daudé
fcc8672aca block/nvme: Trace NVMe spec version supported by the controller
NVMe controllers implement different versions of the spec,
and different features of it. It is useful to gather this
information when debugging.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210127212137.3482291-3-philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-02-02 17:05:38 +01:00
Philippe Mathieu-Daudé
97b709f32e block/nvme: Properly display doorbell stride length in trace event
Commit 15b2260bef ("block/nvme: Trace controller capabilities")
misunderstood the doorbell stride value from the datasheet, use
the correct one. The 'doorbell_scale' variable used few lines
later is correct.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210127212137.3482291-2-philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-02-02 17:05:38 +01:00
Peter Maydell
7e7eb9f852 QAPI patches patches for 2021-01-28
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmASY10SHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZT4M4P+gKN64+WaErotLHHsqtiA0aoTwbTFXin
 OEyR5du0+PjX96qYHHV+ZDn5uxxKI57/SRNooPndjU63sYgAbNApfsu+wUDZC844
 qMSlrmyw2+Lw1EIykoLXK49+pEDU0XpVIciL5+zEdtCgjiJRjrOOJ/JRBcKoQNHn
 UArGNQ8y0D+0i8uXyJjyvQeHdz6KUr9sX1vqwRGMt9axEMDJks0+Si4Zg3z2wlWJ
 Sc3WsXEhikxK1qkF2/6VsopOgNGB0UUvV6q1GO6ngdqag1Hb6mACzSv9mtIShGjh
 a2MISBhxF8h4wfO8U5TiS9vBgYR3elA3kRGsn4FOfD3sSilt/SWLPHWXdlO1aL2E
 TollRPtYBqn2YIYQP1SEp7NIqaWC/QaGkP/mH8Jvv0YlL64RK879lv6KiHKzfvI7
 HBD7WGZBwMQqPczuw308tqDTQPKUsPDYoEJAFRywkLry86wL8DBOlkQ0lWUjF06s
 UQk/i09nhrcNLo0GbmgAOHUVj4m03zLyMW/fYmsQ8xe9/b6GBwJvtm2v5wKwg0HE
 ixxj4oBIk5YV5Xwt7DKLkT0voPAAgNK13a6ywzbyfsigwaJaO9tLtZ0PMuaT9kgs
 b/OBdeeIYpFdIT/DlcMWpIFi53VYe0McX8MmprHcMZb1133wk5Z5gk+FAWLMifrw
 2ltmoUPoB1dC
 =djiE
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2021-01-28' into staging

QAPI patches patches for 2021-01-28

# gpg: Signature made Thu 28 Jan 2021 07:10:21 GMT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-qapi-2021-01-28:
  qapi: More complex uses of QAPI_LIST_APPEND
  qapi: Use QAPI_LIST_APPEND in trivial cases
  qapi: Introduce QAPI_LIST_APPEND
  qapi: A couple more QAPI_LIST_PREPEND() stragglers
  net: Clarify early exit condition

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-01-28 22:43:18 +00:00
Eric Blake
95b3a8c8a8 qapi: More complex uses of QAPI_LIST_APPEND
These cases require a bit more thought to review; in each case, the
code was appending to a list, but not with a FOOList **tail variable.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210113221013.390592-6-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Flawed change to qmp_guest_network_get_interfaces() dropped]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-01-28 08:08:45 +01:00
Eric Blake
c3033fd372 qapi: Use QAPI_LIST_APPEND in trivial cases
The easiest spots to use QAPI_LIST_APPEND are where we already have an
obvious pointer to the tail of a list.  While at it, consistently use
the variable name 'tail' for that purpose.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210113221013.390592-5-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-01-28 08:08:45 +01:00
Kevin Wolf
86b1cf3227 block: Separate blk_is_writable() and blk_supports_write_perm()
Currently, blk_is_read_only() tells whether a given BlockBackend can
only be used in read-only mode because its root node is read-only. Some
callers actually try to answer a slightly different question: Is the
BlockBackend configured to be writable, by taking write permissions on
the root node?

This can differ, for example, for CD-ROM devices which don't take write
permissions, but may be backed by a writable image file. scsi-cd allows
write requests to the drive if blk_is_read_only() returns false.
However, the write request will immediately run into an assertion
failure because the write permission is missing.

This patch introduces separate functions for both questions.
blk_supports_write_perm() answers the question whether the block
node/image file can support writable devices, whereas blk_is_writable()
tells whether the BlockBackend is currently configured to be writable.

All calls of blk_is_read_only() are converted to one of the two new
functions.

Fixes: https://bugs.launchpad.net/bugs/1906693
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210118123448.307825-2-kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-01-27 20:45:20 +01:00
David Edmondson
797e3e3805 block: report errno when flock fcntl fails
When a call to fcntl(2) for the purpose of adding file locks fails
with an error other than EAGAIN or EACCES, report the error returned
by fcntl.

EAGAIN or EACCES are elided as they are considered to be common
failures, indicating that a conflicting lock is held by another
process.

No errors are elided when removing file locks.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210113164447.2545785-1-david.edmondson@oracle.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
143a6384f5 block/block-copy: drop unused argument of block_copy()
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-21-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
5b49c2bdc1 block/block-copy: drop unused block_copy_set_progress_callback()
Drop unused code.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-20-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
71eed4cebe backup: move to block-copy
This brings async request handling and block-status driven chunk sizes
to backup out of the box, which improves backup performance.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-18-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
511e7d31bf block/backup: drop extra gotos from backup_run()
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-17-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
d51590fc3e block/block-copy: make progress_bytes_callback optional
We are going to stop use of this callback in the following commit.
Still the callback handling code will be dropped in a separate commit.
So, for now let's make it optional.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-16-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
2c59fd833a qapi: backup: add max-chunk and max-workers to x-perf struct
Add new parameters to configure future backup features. The patch
doesn't introduce aio backup requests (so we actually have only one
worker) neither requests larger than one cluster. Still, formally we
satisfy these maximums anyway, so add the parameters now, to facilitate
further patch which will really change backup job behavior.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-11-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
a6d23d56df block/block-copy: add block_copy_cancel
Add function to cancel running async block-copy call. It will be used
in backup.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-8-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
7e032df0ea block/block-copy: add ratelimit to block-copy
We are going to directly use one async block-copy operation for backup
job, so we need rate limiter.

We want to maintain current backup behavior: only background copying is
limited and copy-before-write operations only participate in limit
calculation. Therefore we need one rate limiter for block-copy state
and boolean flag for block-copy call state for actual limitation.

Note, that we can't just calculate each chunk in limiter after
successful copying: it will not save us from starting a lot of async
sub-requests which will exceed limit too much. Instead let's use the
following scheme on sub-request creation:
1. If at the moment limit is not exceeded, create the request and
account it immediately.
2. If at the moment limit is already exceeded, drop create sub-request
and handle limit instead (by sleep).
With this approach we'll never exceed the limit more than by one
sub-request (which pretty much matches current backup behavior).

Note also, that if there is in-flight block-copy async call,
block_copy_kick() should be used after set-speed to apply new setup
faster. For that block_copy_kick() published in this patch.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-7-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
2e099a9d29 block/block-copy: add list of all call-states
It simplifies debugging.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-6-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
26be9d62dd block/block-copy: add max_chunk and max_workers parameters
They will be used for backup.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-5-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
de4641b46b block/block-copy: implement block_copy_async
We'll need async block-copy invocation to use in backup directly.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-4-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
3b8c2329b5 block/block-copy: More explicit call_state
Refactor common path to use BlockCopyCallState pointer as parameter, to
prepare it for use in asynchronous block-copy (at least, we'll need to
run block-copy in a coroutine, passing the whole parameters as one
pointer).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-3-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
86c6a3b690 qapi: backup: add perf.use-copy-range parameter
Experiments show, that copy_range is not always making things faster.
So, to make experimentation simpler, let's add a parameter. Some more
perf parameters will be added soon, so here is a new struct.

For now, add new backup qmp parameter with x- prefix for the following
reasons:

 - We are going to add more performance parameters, some will be
   related to the whole block-copy process, some only to background
   copying in backup (ignored for copy-before-write operations).
 - On the other hand, we are going to use block-copy interface in other
   block jobs, which will need performance options as well.. And it
   should be the same structure or at least somehow related.

So, there are too much unclean things about how the interface and now
we need the new options mostly for testing. Let's keep them
experimental for a while.

In do_backup_common() new x-perf parameter handled in a way to
make further options addition simpler.

We add use-copy-range with default=true, and we'll change the default
in further patch, after moving backup to use block-copy.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-2-vsementsov@virtuozzo.com>
[mreitz: s/5\.2/6.0/]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Andrey Shinkevich
205736f488 block: apply COR-filter to block-stream jobs
This patch completes the series with the COR-filter applied to
block-stream operations.

Adding the filter makes it possible in future implement discarding
copied regions in backing files during the block-stream job, to reduce
the disk overuse (we need control on permissions).

Also, the filter now is smart enough to do copy-on-read with specified
base, so we have benefit on guest reads even when doing block-stream of
the part of the backing chain.

Several iotests are slightly modified due to filter insertion.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201216061703.70908-14-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
0f6c94988a block/stream: add s->target_bs
Add a direct link to target bs for convenience and to simplify
following commit which will insert COR filter above target bs.

This is a part of original commit written by Andrey.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-13-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
7f4a396d76 qapi: block-stream: add "bottom" argument
The code already don't freeze base node and we try to make it prepared
for the situation when base node is changed during the operation. In
other words, block-stream doesn't own base node.

Let's introduce a new interface which should replace the current one,
which will in better relations with the code. Specifying bottom node
instead of base, and requiring it to be non-filter gives us the
following benefits:

 - drop difference between above_base and base_overlay, which will be
   renamed to just bottom, when old interface dropped

 - clean way to work with parallel streams/commits on the same backing
   chain, which otherwise become a problem when we introduce a filter
   for stream job

 - cleaner interface. Nobody will surprised the fact that base node may
   disappear during block-stream, when there is no word about "base" in
   the interface.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201216061703.70908-11-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Andrey Shinkevich
000e5a1cda stream: rework backing-file changing
Stream in stream_prepare calls bdrv_change_backing_file() to change
backing-file in the metadata of bs.

It may use either backing-file parameter given by user or just take
filename of base on job start.

Backing file format is determined by base on job finish.

There are some problems with this design, we solve only two by this
patch:

1. Consider scenario with backing-file unset. Current concept of stream
supports changing of the base during the job (we don't freeze link to
the base). So, we should not save base filename at job start,

  - let's determine name of the base on job finish.

2. Using direct base to determine filename and format is not very good:
base node may be a filter, so its filename may be JSON, and format_name
is not good for storing into qcow2 metadata as backing file format.

  - let's use unfiltered_base

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
  [vsementsov: change commit subject, change logic in stream_prepare]
Message-Id: <20201216061703.70908-10-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Andrey Shinkevich
e275458b29 copy-on-read: skip non-guest reads if no copy needed
If the flag BDRV_REQ_PREFETCH was set, skip idling read/write
operations in COR-driver. It can be taken into account for the
COR-algorithms optimization. That check is being made during the
block stream job by the moment.

Add the BDRV_REQ_PREFETCH flag to the supported_read_flags of the
COR-filter.

block: Modify the comment for the flag BDRV_REQ_PREFETCH as we are
going to use it alone and pass it to the COR-filter driver for further
processing.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-9-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Andrey Shinkevich
897dd0ec4f block: include supported_read_flags into BDS structure
Add the new member supported_read_flags to the BlockDriverState
structure. It will control the flags set for copy-on-read operations.
Make the block generic layer evaluate supported read flags before they
go to a block driver.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
 [vsementsov: use assert instead of abort]
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-8-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Andrey Shinkevich
e4c8fddde7 qapi: copy-on-read filter: add 'bottom' option
Add an option to limit copy-on-read operations to specified sub-chain
of backing-chain, to make copy-on-read filter useful for block-stream
job.

Suggested-by: Max Reitz <mreitz@redhat.com>
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
  [vsementsov: change subject, modified to freeze the chain,
   do some fixes]
Message-Id: <20201216061703.70908-6-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 11:26:54 +01:00
Andrey Shinkevich
880747a887 qapi: add filter-node-name to block-stream
Provide the possibility to pass the 'filter-node-name' parameter to the
block-stream job as it is done for the commit block job.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
  [vsementsov: comment indentation, s/Since: 5.2/Since: 6.0/]
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-5-vsementsov@virtuozzo.com>
[mreitz: s/commit/stream/]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 11:26:54 +01:00
Andrey Shinkevich
16e09a21af copy-on-read: add filter drop function
Provide API for the COR-filter removal. Also, drop the filter child
permissions for an inactive state when the filter node is being
removed.
To insert the filter, the block generic layer function
bdrv_insert_node() can be used.
The new function bdrv_cor_filter_drop() may be considered as an
intermediate solution before the QEMU permission update system has
overhauled. Then we are able to implement the API function
bdrv_remove_node() on the block generic layer.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-4-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 11:26:54 +01:00
Andrey Shinkevich
1252e03b8e copy-on-read: support preadv/pwritev_part functions
Add support for the recently introduced functions
bdrv_co_preadv_part()
and
bdrv_co_pwritev_part()
to the COR-filter driver.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201216061703.70908-2-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 11:26:54 +01:00
Peter Maydell
45240eed4f Yank patches patches for 2021-01-13
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl/+vJoSHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZTyv8P/3Dqb9sM8p2WIeoUt5KjgkgWto8anCXM
 /vzAvVdOrPcLXgHF1HOEkcYkp5ZDzFb1PP+LNWsbIB82HmGUGfA7CiOpCpXEoDJs
 Z9OYR3K8W5fvSKYTI/m+s7d+9aYqRZajI6ON5M4Eqem0ZwV93/SZMBHOcs1GmvIR
 diXIztaWVgHjU1Q37MlvJTM4lLN3RH1kTWEdfp3dkNMO6HxBet0B1g7xwaPxKrgb
 4y8D/kk9TA1m4wnwrr9s1l3UnfDiZ7mSfsEsXMcTMmQQrAtonD/xiX2YFsHwf6+U
 9cX9BCG2XP65t3ynY5goddcjVX3R6SuP4YWSgYUpJGrSx+GVxXrtF0ZumgiCk+4T
 uv8sOkJdPe9/aRy86UkNzJx7V50nCZnJ2if9neukVjHGW4Hw4txcYV5/7ZuuDqKl
 tF5NdF/LcHEZLKkBt+4g4TpJ7vxEmJc8/ukn9niLT381SkRBFnnP+Bnl9taSlHOY
 xNtQJ3Jrcd/cvBjAPCKgtE4Fx6wzQG3c7Yg4WHxbZzcYZBPp4fifUTLCK5XKHqhb
 rlqCQIO9DzGz5tqOgG7hWmlodSafGiDsHo9tJVpyF5pSSUv3A4KzX2xW4FZZLKJn
 7uBrcV0bLmR4tyw+fr+u2EW0ClYrs/JxeXAnsnTp9JrzkXILf5RjEuK0Sc1ZIuZW
 cmuPa8027ybj
 =fLO1
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-yank-2021-01-13' into staging

Yank patches patches for 2021-01-13

# gpg: Signature made Wed 13 Jan 2021 09:25:46 GMT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-yank-2021-01-13:
  tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test
  io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown
  io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
  migration: Add yank feature
  chardev/char-socket.c: Add yank feature
  block/nbd.c: Add yank feature
  Introduce yank feature

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-01-13 14:19:24 +00:00
Lukas Straub
fee091cdff block/nbd.c: Add yank feature
Register a yank function which shuts down the socket and sets
s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an
error occured.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <b73eb07db6d1fcd00667beb13ae6117260f002c3.1609167865.git.lukasstraub2@web.de>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-01-13 10:21:17 +01:00
Roman Bolshakov
3eacf70bb5 meson: Propagate gnutls dependency
crypto/tlscreds.h includes GnuTLS headers if CONFIG_GNUTLS is set, but
GNUTLS_CFLAGS, that describe include path, are not propagated
transitively to all users of crypto and build fails if GnuTLS headers
reside in non-standard directory (which is a case for homebrew on Apple
Silicon).

Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Message-Id: <20210102125213.41279-1-r.bolshakov@yadro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-12 12:38:03 +01:00
Peter Maydell
729cc68373 Remove superfluous timer_del() calls
This commit is the result of running the timer-del-timer-free.cocci
script on the whole source tree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201215154107.3255-4-peter.maydell@linaro.org
2021-01-08 15:13:38 +00:00
Paolo Bonzini
9db405a335 libiscsi: convert to meson
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-02 21:03:37 +01:00
Paolo Bonzini
8e4e2b551d curl: remove compatibility code, require 7.29.0
cURL 7.16.0 was released in October 2006.  Just remove code that is
in all likelihood not being used anywhere, and require the oldest version
found in currently supported distros, which is 7.29.0 from CentOS 7.

pkg-config is enough for QEMU, since it does not need extra information
such as the path for certicate authorities.  All supported platforms
today will all have pkg-config for curl, so we can drop curl-config.

Suggested-by: Daniel Berrangé <berrange@redhat.com>
Reviewed-by: Daniel Berrangé <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-02 21:03:37 +01:00
Paolo Bonzini
2f2a376a42 meson: use dependency to gate block modules
This allows converting the dependencies to meson options one by one.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-02 21:03:36 +01:00
Peter Maydell
1f7c02797f QAPI patches patches for 2020-12-19
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl/dynUSHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZT3igP/3bWwsKR5vKVsDUTmMfrhcgaFvQiaYoG
 F29Bond8Xy0Zd0gl7OWh/5jKL0vGlrEVPrKfYLUjMnfkeRec/pOkIB2oOmIxpnPs
 9zi4kh2hQ3dEoRBuvSnnZzedetYPTuCpWMIjlztkgfxgcimqm8TPNVSxRaSApjC3
 Y8108wGwBWVf2C0rhKO9E2xA51uo6khy05i1psUtqUlC+PuDQ/OwzQHM2dnWdDB6
 kUwBDK17nhL6WwsYqCyKLSiDModReYfDiY8GS5MDLo74dzwXiatEefCR7+sbM4xq
 eX/SBoqoeS1jLPNuCryNeGNKvNA2KAbEJTnbQA2NxBXHgZ9/1SxVZFxuPp4nDMSQ
 N7BDuDI8YtJE479RjT/ZzRG65xadGBSe/HXkXM9mZwh1zitop8SVZ9fArFBHvNzw
 Y5zAv3fQd54+87psffg4dYFK0wGmqTabLEEuVzM8KIVqcAdYA2yC2b2EHy+vsxuq
 GMkr0WaA6Sq2gthXmzdTjmUPuHdan/NIhuV6d66SbPNH2oH31piptFxuznyFWSKV
 isciFFdUrkg5QrF8DSt2nmdwMFf8QGbszqP8QIGMzhJCCS9GXIiGG8f149++q8X8
 HO1lFAdLQJdrDwCYmfx36tOvi2rS/rcoTGgvg66UX3xKko1ruoxR1ZWcS54obJN6
 vEQDZ+PxubDg
 =vGLy
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2020-12-19' into staging

QAPI patches patches for 2020-12-19

# gpg: Signature made Sat 19 Dec 2020 09:40:05 GMT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-qapi-2020-12-19: (33 commits)
  qobject: Make QString immutable
  block: Use GString instead of QString to build filenames
  keyval: Use GString to accumulate value strings
  json: Use GString instead of QString to accumulate strings
  migration: Replace migration's JSON writer by the general one
  qobject: Factor JSON writer out of qobject_to_json()
  qobject: Factor quoted_str() out of to_json()
  qobject: Drop qstring_get_try_str()
  qobject: Drop qobject_get_try_str()
  Revert "qobject: let object_property_get_str() use new API"
  block: Avoid qobject_get_try_str()
  qmp: Fix tracing of non-string command IDs
  qobject: Move internals to qobject-internal.h
  hw/rdma: Replace QList by GQueue
  Revert "qstring: add qstring_free()"
  qobject: Change qobject_to_json()'s value to GString
  qobject: Use GString instead of QString to accumulate JSON
  qobject: Make qobject_to_json_pretty() take a pretty argument
  monitor: Use GString instead of QString for output buffer
  hmp: Simplify how qmp_human_monitor_command() gets output
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-01-01 14:33:03 +00:00
Peter Maydell
26f6b15e26 Block patches:
- New block filter: preallocate (which, on writes beyond an image file's
   end, allocates big chunks of data so that such post-EOF writes will
   occur less frequently)
 - write-zeroes and block-status support for Quorum
 - Implementation of truncate for the nvme block driver similarly to the
   existing implementations for host block devices and iscsi devices
 - Block layer refactoring: Drop the tighten_restrictions concept in the
   block permission functions
 - iotest fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAl/cwIoSHG1yZWl0ekBy
 ZWRoYXQuY29tAAoJEPQH2wBh1c9AnvMH+gOnZCwEUKWuBxGX3Wjb/kqV1OuhAhcP
 IVrKLRnqdarCYMQ9M4SZL6pedfsujHA7vClTV7NTrenXBsEIradBQ59ztQ0oDirS
 4ipIjVtNqj7m86l+IRZDq5HlwOYwwFnWogmLo2bcmNJGLpPQQfrhL2vRJ1wLgFYk
 WjeAVlkkYcHnTIDvs4ne9WRSlxGVBWJ4X5nSlRdZqeyUcMY9v4wL4P9Wc4ZuORmq
 /5HRcT5JKGaT2bAueaqAGEdtPFGbazEP5uU7MTTK/fueDKIRAXO2d0gqhANtOOJQ
 7hMmKhwOPOOhrrpCVi9nxsVwdCOHfurV0km6cOs+Iprm/Wm2UtuS/A8=
 =z+7k
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2020-12-18' into staging

Block patches:
- New block filter: preallocate (which, on writes beyond an image file's
  end, allocates big chunks of data so that such post-EOF writes will
  occur less frequently)
- write-zeroes and block-status support for Quorum
- Implementation of truncate for the nvme block driver similarly to the
  existing implementations for host block devices and iscsi devices
- Block layer refactoring: Drop the tighten_restrictions concept in the
  block permission functions
- iotest fixes

# gpg: Signature made Fri 18 Dec 2020 14:45:30 GMT
# gpg:                using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40
# gpg:                issuer "mreitz@redhat.com"
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full]
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/maxreitz/tags/pull-block-2020-12-18: (30 commits)
  iotests: Fix _send_qemu_cmd with bash 5.1
  iotests/102: Pass $QEMU_HANDLE to _send_qemu_cmd
  block/nvme: Implement fake truncate() coroutine
  quorum: Implement bdrv_co_pwrite_zeroes()
  quorum: Implement bdrv_co_block_status()
  scripts/simplebench: add bench_prealloc.py
  simplebench/results_to_text: make executable
  simplebench/results_to_text: add difference line to the table
  simplebench/results_to_text: improve view of the table
  simplebench: move results_to_text() into separate file
  simplebench: rename ascii() to results_to_text()
  scripts/simplebench: use standard deviation for +- error
  scripts/simplebench: support iops
  scripts/simplebench: fix grammar: s/successed/succeeded/
  iotests: add 298 to test new preallocate filter driver
  iotests.py: execute_setup_common(): add required_fmts argument
  iotests: qemu_io_silent: support --image-opts
  qemu-io: add preallocate mode parameter for truncate command
  block: introduce preallocate filter
  block: bdrv_check_perm(): process children anyway
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-12-31 23:26:46 +00:00
Markus Armbruster
eab3a4678b qobject: Change qobject_to_json()'s value to GString
qobject_to_json() and qobject_to_json_pretty() build a GString, then
covert it to QString.  Just one of the callers actually needs a
QString: qemu_rbd_parse_filename().  A few others need a string they
can modify: qmp_send_response(), qga's send_response(), to_json_str(),
and qmp_fd_vsend_fds().  The remainder just need a string.

Change qobject_to_json() and qobject_to_json_pretty() to return the
GString.

qemu_rbd_parse_filename() now has to convert to QString.  All others
save a QString temporary.  to_json_str() actually becomes a bit
simpler, because GString provides more convenient modification
functions.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201211171152.146877-6-armbru@redhat.com>
2020-12-19 10:38:43 +01:00
Eric Blake
54aa3de72e qapi: Use QAPI_LIST_PREPEND() where possible
Anywhere we create a list of just one item or by prepending items
(typically because order doesn't matter), we can use
QAPI_LIST_PREPEND().  But places where we must keep the list in order
by appending remain open-coded until later patches.

Note that as a side effect, this also performs a cleanup of two minor
issues in qga/commands-posix.c: the old code was performing
 new = g_malloc0(sizeof(*ret));
which 1) is confusing because you have to verify whether 'new' and
'ret' are variables with the same type, and 2) would conflict with C++
compilation (not an actual problem for this file, but makes
copy-and-paste harder).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201113011340.463563-5-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
[Straightforward conflicts due to commit a8aa94b5f8 "qga: update
schema for guest-get-disks 'dependents' field" and commit a10b453a52
"target/mips: Move mips_cpu_add_definition() from helper.c to cpu.c"
resolved.  Commit message tweaked.]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2020-12-19 10:20:14 +01:00
Markus Armbruster
be7c5ddd0d block/vpc: Use sizeof() instead of HEADER_SIZE for footer size
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-10-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:30 +01:00
Markus Armbruster
a3d2761719 block/vpc: Pass footer buffers as VHDFooter * instead of uint8_t *
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-9-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:28 +01:00
Markus Armbruster
275734e479 block/vpc: Pad VHDFooter, replace uint8_t[] buffers
Pad VHDFooter as specified in the "Virtual Hard Disk Image Format
Specification" version 1.0[*].  Change footer buffers from
uint8_t[HEADER_SIZE] to VHDFooter.  Their size remains the same.

The VHDFooter * variables pointing to a VHDFooter variable right next
to it are now silly.  Eliminate them, and shorten the remaining
variables' names.

Most variables pointing to s->footer are now also silly.  Eliminate
them, too.

[*] http://download.microsoft.com/download/f/f/e/ffef50a5-07dd-4cf8-aaa3-442c0673a029/Virtual%20Hard%20Disk%20Format%20Spec_10_18_06.doc

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-8-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:26 +01:00
Markus Armbruster
3d6101a3f2 block/vpc: Use sizeof() instead of 1024 for dynamic header size
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-7-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:23 +01:00
Markus Armbruster
e326f0783e block/vpc: Pad VHDDynDiskHeader, replace uint8_t[] buffers
Pad VHDDynDiskHeader as specified in the "Virtual Hard Disk Image
Format Specification" version 1.0[*].  Change dynamic disk header
buffers from uint8_t[1024] to VHDDynDiskHeader.  Their size remains
the same.

The VHDDynDiskHeader * variables pointing to a VHDDynDiskHeader
variable right next to it are now silly.  Eliminate them.

[*] http://download.microsoft.com/download/f/f/e/ffef50a5-07dd-4cf8-aaa3-442c0673a029/Virtual%20Hard%20Disk%20Format%20Spec_10_18_06.doc

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-6-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:18 +01:00
Markus Armbruster
7550379ded block/vpc: Make vpc_checksum() take void *
Some of the next commits will checksum structs.  Change vpc_checksum()
to take void * instead of uint8_t, to save us pointless casts to
uint8_t *.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-5-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:16 +01:00
Markus Armbruster
a18dc3a14d block/vpc: Don't abuse the footer buffer for dynamic header
create_dynamic_disk() takes a buffer holding the footer as first
argument.  It writes out the footer (512 bytes), then reuses the
buffer to initialize and write out the dynamic header (1024 bytes).

Works, because the caller passes a buffer that is large enough for
both purposes.  I hate that.

Use a separate buffer for the dynamic header, and adjust the caller's
buffer.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-4-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:14 +01:00
Markus Armbruster
b0ce8cb0e8 block/vpc: Don't abuse the footer buffer as BAT sector buffer
create_dynamic_disk() takes a buffer holding the footer as first
argument.  It writes out the footer (512 bytes), then reuses the
buffer to initialize and write out the dynamic header (1024 bytes),
then reuses it again to initialize and write out BAT sectors (512).

Works, because the caller passes a buffer that is large enough for all
three purposes.  I hate that.

Use a separate buffer for writing out BAT sectors.  The next commit
will do the same for the dynamic header.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-3-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:06 +01:00
Markus Armbruster
02df95c4a1 block/vpc: Make vpc_open() read the full dynamic header
The dynamic header's size is 1024 bytes.

vpc_open() reads only the 512 bytes of the dynamic header into buf[].
Works, because it doesn't actually access the second half.  However, a
colleague told me that GCC 11 warns:

    ../block/vpc.c:358:51: error: array subscript 'struct VHDDynDiskHeader[0]' is partly outside array bounds of 'uint8_t[512]' [-Werror=array-bounds]

Clean up to read the full header.

Rename buf[] to dyndisk_header_buf[] while there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-2-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:42:34 +01:00
Philippe Mathieu-Daudé
c8807c5edc block/nvme: Implement fake truncate() coroutine
NVMe drive cannot be shrunk.

Since commit c80d8b06cf we can use the @exact parameter (set
to false) to return success if the block device is larger than
the requested offset (even if we can not be shrunk).

Use this parameter to implement the NVMe truncate() coroutine,
similarly how it is done for the iscsi and file-posix drivers
(see commit 82325ae5f2 "Evaluate @exact in protocol drivers").

Reported-by: Xueqiang Wei <xuwei@redhat.com>
Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20201210125202.858656-1-philmd@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Alberto Garcia
5cddb2e95f quorum: Implement bdrv_co_pwrite_zeroes()
This simply calls bdrv_co_pwrite_zeroes() in all children.

bs->supported_zero_flags is also set to the flags that are supported
by all children.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <2f09c842781fe336b4c2e40036bba577b7430190.1605286097.git.berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Alberto Garcia
ef9bba1484 quorum: Implement bdrv_co_block_status()
The quorum driver does not implement bdrv_co_block_status() and
because of that it always reports to contain data even if all its
children are known to be empty.

One consequence of this is that if we for example create a quorum with
a size of 10GB and we mirror it to a new image the operation will
write 10GB of actual zeroes to the destination image wasting a lot of
time and disk space.

Since a quorum has an arbitrary number of children of potentially
different formats there is no way to report all possible allocation
status flags in a way that makes sense, so this implementation only
reports when a given region is known to contain zeroes
(BDRV_BLOCK_ZERO) or not (BDRV_BLOCK_DATA).

If all children agree that a region contains zeroes then we can return
BDRV_BLOCK_ZERO using the smallest size reported by the children
(because all agree that a region of at least that size contains
zeroes).

If at least one child disagrees we have to return BDRV_BLOCK_DATA.
In this case we use the largest of the sizes reported by the children
that didn't return BDRV_BLOCK_ZERO (because we know that there won't
be an agreement for at least that size).

Signed-off-by: Alberto Garcia <berto@igalia.com>
Tested-by: Tao Xu <tao3.xu@intel.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <db83149afcf0f793effc8878089d29af4c46ffe1.1605286097.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy
33fa2222eb block: introduce preallocate filter
It's intended to be inserted between format and protocol nodes to
preallocate additional space (expanding protocol file) on writes
crossing EOF. It improves performance for file-systems with slow
allocation.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201021145859.11201-9-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Two comment fixes, and bumped the version from 5.2 to 6.0]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy
d1a764d126 block: introduce BDRV_REQ_NO_WAIT flag
Add flag to make serialising request no wait: if there are conflicting
requests, just return error immediately. It's will be used in upcoming
preallocate filter.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201021145859.11201-7-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy
8ac5aab255 block: bdrv_mark_request_serialising: split non-waiting function
We'll need a separate function, which will only "mark" request
serialising with specified align but not wait for conflicting
requests. So, it will be like old bdrv_mark_request_serialising(),
before merging bdrv_wait_serialising_requests_locked() into it.

To reduce the possible mess, let's do the following:

Public function that does both marking and waiting will be called
bdrv_make_request_serialising, and private function which will only
"mark" will be called tracked_request_set_serialising().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201021145859.11201-6-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy
ec1c886831 block/io: bdrv_wait_serialising_requests_locked: drop extra bs arg
bs is linked in req, so no needs to pass it separately. Most of
tracked-requests API doesn't have bs argument. Actually, after this
patch only tracked_request_begin has it, but it's for purpose.

While being here, also add a comment about what "_locked" is.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201021145859.11201-5-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy
3183937ff9 block/io: split out bdrv_find_conflicting_request
To be reused in separate.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201021145859.11201-4-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy
2e36da62cf block/io.c: drop assertion on double waiting for request serialisation
The comments states, that on misaligned request we should have already
been waiting. But for bdrv_padding_rmw_read, we called
bdrv_mark_request_serialising with align = request_alignment, and now
we serialise with align = cluster_size. So we may have to wait again
with larger alignment.

Note, that the only user of BDRV_REQ_SERIALISING is backup which issues
cluster-aligned requests, so seems the assertion should not fire for
now. But it's wrong anyway.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20201021145859.11201-3-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Peter Lieven
182454dc63 block/nfs: fix int overflow in nfs_client_open_qdict
nfs_client_open returns the file size in sectors. This effectively
makes it impossible to open files larger than 1TB.

Fixes: c22a034545
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <20201209121735.16437-1-pl@kamp.de>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 11:48:39 +01:00
Pan Nengyuan
cb8d0851f1 block/file-posix: fix a possible undefined behavior
local_err is not initialized to NULL, it will cause a assert error as below:
qemu/util/error.c:59: error_setv: Assertion `*errp == NULL' failed.

Fixes: c644751069
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Message-Id: <20201023061218.2080844-8-kuhn.chenqun@huawei.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-12-13 23:56:16 +01:00
Kevin Wolf
960d5fb3e8 block: Fix deadlock in bdrv_co_yield_to_drain()
If bdrv_co_yield_to_drain() is called for draining a block node that
runs in a different AioContext, it keeps that AioContext locked while it
yields and schedules a BH in the AioContext to do the actual drain.

As long as executing the BH is the very next thing that the event loop
of the node's AioContext does, this actually happens to work, but when
it tries to execute something else that wants to take the AioContext
lock, it will deadlock. (In the bug report, this other thing is a
virtio-scsi device running virtio_scsi_data_plane_handle_cmd().)

Instead, always drop the AioContext lock across the yield and reacquire
it only when the coroutine is reentered. The BH needs to unconditionally
take the lock for itself now.

This fixes the 'block_resize' QMP command on a block node that runs in
an iothread.

Cc: qemu-stable@nongnu.org
Fixes: eb94b81a94
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1903511
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201203172311.68232-4-kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy
8b1170012b block: introduce BDRV_MAX_LENGTH
We are going to modify block layer to work with 64bit requests. And
first step is moving to int64_t type for both offset and bytes
arguments in all block request related functions.

It's mostly safe (when widening signed or unsigned int to int64_t), but
switching from uint64_t is questionable.

So, let's first establish the set of requests we want to work with.
First signed int64_t should be enough, as off_t is signed anyway. Then,
obviously offset + bytes should not overflow.

And most interesting: (offset + bytes) being aligned up should not
overflow as well. Aligned to what alignment? First thing that comes in
mind is bs->bl.request_alignment, as we align up request to this
alignment. But there is another thing: look at
bdrv_mark_request_serialising(). It aligns request up to some given
alignment. And this parameter may be bdrv_get_cluster_size(), which is
often a lot greater than bs->bl.request_alignment.
Note also, that bdrv_mark_request_serialising() uses signed int64_t for
calculations. So, actually, we already depend on some restrictions.

Happily, bdrv_get_cluster_size() returns int and
bs->bl.request_alignment has 32bit unsigned type, but defined to be a
power of 2 less than INT_MAX. So, we may establish, that INT_MAX is
absolute maximum for any kind of alignment that may occur with the
request.

Note, that bdrv_get_cluster_size() is not documented to return power
of 2, still bdrv_mark_request_serialising() behaves like it is.
Also, backup uses bdi.cluster_size and is not prepared to it not being
power of 2.
So, let's establish that Qemu supports only power-of-2 clusters and
alignments.

So, alignment can't be greater than 2^30.

Finally to be safe with calculations, to not calculate different
maximums for different nodes (depending on cluster size and
request_alignment), let's simply set QEMU_ALIGN_DOWN(INT64_MAX, 2^30)
as absolute maximum bytes length for Qemu. Actually, it's not much less
than INT64_MAX.

OK, then, let's apply it to block/io.

Let's consider all block/io entry points of offset/bytes:

4 bytes/offset interface functions: bdrv_co_preadv_part(),
bdrv_co_pwritev_part(), bdrv_co_copy_range_internal() and
bdrv_co_pdiscard() and we check them all with bdrv_check_request().

We also have one entry point with only offset: bdrv_co_truncate().
Check the offset.

And one public structure: BdrvTrackedRequest. Happily, it has only
three external users:

 file-posix.c: adopted by this patch
 write-threshold.c: only read fields
 test-write-threshold.c: sets obviously small constant values

Better is to make the structure private and add corresponding
interfaces.. Still it's not obvious what kind of interface is needed
for file-posix.c. Let's keep it public but add corresponding
assertions.

After this patch we'll convert functions in block/io.c to int64_t bytes
and offset parameters. We can assume that offset/bytes pair always
satisfy new restrictions, and make
corresponding assertions where needed. If we reach some offset/bytes
point in block/io.c missing bdrv_check_request() it is considered a
bug. As well, if block/io.c modifies a offset/bytes request, expanding
it more then aligning up to request_alignment, it's a bug too.

For all io requests except for discard we keep for now old restriction
of 32bit request length.

iotest 206 output error message changed, as now test disk size is
larger than new limit. Add one more test case with new maximum disk
size to cover too-big-L1 case.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-5-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy
f4dad307ef block/io: bdrv_check_byte_request(): drop bdrv_is_inserted()
Move bdrv_is_inserted() calls into callers.

We are going to make bdrv_check_byte_request() a clean thing.
bdrv_is_inserted() is not about checking the request, it's about
checking the bs. So, it should be separate.

With this patch we probably change error path for some failure
scenarios. But depending on the fact that querying too big request on
empty cdrom (or corrupted qcow2 node with no drv) will result in EIO
and not ENOMEDIUM would be very strange. More over, we are going to
move to 64bit requests, so larger requests will be allowed anyway.

More over, keeping in mind that cdrom is the only driver that has
.bdrv_is_inserted() handler it's strange that we should care so much
about it in generic block layer, intuitively we should just do read and
write, and cdrom driver should return correct errors if it is not
inserted. But it's a work for another series.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-4-vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy
33985614bd block/io: bdrv_refresh_limits(): use ERRP_GUARD
This simplifies following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-3-vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy
9b100af30f block/file-posix: fix workaround in raw_do_pwrite_zeroes()
We should not set overlap_bytes:

1. Don't worry: it is calculated by bdrv_mark_request_serialising() and
   will be equal to or greater than bytes anyway.

2. If the request was already aligned up to some greater alignment,
   than we may break things: we reduce overlap_bytes, and further
   bdrv_mark_request_serialising() may not help, as it will not restore
   old bigger alignment.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-2-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Li Feng
eb43ea16dc file-posix: check the use_lock before setting the file lock
The scenario is that when accessing a volume on an NFS filesystem
without supporting the file lock,  Qemu will complain "Failed to lock
byte 100", even when setting the file.locking = off.

We should do file lock related operations only when the file.locking is
enabled, otherwise, the syscall of 'fcntl' will return non-zero.

Signed-off-by: Li Feng <fengli@smartx.com>
Message-Id: <1607341446-85506-1-git-send-email-fengli@smartx.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Max Reitz
df4ea7091b fuse: Implement hole detection through lseek
This is a relatively new feature in libfuse (available since 3.8.0,
which was released in November 2019), so we have to add a dedicated
check whether it is available before making use of it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-7-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Max Reitz
4ca37a96a7 fuse: (Partially) implement fallocate()
This allows allocating areas after the (old) EOF as part of a growing
resize, writing zeroes, and discarding.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-6-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Max Reitz
4fba06d594 fuse: Allow growable exports
These will behave more like normal files in that writes beyond the EOF
will automatically grow the export size.

As an optimization, keep the RESIZE permission for growable exports so
we do not have to take it for every post-EOF write.  (This permission is
not released when the export is destroyed, because at that point the
BlockBackend is destroyed altogether anyway.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-5-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Max Reitz
41429e3d79 fuse: Implement standard FUSE operations
This makes the export actually useful instead of only producing errors
whenever it is accessed.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-4-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Max Reitz
0c9b70d590 fuse: Allow exporting BDSs via FUSE
block-export-add type=fuse allows mounting block graph nodes via FUSE on
some existing regular file.  That file should then appears like a raw
disk image, and accesses to it result in accesses to the exported BDS.

Right now, we only implement the necessary block export functions to set
it up and shut it down.  We do not implement any access functions, so
accessing the mount point only results in errors.  This will be
addressed by a followup patch.

We keep a hash table of exported mount points, because we want to be
able to detect when users try to use a mount point twice.  This is
because we invoke stat() to check whether the given mount point is a
regular file, but if that file is served by ourselves (because it is
already used as a mount point), then this stat() would have to be served
by ourselves, too, which is impossible to do while we (as the caller)
are waiting for it to settle.  Therefore, keep track of mount point
paths to at least catch the most obvious instances of that problem.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-3-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Gan Qixin
c208b0ef96 block/iscsi: Use lock guard macros
Replace manual lock()/unlock() calls with lock guard macros
(QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/iscsi.

Signed-off-by: Gan Qixin <ganqixin@huawei.com>
Message-Id: <20201203075055.127773-5-ganqixin@huawei.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Gan Qixin
3af613ebdb block/throttle-groups: Use lock guard macros
Replace manual lock()/unlock() calls with lock guard macros
(QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/throttle-groups.

Signed-off-by: Gan Qixin <ganqixin@huawei.com>
Message-Id: <20201203075055.127773-4-ganqixin@huawei.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Gan Qixin
f5056b70e6 block/curl: Use lock guard macros
Replace manual lock()/unlock() calls with lock guard macros
(QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/curl.

Signed-off-by: Gan Qixin <ganqixin@huawei.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20201203075055.127773-3-ganqixin@huawei.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Gan Qixin
c37c973660 block/accounting: Use lock guard macros
Replace manual lock()/unlock() calls with lock guard macros
(QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/accounting.

Signed-off-by: Gan Qixin <ganqixin@huawei.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20201203075055.127773-2-ganqixin@huawei.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Markus Armbruster
6cc0667d9b Tweak a few "Parameter 'NAME' expects THING" error message
Change to "expects a THING" where that's an obvious improvement

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201113082626.2725812-11-armbru@redhat.com>
2020-12-10 17:16:44 +01:00
Stefan Hajnoczi
552c2c4c10 block/export: avoid g_return_val_if() input validation
Do not validate input with g_return_val_if(). This API is intended for
checking programming errors and is compiled out with -DG_DISABLE_CHECKS.

Use an explicit if statement for input validation so it cannot
accidentally be compiled out.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201118091644.199527-5-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-09 13:04:17 -05:00
Marc-André Lureau
0df750e9d3 libvhost-user: make it a meson subproject
By making libvhost-user a subproject, check it builds
standalone (without the global QEMU cflags etc).

Note that the library still relies on QEMU include/qemu/atomic.h and
linux_headers/.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20201125100640.366523-6-marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-08 13:48:58 -05:00
Maxim Levitsky
c8bf9a9169 qcow2: Fix corruption on write_zeroes with MAY_UNMAP
Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
introduced a subtle change to code in zero_in_l2_slice:

It swapped the order of

1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);

To

1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);

It seems harmless, however the call to qcow2_free_any_clusters can
trigger a cache flush which can mark the L2 table as clean, and
assuming that this was the last write to it, a stale version of it
will remain on the disk.

Now we have a valid L2 entry pointing to a freed cluster. Oops.

Fixes: 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
[ kwolf: Fixed to restore the correct original order from before
  205fa50750; added comments like in discard_in_l2_slice(). ]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201124092815.39056-1-kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-11-24 11:29:41 +01:00
Peter Maydell
683685e72d Pull request for 5.2
NVMe fixes to solve IOMMU issues on non-x86 and error message/tracing
 improvements. Elena Afanasova's ioeventfd fixes are also included.
 
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAl+ixjgACgkQnKSrs4Gr
 c8iZYgf+OB2eAGsdZO97fKh6VUUoRKa+BgWKuh37Cfpp3q+dLuIFMSKfU/UgprLc
 aowt6uTFfwudDV9KltUB2EiXIzpuf7JhMNOiDRkyEvYSj4KHRPsQmFCd35Nrjezy
 VvxSGafe2Z60Qnvcx+iGeMATSFX9YTcTZeHttC07v7dWn/yEK3b1hobcmjCcwWeR
 Ud8pjMyh5E2z/NpW8E669/byJf9iahx3LSQxSWt+9PVTPuftAB0Suu+m6svz1wvk
 sjVfIbtVWCp2BdGf5U6a2rEqF3+kIcFkfHp+MwgE0EdMz1wfjudaPl13a0C4DSun
 PSt9E+Ct5BTrDUvqCHvQDOaFiMZTPg==
 =Poyb
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha-gitlab/tags/block-pull-request' into staging

Pull request for 5.2

NVMe fixes to solve IOMMU issues on non-x86 and error message/tracing
improvements. Elena Afanasova's ioeventfd fixes are also included.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

# gpg: Signature made Wed 04 Nov 2020 15:18:16 GMT
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha-gitlab/tags/block-pull-request: (33 commits)
  util/vfio-helpers: Assert offset is aligned to page size
  util/vfio-helpers: Convert vfio_dump_mapping to trace events
  util/vfio-helpers: Improve DMA trace events
  util/vfio-helpers: Trace where BARs are mapped
  util/vfio-helpers: Trace PCI BAR region info
  util/vfio-helpers: Trace PCI I/O config accesses
  util/vfio-helpers: Improve reporting unsupported IOMMU type
  block/nvme: Fix nvme_submit_command() on big-endian host
  block/nvme: Fix use of write-only doorbells page on Aarch64 arch
  block/nvme: Align iov's va and size on host page size
  block/nvme: Change size and alignment of prp_list_pages
  block/nvme: Change size and alignment of queue
  block/nvme: Change size and alignment of IDENTIFY response buffer
  block/nvme: Correct minimum device page size
  block/nvme: Set request_alignment at initialization
  block/nvme: Simplify nvme_cmd_sync()
  block/nvme: Simplify ADMIN queue access
  block/nvme: Correctly initialize Admin Queue Attributes
  block/nvme: Use definitions instead of magic values in add_io_queue()
  block/nvme: Introduce Completion Queue definitions
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-11-23 13:03:13 +00:00
Peter Maydell
c8e5c4b246 Patches for 5.2.0-rc2:
- quorum: Fix crash with rewrite-corrupted and without read-write user
 - io_uring: do not use pointer after free
 - file-posix: Use fallback path for -EBUSY from FALLOC_FL_PUNCH_HOLE
 - iotests: Fix failure on Python 3.9 due to use of a deprecated function
 - char-stdio: Fix QMP default for 'signal'
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl+zt1URHGt3b2xmQHJl
 ZGhhdC5jb20ACgkQfwmycsiPL9ZxyxAAu8GIOeAb7atQvc+KpeBTUG4A+tfAXkC+
 iUYdIpFeWWgmGf7myu3nlaAkeTDk6qHalmzkGRHi3yhX4eNIh5Sdff1YwPcZwf+q
 GLIqFFTW0z1Bd36N8G7Mkf04nKX4QTHqp6THHtSt9jNs56h5OP3axPXVA/3v9y8B
 4ZAkOOvwnwO+U94crhy5y5pX/Vwafv/Dz4DH9hEupE+EI9AuzjZLBrS+sgkxjhmu
 gvHpDSqm6NXwWQA5a24J6NzCy3n/Fw/rqmnoOrN8eRz+4DSCMVDnTDDEMFLa/UoK
 Ci7AqWfG/MnQ4GrGsOx80KJhAFLTmI60vfnUizKtEjL/HJyK5PDyM+VxHz+P/Tkq
 4hqQsHEsll4mAQiKCrrKOOXhn+YC4DhY/5O1EzEfhqfUjI+BFE9iC7LuqQevwKPL
 gytup7eoZjIHMtnKwY1B2ApAqHtodswjHkefcjEcvSlhqGi/BvwuWmeYlFXmA3r0
 YO8fvbYJrwHwJy7CzMb5Rgs2461QGERmXoCsBxLAiqXU9rhpOZ6gKXIjjlYojZ8M
 W0kqbaccTRPuhooFdEQ9RTPSkX7AX2bI0nOoPxfz3YD/siw35YwnUkJqvQbckvJd
 vpPkCL5jt3d9sfO0z1xjSH2ey9bevSReYpCsk+kIZl7V2XoDAW0Nbi0Td3pW4j6x
 dEkg/+sjF+o=
 =0pFF
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Patches for 5.2.0-rc2:

- quorum: Fix crash with rewrite-corrupted and without read-write user
- io_uring: do not use pointer after free
- file-posix: Use fallback path for -EBUSY from FALLOC_FL_PUNCH_HOLE
- iotests: Fix failure on Python 3.9 due to use of a deprecated function
- char-stdio: Fix QMP default for 'signal'

# gpg: Signature made Tue 17 Nov 2020 11:43:17 GMT
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  iotests/081: Test rewrite-corrupted without WRITE
  iotests/081: Filter image format after testdir
  quorum: Require WRITE perm with rewrite-corrupted
  io_uring: do not use pointer after free
  file-posix: allow -EBUSY errors during write zeros on raw block devices
  iotests: Replace deprecated ConfigParser.readfp()
  char-stdio: Fix QMP default for 'signal'

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-11-17 15:58:51 +00:00
Peter Maydell
1c7ab0930a pc,vhost: fixes
Fixes all over the place.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl+zlRgPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpmf8H/0BEjxnINJCN12Te+Mot8K9fjwc0zE0SUuYY
 25LogfJMCfVy0SZk0ZQV9z33GEL5XyMlXQjEpLmlX4d3mOBLcbutI6UVLhu8+Ixj
 89+jFphxIQPDOpA7BnPOD4AJ6TlhbewZ41QBR/J/qv946HayFW9QCAUywuj6H80m
 T3lw0FmPkd6/YupUdUm0pPgJjowckGis+cAa9UkTlqp8jpzFur28N02fE0L6QO3Z
 lR6zsk4yEvsVoeXSkEkmSqZGNcwoQCf4BhmDuD7lBLZ0LBvmd37CCoakStpdnQPH
 Swunmf7Q1H6LRtF7s8ZKXBB/ecVnss3kFTFj5KWx3fJH2SJuHG8=
 =v205
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc,vhost: fixes

Fixes all over the place.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Tue 17 Nov 2020 09:17:12 GMT
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream:
  vhost-user-blk/scsi: Fix broken error handling for socket call
  contrib/libvhost-user: Fix bad printf format specifiers
  hw/i386/acpi-build: Fix maybe-uninitialized error when ACPI hotplug off
  configure: mark vhost-user Linux-only
  vhost-user-blk-server: depend on CONFIG_VHOST_USER
  meson: move vhost_user_blk_server to meson.build
  vhost-user: fix VHOST_USER_ADD/REM_MEM_REG truncation

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

# Conflicts:
#	meson.build
2020-11-17 11:50:11 +00:00
Max Reitz
9ca5b0e842 quorum: Require WRITE perm with rewrite-corrupted
Using rewrite-corrupted means quorum may issue writes to its children
just from receiving read requests from its parents.  Thus, it must take
the WRITE permission when rewrite-corrupted is used.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201113211718.261671-2-mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-11-17 12:38:28 +01:00
Paolo Bonzini
bd89f93603 io_uring: do not use pointer after free
Even though only the pointer value is only printed, it is untidy
and Coverity complains.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20201113154102.1460459-1-pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-11-17 12:26:48 +01:00
Maxim Levitsky
ece4fa9152 file-posix: allow -EBUSY errors during write zeros on raw block devices
On Linux, fallocate(fd, FALLOC_FL_PUNCH_HOLE) when it is used on a block device,
without O_DIRECT can return -EBUSY if it races with another write to the same page.

Since this is rare and discard is not a critical operation, ignore this error

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20201111153913.41840-2-mlevitsk@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-11-17 12:26:48 +01:00
Chetan Pant
61f3c91a67 nomaintainer: Fix Lesser GPL version number
There is no "version 2" of the "Lesser" General Public License.
It is either "GPL version 2.0" or "Lesser GPL version 2.1".
This patch replaces all occurrences of "Lesser GPL version 2" with
"Lesser GPL version 2.1" in comment section.

This patch contains all the files, whose maintainer I could not get
from ‘get_maintainer.pl’ script.

Signed-off-by: Chetan Pant <chetan4windows@gmail.com>
Message-Id: <20201023124424.20177-1-chetan4windows@gmail.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
[thuth: Adapted exec.c and qdev-monitor.c to new location]
Signed-off-by: Thomas Huth <thuth@redhat.com>
2020-11-15 17:04:40 +01:00
Stefan Hajnoczi
e5e856c1eb meson: move vhost_user_blk_server to meson.build
The --enable/disable-vhost-user-blk-server options were implemented in
./configure. There has been confusion about them and part of the problem
is that the shell syntax used for setting the default value is not easy
to read. Move the option over to meson where the conditions are easier
to understand:

  have_vhost_user_blk_server = (targetos == 'linux')

  if get_option('vhost_user_blk_server').enabled()
      if targetos != 'linux'
          error('vhost_user_blk_server requires linux')
      endif
  elif get_option('vhost_user_blk_server').disabled() or not have_system
      have_vhost_user_blk_server = false
  endif

This patch does not change behavior.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201110171121.1265142-2-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2020-11-12 09:19:40 -05:00
shiliyang
5f14f31d2b block: Fix some code style problems, "foo* bar" should be "foo *bar"
There have some code style problems be found when read the block driver code.
So I fixes some problems of this error, ERROR: "foo* bar" should be "foo *bar".

Signed-off-by: Liyang Shi <shiliyang@huawei.com>
Reported-by: Euler Robot <euler.robot@huawei.com>
Message-Id: <3211f389-6d22-46c1-4a16-e6a2ba66f070@huawei.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-11-09 18:42:47 +01:00
Yonggang Luo
c63b0201ae block: Fixes nfs compiling error on msys2/mingw
These compiling errors are fixed:
../block/nfs.c:27:10: fatal error: poll.h: No such file or directory
   27 | #include <poll.h>
      |          ^~~~~~~~
compilation terminated.

../block/nfs.c:63:5: error: unknown type name 'blkcnt_t'
   63 |     blkcnt_t st_blocks;
      |     ^~~~~~~~
../block/nfs.c: In function 'nfs_client_open':
../block/nfs.c:550:27: error: 'struct _stat64' has no member named 'st_blocks'
  550 |     client->st_blocks = st.st_blocks;
      |                           ^
../block/nfs.c: In function 'nfs_get_allocated_file_size':
../block/nfs.c:751:41: error: 'struct _stat64' has no member named 'st_blocks'
  751 |     return (task.ret < 0 ? task.ret : st.st_blocks * 512);
      |                                         ^
../block/nfs.c: In function 'nfs_reopen_prepare':
../block/nfs.c:805:31: error: 'struct _stat64' has no member named 'st_blocks'
  805 |         client->st_blocks = st.st_blocks;
      |                               ^
../block/nfs.c: In function 'nfs_get_allocated_file_size':
../block/nfs.c:752:1: error: control reaches end of non-void function [-Werror=return-type]
  752 | }
      | ^

On msys2/mingw, there is no st_blocks in struct _stat64 yet, we disable the usage of it
on msys2/mingw, and create a typedef long long blkcnt_t; for further implementation

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Message-Id: <20201105123116.674-2-luoyonggang@gmail.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-11-09 15:44:21 +01:00
Alberto Garcia
3441ad4bc4 qcow2: Document and enforce the QCowL2Meta invariants
The QCowL2Meta structure is used to store information about a part of
a write request that touches clusters that need changes in their L2
entries. This happens with newly-allocated clusters or subclusters.

This structure has changed a bit since it was first created and its
current documentation is not quite up-to-date.

A write request can span a region consisting of a combination of
clusters of different types, and qcow2_alloc_host_offset() can
repeatedly call handle_copied() and handle_alloc() to add more
clusters to the mix as long as they all are contiguous on the image
file.

Because of this a write request has a list of QCowL2Meta structures,
one for each part of the request that needs changes in the L2
metadata.

Each one of them spans nb_clusters and has two copy-on-write regions
located immediately before and after the middle region touched by that
part of the write request. Even when those regions themselves are
empty their offsets must be correct because they are used to know the
location of the middle region.

This was not always the case but it is not a problem anymore
because the only two places where QCowL2Meta structures are created
(calculate_l2_meta() and qcow2_co_truncate()) ensure that the
copy-on-write regions are correctly defined, and so do assertions like
the ones in perform_cow().

The conditional initialization of the 'written_to' variable is
therefore unnecessary and is removed by this patch.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201007161323.4667-1-berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-11-09 15:44:21 +01:00
AlexChen
3d86af858e block: Remove unused include
The "qemu-common.h" include is not used, remove it.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: AlexChen <alex.chen@huawei.com>
Message-Id: <5F8FFB94.3030209@huawei.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-11-09 15:44:21 +01:00
Peter Maydell
85c3ed4417 pc,pci,vhost,virtio: fixes
Lots of fixes all over the place.
 virtio-mem and virtio-iommu patches are kind of fixes but
 it seems better to just make them behave sanely than
 try to educate users about the limitations ...
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl+i9YMPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpySQH/Ru/sxB9PncR1HsqSf0HC0tt/EMKgyZTXEwQ
 FITcjkCvBDS98a1VUvvZbjzTEDEZNnoUv94MjdLeBoptJ7GtK6nPoI6Ke0p1Zqbe
 mlY2BCb0FpN8FE+mthjAI03mhw6o8Qo/OPtyISQzUxCVVqUHL5TRAVAQdeidoK8n
 RBQ4WogwM/h7wI0d9GGgSxAON8IRQnBYImtzJieBb6zeScwKVFTWI1tqBdOyFN0/
 AhzQiNZuhZ7a1XGJIsxmWB1NK2kcXNJuOF0ANh4coIHR0JzmH3xRy+Jnf5e3dYsw
 LI23DUZPSTJJXAwKPucyTG7RTX8F55N9DVHC9KDRD6Ntq1oreJ4=
 =pcbN
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc,pci,vhost,virtio: fixes

Lots of fixes all over the place.
virtio-mem and virtio-iommu patches are kind of fixes but
it seems better to just make them behave sanely than
try to educate users about the limitations ...

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Wed 04 Nov 2020 18:40:03 GMT
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream: (31 commits)
  contrib/vhost-user-blk: fix get_config() information leak
  block/export: fix vhost-user-blk get_config() information leak
  block/export: make vhost-user-blk config space little-endian
  configure: introduce --enable-vhost-user-blk-server
  libvhost-user: follow QEMU comment style
  vhost-blk: set features before setting inflight feature
  Revert "vhost-blk: set features before setting inflight feature"
  net: Add vhost-vdpa in show_netdevs()
  vhost-vdpa: Add qemu_close in vhost_vdpa_cleanup
  vfio: Don't issue full 2^64 unmap
  virtio-iommu: Set supported page size mask
  vfio: Set IOMMU page size as per host supported page size
  memory: Add interface to set iommu page size mask
  virtio-iommu: Add notify_flag_changed() memory region callback
  virtio-iommu: Add replay() memory region callback
  virtio-iommu: Call memory notifiers in attach/detach
  virtio-iommu: Add memory notifiers for map/unmap
  virtio-iommu: Store memory region in endpoint struct
  virtio-iommu: Fix virtio_iommu_mr()
  hw/smbios: Fix leaked fd in save_opt_one() error path
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-11-05 15:16:43 +00:00
Stefan Hajnoczi
f8ffcb2bda block/export: fix vhost-user-blk get_config() information leak
Refuse get_config() requests in excess of sizeof(struct virtio_blk_config).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201027173528.213464-5-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 16:39:05 -05:00
Stefan Hajnoczi
11f60f7eae block/export: make vhost-user-blk config space little-endian
VIRTIO 1.0 devices have little-endian configuration space. The
vhost-user-blk-server.c code already uses little-endian for virtqueue
processing but not for the configuration space fields. Fix this so the
vhost-user-blk export works on big-endian hosts.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201027173528.213464-4-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 16:39:05 -05:00
Stefan Hajnoczi
bc15e44cb2 configure: introduce --enable-vhost-user-blk-server
Make it possible to compile out the vhost-user-blk server. It is enabled
by default on Linux.

Note that vhost-user-server.c depends on libvhost-user, which requires
CONFIG_LINUX. The CONFIG_VHOST_USER dependency was erroneous since that
option controls vhost-user frontends (previously known as "master") and
not device backends (previously known as "slave").

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201027173528.213464-3-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 16:39:05 -05:00
Philippe Mathieu-Daudé
a0546a7b6f block/nvme: Fix nvme_submit_command() on big-endian host
The Completion Queue Command Identifier is a 16-bit value,
so nvme_submit_command() is unlikely to work on big-endian
hosts, as the relevant bits are truncated.
Fix by using the correct byte-swap function.

Fixes: bdd6a90a9e ("block: Add VFIO based NVMe driver")
Reported-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201029093306.1063879-25-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:22 +00:00
Philippe Mathieu-Daudé
4b19e9b815 block/nvme: Fix use of write-only doorbells page on Aarch64 arch
qemu_vfio_pci_map_bar() calls mmap(), and mmap(2) states:

  'offset' must be a multiple of the page size as returned
   by sysconf(_SC_PAGE_SIZE).

In commit f68453237b we started to use an offset of 4K which
broke this contract on Aarch64 arch.

Fix by mapping at offset 0, and and accessing doorbells at offset=4K.

Fixes: f68453237b ("block/nvme: Map doorbells pages write-only")
Reported-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-24-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:22 +00:00