Commit Graph

5441 Commits

Author SHA1 Message Date
David Edmondson
07ee2ab4fd block/vdi: Don't assume that blocks are larger than VdiHeader
Given that the block size is read from the header of the VDI file, a
wide variety of sizes might be seen. Rather than re-using a block
sized memory region when writing the VDI header, allocate an
appropriately sized buffer.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Max Reitz <mreitz@redhat.com>
Message-id: 20210325112941.365238-3-pbonzini@redhat.com
Message-Id: <20210309144015.557477-3-david.edmondson@oracle.com>
Acked-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-31 10:44:21 +01:00
David Edmondson
574b8304cf block/vdi: When writing new bmap entry fails, don't leak the buffer
If a new bitmap entry is allocated, requiring the entire block to be
written, avoiding leaking the buffer allocated for the block should
the write fail.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Max Reitz <mreitz@redhat.com>
Message-id: 20210325112941.365238-2-pbonzini@redhat.com
Message-Id: <20210309144015.557477-2-david.edmondson@oracle.com>
Acked-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-31 10:44:21 +01:00
Max Reitz
484108293d qcow2: Force preallocation with data-file-raw
Setting the qcow2 data-file-raw bit means that you can ignore the
qcow2 metadata when reading from the external data file.  It does not
mean that you have to ignore it, though.  Therefore, the data read must
be the same regardless of whether you interpret the metadata or whether
you ignore it, and thus the L1/L2 tables must all be present and give a
1:1 mapping.

This patch changes 244's output: First, the qcow2 file is larger right
after creation, because of metadata preallocation.  Second, the qemu-img
map output changes: Everything that was not explicitly discarded or
zeroed is now a data area.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210326145509.163455-2-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2021-03-30 13:02:10 +02:00
Max Reitz
53431b9086 block/mirror: Fix mirror_top's permissions
mirror_top currently shares all permissions, and takes only the WRITE
permission (if some parent has taken that permission, too).

That is wrong, though; mirror_top is a filter, so it should take
permissions like any other filter does.  For example, if the parent
needs CONSISTENT_READ, we need to take that, too, and if it cannot share
the WRITE permission, we cannot share it either.

The exception is when mirror_top is used for active commit, where we
cannot take CONSISTENT_READ (because it is deliberately unshared above
the base node) and where we must share WRITE (so that it is shared for
all images in the backing chain, so the mirror job can take it for the
target BB).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210211172242.146671-2-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-03-29 18:09:00 +02:00
Pavel Dovgalyuk
ad0ce64279 qcow2: use external virtual timers
Regular virtual timers are used to emulate timings
related to vCPU and peripheral states. QCOW2 uses timers
to clean the cache. These timers should have external
flag. In the opposite case they affect the execution
and it can't be recorded and replayed.
This patch adds external flag to the timer for qcow2
cache clean.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <161700516327.1141158.8366564693714562536.stgit@pasha-ThinkPad-X280>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-03-29 18:04:29 +02:00
Markus Armbruster
bdabafc683 block: Remove monitor command block_passwd
Command block_passwd always fails since

Commit c01c214b69 "block: remove all encryption handling APIs"
(v2.10.0) turned block_passwd into a stub that always fails, and
hardcoded encryption_key_missing to false in query-named-block-nodes
and query-block.

Commit ad1324e044 "block: remove 'encryption_key_missing' flag from
QAPI" just landed.  Complete the cleanup job: remove block_passwd.

Cc: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210323101951.3686029-1-armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2021-03-23 22:31:56 +01:00
Stefan Hajnoczi
6f4b1996b4 block/export: disable VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD for now
The vhost-user in-flight shmfd feature has not been tested with
qemu-storage-daemon's vhost-user-blk server. Disable this optional
feature for now because it requires MFD_ALLOW_SEALING, which is not
available in some CI environments.

If we need this feature in the future it can be re-enabled after
testing.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210309094106.196911-2-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-19 10:15:06 +01:00
Max Reitz
0f418a2076 curl: Disconnect sockets from CURLState
When a curl transfer is finished, that does not mean that CURL lets go
of all the sockets it used for it.  We therefore must not free a
CURLSocket object before CURL has invoked curl_sock_cb() to tell us to
remove it.  Otherwise, we may get a use-after-free, as described in this
bug report: https://bugs.launchpad.net/qemu/+bug/1916501

(Reproducer from that report:
  $ qemu-img convert -f qcow2 -O raw \
  https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img \
  out.img
)

(Alternatively, it might seem logical to force-drop all sockets that
have been used for a state when the respective transfer is done, kind of
like it is done now, but including unsetting the AIO handlers.
Unfortunately, doing so makes the driver just hang instead of crashing,
which seems to evidence that CURL still uses those sockets.)

Make the CURLSocket object independent of "its" CURLState by putting all
sockets into a hash table belonging to the BDRVCURLState instead of a
list that belongs to a CURLState.  Do not touch any sockets in
curl_clean_state().

Testing, it seems like all sockets are indeed gone by the time the curl
BDS is closed, so it seems like there really was no point in freeing any
socket just because a transfer is done.  libcurl does invoke
curl_sock_cb() with CURL_POLL_REMOVE for every socket it has.

Buglink: https://bugs.launchpad.net/qemu/+bug/1916501
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210309130541.37540-3-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-19 10:15:06 +01:00
Max Reitz
3663dca461 curl: Store BDRVCURLState pointer in CURLSocket
A socket does not really belong to any specific state.  We do not need
to store a pointer to "its" state in it, a pointer to the common
BDRVCURLState is sufficient.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210309130541.37540-2-mreitz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-19 10:15:06 +01:00
Kevin Wolf
1bf26076d6 stream: Don't crash when node permission is denied
The image streaming block job restricts shared permissions of the nodes
it accesses. This can obviously fail when other users already got these
permissions. &error_abort is therefore wrong and can crash. Handle these
errors gracefully and just fail starting the block job.

Reported-by: Nini Gu <ngu@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210309173451.45152-1-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-19 10:15:06 +01:00
Daniel P. Berrangé
8d17adf34f block: remove support for using "file" driver with block/char devices
The 'host_device' and 'host_cdrom' drivers must be used instead.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-03-18 09:22:55 +00:00
Daniel P. Berrangé
e67d8e2928 block: remove 'dirty-bitmaps' field from 'BlockInfo' struct
The same data is available in the 'BlockDeviceInfo' struct.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-03-18 09:22:55 +00:00
Daniel P. Berrangé
81cbfd5088 block: remove dirty bitmaps 'status' field
The same information is available via the 'recording' and 'busy' fields.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-03-18 09:22:55 +00:00
Daniel P. Berrangé
ad1324e044 block: remove 'encryption_key_missing' flag from QAPI
This has been hardcoded to "false" since 2.10.0, since secrets required
to unlock block devices are now always provided up front instead of using
interactive prompts.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-03-18 09:22:55 +00:00
Peter Maydell
6f34661b6c Pull request
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAmBJQHkSHGxhdXJlbnRA
 dml2aWVyLmV1AAoJEPMMOL0/L748EdsP/2U2CGTM95tjDunTs9uZV/7zM6PWt85M
 vAPItNVU2jYPfzmaJN8twrzlj0PEDhvB9Q+OJjE4HEGxEbPcdblLg/R6Zs/EaWuY
 N6oKHPXnOnHb+e80UUJdiAq+Y5RUnJbb5L3ArycnVzBgws+Oj3DtqjB2VDccY4C/
 Gkt23tZ7ikU4958e5VBqW2NUUrr+BQO0mqsW+sbbeE3WPj75NQc6srvS3TWvsg7W
 OYEyVYwm52/q2W/1a3Knfv/YO6UU9NGMpGyDLD2kwQwKbgUWYLW2BiWVwOAUldo9
 De3nfKbKnFezLCZAZro20lfCa/aKwNGCOXWzlrKxqUQCmGYUx7gM1+3ahrSd5N0v
 zUgLdZm7O428ZHL6GujWGLA1UwwzpM9X3P3yo4c0S1J6fHypbI6a9jtewrUFvFgP
 TuQ7dp6cn2DTBYUcsrWilPHbTZMADYQNRD/xUtKqalYBEWy3FX5W75+OYBJKKh+X
 Qip68m6JBzgkszXhCcu6xlLb8ynZJr2VsHvtvIgf4NnLqNOIEgVLcMtoMZT8DPrp
 rIoRc5oUFz8zj5lHnJuLADBUvlCMqoCCoU3h2aqHwH8a7RGb180f+82BW9aBcb2u
 Jk+WgAhBUjWBBC97ReFgrINUD/qZRXVoOq8LthTuQSSyr/i1zq+oLM1F0EDXcMDm
 ssATku2IxL24
 =moUF
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-for-6.0-pull-request' into staging

Pull request

# gpg: Signature made Wed 10 Mar 2021 21:56:09 GMT
# gpg:                using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg:                issuer "laurent@vivier.eu"
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full]
# gpg:                 aka "Laurent Vivier <laurent@vivier.eu>" [full]
# gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier2/tags/trivial-branch-for-6.0-pull-request: (22 commits)
  sysemu: Let VMChangeStateHandler take boolean 'running' argument
  sysemu/runstate: Let runstate_is_running() return bool
  hw/lm32/Kconfig: Have MILKYMIST select LM32_DEVICES
  hw/lm32/Kconfig: Rename CONFIG_LM32 -> CONFIG_LM32_DEVICES
  hw/lm32/Kconfig: Introduce CONFIG_LM32_EVR for lm32-evr/uclinux boards
  qemu-common.h: Update copyright string to 2021
  tests/fp/fp-test: Replace the word 'blacklist'
  qemu-options: Replace the word 'blacklist'
  seccomp: Replace the word 'blacklist'
  scripts/tracetool: Replace the word 'whitelist'
  ui: Replace the word 'whitelist'
  virtio-gpu: Adjust code space style
  exec/memory: Use struct Object typedef
  fuzz-test: remove unneccessary debugging flags
  net: Use id_generate() in the network subsystem, too
  MAINTAINERS: Fix the location of tools manuals
  vhost_user_gpu: Drop dead check for g_malloc() failure
  backends/dbus-vmstate: Fix short read error handling
  target/hexagon/gen_tcg_funcs: Fix a typo
  hw/elf_ops: Fix a typo
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-11 18:55:27 +00:00
Peter Maydell
9abda42bf2 nbd patches for 2021-03-09
- Add Vladimir as NBD co-maintainer
 - Fix reporting of holes in NBD_CMD_BLOCK_STATUS
 - Improve command-line parsing accuracy of large numbers (anything going
 through qemu_strtosz), including the deprecation of hex+suffix
 - Improve some error reporting in the block layer
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmBHlmIACgkQp6FrSiUn
 Q2q2cQgAqJWNb4J/ShjvzocDDPzJ0iBitFbg0huFPfbt4DScubEZo5wBJG7vOhOW
 hIHrWCRzGvRgsn0tcSfrgFaegmHKrLgjkibM7ou8ni9NC1kUBd3R/3FBNIMxhYf7
 Q8Kfspl0LRfMJDKF9jdCnQ4Gxcd6h2OIYZqiWVg8V4Tc8WdCpIVOah7e7wjuW8bT
 vgZvfboUWm5AmIF9j/MxuMn+HFZ4ArSuFVL80ZaXlD00vRra7u3HZ8pUfcOlOujg
 7HeouM1E5j3NNE6aZSN++x/EQ3sg0zmirbWUCcgAyRfdRkAmB15uh2PUzPxEIJKH
 UHUIW5LvNtz2+yzOAz2yK29OE523Yg==
 =blE1
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2021-03-09' into staging

nbd patches for 2021-03-09

- Add Vladimir as NBD co-maintainer
- Fix reporting of holes in NBD_CMD_BLOCK_STATUS
- Improve command-line parsing accuracy of large numbers (anything going
through qemu_strtosz), including the deprecation of hex+suffix
- Improve some error reporting in the block layer

# gpg: Signature made Tue 09 Mar 2021 15:38:10 GMT
# gpg:                using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg:                 aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2021-03-09:
  block/qcow2: refactor qcow2_update_options_prepare error paths
  block/qed: bdrv_qed_do_open: deal with errp
  block/qcow2: simplify qcow2_co_invalidate_cache()
  block/qcow2: read_cache_sizes: return status value
  block/qcow2-bitmap: return status from qcow2_store_persistent_dirty_bitmaps
  block/qcow2-bitmap: improve qcow2_load_dirty_bitmaps() interface
  block/qcow2: qcow2_get_specific_info(): drop error propagation
  blockjob: return status from block_job_set_speed()
  block/mirror: drop extra error propagation in commit_active_start()
  block: drop extra error propagation for bdrv_set_backing_hd
  blockdev: fix drive_backup_prepare() missed error
  block: check return value of bdrv_open_child and drop error propagation
  utils: Deprecate hex-with-suffix sizes
  utils: Improve qemu_strtosz() to have 64 bits of precision
  utils: Enhance testsuite for do_strtosz()
  nbd: server: Report holes for raw images
  MAINTAINERS: add Vladimir as co-maintainer of NBD

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-11 13:57:08 +00:00
Philippe Mathieu-Daudé
538f049704 sysemu: Let VMChangeStateHandler take boolean 'running' argument
The 'running' argument from VMChangeStateHandler does not require
other value than 0 / 1. Make it a plain boolean.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20210111152020.1422021-3-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-03-09 23:13:57 +01:00
Peter Maydell
a557b00469 Block layer patches:
- qemu-storage-daemon: add --pidfile option
 - qemu-storage-daemon: CLI error messages include the option name now
 - vhost-user-blk export: Misc fixes
 - docs: Improvements for qemu-storage-daemon documentation
 - parallels: load bitmap extension
 - backup-top: Don't crash on post-finalize accesses
 - Improve error messages related to node-name options
 - iotests improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmBGWHURHGt3b2xmQHJl
 ZGhhdC5jb20ACgkQfwmycsiPL9ZpyxAAk0gRiayMUidSzgvzU/CeUhzBsC4ayEkn
 dLtTZ8hl7cW/w3GjDK1Wri4MANRN/0YHjiLSzO38lfVpK0z8SJr5aU4CwhRlOKVm
 VWgx+OLlV4Azht9fMNF4SwUXgXhl7pUNiFMNnomb++gvqhjMCedDZcWlnVKhbuQ+
 O3TKGO4tToSGaXP85jCM4xukw5HZ//4QMYg6MH0gDk8ahfE2MhyTHz64oDp412os
 qhxvc4bU2S5xGLaBfLGhsc6VPQFKjblG704P/Y73zeoxq12A0L2Ru98WvrNaXw7Z
 m54jJUINiDkJ7ZOl6W04zdeiLvs3BOrNe+7mxawOTmdkBsLOKErrhrTO1gJmHHmX
 kJLWEh9VYWxVbvE7C3KQt9bclR6wt+aPup4X1XE8pHtocPVONVq5bvctrVgxgK0b
 btN06NcK+2jQxcQkG4MnBJ8S41qmxHyIEQlQWKyUWXvKt6zsFU/NuWKMQrAfYZZi
 5J+RPU/fB073LY4lpAgou0OP1/RIvQmi5zWzjWm/Qbp3JpgC+azcYvxn7UU7J71P
 +u8IEQ4+Q9s0gvXQAh/U8AQg2eOqAwEAyFUJl9wpPN56O03dbI8KyCV5ECIRJu49
 CC8uKlJxZkbw9ZBs11SAmm/0J64WcNb2AMWxDPC8Z6oQbVaRRRznoRwRP2H6odUu
 uBolS43+5cI=
 =eAjH
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- qemu-storage-daemon: add --pidfile option
- qemu-storage-daemon: CLI error messages include the option name now
- vhost-user-blk export: Misc fixes
- docs: Improvements for qemu-storage-daemon documentation
- parallels: load bitmap extension
- backup-top: Don't crash on post-finalize accesses
- Improve error messages related to node-name options
- iotests improvements

# gpg: Signature made Mon 08 Mar 2021 17:01:41 GMT
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (30 commits)
  blockdev: Clarify error messages pertaining to 'node-name'
  block: Clarify error messages pertaining to 'node-name'
  docs: qsd: Explain --export nbd,name=... default
  MAINTAINERS: update parallels block driver
  iotests: add parallels-read-bitmap test
  iotests.py: add unarchive_sample_image() helper
  parallels: support bitmap extension for read-only mode
  block/parallels: BDRVParallelsState: add cluster_size field
  parallels.txt: fix bitmap L1 table description
  qcow2-bitmap: make bytes_covered_by_bitmap_cluster() public
  block/export: port virtio-blk read/write range check
  block/export: port virtio-blk discard/write zeroes input validation
  block/export: fix vhost-user-blk export sector number calculation
  block/export: use VIRTIO_BLK_SECTOR_BITS
  block/export: fix blk_size double byteswap
  libqtest: add qtest_remove_abrt_handler()
  libqtest: add qtest_kill_qemu()
  libqtest: add qtest_socket_server()
  vhost-user-blk: fix blkcfg->num_queues endianness
  docs: replace insecure /tmp examples in qsd docs
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-09 21:31:18 +00:00
Vladimir Sementsov-Ogievskiy
1184b41101 block/qcow2: refactor qcow2_update_options_prepare error paths
Keep setting ret close to setting errp and don't merge different error
paths into one. This way it's more obvious that we don't return
error without setting errp.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20210202124956.63146-15-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-03-08 16:04:46 -06:00
Vladimir Sementsov-Ogievskiy
15ce94a68c block/qed: bdrv_qed_do_open: deal with errp
Always set errp on failure. The generic bdrv_open_driver supports
driver functions which can return a negative value but forget to set
errp.  That's a strange thing. Let's improve bdrv_qed_do_open to not
behave this way. This allows the simplification of code in
bdrv_qed_co_invalidate_cache().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210202124956.63146-14-vsementsov@virtuozzo.com>
[eblake: commit message grammar tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-03-08 16:03:32 -06:00
Vladimir Sementsov-Ogievskiy
e6247c9c9f block/qcow2: simplify qcow2_co_invalidate_cache()
qcow2_do_open correctly sets errp on each failure path. So, we can
simplify code in qcow2_co_invalidate_cache() and drop explicit error
propagation.

Add ERRP_GUARD() as mandated by the documentation in
include/qapi/error.h so that error_prepend() is actually called even if
errp is &error_fatal.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210202124956.63146-13-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-03-08 16:03:27 -06:00
Vladimir Sementsov-Ogievskiy
772c4cad13 block/qcow2: read_cache_sizes: return status value
It's better to return status together with setting errp. It allows to
reduce error propagation.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20210202124956.63146-12-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-03-08 16:03:23 -06:00
Vladimir Sementsov-Ogievskiy
526e31de99 block/qcow2-bitmap: return status from qcow2_store_persistent_dirty_bitmaps
It's better to return status together with setting errp. It makes
possible to avoid error propagation.

While being here, put ERRP_GUARD() to fix error_prepend(errp, ...)
usage inside qcow2_store_persistent_dirty_bitmaps() (see the comment
above ERRP_GUARD() definition in include/qapi/error.h)

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20210202124956.63146-11-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-03-08 16:03:21 -06:00
Vladimir Sementsov-Ogievskiy
0c1e9d2a9a block/qcow2-bitmap: improve qcow2_load_dirty_bitmaps() interface
It's recommended for bool functions with errp to return true on success
and false on failure. Non-standard interfaces don't help to understand
the code. The change is also needed to reduce error propagation.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210202124956.63146-10-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-03-08 16:01:47 -06:00
Vladimir Sementsov-Ogievskiy
83bad8cbf5 block/qcow2: qcow2_get_specific_info(): drop error propagation
Don't use error propagation in qcow2_get_specific_info(). For this
refactor qcow2_get_bitmap_info_list, its current interface is rather
weird.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210202124956.63146-9-vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
[eblake: separate local 'tail' variable from 'info_list' parameter]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-03-08 15:16:11 -06:00
Vladimir Sementsov-Ogievskiy
eb5becc18f block/mirror: drop extra error propagation in commit_active_start()
Let's check return value of mirror_start_job to check for failure
instead of local_err.

Rename ret to job, as ret is usually integer variable.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20210202124956.63146-7-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-03-08 15:14:14 -06:00
Vladimir Sementsov-Ogievskiy
bc52024959 block: check return value of bdrv_open_child and drop error propagation
This patch is generated by cocci script:

@@
symbol bdrv_open_child, errp, local_err;
expression file;
@@

  file = bdrv_open_child(...,
-                        &local_err
+                        errp
                        );
- if (local_err)
+ if (!file)
  {
      ...
-     error_propagate(errp, local_err);
      ...
  }

with command

spatch --sp-file x.cocci --macro-file scripts/cocci-macro-file.h \
--in-place --no-show-diff --max-width 80 --use-gitgrep block

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20210202124956.63146-4-vsementsov@virtuozzo.com>
[eblake: fix qcow2_do_open() to use ERRP_GUARD, necessary as the only
caller to pass allow_none=true]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-03-08 15:07:09 -06:00
Vladimir Sementsov-Ogievskiy
baefd97700 parallels: support bitmap extension for read-only mode
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210224104707.88430-5-vsementsov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:55 +01:00
Vladimir Sementsov-Ogievskiy
e0b5207f54 block/parallels: BDRVParallelsState: add cluster_size field
We are going to use it in more places, calculating
"s->tracks << BDRV_SECTOR_BITS" doesn't look good.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210224104707.88430-4-vsementsov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:54 +01:00
Vladimir Sementsov-Ogievskiy
35f428ba39 qcow2-bitmap: make bytes_covered_by_bitmap_cluster() public
Rename bytes_covered_by_bitmap_cluster() to
bdrv_dirty_bitmap_serialization_coverage() and make it public.
It is needed as we are going to share it with bitmap loading in
parallels format.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-Id: <20210224104707.88430-2-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:54 +01:00
Stefan Hajnoczi
05ae4e674e block/export: port virtio-blk read/write range check
Check that the sector number and byte count are valid.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210223144653.811468-13-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:54 +01:00
Stefan Hajnoczi
db4eadf9f1 block/export: port virtio-blk discard/write zeroes input validation
Validate discard/write zeroes the same way we do for virtio-blk. Some of
these checks are mandated by the VIRTIO specification, others are
internal to QEMU.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210223144653.811468-11-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:54 +01:00
Stefan Hajnoczi
e44362ce31 block/export: fix vhost-user-blk export sector number calculation
The driver is supposed to honor the blk_size field but the protocol
still uses 512-byte sector numbers. It is incorrect to multiply
req->sector_num by blk_size.

VIRTIO 1.1 5.2.5 Device Initialization says:

  blk_size can be read to determine the optimal sector size for the
  driver to use. This does not affect the units used in the protocol
  (always 512 bytes), but awareness of the correct value can affect
  performance.

Fixes: 3578389bcf ("block/export: vhost-user block device backend server")
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210223144653.811468-10-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:54 +01:00
Stefan Hajnoczi
524bac0744 block/export: use VIRTIO_BLK_SECTOR_BITS
Use VIRTIO_BLK_SECTOR_BITS and VIRTIO_BLK_SECTOR_SIZE when dealing with
virtio-blk sector numbers. Although the values happen to be the same as
BDRV_SECTOR_BITS and BDRV_SECTOR_SIZE, they are conceptually different.
This makes it clearer when we are dealing with virtio-blk sector units.

Use VIRTIO_BLK_SECTOR_BITS in vu_blk_initialize_config(). Later patches
will use it the new constants the virtqueue request processing code
path.

Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210223144653.811468-9-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:54 +01:00
Stefan Hajnoczi
a4f1542af5 block/export: fix blk_size double byteswap
The config->blk_size field is little-endian. Use the native-endian
blk_size variable to avoid double byteswapping.

Fixes: 11f60f7eae ("block/export: make vhost-user-blk config space little-endian")
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210223144653.811468-8-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:56:54 +01:00
Max Reitz
705dde27c6 backup-top: Refuse I/O in inactive state
When the backup-top node transitions from active to inactive in
bdrv_backup_top_drop(), the BlockCopyState is freed and the filtered
child is removed, so the node effectively becomes unusable.

However, noone told its I/O functions this, so they will happily
continue accessing bs->backing and s->bcs.  Prevent that by aborting
early when s->active is false.

(After the preceding patch, the node should be gone after
bdrv_backup_top_drop(), so this should largely be a theoretical problem.
But still, better to be safe than sorry, and also I think it just makes
sense to check s->active in the I/O functions.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210219153348.41861-3-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:55:18 +01:00
Max Reitz
bdc4c4c5e3 backup: Remove nodes from job in .clean()
The block job holds a reference to the backup-top node (because it is
passed as the main job BDS to block_job_create()).  Therefore,
bdrv_backup_top_drop() cannot delete the backup-top node (replacing it
by its child does not affect the job parent, because that has
.stay_at_node set).  That is a problem, because all of its I/O functions
assume the BlockCopyState (s->bcs) to be valid and that it has a
filtered child; but after bdrv_backup_top_drop(), neither of those
things are true.

It does not make sense to add new parents to backup-top after
backup_clean(), so we should detach it from the job before
bdrv_backup_top_drop().  Because there is no function to do that for a
single node, just detach all of the job's nodes -- the job does not do
anything past backup_clean() anyway.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210219153348.41861-2-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08 14:55:18 +01:00
Paolo Bonzini
f7544edcd3 qemu-config: add error propagation to qemu_config_parse
This enables some simplification of vl.c via error_fatal, and improves
error messages.  Before:

  $ ./qemu-system-x86_64 -readconfig .
  qemu-system-x86_64: error reading file
  qemu-system-x86_64: -readconfig .: read config .: Invalid argument
  $ /usr/libexec/qemu-kvm -readconfig foo
  qemu-kvm: -readconfig foo: read config foo: No such file or directory

After:

  $ ./qemu-system-x86_64 -readconfig .
  qemu-system-x86_64: -readconfig .: Cannot read config file: Is a directory
  $ ./qemu-system-x86_64 -readconfig foo
  qemu-system-x86_64: -readconfig foo: Could not open 'foo': No such file or directory

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210226170816.231173-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06 11:41:54 +01:00
Maxim Levitsky
6094cbeb72 block: qcow2: remove the created file on initialization error
If the qcow initialization fails, we should remove the file if it was
already created, to avoid leaving stale files around.

We already do this for luks raw images.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20201217170904.946013-4-mlevitsk@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-02-15 15:10:14 +01:00
Maxim Levitsky
a890f08e58 block: add bdrv_co_delete_file_noerr
This function wraps bdrv_co_delete_file for the common case of removing a file,
which was just created by format driver, on an error condition.

It hides the -ENOTSUPP error, and reports all other errors otherwise.

Use it in luks driver

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20201217170904.946013-3-mlevitsk@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-02-15 15:10:14 +01:00
Maxim Levitsky
dcb6699512 crypto: luks: Fix tiny memory leak
When the underlying block device doesn't support the
bdrv_co_delete_file interface, an 'Error' object was leaked.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201217170904.946013-2-mlevitsk@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-02-15 15:10:14 +01:00
Peter Maydell
392b9a74b9 bitmaps patches for 2021-02-12
- add 'transform' member to manipulate bitmaps across migration
 - work towards better error handling during bdrv_open
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmAnDQsACgkQp6FrSiUn
 Q2qc5Qf/SKVdpX4j7OnHF6sBuf/8LVWz4KazSqEU0ohazBJmafgJpH2EA5pXMXR4
 frZDWeanGmhj1MjMkta/++uvEBU/TMpW2z98mZvjErteXdnRQAlII/hOCI+QZJvg
 viQ5t1EyrkyXzUePOjs+AwqA5KHWbCKt6QqyItQ78HvI23sw/fuvHj0G67KbVzXZ
 VcSrVr0J7PXnZV/hWfg+C+Nn9Ro9tsVdn79awLYVQ7/SDro3hzylpcHMQaHMK2oe
 mX4D2kNq7s21E27Zb6vlknUhQPkMdETk0gfEbpn7sTVMEc58GRLC7Tqfx7l0JIFK
 5izVyA5vndKVxDGYPkbDK6VL2uDg4A==
 =+Epy
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ericb/tags/pull-bitmaps-2021-02-12' into staging

bitmaps patches for 2021-02-12

- add 'transform' member to manipulate bitmaps across migration
- work towards better error handling during bdrv_open

# gpg: Signature made Fri 12 Feb 2021 23:19:39 GMT
# gpg:                using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg:                 aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-bitmaps-2021-02-12:
  block: use return status of bdrv_append()
  block: return status from bdrv_append and friends
  qemu-iotests: 300: Add test case for modifying persistence of bitmap
  migration: dirty-bitmap: Allow control of bitmap persistence
  migration: dirty-bitmap: Use struct for alias map inner members

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-02-13 21:26:00 +00:00
Vladimir Sementsov-Ogievskiy
934aee14d3 block: use return status of bdrv_append()
Now bdrv_append returns status and we can drop all the local_err things
around it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20210202124956.63146-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-12 15:39:44 -06:00
Vladimir Sementsov-Ogievskiy
ff789bf5a9 block/backup: implement .cancel job handler
Cancel in-flight io on target to not waste the time.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210205163720.887197-10-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-12 12:17:08 -06:00
Vladimir Sementsov-Ogievskiy
521ff8b779 block/mirror: implement .cancel job handler
Cancel in-flight io on target to not waste the time.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210205163720.887197-6-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-12 11:29:40 -06:00
Vladimir Sementsov-Ogievskiy
3fc1ec3725 block/raw-format: implement .bdrv_cancel_in_flight handler
We are going to cancel in-flight requests on mirror nbd target on job
cancel. Still nbd is often used not directly but as raw-format child.
So, add pass-through handler here.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210205163720.887197-4-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-12 09:45:18 -06:00
Vladimir Sementsov-Ogievskiy
c4f7f24e1f block/nbd: implement .bdrv_cancel_in_flight
Just stop waiting for connection in existing requests.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210205163720.887197-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-12 09:45:18 -06:00
Vladimir Sementsov-Ogievskiy
bd54669a4a block: add new BlockDriver handler: bdrv_cancel_in_flight
It will be used to stop retrying NBD requests on mirror cancel.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210205163720.887197-2-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-12 09:45:18 -06:00
Daniel P. Berrangé
3d3e9b1f66 block: rename and alter bdrv_all_find_snapshot semantics
Currently bdrv_all_find_snapshot() will return 0 if it finds
a snapshot, -1 if an error occurs, or if it fails to find a
snapshot. New callers to be added want to distinguish between
the error scenario and failing to find a snapshot.

Rename it to bdrv_all_has_snapshot and make it return -1 on
error, 0 if no snapshot is found and 1 if snapshot is found.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210204124834.774401-7-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08 11:19:51 +00:00
Daniel P. Berrangé
c22d644ca7 block: allow specifying name of block device for vmstate storage
Currently the vmstate will be stored in the first block device that
supports snapshots. Historically this would have usually been the
root device, but with UEFI it might be the variable store. There
needs to be a way to override the choice of block device to store
the state in.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210204124834.774401-6-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08 11:19:51 +00:00
Daniel P. Berrangé
cf3a74c94f block: add ability to specify list of blockdevs during snapshot
When running snapshot operations, there are various rules for which
blockdevs are included/excluded. While this provides reasonable default
behaviour, there are scenarios that are not well handled by the default
logic. Some of the conditions do not have a single correct answer.

Thus there needs to be a way for the mgmt app to provide an explicit
list of blockdevs to perform snapshots across. This can be achieved
by passing a list of node names that should be used.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210204124834.774401-5-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08 11:19:51 +00:00
Daniel P. Berrangé
e26f98e209 block: push error reporting into bdrv_all_*_snapshot functions
The bdrv_all_*_snapshot functions return a BlockDriverState pointer
for the invalid backend, which the callers then use to report an
error message. In some cases multiple callers are reporting the
same error message, but with slightly different text. In the future
there will be more error scenarios for some of these methods, which
will benefit from fine grained error message reporting. So it is
helpful to push error reporting down a level.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
[PMD: Initialize variables]
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210204124834.774401-2-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08 11:19:51 +00:00
Roman Kagan
ddde5ee769 block/nbd: only enter connection coroutine if it's present
When an NBD block driver state is moved from one aio_context to another
(e.g. when doing a drain in a migration thread),
nbd_client_attach_aio_context_bh is executed that enters the connection
coroutine.

However, the assumption that ->connection_co is always present here
appears incorrect: the connection may have encountered an error other
than -EIO in the underlying transport, and thus may have decided to quit
rather than keep trying to reconnect, and therefore it may have
terminated the connection coroutine.  As a result an attempt to reassign
the client in this state (NBD_CLIENT_QUIT) to a different aio_context
leads to a null pointer dereference:

  #0  qio_channel_detach_aio_context (ioc=0x0)
      at /build/qemu-gYtjVn/qemu-5.0.1/io/channel.c:452
  #1  0x0000562a242824b3 in bdrv_detach_aio_context (bs=0x562a268d6a00)
      at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6151
  #2  bdrv_set_aio_context_ignore (bs=bs@entry=0x562a268d6a00,
      new_context=new_context@entry=0x562a260c9580,
      ignore=ignore@entry=0x7feeadc9b780)
      at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6230
  #3  0x0000562a24282969 in bdrv_child_try_set_aio_context
      (bs=bs@entry=0x562a268d6a00, ctx=0x562a260c9580,
      ignore_child=<optimized out>, errp=<optimized out>)
      at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6332
  #4  0x0000562a242bb7db in blk_do_set_aio_context (blk=0x562a2735d0d0,
      new_context=0x562a260c9580,
      update_root_node=update_root_node@entry=true, errp=errp@entry=0x0)
      at /build/qemu-gYtjVn/qemu-5.0.1/block/block-backend.c:1989
  #5  0x0000562a242be0bd in blk_set_aio_context (blk=<optimized out>,
      new_context=<optimized out>, errp=errp@entry=0x0)
      at /build/qemu-gYtjVn/qemu-5.0.1/block/block-backend.c:2010
  #6  0x0000562a23fbd953 in virtio_blk_data_plane_stop (vdev=<optimized
      out>)
      at /build/qemu-gYtjVn/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292
  #7  0x0000562a241fc7bf in virtio_bus_stop_ioeventfd (bus=0x562a260dbf08)
      at /build/qemu-gYtjVn/qemu-5.0.1/hw/virtio/virtio-bus.c:245
  #8  0x0000562a23fefb2e in virtio_vmstate_change (opaque=0x562a260dbf90,
      running=0, state=<optimized out>)
      at /build/qemu-gYtjVn/qemu-5.0.1/hw/virtio/virtio.c:3220
  #9  0x0000562a2402ebfd in vm_state_notify (running=running@entry=0,
      state=state@entry=RUN_STATE_FINISH_MIGRATE)
      at /build/qemu-gYtjVn/qemu-5.0.1/softmmu/vl.c:1275
  #10 0x0000562a23f7bc02 in do_vm_stop (state=RUN_STATE_FINISH_MIGRATE,
      send_stop=<optimized out>)
      at /build/qemu-gYtjVn/qemu-5.0.1/cpus.c:1032
  #11 0x0000562a24209765 in migration_completion (s=0x562a260e83a0)
      at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:2914
  #12 migration_iteration_run (s=0x562a260e83a0)
      at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:3275
  #13 migration_thread (opaque=opaque@entry=0x562a260e83a0)
      at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:3439
  #14 0x0000562a2435ca96 in qemu_thread_start (args=<optimized out>)
      at /build/qemu-gYtjVn/qemu-5.0.1/util/qemu-thread-posix.c:519
  #15 0x00007feed31466ba in start_thread (arg=0x7feeadc9c700)
      at pthread_create.c:333
  #16 0x00007feed2e7c41d in __GI___sysctl (name=0x0, nlen=608471908,
      oldval=0x562a2452b138, oldlenp=0x0, newval=0x562a2452c5e0
      <__func__.28102>, newlen=0)
      at ../sysdeps/unix/sysv/linux/sysctl.c:30
  #17 0x0000000000000000 in ?? ()

Fix it by checking that the connection coroutine is non-null before
trying to enter it.  If it is null, no entering is needed, as the
connection is probably going down anyway.

Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210129073859.683063-3-rvkagan@yandex-team.ru>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:17:12 -06:00
Roman Kagan
3b5e4db673 block/nbd: only detach existing iochannel from aio_context
When the reconnect in NBD client is in progress, the iochannel used for
NBD connection doesn't exist.  Therefore an attempt to detach it from
the aio_context of the parent BlockDriverState results in a NULL pointer
dereference.

The problem is triggerable, in particular, when an outgoing migration is
about to finish, and stopping the dataplane tries to move the
BlockDriverState from the iothread aio_context to the main loop.  If the
NBD connection is lost before this point, and the NBD client has entered
the reconnect procedure, QEMU crashes:

  #0  qemu_aio_coroutine_enter (ctx=0x5618056c7580, co=0x0)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-coroutine.c:109
  #1  0x00005618034b1b68 in nbd_client_attach_aio_context_bh (
      opaque=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block/nbd.c:164
  #2  0x000056180353116b in aio_wait_bh (opaque=0x7f60e1e63700)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:55
  #3  0x0000561803530633 in aio_bh_call (bh=0x7f60d40a7e80)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:136
  #4  aio_bh_poll (ctx=ctx@entry=0x5618056c7580)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:164
  #5  0x0000561803533e5a in aio_poll (ctx=ctx@entry=0x5618056c7580,
      blocking=blocking@entry=true)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-posix.c:650
  #6  0x000056180353128d in aio_wait_bh_oneshot (ctx=0x5618056c7580,
      cb=<optimized out>, opaque=<optimized out>)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:71
  #7  0x000056180345c50a in bdrv_attach_aio_context (new_context=0x5618056c7580,
      bs=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6172
  #8  bdrv_set_aio_context_ignore (bs=bs@entry=0x561805ed4c00,
      new_context=new_context@entry=0x5618056c7580,
      ignore=ignore@entry=0x7f60e1e63780)
      at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6237
  #9  0x000056180345c969 in bdrv_child_try_set_aio_context (
      bs=bs@entry=0x561805ed4c00, ctx=0x5618056c7580,
      ignore_child=<optimized out>, errp=<optimized out>)
      at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6332
  #10 0x00005618034957db in blk_do_set_aio_context (blk=0x56180695b3f0,
      new_context=0x5618056c7580, update_root_node=update_root_node@entry=true,
      errp=errp@entry=0x0)
      at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:1989
  #11 0x00005618034980bd in blk_set_aio_context (blk=<optimized out>,
      new_context=<optimized out>, errp=errp@entry=0x0)
      at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:2010
  #12 0x0000561803197953 in virtio_blk_data_plane_stop (vdev=<optimized out>)
      at /build/qemu-6MF7tq/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292
  #13 0x00005618033d67bf in virtio_bus_stop_ioeventfd (bus=0x5618056d9f08)
      at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio-bus.c:245
  #14 0x00005618031c9b2e in virtio_vmstate_change (opaque=0x5618056d9f90,
      running=0, state=<optimized out>)
      at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio.c:3220
  #15 0x0000561803208bfd in vm_state_notify (running=running@entry=0,
      state=state@entry=RUN_STATE_FINISH_MIGRATE)
      at /build/qemu-6MF7tq/qemu-5.0.1/softmmu/vl.c:1275
  #16 0x0000561803155c02 in do_vm_stop (state=RUN_STATE_FINISH_MIGRATE,
      send_stop=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/cpus.c:1032
  #17 0x00005618033e3765 in migration_completion (s=0x5618056e6960)
      at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:2914
  #18 migration_iteration_run (s=0x5618056e6960)
      at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3275
  #19 migration_thread (opaque=opaque@entry=0x5618056e6960)
      at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3439
  #20 0x0000561803536ad6 in qemu_thread_start (args=<optimized out>)
      at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-thread-posix.c:519
  #21 0x00007f61085d06ba in start_thread ()
     from /lib/x86_64-linux-gnu/libpthread.so.0
  #22 0x00007f610830641d in sysctl () from /lib/x86_64-linux-gnu/libc.so.6
  #23 0x0000000000000000 in ?? ()

Fix it by checking that the iochannel is non-null before trying to
detach it from the aio_context.  If it is null, no detaching is needed,
and it will get reattached in the proper aio_context once the connection
is reestablished.

Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210129073859.683063-2-rvkagan@yandex-team.ru>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy
a5215b8fdf block/io: use int64_t bytes in copy_range
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, convert now copy_range parameters which are already 64bit to signed
type.

It's safe as we don't work with requests overflowing BDRV_MAX_LENGTH
(which is less than INT64_MAX), and do check the requests in
bdrv_co_copy_range_internal() (by bdrv_check_request32(), which calls
bdrv_check_request()).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-17-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy
e9e52efdc5 block/io: support int64_t bytes in read/write wrappers
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

Now, since bdrv_co_preadv_part() and bdrv_co_pwritev_part() have been
updated, update all their wrappers.

For all of them type of 'bytes' is widening, so callers are safe. We
have update request_fn in blkverify.c simultaneously. Still it's just a
pointer to one of bdrv_co_pwritev() or bdrv_co_preadv(), and type is
widening for callers of the request_fn anyway.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-16-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy
37e9403ea8 block/io: support int64_t bytes in bdrv_co_p{read,write}v_part()
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_co_preadv_part() and bdrv_co_pwritev_part() and their
remaining dependencies now.

bdrv_pad_request() is updated simultaneously, as pointer to bytes passed
to it both from bdrv_co_pwritev_part() and bdrv_co_preadv_part().

So, all callers of bdrv_pad_request() are updated to pass 64bit bytes.
bdrv_pad_request() is already good for 64bit requests, add
corresponding assertion.

Look at bdrv_co_preadv_part() and bdrv_co_pwritev_part().
Type is widening, so callers are safe. Let's look inside the functions.

In bdrv_co_preadv_part() and bdrv_aligned_pwritev() we only pass bytes
to other already int64_t interfaces (and some obviously safe
calculations), it's OK.

In bdrv_co_do_zero_pwritev() aligned_bytes may become large now, still
it's passed to bdrv_aligned_pwritev which supports int64_t bytes.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-15-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:17:11 -06:00
Vladimir Sementsov-Ogievskiy
8b0c5d7659 block/io: support int64_t bytes in bdrv_aligned_preadv()
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_aligned_preadv() now.

Make the bytes variable in bdrv_padding_rmw_read() int64_t, as it is
only used for pass-through to bdrv_aligned_preadv().

All bdrv_aligned_preadv() callers are safe as type is widening. Let's
look inside:

 - add a new-style assertion that request is good.
 - callees bdrv_is_allocated(), bdrv_co_do_copy_on_readv() supports
   int64_t bytes
 - conversion of bytes_remaining is OK, as we never have requests
   overflowing BDRV_MAX_LENGTH
 - looping through bytes_remaining is ok, num is updated to int64_t
   - for bdrv_driver_preadv we have same limit of max_transfer
   - qemu_iovec_memset is OK, as bytes+qiov_offset should not overflow
     qiov->size anyway (thanks to bdrv_check_qiov_request())

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-14-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:17:11 -06:00
Vladimir Sementsov-Ogievskiy
9df5afbdd1 block/io: support int64_t bytes in bdrv_co_do_copy_on_readv()
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_co_do_copy_on_readv() now.

'bytes' type widening, so callers are safe. Look at the function
itself:

bytes, skip_bytes and progress become int64_t.

bdrv_round_to_clusters() is OK, cluster_bytes now may be large.
trace_bdrv_co_do_copy_on_readv() is OK

looping through cluster_bytes is still OK.

pnum is still capped to max_transfer, and to MAX_BOUNCE_BUFFER when we
are going to do COR operation. Therefor calculations in
qemu_iovec_from_buf() and bdrv_driver_preadv() should not change.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-13-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:17:11 -06:00
Vladimir Sementsov-Ogievskiy
fcfd9ade68 block/io: support int64_t bytes in bdrv_aligned_pwritev()
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_aligned_pwritev() now and convert the dependencies:
bdrv_co_write_req_prepare() and bdrv_co_write_req_finish() to signed
type bytes.

Conversion of bdrv_co_write_req_prepare() and
bdrv_co_write_req_finish() is definitely safe, as all requests in
block/io must not overflow BDRV_MAX_LENGTH. Still add assertions.

For bdrv_aligned_pwritev() 'bytes' type is widened, so callers are
safe. Let's check usage of the parameter inside the function.

Passing to bdrv_co_write_req_prepare() and bdrv_co_write_req_finish()
is OK.

Passing to qemu_iovec_* is OK after new assertion. All other callees
are already updated to int64_t.

Checking alignment is not changed, offset + bytes and qiov_offset +
bytes calculations are safe (thanks to new assertions).

max_transfer is kept to be int for now. It has a default of INT_MAX
here, and some drivers may rely on it. It's to be refactored later.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-12-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:16:03 -06:00
Vladimir Sementsov-Ogievskiy
5ae07b1410 block/io: support int64_t bytes in bdrv_co_do_pwrite_zeroes()
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_co_do_pwrite_zeroes() now.

Callers are safe, as converting int to int64_t is safe. Concentrate on
'bytes' usage in the function (thx to Eric Blake):

    compute 'int tail' via % 'int alignment' - safe
    fragmentation loop 'int num' - still fragments with a cap on
      max_transfer

    use of 'num' within the loop
    MIN(bytes, max_transfer) as well as %alignment - still works, so
         calculations in if (head) {} are safe
    clamp size by 'int max_write_zeroes' - safe
    drv->bdrv_co_pwrite_zeroes(int) - safe because of clamping
    clamp size by 'int max_transfer' - safe
    buf allocation is still clamped to max_transfer
    qemu_iovec_init_buf(size_t) - safe because of clamping
    bdrv_driver_pwritev(uint64_t) - safe

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-11-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:16:03 -06:00
Vladimir Sementsov-Ogievskiy
17abcbeee2 block/io: use int64_t bytes in driver wrappers
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, convert driver wrappers parameters which are already 64bit to
signed type.

Requests in block/io.c must never exceed BDRV_MAX_LENGTH (which is less
than INT64_MAX), which makes the conversion to signed 64bit type safe.

Add corresponding assertions.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-10-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:16:03 -06:00
Eric Blake
8024726459 block: use int64_t as bytes type in tracked requests
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

All requests in block/io must not overflow BDRV_MAX_LENGTH, all
external users of BdrvTrackedRequest already have corresponding
assertions, so we are safe. Add some assertions still.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-9-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:14:15 -06:00
Vladimir Sementsov-Ogievskiy
63f4ad1186 block/io: improve bdrv_check_request: check qiov too
Operations with qiov add more restrictions on bytes, let's cover it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-8-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:14:00 -06:00
Vladimir Sementsov-Ogievskiy
801625e69d block/throttle-groups: throttle_group_co_io_limits_intercept(): 64bit bytes
The function is called from 64bit io handlers, and bytes is just passed
to throttle_account() which is 64bit too (unsigned though). So, let's
convert intermediate argument to 64bit too.

This patch is a first in the 64-bit-blocklayer series, so we are
generally moving to int64_t for both offset and bytes parameters on all
io paths. Main motivation is realization of 64-bit write_zeroes
operation for fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

Patch-correctness audit by Eric Blake:

  Caller has 32-bit, this patch now causes widening which is safe:
  block/block-backend.c: blk_do_preadv() passes 'unsigned int'
  block/block-backend.c: blk_do_pwritev_part() passes 'unsigned int'
  block/throttle.c: throttle_co_pwrite_zeroes() passes 'int'
  block/throttle.c: throttle_co_pdiscard() passes 'int'

  Caller has 64-bit, this patch fixes potential bug where pre-patch
  could narrow, except it's easy enough to trace that callers are still
  capped at 2G actions:
  block/throttle.c: throttle_co_preadv() passes 'uint64_t'
  block/throttle.c: throttle_co_pwritev() passes 'uint64_t'

  Implementation in question: block/throttle-groups.c
  throttle_group_co_io_limits_intercept() takes 'unsigned int bytes'
  and uses it: argument to util/throttle.c throttle_account(uint64_t)

  All safe: it patches a latent bug, and does not introduce any 64-bit
  gotchas once throttle_co_p{read,write}v are relaxed, and assuming
  throttle_account() is not buggy.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20201211183934.169161-7-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:14:00 -06:00
Vladimir Sementsov-Ogievskiy
98ca45494f block/io: bdrv_pad_request(): support qemu_iovec_init_extended failure
Make bdrv_pad_request() honest: return error if
qemu_iovec_init_extended() failed.

Update also bdrv_padding_destroy() to clean the structure for safety.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-6-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:14:00 -06:00
Vladimir Sementsov-Ogievskiy
f0deecff82 block/io: refactor bdrv_pad_request(): move bdrv_pad_request() up
Prepare for the following patch when bdrv_pad_request() will be able to
fail. Update the comments.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:00:52 -06:00
Vladimir Sementsov-Ogievskiy
a56ed80c42 block: fix theoretical overflow in bdrv_init_padding()
Calculation of sum may theoretically overflow, so use 64bit type and
add some good assertions.

Use int64_t constantly.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: tweak assertion order]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:00:33 -06:00
Vladimir Sementsov-Ogievskiy
4c002cef0e util/iov: make qemu_iovec_init_extended() honest
Actually, we can't extend the io vector in all cases. Handle possible
MAX_IOV and size_t overflows.

For now add assertion to callers (actually they rely on success anyway)
and fix them in the following patch.

Add also some additional good assertions to qemu_iovec_init_slice()
while being here.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:00:33 -06:00
Vladimir Sementsov-Ogievskiy
69b55e03f7 block: refactor bdrv_check_request: add errp
It's better to pass &error_abort than just assert that result is 0: on
crash, we'll immediately see the reason in the backtrace.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201211183934.169161-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: fix iotest 206 fallout]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-03 08:00:33 -06:00
Kevin Wolf
26513a0174 block: Fix VM size column width in bdrv_snapshot_dump()
size_to_str() can return a size like "4.24 MiB", with a single digit
integer part and two fractional digits. This is eight characters, but
commit b39847a5 changed the format string to only reserve seven
characters for the column.

This can result in unaligned columns, which in turn changes the output of
iotests case 267 because exceeding the column size defeats the attempt
to filter the size out of the output (observed with the ppc64 emulator).
The resulting change is only a whitespace change, but since commit
f203080b this is enough for iotests to consider the test failed.

Taking a character away from the tag name column and adding it to the VM
size column doesn't change anything in the common case (the tag name is
left justified, the VM size is right justified), but fixes this case.

Fixes: b39847a505
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210202155911.179865-1-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-02-02 17:23:55 +01:00
Philippe Mathieu-Daudé
fcc8672aca block/nvme: Trace NVMe spec version supported by the controller
NVMe controllers implement different versions of the spec,
and different features of it. It is useful to gather this
information when debugging.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210127212137.3482291-3-philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-02-02 17:05:38 +01:00
Philippe Mathieu-Daudé
97b709f32e block/nvme: Properly display doorbell stride length in trace event
Commit 15b2260bef ("block/nvme: Trace controller capabilities")
misunderstood the doorbell stride value from the datasheet, use
the correct one. The 'doorbell_scale' variable used few lines
later is correct.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210127212137.3482291-2-philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-02-02 17:05:38 +01:00
Peter Maydell
7e7eb9f852 QAPI patches patches for 2021-01-28
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmASY10SHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZT4M4P+gKN64+WaErotLHHsqtiA0aoTwbTFXin
 OEyR5du0+PjX96qYHHV+ZDn5uxxKI57/SRNooPndjU63sYgAbNApfsu+wUDZC844
 qMSlrmyw2+Lw1EIykoLXK49+pEDU0XpVIciL5+zEdtCgjiJRjrOOJ/JRBcKoQNHn
 UArGNQ8y0D+0i8uXyJjyvQeHdz6KUr9sX1vqwRGMt9axEMDJks0+Si4Zg3z2wlWJ
 Sc3WsXEhikxK1qkF2/6VsopOgNGB0UUvV6q1GO6ngdqag1Hb6mACzSv9mtIShGjh
 a2MISBhxF8h4wfO8U5TiS9vBgYR3elA3kRGsn4FOfD3sSilt/SWLPHWXdlO1aL2E
 TollRPtYBqn2YIYQP1SEp7NIqaWC/QaGkP/mH8Jvv0YlL64RK879lv6KiHKzfvI7
 HBD7WGZBwMQqPczuw308tqDTQPKUsPDYoEJAFRywkLry86wL8DBOlkQ0lWUjF06s
 UQk/i09nhrcNLo0GbmgAOHUVj4m03zLyMW/fYmsQ8xe9/b6GBwJvtm2v5wKwg0HE
 ixxj4oBIk5YV5Xwt7DKLkT0voPAAgNK13a6ywzbyfsigwaJaO9tLtZ0PMuaT9kgs
 b/OBdeeIYpFdIT/DlcMWpIFi53VYe0McX8MmprHcMZb1133wk5Z5gk+FAWLMifrw
 2ltmoUPoB1dC
 =djiE
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2021-01-28' into staging

QAPI patches patches for 2021-01-28

# gpg: Signature made Thu 28 Jan 2021 07:10:21 GMT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-qapi-2021-01-28:
  qapi: More complex uses of QAPI_LIST_APPEND
  qapi: Use QAPI_LIST_APPEND in trivial cases
  qapi: Introduce QAPI_LIST_APPEND
  qapi: A couple more QAPI_LIST_PREPEND() stragglers
  net: Clarify early exit condition

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-01-28 22:43:18 +00:00
Eric Blake
95b3a8c8a8 qapi: More complex uses of QAPI_LIST_APPEND
These cases require a bit more thought to review; in each case, the
code was appending to a list, but not with a FOOList **tail variable.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210113221013.390592-6-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Flawed change to qmp_guest_network_get_interfaces() dropped]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-01-28 08:08:45 +01:00
Eric Blake
c3033fd372 qapi: Use QAPI_LIST_APPEND in trivial cases
The easiest spots to use QAPI_LIST_APPEND are where we already have an
obvious pointer to the tail of a list.  While at it, consistently use
the variable name 'tail' for that purpose.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210113221013.390592-5-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-01-28 08:08:45 +01:00
Kevin Wolf
86b1cf3227 block: Separate blk_is_writable() and blk_supports_write_perm()
Currently, blk_is_read_only() tells whether a given BlockBackend can
only be used in read-only mode because its root node is read-only. Some
callers actually try to answer a slightly different question: Is the
BlockBackend configured to be writable, by taking write permissions on
the root node?

This can differ, for example, for CD-ROM devices which don't take write
permissions, but may be backed by a writable image file. scsi-cd allows
write requests to the drive if blk_is_read_only() returns false.
However, the write request will immediately run into an assertion
failure because the write permission is missing.

This patch introduces separate functions for both questions.
blk_supports_write_perm() answers the question whether the block
node/image file can support writable devices, whereas blk_is_writable()
tells whether the BlockBackend is currently configured to be writable.

All calls of blk_is_read_only() are converted to one of the two new
functions.

Fixes: https://bugs.launchpad.net/bugs/1906693
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210118123448.307825-2-kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-01-27 20:45:20 +01:00
David Edmondson
797e3e3805 block: report errno when flock fcntl fails
When a call to fcntl(2) for the purpose of adding file locks fails
with an error other than EAGAIN or EACCES, report the error returned
by fcntl.

EAGAIN or EACCES are elided as they are considered to be common
failures, indicating that a conflicting lock is held by another
process.

No errors are elided when removing file locks.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20210113164447.2545785-1-david.edmondson@oracle.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
143a6384f5 block/block-copy: drop unused argument of block_copy()
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-21-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
5b49c2bdc1 block/block-copy: drop unused block_copy_set_progress_callback()
Drop unused code.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-20-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
71eed4cebe backup: move to block-copy
This brings async request handling and block-status driven chunk sizes
to backup out of the box, which improves backup performance.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-18-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
511e7d31bf block/backup: drop extra gotos from backup_run()
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-17-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
d51590fc3e block/block-copy: make progress_bytes_callback optional
We are going to stop use of this callback in the following commit.
Still the callback handling code will be dropped in a separate commit.
So, for now let's make it optional.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-16-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
2c59fd833a qapi: backup: add max-chunk and max-workers to x-perf struct
Add new parameters to configure future backup features. The patch
doesn't introduce aio backup requests (so we actually have only one
worker) neither requests larger than one cluster. Still, formally we
satisfy these maximums anyway, so add the parameters now, to facilitate
further patch which will really change backup job behavior.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-11-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
a6d23d56df block/block-copy: add block_copy_cancel
Add function to cancel running async block-copy call. It will be used
in backup.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-8-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
7e032df0ea block/block-copy: add ratelimit to block-copy
We are going to directly use one async block-copy operation for backup
job, so we need rate limiter.

We want to maintain current backup behavior: only background copying is
limited and copy-before-write operations only participate in limit
calculation. Therefore we need one rate limiter for block-copy state
and boolean flag for block-copy call state for actual limitation.

Note, that we can't just calculate each chunk in limiter after
successful copying: it will not save us from starting a lot of async
sub-requests which will exceed limit too much. Instead let's use the
following scheme on sub-request creation:
1. If at the moment limit is not exceeded, create the request and
account it immediately.
2. If at the moment limit is already exceeded, drop create sub-request
and handle limit instead (by sleep).
With this approach we'll never exceed the limit more than by one
sub-request (which pretty much matches current backup behavior).

Note also, that if there is in-flight block-copy async call,
block_copy_kick() should be used after set-speed to apply new setup
faster. For that block_copy_kick() published in this patch.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-7-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
2e099a9d29 block/block-copy: add list of all call-states
It simplifies debugging.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-6-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
26be9d62dd block/block-copy: add max_chunk and max_workers parameters
They will be used for backup.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-5-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
de4641b46b block/block-copy: implement block_copy_async
We'll need async block-copy invocation to use in backup directly.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-4-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
3b8c2329b5 block/block-copy: More explicit call_state
Refactor common path to use BlockCopyCallState pointer as parameter, to
prepare it for use in asynchronous block-copy (at least, we'll need to
run block-copy in a coroutine, passing the whole parameters as one
pointer).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-3-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
86c6a3b690 qapi: backup: add perf.use-copy-range parameter
Experiments show, that copy_range is not always making things faster.
So, to make experimentation simpler, let's add a parameter. Some more
perf parameters will be added soon, so here is a new struct.

For now, add new backup qmp parameter with x- prefix for the following
reasons:

 - We are going to add more performance parameters, some will be
   related to the whole block-copy process, some only to background
   copying in backup (ignored for copy-before-write operations).
 - On the other hand, we are going to use block-copy interface in other
   block jobs, which will need performance options as well.. And it
   should be the same structure or at least somehow related.

So, there are too much unclean things about how the interface and now
we need the new options mostly for testing. Let's keep them
experimental for a while.

In do_backup_common() new x-perf parameter handled in a way to
make further options addition simpler.

We add use-copy-range with default=true, and we'll change the default
in further patch, after moving backup to use block-copy.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210116214705.822267-2-vsementsov@virtuozzo.com>
[mreitz: s/5\.2/6.0/]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Andrey Shinkevich
205736f488 block: apply COR-filter to block-stream jobs
This patch completes the series with the COR-filter applied to
block-stream operations.

Adding the filter makes it possible in future implement discarding
copied regions in backing files during the block-stream job, to reduce
the disk overuse (we need control on permissions).

Also, the filter now is smart enough to do copy-on-read with specified
base, so we have benefit on guest reads even when doing block-stream of
the part of the backing chain.

Several iotests are slightly modified due to filter insertion.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201216061703.70908-14-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
0f6c94988a block/stream: add s->target_bs
Add a direct link to target bs for convenience and to simplify
following commit which will insert COR filter above target bs.

This is a part of original commit written by Andrey.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-13-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy
7f4a396d76 qapi: block-stream: add "bottom" argument
The code already don't freeze base node and we try to make it prepared
for the situation when base node is changed during the operation. In
other words, block-stream doesn't own base node.

Let's introduce a new interface which should replace the current one,
which will in better relations with the code. Specifying bottom node
instead of base, and requiring it to be non-filter gives us the
following benefits:

 - drop difference between above_base and base_overlay, which will be
   renamed to just bottom, when old interface dropped

 - clean way to work with parallel streams/commits on the same backing
   chain, which otherwise become a problem when we introduce a filter
   for stream job

 - cleaner interface. Nobody will surprised the fact that base node may
   disappear during block-stream, when there is no word about "base" in
   the interface.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201216061703.70908-11-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Andrey Shinkevich
000e5a1cda stream: rework backing-file changing
Stream in stream_prepare calls bdrv_change_backing_file() to change
backing-file in the metadata of bs.

It may use either backing-file parameter given by user or just take
filename of base on job start.

Backing file format is determined by base on job finish.

There are some problems with this design, we solve only two by this
patch:

1. Consider scenario with backing-file unset. Current concept of stream
supports changing of the base during the job (we don't freeze link to
the base). So, we should not save base filename at job start,

  - let's determine name of the base on job finish.

2. Using direct base to determine filename and format is not very good:
base node may be a filter, so its filename may be JSON, and format_name
is not good for storing into qcow2 metadata as backing file format.

  - let's use unfiltered_base

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
  [vsementsov: change commit subject, change logic in stream_prepare]
Message-Id: <20201216061703.70908-10-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Andrey Shinkevich
e275458b29 copy-on-read: skip non-guest reads if no copy needed
If the flag BDRV_REQ_PREFETCH was set, skip idling read/write
operations in COR-driver. It can be taken into account for the
COR-algorithms optimization. That check is being made during the
block stream job by the moment.

Add the BDRV_REQ_PREFETCH flag to the supported_read_flags of the
COR-filter.

block: Modify the comment for the flag BDRV_REQ_PREFETCH as we are
going to use it alone and pass it to the COR-filter driver for further
processing.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-9-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Andrey Shinkevich
897dd0ec4f block: include supported_read_flags into BDS structure
Add the new member supported_read_flags to the BlockDriverState
structure. It will control the flags set for copy-on-read operations.
Make the block generic layer evaluate supported read flags before they
go to a block driver.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
 [vsementsov: use assert instead of abort]
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-8-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 14:36:37 +01:00
Andrey Shinkevich
e4c8fddde7 qapi: copy-on-read filter: add 'bottom' option
Add an option to limit copy-on-read operations to specified sub-chain
of backing-chain, to make copy-on-read filter useful for block-stream
job.

Suggested-by: Max Reitz <mreitz@redhat.com>
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
  [vsementsov: change subject, modified to freeze the chain,
   do some fixes]
Message-Id: <20201216061703.70908-6-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 11:26:54 +01:00
Andrey Shinkevich
880747a887 qapi: add filter-node-name to block-stream
Provide the possibility to pass the 'filter-node-name' parameter to the
block-stream job as it is done for the commit block job.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
  [vsementsov: comment indentation, s/Since: 5.2/Since: 6.0/]
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-5-vsementsov@virtuozzo.com>
[mreitz: s/commit/stream/]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 11:26:54 +01:00
Andrey Shinkevich
16e09a21af copy-on-read: add filter drop function
Provide API for the COR-filter removal. Also, drop the filter child
permissions for an inactive state when the filter node is being
removed.
To insert the filter, the block generic layer function
bdrv_insert_node() can be used.
The new function bdrv_cor_filter_drop() may be considered as an
intermediate solution before the QEMU permission update system has
overhauled. Then we are able to implement the API function
bdrv_remove_node() on the block generic layer.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201216061703.70908-4-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 11:26:54 +01:00
Andrey Shinkevich
1252e03b8e copy-on-read: support preadv/pwritev_part functions
Add support for the recently introduced functions
bdrv_co_preadv_part()
and
bdrv_co_pwritev_part()
to the COR-filter driver.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201216061703.70908-2-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-01-26 11:26:54 +01:00
Peter Maydell
45240eed4f Yank patches patches for 2021-01-13
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl/+vJoSHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZTyv8P/3Dqb9sM8p2WIeoUt5KjgkgWto8anCXM
 /vzAvVdOrPcLXgHF1HOEkcYkp5ZDzFb1PP+LNWsbIB82HmGUGfA7CiOpCpXEoDJs
 Z9OYR3K8W5fvSKYTI/m+s7d+9aYqRZajI6ON5M4Eqem0ZwV93/SZMBHOcs1GmvIR
 diXIztaWVgHjU1Q37MlvJTM4lLN3RH1kTWEdfp3dkNMO6HxBet0B1g7xwaPxKrgb
 4y8D/kk9TA1m4wnwrr9s1l3UnfDiZ7mSfsEsXMcTMmQQrAtonD/xiX2YFsHwf6+U
 9cX9BCG2XP65t3ynY5goddcjVX3R6SuP4YWSgYUpJGrSx+GVxXrtF0ZumgiCk+4T
 uv8sOkJdPe9/aRy86UkNzJx7V50nCZnJ2if9neukVjHGW4Hw4txcYV5/7ZuuDqKl
 tF5NdF/LcHEZLKkBt+4g4TpJ7vxEmJc8/ukn9niLT381SkRBFnnP+Bnl9taSlHOY
 xNtQJ3Jrcd/cvBjAPCKgtE4Fx6wzQG3c7Yg4WHxbZzcYZBPp4fifUTLCK5XKHqhb
 rlqCQIO9DzGz5tqOgG7hWmlodSafGiDsHo9tJVpyF5pSSUv3A4KzX2xW4FZZLKJn
 7uBrcV0bLmR4tyw+fr+u2EW0ClYrs/JxeXAnsnTp9JrzkXILf5RjEuK0Sc1ZIuZW
 cmuPa8027ybj
 =fLO1
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-yank-2021-01-13' into staging

Yank patches patches for 2021-01-13

# gpg: Signature made Wed 13 Jan 2021 09:25:46 GMT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-yank-2021-01-13:
  tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test
  io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown
  io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
  migration: Add yank feature
  chardev/char-socket.c: Add yank feature
  block/nbd.c: Add yank feature
  Introduce yank feature

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-01-13 14:19:24 +00:00
Lukas Straub
fee091cdff block/nbd.c: Add yank feature
Register a yank function which shuts down the socket and sets
s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an
error occured.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <b73eb07db6d1fcd00667beb13ae6117260f002c3.1609167865.git.lukasstraub2@web.de>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-01-13 10:21:17 +01:00
Roman Bolshakov
3eacf70bb5 meson: Propagate gnutls dependency
crypto/tlscreds.h includes GnuTLS headers if CONFIG_GNUTLS is set, but
GNUTLS_CFLAGS, that describe include path, are not propagated
transitively to all users of crypto and build fails if GnuTLS headers
reside in non-standard directory (which is a case for homebrew on Apple
Silicon).

Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Message-Id: <20210102125213.41279-1-r.bolshakov@yadro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-12 12:38:03 +01:00
Peter Maydell
729cc68373 Remove superfluous timer_del() calls
This commit is the result of running the timer-del-timer-free.cocci
script on the whole source tree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201215154107.3255-4-peter.maydell@linaro.org
2021-01-08 15:13:38 +00:00
Paolo Bonzini
9db405a335 libiscsi: convert to meson
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-02 21:03:37 +01:00
Paolo Bonzini
8e4e2b551d curl: remove compatibility code, require 7.29.0
cURL 7.16.0 was released in October 2006.  Just remove code that is
in all likelihood not being used anywhere, and require the oldest version
found in currently supported distros, which is 7.29.0 from CentOS 7.

pkg-config is enough for QEMU, since it does not need extra information
such as the path for certicate authorities.  All supported platforms
today will all have pkg-config for curl, so we can drop curl-config.

Suggested-by: Daniel Berrangé <berrange@redhat.com>
Reviewed-by: Daniel Berrangé <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-02 21:03:37 +01:00
Paolo Bonzini
2f2a376a42 meson: use dependency to gate block modules
This allows converting the dependencies to meson options one by one.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-02 21:03:36 +01:00
Peter Maydell
1f7c02797f QAPI patches patches for 2020-12-19
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl/dynUSHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZT3igP/3bWwsKR5vKVsDUTmMfrhcgaFvQiaYoG
 F29Bond8Xy0Zd0gl7OWh/5jKL0vGlrEVPrKfYLUjMnfkeRec/pOkIB2oOmIxpnPs
 9zi4kh2hQ3dEoRBuvSnnZzedetYPTuCpWMIjlztkgfxgcimqm8TPNVSxRaSApjC3
 Y8108wGwBWVf2C0rhKO9E2xA51uo6khy05i1psUtqUlC+PuDQ/OwzQHM2dnWdDB6
 kUwBDK17nhL6WwsYqCyKLSiDModReYfDiY8GS5MDLo74dzwXiatEefCR7+sbM4xq
 eX/SBoqoeS1jLPNuCryNeGNKvNA2KAbEJTnbQA2NxBXHgZ9/1SxVZFxuPp4nDMSQ
 N7BDuDI8YtJE479RjT/ZzRG65xadGBSe/HXkXM9mZwh1zitop8SVZ9fArFBHvNzw
 Y5zAv3fQd54+87psffg4dYFK0wGmqTabLEEuVzM8KIVqcAdYA2yC2b2EHy+vsxuq
 GMkr0WaA6Sq2gthXmzdTjmUPuHdan/NIhuV6d66SbPNH2oH31piptFxuznyFWSKV
 isciFFdUrkg5QrF8DSt2nmdwMFf8QGbszqP8QIGMzhJCCS9GXIiGG8f149++q8X8
 HO1lFAdLQJdrDwCYmfx36tOvi2rS/rcoTGgvg66UX3xKko1ruoxR1ZWcS54obJN6
 vEQDZ+PxubDg
 =vGLy
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2020-12-19' into staging

QAPI patches patches for 2020-12-19

# gpg: Signature made Sat 19 Dec 2020 09:40:05 GMT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-qapi-2020-12-19: (33 commits)
  qobject: Make QString immutable
  block: Use GString instead of QString to build filenames
  keyval: Use GString to accumulate value strings
  json: Use GString instead of QString to accumulate strings
  migration: Replace migration's JSON writer by the general one
  qobject: Factor JSON writer out of qobject_to_json()
  qobject: Factor quoted_str() out of to_json()
  qobject: Drop qstring_get_try_str()
  qobject: Drop qobject_get_try_str()
  Revert "qobject: let object_property_get_str() use new API"
  block: Avoid qobject_get_try_str()
  qmp: Fix tracing of non-string command IDs
  qobject: Move internals to qobject-internal.h
  hw/rdma: Replace QList by GQueue
  Revert "qstring: add qstring_free()"
  qobject: Change qobject_to_json()'s value to GString
  qobject: Use GString instead of QString to accumulate JSON
  qobject: Make qobject_to_json_pretty() take a pretty argument
  monitor: Use GString instead of QString for output buffer
  hmp: Simplify how qmp_human_monitor_command() gets output
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-01-01 14:33:03 +00:00
Peter Maydell
26f6b15e26 Block patches:
- New block filter: preallocate (which, on writes beyond an image file's
   end, allocates big chunks of data so that such post-EOF writes will
   occur less frequently)
 - write-zeroes and block-status support for Quorum
 - Implementation of truncate for the nvme block driver similarly to the
   existing implementations for host block devices and iscsi devices
 - Block layer refactoring: Drop the tighten_restrictions concept in the
   block permission functions
 - iotest fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAl/cwIoSHG1yZWl0ekBy
 ZWRoYXQuY29tAAoJEPQH2wBh1c9AnvMH+gOnZCwEUKWuBxGX3Wjb/kqV1OuhAhcP
 IVrKLRnqdarCYMQ9M4SZL6pedfsujHA7vClTV7NTrenXBsEIradBQ59ztQ0oDirS
 4ipIjVtNqj7m86l+IRZDq5HlwOYwwFnWogmLo2bcmNJGLpPQQfrhL2vRJ1wLgFYk
 WjeAVlkkYcHnTIDvs4ne9WRSlxGVBWJ4X5nSlRdZqeyUcMY9v4wL4P9Wc4ZuORmq
 /5HRcT5JKGaT2bAueaqAGEdtPFGbazEP5uU7MTTK/fueDKIRAXO2d0gqhANtOOJQ
 7hMmKhwOPOOhrrpCVi9nxsVwdCOHfurV0km6cOs+Iprm/Wm2UtuS/A8=
 =z+7k
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2020-12-18' into staging

Block patches:
- New block filter: preallocate (which, on writes beyond an image file's
  end, allocates big chunks of data so that such post-EOF writes will
  occur less frequently)
- write-zeroes and block-status support for Quorum
- Implementation of truncate for the nvme block driver similarly to the
  existing implementations for host block devices and iscsi devices
- Block layer refactoring: Drop the tighten_restrictions concept in the
  block permission functions
- iotest fixes

# gpg: Signature made Fri 18 Dec 2020 14:45:30 GMT
# gpg:                using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40
# gpg:                issuer "mreitz@redhat.com"
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full]
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/maxreitz/tags/pull-block-2020-12-18: (30 commits)
  iotests: Fix _send_qemu_cmd with bash 5.1
  iotests/102: Pass $QEMU_HANDLE to _send_qemu_cmd
  block/nvme: Implement fake truncate() coroutine
  quorum: Implement bdrv_co_pwrite_zeroes()
  quorum: Implement bdrv_co_block_status()
  scripts/simplebench: add bench_prealloc.py
  simplebench/results_to_text: make executable
  simplebench/results_to_text: add difference line to the table
  simplebench/results_to_text: improve view of the table
  simplebench: move results_to_text() into separate file
  simplebench: rename ascii() to results_to_text()
  scripts/simplebench: use standard deviation for +- error
  scripts/simplebench: support iops
  scripts/simplebench: fix grammar: s/successed/succeeded/
  iotests: add 298 to test new preallocate filter driver
  iotests.py: execute_setup_common(): add required_fmts argument
  iotests: qemu_io_silent: support --image-opts
  qemu-io: add preallocate mode parameter for truncate command
  block: introduce preallocate filter
  block: bdrv_check_perm(): process children anyway
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-12-31 23:26:46 +00:00
Markus Armbruster
eab3a4678b qobject: Change qobject_to_json()'s value to GString
qobject_to_json() and qobject_to_json_pretty() build a GString, then
covert it to QString.  Just one of the callers actually needs a
QString: qemu_rbd_parse_filename().  A few others need a string they
can modify: qmp_send_response(), qga's send_response(), to_json_str(),
and qmp_fd_vsend_fds().  The remainder just need a string.

Change qobject_to_json() and qobject_to_json_pretty() to return the
GString.

qemu_rbd_parse_filename() now has to convert to QString.  All others
save a QString temporary.  to_json_str() actually becomes a bit
simpler, because GString provides more convenient modification
functions.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201211171152.146877-6-armbru@redhat.com>
2020-12-19 10:38:43 +01:00
Eric Blake
54aa3de72e qapi: Use QAPI_LIST_PREPEND() where possible
Anywhere we create a list of just one item or by prepending items
(typically because order doesn't matter), we can use
QAPI_LIST_PREPEND().  But places where we must keep the list in order
by appending remain open-coded until later patches.

Note that as a side effect, this also performs a cleanup of two minor
issues in qga/commands-posix.c: the old code was performing
 new = g_malloc0(sizeof(*ret));
which 1) is confusing because you have to verify whether 'new' and
'ret' are variables with the same type, and 2) would conflict with C++
compilation (not an actual problem for this file, but makes
copy-and-paste harder).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201113011340.463563-5-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
[Straightforward conflicts due to commit a8aa94b5f8 "qga: update
schema for guest-get-disks 'dependents' field" and commit a10b453a52
"target/mips: Move mips_cpu_add_definition() from helper.c to cpu.c"
resolved.  Commit message tweaked.]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2020-12-19 10:20:14 +01:00
Markus Armbruster
be7c5ddd0d block/vpc: Use sizeof() instead of HEADER_SIZE for footer size
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-10-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:30 +01:00
Markus Armbruster
a3d2761719 block/vpc: Pass footer buffers as VHDFooter * instead of uint8_t *
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-9-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:28 +01:00
Markus Armbruster
275734e479 block/vpc: Pad VHDFooter, replace uint8_t[] buffers
Pad VHDFooter as specified in the "Virtual Hard Disk Image Format
Specification" version 1.0[*].  Change footer buffers from
uint8_t[HEADER_SIZE] to VHDFooter.  Their size remains the same.

The VHDFooter * variables pointing to a VHDFooter variable right next
to it are now silly.  Eliminate them, and shorten the remaining
variables' names.

Most variables pointing to s->footer are now also silly.  Eliminate
them, too.

[*] http://download.microsoft.com/download/f/f/e/ffef50a5-07dd-4cf8-aaa3-442c0673a029/Virtual%20Hard%20Disk%20Format%20Spec_10_18_06.doc

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-8-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:26 +01:00
Markus Armbruster
3d6101a3f2 block/vpc: Use sizeof() instead of 1024 for dynamic header size
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-7-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:23 +01:00
Markus Armbruster
e326f0783e block/vpc: Pad VHDDynDiskHeader, replace uint8_t[] buffers
Pad VHDDynDiskHeader as specified in the "Virtual Hard Disk Image
Format Specification" version 1.0[*].  Change dynamic disk header
buffers from uint8_t[1024] to VHDDynDiskHeader.  Their size remains
the same.

The VHDDynDiskHeader * variables pointing to a VHDDynDiskHeader
variable right next to it are now silly.  Eliminate them.

[*] http://download.microsoft.com/download/f/f/e/ffef50a5-07dd-4cf8-aaa3-442c0673a029/Virtual%20Hard%20Disk%20Format%20Spec_10_18_06.doc

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-6-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:18 +01:00
Markus Armbruster
7550379ded block/vpc: Make vpc_checksum() take void *
Some of the next commits will checksum structs.  Change vpc_checksum()
to take void * instead of uint8_t, to save us pointless casts to
uint8_t *.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-5-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:16 +01:00
Markus Armbruster
a18dc3a14d block/vpc: Don't abuse the footer buffer for dynamic header
create_dynamic_disk() takes a buffer holding the footer as first
argument.  It writes out the footer (512 bytes), then reuses the
buffer to initialize and write out the dynamic header (1024 bytes).

Works, because the caller passes a buffer that is large enough for
both purposes.  I hate that.

Use a separate buffer for the dynamic header, and adjust the caller's
buffer.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-4-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:14 +01:00
Markus Armbruster
b0ce8cb0e8 block/vpc: Don't abuse the footer buffer as BAT sector buffer
create_dynamic_disk() takes a buffer holding the footer as first
argument.  It writes out the footer (512 bytes), then reuses the
buffer to initialize and write out the dynamic header (1024 bytes),
then reuses it again to initialize and write out BAT sectors (512).

Works, because the caller passes a buffer that is large enough for all
three purposes.  I hate that.

Use a separate buffer for writing out BAT sectors.  The next commit
will do the same for the dynamic header.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-3-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:43:06 +01:00
Markus Armbruster
02df95c4a1 block/vpc: Make vpc_open() read the full dynamic header
The dynamic header's size is 1024 bytes.

vpc_open() reads only the 512 bytes of the dynamic header into buf[].
Works, because it doesn't actually access the second half.  However, a
colleague told me that GCC 11 warns:

    ../block/vpc.c:358:51: error: array subscript 'struct VHDDynDiskHeader[0]' is partly outside array bounds of 'uint8_t[512]' [-Werror=array-bounds]

Clean up to read the full header.

Rename buf[] to dyndisk_header_buf[] while there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201217162003.1102738-2-armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 12:42:34 +01:00
Philippe Mathieu-Daudé
c8807c5edc block/nvme: Implement fake truncate() coroutine
NVMe drive cannot be shrunk.

Since commit c80d8b06cf we can use the @exact parameter (set
to false) to return success if the block device is larger than
the requested offset (even if we can not be shrunk).

Use this parameter to implement the NVMe truncate() coroutine,
similarly how it is done for the iscsi and file-posix drivers
(see commit 82325ae5f2 "Evaluate @exact in protocol drivers").

Reported-by: Xueqiang Wei <xuwei@redhat.com>
Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20201210125202.858656-1-philmd@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Alberto Garcia
5cddb2e95f quorum: Implement bdrv_co_pwrite_zeroes()
This simply calls bdrv_co_pwrite_zeroes() in all children.

bs->supported_zero_flags is also set to the flags that are supported
by all children.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <2f09c842781fe336b4c2e40036bba577b7430190.1605286097.git.berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Alberto Garcia
ef9bba1484 quorum: Implement bdrv_co_block_status()
The quorum driver does not implement bdrv_co_block_status() and
because of that it always reports to contain data even if all its
children are known to be empty.

One consequence of this is that if we for example create a quorum with
a size of 10GB and we mirror it to a new image the operation will
write 10GB of actual zeroes to the destination image wasting a lot of
time and disk space.

Since a quorum has an arbitrary number of children of potentially
different formats there is no way to report all possible allocation
status flags in a way that makes sense, so this implementation only
reports when a given region is known to contain zeroes
(BDRV_BLOCK_ZERO) or not (BDRV_BLOCK_DATA).

If all children agree that a region contains zeroes then we can return
BDRV_BLOCK_ZERO using the smallest size reported by the children
(because all agree that a region of at least that size contains
zeroes).

If at least one child disagrees we have to return BDRV_BLOCK_DATA.
In this case we use the largest of the sizes reported by the children
that didn't return BDRV_BLOCK_ZERO (because we know that there won't
be an agreement for at least that size).

Signed-off-by: Alberto Garcia <berto@igalia.com>
Tested-by: Tao Xu <tao3.xu@intel.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <db83149afcf0f793effc8878089d29af4c46ffe1.1605286097.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy
33fa2222eb block: introduce preallocate filter
It's intended to be inserted between format and protocol nodes to
preallocate additional space (expanding protocol file) on writes
crossing EOF. It improves performance for file-systems with slow
allocation.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201021145859.11201-9-vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Two comment fixes, and bumped the version from 5.2 to 6.0]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy
d1a764d126 block: introduce BDRV_REQ_NO_WAIT flag
Add flag to make serialising request no wait: if there are conflicting
requests, just return error immediately. It's will be used in upcoming
preallocate filter.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201021145859.11201-7-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy
8ac5aab255 block: bdrv_mark_request_serialising: split non-waiting function
We'll need a separate function, which will only "mark" request
serialising with specified align but not wait for conflicting
requests. So, it will be like old bdrv_mark_request_serialising(),
before merging bdrv_wait_serialising_requests_locked() into it.

To reduce the possible mess, let's do the following:

Public function that does both marking and waiting will be called
bdrv_make_request_serialising, and private function which will only
"mark" will be called tracked_request_set_serialising().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201021145859.11201-6-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy
ec1c886831 block/io: bdrv_wait_serialising_requests_locked: drop extra bs arg
bs is linked in req, so no needs to pass it separately. Most of
tracked-requests API doesn't have bs argument. Actually, after this
patch only tracked_request_begin has it, but it's for purpose.

While being here, also add a comment about what "_locked" is.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201021145859.11201-5-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy
3183937ff9 block/io: split out bdrv_find_conflicting_request
To be reused in separate.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201021145859.11201-4-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy
2e36da62cf block/io.c: drop assertion on double waiting for request serialisation
The comments states, that on misaligned request we should have already
been waiting. But for bdrv_padding_rmw_read, we called
bdrv_mark_request_serialising with align = request_alignment, and now
we serialise with align = cluster_size. So we may have to wait again
with larger alignment.

Note, that the only user of BDRV_REQ_SERIALISING is backup which issues
cluster-aligned requests, so seems the assertion should not fire for
now. But it's wrong anyway.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20201021145859.11201-3-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18 12:35:55 +01:00
Peter Lieven
182454dc63 block/nfs: fix int overflow in nfs_client_open_qdict
nfs_client_open returns the file size in sectors. This effectively
makes it impossible to open files larger than 1TB.

Fixes: c22a034545
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <20201209121735.16437-1-pl@kamp.de>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18 11:48:39 +01:00
Pan Nengyuan
cb8d0851f1 block/file-posix: fix a possible undefined behavior
local_err is not initialized to NULL, it will cause a assert error as below:
qemu/util/error.c:59: error_setv: Assertion `*errp == NULL' failed.

Fixes: c644751069
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Message-Id: <20201023061218.2080844-8-kuhn.chenqun@huawei.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-12-13 23:56:16 +01:00
Kevin Wolf
960d5fb3e8 block: Fix deadlock in bdrv_co_yield_to_drain()
If bdrv_co_yield_to_drain() is called for draining a block node that
runs in a different AioContext, it keeps that AioContext locked while it
yields and schedules a BH in the AioContext to do the actual drain.

As long as executing the BH is the very next thing that the event loop
of the node's AioContext does, this actually happens to work, but when
it tries to execute something else that wants to take the AioContext
lock, it will deadlock. (In the bug report, this other thing is a
virtio-scsi device running virtio_scsi_data_plane_handle_cmd().)

Instead, always drop the AioContext lock across the yield and reacquire
it only when the coroutine is reentered. The BH needs to unconditionally
take the lock for itself now.

This fixes the 'block_resize' QMP command on a block node that runs in
an iothread.

Cc: qemu-stable@nongnu.org
Fixes: eb94b81a94
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1903511
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201203172311.68232-4-kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy
8b1170012b block: introduce BDRV_MAX_LENGTH
We are going to modify block layer to work with 64bit requests. And
first step is moving to int64_t type for both offset and bytes
arguments in all block request related functions.

It's mostly safe (when widening signed or unsigned int to int64_t), but
switching from uint64_t is questionable.

So, let's first establish the set of requests we want to work with.
First signed int64_t should be enough, as off_t is signed anyway. Then,
obviously offset + bytes should not overflow.

And most interesting: (offset + bytes) being aligned up should not
overflow as well. Aligned to what alignment? First thing that comes in
mind is bs->bl.request_alignment, as we align up request to this
alignment. But there is another thing: look at
bdrv_mark_request_serialising(). It aligns request up to some given
alignment. And this parameter may be bdrv_get_cluster_size(), which is
often a lot greater than bs->bl.request_alignment.
Note also, that bdrv_mark_request_serialising() uses signed int64_t for
calculations. So, actually, we already depend on some restrictions.

Happily, bdrv_get_cluster_size() returns int and
bs->bl.request_alignment has 32bit unsigned type, but defined to be a
power of 2 less than INT_MAX. So, we may establish, that INT_MAX is
absolute maximum for any kind of alignment that may occur with the
request.

Note, that bdrv_get_cluster_size() is not documented to return power
of 2, still bdrv_mark_request_serialising() behaves like it is.
Also, backup uses bdi.cluster_size and is not prepared to it not being
power of 2.
So, let's establish that Qemu supports only power-of-2 clusters and
alignments.

So, alignment can't be greater than 2^30.

Finally to be safe with calculations, to not calculate different
maximums for different nodes (depending on cluster size and
request_alignment), let's simply set QEMU_ALIGN_DOWN(INT64_MAX, 2^30)
as absolute maximum bytes length for Qemu. Actually, it's not much less
than INT64_MAX.

OK, then, let's apply it to block/io.

Let's consider all block/io entry points of offset/bytes:

4 bytes/offset interface functions: bdrv_co_preadv_part(),
bdrv_co_pwritev_part(), bdrv_co_copy_range_internal() and
bdrv_co_pdiscard() and we check them all with bdrv_check_request().

We also have one entry point with only offset: bdrv_co_truncate().
Check the offset.

And one public structure: BdrvTrackedRequest. Happily, it has only
three external users:

 file-posix.c: adopted by this patch
 write-threshold.c: only read fields
 test-write-threshold.c: sets obviously small constant values

Better is to make the structure private and add corresponding
interfaces.. Still it's not obvious what kind of interface is needed
for file-posix.c. Let's keep it public but add corresponding
assertions.

After this patch we'll convert functions in block/io.c to int64_t bytes
and offset parameters. We can assume that offset/bytes pair always
satisfy new restrictions, and make
corresponding assertions where needed. If we reach some offset/bytes
point in block/io.c missing bdrv_check_request() it is considered a
bug. As well, if block/io.c modifies a offset/bytes request, expanding
it more then aligning up to request_alignment, it's a bug too.

For all io requests except for discard we keep for now old restriction
of 32bit request length.

iotest 206 output error message changed, as now test disk size is
larger than new limit. Add one more test case with new maximum disk
size to cover too-big-L1 case.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-5-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy
f4dad307ef block/io: bdrv_check_byte_request(): drop bdrv_is_inserted()
Move bdrv_is_inserted() calls into callers.

We are going to make bdrv_check_byte_request() a clean thing.
bdrv_is_inserted() is not about checking the request, it's about
checking the bs. So, it should be separate.

With this patch we probably change error path for some failure
scenarios. But depending on the fact that querying too big request on
empty cdrom (or corrupted qcow2 node with no drv) will result in EIO
and not ENOMEDIUM would be very strange. More over, we are going to
move to 64bit requests, so larger requests will be allowed anyway.

More over, keeping in mind that cdrom is the only driver that has
.bdrv_is_inserted() handler it's strange that we should care so much
about it in generic block layer, intuitively we should just do read and
write, and cdrom driver should return correct errors if it is not
inserted. But it's a work for another series.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-4-vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy
33985614bd block/io: bdrv_refresh_limits(): use ERRP_GUARD
This simplifies following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-3-vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy
9b100af30f block/file-posix: fix workaround in raw_do_pwrite_zeroes()
We should not set overlap_bytes:

1. Don't worry: it is calculated by bdrv_mark_request_serialising() and
   will be equal to or greater than bytes anyway.

2. If the request was already aligned up to some greater alignment,
   than we may break things: we reduce overlap_bytes, and further
   bdrv_mark_request_serialising() may not help, as it will not restore
   old bigger alignment.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201203222713.13507-2-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Li Feng
eb43ea16dc file-posix: check the use_lock before setting the file lock
The scenario is that when accessing a volume on an NFS filesystem
without supporting the file lock,  Qemu will complain "Failed to lock
byte 100", even when setting the file.locking = off.

We should do file lock related operations only when the file.locking is
enabled, otherwise, the syscall of 'fcntl' will return non-zero.

Signed-off-by: Li Feng <fengli@smartx.com>
Message-Id: <1607341446-85506-1-git-send-email-fengli@smartx.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Max Reitz
df4ea7091b fuse: Implement hole detection through lseek
This is a relatively new feature in libfuse (available since 3.8.0,
which was released in November 2019), so we have to add a dedicated
check whether it is available before making use of it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-7-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Max Reitz
4ca37a96a7 fuse: (Partially) implement fallocate()
This allows allocating areas after the (old) EOF as part of a growing
resize, writing zeroes, and discarding.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-6-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Max Reitz
4fba06d594 fuse: Allow growable exports
These will behave more like normal files in that writes beyond the EOF
will automatically grow the export size.

As an optimization, keep the RESIZE permission for growable exports so
we do not have to take it for every post-EOF write.  (This permission is
not released when the export is destroyed, because at that point the
BlockBackend is destroyed altogether anyway.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-5-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:40 +01:00
Max Reitz
41429e3d79 fuse: Implement standard FUSE operations
This makes the export actually useful instead of only producing errors
whenever it is accessed.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-4-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Max Reitz
0c9b70d590 fuse: Allow exporting BDSs via FUSE
block-export-add type=fuse allows mounting block graph nodes via FUSE on
some existing regular file.  That file should then appears like a raw
disk image, and accesses to it result in accesses to the exported BDS.

Right now, we only implement the necessary block export functions to set
it up and shut it down.  We do not implement any access functions, so
accessing the mount point only results in errors.  This will be
addressed by a followup patch.

We keep a hash table of exported mount points, because we want to be
able to detect when users try to use a mount point twice.  This is
because we invoke stat() to check whether the given mount point is a
regular file, but if that file is served by ourselves (because it is
already used as a mount point), then this stat() would have to be served
by ourselves, too, which is impossible to do while we (as the caller)
are waiting for it to settle.  Therefore, keep track of mount point
paths to at least catch the most obvious instances of that problem.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201027190600.192171-3-mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Gan Qixin
c208b0ef96 block/iscsi: Use lock guard macros
Replace manual lock()/unlock() calls with lock guard macros
(QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/iscsi.

Signed-off-by: Gan Qixin <ganqixin@huawei.com>
Message-Id: <20201203075055.127773-5-ganqixin@huawei.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Gan Qixin
3af613ebdb block/throttle-groups: Use lock guard macros
Replace manual lock()/unlock() calls with lock guard macros
(QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/throttle-groups.

Signed-off-by: Gan Qixin <ganqixin@huawei.com>
Message-Id: <20201203075055.127773-4-ganqixin@huawei.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Gan Qixin
f5056b70e6 block/curl: Use lock guard macros
Replace manual lock()/unlock() calls with lock guard macros
(QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/curl.

Signed-off-by: Gan Qixin <ganqixin@huawei.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20201203075055.127773-3-ganqixin@huawei.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Gan Qixin
c37c973660 block/accounting: Use lock guard macros
Replace manual lock()/unlock() calls with lock guard macros
(QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/accounting.

Signed-off-by: Gan Qixin <ganqixin@huawei.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20201203075055.127773-2-ganqixin@huawei.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11 17:52:39 +01:00
Markus Armbruster
6cc0667d9b Tweak a few "Parameter 'NAME' expects THING" error message
Change to "expects a THING" where that's an obvious improvement

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201113082626.2725812-11-armbru@redhat.com>
2020-12-10 17:16:44 +01:00
Stefan Hajnoczi
552c2c4c10 block/export: avoid g_return_val_if() input validation
Do not validate input with g_return_val_if(). This API is intended for
checking programming errors and is compiled out with -DG_DISABLE_CHECKS.

Use an explicit if statement for input validation so it cannot
accidentally be compiled out.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201118091644.199527-5-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-09 13:04:17 -05:00
Marc-André Lureau
0df750e9d3 libvhost-user: make it a meson subproject
By making libvhost-user a subproject, check it builds
standalone (without the global QEMU cflags etc).

Note that the library still relies on QEMU include/qemu/atomic.h and
linux_headers/.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20201125100640.366523-6-marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-08 13:48:58 -05:00
Maxim Levitsky
c8bf9a9169 qcow2: Fix corruption on write_zeroes with MAY_UNMAP
Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
introduced a subtle change to code in zero_in_l2_slice:

It swapped the order of

1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);

To

1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);

It seems harmless, however the call to qcow2_free_any_clusters can
trigger a cache flush which can mark the L2 table as clean, and
assuming that this was the last write to it, a stale version of it
will remain on the disk.

Now we have a valid L2 entry pointing to a freed cluster. Oops.

Fixes: 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()")
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
[ kwolf: Fixed to restore the correct original order from before
  205fa50750; added comments like in discard_in_l2_slice(). ]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201124092815.39056-1-kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-11-24 11:29:41 +01:00
Peter Maydell
683685e72d Pull request for 5.2
NVMe fixes to solve IOMMU issues on non-x86 and error message/tracing
 improvements. Elena Afanasova's ioeventfd fixes are also included.
 
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAl+ixjgACgkQnKSrs4Gr
 c8iZYgf+OB2eAGsdZO97fKh6VUUoRKa+BgWKuh37Cfpp3q+dLuIFMSKfU/UgprLc
 aowt6uTFfwudDV9KltUB2EiXIzpuf7JhMNOiDRkyEvYSj4KHRPsQmFCd35Nrjezy
 VvxSGafe2Z60Qnvcx+iGeMATSFX9YTcTZeHttC07v7dWn/yEK3b1hobcmjCcwWeR
 Ud8pjMyh5E2z/NpW8E669/byJf9iahx3LSQxSWt+9PVTPuftAB0Suu+m6svz1wvk
 sjVfIbtVWCp2BdGf5U6a2rEqF3+kIcFkfHp+MwgE0EdMz1wfjudaPl13a0C4DSun
 PSt9E+Ct5BTrDUvqCHvQDOaFiMZTPg==
 =Poyb
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha-gitlab/tags/block-pull-request' into staging

Pull request for 5.2

NVMe fixes to solve IOMMU issues on non-x86 and error message/tracing
improvements. Elena Afanasova's ioeventfd fixes are also included.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

# gpg: Signature made Wed 04 Nov 2020 15:18:16 GMT
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha-gitlab/tags/block-pull-request: (33 commits)
  util/vfio-helpers: Assert offset is aligned to page size
  util/vfio-helpers: Convert vfio_dump_mapping to trace events
  util/vfio-helpers: Improve DMA trace events
  util/vfio-helpers: Trace where BARs are mapped
  util/vfio-helpers: Trace PCI BAR region info
  util/vfio-helpers: Trace PCI I/O config accesses
  util/vfio-helpers: Improve reporting unsupported IOMMU type
  block/nvme: Fix nvme_submit_command() on big-endian host
  block/nvme: Fix use of write-only doorbells page on Aarch64 arch
  block/nvme: Align iov's va and size on host page size
  block/nvme: Change size and alignment of prp_list_pages
  block/nvme: Change size and alignment of queue
  block/nvme: Change size and alignment of IDENTIFY response buffer
  block/nvme: Correct minimum device page size
  block/nvme: Set request_alignment at initialization
  block/nvme: Simplify nvme_cmd_sync()
  block/nvme: Simplify ADMIN queue access
  block/nvme: Correctly initialize Admin Queue Attributes
  block/nvme: Use definitions instead of magic values in add_io_queue()
  block/nvme: Introduce Completion Queue definitions
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-11-23 13:03:13 +00:00
Peter Maydell
c8e5c4b246 Patches for 5.2.0-rc2:
- quorum: Fix crash with rewrite-corrupted and without read-write user
 - io_uring: do not use pointer after free
 - file-posix: Use fallback path for -EBUSY from FALLOC_FL_PUNCH_HOLE
 - iotests: Fix failure on Python 3.9 due to use of a deprecated function
 - char-stdio: Fix QMP default for 'signal'
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl+zt1URHGt3b2xmQHJl
 ZGhhdC5jb20ACgkQfwmycsiPL9ZxyxAAu8GIOeAb7atQvc+KpeBTUG4A+tfAXkC+
 iUYdIpFeWWgmGf7myu3nlaAkeTDk6qHalmzkGRHi3yhX4eNIh5Sdff1YwPcZwf+q
 GLIqFFTW0z1Bd36N8G7Mkf04nKX4QTHqp6THHtSt9jNs56h5OP3axPXVA/3v9y8B
 4ZAkOOvwnwO+U94crhy5y5pX/Vwafv/Dz4DH9hEupE+EI9AuzjZLBrS+sgkxjhmu
 gvHpDSqm6NXwWQA5a24J6NzCy3n/Fw/rqmnoOrN8eRz+4DSCMVDnTDDEMFLa/UoK
 Ci7AqWfG/MnQ4GrGsOx80KJhAFLTmI60vfnUizKtEjL/HJyK5PDyM+VxHz+P/Tkq
 4hqQsHEsll4mAQiKCrrKOOXhn+YC4DhY/5O1EzEfhqfUjI+BFE9iC7LuqQevwKPL
 gytup7eoZjIHMtnKwY1B2ApAqHtodswjHkefcjEcvSlhqGi/BvwuWmeYlFXmA3r0
 YO8fvbYJrwHwJy7CzMb5Rgs2461QGERmXoCsBxLAiqXU9rhpOZ6gKXIjjlYojZ8M
 W0kqbaccTRPuhooFdEQ9RTPSkX7AX2bI0nOoPxfz3YD/siw35YwnUkJqvQbckvJd
 vpPkCL5jt3d9sfO0z1xjSH2ey9bevSReYpCsk+kIZl7V2XoDAW0Nbi0Td3pW4j6x
 dEkg/+sjF+o=
 =0pFF
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Patches for 5.2.0-rc2:

- quorum: Fix crash with rewrite-corrupted and without read-write user
- io_uring: do not use pointer after free
- file-posix: Use fallback path for -EBUSY from FALLOC_FL_PUNCH_HOLE
- iotests: Fix failure on Python 3.9 due to use of a deprecated function
- char-stdio: Fix QMP default for 'signal'

# gpg: Signature made Tue 17 Nov 2020 11:43:17 GMT
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  iotests/081: Test rewrite-corrupted without WRITE
  iotests/081: Filter image format after testdir
  quorum: Require WRITE perm with rewrite-corrupted
  io_uring: do not use pointer after free
  file-posix: allow -EBUSY errors during write zeros on raw block devices
  iotests: Replace deprecated ConfigParser.readfp()
  char-stdio: Fix QMP default for 'signal'

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-11-17 15:58:51 +00:00
Peter Maydell
1c7ab0930a pc,vhost: fixes
Fixes all over the place.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl+zlRgPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpmf8H/0BEjxnINJCN12Te+Mot8K9fjwc0zE0SUuYY
 25LogfJMCfVy0SZk0ZQV9z33GEL5XyMlXQjEpLmlX4d3mOBLcbutI6UVLhu8+Ixj
 89+jFphxIQPDOpA7BnPOD4AJ6TlhbewZ41QBR/J/qv946HayFW9QCAUywuj6H80m
 T3lw0FmPkd6/YupUdUm0pPgJjowckGis+cAa9UkTlqp8jpzFur28N02fE0L6QO3Z
 lR6zsk4yEvsVoeXSkEkmSqZGNcwoQCf4BhmDuD7lBLZ0LBvmd37CCoakStpdnQPH
 Swunmf7Q1H6LRtF7s8ZKXBB/ecVnss3kFTFj5KWx3fJH2SJuHG8=
 =v205
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc,vhost: fixes

Fixes all over the place.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Tue 17 Nov 2020 09:17:12 GMT
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream:
  vhost-user-blk/scsi: Fix broken error handling for socket call
  contrib/libvhost-user: Fix bad printf format specifiers
  hw/i386/acpi-build: Fix maybe-uninitialized error when ACPI hotplug off
  configure: mark vhost-user Linux-only
  vhost-user-blk-server: depend on CONFIG_VHOST_USER
  meson: move vhost_user_blk_server to meson.build
  vhost-user: fix VHOST_USER_ADD/REM_MEM_REG truncation

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

# Conflicts:
#	meson.build
2020-11-17 11:50:11 +00:00
Max Reitz
9ca5b0e842 quorum: Require WRITE perm with rewrite-corrupted
Using rewrite-corrupted means quorum may issue writes to its children
just from receiving read requests from its parents.  Thus, it must take
the WRITE permission when rewrite-corrupted is used.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201113211718.261671-2-mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-11-17 12:38:28 +01:00
Paolo Bonzini
bd89f93603 io_uring: do not use pointer after free
Even though only the pointer value is only printed, it is untidy
and Coverity complains.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20201113154102.1460459-1-pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-11-17 12:26:48 +01:00
Maxim Levitsky
ece4fa9152 file-posix: allow -EBUSY errors during write zeros on raw block devices
On Linux, fallocate(fd, FALLOC_FL_PUNCH_HOLE) when it is used on a block device,
without O_DIRECT can return -EBUSY if it races with another write to the same page.

Since this is rare and discard is not a critical operation, ignore this error

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20201111153913.41840-2-mlevitsk@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-11-17 12:26:48 +01:00
Chetan Pant
61f3c91a67 nomaintainer: Fix Lesser GPL version number
There is no "version 2" of the "Lesser" General Public License.
It is either "GPL version 2.0" or "Lesser GPL version 2.1".
This patch replaces all occurrences of "Lesser GPL version 2" with
"Lesser GPL version 2.1" in comment section.

This patch contains all the files, whose maintainer I could not get
from ‘get_maintainer.pl’ script.

Signed-off-by: Chetan Pant <chetan4windows@gmail.com>
Message-Id: <20201023124424.20177-1-chetan4windows@gmail.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
[thuth: Adapted exec.c and qdev-monitor.c to new location]
Signed-off-by: Thomas Huth <thuth@redhat.com>
2020-11-15 17:04:40 +01:00
Stefan Hajnoczi
e5e856c1eb meson: move vhost_user_blk_server to meson.build
The --enable/disable-vhost-user-blk-server options were implemented in
./configure. There has been confusion about them and part of the problem
is that the shell syntax used for setting the default value is not easy
to read. Move the option over to meson where the conditions are easier
to understand:

  have_vhost_user_blk_server = (targetos == 'linux')

  if get_option('vhost_user_blk_server').enabled()
      if targetos != 'linux'
          error('vhost_user_blk_server requires linux')
      endif
  elif get_option('vhost_user_blk_server').disabled() or not have_system
      have_vhost_user_blk_server = false
  endif

This patch does not change behavior.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201110171121.1265142-2-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2020-11-12 09:19:40 -05:00
shiliyang
5f14f31d2b block: Fix some code style problems, "foo* bar" should be "foo *bar"
There have some code style problems be found when read the block driver code.
So I fixes some problems of this error, ERROR: "foo* bar" should be "foo *bar".

Signed-off-by: Liyang Shi <shiliyang@huawei.com>
Reported-by: Euler Robot <euler.robot@huawei.com>
Message-Id: <3211f389-6d22-46c1-4a16-e6a2ba66f070@huawei.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-11-09 18:42:47 +01:00
Yonggang Luo
c63b0201ae block: Fixes nfs compiling error on msys2/mingw
These compiling errors are fixed:
../block/nfs.c:27:10: fatal error: poll.h: No such file or directory
   27 | #include <poll.h>
      |          ^~~~~~~~
compilation terminated.

../block/nfs.c:63:5: error: unknown type name 'blkcnt_t'
   63 |     blkcnt_t st_blocks;
      |     ^~~~~~~~
../block/nfs.c: In function 'nfs_client_open':
../block/nfs.c:550:27: error: 'struct _stat64' has no member named 'st_blocks'
  550 |     client->st_blocks = st.st_blocks;
      |                           ^
../block/nfs.c: In function 'nfs_get_allocated_file_size':
../block/nfs.c:751:41: error: 'struct _stat64' has no member named 'st_blocks'
  751 |     return (task.ret < 0 ? task.ret : st.st_blocks * 512);
      |                                         ^
../block/nfs.c: In function 'nfs_reopen_prepare':
../block/nfs.c:805:31: error: 'struct _stat64' has no member named 'st_blocks'
  805 |         client->st_blocks = st.st_blocks;
      |                               ^
../block/nfs.c: In function 'nfs_get_allocated_file_size':
../block/nfs.c:752:1: error: control reaches end of non-void function [-Werror=return-type]
  752 | }
      | ^

On msys2/mingw, there is no st_blocks in struct _stat64 yet, we disable the usage of it
on msys2/mingw, and create a typedef long long blkcnt_t; for further implementation

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Message-Id: <20201105123116.674-2-luoyonggang@gmail.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-11-09 15:44:21 +01:00
Alberto Garcia
3441ad4bc4 qcow2: Document and enforce the QCowL2Meta invariants
The QCowL2Meta structure is used to store information about a part of
a write request that touches clusters that need changes in their L2
entries. This happens with newly-allocated clusters or subclusters.

This structure has changed a bit since it was first created and its
current documentation is not quite up-to-date.

A write request can span a region consisting of a combination of
clusters of different types, and qcow2_alloc_host_offset() can
repeatedly call handle_copied() and handle_alloc() to add more
clusters to the mix as long as they all are contiguous on the image
file.

Because of this a write request has a list of QCowL2Meta structures,
one for each part of the request that needs changes in the L2
metadata.

Each one of them spans nb_clusters and has two copy-on-write regions
located immediately before and after the middle region touched by that
part of the write request. Even when those regions themselves are
empty their offsets must be correct because they are used to know the
location of the middle region.

This was not always the case but it is not a problem anymore
because the only two places where QCowL2Meta structures are created
(calculate_l2_meta() and qcow2_co_truncate()) ensure that the
copy-on-write regions are correctly defined, and so do assertions like
the ones in perform_cow().

The conditional initialization of the 'written_to' variable is
therefore unnecessary and is removed by this patch.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20201007161323.4667-1-berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-11-09 15:44:21 +01:00
AlexChen
3d86af858e block: Remove unused include
The "qemu-common.h" include is not used, remove it.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: AlexChen <alex.chen@huawei.com>
Message-Id: <5F8FFB94.3030209@huawei.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-11-09 15:44:21 +01:00
Peter Maydell
85c3ed4417 pc,pci,vhost,virtio: fixes
Lots of fixes all over the place.
 virtio-mem and virtio-iommu patches are kind of fixes but
 it seems better to just make them behave sanely than
 try to educate users about the limitations ...
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl+i9YMPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpySQH/Ru/sxB9PncR1HsqSf0HC0tt/EMKgyZTXEwQ
 FITcjkCvBDS98a1VUvvZbjzTEDEZNnoUv94MjdLeBoptJ7GtK6nPoI6Ke0p1Zqbe
 mlY2BCb0FpN8FE+mthjAI03mhw6o8Qo/OPtyISQzUxCVVqUHL5TRAVAQdeidoK8n
 RBQ4WogwM/h7wI0d9GGgSxAON8IRQnBYImtzJieBb6zeScwKVFTWI1tqBdOyFN0/
 AhzQiNZuhZ7a1XGJIsxmWB1NK2kcXNJuOF0ANh4coIHR0JzmH3xRy+Jnf5e3dYsw
 LI23DUZPSTJJXAwKPucyTG7RTX8F55N9DVHC9KDRD6Ntq1oreJ4=
 =pcbN
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc,pci,vhost,virtio: fixes

Lots of fixes all over the place.
virtio-mem and virtio-iommu patches are kind of fixes but
it seems better to just make them behave sanely than
try to educate users about the limitations ...

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Wed 04 Nov 2020 18:40:03 GMT
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream: (31 commits)
  contrib/vhost-user-blk: fix get_config() information leak
  block/export: fix vhost-user-blk get_config() information leak
  block/export: make vhost-user-blk config space little-endian
  configure: introduce --enable-vhost-user-blk-server
  libvhost-user: follow QEMU comment style
  vhost-blk: set features before setting inflight feature
  Revert "vhost-blk: set features before setting inflight feature"
  net: Add vhost-vdpa in show_netdevs()
  vhost-vdpa: Add qemu_close in vhost_vdpa_cleanup
  vfio: Don't issue full 2^64 unmap
  virtio-iommu: Set supported page size mask
  vfio: Set IOMMU page size as per host supported page size
  memory: Add interface to set iommu page size mask
  virtio-iommu: Add notify_flag_changed() memory region callback
  virtio-iommu: Add replay() memory region callback
  virtio-iommu: Call memory notifiers in attach/detach
  virtio-iommu: Add memory notifiers for map/unmap
  virtio-iommu: Store memory region in endpoint struct
  virtio-iommu: Fix virtio_iommu_mr()
  hw/smbios: Fix leaked fd in save_opt_one() error path
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-11-05 15:16:43 +00:00
Stefan Hajnoczi
f8ffcb2bda block/export: fix vhost-user-blk get_config() information leak
Refuse get_config() requests in excess of sizeof(struct virtio_blk_config).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201027173528.213464-5-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 16:39:05 -05:00
Stefan Hajnoczi
11f60f7eae block/export: make vhost-user-blk config space little-endian
VIRTIO 1.0 devices have little-endian configuration space. The
vhost-user-blk-server.c code already uses little-endian for virtqueue
processing but not for the configuration space fields. Fix this so the
vhost-user-blk export works on big-endian hosts.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201027173528.213464-4-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 16:39:05 -05:00
Stefan Hajnoczi
bc15e44cb2 configure: introduce --enable-vhost-user-blk-server
Make it possible to compile out the vhost-user-blk server. It is enabled
by default on Linux.

Note that vhost-user-server.c depends on libvhost-user, which requires
CONFIG_LINUX. The CONFIG_VHOST_USER dependency was erroneous since that
option controls vhost-user frontends (previously known as "master") and
not device backends (previously known as "slave").

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201027173528.213464-3-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-03 16:39:05 -05:00
Philippe Mathieu-Daudé
a0546a7b6f block/nvme: Fix nvme_submit_command() on big-endian host
The Completion Queue Command Identifier is a 16-bit value,
so nvme_submit_command() is unlikely to work on big-endian
hosts, as the relevant bits are truncated.
Fix by using the correct byte-swap function.

Fixes: bdd6a90a9e ("block: Add VFIO based NVMe driver")
Reported-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201029093306.1063879-25-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:22 +00:00
Philippe Mathieu-Daudé
4b19e9b815 block/nvme: Fix use of write-only doorbells page on Aarch64 arch
qemu_vfio_pci_map_bar() calls mmap(), and mmap(2) states:

  'offset' must be a multiple of the page size as returned
   by sysconf(_SC_PAGE_SIZE).

In commit f68453237b we started to use an offset of 4K which
broke this contract on Aarch64 arch.

Fix by mapping at offset 0, and and accessing doorbells at offset=4K.

Fixes: f68453237b ("block/nvme: Map doorbells pages write-only")
Reported-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-24-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:22 +00:00
Eric Auger
9e13d59884 block/nvme: Align iov's va and size on host page size
Make sure iov's va and size are properly aligned on the
host page size.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-23-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:22 +00:00
Eric Auger
f8fd3ebac3 block/nvme: Change size and alignment of prp_list_pages
In preparation of 64kB host page support, let's change the size
and alignment of the prp_list_pages so that the VFIO DMA MAP succeeds
with 64kB host page size. We align on the host page size.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-22-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:22 +00:00
Eric Auger
2387aaced7 block/nvme: Change size and alignment of queue
In preparation of 64kB host page support, let's change the size
and alignment of the queue so that the VFIO DMA MAP succeeds.
We align on the host page size.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-21-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:22 +00:00
Eric Auger
0aecd06049 block/nvme: Change size and alignment of IDENTIFY response buffer
In preparation of 64kB host page support, let's change the size
and alignment of the IDENTIFY command response buffer so that
the VFIO DMA MAP succeeds. We align on the host page size.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-20-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:22 +00:00
Philippe Mathieu-Daudé
a652a3ec69 block/nvme: Correct minimum device page size
While trying to simplify the code using a macro, we forgot
the 12-bit shift... Correct that.

Fixes: fad1eb6886 ("block/nvme: Use register definitions from 'block/nvme.h'")
Reported-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-19-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:22 +00:00
Philippe Mathieu-Daudé
c8228ac355 block/nvme: Set request_alignment at initialization
Commit bdd6a90a9e ("block: Add VFIO based NVMe driver")
sets the request_alignment in nvme_refresh_limits().
For consistency, also set it during initialization.

Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-18-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé
08d5406798 block/nvme: Simplify nvme_cmd_sync()
As all commands use the ADMIN queue, it is pointless to pass
it as argument each time. Remove the argument, and rename the
function as nvme_admin_cmd_sync() to make this new behavior
clearer.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201029093306.1063879-17-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé
52b75ea8ec block/nvme: Simplify ADMIN queue access
We don't need to dereference from BDRVNVMeState each time.
Use a NVMeQueuePair pointer on the admin queue.
The nvme_init() becomes easier to review, matching the style
of nvme_add_io_queue().

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-16-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé
3c363c073e block/nvme: Correctly initialize Admin Queue Attributes
From the specification chapter 3.1.8 "AQA - Admin Queue Attributes"
the Admin Submission Queue Size field is a 0’s based value:

  Admin Submission Queue Size (ASQS):

    Defines the size of the Admin Submission Queue in entries.
    Enabling a controller while this field is cleared to 00h
    produces undefined results. The minimum size of the Admin
    Submission Queue is two entries. The maximum size of the
    Admin Submission Queue is 4096 entries.
    This is a 0’s based value.

This bug has never been hit because the device initialization
uses a single command synchronously :)

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-15-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé
76a24781cc block/nvme: Use definitions instead of magic values in add_io_queue()
Replace magic values by definitions, and simplifiy since the
number of queues will never reach 64K.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-14-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé
dfa9c6c656 block/nvme: Make nvme_init_queue() return boolean indicating error
Just for consistency, following the example documented since
commit e3fe3988d7 ("error: Document Error API usage rules"),
return a boolean value indicating an error is set or not.
Directly pass errp as the local_err is not requested in our
case. This simplifies a bit nvme_create_queue_pair().

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-12-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé
7a5f00dde3 block/nvme: Make nvme_identify() return boolean indicating error
Just for consistency, following the example documented since
commit e3fe3988d7 ("error: Document Error API usage rules"),
return a boolean value indicating an error is set or not.
Directly pass errp as the local_err is not requested in our
case.

Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201029093306.1063879-11-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé
1b539bd6db block/nvme: Use unsigned integer for queue counter/size
We can not have negative queue count/size/index, use unsigned type.
Rename 'nr_queues' as 'queue_count' to match the spec naming.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-10-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé
3214b0f094 block/nvme: Move definitions before structure declarations
To be able to use some definitions in structure declarations,
move them earlier. No logical change.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-9-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé
6e1e9ff2d3 block/nvme: Trace queue pair creation/deletion
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-8-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé
51e98b6d21 block/nvme: Improve nvme_free_req_queue_wait() trace information
What we want to trace is the block driver state and the queue index.

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-7-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé
1c914cd120 block/nvme: Trace nvme_poll_queue() per queue
As we want to enable multiple queues, report the event
in each nvme_poll_queue() call, rather than once in
the callback calling nvme_poll_queues().

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-6-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé
15b2260bef block/nvme: Trace controller capabilities
Controllers have different capabilities and report them in the
CAP register. We are particularly interested by the page size
limits.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-5-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé
58ad6ae0cb block/nvme: Report warning with warn_report()
Instead of displaying warning on stderr, use warn_report()
which also displays it on the monitor.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-4-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé
8526e39e99 block/nvme: Use hex format to display offset in trace events
Use the same format used for the hw/vfio/ trace events.

Suggested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201029093306.1063879-3-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
2020-11-03 19:06:20 +00:00
AlexChen
c9eb2f3e38 block/vvfat: Fix bad printf format specifiers
We should use printf format specifier "%u" instead of "%d" for
argument of type "unsigned int".
In addition, fix two error format problems found by checkpatch.pl:
ERROR: space required after that ',' (ctx:VxV)
+        fprintf(stderr,"%s attributes=0x%02x begin=%u size=%d\n",
                       ^
ERROR: line over 90 characters
+        fprintf(stderr, "%d, %s (%u, %d)\n", i, commit->path ? commit->path : "(null)", commit->param.rename.cluster, commit->action);

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Message-Id: <5FA12620.6030705@huawei.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-11-03 16:24:56 +01:00
Eric Blake
dbc7b01492 nbd: Add 'qemu-nbd -A' to expose allocation depth
Allow the server to expose an additional metacontext to be requested
by savvy clients.  qemu-nbd adds a new option -A to expose the
qemu:allocation-depth metacontext through NBD_CMD_BLOCK_STATUS; this
can also be set via QMP when using block-export-add.

qemu as client is hacked into viewing the key aspects of this new
context by abusing the already-experimental x-dirty-bitmap option to
collapse all depths greater than 2, which results in a tri-state value
visible in the output of 'qemu-img map --output=json' (yes, that means
x-dirty-bitmap is now a bit of a misnomer, but I didn't feel like
renaming it as it would introduce a needless break of back-compat,
even though we make no compat guarantees with x- members):

unallocated (depth 0) => "zero":false, "data":true
local (depth 1)       => "zero":false, "data":false
backing (depth 2+)    => "zero":true,  "data":true

libnbd as client is probably a nicer way to get at the information
without having to decipher such hacks in qemu as client. ;)

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201027050556.269064-11-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2020-10-30 15:22:00 -05:00
Eric Blake
a92b1b065e block: Return depth level during bdrv_is_allocated_above
When checking for allocation across a chain, it's already easy to
count the depth within the chain at which the allocation is found.
Instead of throwing that information away, return it to the caller.
Existing callers only cared about allocated/non-allocated, but having
a depth available will be used by NBD in the next patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201027050556.269064-9-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: rebase to master]
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-10-30 15:21:23 -05:00
Greg Kurz
1a6d3bd229 block: End quiescent sections when a BDS is deleted
If a BDS gets deleted during blk_drain_all(), it might miss a
call to bdrv_do_drained_end(). This means missing a call to
aio_enable_external() and the AIO context remains disabled for
ever. This can cause a device to become irresponsive and to
disrupt the guest execution, ie. hang, loop forever or worse.

This scenario is quite easy to encounter with virtio-scsi
on POWER when punching multiple blockdev-create QMP commands
while the guest is booting and it is still running the SLOF
firmware. This happens because SLOF disables/re-enables PCI
devices multiple times via IO/MEM/MASTER bits of PCI_COMMAND
register after the initial probe/feature negotiation, as it
tends to work with a single device at a time at various stages
like probing and running block/network bootloaders without
doing a full reset in-between. This naturally generates many
dataplane stops and starts, and thus many drain sections that
can race with blockdev_create_run(). In the end, SLOF bails
out.

It is somehow reproducible on x86 but it requires to generate
articial dataplane start/stop activity with stop/cont QMP
commands. In this case, seabios ends up looping for ever,
waiting for the virtio-scsi device to send a response to
a command it never received.

Add a helper that pairs all previously called bdrv_do_drained_begin()
with a bdrv_do_drained_end() and call it from bdrv_close().
While at it, update the "/bdrv-drain/graph-change/drain_all"
test in test-bdrv-drain so that it can catch the issue.

BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1874441
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160346526998.272601.9045392804399803158.stgit@bahia.lan>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-27 15:26:20 +01:00
Alberto Garcia
46cd1e8a47 qcow2: Skip copy-on-write when allocating a zero cluster
Since commit c8bb23cbdb when a write
request results in a new allocation QEMU first tries to see if the
rest of the cluster outside the written area contains only zeroes.

In that case, instead of doing a normal copy-on-write operation and
writing explicit zero buffers to disk, the code zeroes the whole
cluster efficiently using pwrite_zeroes() with BDRV_REQ_NO_FALLBACK.

This improves performance very significantly but it only happens when
we are writing to an area that was completely unallocated before. Zero
clusters (QCOW2_CLUSTER_ZERO_*) are treated like normal clusters and
are therefore slower to allocate.

This happens because the code uses bdrv_is_allocated_above() rather
bdrv_block_status_above(). The former is not as accurate for this
purpose but it is faster. However in the case of qcow2 the underlying
call does already report zero clusters just fine so there is no reason
why we cannot use that information.

After testing 4KB writes on an image that only contains zero clusters
this patch results in almost five times more IOPS.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <6d77cab968c501c44d6e1089b9bc91b04170b49e.1603731354.git.berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-27 15:26:20 +01:00
Alberto Garcia
d40f4a565a qcow2: Report BDRV_BLOCK_ZERO more accurately in bdrv_co_block_status()
If a BlockDriverState supports backing files but has none then any
unallocated area reads back as zeroes.

bdrv_co_block_status() is only reporting this is if want_zero is true,
but this is an inexpensive test and there is no reason not to do it in
all cases.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <66fa0914a0e2b727ab6d1b63ca773d7cd29a9a9e.1603731354.git.berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-27 15:26:20 +01:00
Peter Maydell
a95e0396c8 * fix --disable-tcg builds (Claudio)
* Fixes for macOS --enable-modules build and OpenBSD curses/iconv detection (myself)
 * Start preparing for meson 0.56 (myself)
 * Move directory configuration to meson (myself)
 * Start untangling qemu_init (myself)
 * Windows fixes (Sunil)
 * Remove -no-kbm (Thomas)
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl+WrxEUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNQAggAqfucqEQvz6s+DCPv2u572diyMvhe
 Y7vmaQF0qYKoAvy5OLqGlqXVsn8lwf19zJWo9Z7k4qNefWl84ii0J/kEmnolzTGq
 7Z0CRSnGbNQy9YedYXuymaR3E0VY+6lsPnzIpufQISzQRdjzT8OQ51DMAhc04oQl
 saXsts7y+om+tzvW2JFGtNsfFRUjcRKqjIAVfwneBXFW9TRD2epvYxz/S0o+XJwF
 eSiINvTqDxxPyy6XJykC46xf/TTfReHv6fQgTn7Jw3TQuo4m7qXLi5Vj8W1erZJv
 t3xhZNabt813T6ztNcAAuJ0srIn55Ac7Fuq3/1ecgeVD08ntmabe4WhKRg==
 =931x
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging

* fix --disable-tcg builds (Claudio)
* Fixes for macOS --enable-modules build and OpenBSD curses/iconv detection (myself)
* Start preparing for meson 0.56 (myself)
* Move directory configuration to meson (myself)
* Start untangling qemu_init (myself)
* Windows fixes (Sunil)
* Remove -no-kbm (Thomas)

# gpg: Signature made Mon 26 Oct 2020 11:12:17 GMT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini-gitlab/tags/for-upstream:
  machine: move SMP initialization from vl.c
  machine: move UP defaults to class_base_init
  machine: remove deprecated -machine enforce-config-section option
  win32: boot broken when bind & data dir are the same
  WHPX: Fix WHPX build break
  configure: move install_blobs from configure to meson
  configure: remove unused variable from config-host.mak
  configure: move directory options from config-host.mak to meson
  configure: allow configuring localedir
  Makefile: separate meson rerun from the rest of the ninja invocation
  Remove deprecated -no-kvm option
  replay: do not build if TCG is not available
  qtest: unbreak non-TCG builds in bios-tables-test
  hw/core/qdev-clock: add a reference on aliased clocks
  do not use colons in test names
  meson: rewrite curses/iconv test
  build: fix macOS --enable-modules build

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-10-26 15:49:11 +00:00
Vladimir Sementsov-Ogievskiy
7e7e510077 block/io: fix bdrv_is_allocated_above
bdrv_is_allocated_above wrongly handles short backing files: it reports
after-EOF space as UNALLOCATED which is wrong, as on read the data is
generated on the level of short backing file (if all overlays have
unallocated areas at that place).

Reusing bdrv_common_block_status_above fixes the issue and unifies code
path.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20200924194003.22080-5-vsementsov@virtuozzo.com
[Fix s/has/have/ as suggested by Eric Blake. Fix s/area/areas/.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Vladimir Sementsov-Ogievskiy
624f27bbe9 block/io: bdrv_common_block_status_above: support bs == base
We are going to reuse bdrv_common_block_status_above in
bdrv_is_allocated_above. bdrv_is_allocated_above may be called with
include_base == false and still bs == base (for ex. from img_rebase()).

So, support this corner case.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20200924194003.22080-4-vsementsov@virtuozzo.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Vladimir Sementsov-Ogievskiy
3555a43261 block/io: bdrv_common_block_status_above: support include_base
In order to reuse bdrv_common_block_status_above in
bdrv_is_allocated_above, let's support include_base parameter.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20200924194003.22080-3-vsementsov@virtuozzo.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Vladimir Sementsov-Ogievskiy
67c095c8b8 block/io: fix bdrv_co_block_status_above
bdrv_co_block_status_above has several design problems with handling
short backing files:

1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
which produces these after-EOF zeros is inside requested backing
sequence.

2. With want_zero=false, it may return pnum=0 prior to actual EOF,
because of EOF of short backing file.

Fix these things, making logic about short backing files clearer.

With fixed bdrv_block_status_above we also have to improve is_zero in
qcow2 code, otherwise iotest 154 will fail, because with this patch we
stop to merge zeros of different types (produced by fully unallocated
in the whole backing chain regions vs produced by short backing files).

Note also, that this patch leaves for another day the general problem
around block-status: misuse of BDRV_BLOCK_ALLOCATED as is-fs-allocated
vs go-to-backing.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20200924194003.22080-2-vsementsov@virtuozzo.com
[Fix s/comes/come/ as suggested by Eric Blake
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
d9b495f9c6 block/export: add vhost-user-blk multi-queue support
Allow the number of queues to be configured using --export
vhost-user-blk,num-queues=N. This setting should match the QEMU --device
vhost-user-blk-pci,num-queues=N setting but QEMU vhost-user-blk.c lowers
its own value if the vhost-user-blk backend offers fewer queues than
QEMU.

The vhost-user-blk-server.c code is already capable of multi-queue. All
virtqueue processing runs in the same AioContext. No new locking is
needed.

Add the num-queues=N option and set the VIRTIO_BLK_F_MQ feature bit.
Note that the feature bit only announces the presence of the num_queues
configuration space field. It does not promise that there is more than 1
virtqueue, so we can set it unconditionally.

I tested multi-queue by running a random read fio test with numjobs=4 on
an -smp 4 guest. After the benchmark finished the guest /proc/interrupts
file showed activity on all 4 virtio-blk MSI-X. The /sys/block/vda/mq/
directory shows that Linux blk-mq has 4 queues configured.

An automated test is included in the next commit.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20201001144604.559733-2-stefanha@redhat.com
[Fixed accidental tab characters as suggested by Markus Armbruster
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
f51d23c80a block/export: add iothread and fixed-iothread options
Make it possible to specify the iothread where the export will run. By
default the block node can be moved to other AioContexts later and the
export will follow. The fixed-iothread option forces strict behavior
that prevents changing AioContext while the export is active. See the
QAPI docs for details.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200929125516.186715-5-stefanha@redhat.com
[Fix stray '#' character in block-export.json and add missing "(since:
5.2)" as suggested by Eric Blake.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
cbc20bfb8f block: move block exports to libblockdev
Block exports are used by softmmu, qemu-storage-daemon, and qemu-nbd.
They are not used by other programs and are not otherwise needed in
libblock.

Undo the recent move of blockdev-nbd.c from blockdev_ss into block_ss.
Since bdrv_close_all() (libblock) calls blk_exp_close_all()
(libblockdev) a stub function is required..

Make qemu-nbd.c use signal handling utility functions instead of
duplicating the code. This helps because os-posix.c is in libblockdev
and it depends on a qemu_system_killed() symbol that qemu-nbd.c lacks.
Once we use the signal handling utility functions we also end up
providing the necessary symbol.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20200929125516.186715-4-stefanha@redhat.com
[Fixed s/ndb/nbd/ typo in commit description as suggested by Eric Blake
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
3a213f83d9 util/vhost-user-server: use static library in meson.build
Don't compile contrib/libvhost-user/libvhost-user.c again. Instead build
the static library once and then reuse it throughout QEMU.

Also switch from CONFIG_LINUX to CONFIG_VHOST_USER, which is what the
vhost-user tools (vhost-user-gpu, etc) do.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200924151549.913737-14-stefanha@redhat.com
[Added CONFIG_LINUX again because libvhost-user doesn't build on macOS.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
80a06cc52b util/vhost-user-server: move header to include/
Headers used by other subsystems are located in include/. Also add the
vhost-user-server and vhost-user-blk-server headers to MAINTAINERS.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200924151549.913737-13-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
90fc91d50b block/export: convert vhost-user-blk server to block export API
Use the new QAPI block exports API instead of defining our own QOM
objects.

This is a large change because the lifecycle of VuBlockDev needs to
follow BlockExportDriver. QOM properties are replaced by QAPI options
objects.

VuBlockDev is renamed VuBlkExport and contains a BlockExport field.
Several fields can be dropped since BlockExport already has equivalents.

The file names and meson build integration will be adjusted in a future
patch. libvhost-user should probably be built as a static library that
is linked into QEMU instead of as a .c file that results in duplicate
compilation.

The new command-line syntax is:

  $ qemu-storage-daemon \
      --blockdev file,node-name=drive0,filename=test.img \
      --export vhost-user-blk,node-name=drive0,id=export0,unix-socket=/tmp/vhost-user-blk.sock

Note that unix-socket is optional because we may wish to accept chardevs
too in the future.

Markus noted that supported address families are not explicit in the
QAPI schema. It is unlikely that support for more address families will
be added since file descriptor passing is required and few address
families support it. If a new address family needs to be added, then the
QAPI 'features' syntax can be used to advertize them.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20200924151549.913737-12-stefanha@redhat.com
[Skip test on big-endian host architectures because this device doesn't
support them yet (as already mentioned in a code comment).
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
0534b1b227 block/export: report flush errors
Propagate the flush return value since errors are possible.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200924151549.913737-11-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
7185c85776 util/vhost-user-server: rework vu_client_trip() coroutine lifecycle
The vu_client_trip() coroutine is leaked during AioContext switching. It
is also unsafe to destroy the vu_dev in panic_cb() since its callers
still access it in some cases.

Rework the lifecycle to solve these safety issues.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200924151549.913737-10-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
47ba680466 util/vhost-user-server: drop unused DevicePanicNotifier
The device panic notifier callback is not used. Drop it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200924151549.913737-7-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
df6af7ce77 block/export: consolidate request structs into VuBlockReq
Only one struct is needed per request. Drop req_data and the separate
VuBlockReq instance. Instead let vu_queue_pop() allocate everything at
once.

This fixes the req_data memory leak in vu_block_virtio_process_req().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200924151549.913737-6-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Coiby Xu
3578389bcf block/export: vhost-user block device backend server
By making use of libvhost-user, block device drive can be shared to
the connected vhost-user client. Only one client can connect to the
server one time.

Since vhost-user-server needs a block drive to be created first, delay
the creation of this object.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20200918080912.321299-6-coiby.xu@gmail.com
[Shorten "vhost_user_blk_server" string to "vhost_user_blk" to avoid the
following compiler warning:
../block/export/vhost-user-blk-server.c:178:50: error: ‘%s’ directive output truncated writing 21 bytes into a region of size 20 [-Werror=format-truncation=]
and fix "Invalid size %ld ..." ssize_t format string arguments for
32-bit hosts.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Philippe Mathieu-Daudé
f25e7ab2b0 block/nvme: Add driver statistics for access alignment and hw errors
Keep statistics of some hardware errors, and number of
aligned/unaligned I/O accesses.

QMP example booting a full RHEL 8.3 aarch64 guest:

{ "execute": "query-blockstats" }
{
    "return": [
        {
            "device": "",
            "node-name": "drive0",
            "stats": {
                "flush_total_time_ns": 6026948,
                "wr_highest_offset": 3383991230464,
                "wr_total_time_ns": 807450995,
                "failed_wr_operations": 0,
                "failed_rd_operations": 0,
                "wr_merged": 3,
                "wr_bytes": 50133504,
                "failed_unmap_operations": 0,
                "failed_flush_operations": 0,
                "account_invalid": false,
                "rd_total_time_ns": 1846979900,
                "flush_operations": 130,
                "wr_operations": 659,
                "rd_merged": 1192,
                "rd_bytes": 218244096,
                "account_failed": false,
                "idle_time_ns": 2678641497,
                "rd_operations": 7406,
            },
            "driver-specific": {
                "driver": "nvme",
                "completion-errors": 0,
                "unaligned-accesses": 2959,
                "aligned-accesses": 4477
            },
            "qdev": "/machine/peripheral-anon/device[0]/virtio-backend"
        }
    ]
}

Suggested-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20201001162939.1567915-1-philmd@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Claudio Fontana
9b1c911654 replay: do not build if TCG is not available
this fixes non-TCG builds broken recently by replay reverse debugging.

Stub the needed functions in stub/, splitting roughly between functions
needed only by system emulation, by system emulation and tools,
and by everyone.  This includes duplicating some code in replay/, and
puts the logic for non-replay related events in the replay/ module (+
the stubs), so this should be revisited in the future.

Surprisingly, only _one_ qtest was affected by this, ide-test.c, which
resulted in a buzz as the bh events were never delivered, and the bh
never executed.

Many other subsystems _should_ have been affected.

This fixes the immediate issue, however a better way to group replay
functionality to TCG-only code could be developed in the long term.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20201013192123.22632-4-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-22 11:53:54 -04:00
Daniel P. Berrangé
e1c4269763 block: deprecate the sheepdog block driver
This thread from a little over a year ago:

  http://lists.wpkg.org/pipermail/sheepdog/2019-March/thread.html

states that sheepdog is no longer actively developed. The only mentioned
users are some companies who are said to have it for legacy reasons with
plans to replace it by Ceph. There is talk about cutting out existing
features to turn it into a simple demo of how to write a distributed
block service. There is no evidence of anyone working on that idea:

  https://github.com/sheepdog/sheepdog/commits/master

No real commits to git since Jan 2018, and before then just some minor
technical debt cleanup.

There is essentially no activity on the mailing list aside from
patches to QEMU that get CC'd due to our MAINTAINERS entry.

Fedora packages for sheepdog failed to build from upstream source
because of the more strict linker that no longer merges duplicate
global symbols. Fedora patches it to add the missing "extern"
annotations and presumably other distros do to, but upstream source
remains broken.

There is only basic compile testing, no functional testing of the
driver.

Since there are no build pre-requisites the sheepdog driver is currently
enabled unconditionally. This would result in configure issuing a
deprecation warning by default for all users. Thus the configure default
is changed to disable it, requiring users to pass --enable-sheepdog to
build the driver.

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20201002113243.2347710-3-berrange@redhat.com>
Reviewed-by: Neal Gompa <ngompa13@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-15 16:06:28 +02:00
Elena Afanasova
5b4c95d0a3 block/blkdebug: fix memory leak
Spotted by PVS-Studio

Signed-off-by: Elena Afanasova <eafanasova@gmail.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1e903f928eb3da332cc95e2a6f87243bd9fe66e4.camel@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-10-13 13:33:46 +02:00
Christian Borntraeger
cd466702f0 vmdk: fix maybe uninitialized warnings
Fedora 32 gcc 10 seems to give false positives:

Compiling C object libblock.fa.p/block_vmdk.c.o
../block/vmdk.c: In function ‘vmdk_parse_extents’:
../block/vmdk.c:587:5: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  587 |     g_free(extent->l1_table);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~
../block/vmdk.c:754:17: note: ‘extent’ was declared here
  754 |     VmdkExtent *extent;
      |                 ^~~~~~
../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  620 |     ret = vmdk_init_tables(bs, extent, errp);
      |           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../block/vmdk.c:598:17: note: ‘extent’ was declared here
  598 |     VmdkExtent *extent;
      |                 ^~~~~~
../block/vmdk.c:1178:39: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
 1178 |             extent->flat_start_offset = flat_offset << 9;
      |             ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
../block/vmdk.c: In function ‘vmdk_open_vmdk4’:
../block/vmdk.c:581:22: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  581 |     extent->l2_cache =
      |     ~~~~~~~~~~~~~~~~~^
  582 |         g_malloc(extent->entry_size * extent->l2_size * L2_CACHE_SIZE);
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../block/vmdk.c:872:17: note: ‘extent’ was declared here
  872 |     VmdkExtent *extent;
      |                 ^~~~~~
../block/vmdk.c: In function ‘vmdk_open’:
../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  620 |     ret = vmdk_init_tables(bs, extent, errp);
      |           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../block/vmdk.c:598:17: note: ‘extent’ was declared here
  598 |     VmdkExtent *extent;
      |                 ^~~~~~
cc1: all warnings being treated as errors
make: *** [Makefile.ninja:884: libblock.fa.p/block_vmdk.c.o] Error 1

fix them by assigning a default value.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Fam Zheng <fam@euphon.net>
Message-Id: <20200930155859.303148-2-borntraeger@de.ibm.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-10-13 13:33:45 +02:00
Vladimir Sementsov-Ogievskiy
99d72dba1c block/nbd: nbd_co_reconnect_loop(): don't connect if drained
In a recent commit 12c75e20a2 we've improved
nbd_co_reconnect_loop() to not make drain wait for additional sleep.
Similarly, we shouldn't try to connect, if previous sleep was
interrupted by drain begin, otherwise drain_begin will have to wait for
the whole connection attempt.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200903190301.367620-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-10-09 15:04:32 -05:00
Vladimir Sementsov-Ogievskiy
46f56631b5 block/nbd: fix reconnect-delay
reconnect-delay has a design flaw: we handle it in the same loop where
we do connection attempt. So, reconnect-delay may be exceeded by
unpredictable time of connection attempt.

Let's instead use separate timer.

How to reproduce the bug:

1. Create an image on node1:
   qemu-img create -f qcow2 xx 100M

2. Start NBD server on node1:
   qemu-nbd xx

3. On node2 start qemu-io:

./build/qemu-io --image-opts \
driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=15

4. Type 'read 0 512' in qemu-io interface to check that connection
   works

Be careful: you should make steps 5-7 in a short time, less than 15
seconds.

5. Kill nbd server on node1

6. Run 'read 0 512' in qemu-io interface again, to be sure that nbd
client goes to reconnect loop.

7. On node1 run the following command

   sudo iptables -A INPUT -p tcp --dport 10809 -j DROP

This will make the connect() call of qemu-io at node2 take a long time.

And you'll see that read command in qemu-io will hang for a long time,
more than 15 seconds specified by reconnect-delay parameter. It's the
bug.

8. Don't forget to drop iptables rule on node1:

   sudo iptables -D INPUT -p tcp --dport 10809 -j DROP

Important note: Step [5] is necessary to reproduce _this_ bug. If we
miss step [5], the read command (step 6) will hang for a long time and
this commit doesn't help, because there will be not long connect() to
unreachable host, but long sendmsg() to unreachable host, which should
be fixed by enabling and adjusting keep-alive on the socket, which is a
thing for further patch set.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200903190301.367620-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-10-09 15:04:32 -05:00
Vladimir Sementsov-Ogievskiy
8a509afd72 block/nbd: correctly use qio_channel_detach_aio_context when needed
Don't use nbd_client_detach_aio_context() driver handler where we want
to finalize the connection. We should directly use
qio_channel_detach_aio_context() in such cases. Driver handler may (and
will) contain another things, unrelated to the qio channel.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200903190301.367620-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-10-09 15:04:32 -05:00
Vladimir Sementsov-Ogievskiy
8c517de24a block/nbd: fix drain dead-lock because of nbd reconnect-delay
We pause reconnect process during drained section. So, if we have some
requests, waiting for reconnect we should cancel them, otherwise they
deadlock the drained section.

How to reproduce:

1. Create an image:
   qemu-img create -f qcow2 xx 100M

2. Start NBD server:
   qemu-nbd xx

3. Start vm with second nbd disk on node2, like this:

  ./build/x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \
     file=/work/images/cent7.qcow2 -drive \
     driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=60 \
     -vnc :0 -m 2G -enable-kvm -vga std

4. Access the vm through vnc (or some other way?), and check that NBD
   drive works:

   dd if=/dev/sdb of=/dev/null bs=1M count=10

   - the command should succeed.

5. Now, kill the nbd server, and run dd in the guest again:

   dd if=/dev/sdb of=/dev/null bs=1M count=10

Now Qemu is trying to reconnect, and dd-generated requests are waiting
for the connection (they will wait up to 60 seconds (see
reconnect-delay option above) and than fail). But suddenly, vm may
totally hang in the deadlock. You may need to increase reconnect-delay
period to catch the dead-lock.

VM doesn't respond because drain dead-lock happens in cpu thread with
global mutex taken. That's not good thing by itself and is not fixed
by this commit (true way is using iothreads). Still this commit fixes
drain dead-lock itself.

Note: probably, we can instead continue to reconnect during drained
section. To achieve this, we may move negotiation to the connect thread
to make it independent of bs aio context. But expanding drained section
doesn't seem good anyway. So, let's now fix the bug the simplest way.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200903190301.367620-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-10-09 15:04:32 -05:00
Peter Maydell
f2687fdb75 * Reverse debugging (Pavel)
* CFLAGS cleanup (Paolo)
 * ASLR fix (Mark)
 * cpus.c refactoring (Claudio)
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl98EB0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOCsQf9G7EUAK1zcEOx20LtDdXFrk4tjsRp
 S83OGdihWe8SM+XiY9BfqsBbXdByqF+SitePOV3feGK0mOP5vtJIL7/2DLrtFTeF
 wOeARRA9ePVb7hcL5oXAQeE3bXrX8wq8Qtw9xAoHdw5JAEVmKIEJS6AL5Eu3M2Fh
 pvdBoV84pOm2/ARS3eRstRyW8gCC8rdLDlNsVDtCbYdNVq+VdkzR0l5Phc8JDx1M
 Qjdl1KpN6ZkuN8M6tnaQNTb9IUVu5c1tu5jdR6JdLUqAWp1wYZJ6r2jSatZWfLR3
 H+gzFsDoLPfCjZ3IhfZyvzF5leSZmdbFfzI0tHS1UJ/ZZYjutDvlPlbyYA==
 =Jys5
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging

* Reverse debugging (Pavel)
* CFLAGS cleanup (Paolo)
* ASLR fix (Mark)
* cpus.c refactoring (Claudio)

# gpg: Signature made Tue 06 Oct 2020 07:35:09 BST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini-gitlab/tags/for-upstream: (37 commits)
  tests/acceptance: add reverse debugging test
  replay: create temporary snapshot at debugger connection
  replay: describe reverse debugging in docs/replay.txt
  gdbstub: add reverse continue support in replay mode
  gdbstub: add reverse step support in replay mode
  replay: flush rr queue before loading the vmstate
  replay: implement replay-seek command
  replay: introduce breakpoint at the specified step
  replay: introduce info hmp/qmp command
  qapi: introduce replay.json for record/replay-related stuff
  migration: introduce icount field for snapshots
  qcow2: introduce icount field for snapshots
  replay: provide an accessor for rr filename
  replay: don't record interrupt poll
  configure: don't enable ASLR for --enable-debug Windows builds
  configure: consistently pass CFLAGS/CXXFLAGS/LDFLAGS to meson
  configure: do not clobber environment CFLAGS/CXXFLAGS/LDFLAGS
  dtc: Convert Makefile bits to meson bits
  slirp: Convert Makefile bits to meson bits
  accel/tcg: use current_machine as it is always set for softmmu
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-10-06 15:04:10 +01:00
Pavel Dovgalyuk
b39847a505 migration: introduce icount field for snapshots
Saving icount as a parameters of the snapshot allows navigation between
them in the execution replay scenario.
This information can be used for finding a specific snapshot for proceeding
the recorded execution to the specific moment of the time.
E.g., 'reverse step' action (introduced in one of the following patches)
needs to load the nearest snapshot which is prior to the current moment
of time.
This patch also updates snapshot test which verifies qemu monitor output.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Acked-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>

--

v4 changes:
 - squashed format update with test output update
v7 changes:
 - introduced the spaces between the fields in snapshot info output
 - updated the test to match new field widths
Message-Id: <160174518865.12451.14327573383978752463.stgit@pasha-ThinkPad-X280>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-06 08:34:49 +02:00
Pavel Dovgalyuk
bbacffc5f7 qcow2: introduce icount field for snapshots
This patch introduces the icount field for saving within the snapshot.
It is required for navigation between the snapshots in record/replay mode.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Acked-by: Kevin Wolf <kwolf@redhat.com>

--

v7 changes:
 - also fix the test which checks qcow2 snapshot extra data
Message-Id: <160174518284.12451.2301137308458777398.stgit@pasha-ThinkPad-X280>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-06 08:34:49 +02:00
Vladimir Sementsov-Ogievskiy
b33b354f3a block/io: refactor save/load vmstate
Like for read/write in a previous commit, drop extra indirection layer,
generate directly bdrv_readv_vmstate() and bdrv_writev_vmstate().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200924185414.28642-8-vsementsov@virtuozzo.com>
2020-10-05 10:59:42 +01:00
Vladimir Sementsov-Ogievskiy
fae2681add block: drop bdrv_prwv
Now that we are not maintaining boilerplate code for coroutine
wrappers, there is no more sense in keeping the extra indirection layer
of bdrv_prwv().  Let's drop it and instead generate pure bdrv_preadv()
and bdrv_pwritev().

Currently, bdrv_pwritev() and bdrv_preadv() are returning bytes on
success, auto generated functions will instead return zero, as their
_co_ prototype. Still, it's simple to make the conversion safe: the
only external user of bdrv_pwritev() is test-bdrv-drain, and it is
comfortable enough with bdrv_co_pwritev() instead. So prototypes are
moved to local block/coroutines.h. Next, the only internal use is
bdrv_pread() and bdrv_pwrite(), which are modified to return bytes on
success.

Of course, it would be great to convert bdrv_pread() and bdrv_pwrite()
to return 0 on success. But this requires audit (and probably
conversion) of all their users, let's leave it for another day
refactoring.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200924185414.28642-7-vsementsov@virtuozzo.com>
2020-10-05 10:59:42 +01:00
Vladimir Sementsov-Ogievskiy
9bb4b066cc block: generate coroutine-wrapper code
Use code generation implemented in previous commit to generated
coroutine wrappers in block.c and block/io.c

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200924185414.28642-6-vsementsov@virtuozzo.com>
2020-10-05 10:59:42 +01:00
Vladimir Sementsov-Ogievskiy
aaaa20b69b scripts: add block-coroutine-wrapper.py
We have a very frequent pattern of creating a coroutine from a function
with several arguments:

  - create a structure to pack parameters
  - create _entry function to call original function taking parameters
    from struct
  - do different magic to handle completion: set ret to NOT_DONE or
    EINPROGRESS or use separate bool field
  - fill the struct and create coroutine from _entry function with this
    struct as a parameter
  - do coroutine enter and BDRV_POLL_WHILE loop

Let's reduce code duplication by generating coroutine wrappers.

This patch adds scripts/block-coroutine-wrapper.py together with some
friends, which will generate functions with declared prototypes marked
by the 'generated_co_wrapper' specifier.

The usage of new code generation is as follows:

    1. define the coroutine function somewhere

        int coroutine_fn bdrv_co_NAME(...) {...}

    2. declare in some header file

        int generated_co_wrapper bdrv_NAME(...);

       with same list of parameters (generated_co_wrapper is
       defined in "include/block/block.h").

    3. Make sure the block_gen_c declaration in block/meson.build
       mentions the file with your marker function.

Still, no function is now marked, this work is for the following
commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200924185414.28642-5-vsementsov@virtuozzo.com>
[Added encoding='utf-8' to open() calls as requested by Vladimir. Fixed
typo and grammar issues pointed out by Eric Blake. Removed clang-format
dependency that caused build test issues.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-05 10:59:06 +01:00
Vladimir Sementsov-Ogievskiy
21c2283ebc block: declare some coroutine functions in block/coroutines.h
We are going to keep coroutine-wrappers code (structure-packing
parameters, BDRV_POLL wrapper functions) in separate auto-generated
files. So, we'll need a header with declaration of original _co_
functions, for those which are static now. As well, we'll need
declarations for wrapper functions. Do these declarations now, as a
preparation step.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200924185414.28642-4-vsementsov@virtuozzo.com>
2020-10-05 09:35:52 +01:00
Vladimir Sementsov-Ogievskiy
f9e694cb32 block/io: refactor coroutine wrappers
Most of our coroutine wrappers already follow this convention:

We have 'coroutine_fn bdrv_co_<something>(<normal argument list>)' as
the core function, and a wrapper 'bdrv_<something>(<same argument
list>)' which does parameter packing and calls bdrv_run_co().

The only outsiders are the bdrv_prwv_co and
bdrv_common_block_status_above wrappers. Let's refactor them to behave
as the others, it simplifies further conversion of coroutine wrappers.

This patch adds an indirection layer, but it will be compensated by
a further commit, which will drop bdrv_co_prwv together with the
is_write logic, to keep the read and write paths separate.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200924185414.28642-3-vsementsov@virtuozzo.com>
2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé
eefffb0244 block/nvme: Replace magic value by SCALE_MS definition
Use self-explicit SCALE_MS definition instead of magic value
(missed in similar commit e4f310fe7f).

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200922083821.578519-7-philmd@redhat.com>
2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé
fad1eb6886 block/nvme: Use register definitions from 'block/nvme.h'
Use the NVMe register definitions from "block/nvme.h" which
ease a bit reviewing the code while matching the datasheet.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200922083821.578519-6-philmd@redhat.com>
2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé
9406e0d97e block/nvme: Drop NVMeRegs structure, directly use NvmeBar
NVMeRegs only contains NvmeBar. Simplify the code by using NvmeBar
directly.

This triggers a checkpatch.pl error:

  ERROR: Use of volatile is usually wrong, please add a comment
  #30: FILE: block/nvme.c:691:
  +    volatile NvmeBar *regs;

This is a false positive as in our case we are using I/O registers,
so the 'volatile' use is justified.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200922083821.578519-5-philmd@redhat.com>
2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé
37d7a45abd block/nvme: Reduce I/O registers scope
We only access the I/O register in nvme_init().
Remove the reference in BDRVNVMeState and reduce its scope.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200922083821.578519-4-philmd@redhat.com>
2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé
f68453237b block/nvme: Map doorbells pages write-only
Per the datasheet sections 3.1.13/3.1.14:
  "The host should not read the doorbell registers."

As we don't need read access, map the doorbells with write-only
permission. We keep a reference to this mapped address in the
BDRVNVMeState structure.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200922083821.578519-3-philmd@redhat.com>
2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé
b02c01a513 util/vfio-helpers: Pass page protections to qemu_vfio_pci_map_bar()
Pages are currently mapped READ/WRITE. To be able to use different
protections, add a new argument to qemu_vfio_pci_map_bar().

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200922083821.578519-2-philmd@redhat.com>
2020-10-05 09:35:52 +01:00
Alberto Garcia
c508c73dca qcow2: Use L1E_SIZE in qcow2_write_l1_entry()
We overlooked these in 02b1ecfa10

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200928162333.14998-1-berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
30dbc81d31 block/export: Move writable to BlockExportOptions
The 'writable' option is a basic option that will probably be applicable
to most if not all export types that we will implement. Move it from NBD
to the generic BlockExport layer.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-26-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
8cade320c8 block/export: Add query-block-exports
This adds a simple QMP command to query the list of block exports.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-25-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
331170e073 block/export: Create BlockBackend in blk_exp_add()
Every export type will need a BlockBackend, so creating it centrally in
blk_exp_add() instead of the .create driver callback avoids duplication.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-24-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
37a4f70cea block/export: Move blk to BlockExport
Every block export has a BlockBackend representing the disk that is
exported. It should live in BlockExport therefore.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-23-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
1a9f7a804f block/export: Add BLOCK_EXPORT_DELETED event
Clients may want to know when an export has finally disappeard
(block-export-del returns earlier than that in the general case), so add
a QAPI event for it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200924152717.287415-22-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
3c3bc462ad block/export: Add block-export-del
Implement a new QMP command block-export-del and make nbd-server-remove
a wrapper around it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-21-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
3859ad36f0 block/export: Move strong user reference to block_exports
The reference owned by the user/monitor that is created when adding the
export and dropped when removing it was tied to the 'exports' list in
nbd/server.c. Every block export will have a user reference, so move it
to the block export level and tie it to the 'block_exports' list in
block/export/export.c instead. This is necessary for introducing a QMP
command for removing exports.

Note that exports are present in block_exports even after the user has
requested shutdown. This is different from NBD's exports where exports
are immediately removed on a shutdown request, even if they are still in
the process of shutting down. In order to avoid that the user still
interacts with an export that is shutting down (and possibly removes it
a second time), we need to remember if the user actually still owns it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-20-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
d53be9ce55 block/export: Add 'id' option to block-export-add
We'll need an id to identify block exports in monitor commands. This
adds one.

Note that this is different from the 'name' option in the NBD server,
which is the externally visible export name. While block export ids need
to be unique in the whole process, export names must be unique only for
the same server. Different export types or (potentially in the future)
multiple NBD servers can have the same export name externally, but still
need different block export ids internally.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-19-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
bc4ee65b8c block/export: Add blk_exp_close_all(_type)
This adds a function to shut down all block exports, and another one to
shut down the block exports of a single type. The latter is used for now
when stopping the NBD server. As soon as we implement support for
multiple NBD servers, we'll need a per-server list of exports and it
will be replaced by a function using that.

As a side effect, the BlockExport layer has a list tracking all existing
exports now. closed_exports loses its only user and can go away.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-18-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
a6ff798966 block/export: Allocate BlockExport in blk_exp_add()
Instead of letting the driver allocate and return the BlockExport
object, allocate it already in blk_exp_add() and pass it. This allows us
to initialise the generic part before calling into the driver so that
the driver can just use these values instead of having to parse the
options a second time.

For symmetry, move freeing the BlockExport to blk_exp_unref().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-17-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
b6076afcab block/export: Add node-name to BlockExportOptions
Every block export needs a block node to export, so add a 'node-name'
option to BlockExportOptions and remove the replaced option 'device'
from BlockExportOptionsNbd.

To maintain compatibility in nbd-server-add, BlockExportOptionsNbd needs
to be wrapped by a new type NbdServerAddOptions that adds 'device' back
because nbd-server-add doesn't use the BlockExportOptions base type at
all (so even without changing it to a 'node-name' option in
block-export-add, this compatibility code would be necessary).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-16-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
8612c68673 block/export: Move AioContext from NBDExport to BlockExport
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-15-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
c69de1bef5 block/export: Move refcount from NBDExport to BlockExport
Having a refcount makes sense for all types of block exports. It is also
a prerequisite for keeping a list of all exports at the BlockExport
level.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-14-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
1c8222b014 nbd: Add max-connections to nbd-server-start
This is a QMP equivalent of qemu-nbd's --shared option, limiting the
maximum number of clients that can attach at the same time.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200924152717.287415-9-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
9b562c646b block/export: Remove magic from block-export-add
nbd-server-add tries to be convenient and adds two questionable
features that we don't want to share in block-export-add, even for NBD
exports:

1. When requesting a writable export of a read-only device, the export
   is silently downgraded to read-only. This should be an error in the
   context of block-export-add.

2. When using a BlockBackend name, unplugging the device from the guest
   will automatically stop the NBD server, too. This may sometimes be
   what you want, but it could also be very surprising. Let's keep
   things explicit with block-export-add. If the user wants to stop the
   export, they should tell us so.

Move these things into the nbd-server-add QMP command handler so that
they apply only there.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-8-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
56ee86261e block/export: Add BlockExport infrastructure and block-export-add
We want to have a common set of commands for all types of block exports.
Currently, this is only NBD, but we're going to add more types.

This patch adds the basic BlockExport and BlockExportDriver structs and
a QMP command block-export-add that creates a new export based on the
given BlockExportOptions.

qmp_nbd_server_add() becomes a wrapper around qmp_block_export_add().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200924152717.287415-5-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
143ea7670c qapi: Rename BlockExport to BlockExportOptions
The name BlockExport will be used for the struct containing the runtime
state of block exports, so change the name of export creation options.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200924152717.287415-4-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Kevin Wolf
5daa6bfd8e qapi: Create block-export module
Move all block export related types and commands from block-core to the
new QAPI module block-export.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200924152717.287415-3-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Philippe Mathieu-Daudé
74f2e02766 block/sheepdog: Replace magic val by NANOSECONDS_PER_SECOND definition
Use self-explicit NANOSECONDS_PER_SECOND definition instead
of magic value.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200921110145.520944-1-philmd@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02 15:46:40 +02:00
Philippe Mathieu-Daudé
f68c01470b qapi: Restrict query-uuid command to machine code
Only qemu-system-FOO and qemu-storage-daemon provide QMP
monitors, therefore such declarations and definitions are
irrelevant for user-mode emulation.

Restricting the query-uuid command to machine.json pulls less
QAPI-generated code into user-mode.

Acked-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200913195348.1064154-6-philmd@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2020-09-29 15:41:35 +02:00
Stefan Hajnoczi
d73415a315 qemu/atomic.h: rename atomic_ to qatomic_
clang's C11 atomic_fetch_*() functions only take a C11 atomic type
pointer argument. QEMU uses direct types (int, etc) and this causes a
compiler error when a QEMU code calls these functions in a source file
that also included <stdatomic.h> via a system header file:

  $ CC=clang CXX=clang++ ./configure ... && make
  ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid)

Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h
and <stdatomic.h> can co-exist. I checked /usr/include on my machine and
searched GitHub for existing "qatomic_" users but there seem to be none.

This patch was generated using:

  $ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \
    sort -u >/tmp/changed_identifiers
  $ for identifier in $(</tmp/changed_identifiers); do
        sed -i "s%\<$identifier\>%q$identifier%g" \
            $(git grep -I -l "\<$identifier\>")
    done

I manually fixed line-wrap issues and misaligned rST tables.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
2020-09-23 16:07:44 +01:00
Daniel P. Berrangé
b18a24a9f8 block/file: switch to use qemu_open/qemu_create for improved errors
Currently at startup if using cache=none on a filesystem lacking
O_DIRECT such as tmpfs, at startup QEMU prints

qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: file system may not support O_DIRECT
qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: Could not open '/tmp/foo.img': Invalid argument

while at QMP level the hint is missing, so QEMU reports just

  "error": {
      "class": "GenericError",
      "desc": "Could not open '/tmp/foo.img': Invalid argument"
  }

which is close to useless for the end user trying to figure out what
they did wrong.

With this change at startup QEMU prints

qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: Unable to open '/tmp/foo.img': filesystem does not support O_DIRECT

while at the QMP level QEMU reports a massively more informative

  "error": {
     "class": "GenericError",
     "desc": "Unable to open '/tmp/foo.img': filesystem does not support O_DIRECT"
  }

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-16 10:33:48 +01:00
Daniel P. Berrangé
448058aa99 util: rename qemu_open() to qemu_open_old()
We want to introduce a new version of qemu_open() that uses an Error
object for reporting problems and make this it the preferred interface.
Rename the existing method to release the namespace for the new impl.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-09-16 10:33:48 +01:00
Stefano Garzarella
7bae7c805d block/rbd: add 'namespace' to qemu_rbd_strong_runtime_opts[]
Commit 19ae9ae014 ("block/rbd: Add support for ceph namespaces")
introduced namespace support for RBD, but we forgot to add the
new 'namespace' options to qemu_rbd_strong_runtime_opts[].

The 'namespace' is used to identify the image, so it is a strong
option since it can changes the data of a BDS.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1821528
Fixes: 19ae9ae014 ("block/rbd: Add support for ceph namespaces")
Cc: Florian Florensa <fflorensa@online.net>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20200914190553.74871-1-sgarzare@redhat.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:31:10 +02:00
Alberto Garcia
bfd0989acf qcow2: Convert qcow2_alloc_cluster_offset() into qcow2_alloc_host_offset()
qcow2_alloc_cluster_offset() takes an (unaligned) guest offset and
returns the (aligned) offset of the corresponding cluster in the qcow2
image.

In practice none of the callers need to know where the cluster starts
so this patch makes the function calculate and return the final host
offset directly. The function is also renamed accordingly.

See 388e581615 for a similar change to qcow2_get_cluster_offset().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <9bfef50ec9200d752413be4fc2aeb22a28378817.1599833007.git.berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:31:10 +02:00
Alberto Garcia
8e958260c5 qcow2: Make preallocate_co() resize the image to the correct size
This function preallocates metadata structures and then extends the
image to its new size, but that new size calculation is wrong because
it doesn't take into account that the host_offset variable is always
cluster-aligned.

This problem can be reproduced with preallocation=metadata when the
original size is not cluster-aligned but the new size is. In this case
the final image size will be shorter than expected.

   qemu-img create -f qcow2 img.qcow2 31k
   qemu-img resize --preallocation=metadata img.qcow2 128k

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <adeb8b059917b141d5f5b3bd2a016262d3052c79.1599833007.git.berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
[mreitz: Mark compat=0.10 unsupported for iotest 125]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:30:36 +02:00
John Snow
c1dadda02c block/qcow: remove runtime opts
Introduced by d85f4222b4,
These were seemingly never used at all.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20200806211345.2925343-3-jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:13 +02:00
John Snow
30b70f070f block/rbd: remove runtime_opts
This saw its last use in 4bfb274165.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20200806211345.2925343-2-jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:13 +02:00
Alberto Garcia
580384d637 qcow2: Return the original error code in qcow2_co_pwrite_zeroes()
This function checks the current status of a (sub)cluster in order to
see if an unaligned 'write zeroes' request can be done efficiently by
simply updating the L2 metadata and without having to write actual
zeroes to disk.

If the situation does not allow using the fast path then the function
returns -ENOTSUP and the caller falls back to writing zeroes.

If can happen however that the aforementioned check returns an actual
error code so in this case we should pass it to the caller.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200909123739.719-1-berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:13 +02:00
Alberto Garcia
3fec237fca qcow2: Make qcow2_free_any_clusters() free only one cluster
This function takes an L2 entry and a number of clusters to free.
Although in principle it can free any type of cluster (using the L2
entry to determine its type) in practice the API is broken because
compressed clusters have a variable size and there is no way to free
more than one without having the L2 entry of each one of them.

The good news all callers are passing nb_clusters=1 so we can simply
get rid of that parameter.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <77cea0f4616f921d37e971b3c5b18a2faa24b173.1599573989.git.berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:13 +02:00
Alberto Garcia
1a52b73dba qcow2: Handle QCowL2Meta on error in preallocate_co()
If qcow2_alloc_cluster_offset() or qcow2_alloc_cluster_link_l2() fail
then this function simply returns the error code, potentially leaking
the QCowL2Meta structure and leaving stale items in s->cluster_allocs.

A second problem is that this function calls qcow2_free_any_clusters()
on failure but passing a host cluster offset instead of an L2 entry.
Luckily for normal uncompressed clusters a raw offset also works like
a valid L2 entry so it works just the same, but we should be using
qcow2_free_clusters() instead.

This patch fixes both problems by using qcow2_handle_l2meta().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <cd3a6b9abd43f9c0b60be413d760f0cacc67eb66.1599573989.git.berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:13 +02:00
Swapnil Ingle
83a6a90009 block/vhdx: Support vhdx image only with 512 bytes logical sector size
block/vhdx uses qemu block layer where sector size is always 512 bytes.
This may have issues  with 4K logical sector sized vhdx image.

For e.g qemu-img convert on such images fails with following assert:

$qemu-img convert -f vhdx -O raw 4KTest1.vhdx test.raw
qemu-img: util/iov.c:388: qiov_slice: Assertion `offset + len <=
qiov->size' failed.
Aborted

This patch adds an check to return ENOTSUP for vhdx images which
have logical sector size other than 512 bytes.

Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Message-Id: <1596794594-44531-1-git-send-email-swapnil.ingle@nutanix.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:13 +02:00
Alberto Garcia
2b60c5b996 qcow2: Rewrite the documentation of qcow2_alloc_cluster_offset()
The current text corresponds to an earlier, simpler version of this
function and it does not explain how it works now.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <bb5bd06f07c5a05b0818611de0d06ec5b66c8df3.1599150873.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:13 +02:00
Alberto Garcia
f7bd5bba1b qcow2: Don't check nb_clusters when removing l2meta from the list
In the past, when a new cluster was allocated the l2meta structure was
a variable in the stack so it was necessary to have a way to tell
whether it had been initialized and contained valid data or not. The
nb_clusters field was used for this purpose. Since commit f50f88b9fe
this is no longer the case, l2meta (nowadays a pointer to a list) is
only allocated when needed and nb_clusters is guaranteed to be > 0 so
this check is unnecessary.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <ab0b67c29c7ba26e598db35f12aa5ab5982539c1.1599150873.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:13 +02:00
Alberto Garcia
184581fa4d qcow2: Fix removal of list members from BDRVQcow2State.cluster_allocs
When a write request needs to allocate new clusters (or change the L2
bitmap of existing ones) a QCowL2Meta structure is created so the L2
metadata can be later updated and any copy-on-write can be performed
if necessary.

A write request can span a region consisting of an arbitrary
combination of previously unallocated and allocated clusters, and if
the unallocated ones can be put contiguous to the existing ones then
QEMU will do so in order to minimize the number of write operations.

In practice this means that a write request has not just one but a
number of QCowL2Meta structures. All of them are added to the
cluster_allocs list that is stored in BDRVQcow2State and is used to
detect overlapping requests. After the write request finishes all its
associated QCowL2Meta are removed from that list. calculate_l2_meta()
takes care of creating and putting those structures in the list, and
qcow2_handle_l2meta() takes care of removing them.

The problem is that the error path in handle_alloc() also tries to
remove an item in that list, a remnant from the time when this was
handled there (that code would not even be correct anymore because
it only removes one struct and not all the ones from the same write
request).

This can trigger a double removal of the same item from the list,
causing a crash. This is not easy to reproduce in practice because
it requires that do_alloc_cluster_offset() fails after a successful
previous allocation during the same write request, but it can be
reproduced with the included test case.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <3440a1c4d53c4fe48312b478c96accb338cbef7c.1599150873.git.berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:13 +02:00
Alberto Garcia
02b1ecfa10 qcow2: Use macros for the L1, refcount and bitmap table entry sizes
This patch replaces instances of sizeof(uint64_t) in the qcow2 driver
with macros that indicate what those sizes are actually referring to.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20200828110828.13833-1-berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:12 +02:00
Lukas Straub
5eb9a3c7b0 block/quorum.c: stable children names
If we remove the child with the highest index from the quorum,
decrement s->next_child_index. This way we get stable children
names as long as we only remove the last child.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Fixes: https://bugs.launchpad.net/bugs/1881231
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <5d5f930424c1c770754041aa8ad6421dc4e2b58e.1596536719.git.lukasstraub2@web.de>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15 11:05:12 +02:00
Peter Maydell
2499453eb1 Block layer patches:
- qemu-img create: Fail gracefully when backing file is an empty string
 - Fixes related to filter block nodes ("Deal with filters" series)
 - block/nvme: Various cleanups required to use multiple queues
 - block/nvme: Use NvmeBar structure from "block/nvme.h"
 - file-win32: Fix "locking" option
 - iotests: Allow running from different directory
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl9Z7bcRHGt3b2xmQHJl
 ZGhhdC5jb20ACgkQfwmycsiPL9ZS6w//bos+A0RfRRF0YFWkIBLQWxqzKcGvMJ8W
 XWv3mFzd47UaDgRYwVnCC3CR6bLYEINISngZ3geA4jI1+w7AtYKDOO0HN32dUg+D
 ZrNMn02701CA6qkmpxJ+yjsrl9ltR3jYe0me4Wr39Pvdexa2pl/e+M4Vas6FhkYL
 ghAwNThypscGCrFjAlz3ru2Sc/K+sPWrGoqkzr+SWvsm9wy4vb8aLxr8Yy50x/zc
 CqALS9SQ/YA93BCVi9CzPkVyV3ioA0kg/y38WvLtAQ9GZ3m/ekMro3WvdYsRsFCN
 LGXsuwFig+U7Kd7lJrCS9TLnlTJstNGqPq9jEoV5cThPvGknFfMvVOzRmmP7tzqT
 YRcPRy39z44OoLKa3kyg3aF38BTxt+9gPqBnivKMr9j9EecMvPsXXHRvF+lP+LsP
 j753Ih561hX6FurcjX8pc9GOM2cQA0GjlyL77UTTAmLZyFXP/8e55oQbBuYTylc/
 Xlvmc/T+yEGiEGTnK+FxgDAiUaxbCCM9cDVStJjTvsIq43dwXb48g1onDsGZ5eDf
 j9lmAD6TJxHNOB5ErNsDPODf4/D1wJ9t9WVF8UZp9ArfPHRdxMzT7Q4LvetaDmVl
 +hQC9cgTq8Qd8LwSqbKEYua4L6iGbmLAT7/N6htq5L1eVLg76/tLg/tKSwh/vKAY
 yzPmyHaVK84=
 =gaaW
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- qemu-img create: Fail gracefully when backing file is an empty string
- Fixes related to filter block nodes ("Deal with filters" series)
- block/nvme: Various cleanups required to use multiple queues
- block/nvme: Use NvmeBar structure from "block/nvme.h"
- file-win32: Fix "locking" option
- iotests: Allow running from different directory

# gpg: Signature made Thu 10 Sep 2020 10:11:19 BST
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (65 commits)
  block/qcow2-cluster: Add missing "fallthrough" annotation
  block/nvme: Pair doorbell registers
  block/nvme: Use generic NvmeBar structure
  block/nvme: Group controller registers in NVMeRegs structure
  file-win32: Fix "locking" option
  iotests: Allow running from different directory
  iotests: Test committing to overridden backing
  iotests: Add test for commit in sub directory
  iotests: Add filter mirror test cases
  iotests: Add filter commit test cases
  iotests: Let complete_and_wait() work with commit
  iotests: Test that qcow2's data-file is flushed
  block: Leave BDS.backing_{file,format} constant
  block: Inline bdrv_co_block_status_from_*()
  blockdev: Fix active commit choice
  block: Drop backing_bs()
  qemu-img: Use child access functions
  nbd: Use CAF when looking for dirty bitmap
  commit: Deal with filters
  backup: Deal with filters
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-09-11 14:47:49 +01:00
Thomas Huth
b9be6faed1 block/qcow2-cluster: Add missing "fallthrough" annotation
When compiling with -Werror=implicit-fallthrough, the compiler currently
complains:

../../devel/qemu/block/qcow2-cluster.c: In function ‘cluster_needs_new_alloc’:
../../devel/qemu/block/qcow2-cluster.c:1320:12: error: this statement may fall
 through [-Werror=implicit-fallthrough=]
         if (l2_entry & QCOW_OFLAG_COPIED) {
            ^
../../devel/qemu/block/qcow2-cluster.c:1323:5: note: here
     case QCOW2_CLUSTER_UNALLOCATED:
     ^~~~

It's quite obvious that the fallthrough is intended here, so let's add
a comment to silence the compiler warning.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20200908070028.193298-1-thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé
e5ff22ba9f block/nvme: Pair doorbell registers
For each queue doorbell registers are paired as:
- Submission Queue Tail Doorbell
- Completion Queue Head Doorbell

Reflect that in the NVMeRegs structure, and adapt
nvme_create_queue_pair() accordingly.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200904124130.583838-4-philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Fam Zheng <fam@euphon.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé
c7100f0a0b block/nvme: Use generic NvmeBar structure
Commit f3c507adcd ("NVMe: Initial commit for new storage interface")
introduced the NvmeBar structure. Unfortunately in commit bdd6a90a9e
("block: Add VFIO based NVMe driver") we duplicated it.

Apparently in commit a3d9a352d4 ("block: Move NVMe constants to
a separate header") we tried to unify headers but forgot to remove
the structure declared in the block/nvme.c source file.

Do it now, and remove the structure size check which is redundant
with the header check added in commit 74e18435c0 ("hw/block/nvme:
Align I/O BAR to 4 KiB").

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200904124130.583838-3-philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Fam Zheng <fam@euphon.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé
0ea32f34ce block/nvme: Group controller registers in NVMeRegs structure
We want to use the NvmeBar structure from "block/nvme.h" in the
next commit. As a preliminary step, group all the NVMe controller
registers in the 'ctrl' field, keeping the doorbells registers
out of it.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200904124130.583838-2-philmd@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Fam Zheng <fam@euphon.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-10 11:11:12 +02:00
Kevin Wolf
3b079ac0ff file-win32: Fix "locking" option
The intended behaviour was that locking=off/auto work and have no
effect (to remain compatible with file-posix), whereas locking=on would
return an error. Unfortunately, the code forgot to remove "locking" from
the options QDict, so any attempt to use the option would fail.

Replace the option parsing code for "locking" with something that is
part of the raw_runtime_opts QemuOptsList (so it is properly removed
from the QDict) and looks more like file-posix.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200907092739.9988-1-kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-10 11:11:12 +02:00
Markus Armbruster
b15e402fc8 trace-events: Fix attribution of trace points to source
Some trace points are attributed to the wrong source file.  Happens
when we neglect to update trace-events for code motion, or add events
in the wrong place, or misspell the file name.

Clean up with help of scripts/cleanup-trace-events.pl.  Funnies
requiring manual post-processing:

* accel/tcg/cputlb.c trace points are in trace-events.

* block.c and blockdev.c trace points are in block/trace-events.

* hw/block/nvme.c uses the preprocessor to hide its trace point use
  from cleanup-trace-events.pl.

* hw/tpm/tpm_spapr.c uses pseudo trace point tpm_spapr_show_buffer to
  guard debug code.

* include/hw/xen/xen_common.h trace points are in hw/xen/trace-events.

* linux-user/trace-events abbreviates a tedious list of filenames to
  */signal.c.

* net/colo-compare and net/filter-rewriter.c use pseudo trace points
  colo_compare_miscompare and colo_filter_rewriter_debug to guard
  debug code.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200806141334.3646302-5-armbru@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-09-09 17:17:58 +01:00
Markus Armbruster
6ec9379870 trace-events: Delete unused trace points
Tracked down with the help of scripts/cleanup-trace-events.pl.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20200806141334.3646302-4-armbru@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-09-09 17:17:02 +01:00
Max Reitz
0b877d09df block: Leave BDS.backing_{file,format} constant
Parts of the block layer treat BDS.backing_file as if it were whatever
the image header says (i.e., if it is a relative path, it is relative to
the overlay), other parts treat it like a cache for
bs->backing->bs->filename (relative paths are relative to the CWD).
Considering bs->backing->bs->filename exists, let us make it mean the
former.

Among other things, this now allows the user to specify a base when
using qemu-img to commit an image file in a directory that is not the
CWD (assuming, everything uses relative filenames).

Before this patch:

$ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
$ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
$ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'

After this patch:

$ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
Image committed.
$ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
$ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
Image committed.

With this change, bdrv_find_backing_image() must look at whether the
user has overridden a BDS's backing file.  If so, it can no longer use
bs->backing_file, but must instead compare the given filename against
the backing node's filename directly.

Note that this changes the QAPI output for a node's backing_file.  We
had very inconsistent output there (sometimes what the image header
said, sometimes the actual filename of the backing image).  This
inconsistent output was effectively useless, so we have to decide one
way or the other.  Considering that bs->backing_file usually at runtime
contained the path to the image relative to qemu's CWD (or absolute),
this patch changes QAPI's backing_file to always report the
bs->backing->bs->filename from now on.  If you want to receive the image
header information, you have to refer to full-backing-filename.

This necessitates a change to iotest 228.  The interesting information
it really wanted is the image header, and it can get that now, but it
has to use full-backing-filename instead of backing_file.  Because of
this patch's changes to bs->backing_file's behavior, we also need some
reference output changes.

Along with the changes to bs->backing_file, stop updating
BDS.backing_format in bdrv_backing_attach() as well.  This way,
ImageInfo's backing-filename and backing-filename-format fields will
represent what the image header says and nothing else.

iotest 245 changes in behavior: With the backing node no longer
overriding the parent node's backing_file string, you can now omit the
@backing option when reopening a node with neither a default nor a
current backing file even if it used to have a backing node at some
point.

273 also changes: The base image is opened without a format layer, so
ImageInfo.backing-filename-format used to report "file" for the base
image's overlay after blockdev-snapshot.  However, the image header
never says "file" anywhere, so it now reports $IMGFMT.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
549ec0d978 block: Inline bdrv_co_block_status_from_*()
With bdrv_filter_bs(), we can easily handle this default filter behavior
in bdrv_co_block_status().

blkdebug wants to have an additional assertion, so it keeps its own
implementation, except bdrv_co_block_status_from_file() needs to be
inlined there.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
9a71b9de3f commit: Deal with filters
This includes some permission limiting (for example, we only need to
take the RESIZE permission if the base is smaller than the top).

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
2b088c60bb backup: Deal with filters
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
3f072a7fb7 mirror: Deal with filters
This includes some permission limiting (for example, we only need to
take the RESIZE permission for active commits where the base is smaller
than the top).

base_overlay is introduced so we can query bdrv_is_allocated_above() on
it - we cannot do that with base itself, because a filter's block_status
is the same as its child node, so if there are filters on base,
bdrv_is_allocated_above() on base would return information including
base.

Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to
"target_backing_bs", because that is what it really refers to.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
c6f6d8462c block-copy: Use CAF to find sync=top base
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
0a7585dbba block: Use child access functions for QAPI queries
query-block, query-named-block-nodes, and query-blockstats now return
any filtered child under "backing", not just bs->backing or COW
children.  This is so that filters do not interrupt the reported backing
chain.  This changes the output for iotest 184, as the throttled node
now appears as a backing child.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
3f26191c73 block: Report data child for query-blockstats
It makes no sense to report the block stats of a purely metadata-storing
child in query-blockstats.  So if the primary child does not have any
data, try to find a unique data-storing child.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
07cd7b659a block/null: Implement bdrv_get_allocated_file_size
It is trivial, so we might as well do it.

Remove _filter_actual_image_size from iotest 184, so we get to see the
result in its reference output.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
c8af87573f block/snapshot: Fix fallback
If the top node's driver does not provide snapshot functionality and we
want to fall back to a node down the chain, we need to snapshot all
non-COW children.  For simplicity's sake, just do not fall back if there
is more than one such child.  Furthermore, we really only can fall back
to bs->file and bs->backing, because bdrv_snapshot_goto() has to modify
the child link (notably, set it to NULL).

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
c4db2e25df block: Use CAF in bdrv_co_rw_vmstate()
If a node whose driver does not provide VM state functions has a
metadata child, the VM state should probably go there; if it is a
filter, the VM state should probably go there.  It follows that we
should generally go down to the primary child.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
66b129ac5e block: Iterate over children in refresh_limits
Instead of looking at just bs->file and bs->backing, we should look at
all children that could end up receiving forwarded requests.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
fb787f02a6 vmdk: Drop vmdk_co_flush()
Before HEAD^, we needed this because bdrv_co_flush() by itself would
only flush bs->file.  With HEAD^, bdrv_co_flush() will flush all
children on which a WRITE or WRITE_UNCHANGED permission has been taken.
Thus, vmdk no longer needs to do it itself.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
883833e29c block: Flush all children in generic code
If the driver does not support .bdrv_co_flush() so bdrv_co_flush()
itself has to flush the children of the given node, it should not flush
just bs->file->bs, but in fact all children that might have been written
to (judging from the permissions taken on them).

This is a bug fix for qcow2 images with an external data file, as they
so far did not flush that data_file node.

In any case, the BLKDBG_EVENT() should be emitted on the primary child,
because that is where a blkdebug node would be if there is any.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
23b93525a2 block: Use bdrv_cow_child() in bdrv_co_truncate()
The condition modified here is not about potentially filtered children,
but only about COW sources (i.e. traditional backing files).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
67acfd2188 stream: Deal with filters
Because of the (not so recent anymore) changes that make the stream job
independent of the base node and instead track the node above it, we
have to split that "bottom" node into two cases: The bottom COW node,
and the node directly above the base node (which may be an R/W filter
or the bottom COW node).

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
cb8503159a block: Use CAFs in block status functions
Use the child access functions in the block status inquiry functions as
appropriate.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
93393e698c block: Use bdrv_filter_(bs|child) where obvious
Places that use patterns like

    if (bs->drv->is_filter && bs->file) {
        ... something about bs->file->bs ...
    }

should be

    BlockDriverState *filtered = bdrv_filter_bs(bs);
    if (filtered) {
        ... something about @filtered ...
    }

instead.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00
Max Reitz
4935e8be22 copy-on-read: Support compressed writes
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07 12:31:31 +02:00