Now that the implementation of subclusters is complete we can finally
add the necessary options to create and read images with this feature,
which we call "extended L2 entries".
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <6476caaa73216bd05b7bb2d504a20415e1665176.1594396418.git.berto@igalia.com>
[mreitz: %s/5\.1/5.2/; fixed 302's and 303's reference output]
Signed-off-by: Max Reitz <mreitz@redhat.com>
The various schemas included in QEMU use a JSON-based format which
is, however, strictly speaking not valid JSON.
As a consequence, when vim tries to apply syntax highlight rules
for JSON (as guessed from the file name), the result is an unreadable
mess which mostly consist of red markers pointing out supposed errors
in, well, pretty much everything.
Using Python syntax highlighting produces much better results, and
in fact these files already start with specially-formatted comments
that instruct Emacs to process them as if they were Python files.
This commit adds the equivalent special comments for vim.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Message-Id: <20200729185024.121766-1-abologna@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200730091656.2633334-1-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[One more line de-indented]
The vxhs code doesn't compile since v2.12.0. There's no point in fixing
and then adding CI for a config that our users have demonstrated that
they do not use; better to just remove it.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200711065926.2204721-1-marcandre.lureau@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Especially when O_DIRECT is used with image files so that the page cache
indirection can't cause a merge of allocating requests, the file will
fragment on the file system layer, with a potentially very small
fragment size (this depends on the requests the guest sent).
On Linux, fragmentation can be reduced by setting an extent size hint
when creating the file (at least on XFS, it can't be set any more after
the first extent has been allocated), basically giving raw files a
"cluster size" for allocation.
This adds a create option to set the extent size hint, and changes the
default from not setting a hint to setting it to 1 MB. The main reason
why qcow2 defaults to smaller cluster sizes is that COW becomes more
expensive, which is not an issue with raw files, so we can choose a
larger size. The tradeoff here is only potentially wasted disk space.
For qcow2 (or other image formats) over file-posix, the advantage should
even be greater because they grow sequentially without leaving holes, so
there won't be wasted space. Setting even larger extent size hints for
such images may make sense. This can be done with the new option, but
let's keep the default conservative for now.
The effect is very visible with a test that intentionally creates a
badly fragmented file with qemu-img bench (the time difference while
creating the file is already remarkable) and then looks at the number of
extents and the time a simple "qemu-img map" takes.
Without an extent size hint:
$ ./qemu-img create -f raw -o extent_size_hint=0 ~/tmp/test.raw 10G
Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=0
$ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0
Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192)
Run completed in 25.848 seconds.
$ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096
Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192)
Run completed in 19.616 seconds.
$ filefrag ~/tmp/test.raw
/home/kwolf/tmp/test.raw: 2000000 extents found
$ time ./qemu-img map ~/tmp/test.raw
Offset Length Mapped to File
0 0x1e8480000 0 /home/kwolf/tmp/test.raw
real 0m1,279s
user 0m0,043s
sys 0m1,226s
With the new default extent size hint of 1 MB:
$ ./qemu-img create -f raw -o extent_size_hint=1M ~/tmp/test.raw 10G
Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=1048576
$ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0
Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192)
Run completed in 11.833 seconds.
$ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096
Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192)
Run completed in 10.155 seconds.
$ filefrag ~/tmp/test.raw
/home/kwolf/tmp/test.raw: 178 extents found
$ time ./qemu-img map ~/tmp/test.raw
Offset Length Mapped to File
0 0x1e8480000 0 /home/kwolf/tmp/test.raw
real 0m0,061s
user 0m0,040s
sys 0m0,014s
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200707142329.48303-1-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently the implementation only supports amending the encryption
options, unlike the qemu-img version
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200608094030.670121-14-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200608094030.670121-13-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
blockdev-amend will be used similiar to blockdev-create
to allow on the fly changes of the structure of the format based block devices.
Current plan is to first support encryption keyslot management for luks
based formats (raw and embedded in qcow2)
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20200608094030.670121-12-mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
It's useful to know how much space can be occupied by qcow2 persistent
bitmaps, even though such metadata is unrelated to the guest-visible
data. Report this value as an additional QMP field, present when
measuring an existing image and output format that both support
bitmaps. Update iotest 178 and 190 to updated output, as well as new
coverage in 190 demonstrating non-zero values made possible with the
recently-added qemu-img bitmap command (see 3b51ab4b).
The new 'bitmaps size:' field is displayed automatically as part of
'qemu-img measure' any time it is present in QMP (that is, any time
both the source image being measured and destination format support
bitmaps, even if the measurement is 0 because there are no bitmaps
present). If the field is absent, it means that no bitmaps can be
copied (source, destination, or both lack bitmaps, including when
measuring based on size rather than on a source image). This behavior
is compatible with an upcoming patch adding 'qemu-img convert
--bitmaps': that command will fail in the same situations where this
patch omits the field.
The addition of a new field demonstrates why we should always
zero-initialize qapi C structs; while the qcow2 driver still fully
populates all fields, the raw and crypto drivers had to be tweaked to
avoid uninitialized data.
Consideration was also given towards having a 'qemu-img measure
--bitmaps' which errors out when bitmaps are not possible, and
otherwise sums the bitmaps into the existing allocation totals rather
than displaying as a separate field, as a potential convenience
factor. But this was ultimately decided to be more complexity than
necessary when the QMP interface was sufficient enough with bitmaps
remaining a separate field.
See also: https://bugzilla.redhat.com/1779904
Reported-by: Nir Soffer <nsoffer@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200521192137.1120211-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.
The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G
The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.
compress cmd:
time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
src.img [zlib|zstd]_compressed.img
decompress cmd
time ./qemu-img convert -O qcow2
[zlib|zstd]_compressed.img uncompressed.img
compression decompression
zlib zstd zlib zstd
------------------------------------------------------------
real 65.5 16.3 (-75 %) 1.9 1.6 (-16 %)
user 65.0 15.8 5.3 2.5
sys 3.3 0.2 2.0 2.0
Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G
Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
QAPI part:
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200507082521.29210-4-dplotnikov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
The patch adds some preparation parts for incompatible compression type
feature to qcow2 allowing the use different compression methods for
image clusters (de)compressing.
It is implied that the compression type is set on the image creation and
can be changed only later by image conversion, thus compression type
defines the only compression algorithm used for the image, and thus,
for all image clusters.
The goal of the feature is to add support of other compression methods
to qcow2. For example, ZSTD which is more effective on compression than ZLIB.
The default compression is ZLIB. Images created with ZLIB compression type
are backward compatible with older qemu versions.
Adding of the compression type breaks a number of tests because now the
compression type is reported on image creation and there are some changes
in the qcow2 header in size and offsets.
The tests are fixed in the following ways:
* filter out compression_type for many tests
* fix header size, feature table size and backing file offset
affected tests: 031, 036, 061, 080
header_size +=8: 1 byte compression type
7 bytes padding
feature_table += 48: incompatible feature compression type
backing_file_offset += 56 (8 + 48 -> header_change + feature_table_change)
* add "compression type" for test output matching when it isn't filtered
affected tests: 049, 060, 061, 065, 082, 085, 144, 182, 185, 198, 206,
242, 255, 274, 280
Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
QAPI part:
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200507082521.29210-2-dplotnikov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add feature 'deprecated' to the deprecated QMP commands, so their
deprecation becomes visible in output of query-qmp-schema. Looks like
this:
{"name": "query-cpus",
"ret-type": "[164]",
"meta-type": "command",
"arg-type": "0",
---> "features": ["deprecated"]}
Management applications could conceivably use this for static
checking.
The deprecated commands are change, cpu-add, migrate-set-cache-size,
migrate_set_downtime, migrate_set_speed, query-cpus, query-events,
query-migrate-cache-size.
The deprecated command arguments are block-commit arguments @base and
@top, and block_set_io_throttle, blockdev-change-medium,
blockdev-close-tray, blockdev-open-tray, eject argument @device.
The deprecated command results are query-cpus-fast result @arch,
query-block result @dirty-bitmaps, query-named-block-nodes result
@encryption_key_missing and result @dirty-bitmaps's member @status.
Same for query-block result @inserted, which mirrors
query-named-block-nodes.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200317115459.31821-27-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Anounce that 'blockdev-snapshot' command's permissions allow changing
of the backing file if the 'consistent_read' permission is not required.
This is useful for libvirt to allow late opening of the backing chain
during a blockdev-mirror.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200310113831.27293-8-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Starting from ceph Nautilus, RBD has support for namespaces, allowing
for finer grain ACLs on images inside a pool, and tenant isolation.
In the rbd cli tool documentation, the new image-spec and snap-spec are :
- [pool-name/[namespace-name/]]image-name
- [pool-name/[namespace-name/]]image-name@snap-name
When using an non namespace's enabled qemu, it complains about not
finding the image called namespace-name/image-name, thus we only need to
parse the image once again to find if there is a '/' in its name, and if
there is, use what is before it as the name of the namespace to later
pass it to rados_ioctx_set_namespace.
rados_ioctx_set_namespace if called with en empty string or a null
pointer as the namespace parameters pretty much does nothing, as it then
defaults to the default namespace.
The namespace is extracted inside qemu_rbd_parse_filename, stored in the
qdict, and used in qemu_rbd_connect to make it work with both qemu-img,
and qemu itself.
Signed-off-by: Florian Florensa <fflorensa@online.net>
Message-Id: <20200110111513.321728-2-fflorensa@online.net>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a --export option to qemu-storage-daemon to export a block node. For
now, only NBD exports are implemented. Apart from the 'type' option
(which is the implied key), it maps the arguments for nbd-server-add to
the command line. Example:
--export nbd,device=disk,name=test-export,writable=on
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200224143008.13362-12-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move the arguments of nbd-server-add to a new struct BlockExportNbd and
convert the command to 'boxed': true. This makes it easier to share code
with the storage daemon.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200224143008.13362-11-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a --nbd-server option to qemu-storage-daemon to start the built-in
NBD server right away. It maps the arguments for nbd-server-start to the
command line, with the exception that it uses SocketAddress instead of
SocketAddressLegacy: New interfaces shouldn't use legacy types, and the
additional nesting would be nasty on the command line.
Example (only with required options):
--nbd-server addr.type=inet,addr.host=localhost,addr.port=10809
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200224143008.13362-10-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
QMP commands that are related to the system emulator and don't make
sense in the context of tools such as qemu-storage-daemon should live in
qapi/block.json rather than qapi/block-core.json. Move them there.
The associated data types are actually also used in code shared with the
tools, so they stay in block-core.json.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200224143008.13362-6-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
block-core is for everything that isn't related to the system emulator.
Internal snapshots, the NBD server and quorum events make sense in the
tools, too, so move them to block-core.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200224143008.13362-5-kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When a management application manages node names there's no reason to
recurse into backing images in the output of query-named-block-nodes.
Add a parameter to the command which will return just the top level
structs.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Message-Id: <4470f8c779abc404dcf65e375db195cd91a80651.1579509782.git.pkrempa@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[mreitz: Fixed coding style]
Signed-off-by: Max Reitz <mreitz@redhat.com>
Now that the error handling in the common block job is fixed, we can
expose the on-error option in QMP instead of hard-coding it as 'report'
in qmp_block_commit().
This fulfills the promise that the old comment in that function made,
even if a bit later than expected: "This will be part of the QMP
command, if/when the BlockdevOnError change for blkmirror makes it in".
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200214200812.28180-7-kwolf@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It is not obvious what 'ignore' actually means for block jobs: It could
be continuing the job and returning success in the end despite the error
(no block job does this). It could also mean continuing and returning
failure in the end (this is what stream does). And it can mean retrying
the failed request later (this is what backup, commit and mirror do).
This (somewhat inconsistent) behaviour was introduced and described for
stream and mirror in commit 32c81a4a6e. backup and commit were
introduced later and use the same model as mirror.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200214200812.28180-2-kwolf@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We would like to switch the doc comments to rST format. rST
insists on a blank line before and after a bulleted list, but our
Texinfo doc generator did not. Add some extra blank lines in the doc
comments so they're acceptable rST input.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200213175647.17628-17-peter.maydell@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
A JSON block comment like this:
Returns: nothing on success
If @node is not a valid block device, DeviceNotFound
If @name is not found, GenericError with an explanation
renders like this:
Returns: nothing on success If node is not a valid block device,
DeviceNotFound If name is not found, GenericError with an explanation
because whitespace is not significant.
Use an actual bulleted list, so that the formatting is correct.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20200213175647.17628-14-peter.maydell@linaro.org>
Message-Id: <20200213175647.17628-15-peter.maydell@linaro.org>
Message-Id: <20200213175647.17628-16-peter.maydell@linaro.org>
[Three commits squashed into one]
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
There are some stray hardcoded tabs in some of our json files;
remove them.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200213175647.17628-11-peter.maydell@linaro.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The current doc generation doesn't care much about indentation levels,
but we would like to switch to an rST format, and rST does care about
indentation.
Make the doc comments more strongly consistent about indentation
for multiline constructs like:
@arg: description line 1
description line 2
Returns: line one
line 2
so that there is always exactly one space after the colon, and
subsequent lines align with the first.
This commit is a purely whitespace change, and it does not alter the
generated .texi files (because the texi generation code strips away
all the extra whitespace). This does mean that we end up with some
over-length lines.
Note that when the documentation for an argument fits on a single
line like this:
@arg: one line only
then stray extra spaces after the ':' don't affect the rST output, so
I have not attempted to methodically fix them, though the preference
is a single space here too.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20200213175647.17628-10-peter.maydell@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Some qapi doc comments have forgotten the ':' after the
@argument, like this:
# @filename Filename for the new image file
# @size Size of the virtual disk in bytes
The result is that these are parsed as part of the body
text and appear as a run-on line:
filename Filename for the new image file size Size of the virtual disk in bytes"
followed by
filename: string
Not documented
size: int
Not documented
in the 'Members' section.
Correct the formatting.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200213175647.17628-9-peter.maydell@linaro.org>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The ascii-art graph in the BlockLatencyHistogramInfo documentation
doesn't render correctly, because the whitespace is collapsed.
Use the '|' format that emits a literal 'example' block so the graph
is displayed correctly.
Strictly the Texinfo generated is still wrong because each line
goes into its own @example environment, but it renders better
than what we had before.
Fixing this rendering is a necessary prerequisite for the upcoming rST
generator, which otherwise complains about the inconsistent
indentation in the ascii-art graph.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20200213175647.17628-8-peter.maydell@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Since io_uring is the actual name of the Linux API, we use it as enum
value even though the QAPI schema conventions would prefer io-uring.
Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200120141858.587874-3-stefanha@redhat.com
Message-Id: <20200120141858.587874-3-stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Allow writing all the data compressed through the filter driver.
The written data will be aligned by the cluster size.
Based on the QEMU current implementation, that data can be written to
unallocated clusters only. May be used for a backup job.
Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 1575288906-551879-2-git-send-email-andrey.shinkevich@virtuozzo.com
[mreitz: Replace NULL bdrv_get_format_name() by "(no format)"]
Signed-off-by: Max Reitz <mreitz@redhat.com>
Sometimes it is useful to be able to add a node to the block graph that
takes or unshare a certain set of permissions for debugging purposes.
This patch adds this capability to blkdebug.
(Note that you cannot make blkdebug release or share permissions that it
needs to take or cannot share, because this might result in assertion
failures in the block layer. But if the blkdebug node has no parents,
it will not take any permissions and share everything by default, so you
can then freely choose what permissions to take and share.)
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191108123455.39445-4-mreitz@redhat.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Mention that this is a PCI device address & give the format it is
expected in. Also mention that it must be first unbound from any
host kernel driver.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We document that for qcow2 persistent bitmaps, the name cannot exceed
1023 bytes. It is inconsistent if transient bitmaps do not have to
abide by the same limit, and it is unlikely that any existing client
even cares about using bitmap names this long. It's time to codify
that ALL bitmaps managed by qemu (whether persistent in qcow2 or not)
have a documented maximum length.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20191114024635.11363-3-eblake@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
This patch adds support for blkreplay driver to the blockdev options.
Now blkreplay can be used with -blockdev command line option
in the following format:
-blockdev driver=blkreplay,image=file-node-name,node-name=replay-node-name
This option makes possible implementation of the better command
line support for record/replay invocations.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This parameter has been deprecated since 2.12.0 and is eligible for
removal. Remove this parameter as it is actually completely ignored;
let's not give false hope.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191002232411.29968-1-jsnow@redhat.com
Drop write notifiers and use filter node instead.
= Changes =
1. Add filter-node-name argument for backup qmp api. We have to do it
in this commit, as 257 needs to be fixed.
2. There are no more write notifiers here, so is_write_notifier
parameter is dropped from block-copy paths.
3. To sync with in-flight requests at job finish we now have drained
removing of the filter, we don't need rw-lock.
4. Block-copy is now using BdrvChildren instead of BlockBackends
5. As backup-top owns these children, we also move block-copy state
into backup-top's ownership.
= Iotest changes =
56: op-blocker doesn't shoot now, as we set it on source, but then
check on filter, when trying to start second backup.
To keep the test we instead can catch another collision: both jobs will
get 'drive0' job-id, as job-id parameter is unspecified. To prevent
interleaving with file-posix locks (as they are dependent on config)
let's use another target for second backup.
Also, it's obvious now that we'd like to drop this op-blocker at all
and add a test-case for two backups from one node (to different
destinations) actually works. But not in these series.
141: Output changed: prepatch, "Node is in use" comes from bdrv_has_blk
check inside qmp_blockdev_del. But we've dropped block-copy blk
objects, so no more blk objects on source bs (job blk is on backup-top
filter bs). New message is from op-blocker, which is the next check in
qmp_blockdev_add.
257: The test wants to emulate guest write during backup. They should
go to filter node, not to original source node, of course. Therefore we
need to specify filter node name and use it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191001131409.14202-6-vsementsov@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
A block driver can provide a callback to report driver-specific
statistics.
file-posix driver now reports discard statistics
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20190923121737.83281-10-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20190923121737.83281-3-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Make the stat fields definition slightly more readable.
Also reorder total_time_ns stats read-write-flush as done elsewhere.
Cosmetic change only.
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190923121737.83281-2-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
When preallocating an encrypted qcow2 image, it just lets the protocol
driver write data and then does not mark the clusters as zero.
Therefore, reading this image will yield effectively random data.
As such, we have not fulfilled the promise of always writing zeroes when
preallocating an image in a while. It seems that nobody has really
cared, so change the documentation to conform to qemu's actual behavior.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190711132935.13070-1-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Let's add a possibility to query dirty-bitmaps not only on root nodes.
It is useful when dealing both with snapshots and incremental backups.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 20190717173937.18747-1-jsnow@redhat.com
[Added deprecation information. --js]
Signed-off-by: John Snow <jsnow@redhat.com>
[Fixed spelling --js]
Accept bitmaps and sync policies for the other backup modes.
This allows us to do things like create a bitmap synced to a full backup
without a transaction, or start a resumable backup process.
Some combinations don't make sense, though:
- NEVER policy combined with any non-BITMAP mode doesn't do anything,
because the bitmap isn't used for input or output.
It's harmless, but is almost certainly never what the user wanted.
- sync=NONE is more questionable. It can't use on-success because this
job never completes with success anyway, and the resulting artifact
of 'always' is suspect: because we start with a full bitmap and only
copy out segments that get written to, the final output bitmap will
always be ... a fully set bitmap.
Maybe there's contexts in which bitmaps make sense for sync=none,
but not without more severe changes to the current job, and omitting
it here doesn't prevent us from adding it later.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190716000117.25219-11-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
This adds an "always" policy for bitmap synchronization. Regardless of if
the job succeeds or fails, the bitmap is *always* synchronized. This means
that for backups that fail part-way through, the bitmap retains a record of
which sectors need to be copied out to accomplish a new backup using the
old, partial result.
In effect, this allows us to "resume" a failed backup; however the new backup
will be from the new point in time, so it isn't a "resume" as much as it is
an "incremental retry." This can be useful in the case of extremely large
backups that fail considerably through the operation and we'd like to not waste
the work that was already performed.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190709232550.10724-13-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
This adds a "never" policy for bitmap synchronization. Regardless of if
the job succeeds or fails, we never update the bitmap. This can be used
to perform differential backups, or simply to avoid the job modifying a
bitmap.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190709232550.10724-7-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
We don't need or want a new sync mode for simple differences in
semantics. Create a new mode simply named "BITMAP" that is designed to
make use of the new Bitmap Sync Mode field.
Because the only bitmap sync mode is 'on-success', this adds no new
functionality to the backup job (yet). The old incremental backup mode
is maintained as a syntactic sugar for sync=bitmap, mode=on-success.
Add all of the plumbing necessary to support this new instruction.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190709232550.10724-6-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Depending on what a user is trying to accomplish, there might be a few
bitmap cleanup actions that occur when an operation is finished that
could be useful.
I am proposing three:
- NEVER: The bitmap is never synchronized against what was copied.
- ALWAYS: The bitmap is always synchronized, even on failures.
- ON-SUCCESS: The bitmap is synchronized only on success.
The existing incremental backup modes use 'on-success' semantics,
so add just that one for right now.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20190709232550.10724-5-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
drive-backup and blockdev-backup have an awful lot of things in common
that are the same. Let's fix that.
I don't deduplicate 'target', because the semantics actually did change
between each structure. Leave that one alone so it can be documented
separately.
Where documentation was not identical, use the most up-to-date version.
For "speed", use Blockdev-Backup's version. For "sync", use
Drive-Backup's version.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
[Maintainer edit: modified commit message. --js]
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20190709232550.10724-2-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Reconnect will be implemented in the following commit, so for now,
in semantics below, disconnect itself is a "serious error".
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190618114328.55249-5-vsementsov@virtuozzo.com>
[eblake: slipped from 4.1 to 4.2]
Signed-off-by: Eric Blake <eblake@redhat.com>
Commit cd219eb1e5 added the read-zeroes option for the null-co and
null-aio block driver, but forgot to add them to the QAPI schema.
Therefore, this option wasn't available in -blockdev and blockdev-add
until now.
Add the missing option in the schema to make it available there, too.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190507203508.18026-6-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Together with @iotypes and @sector, this can be used to trap e.g. the
first read or write access to a certain sector without having to know
what happens internally in the block layer, i.e. which "real" events
happen right before such an access.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190507203508.18026-5-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
This new error option allows users of blkdebug to inject errors only on
certain kinds of I/O operations. Users usually want to make a very
specific operation fail, not just any; but right now they simply hope
that the event that triggers the error injection is followed up with
that very operation. That may not be true, however, because the block
layer is changing (including blkdebug, which may increase the number of
types of I/O operations on which to inject errors).
The new option's default has been chosen to keep backwards
compatibility.
Note that similar to the internal representation, we could choose to
expose this option as a list of I/O types. But there is no practical
use for this, because as described above, users usually know exactly
which kind of operation they want to make fail, so there is no need to
specify multiple I/O types at once. In addition, exposing this option
as a list would require non-trivial changes to qemu_opts_absorb_qdict().
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190507203508.18026-4-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
A snapshot is something that reflects the state of something at a
certain point in time. It does not change.
The file our snapshot commands create (or the node they install) is not
a snapshot, as it does change over time. It is an overlay. We cannot
do anything about the parameter names, but we can at least adjust the
descriptions to reflect that fact.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20190603202236.1342-2-mreitz@redhat.com
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
In commit 23dece19da ('file-posix: Make auto-read-only dynamic') ,
auto-read-only=on changed its behaviour in file-posix for the 4.0
release. This change cannot be detected through the usual mechanisms
like schema introspection. Add a new feature flag to the schema to
allow libvirt to detect the presence of the new behaviour.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20190606153803.5278-7-armbru@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Comment tweaked on Eric Blake's advice]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Add default and available values in the documentation block of
each block device or protocol that supports the 'preallocation'
parameter during the image creation.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190524075848.23781-2-sgarzare@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Add new optional parameter making possible to merge bitmaps from
different nodes. It is needed to maintain external snapshots during
incremental backup chain history.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190517152111.206494-2-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
If COW areas of the newly allocated clusters are zeroes on the backing
image, efficient bdrv_write_zeroes(flags=BDRV_REQ_NO_FALLBACK) can be
used on the whole cluster instead of writing explicit zero buffers later
in perform_cow().
iotest 060:
write to the discarded cluster does not trigger COW anymore.
Use a backing image instead.
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Message-id: 20190516142749.81019-2-anton.nefedov@virtuozzo.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
There no @device parameter, only the @id one.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* Add 'drop-cache=on|off' option to file-posix.c. The default is on.
Disabling the option fixes a QEMU 3.0.0 performance regression when live
migrating on the same host with cache.direct=off.
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJciOSEAAoJEJykq7OBq3PIVSUIAI6r2Mgoi+no4nle8Jf2nZ+W
EnQXnNEFyJA0lKRtqQ2UILD9udVdKd/L1PZu5k/Il/Ralto9Yf3+62brekI7rsss
c3Qusu4LUK6jom2RslRjRIaJ9GilQi/jWezKV/O0VlcsMVemgVHX008EIR+ea1U4
H0/u2kfu04PciKQ5MR2+6aacu9bfmyH1yM2no+aMN5dDu/38PV6JEsf0Zl2agowg
opGepJ7YiDQsxH9IBXrbfm38mBrrY0K2vFzAb9BzTHfBPotGMNIZNJNM2FChRfoM
sTjOIpZz3NDwPEUPQPZxp+7YKRFFYfse1oHtpyh4n1rMQksB019SCGlP9TBhrF0=
=CH5G
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
Pull request
* Add 'drop-cache=on|off' option to file-posix.c. The default is on.
Disabling the option fixes a QEMU 3.0.0 performance regression when live
migrating on the same host with cache.direct=off.
# gpg: Signature made Wed 13 Mar 2019 11:07:48 GMT
# gpg: using RSA key 9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
* remotes/stefanha/tags/block-pull-request:
file-posix: add drop-cache=on|off option
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Commit dd577a26ff ("block/file-posix:
implement bdrv_co_invalidate_cache() on Linux") introduced page cache
invalidation so that cache.direct=off live migration is safe on Linux.
The invalidation takes a significant amount of time when the file is
large and present in the page cache. Normally this is not the case for
cross-host live migration but it can happen when migrating between QEMU
processes on the same host.
On same-host migration we don't need to invalidate pages for correctness
anyway, so an option to skip page cache invalidation is useful. I
investigated optimizing invalidation and detecting same-host migration,
but both are hard to achieve so a user-visible option will suffice.
As a bonus this option means that the cache invalidation feature will
now be detectable by libvirt via QMP schema introspection.
Suggested-by: Neil Skrypuch <neil@tembosocial.com>
Tested-by: Neil Skrypuch <neil@tembosocial.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190307164941.3322-1-stefanha@redhat.com
Message-Id: <20190307164941.3322-1-stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This command allows reopening an arbitrary BlockDriverState with a
new set of options. Some options (e.g node-name) cannot be changed
and some block drivers don't allow reopening, but otherwise this
command is modelled after 'blockdev-add' and the state of the reopened
BlockDriverState should generally be the same as if it had just been
added by 'blockdev-add' with the same set of options.
One notable exception is the 'backing' option: 'x-blockdev-reopen'
requires that it is always present unless the BlockDriverState in
question doesn't have a current or default backing file.
This command allows reconfiguring the graph by using the appropriate
options to change the children of a node. At the moment it's possible
to change a backing file by setting the 'backing' option to the name
of the new node, but it should also be possible to add a similar
functionality to other block drivers (e.g. Quorum, blkverify).
Although the API is unlikely to change, this command is marked
experimental for the time being so there's room to see if the
semantics need changes.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Drop x- and x_ prefixes for latency histograms and update version to
4.0
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Even though the status field is deprecated, we still have to support
it for a few more releases. Since this is a very new kind of bitmap
state, it makes sense for it to have its own status field.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190301191545.8728-3-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Add an inconsistent bit to dirty-bitmaps that allows us to report a bitmap as
persistent but potentially inconsistent, i.e. if we find bitmaps on a qcow2
that have been marked as "in use".
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190301191545.8728-2-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
The current API allows us to report a single status, which we've defined as:
Frozen: has a successor, treated as qmp_locked, may or may not be enabled.
Locked: no successor, qmp_locked. may or may not be enabled.
Disabled: Not frozen or locked, disabled.
Active: Not frozen, locked, or disabled.
The problem is that both "Frozen" and "Locked" mean nearly the same thing,
and that both of them do not intuit whether they are recording guest writes
or not.
This patch deprecates that status field and introduces two orthogonal
properties instead to replace it.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190223000614.13894-2-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Provide an option to force QEMU to always keep the external data file
consistent as a standalone read-only raw image.
At the moment, this means making sure that write_zeroes requests are
forwarded to the data file instead of just updating the metadata, and
checking that no backing file is used.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Rather than requiring that the external data file node is passed
explicitly when creating the qcow2 node, store the filename in the
designated header extension during .bdrv_create and read it from there
as a default during .bdrv_open.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The meaning of the states has changed subtly over time,
this should bring the understanding more in-line with the
current, actual usages.
Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190202011048.12343-1-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Since qemu currently doesn't flush persistent bitmaps to disk until
shutdown (which might be MUCH later), it's useful if 'query-block'
at least shows WHICH bitmaps will (eventually) make it to persistent
storage. Update affected iotests.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190204210512.27458-1-eblake@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
In the 'Format specific information' section of the 'qemu-img info'
command output, the supplemental information about existing QCOW2
bitmaps will be shown, such as a bitmap name, flags and granularity:
image: /vz/vmprivate/VM1/harddisk.hdd
file format: qcow2
virtual size: 64G (68719476736 bytes)
disk size: 3.0M
cluster_size: 1048576
Format specific information:
compat: 1.1
lazy refcounts: true
bitmaps:
[0]:
flags:
[0]: in-use
[1]: auto
name: back-up1
granularity: 65536
[1]:
flags:
[0]: in-use
[1]: auto
name: back-up2
granularity: 65536
refcount bits: 16
corrupt: false
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <1549638368-530182-3-git-send-email-andrey.shinkevich@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Clarify that the number of extents provided in BlockdevCreateOptionsVmdk
must match the number of extents that will actually be used. Providing
more extents will result in an error now.
This requires adapting the test case to provide the right number of
extents.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
This makes VMDK support blockdev-create. The implementation reuses the
image creation code in vmdk_co_create_opts which now acceptes a callback
pointer to "retrieve" BlockBackend pointers from the caller. This way we
separate the logic between file/extent acquisition and initialization.
The QAPI command parameters are mostly the same as the old create_opts
except the dropped legacy @compat6 switch, which is redundant with
@hwversion.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a new command, returning block nodes (and their users) graph.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20181221170909.25584-2-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
The 'x' prefix was added because I was uncertain of the direction we'd
take for the libvirt API. With the general approach solidified, I feel
comfortable committing to this API for 4.0.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20181221093529.23855-5-jsnow@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Especially outside of transactions, it is helpful to provide
all-or-nothing semantics for bitmap merges. This facilitates
the coalescing of multiple bitmaps into a single target for
the "checkpoint" interpretation when assembling bitmaps that
represent arbitrary points in time from component bitmaps.
This is an incompatible change from the preliminary version
of the API.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20181221093529.23855-4-jsnow@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Add #if defined(CONFIG_REPLICATION) in generated code, and adjust the
code accordingly.
Made conditional:
* xen-set-replication, query-xen-replication-status,
xen-colo-do-checkpoint
Before the patch, we first register the commands unconditionally in
generated code (requires a stub), then conditionally unregister in
qmp_unregister_commands_hack().
Afterwards, we register only when CONFIG_REPLICATION. The command
fails exactly the same, with CommandNotFound.
Improvement, because now query-qmp-schema is accurate, and we're one
step closer to killing qmp_unregister_commands_hack().
* enum BlockdevDriver value "replication" in command blockdev-add
* BlockdevOptions variant @replication
and related structures.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20181213123724.4866-23-marcandre.lureau@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Let's break the line before 'data'. While at it, improve a bit
indentation/spacing. (I removed some alignment which are not helping
much readability and become quickly inconsistent)
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20181208111606.8505-26-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
If a management application builds the block graph node by node, the
protocol layer doesn't inherit its read-only option from the format
layer any more, so it must be set explicitly.
Backing files should work on read-only storage, but at the same time, a
block job like commit should be able to reopen them read-write if they
are on read-write storage. However, without option inheritance, reopen
only changes the read-only option for the root node (typically the
format layer), but not the protocol layer, so reopening fails (the
format layer wants to get write permissions, but the protocol layer is
still read-only).
A simple workaround for the problem in the management tool would be to
open the protocol layer always read-write and to make only the format
layer read-only for backing files. However, sometimes the file is
actually stored on read-only storage and we don't know whether the image
can be opened read-write (for example, for NBD it depends on the server
we're trying to connect to). This adds an option that makes QEMU try to
open the image read-write, but allows it to degrade to a read-only mode
without returning an error.
The documentation for this option is consciously phrased in a way that
allows QEMU to switch to a better model eventually: Instead of trying
when the image is first opened, making the read-only flag dynamic and
changing it automatically whenever the first BLK_PERM_WRITE user is
attached or the last one is detached would be much more useful
behaviour.
Unfortunately, this more useful behaviour is also a lot harder to
implement, and libvirt needs a solution now before it can switch to
-blockdev, so let's start with this easier approach for now.
Instead of adding a new auto-read-only option, turning the existing
read-only into an enum (with a bool alternate for compatibility) was
considered, but it complicated the implementation to the point that it
didn't seem to be worth it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
It is only an oversight that we don't allow incremental backup with
blockdev-backup. Add the bitmap argument which enables this.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 20180830211605.13683-2-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
- qcow2 cache option default changes (Linux: 32 MB maximum, limited by
whatever cache size can be made use of with the specific image;
default cache-clean-interval of 10 minutes)
- reopen: Allow specifying unchanged child node references, and changing
a few generic options (discard, detect-zeroes)
- Fix werror/rerror defaults for -device drive=<node-name>
- Test case fixes
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbslavAAoJEH8JsnLIjy/Wi0kP/jU18AzfISoIhcJ2GBXYU2aV
/FnUdB/L3mjMZOYkIgjDunw/fgfvelLqNdWb7xlijYeDPAiYKNEmJHX+iznE5ieP
KnpHOxASSe8w5SFlnF8h30rLK05gcy/rg/QcuMX4KkU46E0C8t0rSLBJE5FdYiRU
HN00jraTNfzyixuFxRVpqyadbhbCCEVwlwjDg3GMjGEML/WRk6jmhOOF5tVX72om
gmVrzA1lAlzkFnx32Bloevp72iolWFLkyA86oNgPMwIFG0zj9lnK5B/fvnkVTY2v
MnXGPwEVZUoZnif4nAXA2+bBqKT4Nbo21N8OylJhmNUi8K/rndiZdHH5Kph+yFod
RGkBI4Pb5KxiI+YDiRKJmyQd/7IiWLarjP1nV3UjvPLnpmuTA54jRjDVmA6AW8OH
BFu34+jfA4rll2dorVmQAFES4yvvj/brtTsCZfG5VNl60tigdqeLCZrQkNwR188q
osKGWBEKy7+2SYj5q+s0BSO+caXmU2XLSdcE1gEHFQ51eU0mRZA0OrooNUuUk30E
42n8BZ77P8EGb7UQBmKqYwWL4hXQPWL3m3i7Mnz19+iwk/m8SHvj2nriouDoiVtf
gtUwfr7TKvL9JcPLHrS3/j8boC5S4Rm+wlyyIlta8n2rS4bh1e2bGEZuNxZKyKCg
Y9WO6KxbztbO9X0ZnxFW
=ai81
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches:
- qcow2 cache option default changes (Linux: 32 MB maximum, limited by
whatever cache size can be made use of with the specific image;
default cache-clean-interval of 10 minutes)
- reopen: Allow specifying unchanged child node references, and changing
a few generic options (discard, detect-zeroes)
- Fix werror/rerror defaults for -device drive=<node-name>
- Test case fixes
# gpg: Signature made Mon 01 Oct 2018 18:17:35 BST
# gpg: using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (23 commits)
tests/test-bdrv-drain: Fix too late qemu_event_reset()
test-replication: Lock AioContext around blk_unref()
qcow2: Fix cache-clean-interval documentation
block-backend: Set werror/rerror defaults in blk_new()
qcow2: Explicit number replaced by a constant
qcow2: Set the default cache-clean-interval to 10 minutes
qcow2: Resize the cache upon image resizing
qcow2: Increase the default upper limit on the L2 cache size
qcow2: Assign the L2 cache relatively to the image size
qcow2: Avoid duplication in setting the refcount cache size
qcow2: Make sizes more humanly readable
include: Add a lookup table of sizes
qcow2: Options' documentation fixes
block: Allow changing 'detect-zeroes' on reopen
block: Allow changing 'discard' on reopen
file-posix: Forbid trying to change unsupported options during reopen
block: Forbid trying to change unsupported options during reopen
block: Allow child references on reopen
block: Don't look for child references in append_open_options()
block: Remove child references from bs->{options,explicit_options}
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Fixing cache-clean-interval documentation following the recent change to
a default of 600 seconds on supported plarforms (only Linux currently).
Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The default cache-clean-interval is set to 10 minutes, in order to lower
the overhead of the qcow2 caches (before the default was 0, i.e.
disabled).
* For non-Linux platforms the default is kept at 0, because
cache-clean-interval is not supported there yet.
Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We named these using underscores instead of the preferred dash,
document this nearby so we cannot possibly forget to rectify this
when we remove the 'x-' prefixes when the feature becomes stable.
We do not implement the change ahead of time to avoid more work
for libvirt to do in order to figure out how to use the beta version
of the API needlessly.
Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20180919190934.16284-1-jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: typo fix]
Signed-off-by: Eric Blake <eblake@redhat.com>
The block-commit QMP command required specifying the top and base nodes
of the commit jobs using the file name of that node. While this works
in simple cases (local files with absolute paths), the file names
generated for more complicated setups can be hard to predict.
The block-commit command has more problems than just this, so we want to
replace it altogether in the long run, but libvirt needs a reliable way
to address nodes now. So we don't want to wait for a new, cleaner
command, but just add the minimal thing needed right now.
This adds two new options top-node and base-node to the command, which
allow specifying node names instead. They are mutually exclusive with
the old options.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fix documentation to match the other jobs amended for 3.1.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180906130225.5118-16-jsnow@redhat.com
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180906130225.5118-15-jsnow@redhat.com
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180906130225.5118-14-jsnow@redhat.com
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180906130225.5118-13-jsnow@redhat.com
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Most of the various error classes were removed prior to the 1.2 release.
Remove mentions of the error classes which did not make it.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
blockdev-add fails if an invalid node name is given, so we should
document what a valid node name even is.
Reported-by: Cong Li <coli@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Cong Li <coli@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Like for query-block, the client needs to identify which BlockBackend
the returned data is for. Anonymous BlockBackends are identified by the
device model they are attached to. Add a 'qdev' field that contains the
qdev ID or QOM path of the attached device model.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
This was accidentally omitted. Thanks to Eric Blake for spotting this.
Signed-off-by: Ari Sundholm <ari@tuxera.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is a way to ensure that the log superblock is periodically
updated. Before, this was only done on flush requests, which may
not be enough if the VM exits abnormally, omitting the final flush.
The default interval is 4096 write requests.
Signed-off-by: Ari Sundholm <ari@tuxera.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Suggested by Kevin Wolf. May be useful when testing multiple batches
of writes or doing long-term testing involving restarts of the VM.
Signed-off-by: Ari Sundholm <ari@tuxera.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Implements a block device write logging system, similar to Linux kernel
device mapper dm-log-writes. The write operations that are performed
on a block device are logged to a file or another block device. The
write log format is identical to the dm-log-writes format. Currently,
log markers are not supported.
This functionality can be used for crash consistency and fs consistency
testing. By implementing it in qemu, tests utilizing write logs can be
be used to test non-Linux drivers and older kernels.
The driver accepts an optional parameter to set the sector size used
for logging. This makes the driver require all requests to be aligned
to this sector size and also makes offsets and sizes of writes in the
log metadata to be expressed in terms of this value (the log format has
a granularity of one sector for offsets and sizes). This allows
accurate logging of writes to guest block devices that have unusual
sector sizes.
The implementation is based on the blkverify and blkdebug block
drivers.
Signed-off-by: Aapo Vienamo <aapo@tuxera.com>
Signed-off-by: Ari Sundholm <ari@tuxera.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In order to test that the NBD server is properly advertising
dirty bitmaps, we need a bare minimum client that can request
and read the context. Since feature freeze for 3.0 is imminent,
this is the smallest workable patch, which replaces the qemu
block status report with the results of the NBD server's dirty
bitmap (making it very easy to use 'qemu-img map --output=json'
to learn where the dirty portions are). Note that the NBD
protocol defines a dirty section with the same bit but opposite
sense that normal "base:allocation" uses to report an allocated
section; so in qemu-img map output, "data":true corresponds to
clean, "data":false corresponds to dirty.
A more complete solution that allows dirty bitmaps to be queried
at the same time as normal block status will be required before
this addition can lose the x- prefix. Until then, the fact that
this replaces normal status with dirty status means actions
like 'qemu-img convert' will likely misbehave due to treating
dirty regions of the file as if they are unallocated.
The next patch adds an iotest to exercise this new code.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180702191458.28741-2-eblake@redhat.com>
Flat unions may now have uncovered branches, so it is possible to get rid
of empty types defined for that purpose only.
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1529311206-76847-3-git-send-email-anton.nefedov@virtuozzo.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
This patch allows the user to specify whether to use active or only
background mode for mirror block jobs. Currently, this setting will
remain constant for the duration of the entire block job.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180613181823.13618-14-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
This patch implements active synchronous mirroring. In active mode, the
passive mechanism will still be in place and is used to copy all
initially dirty clusters off the source disk; but every write request
will write data both to the source and the target disk, so the source
cannot be dirtied faster than data is mirrored to the target. Also,
once the block job has converged (BLOCK_JOB_READY sent), source and
target are guaranteed to stay in sync (unless an error occurs).
Active mode is completely optional and currently disabled at runtime. A
later patch will add a way for users to enable it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180613181823.13618-13-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Legacy -drive supports "password-secret" parameter that isn't
available with -blockdev / blockdev-add. That's because we backed out
our first try to provide it there due to interface design doubts, in
commit 577d8c9a81, v2.9.0.
This is the second try. It brings back the parameter, except it's
named "key-secret" now.
Let's review our reasons for backing out the first try, as stated in
the commit message:
* BlockdevOptionsRbd member @password-secret isn't actually a
password, it's a key generated by Ceph.
Addressed by the rename.
* We're not sure where member @password-secret belongs (see the
previous commit).
See previous commit.
* How @password-secret interacts with settings from a configuration
file specified with @conf is undocumented.
Not actually true, the documentation for @conf says "Values in the
configuration file will be overridden by options specified via QAPI",
and we've tested this.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Parameter auth-client-required lets you configure authentication
methods. We tried to provide that in v2.9.0, but backed out due to
interface design doubts (commit 464444fcc1).
This commit is similar to what we backed out, but simpler: we use a
list of enumeration values instead of a list of objects with a member
of enumeration type.
Let's review our reasons for backing out the first try, as stated in
the commit message:
* The implementation uses deprecated rados_conf_set() key
"auth_supported". No biggie.
Fixed: we use "auth-client-required".
* The implementation makes -drive silently ignore invalid parameters
"auth" and "auth-supported.*.X" where X isn't "auth". Fixable (in
fact I'm going to fix similar bugs around parameter server), so
again no biggie.
That fix is commit 2836284db6. This commit doesn't bring the bugs
back.
* BlockdevOptionsRbd member @password-secret applies only to
authentication method cephx. Should it be a variant member of
RbdAuthMethod?
We've had time to ponder, and we decided to stick to the way Ceph
configuration works: the key configured separately, and silently
ignored if the authentication method doesn't use it.
* BlockdevOptionsRbd member @user could apply to both methods cephx
and none, but I'm not sure it's actually used with none. If it
isn't, should it be a variant member of RbdAuthMethod?
Likewise.
* The client offers a *set* of authentication methods, not a list.
Should the methods be optional members of BlockdevOptionsRbd instead
of members of list @auth-supported? The latter begs the question
what multiple entries for the same method mean. Trivial question
now that RbdAuthMethod contains nothing but @type, but less so when
RbdAuthMethod acquires other members, such the ones discussed above.
Again, we decided to stick to the way Ceph configuration works, except
we make auth-client-required a list of enumeration values instead of a
string containing keywords separated by delimiters.
* How BlockdevOptionsRbd member @auth-supported interacts with
settings from a configuration file specified with @conf is
undocumented. I suspect it's untested, too.
Not actually true, the documentation for @conf says "Values in the
configuration file will be overridden by options specified via QAPI",
and we've tested this.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is needed, for example, to create a new bitmap and merge several
disabled bitmaps into a new one. Without this flag we will have to
put block-dirty-bitmap-add and block-dirty-bitmap-disable into one
transaction.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20180606182449.1607-6-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20180606182449.1607-5-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Expose the ability to turn bitmaps "on" or "off". This is experimental
and principally for the sake of the Libvirt Checkpoints API, and it may
or may not be committed for 3.0.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20180606182449.1607-3-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
We're ready to declare the blockdev-create job stable. This renames the
corresponding QMP command from x-blockdev-create to blockdev-create.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
This changes the x-blockdev-create QMP command so that it doesn't block
the monitor and the main loop any more, but starts a background job that
performs the image creation.
The basic job as implemented here is all that is necessary to make image
creation asynchronous and to provide a QMP interface that can be marked
stable, but it still lacks a few features that jobs usually provide: The
job will ignore pause commands and it doesn't publish more than very
basic progress yet (total-progress is 1 and current-progress advances
from 0 to 1 when the driver callbacks returns). These features can be
added later without breaking compatibility.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
We're going to make the next release be 3.0, not 2.13; change
the annotations in our json appropriately.
Changes produced with
sed -i -e 's/2\.13/3.0/g' qapi/*.json
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20180522104000.9044-2-peter.maydell@linaro.org
This adds a separate schema file for all job-related definitions that
aren't tied to the block layer.
For a start, move the enums JobType, JobStatus and JobVerb.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
This moves BlockJob.status and the closely related functions
(block_)job_state_transition() and (block_)job_apply_verb to Job. The
two QAPI enums are renamed to JobStatus and JobVerb.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
QAPI types aren't externally visible, so we can rename them without
causing problems. Before we add a job type to Job, rename the enum
so it can be used for more than just block jobs.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Clarify that len is just an estimation of the end value of offset, and
that offset increases monotonically while len can change arbitrarily.
While touching the documentation of offset, move it directly after len
to match the order of the declaration below.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Commit 0ec4dfb8d changed block-job_pause/resume so that they return an
error if they don't do anything because the job is already
paused/running. It forgot to update the documentation, so do that now.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
This adds a simple copy-on-read filter driver. It relies on the already
existing COR functionality in the central block layer code, which may be
moved here once we no longer need it there.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180421132929.21610-2-mreitz@redhat.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
When we've reached the concluded state, we need to expose the error
state if applicable. Add the new field.
This should be sufficient for determining if a job completed
successfully or not after concluding; if we want to discriminate
based on how it failed more mechanically, we can always add an
explicit return code enumeration later.
I didn't bother to make it only show up if we are in the concluded
state; I don't think it's necessary.
Cc: qemu-stable@nongnu.org
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
mincore(2) checks whether pages are resident. Use it to verify that
page cache has been dropped.
You can trigger a verification failure by mmapping the image file from
another process that loads a byte from a page, forcing it to become
resident. bdrv_co_invalidate_cache() will fail while that process is
alive.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180427162312.18583-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
What static=on really does is what we call metadata preallocation for
other block drivers. While we can still change the QMP interface, make
it more consistent by using 'preallocation' for VDI, too.
This doesn't implement any new functionality, so the only supported
preallocation modes are 'off' and 'metadata' for now.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Set (and clear) histograms through new command
block-latency-histogram-set and show new statistics in
query-blockstats results.
For now, the command is marked experimental with prefix 'x-',
to gain experience with the interface without being stuck
with design decisions.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180309165212.97144-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[eblake: fix typos, mention x- prefix in commit message]
Signed-off-by: Eric Blake <eblake@redhat.com>
We have a clear replacement, so let's deprecate it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20180224154033.29559-8-mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
When doing drive mirror to a low speed shared storage, if there was heavy
BLK IO write workload in VM after the 'ready' event, drive mirror block job
can't be canceled immediately, it would keep running until the heavy BLK IO
workload stopped in the VM.
Libvirt depends on the current block-job-cancel semantics, which is that
when used without a flag after the 'ready' event, the command blocks
until data is in sync. However, these semantics are awkward in other
situations, for example, people may use drive mirror for realtime
backups while still wanting to use block live migration. Libvirt cannot
start a block live migration while another drive mirror is in progress,
but the user would rather abandon the backup attempt as broken and
proceed with the live migration than be stuck waiting for the current
drive mirror backup to finish.
The drive-mirror command already includes a 'force' flag, which libvirt
does not use, although it documented the flag as only being useful to
quit a job which is paused. However, since quitting a paused job has
the same effect as abandoning a backup in a non-paused job (namely, the
destination file is not in sync, and the command completes immediately),
we can just improve the documentation to make the force flag obviously
useful.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jeff Cody <jcody@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: John Snow <jsnow@redhat.com>
Reported-by: Huaitong Han <huanhuaitong@didichuxing.com>
Signed-off-by: Huaitong Han <huanhuaitong@didichuxing.com>
Signed-off-by: Liang Li <liliangleo@didichuxing.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds the .bdrv_co_create driver callback to vpc, which
enables image creation over QMP.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
This adds the .bdrv_co_create driver callback to vhdx, which
enables image creation over QMP.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
This adds the .bdrv_co_create driver callback to qed, which
enables image creation over QMP.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
This adds the .bdrv_co_create driver callback to qcow, which
enables image creation over QMP.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
This adds the .bdrv_co_create driver callback to parallels, which
enables image creation over QMP.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
In preparation of QAPI-fying VDI image creation, we have to create a
BlockdevCreateOptionsVdi type which is received by a (future)
vdi_co_create().
vdi_co_create_opts() now converts the QemuOpts object into such a
BlockdevCreateOptionsVdi object. The protocol-layer file is still
created in vdi_co_do_create() (and BlockdevCreateOptionsVdi.file is set
to an empty string), but that will be addressed by a follow-up patch.
Note that cluster-size is not part of the QAPI schema because it is not
supported by default.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds the .bdrv_co_create driver callback to luks, which enables
image creation over QMP.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Expose the "manual" property via QAPI for the backup-related jobs.
As of this commit, this allows the management API to request the
"concluded" and "dismiss" semantics for backup jobs.
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of automatically transitioning from PENDING to CONCLUDED, gate
the .prepare() and .commit() phases behind an explicit acknowledgement
provided by the QMP monitor if auto_finalize = false has been requested.
This allows us to perform graph changes in prepare and/or commit so that
graph changes do not occur autonomously without knowledge of the
controlling management layer.
Transactions that have reached the "PENDING" state together can all be
moved to invoke their finalization methods by issuing block_job_finalize
to any one job in the transaction.
Jobs in a transaction with mixed job->auto_finalize settings will all
remain stuck in the "PENDING" state, as if the entire transaction was
specified with auto_finalize = false. Jobs that specified
auto_finalize = true, however, will still not emit the PENDING event.
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For jobs utilizing the new manual workflow, we intend to prohibit
them from modifying the block graph until the management layer provides
an explicit ACK via block-job-finalize to move the process forward.
To distinguish this runstate from "ready" or "waiting," we add a new
"pending" event and status.
For now, the transition from PENDING to CONCLUDED/ABORTING is automatic,
but a future commit will add the explicit block-job-finalize step.
Transitions:
Waiting -> Pending: Normal transition.
Pending -> Concluded: Normal transition.
Pending -> Aborting: Late transactional failures and cancellations.
Removed Transitions:
Waiting -> Concluded: Jobs must go to PENDING first.
Verbs:
Cancel: Can be applied to a pending job.
+---------+
|UNDEFINED|
+--+------+
|
+--v----+
+---------+CREATED+-----------------+
| +--+----+ |
| | |
| +--+----+ +------+ |
+---------+RUNNING<----->PAUSED| |
| +--+-+--+ +------+ |
| | | |
| | +------------------+ |
| | | |
| +--v--+ +-------+ | |
+---------+READY<------->STANDBY| | |
| +--+--+ +-------+ | |
| | | |
| +--v----+ | |
+---------+WAITING<---------------+ |
| +--+----+ |
| | |
| +--v----+ |
+---------+PENDING| |
| +--+----+ |
| | |
+--v-----+ +--v------+ |
|ABORTING+--->CONCLUDED| |
+--------+ +--+------+ |
| |
+--v-+ |
|NULL<--------------------+
+----+
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For jobs that are stuck waiting on others in a transaction, it would
be nice to know that they are no longer "running" in that sense, but
instead are waiting on other jobs in the transaction.
Jobs that are "waiting" in this sense cannot be meaningfully altered
any longer as they have left their running loop. The only meaningful
user verb for jobs in this state is "cancel," which will cancel the
whole transaction, too.
Transitions:
Running -> Waiting: Normal transition.
Ready -> Waiting: Normal transition.
Waiting -> Aborting: Transactional cancellation.
Waiting -> Concluded: Normal transition.
Removed Transitions:
Running -> Concluded: Jobs must go to WAITING first.
Ready -> Concluded: Jobs must go to WAITING first.
Verbs:
Cancel: Can be applied to WAITING jobs.
+---------+
|UNDEFINED|
+--+------+
|
+--v----+
+---------+CREATED+-----------------+
| +--+----+ |
| | |
| +--v----+ +------+ |
+---------+RUNNING<----->PAUSED| |
| +--+-+--+ +------+ |
| | | |
| | +------------------+ |
| | | |
| +--v--+ +-------+ | |
+---------+READY<------->STANDBY| | |
| +--+--+ +-------+ | |
| | | |
| +--v----+ | |
+---------+WAITING<---------------+ |
| +--+----+ |
| | |
+--v-----+ +--v------+ |
|ABORTING+--->CONCLUDED| |
+--------+ +--+------+ |
| |
+--v-+ |
|NULL<--------------------+
+----+
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For jobs that have reached their CONCLUDED state, prior to having their
last reference put down (meaning jobs that have completed successfully,
unsuccessfully, or have been canceled), allow the user to dismiss the
job's lingering status report via block-job-dismiss.
This gives management APIs the chance to conclusively determine if a job
failed or succeeded, even if the event broadcast was missed.
Note: block_job_do_dismiss and block_job_decommission happen to do
exactly the same thing, but they're called from different semantic
contexts, so both aliases are kept to improve readability.
Note 2: Don't worry about the 0x04 flag definition for AUTO_DISMISS, she
has a friend coming in a future patch to fill the hole where 0x02 is.
Verbs:
Dismiss: operates on CONCLUDED jobs only.
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a new state that specifically demarcates when we begin to permanently
demolish a job after it has performed all work. This makes the transition
explicit in the STM table and highlights conditions under which a job may
be demolished.
Alongside this state, add a new helper command "block_job_decommission",
which transitions to the NULL state and puts down our implicit reference.
This separates instances in the code for "block_job_unref" which merely
undo a matching "block_job_ref" with instances intended to initiate the
full destruction of the object.
This decommission action also sets a number of fields to make sure that
block internals or external users that are holding a reference to a job
to see when it "finishes" are convinced that the job object is "done."
This is necessary, for instance, to do a block_job_cancel_sync on a
created object which will not make any progress.
Now, all jobs must go through block_job_decommission prior to being
freed, giving us start-to-finish state machine coverage for jobs.
Transitions:
Created -> Null: Early failure event before the job is started
Concluded -> Null: Standard transition.
Verbs:
None. This should not ever be visible to the monitor.
+---------+
|UNDEFINED|
+--+------+
|
+--v----+
+---------+CREATED+------------------+
| +--+----+ |
| | |
| +--v----+ +------+ |
+---------+RUNNING<----->PAUSED| |
| +--+-+--+ +------+ |
| | | |
| | +------------------+ |
| | | |
| +--v--+ +-------+ | |
+---------+READY<------->STANDBY| | |
| +--+--+ +-------+ | |
| | | |
+--v-----+ +--v------+ | |
|ABORTING+--->CONCLUDED<-------------+ |
+--------+ +--+------+ |
| |
+--v-+ |
|NULL<---------------------+
+----+
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
add a new state "CONCLUDED" that identifies a job that has ceased all
operations. The wording was chosen to avoid any phrasing that might
imply success, error, or cancellation. The task has simply ceased all
operation and can never again perform any work.
("finished", "done", and "completed" might all imply success.)
Transitions:
Running -> Concluded: normal completion
Ready -> Concluded: normal completion
Aborting -> Concluded: error and cancellations
Verbs:
None as of this commit. (a future commit adds 'dismiss')
+---------+
|UNDEFINED|
+--+------+
|
+--v----+
+---------+CREATED|
| +--+----+
| |
| +--v----+ +------+
+---------+RUNNING<----->PAUSED|
| +--+-+--+ +------+
| | |
| | +------------------+
| | |
| +--v--+ +-------+ |
+---------+READY<------->STANDBY| |
| +--+--+ +-------+ |
| | |
+--v-----+ +--v------+ |
|ABORTING+--->CONCLUDED<-------------+
+--------+ +---------+
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a new state ABORTING.
This makes transitions from normative states to error states explicit
in the STM, and serves as a disambiguation for which states may complete
normally when normal end-states (CONCLUDED) are added in future commits.
Notably, Paused/Standby jobs do not transition directly to aborting,
as they must wake up first and cooperate in their cancellation.
Transitions:
Created -> Aborting: can be cancelled (by the system)
Running -> Aborting: can be cancelled or encounter an error
Ready -> Aborting: can be cancelled or encounter an error
Verbs:
None. The job must finish cleaning itself up and report its final status.
+---------+
|UNDEFINED|
+--+------+
|
+--v----+
+---------+CREATED|
| +--+----+
| |
| +--v----+ +------+
+---------+RUNNING<----->PAUSED|
| +--+----+ +------+
| |
| +--v--+ +-------+
+---------+READY<------->STANDBY|
| +-----+ +-------+
|
+--v-----+
|ABORTING|
+--------+
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Which commands ("verbs") are appropriate for jobs in which state is
also somewhat burdensome to keep track of.
As of this commit, it looks rather useless, but begins to look more
interesting the more states we add to the STM table.
A recurring theme is that no verb will apply to an 'undefined' job.
Further, it's not presently possible to restrict the "pause" or "resume"
verbs any more than they are in this commit because of the asynchronous
nature of how jobs enter the PAUSED state; justifications for some
seemingly erroneous applications are given below.
=====
Verbs
=====
Cancel: Any state except undefined.
Pause: Any state except undefined;
'created': Requests that the job pauses as it starts.
'running': Normal usage. (PAUSED)
'paused': The job may be paused for internal reasons,
but the user may wish to force an indefinite
user-pause, so this is allowed.
'ready': Normal usage. (STANDBY)
'standby': Same logic as above.
Resume: Any state except undefined;
'created': Will lift a user's pause-on-start request.
'running': Will lift a pause request before it takes effect.
'paused': Normal usage.
'ready': Will lift a pause request before it takes effect.
'standby': Normal usage.
Set-speed: Any state except undefined, though ready may not be meaningful.
Complete: Only a 'ready' job may accept a complete request.
=======
Changes
=======
(1)
To facilitate "nice" error checking, all five major block-job verb
interfaces in blockjob.c now support an errp parameter:
- block_job_user_cancel is added as a new interface.
- block_job_user_pause gains an errp paramter
- block_job_user_resume gains an errp parameter
- block_job_set_speed already had an errp parameter.
- block_job_complete already had an errp parameter.
(2)
block-job-pause and block-job-resume will no longer no-op when trying
to pause an already paused job, or trying to resume a job that isn't
paused. These functions will now report that they did not perform the
action requested because it was not possible.
iotests have been adjusted to address this new behavior.
(3)
block-job-complete doesn't worry about checking !block_job_started,
because the permission table guards against this.
(4)
test-bdrv-drain's job implementation needs to announce that it is
'ready' now, in order to be completed.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We're about to add several new states, and booleans are becoming
unwieldly and difficult to reason about. It would help to have a
more explicit bookkeeping of the state of blockjobs. To this end,
add a new "status" field and add our existing states in a redundant
manner alongside the bools they are replacing:
UNDEFINED: Placeholder, default state. Not currently visible to QMP
unless changes occur in the future to allow creating jobs
without starting them via QMP.
CREATED: replaces !!job->co && paused && !busy
RUNNING: replaces effectively (!paused && busy)
PAUSED: Nearly redundant with info->paused, which shows pause_count.
This reports the actual status of the job, which almost always
matches the paused request status. It differs in that it is
strictly only true when the job has actually gone dormant.
READY: replaces job->ready.
STANDBY: Paused, but job->ready is true.
New state additions in coming commits will not be quite so redundant:
WAITING: Waiting on transaction. This job has finished all the work
it can until the transaction converges, fails, or is canceled.
PENDING: Pending authorization from user. This job has finished all the
work it can until the job or transaction is finalized via
block_job_finalize. This implies the transaction has converged
and left the WAITING phase.
ABORTING: Job has encountered an error condition and is in the process
of aborting.
CONCLUDED: Job has ceased all operations and has a return code available
for query and may be dismissed via block_job_dismiss.
NULL: Job has been dismissed and (should) be destroyed. Should never
be visible to QMP.
Some of these states appear somewhat superfluous, but it helps define the
expected flow of a job; so some of the states wind up being synchronous
empty transitions. Importantly, jobs can be in only one of these states
at any given time, which helps code and external users alike reason about
the current condition of a job unambiguously.
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add special state, when qmp operations on the bitmap are disabled.
It is needed during bitmap migration. "Frozen" state is not
appropriate here, because it looks like bitmap is unchanged.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180207155837.92351-5-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>
This adds the .bdrv_co_create driver callback to ssh, which enables
image creation over QMP.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
This makes the host-key-check option available in blockdev-add.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
This adds the .bdrv_co_create driver callback to sheepdog, which enables
image creation over QMP.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>