Block jobs used drive_get_ref(drive_get_by_blockdev(bs)) to avoid BDS
being deleted. Now we have BDS reference count, and block jobs don't
care about dinfo, so replace them to get cleaner code. It is also the
safe way when BDS has no drive info.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Manage BlockDriverState lifecycle with refcnt, so bdrv_delete() is no
longer public and should be called by bdrv_unref() if refcnt is
decreased to 0.
This is an identical change because effectively, there's no multiple
reference of BDS now: no caller of bdrv_ref() yet, only bdrv_new() sets
bs->refcnt to 1, so all bdrv_unref() now actually delete the BDS.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This feature can be used in case where users are avoiding the iops limit by
doing jumbo I/Os hammering the storage backend.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The max parameter of the leaky bucket throttling algorithm can be used to
allow the guest to do bursts.
The max value is a pool of I/O that the guest can use without being throttled
at all. Throttling is triggered once this pool is empty.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This is an autogenerated patch using scripts/switch-timer-api.
Switch the entire code base to using the new timer API.
Note this patch may introduce some line length issues.
Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
When user tries to use read-only whitelist format in the command line
option, failure message was "'foo' invalid format". It might be invalid
only for writable, but valid for read-only, so it is confusing. Give the
user easier to understand information.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
bdrv_flags is set by bdrv_parse_discard_flags(), but later it is reset
to zero.
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Message-id: 1376483201-13466-1-git-send-email-mohan@in.ibm.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
When use -drive file='xxx',format=qcow2,snapshot=on the error
message "Can't use snapshot=on with driver-specific options"
can be show, and fail to start the qemu.
This should not be happened, and there is no file.driver option
in qemu command line.
It is because the commit 74fe54f2a1,
it puts 'driver' option if the command line use 'format' option.
This patch is to solve this bug.
Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We don't want to commit to the API yet before everything is worked out.
Like already for 1.5, disable it again for the 1.6 release. This commit
is meant to be reverted after the 1.6 release.
The disabling of the driver-specific options is achieved by applying the
old checks while parsing the command line.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The macro g_assert_not_reached is a better self documenting replacement
for assert(0) or assert(false).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This patch adds sync-modes to the drive-backup interface and
implements the FULL, NONE and TOP modes of synchronization.
FULL performs as before copying the entire contents of the drive
while preserving the point-in-time using CoW.
NONE only copies new writes to the target drive.
TOP copies changes to the topmost drive image and preserves the
point-in-time using CoW.
For sync mode TOP are creating a new target image using the same backing
file as the original disk image. Then any new data that has been laid
on top of it since creation is copied in the main backup_run() loop.
There is an extra check in the 'TOP' case so that we don't bother to copy
all the data of the backing file as it already exists in the target.
This is where the bdrv_co_is_allocated() is used to determine if the
data exists in the topmost layer or below.
Also any new data being written is intercepted via the write_notifier
hook which ends up calling backup_do_cow() to copy old data out before
it gets overwritten.
For mode 'NONE' we create the new target image and only copy in the
original data from the disk image starting from the time the call was
made. This preserves the point in time data by only copying the parts
that are *going to change* to the target image. This way we can
reconstruct the final image by checking to see if the given block exists
in the new target image first, and if it does not, you can get it from
the original image. This is basically an optimization allowing you to
do point-in-time snapshots with low overhead vs the 'FULL' version.
Since there is no old data to copy out the loop in backup_run() for the
NONE case just calls qemu_coroutine_yield() which only wakes up after
an event (usually cancel in this case). The rest is handled by the
before_write notifier which again calls backup_do_cow() to write out
the old data so it can be preserved.
Signed-off-by: Ian Main <imain@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The old 'cache' option really encodes three different boolean flags into
a cache mode name, without providing all combinations. Make them three
separate options instead and translate the old option to the new ones
for drive_init().
The specific boolean options take precedence if the old cache option is
specified as well, so the following options are equivalent:
-drive file=x,cache=none,cache.no-flush=true
-drive file=x,cache.writeback=true,cache.direct=true,cache.no-flush=true
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
In QMP, we want to use dashes instead of underscores in QMP argument
names, and use nested options for throttling.
The new option names affect the command line as well, but for
compatibility drive_init() will convert the old option names before
calling into the code that will be shared between -drive and
blockdev-add.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
This is traditionally -drive format=..., which is now translated into
the new driver option. This gives us a more consistent way to select the
driver of BlockDriverStates that can be used in QMP context, too.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The drive-backup command is similar to the drive-mirror command, except
no guest data written after the command executes gets copied. Add a
sync mode argument which determines whether the entire disk is copied,
just allocated clusters, or only clusters being written to by the guest.
Currently only sync mode 'full' is supported - it copies the entire disk.
For read-only point-in-time snapshots we may only need sync mode 'none'
since the target can be a qcow2 file using the guest's disk as its
backing file (no need to copy the entire disk). Finally, sync mode
'top' is useful if we wish to preserve the backing chain.
Note that this patch just adds the sync mode argument to drive-backup.
It does not implement sync modes 'top' or 'none'. This patch is
necessary so we can add a drive-backup HMP command that behaves like the
existing drive-mirror HMP command and takes a sync mode.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The Abort action can be used to test QMP 'transaction' failure. Add it
as the last action to exercise the .abort() and .cleanup() code paths
for all previous actions.
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds a transactional version of the drive-backup QMP command.
It allows atomic snapshots of multiple drives along with automatic
cleanup if there is a failure to start one of the backup jobs.
Note that QMP events are emitted for block job completion/cancellation
and the block job will be listed by query-block-jobs.
@device: the name of the device whose writes should be mirrored.
@target: the target of the new image. If the file exists, or if it
is a device, the existing file/device will be used as the new
destination. If it does not exist, a new file will be created.
@format: #optional the format of the new destination, default is to
probe if @mode is 'existing', else the format of the source
@mode: #optional whether and how QEMU should create a new image, default is
'absolute-paths'.
@speed: #optional the maximum speed, in bytes per second
@on-source-error: #optional the action to take on an error on the source,
default 'report'. 'stop' and 'enospc' can only be used
if the block device supports io-status (see BlockInfo).
@on-target-error: #optional the action to take on an error on the target,
default 'report' (no limitations, since this applies to
a different block device than @device).
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some QMP 'transaction' types don't need to do anything on .commit().
Make .commit() optional just like .abort().
The "drive-backup" action will take advantage of this, it only needs to
cancel the block job on .abort(). Other block job actions will probably
follow the same pattern, so allow .commit() to be NULL.
Suggested-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The QMP 'transaction' command keeps a list of in-flight transactions.
The transaction state structure is called BlkTransactionStates even
though it only deals with a single transaction. The only plural thing
is the linked list of transaction states.
I find it confusing to call the single structure "States". This patch
renames it to "State", just like BlockDriverState is singular.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
@drive-backup
Start a point-in-time copy of a block device to a new destination. The
status of ongoing drive-backup operations can be checked with
query-block-jobs where the BlockJobInfo.type field has the value 'backup'.
The operation can be stopped before it has completed using the
block-job-cancel command.
@device: the name of the device which should be copied.
@target: the target of the new image. If the file exists, or if it
is a device, the existing file/device will be used as the new
destination. If it does not exist, a new file will be created.
@format: #optional the format of the new destination, default is to
probe if @mode is 'existing', else the format of the source
@mode: #optional whether and how QEMU should create a new image, default is
'absolute-paths'.
@speed: #optional the maximum speed, in bytes per second
@on-source-error: #optional the action to take on an error on the source,
default 'report'. 'stop' and 'enospc' can only be used
if the block device supports io-status (see BlockInfo).
@on-target-error: #optional the action to take on an error on the target,
default 'report' (no limitations, since this applies to
a different block device than @device).
Note that @on-source-error and @on-target-error only affect background I/O.
If an error occurs during a guest write request, the device's rerror/werror
actions will be used.
Returns: nothing on success
If @device is not a valid block device, DeviceNotFound
Since 1.6
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Use bdrv_getlength() for its byte units and error return instead of
bdrv_get_geometry().
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It is not necessary to check that we can find a protocol block driver
since we create or open the image file. This produces the error that we
need anyway.
Besides, the QERR_INVALID_BLOCK_FORMAT is inappropriate since the
protocol is incorrect rather than the format.
Also drop an empty line between bdrv_open() and checking its return
value. This may be due to copy-pasting from earlier code that performed
other operations before handling errors.
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini <pbonzini@redhat.com> suggested the following test case:
1. Launch a guest and wait at the GRUB boot menu:
qemu-system-x86_64 -enable-kvm -m 1024 \
-drive if=none,cache=none,file=test.img,id=foo,werror=stop,rerror=stop
-device virtio-blk-pci,drive=foo,id=virtio0,addr=4
2. Hot unplug the device:
(qemu) drive_del foo
3. Select the first boot menu entry
Without this patch the guest pauses due to ENOMEDIUM. The guest is
stuck in a continuous pause loop since the I/O request is retried and
fails immediately again when the guest is resumed.
With this patch the error is reported to the guest.
Note that this scenario actually happens sometimes during libvirt disk
hot unplug, where device_del is followed by drive_del. I/O may still be
submitted to the drive after drive_del if the guest does not process the
PCI hot unplug notification.
Reported-by: Dafna Ron <dron@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
We may want to include a driver in the whitelist for read only tasks
such as diagnosing or exporting guest data (with libguestfs as a good
example). This patch introduces a readonly whitelist option, and for
backward compatibility, the old configure option --block-drv-whitelist
is now an alias to rw whitelist.
Drivers in readonly list is only permitted to open file readonly, and
returns -ENOTSUP for RW opening.
E.g. To include vmdk readonly, and others read+write:
./configure --target-list=x86_64-softmmu \
--block-drv-rw-whitelist=qcow2,raw,file,qed \
--block-drv-ro-whitelist=vmdk
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There's no reason to restrict transactions to operations related to
block devices, so rename the type now before schema introspection stops
us from doing so.
Also change the schema documentation of 'transaction' to not refer to
block devices or snapshots any more.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Make it easier to add other operations to qmp_transaction() by using
callbacks, with external snapshots serving as an example implementation
of the callbacks.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The code is simply moved into a separate function.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The code is moved into preparation function, and changed
a bit to tip more clearly what it is doing.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The code before really committing is moved into a function. Most
code is simply moved from qmp_transaction(), except that on fail it
just returns now. Other code such as input parsing is not touched,
to make it easier in review.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We have an errno value that can be displayed, so we should just do that.
An easy way to reproduce this case is to resize a raw image to a size
that is too large for the host file system.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
We don't want to commit to the API yet before everything is worked out.
Disable it for the 1.5 release. This commit is meant to be reverted
after the 1.5 release.
The disabling of the driver-specific options is achieved by applying the
old checks while parsing the command line.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Many of these should be cleaned up with proper qdev-/QOM-ification.
Right now there are many catch-all headers in include/hw/ARCH depending
on cpu.h, and this makes it necessary to compile these files per-target.
However, fixing this does not belong in these patches.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It is not necessary to adjust the slice time at runtime. We already
extend the current slice in order to carry over accounting into the next
slice. Changing the actual slice time value introduces oscillations.
The guest may experience large changes in throughput or IOPS from one
moment to the next when slice times are adjusted.
Reported-by: Benoît Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-By: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
After this patch, using -drive with an empty file name continues to open
the file if driver-specific options are used. If no driver-specific
options are specified, the semantics stay as it was: It defines a drive
without an inserted medium.
In order to achieve this, bdrv_open() must be made safe to work with a
NULL filename parameter. The assumption that is made is that only block
drivers which implement bdrv_parse_filename() support using driver
specific options and could therefore work without a filename. These
drivers must make sure to cope with NULL in their implementation of
.bdrv_open() (this is only NBD for now). For all other drivers, the
block layer code will make sure to error out before calling into their
code - they can't possibly work without a filename.
Now an NBD connection can be opened like this:
qemu-system-x86_64 -drive file.driver=nbd,file.port=1234,file.host=::1
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
this patch ensures that all pending IOs are completed
before a device is resized. this is especially important
if a device is shrinked as it the bdrv_check_request()
result is invalidated.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Screwed up in commit 666daa68. Thanks to Kevin Wolf for reminding me
to fix this.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Any non-default -drive options are now passed down to the block drivers.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Pointing to a QemuOpts element is surprising and can lead to subtle
use-after-free errors when the QemuOpts is freed after all options are
parsed.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It doesn't do anything yet except storing the options QDict in the
BlockDriverState.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Add support for BDRV_O_UNMAP from the QEMU command-line.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There can be a need to turn output to stdout off. This patch adds a -q option
that enable "Quiet mode". In Quiet mode, only errors are printed out.
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Negative I/O throttling iops and bps values do not make sense so reject
them with an error message.
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The do_check_io_limits() function returns false when I/O limits are
invalid but it doesn't set an Error to indicate why. The two
do_check_io_limits() callers duplicate error reporting. Solve this by
passing an Error pointer into do_check_io_limits().
Note that the two callers report slightly different errors: drive_init()
prints a custom error message while qmp_block_set_io_throttle() does
error_set(errp, QERR_INVALID_PARAMETER_COMBINATION).
QERR_INVALID_PARAMETER_COMBINATION is a generic error, see
include/qapi/qmp/qerror.h:
#define QERR_INVALID_PARAMETER_COMBINATION \
ERROR_CLASS_GENERIC_ERROR, "Invalid parameter combination"
Since it is generic we are not obliged to keep this error. Switch to
the custom error message which contains more information.
This patch prepares for adding additional checks with their own error
messages to do_check_io_limits(). The next patch adds a new check.
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
# By Paolo Bonzini (14) and others
# Via Kevin Wolf
* kwolf/for-anthony: (24 commits)
ide: Add fall through annotations
block: Create proper size file for disk mirror
ahci: Add migration support
ahci: Change data types in preparation for migration
ahci: Remove unused AHCIDevice fields
hbitmap: add assertion on hbitmap_iter_init
mirror: do nothing on zero-sized disk
block/vdi: Check for bad signature
block/vdi: Improved return values from vdi_open
block/vdi: Improve debug output for signature
block: Use error code EMEDIUMTYPE for wrong format in some block drivers
block: Add special error code for wrong format
mirror: support arbitrarily-sized iterations
mirror: support more than one in-flight AIO operation
mirror: add buf-size argument to drive-mirror
mirror: switch mirror_iteration to AIO
mirror: allow customizing the granularity
block: allow customizing the granularity of the dirty bitmap
block: return count of dirty sectors, not chunks
mirror: perform COW if the cluster size is bigger than the granularity
...