Simplify the blockdev-snapshot-sync code and gain failsafe operation
by turning it into a wrapper around the new transaction command. A new
option is also added matching "mode".
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The mode field lets a management application create the snapshot
destination outside QEMU.
Right now, the only modes are "existing" and "absolute-paths". Mirroring
introduces "no-backing-file". In the future "relative-paths" could be
implemented too.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We will add other kinds of operation. Prepare for this by adjusting
the schema.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Monitor operations that manipulate image files must not execute while a
background job (like image streaming) is in progress. This prevents
corruptions from happening when two pieces of code are manipulating the
image file without knowledge of each other.
The monitor "commit" command raises QERR_DEVICE_IN_USE when
bdrv_commit() returns -EBUSY but "commit all" has no error handling.
This is easy to fix, although note that we do not deliver a detailed
error about which device was busy in the "commit all" case.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is a QAPI/QMP only command to take a snapshot of a group of
devices. This is similar to the blockdev-snapshot-sync command, except
blockdev-group-snapshot-sync accepts a list devices, filenames, and
formats.
It is attempted to keep the snapshot of the group atomic; if the
creation or open of any of the new snapshots fails, then all of
the new snapshots are abandoned, and the name of the snapshot image
that failed is returned. The failure case should not interrupt
any operations.
Rather than use bdrv_close() along with a subsequent bdrv_open() to
perform the pivot, the original image is never closed and the new
image is placed 'in front' of the original image via manipulation
of the BlockDriverState fields. Thus, once the new snapshot image
has been successfully created, there are no more failure points
before pivoting to the new snapshot.
This allows the group of disks to remain consistent with each other,
even across snapshot failures.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Acked-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add support for streaming data from an intermediate section of the
image chain (see patch and documentation for details).
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Unplugging a storage interface like virtio-blk causes the host block
device to be deleted too. Long-running operations like block migration
must take a DriveInfo reference to prevent the BlockDriverState from
being freed. For image streaming we can do the same thing.
Note that it is not possible to acquire/release the drive reference in
block.c where the block job functions live because
drive_get_ref()/drive_put_ref() are blockdev.c functions. Calling them
from block.c would be a layering violation - tools like qemu-img don't
even link against blockdev.c.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add query-block-jobs, which shows the progress of ongoing block device
operations.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add block_job_cancel, which stops an active block streaming operation.
When the operation has been cancelled the new BLOCK_JOB_CANCELLED event
is emitted.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add block_job_set_speed, which sets the maximum speed for a background
block operation.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add the block_stream command, which starts copy backing file contents
into the image file. Also add the BLOCK_JOB_COMPLETED QMP event which
is emitted when image streaming completes. Later patches add control
over the background copy speed, cancelation, and querying running
streaming operations.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Long-running block operations like block migration and image streaming
must have continual access to their block device. It is not safe to
perform operations like hotplug, eject, change, resize, commit, or
external snapshot while a long-running operation is in progress.
This patch adds the missing bdrv_in_use() checks so that block migration
and image streaming never have the rug pulled out from underneath them.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Let's report specific errors so that management tools and users can
identify the problem.
Two new qerrors are needed:
* QERR_DEVICE_HAS_NO_MEDIUM for ENOMEDIUM
* QERR_DEVICE_IS_READ_ONLY for EACCES
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Also drops the leftover 'mon' argument.
This is a preparation for the next commits which will port the
eject and change commands to the QAPI.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Unfortunately, this conversion required an additional change.
In the old QMP command, the 'snapshot-file' argument is specified as
optional. The idea is to take the snapshot internally if 'snapshot-file'
is not passed. However, internal snapshots are not supported yet so
the command returns a MissingParamater error if 'snapshot-file' is not
passed. Which makes the argument actually required and will cause
compatibility breakage if we change that in the future.
To fix this the QAPI converted blockdev_snapshot_sync command makes the
'snapshot-file' argument required. Again, in practice it's actually required,
so this is not incompatible.
If we do implement internal snapshots someday, we'll need a new argument
for it.
Note that this discussion doesn't affect HMP.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Many places in QEMU call qemu_aio_flush() to complete all pending
asynchronous I/O. Most of these places actually want to drain all block
requests but there is no block layer API to do so.
This patch introduces the bdrv_drain_all() API to wait for requests
across all BlockDriverStates to complete. As a bonus we perform checks
after qemu_aio_wait() to ensure that requests really have finished.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds the -drive copy-on-read=on|off command-line option:
copy-on-read=on|off
copy-on-read is "on" or "off" and enables whether to copy read backing
file sectors into the image file. Copy-on-read avoids accessing the
same backing file sectors repeatedly and is useful when the backing
file is over a slow network. By default copy-on-read is off.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Recent versions of udev always keep the tray locked so that the kernel
can observe "eject request" events (aka tray button presses) even on
discs that aren't mounted. Add support for these events in the ATAPI
and SCSI cd drive device models.
To let management cope with the behavior of udev, an event should also
be added for "tray opened/closed". This way, after issuing an "eject"
command, management can poll until the guests actually reacts to the
command. They can then issue the "change" command after the tray has been
opened, or try with "eject -f" after a (configurable?) timeout. However,
with this patch and the corresponding support in the device models,
at least it is possible to do a manual two-step eject+change sequence.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
change fails while the tray is locked by the guest. eject -f forces
it open and removes any media. Unfortunately, the tray closes again
instantly. Since the lock remains as it is, there is no way to insert
another medium unless the guest voluntarily unlocks.
Fix by leaving the tray open after monitor eject.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It's a confused mess (see previous commit). No users remain.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
BlockDriverState member removable is a confused mess. It is true when
an ide-cd, scsi-cd or floppy qdev is attached, or when the
BlockDriverState was created with -drive if={floppy,sd} or -drive
if={ide,scsi,xen,none},media=cdrom ("created removable"), except when
an ide-hd, scsi-hd, scsi-generic or virtio-blk qdev is attached.
Three users remain:
1. eject_device(), via bdrv_is_removable() uses it to determine
whether a block device can eject media.
2. bdrv_info() is monitor command "info block". QMP documentation
says "true if the device is removable, false otherwise". From the
monitor user's point of view, the only sensible interpretation of
"is removable" is "can eject media with monitor commands eject and
change".
A block device can eject media unless a device is attached that
doesn't support it. Switch the two users over to new
bdrv_dev_has_removable_media() that returns exactly that.
3. bdrv_getlength() uses to suppress its length cache when media can
change (see commit 46a4e4e6). Media change is either monitor
command change (updates the length cache), monitor command eject
(doesn't update the length cache, easily fixable), or physical
media change (invalidates length cache, not so easily fixable).
I'm refraining from improving anything here, because this series is
long enough already. Instead, I simply switch it over to
bdrv_dev_has_removable_media() as well.
This changes the behavior of the length cache and of monitor commands
eject and change in two cases:
a. drive not created removable, no device attached
The commit makes the drive removable, and defeats the length cache.
Example: -drive if=none
b. drive created removable, but the attached drive is non-removable,
and doesn't call bdrv_set_removable(..., 0) (most devices don't)
The commit makes the drive non-removable, and enables the length
cache.
Example: -drive if=xen,media=cdrom -M xenpv
The other non-removable devices that don't call
bdrv_set_removable() can't currently use a drive created removable,
either because they aren't qdevified, or because they lack a drive
property. Won't stay that way.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Requires new BlockDevOps member is_medium_locked(). Implement for IDE
and SCSI CD-ROMs.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For now, this just protects against programming errors like having the
same drive back multiple non-qdev devices, or untimely bdrv_delete().
Later commits will add other interesting uses.
While there, rename BlockDriverState member peer to dev, bdrv_attach()
to bdrv_attach_dev(), bdrv_detach() to bdrv_detach_dev(), and
bdrv_get_attached() to bdrv_get_attached_dev().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch introduces bdrv_parse_cache_flags() which sets open flags
given a cache mode. Previously this was duplicated in blockdev.c and
qemu-img.c.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Ejecting hard disk platters can only end in tears.
If you need to revoke access to an image, use drive_del, not eject -f.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add QMP bits for snapshot_blkdev command. This is the same as
snapshot_blkdev in the human monitor. The command is synchronous.
In the future async commands and or a break down of the functionality
into multiple commands might be added.
Also change the 'snapshot_file' argument to 'snapshot-file' in
the human monitor, so that it matches QMP.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@gmail.com>
The current message doesn't clearly communicate the error cause.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Change BDRV_O_NOCACHE to only imply bypassing the host OS file cache,
but no writeback semantics. All existing callers are changed to also
specify BDRV_O_CACHE_WB to give them writeback semantics.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
No users of bdrv_get_type_hint() left. bdrv_set_type_hint() can make
the media removable by side effect. Make that explicit.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
DriveInfo is closely tied to -drive, and like -drive, it mixes
information about host and guest part of the block device. Unlike
DriveInfo, BlockDriverState should be about the host part only.
One of the remaining guest bits there is the "type hint". -drive
option media sets it, and qdevs "ide-drive", "scsi-disk" and non-qdev
IF_XEN devices check it to pick HD vs. CD.
Communicate -drive option media via new DriveInfo member media_cd
instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When removing a drive from the host-side via drive_del we currently have
the following path:
drive_del
qemu_aio_flush()
bdrv_close() // zaps bs->drv, which makes any subsequent I/O get
// dropped. Works as designed
drive_uninit()
bdrv_delete() // frees the bs. Since the device is still connected to
// bs, any subsequent I/O is a use-after-free.
The value of bs->drv becomes unpredictable on free. As long as it
remains null, I/O still gets dropped, however it could become non-null
at any point after the free resulting SEGVs or other QEMU state
corruption.
To resolve this issue as simply as possible, we can chose to not
actually delete the BlockDriverState pointer. Since bdrv_close()
handles setting the drv pointer to NULL, we just need to remove the
BlockDriverState from the QLIST that is used to enumerate the block
devices. This is currently handled within bdrv_delete, so move this
into its own function, bdrv_make_anon().
The result is that we can now invoke drive_del, this closes the file
descriptors and sets BlockDriverState->drv to NULL which prevents futher
IO to the device, and since we do not free BlockDriverState, we don't
have to worry about the copy retained in the block devices.
We also don't attempt to remove the qdev property since we are no longer
deleting the BlockDriverState on drives with associated drives. This
also allows for removing Drives with no devices associated either.
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We have two different virtio buses: pci and s390. The abstraction path
taken in qemu is to have generic aliases for each device type in the
architecture specific qdev devices.
So let's make use of these aliases whenever we can and define them
whenever we can.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
In case we cannot open the newly created snapshot image, try to fall
back to the original image file and continue running on that, which
should prevent the guest from aborting.
This is a corner case which can happen if the admin by mistake
specifies the snapshot file on a virtual file system which does not
support O_DIRECT. bdrv_create() does not use O_DIRECT, but the
following open in bdrv_open() does and will then fail.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>