Commit Graph

195 Commits

Author SHA1 Message Date
Fam Zheng
fbe40ff780 block: Introduce op_blockers to BlockDriverState
BlockDriverState.op_blockers is an array of lists with BLOCK_OP_TYPE_MAX
elements. Each list is a list of blockers of an operation type
(BlockOpType), that marks this BDS as currently blocked for a certain
type of operation with reason errors stored in the list. The rule of
usage is:

 * BDS user who wants to take an operation should check if there's any
   blocker of the type with bdrv_op_is_blocked().

 * BDS user who wants to block certain types of operation, should call
   bdrv_op_block (or bdrv_op_block_all to block all types of operations,
   which is similar to the existing bdrv_set_in_use()).

 * A blocker is only referenced by op_blockers, so the lifecycle is
   managed by caller, and shouldn't be lost until unblock, so typically
   a caller does these:

   - Allocate a blocker with error_setg or similar, call bdrv_op_block()
     to block some operations.
   - Hold the blocker, do his job.
   - Unblock operations that it blocked, with the same reason pointer
     passed to bdrv_op_unblock().
   - Release the blocker with error_free().

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-28 14:28:46 +02:00
Fam Zheng
8574575f90 block: Add BlockOpType enum
This adds the enum of all the operations that can be taken on a block
device.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-28 14:28:46 +02:00
Peter Lieven
465bee1da8 block: optimize zero writes with bdrv_write_zeroes
this patch tries to optimize zero write requests
by automatically using bdrv_write_zeroes if it is
supported by the format.

This significantly speeds up file system initialization and
should speed zero write test used to test backend storage
performance.

I ran the following 2 tests on my internal SSD with a
50G QCOW2 container and on an attached iSCSI storage.

a) mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdX

QCOW2         [off]     [on]     [unmap]
-----
runtime:       14secs    1.1secs  1.1secs
filesize:      937M      18M      18M

iSCSI         [off]     [on]     [unmap]
----
runtime:       9.3s      0.9s     0.9s

b) dd if=/dev/zero of=/dev/vdX bs=1M oflag=direct

QCOW2         [off]     [on]     [unmap]
-----
runtime:       246secs   18secs   18secs
filesize:      51G       192K     192K
throughput:    203M/s    2.3G/s   2.3G/s

iSCSI*        [off]     [on]     [unmap]
----
runtime:       8mins     45secs   33secs
throughput:    106M/s    1.2G/s   1.6G/s
allocated:     100%      100%     0%

* The storage was connected via an 1Gbit interface.
  It seems to internally handle writing zeroes
  via WRITESAME16 very fast.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-05-19 13:42:27 +02:00
Kevin Wolf
e88ae2264d block: Fix bdrv_is_allocated() for short backing files
bdrv_is_allocated() shouldn't return true for sectors that are
unallocated, but after the end of a short backing file, even though
such sectors are (correctly) marked as containing zeros.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-05-19 11:36:48 +02:00
Kevin Wolf
b1e6fc0817 block: Fix open flags with BDRV_O_SNAPSHOT
The immediately visible effect of this patch is that it fixes committing
a temporary snapshot to its backing file. Previously, it would fail with
a "permission denied" error because bdrv_inherited_flags() forced the
backing file to be read-only, ignoring the r/w reopen of bdrv_commit().

The bigger problem this revealed is that the original open flags must
actually only be applied to the temporary snapshot, and the original
image file must be treated as a backing file of the temporary snapshot
and get the right flags for that.

Reported-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-09 20:57:31 +02:00
Fam Zheng
85f49cad87 qemu-img: Convert by cluster size if target is compressed
If target block driver forces compression, qemu-img convert needs to
write by cluster size as well as "-c" option.

Particularly, this applies for converting to VMDK streamOptimized
format.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-09 13:32:16 +02:00
Kevin Wolf
8bfea15dda block: Unlink temporary files in raw-posix/win32
Instead of having unlink() calls in the generic block layer, where we
aren't even guarateed to have a file name, move them to those block
drivers that are actually used and that always have a filename. Gets us
rid of some #ifdefs as well.

The patch also converts bs->is_temporary to a new BDRV_O_TEMPORARY open
flag so that it is inherited in the protocol layer and the raw-posix and
raw-win32 drivers can unlink the file.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-30 11:05:00 +02:00
Kevin Wolf
98522f63f4 block: Add errp to bdrv_new()
This patch adds an errp parameter to bdrv_new() and updates all its
callers. The next patches will make use of this in order to check for
duplicate IDs. Most of the callers know that their ID is fine, so they
can simply assert that there is no error.

Behaviour doesn't change with this patch yet as bdrv_new() doesn't
actually assign errors to errp.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-04-22 12:00:20 +02:00
Fam Zheng
b8afb520e4 block: Handle error of bdrv_getlength in bdrv_create_dirty_bitmap
bdrv_getlength could fail, check the return value before using it.
Return NULL and set errno if it fails. Callers are updated to handle
the error case.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-04-22 11:57:02 +02:00
Kevin Wolf
b998875dcf block: Fix snapshot=on for protocol parsed from filename
Since commit 9fd3171a, BDRV_O_SNAPSHOT uses an option QDict to specify
the originally requested image as the backing file of the newly created
temporary snapshot. This means that the filename is stored in
"file.filename", which is an option that is not parsed for protocol
names. Therefore things like -drive file=nbd:localhost:10809 were
broken because it looked for a local file with the literal name
'nbd:localhost:10809'.

This patch changes the way BDRV_O_SNAPSHOT works once again. We now open
the originally requested image as normal, and then do a similar
operation as for live snapshots to put the temporary snapshot on top.
This way, both driver specific options and parsed filenames work.

As a nice side effect, this results in code movement to factor
bdrv_append_temp_snapshot() out. This is a good preparation for moving
its call to drive_init() and friends eventually.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-04 19:35:51 +02:00
Kevin Wolf
5a8a30db47 block: Add error handling to bdrv_invalidate_cache()
If it returns an error, the migrated VM will not be started, but qemu
exits with an error message.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-03-19 09:39:41 +01:00
Stefan Hajnoczi
98563fc3ec aio: add aio_context_acquire() and aio_context_release()
It can be useful to run an AioContext from a thread which normally does
not "own" the AioContext.  For example, request draining can be
implemented by acquiring the AioContext and looping aio_poll() until all
requests have been completed.

The following pattern should work:

  /* Event loop thread */
  while (running) {
      aio_context_acquire(ctx);
      aio_poll(ctx, true);
      aio_context_release(ctx);
  }

  /* Another thread */
  aio_context_acquire(ctx);
  bdrv_read(bs, 0x1000, buf, 1);
  aio_context_release(ctx);

This patch implements aio_context_acquire() and aio_context_release().

Note that existing aio_poll() callers do not need to worry about
acquiring and releasing - it is only needed when multiple threads will
call aio_poll() on the same AioContext.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-03-13 14:42:24 +01:00
Benoît Canet
b5042a3622 block: Rewrite the snapshot authorization mechanism for block filters.
This patch keep the recursive way of doing things but simplify it by giving
two responsabilities to all block filters implementors.

They will need to do two things:

-Set the is_filter field of their block driver to true.

-Implement the bdrv_recurse_is_first_non_filter method of their block driver like
it is done on the Quorum block driver. (block/quorum.c)

[Paolo Bonzini <pbonzini@redhat.com> pointed out that this patch changes
the semantics of blkverify, which now recurses down both bs->file and
s->test_file.
-- Stefan]

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-03-13 14:23:27 +01:00
Paolo Bonzini
537b41f501 nbd: move socket wrappers to qemu-nbd
qemu-nbd is one of the few valid users of qerror_report_err.  Move
the error-reporting socket wrappers there.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:23 +01:00
Paolo Bonzini
c06b72781d nbd: inline tcp_socket_incoming_spec into sole caller
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Paolo Bonzini
77e8b9ca64 nbd: correctly propagate errors
Before:
    $ ./qemu-io-old
    qemu-io-old> open -r -o file.driver=nbd
    one of path and host must be specified.
    qemu-io-old: can't open device (null): Could not open image: Invalid argument
    $ ./qemu-io-old
    qemu-io-old> open -r -o file.driver=nbd,file.host=foo,file.path=bar
    path and host may not be used at the same time.
    qemu-io-old: can't open device (null): Could not open image: Invalid argument

After:
    $ ./qemu-io
    qemu-io> open -r -o file.driver=nbd
    qemu-io: can't open device (null): one of path and host must be specified.
    $ ./qemu-io
    qemu-io> open -r -o file.driver=nbd,file.host=foo,file.path=bar
    qemu-io: can't open device (null): path and host may not be used at the same time.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz
f7d9fd8c72 block: Remove bdrv_open_image()'s force_raw option
This option is now unnecessary since specifying BDRV_O_PROTOCOL as flag
will do exactly the same.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz
2e40134bfd block: Make bdrv_file_open() static
Add the bdrv_open() option BDRV_O_PROTOCOL which results in passing the
call to bdrv_file_open(). Additionally, make bdrv_file_open() static and
therefore bdrv_open() the only way to call it.

Consequently, all existing calls to bdrv_file_open() have to be adjusted
to use bdrv_open() with the BDRV_O_PROTOCOL flag instead.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz
ddf5636dc9 block: Add reference parameter to bdrv_open()
Allow bdrv_open() to handle references to existing block devices just as
bdrv_file_open() is already capable of.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz
f67503e5bd block: Change BDS parameter of bdrv_open() to **
Make bdrv_open() take a pointer to a BDS pointer, similarly to
bdrv_file_open(). If a pointer to a NULL pointer is given, bdrv_open()
will create a new BDS with an empty name; if the BDS pointer is not
NULL, that existing BDS will be reused (in the same way as bdrv_open()
already did).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:21 +01:00
Kevin Wolf
9e1cb96d9a qemu-iotests: Test pwritev RMW logic
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:25 +01:00
Kevin Wolf
8407d5d7e2 block: Make bdrv_pwrite() a bdrv_prwv_co() wrapper
Instead of implementing the alignment adjustment here, use the now
existing functionality of bdrv_co_do_pwritev().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:03 +01:00
Kevin Wolf
6460440f34 block: Allow wait_serialising_requests() at any point
We can only have a single wait_serialising_requests() call per request
because otherwise we can run into deadlocks where requests are waiting
for each other. The same is true when wait_serialising_requests() is not
at the very beginning of a request, so that other requests can be issued
between the start of the tracking and wait_serialising_requests().

Fix this by changing wait_serialising_requests() to ignore requests that
are already (directly or indirectly) waiting for the calling request.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
7327145f63 block: Make overlap range for serialisation dynamic
Copy on Read wants to serialise with all requests touching the same
cluster, so wait_serialising_requests() rounded to cluster boundaries.
Other users like alignment RMW will have different requirements, though
(requests touching the same sector), so make it dynamic.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
2dbafdc012 block: Generalise and optimise COR serialisation
Change the API so that specific requests can be marked serialising. Only
these requests are checked for overlaps then.

This means that during a Copy on Read operation, not all requests
overlapping other requests are serialised any more, but only those that
actually overlap with the specific COR request.

Also remove COR from function and variable names because this
functionality can be useful in other contexts.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
793ed47a7a block: Switch BdrvTrackedRequest to byte granularity
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Paolo Bonzini
c25f53b06e raw: Probe required direct I/O alignment
Add a bs->request_alignment field that contains the required
offset/length alignment for I/O requests and fill it in the raw block
drivers. Use ioctls if possible, else see what alignment it takes for
O_DIRECT to succeed.

While at it, also expose the memory alignment requirements, which may be
(and in practice are) different from the disk alignment requirements.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:02 +01:00
Paolo Bonzini
1b7fd72955 block: rename buffer_alignment to guest_block_size
The alignment field is now set to the value that is promised to the
guest, rather than required by the host.  The next patches will make
QEMU aware of the host-provided values, so make this clear.

The alignment is also not about memory buffers, but about the sectors on
the disk, change the documentation of the field.

At this point, the field is set by the device emulation, but completely
ignored by the block layer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:01 +01:00
Kevin Wolf
339064d506 block: Don't use guest sector size for qemu_blockalign()
bs->buffer_alignment is set by the device emulation and contains the
logical block size of the guest device. This isn't something that the
block layer should know, and even less something to use for determining
the right alignment of buffers to be used for the host.

The new BlockLimits field opt_mem_alignment tells the qemu block layer
the optimal alignment to be used so that no bounce buffer must be used
in the driver.

This patch may change the buffer alignment from 4k to 512 for all
callers that used qemu_blockalign() with the top-level image format
BlockDriverState. The value was never propagated to other levels in the
tree, so in particular raw-posix never required anything else than 512.

While on disks with 4k sectors direct I/O requires a 4k alignment,
memory may still be okay when aligned to 512 byte boundaries. This is
what must have happened in practice, because otherwise this would
already have failed earlier. Therefore I don't expect regressions even
with this intermediate state. Later, raw-posix can implement the hook
and expose a different memory alignment requirement.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:01 +01:00
Kevin Wolf
355ef4ac95 block: Update BlockLimits when they might have changed
When reopening with different flags, or when backing files disappear
from the chain, the limits may change. Make sure they get updated in
these cases.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit@irqsave.net>
2014-01-24 17:40:01 +01:00
Kevin Wolf
d34682cd4a block: Move initialisation of BlockLimits to bdrv_refresh_limits()
This function separates filling the BlockLimits from bdrv_open(), which
allows it to call it from other operations which may change the limits
(e.g. modifications to the backing file chain or bdrv_reopen)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:01 +01:00
Benoît Canet
212a5a8f09 block: Create authorizations mechanism for external snapshot and resize.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 16:07:08 +01:00
Benoît Canet
12d3ba821d qmp: Allow to change password on named block driver states.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Fam Zheng <famz@redhat.com>

There was two candidate ways to implement named node manipulation:

1)
{ 'command': 'block_passwd', 'data': {'*device': 'str',
                                      '*node-name': 'str', 'password': 'str'}
}

2)

{ 'command': 'block_passwd', 'data': {'device': 'str',
                                      '*device-is-node': 'bool',
                                      'password': 'str'} }

Luiz proposed 1 and says 2 was an abuse of the QMP interface and proposed to
rewrite the QMP block interface for 2.0.

Luiz does not like in 1 the fact that 2 fields are optional but one of them must
be specified leading to an abuse of the QMP semantic.

Kevin argumented that 2 what a clear abuse of the device field and would not be
practical when reading fast some log file because the user would read "device"
and think that a device is manipulated when it's in fact a node name.
Documentation of 1 make it pretty clear what to do for the user.

Kevin argued that all bs are node including devices ones so 2 does not make
sense.

Kevin also argued that rewriting the QMP block interface would not make disapear
the current one.

Kevin pushed the argument that making the QAPI generator compatible with the
semantic of the operation would need a rewrite that no one has done yet.

A vote has been done on the list to elect the version to use and 1 won.

For reference the complete thread is:
"[Qemu-devel] [PATCH V4 4/7] qmp: Allow to change password on names block driver
states."

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 16:07:08 +01:00
Benoît Canet
c13163fba1 qmp: Add QMP query-named-block-nodes to list the named BlockDriverState nodes.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 16:07:08 +01:00
Benoît Canet
dc364f4cdc block: Add bs->node_name to hold the name of a bs node of the bs graph.
Add the minimum of code to prepare for the following patches.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 14:33:01 +01:00
Max Reitz
da557aac18 block: Add bdrv_open_image()
Add a common function for opening images to be used for block drivers
specified through BlockdevRefs in an option QDict. The difference from
bdrv_file_open() is that this function may invoke bdrv_open() instead,
allowing auto-detection of the driver to be used; and second, it
automatically extracts the BlockdevRef from the option QDict.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-22 12:07:18 +01:00
Max Reitz
72daa72eee block: Allow reference for bdrv_file_open()
Allow specifying a reference to an existing block device (by name) for
bdrv_file_open() instead of a filename and/or options.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-22 12:07:17 +01:00
Fam Zheng
03544a6e9e block: Add commit_active_start()
commit_active_start is implemented in block/mirror.c, It will create a
job with "commit" type and designated base in block-commit command. This
will be used for committing active layer of device.

Sync mode is removed from MirrorBlockJob because there's no proper type
for commit. The used information is is_none_mode.

The common part of mirror_start and commit_active_start is moved to
mirror_start_job().

Fix the comment wording for commit_start.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-20 16:26:16 +01:00
Peter Lieven
7337acaf21 block: add opt_transfer_length to BlockLimits
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-05 11:45:24 +01:00
Wenchao Xia
8c116b0e41 qemu-nbd: support internal snapshot export
Now it is possible to directly export an internal snapshot, which
can be used to probe the snapshot's contents without qemu-img
convert.

Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-04 15:19:00 +01:00
Wenchao Xia
7b4c4781e3 snapshot: distinguish id and name in load_tmp
Since later this function will be used so improve it. The only caller of it
now is qemu-img, and it is not impacted by introduce function
bdrv_snapshot_load_tmp_by_id_or_name() that call bdrv_snapshot_load_tmp()
twice to keep old search logic. bdrv_snapshot_load_tmp_by_id_or_name() return
int to let caller know the errno, and errno will be used later.
Also fix a typo in comments of bdrv_snapshot_delete().

Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-04 15:19:00 +01:00
Paolo Bonzini
d5ef94d43d block: add bdrv_aio_write_zeroes
This will be used by the SCSI layer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-03 15:26:49 +01:00
Paolo Bonzini
d20d9b7c67 block: add flags to BlockRequest
This lets bdrv_co_do_rw receive flags, so that it can be used for
zero writes.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-03 15:26:48 +01:00
Marc-André Lureau
7b6b145dbc coroutine: remove unused CoQueue AioContext
The AioContext ctx field is apparently unused in qemu codebase since
02ffb50448.

Signed-off-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-02 17:11:49 +01:00
Marc-André Lureau
f287c41381 coroutine: remove qemu_co_queue_wait_insert_head
qemu_co_queue_wait_insert_head() is unused in qemu code base now.

Signed-off-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-02 17:11:49 +01:00
Fam Zheng
4cc70e9337 blkdebug: add "remove_break" command
This adds "remove_break" command which is the reverse of blkdebug
command "break": it removes all breakpoints with given tag and resumes
all the requests.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-29 13:40:37 +01:00
Liu Yuan
b3af018f3b sheepdog: support user-defined redundancy option
Sheepdog support two kinds of redundancy, full replication and erasure coding.

# create a fully replicated vdi with x copies
 -o redundancy=x (1 <= x <= SD_MAX_COPIES)

# create a erasure coded vdi with x data strips and y parity strips
 -o redundancy=x:y (x must be one of {2,4,8,16} and 1 <= y < SD_EC_MAX_STRIP)

E.g, to convert a vdi into sheepdog vdi 'test' with 8:3 erasure coding scheme

$ qemu-img convert -o redundancy=8:3 linux-0.2.img sheepdog:test

Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-29 13:40:37 +01:00
Fam Zheng
21b5683508 qapi: Change BlockDirtyInfo to list
We have multiple dirty bitmaps in BDS now, switch QAPI to allow query
it (BlockInfo.dirty_bitmaps), and also drop old BlockInfo.dirty.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-11-29 13:40:36 +01:00
Fam Zheng
e4654d2d94 block: per caller dirty bitmap
Previously a BlockDriverState has only one dirty bitmap, so only one
caller (e.g. a block job) can keep track of writing. This changes the
dirty bitmap to a list and creates a BdrvDirtyBitmap for each caller, the
lifecycle is managed with these new functions:

    bdrv_create_dirty_bitmap
    bdrv_release_dirty_bitmap

Where BdrvDirtyBitmap is a linked list wrapper structure of HBitmap.

In place of bdrv_set_dirty_tracking, a BdrvDirtyBitmap pointer argument
is added to these functions, since each caller has its own dirty bitmap:

    bdrv_get_dirty
    bdrv_dirty_iter_init
    bdrv_get_dirty_count

bdrv_set_dirty and bdrv_reset_dirty prototypes are unchanged but will
internally walk the list of all dirty bitmaps and set them one by one.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-11-29 13:40:33 +01:00
Peter Lieven
d75cbb5e68 block: introduce bdrv_make_zero
this patch adds a call to completely zero out a block device.
the operation is sped up by checking the block status and
only writing zeroes to the device if they currently do not
return zeroes. optionally the zero writing can be sped up
by setting the flag BDRV_REQ_MAY_UNMAP to emulate the zero
write by unmapping if the driver supports it.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:52 +01:00
Peter Lieven
fe81c2cca6 block: add BlockLimits structure to BlockDriverState
this patch adds BlockLimits which introduces discard and write_zeroes
limits and alignment information to the BlockDriverState.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:51 +01:00
Peter Lieven
4ce786914b block: add wrappers for logical block provisioning information
This adds 2 wrappers to read the unallocated_blocks_are_zero and
can_write_zeroes_with_unmap info from the BDI. The wrappers are
required to check for the existence of a backing_hd and
if the devices are opened with the correct flags.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:51 +01:00
Peter Lieven
e1a5c4bed4 block: add logical block provisioning info to BlockDriverInfo
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:51 +01:00
Peter Lieven
d32f35cbc5 block: introduce BDRV_REQ_MAY_UNMAP request flag
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:51 +01:00
Peter Lieven
aa7bfbfff7 block: add flags to bdrv_*_write_zeroes
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:51 +01:00
Peter Lieven
6faac15fa8 block: make BdrvRequestFlags public
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:51 +01:00
MORITA Kazutaka
3ab7bd1917 coroutine: add co_aio_sleep_ns() to allow sleep in block drivers
This helper function behaves similarly to co_sleep_ns(), but the
sleeping coroutine will be resumed when using qemu_aio_wait().

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Tested-by: Liu Yuan <namei.unix@gmail.com>
Reviewed-by: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-30 12:22:09 +01:00
Kevin Wolf
b94a261057 block: Avoid unecessary drv->bdrv_getlength() calls
The block layer generally keeps the size of an image cached in
bs->total_sectors so that it doesn't have to perform expensive
operations to get the size whenever it needs it.

This doesn't work however when using a backend that can change its size
without qemu being aware of it, i.e. passthrough of removable media like
CD-ROMs or floppy disks. For this reason, the caching is disabled when a
removable device is used.

It is obvious that checking whether the _guest_ device has removable
media isn't the right thing to do when we want to know whether the size
of the host backend can change. To make things worse, non-top-level
BlockDriverStates never have any device attached, which makes qemu
assume they are removable, so drv->bdrv_getlength() is always called on
the protocol layer. In the case of raw-posix, this causes unnecessary
lseek() system calls, which turned out to be rather expensive.

This patch completely changes the logic and disables bs->total_sectors
caching only for certain block driver types, for which a size change is
expected: host_cdrom and host_floppy on POSIX, host_device on win32; also
the raw format in case it sits on top of one of these protocols, but in
the common case the nested bdrv_getlength() call on the protocol driver
will use the cache again and avoid an expensive drv->bdrv_getlength()
call.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-29 13:10:26 +01:00
Benoît Canet
f6186f49e2 block: Add BlockDriver.bdrv_check_ext_snapshot.
This field is used by blkverify to disable external snapshots creation.
It will also be used by block filters like quorum to disable external
snapshot creation.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 16:49:59 +02:00
Peter Lieven
92bc50a5ad block/get_block_status: avoid redundant callouts on raw devices
if a raw device like an iscsi target or host device is used
the current implementation makes a second call out to get
the block status of bs->file.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 16:49:59 +02:00
Max Reitz
a8d8ecb77f block/qapi: Human-readable ImageInfoSpecific dump
Add a function for generically dumping the ImageInfoSpecific information
in a human-readable format to block/qapi.c.

Use this function in bdrv_image_info_dump and qemu-io-cmds.c:info_f to
allow qemu-img info resp. qemu-io -c info to print that format specific
information.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 10:52:54 +02:00
Max Reitz
eae041fe6f block: Add bdrv_get_specific_info
Add a function for retrieving an ImageInfoSpecific object from a block
driver.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 10:52:54 +02:00
Fam Zheng
79e14bf778 qapi: make use of new BlockJobType
Switch the string to enum type BlockJobType in BlockJobDriver.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 10:52:54 +02:00
Fam Zheng
3fc4b10af0 blockjob: rename BlockJobType to BlockJobDriver
We will use BlockJobType as the enum type name of block jobs in QAPI,
rename current BlockJobType to BlockJobDriver, which will eventually
become a set of operations, similar to block drivers.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 10:52:54 +02:00
Benoît Canet
030be32184 block: introduce BlockDriver.bdrv_needs_filename to enable some drivers.
Some drivers will have driver specifics options but no filename.
This new bool allow the block layer to treat them correctly.

The .bdrv_needs_filename is set in drivers not having .bdrv_parse_filename and
not having .bdrv_open.

The first exception to this rule will be the quorum driver.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 16:21:28 +02:00
Max Reitz
cc84d90ff5 block: Error parameter for create functions
Add an Error ** parameter to bdrv_create and its associated functions to
allow more specific error messages.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2013-09-12 10:12:48 +02:00
Max Reitz
34b5d2c68e block: Error parameter for open functions
Add an Error ** parameter to bdrv_open, bdrv_file_open and associated
functions to allow more specific error messages.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2013-09-12 10:12:48 +02:00
Max Reitz
d5124c00d8 bdrv: Use "Error" for creating images
Add an Error ** parameter to BlockDriver.bdrv_create to allow more
specific error messages.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2013-09-12 10:12:48 +02:00
Max Reitz
015a1036a7 bdrv: Use "Error" for opening images
Add an Error ** parameter to BlockDriver.bdrv_open and
BlockDriver.bdrv_file_open to allow more specific error messages.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2013-09-12 10:12:47 +02:00
Wenchao Xia
a89d89d3e6 snapshot: distinguish id and name in snapshot delete
Snapshot creation actually already distinguish id and name since it take
a structured parameter *sn, but delete can't. Later an accurate delete
is needed in qmp_transaction abort and blockdev-snapshot-delete-sync,
so change its prototype. Also *errp is added to tip error, but return
value is kepted to let caller check what kind of error happens. Existing
caller for it are savevm, delvm and qemu-img, they are not impacted by
introducing a new function bdrv_snapshot_delete_by_id_or_name(), which
check the return value and do the operation again.

Before this patch:
  For qcow2, it search id first then name to find the one to delete.
  For rbd, it search name.
  For sheepdog, it does nothing.

After this patch:
  For qcow2, logic is the same by call it twice in caller.
  For rbd, it always fails in delete with id, but still search for name
in second try, no change to user.

Some code for *errp is based on Pavel's patch.

Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-12 10:12:47 +02:00
Wenchao Xia
2ea1dd758c snapshot: new function bdrv_snapshot_find_by_id_and_name()
To make it clear about id and name in searching, add this API
to distinguish them. Caller can choose to search by id or name,
*errp will be set only for exception.

Some code are modified based on Pavel's patch.

Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-12 10:12:47 +02:00
Max Reitz
6f176b48f9 block: Image file option amendment
This patch adds the "amend" option to qemu-img which allows changing
image options on existing image files. It also adds the generic bdrv
implementation which is basically just a wrapper for the image format
specific function.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-12 10:12:46 +02:00
Paolo Bonzini
4333bb7140 block: define get_block_status return value
Define the return value of get_block_status.  Bits 0, 1, 2 and 9-62
are valid; bit 63 (the sign bit) is reserved for errors.  Bits 3-8
are left for future extensions.

The return code is compatible with the old is_allocated API: if a driver
only returns 0 or 1 (aka BDRV_BLOCK_DATA) like is_allocated used to,
clients of is_allocated will not have any change in behavior.  Still,
we will return more precise information in the next patches and the
new definition of bdrv_is_allocated is already prepared for this.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini
b6b8a33354 block: introduce bdrv_get_block_status API
For now, bdrv_get_block_status is just another name for bdrv_is_allocated.
The next patches will add more flags.

This also touches all block drivers with a mostly mechanical rename.  The
sole exception is cow; because it calls cow_co_is_allocated from the read
code, we keep that function and make cow_co_get_block_status a wrapper.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini
4f5786376e block: remove bdrv_is_allocated_above/bdrv_co_is_allocated_above distinction
Now that bdrv_is_allocated detects coroutine context, the two can
use the same code.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini
bdad13b9de block: make bdrv_co_is_allocated static
bdrv_is_allocated can detect coroutine context and go through a fast
path, similar to other block layer functions.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Fam Zheng
4f6fd3491c block: make bdrv_delete() static
Manage BlockDriverState lifecycle with refcnt, so bdrv_delete() is no
longer public and should be called by bdrv_unref() if refcnt is
decreased to 0.

This is an identical change because effectively, there's no multiple
reference of BDS now: no caller of bdrv_ref() yet, only bdrv_new() sets
bs->refcnt to 1, so all bdrv_unref() now actually delete the BDS.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Fam Zheng
9fcb025146 block: implement reference count for BlockDriverState
Introduce bdrv_ref/bdrv_unref to manage the lifecycle of
BlockDriverState. They are unused for now but will used to replace
bdrv_delete() later.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Benoît Canet
cc0681c454 block: Enable the new throttling code in the block layer.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:07 +02:00
Max Reitz
afa50193cd qcow2-refcount: Repair shared refcount blocks
If the refcount of a refcount block is greater than one, we can at least
try to repair that problem by duplicating the affected block.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-02 10:06:59 +02:00
Alex Bligh
7483d1e547 aio / timers: convert block_job_sleep_ns and co_sleep_ns to new API
Convert block_job_sleep_ns and co_sleep_ns to use the new timer
API.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 19:14:24 +02:00
Alex Bligh
4e29e8311a aio / timers: Add aio_timer_init & aio_timer_new wrappers
Add aio_timer_init and aio_timer_new wrapper functions.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 19:10:28 +02:00
Alex Bligh
dae21b98b9 aio / timers: Add QEMUTimerListGroup to AioContext
Add a QEMUTimerListGroup each AioContext (meaning a QEMUTimerList
associated with each clock is added) and delete it when the
AioContext is freed.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 19:10:27 +02:00
Alex Bligh
6a1751b7aa aio / timers: Untangle include files
include/qemu/timer.h has no need to include main-loop.h and
doing so causes an issue for the next patch. Unfortunately
various files assume including timers.h will pull in main-loop.h.
Untangle this mess.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 19:10:27 +02:00
Asias He
0d51b4debe block: Introduce bs->zero_beyond_eof
In 4146b46c42e0989cb5842e04d88ab6ccb1713a48 (block: Produce zeros when
protocols reading beyond end of file), we break qemu-iotests ./check
-qcow2 022. This happens because qcow2 temporarily sets ->growable = 1
for vmstate accesses (which are stored beyond the end of regular image
data).

We introduce the bs->zero_beyond_eof to allow qcow2_load_vmstate() to
disable ->zero_beyond_eof temporarily in addition to enable ->growable.

[Since the broken patch "block: Produce zeros when protocols reading
beyond end of file" has not been merged yet, I have applied this fix
*first* and will then apply the next patch to keep the tree bisectable.
-- Stefan]

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 14:10:21 +02:00
Stefan Hajnoczi
f2e5dca46b aio: drop io_flush argument
The .io_flush() handler no longer exists and has no users.  Drop the
io_flush argument to aio_set_fd_handler() and related functions.

The AioFlushEventNotifierHandler and AioFlushHandler typedefs are no
longer used and are dropped too.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-19 15:52:19 +02:00
Benoît Canet
b681a1c73e block: Repair the throttling code.
The throttling code was segfaulting since commit
02ffb50448 because some qemu_co_queue_next caller
does not run in a coroutine.
qemu_co_queue_do_restart assume that the caller is a coroutinne.
As suggested by Stefan fix this by entering the coroutine directly.
Also make sure like suggested that qemu_co_queue_next() and
qemu_co_queue_restart_all() can be called only in coroutines.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-07-29 17:07:37 +02:00
Ian Main
fc5d3f8432 Implement sync modes for drive-backup.
This patch adds sync-modes to the drive-backup interface and
implements the FULL, NONE and TOP modes of synchronization.

FULL performs as before copying the entire contents of the drive
while preserving the point-in-time using CoW.
NONE only copies new writes to the target drive.
TOP copies changes to the topmost drive image and preserves the
point-in-time using CoW.

For sync mode TOP are creating a new target image using the same backing
file as the original disk image.  Then any new data that has been laid
on top of it since creation is copied in the main backup_run() loop.
There is an extra check in the 'TOP' case so that we don't bother to copy
all the data of the backing file as it already exists in the target.
This is where the bdrv_co_is_allocated() is used to determine if the
data exists in the topmost layer or below.

Also any new data being written is intercepted via the write_notifier
hook which ends up calling backup_do_cow() to copy old data out before
it gets overwritten.

For mode 'NONE' we create the new target image and only copy in the
original data from the disk image starting from the time the call was
made.  This preserves the point in time data by only copying the parts
that are *going to change* to the target image.  This way we can
reconstruct the final image by checking to see if the given block exists
in the new target image first, and if it does not, you can get it from
the original image.  This is basically an optimization allowing you to
do point-in-time snapshots with low overhead vs the 'FULL' version.

Since there is no old data to copy out the loop in backup_run() for the
NONE case just calls qemu_coroutine_yield() which only wakes up after
an event (usually cancel in this case).  The rest is handled by the
before_write notifier which again calls backup_do_cow() to write out
the old data so it can be preserved.

Signed-off-by: Ian Main <imain@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-07-26 22:01:31 +02:00
Peter Lieven
4105eaaab9 block: add bdrv_write_zeroes()
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-07-19 12:29:21 +08:00
Liu Ping Fan
dcc772e2f2 QEMUBH: make AioContext's bh re-entrant
BH will be used outside big lock, so introduce lock to protect
between the writers, ie, bh's adders and deleter. The lock only
affects the writers and bh's callback does not take this extra lock.
Note that for the same AioContext, aio_bh_poll() can not run in
parallel yet.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-07-19 12:29:21 +08:00
Kevin Wolf
f0f0fdfeec block: Add return value for bdrv_flush_all()
bdrv_flush() can fail, and bdrv_flush_all() should return an error as
well if this happens for a block device. It returns the first error
return now, but still at least tries to flush the remaining devices even
in error cases.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-07-15 09:51:27 +02:00
Kevin Wolf
98289620e0 block: Don't parse protocol from file.filename
One of the major reasons for doing something new for -blockdev and
blockdev-add was that the old block layer code parses filenames instead
of just taking them literally. So we should really leave it untouched
when it's passing using the new interfaces (like -drive
file.filename=...).

This allows opening relative file names that contain a colon.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-07-15 09:49:00 +02:00
Peter Lieven
3ac216270a block: change default of .has_zero_init to 0
.has_zero_init defaults to 1 for all formats and protocols.

this is a dangerous default since this means that all
new added drivers need to manually overwrite it to 0 if
they do not ensure that a device is zero initialized
after bdrv_create().

if a driver needs to explicitly set this value to
1 its easier to verify the correctness in the review process.

during review of the existing drivers it turned out
that ssh and gluster had a wrong default of 1.
both protocols support host_devices as backend
which are not by default zero initialized. this
wrong assumption will lead to possible corruption
if qemu-img convert is used to write to such a backend.

vpc and vmdk also defaulted to 1 altough they support
fixed respectively flat extends. this has to be addresses
in separate patches. both formats as well as the mentioned
ssh and gluster are turned to the default of 0 with this
patch for safety.

a similar problem with the wrong default existed for
iscsi most likely because the driver developer did
oversee the default value of 1.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-06-28 13:52:35 +02:00
Dietmar Maurer
98d2c6f2cd block: add basic backup support to block driver
backup_start() creates a block job that copies a point-in-time snapshot
of a block device to a target block device.

We call backup_do_cow() for each write during backup. That function
reads the original data from the block device before it gets
overwritten.  The data is then written to the target device.

Currently backup cluster size is hardcoded to 65536 bytes.

[I made a number of changes to Dietmar's original patch and folded them
in to make code review easy.  Here is the full list:

 * Drop BackupDumpFunc interface in favor of a target block device
 * Detect zero clusters with buffer_is_zero() and use bdrv_co_write_zeroes()
 * Use 0 delay instead of 1us, like other block jobs
 * Unify creation/start functions into backup_start()
 * Simplify cleanup, free bitmap in backup_run() instead of cb
 * function
 * Use HBitmap to avoid duplicating bitmap code
 * Use bdrv_getlength() instead of accessing ->total_sectors
 * directly
 * Delete the backup.h header file, it is no longer necessary
 * Move ./backup.c to block/backup.c
 * Remove #ifdefed out code
 * Coding style and whitespace cleanups
 * Use bdrv_add_before_write_notifier() instead of blockjob-specific hooks
 * Keep our own in-flight CowRequest list instead of using block.c
   tracked requests.  This means a little code duplication but is much
   simpler than trying to share the tracked requests list and use the
   backup block size.
 * Add on_source_error and on_target_error error handling.
 * Use trace events instead of DPRINTF()

-- stefanha]

Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-06-28 09:20:26 +02:00
Stefan Hajnoczi
d616b22474 block: add bdrv_add_before_write_notifier()
The bdrv_add_before_write_notifier() function installs a callback that
is invoked before a write request is processed.  This will be used to
implement copy-on-write point-in-time snapshots where we need to copy
out old data before overwriting it.

Note that BdrvTrackedRequest is moved to block_int.h since it is passed
to .notify() functions.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-06-28 09:20:26 +02:00
Michael R. Hines
9f05d0c3a4 rdma: export yield_until_fd_readable()
The RDMA event channel can be made non-blocking just like a TCP
socket. Exporting this function allows us to yield so that the
QEMU monitor remains available.

Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Tested-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2013-06-27 02:38:36 +02:00
Wenchao Xia
553a7e8718 qmp: add ImageInfo in BlockDeviceInfo used by query-block
Now image info will be retrieved as an embbed json object inside
BlockDeviceInfo, backing chain info and all related internal snapshot
info can be got in the enhanced recursive structure of ImageInfo. New
recursive member *backing-image is added to reflect the backing chain
status.

Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-06-07 13:37:45 +02:00
Wenchao Xia
43526ec8d1 block: add image info query function bdrv_query_image_info()
This patch adds function bdrv_query_image_info(), which will
retrieve image info in qmp object format. The implementation is
based on the code moved from qemu-img.c, but uses block layer
function to get snapshot info.

Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-06-07 13:37:45 +02:00
Wenchao Xia
fb0ed4539c block: add snapshot info query function bdrv_query_snapshot_info_list()
This patch adds function bdrv_query_snapshot_info_list(), which will
retrieve snapshot info of an image in qmp object format. The implementation
is based on the code moved from qemu-img.c with modification to fit more
for qmp based block layer API.

Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-06-07 13:37:45 +02:00
Kevin Wolf
bf736fe34c blkdebug: Add BLKDBG_FLUSH_TO_OS/DISK events
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-06-06 11:27:22 +02:00