If the sectors are unallocated and we are past the end of the
backing file, they will read as zero.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Alternatively, this could use a "discard zeroes data" flag returned
by bdrv_get_info.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Define the return value of get_block_status. Bits 0, 1, 2 and 9-62
are valid; bit 63 (the sign bit) is reserved for errors. Bits 3-8
are left for future extensions.
The return code is compatible with the old is_allocated API: if a driver
only returns 0 or 1 (aka BDRV_BLOCK_DATA) like is_allocated used to,
clients of is_allocated will not have any change in behavior. Still,
we will return more precise information in the next patches and the
new definition of bdrv_is_allocated is already prepared for this.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
For now, bdrv_get_block_status is just another name for bdrv_is_allocated.
The next patches will add more flags.
This also touches all block drivers with a mostly mechanical rename. The
sole exception is cow; because it calls cow_co_is_allocated from the read
code, we keep that function and make cow_co_get_block_status a wrapper.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This helps implementing is_allocated on top of get_block_status.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Some bdrv_is_allocated callers do not expect errors, but the fallback
in qcow2.c might make other callers trip on assertion failures or
infinite loops.
Fix the callers to always look for errors.
Cc: qemu-stable@nongnu.org
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Now that bdrv_is_allocated detects coroutine context, the two can
use the same code.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This is more robust when the device has removable media.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
bdrv_is_allocated can detect coroutine context and go through a fast
path, similar to other block layer functions.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If a BlockDriverState is growable, after every write we need to
check if bs->total_sectors might have changed. With this change,
bdrv_getlength does not need anymore a system call.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Manage BlockDriverState lifecycle with refcnt, so bdrv_delete() is no
longer public and should be called by bdrv_unref() if refcnt is
decreased to 0.
This is an identical change because effectively, there's no multiple
reference of BDS now: no caller of bdrv_ref() yet, only bdrv_new() sets
bs->refcnt to 1, so all bdrv_unref() now actually delete the BDS.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Introduce bdrv_ref/bdrv_unref to manage the lifecycle of
BlockDriverState. They are unused for now but will used to replace
bdrv_delete() later.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The failing condition is checked immediately before the assertion, so
keeping the assertion is kind of redundant.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is an autogenerated patch using scripts/switch-timer-api.
Switch the entire code base to using the new timer API.
Note this patch may introduce some line length issues.
Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
While Asias is debugging an issue creating qcow2 images on top of
non-file protocols. It boils down to this example using NBD:
$ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
Notice the open -g option to set bs->growable. This means you can
read/write beyond end of file. Reading beyond end of file is supposed
to produce zeroes.
We rely on this behavior in qcow2_create2() during qcow2 image
creation. We create a new file and then write the qcow2 header
structure using bdrv_pwrite(). Since QCowHeader is not a multiple of
sector size, block.c first uses bdrv_read() on the empty file to fetch
the first sector (should be all zeroes).
Here is the output from the qemu-io NBD example above:
$ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
00000000: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ................
00000010: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ................
00000020: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ................
...
We are not zeroing the buffer! As a result qcow2 image creation on top
of protocols is not guaranteed to work even when file creation is
supported by the protocol.
[Adapted this patch to use bs->zero_beyond_eof.
-- Stefan]
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In 4146b46c42e0989cb5842e04d88ab6ccb1713a48 (block: Produce zeros when
protocols reading beyond end of file), we break qemu-iotests ./check
-qcow2 022. This happens because qcow2 temporarily sets ->growable = 1
for vmstate accesses (which are stored beyond the end of regular image
data).
We introduce the bs->zero_beyond_eof to allow qcow2_load_vmstate() to
disable ->zero_beyond_eof temporarily in addition to enable ->growable.
[Since the broken patch "block: Produce zeros when protocols reading
beyond end of file" has not been merged yet, I have applied this fix
*first* and will then apply the next patch to keep the tree bisectable.
-- Stefan]
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If a block driver has no file descriptors to monitor but there are still
active requests, it can return 1 from .io_flush(). This is used to spin
during synchronous I/O.
Stop relying on .io_flush() and instead check
QLIST_EMPTY(&bs->tracked_requests) to decide whether there are active
requests.
This is the first step in removing .io_flush() so that event loops no
longer need to have the concept of synchronous I/O. Eventually we may
be able to kill synchronous I/O completely by running everything in a
coroutine, but that is future work.
Note this patch moves bs->throttled_reqs initialization to bdrv_new() so
that bdrv_requests_pending(bs) can safely access it. In practice bs is
g_malloc0() so the memory is already zeroed but it's safer to initialize
the queue properly.
We also need to fix up block/stream.c:close_unused_images() to prevent
traversing a dangling pointer while it rearranges the backing file
chain. This is necessary since the new bdrv_drain_all() traverses the
backing file chain.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In bdrv_delete() make sure to call bdrv_make_anon() *after* bdrv_close()
so that the device is still seen by bdrv_drain_all() when iterating
bdrv_states.
Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The throttling code was segfaulting since commit
02ffb50448 because some qemu_co_queue_next caller
does not run in a coroutine.
qemu_co_queue_do_restart assume that the caller is a coroutinne.
As suggested by Stefan fix this by entering the coroutine directly.
Also make sure like suggested that qemu_co_queue_next() and
qemu_co_queue_restart_all() can be called only in coroutines.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This is traditionally -drive format=..., which is now translated into
the new driver option. This gives us a more consistent way to select the
driver of BlockDriverStates that can be used in QMP context, too.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
bdrv_flush() can fail, and bdrv_flush_all() should return an error as
well if this happens for a block device. It returns the first error
return now, but still at least tries to flush the remaining devices even
in error cases.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
One of the major reasons for doing something new for -blockdev and
blockdev-add was that the old block layer code parses filenames instead
of just taking them literally. So we should really leave it untouched
when it's passing using the new interfaces (like -drive
file.filename=...).
This allows opening relative file names that contain a colon.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Since 80ccf93b we flush the block device during close. The
bdrv_drain_all() call should come before bdrv_flush() to ensure guest
write requests have completed. Otherwise we may miss pending writes
when flushing.
Call bdrv_drain_all() again for safety as the final step after
bdrv_flush(). This should not be necessary but we can be paranoid here
in case bdrv_flush() left I/O pending.
Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
.has_zero_init defaults to 1 for all formats and protocols.
this is a dangerous default since this means that all
new added drivers need to manually overwrite it to 0 if
they do not ensure that a device is zero initialized
after bdrv_create().
if a driver needs to explicitly set this value to
1 its easier to verify the correctness in the review process.
during review of the existing drivers it turned out
that ssh and gluster had a wrong default of 1.
both protocols support host_devices as backend
which are not by default zero initialized. this
wrong assumption will lead to possible corruption
if qemu-img convert is used to write to such a backend.
vpc and vmdk also defaulted to 1 altough they support
fixed respectively flat extends. this has to be addresses
in separate patches. both formats as well as the mentioned
ssh and gluster are turned to the default of 0 with this
patch for safety.
a similar problem with the wrong default existed for
iscsi most likely because the driver developer did
oversee the default value of 1.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The bdrv_add_before_write_notifier() function installs a callback that
is invoked before a write request is processed. This will be used to
implement copy-on-write point-in-time snapshots where we need to copy
out old data before overwriting it.
Note that BdrvTrackedRequest is moved to block_int.h since it is passed
to .notify() functions.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Turning on discard options in qcow2 doesn't help a lot when the discard
requests that it issues are thrown away by the raw-posix layer. This
patch always enables discard functionality on the protocol level so that
it's the image format's responsibility to send (or not) discard
requests. Requests sent by the guest will be allowed or ignored by the
top level BlockDriverState, which depends on the discard=... option like
before.
In particular, this means that even without specifying options, the
qcow2 default of discarding deleted snapshots actually takes effect now,
both for qemu and qemu-img.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The call to drv->bdrv_reopen_prepare() can fail due to reasons
other than an open failure. Unfortunately, we can't use errno
nor -ret, cause they are not always set.
Stick to a generic error message then.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
This patch is a pure code move patch, except following modification:
1 get_human_readable_size() is changed to static function.
2 dump_human_image_info() is renamed to bdrv_image_info_dump().
3 in qmp_query_block() and qmp_query_blockstats, use bdrv_next(bs)
instead of direct traverse of global array 'bdrv_states'.
4 collect_snapshots() and collect_image_info() are renamed, unused parameter
*fmt in collect_image_info() is removed.
5 code style fix.
To avoid conflict and tip better, macro in header file is BLOCK_QAPI_H
instead of QAPI_H. Now block.h and snapshot.h are at the same level in
include path, block_int.h and qapi.h will both include them.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All snapshot related code, except bdrv_snapshot_dump() and
bdrv_is_snapshot(), is moved to block/snapshot.c. bdrv_snapshot_dump()
will be moved to another file later. bdrv_is_snapshot() is not related
with internal snapshot. It also fixes small code style errors reported
by check script.
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The bs_snapshots global variable points to the BlockDriverState which
will be used to save vmstate. This is really a savevm.c concept but was
moved into block.c:bdrv_snapshots() when it became clear that hotplug
could result in a dangling pointer.
While auditing the block layer's global state I came upon bs_snapshots
and realized that a variable is not necessary here. Simply find the
first BlockDriverState capable of internal snapshots each time this is
needed.
The behavior of bdrv_snapshots() is preserved across hotplug because new
drives are always appended to the bdrv_states list. This means that
calling the new find_vmstate_bs() function is idempotent - it returns
the same BlockDriverState unless it was hot-unplugged.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We may want to include a driver in the whitelist for read only tasks
such as diagnosing or exporting guest data (with libguestfs as a good
example). This patch introduces a readonly whitelist option, and for
backward compatibility, the old configure option --block-drv-whitelist
is now an alias to rw whitelist.
Drivers in readonly list is only permitted to open file readonly, and
returns -ENOTSUP for RW opening.
E.g. To include vmdk readonly, and others read+write:
./configure --target-list=x86_64-softmmu \
--block-drv-rw-whitelist=qcow2,raw,file,qed \
--block-drv-ro-whitelist=vmdk
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The limit of qcow2 files at least depends on the cluster size. If the
image format has a cluster_size option, suggest to increase it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
filename was still uninitialised when it's used as a parameter to a
tracing function, so let's move the initialisation. Also, commit c2ad1b0c
forgot to add a NULL check, which this patch adds while we're at it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Andreas Färber <afaerber@suse.de>
Message-id: 1366645720-11384-1-git-send-email-kwolf@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
If a filename is passed in the driver-specific options from the command
line, the backing file path from the image is ignored now.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
This allows using the file.filename option instead of the string that
comes from -drive file=... and is passed around as a separate parameter.
The goal is to get rid of this parameter and use the options QDict more
consistently.
With this option you can access not only the top-level image, but
specify a filename for the backing file (currently only if no backing
file exists, but we'll allow overriding it later)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Options starting in "backing." are passed to the backing file now. If
you don't need to specify the filename for the backing file, you can add
it on the command line instead of in the image file:
$ qemu-nbd -t /tmp/test.img
$ qemu-img create -f qcow2 empty.qcow2 1G
$ qemu-system-x86_64 -drive file=empty.qcow2,backing.file.driver=nbd,\
backing.file.host=localhost
Note that this doesn't override the backing filename from the image. If
the image has one, this will fail because NBD doesn't want the options
and a filename at the same time.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Specifying the wrong driver could fail an assertion:
$ qemu-system-x86_64 -drive file.driver=qcow2,file=x
qemu-system-x86_64: block.c:721: bdrv_open_common: Assertion `file !=
((void *)0)' failed.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Directly pass the QEMUIOVector on instead of linearising it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The wait_time variable is in seconds. Reflect this in a comment and use
NANOSECONDS_PER_SECOND instead of BLOCK_IO_SLICE_TIME * 10 (which
happens to have the right value).
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-By: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The current slice is extended when an I/O request exceeds the limit.
There is no need to extend the slice every time we check a request.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-By: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It is not necessary to adjust the slice time at runtime. We already
extend the current slice in order to carry over accounting into the next
slice. Changing the actual slice time value introduces oscillations.
The guest may experience large changes in throughput or IOPS from one
moment to the next when slice times are adjusted.
Reported-by: Benoît Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-By: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
I/O throttling relies on bdrv_acct_done() which is called when a request
completes. This leaves a blind spot since we only charge for completed
requests, not submitted requests.
For example, if there is 1 operation remaining in this time slice the
guest could submit 3 operations and they will all be submitted
successfully since they don't actually get accounted for until they
complete.
Originally we probably thought this is okay since the requests will be
accounted when the time slice is extended. In practice it causes
fluctuations since the guest can exceed its I/O limit and it will be
punished for this later on.
Account for I/O upon submission so that I/O limits are enforced
properly.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-By: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_open_common() implements direct use of protocols by copying the
pre-opened BlockDriverStates to bs using bdrv_swap(). It did however
first set some fields in bs, which end up in file after the swap. When
bdrv_open() destroys file, it appears to be open, and because it isn't,
qemu could segfault while trying to close it.
Reorder the operations to return immediately in such cases so that file
is correctly detected as closed.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
After this patch, using -drive with an empty file name continues to open
the file if driver-specific options are used. If no driver-specific
options are specified, the semantics stay as it was: It defines a drive
without an inserted medium.
In order to achieve this, bdrv_open() must be made safe to work with a
NULL filename parameter. The assumption that is made is that only block
drivers which implement bdrv_parse_filename() support using driver
specific options and could therefore work without a filename. These
drivers must make sure to cope with NULL in their implementation of
.bdrv_open() (this is only NBD for now). For all other drivers, the
block layer code will make sure to error out before calling into their
code - they can't possibly work without a filename.
Now an NBD connection can be opened like this:
qemu-system-x86_64 -drive file.driver=nbd,file.port=1234,file.host=::1
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
bdrv_open() uses two different variables called options. Rename one of
them to avoid confusion and to allow the outer one to be accessed
everywhere.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
If a driver needs structured data and not just a string, it can provide
a .bdrv_parse_filename callback now that parses the command line string
into separate options. Keeping this separate from .bdrv_open_filename
ensures that the preferred way of directly specifying the options always
works as well if parsing the string works.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Specify -drive file.option=... on the command line to pass the option to
the protocol instead of the format driver.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
brdv_truncate() is also called from readv/writev commands on self-
growing file based storage. this will result in requests waiting
for theirselves to complete.
This reverts commit 9a665b2b86.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
realpath(3) is used to get an absolute path to the image file when
creating a -drive snapshot=on temporary qcow2. This does not work for
protocols since their filenames ("proto:foo:...") do not correspond to
file system paths.
Commit 7c96d46ec2 ("Let snapshot work with
protocols") skipped realpath(3) for protocols. Later on the "raw"
format was introduced and broke the check.
Use path_has_protocol(filename) to decide if this image uses a protocol
or a filename.
Reported-by: Richard Jones <rjones@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For now bdrv_get_aio_context() is just a stub that calls
qemu_aio_get_context() since the block layer is currently tied to the
main loop AioContext.
Add the stub now so that the block layer can begin accessing its
AioContext.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
The options are passed down to the block drivers, which are supposed to
remove all options they have processed. Anything that is left over in
the end is an unknown option and results in an error.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It doesn't do anything yet except storing the options QDict in the
BlockDriverState.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
During a commit of 'all' using the HMP non-live commit, the operation
is aborted and returns error on the first error enountered. When
non-COW drives are in use (e.g. ejected floppy, cdrom, or drives without
a backing parent), that means a commit all will return an error of either
-ENOMEDIUM or -ENOTSUP. This is not desirable, so for the 'all' commit
case, only attempt the commit if both bs->drv and bs->backing_hd are
present.
More succinctly: 'commit all' now means a commit on all COW drives.
This means an individual commit to a specific non-COW drive will still
return the appropriate error (-ENOMEDIUM if eject / not present, -ENOTSUP
if no backing file).
Reported-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It is better to present homogeneous hardware independent of the storage
technology that is chosen on the host, hence we make discard a host
parameter; the user can choose whether to pass it down to the image
format and protocol, or to ignore it.
Using DISCARD with filesystems can cause very severe fragmentation, so it
is left default-off for now. This can change later when we implement the
"anchor" operation for efficient management of preallocated files.
There is still one choice to make: whether DISCARD has an effect on the
dirty bitmap or not. I chose yes, though there is a disadvantage: if
the guest is buggy and issues discards for data that is in use, there
will be no way to migrate storage for that guest without downgrading
the machine type to an older one.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_truncate() invalidates the bdrv_check_request() result for
in-flight requests, so there should better be none.
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There can be a need to turn output to stdout off. This patch adds a -q option
that enable "Quiet mode". In Quiet mode, only errors are printed out.
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
There's no synchronous wrapper for bdrv_co_is_allocated_above function
so it's not possible to check for sector allocation in an image with
a backing file.
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In an image chain, if the base image is smaller than the current
image, we need to make sure to use the current images count of
unallocated blocks once we get to the end of the base image. Without
this change the code will return 0 blocks when it gets to the end
of the base image and mirror_run will fail its assertion.
Signed-off-by: Vishvananda Ishaya <vishvananda@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is needed in the following patch.
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This actually uses the dirty bitmap in the block layer, and converts
mirroring to use an HBitmapIter.
Reviewed-by: Laszlo Ersek <lersek@redhat.com> (except block/mirror.c parts)
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Note that resetting bits in the dirty bitmap is done _before_ actually
processing the request. Writes, instead, set bits after the request
is completed.
This way, when there are concurrent write and discard requests, the
outcome will always be that the blocks are marked dirty. This scenario
should never happen, but it is safer to do it this way.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
bdrv_io_limits_enable() starts a new slice, but does not set io_base
correctly for that slice.
Here is how io_base is used:
bytes_base = bs->nr_bytes[is_write] - bs->io_base.bytes[is_write];
bytes_res = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
if (bytes_base + bytes_res <= bytes_limit) {
/* no wait */
} else {
/* operation needs to be throttled */
}
As a result, any I/O operations that are triggered between now and
bs->slice_end are incorrectly limited. If 10 MB of data has been
written since the VM was started, QEMU thinks that 10 MB of data has
been written in this slice. This leads to a I/O lockup in the guest.
We fix this by delaying the start of a new slice to the next
call of bdrv_exceed_io_limits().
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The qiov_is_aligned() function checks whether a QEMUIOVector meets a
BlockDriverState's alignment requirements. This is needed by
virtio-blk-data-plane so:
1. Move the function from block/raw-posix.c to block/block.c.
2. Make it public in block/block.h.
3. Rename to bdrv_qiov_is_aligned().
4. Change return type from int to bool.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
A blank CD or DVD is visible as a zero-sized disks. Probing such
disks will lead to an EIO and a failure to start the VM. Treating
them as raw is a better solution.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This allows removing of MinGW specific code and improves
reentrancy for POSIX hosts.
[Removed unused ret variable in qemu_get_timedate() to fix warning:
vl.c: In function ‘qemu_get_timedate’:
vl.c:451:16: error: variable ‘ret’ set but not used [-Werror=unused-but-set-variable]
-- Stefan Hajnoczi]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This makes the blkdebug suspend/resume functionality available in
qemu-io. Use it like this:
$ ./qemu-io blkdebug::/tmp/test.qcow2
qemu-io> break write_aio req_a
qemu-io> aio_write 0 4k
qemu-io> blkdebug: Suspended request 'req_a'
qemu-io> resume req_a
blkdebug: Resuming request 'req_a'
qemu-io> wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 0:00:30.71 (133.359788 bytes/sec and 0.0326 ops/sec)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This commit adds an Error ** argument to bdrv_img_create() and set it
appropriately on error.
Callers of bdrv_img_create() pass NULL for the new argument and still
rely on bdrv_img_create()'s return value. Next commits will change
callers to use the Error object instead.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This fixes problems that are caused by the additional open/close cycle
of the existing format probing, for example related to qemu-nbd without
-t option or file descriptor passing.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of waiting for all requests to complete, wait just for the
specific request that should be cancelled.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The local string tmp_filename is passed to function get_tmp_filename
which expects a string with minimum size MAX_PATH for w32 hosts.
MAX_PATH is 260 and PATH_MAX is 259, so tmp_filename was too short.
Commit eba25057b9 introduced this
regression.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Now that AIOPool no longer keeps a freelist, it isn't really a "pool"
anymore. Rename it to AIOCBInfo and make it const since it no longer
needs to be modified.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
AIO control blocks are frequently acquired and released because each aio
request involves at least one AIOCB. Therefore, we pool them to avoid
heap allocation overhead.
The problem with the freelist approach in AIOPool is thread-safety. If
we want BlockDriverStates to associate with AioContexts that execute in
multiple threads, then a global freelist becomes a problem.
This patch drops the freelist and instead uses g_slice_alloc() which is
tuned for per-thread fixed-size object pools. qemu_aio_get() and
qemu_aio_release() are now thread-safe.
Note that the change from g_malloc0() to g_slice_alloc() should be safe
since the freelist reuse case doesn't zero the AIOCB either.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* kwolf/for-anthony: (32 commits)
osdep: Less restrictive F_SEFL in qemu_dup_flags()
qemu-iotests: add testcases for mirroring on-source-error/on-target-error
qmp: add pull_event function
mirror: add support for on-source-error/on-target-error
iostatus: forward block_job_iostatus_reset to block job
qemu-iotests: add mirroring test case
mirror: implement completion
qmp: add drive-mirror command
mirror: introduce mirror job
block: introduce BLOCK_JOB_READY event
block: add block-job-complete
block: rename block_job_complete to block_job_completed
block: export dirty bitmap information in query-block
block: introduce new dirty bitmap functionality
block: add bdrv_open_backing_file
block: add bdrv_query_stats
block: add bdrv_query_info
qemu-config: Add new -add-fd command line option
monitor: Prevent removing fd from set during init
monitor: Enable adding an inherited fd to an fd set
...
Conflicts:
vl.c
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Assert that write_compressed is never used with the dirty bitmap.
Setting the bits early is wrong, because a coroutine might concurrently
examine them and copy incomplete data from the source.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Mirroring runs without the backing file so that it can be copied outside
QEMU. However, we need to add it at the time the job is completed and
QEMU switches to the target. Factor out the common bits of opening an
image and completing a mirroring operation.
The new function does not assume that the file is closed immediately after
it returns failure, so it keeps the BDRV_O_NO_BACKING flag up-to-date.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qmp_query_blockstat cannot have errors, remove the Error argument and
create a new public function bdrv_query_stats out of it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently, bdrv_find_backing_image compares bs->backing_file with
what is passed in as a backing_file name. Mismatches may occur,
however, when bs->backing_file and backing_file are not both
absolute or relative.
Use path_combine() to make sure any relative backing filenames are
relative to the current image filename being searched, and then use
realpath() to make all comparisons based on absolute filenames.
If either backing_file or bs->backing_file is determine to be a
protocol, then no filename normalization is performed.
This also changes bdrv_find_backing_image to no longer be recursive,
but iterative.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The first user of close notifiers will be the embedded NBD server.
It would be possible to use them to do some of the ad hoc processing
(e.g. for block jobs and I/O limits) that is currently done by
bdrv_close.
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There is no reason in principle to skip job cancellation and draining
of pending I/O when there is no medium in the disk. Do these unconditionally,
which also prepares the code for the next patch.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>