The block layer generally keeps the size of an image cached in
bs->total_sectors so that it doesn't have to perform expensive
operations to get the size whenever it needs it.
This doesn't work however when using a backend that can change its size
without qemu being aware of it, i.e. passthrough of removable media like
CD-ROMs or floppy disks. For this reason, the caching is disabled when a
removable device is used.
It is obvious that checking whether the _guest_ device has removable
media isn't the right thing to do when we want to know whether the size
of the host backend can change. To make things worse, non-top-level
BlockDriverStates never have any device attached, which makes qemu
assume they are removable, so drv->bdrv_getlength() is always called on
the protocol layer. In the case of raw-posix, this causes unnecessary
lseek() system calls, which turned out to be rather expensive.
This patch completely changes the logic and disables bs->total_sectors
caching only for certain block driver types, for which a size change is
expected: host_cdrom and host_floppy on POSIX, host_device on win32; also
the raw format in case it sits on top of one of these protocols, but in
the common case the nested bdrv_getlength() call on the protocol driver
will use the cache again and avoid an expensive drv->bdrv_getlength()
call.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Since commit 0ebd24e0a2,
bdrv_open_common will throw an error when trying to open a file
read-only with the BDRV_O_COPY_ON_READ flag set.
Although BDRV_O_RDWR is unset for the backing files,
BDRV_O_COPY_ON_READ is still passed on if copy-on-read was requested
for the drive. Let's unset this flag too before opening the backing
file, or bdrv_open_common will fail.
Signed-off-by: Thibaut LAURENT <thibaut.laurent@gmail.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_open_backing_file() tries to copy the backing file name using
pstrcpy directly after calling bdrv_open() to open the backing file
without checking whether that was actually successful. If it was not,
ps->backing_hd->file will probably be NULL and qemu will crash.
Fix this by moving pstrcpy after checking whether bdrv_open() succeeded.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Amos Kong <kongjianjun@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If a read-only device is configured with copy-on-read=on, the old code
only prints a warning and automatically disables copy on read. Make it
a real error for blockdev-add.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The main intent of this patch is to consolidate the whitelist checks to
a single point in the code instead of spreading it everywhere. This adds
a nicer error message for read-only whitelisting, too, in places where
it was still missing.
The patch also contains a bonus bug fix: By finding the format first in
bdrv_open() and then independently checking against the whitelist only
later, we avoid the case that use of a non-whitelisted format results in
probing rather than an error message. Previously, this could happen when
using the driver=... option.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
This field is used by blkverify to disable external snapshots creation.
It will also be used by block filters like quorum to disable external
snapshot creation.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
if a raw device like an iscsi target or host device is used
the current implementation makes a second call out to get
the block status of bs->file.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add a function for retrieving an ImageInfoSpecific object from a block
driver.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The content filename point to may be erased by qemu_opts_absorb_qdict()
in raw_open_common() in drv->bdrv_file_open()
So it's better to use bs->filename.
Signed-off-by: Dunrong Huang <riegamaths@gmail.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The content filename point to will be erased by qemu_opts_absorb_qdict()
in raw_open_common() in drv->bdrv_file_open()
So it's better to use bs->filename.
Signed-off-by: Dunrong Huang <riegamaths@gmail.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Some drivers will have driver specifics options but no filename.
This new bool allow the block layer to treat them correctly.
The .bdrv_needs_filename is set in drivers not having .bdrv_parse_filename and
not having .bdrv_open.
The first exception to this rule will be the quorum driver.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
if the call is invoked through bdrv_is_allocated the caller might
expect *pnum = 0 on error. however, a new implementation of
bdrv_get_block_status might only return a negative exit value on
error while keeping *pnum untouched.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Providing backing.file.filename doesn't override backing file as expected:
$ x86_64-softmmu/qemu-system-x86_64 -drive \
file=/tmp/child.qcow2,backing.file.filename=/tmp/fake.qcow2
qemu-system-x86_64: -drive \
file=/tmp/child.qcow2,backing.file.filename=/tmp/fake.qcow2: could not
open disk image /tmp/child.qcow2: Can't specify 'file' and 'filename'
options at the same time
With
$ qemu-img info /tmp/child.qcow2
image: /tmp/child.qcow2
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 196K
cluster_size: 65536
backing file: /tmp/fake.qcow2
This fixes it by calling bdrv_get_full_backing_filename only if
backing.file.filename is not provided. Also save the backing file name
to bs->backing_file so the information is correct with HMP "info block".
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
To read the last sector that is not aligned to sector boundary, current
code for growable backends, since commit 893a8f6 "block: Produce zeros
when protocols reading beyond end of file", drops the data and directly
returns zeroes. That is incorrect.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Add an Error ** parameter to bdrv_create and its associated functions to
allow more specific error messages.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add an Error ** parameter to bdrv_open, bdrv_file_open and associated
functions to allow more specific error messages.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add an Error ** parameter to BlockDriver.bdrv_open and
BlockDriver.bdrv_file_open to allow more specific error messages.
Signed-off-by: Max Reitz <mreitz@redhat.com>
This patch adds the "amend" option to qemu-img which allows changing
image options on existing image files. It also adds the generic bdrv
implementation which is basically just a wrapper for the image format
specific function.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Protocols return raw data, so you can assume the offsets to pass
through unchanged.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If the sectors are unallocated and we are past the end of the
backing file, they will read as zero.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Alternatively, this could use a "discard zeroes data" flag returned
by bdrv_get_info.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Define the return value of get_block_status. Bits 0, 1, 2 and 9-62
are valid; bit 63 (the sign bit) is reserved for errors. Bits 3-8
are left for future extensions.
The return code is compatible with the old is_allocated API: if a driver
only returns 0 or 1 (aka BDRV_BLOCK_DATA) like is_allocated used to,
clients of is_allocated will not have any change in behavior. Still,
we will return more precise information in the next patches and the
new definition of bdrv_is_allocated is already prepared for this.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
For now, bdrv_get_block_status is just another name for bdrv_is_allocated.
The next patches will add more flags.
This also touches all block drivers with a mostly mechanical rename. The
sole exception is cow; because it calls cow_co_is_allocated from the read
code, we keep that function and make cow_co_get_block_status a wrapper.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This helps implementing is_allocated on top of get_block_status.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Some bdrv_is_allocated callers do not expect errors, but the fallback
in qcow2.c might make other callers trip on assertion failures or
infinite loops.
Fix the callers to always look for errors.
Cc: qemu-stable@nongnu.org
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Now that bdrv_is_allocated detects coroutine context, the two can
use the same code.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This is more robust when the device has removable media.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
bdrv_is_allocated can detect coroutine context and go through a fast
path, similar to other block layer functions.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If a BlockDriverState is growable, after every write we need to
check if bs->total_sectors might have changed. With this change,
bdrv_getlength does not need anymore a system call.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Manage BlockDriverState lifecycle with refcnt, so bdrv_delete() is no
longer public and should be called by bdrv_unref() if refcnt is
decreased to 0.
This is an identical change because effectively, there's no multiple
reference of BDS now: no caller of bdrv_ref() yet, only bdrv_new() sets
bs->refcnt to 1, so all bdrv_unref() now actually delete the BDS.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Introduce bdrv_ref/bdrv_unref to manage the lifecycle of
BlockDriverState. They are unused for now but will used to replace
bdrv_delete() later.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The failing condition is checked immediately before the assertion, so
keeping the assertion is kind of redundant.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is an autogenerated patch using scripts/switch-timer-api.
Switch the entire code base to using the new timer API.
Note this patch may introduce some line length issues.
Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
While Asias is debugging an issue creating qcow2 images on top of
non-file protocols. It boils down to this example using NBD:
$ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
Notice the open -g option to set bs->growable. This means you can
read/write beyond end of file. Reading beyond end of file is supposed
to produce zeroes.
We rely on this behavior in qcow2_create2() during qcow2 image
creation. We create a new file and then write the qcow2 header
structure using bdrv_pwrite(). Since QCowHeader is not a multiple of
sector size, block.c first uses bdrv_read() on the empty file to fetch
the first sector (should be all zeroes).
Here is the output from the qemu-io NBD example above:
$ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
00000000: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ................
00000010: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ................
00000020: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ................
...
We are not zeroing the buffer! As a result qcow2 image creation on top
of protocols is not guaranteed to work even when file creation is
supported by the protocol.
[Adapted this patch to use bs->zero_beyond_eof.
-- Stefan]
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In 4146b46c42e0989cb5842e04d88ab6ccb1713a48 (block: Produce zeros when
protocols reading beyond end of file), we break qemu-iotests ./check
-qcow2 022. This happens because qcow2 temporarily sets ->growable = 1
for vmstate accesses (which are stored beyond the end of regular image
data).
We introduce the bs->zero_beyond_eof to allow qcow2_load_vmstate() to
disable ->zero_beyond_eof temporarily in addition to enable ->growable.
[Since the broken patch "block: Produce zeros when protocols reading
beyond end of file" has not been merged yet, I have applied this fix
*first* and will then apply the next patch to keep the tree bisectable.
-- Stefan]
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If a block driver has no file descriptors to monitor but there are still
active requests, it can return 1 from .io_flush(). This is used to spin
during synchronous I/O.
Stop relying on .io_flush() and instead check
QLIST_EMPTY(&bs->tracked_requests) to decide whether there are active
requests.
This is the first step in removing .io_flush() so that event loops no
longer need to have the concept of synchronous I/O. Eventually we may
be able to kill synchronous I/O completely by running everything in a
coroutine, but that is future work.
Note this patch moves bs->throttled_reqs initialization to bdrv_new() so
that bdrv_requests_pending(bs) can safely access it. In practice bs is
g_malloc0() so the memory is already zeroed but it's safer to initialize
the queue properly.
We also need to fix up block/stream.c:close_unused_images() to prevent
traversing a dangling pointer while it rearranges the backing file
chain. This is necessary since the new bdrv_drain_all() traverses the
backing file chain.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In bdrv_delete() make sure to call bdrv_make_anon() *after* bdrv_close()
so that the device is still seen by bdrv_drain_all() when iterating
bdrv_states.
Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The throttling code was segfaulting since commit
02ffb50448 because some qemu_co_queue_next caller
does not run in a coroutine.
qemu_co_queue_do_restart assume that the caller is a coroutinne.
As suggested by Stefan fix this by entering the coroutine directly.
Also make sure like suggested that qemu_co_queue_next() and
qemu_co_queue_restart_all() can be called only in coroutines.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This is traditionally -drive format=..., which is now translated into
the new driver option. This gives us a more consistent way to select the
driver of BlockDriverStates that can be used in QMP context, too.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
bdrv_flush() can fail, and bdrv_flush_all() should return an error as
well if this happens for a block device. It returns the first error
return now, but still at least tries to flush the remaining devices even
in error cases.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
One of the major reasons for doing something new for -blockdev and
blockdev-add was that the old block layer code parses filenames instead
of just taking them literally. So we should really leave it untouched
when it's passing using the new interfaces (like -drive
file.filename=...).
This allows opening relative file names that contain a colon.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Since 80ccf93b we flush the block device during close. The
bdrv_drain_all() call should come before bdrv_flush() to ensure guest
write requests have completed. Otherwise we may miss pending writes
when flushing.
Call bdrv_drain_all() again for safety as the final step after
bdrv_flush(). This should not be necessary but we can be paranoid here
in case bdrv_flush() left I/O pending.
Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>