Although bdrv_getlength() was just called above this, and checked for
error, it is better to just use the value we already get, and use
DIV_ROUND_UP.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
if the iscsi driver receives a write zeroes request with
the BDRV_REQ_MAY_UNMAP flag set it fails with -ENOTSUP
if the iscsi target does not support WRITE SAME with
UNMAP. However, the BDRV_REQ_MAY_UNMAP is only a hint
and writing zeroes with WRITE SAME will still be
better than falling back to writing zeroes with WRITE16.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* remotes/qmp-unstable/queue/qmp:
monitor: fix qmp_getfd() fd leak in error case
HMP: support specifying dump format for dump-guest-memory
HMP: fix doc of dump-guest-memory
qmp: object-add: Validate class before creating object
monitor: Add device_add and device_del completion.
monitor: Add command_completion callback to mon_cmd_t.
monitor: Fix drive_del id argument type completion.
error: Remove some unused headers
qerror.h: Replace QERR_NOT_SUPPORTED with QERR_UNSUPPORTED
qerror.h: Remove QERR defines that are only used once
qerror.h: Remove unused error classes
error: Print error_report() to stderr if using qmp
monitor: Remove unused monitor_print_filename
error: Privatize error_print_loc
vnc: Remove default_mon usage
slirp: Remove default_mon usage
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Using error_is_set(errp) that way can sweep programming errors under
the carpet when we get called incorrectly with an error set.
Commit 24d3bd6 added a broken error path to iscsi_do_inquiry(): it
first calls error_setg(), then jumps to the preexisting error label,
where error_setg() gets called again, triggering an assertion failure.
Commit cbee81f fixed this by guarding the second error_setg() with an
error_is_set().
Replace this fix by a simpler and safer one: jump right behind the
second error_setg().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Using error_is_set(errp) to check whether a function call failed is
fragile: it breaks when errp is null. Check perfectly suitable return
values instead when possible. errp can't be null there now, but this
is more robust and more obviously correct
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
error_is_set(&var) is the same as var != NULL, but it takes
whole-program analysis to figure that out. Unnecessarily hard for
optimizers, static checkers, and human readers. Commit 84d18f0 dumbed
it down to obvious, but a few more have crept in since, and
documentation was overlooked. Dumb these down, too.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Just hardcode them in the callers
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
PRIu32 is the format string specifier for uint32_t, let's use it.
Variables ->block_size, ->n_blocks, and i are all uint32_t.
Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds an errp parameter to bdrv_new() and updates all its
callers. The next patches will make use of this in order to check for
duplicate IDs. Most of the callers know that their ID is fine, so they
can simply assert that there is no error.
Behaviour doesn't change with this patch yet as bdrv_new() doesn't
actually assign errors to errp.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
This patch converts fprintf() calls to error_setg() in block/qed.c:bdrv_qed_create()
(error_setg() is part of error reporting API in include/qapi/error.h)
Signed-off-by: Aakriti Gupta <aakritty@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_getlength could fail, check the return value before using it.
Return NULL and set errno if it fails. Callers are updated to handle
the error case.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The old check was off by a factor of 512 and didn't consider cases where
we don't get an exact division. This could lead to an out-of-bounds
array access in seek_to_sector().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
When qcow2_get_cluster_offset() sees a zero cluster in a version 2
image, it (rightfully) returns an error. But in doing so it shouldn't
leak an L2 table cache reference.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
If lazy refcounts are enabled for a backing file, committing to this
backing file may leave it in a dirty state even if the commit succeeds.
The reason is that the bdrv_flush() call in bdrv_commit() doesn't flush
refcount updates with lazy refcounts enabled, and qcow2_reopen_prepare()
doesn't take care to flush metadata.
In order to fix this, this patch also fixes qcow2_mark_clean(), which
contains another ineffective bdrv_flush() call beause lazy refcounts are
disabled only afterwards. All existing callers of qcow2_mark_clean()
either don't modify refcounts or already flush manually, so that this
fixes only a latent, but not yet actually triggerable bug.
Another instance of the same problem is live snapshots. Again, a real
corruption is prevented by an explicit flush for non-read-only images in
external_snapshot_prepare(), but images using lazy refcounts stay dirty.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
This eliminates the possible assertion failure in error_setg().
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Max WRITE SAME length is also used when the UNMAP bit is zero, so it
should be queried even if LBPWS=0. Same for the optimal transfer
length.
However, the write_zeroes_alignment only matters for UNMAP=1 so we
still restrict it to LBPWS=1.
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Non-block SCSI devices do not support flushing, but we may still send
them requests via bdrv_flush_all. Just ignore them.
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some targets may return "invalid field" as the ASCQ from WRITE SAME
if they support the command only without the UNMAP field. Recognize
that, and return ENOTSUP just like for "invalid operation code".
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
preallocate() only links the first QCowL2Meta's data clusters into the
L2 table and ignores any chained QCowL2Metas in the linked list.
Chains of QCowL2Meta structs are built up when contiguous clusters span
L2 tables. Each QCowL2Meta describes one L2 table update. This is a
rare case in preallocate() but can happen.
This patch fixes preallocate() by iterating over the whole list of
QCowL2Metas. Compare with the qcow2_co_writev() function's
implementation, which is similar but also also handles request
dependencies. preallocate() only performs one allocation at a time so
there can be no dependencies.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This avoids a possible division by zero.
Convert s->tracks to unsigned as well because it feels better than
surviving just because the results of calculations with s->tracks are
converted to unsigned anyway.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The first test case would cause a huge memory allocation, leading to a
qemu abort; the second one to a too small malloc() for the catalog
(smaller than s->catalog_size), which causes a read-only out-of-bounds
array access and on big endian hosts an endianess conversion for an
undefined memory area.
The sample image used here is not an original Parallels image. It was
created using an hexeditor on the basis of the struct that qemu uses.
Good enough for trying to crash the driver, but not for ensuring
compatibility.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Even with a limit of 64k snapshots, each snapshot could have a filename
and an ID with up to 64k, which would still lead to pretty large
allocations, which could potentially lead to qemu aborting. Limit the
total size of the snapshot table to an average of 1k per entry when
the limit of 64k snapshots is fully used. This should be plenty for any
reasonable user.
This also fixes potential integer overflows of s->snapshot_size.
Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This avoids an unbounded allocation.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
For the L1 table to loaded for an internal snapshot, the code allocated
only enough memory to hold the currently active L1 table. If the
snapshot's L1 table is actually larger than the current one, this leads
to a buffer overflow.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The qcow2 code assumes that s->snapshots is non-NULL if s->nb_snapshots
!= 0. By having the initialisation of both fields separated in
qcow2_open(), any error occuring in between would cause the error path
to dereference NULL in qcow2_free_snapshots() if the image had any
snapshots.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
bs->total_sectors is not the highest possible sector number that could
be involved in a copy on write operation: VM state is after the end of
the virtual disk. This resulted in wrong values for the number of
sectors to be copied (n).
The code that checks for the end of the image isn't required any more
because the code hasn't been calling the block layer's bdrv_read() for a
long time; instead, it directly calls qcow2_readv(), which doesn't error
out on VM state sector numbers.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Both compressed and uncompressed I/O is buffered. dmg_open() calculates
the maximum buffer size needed from the metadata in the image file.
There is currently a buffer overflow since ->lengths[] is accounted
against the maximum compressed buffer size but actually uses the
uncompressed buffer:
switch (s->types[chunk]) {
case 1: /* copy */
ret = bdrv_pread(bs->file, s->offsets[chunk],
s->uncompressed_chunk, s->lengths[chunk]);
We must account against the maximum uncompressed buffer size for type=1
chunks.
This patch fixes the maximum buffer size calculation to take into
account the chunk type. It is critical that we update the correct
maximum since there are two buffers ->compressed_chunk and
->uncompressed_chunk.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The DMG metadata is stored as uint64_t, so use the same type for
sector_num. int was a particularly poor choice since it is only 32-bit
and would truncate large values.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Chunk length and sectorcount are used for decompression buffers as well
as the bdrv_pread() count argument. Ensure that they have reasonable
values so neither memory allocation nor conversion from uint64_t to int
will cause problems.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Use the right types instead of signed int:
size_t new_size;
This is a byte count for g_realloc() that is calculated from uint32_t
and size_t values.
uint32_t chunk_count;
Use the same type as s->n_chunks, which is used together with
chunk_count.
This patch is a cleanup and does not fix bugs.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It is not necessary to check errno for EINTR and the block layer does
not produce short reads. Therefore we can drop the loop that attempts
to read a compressed chunk.
The loop is buggy because it incorrectly adds the transferred bytes
twice:
do {
ret = bdrv_pread(...);
i += ret;
} while (ret >= 0 && ret + i < s->lengths[chunk]);
Luckily we can drop the loop completely and perform a single
bdrv_pread().
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
When a terminator is reached the base for offsets and sectors is stored.
The following records that are processed will use this base value.
If the first record we encounter is a terminator, then calculating the
base values would result in out-of-bounds array accesses. Don't do
that.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Clean up the mix of tabs and spaces, as well as the coding style
violations in block/dmg.c. There are no semantic changes since this
patch simply reformats the code.
This patch is necessary before we can make meaningful changes to this
file, due to the inconsistent formatting and confusing indentation.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The size in bytes is assigned to an int later, so check that instead of
the number of entries.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In order to avoid integer overflows.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If the size becomes larger than what qcow2_open() would accept, fail the
growing operation.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This ensures that the checks catch all invalid cluster indexes
instead of returning the refcount of a wrong cluster.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
free_cluster_index is only correct if update_refcount() was called from
an allocation function, and even there it's brittle because it's used to
protect unfinished allocations which still have a refcount of 0 - if it
moves in the wrong place, the unfinished allocation can be corrupted.
So not using it any more seems to be a good idea. Instead, use the
first requested cluster to do the calculations. Return -EAGAIN if
unfinished allocations could become invalid and let the caller restart
its search for some free clusters.
The context of creating a snapsnot is one situation where
update_refcount() is called outside of a cluster allocation. For this
case, the change fixes a buffer overflow if a cluster is referenced in
an L2 table that cannot be represented by an existing refcount block.
(new_table[refcount_table_index] was out of bounds)
[Bump the qemu-iotests 026 refblock_alloc.write leak count from 10 to
11.
--Stefan]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
len could become negative and would pass the check then. Nothing bad
happened because bdrv_pread() happens to return an error for negative
length values, but make variables for sizes unsigned anyway.
This patch also changes the behaviour to error out on invalid lengths
instead of silently truncating it to 1023.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This avoids an unbounded allocation.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This avoid unbounded memory allocation and fixes a potential buffer
overflow on 32 bit hosts.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The end of the refcount table must not exceed INT64_MAX so that integer
overflows are avoided.
Also check for misaligned refcount table. Such images are invalid and
probably the result of data corruption. Error out to avoid further
corruption.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Limit the in-memory reference count table size to 8 MB, it's enough in
practice. This fixes an unbounded allocation as well as a buffer
overflow in qcow2_refcount_init().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Header, header extension and the backing file name must all be stored in
the first cluster. Setting the backing file to a much higher value
allowed header extensions to become much bigger than we want them to be
(unbounded allocation).
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This fixes an unbounded allocation for s->unknown_header_fields.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
curl_read_cb is callback function for libcurl when data arrives. The
data size passed in here is not guaranteed to be within the range of
request we submitted, so we may overflow the guest IO buffer. Check the
real size we have before memcpy to buffer to avoid overflow.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Other variables (e.g. sectors_per_block) are calculated using these
variables, and if not range-checked illegal values could be obtained
causing infinite loops and other potential issues when calculating
BAT entries.
The 1.00 VHDX spec requires BlockSize to be min 1MB, max 256MB.
LogicalSectorSize is required to be either 512 or 4096 bytes.
Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The maximum blocks_in_image is 0xffffffff / 4, which also limits the
maximum disk_size for a VDI image to 1024TB. Note that this is the maximum
size that QEMU will currently support with this driver, not necessarily the
maximum size allowed by the image format.
This also fixes an incorrect error message, a bug introduced by commit
5b7aa9b56d (Reported by Stefan Weil)
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This fixes some cases of division by zero crashes.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This adds checks to make sure that max_table_entries and block_size
are in sane ranges. Memory is allocated based on max_table_entries,
and block_size is used to calculate indices into that allocated
memory, so if these values are incorrect that can lead to potential
unbounded memory allocation, or invalid memory accesses.
Also, the allocation of the pagetable is changed from g_malloc0()
to qemu_blockalign().
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
32 bit truncation could let us access the wrong offset in the image.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This fixes two possible division by zero crashes: In bochs_open() and in
seek_to_sector().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It should neither become negative nor allow unbounded memory
allocations. This fixes aborts in g_malloc() and an s->catalog_bitmap
buffer overflow on big endian hosts.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Gets us rid of integer overflows resulting in negative sizes which
aren't correctly checked.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This is an on-disk structure, so offsets must be accurate.
Before this patch, sizeof(bochs) != sizeof(header_v1), which makes the
memcpy() between both invalid. We're lucky enough that the destination
buffer happened to be the larger one, and the memcpy size to be taken
from the smaller one, so we didn't get a buffer overflow in practice.
This patch unifies the both structures, eliminating the need to do a
memcpy in the first place. The common fields are extracted to the top
level of the struct and the actually differing part gets a union of the
two versions.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
cloop stores the number of compressed blocks in the n_blocks header
field. The file actually contains n_blocks + 1 offsets, where the extra
offset is the end-of-file offset.
The following line in cloop_read_block() results in an out-of-bounds
offsets[] access:
uint32_t bytes = s->offsets[block_num + 1] - s->offsets[block_num];
This patch allocates and loads the extra offset so that
cloop_read_block() works correctly when the last block is accessed.
Notice that we must free s->offsets[] unconditionally now since there is
always an end-of-file offset.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The offsets[] array allows efficient seeking and tells us the maximum
compressed data size. If the offsets are bogus the maximum compressed
data size will be unrealistic.
This could cause g_malloc() to abort and bogus offsets mean the image is
broken anyway. Therefore we should refuse such images.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Limit offsets_size to 512 MB so that:
1. g_malloc() does not abort due to an unreasonable size argument.
2. offsets_size does not overflow the bdrv_pread() int size argument.
This limit imposes a maximum image size of 16 TB at 256 KB block size.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The following integer overflow in offsets_size can lead to out-of-bounds
memory stores when n_blocks has a huge value:
uint32_t n_blocks, offsets_size;
[...]
ret = bdrv_pread(bs->file, 128 + 4, &s->n_blocks, 4);
[...]
s->n_blocks = be32_to_cpu(s->n_blocks);
/* read offsets */
offsets_size = s->n_blocks * sizeof(uint64_t);
s->offsets = g_malloc(offsets_size);
[...]
for(i=0;i<s->n_blocks;i++) {
s->offsets[i] = be64_to_cpu(s->offsets[i]);
offsets_size can be smaller than n_blocks due to integer overflow.
Therefore s->offsets[] is too small when the for loop byteswaps offsets.
This patch refuses to open files if offsets_size would overflow.
Note that changing the type of offsets_size is not a fix since 32-bit
hosts still only have 32-bit size_t.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Avoid unbounded s->uncompressed_block memory allocation by checking that
the block_size header field has a reasonable value. Also enforce the
assumption that the value is a non-zero multiple of 512.
These constraints conform to cloop 2.639's code so we accept existing
image files.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The mirror blockjob coroutine rate-limits itself by sleeping. The
coroutine also performs I/O asynchronously so it's important that the
aio callback doesn't wake the coroutine early as that breaks
rate-limiting.
Reported-by: Joaquim Barrera <jbarrera@ac.upc.edu>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The throttling delay calculation was using an inaccurate sector count to
calculate the time to sleep. This broke rate-limiting for the block
mirror job.
Move the delay calculation into mirror_iteration() where we know how
many sectors were transferred. This lets us calculate an accurate delay
time.
Reported-by: Joaquim Barrera <jbarrera@ac.upc.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
if an NFS operation fails we should report what libnfs knows
about the failure. It is likely more than just an error code.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If qcow2_alloc_clusters() fails, new_offset and ret will both be
negative after the fail label, thus passing the first if condition and
subsequently resulting in a call of qcow2_free_clusters() with an
invalid (negative) offset parameter. Fix this by introducing a new label
"fail_free_cluster" which is only invoked if new_offset is indeed
pointing to a newly allocated cluster that should be cleaned up by
freeing it.
While we're at it, clean up the whole fail path. qcow2_cache_put()
should (and actually can) never fail, hence the return value can safely
be ignored (aside from asserting that it indeed did not fail).
Furthermore, there is no reason to give QCOW2_DISCARD_ALWAYS to
qcow2_free_clusters(), a mere QCOW2_DISCARD_OTHER will suffice.
Ultimately, rename the "fail" label to "done", as it is invoked both on
failure and success.
Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Contrary to the comment describing this function's behavior, it does not
return 0 on success, but rather the offset of the newly allocated
cluster. This patch adjusts the comment accordingly to reflect the
actual behavior.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If it returns an error, the migrated VM will not be started, but qemu
exits with an error message.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
nbd_receive_reply() is called by the event loop whenever data is
available or the socket has been closed by the remote side.
This patch closes the socket when an error occurs to prevent the
nbd_receive_reply() handler from being called indefinitely after the
connection has failed.
Note that we were already correctly returning EIO for pending requests
but leaving the nbd_receive_reply() handler registered resulted in high
CPU consumption and a flood of error messages.
Reuse nbd_teardown_connection() to close the socket.
Reported-by: Zhifeng Cai <bluewindow@h3c.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
On 32-bit hosts, some compilers will warn on too large integer constants
for constants that are 64-bit in length. Explicitly put a 'ULL' suffix
on those defines.
Reported-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The "host_device" protocol driver should strip the "host_device:" prefix
from filenames if present.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The hdev_create() implementation in block/raw-posix.c is used by the
"host_device", "host_cdrom" and "host_floppy" protocol block drivers
together. Thus, any of the associated prefixes may occur and exactly one
should should be stripped, if it does (thus,
"host_device:host_cdrom:/dev/cdrom" is not shortened to "/dev/cdrom").
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The "host_cdrom" protocol drivers should strip the "host_cdrom:" prefix
from filenames if present.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The "host_floppy" protocol driver should strip the "host_floppy:" prefix
from filenames if present.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The "host_device" protocol driver should strip the "host_device:" prefix
from filenames if present.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
qcow2_open() causes writes when repairing an image with the dirty flag
set and when clearing autoclear flags. It shouldn't do this when another
qemu instance is still actively working on this image file.
One effect of the bug is that images may have a cleared dirty flag while
the migration source host still has it in use with lazy refcounts
enabled, so refcounts are not accurate and the dirty flag must remain
set.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Instead of manually building a list of all options from BDRVQcowState
values just reuse the options that were used to open the image.
qcow2_open() won't fully use all of the options in the QDict, but that's
okay.
This fixes all of the driver-specific options in qcow2, except for
lazy-refcounts, which was special cased before.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch keep the recursive way of doing things but simplify it by giving
two responsabilities to all block filters implementors.
They will need to do two things:
-Set the is_filter field of their block driver to true.
-Implement the bdrv_recurse_is_first_non_filter method of their block driver like
it is done on the Quorum block driver. (block/quorum.c)
[Paolo Bonzini <pbonzini@redhat.com> pointed out that this patch changes
the semantics of blkverify, which now recurses down both bs->file and
s->test_file.
-- Stefan]
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Before dereferencing bs->drv for a call to its member bdrv_co_readv(),
copy_sectors() should check whether that pointer is indeed valid, since
it may have been set to NULL by e.g. a concurrent write triggering the
corruption prevention mechanism.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
After migration has completed, we call bdrv_invalidate_cache() so that
drivers which cache some data drop their stale copy of the data and
reread it from the image file to get a new version of data that the
source modified while the migration was running.
Reloading metadata from the image file is useless, though, if the size
of the image file stays stale (this is a value that is cached for all
image formats in block.c). Reads from (meta)data after the old EOF
return only zeroes, causing image corruption.
We need to update bs->total_sectors in all layers that could potentially
have changed their size (i.e. backing files are not a concern - if they
are changed, we're in bigger trouble)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
When reading the refcount table entry in get_refcount(), only bits which
are actually significant for the refcount block offset should be taken
into account.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The bdrv_create() implementation of the block/raw-win32 "file" protocol
driver should strip the "file:" prefix from filenames if present.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The "file" protocol driver should strip the "file:" prefix from
filenames if present.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The bdrv_create() implementation of the block/raw-posix "file" protocol
driver should strip the "file:" prefix from filenames if present.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The "file" protocol driver should strip the "file:" prefix from
filenames if present.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Originally, this built up the error message with the backing filename,
so that errp was set as follows:
error_set(errp, QERR_OPEN_FILE_FAILED, backing_filename);
However, we now propagate the local_error from the
bdrv_open_backing_file() call instead, making these 2 lines useless
code.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The current iscsi block driver code makes the rather arbitrary decision
that TYPE_MEDIUM_CHANGER and TYPE_TAPE devices have bs->sg = 1 and all
other device types are disks.
Instead of this, check for TYPE_DISK to expose the disk interface and
make everything else bs->sg = 1. In particular, this includes devices
with TYPE_STORAGE_ARRAY, which is what LUN 0 of an iscsi target is.
(See https://bugzilla.redhat.com/show_bug.cgi?id=1067784 for the exact
scenario.)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Remove the definitions of GLUSTER_FD_WRITE and GLUSTER_FD_READ which are
no longer used. Also sockets.h isn't needed any more.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Pipe handling mechanism in gluster driver was based on similar implementation
in RBD driver and hence had GPLv2 and associated copyright information.
After changing gluster driver to coroutine based implementation, the pipe
handling code no longer exists and hence change gluster driver's licence to
GPLv2+ and remove RBD copyrights.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit adccfbcd60 (block: gluster - add
reopen support.) did not supply the qemu_gluster_init() Error **
argument, needed since commit a7451cb850
(gluster: correctly propagate errors).
Pass through qemu_gluster_reopen_prepare()'s errp, as done in
qemu_gluster_open().
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJTENUMAAoJEJykq7OBq3PIXoQH/R7Q9CqASrBEfbT1GJDSr3E5
yq0ZZ4nJBhhn0f8tOw9NWGCmTH1UZn0u4PbYEj4jTGJyWY2POkCKLPRvurge4c+Q
L+OwzspW14O35o7WuBC8uWvSxh8ExaiwwrutCsIsz68UAcnoiwyTMSKghIy9+Osk
lJKRFz6IFbgxCocqeNQrilj84vYSv9VzXZY8/nIhDMFsbjaKW7sKwu4jnXB7+v5J
cA0+voafINNICNMeXzkqBbKwWIFtYOkPe287MZQNbpS2vCS2bsZeivDxsBJ26Cmx
Xho8Xz5utvBMBrNtidHCTsd2jrj0VBdGgksYoHPyIwK/gBr1EVTGKpebraKXBEU=
=rgw9
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
Block pull request
# gpg: Signature made Fri 28 Feb 2014 18:27:24 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
* remotes/stefanha/tags/block-pull-request:
block/vmdk: do not report file offset for compressed extents
discard rbd error output when not relevant in qemu-iotests
block: use /var/tmp instead of /tmp for -snapshot
qemu-io-test: Disable Quorum test when not compiled in.
qmp: Make Quorum error events more palatable.
qmp: Fix BlockdevOptionQuorum.
block: gluster - add reopen support.
block: gluster - code movements, state storage changes
qemu-iotests: add more tests to the "quick" group
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* remotes/bonzini/scsi-next:
block/iscsi: fix segfault if writesame fails
scsi-disk: Add support for port WWN and index descriptors in VPD page 83h
block/iscsi: query for supported VPD pages
block/iscsi: fix deadlock on scsi check condition
scsi-bus: Fix transfer length for VERIFY with BYTCHK=11b
scsi: report thin provisioning errors with werror=report
scsi: Change scsi sense buf size to 252
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Insert quorum QMP events documentation alphabetically.
Also change the "ret" errno value by an optional "error" being an strerror(-ret)
in the QUORUM_REPORT_BAD qmp event.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Gluster does parse open flags in its .bdrv_open() implementation,
and the .bdrv_reopen_* implementations need to do the same.
A new gluster connection to the image file to be created is established
in the .bdrv_reopen_prepare(), and the image file opened with the new
flags.
If this is successful, then the old image file is closed, and the
old connection torn down. The relevant structure pointers in the gluster
state structure are updated to the new connection.
If it is not successful, then the new file handle and connection is
abandoned (if it exists), while the old connection is not modified at
all.
With reopen supported, block-commit (and offline commit) is now also
supported for image files whose base image uses the native gluster
protocol driver.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In preparation for supporting reopen on gluster, move flag
parsing out to a function. Also, add a NULL check in the
gconf cleanup.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTB8hAAAoJEH8JsnLIjy/WQdgP/jEu5baA1/qKanDsS9l+81u1
/sIYSWpHDEJ0uavqTMBeyMOwkzel7SZRusIwA/d5pMqxbY6/86YJumTTozFWvtqc
IABqHtRKCxjcLdZRPbkuNAOiw6p76vSZa543o2t8OAhK2DIFy530wWXeoQEYvuJX
4pOh0lTradOrF1z6uW4ozgQ1efPppwh/iqwfWWNJVTgfnWxJk6qQaATEgkuSdsUN
Wp78UzOxLGO6JKJB6kP3LfNL0ANTYHpfH2/wkE6cW6TkSUduOm6hIBY+tb9khqYt
INOKxqFADK6EOgjvJBsZuZUtOnHK5oM921LepN/mOPAs6gKcn2j+FfqJrl3I1/5M
AXM3M0FPuijEKPGWw7pCLt7j84KJkD9a/rsKO37yRzw17fOma2Rpr4TrX43BF+5t
CGqQ7PzDJ6Fng4EXjyNDzviwXIK8xmG1tfn92tq/BUd6OuM9MCyzEGvEiGOMBoXv
w4iOV7UC+1P3TjnTBhMlBVGywSfdOJoHr9k4lXGNp0h8fPhM9rfruI3BFysxaas6
GmKbd7yvKwXOTptd3I9SB8BzVUL3CcD3FK24+cWKAl8GgyiDIWRlvBYyMp3p8Z8f
NDzcxYP6aRGsoddvpIWr3Tz89hw5wTW5u3RmNgxJUguz6HYKFbl30dpGT+96q2BN
YIAANTdPxn7BP6r3glQH
=ZaDG
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block patches
# gpg: Signature made Fri 21 Feb 2014 21:42:24 GMT using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
* remotes/kevin/tags/for-upstream: (54 commits)
iotests: Mixed quorum child device specifications
quorum: Simplify quorum_open()
quorum: Add unit test.
quorum: Add quorum_open() and quorum_close().
quorum: Implement recursive .bdrv_recurse_is_first_non_filter in quorum.
quorum: Add quorum_co_flush().
quorum: Add quorum_invalidate_cache().
quorum: Add quorum_getlength().
quorum: Add quorum mechanism.
quorum: Add quorum_aio_readv.
blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify.
quorum: Add quorum_aio_writev and its dependencies.
quorum: Create BDRVQuorumState and BlkDriver and do init.
quorum: Create quorum.c, add QuorumChildRequest and QuorumAIOCB.
check-qdict: Test termination of qdict_array_split()
check-qdict: Adjust test for qdict_array_split()
qdict: Extract non-QDicts in qdict_array_split()
qemu-config: Sections must consist of keys
qemu-iotests: Check qemu-img command line parsing
qemu-img: Allow -o help with incomplete argument list
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* remotes/bonzini/configure:
build: softmmu targets do not have a "main.o" file
configure: Disable libtool if -fPIE does not work with it (bug #1257099)
block: convert block drivers linked with libs to modules
Makefile: introduce common-obj-m and block-obj-m for DSO
Makefile: install modules with "make install"
module: implement module loading
rules.mak: introduce DSO rules
darwin: do not use -mdynamic-no-pic
block: use per-object cflags and libs
rules.mak: allow per object cflags and libs
rules.mak: fix $(obj) to a real relative path
util: Split out exec_dir from os_find_datadir
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
commit fa6252b0 introduced a segfault because it tries
to read iTask.task->sense after iTask.task has been
freed.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
this patch ensures that we only query for block provisioning and
block limits vpd pages if they are advertised. It also cleans
up the inquiry code and eliminates some redundant code.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
the retry logic was broken because the complete status
of the task structure was not reset. this resulted in
an infinite loop retrying the command over and over.
CC: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Although it may not look like it, this patch simplifies quorum_open().
qdict_array_split() is now able to return QLists with different objects
than only QDicts, therefore it will now do all the work and
quorum_open() does not have to handle reference strings by itself.
This allows mixing full option dicts and reference strings for
specifying the child block devices of quorum; furthermore, it improves
handling of malformed specifications.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Example of command line:
-drive if=virtio,driver=quorum,\
children.0.file.filename=1.raw,\
children.0.node-name=1.raw,\
children.0.driver=raw,\
children.1.file.filename=2.raw,\
children.1.node-name=2.raw,\
children.1.driver=raw,\
children.2.file.filename=3.raw,\
children.2.node-name=3.raw,\
children.2.driver=raw,\
vote-threshold=2
blkverify=on with vote-threshold=2 and two files can be passed to
emulate blkverify.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is used to activate quorum snapshot.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Makes a vote to select error if any.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We really want that live migration works with quorum so implement
quorum_invalidate_cache().
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Check that every bs file returns the same length.
Otherwise, return -EIO to disable the quorum and
avoid length discrepancy.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patchset enables the core of the quorum mechanism.
The num_children reads are compared to get the majority version and if this
version exists more than threshold times the guest won't see the error at all.
If a block is corrupted or if an error occurs during an IO or if the quorum
cannot be established QMP events are used to report to the management.
Use gnutls's SHA-256 to compare versions.
--enable-quorum must be used to enable the feature.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add code to do num_children reads in parallel and cleanup the structures
afterwards.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qemu_iovec_compare() will be used to compare IOs vectors in quorum blkverify
mode. The patch extracts these functions in order to factorize the code.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Writes are mirrored num_children times on num_children devices.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Create the structure holding the quorum settings and write the minimal block
driver instanciation boilerplate.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Quorum is a block filter mirroring writes to num_children children.
For reads quorum reads each children and does a vote.
If more than vote_threshold versions are identical the quorum is reached and
this winning version is returned to the guest. So quorum prevents bit corruption.
For high availability purpose minority errors are reported via QMP but the guest
does not see them.
This patch creates the driver C source file and introduces the structures that
will be used in asynchronous reads and writes.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of just putting it in debugging output, we can now put the
value in an Error.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Returning "Wrong medium type" for an image that does not have a valid
header is a bit weird. Improve the error by mentioning what format
was trying to open it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that we can return the "right" errors, use the Error** parameter
to pass them back instead of just printing them.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This prepares for propagating errors from vmdk_open_sparse and
vmdk_open_desc_file up to the caller of vmdk_open.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently, we just try reading a VMDK file as both image and descriptor.
This makes it hard to choose which of the two attempts gave the best error.
We'll decide in advance if the file looks like an image or a descriptor,
and this patch is the first step to that end.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Before:
$ ./qemu-io-old
qemu-io-old> open -r -o driver=vvfat,fat-type=24,dir=i386-softmmu
Valid FAT types are only 12, 16 and 32
qemu-io-old: can't open device (null): Could not open image: Invalid argument
After:
$ ./qemu-io
qemu-io> open -r -o driver=vvfat,fat-type=24,dir=i386-softmmu
qemu-io: can't open device (null): Valid FAT types are only 12, 16 and 32
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently, "gluster:///volname/img" and (using file. options)
"file.driver=gluster,file.filename=foo" will segfault. Also,
"//host/volname/img" will be rejected, but it is a valid URL
that should be accepted just fine with "file.driver=gluster".
Accept all of these, by inferring missing transport and host
as TCP and localhost respectively.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Before:
$ ./qemu-io-old
qemu-io-old> open -r -o file.driver=iscsi,file.filename=foo
Failed to parse URL : foo
qemu-io-old: can't open device (null): Could not open 'foo': Invalid argument
After:
$ ./qemu-io
qemu-io> open -r -o file.driver=iscsi,file.filename=foo
qemu-io: can't open device (null): Failed to parse URL : foo
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Before:
$ ./qemu-io-old
qemu-io-old> open -r -o file.driver=nbd
one of path and host must be specified.
qemu-io-old: can't open device (null): Could not open image: Invalid argument
$ ./qemu-io-old
qemu-io-old> open -r -o file.driver=nbd,file.host=foo,file.path=bar
path and host may not be used at the same time.
qemu-io-old: can't open device (null): Could not open image: Invalid argument
After:
$ ./qemu-io
qemu-io> open -r -o file.driver=nbd
qemu-io: can't open device (null): one of path and host must be specified.
$ ./qemu-io
qemu-io> open -r -o file.driver=nbd,file.host=foo,file.path=bar
qemu-io: can't open device (null): path and host may not be used at the same time.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Before:
$ qemu-io-old
qemu-io-old> open -r -o file.driver=nbd
qemu-io-old: can't open device (null): Could not open image: Invalid argument
$ ./qemu-io-old
qemu-io-old> open -r -o file.driver=nbd,file.host=foo,file.path=bar
path and host may not be used at the same time.
qemu-io-old: can't open device (null): Could not open image: Invalid argument
After:
$ ./qemu-io
qemu-io> open -r -o file.driver=nbd
one of path and host must be specified.
qemu-io: can't open device (null): Could not open image: Invalid argument
$ ./qemu-io
qemu-io> open -r -o file.driver=nbd,file.host=foo,file.path=bar
path and host may not be used at the same time.
qemu-io: can't open device (null): Could not open image: Invalid argument
Next patch will fix the error propagation.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This option is now unnecessary since specifying BDRV_O_PROTOCOL as flag
will do exactly the same.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add the bdrv_open() option BDRV_O_PROTOCOL which results in passing the
call to bdrv_file_open(). Additionally, make bdrv_file_open() static and
therefore bdrv_open() the only way to call it.
Consequently, all existing calls to bdrv_file_open() have to be adjusted
to use bdrv_open() with the BDRV_O_PROTOCOL flag instead.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Allow bdrv_open() to handle references to existing block devices just as
bdrv_file_open() is already capable of.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Make bdrv_open() take a pointer to a BDS pointer, similarly to
bdrv_file_open(). If a pointer to a NULL pointer is given, bdrv_open()
will create a new BDS with an empty name; if the BDS pointer is not
NULL, that existing BDS will be reused (in the same way as bdrv_open()
already did).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of making the backing file contents visible again after a discard
request, set the zero flag if possible (i.e. on version >= 3).
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
No longer adds flags and libs for them to global variables, instead
create config-host.mak variables like FOO_CFLAGS and FOO_LIBS, which is
used as per object cflags and libs.
This removes unwanted dependencies from libcacard.
Signed-off-by: Fam Zheng <famz@redhat.com>
[Split from Fam's patch to enable modules. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
error_is_set(&var) is the same as var != NULL, but it takes
whole-program analysis to figure that out. Unnecessarily hard for
optimizers, static checkers, and human readers. Dumb it down to
obvious.
Gets rid of several dozen Coverity false positives.
Note that the obvious form is already used in many places.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
When starting a block job, commit_active_start() relies on whether *errp
is set by mirror_start_job. This allows it to determine if the mirror
job start failed, so that it can clean up any changes to open flags from
the bdrv_reopen(). If errp is NULL, then it will not be able to
determine if mirror_start_job failed or not.
To avoid this, use a local Error variable, and then propagate the error
(if any) to errp.
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
There are a handful of places in the block layer where a failure path
has a valid -errno value, yet error_setg() is used. Those instances
should instead use error_setg_errno(), to preserve as much error
information as possible.
This patch replaces those instances with error_setg_errno(), so that
errno is passed up the stack in the error message.
Reported-By: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
iSCSI currently does not need to do any actions to support the
current usage of bdrv_reopen(). However, it is important to note
a couple of things: 1.) A connection will not be re-established to
an iSCSI target, and 2.) If iscsi_open() is changed to parse 'flags',
then iscsi_reopen_prepare() may need to be more than a stub.
In light of the above, this commit adds comments above both of the
functions to bring attention to these facts.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
raw copies over the BlockLimits of bs->file during bdrv_open().
However, since commit d34682cd it is immediately overwritten during
bdrv_refresh_limits(). This caused all fields except for
opt_transfer_length and opt_mem_alignment (which happen to be correctly
inherited in generic code) to be zeroed.
Move the BlockLimit assignment to a .bdrv_refresh_limits() callback to
make it work again for all fields.
Reported-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
In the case of a metadata preallocation with a large cluster size,
qcow2_alloc_cluster_offset() can allocate nothing and returns a
NULL l2meta. This patch checks for it and link2 l2 with only valid
l2meta.
Replace 9 and 512 with BDRV_SECTOR_BITS, BDRV_SECTOR_SIZE
respectively while at the function.
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When cluster size is big enough it can lead to an offset overflow
in qcow2_alloc_clusters_at(). This patch fixes it.
The allocation is stopped each time at L2 table boundary
(see handle_alloc()), so the possible maximum bytes could be
2^(cluster_bits - 3 + cluster_bits)
cluster_bits - 3 is used to compute the number of entry by L2
and the additional cluster_bits is to take into account each
clusters referenced by the L2 entries.
so int is safe for cluster_bits<=17, unsafe otherwise.
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
n_start can be actually calculated from offset. The number of
sectors to be allocated(n_end - n_start) can be passed in in
num. By removing n_start and n_end, we can save two parameters.
The side effect is there is a bug in qcow2.c:preallocate() that
passes incorrect n_start to qcow2_alloc_cluster_offset() is
fixed. The bug can be triggerred by a larger cluster size than
the default value(65536), for example:
./qemu-img create -f qcow2 \
-o 'cluster_size=131072,preallocation=metadata' file.img 4G
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
the opt_transfer_length has nothing to do with logical
block provisioning stuff so always copy it from
the block limits VPD page.
Reported-By: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds native support for accessing images on NFS
shares without the requirement to actually mount the entire
NFS share on the host.
NFS Images can simply be specified by an url of the form:
nfs://<host>/<export>/<filename>[?param=value[¶m2=value2[&...]]]
For example:
qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2
You need LibNFS from Ronnie Sahlberg available at:
git://github.com/sahlberg/libnfs.git
for this to work.
During configure it is automatically probed for libnfs and support
is enabled on-the-fly. You can forbid or enforce libnfs support
with --disable-libnfs or --enable-libnfs respectively.
Due to NFS restrictions you might need to execute your binaries
as root, allow them to open priviledged ports (<1024) or specify
insecure option on the NFS server.
For additional information on ROOT vs. non-ROOT operation and URL
format + parameters see:
https://raw.github.com/sahlberg/libnfs/master/README
Supported by qemu are the uid, gid and tcp-syncnt URL parameters.
LibNFS currently support NFS version 3 only.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Errors are inadvertently ignored in a few places. Has always been
broken. Spotted by Coverity.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
this adds a basic vmdk corruption check. it should detect severe
table corruptions and file truncation.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The QCOW2 .bdrv_make_empty implementation always returns 0 for success,
but does not actually do anything.
The proper way to not support an optional driver function stub is to
just not implement it, so let's remove the stub.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The QED .bdrv_make_empty() implementation does nothing but return
-ENOTSUP, which causes problems in bdrv_commit(). Since the function
stub exists for QED, it is called, which then always returns an error.
The proper way to not support an optional driver function stub is to
just not implement it, so let's remove the stub.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* bonzini/scsi-next:
scsi: Support TEST UNIT READY in the dummy LUN0
block: add .bdrv_reopen_prepare() stub for iscsi
virtio-scsi: Prevent assertion on missed events
virtio-scsi: Cleanup of I/Os that never started
scsi: Assign cancel_io vector for scsi_disk_emulate_ops
Conflicts:
block/iscsi.c
aliguori: resolve trivial merge conflict in block/iscsi.c
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
The new 'align' option of blkdebug can be used in order to emulate
backends with a required 4k alignment on hosts which only really require
512 byte alignment.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The iSCSI backend already gets the block size from the READ CAPACITY
command it sends. Save it so that the generic block layer gets it
too.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Add a bs->request_alignment field that contains the required
offset/length alignment for I/O requests and fill it in the raw block
drivers. Use ioctls if possible, else see what alignment it takes for
O_DIRECT to succeed.
While at it, also expose the memory alignment requirements, which may be
(and in practice are) different from the disk alignment requirements.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
When reopening with different flags, or when backing files disappear
from the chain, the limits may change. Make sure they get updated in
these cases.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit@irqsave.net>
This function separates filling the BlockLimits from bdrv_open(), which
allows it to call it from other operations which may change the limits
(e.g. modifications to the backing file chain or bdrv_reopen)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
If the top image to commit is the active layer, and also larger than
the base image, then an I/O error will likely be returned during
block-commit.
For instance, if we have a base image with a virtual size 10G, and a
active layer image of size 20G, then committing the snapshot via
'block-commit' will likely fail.
This will automatically attempt to resize the base image, if the
active layer image to be committed is larger.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
libcurl versions 7.16.0 and later have a timer callback interface which
must be implemented in order for libcurl to make forward progress (it
will sometimes rely on being called back on the timeout if there are
no file descriptors registered). Implement the callback, and use a
QEMU AIO timer to ensure we prod libcurl again when it asks us to.
Based on Peter's original patch plus my fix to add curl_multi_timeout_do.
Should compile just fine even on older versions of libcurl.
I also tried copy-on-read and streaming:
$ ./qemu-img create -f qcow2 -o \
backing_file=http://download.fedoraproject.org/pub/fedora/linux/releases/20/Live/x86_64/Fedora-Live-Desktop-x86_64-20-1.iso \
foo.qcow2 1G
$ x86_64-softmmu/qemu-system-x86_64 \
-drive if=none,file=foo.qcow2,copy-on-read=on,id=cd \
-device ide-cd,drive=cd --enable-kvm -m 1024
Direct http usage is probably too slow, but with copy-on-read ultimately
the image does boot!
After some time, streaming gets canceled by an EIO, which needs further
investigation.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently there is no way to query BlockStats of the backing chain. This
adds "backing" field into BlockStats to make it possible.
The comment of "parent" is reworded.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In the function mirror_iteration() -> qemu_iovec_init(),
it allocates memory for op->qiov.iov, when the write request calls back,
but in the function mirror_iteration_done(), it only frees the op,
not free the op->qiov.iov, so this causes memory leak.
It should use qemu_iovec_destroy() to free op->qiov.
Signed-off-by: Zhang Min <rudy.zhangmin@huawei.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It was muted in the previous commit 4bc74be9. Let's revive it since nothing
prevents us to do it.
With this patch, following command will work as other formats:
$ qemu-img map sheepdog:image
Cc: qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Accoring to qcow spec, the offset fields in l1e, l2e and ref table entry
start at bit 9. The offset is cluster offset, and the smallest possible
cluster size is 512 bytes.
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If the filename is not prefixed by "blkverify:" in
blkverify_parse_filename(), the blkverify driver was not selected
through that protocol prefix, but by an explicit command line (or QMP)
option (like driver=blkverify).
If blkverify_parse_filename() has been called, a filename has been
given. If it is not prefixed, it is probably really just a plain
filename. This is no problem, since we can use it as the test image
filename and rely on the user to specify the raw image filename through
the new corresponding option.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Introduce the "test" and "raw" options for specifying images.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Introduce the "image" option as an alternative to specifying the image
through the filename.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Allow specifying a reference to an existing block device (by name) for
bdrv_file_open() instead of a filename and/or options.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Use qemu_config_parse_qdict() to parse the command-line options in
addition to the config file.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move the check whether there actually is a config file into the
read_config() function.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If the filename is not prefixed by "blkdebug:" in
blkdebug_parse_filename(), the blkdebug driver was not selected through
that protocol prefix, but by an explicit command line option
(file.driver=blkdebug or something similar). Contrary to the current
reaction, this is not a problem at all; we just need to store the
filename (in the x-image option) and can go on; the user just has to
manually specify the config option.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Use an Error variable in the read_config() function.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Local variable "n" as int64_t avoids overflow with large sector number
calculation. See test case change for failure case.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We should pass base_inode->vdi_id to base_vdi_id of SheepdogVdiReq so that sheep
can create a clone instead a fresh volume.
This fixes following command:
qemu-create -b sheepdog:base sheepdog:clone
so users can boot sheepdog:clone as a normal volume.
Cc: qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
GlusterFS supports creation of zero-filled file on GlusterFS volume
by means of an API called glfs_zerofill(). Use this API from QEMU to
create an image that is filled with zeroes by using the preallocation
option of qemu-img.
qemu-img create gluster://server/volume/image -o preallocation=full 10G
The allowed values for preallocation are 'full' and 'off'. By default
preallocation is off and image is not zero-filled.
glfs_zerofill() offloads the writing of zeroes to the server and if
the storage supports SCSI WRITESAME, GlusterFS server can issue
BLKZEROOUT ioctl to achieve the zeroing.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Support .bdrv_co_write_zeroes() from gluster driver by using GlusterFS API
glfs_zerofill() that off-loads the writing of zeroes to GlusterFS server.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Convert the read, write, flush and discard implementations from aio-based
ones to coroutine based ones.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
if an async libiscsi call fails directly it can only be due
to an out of memory condition. All other errors are returned
through the callback.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
rbd callbacks are called from non-QEMU threads. Up until now a pipe was
used to signal completion back to the QEMU iothread.
The pipe writer code handles EAGAIN using select(2). The select(2) API
is not scalable since fd_set size is static. FD_SET() can write beyond
the end of fd_set if the file descriptor number is too high. (QEMU's
main loop uses poll(2) to avoid this issue with select(2).)
Since the pipe itself is quite clumsy to use and QEMUBH is now
thread-safe, just schedule a BH from the rbd callback function. This
way we can simplify I/O completion in addition to eliminating the
potential FD_SET() crash when file descriptor numbers become too high.
Crash scenario: QEMU already has 1024 file descriptors open. Hotplug an
rbd drive and get the pipe writer to take the select(2) code path.
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Tested-by: Josh Durgin <josh.durgin@inktank.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
To suppport reopen(), the .bdrv_reopen_prepare() stub must exist.
iSCSI does not have anything that needs to be done to support reopen,
so we can just implement the _prepare() stub.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* bonzini/scsi-next:
scsi-disk: add UNMAP limits to block limits VPD page
block/iscsi: use a bh to schedule co reentrance
Message-id: 1387720926-11421-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
This is a boiler-plate _nofail variant of qemu_opts_create. Remove and
use error_abort in call sites.
null/0 arguments needs to be added for the id and fail_if_exists fields
in affected callsites due to argument inconsistency between the normal and
no_fail variants.
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
We support top == active for commit now, remove the check and add an
assertion here.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If active is top, it will be mirrored to base, (with block/mirror.c
code), then the image is switched when user completes the block job.
QMP documentation is updated.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>