Commit Graph

2073 Commits

Author SHA1 Message Date
Peter Wu
f6e6652d7c block/dmg: validate chunk size to avoid overflow
Previously the chunk size was not checked, allowing for a large memory
allocation. This patch checks whether the chunks size is within the
resource fork length, and whether the resource fork is below the
trailer of the dmg file.

Signed-off-by: Peter Wu <peter@lekensteyn.nl>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1420566495-13284-6-git-send-email-peter@lekensteyn.nl
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:21 +01:00
Peter Wu
7aee37b93a block/dmg: process a buffer instead of reading ints
As the decoded plist XML is not a pointer in the file,
dmg_read_mish_block must be able to process a buffer instead of a file
pointer. Since the full buffer must be processed, let's change the
return value again to just a success flag.

Signed-off-by: Peter Wu <peter@lekensteyn.nl>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1420566495-13284-5-git-send-email-peter@lekensteyn.nl
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:21 +01:00
Peter Wu
b0e8dc5d54 block/dmg: extract processing of resource forks
Besides the offset, also read the resource length. This length is now
used in the extracted function to verify the end of the resource fork
against "count" from the resource fork.

Instead of relying on the value of offset to conclude whether the
resource fork is available or not (info_begin==0), check the
rsrc_fork_length instead. This would allow a dmg file to begin with a
resource fork. This seemingly unnecessary restriction was found while
trying to craft a DMG file by hand.

Other changes:

 - Do not require resource data offset to be 0x100 (but check that it
   is within bounds though).
 - Further improve boundary checking (resource data must be within
   the resource fork).
 - Use correct value for resource data length (spotted by John Snow)
 - Consider the resource data offset when determining info_end.
   This fixes an EINVAL on the tuxpaint dmg example.

The resource fork format is documented at
https://developer.apple.com/legacy/library/documentation/mac/pdf/MoreMacintoshToolbox.pdf#page=151

Signed-off-by: Peter Wu <peter@lekensteyn.nl>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1420566495-13284-4-git-send-email-peter@lekensteyn.nl
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:21 +01:00
Peter Wu
65a1c7c96a block/dmg: extract mish block decoding functionality
Extract the mish block decoder such that this can be used for other
formats in the future. A new DmgHeaderState struct is introduced to
share state while decoding.

The code is kept unchanged as much as possible, a "fail" label is added
for example where a simple return would probably do. In dmg_open, the
variable "tmp" is renamed to "rsrc_data_offset" for clarity and comments
have been added explaining various data.

Note that this patch has one subtle difference with the previous
version which should not affect functionality. In the previous code,
the end of a resource was inferred from the mish block (the offsets
would be increased by the fields). In this patch, the resource length
is used instead to avoid the need to rely on the previous offsets.

Signed-off-by: Peter Wu <peter@lekensteyn.nl>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1420566495-13284-3-git-send-email-peter@lekensteyn.nl
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:21 +01:00
Peter Wu
fa8354bd22 block/dmg: properly detect the UDIF trailer
DMG files have a variable length with a UDIF trailer at the end of a
file. This UDIF trailer is essential as it describes the contents of
the image. At the moment however, the start of this trailer is almost
always incorrect as bdrv_getlength() returns a multiple of the block
size (rounded up). This results in a failure to recognize DMG files,
resulting in Invalid argument (EINVAL) errors.

As there is no API to retrieve the real file size, look for the magic
header in the last two sectors to find the start of this 512-byte UDIF
trailer (the "koly" block).

The resource fork offset ("info_begin") has its offset adjusted as the
initial value of offset does not mean "end of file" anymore, but "begin
of UDIF trailer".

[Replaced error_set(errp, ERROR_CLASS_GENERIC_ERROR, ...) with
error_setg(errp, ...) as discussed with Peter.
--Stefan]

Signed-off-by: Peter Wu <peter@lekensteyn.nl>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1420566495-13284-2-git-send-email-peter@lekensteyn.nl
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:21 +01:00
Francesco Romani
e2462113b2 block: add event when disk usage exceeds threshold
Managing applications, like oVirt (http://www.ovirt.org), make extensive
use of thin-provisioned disk images.
To let the guest run smoothly and be not unnecessarily paused, oVirt sets
a disk usage threshold (so called 'high water mark') based on the occupation
of the device,  and automatically extends the image once the threshold
is reached or exceeded.

In order to detect the crossing of the threshold, oVirt has no choice but
aggressively polling the QEMU monitor using the query-blockstats command.
This lead to unnecessary system load, and is made even worse under scale:
deployments with hundreds of VMs are no longer rare.

To fix this, this patch adds:
* A new monitor command `block-set-write-threshold', to set a mark for
  a given block device.
* A new event `BLOCK_WRITE_THRESHOLD', to report if a block device
  usage exceeds the threshold.
* A new `write_threshold' field into the `BlockDeviceInfo' structure,
  to report the configured threshold.

This will allow the managing application to use smarter and more
efficient monitoring, greatly reducing the need of polling.

[Updated qemu-iotests 067 output to add the new 'write_threshold'
property. --Stefan]
[Changed g_assert_false() to !g_assert() to fix the build on older glib
versions. --Kevin]

Signed-off-by: Francesco Romani <fromani@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1421068273-692-1-git-send-email-fromani@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:21 +01:00
Peter Lieven
454057b7d9 block-backend: expose bs->bl.max_transfer_length
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:21 +01:00
Peter Lieven
f4564d53c6 block: add accounting for merged requests
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:21 +01:00
Fam Zheng
35f5a49374 qed: Really remove unused field QEDAIOCB.finished
The commit 533ffb17a that removed qed_aiocb_info.cancel said to remove
this but didn't do it.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:21 +01:00
Denis V. Lunev
1cdc3239f1 block: use fallocate(FALLOC_FL_PUNCH_HOLE) & fallocate(0) to write zeroes
This sequence works efficiently if FALLOC_FL_ZERO_RANGE is not supported.
Unfortunately, FALLOC_FL_ZERO_RANGE is supported on really modern systems
and only for a couple of filesystems. FALLOC_FL_PUNCH_HOLE is much more
mature.

The sequence of 2 operations FALLOC_FL_PUNCH_HOLE and 0 is necessary due
to the following reasons:
- FALLOC_FL_PUNCH_HOLE creates a hole in the file, the file becomes
  sparse. In order to retain original functionality we must allocate
  disk space afterwards. This is done using fallocate(0) call
- fallocate(0) without preceeding FALLOC_FL_PUNCH_HOLE will do nothing
  if called above already allocated areas of the file, i.e. the content
  will not be zeroed

This should increase the performance a bit for not-so-modern kernels.

CC: Max Reitz <mreitz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Peter Lieven <pl@kamp.de>
CC: Fam Zheng <famz@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:20 +01:00
Denis V. Lunev
d50d822219 block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes
There is a possibility that we are extending our image and thus writing
zeroes beyond the end of the file. In this case we do not need to care
about the hole to make sure that there is no data in the file under
this offset (pre-condition to fallocate(0) to work). We could simply call
fallocate(0).

This improves the performance of writing zeroes even on really old
platforms which do not have even FALLOC_FL_PUNCH_HOLE.

Before the patch do_fallocate was used when either
CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE are defined.
Now the story is different. CONFIG_FALLOCATE is defined when Linux
fallocate is defined, posix_fallocate is completely different story
(CONFIG_POSIX_FALLOCATE). CONFIG_FALLOCATE is mandatory prerequite
for both CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE
thus we are on the safe side.

CC: Max Reitz <mreitz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Peter Lieven <pl@kamp.de>
CC: Fam Zheng <famz@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:20 +01:00
Denis V. Lunev
b953f07500 block: use fallocate(FALLOC_FL_ZERO_RANGE) in handle_aiocb_write_zeroes
This efficiently writes zeroes on Linux if the kernel is capable enough.
FALLOC_FL_ZERO_RANGE correctly handles all cases, including and not
including file expansion.

CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Peter Lieven <pl@kamp.de>
CC: Fam Zheng <famz@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:20 +01:00
Denis V. Lunev
37cc9f7f68 block/raw-posix: refactor handle_aiocb_write_zeroes a bit
move code dealing with a block device to a separate function. This will
allow to implement additional processing for ordinary files.

Please note, that xfs_code has been moved before checking for
s->has_write_zeroes as xfs_write_zeroes does not touch this flag inside.
This makes code a bit more consistent.

CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Peter Lieven <pl@kamp.de>
CC: Fam Zheng <famz@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:20 +01:00
Denis V. Lunev
0b99171230 block/raw-posix: create do_fallocate helper
The pattern
    do {
        if (fallocate(s->fd, mode, offset, len) == 0) {
            return 0;
        }
    } while (errno == EINTR);
    ret = translate_err(-errno);
will be commonly useful in next patches. Create helper for it.

CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Peter Lieven <pl@kamp.de>
CC: Fam Zheng <famz@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:20 +01:00
Denis V. Lunev
1486df0e31 block/raw-posix: create translate_err helper to merge errno values
actually the code
    if (ret == -ENODEV || ret == -ENOSYS || ret == -EOPNOTSUPP ||
        ret == -ENOTTY) {
        ret = -ENOTSUP;
    }
is present twice and will be added a couple more times. Create helper
for this.

CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Peter Lieven <pl@kamp.de>
CC: Fam Zheng <famz@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:20 +01:00
Jeff Cody
cdf9634bdf block: vhdx - force FileOffsetMB field to '0' for certain block states
The v1.0.0 spec calls out PAYLOAD_BLOCK_ZERO FileOffsetMB field as being
'reserved'.  In practice, this means that Hyper-V will fail to read a
disk image with PAYLOAD_BLOCK_ZERO block states with a FileOffsetMB
value other than 0.

The other states that indicate a block that is not there
(PAYLOAD_BLOCK_UNDEFINED, PAYLOAD_BLOCK_NOT_PRESENT,
 PAYLOAD_BLOCK_UNMAPPED) have multiple options for what FileOffsetMB may
be set to, and '0' is explicitly called out as an option.

For all the above states, we will also just set the FileOffsetMB value
to 0.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: a9fe92f53f07e6ab1693811e4312c0d1e958500b.1421787566.git.jcody@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-01-23 12:41:32 -05:00
Jeff Cody
9a29e18f7d block: update string sizes for filename,backing_file,exact_filename
The string field entries 'filename', 'backing_file', and
'exact_filename' in the BlockDriverState struct are defined as 1024
bytes.

However, many places that use these values accept a maximum of PATH_MAX
bytes, so we have a mixture of 1024 byte and PATH_MAX byte allocations.
This patch makes the BlockDriverStruct field string sizes match usage.

This patch also does a few fixes related to the size that needs to
happen now:

    * the block qapi driver is updated to use PATH_MAX bytes
    * the qcow and qcow2 drivers have an additional safety check
    * the block vvfat driver is updated to use PATH_MAX bytes
      for the size of backing_file, for systems where PATH_MAX is < 1024
      bytes.
    * qemu-img uses PATH_MAX rather than 1024.  These instances were not
      changed to be dynamically allocated, however, as the extra
      temporary 3K in stack usage for qemu-img does not seem worrisome.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-23 18:17:06 +01:00
Jeff Cody
1d33936ea8 block: mirror - change string allocation to 2-bytes
The backing_filename string in mirror_run() is only used to check
for a NULL string, so we don't need to allocate 1024 bytes (or, later,
PATH_MAX bytes), when we only need to copy the first 2 characters.

We technically only need 1 byte, as we are just checking for NULL, but
since backing_filename[] is populated by bdrv_get_backing_filename(), a
string size of 1 will always only return '\0';

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-23 18:17:06 +01:00
Jeff Cody
564d64bdde block: qapi - move string allocation from stack to the heap
Rather than declaring 'backing_filename2' on the stack in
bdrv_query_image_info(), dynamically allocate it on the heap.

Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-23 18:17:06 +01:00
Jeff Cody
fe2065629a block: vmdk - move string allocations from stack to the heap
Functions 'vmdk_parse_extents' and 'vmdk_create' allocate several
PATH_MAX sized arrays on the stack.  Make these dynamically allocated.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-23 18:17:05 +01:00
Jeff Cody
395a22fae0 block: vmdk - make ret variable usage clear
Keep the variable 'ret' something that is returned by the function it is
defined in.  For the return value of 'sscanf', use a more meaningful
variable name.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-23 18:17:05 +01:00
Max Reitz
8dd93d9339 qcow2: Add two more unalignment checks
This adds checks for unaligned L2 table offsets and unaligned data
cluster offsets (actually the preallocated offsets for zero clusters) to
the zero cluster expansion function.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-23 18:17:05 +01:00
Paolo Bonzini
66552b894b coroutine: drop qemu_coroutine_adjust_pool_size
This is not needed anymore.  The new TLS-based algorithm is adaptive.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1417518350-6167-7-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-01-13 13:43:29 +00:00
Fam Zheng
c29c1dd312 qmp: Add command 'blockdev-backup'
Similar to drive-backup, but this command uses a device id as target
instead of creating/opening an image file.

Also add blocker on target bs, since the target is also a named device
now.

Add check and report error for bs == target which became possible but is
an illegal case with introduction of blockdev-backup.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1418899027-8445-3-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-01-13 11:47:56 +00:00
Vladimir Sementsov-Ogievskiy
c4237dfa63 block: fix spoiling all dirty bitmaps by mirror and migration
Mirror and migration use dirty bitmaps for their purposes, and since
commit [block: per caller dirty bitmap] they use their own bitmaps, not
the global one. But they use old functions bdrv_set_dirty and
bdrv_reset_dirty, which change all dirty bitmaps.

Named dirty bitmaps series by Fam and Snow are affected: mirroring and
migration will spoil all (not related to this mirroring or migration)
named dirty bitmaps.

This patch fixes this by adding bdrv_set_dirty_bitmap and
bdrv_reset_dirty_bitmap, which change concrete bitmap. Also, to prevent
such mistakes in future, old functions bdrv_(set,reset)_dirty are made
static, for internal block usage.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
CC: John Snow <jsnow@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Denis V. Lunev <den@openvz.org>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1417081246-3593-1-git-send-email-vsementsov@parallels.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-01-13 11:47:56 +00:00
Max Reitz
1085daf941 block/vmdk: Relative backing file for creation
When a vmdk image is created with a backing file, it is opened to check
whether it is indeed a vmdk file by letting qemu probe it. When doing
so, the backing filename is relative to the image's base directory so it
should be interpreted accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-13 11:47:56 +00:00
Max Reitz
9f07429e88 block: JSON filenames and relative backing files
When using a relative backing file name, qemu needs to know the
directory of the top image file. For JSON filenames, such a directory
cannot be easily determined (e.g. how do you determine the directory of
a qcow2 BDS directly on top of a quorum BDS?). Therefore, do not allow
relative filenames for the backing file of BDSs only having a JSON
filename.

Furthermore, BDS::exact_filename should be used whenever possible. If
BDS::filename is not equal to BDS::exact_filename, the former will
always be a JSON object.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-13 11:47:56 +00:00
Peter Wu
debfb917a4 block/iscsi: fix uninitialized variable
'ret' was never initialized in the success path.

Signed-off-by: Peter Wu <peter@lekensteyn.nl>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-03 09:22:13 +01:00
Paolo Bonzini
82595da8de linux-aio: simplify removal of completed iocbs from the list
There is no need to do another O(n) pass on the list; the iocb to
split the list at is already available through the array we passed to
io_submit.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1418305950-30924-6-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 16:57:55 +00:00
Paolo Bonzini
de35464461 linux-aio: drop return code from laio_io_unplug and ioq_submit
These are unused.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1418305950-30924-5-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 16:57:55 +00:00
Paolo Bonzini
8455ce053a linux-aio: rename LaioQueue idx field to "n"
It does not identify an index in an array anymore.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1418305950-30924-4-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 16:57:55 +00:00
Paolo Bonzini
43f2376e09 linux-aio: track whether the queue is blocked
Avoid that unplug submits requests when io_submit reported that it
couldn't accept more; at the same time, try more io_submit calls if it
could handle the whole set of requests that were passed, so that the
"blocked" flag is reset as soon as possible.

After the previous patch, laio_submit already tried to avoid submitting
requests to a blocked queue, by comparing s->io_q.idx with "==" instead
of the more natural ">=".  Switch to the simpler expression now that we
have the "blocked" flag.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1418305950-30924-3-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 16:57:55 +00:00
Paolo Bonzini
28b240877b linux-aio: queue requests that cannot be submitted
Keep a queue of requests that were not submitted; pass them to
the kernel when a completion is reported, unless the queue is
plugged.

The array of iocbs is rebuilt every time from scratch.  This
avoids keeping the iocbs array and list synchronized.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1418305950-30924-2-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 16:57:55 +00:00
Jeff Cody
85b712c9d5 block: vhdx - set .bdrv_has_zero_init to bdrv_has_zero_init_1
Now that new VHDX images will default to BAT block states of
PAYLOAD_BLOCK_ZERO, we can indicate that VHDX has zero init.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 5e582703e36450b9ca939e2e5c9fa3930030f7fe.1418018421.git.jcody@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 16:35:35 +00:00
Jeff Cody
30af51ce7f block: vhdx - change .vhdx_create default block state to ZERO
The VHDX spec specifies that the default new block state is
PAYLOAD_BLOCK_NOT_PRESENT for a dynamic VHDX image, and
PAYLOAD_BLOCK_FULLY_PRESENT for a fixed VHDX image.

However, in order to create space-efficient VHDX images with qemu-img
convert, it is desirable to be able to set has_zero_init to true for
VHDX.

There is currently an option when creating VHDX images, to use block
state ZERO for new blocks.  However, this currently defaults to 'off'.
In order to be able to eventually set has_zero_init to true for VHDX,
this needs to default to 'on'.

This patch changes the default to 'on', and provides some help
information to warn against setting it to 'off' when using qemu-img
convert.

[Max Reitz pointed out that a full stop was missing at the end of the
VHDX_BLOCK_OPT_ZERO option help text.  I have added it.
--Stefan]

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 85164899eacc86e150c3ceba793cf93b398dedd7.1418018421.git.jcody@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 15:42:49 +00:00
Jeff Cody
a9d1e9daa5 block: vhdx - update PAYLOAD_BLOCK_UNMAPPED value to match 1.00 spec
The 0.95 VHDX spec defined PAYLOAD_BLOCK_UNMAPPED to be 5.  The 1.00
VHDX spec redefines PAYLOAD_BLOCK_UNMAPPED to be 3 instead.

The original value of 5 is now an undefined state in the spec, but it
should be safe to treat it the same and return zeros for data read.
This way, we can maintain compatibility with any images out in the wild
that may have been created in accordance to the 0.95 spec.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 8a4d2da73a8dbc04cde62bea782fc09ff84b1cf1.1418018421.git.jcody@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 15:42:22 +00:00
Jeff Cody
0571df44a1 block: vhdx - remove redundant comments
Minor cleanup.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: e8718ae3fd3e40a527e46a00e394973fbaab4d53.1418018421.git.jcody@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 15:42:22 +00:00
Gonglei
9281dbe653 block/rbd: fix memory leak
Variable local_err going out of scope
leaks the storage it points to.

Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Amos Kong <akong@redhat.com>
Message-id: 1417674851-6248-1-git-send-email-arei.gonglei@huawei.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 13:16:56 +00:00
Max Reitz
5c98415b2a vmdk: Fix error for JSON descriptor file names
If vmdk blindly tries to use path_combine() using bs->file->filename as
the base file name, this will result in a bad error message for JSON
file names when calling bdrv_open(). It is better to only try
bs->file->exact_filename; if that is empty, bs->file->filename will be
useless for path_combine() and an error should be emitted (containing
bs->file->filename because desc_file_path (which is
bs->file->exact_filename) is empty).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1417615043-26174-2-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 13:14:10 +00:00
Fam Zheng
d899d2e248 vmdk: Set errp on failures in vmdk_open_vmdk4
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Don Koch <dkoch@verizon.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1417649314-13704-7-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:21 +01:00
Fam Zheng
9aeecbbc62 vmdk: Remove unnecessary initialization
It will be assigned to the return value of vmdk_read_desc.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Don Koch <dkoch@verizon.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1417649314-13704-6-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:21 +01:00
Fam Zheng
03c3359dfc vmdk: Check descriptor file length when reading it
Since a too small file cannot be a valid VMDK image, and also since the
buffer's first 4 bytes will be unconditionally examined by
vmdk_open_sparse, let's error out the small file case to be clear.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Don Koch <dkoch@verizon.com>
Message-id: 1417649314-13704-5-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:21 +01:00
Fam Zheng
73b7bcad43 vmdk: Clean up descriptor file reading
Zeroing a buffer that will be filled right after is not necessary, and
allocating a power of two + 1 is naughty.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Don Koch <dkoch@verizon.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1417649314-13704-4-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:21 +01:00
Fam Zheng
8a3e0bc370 vmdk: Fix comment to match code of extent lines
commit 04d542c8b (vmdk: support vmfs files) added support of VMFS extent
type but the comment above the changed code is left out. Update the
comment so they are consistent.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Don Koch <dkoch@verizon.com>
Message-id: 1417649314-13704-3-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:21 +01:00
Fam Zheng
e5dc64b8ff vmdk: Use g_random_int to generate CID
This replaces two "time(NULL)" invocations with "g_random_int()".
According to VMDK spec, CID "is a random 32‐bit value updated the first
time the content of the virtual disk is modified after the virtual disk
is opened". Using "seconds since epoch" is just a "lame way" to generate
it, and not completely safe because of the low precision.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Don Koch <dkoch@verizon.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1417649314-13704-2-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:21 +01:00
Jeff Cody
625fa9fe6f block: remove BLOCK_OPT_NOCOW from vpc_create_opts
In commit fef6070, the need for NOCOW was removed from the vpc driver,
as we removed the the posix calls.  However, the BLOCK_OPT_NOCOW was not
removed from vpc_create_opts.  This was a mistake - remove the opt from
there as well.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Message-id: 8ba076fa725fed681cde7d8afc4fb239ae06a9c6.1417620301.git.jcody@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:21 +01:00
Jeff Cody
0d0d7f47b4 block: remove BLOCK_OPT_NOCOW from vdi_create_opts
In commit 7074786, the need for NOCOW was removed from the vdi driver,
as we removed the the posix calls.  However, the BLOCK_OPT_NOCOW was not
removed from vdi_create_opts.  This was a mistake - remove the opt from
there as well.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Message-id: e189364de11929d8fa04722f5d845de0a9834d44.1417620301.git.jcody@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:20 +01:00
Max Reitz
01212d4ed6 block/raw-posix: Fix ret in raw_open_common()
The return value must be negative on error; there is one place in
raw_open_common() where errp is set, but ret remains 0. Fix it.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:20 +01:00
Max Reitz
6a69b9620a qcow2: Respect bdrv_truncate() error
bdrv_truncate() may fail and qcow2_write_compressed() should return the
error code in that case.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:20 +01:00
Max Reitz
3b5e14c76a qcow2: Flushing the caches in qcow2_close may fail
qcow2_cache_flush() may fail; if one of the caches failed to be flushed
successfully to disk in qcow2_close() the image should not be marked
clean, and we should emit a warning.

This breaks the (qcow2-specific) iotests 026, 071 and 089; change their
output accordingly.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:20 +01:00
Max Reitz
11c89769dc qcow2: Prevent numerical overflow
In qcow2_alloc_cluster_offset(), *num is limited to
INT_MAX >> BDRV_SECTOR_BITS by all callers. However, since remaining is
of type uint64_t, we might as well cast *num to that type before
performing the shift.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:20 +01:00
Max Reitz
fd752801ae block/nfs: Add create_opts
The nfs protocol driver is capable of creating images, but did not
specify any creation options. Fix it.

A way to test this issue is the following:

$ qemu-img create -f nfs nfs://127.0.0.1/foo.qcow2 64M

Without this patch, it segfaults. With this patch, it does not. However,
this is not something that should really work; qemu-img should check
whether the parameter for the -f option (and -O for convert) is indeed a
format, and error out if it is not. Therefore, I am not making it an
iotest.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:19 +01:00
Max Reitz
1bcb15cf77 block/vvfat: qcow driver may not be found
Although virtually impossible right now, bdrv_find_format("qcow") may
fail. The vvfat block driver should heed that case.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:19 +01:00
Max Reitz
ef8104378c block: Omit bdrv_find_format for essential drivers
We can always assume raw, file and qcow2 being available; so do not use
bdrv_find_format() to locate their BlockDriver objects but statically
reference the respective objects.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:19 +01:00
Max Reitz
5f535a941e block: Make essential BlockDriver objects public
There are some block drivers which are essential to QEMU and may not be
removed: These are raw, file and qcow2 (as the default non-raw format).
Make their BlockDriver objects public so they can be directly referenced
throughout the block layer without needing to call bdrv_find_format()
and having to deal with an error at runtime, while the real problem
occurred during linking (where raw, file or qcow2 were not linked into
qemu).

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:19 +01:00
Paolo Bonzini
a56ebc6ba4 block: do not use get_clock()
Use the external qemu-timer API instead.

No one else should be calling cpu_get_clock(), get_clock() and
get_clock_realtime() directly; they are internal functions and they
should be confined to qemu-timer.c and cpus.c (where the icount
implementation resides).  All accesses should go through
qemu_clock_get_ns.

Cc: kwolf@redhat.com
Cc: stefanha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1417010463-3527-2-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:13 +01:00
Kevin Wolf
2ebafc854d qcow2: Fix header extension size check
After reading the extension header, offset is incremented, but not
checked against end_offset any more. This way an integer overflow could
happen when checking whether the extension end is within the allowed
range, effectively disabling the check.

This patch adds the missing check and a test case for it.

Cc: qemu-stable@nongnu.org
Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1416935562-7760-2-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:13 +01:00
Kevin Wolf
38f3ef574b raw: Prohibit dangerous writes for probed images
If the user neglects to specify the image format, QEMU probes the
image to guess it automatically, for convenience.

Relying on format probing is insecure for raw images (CVE-2008-2004).
If the guest writes a suitable header to the device, the next probe
will recognize a format chosen by the guest.  A malicious guest can
abuse this to gain access to host files, e.g. by crafting a QCOW2
header with backing file /etc/shadow.

Commit 1e72d3b (April 2008) provided -drive parameter format to let
users disable probing.  Commit f965509 (March 2009) extended QCOW2 to
optionally store the backing file format, to let users disable backing
file probing.  QED has had a flag to suppress probing since the
beginning (2010), set whenever a raw backing file is assigned.

All of these additions that allow to avoid format probing have to be
specified explicitly. The default still allows the attack.

In order to fix this, commit 79368c8 (July 2010) put probed raw images
in a restricted mode, in which they wouldn't be able to overwrite the
first few bytes of the image so that they would identify as a different
image. If a write to the first sector would write one of the signatures
of another driver, qemu would instead zero out the first four bytes.
This patch was later reverted in commit 8b33d9e (September 2010) because
it didn't get the handling of unaligned qiov members right.

Today's block layer that is based on coroutines and has qiov utility
functions makes it much easier to get this functionality right, so this
patch implements it.

The other differences of this patch to the old one are that it doesn't
silently write something different than the guest requested by zeroing
out some bytes (it fails the request instead) and that it doesn't
maintain a list of signatures in the raw driver (it calls the usual
probe function instead).

Note that this change doesn't introduce new breakage for false positive
cases where the guest legitimately writes data into the first sector
that matches the signatures of an image format (e.g. for nested virt):
These cases were broken before, only the failure mode changes from
corruption after the next restart (when the wrong format is probed) to
failing the problematic write request.

Also note that like in the original patch, the restrictions only apply
if the image format has been guessed by probing. Explicitly specifying a
format allows guests to write anything they like.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1416497234-29880-8-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:13 +01:00
Max Reitz
2c28b21f7c block: Add blk_add_close_notifier() for BB
Adding something like a "delete notifier" to a BlockBackend would not
make much sense, because whoever is interested in registering there will
probably hold a reference to that BlockBackend; therefore, the notifier
will never be called (or only when the notifiee already relinquished its
reference and thus most probably is no longer interested in that
notification).

Therefore, this patch just passes through the close notifier interface
of the root BDS. This will be called when the device is ejected, for
instance, and therefore does make sense.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1416309679-333-4-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:12 +01:00
Max Reitz
2019ba0a01 block: Add AioContextNotifier functions to BB
Because all BlockDriverStates behind a single BlockBackend reside in a
single AioContext, it is fine to just pass these functions
(blk_add_aio_context_notifier() and blk_remove_aio_context_notifier())
through to the root BlockDriverState.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1416309679-333-3-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:12 +01:00
Max Reitz
2bb0dce762 block: Lift more functions into BlockBackend
There are already some blk_aio_* functions, so we might as well have
blk_co_* functions (as far as we need them). This patch adds
blk_co_flush(), blk_co_discard(), and also blk_invalidate_cache() (which
is not a blk_co_* function but is needed nonetheless).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1416309679-333-2-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:12 +01:00
Max Reitz
8779441b1b blkdebug: Simplify and improve filename generation
Instead of actually recreating the options from scratch, just reuse the
options given for creating the BDS, which are the configuration file
name and additional options. In case there are no additional options we
can thus create a plain filename.

This obviously results in a different output for qemu-iotest 099 which
exactly tests this filename generation. Fix it up as well.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1415697825-26678-2-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:11 +01:00
Kevin Wolf
9e193c5a65 block/qapi: Add cache information to query-block
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-12-10 10:31:09 +01:00
Fam Zheng
f71eaa74c0 qmp: Add optional switch "query-nodes" in query-blockstats
This bool option will allow query all the node names. It iterates all
the BDSes that are assigned a name, also in this case don't query up the
backing chain.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:25:29 +01:00
Fam Zheng
4875a77950 block: Include "node-name" if present in query-blockstats
Node name is a better identifier of BDS.

We will want to query statistics of a BDS node buried in the BDS graph,
so reporting the node's name if there is one will do the trick.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:25:29 +01:00
Kevin Wolf
24bf10dac3 Revert "qemu-img info: show nocow info"
This reverts commit 000c4dfff4.

The main reason for reverting this commit before the 2.2 release is that
it adds a QAPI interface that we don't want to keep: The 'nocow' flag
doesn't generally make sense for block nodes, but only for the raw-posix
driver. It should therefore be part of ImageInfoSpecific rather than
ImageInfo.

The commit contains more problems, but unlike the API stability issue
they wouldn't justify reverting it.

Conflicts:
	block/qapi.c

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-24 13:52:10 +01:00
Max Reitz
098ffa6674 block/raw-posix: Catch fsync() errors
fsync() may fail, and that case should be handled.

Reported-by: László Érsek <lersek@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-11-18 12:09:00 +01:00
Max Reitz
731de38052 block/raw-posix: Only sync after successful preallocation
The loop which filled the file with zeroes may have been left early due
to an error. In that case, the fsync() should be skipped.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-11-18 12:09:00 +01:00
Max Reitz
39411cf3c3 block/raw-posix: Fix preallocating write() loop
write() may write less bytes than requested; in this case, the number of
bytes written is returned. This is the byte count we should be
subtracting from the number of bytes still to be written, and not the
byte count we requested to write.

Reported-by: László Érsek <lersek@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-11-18 12:08:59 +01:00
Markus Armbruster
d1f06fe665 raw-posix: The SEEK_HOLE code is flawed, rewrite it
On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris,
but not Linux), try_seek_hole() reports trailing data instead.

Additionally, unlikely lseek() failures are treated badly:

* When SEEK_HOLE fails, try_seek_hole() reports trailing data.  For
  -ENXIO, there's in fact a trailing hole.  Can happen only when
  something truncated the file since we opened it.

* When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds,
  then try_seek_hole() reports a trailing hole.  This is okay only
  when SEEK_DATA failed with -ENXIO (which means the non-trailing hole
  found by SEEK_HOLE has since become trailing somehow).  For other
  failures (unlikely), it's wrong.

* When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely),
  then try_seek_hole() reports bogus data [-1,start), which its caller
  raw_co_get_block_status() turns into zero sectors of data.  Could
  theoretically lead to infinite loops in code that attempts to scan
  data vs. hole forward.

Rewrite from scratch, with very careful comments.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2014-11-18 09:45:48 +01:00
Markus Armbruster
c4875e5b22 raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
follows:

1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl

2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek()

3. Else pretend there are no holes

Later on, raw_co_is_allocated() was generalized to
raw_co_get_block_status().

Commit 4f11aa8 (May 2014) changed it to try the three methods in order
until success, because "there may be implementations which support
[SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
versa."

Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
Commit 38c4d0a (Sep 2014) added it.  Because that's a significant
speed hit, the next commit 7c159037 put SEEK_HOLE/SEEK_DATA first.

As you see, the obvious use of FIEMAP is wrong, and the correct use is
slow.  I guess this puts it somewhere between -7 "The obvious use is
wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard
to Misuse scale[*].

"Fortunately", the FIEMAP code is used only when

* SEEK_HOLE/SEEK_DATA aren't defined, but CONFIG_FIEMAP is

  Uncommon.  SEEK_HOLE had no XFS implementation between 2011 (when it
  was introduced for ext4 and btrfs) and 2012.

* SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails

  Unlikely.

Thus, the FIEMAP code executes rarely.  Makes it a nice hidey-hole for
bugs.  Worse, bugs hiding there can theoretically bite even on a host
that has SEEK_HOLE/SEEK_DATA.

I don't want to worry about this crap, not even theoretically.  Get
rid of it.

[*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2014-11-18 09:45:35 +01:00
Markus Armbruster
be2ebc6dad raw-posix: Fix comment for raw_co_get_block_status()
Missed in commit 705be72.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2014-11-18 09:44:02 +01:00
Fam Zheng
5f58330790 vmdk: Leave bdi intact if -ENOTSUP in vmdk_get_info
When extent types don't match, we return -ENOTSUP. In this case, be
polite to the caller and don't modify bdi.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1415938161-16217-1-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-14 09:20:45 +00:00
Max Reitz
d20418ee51 block/vdi: Limit maximum size even futher
The block layer read and write functions do not like requests which are
bigger than INT_MAX bytes. Since the VDI bmap is read and written in a
single operation, its size is therefore limited accordingly. This
reduces the maximum VDI image size supported by QEMU to half of what it
currently is (down to approximately 512 TB).

The VDI test 084 has to be adapted accordingly. Actually, one could
clearly see that it was broken from the "Could not open
'TEST_DIR/t.IMGFMT': Invalid argument" line for an image which was
supposed to work just fine.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
2014-11-09 23:39:50 +01:00
Peter Maydell
9a33c0c851 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQEcBAABAgAGBQJUV2wdAAoJEJykq7OBq3PIjcAH/29rl938ETw1wjXxYe3uH+R6
 K2yFEiPh9/cOJSH0mJ+gD8DZIN+iyR4eoQGP2s5ALFPcX3bkYxRLlUeYK0BCp883
 esc7gO6XPhLvTVqP0xgACRCdUwH2I0VTToDlHjXXZogyI/DuDX3gzWJufE3x1DGs
 WNTMOp5n/uYkWH3rI3DkInmbSddEz3pgX65a8BuYtw0V/RSeSRnHKDYHMygvJBRL
 EVfWRNeOIrZ730CyJry0t8ITjsZxiBDKXR5glNSwaIfQUfGkTSWi9YNSurNYkUDr
 aMS2rgvOVlrOUDKTHUj9oS3jgoGWcDtlk9E1MeSoyIptbRoMhdFVl1AUJZsrMJU=
 =Mfbu
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

# gpg: Signature made Mon 03 Nov 2014 11:50:53 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/block-pull-request: (53 commits)
  block: declare blockjobs and dataplane friends!
  block: let commit blockjob run in BDS AioContext
  block: let mirror blockjob run in BDS AioContext
  block: let stream blockjob run in BDS AioContext
  block: let backup blockjob run in BDS AioContext
  block: add bdrv_drain()
  blockjob: add block_job_defer_to_main_loop()
  blockdev: add note that block_job_cb() must be thread-safe
  blockdev: acquire AioContext in blockdev_mark_auto_del()
  blockdev: acquire AioContext in do_qmp_query_block_jobs_one()
  block: acquire AioContext in generic blockjob QMP commands
  iotests: Expand test 061
  block/qcow2: Simplify shared L2 handling in amend
  block/qcow2: Make get_refcount() global
  block/qcow2: Implement status CB for amend
  qemu-img: Fix insignificant memleak
  qemu-img: Add progress output for amend
  block: Add status callback to bdrv_amend_options()
  block: qemu-iotest 107 supports NFS
  iotests: Add test for qcow2's bdrv_make_empty
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-11-03 18:34:09 +00:00
Peter Maydell
7135781f65 trivial patches for 2014-11-02
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJUVhuDAAoJEL7lnXSkw9fbOKMIAIE3XZMhar4Vmokb/K0DFbnh
 gy2z7iCe7vumLKiRSJX1LGmkFO3dwykw82JZQ1SVo0RdgguJ5dx1Abx1qDM1rojL
 jJT0pJ9zWPl4fTv38wCEfaysQHPdgwoH4826ga+MXnVS9XHRHHxuQ4vI01AK3oyQ
 4t6/wto9H8kF3n6ny7tz5WNZClsq7qbiIqw5nNCILQfSh/VBPwxQNBiWf/nYVMuY
 Ubk5noztZwH+hbiAQL5lAPz/HolcRwg1tzbR0dfmt8/aqO28rJhasG58JgtziI2y
 JSg4BwldqUQEgiHonArLfQDixjLtEEyL+fQSzZm02ixwcBpc/ADSyGDy2R1zpH8=
 =j1ga
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2014-11-02' into staging

trivial patches for 2014-11-02

# gpg: Signature made Sun 02 Nov 2014 11:54:43 GMT using RSA key ID A4C3D7DB
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>"
# gpg:                 aka "Michael Tokarev <mjt@debian.org>"

* remotes/mjt/tags/pull-trivial-patches-2014-11-02: (23 commits)
  vdi: wrapped uuid_unparse() in #ifdef
  tap: fix possible fd leak in net_init_tap
  tap: do not close(fd) in net_init_tap_one
  target-i386: Remove unused model_features_t struct
  tap_int.h: remove repeating NETWORK_SCRIPT defines
  os-posix: reorder parent notification for -daemonize
  pidfile: stop making pidfile error a special case
  os-posix: replace goto again with a proper loop
  os-posix: use global daemon_pipe instead of cryptic fds[1]
  dump: Fix dump-guest-memory termination and use-after-close
  virtio-9p-proxy: improve error messages in connect_namedsocket()
  virtio-9p-proxy: fix error return in proxy_init()
  virtio-9p-proxy: Fix sockfd leak
  target-tricore: check return value before using it
  net/slirp: specify logbase for smbd
  Revert "os-posix: report error message when lock file failed"
  util: Improve os_mem_prealloc error message
  sparse: fix build
  target-arm: A64: remove redundant store
  target-xtensa: mark XtensaConfig structs as unused
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-11-03 14:55:17 +00:00
Stefan Hajnoczi
9e85cd5ce0 block: let commit blockjob run in BDS AioContext
The commit block job must run in the BlockDriverState AioContext so that
it works with dataplane.

Acquire the AioContext in blockdev.c so starting the block job is safe.
One detail here is that the bdrv_drain_all() must be moved inside the
aio_context_acquire() region so requests cannot sneak in between the
drain and acquire.

The completion code in block/commit.c must perform backing chain
manipulation and bdrv_reopen() from the main loop.  Use
block_job_defer_to_main_loop() to achieve that.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1413889440-32577-11-git-send-email-stefanha@redhat.com
2014-11-03 11:41:49 +00:00
Stefan Hajnoczi
5a7e7a0bad block: let mirror blockjob run in BDS AioContext
The mirror block job must run in the BlockDriverState AioContext so that
it works with dataplane.

Acquire the AioContext in blockdev.c so starting the block job is safe.

Note that to_replace is treated separately from other BlockDriverStates
in that it does not need to be in the same AioContext.  Explicitly
acquire/release to_replace's AioContext when accessing it.

The completion code in block/mirror.c must perform BDS graph
manipulation and bdrv_reopen() from the main loop.  Use
block_job_defer_to_main_loop() to achieve that.

The bdrv_drain_all() call is not allowed outside the main loop since it
could lead to lock ordering problems.  Use bdrv_drain(bs) instead
because we have acquired the AioContext so nothing else can sneak in
I/O.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1413889440-32577-10-git-send-email-stefanha@redhat.com
2014-11-03 11:41:49 +00:00
Stefan Hajnoczi
f3e69beb94 block: let stream blockjob run in BDS AioContext
The stream block job must run in the BlockDriverState AioContext so that
it works with dataplane.

The basics of acquiring the AioContext are easy in blockdev.c.

The tricky part is the completion code which drops part of the backing
file chain.  This must be done in the main loop where bdrv_unref() and
bdrv_close() are safe to call.  Use block_job_defer_to_main_loop() to
achieve that.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1413889440-32577-9-git-send-email-stefanha@redhat.com
2014-11-03 11:41:49 +00:00
Stefan Hajnoczi
761731b180 block: let backup blockjob run in BDS AioContext
The backup block job must run in the BlockDriverState AioContext so that
it works with dataplane.

The basics of acquiring the AioContext are easy in blockdev.c.

The completion code in block/backup.c must call bdrv_unref() from the
main loop.  Use block_job_defer_to_main_loop() to achieve that.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1413889440-32577-8-git-send-email-stefanha@redhat.com
2014-11-03 11:41:49 +00:00
Max Reitz
ecf58777c5 block/qcow2: Simplify shared L2 handling in amend
Currently, we have a bitmap for keeping track of which clusters have
been created during the zero cluster expansion process. This was
necessary because we need to properly increase the refcount for shared
L2 tables.

However, now we can simply take the L2 refcount and use it for the
cluster allocated for expansion. This will be the correct refcount and
therefore we don't have to remember that cluster having been allocated
any more.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Message-id: 1414404776-4919-7-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 11:41:49 +00:00
Max Reitz
44751917db block/qcow2: Make get_refcount() global
Reading the refcount of a cluster is an operation which can be useful in
all of the qcow2 code, so make that function globally available.

While touching this function, amend the comment describing the "addend"
parameter: It is (no longer, if it ever was) necessary to have it set to
-1 or 1; any value is fine.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Message-id: 1414404776-4919-6-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 11:41:49 +00:00
Max Reitz
4057a2b24a block/qcow2: Implement status CB for amend
The only really time-consuming operation potentially performed by
qcow2_amend_options() is zero cluster expansion when downgrading qcow2
images from compat=1.1 to compat=0.10, so report status of that
operation and that operation only through the status CB.

For this, approximate the progress as the number of L1 entries visited
during the operation.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Message-id: 1414404776-4919-5-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 11:41:49 +00:00
Max Reitz
7748543420 block: Add status callback to bdrv_amend_options()
Depending on the changed options and the image format,
bdrv_amend_options() may take a significant amount of time. In these
cases, a way to be informed about the operation's status is desirable.

Since the operation is rather complex and may fundamentally change the
image, implementing it as AIO or a coroutine does not seem feasible. On
the other hand, implementing it as a block job would be significantly
more difficult than a simple callback and would not add benefits other
than progress report to the amending operation, because it should not
actually be run as a block job at all.

A callback may not be very pretty, but it's very easy to implement and
perfectly fits its purpose here.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1414404776-4919-2-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 11:41:48 +00:00
Max Reitz
d4a3238af5 qemu-img: Implement commit like QMP
qemu-img should use QMP commands whenever possible in order to ensure
feature completeness of both online and offline image operations. As
qemu-img itself has no access to QMP (since this would basically require
just everything being linked into qemu-img), imitate QMP's
implementation of block-commit by using commit_active_start() and then
waiting for the block job to finish.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1414159063-25977-9-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 11:41:48 +00:00
Max Reitz
b21c76529d block/mirror: Improve progress report
Instead of taking the total length of the block device as the block
job's length, use the number of dirty sectors. The progress is now the
number of sectors mirrored to the target block device. Note that this
may result in the job's length increasing during operation, which is
however in fact desirable.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1414159063-25977-8-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 11:41:48 +00:00
Max Reitz
94054183da qcow2: Optimize bdrv_make_empty()
bdrv_make_empty() is currently only called if the current image
represents an external snapshot that has been committed to its base
image; it is therefore unlikely to have internal snapshots. In this
case, bdrv_make_empty() can be greatly sped up by emptying the L1 and
refcount table (while having the dirty flag set, which only works for
compat=1.1) and creating a trivial refcount structure.

If there are snapshots or for compat=0.10, fall back to the simple
implementation (discard all clusters).

[Applied s/clusters/cluster/ typo fix suggested by Eric Blake
--Stefan]

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1414159063-25977-4-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 11:41:48 +00:00
Max Reitz
491d27e2af qcow2: Implement bdrv_make_empty()
Implement this function by making all clusters in the image file fall
through to the backing file (by using the recently extended discard).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1414159063-25977-3-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 11:41:48 +00:00
Max Reitz
808c4b6f30 qcow2: Allow "full" discard
Normally, discarded sectors should read back as zero. However, there are
cases in which a sector (or rather cluster) should be discarded as if
they were never written in the first place, that is, reading them should
fall through to the backing file again.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1414159063-25977-2-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 11:41:47 +00:00
Max Reitz
d7f62751a1 raw-posix: raw_co_get_block_status() return value
Instead of generating the full return value thrice in try_fiemap(),
try_seek_hole() and as a fall-back in raw_co_get_block_status() itself,
generate the value only in raw_co_get_block_status().

While at it, also remove the pnum parameter from try_fiemap() and
try_seek_hole().

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1414148280-17949-3-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 11:41:47 +00:00
Max Reitz
e6d7ec32dd raw-posix: Fix raw_co_get_block_status() after EOF
As its comment states, raw_co_get_block_status() should unconditionally
return 0 and set *pnum to 0 for after EOF.

An assertion after lseek(..., SEEK_HOLE) tried to catch this case by
asserting that errno != -ENXIO (which would indicate a position after
the EOF); but it should be errno != ENXIO instead. Regardless of that,
there should be no such assertion at all. If bdrv_getlength() returned
an outdated value and the image has been resized outside of qemu,
lseek() will return with errno == ENXIO. Just return that value as an
error then.

Setting *pnum to 0 and returning 0 should not be done here, as in that
case we should update the device length as well. So, from qemu's
perspective, the file has not been resized; it's just that there was an
error querying sectors beyond a certain point (the actual file size).

Additionally, nb_sectors should be clamped against the image end. This
was probably not an issue if FIEMAP or SEEK_HOLE/SEEK_DATA worked, but
the fallback did not take this case into account.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1414148280-17949-2-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 11:41:47 +00:00
Richard W.M. Jones
f76faeda4b block/curl: Improve type safety of s->timeout.
qemu_opt_get_number returns a uint64_t, and curl_easy_setopt expects a
long (not an int).  There is no warning about the latter type error
because curl_easy_setopt uses a varargs argument.

Store the timeout (which is a positive number of seconds) as a
uint64_t.  Check that the number given by the user is reasonable.
Zero is permissible (meaning no timeout is enforced by cURL).

Cast it to long before calling curl_easy_setopt to fix the type error.

Example error message after this change has been applied:

$ ./qemu-img create -f qcow2 /tmp/test.qcow2 \
    -b 'json: { "file.driver":"https",
                "file.url":"https://foo/bar",
                "file.timeout":-1 }'
qemu-img: /tmp/test.qcow2: Could not open 'json: { "file.driver":"https", "file.url":"https://foo/bar", "file.timeout":-1 }': timeout parameter is too large or negative: Invalid argument

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 11:41:47 +00:00
Zhang Haoyu
3432a1929e snapshot: add bdrv_drain_all() to bdrv_snapshot_delete() to avoid concurrency problem
If there are still pending i/o while deleting snapshot,
because deleting snapshot is done in non-coroutine context, and
the pending i/o read/write (bdrv_co_do_rw) is done in coroutine context,
so it's possible to cause concurrency problem between above two operations.
Add bdrv_drain_all() to bdrv_snapshot_delete() to avoid this problem.

Signed-off-by: Zhang Haoyu <zhanghy@sangfor.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 201410211637596311287@sangfor.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 09:48:42 +00:00
Adam Crume
be21788495 rbd: Add support for bdrv_invalidate_cache
This fixes Ceph issue 2467: ttp://tracker.ceph.com/issues/2467

[Dropped return r in void function as suggested by Josh Durgin
<josh.durgin@inktank.com>.
--Stefan]

Signed-off-by: Adam Crume <adamcrume@gmail.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1412880272-3154-1-git-send-email-adamcrume@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 09:48:41 +00:00
Denis V. Lunev
da725d0b0e block/parallels: fix access to not initialized memory in catalog_bitmap
found by valgrind.

Command: ./qemu-img convert -f parallels -O qcow2 1.hds 1.img
Invalid read of size 4
   at 0x17D0EF: parallels_co_read (parallels.c:357)
   by 0x11FEE4: bdrv_aio_rw_vector (block.c:4640)
   by 0x11FFBF: bdrv_aio_readv_em (block.c:4652)
   by 0x11F55F: bdrv_co_readv_em (block.c:4862)
   by 0x123428: bdrv_aligned_preadv (block.c:3056)
   by 0x1239FA: bdrv_co_do_preadv (block.c:3162)
   by 0x125424: bdrv_rw_co_entry (block.c:2706)
   by 0x155DD9: coroutine_trampoline (coroutine-ucontext.c:118)
   by 0x6975B6F: ??? (in /lib/x86_64-linux-gnu/libc-2.19.so)

The problem is that s->catalog_bitmap is allocated/filled as
gmalloc(s->catalog_size) thus index validity check must be
inclusive, i.e. index >= s->catalog_size is invalid.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1412759610-2257-4-git-send-email-den@openvz.org
CC: Jeff Cody <jcody@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 09:48:41 +00:00
Peter Lieven
dc9e716369 block/iscsi: check for oversized requests
Cancel oversized requests early. They would generate
an iSCSI protocol error anyway; after having transferred
possibly a lot of data over the wire.

Suggested-By: Max Reitz <mreitz@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 09:48:41 +00:00
Peter Lieven
3dab155154 block/iscsi: use sector_limits_lun2qemu throughout iscsi_refresh_limits
As Max pointed out there is a hidden cast from int64_t to int for all
limits. So use the newly introduced sector_limits_lun2qemu for all
limits received from the target.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 09:48:41 +00:00
Peter Lieven
52f6fa1430 block/iscsi: set max_transfer_length
Copy the max_xfer_len from the BlockLimits VPD or use the
maximum value fitting in the CDB.

The helper function sector_limits_lun2qemu is introduced to convert
and cap the limits from the VPD to the maximum power of two fitting
in an integer; integer is the range for nb_sectors throughout
the block layer.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 09:48:41 +00:00
SeokYeon Hwang
f18a768efb vdi: wrapped uuid_unparse() in #ifdef
Wrapped uuid_unparse() in #ifdef to avoid "-Wunused-function"
on clang 3.4 or later.

Signed-off-by: SeokYeon Hwang <syeon.hwang@samsung.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-11-02 10:05:35 +03:00
Fam Zheng
c1d4096b0f iscsi: Refuse to open as writable if the LUN is write protected
Before, when a write protected iSCSI target is attached as scsi-disk
with BDRV_O_RDWR, we report it as writable, while in fact all writes
will fail.

One way to improve this is to report write protect flag as true to
guest, but a even better way is to refuse using a write protected LUN to
guest.

Target write protect flag is checked with a mode sense query.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-10-31 11:29:02 +01:00
Roger Pau Monne
3cad83075c block: char devices on FreeBSD are not behind a pager
Introduce a new flag to mark devices that require requests to be aligned and
replace the usage of BDRV_O_NOCACHE and O_DIRECT with this flag when
appropriate.

If a character device is used as a backend on a FreeBSD host set this flag
unconditionally.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2014-10-23 16:56:53 +02:00
Max Reitz
a1391444fe qcow2: Do not overflow when writing an L1 sector
While writing an L1 table sector, qcow2_write_l1_entry() copies the
respective range from s->l1_table to the local "buf" array. The size of
s->l1_table does not have to be a multiple of L1_ENTRIES_PER_SECTOR;
thus, limit the index which is used for copying all entries to the L1
size.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:02 +02:00
Max Reitz
17bd5f4727 qcow2: Drop REFCOUNT_SHIFT
With BDRVQcowState.refcount_block_bits, we don't need REFCOUNT_SHIFT
anymore.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:02 +02:00
Max Reitz
791230d8bb qcow2: Clean up after refcount rebuild
Because the old refcount structure will be leaked after having rebuilt
it, we need to recalculate the refcounts and run a leak-fixing operation
afterwards (if leaks should be fixed at all).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
c7c0681bc8 qcow2: Rebuild refcount structure during check
The previous commit introduced the "rebuild" variable to qcow2's
implementation of the image consistency check. Now make use of this by
adding a function which creates a completely new refcount structure
based solely on the in-memory information gathered before.

The old refcount structure will be leaked, however. This leak will be
dealt with in a follow-up commit.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
f307b2558f qcow2: Do not perform potentially damaging repairs
If a referenced cluster has a refcount of 0, increasing its refcount may
result in clusters being allocated for the refcount structures. This may
overwrite the referenced cluster, therefore we cannot simply increase
the refcount then.

In such cases, we can either try to replicate all the refcount
operations solely for the check operation, basing the allocations on the
in-memory refcount table; or we can simply rebuild the whole refcount
structure based on the in-memory refcount table. Since the latter will
be much easier, do that.

To prepare for this, introduce a "rebuild" boolean which should be set
to true whenever a fix is rather dangerous or too complicated using the
current refcount structures. Another example for this is refcount blocks
being referenced more than once.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
001c158def qcow2: Fix refcount blocks beyond image end
If the qcow2 check function detects a refcount block located beyond the
image end, grow the image appropriately. This cannot break anything and
is the logical fix for such a case.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
9696df219a qcow2: Reuse refcount table in calculate_refcounts()
We will later call calculate_refcounts multiple times, so reuse the
refcount table if possible.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
641bb63cd6 qcow2: Let inc_refcounts() resize the reftable
Now that the refcount table can be passed around by reference, do that
for inc_refcounts() (and subsequently check_refcounts_l1() and
check_refcounts_l2()) and use it for resizing it when a cluster after
the image end is encountered.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
fef4d3d564 qcow2: Let inc_refcounts() return -errno
As of a future patch, inc_refcounts() will have to throw errors which
are generally signaled by returning -errno. Therefore, let it return an
integer which is either 0 for success or -errno and handle the -errno
case in all callers.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
ad27390c85 qcow2: Split fail code in L1 and L2 checks
Instead of printing out an error message, incrementing check_errors and
returning a fixed -errno, just do cleanups and return -ret, with ret set
by the code which threw the exception (jumped to the fail label).

Also, increment check_errors on error in check_refcounts_l2().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
713d9675e0 qcow2: Use int64_t for in-memory reftable size
Use int64_t for the entry count of the in-memory refcount table
throughout the check functions.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
057a3fe57e qcow2: Pull check_refblocks() up
Pull check_refblocks() before calculate_refcounts() so we can drop its
static declaration.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
78fb328e85 qcow2: Use sizeof(**refcount_table)
When implementing variable refcounts, we want to be able to easily find
all the places in qemu which are tied to a certain refcount order.
Replace sizeof(uint16_t) in the check code by sizeof(**refcount_table)
so we can later find it more easily.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
6ca56bf5e9 qcow2: Split qcow2_check_refcounts()
Put the code for calculating the reference counts and comparing them
during qemu-img check into own functions.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
5b84106bd9 qcow2: Fix leaks in dirty images
When opening dirty images, qcow2's repair function should not only
repair errors but leaks as well.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
1d13d65466 qcow2: Calculate refcount block entry count
The size of a refblock entry is (in theory) variable; calculate
therefore the number of entries per refblock and the according bit shift
(1 << x == entry count) when opening an image.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Max Reitz
e9082e4736 block/vdi: Use {DIV_,}ROUND_UP
There are macros for these operations, so make use of them.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Markus Armbruster
84ebe3755f block: Make device model's references to BlockBackend strong
Doesn't make a difference just yet, but it's the right thing to do.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 14:03:51 +02:00
Markus Armbruster
a7f53e26a6 block: Lift device model API into BlockBackend
Move device model attachment / detachment and the BlockDevOps device
model callbacks and their wrappers from BlockDriverState to
BlockBackend.

Wrapper calls in block.c change from

    bdrv_dev_FOO_cb(bs, ...)

to

    if (bs->blk) {
        bdrv_dev_FOO_cb(bs->blk, ...);
    }

No change, because both bdrv_dev_change_media_cb() and
bdrv_dev_resize_cb() do nothing when no device model is attached, and
a device model can be attached only when bs->blk.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 14:03:50 +02:00
Markus Armbruster
d829a2115f block/qapi: Convert qmp_query_block() to BlockBackend
Much more command code needs conversion.  I start with this one
because it's using bdrv_dev_* functions, which I'm about to lift into
BlockBackend.

While there, give bdrv_query_info() internal linkage.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 14:03:50 +02:00
Markus Armbruster
26f8b3a847 blockdev: Fix blockdev-add not to create DriveInfo
blockdev_init() always creates a DriveInfo, but only drive_new() fills
it in.  qmp_blockdev_add() leaves it blank.  This results in a drive
with type = IF_IDE, bus = 0, unit = 0.  Screwed up in commit ee13ed1c.

Board initialization code looking for IDE drive (0,0) can pick up one
of these bogus drives.  The QMP command has to execute really early to
be visible.  Not sure how likely that is in practice.

Fix by creating DriveInfo in drive_new().  Block backends created by
blockdev-add don't get one.

Breaks the test for "has been created by qmp_blockdev_add()" in
blockdev_mark_auto_del() and do_drive_del(), because it changes the
value of dinfo && !dinfo->enable_auto_del from true to false.  Simply
test !dinfo instead.

Leaves DriveInfo member enable_auto_del unused.  Drop it.

A few places assume a block backend always has a DriveInfo.  Fix them
up.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 14:03:50 +02:00
Markus Armbruster
d3aeb1b7da blockdev: Drop superfluous DriveInfo member id
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 14:03:50 +02:00
Markus Armbruster
4be746345f hw: Convert from BlockDriverState to BlockBackend, mostly
Device models should access their block backends only through the
block-backend.h API.  Convert them, and drop direct includes of
inappropriate headers.

Just four uses of BlockDriverState are left:

* The Xen paravirtual block device backend (xen_disk.c) opens images
  itself when set up via xenbus, bypassing blockdev.c.  I figure it
  should go through qmp_blockdev_add() instead.

* Device model "usb-storage" prompts for keys.  No other device model
  does, and this one probably shouldn't do it, either.

* ide_issue_trim_cb() uses bdrv_aio_discard() instead of
  blk_aio_discard() because it fishes its backend out of a BlockAIOCB,
  which has only the BlockDriverState.

* PC87312State has an unused BlockDriverState[] member.

The next two commits take care of the latter two.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 14:02:25 +02:00
Markus Armbruster
097310b53e block: Rename BlockDriverCompletionFunc to BlockCompletionFunc
I'll use it with block backends shortly, and the name is going to fit
badly there.  It's a block layer thing anyway, not just a block driver
thing.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:27 +02:00
Markus Armbruster
7c84b1b831 block: Rename BlockDriverAIOCB* to BlockAIOCB*
I'll use BlockDriverAIOCB with block backends shortly, and the name is
going to fit badly there.  It's a block layer thing anyway, not just a
block driver thing.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:27 +02:00
Markus Armbruster
7f06d47eff block: Merge BlockBackend and BlockDriverState name spaces
BlockBackend's name space is separate only to keep the initial patches
simple.  Time to merge the two.

Retain bdrv_find() and bdrv_get_device_name() for now, to keep this
series manageable.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Markus Armbruster
bfb197e0d9 block: Eliminate BlockDriverState member device_name[]
device_name[] can become non-empty only in bdrv_new_root() and
bdrv_move_feature_fields().  The latter is used only to undo damage
done by bdrv_swap().  The former is called only by blk_new_with_bs().
Therefore, when a BlockDriverState's device_name[] is non-empty, then
it's been created with a BlockBackend, and vice versa.  Furthermore,
blk_new_with_bs() keeps the two names equal.

Therefore, device_name[] is redundant.  Eliminate it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Markus Armbruster
9ba10c95a4 block: Make BlockBackend own its BlockDriverState
On BlockBackend destruction, unref its BlockDriverState.  Replaces the
callers' unrefs.

This turns the pointer from BlockBackend to BlockDriverState into a
strong reference, managed with bdrv_ref() / bdrv_unref().  The
back-pointer remains weak.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Markus Armbruster
8fb3c76c94 block: Code motion to get rid of stubs/blockdev.c
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Markus Armbruster
18e46a033d block: Connect BlockBackend and DriveInfo
Make the BlockBackend own the DriveInfo.  Change blockdev_init() to
return the BlockBackend instead of the DriveInfo.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Markus Armbruster
7e7d56d9e0 block: Connect BlockBackend to BlockDriverState
Convenience function blk_new_with_bs() creates a BlockBackend with its
BlockDriverState.  Callers have to unref both.  The commit after next
will relieve them of the need to unref the BlockDriverState.

Complication: due to the silly way drive_del works, we need a way to
hide a BlockBackend, just like bdrv_make_anon().  To emphasize its
"special" status, give the function a suitably off-putting name:
blk_hide_on_behalf_of_do_drive_del().  Unfortunately, hiding turns the
BlockBackend's name into the empty string.  Can't avoid that without
breaking the blk->bs->device_name equals blk->name invariant.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Markus Armbruster
26f54e9a3c block: New BlockBackend
A block device consists of a frontend device model and a backend.

A block backend has a tree of block drivers doing the actual work.
The tree is managed by the block layer.

We currently use a single abstraction BlockDriverState both for tree
nodes and the backend as a whole.  Drawbacks:

* Its API includes both stuff that makes sense only at the block
  backend level (root of the tree) and stuff that's only for use
  within the block layer.  This makes the API bigger and more complex
  than necessary.  Moreover, it's not obvious which interfaces are
  meant for device models, and which really aren't.

* Since device models keep a reference to their backend, the backend
  object can't just be destroyed.  But for media change, we need to
  replace the tree.  Our solution is to make the BlockDriverState
  generic, with actual driver state in a separate object, pointed to
  by member opaque.  That lets us replace the tree by deinitializing
  and reinitializing its root.  This special need of the root makes
  the data structure awkward everywhere in the tree.

The general plan is to separate the APIs into "block backend", for use
by device models, monitor and whatever other code dealing with block
backends, and "block driver", for use by the block layer and whatever
other code (if any) dealing with trees and tree nodes.

Code dealing with block backends, device models in particular, should
become completely oblivious of BlockDriverState.  This should let us
clean up both APIs, and the tree data structures.

This commit is a first step.  It creates a minimal "block backend"
API: type BlockBackend and functions to create, destroy and find them.

BlockBackend objects are created and destroyed exactly when root
BlockDriverState objects are created and destroyed.  "Root" in the
sense of "in bdrv_states".  They're not yet used for anything; that'll
come shortly.

A root BlockDriverState is created with bdrv_new_root(), so where to
create a BlockBackend is obvious.  Where these roots get destroyed
isn't always as obvious.

It is obvious in qemu-img.c, qemu-io.c and qemu-nbd.c, and in error
paths of blockdev_init(), blk_connect().  That leaves destruction of
objects successfully created by blockdev_init() and blk_connect().

blockdev_init() is used only by drive_new() and qmp_blockdev_add().
Objects created by the latter are currently indestructible (see commit
48f364d "blockdev: Refuse to drive_del something added with
blockdev-add" and commit 2d246f0 "blockdev: Introduce
DriveInfo.enable_auto_del").  Objects created by the former get
destroyed by drive_del().

Objects created by blk_connect() get destroyed by blk_disconnect().

BlockBackend is reference-counted.  Its reference count never exceeds
one so far, but that's going to change.

In drive_del(), the BB's reference count is surely one now.  The BDS's
reference count is greater than one when something else is holding a
reference, such as a block job.  In this case, the BB is destroyed
right away, but the BDS lives on until all extra references get
dropped.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Markus Armbruster
e4e9986b1c block: Split bdrv_new_root() off bdrv_new()
Creating an anonymous BDS can't fail.  Make that obvious.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Max Reitz
ec0de76874 nbd: Fix filename generation
Export names may be used with nbd+unix, too, fix nbd_refresh_filename()
accordingly. Also, for nbd+tcp, the documented path schema is
"nbd://host[:port]/export", so use it. Furthermore, as can be seen from
that schema, the port is optional.

That makes six single cases for how the filename can be formatted; it is
not easy to generalize these cases without the resulting statement being
completely unreadable, thus there is simply one snprintf() per case.

Finally, taking the options from BDRVNBDState::socket_opts is wrong,
because those will not contain the export name. Just use
BlockDriverState::options instead.

Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Tony Breeds
7c15903789 block/raw-posix: use seek_hole ahead of fiemap
try_fiemap() uses FIEMAP_FLAG_SYNC which has a significant performance
impact.

Prefer seek_hole() over fiemap() to avoid this impact where possible.
seek_hole is more widely used and, arguably, has potential to be
optimised in the kernel.

Reported-By: Michael Steffens <michael_steffens@posteo.de>
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: Pádraig Brady <pbrady@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Tony Breeds
38c4d0aea3 block/raw-posix: Fix disk corruption in try_fiemap
Using fiemap without FIEMAP_FLAG_SYNC is a known corrupter.

Add the FIEMAP_FLAG_SYNC flag to the FS_IOC_FIEMAP ioctl.  This has
the downside of significantly reducing performance.

Reported-By: Michael Steffens <michael_steffens@posteo.de>
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: Pádraig Brady <pbrady@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Zhang Haoyu
d8bb71b622 qcow2: fix leak of Qcow2DiscardRegion in update_refcount_discard
When the Qcow2DiscardRegion is adjacent to another one referenced by "d",
free this Qcow2DiscardRegion metadata referenced by "p" after
it was removed from s->discards queue.

Signed-off-by: Zhang Haoyu <zhanghy@sangfor.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Max Reitz
9009b1963c qapi: Add corrupt field to ImageInfoSpecificQCow2
Just like lazy-refcounts, this field will be present iff the qcow2
compat level is 1.1 (or probably any future revision).

As expected, this breaks some tests due to the new field present in
qemu-img info output; so fix their output accordingly.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1412105489-7681-3-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-10-04 19:18:17 +01:00
Fam Zheng
d1319b077a vmdk: Fix integer overflow in offset calculation
This fixes the bug introduced by commit c6ac36e (vmdk: Optimize cluster
allocation).

$ ~/build/master/qemu-io /stor/vm/arch.vmdk -c 'write 2G 1k'
write failed: Invalid argument

Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1411437381-11234-1-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-10-03 10:30:33 +01:00
Richard W.M. Jones
18fe46d79a ssh: Don't crash if either host or path is not specified.
$ ./qemu-img create -f qcow2 overlay \
    -b 'json: { "file.driver":"ssh",
                "file.host":"localhost",
                "file.host_key_check":"no" }'
qemu-img: qobject/qdict.c:193: qdict_get_obj: Assertion `obj != ((void *)0)' failed.
Aborted

A similar crash also happens if the file.host field is omitted.

https://bugzilla.redhat.com/show_bug.cgi?id=1147343

Bug found and reported by Jun Li.

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-10-03 10:30:33 +01:00
Peter Maydell
1831e15060 This update brings dataplane to virtio-scsi (NOT
yet 100% thread-safe, though, which makes it really, really
 experimental.  It also brings asynchronous cancellation to
 the SCSI subsystem and implements it in virtio-scsi.  This
 is a pretty important feature.  Almost all the work here
 was done by Fam Zheng.
 
 I also included the virtio refcount fixes from Gonglei,
 because they had a small conflict with virtio-scsi dataplane.
 
 This pull request is using the new subkey 4E6B09D7.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJUKpR2AAoJEBRUblpOawnXNLAH/RBeF66ZqWc29dl78JKEbv0+
 C5pL61GhlI5vFIjSbPU3/iaZQifw3E4NLvX3SCN5ImsLzBw4r3qerapP2Ut96K/j
 5CYdWTF1oqE32oCefvlWhJulHmE1vxGN53BvOz3HHxoehdF1/tJ0wUoZyfztGTOF
 tiW85VMewi6CKm47/ns5tSNfGMVzWHqnUg67z/mwN6ZmPFU1dXBlgmiIv8Znahrn
 B1AOAeMjWaKvOS+tiYNVG6k0GENWGoiypxiTR3ZXLQKxOYdkh/X0ARULqLMonASX
 YsT772nzO9KZDIsdLj9QZZmM7vxs7UhW0MgQlvcSWP9vfZa5SeuRSgoXorPDj3Q=
 =54T3
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

This update brings dataplane to virtio-scsi (NOT
yet 100% thread-safe, though, which makes it really, really
experimental.  It also brings asynchronous cancellation to
the SCSI subsystem and implements it in virtio-scsi.  This
is a pretty important feature.  Almost all the work here
was done by Fam Zheng.

I also included the virtio refcount fixes from Gonglei,
because they had a small conflict with virtio-scsi dataplane.

This pull request is using the new subkey 4E6B09D7.

# gpg: Signature made Tue 30 Sep 2014 12:31:02 BST using RSA key ID 4E6B09D7
# gpg: Good signature from "Paolo Bonzini <pbonzini@redhat.com>"
# gpg:                 aka "Paolo Bonzini <bonzini@gnu.org>"

* remotes/bonzini/tags/for-upstream: (39 commits)
  block/iscsi: handle failure on malloc of the allocationmap
  util: introduce bitmap_try_new
  virtio-scsi: Handle TMF request cancellation asynchronously
  scsi: Introduce scsi_req_cancel_async
  scsi: Introduce scsi_req_cancel_complete
  scsi: Drop SCSIReqOps.cancel_io
  scsi: Unify request unref in scsi_req_cancel
  scsi-generic: Handle canceled request in scsi_command_complete
  scsi: Drop scsi_req_abort
  virtio-scsi: Process ".iothread" property
  virtio-scsi: Call bdrv_io_plug/bdrv_io_unplug in cmd request handling
  virtio-scsi: Batched prepare for cmd reqs
  virtio-scsi: Two stages processing of cmd request
  virtio-scsi: Add migration state notifier for dataplane code
  virtio-scsi: Hook up with dataplane
  virtio-scsi-dataplane: Code to run virtio-scsi on iothread
  virtio-scsi: Add VirtIOSCSIVring in VirtIOSCSIReq
  virtio-scsi: Add 'iothread' property to virtio-scsi
  virtio: add a wrapper for virtio-backend initialization
  virtio-9p: fix virtio-9p child refcount in transports
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-30 16:45:35 +01:00
Peter Lieven
a9fe4c957b block/iscsi: handle failure on malloc of the allocationmap
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-30 13:30:51 +02:00
Kevin Wolf
ed9114356b raw-posix: Fix build without posix_fallocate()
Check for the presence of posix_fallocate() in configure and only
compile in support for PREALLOC_MODE_FALLOC when it's there.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-29 16:28:24 +01:00
Peter Maydell
0ebcc56453 Block patches
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJUJbcwAAoJEH8JsnLIjy/WmckP/RBMK3gkr2aArp0k/sbdFcnW
 D7GusGkoj1jXRVFKuG3d+T/nozJTX0mkfnaRo8QCg/5VVJtqeHfsfY6JHgppQEai
 C5Z8gtzIFxsQNE0/8d2D6zJMnqsIeXAVtDONWgWcDOZstVXn+KEKOw5E3KCZeGxF
 yLA+h4h8sncT3ysPE0/jnQqeIwGJpLusFXmICiRY2lXRcCgY/qniymUq25FZO69m
 Isw8iVHrUr05u1kfthdQeSAmdPQqI5tK5siR4CERtO2XdlyiODmdo5DRZmATfGiL
 CGUIJd30mvrKpqRpIk8gbuGEtz92RuTnyoi9xkLcKi52eWyflGH4gjN3ojYe5Pro
 hRY06AcwI9lAVjCy7lKPO6dJn8RWOi/wkgfffK0CFujlo8r1BtwfCW0Y9koYuhdq
 mM/K0IEbjNgOvacHLIMOTQuKz3DLv7QB8wZQ+cTkdmvC6kNGnqD9yrA4cAQP4kjR
 wjm/WfDIRKPif6cMwgaOiQeNaL0VF2LIn4GRWgaiIqEYfcKzy65uBFoPqOlAoXr8
 NtTzCIXDIBbTSzz3VLBKD22FOsP0pfPrMeXkf+GwJwStKMcp6PGY4iUd3+M3bfOP
 iUsxTJ4Uw5zMg0MgOaaWnUaDdbRtKGtH1y2UA+4X3gO+3W8Vz0e8B5cPMBqN/mpy
 lUS4mBA08mTvaLE520rp
 =ManP
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block patches

# gpg: Signature made Fri 26 Sep 2014 19:57:52 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream:
  qemu-iotests: Fail test if explicit test case number is unknown
  block: Validate node-name
  vpc: fix beX_to_cpu() and cpu_to_beX() confusion
  docs: add blkdebug block driver documentation
  block: Catch simultaneous usage of options and their aliases
  block: Specify -drive legacy option aliases in array
  block: Improve message for device name clashing with node name
  qemu-nbd: Destroy the BlockDriverState properly
  block: Keep DriveInfo alive until BlockDriverState dies
  blockdev: Disentangle BlockDriverState and DriveInfo creation
  blkdebug: show an error for invalid event names

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-29 12:26:15 +01:00
Stefan Hajnoczi
44e7ebb8bb trace-events: drop orphan iscsi trace events
iscsi_aio_write16_cb, iscsi_aio_writev, iscsi_aio_read16_cb, and
iscsi_aio_readv have not not been in use since commit
063c3378a9 ("block/iscsi: introduce
bdrv_co_{readv, writev, flush_to_disk}").

These were the only trace events in block/iscsi.c so drop the the
trace.h include.

Cc: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1411394595-15300-4-git-send-email-stefanha@redhat.com
2014-09-26 09:34:38 +01:00
Stefan Hajnoczi
a4127c428e vpc: fix beX_to_cpu() and cpu_to_beX() confusion
The beX_to_cpu() and cpu_to_beX() functions perform the same operation -
they do a byteswap if the host CPU endianness is little-endian or a
nothing otherwise.

The point of two names for the same operation is that it documents which
direction the data is being converted.  This makes it clear whether the
data is suitable for CPU processing or in its external representation.

This patch fixes incorrect beX_to_cpu()/cpu_to_beX() usage.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-25 15:24:14 +02:00
Stefan Hajnoczi
d4362d642e blkdebug: show an error for invalid event names
It is easy to typo a blkdebug configuration and waste a lot of time
figuring out why no rules are matching.

Push the Error** down into add_rule() so we can report an error when the
event name is invalid.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-25 15:24:14 +02:00
Peter Maydell
380f649e02 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQEcBAABAgAGBQJUIAsHAAoJEJykq7OBq3PIyR4H/ikrs35Boxv08mR8rTyfpSnQ
 ERQhMnKWIe3cy5pzNG5TKiPliljF0FnkNC3KmBLU5TsoqXeW76WJF//Db5hNnTzG
 FAIeJu2RUSqhjqoz5K6TNYOJGH2XQ+/EZbMyrIeLBwYFn0gFMvZJOVYgpBWP0QTQ
 7sImrlxihUalwwL/6twfE6s5aA12DXN8hlC57u+9nvf+5ocaDyOJ7jUBU2EEhSNr
 TzDTO3gTCSEmnDriwKi3m3mIW/y7kLTXWyGolprZ0UpRyYRSmjcfgBHu8l0X8NCv
 lKkrYuE4V3QIzJk4BWbZQSPjoDlLbH9gnq3H+VwMxMZDxDKtAqAJRw/3H4yWhZ8=
 =JJEF
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

# gpg: Signature made Mon 22 Sep 2014 12:41:59 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/block-pull-request: (59 commits)
  block: Always compile virtio-blk dataplane
  vring: Better error handling if num is too large
  virtio: Import virtio_vring.h
  async: aio_context_new(): Handle event_notifier_init failure
  block: vhdx - fix reading beyond pointer during image creation
  block: delete cow block driver
  block/archipelago: Fix typo in qemu_archipelago_truncate()
  ahci: Add test_identify case to ahci-test.
  ahci: Add test_hba_enable to ahci-test.
  ahci: Add test_hba_spec to ahci-test.
  ahci: properly shadow the TFD register
  ahci: add test_pci_enable to ahci-test.
  ahci: Add test_pci_spec to ahci-test.
  ahci: MSI capability should be at 0x80, not 0x50.
  ahci: Adding basic functionality qtest.
  layout: Add generators for refcount table and blocks
  fuzz: Add fuzzing functions for entries of refcount table and blocks
  docs: List all image elements currently supported by the fuzzer
  qapi/block-core: Add "new" qcow2 options
  qcow2: Add overlap-check.template option
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-23 12:08:55 +01:00
Jeff Cody
e91a8b2fef block: vhdx - fix reading beyond pointer during image creation
In vhdx_create_metadata(), we allocate 40 bytes to entry_buffer for
the various metadata table entries.  However, we write out 64kB from
that buffer into the new file.  Only write out the correct 40 bytes.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:46 +01:00
Stefan Hajnoczi
550830f935 block: delete cow block driver
This patch removes support for the cow file format.

Normally we do not break backwards compatibility but in this case there
is no impact and it is the most logical option.  Extraordinary claims
require extraordinary evidence so I will show why removing the cow block
driver is the right thing to do.

The cow file format is the disk image format for Usermode Linux, a way
of running a Linux system in userspace.  The performance of UML was
never great and it was hacky, but it enjoyed some popularity before
hardware virtualization support became mainstream.

QEMU's block/cow.c is supposed to read this image file format.
Unfortunately the file format was underspecified:

1. Earlier Linux versions used the MAXPATHLEN constant for the backing
   filename field.  The value of MAXPATHLEN can change, so Linux
   switched to a 4096 literal but QEMU has a 1024 literal.

2. Padding was not used on the header struct (both in the Linux kernel
   and in QEMU) so the struct layout varied across architectures.  In
   particular, i386 and x86_64 were different due to int64_t alignment
   differences.  Linux now uses __attribute__((packed)), QEMU does not.

Therefore:

1. QEMU cow images do not conform to the Linux cow image file format.

2. cow images cannot be shared between different host architectures.

This means QEMU cow images are useless and QEMU has not had bug reports
from users actually hitting these issues.

Let's get rid of this thing, it serves no purpose and no one will be
affected.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1410877464-20481-1-git-send-email-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:45 +01:00
Chrysostomos Nanakos
3e7e6f186f block/archipelago: Fix typo in qemu_archipelago_truncate()
Fix a typo introduced by 94c80a438c

Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:44 +01:00
Max Reitz
ee42b5ce0b qcow2: Add overlap-check.template option
Being able to set the overlap-check option to a string and then refine
it via the overlap-check.* options is a nice idea for the command line
but does not work so well for non-flattened dicts. In that case, one can
only specify either but not both, so add a field to overlap-check.*
which does the same as directly specifying overlap-check but can be used
in conjunction with the other fields in non-flattened dicts.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1408557576-14574-4-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:34 +01:00
Max Reitz
7b17ce60cc qcow2: Fix leak of QemuOpts in qcow2_open()
Currently, the QemuOpts object opts is leaked if anything fails from its
creation up to and including the image repair block. Fix this by freeing
that object in the fail path.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Message-id: 1408557576-14574-2-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:32 +01:00
Max Reitz
a97c67ee6c qcow2: Check L1/L2/reftable entries for alignment
Offsets taken from the L1, L2 and refcount tables are generally assumed
to be correctly aligned. However, this cannot be guaranteed if the image
has been written to by something different than qemu, thus check all
offsets taken from these tables for correct cluster alignment.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1409926039-29044-5-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:28 +01:00
Max Reitz
adb435522b qcow2: Use qcow2_signal_corruption() for overlaps
Use the new function in case of a failed overlap check.

This changes output in case of corruption, so adapt iotest 060's
reference output accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Message-id: 1409926039-29044-4-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:28 +01:00
Max Reitz
85186ebdac qcow2: Add qcow2_signal_corruption()
Add a helper function for easily marking an image corrupt (on fatal
corruptions) while outputting an informative message to stderr and via
QAPI.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Message-id: 1409926039-29044-3-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:27 +01:00
Max Reitz
9bf040b962 qapi/block: Add "fatal" to BLOCK_IMAGE_CORRUPTED
Not every BLOCK_IMAGE_CORRUPTED event must be fatal; for example, when
reading from an image, they should generally not be. Nonetheless, even
an image only read from may of course be corrupted and this can be
detected during normal operation. In this case, a non-fatal event should
be emitted, but the image should not be marked corrupt (in accordance to
"fatal" set to false).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1409926039-29044-2-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:26 +01:00
Fam Zheng
e819ab2277 block: Introduce "null" drivers
This is an analogue to Linux null_blk. It can be used for testing or
benchmarking block device emulation and general block layer
functionalities such as coroutines and throttling, where disk IO is not
necessary or wanted.

Use null-aio:// for AIO version, and null-co:// for coroutine version.

[Resolved conflict with Fam's async bdrv_aio_cancel() series:
1. Drop .bdrv_aio_cancel() since it is now done by block.c
2. Rename qemu_aio_release() to qemu_aio_unref()
--Stefan]

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1410415798-20673-2-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:23 +01:00
Fam Zheng
8007429a99 block: Rename qemu_aio_release -> qemu_aio_unref
Suggested-by: Benoît Canet <benoit.canet@irqsave.net>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:17 +01:00
Fam Zheng
5da91e4ef4 win32-aio: Drop win32_aiocb_info.cancel
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:15 +01:00
Fam Zheng
6d24b4df38 sheepdog: Convert sd_aiocb_info.cancel to .cancel_async
Also drop the now unused SheepdogAIOCB.finished field. Note that this
aio is internal to sheepdog driver and has NULL cb and opaque, and
should be unused at all.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:14 +01:00
Fam Zheng
7691e24dbe rbd: Drop rbd_aiocb_info.cancel
And also drop the now unused "cancelled" field.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:13 +01:00
Fam Zheng
7940e5056b quorum: Convert quorum_aiocb_info.cancel to .cancel_async
Before, we cancel all the child requests with bdrv_aio_cancel, then free
the acb..

Now we just kick off asynchronous cancellation of child requests and
return, we know quorum_aio_cb will be called later, so in the end
quorum_aio_finalize will take care of calling the caller's cb.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:13 +01:00
Liu Yuan
997dd8df3e quorum: fix quorum_aio_cancel()
For a fifo read pattern, we only have one running aio (possible other cases that
has less number than num_children in the future), so we need to check if
.acb is NULL against bdrv_aio_cancel() to avoid segfault.

Cc: Eric Blake <eblake@redhat.com>
Cc: Benoit Canet <benoit@irqsave.net>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:12 +01:00
Fam Zheng
533ffb17a5 qed: Drop qed_aiocb_info.cancel
Also drop the now unused ->finished field.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:11 +01:00
Fam Zheng
facb5539d6 curl: Drop curl_aiocb_info.cancel
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:10 +01:00
Fam Zheng
9784e70f3d blkverify: Drop blkverify_aiocb_info.cancel
Also the finished pointer is not used any more.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:09 +01:00
Fam Zheng
4c781717c5 blkdebug: Drop blkdebug_aiocb_info.cancel
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:08 +01:00
Fam Zheng
4743b42a1b archipelago: Drop archipelago_aiocb_info.cancel
The cancelled flag is no longer useful. Later the request will complete
as before, and cb will be called.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:07 +01:00
Fam Zheng
722d93335f iscsi: Convert iscsi_aiocb_info.cancel to .cancel_async
Also drop the unused field "canceled".

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:06 +01:00
Fam Zheng
771b64daf9 linux-aio: Convert laio_aiocb_info.cancel to .cancel_async
Just call io_cancel (2), if it fails, it means the request is not
canceled, so the event loop will eventually call
qemu_laio_process_completion.

In qemu_laio_process_completion, change to call the cb unconditionally.
It is required by bdrv_aio_cancel_async.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:04 +01:00
Peter Maydell
c2ebb05e25 block/vhdx.c: Mark parent_vhdx_guid variable as unused
The parent_vhdx_guid variable is defined but never used, which provokes
complaints from newer versions of clang. Since the variable definition
is here acting as documentation of the image format, mark it with the
'unused' attribute to keep the compiler happy rather than simply
deleting it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:35:02 +01:00
Adelina Tuvenie
a011898d25 block: allow creation of fixed vhdx images
When trying to create a fixed vhd image qemu-img will return the
following error:

 qemu-img: test.vhdx: Could not create image: Cannot allocate memory

This happens because of a incorrect check in vhdx.c. Specifficaly,
in vhdx_create_bat(), after allocating memory for the BAT entry,
there is a check to determine if the allocation was unsuccsessful.
The error comes from the fact that it checks if s->bat isn't NULL,
which is true in case of succsessful allocation,  and exits with
error ENOMEM.

Signed-off-by: Adelina Tuvenie <atuvenie@cloudbasesolutions.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-09-22 12:09:31 +04:00
Hu Tao
0e4271b711 qcow2: Add falloc and full preallocation option
preallocation=falloc allocates disk space by posix_fallocate(),
preallocation=full allocates disk space by writing zeros to disk.
Both modes imply preallocation=metadata.

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-12 15:43:06 +02:00
Hu Tao
06247428be raw-posix: Add falloc and full preallocation option
This patch adds a new option preallocation for raw format, and implements
falloc and full preallocation.

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-12 15:43:06 +02:00
Hu Tao
ffeaac9b4e qapi: introduce PreallocMode and new PreallocModes full and falloc.
This patch prepares for the subsequent patches.

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-12 15:43:06 +02:00
Hu Tao
180e95265e block: don't convert file size to sector size
and avoid converting it back later.

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-12 15:43:06 +02:00
Hu Tao
c2eb918e32 block: round up file size to nearest sector
Currently the file size requested by user is rounded down to nearest
sector, causing the actual file size could be a bit less than the size
user requested. Since some formats (like qcow2) record virtual disk
size in bytes, this can make the last few bytes cannot be accessed.

This patch fixes it by rounding up file size to nearest sector so that
the actual file size is no less than the requested file size.

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-12 15:43:06 +02:00
Chrysostomos Nanakos
94c80a438c block/archipelago: Implement bdrv_truncate()
Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-10 10:41:29 +02:00
Benoît Canet
5366d0c8bc block: Make the block accounting functions operate on BlockAcctStats
This is the next step for decoupling block accounting functions from
BlockDriverState.
In a future commit the BlockAcctStats structure will be moved from
BlockDriverState to the device models structures.

Note that bdrv_get_stats was introduced so device models can retrieve the
BlockAcctStats structure of a BlockDriverState without being aware of it's
layout.
This function should go away when BlockAcctStats will be embedded in the device
models structures.

CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Keith Busch <keith.busch@intel.com>
CC: Anthony Liguori <aliguori@amazon.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: Peter Maydell <peter.maydell@linaro.org>
CC: Michael Tokarev <mjt@tls.msk.ru>
CC: John Snow <jsnow@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Alexander Graf <agraf@suse.de>
CC: Max Reitz <mreitz@redhat.com>

Signed-off-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-10 10:41:29 +02:00
Benoît Canet
28298fd3d9 block: rename BlockAcctType members to start with BLOCK_ instead of BDRV_
The middle term goal is to move the BlockAcctStats structure in the device models.
(Capturing I/O accounting statistics in the device models is good for billing)
This patch make a small step in this direction by removing a reference to BDRV.

CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Keith Busch <keith.busch@intel.com>
CC: Anthony Liguori <aliguori@amazon.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Markus Armbruster <armbru@redhat.com>
CC: Alexander Graf <agraf@suse.de>i

Signed-off-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-10 10:41:29 +02:00
Benoît Canet
5e5a94b605 block: Extract the block accounting code
The plan is to add new accounting metrics (latency, invalid requests, failed
requests, queue depth) and block.c is overpopulated so it will be better to work
in a separate module.

Moreover the long term plan is to have statistics in each of the BDS of the graph
for metrology purpose; this means that the device model statistics must move from
the topmost BDS to the device model.

So we need to decouple the statistic code from BlockDriverState.

This is another argument for the extraction of the code in a separate module.

CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: Benoit Canet <benoit@irqsave.net>
CC: Fam Zheng <famz@redhat.com>
CC: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
CC: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-10 10:41:29 +02:00
Benoît Canet
0ddd0ad96a block: Extract the BlockAcctStats structure
Extract the block accounting statistics into a structure so the block device
models can hold them in the future.

CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Eric Blake <eblake@redhat.com>

Signed-off-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-10 10:41:29 +02:00
Xiaodong Gong
0d4cc3e715 Fix improper usage of cpu_to_be32 in vpc
cpu_to_be32() is wrong since vhd_type is an enum constant
(just a regular CPU-endian integer).

Signed-off-by: Xiaodong Gong <gordongong0350@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-10 10:41:29 +02:00
Stefan Hajnoczi
b6b1d31f09 vmdk: fix buf leak in vmdk_parse_extents()
vmdk_open_sparse() does not take ownership of buf so the caller always
needs to free it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2014-09-08 11:12:44 +01:00
Stefan Hajnoczi
ff74f33c31 vmdk: fix vmdk_parse_extents() extent_file leaks
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2014-09-08 11:12:44 +01:00
Chrysostomos Nanakos
072f9ac44a block/archipelago: Use QEMU atomic builtins
Replace __sync builtins with ones provided by QEMU
for atomic operations.

Special thanks goes to Paolo Bonzini for his refactoring
suggestion in order to use the already existing atomic builtins
interface.

Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-08 11:12:43 +01:00
Richard W.M. Jones
41c2346716 curl: The macro that you have to uncomment to get debugging is DEBUG_CURL.
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-09-02 22:38:16 +04:00
Fam Zheng
8df3abfcee quorum: Fix leak of opts in quorum_open
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 17:10:18 +01:00
Fam Zheng
3158593126 blkverify: Fix leak of opts in blkverify_open
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 17:10:18 +01:00
Fam Zheng
810f4f86b7 nfs: Fix leak of opts in nfs_file_open
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 17:10:18 +01:00
Richard W.M. Jones
a2f468e48f curl: Don't deref NULL pointer in call to aio_poll.
In commit 63f0f45f2e the following
mechanical change was made:

         if (!state) {
-            qemu_aio_wait();
+            aio_poll(state->s->aio_context, true);
         }

The new code now checks if state is NULL and then dereferences it
('state->s') which is obviously incorrect.

This commit replaces state->s->aio_context with
bdrv_get_aio_context(bs), fixing this problem.  The two other hunks
are concerned with getting the BlockDriverState pointer bs to where it
is needed.

The original bug causes a segfault when using libguestfs to access a
VMware vCenter Server and doing any kind of complex read-heavy
operations.  With this commit the segfault goes away.

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 16:19:01 +01:00
Richard W.M. Jones
a94f83d94f curl: Allow a cookie or cookies to be sent with http/https requests.
In order to access VMware ESX efficiently, we need to send a session
cookie.  This patch is very simple and just allows you to send that
session cookie.  It punts on the question of how you get the session
cookie in the first place, but in practice you can just run a `curl'
command against the server and extract the cookie that way.

To use it, add file.cookie to the curl URL.  For example:

$ qemu-img info 'json: {
    "file.driver":"https",
    "file.url":"https://vcenter/folder/Windows%202003/Windows%202003-flat.vmdk?dcPath=Datacenter&dsName=datastore1",
    "file.sslverify":"off",
    "file.cookie":"vmware_soap_session=\"52a01262-bf93-ccce-d379-8dabb3e55560\""}'
image: [...]
file format: raw
virtual size: 8.0G (8589934592 bytes)
disk size: unavailable

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 16:11:14 +01:00
Stefan Hajnoczi
2cdff7f620 linux-aio: avoid deadlock in nested aio_poll() calls
If two Linux AIO request completions are fetched in the same
io_getevents() call, QEMU will deadlock if request A's callback waits
for request B to complete using an aio_poll() loop.  This was reported
to happen with the mirror blockjob.

This patch moves completion processing into a BH and makes it resumable.
Nested event loops can resume completion processing so that request B
will complete and the deadlock will not occur.

Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Marcin Gibuła <m.gibula@beyond.pl>
Reported-by: Marcin Gibuła <m.gibula@beyond.pl>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Marcin Gibuła <m.gibula@beyond.pl>
2014-08-29 15:59:17 +01:00
Liu Yuan
a780dea045 sheepdog: fix a core dump while do auto-reconnecting
We should reinit local_err as NULL inside the while loop or g_free() will report
corrupption and abort the QEMU when sheepdog driver tries reconnecting.

This was broken in commit 356b4ca.

qemu-system-x86_64: failed to get the header, Resource temporarily unavailable
qemu-system-x86_64: Failed to connect to socket: Connection refused
qemu-system-x86_64: (null)
[xcb] Unknown sequence number while awaiting reply
[xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
[xcb] Aborting, sorry about that.
qemu-system-x86_64: ../../src/xcb_io.c:298: poll_for_response: Assertion `!xcb_xlib_threads_sequence_lost' failed.
Aborted (core dumped)

Cc: qemu-devel@nongnu.org
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Paolo Bonzini
b493317d34 aio-win32: add support for sockets
Uses the same select/WSAEventSelect scheme as main-loop.c.
WSAEventSelect() is edge-triggered, so it cannot be used
directly, but it is still used as a way to exit from a
blocking g_poll().

Before g_poll() is called, we poll sockets with a non-blocking
select() to achieve the level-triggered semantics we require:
if a socket is ready, the g_poll() is made non-blocking too.

Based on a patch from Or Goshen.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Liu Yuan
a9db86b223 block/quorum: add simple read pattern support
This patch adds single read pattern to quorum driver and quorum vote is default
pattern.

For now we do a quorum vote on all the reads, it is designed for unreliable
underlying storage such as non-redundant NFS to make sure data integrity at the
cost of the read performance.

For some use cases as following:

        VM
  --------------
  |            |
  v            v
  A            B

Both A and B has hardware raid storage to justify the data integrity on its own.
So it would help performance if we do a single read instead of on all the nodes.
Further, if we run VM on either of the storage node, we can make a local read
request for better performance.

This patch generalize the above 2 nodes case in the N nodes. That is,

vm -> write to all the N nodes, read just one of them. If single read fails, we
try to read next node in FIFO order specified by the startup command.

The 2 nodes case is very similar to DRBD[1] though lack of auto-sync
functionality in the single device/node failure for now. But compared with DRBD
we still have some advantages over it:

- Suppose we have 20 VMs running on one(assume A) of 2 nodes' DRBD backed
storage. And if A crashes, we need to restart all the VMs on node B. But for
practice case, we can't because B might not have enough resources to setup 20 VMs
at once. So if we run our 20 VMs with quorum driver, and scatter the replicated
images over the data center, we can very likely restart 20 VMs without any
resource problem.

After all, I think we can build a more powerful replicated image functionality
on quorum and block jobs(block mirror) to meet various High Availibility needs.

E.g, Enable single read pattern on 2 children,

-drive driver=quorum,children.0.file.filename=0.qcow2,\
children.1.file.filename=1.qcow2,read-pattern=fifo,vote-threshold=1

[1] http://en.wikipedia.org/wiki/Distributed_Replicated_Block_Device

[Dropped \n from an error_setg() error message
--Stefan]

Cc: Benoit Canet <benoit@irqsave.net>
Cc: Eric Blake <eblake@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:58 +01:00
Hitoshi Mitake
38890b246d sheepdog: improve error handling for a case of failed lock
Recently, sheepdog revived its VDI locking functionality. This patch
updates sheepdog driver of QEMU for this feature. It changes an error
code for a case of failed locking. -EBUSY is a suitable one.

Reported-by: Valerio Pachera <sirio81@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Liu Yuan <namei.unix@gmail.com>
Cc: MORITA Kazutaka <morita.kazutaka@gmail.com>
Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:57 +01:00
Hitoshi Mitake
1dbfafed7f sheepdog: adopting protocol update for VDI locking
The update is required for supporting iSCSI multipath. It doesn't
affect behavior of QEMU driver but adding a new field to vdi request
struct is required.

Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Liu Yuan <namei.unix@gmail.com>
Cc: MORITA Kazutaka <morita.kazutaka@gmail.com>
Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:57 +01:00
Daniel Henrique Barboza
212aefaa53 block.curl: adding 'timeout' option
The curl hardcoded timeout (5 seconds) sometimes is not long
enough depending on the remote server configuration and network
traffic. The user should be able to set how much long he is
willing to wait for the connection.

Adding a new option to set this timeout gives the user this
flexibility. The previous default timeout of 5 seconds will be
used if this option is not present.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Reviewed-by: Benoit Canet <benoit.canet@nodalink.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:46:57 +01:00
Stefan Hajnoczi
6d0de8eb21 mirror: fix uninitialized variable delay_ns warnings
The gcc 4.1.2 compiler warns that delay_ns may be uninitialized in
mirror_iteration().

There are two break statements in the do ... while loop that skip over
the delay_ns assignment.  These are probably the cause of the warning.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
2014-08-28 13:42:25 +01:00
Markus Armbruster
0a156f7c75 vmdk: Use bdrv_nb_sectors() where sectors, not bytes are wanted
Instead of bdrv_getlength().

Commit 57322b7 did this all over block, but one more bdrv_getlength()
has crept in since.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-22 11:10:12 +02:00
Fam Zheng
cbf95a0b11 blkdebug: Delete BH in bdrv_aio_cancel
Otherwise error_callback_bh will access the already released acb.

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-22 11:07:00 +02:00
Stefan Hajnoczi
61ed73cff4 raw-posix: fix O_DIRECT short reads
The following O_DIRECT read from a <512 byte file fails:

  $ truncate -s 320 test.img
  $ qemu-io -n -c 'read -P 0 0 512' test.img
  qemu-io: can't open device test.img: Could not read image for determining its format: Invalid argument

Note that qemu-io completes successfully without the -n (O_DIRECT)
option.

This patch fixes qemu-iotests ./check -nocache -vmdk 059.

Cc: qemu-stable@nongnu.org
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-22 11:00:56 +02:00
Peter Lieven
d832fb4d66 block/iscsi: fix memory corruption on iscsi resize
bs->total_sectors is not yet updated at this point. resulting
in memory corruption if the volume has grown and data is written
to the newly availble areas.

CC: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-22 10:55:22 +02:00
Michael Tokarev
13b552c2f4 block/vvfat.c: remove debugging code to reinit stderr if NULL
Just log to stderr unconditionally, like other similar code does.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-21 10:36:29 +02:00
Max Reitz
fafcfe228d quorum: Implement bdrv_refresh_filename()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 14:31:56 +02:00
Max Reitz
2019d68b3b nbd: Implement bdrv_refresh_filename()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 14:31:56 +02:00
Max Reitz
74b36b2eb8 blkverify: Implement bdrv_refresh_filename()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 14:31:56 +02:00
Max Reitz
2c31b04c94 blkdebug: Implement bdrv_refresh_filename()
Because blkdebug cannot simply create a configuration file, simply
refuse to reconstruct a plain filename and only generate an options
QDict from the rules instead.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 14:31:56 +02:00
Max Reitz
6c1c8d5d3e qcow2: Add runtime options for cache sizes
Add options for specifying the size of the metadata caches. This can
either be done directly for each cache (if only one is given, the other
will be derived according to a default ratio) or combined for both.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Max Reitz
02004bd4ba qcow2: Use g_try_new0() for cache array
With a variable cache size, the number given to qcow2_cache_create() may
be huge. Therefore, use g_try_new0().

While at it, use g_new0() instead of g_malloc0() for allocating the
Qcow2Cache object.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Max Reitz
440ba08aea qcow2: Constant cache size in bytes
Specifying the metadata cache sizes in clusters results in less clusters
(and much less bytes) covered for small cluster sizes and vice versa.
Using a constant byte size reduces this difference, and makes it
possible to manually specify the cache size in an easily comprehensible
unit.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Markus Armbruster
d4df3dbc02 block: Drop some superfluous casts from void *
They clutter the code.  Unfortunately, I can't figure out how to make
Coccinelle drop all of them, so I have to settle for common special
cases:

    @@
    type T;
    T *pt;
    void *pv;
    @@
    - pt = (T *)pv;
    + pt = pv;
    @@
    type T;
    @@
    - (T *)
      (\(g_malloc\|g_malloc0\|g_realloc\|g_new\|g_new0\|g_renew\|
	 g_try_malloc\|g_try_malloc0\|g_try_realloc\|
	 g_try_new\|g_try_new0\|g_try_renew\)(...))

Topped off with minor manual style cleanups.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Markus Armbruster
02c4f26b15 block: Use g_new() & friends to avoid multiplying sizes
g_new(T, n) is safer than g_malloc(sizeof(*v) * n) for two reasons.
One, it catches multiplication overflowing size_t.  Two, it returns
T * rather than void *, which lets the compiler catch more type
errors.

Perhaps a conversion to g_malloc_n() would be neater in places, but
that's merely four years old, and we can't use such newfangled stuff.

This commit only touches allocations with size arguments of the form
sizeof(T), plus two that use 4 instead of sizeof(uint32_t).  We can
make the others safe by converting to g_malloc_n() when it becomes
available to us in a couple of years.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Markus Armbruster
5839e53bbc block: Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n).  It's also safer,
for two reasons.  One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.

Patch created with Coccinelle, with two manual changes on top:

* Add const to bdrv_iterate_format() to keep the types straight

* Convert the allocation in bdrv_drop_intermediate(), which Coccinelle
  inexplicably misses

Coccinelle semantic patch:

    @@
    type T;
    @@
    -g_malloc(sizeof(T))
    +g_new(T, 1)
    @@
    type T;
    @@
    -g_try_malloc(sizeof(T))
    +g_try_new(T, 1)
    @@
    type T;
    @@
    -g_malloc0(sizeof(T))
    +g_new0(T, 1)
    @@
    type T;
    @@
    -g_try_malloc0(sizeof(T))
    +g_try_new0(T, 1)
    @@
    type T;
    expression n;
    @@
    -g_malloc(sizeof(T) * (n))
    +g_new(T, n)
    @@
    type T;
    expression n;
    @@
    -g_try_malloc(sizeof(T) * (n))
    +g_try_new(T, n)
    @@
    type T;
    expression n;
    @@
    -g_malloc0(sizeof(T) * (n))
    +g_new0(T, n)
    @@
    type T;
    expression n;
    @@
    -g_try_malloc0(sizeof(T) * (n))
    +g_try_new0(T, n)
    @@
    type T;
    expression p, n;
    @@
    -g_realloc(p, sizeof(T) * (n))
    +g_renew(T, p, n)
    @@
    type T;
    expression p, n;
    @@
    -g_try_realloc(p, sizeof(T) * (n))
    +g_try_renew(T, p, n)

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Stefan Hajnoczi
39ba3bf69c qcow2: fix new_blocks double-free in alloc_refcount_block()
Commit de82815db1 ("qcow2: Handle failure
for potentially large allocations") introduced a double-free of
new_blocks in the alloc_refcount_block() error path.

The qemu-iotests qcow2 026 test case was failing because qemu-io
segfaulted.

Make sure new_blocks is NULL after we free it the first time.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:26 +01:00
Denis V. Lunev
d25d598020 parallels: 2TB+ parallels images support
Parallels has released in the recent updates of Parallels Server 5/6
new addition to his image format. Images with signature WithouFreSpacExt
have offsets in the catalog coded not as offsets in sectors (multiple
of 512 bytes) but offsets coded in blocks (i.e. header->tracks * 512)

In this case all 64 bits of header->nb_sectors are used for image size.

This patch implements support of this for qemu-img and also adds specific
check for an incorrect image. Images with block size greater than
INT_MAX/513 are not supported. The biggest available Parallels image
cluster size in the field is 1 Mb. Thus this limit will not hurt
anyone.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
Denis V. Lunev
418a7adb77 parallels: split check for parallels format in parallels_open
and rework error path a bit. There is no difference at the moment, but
the code will be definitely shorter when additional processing will
be required for WithouFreSpacExt

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jeff Cody <jcody@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
Denis V. Lunev
f08e2f8465 parallels: replace tabs with spaces in block/parallels.c
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Jeff Cody <jcody@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
Denis V. Lunev
8c27d54fa0 parallels: extend parallels format header with actual data values
Parallels image format has several additional fields inside:
- nb_sectors is actually 64 bit wide. Upper 32bits are not used for
  images with signature "WithoutFreeSpace" and must be explicitly
  zeroed according to Parallels. They will be used for images with
  signature "WithouFreSpacExt"
- inuse is magic which means that the image is currently opened for
  read/write or was not closed correctly, the magic is 0x746f6e59
- data_off is the location of the first data block. It can be zero
  and in this case data starts just beyond the header aligned to
  512 bytes. Though this field does not matter for read-only driver

This patch adds these values to struct parallels_header and adds
proper handling of nb_sectors for currently supported WithoutFreeSpace
images.

WithouFreSpacExt will be covered in next patches.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:13 +01:00
Paolo Bonzini
9e52c53b8c blkdebug: report errors on flush too
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 18:03:11 +01:00
Max Reitz
ff52aab2df qcow2: Catch !*host_offset for data allocation
qcow2_alloc_cluster_offset() uses host_offset == 0 as "no preferred
offset" for the (data) cluster range to be allocated. However, this
offset is actually valid and may be allocated on images with a corrupted
refcount table or first refcount block.

In this case, the corruption prevention should normally catch that
write anyway (because it would overwrite the image header). But since 0
is a special value here, the function assumes that nothing has been
allocated at all which it asserts against.

Because this condition is not qemu's fault but rather that of a broken
image, it shouldn't throw an assertion but rather mark the image corrupt
and show an appropriate message, which this patch does by calling the
corruption check earlier than it would be called normally (before the
assertion).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:16 +02:00
Max Reitz
8fcffa9853 qcow2: Return useful error code in refcount_init()
If bdrv_pread() returns an error, it is very unlikely that it was
ENOMEM. In this case, the return value should be passed along; as
bdrv_pread() will always either return the number of bytes read or a
negative value (the error code), the condition for checking whether
bdrv_pread() failed can be simplified (and clarified) as well.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:16 +02:00
Kevin Wolf
7504edf477 mirror: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the mirror block job.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:16 +02:00
Kevin Wolf
5fb09cd586 vpc: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the vpc block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:16 +02:00
Kevin Wolf
d6e5993197 vmdk: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the vmdk block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:16 +02:00
Kevin Wolf
a67e128a4f vhdx: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the vhdx block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:16 +02:00
Kevin Wolf
17cce73578 vdi: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the vdi block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:16 +02:00
Kevin Wolf
0f7a02379b rbd: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the rbd block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:16 +02:00
Kevin Wolf
4b6af3d58a raw-win32: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the raw-win32 block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:16 +02:00
Kevin Wolf
50d4a858e6 raw-posix: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the raw-posix block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:15 +02:00
Kevin Wolf
4f4896db5f qed: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the qed block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:15 +02:00
Kevin Wolf
de82815db1 qcow2: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the qcow2 block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:15 +02:00
Kevin Wolf
0df93305f2 qcow1: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the qcow1 block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:15 +02:00
Kevin Wolf
f7b593d937 parallels: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the parallels block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:15 +02:00
Kevin Wolf
2347dd7b68 nfs: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the nfs block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:15 +02:00
Kevin Wolf
4d5a3f888c iscsi: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the iscsi block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-08-15 15:07:15 +02:00
Kevin Wolf
b546a94474 dmg: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the dmg block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:15 +02:00
Kevin Wolf
8dc7a7725b curl: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the curl block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:15 +02:00
Kevin Wolf
4ae7a52e43 cloop: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the cloop block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:15 +02:00
Kevin Wolf
7bf665ee35 bochs: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses the allocations in the bochs block driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:15 +02:00
Jeff Cody
fef6070eff block: vpc - use block layer ops in vpc_create, instead of posix calls
Use the block layer to create, and write to, the image file in the VPC
.bdrv_create() operation.

This has a couple of benefits: Images can now be created over protocols,
and hacks such as NOCOW are not needed in the image format driver, and
the underlying file protocol appropriate for the host OS can be relied
upon.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:15 +02:00
Jeff Cody
dddc7750d6 block: use the standard 'ret' instead of 'result'
Most QEMU code uses 'ret' for function return values. The VDI driver
uses a mix of 'result' and 'ret'.  This cleans that up, switching over
to the standard 'ret' usage.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:15 +02:00
Jeff Cody
70747862f1 block: vdi - use block layer ops in vdi_create, instead of posix calls
Use the block layer to create, and write to, the image file in the
VDI .bdrv_create() operation.

This has a couple of benefits: Images can now be created over protocols,
and hacks such as NOCOW are not needed in the image format driver, and
the underlying file protocol appropriate for the host OS can be relied
upon.

Also some minor cleanup for error handling.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:14 +02:00
Jeff Cody
4f75b52a07 block: VHDX endian fixes
This patch contains several changes for endian conversion fixes for
VHDX, particularly for big-endian machines (multibyte values in VHDX are
all on disk in LE format).

Tests were done with existing qemu-iotests on an IBM POWER7 (8406-71Y).
This includes sample images created by Hyper-V, both with dirty logs and
without.

In addition, VHDX image files created (and written to) on a BE machine
were tested on a LE machine, and vice-versa.

Reported-by: Markus Armburster <armbru@redhat.com>
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:14 +02:00
Jeff Cody
349592e0b9 block: vhdx - add error check
This add an error check for an invalid descriptor entry signature,
when flushing the log descriptor entries.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:14 +02:00
Chrysostomos Nanakos
76d3d83a37 block/archipelago: Add support for creating images
qemu-img archipelago:<volumename>[/mport=<mapperd_port>[:vport=<vlmcd_port>]
 [:segment=<segment_name>]] [size]

Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:14 +02:00
Chrysostomos Nanakos
70537a8506 block/archipelago: Implement bdrv_parse_filename()
VM Image on Archipelago volume can also be specified like this:

file=archipelago:<volumename>[/mport=<mapperd_port>[:vport=<vlmcd_port>][:
segment=<segment_name>]]

Examples:

file=archipelago:my_vm_volume
file=archipelago:my_vm_volume/mport=123
file=archipelago:my_vm_volume/mport=123:vport=1234
file=archipelago:my_vm_volume/mport=123:vport=1234:segment=my_segment

Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:14 +02:00
Chrysostomos Nanakos
c9a12e751b block: Support Archipelago as a QEMU block backend
VM Image on Archipelago volume is specified like this:

file.driver=archipelago,file.volume=<volumename>[,file.mport=<mapperd_port>[,
file.vport=<vlmcd_port>][,file.segment=<segment_name>]]

'archipelago' is the protocol.

'mport' is the port number on which mapperd is listening. This is optional
and if not specified, QEMU will make Archipelago to use the default port.

'vport' is the port number on which vlmcd is listening. This is optional
and if not specified, QEMU will make Archipelago to use the default port.

'segment' is the name of the shared memory segment Archipelago stack is using.
This is optional and if not specified, QEMU will make Archipelago to use the
default value, 'archipelago'.

Examples:

file.driver=archipelago,file.volume=my_vm_volume
file.driver=archipelago,file.volume=my_vm_volume,file.mport=123
file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
file.vport=1234
file.driver=archipelago,file.volume=my_vm_volume,file.mport=123,
file.vport=1234,file.segment=my_segment

Signed-off-by: Chrysostomos Nanakos <cnanakos@grnet.gr>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:14 +02:00
Chunyan Liu
000c4dfff4 qemu-img info: show nocow info
Add nocow info in 'qemu-img info' output to show whether the file
currently has NOCOW flag set or not.

Signed-off-by: Chunyan Liu <cyliu@suse.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:14 +02:00
Fam Zheng
c6ac36e145 vmdk: Optimize cluster allocation
This drops the unnecessary bdrv_truncate() from, and also improves,
cluster allocation code path.

Before, when we need a new cluster, get_cluster_offset truncates the
image to bdrv_getlength() + cluster_size, and returns the offset of
added area, i.e. the image length before truncating.

This is not efficient, so it's now rewritten as:

  - Save the extent file length when opening.

  - When allocating cluster, use the saved length as cluster offset.

  - Don't truncate image, because we'll anyway write data there: just
    write any data at the EOF position, in descending priority:

    * New user data (cluster allocation happens in a write request).

    * Filling data in the beginning and/or ending of the new cluster, if
      not covered by user data: either backing file content (COW), or
      zero for standalone images.

One major benifit of this change is, on host mounted NFS images, even
over a fast network, ftruncate is slow (see the example below). This
change significantly speeds up cluster allocation. Comparing by
converting a cirros image (296M) to VMDK on an NFS mount point, over
1Gbe LAN:

    $ time qemu-img convert cirros-0.3.1.img /mnt/a.raw -O vmdk

    Before:
        real    0m21.796s
        user    0m0.130s
        sys     0m0.483s

    After:
        real    0m2.017s
        user    0m0.047s
        sys     0m0.190s

We also get rid of unchecked bdrv_getlength() and bdrv_truncate(), and
get a little more documentation in function comments.

Tested that this passes qemu-iotests for all VMDK subformats.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:14 +02:00
Markus Armbruster
52bf1e722d block: Avoid bdrv_get_geometry() where errors should be detected
bdrv_get_geometry() hides errors.  Use bdrv_nb_sectors() or
bdrv_getlength() instead where that's obviously inappropriate.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Markus Armbruster
75d3d21f9e block: Drop superfluous aligning of bdrv_getlength()'s value
It returns a multiple of the sector size.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Markus Armbruster
57322b7811 block: Use bdrv_nb_sectors() where sectors, not bytes are wanted
Instead of bdrv_getlength().

Aside: a few of these callers don't handle errors.  I didn't
investigate whether they should.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Kevin Wolf
df26a35025 raw-posix: Fail gracefully if no working alignment is found
If qemu couldn't find out what O_DIRECT alignment to use with a given
file, it would run into assert(bdrv_opt_mem_align(bs) != 0); in block.c
and confuse users. This adds a more descriptive error message for such
cases.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-18 13:18:43 +01:00
Kevin Wolf
3baca89139 block: Add Error argument to bdrv_refresh_limits()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-18 13:18:43 +01:00
Kevin Wolf
12ac6d3db7 qcow2: Fix error path for unknown incompatible features
qcow2's report_unsupported_feature() had two bugs: A 32 bit truncation
would prevent feature table entries for bits 32-63 from being used, and
it could assign errp multiple times if there was more than one unknown
feature, resulting in an error_set() assertion failure.

Fix the truncation, make sure to set the error exactly once and add a
qemu-iotests case for it.

This fixes https://bugs.launchpad.net/qemu/+bug/1342704/

Reported-by: Maria Kustova <maria.k@catit.be>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-18 13:12:15 +01:00
Gonglei
a1abf40d6b linux-aio: Fix laio resource leak
when hotplug virtio-scsi disks using laio, the aio_nr will
increase in laio_init() by io_setup(), we can see the number by
  # cat /proc/sys/fs/aio-nr
  128
if the aio_nr attach the maxnum, which found from
  # cat /proc/sys/fs/aio-max-nr
  65536
the hotplug process will fail because of aio context leak.

Fix it by io_destroy in laio_cleanup().

Reported-by: daifulai <daifulai@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-15 15:34:13 +02:00
Kevin Wolf
8eb029c26e block: Assert qiov length matches request length
At least raw-posix relies on this because it can allocate bounce buffers
based on the request length, but access it using all of the qiov entries
later.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-07-14 12:03:20 +02:00
Kevin Wolf
f06ee3d4aa qed: Make qiov match request size until backing file EOF
If a QED image has a shorter backing file and a read request to
unallocated clusters goes across EOF of the backing file, the backing
file sees a shortened request and the rest is filled with zeros.
However, the original too long qiov was used with the shortened request.

This patch makes the qiov size match the request size, avoiding a
potential buffer overflow in raw-posix.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-07-14 12:03:20 +02:00
Kevin Wolf
44deba5a52 qcow2: Make qiov match request size until backing file EOF
If a qcow2 image has a shorter backing file and a read request to
unallocated clusters goes across EOF of the backing file, the backing
file sees a shortened request and the rest is filled with zeros.
However, the original too long qiov was used with the shortened request.

This patch makes the qiov size match the request size, avoiding a
potential buffer overflow in raw-posix.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-07-14 12:03:20 +02:00
Kevin Wolf
d40593dd90 block/backup: Fix hang for unaligned image size
When doing a block backup of an image with an unaligned size (with
respect to the BACKUP_CLUSTER_SIZE), qemu would check the allocation
status of sectors after the end of the image. bdrv_is_allocated()
returns a result that is valid for 0 sectors in this case, so the backup
job ran into an endless loop.

Stop looping when seeing a result valid for 0 sectors, we're at EOF then.

The test case looks somewhat unrelated at first sight because I
originally tried to reproduce a different suspected bug that turned out
to not exist. Still a good test case and it accidentally found this one.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-07-09 15:50:11 +02:00
Ming Lei
1b3abdcccf linux-aio: implement io plug, unplug and flush io queue
This patch implements .bdrv_io_plug, .bdrv_io_unplug and
.bdrv_flush_io_queue callbacks for linux-aio Block Drivers,
so that submitting I/O as a batch can be supported on linux-aio.

[Unprocessed requests are completed with -EIO instead of a bogus ret
value.
--Stefan]

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-07 11:05:17 +02:00
Markus Armbruster
aa729704f4 raw-posix: Fix raw_getlength() to always return -errno on error
We got a merry mix of -1 and -errno here.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-07 09:41:29 +02:00
Kevin Wolf
5a0f6fd5c8 mirror: Fix qiov size for short requests
When mirroring an image of a size that is not a multiple of the
mirror job granularity, the last request would have the right nb_sectors
argument, but a qiov that is rounded up to the next multiple of the
granularity. Don't do this.

This fixes a segfault that is caused by raw-posix being confused by this
and allocating a buffer with request length, but operating on it with
qiov length.

[s/Driver/Drive/ in qemu-iotests 041 as suggested by Eric
--Stefan]

Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-07 09:15:29 +02:00
Jeff Cody
13d8cc515d block: add backing-file option to block-stream
On some image chains, QEMU may not always be able to resolve the
filenames properly, when updating the backing file of an image
after a block job.

For instance, certain relative pathnames may fail, or drives may
have been specified originally by file descriptor (e.g. /dev/fd/???),
or a relative protocol pathname may have been used.

In these instances, QEMU may lack the information to be able to make
the correct choice, but the user or management layer most likely does
have that knowledge.

With this extension to the block-stream api, the user is able to change
the backing file of the active layer as part of the block-stream
operation.

This allows the change to be 'safe', in the sense that if the attempt
to write the active image metadata fails, then the block-stream
operation returns failure, without disrupting the guest.

If a backing file string is not specified in the command, the backing
file string to use is determined in the same manner as it was
previously.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-01 10:47:01 +02:00
Jeff Cody
54e2690090 block: extend block-commit to accept a string for the backing file
On some image chains, QEMU may not always be able to resolve the
filenames properly, when updating the backing file of an image
after a block commit.

For instance, certain relative pathnames may fail, or drives may
have been specified originally by file descriptor (e.g. /dev/fd/???),
or a relative protocol pathname may have been used.

In these instances, QEMU may lack the information to be able to make
the correct choice, but the user or management layer most likely does
have that knowledge.

With this extension to the block-commit api, the user is able to change
the backing file of the overlay image as part of the block-commit
operation.

This allows the change to be 'safe', in the sense that if the attempt
to write the overlay image metadata fails, then the block-commit
operation returns failure, without disrupting the guest.

If the commit top is the active layer, then specifying the backing
file string will be treated as an error (there is no overlay image
to modify in that case).

If a backing file string is not specified in the command, the backing
file string to use is determined in the same manner as it was
previously.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-01 10:47:01 +02:00
Peter Maydell
6764579f89 block/cow: Avoid use of uninitialized cow_bs in error path
Commit 25814e8987 introduced an error-exit code path which does
a "goto exit" before the cow_bs variable is initialized, meaning
we would call bdrv_unref() on an uninitialized variable and
likely segfault. Fix this by moving the NULL-initialization
to the top of the function and making the exit code path handle
the case where it is NULL.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-01 10:15:34 +02:00
Chunyan Liu
4ab1559085 qemu-img create: add 'nocow' option
Add 'nocow' option so that users could have a chance to set NOCOW flag to
newly created files. It's useful on btrfs file system to enhance performance.

Btrfs has low performance when hosting VM images, even more when the guest
in those VM are also using btrfs as file system. One way to mitigate this bad
performance is to turn off COW attributes on VM files. Generally, there are
two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then
all newly created files will be NOCOW. b) per file. Add the NOCOW file
attribute. It could only be done to empty or new files.

This patch tries the second way, according to the option, it could add NOCOW
per file.

For most block drivers, since the create file step is in raw-posix.c, so we
can do setting NOCOW flag ioctl in raw-posix.c only.

But there are some exceptions, like block/vpc.c and block/vdi.c, they are
creating file by calling qemu_open directly. For them, do the same setting
NOCOW flag ioctl work in them separately.

[Fixed up 082.out due to the new 'nocow' creation option
--Stefan]

Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-01 10:15:12 +02:00
Benoît Canet
09158f00e0 block: Add replaces argument to drive-mirror
drive-mirror will bdrv_swap the new BDS named node-name with the one
pointed by replaces when the mirroring is finished.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-27 20:00:00 +02:00
Stefan Hajnoczi
13344f3a17 block: acquire AioContext in qmp_query_blockstats()
Make query-blockstats safe for dataplane by acquiring the
BlockDriverState's AioContext.  This ensures that the dataplane IOThread
and the main loop's monitor code do not race.

Note the assumption that acquiring the drive's BDS AioContext also
protects ->file and ->backing_hd.  This assumption is made by other
aio_context_acquire() callers too.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Tested-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-27 18:20:29 +02:00
Stefan Hajnoczi
ac46821f2c block: make bdrv_query_stats() static
This function is only called from block/qapi.c.  There is no need to
keep it public.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Tested-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-27 18:19:57 +02:00
Benoît Canet
cf29a570a7 quorum: Add the rewrite-corrupted parameter to quorum
On read operations when this parameter is set and some replicas are corrupted
while quorum can be reached quorum will proceed to rewrite the correct version
of the data to fix the corrupted replicas.

This will shine with SSD where the FTL will remap the same block at another
place on rewrite.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-27 14:18:17 +02:00
Kevin Wolf
8ee79e707a block: Catch backing files assigned to non-COW drivers
Since we parse backing.* options to add a backing file from the command
line when the driver didn't assign one, it has been possible to have a
backing file for e.g. raw images (it just was never accessed).

This is obvious nonsense and should be rejected.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-06-26 13:51:01 +02:00
Peter Lieven
f42ca3cad1 block/nfs: add knob to set readahead
upcoming libnfs will feature internal readahead support.
Add a knob to pass the optional readahead value as a URL
parameter.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-26 13:51:01 +02:00
Peter Lieven
7c24384b3b block/nfs: fix url parameter checking
this patch fixes the incorrect usage of strncmp and
adds simple error checking by means of parse_uint_full
instead of atoi for the supplied URL parameters.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-26 13:51:01 +02:00
Fam Zheng
9e48b02540 mirror: Go through ready -> complete process for 0 len image
When mirroring or active committing a zero length image, BLOCK_JOB_READY
is not reported now, instead the job completes because we short circuit
the mirror job loop.

This is inconsistent with non-zero length images, and only confuses
management software.

Let's do the same thing when seeing a 0-length image: report ready
immediately; wait for block-job-cancel or block-job-complete; clear the
cancel flag as existing non-zero image synced case (cancelled after
ready); then jump to the exit.

Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-26 13:50:57 +02:00
Stefan Weil
5d831be272 Fix new typos (found by codespell)
* accomodate -> accommodate
* aquiring -> acquiring
* beacuse -> because
* loosing -> losing
* prefering -> preferring
* threshhold -> threshold

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-06-24 20:01:24 +04:00
Wenchao Xia
fe069d9d59 qapi event: convert QUORUM events
Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-06-23 11:12:28 -04:00
Wenchao Xia
bcada37b19 qapi event: convert other BLOCK_JOB events
Since BLOCK_JOB_COMPLETED, BLOCK_JOB_CANCELLED, BLOCK_JOB_READY are
related, convert them in one patch. The block_job_event_* functions
are used to keep encapsulation of BlockJob structure.

Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-06-23 11:12:28 -04:00
Wenchao Xia
c120f0fa14 qapi event: convert BLOCK_IMAGE_CORRUPTED
Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-06-23 11:12:27 -04:00
Wenchao Xia
a589569f2f qapi: adjust existing defines
In order to let event defines use existing types later, instead of
redefine new ones, some old type defines for spice and vnc are changed,
and BlockErrorAction is moved from block.h to qapi schema. Note that
BlockErrorAction is not merged with BlockdevOnError.

At this point, VncInfo is not made a child of VncBasicInfo, because
VncBasicInfo has mandatory fields where VncInfo makes them optional.

Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-06-23 11:01:25 -04:00
Liu Yuan
5d5da114b3 sheepdog: fix NULL dereference in sd_create
Following command

qemu-img create -f qcow2 sheepdog:test 20g

will cause core dump because aio_context is NULL in sd_create. We should
initialize it by qemu_get_aio_context() to avoid NULL dereference.

Cc: qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-23 16:36:13 +08:00
Peter Lieven
8c215a9fbd block/iscsi: drop obsolete pointers from iscsi_co_writev
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-18 18:42:42 +02:00
Peter Lieven
9d2e256e62 block/iscsi: fix init value for iTask->retries
during rebasing the changed init value for the
retry counter was missed. This resulted in no retries
being performed at all.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-18 18:42:24 +02:00
Peter Lieven
e49ab19fca block/iscsi: bump libiscsi requirement to 1.9.0
This patch lifts the minimum supported libiscsi version from 1.4.0 to
1.9.0 since the BUSY patch required that change.

On one this allows us to remove all #ifdefs from the code which
makes the code easier to maintain and read. On the other hand
I would not recommend libiscsi prior to 1.8.0 for production use
because the following important libiscsi fixes for deadlocks and
protocol errors are missing prior to 1.8.0:

dbe9a1e SOCKET queue cmd PDUs directly in waitpdu queue
30df192 DATA-OUT set pdu->cmdsn appropriately
548bd22 ISCSI fix broken send logic in iscsi_scsi_async_command
14bee10 RECONNECT do not increase CmdSN for immediate PDUs
1f4a66a PDU queue out PDUs in order of itt.
562dd46 PDU avoid incrementing itt to 0xffffffff
cd09c0f PDU use serial32 arithmetic for cmdsn, maxcmdsn and expcmdsn.
89e918e SOCKET validate data_size in in_pdu header
91267f5 Limit immediate and unsolicited data to FirstBurstLength

Note that libiscsi 1.9.0 was released on Feb 24th, 2013, about
one month after 1.8.0.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-18 18:03:33 +02:00
Peter Lieven
9281fe9eea block/iscsi: use 16 byte CDBs only when necessary
this patch changes the driver to uses 16 Byte CDBs for
READ/WRITE only if the target requires 64bit lba addressing.

On one hand this saves 6 bytes in each PDU on the other
hand it seems that 10 Byte CDBs seems to be much better
supported and tested as a recent issue I had with a
major storage supplier lined out.

For WRITESAME the logic is a bit more tricky as WRITESAME10
with UNMAP was added really late. Thus a fallback to WRITESAME16
is possible if it supports UNMAP and WRITESAME10 not.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-18 08:47:10 +02:00
Peter Lieven
fcd470d857 block/iscsi: fix potential segfault on early callback
it might happen in the future that a function directly invokes its callback.
In this case we end up in a segfault because the iTask is gone when the BH
is scheduled.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-18 08:47:10 +02:00
Peter Lieven
efc6de0d0e block/iscsi: handle BUSY condition
this patch adds handling of BUSY status reponse from an iSCSI target.
Currently, we fail with -EIO in case of SCSI_STATUS_BUSY while the
obvious reaction would be to retry the operation after some time.
The retry time is randomly choosen from a range with exponential
growth increasing with each retry.

This patch includes most of the changes by a an upcoming patch
from Stefan Hajnoczi:

 iscsi: implement .bdrv_detach/attach_aio_context()

because I also need the reference to the aio_context for
the retry timer to work. I included the changes to maintain
better mergeability.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-18 08:47:10 +02:00
Chunyan Liu
c282e1fdf7 cleanup QEMUOptionParameter
Now that all backend drivers are using QemuOpts, remove all
QEMUOptionParameter related codes.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-16 17:23:21 +08:00
Chunyan Liu
fec9921f0a vpc.c: replace QEMUOptionParameter with QemuOpts
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-16 17:23:21 +08:00
Chunyan Liu
5820f1da51 vmdk.c: replace QEMUOptionParameter with QemuOpts
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-16 17:23:21 +08:00
Chunyan Liu
5366092c7a vhdx.c: replace QEMUOptionParameter with QemuOpts
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-16 17:23:21 +08:00
Chunyan Liu
004b7f2522 vdi.c: replace QEMUOptionParameter with QemuOpts
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-16 17:23:21 +08:00
Chunyan Liu
766181fe57 ssh.c: replace QEMUOptionParameter with QemuOpts
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-16 17:23:21 +08:00
Chunyan Liu
b222237b7b sheepdog.c: replace QEMUOptionParameter with QemuOpts
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-16 17:23:21 +08:00
Chunyan Liu
bd0cf596fd rbd.c: replace QEMUOptionParameter with QemuOpts
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-16 17:23:21 +08:00
Chunyan Liu
cd3a4cf631 raw_bsd.c: replace QEMUOptionParameter with QemuOpts
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-16 17:23:21 +08:00