Currently the file size requested by user is rounded down to nearest
sector, causing the actual file size could be a bit less than the size
user requested. Since some formats (like qcow2) record virtual disk
size in bytes, this can make the last few bytes cannot be accessed.
This patch fixes it by rounding up file size to nearest sector so that
the actual file size is no less than the requested file size.
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
vmdk_open_sparse() does not take ownership of buf so the caller always
needs to free it.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Instead of bdrv_getlength().
Commit 57322b7 did this all over block, but one more bdrv_getlength()
has crept in since.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.
Patch created with Coccinelle, with two manual changes on top:
* Add const to bdrv_iterate_format() to keep the types straight
* Convert the allocation in bdrv_drop_intermediate(), which Coccinelle
inexplicably misses
Coccinelle semantic patch:
@@
type T;
@@
-g_malloc(sizeof(T))
+g_new(T, 1)
@@
type T;
@@
-g_try_malloc(sizeof(T))
+g_try_new(T, 1)
@@
type T;
@@
-g_malloc0(sizeof(T))
+g_new0(T, 1)
@@
type T;
@@
-g_try_malloc0(sizeof(T))
+g_try_new0(T, 1)
@@
type T;
expression n;
@@
-g_malloc(sizeof(T) * (n))
+g_new(T, n)
@@
type T;
expression n;
@@
-g_try_malloc(sizeof(T) * (n))
+g_try_new(T, n)
@@
type T;
expression n;
@@
-g_malloc0(sizeof(T) * (n))
+g_new0(T, n)
@@
type T;
expression n;
@@
-g_try_malloc0(sizeof(T) * (n))
+g_try_new0(T, n)
@@
type T;
expression p, n;
@@
-g_realloc(p, sizeof(T) * (n))
+g_renew(T, p, n)
@@
type T;
expression p, n;
@@
-g_try_realloc(p, sizeof(T) * (n))
+g_try_renew(T, p, n)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.
This patch addresses the allocations in the vmdk block driver.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
This drops the unnecessary bdrv_truncate() from, and also improves,
cluster allocation code path.
Before, when we need a new cluster, get_cluster_offset truncates the
image to bdrv_getlength() + cluster_size, and returns the offset of
added area, i.e. the image length before truncating.
This is not efficient, so it's now rewritten as:
- Save the extent file length when opening.
- When allocating cluster, use the saved length as cluster offset.
- Don't truncate image, because we'll anyway write data there: just
write any data at the EOF position, in descending priority:
* New user data (cluster allocation happens in a write request).
* Filling data in the beginning and/or ending of the new cluster, if
not covered by user data: either backing file content (COW), or
zero for standalone images.
One major benifit of this change is, on host mounted NFS images, even
over a fast network, ftruncate is slow (see the example below). This
change significantly speeds up cluster allocation. Comparing by
converting a cirros image (296M) to VMDK on an NFS mount point, over
1Gbe LAN:
$ time qemu-img convert cirros-0.3.1.img /mnt/a.raw -O vmdk
Before:
real 0m21.796s
user 0m0.130s
sys 0m0.483s
After:
real 0m2.017s
user 0m0.047s
sys 0m0.190s
We also get rid of unchecked bdrv_getlength() and bdrv_truncate(), and
get a little more documentation in function comments.
Tested that this passes qemu-iotests for all VMDK subformats.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Instead of bdrv_getlength().
Aside: a few of these callers don't handle errors. I didn't
investigate whether they should.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Add 'nocow' option so that users could have a chance to set NOCOW flag to
newly created files. It's useful on btrfs file system to enhance performance.
Btrfs has low performance when hosting VM images, even more when the guest
in those VM are also using btrfs as file system. One way to mitigate this bad
performance is to turn off COW attributes on VM files. Generally, there are
two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then
all newly created files will be NOCOW. b) per file. Add the NOCOW file
attribute. It could only be done to empty or new files.
This patch tries the second way, according to the option, it could add NOCOW
per file.
For most block drivers, since the create file step is in raw-posix.c, so we
can do setting NOCOW flag ioctl in raw-posix.c only.
But there are some exceptions, like block/vpc.c and block/vdi.c, they are
creating file by calling qemu_open directly. For them, do the same setting
NOCOW flag ioctl work in them separately.
[Fixed up 082.out due to the new 'nocow' creation option
--Stefan]
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Since we parse backing.* options to add a backing file from the command
line when the driver didn't assign one, it has been possible to have a
backing file for e.g. raw images (it just was never accessed).
This is obvious nonsense and should be rejected.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Now that all backend drivers are using QemuOpts, remove all
QEMUOptionParameter related codes.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Change block layer to support both QemuOpts and QEMUOptionParameter.
After this patch, it will change backend drivers one by one. At the end,
QEMUOptionParameter will be removed and only QemuOpts is kept.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Implement .bdrv_detach/attach_aio_context() interfaces to propagate
detach/attach to BDRVVmdkState->extents[].file. The block layer takes
care of ->file and ->backing_hd but doesn't know about our extents
BlockDriverStates, which is also part of the graph.
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In vmdk_create and vmdk_create_extent, initialize local_err before using
it, and don't leak it on error.
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This will return cluster_size and needs_compressed_writes to caller, if all the
extents have the same value (or there's only one extent). Otherwise return
-ENOTSUP.
cluster_size is only reported for sparse formats.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Add a wrapper function to support "compressed" path in qemu-img convert.
Only support streamOptimized subformat case for now (num_extents == 1
and extent compression is true).
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Returning "Wrong medium type" for an image that does not have a valid
header is a bit weird. Improve the error by mentioning what format
was trying to open it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that we can return the "right" errors, use the Error** parameter
to pass them back instead of just printing them.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This prepares for propagating errors from vmdk_open_sparse and
vmdk_open_desc_file up to the caller of vmdk_open.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently, we just try reading a VMDK file as both image and descriptor.
This makes it hard to choose which of the two attempts gave the best error.
We'll decide in advance if the file looks like an image or a descriptor,
and this patch is the first step to that end.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add the bdrv_open() option BDRV_O_PROTOCOL which results in passing the
call to bdrv_file_open(). Additionally, make bdrv_file_open() static and
therefore bdrv_open() the only way to call it.
Consequently, all existing calls to bdrv_file_open() have to be adjusted
to use bdrv_open() with the BDRV_O_PROTOCOL flag instead.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Allow bdrv_open() to handle references to existing block devices just as
bdrv_file_open() is already capable of.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Make bdrv_open() take a pointer to a BDS pointer, similarly to
bdrv_file_open(). If a pointer to a NULL pointer is given, bdrv_open()
will create a new BDS with an empty name; if the BDS pointer is not
NULL, that existing BDS will be reused (in the same way as bdrv_open()
already did).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There are a handful of places in the block layer where a failure path
has a valid -errno value, yet error_setg() is used. Those instances
should instead use error_setg_errno(), to preserve as much error
information as possible.
This patch replaces those instances with error_setg_errno(), so that
errno is passed up the stack in the error message.
Reported-By: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
this adds a basic vmdk corruption check. it should detect severe
table corruptions and file truncation.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This function separates filling the BlockLimits from bdrv_open(), which
allows it to call it from other operations which may change the limits
(e.g. modifications to the backing file chain or bdrv_reopen)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Allow specifying a reference to an existing block device (by name) for
bdrv_file_open() instead of a filename and/or options.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Local variable "n" as int64_t avoids overflow with large sector number
calculation. See test case change for failure case.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This improves vmdk_create to use bdrv_* functions to replace qemu_open
and other fd functions. The error handling are improved as well. One
difference is that bdrv_pwrite will round up buffer to sectors, so for
description file, an extra bdrv_truncate is used in the end to drop
inding zeros.
Notes:
- A bonus bug fix is correct endian is used in initializing GD entries.
- ROUND_UP and DIV_ROUND_UP are used where possible.
I tested that new code produces exactly the same file as previously.
Signed-off-by: Fam Zheng <famz@redhat.com>
Tested-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
VMFS extent line in description file should be with 4 fields:
RW <size> VMFS "file-name.vmdk"
Check the number explicitly and report error if offset is appended as
FLAT, which should be invalid format.
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This will let misaligned but large requests use zero clusters. This
is important because the cluster size is not guest visible.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The buffer for description file was 4096 which only covers a few
hundred of extents. This changes the buffer to dynamic allocated with
g_strdup_printf in order to support bigger cases.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If you open an image temporarily just because you want to check its size
or get it flushed, there's no real reason to open the whole backing file
chain.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Implement .bdrv_get_specific_info to return the extent information.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The VMFS extent line in description file doesn't have start offset as
FLAT lines does, and it should be defaulted to 0. The flat_offset
variable is initialized to -1, so we need to set it in this case.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Previously cid of parent is parsed from image file for every IO request.
We already have L1/L2 cache and don't have assumption that parent image
can be updated behind us, so remove this to get more efficiency.
The parent CID is checked only for once after opening.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
An extra 'p++' after while loop when *p == '\n' will move p to unknown
data position, risking parsing junk data or memory access violation.
Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Convert "fprintf(stderr,..." and standardize error messages:
Remove a few local_error's and use errp.
Remove "VMDK:" or "Vmdk:" prefixes in error message and fix to upper
case.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We use the extent size as cluster size for flat extents (where no L1/L2
table is allocated so it's safe) reuse sector calculating code with
sparse extents.
Don't pass in the cluster size for adding flat extent, just set it to
sectors later, then the cluster size checking will not fail.
The cluster_sectors is changed to int64_t to allow big flat extent.
Without this, flat extent opening is broken:
# qemu-img create -f vmdk -o subformat=monolithicFlat /tmp/a.vmdk 100G
Formatting '/tmp/a.vmdk', fmt=vmdk size=107374182400 compat6=off subformat='monolithicFlat' zeroed_grain=off
# qemu-img info /tmp/a.vmdk
image: /tmp/a.vmdk
file format: raw
virtual size: 0 (0 bytes)
disk size: 4.0K
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add an Error ** parameter to bdrv_open, bdrv_file_open and associated
functions to allow more specific error messages.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add an Error ** parameter to BlockDriver.bdrv_open and
BlockDriver.bdrv_file_open to allow more specific error messages.
Signed-off-by: Max Reitz <mreitz@redhat.com>
For now, bdrv_get_block_status is just another name for bdrv_is_allocated.
The next patches will add more flags.
This also touches all block drivers with a mostly mechanical rename. The
sole exception is cow; because it calls cow_co_is_allocated from the read
code, we keep that function and make cow_co_get_block_status a wrapper.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Manage BlockDriverState lifecycle with refcnt, so bdrv_delete() is no
longer public and should be called by bdrv_unref() if refcnt is
decreased to 0.
This is an identical change because effectively, there's no multiple
reference of BDS now: no caller of bdrv_ref() yet, only bdrv_new() sets
bs->refcnt to 1, so all bdrv_unref() now actually delete the BDS.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
VMware ESX hosts also use different create and extent types for flat
files, respectively "vmfs" and "VMFS". This is not documented, but it
can be found at http://kb.vmware.com/kb/10002511 (Recreating a missing
virtual machine disk (VMDK) descriptor file).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
VMware ESX hosts use a variant of the VMDK3 format, identified by the
vmfsSparse create type ad the VMFSSPARSE extent type.
It has 16 KB grain tables (L2) and a variable-size grain directory (L1).
In addition, the grain size is always 512, but that is not a problem
because it is included in the header.
The format of the extents is documented in the VMDK spec. The format
of the descriptor file is not documented precisely, but it can be
found at http://kb.vmware.com/kb/10026353 (Recreating a missing virtual
machine disk (VMDK) descriptor file for delta disks).
With these patches, vmfsSparse files only work if opened through the
descriptor file. Data files without descriptor files, as far as I
could understand, are not supported by ESX.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
--
v2: Rebase to patch 01.
Change le64_to_cpu to le32_to_cpu.
Rename vmdk_open_vmdk3 to vmdk_open_vmfs_sparse, which represents the
current usage of this format.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
VMDK3 header has the field l1dir_size, but vmdk_open_vmdk3 hardcoded the
value. This patch honors the header field.
And the L2 table size is 4096 according to VMDK spec[1], instead of
1 << 9 (512).
[1]:
http://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf?src=vmdk
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This header check is common to VMDK3 and VMDK4, so move it into
vmdk_add_extent().
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
num_gtes_per_gte is a historical typo, rename it to a more sensible
name. It means "number of GrainTableEntries per GrainTable".
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We should never grow the stack beyond 1 MB, otherwise we'll fall off the
end. Thread stacks and coroutine stacks (1 MB) do not grow.
get_cluster_offset() allocates a big stack offset, it will fail for big
cluster images, change to heap allocated buffer.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
L1 table size is calculated from capacity, granularity and l2 table
size. If capacity is too big or later two are too small, the L1 table
will be too big to allocate in memory. Limit it to a reasonable range.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
header.num_gtes_per_gte determines size for L2 table. Check for too big
value before using it. Limit to 512M entries (2GB per one L2 table).
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Granularity is used to calculate the cluster size and allocate r/w
buffer. Check the value from image before using it, so we don't abort()
for unbounded memory allocation.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The size and offset fields are all non-negative values, use uint64_t for
them to avoid getting negative in memory value by int overflow.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It's best to make it consistent that all on disk structures are
QEMU_PACKED.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The comment was truncated. Add the missing parts, especially explain why
we need zero_dry_run.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Depending on the subformat, has_zero_init queries underlying storage for
flat extent. If it has a flat extent and its underlying storage doesn't
have zero init, return 0. Otherwise return 1.
Aligns the operator assignments.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
When creating image with backing file, the driver tries to calculate the
relative path from created image file to backing file, but the path
computation is incorrect. e.g.:
$ qemu-img create -f vmdk -b vmdk-data-disk.vmdk vmdk-data-snapshot1
Formatting 'vmdk-data-snapshot1', fmt=vmdk size=10737418240
backing_file='vmdk-data-disk.vmdk' compat6=off zeroed_grain=off
$ qemu-img info vmdk-data-snapshot1
image: vmdk-data-snapshot1
file format: vmdk
virtual size: 10G (10737418240 bytes)
disk size: 12K
-> backing file: disk.vmdk
The common part in file names, "vmdk-data-", is incorrectly forgotten by
relative_path(). As the VMDK specification has no restriction on
parentNameHint to be relative path, we simply remove this by using the
backing_file option.
Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Refuse to open higher version for safety.
Although we try to be compatible with published VMDK spec, VMware has
newer version from ESXi 5.1 exported OVF/OVA, which we have no knowledge
what's changed in it. And it is very likely to have more new versions in
the future, so it's not safe to open them blindly.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
the hard-coded 2k buffer on the stack won't allow reading big descriptor
files which can be generated when storing big images. For example 500G
vmdk splitted to 2G chunks.
Signed-off-by: Evgeny Budilovsky <evgeny.budilovsky@ravellosystems.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Remember to byteswap VMDK4Header.desc_offset on big-endian machines.
Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Use special offset to write zeroes efficiently, when zeroed-grain GTE is
available. If zero-write an allocated cluster, cluster is leaked because
its offset pointer is overwritten by "0x1".
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Previously VmdkMetaData.offset is stored little endian while other
fields are cpu endian. This changes offset to cpu endian and convert
before writing to image.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Add image create option "zeroed-grain" to enable zeroed-grain GTE
feature of vmdk sparse extents. When this option is on, header version
of newly created extent will be 2 and VMDK4_FLAG_ZERO_GRAIN flag bit
will be set.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Introduced support for zeroed-grain GTE, as specified in Virtual Disk
Format 5.0[1].
Recent VMware hosted platform products support a new “zeroed‐grain”
grain table entry (GTE). The zeroed‐grain GTE returns all zeros on
read. In other words, the zeroed‐grain GTE indicates that a grain
in the child disk is zero‐filled but does not actually occupy space
in storage. A sparse extent with zeroed‐grain GTE has the following
in its header:
* SparseExtentHeader.version = 2
* SparseExtentHeader.flags has bit 2 set
Other than the new flag and the possibly zeroed‐grain GTE, version 2
sparse extents are identical to version 1. Also, a zeroed‐grain GTE
has value 0x1 in the GT table.
[1] Virtual Disk Format 5.0, http://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf?src=vmdk
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Internal routines in vmdk.c previously return -1 on error and 0 on
success. More return values are useful for future changes such as
zeroed-grain GTE. Change all the magic `return 0` and `return -1` to
macro names:
* VMDK_OK 0
* VMDK_ERROR (-1)
* VMDK_UNALLOC (-2)
* VMDK_ZEROED (-3)
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It doesn't do anything yet except storing the options QDict in the
BlockDriverState.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The previous scanf() format string stopped parsing the file name on the
first white white space, which seems to be allowed at least by VMware
Workstation.
Change the format string to collect everything between the first and
second quote as the file name, disallowing line breaks.
Signed-off-by: Philipp Hahn <hahn@univention.de>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Introduce a new option "adapter_type" when converting to vmdk images.
It can be one of the following: ide (default), buslogic, lsilogic
or legacyESX (according to the vmdk spec from vmware).
In case of a non-ide adapter, heads is set to 255 instead of the 16.
The latter is used for "ide".
Also see LP#545089
Signed-off-by: Othmar Pasteka <pasteka@kabsi.at>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This improves error reports for bochs, cow, qcow, qcow2, qed and vmdk
when a file with the wrong format is selected.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fixed a MAJOR BUG in VMDK files on file boundaries on reads
and ALSO ON WRITES WHICH MIGHT CORRUPT THE IMAGE AND DATA!!!!!!
Triggered for example with the following VMDK file (partly listed):
RW 4193792 FLAT "XP-W1-f001.vmdk" 0
RW 2097664 FLAT "XP-W1-f002.vmdk" 0
RW 4193792 FLAT "XP-W1-f003.vmdk" 0
RW 512 FLAT "XP-W1-f004.vmdk" 0
RW 4193792 FLAT "XP-W1-f005.vmdk" 0
RW 2097664 FLAT "XP-W1-f006.vmdk" 0
RW 4193792 FLAT "XP-W1-f007.vmdk" 0
RW 512 FLAT "XP-W1-f008.vmdk" 0
Patch includes:
1.) Patch fixes wrong calculation on extent boundaries. Especially it
fixes the relativeness of the sector number to the current extent.
Verfied correctness with:
1.) Converted either with Virtualbox to VDI and then with qemu-img and
then with qemu-img only:
VBoxManage clonehd --format vdi /VM/XP-W/new/XP-W1.vmdk ~/.VirtualBox/Harddisks/XP-W1-new-test.vdi
./qemu-img convert -O raw ~/.VirtualBox/Harddisks/XP-W1-new-test.vdi /root/QEMU/VM-XP-W1/XP-W1-via-VBOX.img
md5sum /root/QEMU/VM-XP-W/XP-W1-direct.img
md5sum /root/QEMU/VM-XP-W/XP-W1-via-VBOX.img
=> same MD5 hash
2.) Verified debug log files
3.) Run Windows XP successfully
4.) chkdsk run successfully without any errors
Signed-off-by: Gerhard Wiesinger <lists@wiesinger.com>
Acked-by: Fam Zheng <famcool@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Avoid strncpy+manual-NUL-terminate. Use pstrcpy instead.
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch supports reopen for VMDK image files. VMDK extents are added
to the existing reopen queue, so that the transactional model of reopen
is maintained with multiple image files.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The footer takes precedence over the header when it exists. It contains
the real grain directory offset that is missing in the header. Without
this patch, streamOptimized images with a footer cannot be read.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
This patch converts all block layer close calls, that correspond
to qemu_open calls, to qemu_close.
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch converts all block layer open calls to qemu_open.
Note that this adds the O_CLOEXEC flag to the changed open paths
when the O_CLOEXEC macro is defined.
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The qcow2, qcow, and vmdk block drivers are based on coroutines. They have a
coroutine mutex which protects internal state. We can convert the
.bdrv_is_allocated() function to .bdrv_co_is_allocated() by holding the mutex
around the cluster lookup operation.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
zlib.h is not a local include file, therefore it should be included
using <> instead of "".
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
There are two different types of flush that you can do: Flushing one level up
to the OS (i.e. writing data to the host page cache) or flushing it all the way
down to the disk. The existing functions flush to the disk, reflect this in the
function name.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Data we read from the disk isn't necessarily null terminated and may not
contain the string we're looking for. The code needs to be a bit more careful
here.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Since coroutine operation is now mandatory, convert all bdrv_flush
implementations to coroutines. For qcow2, this means taking the lock.
Other implementations are simpler and just forward bdrv_flush to the
underlying protocol, so they can avoid the lock.
The bdrv_flush callback is then unused and can be eliminated.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This does the first part of the conversion to coroutines, by
wrapping bdrv_write implementations to take the mutex.
Drivers that implement bdrv_write rather than bdrv_co_writev can
then benefit from asynchronous operation (at least if the underlying
protocol supports it, which is not the case for raw-win32), even
though they still operate with a bounce buffer.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This does the first part of the conversion to coroutines, by
wrapping bdrv_read implementations to take the mutex.
Drivers that implement bdrv_read rather than bdrv_co_readv can
then benefit from asynchronous operation (at least if the underlying
protocol supports it, which is not the case for raw-win32), even
though they still operate with a bounce buffer.
raw-win32 does not need the lock, because it cannot yield.
nbd also doesn't probably, but better be safe.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The big conversion of bdrv_read/write to coroutines caused the two
homonymous callbacks in BlockDriver to become reentrant. It goes
like this:
1) bdrv_read is now called in a coroutine, and calls bdrv_read or
bdrv_pread.
2) the nested bdrv_read goes through the fast path in bdrv_rw_co_entry;
3) in the common case when the protocol is file, bdrv_co_do_readv calls
bdrv_co_readv_em (and from here goes to bdrv_co_io_em), which yields
until the AIO operation is complete;
4) if bdrv_read had been called from a bottom half, the main loop
is free to iterate again: a device model or another bottom half
can then come and call bdrv_read again.
This applies to all four of read/write/flush/discard. It would also
apply to is_allocated, but it is not used from within coroutines:
besides qemu-img.c and qemu-io.c, which operate synchronously, the
only user is the monitor. Copy-on-read will introduce a use in the
block layer, and will require converting it.
The solution is "simply" to convert all drivers to coroutines! We
just need to add a CoMutex that is taken around affected operations.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move vmdk_parent_open to vmdk_open. There's another path how
vmdk_parent_open can be reached:
vmdk_parse_extents() -> vmdk_open_sparse() -> vmdk_open_vmdk4() ->
vmdk_open_desc_file().
If that can happen, however, the code is bogus. vmdk_parent_open
reads from bs->file:
if (bdrv_pread(bs->file, s->desc_offset, desc, DESC_SIZE) != DESC_SIZE) {
but it is always called with s->desc_offset == 0 and with the same
bs->file. So the data that vmdk_parent_open reads comes always from the
same place, and anyway there is only one place where it can write it,
namely bs->backing_file.
So, if it cannot happen, the patched code is okay.
It is also possible that the recursive call can happen, but only once. In
that case there would still be a bug in vmdk_open_desc_file setting
s->desc_offset = 0, but the patched code is okay.
Finally, in the case where multiple recursive calls can happen the code
would need to be rewritten anyway. It is likely that this would anyway
involve adding several parameters to vmdk_parent_open, and calling it from
vmdk_open_vmdk4.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
While vmdk_open_desc_file (touched by the patch) correctly changed -1
to -EINVAL, vmdk_open did not. Fix it directly in vmdk_parent_open.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Release extent_file on error in vmdk_parse_extents. Added closing files
in freeing extents.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The vSphere 4 exported image is streamOptimized extent, which is not
quite correctly handled. Ignore rdgOffset when RGD flag bit not set.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Haiku provides a specially formed vmdk image, which let qemu abort. It a
combination of sparse header and flat data (i.e. with not l1/l2 table at
all). The fix is turn to descriptor when sparse header is zero in field
'capacity'.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Creating streamOptimized subformat. Added subformat option
'streamOptimized', to create a image with compression enabled and each
cluster with a GrainMarker.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add support for reading/writing compressed extent.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Added flags field for compressed/streamOptimized extents, open and save
image configuration.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Factor out read/write extent code, since there will be more things to
take care of once reading/writing compressed clusters is introduced.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Enable the createType 'twoGbMaxExtentFlat'. The supporting code is
already in.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Most changes were made using these commands:
git grep -la '__attribute__((packed))'|xargs perl -pi -e 's/__attribute__\(\(packed\)\)/QEMU_PACKED/'
git grep -la '__attribute__ ((packed))'|xargs perl -pi -e 's/__attribute__ \(\(packed\)\)/QEMU_PACKED/'
git grep -la '__attribute__((__packed__))'|xargs perl -pi -e 's/__attribute__\(\(__packed__\)\)/QEMU_PACKED/'
git grep -la '__attribute__ ((__packed__))'|xargs perl -pi -e 's/__attribute__ \(\(__packed__\)\)/QEMU_PACKED/'
git grep -la '__attribute((packed))'|xargs perl -pi -e 's/__attribute\(\(packed\)\)/QEMU_PACKED/'
Whitespace in linux-user/syscall_defs.h was fixed manually
to avoid warnings from scripts/checkpatch.pl.
Manual changes were also applied to hw/pc.c.
I did not fix indentation with tabs in block/vvfat.c.
The patch will show 4 errors with scripts/checkpatch.pl.
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
qemu-img.c wants to count allocated file size of image. Previously it
counts a single bs->file by 'stat' or Window API. As VMDK introduces
multiple file support, the operation becomes format specific with
platform specific meanwhile.
The functions are moved to block/raw-{posix,win32}.c and qemu-img.c calls
bdrv_get_allocated_file_size to count the bs. And also added VMDK code
to count his own extents.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Conform coding style in vmdk.c to pass scripts/checkpatch.pl checks.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add create option 'format', with enums:
monolithicSparse
monolithicFlat
twoGbMaxExtentSparse
twoGbMaxExtentFlat
Each creates a subformat image file. The default is monolithicSparse.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Parse vmdk decriptor file and open mono flat image.
Read/write the flat extent.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The return type of get_cluster_offset was an offset that use 0 to denote
'not allocated', this will be no longer true for flat extents, as we see
flat extent file as a single huge cluster whose offset is 0 and length
is the whole file length.
So now we use int return value, 0 means success and otherwise offset
invalid.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Cid_update is the flag for updating CID on first write after opening the
image. This should be per image open rather than per program life cycle,
so change it from static var of vmdk_write to a field in BDRVVmdkState.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Flush all the file that referenced by the image.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There are several occurrence of magic number 0x200 as the descriptor
offset within mono sparse image file. This is not the case for images
with separate descriptor file. So a field is added to BDRVVmdkState to
hold the correct value.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Separate vmdk_open by subformats to:
* vmdk_open_vmdk3
* vmdk_open_vmdk4
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Probe as the same behavior as VMware does.
Recognize image as monolithicFlat descriptor file when the file is text
and the first effective line (not '#' leaded comment or space line) is
either 'version=1' or 'version=2'. No space or upper case charactors
accepted.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In get_whole_cluster, the offset is not aligned to cluster when reading
from backing_hd. When the first write to child is not at the cluster
boundary, wrong address data from parent is copied to child.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The vmdk code is sloppy when handling the header descriptor during
creation of an image. Fix all header accesses in the create path to
either store native endianness or convert it when appropriate.
Reported-by: Yury Tsarev <ytsarev@novell.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All drivers use bs->file instead of s->hd for quite a while now, so it's time
to remove s->hd.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
This changes bdrv_flush to return 0 on success and -errno in case of failure.
It's a requirement for implementing proper error handle in users of bdrv_flush.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
VMDK is doing interesting things when it needs to open a backing file. This
patch changes that part to look more like in other drivers. The nice side
effect is that the file name isn't needed any more in the open function.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When trying to do COW, VMDK wrote the data back to the backing file. This
problem was revealed by the patch that made backing files read-only. This patch
does not only fix the problem, but also simplifies the VMDK code a bit.
This fixes the backing file qemu-iotests cases for VMDK.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Format drivers shouldn't need to bother with things like file names, but rather
just get an open BlockDriverState for the underlying protocol. This patch
introduces this behaviour for bdrv_open implementation. For protocols which
need to access the filename to open their file/device/connection/... a new
callback bdrv_file_open is introduced which doesn't get an underlying file
opened.
For now, also some of the more obscure formats use bdrv_file_open because they
open() the file themselves instead of using the block.c functions. They need to
be fixed in later patches.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
What is known today as bdrv_open2 becomes the new bdrv_open. All remaining
callers of the old function are converted to the new one. In some places they
even know the right format, so they should have used bdrv_open2 from the
beginning.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
cleanup code is identical for error/success cases. Only difference
are goto labels.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
fail_gd error case would also free rgd_buf that was already freed
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>