num_gtes_per_gte is a historical typo, rename it to a more sensible
name. It means "number of GrainTableEntries per GrainTable".
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We should never grow the stack beyond 1 MB, otherwise we'll fall off the
end. Thread stacks and coroutine stacks (1 MB) do not grow.
get_cluster_offset() allocates a big stack offset, it will fail for big
cluster images, change to heap allocated buffer.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
L1 table size is calculated from capacity, granularity and l2 table
size. If capacity is too big or later two are too small, the L1 table
will be too big to allocate in memory. Limit it to a reasonable range.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
header.num_gtes_per_gte determines size for L2 table. Check for too big
value before using it. Limit to 512M entries (2GB per one L2 table).
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Granularity is used to calculate the cluster size and allocate r/w
buffer. Check the value from image before using it, so we don't abort()
for unbounded memory allocation.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The size and offset fields are all non-negative values, use uint64_t for
them to avoid getting negative in memory value by int overflow.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It's best to make it consistent that all on disk structures are
QEMU_PACKED.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit 3ac21627 changed the behaviour of bdrv_has_zero_init() to default
to 0. In the review for Sheepdog it turned out that enabling it is safe,
so that commit updated one BlockDriver definition of sheepdog to use
bdrv_has_zero_init_1, missed however that there are more BlockDrivers in
the driver. Fix these now.
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The comment was truncated. Add the missing parts, especially explain why
we need zero_dry_run.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The error on armv7hl was:
block/iscsi.c: In function ‘is_request_lun_aligned’:
block/iscsi.c:251:26: error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘int64_t’ [-Werror=format=]
iscsilun->block_size, sector_num, nb_sectors);
^
This also splits the long line to comply with qemu coding guidelines.
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
'dprintf' is the name of a POSIX standard function so we should not be
stealing it for our debug macro. Rename to 'DPRINTF' (in line with
a number of other source files.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Acked-by: Richard Henderson <rth@twiddle.net>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1375100199-13934-2-git-send-email-peter.maydell@linaro.org
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
# By Stefan Hajnoczi (4) and others
# Via Stefan Hajnoczi
* stefanha/block:
dataplane: refuse to start if device is already in use
dataplane: enable virtio-blk x-data-plane=on live migration
migration: fix spice migration
migration: notify migration state before starting thread
block: Repair the throttling code.
gluster: Add image resize support
Message-id: 1375112172-24863-1-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Implement .bdrv_truncate in GlusterFS block driver so that GlusterFS backend
can support image resizing.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Tested-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
All these typos were found by codespell.
sould -> should
emperical -> empirical
intialization -> initialization
successfuly -> successfully
gaurantee -> guarantee
Fix also another error (before before) in the same context.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This patch adds sync-modes to the drive-backup interface and
implements the FULL, NONE and TOP modes of synchronization.
FULL performs as before copying the entire contents of the drive
while preserving the point-in-time using CoW.
NONE only copies new writes to the target drive.
TOP copies changes to the topmost drive image and preserves the
point-in-time using CoW.
For sync mode TOP are creating a new target image using the same backing
file as the original disk image. Then any new data that has been laid
on top of it since creation is copied in the main backup_run() loop.
There is an extra check in the 'TOP' case so that we don't bother to copy
all the data of the backing file as it already exists in the target.
This is where the bdrv_co_is_allocated() is used to determine if the
data exists in the topmost layer or below.
Also any new data being written is intercepted via the write_notifier
hook which ends up calling backup_do_cow() to copy old data out before
it gets overwritten.
For mode 'NONE' we create the new target image and only copy in the
original data from the disk image starting from the time the call was
made. This preserves the point in time data by only copying the parts
that are *going to change* to the target image. This way we can
reconstruct the final image by checking to see if the given block exists
in the new target image first, and if it does not, you can get it from
the original image. This is basically an optimization allowing you to
do point-in-time snapshots with low overhead vs the 'FULL' version.
Since there is no old data to copy out the loop in backup_run() for the
NONE case just calls qemu_coroutine_yield() which only wakes up after
an event (usually cancel in this case). The rest is handled by the
before_write notifier which again calls backup_do_cow() to write out
the old data so it can be preserved.
Signed-off-by: Ian Main <imain@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is what QMP wants to use. The options haven't been enabled in any
release yet, so we're still free to change them.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
s->qcow and s->qcow_filename are allocated but not freed on error. Fix the
possible leaks, remove unnecessary check for bdrv_new(), propagate ret code of
bdrv_create() and also the one of enable_write_target().
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Implement bdrv_aio_discard for gluster.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
if the blocksize of an iSCSI LUN is bigger than the BDRV_SECTOR_SIZE
it is possible that sector_num or nb_sectors are not correctly
aligned.
to avoid corruption we fail requests which are misaligned.
Signed-off-by: Peter Lieven <pl@kamp.de>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
this hask is not working (anymore). support for misaligned offsets should
be handled at the block layer.
Signed-off-by: Peter Lieven <pl@kamp.de>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
the -ENOPSC case did not work due to the missing goto.
Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Don't assume that SG_IO is always invoked with a simple buffer,
check the iovec_count and if it is >= 1 then we need to pass an array
of iovectors to libiscsi instead of just a plain buffer.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
One of the major reasons for doing something new for -blockdev and
blockdev-add was that the old block layer code parses filenames instead
of just taking them literally. So we should really leave it untouched
when it's passing using the new interfaces (like -drive
file.filename=...).
This allows opening relative file names that contain a colon.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
CURL driver requests partial data from server on guest IO req. For HTTP
and HTTPS, it uses "Range: ***" in requests, and this will not work if
server not accepting range. This patch does this check when open.
* Removed curl_size_cb, which is not used: On one hand it's registered to
libcurl as CURLOPT_WRITEFUNCTION, instead of CURLOPT_HEADERFUNCTION,
which will get called with *data*, not *header*. On the other hand the
s->len is assigned unconditionally later.
In this gone function, the sscanf for "Content-Length: %zd", on
(void *)ptr, which is not guaranteed to be zero-terminated, is
potentially a security bug. So this patch fixes it as a side-effect. The
bug is reported as: https://bugs.launchpad.net/qemu/+bug/1188943
(Note the bug is marked "private" so you might not be able to see it)
* Introduced curl_header_cb, which is used to parse header and mark the
server as accepting range if "Accept-Ranges: bytes" line is seen from
response header. If protocol is HTTP or HTTPS, but server response has
no not this support, refuse to open this URL.
Note that python builtin module SimpleHTTPServer is an example of not
supporting range, if you need to test this driver, get a better server
or use internet URLs.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Depending on the subformat, has_zero_init queries underlying storage for
flat extent. If it has a flat extent and its underlying storage doesn't
have zero init, return 0. Otherwise return 1.
Aligns the operator assignments.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
.has_zero_init defaults to 1 for all formats and protocols.
this is a dangerous default since this means that all
new added drivers need to manually overwrite it to 0 if
they do not ensure that a device is zero initialized
after bdrv_create().
if a driver needs to explicitly set this value to
1 its easier to verify the correctness in the review process.
during review of the existing drivers it turned out
that ssh and gluster had a wrong default of 1.
both protocols support host_devices as backend
which are not by default zero initialized. this
wrong assumption will lead to possible corruption
if qemu-img convert is used to write to such a backend.
vpc and vmdk also defaulted to 1 altough they support
fixed respectively flat extends. this has to be addresses
in separate patches. both formats as well as the mentioned
ssh and gluster are turned to the default of 0 with this
patch for safety.
a similar problem with the wrong default existed for
iscsi most likely because the driver developer did
oversee the default value of 1.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Depending on the subformat, has_zero_init on VHD must behave like raw
and query the underlying storage (fixed) or like other sparse formats
that can always return 1 (dynamic, differencing).
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When creating image with backing file, the driver tries to calculate the
relative path from created image file to backing file, but the path
computation is incorrect. e.g.:
$ qemu-img create -f vmdk -b vmdk-data-disk.vmdk vmdk-data-snapshot1
Formatting 'vmdk-data-snapshot1', fmt=vmdk size=10737418240
backing_file='vmdk-data-disk.vmdk' compat6=off zeroed_grain=off
$ qemu-img info vmdk-data-snapshot1
image: vmdk-data-snapshot1
file format: vmdk
virtual size: 10G (10737418240 bytes)
disk size: 12K
-> backing file: disk.vmdk
The common part in file names, "vmdk-data-", is incorrectly forgotten by
relative_path(). As the VMDK specification has no restriction on
parentNameHint to be relative path, we simply remove this by using the
backing_file option.
Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
GlusterFS volumes can be backed by block devices, in which case
bdrv_create() doesn't make sure that the image is zeroed out. It is
currently not possibly to detect whether a given image is backed by a
file or a block device, and incorrectly assuming that it is zeroed
corrupts images during qemu-img convert, so let's err on the side of
caution and always return 0.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If the remote is a regular file, set it to true (ie. reads of
uninitialized areas in a newly created file will return zeroes).
If we can't prove that, return false (a safe default).
Tested by adding a debugging print statement [not part of this commit]
and creating a remote file and a remote block device:
$ ./qemu-img create ssh://localhost/tmp/new 100M
Formatting 'ssh://localhost/tmp/new', fmt=raw size=104857600
filename ssh://localhost/tmp/new: has_zero_init = 1
$ sudo lvcreate -L 1G -n tmp /dev/fedora
Logical volume "tmp" created
$ ./qemu-img create ssh://localhost/dev/fedora/tmp 1G
Formatting 'ssh://localhost/dev/fedora/tmp', fmt=raw size=1073741824
filename ssh://localhost/dev/fedora/tmp: has_zero_init = 0
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
backup_start() creates a block job that copies a point-in-time snapshot
of a block device to a target block device.
We call backup_do_cow() for each write during backup. That function
reads the original data from the block device before it gets
overwritten. The data is then written to the target device.
Currently backup cluster size is hardcoded to 65536 bytes.
[I made a number of changes to Dietmar's original patch and folded them
in to make code review easy. Here is the full list:
* Drop BackupDumpFunc interface in favor of a target block device
* Detect zero clusters with buffer_is_zero() and use bdrv_co_write_zeroes()
* Use 0 delay instead of 1us, like other block jobs
* Unify creation/start functions into backup_start()
* Simplify cleanup, free bitmap in backup_run() instead of cb
* function
* Use HBitmap to avoid duplicating bitmap code
* Use bdrv_getlength() instead of accessing ->total_sectors
* directly
* Delete the backup.h header file, it is no longer necessary
* Move ./backup.c to block/backup.c
* Remove #ifdefed out code
* Coding style and whitespace cleanups
* Use bdrv_add_before_write_notifier() instead of blockjob-specific hooks
* Keep our own in-flight CowRequest list instead of using block.c
tracked requests. This means a little code duplication but is much
simpler than trying to share the tracked requests list and use the
backup block size.
* Add on_source_error and on_target_error error handling.
* Use trace events instead of DPRINTF()
-- stefanha]
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The raw-posix driver has code to provide a /dev/cdrom on OS X even
though it doesn't really exist. However, since commit c66a6157 the real
filename is dismissed after finding it, so opening /dev/cdrom fails.
Put the filename back into the options QDict to make this work again.
Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Refuse to open higher version for safety.
Although we try to be compatible with published VMDK spec, VMware has
newer version from ESXi 5.1 exported OVF/OVA, which we have no knowledge
what's changed in it. And it is very likely to have more new versions in
the future, so it's not safe to open them blindly.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This optimises the discard operation for freed clusters by batching
discard requests (both snapshot deletion and bdrv_discard end up
updating the refcounts cluster by cluster).
Note that we don't discard asynchronously, but keep s->lock held. This
is to avoid that a freed cluster is reallocated and written to while the
discard is still in flight.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Deleted snapshots are discarded in the image file by default, discard
requests take their default from the -drive discard=... option and other
places that free clusters must always be enabled explicitly.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This adds a refcount update reason to all callers of update_refcounts(),
so that a follow-up patch can use this information to decide whether
clusters that reach a refcount of 0 should be discarded in the image
file.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
# By Paolo Bonzini (3) and others
# Via Paolo Bonzini
* bonzini/scsi-next:
iscsi: reorganize iscsi_readcapacity_sync
iscsi: simplify freeing of tasks
vhost-scsi: fix k->set_guest_notifiers() NULL dereference
scsi-disk: scsi-block device for scsi pass-through should not be removable
scsi-generic: check the return value of bdrv_aio_ioctl in execute_command
scsi-generic: fix sign extension of READ CAPACITY(10) data
scsi: reset cdrom tray statuses on scsi_disk_reset
Message-id: 1371565016-2643-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Avoid the goto, and use the same retry logic for the 10- and 16-
byte versions.
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Always free them in the iscsi_aio_*_acb functions and remove the
checks in their callers. Remove ifs when the task struct was
previously dereferenced (spotted by Coverity).
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Otherwise they would get passed to getaddrinfo and fail with:
address resolution failed for [::1]🔢 Name or service not known
(Broken by commit v1.4.0-736-gf17c90b)
Signed-off-by: Ján Tomko <jtomko@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
# By Luiz Capitulino
# Via Luiz Capitulino
* luiz/queue/qmp:
qerror: drop QERR_OPEN_FILE_FAILED macro
block: bdrv_reopen_prepare(): don't use QERR_OPEN_FILE_FAILED
savevm: qmp_xen_save_devices_state(): use error_setg_file_open()
dump: qmp_dump_guest_memory(): use error_setg_file_open()
cpus: use error_setg_file_open()
blockdev: use error_setg_file_open()
block: mirror_complete(): use error_setg_file_open()
rng-random: use error_setg_file_open()
error: add error_setg_file_open() helper
Message-id: 1371484631-29510-1-git-send-email-lcapitulino@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
the hard-coded 2k buffer on the stack won't allow reading big descriptor
files which can be generated when storing big images. For example 500G
vmdk splitted to 2G chunks.
Signed-off-by: Evgeny Budilovsky <evgeny.budilovsky@ravellosystems.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(Found by Kamil Dudka)
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Cc: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Remember to byteswap VMDK4Header.desc_offset on big-endian machines.
Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Just call sd_create_branch() in the snapshot_goto to rollback the image is good
enough. With this patch, 'loadvm' process for sheepdog is modified:
Suppose we have a snapshot chain A --> B --> C, we do 'loadvm A' so as to get
a new chain,
A --> B
|
V
C1
in the old code:
1 reload inode of A (in snapshot_goto)
2 read vmstate via A's vdi_id (loadvm_state)
3 delete C and create C1, reload inode of C1 (sd_create_branch on write)
with this patch applied:
1 reload inode of A, delete C and create C1 (in snapshot_goto)
2 read vmstate via C1's parent, that is A's vdi_id (loadvm_state)
This will fix the possible bug that QEMU exit between 2 and 3 in the old code
Cc: qemu-devel@nongnu.org
Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>