Commit Graph

794 Commits

Author SHA1 Message Date
Paolo Bonzini
08e4ed6cde mirror: add buf-size argument to drive-mirror
This makes sense when the next commit starts using the extra buffer space
to perform many I/O operations asynchronously.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:34 +01:00
Paolo Bonzini
bd48bde8f0 mirror: switch mirror_iteration to AIO
There is really no change in the behavior of the job here, since
there is still a maximum of one in-flight I/O operation between
the source and the target.  However, this patch already introduces
the AIO callbacks (which are unmodified in the next patch)
and some of the logic to count in-flight operations and only
complete the job when there is none.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:34 +01:00
Paolo Bonzini
eee13dfe30 mirror: allow customizing the granularity
The desired granularity may be very different depending on the kind of
operation (e.g. continuous replication vs. collapse-to-raw) and whether
the VM is expected to perform lots of I/O while mirroring is in progress.

Allow the user to customize it, while providing a sane default so that
in general there will be no extra allocated space in the target compared
to the source.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:34 +01:00
Paolo Bonzini
50717e941b block: allow customizing the granularity of the dirty bitmap
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:34 +01:00
Paolo Bonzini
acc906c6c5 block: return count of dirty sectors, not chunks
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:33 +01:00
Paolo Bonzini
b812f6719c mirror: perform COW if the cluster size is bigger than the granularity
When mirroring runs, the backing files for the target may not yet be
ready.  However, this means that a copy-on-write operation on the target
would fill the missing sectors with zeros.  Copy-on-write only happens
if the granularity of the dirty bitmap is smaller than the cluster size
(and only for clusters that are allocated in the source after the job
has started copying).  So far, the granularity was fixed to 1MB; to avoid
the problem we detected the situation and required the backing files to
be available in that case only.

However, we want to lower the granularity for efficiency, so we need
a better solution.  The solution is to always copy a whole cluster the
first time it is touched.  The code keeps a bitmap of clusters that
have already been allocated by the mirroring job, and only does "manual"
copy-on-write if the chunk being copied is zero in the bitmap.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:33 +01:00
Paolo Bonzini
8f0720ecbc block: implement dirty bitmap using HBitmap
This actually uses the dirty bitmap in the block layer, and converts
mirroring to use an HBitmapIter.

Reviewed-by: Laszlo Ersek <lersek@redhat.com> (except block/mirror.c parts)
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:33 +01:00
Peter Lieven
7371d56fb2 iscsi: add support for iovectors
This patch adds support for directly passing the iovec
array from QEMUIOVector if libiscsi supports it (1.8.0
or newer).

Signed-off-by: Peter Lieven <pl@kamp.de>
[Preserve the improvements from commit 4cc841b, iscsi: partly
 avoid iovec linearization in iscsi_aio_writev, 2012-11-19 - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-01-24 15:37:55 +01:00
Paolo Bonzini
4790b03d30 iscsi: do not leak acb->buf when commands are aborted
acb->buf is freed in the WRITE(16) callback, but this may not
get called at all when commands are aborted.  Add another
free in the ABORT TASK callback, which requires setting acb->buf
to NULL everywhere.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-01-24 15:37:55 +01:00
Anthony Liguori
177f7fc688 Merge remote-tracking branch 'bonzini/scsi-next' into staging
# By Peter Lieven (3) and others
# Via Paolo Bonzini
* bonzini/scsi-next:
  scsi: Drop useless null test in scsi_unit_attention()
  lsi: use qbus_reset_all to reset SCSI bus
  scsi: fix segfault with 0-byte disk
  iscsi: add support for iSCSI NOPs [v2]
  iscsi: partly avoid iovec linearization in iscsi_aio_writev
  iscsi: add iscsi_create support
2013-01-23 09:08:54 -06:00
Peter Lieven
5b5d34ec98 iscsi: add support for iSCSI NOPs [v2]
This patch will send NOP-Out PDUs every 5 seconds to the iSCSI target.
If a consecutive number of NOP-In replies fail a reconnect is initiated.
iSCSI NOPs help to ensure that the connection to the target is still operational.
This should not, but in reality may be the case even if the TCP connection is still
alive if there are bugs in either the target or the initiator implementation.

v2:
 - track the NOPs inside libiscsi so libiscsi can reset the counter
   in case it initiates a reconnect.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-01-22 15:07:03 +01:00
Peter Lieven
4cc841b57c iscsi: partly avoid iovec linearization in iscsi_aio_writev
libiscsi expects all write16 data in a linear buffer. If the
iovec only contains one buffer we can skip the linearization
step as well as the additional malloc/free and pass the
buffer directly.

Reported-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-01-22 15:07:03 +01:00
Peter Lieven
de8864e5ae iscsi: add iscsi_create support
This patch adds support for bdrv_create. This allows e.g.
to use qemu-img to convert from any supported device to
an iscsi backed storage as destination.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-01-22 15:07:03 +01:00
Anthony Liguori
8b17ed4caa Merge remote-tracking branch 'stefanha/block' into staging
# By Kevin Wolf (4) and others
# Via Stefan Hajnoczi
* stefanha/block:
  dataplane: support viostor virtio-pci status bit setting
  dataplane: avoid reentrancy during virtio_blk_data_plane_stop()
  win32-aio: use iov utility functions instead of open-coding them
  win32-aio: Fix memory leak
  win32-aio: Fix vectored reads
  aio: Fix return value of aio_poll()
  ide: Remove wrong assertion
  block: fix null-pointer bug on error case in block commit
2013-01-20 11:01:10 -06:00
Andreas Färber
c36dd8a09f block/raw-posix: Make hdev_aio_discard() available outside Linux
Fixes the build on OpenBSD among others.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-19 14:35:02 +00:00
Michael Tokarev
3249dbe661 win32-aio: use iov utility functions instead of open-coding them
We have iov_from_buf() and iov_to_buf(), use them instead of
open-coding these in block/win32-aio.c

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-18 09:57:51 +01:00
Kevin Wolf
e8bccad5ac win32-aio: Fix memory leak
The buffer is allocated for both reads and writes, and obviously it
should be freed even if an error occurs.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-17 10:58:09 +01:00
Kevin Wolf
bcbbd234d4 win32-aio: Fix vectored reads
Copying data in the right direction really helps a lot!

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-17 10:57:13 +01:00
Jeff Cody
6d759117d3 block: fix null-pointer bug on error case in block commit
This is a bug that was caught by a coverity run by Markus.  In
the error case when we errored out to exit_restore_open early in the
function, 'overlay_bs' was still NULL at that point, although it is
used to look up flags and perform a bdrv_reopen().

Move the overlay_bs lookup to where it is needed, and check for NULL
before restoring the flags.  Also get rid of the unneeded parameter
initialization.

Reported-By: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-17 10:51:11 +01:00
Markus Armbruster
7191bf311e block: Fix how mirror_run() frees its buffer
It allocates with qemu_blockalign(), therefore it must free with
qemu_vfree(), not g_free().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 17:28:55 +01:00
Markus Armbruster
7479acdbce win32-aio: Fix how win32_aio_process_completion() frees buffer
win32_aio_submit() allocates it with qemu_blockalign(), therefore it
must be freed with qemu_vfree(), not g_free().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 16:47:45 +01:00
Liu Yuan
f700f8e346 sheepdog: clean up sd_aio_setup()
The last two parameters of sd_aio_setup() are never used, so remove them.

Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 13:40:10 +01:00
Liu Yuan
4778307278 sheepdog: multiplex the rw FD to flush cache
This will reduce sockfds connected to the sheep server to one, which simply the
future hacks.

Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 11:18:49 +01:00
Paolo Bonzini
8238010b26 block: make discard asynchronous
This is easy with the thread pool, because we can use s->is_xfs and
s->has_discard from the worker function.

QEMU has a widespread assumption that each I/O operation writes less
than 2^32 bytes.  This patch doesn't fix it throughout of course,
but it starts correcting struct RawPosixAIOData so that there is
no regression with respect to the synchronous discard implementation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 10:03:47 +01:00
Paolo Bonzini
fcd9d45552 raw: support discard on block devices
Block devices use a ioctl instead of fallocate, so add a separate
implementation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 10:03:47 +01:00
Paolo Bonzini
c85191e5c9 raw-posix: remember whether discard failed
Avoid sending system calls repeatedly if they shall fail.  This
does not apply to XFS: if the filesystem-specific ioctl fails,
something weird is happening.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 10:03:47 +01:00
Kusanagi Kouichi
3d4fa43e64 raw-posix: support discard on more filesystems
Linux 2.6.38 introduced the filesystem independent interface to
deallocate part of a file. As of Linux 3.7, btrfs, ext4, ocfs2,
tmpfs and xfs support it.

Even though the system calls here are in practice issued on Linux,
the code is structured to allow plugging in alternatives for other Unix
variants.  EOPNOTSUPP is used unconditionally in this patch, but it is
supported in both OpenBSD and Mac OS X since forever (see for example
http://lists.debian.org/debian-glibc/2006/02/msg00337.html).

Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 10:03:47 +01:00
Kevin Wolf
8d2497c355 qcow2: Fix segfault on zero-length write
One of the recent refactoring patches (commit f50f88b9) didn't take care
to initialise l2meta properly, so with zero-length writes, which don't
even enter the write loop, qemu just segfaulted.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 09:08:55 +01:00
Anthony Liguori
da758bd7a3 Merge remote-tracking branch 'kwolf/for-anthony' into staging
* kwolf/for-anthony:
  dataplane: handle misaligned virtio-blk requests
  dataplane: extract virtio-blk read/write processing into do_rdwr_cmd()
  block: make qiov_is_aligned() public
  raw-posix: fix bdrv_aio_ioctl
  sheepdog: implement direct write semantics
  block: do not probe zero-sized disks

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-01-14 10:26:26 -06:00
Stefan Hajnoczi
c53b1c5114 block: make qiov_is_aligned() public
The qiov_is_aligned() function checks whether a QEMUIOVector meets a
BlockDriverState's alignment requirements.  This is needed by
virtio-blk-data-plane so:

1. Move the function from block/raw-posix.c to block/block.c.
2. Make it public in block/block.h.
3. Rename to bdrv_qiov_is_aligned().
4. Change return type from int to bool.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-14 10:06:56 +01:00
Paolo Bonzini
b608c8dc02 raw-posix: fix bdrv_aio_ioctl
When the raw-posix aio=thread code was moved from posix-aio-compat.c
to block/raw-posix.c, there was an unintended change to the ioctl code.
The code used to return the ioctl command, which posix_aio_read()
would later morph into a zero.  This hack is not necessary anymore,
and in fact breaks scsi-generic (which expects a zero return code).
Remove it.

Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-14 10:06:56 +01:00
Liu Yuan
0e7106d8b5 sheepdog: implement direct write semantics
Sheepdog supports both writeback/writethrough write but has not yet supported
DIRECTIO semantics which bypass the cache completely even if Sheepdog daemon is
set up with cache enabled.

Suppose cache is enabled on Sheepdog daemon size, the new cache control is

cache=writeback # enable the writeback semantics for write
cache=writethrough # enable the emulated writethrough semantics for write
cache=directsync # disable cache competely

Guest WCE toggling on the run time to toggle writeback/writethrough is also
supported.

Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-14 10:06:56 +01:00
Paolo Bonzini
4d4545743f qemu-option: move standard option definitions out of qemu-config.c
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-01-12 17:17:53 +01:00
Stefan Weil
eb7ff6fb0b Replace remaining gmtime, localtime by gmtime_r, localtime_r
This allows removing of MinGW specific code and improves
reentrancy for POSIX hosts.

[Removed unused ret variable in qemu_get_timedate() to fix warning:
vl.c: In function ‘qemu_get_timedate’:
vl.c:451:16: error: variable ‘ret’ set but not used [-Werror=unused-but-set-variable]
-- Stefan Hajnoczi]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-11 09:44:37 +01:00
Liu Yuan
d6b1ef89a1 sheepdog: pass oid directly to send_pending_req()
Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-02 16:09:00 +01:00
Liu Yuan
bd751f2204 sheepdog: don't update inode when create_and_write fails
For the error case such as SD_RES_NO_SPACE, we shouldn't update the inode bitmap
to avoid the scenario that the object is allocated but wasn't created at the
server side. This will result in VM's IO error on the failed object.

Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-02 16:08:58 +01:00
Stefan Weil
fccedc624c block/raw-win32: Fix compiler warnings (wrong format specifiers)
Commit fbcad04d6b added fprintf statements
with wrong format specifiers.

GetLastError() returns a DWORD which is unsigned long, so %lu must be used.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-02 16:08:57 +01:00
Stefan Hajnoczi
4065742ac0 raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane
The raw_get_aio_fd() function allows virtio-blk-data-plane to get the
file descriptor of a raw image file with Linux AIO enabled.  This
interface is really a layering violation that can be resolved once the
block layer is able to run outside the global mutex - at that point
virtio-blk-data-plane will switch from custom Linux AIO code to using
the block layer.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-02 15:31:39 +01:00
Paolo Bonzini
9c17d615a6 softmmu: move include files to include/sysemu/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:32:45 +01:00
Paolo Bonzini
1de7afc984 misc: move include files to include/qemu/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:32:39 +01:00
Paolo Bonzini
caf71f86a3 migration: move include files to include/migration/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:32 +01:00
Paolo Bonzini
737e150e89 block: move include files to include/block/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:31 +01:00
Paolo Bonzini
7b1b5d1913 qapi: move include files to include/qobject/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:31 +01:00
Paolo Bonzini
f8fe796407 janitor: do not include qemu-char everywhere
Touching char/char.h basically causes the whole of QEMU to
be rebuilt.  Avoid this, it is usually unnecessary.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:29:59 +01:00
Paolo Bonzini
077805fa92 janitor: do not rely on indirect inclusions of or from qemu-char.h
Various header files rely on qemu-char.h including qemu-config.h or
main-loop.h, but they really do not need qemu-char.h at all (particularly
interesting is the case of the block layer!).  Clean this up, and also
add missing inclusions of qemu-char.h itself.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:29:52 +01:00
Paolo Bonzini
525877c999 build: move rules from Makefile to */Makefile.objs
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:29:06 +01:00
Kevin Wolf
226c3c26b9 qcow2: Factor out handle_dependencies()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
4e95314e2b qcow2: Execute run_dependent_requests() without lock
There's no reason for run_dependent_requests() to hold s->lock, and a
later patch will require that in fact the lock is not held.

Also, before this patch, run_dependent_requests() not only does what its
name suggests, but also removes the l2meta from the list of in-flight
requests. When changing this, it becomes an one-liner, so just inline it
completely.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
280d373579 qcow2: Enable dirty flag in qcow2_alloc_cluster_link_l2
This is closer to where the dirty flag is really needed, and it avoids
having checks for special cases related to cluster allocation directly
in the writev loop.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
f50f88b9fe qcow2: Allocate l2meta only for cluster allocations
Even for writes to already allocated clusters, an l2meta is allocated,
though it stays effectively unused. After this patch, only allocating
requests still have one. Each l2meta now describes an in-flight request
that writes to clusters that are not yet hooked up in the L2 table.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00