Commit Graph

775 Commits

Author SHA1 Message Date
Markus Armbruster
7191bf311e block: Fix how mirror_run() frees its buffer
It allocates with qemu_blockalign(), therefore it must free with
qemu_vfree(), not g_free().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 17:28:55 +01:00
Markus Armbruster
7479acdbce win32-aio: Fix how win32_aio_process_completion() frees buffer
win32_aio_submit() allocates it with qemu_blockalign(), therefore it
must be freed with qemu_vfree(), not g_free().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 16:47:45 +01:00
Liu Yuan
f700f8e346 sheepdog: clean up sd_aio_setup()
The last two parameters of sd_aio_setup() are never used, so remove them.

Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 13:40:10 +01:00
Liu Yuan
4778307278 sheepdog: multiplex the rw FD to flush cache
This will reduce sockfds connected to the sheep server to one, which simply the
future hacks.

Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 11:18:49 +01:00
Paolo Bonzini
8238010b26 block: make discard asynchronous
This is easy with the thread pool, because we can use s->is_xfs and
s->has_discard from the worker function.

QEMU has a widespread assumption that each I/O operation writes less
than 2^32 bytes.  This patch doesn't fix it throughout of course,
but it starts correcting struct RawPosixAIOData so that there is
no regression with respect to the synchronous discard implementation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 10:03:47 +01:00
Paolo Bonzini
fcd9d45552 raw: support discard on block devices
Block devices use a ioctl instead of fallocate, so add a separate
implementation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 10:03:47 +01:00
Paolo Bonzini
c85191e5c9 raw-posix: remember whether discard failed
Avoid sending system calls repeatedly if they shall fail.  This
does not apply to XFS: if the filesystem-specific ioctl fails,
something weird is happening.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 10:03:47 +01:00
Kusanagi Kouichi
3d4fa43e64 raw-posix: support discard on more filesystems
Linux 2.6.38 introduced the filesystem independent interface to
deallocate part of a file. As of Linux 3.7, btrfs, ext4, ocfs2,
tmpfs and xfs support it.

Even though the system calls here are in practice issued on Linux,
the code is structured to allow plugging in alternatives for other Unix
variants.  EOPNOTSUPP is used unconditionally in this patch, but it is
supported in both OpenBSD and Mac OS X since forever (see for example
http://lists.debian.org/debian-glibc/2006/02/msg00337.html).

Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 10:03:47 +01:00
Kevin Wolf
8d2497c355 qcow2: Fix segfault on zero-length write
One of the recent refactoring patches (commit f50f88b9) didn't take care
to initialise l2meta properly, so with zero-length writes, which don't
even enter the write loop, qemu just segfaulted.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 09:08:55 +01:00
Anthony Liguori
da758bd7a3 Merge remote-tracking branch 'kwolf/for-anthony' into staging
* kwolf/for-anthony:
  dataplane: handle misaligned virtio-blk requests
  dataplane: extract virtio-blk read/write processing into do_rdwr_cmd()
  block: make qiov_is_aligned() public
  raw-posix: fix bdrv_aio_ioctl
  sheepdog: implement direct write semantics
  block: do not probe zero-sized disks

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-01-14 10:26:26 -06:00
Stefan Hajnoczi
c53b1c5114 block: make qiov_is_aligned() public
The qiov_is_aligned() function checks whether a QEMUIOVector meets a
BlockDriverState's alignment requirements.  This is needed by
virtio-blk-data-plane so:

1. Move the function from block/raw-posix.c to block/block.c.
2. Make it public in block/block.h.
3. Rename to bdrv_qiov_is_aligned().
4. Change return type from int to bool.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-14 10:06:56 +01:00
Paolo Bonzini
b608c8dc02 raw-posix: fix bdrv_aio_ioctl
When the raw-posix aio=thread code was moved from posix-aio-compat.c
to block/raw-posix.c, there was an unintended change to the ioctl code.
The code used to return the ioctl command, which posix_aio_read()
would later morph into a zero.  This hack is not necessary anymore,
and in fact breaks scsi-generic (which expects a zero return code).
Remove it.

Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-14 10:06:56 +01:00
Liu Yuan
0e7106d8b5 sheepdog: implement direct write semantics
Sheepdog supports both writeback/writethrough write but has not yet supported
DIRECTIO semantics which bypass the cache completely even if Sheepdog daemon is
set up with cache enabled.

Suppose cache is enabled on Sheepdog daemon size, the new cache control is

cache=writeback # enable the writeback semantics for write
cache=writethrough # enable the emulated writethrough semantics for write
cache=directsync # disable cache competely

Guest WCE toggling on the run time to toggle writeback/writethrough is also
supported.

Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-14 10:06:56 +01:00
Paolo Bonzini
4d4545743f qemu-option: move standard option definitions out of qemu-config.c
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-01-12 17:17:53 +01:00
Stefan Weil
eb7ff6fb0b Replace remaining gmtime, localtime by gmtime_r, localtime_r
This allows removing of MinGW specific code and improves
reentrancy for POSIX hosts.

[Removed unused ret variable in qemu_get_timedate() to fix warning:
vl.c: In function ‘qemu_get_timedate’:
vl.c:451:16: error: variable ‘ret’ set but not used [-Werror=unused-but-set-variable]
-- Stefan Hajnoczi]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-11 09:44:37 +01:00
Liu Yuan
d6b1ef89a1 sheepdog: pass oid directly to send_pending_req()
Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-02 16:09:00 +01:00
Liu Yuan
bd751f2204 sheepdog: don't update inode when create_and_write fails
For the error case such as SD_RES_NO_SPACE, we shouldn't update the inode bitmap
to avoid the scenario that the object is allocated but wasn't created at the
server side. This will result in VM's IO error on the failed object.

Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-02 16:08:58 +01:00
Stefan Weil
fccedc624c block/raw-win32: Fix compiler warnings (wrong format specifiers)
Commit fbcad04d6b added fprintf statements
with wrong format specifiers.

GetLastError() returns a DWORD which is unsigned long, so %lu must be used.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-02 16:08:57 +01:00
Stefan Hajnoczi
4065742ac0 raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane
The raw_get_aio_fd() function allows virtio-blk-data-plane to get the
file descriptor of a raw image file with Linux AIO enabled.  This
interface is really a layering violation that can be resolved once the
block layer is able to run outside the global mutex - at that point
virtio-blk-data-plane will switch from custom Linux AIO code to using
the block layer.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-02 15:31:39 +01:00
Paolo Bonzini
9c17d615a6 softmmu: move include files to include/sysemu/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:32:45 +01:00
Paolo Bonzini
1de7afc984 misc: move include files to include/qemu/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:32:39 +01:00
Paolo Bonzini
caf71f86a3 migration: move include files to include/migration/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:32 +01:00
Paolo Bonzini
737e150e89 block: move include files to include/block/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:31 +01:00
Paolo Bonzini
7b1b5d1913 qapi: move include files to include/qobject/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:31 +01:00
Paolo Bonzini
f8fe796407 janitor: do not include qemu-char everywhere
Touching char/char.h basically causes the whole of QEMU to
be rebuilt.  Avoid this, it is usually unnecessary.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:29:59 +01:00
Paolo Bonzini
077805fa92 janitor: do not rely on indirect inclusions of or from qemu-char.h
Various header files rely on qemu-char.h including qemu-config.h or
main-loop.h, but they really do not need qemu-char.h at all (particularly
interesting is the case of the block layer!).  Clean this up, and also
add missing inclusions of qemu-char.h itself.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:29:52 +01:00
Paolo Bonzini
525877c999 build: move rules from Makefile to */Makefile.objs
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:29:06 +01:00
Kevin Wolf
226c3c26b9 qcow2: Factor out handle_dependencies()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
4e95314e2b qcow2: Execute run_dependent_requests() without lock
There's no reason for run_dependent_requests() to hold s->lock, and a
later patch will require that in fact the lock is not held.

Also, before this patch, run_dependent_requests() not only does what its
name suggests, but also removes the l2meta from the list of in-flight
requests. When changing this, it becomes an one-liner, so just inline it
completely.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
280d373579 qcow2: Enable dirty flag in qcow2_alloc_cluster_link_l2
This is closer to where the dirty flag is really needed, and it avoids
having checks for special cases related to cluster allocation directly
in the writev loop.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
f50f88b9fe qcow2: Allocate l2meta only for cluster allocations
Even for writes to already allocated clusters, an l2meta is allocated,
though it stays effectively unused. After this patch, only allocating
requests still have one. Each l2meta now describes an in-flight request
that writes to clusters that are not yet hooked up in the L2 table.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
060bee8943 qcow2: Drop l2meta.cluster_offset
There's no real reason to have an l2meta for normal requests that don't
allocate anything. Before we can get rid of it, we must return the host
cluster offset in a different way.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
cf5c1a231e qcow2: Allocate l2meta dynamically
As soon as delayed COW is introduced, the l2meta struct is needed even
after completion of the request, so it can't live on the stack.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
593fb83cac qcow2: Introduce Qcow2COWRegion
This makes it easier to address the areas for which a COW must be
performed. As a nice side effect, the COW code in
qcow2_alloc_cluster_link_l2 becomes really trivial.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
1d3afd649b qcow2: Round QCowL2Meta.offset down to cluster boundary
The offset within the cluster is already present as n_start and this is
what the code uses. QCowL2Meta.offset is only needed at a cluster
granularity.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
67a7a0ebe5 qcow2: Move BLKDBG_EVENT out of the lock
We want to use these events to suspend requests for testing concurrent
AIO requests. Suspending requests while they are holding the CoMutex is
rather boring for this purpose.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-12 12:33:48 +01:00
Kevin Wolf
3c90c65d7a blkdebug: Implement suspend/resume of AIO requests
This allows more systematic AIO testing. The patch adds three new
operations to blkdebug:

 * Setting a "breakpoint" on a blkdebug event. The next request that
   triggers this breakpoint is suspended and is tagged with a name.
   The breakpoint is removed after a request has triggered it.

 * A suspended request (identified by it's tag) can be resumed

 * It's possible to check whether a suspended request with a given
   tag exists. This can be used for waiting for an event.

Ideally, we would instead tag requests right when they are created and
set breakpoints for individual requests. However, at this point the
block layer doesn't allow this easily, and breakpoints that trigger for
any request already allow a lot of useful testing.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-12 12:33:48 +01:00
Kevin Wolf
9e35542b0f blkdebug: Factor out remove_rule()
The cleanup work to remove a rule depends on the type of the rule. It's
easy for the existing rules as there is no data that must be cleaned up
and is specific to a type yet, but the next patch will change this.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-12 12:33:48 +01:00
Kevin Wolf
312a2ba0eb blkdebug: Allow usage without config file
As soon as new rules can be set during runtime, as introduced by the
next patch, blkdebug makes sense even without a config file.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-12 12:33:48 +01:00
Fabien Chouteau
fbcad04d6b Fix error code checking for SetFilePointer() call
An error has occurred if the return value is invalid_set_file_pointer
and getlasterror doesn't return no_error.

Signed-off-by: Fabien Chouteau <chouteau@adacore.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-12-11 11:36:57 +01:00
Stefan Priebe
473c7f0255 rbd: Fix race between aio completition and aio cancel
This one fixes a race which qemu had also in iscsi block driver
between cancellation and io completition.

qemu_rbd_aio_cancel was not synchronously waiting for the end of
the command.

To archieve this it introduces a new status flag which uses
-EINPROGRESS.

Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11 11:05:11 +01:00
Paolo Bonzini
c208e8c2d8 raw-posix: inline paio_ioctl into hdev_aio_ioctl
clang now warns about an unused function:
  CC    block/raw-posix.o
block/raw-posix.c:707:26: warning: unused function paio_ioctl
[-Wunused-function]
static BlockDriverAIOCB *paio_ioctl(BlockDriverState *bs, int fd,
                         ^
1 warning generated.

because the only use of paio_ioctl() is inside a #if defined(__linux__)
guard and it is static now.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-12-11 11:04:26 +01:00
Charles Arnold
258d2edbcd block: vpc support for ~2 TB disks
The VHD specification allows for up to a 2 TB disk size. The current
implementation in qemu emulates EIDE and ATA-2 hardware which only allows
for up to 127 GB.  This disk size limitation can be overridden by allowing
up to 255 heads instead of the normal 4 bit limitation of 16.  Doing so
allows disk images to be created of up to nearly 2 TB.  This change does
not violate the VHD format specification nor does it change how smaller
disks (ie, <=127GB) are defined.

[Charles Arnold also writes: "In analyzing a 160 GB VHD fixed disk image
created on Windows 2008 R2, it appears that MS is also ignoring the CHS
values in the footer geometry field in whatever driver they use for
accessing the image.  The CHS values are set at 65535,16,255 which
obviously doesn't represent an image size of 160 GB." -- Stefan]

Signed-off-by: Charles Arnold <carnold@suse.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-12-11 11:04:26 +01:00
Charles Arnold
1fe1fa510a block: vpc initialize the uuid footer field
Initialize the uuid field in the footer with a generated uuid.

Signed-off-by: Charles Arnold <carnold@suse.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-12-11 11:04:25 +01:00
Kevin Wolf
c57b6656c3 aio: Get rid of qemu_aio_flush()
There are no remaining users, and new users should probably be
using bdrv_drain_all() in the first place.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11 11:04:25 +01:00
Peter Lieven
f807ecd574 iscsi: do not assume device is zero initialized
Without any complex checks we can't assume that an
iscsi target is initialized to zero.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-11-28 12:51:58 +01:00
Peter Lieven
e829b0bb05 iscsi: fix deadlock during login
If the connection is interrupted before the first login is successfully
completed qemu-kvm is waiting forever in qemu_aio_wait().

This is fixed by performing an sync login to the target. If the
connection breaks after the first successful login errors are
handled internally by libiscsi.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-11-28 12:50:56 +01:00
Peter Lieven
8da1e18b0c iscsi: fix segfault in url parsing
If an invalid URL is specified iscsi_get_error(iscsi) is called
with iscsi == NULL.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-11-28 12:46:13 +01:00
Stefan Priebe
08448d5195 use int64_t for return values from rbd instead of int
rbd / rados tends to return pretty often length of writes
or discarded blocks. These values might be bigger than int.

The steps to reproduce are:

  mkfs.xfs -f a whole device bigger than int in bytes. mkfs.xfs sends
  a discard. Important is that you use scsi-hd and set
  discard_granularity=512. Otherwise rbd disabled discard support.

Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-11-21 09:43:23 +01:00
Stefan Hajnoczi
8ba2aae32c vdi: don't override libuuid symbols
It's poor symbol hygiene to provide a global symbols that collide with a
common library like libuuid.  If QEMU links against a shared library
that depends on uuid_generate() it can end up calling our stub version
of the function.

This exact scenario happened with GlusterFS libgfapi.so, which depends
on libglusterfs.so's uuid_generate().

Scope the uuid stubs for vdi.c only and avoid affecting other shared
objects.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2012-11-21 09:40:29 +01:00