Commit Graph

1076 Commits

Author SHA1 Message Date
Paolo Bonzini
0d09e41a51 hw: move headers to include/
Many of these should be cleaned up with proper qdev-/QOM-ification.
Right now there are many catch-all headers in include/hw/ARCH depending
on cpu.h, and this makes it necessary to compile these files per-target.
However, fixing this does not belong in these patches.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-04-08 18:13:10 +02:00
Kevin Wolf
c2b6ff51e4 qcow2: Fix L1 write error handling in qcow2_update_snapshot_refcount
It ignored the error code, and at least the 'goto fail' is obvious
nonsense as it creates an endless loop (if the next attempt doesn't
magically succeed) and leaves the in-memory L1 table in big-endian
instead of converting it back.

In error cases, there's no point in writing an updated L1 table, so
skip this part for them.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-04-05 18:58:05 +02:00
Kevin Wolf
c2bc78b6a9 qcow2: Return real error in qcow2_update_snapshot_refcount
This fixes the error message triggered by the following script:

    cat > /tmp/blkdebug.cfg <<EOF
    [inject-error]
    event = "cluster_free"
    errno = "28"
    immediately = "off"
    EOF

    $qemu_img create -f qcow2 test.qcow2 10G
    $qemu_img snapshot -c snap test.qcow2
    $qemu_img snapshot -d snap blkdebug:/tmp/blkdebug.cfg:test.qcow2

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-04-05 18:58:05 +02:00
Stefan Hajnoczi
f9e8cacc55 oslib-posix: rename socket_set_nonblock() to qemu_set_nonblock()
The fcntl(fd, F_SETFL, O_NONBLOCK) flag is not specific to sockets.
Rename to qemu_set_nonblock() just like qemu_set_cloexec().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2013-04-02 11:47:37 -04:00
Kevin Wolf
ecdd5333ab qcow2: Gather clusters in a looping loop
Instead of just checking once in exactly this order if there are
dependendies, non-COW clusters and new allocation, this starts looping
around these. This way we can, for example, gather non-COW clusters after
new allocations as long as the host cluster offsets stay contiguous.

Once handle_dependencies() is extended so that COW areas of in-flight
allocations can be overwritten, this allows to continue with gathering
other clusters (we wouldn't be able to do that without this change
because we would have missed a possible second dependency in one of the
next clusters).

This means that in the typical sequential write case, we can combine the
COW overwrite of one cluster with the allocation of the next cluster as
soon as something like Delayed COW gets actually implemented. It is only
by avoiding splitting requests this way that Delayed COW actually starts
improving performance noticably.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:44 +01:00
Kevin Wolf
2c3b32d256 qcow2: Move cluster gathering to a non-looping loop
This patch is mainly to separate the indentation change from the
semantic changes. All that really changes here is that everything moves
into a while loop, all 'goto done' become 'break' and at the end of the
loop a new 'break is inserted.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:44 +01:00
Kevin Wolf
88c6588c51 qcow2: Allow requests with multiple l2metas
Instead of expecting a single l2meta, have a list of them. This allows
to still have a single I/O request for the guest data, even though
multiple l2meta may be needed in order to describe both a COW overwrite
and a new cluster allocation (typical sequential write case).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:44 +01:00
Kevin Wolf
710c2496d8 qcow2: Use byte granularity in qcow2_alloc_cluster_offset()
This gets rid of the nb_clusters and keep_clusters and the associated
complicated calculations. Just advance the number of bytes that have
been processed and everything is fine.

This patch advances the variables even after the last operation even
though they aren't used any more afterwards to make things look more
uniform. A later patch will turn the whole thing into a loop and then
it actually starts making sense.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:44 +01:00
Kevin Wolf
411d62b04b qcow2: Prepare handle_alloc/copied() for byte granularity
This makes handle_alloc() and handle_copied() return byte-granularity
host offsets instead of returning always the cluster start. This is
required so that qcow2_alloc_cluster_offset() can stop aligning
everything to cluster boundaries.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
e62daaf679 qcow2: handle_copied(): Implement non-zero host_offset
Look only for clusters that start at a given physical offset.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
c53ede9f6d qcow2: handle_copied(): Get rid of keep_clusters parameter
Now *bytes is used to return the length of the area that can be written
to without performing an allocation or COW.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
acb0467f8d qcow2: handle_copied(): Get rid of nb_clusters parameter
handle_copied() uses its bytes parameter now to determine how many
clusters it should try to find.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
0af729ec00 qcow2: Factor out handle_copied()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
83baa9a471 qcow2: Clean up handle_alloc()
Things can be simplified a bit now. No semantic changes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
c37f4cd71d qcow2: Finalise interface of handle_alloc()
The interface works completely on a byte granularity now and duplicated
parameters are removed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
3b8e2e260c qcow2: handle_alloc(): Get rid of keep_clusters parameter
handle_alloc() is now called with the offset at which the actual new
allocation starts instead of the offset at which the whole write request
starts, part of which may already be processed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
f5bc635094 qcow2: handle_alloc(): Get rid of nb_clusters parameter
We already communicate the same information in *bytes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
10f0ed8b2f qcow2: Factor out handle_alloc()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
037689d896 qcow2: Decouple cluster allocation from cluster reuse code
This moves some code that prepares the allocation of new clusters to
where the actual allocation happens. This is the minimum required to be
able to move it to a separate function in the next patch.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
65eb2e35c0 qcow2: Change handle_dependency to byte granularity
This is a more precise description of what really constitutes a
dependency. The behaviour doesn't change at this point because the COW
area of the old request is still aligned to cluster boundaries and
therefore an overlap is detected wheneven the requests touch any part of
the same cluster.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
d9d74f4177 qcow2: Improve check for overlapping allocations
The old code detected an overlapping allocation even when the
allocations didn't actually overlap, but were only adjacent.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:43 +01:00
Kevin Wolf
17a71e5823 qcow2: Handle dependencies earlier
Handling overlapping allocations isn't just a detail of cluster
allocation. It is rather one of three ways to get the host cluster
offset for a write request:

1. If a request overlaps an in-flight allocations, the cluster offset
   can be taken from there (this is what handle_dependencies will evolve
   into) or the request must just wait until the allocation has
   completed. Accessing the L2 is not valid in this case, it has
   outdated information.

2. Outside overlapping areas, check the clusters that can be written to
   as they are, with no COW involved.

3. If a COW is required, allocate new clusters

Changing the code to reflect this doesn't change the behaviour because
overlaps cannot exist for clusters that are kept in step 2. It does
however make it easier for later patches to work on clusters that belong
to an allocation that is still in flight.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:42 +01:00
Kevin Wolf
9ee6439e27 qcow2: Remove bogus unlock of s->lock
The unlock wakes up the next coroutine, but the currently running
coroutine will lock it again before it yields, so this doesn't make a
lot of sense.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:42 +01:00
Kevin Wolf
c349ca4bb2 qcow2: Fix "total clusters" number in bdrv_check
This should be based on the virtual disk size, not on the size of the
image.

Interesting observation: With some VM state stored in the image file,
percentages higher than 100% are possible, even though snapshots
themselves are ignored. This is a qcow2 bug to be fixed another day: The
VM state should be discarded in the active L2 tables after completing
the snapshot creation.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:52:42 +01:00
Stefan Weil
ea804cadf8 block: Add options QDict to bdrv_file_open() prototypes (fix MinGW build)
The new parameter is unused yet.

This part was missing in commit 787e4a8500.

Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-25 09:53:04 +01:00
Liu Yuan
d43731c758 rbd: fix compile error
Commit 787e4a85 [block: Add options QDict to bdrv_file_open() prototypes] didn't
update rbd.c accordingly.

Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-25 09:51:43 +01:00
Kevin Wolf
681e7ad024 nbd: Check against invalid option combinations
A file name may only specified if no host or socket path is specified.
The latter two may not appear at the same time either.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22 17:51:32 +01:00
Kevin Wolf
bebbf7fa9c nbd: Use default port if only host is specified
The URL method already takes care to apply the default port when none is
specfied. Directly specifying driver-specific options required the port
number until now. Allow leaving it out and apply the default.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22 17:51:32 +01:00
Kevin Wolf
f5866fa438 block: Make find_image_format safe with NULL filename
In order to achieve this, the .bdrv_probe callbacks of all drivers must
cope with this. The DMG driver is the only one that bases its decision
on the filename and it needs to be changed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22 17:51:32 +01:00
Kevin Wolf
6963a30d82 block: Introduce .bdrv_parse_filename callback
If a driver needs structured data and not just a string, it can provide
a .bdrv_parse_filename callback now that parses the command line string
into separate options. Keeping this separate from .bdrv_open_filename
ensures that the preferred way of directly specifying the options always
works as well if parsing the string works.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22 17:51:32 +01:00
Kevin Wolf
f53a1febcd nbd: Accept -drive options for the network connection
The existing parsers for the file name now parse everything into the
bdrv_open() options QDict. Instead of using these parsers, you can now
directly specify the options on the command line, like this:

    qemu-system-x86_64 -drive file=nbd:,file.port=1234,file.host=::1

Clearly the file=... part could use further improvement, but it's a
start.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22 17:51:32 +01:00
Kevin Wolf
f17c90bed1 nbd: Keep hostname and port separate
The NBD block supports an URL syntax, for which a URL parser returns
separate hostname and port fields. It also supports the traditional qemu
syntax encoded in a filename. Until now, after parsing the URL to get
each piece of information, a new string is built to be fed to socket
functions.

Instead of building a string in the URL case that is immediately parsed
again, parse the string in both cases and use the QemuOpts interface to
qemu-sockets.c.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22 17:51:31 +01:00
Kevin Wolf
787e4a8500 block: Add options QDict to bdrv_file_open() prototypes
The new parameter is unused yet.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22 17:51:31 +01:00
Kevin Wolf
acdfb480ba qcow2: Fix segfault in qcow2_invalidate_cache
Need to pass an options QDict to qcow2_open() now. This fixes a segfault
on the migration target with qcow2.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-19 11:48:36 +01:00
Liu Yuan
fca23f0ad2 sheepdog: show error message for halt status
Sheepdog (neither quorum nor unsafe mode) will refuse to serve IO requests when
number of alive nodes is less than that of copies specified by users. This will
return 0x19 to QEMU client which currently doesn't recognize it.

This patch adds an error description when QEMU client receives it, other than
plainly printing 'Invalid error code'

Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-19 11:48:36 +01:00
Stefan Hajnoczi
c4d9d19645 threadpool: drop global thread pool
Now that each AioContext has a ThreadPool and the main loop AioContext
can be fetched with bdrv_get_aio_context(), we can eliminate the concept
of a global thread pool from thread-pool.c.

The submit functions must take a ThreadPool* argument.

block/raw-posix.c and block/raw-win32.c use
aio_get_thread_pool(bdrv_get_aio_context(bs)) to fetch the main loop's
ThreadPool.

tests/test-thread-pool.c must be updated to reflect the new
thread_pool_submit() function prototypes.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2013-03-15 16:07:51 +01:00
MORITA Kazutaka
ed9ba72467 sheepdog: set io_flush handler in do_co_req
If an io_flush handler is not set, qemu_aio_wait doesn't invoke
callbacks.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15 16:07:50 +01:00
MORITA Kazutaka
0d6db300cd sheepdog: use non-blocking fd in coroutine context
Using a blocking socket in the coroutine context reduces the chance of
switching to other work.  This patch makes the sheepdog driver use a
non-blocking fd always.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15 16:07:50 +01:00
Paolo Bonzini
381b487d54 qcow2: make is_allocated return true for zero clusters
Otherwise, live migration of the top layer will miss zero clusters and
let the backing file show through.  This also matches what is done in qed.

QCOW2_CLUSTER_ZERO clusters are invalid in v2 image files.  Check this
directly in qcow2_get_cluster_offset instead of replicating the test
everywhere.

Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15 16:07:50 +01:00
Stefan Hajnoczi
3647917919 qcow2: drop unnecessary flush in qcow2_update_snapshot_refcount()
We already flush when the function completes.  There is no need to flush
after every compressed cluster.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15 16:07:50 +01:00
Stefan Hajnoczi
f9cb2860bd qcow2: drop flush in update_cluster_refcount()
The update_cluster_refcount() function increments/decrements a cluster's
refcount and then returns the new refcount value.

There is no need to flush since both update_cluster_refcount() callers
already take care of this:

1. qcow2_alloc_bytes() calls update_cluster_refcount() when compressed
   sectors will be appended to an existing cluster with enough free
   space.  qcow2_alloc_bytes() already flushes so there is no need to do
   so in update_cluster_refcount().

2. qcow2_update_snapshot_refcount() sets a cache dependency on refcounts
   if it needs to update L2 entries.  It also flushes before completing.

Removing this flush significantly speeds up qcow2 snapshot creation:

  $ qemu-img create -f qcow2 test.qcow2 -o size=50G,preallocation=metadata
  $ time qemu-img snapshot -c new test.qcow2

Time drops from more than 3 minutes to under 1 second.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15 16:07:50 +01:00
Stefan Hajnoczi
2154f24e4e qcow2: flush in qcow2_update_snapshot_refcount()
Users of qcow2_update_snapshot_refcount() do not flush consistently.
qcow2_snapshot_create() flushes but qcow2_snapshot_goto() and
qcow2_snapshot_delete() do not.

Solve this by moving the bdrv_flush() into
qcow2_update_snapshot_refcount().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15 16:07:50 +01:00
Stefan Hajnoczi
c1f5bafd70 qcow2: set L2 cache dependency in qcow2_alloc_bytes()
Compressed writes use qcow2_alloc_bytes() to allocate space with byte
granularity.  The affected clusters' refcounts will be incremented but
we do not need to flush yet.

Set a L2 cache dependency on the refcount block cache, so that the
refcounts get written out before the L2 updates.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15 16:07:50 +01:00
Stefan Hajnoczi
f6977f1556 qcow2: flush refcount cache correctly in qcow2_write_snapshots()
Since qcow2 metadata is cached we need to flush the caches, not just the
underlying file.  Use bdrv_flush(bs) instead of bdrv_flush(bs->file).

Also add the error return path when bdrv_flush() fails and move the
flush after checking for qcow2_alloc_clusters() failure so that the
qcow2_alloc_clusters() error return value takes precedence.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15 16:07:50 +01:00
Stefan Hajnoczi
9991923b26 qcow2: flush refcount cache correctly in alloc_refcount_block()
update_refcount() affects the refcount cache, it does not write to disk.
Therefore bdrv_flush(bs->file) does nothing.  We need to flush the
refcount cache in order to write out the refcount updates!

While we're here also add error returns when qcow2_cache_flush() fails.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-15 16:07:49 +01:00
Kevin Wolf
74c4510a3c qcow2: Allow lazy refcounts to be enabled on the command line
qcow2 images now accept a boolean lazy_refcounts options. Use it like
this:

  -drive file=test.qcow2,lazy_refcounts=on

If the option is specified on the command line, it overrides the default
specified by the qcow2 header flags that were set when creating the
image.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15 16:07:49 +01:00
Kevin Wolf
de9c0cec6c block: Add options QDict to bdrv_open() prototype
It doesn't do anything yet except storing the options QDict in the
BlockDriverState.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15 16:07:49 +01:00
Kevin Wolf
1a86938f04 block: Add options QDict to .bdrv_open()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15 16:07:49 +01:00
Peter Lieven
cb1b83e740 iscsi: add iscsi_truncate support
this patch adds iscsi_truncate which effectively allows for
online resizing of iscsi volumes. for this to work you have
to resize the volume on your storage and then call
block_resize command in qemu which will issue a
readcapacity16 to update the capacity.

v4:
  - factor out complete readcapacity logic into a separate function
  - handle capacity change check condition in readcapacity function
    (this happens if the block_resize cmd is the first iscsi task
    executed after a resize on the storage)

v3:
  - remove switch statement in iscsi_open
  - create separate patch for brdv_drain_all() in bdrv_truncate()

v2:
  - add a general bdrv_drain_all() before bdrv_truncate() to avoid
    in-flight AIOs while the device is truncated
  - since no AIOs are in flight we can use a sync libiscsi call
    to re-read the capacity
  - factor out the readcapacity16 logic as it is redundant
    to iscsi_open() and iscsi_truncate().

Signed-off-by: Peter Lieven <pl@kamp.de>
[allow any type of unit attention check condition in iscsi_readcapacity_sync(),
 as in Message-ID: <51263A2A.6070304@dlhnet.de> - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-03-05 17:51:51 +01:00
Peter Lieven
1dde716ed6 iscsi: retry read, write, flush and unmap on unit attention check conditions
the storage might return a check condition status for various reasons.
(e.g. bus reset, capacity change, thin-provisioning info etc.)

currently all these informative status responses lead to an I/O error
which is populated to the guest. this patch introduces a retry mechanism
to avoid this.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-03-05 17:51:50 +01:00
MORITA Kazutaka
1b8bbb46e7 sheepdog: add support for connecting to unix domain socket
This patch adds support for a unix domain socket for a connection
between qemu and local sheepdog server.  You can use the unix domain
socket with the following syntax:

 $ qemu sheepdog+unix:///<vdiname>?socket=<socket path>[#snapid]

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-04 09:54:17 +01:00
MORITA Kazutaka
25af257d21 sheepdog: use inet_connect to simplify connect code
This uses the form "<host>:<port>" for the representation of the
sheepdog server to use inet_connect.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-04 09:54:17 +01:00
MORITA Kazutaka
5d6768e3b8 sheepdog: accept URIs
The URI syntax is consistent with the NBD and Gluster syntax.  The
syntax is

  sheepdog[+tcp]://[host:port]/vdiname[#snapid|#tag]

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-04 09:54:17 +01:00
MORITA Kazutaka
bf1c852aa9 move socket_set_nodelay to osdep.c
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-04 09:54:17 +01:00
Stefan Hajnoczi
4db35162ea qcow2: support compressed clusters in BlockFragInfo
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-22 21:21:09 +01:00
Stefan Hajnoczi
fba31bae2d qcow2: record fragmentation statistics during check
The qemu-img check command can display fragmentation statistics:
 * Total number of clusters in virtual disk
 * Number of allocated clusters
 * Number of fragmented clusters

This patch adds fragmentation statistics support to qcow2.

Compressed and normal clusters count as allocated.  Zero clusters are
not counted as allocated unless their L2 entry has a non-zero offset
(e.g. preallocation).

Only the current L1 table counts towards the statistics - snapshots are
ignored.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-22 21:21:09 +01:00
Stefan Hajnoczi
801f704452 qcow2: introduce check_refcounts_l1/l2() flags
The check_refcounts_l1/l2() functions have a check_copied argument to
check that the QCOW_O_COPIED flag is consistent with refcount == 1.
This should be a bool, not an int.

However, the next patch introduces qcow2 fragmentation statistics and
also needs to pass an option to check_refcounts_l1/l2().  This is a good
opportunity to use an int flags field.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-22 21:21:09 +01:00
Federico Simoncelli
c6bb9ad198 qemu-img: find the image end offset during check
This patch adds the support for reporting the image end offset (in
bytes). This is particularly useful after a conversion (or a rebase)
where the destination is a block device in order to find the first
unused byte at the end of the image.

Signed-off-by: Federico Simoncelli <fsimonce@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-02-22 21:21:08 +01:00
Stefan Hajnoczi
8a8f584008 block/curl: only restrict protocols with libcurl>=7.19.4
The curl_easy_setopt(state->curl, CURLOPT_PROTOCOLS, ...) interface was
introduced in libcurl 7.19.4.  Therefore we cannot protect against
CVE-2013-0249 when linking against an older libcurl.

This fixes the build failure introduced by
fb6d1bbd24.

Reported-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Andreas Färber <andreas.faeber@web.de>
Message-id: 1360743934-8337-1-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-02-13 11:57:35 -06:00
Stefan Hajnoczi
33ccf6675f Revert "block/vpc: Fix size calculation"
This reverts commit f880defbb0.

Jeff Cody's testing revealed that the interpretation of size differs
even between VirtualPC and HyperV.  Revert this so there is time to
consider the impact of any backwards incompatible behavior this change
creates.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-12 12:25:15 +01:00
Stefan Hajnoczi
da888d37b0 block/raw-posix: detect readonly Linux block devices using BLKROGET
Linux block devices can be set read-only with "blockdev --setro
<device>".  The same thing can be done for LVM volumes using "lvchange
--permission r <volume>".  This read-only setting is independent of
device node permissions.  Therefore the device can still be opened
O_RDWR but actual writes will fail.

This results in odd behavior for QEMU.  bdrv_open() is supposed to fail
if a read-only image is being opened with BDRV_O_RDWR.  By not failing
for Linux block devices, the guest boots up but every write produces an
I/O error.

This patch checks whether the block device is read-only so that Linux
block devices behave like regular files.

Reported-by: Sibiao Luo <sluo@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-02-12 12:22:49 +01:00
Stefan Weil
f880defbb0 block/vpc: Fix size calculation
The size calculated from the CHS values is not the real image (disk) size,
but usually a smaller value. This is caused by rounding effects.

Only older operating systems use CHS. Such guests won't be able to use
the whole disk. All modern operating systems use the real size.

This patch fixes https://bugs.launchpad.net/qemu/+bug/1105670/.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-id: 1360265212-22037-1-git-send-email-sw@weilnetz.de
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-02-11 08:14:41 -06:00
Markus Armbruster
312fd5f290 error: Strip trailing '\n' from error string arguments (again)
Commit 6daf194d and be62a2eb got rid of a bunch, but they keep coming
back.  Tracked down with this Coccinelle semantic patch:

    @r@
	expression err, eno, cls, fmt;
	position p;
    @@
    (
	error_report(fmt, ...)@p
    |
	error_set(err, cls, fmt, ...)@p
    |
	error_set_errno(err, eno, cls, fmt, ...)@p
    |
	error_setg(err, fmt, ...)@p
    |
	error_setg_errno(err, eno, fmt, ...)@p
    )
    @script:python@
	fmt << r.fmt;
	p << r.p;
    @@
    if "\\n" in str(fmt):
	print "%s:%s:%s:%s" % (p[0].file, p[0].line, p[0].column, fmt)

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1360354939-10994-4-git-send-email-armbru@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-02-11 08:13:19 -06:00
Stefan Hajnoczi
fb6d1bbd24 block/curl: disable extra protocols to prevent CVE-2013-0249
There is a buffer overflow in libcurl POP3/SMTP/IMAP.  The workaround is
simple: disable extra protocols so that they cannot be exploited.  Full
details here:

  http://curl.haxx.se/docs/adv_20130206.html

QEMU only cares about HTTP, HTTPS, FTP, FTPS, and TFTP.  I have tested
that this fix prevents the exploit on my host with
libcurl-7.27.0-5.fc18.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-02-08 11:14:20 -06:00
Andreas Färber
fdf263f63f block/raw-posix: Build fix for O_ASYNC
Commit eeb6b45d48 (block: raw-posix image
file reopen) broke the build on OpenIndiana.

illumos has no O_ASYNC. Exclude it from flags to be compared
and instead assert that it is not set where defined.

Cf. e61ab1da7e for qemu-ga.

Cc: qemu-stable@nongnu.org (1.3.x)
Cc: Jeff Cody <jcody@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-01 15:11:12 +01:00
Philipp Hahn
cd92347575 vmdk: Allow space in file name
The previous scanf() format string stopped parsing the file name on the
first white white space, which seems to be allowed at least by VMware
Workstation.

Change the format string to collect everything between the first and
second quote as the file name, disallowing line breaks.

Signed-off-by: Philipp Hahn <hahn@univention.de>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-01 14:58:29 +01:00
Kevin Wolf
46536235d8 parallels: Fix bdrv_open() error handling
Return -errno instead of -1 on errors. Hey, no memory leak to fix here
while we're touching it!

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-01 14:58:29 +01:00
Kevin Wolf
4f8aa2e19f dmg: Use g_free instead of free
The buffers are allocated with g_(re)alloc, so use g_free to free them.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-01 14:58:29 +01:00
Kevin Wolf
69d34a360d dmg: Fix bdrv_open() error handling
Return -errno instead of -1 on errors and add error checks in some
places that didn't have one. Passing things by reference requires more
correct typing, replaced a few off_ts therefore - with a 32-bit off_t
this is even a fix for truncation bugs.

While touching the code, fix even some more memory leaks than in the
other drivers...

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-01 14:58:29 +01:00
Kevin Wolf
59294e4659 vpc: Fix bdrv_open() error handling
Return -errno instead of -1 on errors. While touching the
code, fix a memory leak.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-01 14:58:28 +01:00
Kevin Wolf
1a60657f57 cloop: Fix bdrv_open() error handling
Return -errno instead of -1 on errors. While touching the
code, fix a memory leak.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-01 14:58:28 +01:00
Kevin Wolf
5b7d7dfd19 bochs: Fix bdrv_open() error handling
Return -errno instead of -1 on errors. While touching the
code, fix a memory leak.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-01 14:58:28 +01:00
Liu Yuan
6f74c260b4 sheepdog: pass vdi_id to sheep daemon for sd_close()
Sheep daemon needs vdi_id to identify which vdi is closed to release resources
such as object cache.

Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-01 14:58:28 +01:00
Othmar Pasteka
7f2039f611 vmdk: Allow selecting SCSI adapter in image creation
Introduce a new option "adapter_type" when converting to vmdk images.
It can be one of the following: ide (default), buslogic, lsilogic
or legacyESX (according to the vmdk spec from vmware).

In case of a non-ide adapter, heads is set to 255 instead of the 16.
The latter is used for "ide".

Also see LP#545089

Signed-off-by: Othmar Pasteka <pasteka@kabsi.at>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-01 14:58:28 +01:00
Markus Armbruster
6528499fa4 g_malloc(0) and g_malloc0(0) return NULL; simplify
Once upon a time, it was decided that qemu_malloc(0) should abort.
Switching to glib retired that bright idea.  Some code that was added
to cope with it (e.g. in commits 702ef63, b76b6e9) is still around.
Bury it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-30 11:14:46 +01:00
Paolo Bonzini
88ff0e48ee mirror: do nothing on zero-sized disk
On a zero-sized disk we need to break out of the job successfully
before bdrv_dirty_iter_init is called, otherwise you will get an
assertion failure with the next patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:35 +01:00
Stefan Weil
0e87ba2ccb block/vdi: Check for bad signature
vdi_open did not check for a bad signature.
This check was only in vdi_probe.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:35 +01:00
Stefan Weil
8937f8222c block/vdi: Improved return values from vdi_open
vdi_open returned -1 in case of any error, but it should return an
error code (negative value of errno or -EMEDIUMTYPE).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:35 +01:00
Stefan Weil
9f0470bb2d block/vdi: Improve debug output for signature
The signature is a 32 bit value and needs up to 8 hex digits for printing.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:35 +01:00
Stefan Weil
15bac0d54f block: Use error code EMEDIUMTYPE for wrong format in some block drivers
This improves error reports for bochs, cow, qcow, qcow2, qed and vmdk
when a file with the wrong format is selected.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:35 +01:00
Paolo Bonzini
884fea4e87 mirror: support arbitrarily-sized iterations
Yet another optimization is to extend the mirroring iteration to include more
adjacent dirty blocks.  This limits the number of I/O operations and makes
mirroring efficient even with a small granularity.  Most of the infrastructure
is already in place; we only need to put a loop around the computation of
the origin and sector count of the iteration.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:35 +01:00
Paolo Bonzini
402a47411b mirror: support more than one in-flight AIO operation
With AIO support in place, we can start copying more than one chunk
in parallel.  This patch introduces the required infrastructure for
this: the buffer is split into multiple granularity-sized chunks,
and there is a free list to access them.

Because of copy-on-write, a single operation may already require
multiple chunks to be available on the free list.

In addition, two different iterations on the HBitmap may want to
copy the same cluster.  We avoid this by keeping a bitmap of in-flight
I/O operations, and blocking until the previous iteration completes.
This should be a pretty rare occurrence, though; as long as there is
no overlap the next iteration can start before the previous one finishes.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:35 +01:00
Paolo Bonzini
08e4ed6cde mirror: add buf-size argument to drive-mirror
This makes sense when the next commit starts using the extra buffer space
to perform many I/O operations asynchronously.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:34 +01:00
Paolo Bonzini
bd48bde8f0 mirror: switch mirror_iteration to AIO
There is really no change in the behavior of the job here, since
there is still a maximum of one in-flight I/O operation between
the source and the target.  However, this patch already introduces
the AIO callbacks (which are unmodified in the next patch)
and some of the logic to count in-flight operations and only
complete the job when there is none.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:34 +01:00
Paolo Bonzini
eee13dfe30 mirror: allow customizing the granularity
The desired granularity may be very different depending on the kind of
operation (e.g. continuous replication vs. collapse-to-raw) and whether
the VM is expected to perform lots of I/O while mirroring is in progress.

Allow the user to customize it, while providing a sane default so that
in general there will be no extra allocated space in the target compared
to the source.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:34 +01:00
Paolo Bonzini
50717e941b block: allow customizing the granularity of the dirty bitmap
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:34 +01:00
Paolo Bonzini
acc906c6c5 block: return count of dirty sectors, not chunks
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:33 +01:00
Paolo Bonzini
b812f6719c mirror: perform COW if the cluster size is bigger than the granularity
When mirroring runs, the backing files for the target may not yet be
ready.  However, this means that a copy-on-write operation on the target
would fill the missing sectors with zeros.  Copy-on-write only happens
if the granularity of the dirty bitmap is smaller than the cluster size
(and only for clusters that are allocated in the source after the job
has started copying).  So far, the granularity was fixed to 1MB; to avoid
the problem we detected the situation and required the backing files to
be available in that case only.

However, we want to lower the granularity for efficiency, so we need
a better solution.  The solution is to always copy a whole cluster the
first time it is touched.  The code keeps a bitmap of clusters that
have already been allocated by the mirroring job, and only does "manual"
copy-on-write if the chunk being copied is zero in the bitmap.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:33 +01:00
Paolo Bonzini
8f0720ecbc block: implement dirty bitmap using HBitmap
This actually uses the dirty bitmap in the block layer, and converts
mirroring to use an HBitmapIter.

Reviewed-by: Laszlo Ersek <lersek@redhat.com> (except block/mirror.c parts)
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:33 +01:00
Peter Lieven
7371d56fb2 iscsi: add support for iovectors
This patch adds support for directly passing the iovec
array from QEMUIOVector if libiscsi supports it (1.8.0
or newer).

Signed-off-by: Peter Lieven <pl@kamp.de>
[Preserve the improvements from commit 4cc841b, iscsi: partly
 avoid iovec linearization in iscsi_aio_writev, 2012-11-19 - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-01-24 15:37:55 +01:00
Paolo Bonzini
4790b03d30 iscsi: do not leak acb->buf when commands are aborted
acb->buf is freed in the WRITE(16) callback, but this may not
get called at all when commands are aborted.  Add another
free in the ABORT TASK callback, which requires setting acb->buf
to NULL everywhere.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-01-24 15:37:55 +01:00
Anthony Liguori
177f7fc688 Merge remote-tracking branch 'bonzini/scsi-next' into staging
# By Peter Lieven (3) and others
# Via Paolo Bonzini
* bonzini/scsi-next:
  scsi: Drop useless null test in scsi_unit_attention()
  lsi: use qbus_reset_all to reset SCSI bus
  scsi: fix segfault with 0-byte disk
  iscsi: add support for iSCSI NOPs [v2]
  iscsi: partly avoid iovec linearization in iscsi_aio_writev
  iscsi: add iscsi_create support
2013-01-23 09:08:54 -06:00
Peter Lieven
5b5d34ec98 iscsi: add support for iSCSI NOPs [v2]
This patch will send NOP-Out PDUs every 5 seconds to the iSCSI target.
If a consecutive number of NOP-In replies fail a reconnect is initiated.
iSCSI NOPs help to ensure that the connection to the target is still operational.
This should not, but in reality may be the case even if the TCP connection is still
alive if there are bugs in either the target or the initiator implementation.

v2:
 - track the NOPs inside libiscsi so libiscsi can reset the counter
   in case it initiates a reconnect.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-01-22 15:07:03 +01:00
Peter Lieven
4cc841b57c iscsi: partly avoid iovec linearization in iscsi_aio_writev
libiscsi expects all write16 data in a linear buffer. If the
iovec only contains one buffer we can skip the linearization
step as well as the additional malloc/free and pass the
buffer directly.

Reported-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-01-22 15:07:03 +01:00
Peter Lieven
de8864e5ae iscsi: add iscsi_create support
This patch adds support for bdrv_create. This allows e.g.
to use qemu-img to convert from any supported device to
an iscsi backed storage as destination.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-01-22 15:07:03 +01:00
Anthony Liguori
8b17ed4caa Merge remote-tracking branch 'stefanha/block' into staging
# By Kevin Wolf (4) and others
# Via Stefan Hajnoczi
* stefanha/block:
  dataplane: support viostor virtio-pci status bit setting
  dataplane: avoid reentrancy during virtio_blk_data_plane_stop()
  win32-aio: use iov utility functions instead of open-coding them
  win32-aio: Fix memory leak
  win32-aio: Fix vectored reads
  aio: Fix return value of aio_poll()
  ide: Remove wrong assertion
  block: fix null-pointer bug on error case in block commit
2013-01-20 11:01:10 -06:00
Andreas Färber
c36dd8a09f block/raw-posix: Make hdev_aio_discard() available outside Linux
Fixes the build on OpenBSD among others.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-19 14:35:02 +00:00
Michael Tokarev
3249dbe661 win32-aio: use iov utility functions instead of open-coding them
We have iov_from_buf() and iov_to_buf(), use them instead of
open-coding these in block/win32-aio.c

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-18 09:57:51 +01:00
Kevin Wolf
e8bccad5ac win32-aio: Fix memory leak
The buffer is allocated for both reads and writes, and obviously it
should be freed even if an error occurs.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-17 10:58:09 +01:00
Kevin Wolf
bcbbd234d4 win32-aio: Fix vectored reads
Copying data in the right direction really helps a lot!

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-17 10:57:13 +01:00
Jeff Cody
6d759117d3 block: fix null-pointer bug on error case in block commit
This is a bug that was caught by a coverity run by Markus.  In
the error case when we errored out to exit_restore_open early in the
function, 'overlay_bs' was still NULL at that point, although it is
used to look up flags and perform a bdrv_reopen().

Move the overlay_bs lookup to where it is needed, and check for NULL
before restoring the flags.  Also get rid of the unneeded parameter
initialization.

Reported-By: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-17 10:51:11 +01:00
Markus Armbruster
7191bf311e block: Fix how mirror_run() frees its buffer
It allocates with qemu_blockalign(), therefore it must free with
qemu_vfree(), not g_free().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 17:28:55 +01:00
Markus Armbruster
7479acdbce win32-aio: Fix how win32_aio_process_completion() frees buffer
win32_aio_submit() allocates it with qemu_blockalign(), therefore it
must be freed with qemu_vfree(), not g_free().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 16:47:45 +01:00
Liu Yuan
f700f8e346 sheepdog: clean up sd_aio_setup()
The last two parameters of sd_aio_setup() are never used, so remove them.

Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 13:40:10 +01:00
Liu Yuan
4778307278 sheepdog: multiplex the rw FD to flush cache
This will reduce sockfds connected to the sheep server to one, which simply the
future hacks.

Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 11:18:49 +01:00
Paolo Bonzini
8238010b26 block: make discard asynchronous
This is easy with the thread pool, because we can use s->is_xfs and
s->has_discard from the worker function.

QEMU has a widespread assumption that each I/O operation writes less
than 2^32 bytes.  This patch doesn't fix it throughout of course,
but it starts correcting struct RawPosixAIOData so that there is
no regression with respect to the synchronous discard implementation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 10:03:47 +01:00
Paolo Bonzini
fcd9d45552 raw: support discard on block devices
Block devices use a ioctl instead of fallocate, so add a separate
implementation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 10:03:47 +01:00
Paolo Bonzini
c85191e5c9 raw-posix: remember whether discard failed
Avoid sending system calls repeatedly if they shall fail.  This
does not apply to XFS: if the filesystem-specific ioctl fails,
something weird is happening.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 10:03:47 +01:00
Kusanagi Kouichi
3d4fa43e64 raw-posix: support discard on more filesystems
Linux 2.6.38 introduced the filesystem independent interface to
deallocate part of a file. As of Linux 3.7, btrfs, ext4, ocfs2,
tmpfs and xfs support it.

Even though the system calls here are in practice issued on Linux,
the code is structured to allow plugging in alternatives for other Unix
variants.  EOPNOTSUPP is used unconditionally in this patch, but it is
supported in both OpenBSD and Mac OS X since forever (see for example
http://lists.debian.org/debian-glibc/2006/02/msg00337.html).

Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 10:03:47 +01:00
Kevin Wolf
8d2497c355 qcow2: Fix segfault on zero-length write
One of the recent refactoring patches (commit f50f88b9) didn't take care
to initialise l2meta properly, so with zero-length writes, which don't
even enter the write loop, qemu just segfaulted.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 09:08:55 +01:00
Anthony Liguori
da758bd7a3 Merge remote-tracking branch 'kwolf/for-anthony' into staging
* kwolf/for-anthony:
  dataplane: handle misaligned virtio-blk requests
  dataplane: extract virtio-blk read/write processing into do_rdwr_cmd()
  block: make qiov_is_aligned() public
  raw-posix: fix bdrv_aio_ioctl
  sheepdog: implement direct write semantics
  block: do not probe zero-sized disks

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-01-14 10:26:26 -06:00
Stefan Hajnoczi
c53b1c5114 block: make qiov_is_aligned() public
The qiov_is_aligned() function checks whether a QEMUIOVector meets a
BlockDriverState's alignment requirements.  This is needed by
virtio-blk-data-plane so:

1. Move the function from block/raw-posix.c to block/block.c.
2. Make it public in block/block.h.
3. Rename to bdrv_qiov_is_aligned().
4. Change return type from int to bool.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-14 10:06:56 +01:00
Paolo Bonzini
b608c8dc02 raw-posix: fix bdrv_aio_ioctl
When the raw-posix aio=thread code was moved from posix-aio-compat.c
to block/raw-posix.c, there was an unintended change to the ioctl code.
The code used to return the ioctl command, which posix_aio_read()
would later morph into a zero.  This hack is not necessary anymore,
and in fact breaks scsi-generic (which expects a zero return code).
Remove it.

Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-14 10:06:56 +01:00
Liu Yuan
0e7106d8b5 sheepdog: implement direct write semantics
Sheepdog supports both writeback/writethrough write but has not yet supported
DIRECTIO semantics which bypass the cache completely even if Sheepdog daemon is
set up with cache enabled.

Suppose cache is enabled on Sheepdog daemon size, the new cache control is

cache=writeback # enable the writeback semantics for write
cache=writethrough # enable the emulated writethrough semantics for write
cache=directsync # disable cache competely

Guest WCE toggling on the run time to toggle writeback/writethrough is also
supported.

Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-14 10:06:56 +01:00
Paolo Bonzini
4d4545743f qemu-option: move standard option definitions out of qemu-config.c
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-01-12 17:17:53 +01:00
Stefan Weil
eb7ff6fb0b Replace remaining gmtime, localtime by gmtime_r, localtime_r
This allows removing of MinGW specific code and improves
reentrancy for POSIX hosts.

[Removed unused ret variable in qemu_get_timedate() to fix warning:
vl.c: In function ‘qemu_get_timedate’:
vl.c:451:16: error: variable ‘ret’ set but not used [-Werror=unused-but-set-variable]
-- Stefan Hajnoczi]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-11 09:44:37 +01:00
Liu Yuan
d6b1ef89a1 sheepdog: pass oid directly to send_pending_req()
Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-02 16:09:00 +01:00
Liu Yuan
bd751f2204 sheepdog: don't update inode when create_and_write fails
For the error case such as SD_RES_NO_SPACE, we shouldn't update the inode bitmap
to avoid the scenario that the object is allocated but wasn't created at the
server side. This will result in VM's IO error on the failed object.

Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-02 16:08:58 +01:00
Stefan Weil
fccedc624c block/raw-win32: Fix compiler warnings (wrong format specifiers)
Commit fbcad04d6b added fprintf statements
with wrong format specifiers.

GetLastError() returns a DWORD which is unsigned long, so %lu must be used.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-02 16:08:57 +01:00
Stefan Hajnoczi
4065742ac0 raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane
The raw_get_aio_fd() function allows virtio-blk-data-plane to get the
file descriptor of a raw image file with Linux AIO enabled.  This
interface is really a layering violation that can be resolved once the
block layer is able to run outside the global mutex - at that point
virtio-blk-data-plane will switch from custom Linux AIO code to using
the block layer.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-02 15:31:39 +01:00
Paolo Bonzini
9c17d615a6 softmmu: move include files to include/sysemu/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:32:45 +01:00
Paolo Bonzini
1de7afc984 misc: move include files to include/qemu/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:32:39 +01:00
Paolo Bonzini
caf71f86a3 migration: move include files to include/migration/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:32 +01:00
Paolo Bonzini
737e150e89 block: move include files to include/block/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:31 +01:00
Paolo Bonzini
7b1b5d1913 qapi: move include files to include/qobject/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:31 +01:00
Paolo Bonzini
f8fe796407 janitor: do not include qemu-char everywhere
Touching char/char.h basically causes the whole of QEMU to
be rebuilt.  Avoid this, it is usually unnecessary.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:29:59 +01:00
Paolo Bonzini
077805fa92 janitor: do not rely on indirect inclusions of or from qemu-char.h
Various header files rely on qemu-char.h including qemu-config.h or
main-loop.h, but they really do not need qemu-char.h at all (particularly
interesting is the case of the block layer!).  Clean this up, and also
add missing inclusions of qemu-char.h itself.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:29:52 +01:00
Paolo Bonzini
525877c999 build: move rules from Makefile to */Makefile.objs
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:29:06 +01:00
Kevin Wolf
226c3c26b9 qcow2: Factor out handle_dependencies()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
4e95314e2b qcow2: Execute run_dependent_requests() without lock
There's no reason for run_dependent_requests() to hold s->lock, and a
later patch will require that in fact the lock is not held.

Also, before this patch, run_dependent_requests() not only does what its
name suggests, but also removes the l2meta from the list of in-flight
requests. When changing this, it becomes an one-liner, so just inline it
completely.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
280d373579 qcow2: Enable dirty flag in qcow2_alloc_cluster_link_l2
This is closer to where the dirty flag is really needed, and it avoids
having checks for special cases related to cluster allocation directly
in the writev loop.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
f50f88b9fe qcow2: Allocate l2meta only for cluster allocations
Even for writes to already allocated clusters, an l2meta is allocated,
though it stays effectively unused. After this patch, only allocating
requests still have one. Each l2meta now describes an in-flight request
that writes to clusters that are not yet hooked up in the L2 table.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
060bee8943 qcow2: Drop l2meta.cluster_offset
There's no real reason to have an l2meta for normal requests that don't
allocate anything. Before we can get rid of it, we must return the host
cluster offset in a different way.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
cf5c1a231e qcow2: Allocate l2meta dynamically
As soon as delayed COW is introduced, the l2meta struct is needed even
after completion of the request, so it can't live on the stack.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
593fb83cac qcow2: Introduce Qcow2COWRegion
This makes it easier to address the areas for which a COW must be
performed. As a nice side effect, the COW code in
qcow2_alloc_cluster_link_l2 becomes really trivial.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
1d3afd649b qcow2: Round QCowL2Meta.offset down to cluster boundary
The offset within the cluster is already present as n_start and this is
what the code uses. QCowL2Meta.offset is only needed at a cluster
granularity.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-13 15:37:59 +01:00
Kevin Wolf
67a7a0ebe5 qcow2: Move BLKDBG_EVENT out of the lock
We want to use these events to suspend requests for testing concurrent
AIO requests. Suspending requests while they are holding the CoMutex is
rather boring for this purpose.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-12 12:33:48 +01:00
Kevin Wolf
3c90c65d7a blkdebug: Implement suspend/resume of AIO requests
This allows more systematic AIO testing. The patch adds three new
operations to blkdebug:

 * Setting a "breakpoint" on a blkdebug event. The next request that
   triggers this breakpoint is suspended and is tagged with a name.
   The breakpoint is removed after a request has triggered it.

 * A suspended request (identified by it's tag) can be resumed

 * It's possible to check whether a suspended request with a given
   tag exists. This can be used for waiting for an event.

Ideally, we would instead tag requests right when they are created and
set breakpoints for individual requests. However, at this point the
block layer doesn't allow this easily, and breakpoints that trigger for
any request already allow a lot of useful testing.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-12 12:33:48 +01:00
Kevin Wolf
9e35542b0f blkdebug: Factor out remove_rule()
The cleanup work to remove a rule depends on the type of the rule. It's
easy for the existing rules as there is no data that must be cleaned up
and is specific to a type yet, but the next patch will change this.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-12 12:33:48 +01:00
Kevin Wolf
312a2ba0eb blkdebug: Allow usage without config file
As soon as new rules can be set during runtime, as introduced by the
next patch, blkdebug makes sense even without a config file.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-12 12:33:48 +01:00
Fabien Chouteau
fbcad04d6b Fix error code checking for SetFilePointer() call
An error has occurred if the return value is invalid_set_file_pointer
and getlasterror doesn't return no_error.

Signed-off-by: Fabien Chouteau <chouteau@adacore.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-12-11 11:36:57 +01:00
Stefan Priebe
473c7f0255 rbd: Fix race between aio completition and aio cancel
This one fixes a race which qemu had also in iscsi block driver
between cancellation and io completition.

qemu_rbd_aio_cancel was not synchronously waiting for the end of
the command.

To archieve this it introduces a new status flag which uses
-EINPROGRESS.

Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11 11:05:11 +01:00
Paolo Bonzini
c208e8c2d8 raw-posix: inline paio_ioctl into hdev_aio_ioctl
clang now warns about an unused function:
  CC    block/raw-posix.o
block/raw-posix.c:707:26: warning: unused function paio_ioctl
[-Wunused-function]
static BlockDriverAIOCB *paio_ioctl(BlockDriverState *bs, int fd,
                         ^
1 warning generated.

because the only use of paio_ioctl() is inside a #if defined(__linux__)
guard and it is static now.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-12-11 11:04:26 +01:00
Charles Arnold
258d2edbcd block: vpc support for ~2 TB disks
The VHD specification allows for up to a 2 TB disk size. The current
implementation in qemu emulates EIDE and ATA-2 hardware which only allows
for up to 127 GB.  This disk size limitation can be overridden by allowing
up to 255 heads instead of the normal 4 bit limitation of 16.  Doing so
allows disk images to be created of up to nearly 2 TB.  This change does
not violate the VHD format specification nor does it change how smaller
disks (ie, <=127GB) are defined.

[Charles Arnold also writes: "In analyzing a 160 GB VHD fixed disk image
created on Windows 2008 R2, it appears that MS is also ignoring the CHS
values in the footer geometry field in whatever driver they use for
accessing the image.  The CHS values are set at 65535,16,255 which
obviously doesn't represent an image size of 160 GB." -- Stefan]

Signed-off-by: Charles Arnold <carnold@suse.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-12-11 11:04:26 +01:00
Charles Arnold
1fe1fa510a block: vpc initialize the uuid footer field
Initialize the uuid field in the footer with a generated uuid.

Signed-off-by: Charles Arnold <carnold@suse.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-12-11 11:04:25 +01:00
Kevin Wolf
c57b6656c3 aio: Get rid of qemu_aio_flush()
There are no remaining users, and new users should probably be
using bdrv_drain_all() in the first place.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11 11:04:25 +01:00
Peter Lieven
f807ecd574 iscsi: do not assume device is zero initialized
Without any complex checks we can't assume that an
iscsi target is initialized to zero.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-11-28 12:51:58 +01:00
Peter Lieven
e829b0bb05 iscsi: fix deadlock during login
If the connection is interrupted before the first login is successfully
completed qemu-kvm is waiting forever in qemu_aio_wait().

This is fixed by performing an sync login to the target. If the
connection breaks after the first successful login errors are
handled internally by libiscsi.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-11-28 12:50:56 +01:00
Peter Lieven
8da1e18b0c iscsi: fix segfault in url parsing
If an invalid URL is specified iscsi_get_error(iscsi) is called
with iscsi == NULL.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-11-28 12:46:13 +01:00
Stefan Priebe
08448d5195 use int64_t for return values from rbd instead of int
rbd / rados tends to return pretty often length of writes
or discarded blocks. These values might be bigger than int.

The steps to reproduce are:

  mkfs.xfs -f a whole device bigger than int in bytes. mkfs.xfs sends
  a discard. Important is that you use scsi-hd and set
  discard_granularity=512. Otherwise rbd disabled discard support.

Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-11-21 09:43:23 +01:00
Stefan Hajnoczi
8ba2aae32c vdi: don't override libuuid symbols
It's poor symbol hygiene to provide a global symbols that collide with a
common library like libuuid.  If QEMU links against a shared library
that depends on uuid_generate() it can end up calling our stub version
of the function.

This exact scenario happened with GlusterFS libgfapi.so, which depends
on libglusterfs.so's uuid_generate().

Scope the uuid stubs for vdi.c only and avoid affecting other shared
objects.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2012-11-21 09:40:29 +01:00
Jeff Cody
1bc6b705ee block: add bdrv_reopen() support for raw hdev, floppy, and cdrom
For hdev, floppy, and cdrom, the reopen() handlers are the same as
for the file reopen handler.  For floppy and cdrom types, however,
we keep O_NONBLOCK, as in the _open function.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-11-21 09:40:29 +01:00
Gerhard Wiesinger
b1649fae49 vmdk: Fix data corruption bug in WRITE and READ handling
Fixed a MAJOR BUG in VMDK files on file boundaries on reads
and ALSO ON WRITES WHICH MIGHT CORRUPT THE IMAGE AND DATA!!!!!!

Triggered for example with the following VMDK file (partly listed):
RW 4193792 FLAT "XP-W1-f001.vmdk" 0
RW 2097664 FLAT "XP-W1-f002.vmdk" 0
RW 4193792 FLAT "XP-W1-f003.vmdk" 0
RW 512 FLAT "XP-W1-f004.vmdk" 0
RW 4193792 FLAT "XP-W1-f005.vmdk" 0
RW 2097664 FLAT "XP-W1-f006.vmdk" 0
RW 4193792 FLAT "XP-W1-f007.vmdk" 0
RW 512 FLAT "XP-W1-f008.vmdk" 0

Patch includes:
1.) Patch fixes wrong calculation on extent boundaries. Especially it
fixes the relativeness of the sector number to the current extent.

Verfied correctness with:
1.) Converted either with Virtualbox to VDI and then with qemu-img and
    then with qemu-img only:

    VBoxManage clonehd --format vdi /VM/XP-W/new/XP-W1.vmdk ~/.VirtualBox/Harddisks/XP-W1-new-test.vdi
    ./qemu-img convert -O raw ~/.VirtualBox/Harddisks/XP-W1-new-test.vdi /root/QEMU/VM-XP-W1/XP-W1-via-VBOX.img
    md5sum /root/QEMU/VM-XP-W/XP-W1-direct.img
    md5sum /root/QEMU/VM-XP-W/XP-W1-via-VBOX.img
    => same MD5 hash

2.) Verified debug log files
3.) Run Windows XP successfully
4.) chkdsk run successfully without any errors

Signed-off-by: Gerhard Wiesinger <lists@wiesinger.com>
Acked-by: Fam Zheng <famcool@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-11-14 18:19:23 +01:00
Stefan Hajnoczi
d7331bed11 aio: rename AIOPool to AIOCBInfo
Now that AIOPool no longer keeps a freelist, it isn't really a "pool"
anymore.  Rename it to AIOCBInfo and make it const since it no longer
needs to be modified.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-11-14 18:19:21 +01:00
Stefan Weil
cee40d2d2d block: Workaround for older versions of MinGW gcc
Versions before gcc-4.6 don't support unnamed fields in initializers
(see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10676).

Offset and OffsetHigh belong to an unnamed struct which is part of an
unnamed union. Therefore the original code does not work with older
versions of gcc.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-11-14 18:19:21 +01:00
Kevin Wolf
a354807706 qcow2: Fix refcount table size calculation
A missing factor for the refcount table entry size in the calculation
could mean that too little memory was allocated for the in-memory
representation of the table, resulting in a buffer overflow.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
2012-11-14 18:19:21 +01:00
Paolo Bonzini
1d7d2a9d21 nbd: accept URIs
The URI syntax is consistent with the Gluster syntax.  Export names
are specified in the path, preceded by one or more (otherwise unused)
slashes.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-11-12 14:38:28 +01:00
Paolo Bonzini
d04b0bbbc9 nbd: accept relative path to Unix socket
Adding the "is_unix" member now will simplify the parsing of NBD URIs.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-11-12 11:33:29 +01:00
Paolo Bonzini
f563a5d7a8 Merge remote-tracking branch 'origin/master' into threadpool
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-31 10:42:51 +01:00
Paolo Bonzini
a27365265c raw-win32: implement native asynchronous I/O
With the new support for EventNotifiers in the AIO event loop, we
can hook a completion port to every opened file and use asynchronous
I/O on them.

Wine's support is extremely inefficient, also because it really does
the I/O synchronously on regular files. (!)  But it works, and it is
good to keep the Win32 and POSIX ports as similar as possible.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-31 10:38:13 +01:00
Paolo Bonzini
10fb6e0682 raw-posix: move linux-aio.c to block/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-31 10:38:13 +01:00
Paolo Bonzini
fc4edb84bf raw-win32: add emulated AIO support
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-31 10:38:13 +01:00
Paolo Bonzini
9f8540ecef raw-posix: rename raw-posix-aio.h, hide unavailable prototypes
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-31 10:38:12 +01:00
Paolo Bonzini
de81a16936 raw: merge posix-aio-compat.c into block/raw-posix.c
Making the qemu_paiocb specific to raw devices will let us access members
of the BDRVRawState arbitrarily.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-31 10:38:12 +01:00
Paolo Bonzini
47e6b251a5 block: switch posix-aio-compat to threadpool
This is not meant for portability, but to remove code duplication.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-31 10:38:12 +01:00
Paolo Bonzini
f42b22077b aio: add Win32 implementation
The Win32 implementation will only accept EventNotifiers, thus a few
drivers are disabled under Windows.  EventNotifiers are a good match
for the GSource implementation, too, because the Win32 port of glib
allows to place their HANDLEs in a GPollFD.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-30 09:30:53 +01:00
Paolo Bonzini
b952b5589a mirror: add support for on-source-error/on-target-error
Error management is important for mirroring; otherwise, an error on the
target (even something as "innocent" as ENOSPC) requires to start again
with a full copy.  Similar to on_read_error/on_write_error, two separate
knobs are provided for on_source_error (reads) and on_target_error (writes).
The default is 'report' for both.

The 'ignore' policy will leave the sector dirty, so that it will be
retried later.  Thus, it will not cause corruption.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24 10:26:22 +02:00
Paolo Bonzini
d63ffd87ac mirror: implement completion
Switching to the target of the migration is done mostly asynchronously,
and reported to management via the BLOCK_JOB_COMPLETED event; the only
synchronous phase is opening the backing files.  bdrv_open_backing_file
can always be done, even for migration of the full image (aka sync:
'full').  In this case, qmp_drive_mirror will create the target disk
with no backing file at all, and bdrv_open_backing_file will be a no-op.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24 10:26:22 +02:00
Paolo Bonzini
893f7ebafe mirror: introduce mirror job
This patch adds the implementation of a new job that mirrors a disk to
a new image while letting the guest continue using the old image.
The target is treated as a "black box" and data is copied from the
source to the target in the background.  This can be used for several
purposes, including storage migration, continuous replication, and
observation of the guest I/O in an external program.  It is also a
first step in replacing the inefficient block migration code that is
part of QEMU.

The job is possibly never-ending, but it is logically structured into
two phases: 1) copy all data as fast as possible until the target
first gets in sync with the source; 2) keep target in sync and
ensure that reopening to the target gets a correct (full) copy
of the source data.

The second phase is indicated by the progress in "info block-jobs"
reporting the current offset to be equal to the length of the file.
When the job is cancelled in the second phase, QEMU will run the
job until the source is clean and quiescent, then it will report
successful completion of the job.

In other words, the BLOCK_JOB_CANCELLED event means that the target
may _not_ be consistent with a past state of the source; the
BLOCK_JOB_COMPLETED event means that the target is consistent with
a past state of the source.  (Note that it could already happen
that management lost the race against QEMU and got a completion
event instead of cancellation).

It is not yet possible to complete the job and switch over to the target
disk.  The next patches will fix this and add many refinements to the
basic idea introduced here.  These include improved error management,
some tunable knobs and performance optimizations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24 10:26:19 +02:00
Paolo Bonzini
65f4632243 block: rename block_job_complete to block_job_completed
The imperative will be used for the QMP command.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24 10:26:19 +02:00
Jeff Cody
d5208c45be block: in commit, determine base image from the top image
This simplifies some code and error checking, and also fixes a bug.

bdrv_find_backing_image() should only be passed absolute filenames,
or filenames relative to the chain.  In the QMP message handler for
block commit, when looking up the base do so from the determined top
image, so we know it is reachable from top.

Some of the error messages put out by block-commit have changed
slightly, which causes 2 tests cases for block-commit to fail.
This patch updates the test cases to look for the correct error
output.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24 10:26:19 +02:00
MORITA Kazutaka
2f5368017f sheepdog: use bool for boolean variables
This improves readability.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-10-12 10:47:35 +02:00
Aurelien Jarno
048d3612a5 Merge branch 'trivial-patches' of git://github.com/stefanha/qemu
* 'trivial-patches' of git://github.com/stefanha/qemu:
  versatilepb: Use symbolic indices for ARM PIC
  qdev: kill bogus comment
  qemu-barrier: Fix compiler version check for future gcc versions
  hw: Add missing 'static' attribute for QEMUMachine
  cleanup useless return sentence
  qemu-sockets: Fix compiler warning (regression for MinGW)
  vnc: Fix spelling (hellmen -> hellman) in comment
  slirp: Fix spelling in comment (enought -> enough, insure -> ensure)
  tcg/arm: Use tcg_out_mov_reg rather than inline equivalent code
  cpu: Add missing 'static' attribute to qemu_global_mutex
  configure: Support empty target list (--target-list=)
  hw: Fix return value check for bdrv_read, bdrv_write
2012-10-06 18:54:14 +02:00
Amos Kong
4d5b97da35 cleanup useless return sentence
This patch cleans up return sentences in the end of void functions.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Amos Kong <akong@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
2012-10-05 15:10:21 +02:00
Jim Meyering
00ea188125 qcow2: mark this file's sole strncpy use as justified
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-10-05 07:58:38 -05:00
Jim Meyering
d66f8e7bd3 vmdk: relative_path: use pstrcpy in place of strncpy
Avoid strncpy+manual-NUL-terminate.  Use pstrcpy instead.

Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-10-05 07:58:36 -05:00
Jim Meyering
3178e2755e sheepdog: avoid a few buffer overruns
* parse_vdiname: Use pstrcpy, not strncpy, when the destination
buffer must be NUL-terminated.
* sd_open: Likewise, avoid buffer overrun.
* do_sd_create: Likewise.  Leave the preceding memset, since
pstrcpy does not NUL-fill, and filename needs that.
* sd_snapshot_create: Add a comment/question.
* find_vdi_name: Remove a useless memset.
* sd_snapshot_goto: Remove a useless memset.
Use pstrcpy to NUL-terminate, because find_vdi_name requires
that its vdi arg (filename parameter) be NUL-terminated.
It seems ok not to NUL-fill the buffer.
Do the same for snapid: remove useless memset-0 (instead,
zero tag[0]).  Use pstrcpy, not strncpy.
* sd_snapshot_list: Use pstrcpy, not strncpy to write
into the ->name member.  Each must be NUL-terminated.

Acked-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-10-05 07:58:36 -05:00
Paolo Bonzini
8f96b5be92 blkdebug: process all set_state rules in the old state
Currently it is impossible to write a blkdebug script that ping-pongs
between two states, because the second set-state rule will use the
state that is set in the first.  If you have

    [set-state]
    event = "..."
    state = "1"
    new_state = "2"

    [set-state]
    event = "..."
    state = "2"
    new_state = "1"

for example the state will remain locked at 1.  This can be fixed
by first processing all rules, and then setting the state.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28 19:40:56 +02:00
Paolo Bonzini
1d809098aa stream: add on-error argument
This patch adds support for error management to streaming.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28 19:40:56 +02:00
Paolo Bonzini
92aa5c6d77 iostatus: move BlockdevOnError declaration to QAPI
This will let block-stream reuse the enum.  Places that used the enums
are renamed accordingly.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28 19:40:26 +02:00
Paolo Bonzini
2f0c9fe64c block: move job APIs to separate files
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28 19:14:26 +02:00
Jeff Cody
747ff60263 block: add live block commit functionality
This adds the live commit coroutine.  This iteration focuses on the
commit only below the active layer, and not the active layer itself.

The behaviour is similar to block streaming; the sectors are walked
through, and anything that exists above 'base' is committed back down
into base.  At the end, intermediate images are deleted, and the
chain stitched together.  Images are restored to their original open
flags upon completion.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28 18:23:12 +02:00
Bharata B Rao
8d6d89cb63 block: Support GlusterFS as a QEMU block backend.
This patch adds gluster as the new block backend in QEMU. This gives
QEMU the ability to boot VM images from gluster volumes. Its already
possible to boot from VM images on gluster volumes using FUSE mount, but
this patchset provides the ability to boot VM images from gluster volumes
by by-passing the FUSE layer in gluster. This is made possible by
using libgfapi routines to perform IO on gluster volumes directly.

VM Image on gluster volume is specified like this:

file=gluster[+transport]://[server[:port]]/volname/image[?socket=...]

'gluster' is the protocol.

'transport' specifies the transport type used to connect to gluster
management daemon (glusterd). Valid transport types are
tcp, unix and rdma. If a transport type isn't specified, then tcp
type is assumed.

'server' specifies the server where the volume file specification for
the given volume resides. This can be either hostname, ipv4 address
or ipv6 address. ipv6 address needs to be within square brackets [ ].
If transport type is 'unix', then 'server' field should not be specifed.
The 'socket' field needs to be populated with the path to unix domain
socket.

'port' is the port number on which glusterd is listening. This is optional
and if not specified, QEMU will send 0 which will make gluster to use the
default port. If the transport type is unix, then 'port' should not be
specified.

'volname' is the name of the gluster volume which contains the VM image.

'image' is the path to the actual VM image that resides on gluster volume.

Examples:

file=gluster://1.2.3.4/testvol/a.img
file=gluster+tcp://1.2.3.4/testvol/a.img
file=gluster+tcp://1.2.3.4:24007/testvol/dir/a.img
file=gluster+tcp://[1:2:3:4:5:6:7:8]/testvol/dir/a.img
file=gluster+tcp://[1:2:3:4:5:6:7:8]:24007/testvol/dir/a.img
file=gluster+tcp://server.domain.com:24007/testvol/dir/a.img
file=gluster+unix:///testvol/dir/a.img?socket=/tmp/glusterd.socket
file=gluster+rdma://1.2.3.4:24007/testvol/a.img

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28 17:58:12 +02:00
Anthony Liguori
444dbc381b Merge remote-tracking branch 'kwolf/for-anthony' into staging
* kwolf/for-anthony:
  block: remove keep_read_only flag from BlockDriverState struct
  block: convert bdrv_commit() to use bdrv_reopen()
  block: vpc image file reopen
  block: vdi image file reopen
  block: vmdk image file reopen
  block: qcow image file reopen
  block: qcow2 image file reopen
  block: qed image file reopen
  block: raw image file reopen
  block: raw-posix image file reopen
  block: purge s->aligned_buf and s->aligned_buf_size from raw-posix.c
  block: use BDRV_O_NOCACHE instead of s->aligned_buf in raw-posix.c
  block: do not parse BDRV_O_CACHE_WB in block drivers
  block: move open flag parsing in raw block drivers to helper functions
  block: move aio initialization into a helper function
  block: Framework for reopening files safely
  block: make bdrv_set_enable_write_cache() modify open_flags
  block: correctly set the keep_read_only flag
  blockdev: preserve readonly and snapshot states across media changes
2012-09-25 16:06:16 -05:00
Jeff Cody
3fe4b70008 block: vpc image file reopen
There is currently nothing that needs to be done for VPC image
file reopen.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:12 +02:00
Jeff Cody
ecfe2bbabb block: vdi image file reopen
There is currently nothing that needs to be done for VDI reopen.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:12 +02:00
Jeff Cody
3897575f1c block: vmdk image file reopen
This patch supports reopen for VMDK image files.  VMDK extents are added
to the existing reopen queue, so that the transactional model of reopen
is maintained with multiple image files.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:12 +02:00
Jeff Cody
d177692ede block: qcow image file reopen
These are the stubs for the file reopen drivers for the qcow format.

There is currently nothing that needs to be done by the qcow driver
in reopen.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:12 +02:00
Jeff Cody
21d82ac95f block: qcow2 image file reopen
These are the stubs for the file reopen drivers for the qcow2 format.

There is currently nothing that needs to be done by the qcow2 driver
in reopen.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:12 +02:00
Jeff Cody
f9cb20f167 block: qed image file reopen
These are the stubs for the file reopen drivers for the qed format.

There is currently nothing that needs to be done by the qed driver
in reopen.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:12 +02:00
Jeff Cody
01bdddb5aa block: raw image file reopen
These are the stubs for the file reopen drivers for the raw format.

There is currently nothing that needs to be done by the raw driver
in reopen.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:12 +02:00
Jeff Cody
eeb6b45d48 block: raw-posix image file reopen
This is derived from the Supriya Kannery's reopen patches.

This contains the raw-posix driver changes for the bdrv_reopen_*
functions.  All changes are staged into a temporary scratch buffer
during the prepare() stage, and copied over to the live structure
during commit().  Upon abort(), all changes are abandoned, and the
live structures are unmodified.

The _prepare() will create an extra fd - either by means of a dup,
if possible, or opening a new fd if not (for instance, access
control changes).  Upon _commit(), the original fd is closed and
the new fd is used.  Upon _abort(), the duplicate/new fd is closed.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:12 +02:00
Jeff Cody
3d1807ac67 block: purge s->aligned_buf and s->aligned_buf_size from raw-posix.c
The aligned_buf pointer and aligned_buf size are no longer used in
raw_posix.c, so remove all references to them.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:12 +02:00
Jeff Cody
9acc5a06d4 block: use BDRV_O_NOCACHE instead of s->aligned_buf in raw-posix.c
Rather than check for a non-NULL aligned_buf to determine if
raw_aio_submit needs to check for alignment, check for the presence
of BDRV_O_NOCACHE in the bs->open_flags.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:11 +02:00
Jeff Cody
39c9fb9565 block: do not parse BDRV_O_CACHE_WB in block drivers
Block drivers should ignore BDRV_O_CACHE_WB in .bdrv_open flags,
and in the bs->open_flags.

This patch removes the code, leaving the behaviour behind as if
BDRV_O_CACHE_WB was set.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:11 +02:00
Jeff Cody
6a8dc0422e block: move open flag parsing in raw block drivers to helper functions
Code motion, to move parsing of open flags into a helper function.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:11 +02:00
Jeff Cody
fc32a72dc1 block: move aio initialization into a helper function
Move AIO initialization for raw-posix block driver into a helper function.

In addition to just code motion, the aio_ctx pointer is checked for NULL,
prior to calling laio_init(), to make sure laio_init() is only run once.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:11 +02:00
Ronnie Sahlberg
40a13ca8d2 iSCSI: We dont need to explicitely call qemu_notify_event() any more
We no longer need to explicitely call qemu_notify_event() any more
since this is now done automatically any time the filehandles we listen
to change.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-09-21 16:12:33 +02:00
Ronnie Sahlberg
f1a12821d7 iSCSI: We need to support SG_IO also from iscsi_ioctl()
We need to support SG_IO from the synchronous iscsi_ioctl() since
scsi-block uses this to do an INQ to the device to discover its properties
This patch makes scsi-block work with iscsi.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-09-21 16:05:51 +02:00
Stefan Weil
514f21a5d4 vdi: Fix warning from clang
ccc-analyzer reports these warnings:

block/vdi.c:704:13: warning: Dereference of null pointer
            bmap[i] = VDI_UNALLOCATED;
            ^
block/vdi.c:702:13: warning: Dereference of null pointer
            bmap[i] = i;
            ^

Moving some code into the if block fixes this.
It also avoids calling function write with 0 bytes of data.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-12 15:50:09 +02:00
Stefan Weil
45724d6d02 block/curl: Fix wrong free statement
Report from smatch:
block/curl.c:546 curl_close(21) info: redundant null check on s->url calling free()

The check was redundant, and free was also wrong because the memory
was allocated using g_strdup.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-12 15:50:09 +02:00
MORITA Kazutaka
1f7a48de44 sheepdog: fix savevm and loadvm
This patch sets data to be sent to Sheepdog correctly and fixes savevm
and loadvm operations on a Sheepdog image.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-12 15:50:09 +02:00
Anthony Liguori
cdedd9d867 Merge remote-tracking branch 'kwolf/for-anthony' into staging
* kwolf/for-anthony:
  qemu-iotests: add backing file smaller than image test case
  stream: complete early if end of backing file is reached
  qed: refuse unaligned zero writes with a backing file
2012-08-31 10:04:18 -05:00
Stefan Hajnoczi
571cd9dcc7 stream: complete early if end of backing file is reached
It is possible to create an image that is larger than its backing file.
Reading beyond the end of the backing file produces zeroes if no writes
have been made to those sectors in the image file.

This patch finishes streaming early when the end of the backing file is
reached.  Without this patch the block job hangs and continually tries
to stream the first sectors beyond the end of the backing file.

To reproduce the hung block job bug:

  $ qemu-img create -f qcow2 backing.qcow2 128M
  $ qemu-img create -f qcow2 -o backing_file=backing.qcow2 image.qcow2 6G
  $ qemu -drive if=virtio,cache=none,file=image.qcow2
  (qemu) block_stream virtio0
  (qemu) info block-jobs

The qemu-iotests 030 streaming test still passes.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-29 15:23:35 +02:00
Stefan Hajnoczi
ef72f76e58 qed: refuse unaligned zero writes with a backing file
Zero writes have cluster granularity in QED.  Therefore they can only be
used to zero entire clusters.

If the zero write request leaves sectors untouched, zeroing the entire
cluster would obscure the backing file.  Instead return -ENOTSUP, which
is handled by block.c:bdrv_co_do_write_zeroes() and falls back to a
regular write.

The qemu-iotests 034 test cases covers this scenario.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-29 15:23:35 +02:00
Ronnie Sahlberg
135b908878 iscsi: Set number of blocks to 0 for blank CDROM devices
The number of blocks of the device is used to compute the device size
in bdrv_getlength()/iscsi_getlength().
For MMC devices, the ReturnedLogicalBlockAddress in the READCAPACITY10
has a special meaning when it is 0.
In this case it does not mean that LBA 0 is the last accessible LBA,
and thus the device has 1 readable block, but instead it means that the
disc is blank and there are no readable blocks.

This change ensures that when the iSCSI LUN is loaded with a blank
DVD-R disk or similar that bdrv_getlength() will return the correct
size of the device as 0 bytes.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
2012-08-28 14:50:08 +02:00
Anthony Liguori
a9b670b139 Merge remote-tracking branch 'bonzini/scsi-next' into staging
* bonzini/scsi-next:
  virtio-scsi: add backwards-compatibility properties for 1.1 and earlier machines
  iscsi: fix races between task completion and abort
  iscsi: simplify iscsi_schedule_bh
  iscsi: move iscsi_schedule_bh and iscsi_readv_writev_bh_cb
  Revert "iscsi: Fix NULL dereferences / races between task completion and abort"
2012-08-22 13:31:17 -05:00
Anthony Liguori
7b2f89c435 Merge remote-tracking branch 'kwolf/for-anthony' into staging
* kwolf/for-anthony:
  virtio-blk: hide VIRTIO_BLK_F_CONFIG_WCE from old machine types
  Documentation: Warn against qemu-img on active image
  vmdk: Read footer for streamOptimized images
  vmdk: Fix header structure

Conflicts:
	hw/virtio-blk.c
2012-08-22 13:01:05 -05:00
Jim Meyering
a7e47d4bfc sheepdog: don't leak socket file descriptor upon connection failure
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-08-22 10:47:14 -05:00
Paolo Bonzini
1bd075f29e iscsi: fix races between task completion and abort
This patch fixes two main issues with block/iscsi.c:

1) iscsi_task_mgmt_abort_task_async calls iscsi_scsi_task_cancel which
was also directly called in iscsi_aio_cancel

2) a race between task completion and task abortion could happen cause
the scsi_free_scsi_task were done before iscsi_schedule_bh has finished.
To fix this, all the freeing of IscsiTasks and releasing of the AIOCBs
is centralized in iscsi_bh_cb, independent of whether the SCSI command
has completed or was cancelled.

3) iscsi_aio_cancel was not synchronously waiting for the end of the
command.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-08-20 15:58:47 +02:00
Paolo Bonzini
cfb3f5064a iscsi: simplify iscsi_schedule_bh
It is always used with the same callback, remove the argument.  And
its return value is never used, assume allocation succeeds.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-08-20 15:58:47 +02:00
Paolo Bonzini
27cbd828c6 iscsi: move iscsi_schedule_bh and iscsi_readv_writev_bh_cb
Put these functions at the beginning, to avoid forward references
in the next patches.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-08-20 15:58:47 +02:00
Paolo Bonzini
b209091957 Revert "iscsi: Fix NULL dereferences / races between task completion and abort"
This reverts commit 64e69e8092.  The commit
returned immediately from iscsi_aio_cancel, risking corruption in case the
following happens:

    guest                  qemu                 target
  =========================================================================
    send write 1 -------->
                           send write 1 -------->
    cancel write 1 ------>
                           cancel write 1 ------>
       <------------------ cancellation processed
    send write 2 -------->
                           send write 2 -------->
                               <---------------- completed write 2
       <------------------ completed write 2
                               <---------------- completed write 1
                               <---------------- cancellation not done

Here, the guest would see write 2 superseding write 1, when in fact the
outcome could have been the opposite.  The right behavior is to return
only after the target says whether the cancellation was done or not, and
it will be implemented by the next three patches.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-08-20 15:50:45 +02:00
Kevin Wolf
65bd155c73 vmdk: Read footer for streamOptimized images
The footer takes precedence over the header when it exists. It contains
the real grain directory offset that is missing in the header. Without
this patch, streamOptimized images with a footer cannot be read.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2012-08-17 13:27:02 +02:00
Kevin Wolf
7a736bfa4e vmdk: Fix header structure
Commit bb45ded9 swapped gd_offset and rgd_offset. This is wrong.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-17 11:14:19 +02:00
Stefan Priebe
64e69e8092 iscsi: Fix NULL dereferences / races between task completion and abort
Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>
Acked-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-15 13:16:22 +02:00
Corey Bryant
2e1e79dae7 block: Convert close calls to qemu_close
This patch converts all block layer close calls, that correspond
to qemu_open calls, to qemu_close.

Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-15 10:48:57 +02:00
Corey Bryant
6165f4d85d block: Convert open calls to qemu_open
This patch converts all block layer open calls to qemu_open.

Note that this adds the O_CLOEXEC flag to the changed open paths
when the O_CLOEXEC macro is defined.

Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-15 10:48:57 +02:00
Corey Bryant
e174082835 block: Prevent detection of /dev/fdset/ as floppy
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-15 10:48:57 +02:00
Anthony Liguori
53810bab3a Merge remote-tracking branch 'kwolf/for-anthony' into staging
* kwolf/for-anthony:
  qemu-iotests: skip 039 with ./check -nocache
  block: add BLOCK_O_CHECK for qemu-img check
  qcow2: mark image clean after repair succeeds
  qed: mark image clean after repair succeeds
  blockdev: flip default cache mode from writethrough to writeback
  virtio-blk: disable write cache if not negotiated
  virtio-blk: support VIRTIO_BLK_F_CONFIG_WCE
  qemu-iotests: Save some sed processes
  ahci: Fix sglist memleak in ahci_dma_rw_buf()
  ahci: Fix ahci cdrom read corruptions for reads > 128k
  virtio-blk: fix use-after-free while handling scsi commands
2012-08-11 19:48:50 -05:00
Anthony Liguori
312942619a Merge remote-tracking branch 'bonzini/scsi-next' into staging
* bonzini/scsi-next:
  scsi-disk: add support for the UNMAP command
  scsi-disk: improve out-of-range LBA detection for WRITE SAME
  scsi-disk: more assertions and resets for aiocb
  virtio-scsi: do not compare 32-bit QEMU tags against 64-bit virtio-scsi tags
  iscsi: Pick default initiator-name based on the name of the VM
  iscsi: reorganize code for parse_initiator_name
  iscsi: do not leak initiator_name
2012-08-11 17:11:23 -05:00
Stefan Hajnoczi
058f8f16db block: add BLOCK_O_CHECK for qemu-img check
Image formats with a dirty bit, like qed and qcow2, repair dirty image
files upon open with BDRV_O_RDWR.  Performing automatic repair when
qemu-img check runs is not ideal because the bdrv_open() call repairs
the image before the actual bdrv_check() call from qemu-img.c.

Fix this "double repair" since it leads to confusing output from
qemu-img check.  Tell the block driver that this image is being opened
just for bdrv_check().  This skips automatic repair and qemu-img.c can
invoke it manually with bdrv_check().

Update the golden output for qemu-iotests 039 to reflect the new
qemu-img check output.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-10 10:25:12 +02:00
Stefan Hajnoczi
acbe59829e qcow2: mark image clean after repair succeeds
The dirty bit is cleared after image repair succeeds in qcow2_open().
Move this into qcow2_check() so that all callers benefit from this
behavior when fix mode is enabled.

This is necessary so qemu-img check can call .bdrv_check() and mark the
image clean.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-10 10:25:12 +02:00
Stefan Hajnoczi
b10170aca0 qed: mark image clean after repair succeeds
The dirty bit is cleared after image repair succeeds in qed_open().
Move this into qed_check() so that all callers benefit from this
behavior when fix=true.

This is necessary so qemu-img check can call .bdrv_check() and mark the
image clean.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-10 10:25:12 +02:00
Ronnie Sahlberg
31459f463a iscsi: Pick default initiator-name based on the name of the VM
This patch updates the iscsi layer to automatically pick a 'unique'
initiator-name based on the name of the vm in case the user has not set
an explicit iqn-name to use.

Create a new function qemu_get_vm_name() that returns the name of the VM,
if specified.

This way we can thus create default names to use as the initiator name
based on the guest session.

If the VM is not named via the '-name' command line argument, the iscsi
initiator-name used wiull simply be

    iqn.2008-11.org.linux-kvm

If a name for the VM was specified with the '-name' option, iscsi will
use a default initiatorname of

    iqn.2008-11.org.linux-kvm:<name>

These names are just the default iscsi initiator name that qemu will
generate/use only when the user has not set an explicit initiator name
via the commandlines or config files.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
2012-08-09 15:04:09 +02:00
Paolo Bonzini
f2ef4a6dd9 iscsi: reorganize code for parse_initiator_name
Merge the occurrences of the "iqn.2008-11.org.linux-kvm" string
to avoid duplication.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-08-08 14:51:59 +02:00
Paolo Bonzini
b93c94f7ec iscsi: do not leak initiator_name
The argument of iscsi_create_context is never freed by libiscsi,
which in fact calls strdup on it.  Avoid a leak.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-08-08 14:51:59 +02:00
Stefan Hajnoczi
bfe8043e92 qcow2: implement lazy refcounts
Lazy refcounts is a performance optimization for qcow2 that postpones
refcount metadata updates and instead marks the image dirty.  In the
case of crash or power failure the image will be left in a dirty state
and repaired next time it is opened.

Reducing metadata I/O is important for cache=writethrough and
cache=directsync because these modes guarantee that data is on disk
after each write (hence we cannot take advantage of caching updates in
RAM).  Refcount metadata is not needed for guest->file block address
translation and therefore does not need to be on-disk at the time of
write completion - this is the motivation behind the lazy refcount
optimization.

The lazy refcount optimization must be enabled at image creation time:

  qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on a.qcow2 10G
  qemu-system-x86_64 -drive if=virtio,file=a.qcow2,cache=writethrough

Update qemu-iotests 031 and 036 since the extension header size changes
when we add feature bit table entries.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-06 22:39:14 +02:00
Stefan Hajnoczi
c61d0004bc qcow2: introduce dirty bit
This patch adds an incompatible feature bit to mark images that have not
been closed cleanly.  When a dirty image file is opened a consistency
check and repair is performed.

Update qemu-iotests 031 and 036 since the extension header size changes
when we add feature bit table entries.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-06 22:39:14 +02:00
Markus Armbruster
4480e0f924 vvfat: Do not clobber the user's geometry
vvfat creates a virtual VFAT filesystem with a certain logical
geometry that depends on its options.  It sets the "geometry hint" to
this geometry.  It is the only block driver to do this.

The geometry hint is about about *physical* geometry, and used only by
certain hard disk device models.

vvfat's hint is normally invisible for device models, because
bdrv_open() puts a raw format on top of vvfat's fat protocol.  That
raw format is where drive_init() puts the user's geometry (if any),
and where the device model gets it from.

Nobody complained, because the default physical geometry is the same
as vvfat's logical geometry:

    opts        LCHS        def. PCHS
                1024,16,63  same
    :32:        1024,16,63  same
    :16:        1024,16,63  same
    :12:          64,16,63  same

Except when you specify :floppy:

    opts        LCHS        def. PCHS
       :floppy:   80, 2,36  5,16,63
    :32:floppy:   80, 2,36  5,16,63
    :16:floppy:   80, 2,36  5,16,63
    :12:floppy:   80, 2,18  2,16,63

Silly thing to do for use with a hard disk.

However, the "raw" format can be suppressed by adding an
redundant-looking "format=vvfat" to "file=fat:FOO".  Then, vvfat's
hint clobbers the user's geometry, i.e. -drive options cyls, heads,
secs get silently ignored.  Don't do that.

No change without format=vvfat.  With it, the user's hard disk
geometry (-drive options cyls, heads, secs) is now obeyed, and the
default hard disk geometry with :floppy: now matches the one without
format=vvfat.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-17 16:48:30 +02:00
Markus Armbruster
f91cbefe2d vvfat: Fix partition table
Unless parameter ":floppy:" is given, vvfat creates a virtual image
with DOS MBR defining a single partition which holds the FAT file
system.  The size of the virtual image depends on the width of the
FAT: 32 MiB (CHS 64, 16, 63) for 12 bit FAT, 504 MiB (CHS 1024, 16,
63) for 16 and 32 bit FAT, leaving (64*16-1)*63 = 64449 and
(1024*16-1)*64 = 1032129 sectors for the partition.

However, it screws up the end of the partition in the MBR:

    FAT width param.  start CHS  end CHS     start LBA  size
        :32:          0,1,1      1023,14,63       63    1032065
        :16:          0,1,1      1023,14,55       63    1032057
        :12:          0,1,1        63,14,55       63      64377

The actual FAT file system nevertheless assumes the partition has
1032129 or 64449 sectors.  Oops.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-17 16:48:30 +02:00
Christoph Hellwig
19db9b9042 sheepdog: do not blindly memset all read buffers
Only buffers that map to unallocated blocks need to be zeroed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-17 16:48:29 +02:00
MORITA Kazutaka
cddd4ac7a2 sheepdog: always use coroutine-based network functions
This reduces some code duplication.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-17 16:48:29 +02:00
Anthony Liguori
23797df3d9 Merge remote-tracking branch 'mjt/mjt-iov2' into staging
* mjt/mjt-iov2:
  rewrite iov_send_recv() and move it to iov.c
  cleanup qemu_co_sendv(), qemu_co_recvv() and friends
  export iov_send_recv() and use it in iov_send() and iov_recv()
  rename qemu_sendv to iov_send, change proto and move declarations to iov.h
  change qemu_iovec_to_buf() to match other to,from_buf functions
  consolidate qemu_iovec_copy() and qemu_iovec_concat() and make them consistent
  allow qemu_iovec_from_buffer() to specify offset from which to start copying
  consolidate qemu_iovec_memset{,_skip}() into single function and use existing iov_memset()
  rewrite iov_* functions
  change iov_* function prototypes to be more appropriate
  virtio-serial-bus: use correct lengths in control_out() message

Conflicts:
	tests/Makefile

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-07-09 12:35:06 -05:00
Anthony Liguori
715cc00ce1 Merge remote-tracking branch 'kwolf/for-anthony' into staging
* kwolf/for-anthony: (24 commits)
  block: Factor bdrv_read_unthrottled() out of guess_disk_lchs()
  qtest: Tidy up temporary files properly
  fdc: Drop broken code for user-defined floppy geometry
  fdc_test: introduce test_sense_interrupt
  fdc_test: update media_change test
  fdc: fix interrupt handling
  fdc: rewrite seek and DSKCHG bit handling
  block: introduce bdrv_swap, implement bdrv_append on top of it
  block: copy over job and dirty bitmap fields in bdrv_append
  raw: hook into blkdebug
  blkdebug: optionally tie errors to a specific sector
  blkdebug: store list of active rules
  blkdebug: pass getlength to underlying file
  blkdebug: tiny cleanup
  blkdebug: remove sync i/o events
  sheepdog: traverse pending_list from the first for each time
  sheepdog: split outstanding list into inflight and pending
  sheepdog: make sure we don't free aiocb before sending all requests
  sheepdog: use coroutine based socket functions in coroutine context
  sheepdog: restart I/O when socket becomes ready in do_co_req()
  ...
2012-07-09 10:29:40 -05:00
Paolo Bonzini
5c171afa4c raw: hook into blkdebug
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:02 +02:00
Paolo Bonzini
e4780db429 blkdebug: optionally tie errors to a specific sector
This makes blkdebug scripts more powerful, and independent of the
exact sequence of operations performed by streaming.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:02 +02:00
Paolo Bonzini
571cd43e57 blkdebug: store list of active rules
This prepares for the next patch, where some active rules may actually
not trigger depending on input to readv/writev.  Store the active rules
in a SIMPLEQ (so that it can be emptied easily with QSIMPLEQ_INIT), and
fetch the errno/once/immediately arguments from there.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:02 +02:00
Paolo Bonzini
e130225587 blkdebug: pass getlength to underlying file
This is required when using blkdebug with raw format.  Unlike qcow2/QED,
raw asks blkdebug for the length of the file, it doesn't get it from
a header.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:02 +02:00
Paolo Bonzini
368e8dd10a blkdebug: tiny cleanup
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:02 +02:00
Paolo Bonzini
820100fd15 blkdebug: remove sync i/o events
These are unused, except (by mistake more or less) in QED.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:02 +02:00
MORITA Kazutaka
7dc1cde05b sheepdog: traverse pending_list from the first for each time
The pending list can be modified in other coroutine context
sd_co_rw_vector, so we need to traverse the list from the first again
after we send the pending request.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:02 +02:00
MORITA Kazutaka
c292ee6a67 sheepdog: split outstanding list into inflight and pending
outstanding_list_head is used for both pending and inflight requests.
This patch splits it and improves readability.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:02 +02:00
MORITA Kazutaka
1d732d7d7c sheepdog: make sure we don't free aiocb before sending all requests
This patch increments the pending counter before sending requests, and
make sures that aiocb is not freed while sending them.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:01 +02:00
MORITA Kazutaka
b97564f4c5 sheepdog: use coroutine based socket functions in coroutine context
This removes blocking network I/Os in coroutine context.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:01 +02:00
MORITA Kazutaka
2dfcca3b68 sheepdog: restart I/O when socket becomes ready in do_co_req()
Currently, no one reenters the yielded coroutine.  This fixes it.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:01 +02:00
MORITA Kazutaka
1b6ac9985a sheepdog: fix dprintf format strings
This fixes warnings about dprintf format in debug mode.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:01 +02:00
Stefan Hajnoczi
206e6d8551 qcow2: preserve free_byte_offset when qcow2_alloc_bytes() fails
When qcow2_alloc_clusters() error handling code was introduced in commit
5d757b563d, the value of free_byte_offset
was clobbered in the error case.  This patch keeps free_byte_offset at 0
so we will try to allocate clusters again next time this function is
called.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:01 +02:00
Stefan Hajnoczi
b35278f754 qcow2: fix #ifdef'd qcow2_check_refcounts() callers
The DEBUG_ALLOC qcow2.h macro enables additional consistency checks
throughout the code.  This makes it easier to spot corruptions that are
introduced during development.  Since consistency check is an expensive
operation the DEBUG_ALLOC macro is used to compile checks out in normal
builds and qcow2_check_refcounts() calls missed the addition of a new
function argument.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:01 +02:00
Ronnie Sahlberg
622695a458 ISCSI: force use of sg for SMC and SSC devices
If the device we open is a SMC or SSC device, then force the use of sg. We
dont have any medium changer or tape emulation so only passthrough via
real sg or scsi-generic via iscsi would work anyway.

Forcing sg also makes qemu skip trying to read from the device to guess
the image format by reading from the device (find_image_format()).
SMC devices do not implement READ6/10/12/16 so it is not possible to
read from them (SSC have different CDBs).

With this patch I can successfully manage a SMC device wiht iscsi in
passthrough mode.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
[Added TYPE_TAPE handling - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-07-02 10:18:41 +02:00