Commit Graph

1405 Commits

Author SHA1 Message Date
Alberto Garcia
81e5f78a9f block: use bdrv_get_device_or_node_name() in error messages
There are several error messages that identify a BlockDriverState by
its device name. However those errors can be produced in nodes that
don't have a device name associated.

In those cases we should use bdrv_get_device_or_node_name() to fall
back to the node name and produce a more meaningful message. The
messages are also updated to use the more generic term 'node' instead
of 'device'.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 9823a1f0514fdb0692e92868661c38a9e00a12d6.1428485266.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:09 +02:00
Alberto Garcia
9b2aa84f87 block: add bdrv_get_device_or_node_name()
This function gets the device name associated with a BlockDriverState,
or its node name if the device name is empty.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 4fa30aa8d61d9052ce266fd5429a59a14e941255.1428485266.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:09 +02:00
Paolo Bonzini
0b5a24454f block: avoid unnecessary bottom halves
bdrv_aio_* APIs can use coroutines to achieve asynchronicity.  However,
the coroutine may terminate without having yielded back to the caller
(for example because of something that invokes a nested event loop,
or because the coroutine is doing nothing at all).  In this case,
the bdrv_aio_* API must delay the completion to the next iteration
of the main loop, because bdrv_aio_* will never invoke the callback
before returning.

This can be done with a bottom half, and indeed bdrv_aio_* is always
using one for simplicity.  It is possible to gain some performance
(~3%) by avoiding this in the common case.  A new field in the
BlockAIOCBCoroutine struct is set to true until the first time the
corotine has yielded to its creator, and completion goes through a
new function bdrv_co_complete.  If the flag is false, bdrv_co_complete
invokes the callback immediately.  If it is true, the caller will
notice that the coroutine has completed and schedule the bottom
half itself.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1427524638-28157-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:09 +02:00
Fam Zheng
69da3b0b47 block: Pause block jobs in bdrv_drain_all
This is necessary to suppress more IO requests from being generated from
block job coroutines.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 1428069921-2957-3-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:09 +02:00
Fam Zheng
de50a20a4c block: Switch to host monotonic clock for IO throttling
Currently, throttle timers won't make any progress when VCPU is not
running, which would stall the request queue in utils, qtest, vm
suspending, and live migration, without special handling.

Block jobs are confusingly inconsistent between with and without
throttling: if user sets a bps limit, stops the vm, then start a block
job, the block job will not make any progress; in contrary, if user
unsets the bps limit, or if it's not set, the block job will run
normally.

After this patch, with the host clock, even if the VCPUs are stopped,
the throttle queues will be processed.

This patch also enables potential to add throttle to bdrv_drain_all.
Currently all requests are drained immediately. In other words whenever
it is called, IO throttling goes ineffective (examples: system reset,
migration and many block job operations.). This is a loophole that guest
could exploit. If we use the host clock, we can later just trust the
nested poll. This could be done on top.

Note that for qemu-iotests case 093, which uses qtest, we still keep vm
clock so the script can control the clock stepping in order to be
deterministic.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1427268446-6426-1-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:08 +02:00
Stefan Hajnoczi
786a4ea82e Convert (ffs(val) - 1) to ctz32(val)
This commit was generated mechanically by coccinelle from the following
semantic patch:

@@
expression val;
@@
- (ffs(val) - 1)
+ ctz32(val)

The call sites have been audited to ensure the ffs(0) - 1 == -1 case
never occurs (due to input validation, asserts, etc).  Therefore we
don't need to worry about the fact that ctz32(0) == 32.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1427124571-28598-5-git-send-email-stefanha@redhat.com
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:08 +02:00
Fam Zheng
fc3959e466 block: Fix unaligned zero write
If the zero write is not aligned, bdrv_co_do_pwritev will segfault
because of accessing to the NULL qiov passed in by bdrv_co_write_zeroes.
Fix this by allocating a local qiov in bdrv_co_do_pwritev if the request
is not aligned. (In this case the padding iovs are necessary anyway, so
it doesn't hurt.)

Also add a check at the end of bdrv_co_do_pwritev to clear the zero flag
if padding is involved.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1427160230-4489-2-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-03-27 10:01:12 +00:00
Fam Zheng
d51a2427f6 block: Drop bdrv_find
All callers are converted, so drop it.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1425296209-1476-5-git-send-email-famz@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-03-16 12:10:30 -04:00
Markus Armbruster
a1f688f415 block: Deprecate QCOW/QCOW2 encryption
We've steered users away from QCOW/QCOW2 encryption for a while,
because it's a flawed design (commit 136cd19 Describe flaws in
qcow/qcow2 encryption in the docs).

In addition to flawed crypto, we have comically bad usability, and
plain old bugs.  Let me show you.

= Example images =

I'm going to use a raw image as backing file, and two QCOW2 images,
one encrypted, and one not:

    $ qemu-img create -f raw backing.img 4m
    Formatting 'backing.img', fmt=raw size=4194304
    $ qemu-img create -f qcow2 -o encryption,backing_file=backing.img,backing_fmt=raw geheim.qcow2 4m
    Formatting 'geheim.qcow2', fmt=qcow2 size=4194304 backing_file='backing.img' backing_fmt='raw' encryption=on cluster_size=65536 lazy_refcounts=off
    $ qemu-img create -f qcow2 -o backing_file=backing.img,backing_fmt=raw normal.qcow2 4m
    Formatting 'normal.qcow2', fmt=qcow2 size=4194304 backing_file='backing.img' backing_fmt='raw' encryption=off cluster_size=65536 lazy_refcounts=off

= Usability issues =

== Confusing startup ==

When no image is encrypted, and you don't give -S, QEMU starts the
guest immediately:

    $ qemu-system-x86_64 -nodefaults -display none -monitor stdio normal.qcow2
    QEMU 2.2.50 monitor - type 'help' for more information
    (qemu) info status
    VM status: running

But as soon as there's an encrypted image in play, the guest is *not*
started, with no notification whatsoever:

    $ qemu-system-x86_64 -nodefaults -display none -monitor stdio geheim.qcow2
    QEMU 2.2.50 monitor - type 'help' for more information
    (qemu) info status
    VM status: paused (prelaunch)

If the user figured out that he needs to type "cont" to enter his
keys, the confusion enters the next level: "cont" asks for at most
*one* key.  If more are needed, it then silently does nothing.  The
user has to type "cont" once per encrypted image:

    $ qemu-system-x86_64 -nodefaults -display none -monitor stdio -drive if=none,file=geheim.qcow2 -drive if=none,file=geheim.qcow2
    QEMU 2.2.50 monitor - type 'help' for more information
    (qemu) info status
    VM status: paused (prelaunch)
    (qemu) c
    none0 (geheim.qcow2) is encrypted.
    Password: ******
    (qemu) info status
    VM status: paused (prelaunch)
    (qemu) c
    none1 (geheim.qcow2) is encrypted.
    Password: ******
    (qemu) info status
    VM status: running

== Incorrect passwords not caught ==

All existing encryption schemes give you the GIGO treatment: garbage
password in, garbage data out.  Guests usually refuse to mount
garbage, but other usage is prone to data loss.

== Need to stop the guest to add an encrypted image ==

    $ qemu-system-x86_64 -nodefaults -display none -monitor stdio
    QEMU 2.2.50 monitor - type 'help' for more information
    (qemu) info status
    VM status: running
    (qemu) drive_add "" if=none,file=geheim.qcow2
    Guest must be stopped for opening of encrypted image
    (qemu) stop
    (qemu) drive_add "" if=none,file=geheim.qcow2
    OK

Commit c3adb58 added this restriction.  Before, we could expose images
lacking an encryption key to guests, with potentially catastrophic
results.  See also "Use without key is not always caught".

= Bugs =

== Use without key is not always caught ==

Encrypted images can be in an intermediate state "opened, but no key".
The weird startup behavior and the need to stop the guest are there to
ensure the guest isn't exposed to that state.  But other things still
are!

* drive_backup

    $ qemu-system-x86_64 -nodefaults -display none -monitor stdio geheim.qcow2
    QEMU 2.2.50 monitor - type 'help' for more information
    (qemu) drive_backup -f ide0-hd0 out.img raw
    Formatting 'out.img', fmt=raw size=4194304

  I guess this writes encrypted data to raw image out.img.  Good luck
  with figuring out how to decrypt that again.

* commit

    $ qemu-system-x86_64 -nodefaults -display none -monitor stdio geheim.qcow2
    QEMU 2.2.50 monitor - type 'help' for more information
    (qemu) commit ide0-hd0

  I guess this writes encrypted data into the unencrypted raw backing
  image, effectively destroying it.

== QMP device_add of usb-storage fails when it shouldn't ==

When the image is encrypted, device_add creates the device, defers
actually attaching it to when the key becomes available, then fails.
This is wrong.  device_add must either create the device and succeed,
or do nothing and fail.

    $ qemu-system-x86_64 -nodefaults -display none -usb -qmp stdio -drive if=none,id=foo,file=geheim.qcow2
    {"QMP": {"version": {"qemu": {"micro": 50, "minor": 2, "major": 2}, "package": ""}, "capabilities": []}}
    { "execute": "qmp_capabilities" }
    {"return": {}}
    { "execute": "device_add", "arguments": { "driver": "usb-storage", "id": "bar", "drive": "foo" } }
    {"error": {"class": "DeviceEncrypted", "desc": "'foo' (geheim.qcow2) is encrypted"}}
    {"execute":"device_del","arguments": { "id": "bar" } }
    {"timestamp": {"seconds": 1426003440, "microseconds": 237181}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/bar/bar.0/legacy[0]"}}
    {"timestamp": {"seconds": 1426003440, "microseconds": 238231}, "event": "DEVICE_DELETED", "data": {"device": "bar", "path": "/machine/peripheral/bar"}}
    {"return": {}}

This stuff is worse than useless, it's a trap for users.

If people become sufficiently interested in encrypted images to
contribute a cryptographically sane implementation for QCOW2 (or
whatever other format), then rewriting the necessary support around it
from scratch will likely be easier and yield better results than
fixing up the existing mess.

Let's deprecate the mess now, drop it after a grace period, and move
on.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-16 17:07:25 +01:00
Ekaterina Tumanova
892b7de832 block: add bdrv functions for geometry and blocksize
Add driver functions for geometry and blocksize detection

Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1424087278-49393-2-git-send-email-tumanova@linux.vnet.ibm.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10 14:02:21 +01:00
Markus Armbruster
dc523cd348 qemu-img: Suppress unhelpful extra errors in convert, amend
img_convert() and img_amend() use qemu_opts_do_parse(), which reports
errors with qerror_report_err().  Its error messages aren't helpful
here, the caller reports one that actually makes sense.  Reproducer:

    $ qemu-img convert -o backing_format=raw in.img out.img
    qemu-img: Invalid parameter 'backing_format'
    qemu-img: Invalid options for file format 'raw'

To fix, propagate errors through qemu_opts_do_parse().  This lifts the
error reporting into callers.  Drop it from img_convert() and
img_amend(), keep it in qemu_chr_parse_compat(), bdrv_img_create().

Since I'm touching qemu_opts_do_parse() anyway, write a function
comment for it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-02-26 14:51:21 +01:00
Markus Armbruster
f43e47dbf6 QemuOpts: Drop qemu_opt_set(), rename qemu_opt_set_err(), fix use
qemu_opt_set() is a wrapper around qemu_opt_set() that reports the
error with qerror_report_err().

Most of its users assume the function can't fail.  Make them use
qemu_opt_set_err() with &error_abort, so that should the assumption
ever break, it'll break noisily.

Just two users remain, in util/qemu-config.c.  Switch them to
qemu_opt_set_err() as well, then rename qemu_opt_set_err() to
qemu_opt_set().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-02-26 14:49:31 +01:00
Markus Armbruster
6be4194b92 block: Suppress unhelpful extra errors in bdrv_img_create()
bdrv_img_create() uses qemu_opt_set(), which reports errors with
qerror_report_err().  Its error messages aren't helpful here, the
caller reports one that actually makes sense.  I don't know how to
trigger the error conditions, though.

Switch to qemu_opt_set_err() to get rid of the unwanted messages.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-02-26 14:48:31 +01:00
Markus Armbruster
39101f2511 QemuOpts: Convert qemu_opt_set_number() to Error, fix its use
Return the Error object instead of reporting it with
qerror_report_err().

Change callers that assume the function can't fail to pass
&error_abort, so that should the assumption ever break, it'll break
noisily.

Turns out all callers outside its unit test assume that.  We could
drop the Error ** argument, but that would make the interface less
regular, so don't.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-02-26 14:47:32 +01:00
Max Reitz
b9c649470b block: Keep bdrv_check*_request()'s return value
Do not throw away the value returned by bdrv_check_request() and
bdrv_check_byte_request().

Fix up some coding style issues in the proximity of the affected hunks.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1423162705-32065-17-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-02-16 15:07:19 +00:00
Max Reitz
c0191e763b block: Remove "growable" from BDS
Now that request clamping is done in the BlockBackend, the "growable"
field can be removed from the BlockDriverState. All BDSs are now treated
as being "growable" (that is, they are allowed to grow; they are not
necessarily actually able to).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1423162705-32065-16-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-02-16 15:07:19 +00:00
Max Reitz
b65a5e12a4 block: Add Error parameter to bdrv_find_protocol()
The argument given to bdrv_find_protocol() is just a file name, which
makes it difficult for the caller to reconstruct what protocol
bdrv_find_protocol() was hoping to find. This patch adds an Error
parameter to that function to solve this issue.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1423162705-32065-4-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-02-16 15:07:18 +00:00
Markus Armbruster
b1ca639184 block: Eliminate silly QERR_ macros used for encryption keys
The QERR_ macros are leftovers from the days of "rich" error objects.
They're used with error_set() and qerror_report(), and expand into the
first *two* arguments.  This trickiness has become pointless.  Clean
up QERR_DEVICE_ENCRYPTED and QERR_DEVICE_NOT_ENCRYPTED.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1422524221-8566-5-git-send-email-armbru@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-02-06 11:46:32 -05:00
Markus Armbruster
4d2855a348 block: New bdrv_add_key(), convert monitor to use it
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1422524221-8566-4-git-send-email-armbru@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-02-06 11:46:32 -05:00
Peter Lieven
75af1f34cd block: introduce BDRV_REQUEST_MAX_SECTORS
we check and adjust request sizes at several places with
sometimes inconsistent checks or default values:
 INT_MAX
 INT_MAX >> BDRV_SECTOR_BITS
 UINT_MAX >> BDRV_SECTOR_BITS
 SIZE_MAX >> BDRV_SECTOR_BITS

This patches introdocues a macro for the maximal allowed sectors
per request and uses it at several places.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:22 +01:00
Peter Lieven
f4564d53c6 block: add accounting for merged requests
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:21 +01:00
Peter Lieven
98764152ad block: change default for discard and write zeroes to INT_MAX
do not trim requests if the driver does not supply a limit
through BlockLimits. For write zeroes we still keep a limit
for the unsupported path to avoid allocating a big bounce buffer.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Suggested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:21 +01:00
Jeff Cody
a1a11d10ab block: remove unused variable in bdrv_commit
As Stefan pointed out, the variable 'filename' in bdrv_commit is unused,
despite being maintained in previous patches.

With this patch, get rid of the variable for good.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-23 18:17:06 +01:00
Fam Zheng
bb00021de0 block: Split BLOCK_OP_TYPE_COMMIT to BLOCK_OP_TYPE_COMMIT_{SOURCE, TARGET}
Like BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET,
block-commit involves two asymmetric devices.

This change is not user-visible (yet), because commit only works with
device names.

But once we enable backing reference in blockdev-add, or specifying
node-name in block-commit command, we don't want the user to start two
commit jobs on the same backing chain, which will corrupt things because
of the final bdrv_swap.

Before we have per category blockers, splitting this type is still
better.

[Resolved virtio-blk dataplane conflict by replacing
BLOCK_OP_TYPE_COMMIT with both BLOCK_OP_TYPE_COMMIT_{SOURCE, TARGET}.
They are safe since the block job runs in the same AioContext as the
dataplane IOThread.
--Stefan]

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-01-13 13:43:29 +00:00
Peter Lieven
095e4fa4b5 block: limited request size in write zeroes unsupported path
If bs->bl.max_write_zeroes is large and we end up in the unsupported
path we might allocate a lot of memory for the iovector and/or even
generate an oversized requests.

Fix this by limiting the request by the minimum of the reported
maximum transfer size or 16MB (32768 sectors).

Reported-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-id: 1420457389-16332-1-git-send-email-pl@kamp.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-01-13 13:43:29 +00:00
Vladimir Sementsov-Ogievskiy
c4237dfa63 block: fix spoiling all dirty bitmaps by mirror and migration
Mirror and migration use dirty bitmaps for their purposes, and since
commit [block: per caller dirty bitmap] they use their own bitmaps, not
the global one. But they use old functions bdrv_set_dirty and
bdrv_reset_dirty, which change all dirty bitmaps.

Named dirty bitmaps series by Fam and Snow are affected: mirroring and
migration will spoil all (not related to this mirroring or migration)
named dirty bitmaps.

This patch fixes this by adding bdrv_set_dirty_bitmap and
bdrv_reset_dirty_bitmap, which change concrete bitmap. Also, to prevent
such mistakes in future, old functions bdrv_(set,reset)_dirty are made
static, for internal block usage.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
CC: John Snow <jsnow@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Denis V. Lunev <den@openvz.org>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1417081246-3593-1-git-send-email-vsementsov@parallels.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-01-13 11:47:56 +00:00
Max Reitz
291680186f block: Relative backing file for image creation
Relative backing filenames are always relative to the backed image's
directory; the same applies to image creation. Therefore, if the backing
file has to be opened for determining its size (in case the size has not
been explicitly specified) its filename should be interpreted relative
to the new image's base directory and not relative to qemu's working
directory.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-13 11:47:56 +00:00
Max Reitz
9f07429e88 block: JSON filenames and relative backing files
When using a relative backing file name, qemu needs to know the
directory of the top image file. For JSON filenames, such a directory
cannot be easily determined (e.g. how do you determine the directory of
a qcow2 BDS directly on top of a quorum BDS?). Therefore, do not allow
relative filenames for the backing file of BDSs only having a JSON
filename.

Furthermore, BDS::exact_filename should be used whenever possible. If
BDS::filename is not equal to BDS::exact_filename, the former will
always be a JSON object.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-13 11:47:56 +00:00
Max Reitz
0a82855a1a block: Get full backing filename from string
Introduce bdrv_get_full_backing_filename_from_filename(), a function
which takes the name of the backed file and a potentially relative
backing filename to produce the full (absolute) backing filename.

Use this function from bdrv_get_full_backing_filename().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-13 11:47:56 +00:00
Paolo Bonzini
e012b78cf5 block: do not allocate an iovec per read of a growable/zero_after_eof BDS
Most reads do not go past the end of the file, and they can use the
input QEMUIOVector instead of creating one.  This removes the
qemu_iovec_* functions from the profile.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-13 11:47:56 +00:00
Fam Zheng
43c5d8f800 block: Don't add trailing space in "Formating..." message
Change the message printing code to output a separator for each option
string before it instead of after, then we don't one more extra ' ' in
the end.

To update qemu-iotests output files, most of the times one would just
copy the *.out.bad to *.out. With this change we will not have the
space disliked by checkpatch.pl.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1418110684-19528-3-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 16:52:33 +00:00
Max Reitz
5c98415b2a vmdk: Fix error for JSON descriptor file names
If vmdk blindly tries to use path_combine() using bs->file->filename as
the base file name, this will result in a bad error message for JSON
file names when calling bdrv_open(). It is better to only try
bs->file->exact_filename; if that is empty, bs->file->filename will be
useless for path_combine() and an error should be emitted (containing
bs->file->filename because desc_file_path (which is
bs->file->exact_filename) is empty).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1417615043-26174-2-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 13:14:10 +00:00
Max Reitz
c614972408 block: Check create_opts before image creation
If a driver supports image creation, it needs to set the .create_opts
field. We can use that to make sure .create_opts for both drivers
involved is not NULL in bdrv_img_create(), which is important so that
the create_opts pointer in that function is not NULL after the
qemu_opts_append() calls and when going into qemu_opts_create().

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:20 +01:00
Max Reitz
ef8104378c block: Omit bdrv_find_format for essential drivers
We can always assume raw, file and qcow2 being available; so do not use
bdrv_find_format() to locate their BlockDriver objects but statically
reference the respective objects.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:19 +01:00
Kevin Wolf
c5f6e493bb block: Don't probe for unknown backing file format
If a qcow2 image specifies a backing file format that doesn't correspond
to any format driver that qemu knows, we shouldn't fall back to probing,
but simply error out.

Not looking up the backing file driver in bdrv_open_backing_file(), but
just filling in the "driver" option if it isn't there moves us closer to
the goal of having everything in QDict options and gets us the error
handling of bdrv_open(), which correctly refuses unknown drivers.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1416935562-7760-4-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:13 +01:00
Kevin Wolf
38f3ef574b raw: Prohibit dangerous writes for probed images
If the user neglects to specify the image format, QEMU probes the
image to guess it automatically, for convenience.

Relying on format probing is insecure for raw images (CVE-2008-2004).
If the guest writes a suitable header to the device, the next probe
will recognize a format chosen by the guest.  A malicious guest can
abuse this to gain access to host files, e.g. by crafting a QCOW2
header with backing file /etc/shadow.

Commit 1e72d3b (April 2008) provided -drive parameter format to let
users disable probing.  Commit f965509 (March 2009) extended QCOW2 to
optionally store the backing file format, to let users disable backing
file probing.  QED has had a flag to suppress probing since the
beginning (2010), set whenever a raw backing file is assigned.

All of these additions that allow to avoid format probing have to be
specified explicitly. The default still allows the attack.

In order to fix this, commit 79368c8 (July 2010) put probed raw images
in a restricted mode, in which they wouldn't be able to overwrite the
first few bytes of the image so that they would identify as a different
image. If a write to the first sector would write one of the signatures
of another driver, qemu would instead zero out the first four bytes.
This patch was later reverted in commit 8b33d9e (September 2010) because
it didn't get the handling of unaligned qiov members right.

Today's block layer that is based on coroutines and has qiov utility
functions makes it much easier to get this functionality right, so this
patch implements it.

The other differences of this patch to the old one are that it doesn't
silently write something different than the guest requested by zeroing
out some bytes (it fails the request instead) and that it doesn't
maintain a list of signatures in the raw driver (it calls the usual
probe function instead).

Note that this change doesn't introduce new breakage for false positive
cases where the guest legitimately writes data into the first sector
that matches the signatures of an image format (e.g. for nested virt):
These cases were broken before, only the failure mode changes from
corruption after the next restart (when the wrong format is probed) to
failing the problematic write request.

Also note that like in the original patch, the restrictions only apply
if the image format has been guessed by probing. Explicitly specifying a
format allows guests to write anything they like.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1416497234-29880-8-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:13 +01:00
Kevin Wolf
7cddd3728e block: Read only one sector for format probing
The only image format driver that even potentially accesses anything
after 512 bytes in its bdrv_probe() implementation is VMDK, which reads
a plain-text descriptor file. In practice, the field it's looking for
seems to come first and will be well within the first 512 bytes, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1416497234-29880-7-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:12 +01:00
Markus Armbruster
c6684249fd block: Factor bdrv_probe_all() out of find_image_format()
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1416497234-29880-6-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:12 +01:00
Fam Zheng
20a9e77dfa block: Add bdrv_get_node_name
This returns the node name of a BDS. Remove the TODO comment and expect
the callers to be explicit.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:25:29 +01:00
Fam Zheng
04df765ab4 block: Add bdrv_next_node
Similar to bdrv_next, this traverses through graph_bdrv_states. Will be
useful to enumerate all the named nodes.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:25:29 +01:00
Fam Zheng
f3a9cfddae block: Fix max nb_sectors in bdrv_make_zero
In bdrv_rw_co we report -EINVAL for nb_sectors > INT_MAX /
BDRV_SECTOR_SIZE, so a caller shouldn't exceed it.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1415603264-21497-1-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-14 09:20:35 +00:00
Peter Maydell
776346cd63 trivial patches for 2014-11-11
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJUYh9vAAoJEL7lnXSkw9fbgPQH/065L5+SpaJR1Nte9Lz3N2s1
 a6tGSI22yu85tKvYCdYjeoVHSkSTyR57FdTfUd2xc2QPj+J4sWXpA81KILBGTJUp
 NMpmLpWg4LOh8Ek4ViRgmFFdryzIFa4dT4gc1AcSAIAQ6jsgK1dM7m5kfncC3TN0
 TUs248vJ2i/DaE0k8TOeJqxJTqInoFttlJEqG7RD+V5JznokE4zpFNXHDGx9BptE
 W2J38GJ/TKRPe9UrHMKZI1r6+ZBdXyE/CaqsNNKLJdqrHgSQuAyK/PS6dQbM4BLg
 M1qdP7Tp0wOlvv9qoEZMOEiUsi54XPqLgaLMbW74Yp5X459fqmLW2imy49pHXt8=
 =klsW
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2014-11-11' into staging

trivial patches for 2014-11-11

# gpg: Signature made Tue 11 Nov 2014 14:38:39 GMT using RSA key ID A4C3D7DB
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>"
# gpg:                 aka "Michael Tokarev <mjt@debian.org>"

* remotes/mjt/tags/pull-trivial-patches-2014-11-11:
  block: Fix comment for bdrv_co_get_block_status
  sysbus: Correct SYSTEM_BUS(obj) defines
  target-i386: cpu: keeping function parameters alignment on new line
  xen-hvm: Remove redundant variable 'xstate'
  coroutine-sigaltstack: Change jmp_buf to sigjmp_buf
  pc-bios: petalogix-s3adsp1800.dtb: Use 'xlnx, xps-ethernetlite-2.00.a' instead of 'xlnx, xps-ethernetlite-2.00.b'
  gdbstub: Add a missing case of signal number translation in gdbstub
  numa: make 'info numa' take into account hotplugged memory
  slirp/smbd: modify/set several parameters in generated smbd.conf
  qemu-doc.texi: fix typos in x509 examples
  icc_bus: fix typo ICC_BRIGDE -> ICC_BRIDGE

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-11-11 14:50:10 +00:00
Fam Zheng
705be728c0 block: Fix comment for bdrv_co_get_block_status
It returns more information than binary, fix the comment.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-11-11 17:36:19 +03:00
Max Reitz
e56934bece block: Propagate error in bdrv_img_create()
If the specified backing file could not be opened, do not generate a new
error message which contains the message which has been generated by
bdrv_open(), but just propagate the latter.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-11-06 12:45:47 +01:00
Stefan Hajnoczi
5a7e7a0bad block: let mirror blockjob run in BDS AioContext
The mirror block job must run in the BlockDriverState AioContext so that
it works with dataplane.

Acquire the AioContext in blockdev.c so starting the block job is safe.

Note that to_replace is treated separately from other BlockDriverStates
in that it does not need to be in the same AioContext.  Explicitly
acquire/release to_replace's AioContext when accessing it.

The completion code in block/mirror.c must perform BDS graph
manipulation and bdrv_reopen() from the main loop.  Use
block_job_defer_to_main_loop() to achieve that.

The bdrv_drain_all() call is not allowed outside the main loop since it
could lead to lock ordering problems.  Use bdrv_drain(bs) instead
because we have acquired the AioContext so nothing else can sneak in
I/O.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1413889440-32577-10-git-send-email-stefanha@redhat.com
2014-11-03 11:41:49 +00:00
Stefan Hajnoczi
5b98db0ad3 block: add bdrv_drain()
Now that op blockers are in use, we can ensure that no other sources are
generating I/O on a BlockDriverState.  Therefore it is possible to drain
requests for a single BDS.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1413889440-32577-7-git-send-email-stefanha@redhat.com
2014-11-03 11:41:49 +00:00
Max Reitz
7748543420 block: Add status callback to bdrv_amend_options()
Depending on the changed options and the image format,
bdrv_amend_options() may take a significant amount of time. In these
cases, a way to be informed about the operation's status is desirable.

Since the operation is rather complex and may fundamentally change the
image, implementing it as AIO or a coroutine does not seem feasible. On
the other hand, implementing it as a block job would be significantly
more difficult than a simple callback and would not add benefits other
than progress report to the amending operation, because it should not
actually be run as a block job at all.

A callback may not be very pretty, but it's very easy to implement and
perfectly fits its purpose here.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1414404776-4919-2-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 11:41:48 +00:00
Peter Maydell
573742a543 block.c: Fix type of IoOperationType variable in send_qmp_error_event()
The local variable 'ac' in send_qmp_error_event() is declared with the
wrong type, which causes clang to complain when it is initialized
and again when it is used:

block.c:3655:20: warning: implicit conversion from enumeration type 'enum IoOperationType' to different enumeration type 'BlockErrorAction' (aka 'enum BlockErrorAction') [-Wenum-conversion]
    ac = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
       ~           ^~~~~~~~~~~~~~~~~~~~~~
block.c:3655:45: warning: implicit conversion from enumeration type 'enum IoOperationType' to different enumeration type 'BlockErrorAction' (aka 'enum BlockErrorAction') [-Wenum-conversion]
    ac = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
       ~                                    ^~~~~~~~~~~~~~~~~~~~~~~
block.c:3656:62: warning: implicit conversion from enumeration type 'BlockErrorAction' (aka 'enum BlockErrorAction') to different enumeration type 'IoOperationType' (aka 'enum IoOperationType') [-Wenum-conversion]
    qapi_event_send_block_io_error(bdrv_get_device_name(bs), ac, action,
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                           ^~

Correct the type to IoOperationType, and rename the variable
to 'optype' to match its correct type.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Message-id: 1412969583-21045-1-git-send-email-peter.maydell@linaro.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 09:48:41 +00:00
Peter Lieven
6c5a42ac34 block: avoid creating oversized writes in multiwrite_merge
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 09:48:41 +00:00
Peter Lieven
2647fab57d BlockLimits: introduce max_transfer_length
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 09:48:41 +00:00
Max Reitz
59c9a95fd2 block: Respect underlying file's EOF
When falling through to the underlying file in
bdrv_co_get_block_status(), if it returns that the query offset is
beyond the file end (by setting *pnum to 0), return the range to be
zero and do not let the number of sectors for which information could be
obtained be overwritten.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:02 +02:00
Max Reitz
9ebd844805 block: Add qemu_{,try_}blockalign0()
These functions call their non-0-counterparts and then fill the
allocated buffer with 0 (if the allocation has been successful).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Markus Armbruster
a7f53e26a6 block: Lift device model API into BlockBackend
Move device model attachment / detachment and the BlockDevOps device
model callbacks and their wrappers from BlockDriverState to
BlockBackend.

Wrapper calls in block.c change from

    bdrv_dev_FOO_cb(bs, ...)

to

    if (bs->blk) {
        bdrv_dev_FOO_cb(bs->blk, ...);
    }

No change, because both bdrv_dev_change_media_cb() and
bdrv_dev_resize_cb() do nothing when no device model is attached, and
a device model can be attached only when bs->blk.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 14:03:50 +02:00
Markus Armbruster
097310b53e block: Rename BlockDriverCompletionFunc to BlockCompletionFunc
I'll use it with block backends shortly, and the name is going to fit
badly there.  It's a block layer thing anyway, not just a block driver
thing.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:27 +02:00
Markus Armbruster
7c84b1b831 block: Rename BlockDriverAIOCB* to BlockAIOCB*
I'll use BlockDriverAIOCB with block backends shortly, and the name is
going to fit badly there.  It's a block layer thing anyway, not just a
block driver thing.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:27 +02:00
Markus Armbruster
7f06d47eff block: Merge BlockBackend and BlockDriverState name spaces
BlockBackend's name space is separate only to keep the initial patches
simple.  Time to merge the two.

Retain bdrv_find() and bdrv_get_device_name() for now, to keep this
series manageable.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Markus Armbruster
bfb197e0d9 block: Eliminate BlockDriverState member device_name[]
device_name[] can become non-empty only in bdrv_new_root() and
bdrv_move_feature_fields().  The latter is used only to undo damage
done by bdrv_swap().  The former is called only by blk_new_with_bs().
Therefore, when a BlockDriverState's device_name[] is non-empty, then
it's been created with a BlockBackend, and vice versa.  Furthermore,
blk_new_with_bs() keeps the two names equal.

Therefore, device_name[] is redundant.  Eliminate it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Markus Armbruster
fea68bb6e9 block: Eliminate bdrv_iterate(), use bdrv_next()
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Markus Armbruster
18e46a033d block: Connect BlockBackend and DriveInfo
Make the BlockBackend own the DriveInfo.  Change blockdev_init() to
return the BlockBackend instead of the DriveInfo.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Markus Armbruster
7e7d56d9e0 block: Connect BlockBackend to BlockDriverState
Convenience function blk_new_with_bs() creates a BlockBackend with its
BlockDriverState.  Callers have to unref both.  The commit after next
will relieve them of the need to unref the BlockDriverState.

Complication: due to the silly way drive_del works, we need a way to
hide a BlockBackend, just like bdrv_make_anon().  To emphasize its
"special" status, give the function a suitably off-putting name:
blk_hide_on_behalf_of_do_drive_del().  Unfortunately, hiding turns the
BlockBackend's name into the empty string.  Can't avoid that without
breaking the blk->bs->device_name equals blk->name invariant.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Markus Armbruster
e4e9986b1c block: Split bdrv_new_root() off bdrv_new()
Creating an anonymous BDS can't fail.  Make that obvious.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Alexey Kardashevskiy
7ea2d269cb block/migration: Disable cache invalidate for incoming migration
When migrated using libvirt with "--copy-storage-all", at the end of
migration there is race between NBD mirroring task trying to do flush
and migration completion, both end up invalidating cache. Since qcow2
driver does not handle this situation very well, random crashes happen.

This disables the BDRV_O_INCOMING flag for the block device being migrated
once the cache has been invalidated.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

--

fixed parens by hand
Signed-off-by: Juan Quintela <quintela@redhat.com>
2014-10-14 09:35:21 +02:00
Markus Armbruster
f5bebbbb28 util: Emancipate id_wellformed() from QemuOpts
IDs have long spread beyond QemuOpts: not everything with an ID
necessarily goes through QemuOpts.  Commit 9aebf3b is about such a
case: block layer names are meant to be well-formed IDs, but some of
them don't go through QemuOpts, and thus weren't checked.  The commit
fixed that the straightforward way: rename the internal QemuOpts
helper id_wellformed() to qemu_opts_id_wellformed() and give it
external linkage.

Instead of using it directly in block.c, the commit adds wrapper
bdrv_is_valid_name(), probably to hide the connection to QemuOpts.

Go one logical step further: emancipate IDs from QemuOpts.  Rename the
function back to id_wellformed(), and put it in another file.  While
there, clean up its value to bool.  Peel off the bdrv_is_valid_name()
wrapper.

[Replaced stray return 0 with return false to match bool returns used
elsewhere in id_wellformed().
--Stefan]

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-10-03 10:30:33 +01:00
Kevin Wolf
9aebf3b892 block: Validate node-name
The device_name of a BlockDriverState is currently checked because it is
always used as a QemuOpts ID and qemu_opts_create() checks whether such
IDs are wellformed.

node-name is supposed to share the same namespace, but it isn't checked
currently. This patch adds explicit checks both for device_name and
node-name so that the same rules will still apply even if QemuOpts won't
be used any more at some point.

qemu-img used to use names with spaces in them, which isn't allowed any
more. Replace them with underscores.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-25 15:24:32 +02:00
Markus Armbruster
d224469d87 block: Improve message for device name clashing with node name
Suggested-by: Benoit Canet <benoit.canet@nodalink.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-25 15:24:14 +02:00
Markus Armbruster
3ae59580a0 block: Keep DriveInfo alive until BlockDriverState dies
If the BDS's refcnt > 0, drive_del() destroys the DriveInfo, but not
the BDS.  This can happen in three places:

* Device model destruction during unplug: blockdev_auto_del()

* Xen IDE unplug: pci_piix3_xen_ide_unplug()

* drive_del command when no device model is attached: do_drive_del()

The other callers of drive_del are on error paths where refcnt == 1.

If the user somehow manages to plug in a device model using a BDS that
has gone through drive_del(), the legacy configuration passed in
DriveInfo doesn't reach the device model, and automatic deletion on
unplug doesn't work.  Worse, some device models such as scsi-disk
crash when DriveInfo doesn't exist.

This is theoretical; I didn't research an actual reproducer. The problem
was introduced when we replaced DriveInfo reference counting by BDS
reference counting in commit a94a3fa..fa510eb.

Fix by keeping DriveInfo alive until its BDS dies.

This affects qemu_drive_opts: now you can't reuse the same ID for new
drive options until the BDS dies.  Before, you could, but since the
code always attempts to create a BDS with the same ID next, the
enclosing operation "create a new drive" failed anyway.  Different
error path, same result.

Unfortunately, the fix involves use of blockdev.c stuff from block.c,
which is a layering violation.  Fortunately, my forthcoming
BlockBackend work will get rid of it again.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-25 15:24:14 +02:00
Fam Zheng
8007429a99 block: Rename qemu_aio_release -> qemu_aio_unref
Suggested-by: Benoît Canet <benoit.canet@irqsave.net>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:17 +01:00
Fam Zheng
ca5fd113b8 block: Drop AIOCBInfo.cancel
Now that all the implementations are converted to asynchronous version
and we can emulate synchronous cancellation with it. Let's drop the
unused member.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:16 +01:00
Fam Zheng
f600ac1902 block: Drop bdrv_em_aiocb_info.cancel
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:39:00 +01:00
Fam Zheng
3acabd685e block: Drop bdrv_em_co_aiocb_info.cancel
Also drop the now unused ->done pointer.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:38:59 +01:00
Fam Zheng
02c50efe08 block: Add bdrv_aio_cancel_async
This is the async version of bdrv_aio_cancel, which doesn't block the
caller. It guarantees that the cb is called either before returning or
some time later.

bdrv_aio_cancel can base on bdrv_aio_cancel_async, later we can convert
all .io_cancel implementations to .io_cancel_async, and the aio_poll is
the common logic. In the end, .io_cancel can be dropped.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:38:58 +01:00
Fam Zheng
f197fe2b2c block: Add refcnt in BlockDriverAIOCB
This will be useful in synchronous cancel emulation with
bdrv_aio_cancel_async.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22 11:38:57 +01:00
Luiz Capitulino
624ff5736e block: extend BLOCK_IO_ERROR with reason string
BLOCK_IO_ERROR events are logged by libvirt, which helps with
post mortem analysis of guests. However, one information that
we miss today is a human readable string describing the cause
of the I/O error.

This commit adds that string it to BLOCK_IO_ERROR. Note that
this string is a debugging aid for humans, meaning that it
should not parsed by applications.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-11 17:14:13 +02:00
Benoît Canet
5366d0c8bc block: Make the block accounting functions operate on BlockAcctStats
This is the next step for decoupling block accounting functions from
BlockDriverState.
In a future commit the BlockAcctStats structure will be moved from
BlockDriverState to the device models structures.

Note that bdrv_get_stats was introduced so device models can retrieve the
BlockAcctStats structure of a BlockDriverState without being aware of it's
layout.
This function should go away when BlockAcctStats will be embedded in the device
models structures.

CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Keith Busch <keith.busch@intel.com>
CC: Anthony Liguori <aliguori@amazon.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: Peter Maydell <peter.maydell@linaro.org>
CC: Michael Tokarev <mjt@tls.msk.ru>
CC: John Snow <jsnow@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Alexander Graf <agraf@suse.de>
CC: Max Reitz <mreitz@redhat.com>

Signed-off-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-10 10:41:29 +02:00
Benoît Canet
5e5a94b605 block: Extract the block accounting code
The plan is to add new accounting metrics (latency, invalid requests, failed
requests, queue depth) and block.c is overpopulated so it will be better to work
in a separate module.

Moreover the long term plan is to have statistics in each of the BDS of the graph
for metrology purpose; this means that the device model statistics must move from
the topmost BDS to the device model.

So we need to decouple the statistic code from BlockDriverState.

This is another argument for the extraction of the code in a separate module.

CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: Benoit Canet <benoit@irqsave.net>
CC: Fam Zheng <famz@redhat.com>
CC: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
CC: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-10 10:41:29 +02:00
Benoît Canet
0ddd0ad96a block: Extract the BlockAcctStats structure
Extract the block accounting statistics into a structure so the block device
models can hold them in the future.

CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Eric Blake <eblake@redhat.com>

Signed-off-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-10 10:41:29 +02:00
Luiz Capitulino
c7c2ff0c7e block: extend BLOCK_IO_ERROR event with nospace indicator
Management software, such as RHEV's vdsm, want to be able to allocate
disk space on demand. The basic use case is to start a VM with a small
disk and then the disk is enlarged when QEMU hits a ENOSPC condition.

To this end, the management software has to be notified when QEMU
encounters ENOSPC. The solution implemented by this commit is simple:
it extends the BLOCK_IO_ERROR with a 'nospace' key, which is true
when QEMU is stopped due to ENOSPC.

Note that support for querying this event is already present in
query-block by means of the 'io-status' key. Also, the new 'nospace'
BLOCK_IO_ERROR field shares the same semantics with 'io-status',
which basically means that werror= has to be set to either
'stop' or 'enospc' to enable 'nospace'.

Finally, this commit also updates the 'io-status' key doc in the
schema with a list of supported device models.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-10 10:41:29 +02:00
Liu Yuan
6bb4515849 block: kill tail whitespace in block.c
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-08 11:12:42 +01:00
Stefan Hajnoczi
391827eb10 block: fix overlapping multiwrite requests
When request A is a strict superset of request B:

  AAAAAAAA
    BBBB

multiwrite_merge() merges them as follows:

  AABBBB

The tail of request A should have been included:

  AABBBBAA

This patch fixes data loss but this code path is probably rare.  Since
guests cannot assume ordering between in-flight requests, few
applications submit overlapping write requests.

Reported-by: Slava Pestov <sviatoslav.pestov@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-29 14:09:43 +01:00
Max Reitz
33384421b3 block: Add AIO context notifiers
If a long-running operation on a BDS wants to always remain in the same
AIO context, it somehow needs to keep track of the BDS changing its
context. This adds a function for registering callbacks on a BDS which
are called whenever the BDS is attached or detached from an AIO context.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29 10:48:45 +01:00
Stefan Hajnoczi
ada4240103 block: sort formats alphabetically in bdrv_iterate_format()
Format names are best consumed in alphabetical order.  This makes
human-readable output easy to produce.

bdrv_iterate_format() already has an array of format strings.  Sort them
before invoking the iteration callback.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
2014-08-28 13:42:25 +01:00
Max Reitz
91af701412 block: Add bdrv_refresh_filename()
Some block devices may not have a filename in their BDS; and for some,
there may not even be a normal filename at all. To work around this, add
a function which tries to construct a valid filename for the
BDS.filename field.

If a filename exists or a block driver is able to reconstruct a valid
filename (which is placed in BDS.exact_filename), this can directly be
used.

If no filename can be constructed, we can still construct an options
QDict which is then converted to a JSON object and prefixed with the
"json:" pseudo protocol prefix. The QDict is placed in
BDS.full_open_options.

For most block drivers, this process can be done automatically; those
that need special handling may define a .bdrv_refresh_filename() method
to fill BDS.exact_filename and BDS.full_open_options themselves.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 14:31:56 +02:00
Markus Armbruster
5839e53bbc block: Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n).  It's also safer,
for two reasons.  One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.

Patch created with Coccinelle, with two manual changes on top:

* Add const to bdrv_iterate_format() to keep the types straight

* Convert the allocation in bdrv_drop_intermediate(), which Coccinelle
  inexplicably misses

Coccinelle semantic patch:

    @@
    type T;
    @@
    -g_malloc(sizeof(T))
    +g_new(T, 1)
    @@
    type T;
    @@
    -g_try_malloc(sizeof(T))
    +g_try_new(T, 1)
    @@
    type T;
    @@
    -g_malloc0(sizeof(T))
    +g_new0(T, 1)
    @@
    type T;
    @@
    -g_try_malloc0(sizeof(T))
    +g_try_new0(T, 1)
    @@
    type T;
    expression n;
    @@
    -g_malloc(sizeof(T) * (n))
    +g_new(T, n)
    @@
    type T;
    expression n;
    @@
    -g_try_malloc(sizeof(T) * (n))
    +g_try_new(T, n)
    @@
    type T;
    expression n;
    @@
    -g_malloc0(sizeof(T) * (n))
    +g_new0(T, n)
    @@
    type T;
    expression n;
    @@
    -g_try_malloc0(sizeof(T) * (n))
    +g_try_new0(T, n)
    @@
    type T;
    expression p, n;
    @@
    -g_realloc(p, sizeof(T) * (n))
    +g_renew(T, p, n)
    @@
    type T;
    expression p, n;
    @@
    -g_try_realloc(p, sizeof(T) * (n))
    +g_try_renew(T, p, n)

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20 11:51:28 +02:00
Max Reitz
908bcd540f block: Catch !bs->drv in bdrv_check()
qemu-img check calls bdrv_check() twice if the first run repaired some
inconsistencies. If the first run however again triggered corruption
prevention (on qcow2) due to very bad inconsistencies, bs->drv may be
NULL afterwards. Thus, bdrv_check() should check whether bs->drv is set.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:16 +02:00
Kevin Wolf
857d4f46c3 block: Handle failure for potentially large allocations
Some code in the block layer makes potentially huge allocations. Failure
is not completely unexpected there, so avoid aborting qemu and handle
out-of-memory situations gracefully.

This patch addresses bounce buffer allocations in block.c. While at it,
convert bdrv_commit() from plain g_malloc() to qemu_try_blockalign().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:15 +02:00
Kevin Wolf
7d2a35cc92 block: Introduce qemu_try_blockalign()
This function returns NULL instead of aborting when an allocation fails.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-08-15 15:07:15 +02:00
Jeff Cody
9a4d5ca607 block: allow bdrv_unref() to be passed NULL pointers
If bdrv_unref() is passed a NULL BDS pointer, it is safe to
exit with no operation.  This will allow cleanup code to blindly
call bdrv_unref() on a BDS that has been initialized to NULL.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15 15:07:14 +02:00
Stefan Hajnoczi
2a87151fb2 block: bump coroutine pool size for drives
When a BlockDriverState is associated with a storage controller
DeviceState we expect guest I/O.  Use this opportunity to bump the
coroutine pool size by 64.

This patch ensures that the coroutine pool size scales with the number
of drives attached to the guest.  It should increase coroutine pool
usage (which makes qemu_coroutine_create() fast) without hogging too
much memory when fewer drives are attached.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-08-15 15:07:14 +02:00
Markus Armbruster
52bf1e722d block: Avoid bdrv_get_geometry() where errors should be detected
bdrv_get_geometry() hides errors.  Use bdrv_nb_sectors() or
bdrv_getlength() instead where that's obviously inappropriate.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Markus Armbruster
75d3d21f9e block: Drop superfluous aligning of bdrv_getlength()'s value
It returns a multiple of the sector size.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Markus Armbruster
57322b7811 block: Use bdrv_nb_sectors() where sectors, not bytes are wanted
Instead of bdrv_getlength().

Aside: a few of these callers don't handle errors.  I didn't
investigate whether they should.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Markus Armbruster
30a7f2fc91 block: Use bdrv_nb_sectors() in bdrv_co_get_block_status()
Instead of bdrv_getlength().

Replace variables length, length2 by total_sectors, nb_sectors2.
Bonus: use total_sectors instead of the slightly unclean
bs->total_sectors.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Markus Armbruster
4049082c4b block: Use bdrv_nb_sectors() in bdrv_aligned_preadv()
Instead of bdrv_getlength().  Eliminate variable len.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Markus Armbruster
d32f7c101b block: Use bdrv_nb_sectors() in bdrv_make_zero()
Instead of bdrv_getlength().

Variable target_size is initially in bytes, then changes meaning to
sectors.  Ugh.  Replace by target_sectors.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:13 +02:00
Markus Armbruster
65a9bb25d6 block: New bdrv_nb_sectors()
A call to retrieve the image size converts between bytes and sectors
several times:

* BlockDriver method bdrv_getlength() returns bytes.

* refresh_total_sectors() converts to sectors, rounding up, and stores
  in total_sectors.

* bdrv_getlength() converts total_sectors back to bytes (now rounded
  up to a multiple of the sector size).

* Callers wanting sectors rather bytes convert it right back.
  Example: bdrv_get_geometry().

bdrv_nb_sectors() provides a way to omit the last two conversions.
It's exactly bdrv_getlength() with the conversion to bytes omitted.
It's functionally like bdrv_get_geometry() without its odd error
handling.

Reimplement bdrv_getlength() and bdrv_get_geometry() on top of
bdrv_nb_sectors().

The next patches will convert some users of bdrv_getlength() to
bdrv_nb_sectors().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-15 15:07:12 +02:00
Kevin Wolf
3baca89139 block: Add Error argument to bdrv_refresh_limits()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-18 13:18:43 +01:00
Kevin Wolf
8eb029c26e block: Assert qiov length matches request length
At least raw-posix relies on this because it can allocate bounce buffers
based on the request length, but access it using all of the qiov entries
later.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-07-14 12:03:20 +02:00
Kevin Wolf
33f461e0c5 block: Make qiov match the request size until EOF
If a read request goes across EOF, the block driver sees a shortened
request that stops at EOF (the rest is memsetted in block.c), however
the original qiov was used for this request.

This patch makes the qiov size match the request size, avoiding a
potential buffer overflow in raw-posix.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-07-14 12:03:20 +02:00
Paolo Bonzini
b47ec2c456 block: prefer aio_poll to qemu_aio_wait
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-07-09 15:50:11 +02:00
Kevin Wolf
01fb2705bd block: Fix bdrv_is_allocated() return value
bdrv_is_allocated() should return either 0 or 1 in successful cases.
We're lucky that currently, the callers that rely on this (e.g. because
they check for ret == 1) don't seem to break badly. They just might skip
some optimisation or in the case of qemu-io 'map' print separate lines
where a single line would suffice. In theory, a wrong allocation status
could lead to image corruption with certain operations, so let's fix
this quickly.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-07-09 15:50:11 +02:00
Ming Lei
448ad91db4 block: block: introduce APIs for submitting IO as a batch
This patch introduces three APIs so that following
patches can support queuing I/O requests and submitting them
as a batch for improving I/O performance.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-07 11:05:17 +02:00
Jeff Cody
54e2690090 block: extend block-commit to accept a string for the backing file
On some image chains, QEMU may not always be able to resolve the
filenames properly, when updating the backing file of an image
after a block commit.

For instance, certain relative pathnames may fail, or drives may
have been specified originally by file descriptor (e.g. /dev/fd/???),
or a relative protocol pathname may have been used.

In these instances, QEMU may lack the information to be able to make
the correct choice, but the user or management layer most likely does
have that knowledge.

With this extension to the block-commit api, the user is able to change
the backing file of the overlay image as part of the block-commit
operation.

This allows the change to be 'safe', in the sense that if the attempt
to write the overlay image metadata fails, then the block-commit
operation returns failure, without disrupting the guest.

If the commit top is the active layer, then specifying the backing
file string will be treated as an error (there is no overlay image
to modify in that case).

If a backing file string is not specified in the command, the backing
file string to use is determined in the same manner as it was
previously.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-01 10:47:01 +02:00
Jeff Cody
5a6684d2b9 block: add helper function to determine if a BDS is in a chain
This is a small helper function, to determine if 'base' is in the
chain of BlockDriverState 'top'.  It returns true if it is in the chain,
and false otherwise.

If either argument is NULL, it will also return false.

Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-01 10:47:01 +02:00
Jeff Cody
4caf0fcd45 block: simplify bdrv_find_base() and bdrv_find_overlay()
This simplifies the function bdrv_find_overlay().  With this change,
bdrv_find_base() is just a subset of usage of bdrv_find_overlay(),
so this also takes advantage of that.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-01 10:15:34 +02:00
Chen Gang
6b8aeca574 block.c: Don't return success for bdrv_append_temp_snapshot() failure
When failure occurs, 'ret' need be set, or may return 0 to indicate
success. Previously, an error was set in errp, but 0 was returned
anyway. So let bdrv_append_temp_snapshot() return an error code and
use that for the bdrv_open() return value.

Also, error_propagate() need be called only one time within a function.

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-27 20:00:00 +02:00
Benoît Canet
09158f00e0 block: Add replaces argument to drive-mirror
drive-mirror will bdrv_swap the new BDS named node-name with the one
pointed by replaces when the mirroring is finished.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-27 20:00:00 +02:00
Jeff Cody
9c75e168bc block: check for RESIZE blocker in the QMP command, not bdrv_truncate()
If we check for the RESIZE blocker in bdrv_truncate(), that means a
commit will fail if the overlay layer is larger than the base, due to
the backing blocker.

This is a regression in behavior from 2.0; currently, commit will try to
grow the size of the base image to match the overlay size, if the
overlay size is larger.

By moving this into the QMP command qmp_block_resize(), it allows
usage of bdrv_truncate() within block jobs.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-27 11:37:35 +02:00
Kevin Wolf
20cca275c6 block: Remove a special case for protocols
The only semantic change is that bs->open_flags gets BDRV_O_PROTOCOL set
now. This isn't useful, but it doesn't hurt either. The code that was
previously skipped by 'goto done' is automatically disabled because
protocol drivers don't support backing files (and if they did, this
would probably be a fix) and can't have snapshot_flags set.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-06-26 13:51:01 +02:00
Kevin Wolf
8ee79e707a block: Catch backing files assigned to non-COW drivers
Since we parse backing.* options to add a backing file from the command
line when the driver didn't assign one, it has been possible to have a
backing file for e.g. raw images (it just was never accessed).

This is obvious nonsense and should be rejected.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-06-26 13:51:01 +02:00
Kevin Wolf
76c591b013 block: Remove second bdrv_open() recursion
This recursion was introduced in commit 505d7583 in order to allow
nesting image formats. It only ever takes effect when the user
explicitly specifies a driver name and that driver isn't suitable for
the protocol level.

We can check this earlier in bdrv_open() and if the explicitly
requested driver is a format driver, clear BDRV_O_PROTOCOL so that
another bs->file layer is opened.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-06-26 13:51:01 +02:00
Kevin Wolf
b348f3311c block: Inline bdrv_file_open()
It doesn't do much any more, we can move the code to bdrv_open() now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-06-26 13:51:01 +02:00
Kevin Wolf
f4788adcb4 block: Use common driver selection code for bdrv_open_file()
This moves the bdrv_open_file() call a bit down so that it can use the
bdrv_open() code that selects the right block driver.

The code between the old and the new call site is either common code
(the error message for an unknown driver has been unified now) or
doesn't run with cleared BDRV_O_PROTOCOL (added an if block in one
place, whereas the right path was already asserted in another place)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-06-26 13:51:01 +02:00
Kevin Wolf
17b005f1d4 block: Always pass driver name through options QDict
The "driver" entry in the options QDict is now only missing if we're
opening an image with format probing.

We also catch cases now where both the drv argument and a "driver"
option is specified, e.g. by specifying -drive format=qcow2,driver=raw

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-06-26 13:51:01 +02:00
Kevin Wolf
5e5c4f63f4 block: Move json: parsing to bdrv_fill_options()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-06-26 13:51:01 +02:00
Kevin Wolf
462f5bcf69 block: Move bdrv_fill_options() call to bdrv_open()
bs->options now contains the modified version of the options.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-06-26 13:51:01 +02:00
Kevin Wolf
f54120ff1a block: Create bdrv_fill_options()
The idea of bdrv_fill_options() is to convert every parameter for
opening images, in particular the filename and flags, to entries in the
options QDict.

This patch starts with moving the filename parsing and driver probing
part from bdrv_file_open() to the new function.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-06-26 13:51:01 +02:00
Chen Gang
5db97df274 block.c: Remove useless 'buf' variable
'buf' is not used actually, so remove it and related snprintf() statement.

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-06-24 20:01:24 +04:00
Wenchao Xia
5a2d2cbd88 qapi event: convert BLOCK_IO_ERROR and BLOCK_JOB_ERROR
Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-06-23 11:12:27 -04:00
Wenchao Xia
a5ee7bd454 qapi event: convert DEVICE_TRAY_MOVED
Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-06-23 11:12:27 -04:00
Wenchao Xia
a589569f2f qapi: adjust existing defines
In order to let event defines use existing types later, instead of
redefine new ones, some old type defines for spice and vnc are changed,
and BlockErrorAction is moved from block.h to qapi schema. Note that
BlockErrorAction is not merged with BlockdevOnError.

At this point, VncInfo is not made a child of VncBasicInfo, because
VncBasicInfo has mandatory fields where VncInfo makes them optional.

Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-06-23 11:01:25 -04:00
Paolo Bonzini
2bd3bce8ef block: asynchronously stop the VM on I/O errors
With virtio-blk dataplane, I/O errors might occur while QEMU is
not in the main I/O thread.  However, it's invalid to call vm_stop
when we're neither in a VCPU thread nor in the main I/O thread,
even if we were to take the iothread mutex around it.

To avoid this problem, we can raise a request to the main I/O thread,
similar to what QEMU does when vm_stop is called from a CPU thread.
We know that bdrv_error_action is called from an AIO callback, and
the moment at which the callback will fire is not well-defined; it
depends on the moment at which the disk or OS finishes the operation,
which can happen at any time.  Note that QEMU is certainly not in a CPU
thread and we do not need to call cpu_stop_current() like vm_stop() does.

However, we need to ensure that any action taken by management will
result in correct detection of the error _and_ a running VM.  In particular:

- the event must be raised after the iostatus has been set, so that
"info block" will return an iostatus that matches the event.

- the VM must be stopped after the iostatus has been set, so that
"info block" will return an iostatus that matches the runstate.

The ordering between the STOP and BLOCK_IO_ERROR events is preserved;
BLOCK_IO_ERROR is documented to come first.

This makes bdrv_error_action() thread safe (assuming QMP events are,
which is attacked by a separate series).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-23 16:36:13 +08:00
Chunyan Liu
c282e1fdf7 cleanup QEMUOptionParameter
Now that all backend drivers are using QemuOpts, remove all
QEMUOptionParameter related codes.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-16 17:23:21 +08:00
Chunyan Liu
83d0521a1e change block layer to support both QemuOpts and QEMUOptionParamter
Change block layer to support both QemuOpts and QEMUOptionParameter.
After this patch, it will change backend drivers one by one. At the end,
QEMUOptionParameter will be removed and only QemuOpts is kept.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-16 17:23:20 +08:00
Stefan Hajnoczi
13af91ebf0 throttle: add throttle_detach/attach_aio_context()
Block I/O throttling uses timers and currently always adds them to the
main loop.  Throttling will break if bdrv_set_aio_context() is used to
move a BlockDriverState to a different AioContext.

This patch adds throttle_detach/attach_aio_context() interfaces so the
throttling timers and uses them to move timers to the new AioContext.
Note that bdrv_set_aio_context() already drains all requests so we're
sure no throttled requests are pending.

The test cases need to be updated since the throttle_init() interface
has changed.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-06-04 09:56:12 +02:00
Stefan Hajnoczi
dcd042282d block: add bdrv_set_aio_context()
Up until now all BlockDriverState instances have used the QEMU main loop
for fd handlers, timers, and BHs.  This is not scalable on SMP guests
and hosts so we need to move to a model with multiple event loops on
different host CPUs.

bdrv_set_aio_context() assigns the AioContext event loop to use for a
particular BlockDriverState.  It first detaches the entire
BlockDriverState graph from the current AioContext and then attaches to
the new AioContext.

This function will be used by virtio-blk data-plane to assign a
BlockDriverState to its IOThread AioContext.  Make
bdrv_aio_set_context() public since data-plane should not include
block_int.h.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-04 09:56:11 +02:00
Stefan Hajnoczi
9b536adcbe block: acquire AioContext in bdrv_drain_all()
Modify bdrv_drain_all() to take into account that BlockDriverState
instances may be running in different AioContexts.

This patch changes the implementation of bdrv_drain_all() while
preserving the semantics.  Previously kicking throttled requests and
checking for pending requests were done across all BlockDriverState
instances in sequence.  Now we process each BlockDriverState in turn,
making sure to acquire and release its AioContext.

This prevents race conditions between the thread executing
bdrv_drain_all() and the thread running the AioContext.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-04 09:56:11 +02:00
Stefan Hajnoczi
ed78cda3de block: acquire AioContext in bdrv_*_all()
bdrv_close_all(), bdrv_commit_all(), bdrv_flush_all(),
bdrv_invalidate_cache_all(), and bdrv_clear_incoming_migration_all() are
called by main loop code and touch all BlockDriverState instances.

Some BlockDriverState instances may be running in another AioContext.
Make sure to acquire the AioContext before closing the BlockDriverState.

This will protect against race conditions once virtio-blk data-plane is
using the BlockDriverState from another AioContext event loop.

Note that this patch does not convert bdrv_drain_all() yet since that
conversion is non-trivial.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-04 09:56:11 +02:00
Stefan Hajnoczi
2572b37a47 block: use BlockDriverState AioContext
Drop the assumption that we're using the main AioContext.  Convert
qemu_aio_wait() to aio_poll() and qemu_bh_new() to aio_bh_new() so the
BlockDriverState AioContext is used.

Note there is still one qemu_aio_wait() left in bdrv_create() but we do
not have a BlockDriverState there and only main loop code invokes this
function.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-04 09:56:11 +02:00
Markus Armbruster
b20e61e0d5 block: Plug memory leak on brv_open_image() error path
Introduced in commit da557a.  Spotted by Coverity.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-05-30 14:26:54 +02:00
Fam Zheng
ce782938b8 block: Drop redundant bdrv_refresh_limits
The above bdrv_set_backing_hd already does this.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-28 14:28:46 +02:00
Fam Zheng
826b6ca0b0 block: Add backing_blocker in BlockDriverState
This makes use of op_blocker and blocks all the operations except for
commit target, on each BlockDriverState->backing_hd.

The asserts for op_blocker in bdrv_swap are removed because with this
change, the target of block commit has at least the backing blocker of
its child, so the assertion is not true. Callers should do their check.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-28 14:28:46 +02:00
Fam Zheng
920beae103 block: Use bdrv_set_backing_hd everywhere
We need to handle the coming backing_blocker properly, so don't open
code the assignment, instead, call bdrv_set_backing_hd to change
backing_hd.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-28 14:28:46 +02:00
Fam Zheng
8d24cce1e3 block: Add bdrv_set_backing_hd()
This is the common but non-trivial steps to assign or change the
backing_hd of BDS.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-28 14:28:46 +02:00
Fam Zheng
3718d8ab65 block: Replace in_use with operation blocker
This drops BlockDriverState.in_use with op_blockers:

  - Call bdrv_op_block_all in place of bdrv_set_in_use(bs, 1).

  - Call bdrv_op_unblock_all in place of bdrv_set_in_use(bs, 0).

  - Check bdrv_op_is_blocked() in place of bdrv_in_use(bs).

    The specific types are used, e.g. in place of starting block backup,
    bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP, ...).

    There is one exception in block_job_create, where
    bdrv_op_blocker_is_empty() is used, because we don't know the operation
    type here. This doesn't matter because in a few commits away we will drop
    the check and move it to callers that _do_ know the type.

  - Check bdrv_op_blocker_is_empty() in place of assert(!bs->in_use).

Note: there is only bdrv_op_block_all and bdrv_op_unblock_all callers at
this moment. So although the checks are specific to op types, this
changes can still be seen as identical logic with previously with
in_use. The difference is error message are improved because of blocker
error info.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-28 14:28:46 +02:00
Fam Zheng
fbe40ff780 block: Introduce op_blockers to BlockDriverState
BlockDriverState.op_blockers is an array of lists with BLOCK_OP_TYPE_MAX
elements. Each list is a list of blockers of an operation type
(BlockOpType), that marks this BDS as currently blocked for a certain
type of operation with reason errors stored in the list. The rule of
usage is:

 * BDS user who wants to take an operation should check if there's any
   blocker of the type with bdrv_op_is_blocked().

 * BDS user who wants to block certain types of operation, should call
   bdrv_op_block (or bdrv_op_block_all to block all types of operations,
   which is similar to the existing bdrv_set_in_use()).

 * A blocker is only referenced by op_blockers, so the lifecycle is
   managed by caller, and shouldn't be lost until unblock, so typically
   a caller does these:

   - Allocate a blocker with error_setg or similar, call bdrv_op_block()
     to block some operations.
   - Hold the blocker, do his job.
   - Unblock operations that it blocked, with the same reason pointer
     passed to bdrv_op_unblock().
   - Release the blocker with error_free().

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-28 14:28:46 +02:00
Peter Lieven
465bee1da8 block: optimize zero writes with bdrv_write_zeroes
this patch tries to optimize zero write requests
by automatically using bdrv_write_zeroes if it is
supported by the format.

This significantly speeds up file system initialization and
should speed zero write test used to test backend storage
performance.

I ran the following 2 tests on my internal SSD with a
50G QCOW2 container and on an attached iSCSI storage.

a) mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdX

QCOW2         [off]     [on]     [unmap]
-----
runtime:       14secs    1.1secs  1.1secs
filesize:      937M      18M      18M

iSCSI         [off]     [on]     [unmap]
----
runtime:       9.3s      0.9s     0.9s

b) dd if=/dev/zero of=/dev/vdX bs=1M oflag=direct

QCOW2         [off]     [on]     [unmap]
-----
runtime:       246secs   18secs   18secs
filesize:      51G       192K     192K
throughput:    203M/s    2.3G/s   2.3G/s

iSCSI*        [off]     [on]     [unmap]
----
runtime:       8mins     45secs   33secs
throughput:    106M/s    1.2G/s   1.6G/s
allocated:     100%      100%     0%

* The storage was connected via an 1Gbit interface.
  It seems to internally handle writing zeroes
  via WRITESAME16 very fast.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-05-19 13:42:27 +02:00
Max Reitz
4993f7ea7e block: Allow JSON filenames
If the filename given to bdrv_open() is prefixed with "json:", parse the
rest as a JSON object and merge the result into the options QDict. If
there are conflicts, the options QDict takes precedence.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-05-19 11:36:49 +02:00
Kevin Wolf
e88ae2264d block: Fix bdrv_is_allocated() for short backing files
bdrv_is_allocated() shouldn't return true for sectors that are
unallocated, but after the end of a short backing file, even though
such sectors are (correctly) marked as containing zeros.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-05-19 11:36:48 +02:00
Kevin Wolf
b1e6fc0817 block: Fix open flags with BDRV_O_SNAPSHOT
The immediately visible effect of this patch is that it fixes committing
a temporary snapshot to its backing file. Previously, it would fail with
a "permission denied" error because bdrv_inherited_flags() forced the
backing file to be read-only, ignoring the r/w reopen of bdrv_commit().

The bigger problem this revealed is that the original open flags must
actually only be applied to the temporary snapshot, and the original
image file must be treated as a backing file of the temporary snapshot
and get the right flags for that.

Reported-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-09 20:57:31 +02:00
Kevin Wolf
f1f25a2e2e block: Fix open_flags in bdrv_reopen()
Use the same function as bdrv_open() for determining what the right
flags for bs->file are. Without doing this, a reopen means that
bs->file loses BDRV_O_CACHE_WB or BDRV_O_UNMAP if bs doesn't have it as
well.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-30 11:05:00 +02:00
Kevin Wolf
7e3d98dd31 Revert "block: another bdrv_append fix"
This reverts commit 3a389e7926. The commit
was wrong and what it tried to fix just works today without any change.

What the commit tried to fix:

    When creating live snapshots, the new image file is opened with
    BDRV_O_NO_BACKING because the whole backing chain is already opened.
    It is then appended to the chain using bdrv_append(). The result of
    this was that the image had a backing file, but BDRV_O_NO_BACKING
    was still set. This is obviously inconsistent.

    There used to be some places in qemu that closed and image and then
    opened it again, with its old flags (a bdrv_open()/close() sequence
    involves reopening the whole backing file chain, too). In this case
    the BDRV_O_NO_BACKING flag meant that the backing chain wasn't
    reopened and only the top layer was left.

    (Most, but not all of these places are replaced by bdrv_reopen()
    today, which doesn't touch the backing files at all.)

    Other places that looked at bs->open_flags weren't interested in
    BDRV_O_NO_BACKING, so no breakage there.

What it actually did:

    The commit moved the BDRV_O_NO_BACKING away to the backing file.
    Because the bdrv_open()/close() sequences only looked at the flags
    of the top level BlockDriverState and used it for the whole chain,
    the flag didn't hurt there any more. Obviously, it is still
    inconsistent because the backing file may have another backing file,
    but without practical impact.

    At the same time, it swapped all other flags. This is practically
    irrelevant as long as live snapshots only allow opening the new
    layer with the same flags as the old top layer. It still doesn't
    make any sense, and it is a time bomb that explodes as soon as the
    flags can differ.

    bdrv_append_temp_snapshot() is such a case: It adds the new flag
    BDRV_O_TEMPORARY for the temporary snapshot. The swapping of commit
    3a389e79 results in the following nonsensical configuration:

    bs->open_flags:                     BDRV_O_TEMPORARY cleared
    bs->file->open_flags:               BDRV_O_TEMPORARY set
    bs->backing_hd->open_flags:         BDRV_O_TEMPORARY set
    bs->backing_hd->file->open_flags:   BDRV_O_TEMPORARY cleared

    We're still lucky because the format layer ignores the flag and the
    protocol layer happens to get the right value, but sooner or later
    this is bound to go wrong...

What the right fix would have been:

    Simply clear the BDRV_O_NO_BACKING flag when the BlockDriverState is
    appended to an existing backing file chain, because now it does have
    a backing file.

    Commit 4ddc07ca already implemented this silently in bdrv_append(),
    so we don't have to come up with a new fix.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-30 11:05:00 +02:00
Kevin Wolf
8bfea15dda block: Unlink temporary files in raw-posix/win32
Instead of having unlink() calls in the generic block layer, where we
aren't even guarateed to have a file name, move them to those block
drivers that are actually used and that always have a filename. Gets us
rid of some #ifdefs as well.

The patch also converts bs->is_temporary to a new BDRV_O_TEMPORARY open
flag so that it is inherited in the protocol layer and the raw-posix and
raw-win32 drivers can unlink the file.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-30 11:05:00 +02:00
Kevin Wolf
5669b44de5 block: Remove BDRV_O_COPY_ON_READ for bs->file
Copy on Read makes sense on the format level where backing files are
implemented, but it's not required on the protocol level. While it
shouldn't actively break anything to have COR enabled on both layers,
needless serialisation and allocation checks may impact performance.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-30 11:05:00 +02:00
Kevin Wolf
317fc44ef2 block: Create bdrv_backing_flags()
Instead of manipulation flags inline, move the derivation of the flags
of a backing file into a new function next to the existing functions
that derive flags for bs->file and for the block driver open function.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-30 11:05:00 +02:00
Kevin Wolf
0b50cc8853 block: Create bdrv_inherited_flags()
Instead of having bdrv_open_flags() as a function that creates flags for
several unrelated places and then adding open-coded flags on top, create
a new function that derives the flags for bs->file from the flags for bs.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-30 11:05:00 +02:00
Jeff Cody
e855e4fb7b block: Ignore duplicate or NULL format_name in bdrv_iterate_format
Some block drivers have multiple BlockDriver instances with identical
format_name fields (e.g. gluster, nbd).

Both qemu-img and qemu will use bdrv_iterate_format() to list the
supported formats when a help option is invoked.  As protocols and
formats may register multiple drivers, redundant listings of formats
occur (e.g., "Supported formats: ... gluster gluster gluster gluster ...
").

Since the list of driver formats will be small, this performs a simple
linear search on format_name, and ignores any duplicates.

The end result change is that the iterator will no longer receive
duplicate string names, nor will it receive NULL pointers.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-04-29 11:58:07 +02:00
Markus Armbruster
0fb6395c0c Use error_is_set() only when necessary (again)
error_is_set(&var) is the same as var != NULL, but it takes
whole-program analysis to figure that out.  Unnecessarily hard for
optimizers, static checkers, and human readers.  Commit 84d18f0 dumbed
it down to obvious, but a few more have crept in since, and
documentation was overlooked.  Dumb these down, too.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-04-25 18:05:06 +02:00
Benoît Canet
1ba4b6a553 block: Prevent coroutine stack overflow when recursing in bdrv_open_backing_file.
In 1.7.1 qcow2_create2 reopen the file for flushing without the BDRV_O_NO_BACKING
flags.

As a consequence the code would recursively open the whole backing chain.

These three stack arrays would pile up through the recursion and lead to a coroutine
stack overflow.

Convert these array to malloced buffers in order to streamline the coroutine
footprint.

Symptoms where freezes or segfaults on production machines while taking QMP externals
snapshots. The overflow disturbed coroutine switching.

[Resolved conflicts on qemu.git/master since the patch was against v1.7.1
--Stefan]

Signed-off-by: Benoit Canet <benoit.canet@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-04-25 18:05:05 +02:00
Kevin Wolf
f2d953ec31 block: Catch duplicate IDs in bdrv_new()
Since commit f298d071, block devices added with blockdev-add don't have
a QemuOpts around in dinfo->opts. Consequently, we can't rely any more
on QemuOpts catching duplicate IDs for block devices.

This patch adds a new check for duplicate IDs to bdrv_new(), and moves
the existing check that the ID isn't already taken for a node-name there
as well.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-04-22 12:00:28 +02:00
Kevin Wolf
98522f63f4 block: Add errp to bdrv_new()
This patch adds an errp parameter to bdrv_new() and updates all its
callers. The next patches will make use of this in order to check for
duplicate IDs. Most of the callers know that their ID is fine, so they
can simply assert that there is no error.

Behaviour doesn't change with this patch yet as bdrv_new() doesn't
actually assign errors to errp.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-04-22 12:00:20 +02:00
Kevin Wolf
636ea3708c block: Remove -errno return value from bdrv_assign_node_name
It takes an errp argument. That's enough for error handling.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-04-22 11:57:02 +02:00
Fam Zheng
b8afb520e4 block: Handle error of bdrv_getlength in bdrv_create_dirty_bitmap
bdrv_getlength could fail, check the return value before using it.
Return NULL and set errno if it fails. Callers are updated to handle
the error case.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-04-22 11:57:02 +02:00
Kevin Wolf
9ce10c0bdc block: Check bdrv_getlength() return value in bdrv_make_zero()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-22 11:57:02 +02:00
Kevin Wolf
da15ee5134 block: Catch integer overflow in bdrv_rw_co()
Insanely large requests could cause an integer overflow in
bdrv_rw_co() while converting sectors to bytes. This patch catches the
problem and returns an error (if we hadn't overflown the integer here,
bdrv_check_byte_request() would have rejected the request, so we're not
breaking anything that was supposed to work before).

We actually do have a test case that triggers behaviour where we
accidentally let such a request pass, so that it would return success,
but read 0 bytes instead of the requested 4 GB. It fails now like it
should.

If the vdi block driver wants to be able to deal with huge images, it
can't read the whole block bitmap at once into memory like it does
today, but needs to use a metadata cache like qcow2 does.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-04-22 11:57:02 +02:00
Kevin Wolf
1dd3a44753 block: Limit size to INT_MAX in bdrv_check_byte_request()
Commit 8f4754ed intended to protect against integer overflow bugs in
block drivers by making sure that a single request that is passed to
drivers is no longer than INT_MAX bytes.

However, meanwhile there are some callers that don't use that code path
any more but call bdrv_check_byte_request() directy, so let's add a
check there as well.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-22 11:57:02 +02:00
Kevin Wolf
54db38a479 block: Fix nb_sectors check in bdrv_check_byte_request()
nb_sectors is signed, check for negative values.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-22 11:57:02 +02:00
Kevin Wolf
f187743acd block: Check bdrv_getlength() return value in bdrv_append_temp_snapshot()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-04 19:35:52 +02:00
Kevin Wolf
b998875dcf block: Fix snapshot=on for protocol parsed from filename
Since commit 9fd3171a, BDRV_O_SNAPSHOT uses an option QDict to specify
the originally requested image as the backing file of the newly created
temporary snapshot. This means that the filename is stored in
"file.filename", which is an option that is not parsed for protocol
names. Therefore things like -drive file=nbd:localhost:10809 were
broken because it looked for a local file with the literal name
'nbd:localhost:10809'.

This patch changes the way BDRV_O_SNAPSHOT works once again. We now open
the originally requested image as normal, and then do a similar
operation as for live snapshots to put the temporary snapshot on top.
This way, both driver specific options and parsed filenames work.

As a nice side effect, this results in code movement to factor
bdrv_append_temp_snapshot() out. This is a good preparation for moving
its call to drive_init() and friends eventually.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-04 19:35:51 +02:00
Kevin Wolf
e3fa4bfa72 block: Don't parse 'filename' option
When using the QDict option 'filename', it is supposed to be interpreted
literally. The code did correctly avoid guessing the protocol from any
string before the first colon, but it still called bdrv_parse_filename()
which would, for example, incorrectly remove a 'file:' prefix in the
raw-posix driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-04 17:10:25 +02:00
Kevin Wolf
8f4754ede5 block: Limit request size (CVE-2014-0143)
Limiting the size of a single request to INT_MAX not only fixes a
direct integer overflow in bdrv_check_request() (which would only
trigger bad behaviour with ridiculously huge images, as in close to
2^64 bytes), but can also prevent overflows in all block drivers.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-04-01 15:22:35 +02:00
Kevin Wolf
5a8a30db47 block: Add error handling to bdrv_invalidate_cache()
If it returns an error, the migrated VM will not be started, but qemu
exits with an error message.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-03-19 09:39:41 +01:00
Markus Armbruster
c3adb58fe0 blockdev: Refuse to open encrypted image unless paused
Opening an encrypted image takes an additional step: setting the key.
Between open and the key set, the image must not be used.

We have some protection against accidental use in place: you can't
unpause a guest while we're missing keys.  You can, however, hot-plug
block devices lacking keys into a running guest just fine, or insert
media lacking keys.  In the latter case, notifying the guest of the
insert is delayed until the key is set, which may suffice to protect
at least some guests in common usage.

This patch makes the protection apply in more cases, in a rather
heavy-handed way: it doesn't let you open encrypted images unless
we're in a paused state.

It doesn't extend the protection to users other than the guest (block
jobs?).  Use of runstate_check() from block.c is disgusting.  Best I
can do right now.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-03-14 16:24:42 +01:00
Max Reitz
9562f69cfd block: Unlink temporary file
If the image file cannot be opened and was created as a temporary file,
it should be deleted; thus, in this case, we should jump to the
"unlink_and_fail" label and not just to "fail".

Reported-by: Benoît Canet <benoit@irqsave.net>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-03-13 14:42:24 +01:00
Benoît Canet
b5042a3622 block: Rewrite the snapshot authorization mechanism for block filters.
This patch keep the recursive way of doing things but simplify it by giving
two responsabilities to all block filters implementors.

They will need to do two things:

-Set the is_filter field of their block driver to true.

-Implement the bdrv_recurse_is_first_non_filter method of their block driver like
it is done on the Quorum block driver. (block/quorum.c)

[Paolo Bonzini <pbonzini@redhat.com> pointed out that this patch changes
the semantics of blkverify, which now recurses down both bs->file and
s->test_file.
-- Stefan]

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-03-13 14:23:27 +01:00
Max Reitz
938789ea92 block: bs->drv may be NULL in bdrv_debug_resume()
Currently, bdrv_debug_resume() requires every bs->drv in the BDS stack
to be NULL until a bs->drv with an implementation of bdrv_debug_resume()
is found. For a normal function, this would be fine, but this is a
function for debugging purposes and should therefore allow intermediate
BDS not to have a driver (i.e., be "ejected"). Otherwise, it is hard to
debug such situations.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-03-13 14:23:27 +01:00
Kevin Wolf
3456a8d185 block: Update image size in bdrv_invalidate_cache()
After migration has completed, we call bdrv_invalidate_cache() so that
drivers which cache some data drop their stale copy of the data and
reread it from the image file to get a new version of data that the
source modified while the migration was running.

Reloading metadata from the image file is useless, though, if the size
of the image file stays stale (this is a value that is cached for all
image formats in block.c). Reads from (meta)data after the old EOF
return only zeroes, causing image corruption.

We need to update bs->total_sectors in all layers that could potentially
have changed their size (i.e. backing files are not a concern - if they
are changed, we're in bigger trouble)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-03-13 14:23:27 +01:00
Kevin Wolf
eb909c7f72 block: Fix error path segfault in bdrv_open()
Using an invalid option for a block device that is opened with
BDRV_O_PROTOCOL led to drv = NULL, and when trying to include the driver
name in the error message, qemu dereferenced it:

    $ x86_64-softmmu/qemu-system-x86_64 -drive file=/tmp/test.qcow2,file.foo=bar
    Segmentation fault (core dumped)

With this patch applied, the expected error message is printed:

    $ x86_64-softmmu/qemu-system-x86_64 -drive file=/tmp/test.qcow2,file.foo=bar
    qemu-system-x86_64: -drive file=/tmp/test.qcow2,file.foo=bar: could
    not open disk image /tmp/test.qcow2: Block protocol 'file' doesn't
    support the option 'foo'

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-03-06 17:29:24 +01:00
Max Reitz
cd5d031e75 block: Keep "filename" option after parsing
Currently, bdrv_file_open() always removes the "filename" option from
the options QDict after bdrv_parse_filename() has been (successfully)
called. However, for drivers with bdrv_needs_filename, it makes more
sense for bdrv_parse_filename() to overwrite the "filename" option and
for bdrv_file_open() to fetch the filename from there.

Since there currently are no drivers that implement
bdrv_parse_filename() and have bdrv_needs_filename set, this does not
change current behavior.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-03-06 16:18:01 +01:00
Benoît Canet
90ce8a061b block: make bdrv_swap rebuild the bs graph node list field.
Moving only the node_name one field could lead to some inconsitencies where a
node_name was defined on a bs which was not registered in the graph node list.

bdrv_swap between a named node bs and a non named node bs would lead to this.

bdrv_make_anon would then crash because it would try to remove the bs from the
graph node list while it is not in it.

This patch remove named node bses from the graph node list before doing the swap
then insert them back.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-03-06 11:33:10 +01:00
Kevin Wolf
47ea2de2d6 block: Fix bs->request_alignment assertion for bs->sg=1
For sg backends, bs->request_alignment is meaningless and may be 0.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-05 16:58:37 +01:00
Amit Shah
69bef7931e block: use /var/tmp instead of /tmp for -snapshot
If TMPDIR is not specified, the default was to use /tmp for the working
copy of the block devices.  Update this to /var/tmp instead, so systems
using tmp-on-tmpfs don't end up inadvertently using RAM for the block
device.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-02-28 18:59:07 +01:00
Max Reitz
f7d9fd8c72 block: Remove bdrv_open_image()'s force_raw option
This option is now unnecessary since specifying BDRV_O_PROTOCOL as flag
will do exactly the same.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz
5acd9d81e1 block: Reuse success path from bdrv_open()
The fail and success paths of bdrv_file_open() may be further shortened
by reusing code already existent in bdrv_open(). This includes
bdrv_file_open() not taking the reference to options which allows the
removal of QDECREF(options) in that function.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz
5469a2a688 block: Handle bs->options in bdrv_open() only
The fail paths of bdrv_file_open() and bdrv_open() naturally exhibit
similarities, thus it is possible to reuse the one from bdrv_open() and
shorten the one in bdrv_file_open() accordingly.

Also, setting bs->options in bdrv_file_open() is not necessary if it is
already done in bdrv_open().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz
d4446eae63 block: Remove bdrv_new() from bdrv_file_open()
Change bdrv_file_open() to take a simple pointer to an already existing
BDS instead of an indirect one. The BDS will be created in bdrv_open()
if necessary.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz
5d12aa63c7 block: Reuse reference handling from bdrv_open()
Remove the reference parameter and the related handling code from
bdrv_file_open(), since it exists in bdrv_open() now as well.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz
2e40134bfd block: Make bdrv_file_open() static
Add the bdrv_open() option BDRV_O_PROTOCOL which results in passing the
call to bdrv_file_open(). Additionally, make bdrv_file_open() static and
therefore bdrv_open() the only way to call it.

Consequently, all existing calls to bdrv_file_open() have to be adjusted
to use bdrv_open() with the BDRV_O_PROTOCOL flag instead.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz
ddf5636dc9 block: Add reference parameter to bdrv_open()
Allow bdrv_open() to handle references to existing block devices just as
bdrv_file_open() is already capable of.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz
f67503e5bd block: Change BDS parameter of bdrv_open() to **
Make bdrv_open() take a pointer to a BDS pointer, similarly to
bdrv_file_open(). If a pointer to a NULL pointer is given, bdrv_open()
will create a new BDS with an empty name; if the BDS pointer is not
NULL, that existing BDS will be reused (in the same way as bdrv_open()
already did).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:21 +01:00
Kevin Wolf
e6dc8a1f83 block: Fix bdrv_is_first_non_filter()
Consider top level BlockDriverStates as well.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Tested-by: Benoit Canet <benoit@irqsave.net>
2014-02-21 21:02:21 +01:00
Peter Maydell
4c0c9bbe78 Merge remote-tracking branch 'remotes/qmp-unstable/queue/qmp' into staging
* remotes/qmp-unstable/queue/qmp:
  monitor: Add object_add class argument completion.
  monitor: Add object_del id argument completion.
  monitor: Add device_add device argument completion.
  monitor: Add device_del id argument completion.
  qmp: expose list of supported character device backends
  Use error_is_set() only when necessary
  QMP: allow JSON dict arguments in qmp-shell
  hmp: migrate command (without -d) now blocks correctly

Conflicts:
	blockdev.c

[PMM: resolved trivial conflict in blockdev.c]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-02-20 12:10:23 +00:00
Markus Armbruster
84d18f065f Use error_is_set() only when necessary
error_is_set(&var) is the same as var != NULL, but it takes
whole-program analysis to figure that out.  Unnecessarily hard for
optimizers, static checkers, and human readers.  Dumb it down to
obvious.

Gets rid of several dozen Coverity false positives.

Note that the obvious form is already used in many places.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-02-17 11:57:23 -05:00
Benoît Canet
0c5e94ee83 block: Open by reference will try device then node_name.
Since we introduced node_name for named bs of the graph modify the opening by
reference to use it as a fallback.

This patch also enforce the separation of the device id and graph node
namespaces.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-02-14 18:05:39 +01:00
Benoît Canet
dd67fa5052 block: Relax bdrv_lookup_bs constraints.
The following patch will reuse bdrv_lookup_bs in order to open images by
references so the rules of usage of bdrv_lookup_bs must be relaxed a bit.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-02-14 18:05:39 +01:00
Kevin Wolf
e96126ffa5 block: Fix 32 bit truncation in mark_request_serialising()
On 32 bit hosts, size_t is too small for align as the bitmask
~(align - 1) will zero out the higher 32 bits of the offset.

While at it, change the local overlap_bytes variable to unsigned to
match the field in BdrvTrackedRequest.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
2014-02-09 09:12:39 +01:00
Kevin Wolf
5f5bcd80f8 block: Don't call ROUND_UP with negative values
The behaviour of the ROUND_UP macro with negative numbers isn't obvious.
It happens to do the right thing in this please, but better avoid it.

Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
2014-02-09 09:12:39 +01:00
Kevin Wolf
af91f9a73c block: bdrv_aligned_pwritev: Assert overlap range
This adds assertions that the request that we actually end up passing to
the block driver (which includes RMW data and has therefore potentially
been rounded to alignment boundaries) is fully covered by the
overlap_{offset,size} fields of the associated BdrvTrackedRequest.

Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
2014-02-09 09:12:39 +01:00
Kevin Wolf
99c4a85ce6 block: Fix memory leaks in bdrv_co_do_pwritev()
The error path for a failure in one of the two bdrv_aligned_preadv()
calls leaked head_buf or tail_buf, respectively. This fixes the memory
leak.

Reported-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
2014-02-09 09:12:39 +01:00
Kevin Wolf
765003db02 block: Fail gracefully with missing filename
This fixes a regression introduced in commit 2a05cbe42 ('block: Allow
block devices without files'):

$ qemu-system-x86_64 -drive driver=file
qemu-system-x86_64: block.c:892: bdrv_open_common: Assertion
`!drv->bdrv_needs_filename || filename != ((void *)0)' failed.

Now the respective check must be performed not only in bdrv_file_open(),
but also in bdrv_open().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-02-09 09:12:38 +01:00
Kevin Wolf
d5103588aa block: Switch bdrv_io_limits_intercept() to byte granularity
Request sizes used to be rounded down to the next sector boundary,
allowing to bypass the I/O limit. Now all requests are accounted for
with their exact byte size.

Reported-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:28 +01:00
Kevin Wolf
9e1cb96d9a qemu-iotests: Test pwritev RMW logic
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:25 +01:00
Kevin Wolf
8407d5d7e2 block: Make bdrv_pwrite() a bdrv_prwv_co() wrapper
Instead of implementing the alignment adjustment here, use the now
existing functionality of bdrv_co_do_pwritev().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:03 +01:00
Kevin Wolf
a3ef657185 block: Make bdrv_pread() a bdrv_prwv_co() wrapper
Instead of implementing the alignment adjustment here, use the now
existing functionality of bdrv_co_do_preadv().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:03 +01:00
Kevin Wolf
775aa8b6e0 block: Change coroutine wrapper to byte granularity
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:03 +01:00
Kevin Wolf
28de2dcd88 block: Assert serialisation assumptions in pwritev
If a request calls wait_serialising_requests() and actually has to wait
in this function (i.e. a coroutine yield), other requests can run and
previously read data (like the head or tail buffer) could become
outdated. In this case, we would have to restart from the beginning to
read in the updated data.

However, we're lucky and don't actually need to do that: A request can
only wait in the first call of wait_serialising_requests() because we
mark it as serialising before that call, so any later requests would
wait. So as we don't wait in practice, we don't have to reload the data.

This is an important assumption that may not be broken or data
corruption will happen. Document it with some assertions.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:03 +01:00
Kevin Wolf
3b8242e0ea block: Align requests in bdrv_co_do_pwritev()
This patch changes bdrv_co_do_pwritev() to actually be what its name
promises. If requests aren't properly aligned, it performs a RMW.

Requests touching the same block are serialised against the RMW request.
Further optimisation of this is possible by differentiating types of
requests (concurrent reads should actually be okay here).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
6460440f34 block: Allow wait_serialising_requests() at any point
We can only have a single wait_serialising_requests() call per request
because otherwise we can run into deadlocks where requests are waiting
for each other. The same is true when wait_serialising_requests() is not
at the very beginning of a request, so that other requests can be issued
between the start of the tracking and wait_serialising_requests().

Fix this by changing wait_serialising_requests() to ignore requests that
are already (directly or indirectly) waiting for the calling request.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
7327145f63 block: Make overlap range for serialisation dynamic
Copy on Read wants to serialise with all requests touching the same
cluster, so wait_serialising_requests() rounded to cluster boundaries.
Other users like alignment RMW will have different requirements, though
(requests touching the same sector), so make it dynamic.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
2dbafdc012 block: Generalise and optimise COR serialisation
Change the API so that specific requests can be marked serialising. Only
these requests are checked for overlaps then.

This means that during a Copy on Read operation, not all requests
overlapping other requests are serialised any more, but only those that
actually overlap with the specific COR request.

Also remove COR from function and variable names because this
functionality can be useful in other contexts.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
ec746e10cb block: Make zero-after-EOF work with larger alignment
Odd file sizes could make bdrv_aligned_preadv() shorten the request in
non-aligned ways. Fix it by rounding to the required alignment instead
of 512 bytes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
65afd211c7 block: Allow waiting for overlapping requests between begin/end
Previously, it was not possible to use wait_for_overlapping_requests()
between tracked_request_begin()/end() because it would wait for itself.

Ignore the current request in the overlap check and run more of the
bdrv_co_do_preadv/pwritev code with a BdrvTrackedRequest present.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
793ed47a7a block: Switch BdrvTrackedRequest to byte granularity
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
6601553e27 block: Introduce bdrv_co_do_pwritev()
This is going to become the bdrv_co_do_preadv() equivalent for writes.
In this patch, however, just a function taking byte offsets is created,
it doesn't align anything yet.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
244eadef5c block: write: Handle COR dependency after I/O throttling
First waiting for all COR requests to complete and calling the
throttling function afterwards means that the request could be delayed
and we still need to wait for the COR request even if it was issued only
after the throttled write request.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
b404f72036 block: Introduce bdrv_aligned_pwritev()
This separates the part of bdrv_co_do_writev() that needs to happen
before the request is modified to match the backend alignment, and a
part that needs to be executed afterwards and passes the request to the
BlockDriver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
1b0288ae7f block: Introduce bdrv_co_do_preadv()
Similar to bdrv_pread(), which aligns byte-aligned request to 512 byte
sectors, bdrv_co_do_preadv() takes a byte-aligned request and aligns it
to the alignment specified in bs->request_alignment.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
d0c7f642f5 block: Introduce bdrv_aligned_preadv()
This separates the part of bdrv_co_do_readv() that needs to happen
before the request is modified to match the backend alignment, and a
part that needs to be executed afterwards and passes the request to the
BlockDriver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:02 +01:00
Paolo Bonzini
c25f53b06e raw: Probe required direct I/O alignment
Add a bs->request_alignment field that contains the required
offset/length alignment for I/O requests and fill it in the raw block
drivers. Use ioctls if possible, else see what alignment it takes for
O_DIRECT to succeed.

While at it, also expose the memory alignment requirements, which may be
(and in practice are) different from the disk alignment requirements.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:02 +01:00
Paolo Bonzini
1b7fd72955 block: rename buffer_alignment to guest_block_size
The alignment field is now set to the value that is promised to the
guest, rather than required by the host.  The next patches will make
QEMU aware of the host-provided values, so make this clear.

The alignment is also not about memory buffers, but about the sectors on
the disk, change the documentation of the field.

At this point, the field is set by the device emulation, but completely
ignored by the block layer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:01 +01:00
Kevin Wolf
339064d506 block: Don't use guest sector size for qemu_blockalign()
bs->buffer_alignment is set by the device emulation and contains the
logical block size of the guest device. This isn't something that the
block layer should know, and even less something to use for determining
the right alignment of buffers to be used for the host.

The new BlockLimits field opt_mem_alignment tells the qemu block layer
the optimal alignment to be used so that no bounce buffer must be used
in the driver.

This patch may change the buffer alignment from 4k to 512 for all
callers that used qemu_blockalign() with the top-level image format
BlockDriverState. The value was never propagated to other levels in the
tree, so in particular raw-posix never required anything else than 512.

While on disks with 4k sectors direct I/O requires a 4k alignment,
memory may still be okay when aligned to 512 byte boundaries. This is
what must have happened in practice, because otherwise this would
already have failed earlier. Therefore I don't expect regressions even
with this intermediate state. Later, raw-posix can implement the hook
and expose a different memory alignment requirement.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:01 +01:00
Kevin Wolf
1ff735bdc4 block: Detect unaligned length in bdrv_qiov_is_aligned()
For an O_DIRECT request to succeed, it's not only necessary that all
base addresses in the qiov are aligned, but also that each length in it
is aligned.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:01 +01:00
Kevin Wolf
355ef4ac95 block: Update BlockLimits when they might have changed
When reopening with different flags, or when backing files disappear
from the chain, the limits may change. Make sure they get updated in
these cases.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit@irqsave.net>
2014-01-24 17:40:01 +01:00
Kevin Wolf
466ad822de block: Inherit opt_transfer_length
When there is a format driver between the backend, it's not guaranteed
that exposing the opt_transfer_length for the format driver results in
the optimal requests (because of fragmentation etc.), but it can't make
things worse, so let's just do it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit@irqsave.net>
2014-01-24 17:40:01 +01:00
Kevin Wolf
d34682cd4a block: Move initialisation of BlockLimits to bdrv_refresh_limits()
This function separates filling the BlockLimits from bdrv_open(), which
allows it to call it from other operations which may change the limits
(e.g. modifications to the backing file chain or bdrv_reopen)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:01 +01:00
Kevin Wolf
dabfa6cc2e block: Fix bdrv_commit return value
bdrv_commit() could return 0 or 1 on success, depending on whether or
not the last sector was allocated in the overlay and whether the overlay
format had a .bdrv_make_empty callback.

Most callers ignored it, but qemu-img commit would print an error
message while the operation actually succeeded.

Also clean up the handling of I/O errors to return the real error code
instead of -EIO.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 16:53:51 +01:00
Jeff Cody
72706ea4cd block: resize backing file image during offline commit, if necessary
Currently, if an image file is logically larger than its backing file,
committing it via 'qemu-img commit' will fail.

For instance, if we have a base image with a virtual size 10G, and a
snapshot image of size 20G, then committing the snapshot offline with
'qemu-img commit' will likely fail.

This will automatically attempt to resize the base image, if the
snapshot image to be committed is larger.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 16:12:49 +01:00
Benoît Canet
212a5a8f09 block: Create authorizations mechanism for external snapshot and resize.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 16:07:08 +01:00
Benoît Canet
12d3ba821d qmp: Allow to change password on named block driver states.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Fam Zheng <famz@redhat.com>

There was two candidate ways to implement named node manipulation:

1)
{ 'command': 'block_passwd', 'data': {'*device': 'str',
                                      '*node-name': 'str', 'password': 'str'}
}

2)

{ 'command': 'block_passwd', 'data': {'device': 'str',
                                      '*device-is-node': 'bool',
                                      'password': 'str'} }

Luiz proposed 1 and says 2 was an abuse of the QMP interface and proposed to
rewrite the QMP block interface for 2.0.

Luiz does not like in 1 the fact that 2 fields are optional but one of them must
be specified leading to an abuse of the QMP semantic.

Kevin argumented that 2 what a clear abuse of the device field and would not be
practical when reading fast some log file because the user would read "device"
and think that a device is manipulated when it's in fact a node name.
Documentation of 1 make it pretty clear what to do for the user.

Kevin argued that all bs are node including devices ones so 2 does not make
sense.

Kevin also argued that rewriting the QMP block interface would not make disapear
the current one.

Kevin pushed the argument that making the QAPI generator compatible with the
semantic of the operation would need a rewrite that no one has done yet.

A vote has been done on the list to elect the version to use and 1 won.

For reference the complete thread is:
"[Qemu-devel] [PATCH V4 4/7] qmp: Allow to change password on names block driver
states."

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 16:07:08 +01:00
Benoît Canet
c13163fba1 qmp: Add QMP query-named-block-nodes to list the named BlockDriverState nodes.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 16:07:08 +01:00
Benoît Canet
6913c0c2ce block: Allow the user to define "node-name" option both on command line and QMP.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 16:06:47 +01:00
Benoît Canet
dc364f4cdc block: Add bs->node_name to hold the name of a bs node of the bs graph.
Add the minimum of code to prepare for the following patches.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 14:33:01 +01:00
Peter Feiner
d80ac658f2 block: fix backing file segfault
When a backing file is opened such that (1) a protocol is directly
used as the block driver and (2) the block driver has bdrv_file_open,
bdrv_open_backing_file segfaults. The problem arises because
bdrv_open_common returns without setting bd->backing_hd->file.

To effect (1), you seem to have to use the -F flag in qemu-img. There
are several block drivers that satisfy (2), such as "file" and "nbd".
Here are some concrete examples:

    #!/bin/bash

    echo Test file format
    ./qemu-img create -f file base.file 1m
    ./qemu-img create -f qcow2 -F file -o backing_file=base.file\
        file-overlay.qcow2
    ./qemu-img convert -O raw file-overlay.qcow2 file-convert.raw

    echo Test nbd format
    SOCK=$PWD/nbd.sock
    ./qemu-img create -f raw base.raw 1m
    ./qemu-nbd -t -k $SOCK base.raw &
    trap "kill $!" EXIT
    while ! test -e $SOCK; do sleep 1; done
    ./qemu-img create -f qcow2 -F nbd -o backing_file=nbd:unix:$SOCK\
        nbd-overlay.qcow2
    ./qemu-img convert -O raw nbd-overlay.qcow2 nbd-convert.raw

Without this patch, the two qemu-img convert commands segfault.

This is a regression that was introduced in v1.7 by
dbecebddfa.

Signed-off-by: Peter Feiner <peter@gridcentric.ca>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-22 13:47:52 +01:00
Max Reitz
505d758334 block: Allow recursive "file"s
It should be possible to use a format as a driver for a file which in
turn requires another file, i.e., nesting file formats.

Allowing nested file formats results in e.g. qcow2 BlockDriverStates
never being directly passed to bdrv_open_common() from bdrv_file_open(),
but instead being handed through bdrv_open(). This changes the error
message when trying to give a filename to qcow2, i.e. trying to use it
as a driver for the protocol level. Therefore, change the reference
output of I/O test 051 accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-22 12:07:18 +01:00
Max Reitz
054963f8f0 block: Use bdrv_open_image() in bdrv_open()
Using bdrv_open_image() instead of bdrv_file_open() directly in
bdrv_open() is easier.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-22 12:07:18 +01:00
Max Reitz
da557aac18 block: Add bdrv_open_image()
Add a common function for opening images to be used for block drivers
specified through BlockdevRefs in an option QDict. The difference from
bdrv_file_open() is that this function may invoke bdrv_open() instead,
allowing auto-detection of the driver to be used; and second, it
automatically extracts the BlockdevRef from the option QDict.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-22 12:07:18 +01:00
Max Reitz
2a05cbe426 block: Allow block devices without files
blkdebug and blkverify will, in order to retain compatibility, not
support the field "file" implicitly through bdrv_open(). In order to be
able to use those drivers without giving a filename anyway, it is
necessary to be able to have block devices without files implicitly
opened by bdrv_open(). This is the case, if there was neither a file
name, a reference to an existing block device to use as a file nor
options specific to the file.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-22 12:07:17 +01:00
Max Reitz
2258e3fe20 block: Pass reference to bdrv_file_open()
With that now being possible, bdrv_open() should try to extract a block
device reference from the options and pass it to bdrv_file_open().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-22 12:07:17 +01:00
Max Reitz
72daa72eee block: Allow reference for bdrv_file_open()
Allow specifying a reference to an existing block device (by name) for
bdrv_file_open() instead of a filename and/or options.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-22 12:07:17 +01:00
Peter Lieven
3d94ce60ae block: expect get_block_status errors in bdrv_make_zero
during testing around with 4k LUNs a bad target implementation
triggert an -EIO in iscsi_get_block_status, but it got never caught
resulting in an infinite loop.

CC: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-12-13 14:49:50 +01:00
Stefan Hajnoczi
0b06ef3bdd block: clean up bdrv_drain_all() throttling comments
Since cc0681c454 ("block: Enable the new
throttling code in the block layer.") bdrv_drain_all() no longer spins.
The code used to look as follows:

  do {
      busy = qemu_aio_wait();

      /* FIXME: We do not have timer support here, so this is effectively
       * a busy wait.
       */
      QTAILQ_FOREACH(bs, &bdrv_states, list) {
          while (qemu_co_enter_next(&bs->throttled_reqs)) {
              busy = true;
          }
      }
  } while (busy);

Note that throttle requests are kicked but I/O throttling limits are
still in effect.  The loop spins until the vm_clock time allows the
request to make progress and complete.

The new throttling code introduced bdrv_start_throttled_reqs().  This
function not only kicks throttled requests but also temporarily disables
throttling so requests can run.

The outdated FIXME comment can be removed.  Also drop the busy = true
assignment since we overwrite it immediately afterwards.

Reviewed-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-06 16:53:51 +01:00
Max Reitz
66f6b8143b block: Close backing file early in bdrv_img_create
Leaving the backing file open although it is not needed anymore can
cause problems if it is opened through a block driver which allows
exclusive access only and if the create function of the block driver
used for the top image (the one being created) tries to close and reopen
the image file (which will include opening the backing file a second
time).

In particular, this will happen with a backing file opened through
qemu-nbd and using qcow2 as the top image file format (which reopens the
image to flush it to disk).

In addition, the BlockDriverState in bdrv_img_create() is used for the
backing file only; it should therefore be made local to the respective
block.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-04 11:29:19 +01:00
Paolo Bonzini
b8d71c09f3 block: make bdrv_co_do_write_zeroes stricter in producing aligned requests
Right now, bdrv_co_do_write_zeroes will only try to align the
beginning of the request.  However, it is simpler for many
formats to expect the block layer to separate both the head *and*
the tail.  This makes sure that the format's bdrv_co_write_zeroes
function will be called with aligned sector_num and nb_sectors for
the bulk of the request.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-03 15:26:49 +01:00
Paolo Bonzini
7ce21016b6 block: handle ENOTSUP from discard in generic code
Similar to write_zeroes, let the generic code receive a ENOTSUP for
discard operations.  Since bdrv_discard has advisory semantics,
we can just swallow the error.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-03 15:26:49 +01:00
Paolo Bonzini
d5ef94d43d block: add bdrv_aio_write_zeroes
This will be used by the SCSI layer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-03 15:26:49 +01:00
Paolo Bonzini
94d6ff21f4 block: add flags argument to bdrv_co_write_zeroes tracepoint
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-03 15:26:49 +01:00
Paolo Bonzini
d20d9b7c67 block: add flags to BlockRequest
This lets bdrv_co_do_rw receive flags, so that it can be used for
zero writes.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-03 15:26:48 +01:00
Paolo Bonzini
d51e9fe505 block: generalize BlockLimits handling to cover bdrv_aio_discard too
bdrv_co_discard is only covering drivers which have a .bdrv_co_discard()
implementation, but not those with .bdrv_aio_discard(). Not very nice,
and easy to avoid.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-03 15:26:48 +01:00
Kevin Wolf
c9fbb99d41 block: Use BDRV_O_NO_BACKING where appropriate
If you open an image temporarily just because you want to check its size
or get it flushed, there's no real reason to open the whole backing file
chain.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2013-11-29 17:41:09 +01:00
Kevin Wolf
9fd3171af9 block: Enable BDRV_O_SNAPSHOT with driver-specific options
In the case of snapshot=on, don't rely on the backing file path in the
temporary image any more, but override the backing file with the given
set of options. This way, block drivers that don't use a file name can
be accessed with snapshot=on, for example:

    -drive file.driver=nbd,file.host=localhost,snapshot=on

Which becomes internally something like:

    file.filename=/tmp/vl.AWQZCu,backing.file.driver=nbd,backing.file.host=localhost

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-11-29 13:40:37 +01:00
Fam Zheng
4cc70e9337 blkdebug: add "remove_break" command
This adds "remove_break" command which is the reverse of blkdebug
command "break": it removes all breakpoints with given tag and resumes
all the requests.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-29 13:40:37 +01:00
Fam Zheng
21b5683508 qapi: Change BlockDirtyInfo to list
We have multiple dirty bitmaps in BDS now, switch QAPI to allow query
it (BlockInfo.dirty_bitmaps), and also drop old BlockInfo.dirty.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-11-29 13:40:36 +01:00
Fam Zheng
e4654d2d94 block: per caller dirty bitmap
Previously a BlockDriverState has only one dirty bitmap, so only one
caller (e.g. a block job) can keep track of writing. This changes the
dirty bitmap to a list and creates a BdrvDirtyBitmap for each caller, the
lifecycle is managed with these new functions:

    bdrv_create_dirty_bitmap
    bdrv_release_dirty_bitmap

Where BdrvDirtyBitmap is a linked list wrapper structure of HBitmap.

In place of bdrv_set_dirty_tracking, a BdrvDirtyBitmap pointer argument
is added to these functions, since each caller has its own dirty bitmap:

    bdrv_get_dirty
    bdrv_dirty_iter_init
    bdrv_get_dirty_count

bdrv_set_dirty and bdrv_reset_dirty prototypes are unchanged but will
internally walk the list of all dirty bitmaps and set them one by one.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-11-29 13:40:33 +01:00
Peter Lieven
c3d8688470 block/get_block_status: fix BDRV_BLOCK_ZERO for unallocated blocks
this patch does 2 things:
a) only do additional call outs if BDRV_BLOCK_ZERO is not already set.
b) use the newly introduced bdrv_unallocated_blocks_are_zero()
   to return the zero state of an unallocated block. the used callout
   to bdrv_has_zero_init() is only valid right after bdrv_create.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:52 +01:00
Peter Lieven
d75cbb5e68 block: introduce bdrv_make_zero
this patch adds a call to completely zero out a block device.
the operation is sped up by checking the block status and
only writing zeroes to the device if they currently do not
return zeroes. optionally the zero writing can be sped up
by setting the flag BDRV_REQ_MAY_UNMAP to emulate the zero
write by unmapping if the driver supports it.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:52 +01:00
Peter Lieven
6f14da5247 block: honour BlockLimits in bdrv_co_discard
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:51 +01:00
Peter Lieven
c31cb70728 block: honour BlockLimits in bdrv_co_do_write_zeroes
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:51 +01:00
Peter Lieven
4ce786914b block: add wrappers for logical block provisioning information
This adds 2 wrappers to read the unallocated_blocks_are_zero and
can_write_zeroes_with_unmap info from the BDI. The wrappers are
required to check for the existence of a backing_hd and
if the devices are opened with the correct flags.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:51 +01:00
Peter Lieven
d32f35cbc5 block: introduce BDRV_REQ_MAY_UNMAP request flag
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:51 +01:00
Peter Lieven
aa7bfbfff7 block: add flags to bdrv_*_write_zeroes
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:51 +01:00
Peter Lieven
6faac15fa8 block: make BdrvRequestFlags public
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:51 +01:00
Kevin Wolf
06d22aa367 block: Fail if requested driver is not available
If an explicit driver option is present, but doesn't specify a valid
driver, then bdrv_open() should fail instead of probing the format.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-15 13:37:48 +01:00
Fam Zheng
b04b6b6ec3 block: Print its file name if backing file opening failed
If backing file doesn't exist, the error message is confusing and
misleading:

    $ qemu /tmp/a.qcow2
    qemu: could not open disk image /tmp/a.qcow2: Could not open file: No
    such file or directory

But...

    $ ls /tmp/a.qcow2
    /tmp/a.qcow2

    $ qemu-img info /tmp/a.qcow2
    image: /tmp/a.qcow2
    file format: qcow2
    virtual size: 8.0G (8589934592 bytes)
    disk size: 196K
    cluster_size: 65536
    backing file: /tmp/b.qcow2

Because...

    $ ls /tmp/b.qcow2
    ls: cannot access /tmp/b.qcow2: No such file or directory

This is not intuitive. It's better to have the missing file's name in
the error message. With this patch:

    $ qemu-io -c 'read 0 512' /tmp/a.qcow2
    qemu-io: can't open device /tmp/a.qcow2: Could not open backing
    file: Could not open '/stor/vm/arch.raw': No such file or directory
    no file open, try 'help open'

Which is a little bit better.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-11-14 13:09:06 +01:00
Fam Zheng
7e382003f1 block: Round up total_sectors
Since b94a2610, bdrv_getlength() is omitted when probing image. VMDK
monolithicFlat is broken by that because a file < 512 bytes can't be
read with its total_sectors truncated to 0. This patch round up the size
to BDRV_SECTOR_SIZE, when a image size is not sector aligned.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-08 10:44:24 +01:00
Max Reitz
17826bc159 block: Save errno before error_setg_errno
error_setg_errno() may overwrite errno; therefore, its value should be
read before calling that function and not afterwards.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-07 13:58:58 +01:00
Kevin Wolf
b94a261057 block: Avoid unecessary drv->bdrv_getlength() calls
The block layer generally keeps the size of an image cached in
bs->total_sectors so that it doesn't have to perform expensive
operations to get the size whenever it needs it.

This doesn't work however when using a backend that can change its size
without qemu being aware of it, i.e. passthrough of removable media like
CD-ROMs or floppy disks. For this reason, the caching is disabled when a
removable device is used.

It is obvious that checking whether the _guest_ device has removable
media isn't the right thing to do when we want to know whether the size
of the host backend can change. To make things worse, non-top-level
BlockDriverStates never have any device attached, which makes qemu
assume they are removable, so drv->bdrv_getlength() is always called on
the protocol layer. In the case of raw-posix, this causes unnecessary
lseek() system calls, which turned out to be rather expensive.

This patch completely changes the logic and disables bs->total_sectors
caching only for certain block driver types, for which a size change is
expected: host_cdrom and host_floppy on POSIX, host_device on win32; also
the raw format in case it sits on top of one of these protocols, but in
the common case the nested bdrv_getlength() call on the protocol driver
will use the cache again and avoid an expensive drv->bdrv_getlength()
call.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-29 13:10:26 +01:00
Thibaut LAURENT
87a5debd31 block: Disable BDRV_O_COPY_ON_READ for the backing file
Since commit 0ebd24e0a2,
bdrv_open_common will throw an error when trying to open a file
read-only with the BDRV_O_COPY_ON_READ flag set.
Although BDRV_O_RDWR is unset for the backing files,
BDRV_O_COPY_ON_READ is still passed on if copy-on-read was requested
for the drive. Let's unset this flag too before opening the backing
file, or bdrv_open_common will fail.

Signed-off-by: Thibaut LAURENT <thibaut.laurent@gmail.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-29 13:06:39 +01:00
Max Reitz
61ed268453 block: Don't copy backing file name on error
bdrv_open_backing_file() tries to copy the backing file name using
pstrcpy directly after calling bdrv_open() to open the backing file
without checking whether that was actually successful. If it was not,
ps->backing_hd->file will probably be NULL and qemu will crash.

Fix this by moving pstrcpy after checking whether bdrv_open() succeeded.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Amos Kong <kongjianjun@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-28 17:35:52 +01:00
Kevin Wolf
0ebd24e0a2 blockdev: Don't disable COR automatically with blockdev-add
If a read-only device is configured with copy-on-read=on, the old code
only prints a warning and automatically disables copy on read. Make it
a real error for blockdev-add.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-10-11 16:50:02 +02:00
Kevin Wolf
8f94a6e40e block: Improve driver whitelist checks
The main intent of this patch is to consolidate the whitelist checks to
a single point in the code instead of spreading it everywhere. This adds
a nicer error message for read-only whitelisting, too, in places where
it was still missing.

The patch also contains a bonus bug fix: By finding the format first in
bdrv_open() and then independently checking against the whitelist only
later, we avoid the case that use of a non-whitelisted format results in
probing rather than an error message. Previously, this could happen when
using the driver=... option.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2013-10-11 16:50:00 +02:00
Benoît Canet
f6186f49e2 block: Add BlockDriver.bdrv_check_ext_snapshot.
This field is used by blkverify to disable external snapshots creation.
It will also be used by block filters like quorum to disable external
snapshot creation.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 16:49:59 +02:00
Peter Lieven
92bc50a5ad block/get_block_status: avoid redundant callouts on raw devices
if a raw device like an iscsi target or host device is used
the current implementation makes a second call out to get
the block status of bs->file.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 16:49:59 +02:00
Max Reitz
eae041fe6f block: Add bdrv_get_specific_info
Add a function for retrieving an ImageInfoSpecific object from a block
driver.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 10:52:54 +02:00
Dunrong Huang
d4cea8dfb9 block: use correct filename
The content filename point to may be erased by qemu_opts_absorb_qdict()
in raw_open_common() in drv->bdrv_file_open()

So it's better to use bs->filename.

Signed-off-by: Dunrong Huang <riegamaths@gmail.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-10-07 13:23:19 +02:00
Dunrong Huang
2fa9aa59cf block: use correct filename for error report
The content filename point to will be erased by qemu_opts_absorb_qdict()
in raw_open_common() in drv->bdrv_file_open()

So it's better to use bs->filename.

Signed-off-by: Dunrong Huang <riegamaths@gmail.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-10-02 11:41:50 +02:00
Fam Zheng
d055a1fec3 block: use DIV_ROUND_UP in bdrv_co_do_readv
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-26 14:11:06 +02:00
Benoît Canet
5726d872f3 qdict: Extract qdict_extract_subqdict
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 16:21:28 +02:00
Benoît Canet
030be32184 block: introduce BlockDriver.bdrv_needs_filename to enable some drivers.
Some drivers will have driver specifics options but no filename.
This new bool allow the block layer to treat them correctly.

The .bdrv_needs_filename is set in drivers not having .bdrv_parse_filename and
not having .bdrv_open.

The first exception to this rule will be the quorum driver.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 16:21:28 +02:00
Peter Lieven
1f9db2243c block/get_block_status: avoid segfault if there is no backing_hd
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 10:08:56 +02:00
Peter Lieven
3e0a233d86 block/get_block_status: set *pnum = 0 on error
if the call is invoked through bdrv_is_allocated the caller might
expect *pnum = 0 on error. however, a new implementation of
bdrv_get_block_status might only return a negative exit value on
error while keeping *pnum untouched.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 10:08:56 +02:00
Fam Zheng
dbecebddfa block: fix backing file overriding
Providing backing.file.filename doesn't override backing file as expected:

    $ x86_64-softmmu/qemu-system-x86_64 -drive \
        file=/tmp/child.qcow2,backing.file.filename=/tmp/fake.qcow2

    qemu-system-x86_64: -drive \
        file=/tmp/child.qcow2,backing.file.filename=/tmp/fake.qcow2: could not
        open disk image /tmp/child.qcow2: Can't specify 'file' and 'filename'
        options at the same time

With

    $ qemu-img info /tmp/child.qcow2
    image: /tmp/child.qcow2
    file format: qcow2
    virtual size: 1.0G (1073741824 bytes)
    disk size: 196K
    cluster_size: 65536
    backing file: /tmp/fake.qcow2

This fixes it by calling bdrv_get_full_backing_filename only if
backing.file.filename is not provided. Also save the backing file name
to bs->backing_file so the information is correct with HMP "info block".

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 10:08:56 +02:00
Fam Zheng
bcb9d66e85 block: don't lose data from last incomplete sector
To read the last sector that is not aligned to sector boundary, current
code for growable backends, since commit 893a8f6 "block: Produce zeros
when protocols reading beyond end of file", drops the data and directly
returns zeroes. That is incorrect.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-20 19:27:26 +02:00
Max Reitz
cc84d90ff5 block: Error parameter for create functions
Add an Error ** parameter to bdrv_create and its associated functions to
allow more specific error messages.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2013-09-12 10:12:48 +02:00
Max Reitz
34b5d2c68e block: Error parameter for open functions
Add an Error ** parameter to bdrv_open, bdrv_file_open and associated
functions to allow more specific error messages.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2013-09-12 10:12:48 +02:00
Max Reitz
d5124c00d8 bdrv: Use "Error" for creating images
Add an Error ** parameter to BlockDriver.bdrv_create to allow more
specific error messages.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2013-09-12 10:12:48 +02:00
Max Reitz
015a1036a7 bdrv: Use "Error" for opening images
Add an Error ** parameter to BlockDriver.bdrv_open and
BlockDriver.bdrv_file_open to allow more specific error messages.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2013-09-12 10:12:47 +02:00
Max Reitz
6f176b48f9 block: Image file option amendment
This patch adds the "amend" option to qemu-img which allows changing
image options on existing image files. It also adds the generic bdrv
implementation which is basically just a wrapper for the image format
specific function.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-12 10:12:46 +02:00
Paolo Bonzini
5daa74a6eb block: look for zero blocks in bs->file
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini
918e92d71b block: add default get_block_status implementation for protocols
Protocols return raw data, so you can assume the offsets to pass
through unchanged.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini
f0ad5712d5 block: return BDRV_BLOCK_ZERO past end of backing file
If the sectors are unallocated and we are past the end of the
backing file, they will read as zero.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini
415b5b013c block: use bdrv_has_zero_init to return BDRV_BLOCK_ZERO
Alternatively, this could use a "discard zeroes data" flag returned
by bdrv_get_info.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini
4333bb7140 block: define get_block_status return value
Define the return value of get_block_status.  Bits 0, 1, 2 and 9-62
are valid; bit 63 (the sign bit) is reserved for errors.  Bits 3-8
are left for future extensions.

The return code is compatible with the old is_allocated API: if a driver
only returns 0 or 1 (aka BDRV_BLOCK_DATA) like is_allocated used to,
clients of is_allocated will not have any change in behavior.  Still,
we will return more precise information in the next patches and the
new definition of bdrv_is_allocated is already prepared for this.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini
b6b8a33354 block: introduce bdrv_get_block_status API
For now, bdrv_get_block_status is just another name for bdrv_is_allocated.
The next patches will add more flags.

This also touches all block drivers with a mostly mechanical rename.  The
sole exception is cow; because it calls cow_co_is_allocated from the read
code, we keep that function and make cow_co_get_block_status a wrapper.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini
11212d8fa0 block: make bdrv_has_zero_init return false for copy-on-write-images
This helps implementing is_allocated on top of get_block_status.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini
d663640c04 block: expect errors from bdrv_co_is_allocated
Some bdrv_is_allocated callers do not expect errors, but the fallback
in qcow2.c might make other callers trip on assertion failures or
infinite loops.

Fix the callers to always look for errors.

Cc: qemu-stable@nongnu.org
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini
4f5786376e block: remove bdrv_is_allocated_above/bdrv_co_is_allocated_above distinction
Now that bdrv_is_allocated detects coroutine context, the two can
use the same code.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini
617ccb466e block: do not use ->total_sectors in bdrv_co_is_allocated
This is more robust when the device has removable media.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Paolo Bonzini
bdad13b9de block: make bdrv_co_is_allocated static
bdrv_is_allocated can detect coroutine context and go through a fast
path, similar to other block layer functions.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Paolo Bonzini
df2a6f29a5 block: keep bs->total_sectors up to date even for growable block devices
If a BlockDriverState is growable, after every write we need to
check if bs->total_sectors might have changed.  With this change,
bdrv_getlength does not need anymore a system call.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Fam Zheng
4f6fd3491c block: make bdrv_delete() static
Manage BlockDriverState lifecycle with refcnt, so bdrv_delete() is no
longer public and should be called by bdrv_unref() if refcnt is
decreased to 0.

This is an identical change because effectively, there's no multiple
reference of BDS now: no caller of bdrv_ref() yet, only bdrv_new() sets
bs->refcnt to 1, so all bdrv_unref() now actually delete the BDS.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Fam Zheng
9fcb025146 block: implement reference count for BlockDriverState
Introduce bdrv_ref/bdrv_unref to manage the lifecycle of
BlockDriverState. They are unused for now but will used to replace
bdrv_delete() later.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Benoît Canet
cc0681c454 block: Enable the new throttling code in the block layer.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:07 +02:00
Kevin Wolf
09da4a7292 block: Remove redundant assertion
The failing condition is checked immediately before the assertion, so
keeping the assertion is kind of redundant.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:28:52 +02:00
Alex Bligh
bc72ad6754 aio / timers: Switch entire codebase to the new timer API
This is an autogenerated patch using scripts/switch-timer-api.

Switch the entire code base to using the new timer API.

Note this patch may introduce some line length issues.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 19:14:24 +02:00
MORITA Kazutaka
893a8f6220 block: Produce zeros when protocols reading beyond end of file
While Asias is debugging an issue creating qcow2 images on top of
non-file protocols.  It boils down to this example using NBD:

$ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'

Notice the open -g option to set bs->growable.  This means you can
read/write beyond end of file.  Reading beyond end of file is supposed
to produce zeroes.

We rely on this behavior in qcow2_create2() during qcow2 image
creation.  We create a new file and then write the qcow2 header
structure using bdrv_pwrite().  Since QCowHeader is not a multiple of
sector size, block.c first uses bdrv_read() on the empty file to fetch
the first sector (should be all zeroes).

Here is the output from the qemu-io NBD example above:

$ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
00000000:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
00000010:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
00000020:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
...

We are not zeroing the buffer!  As a result qcow2 image creation on top
of protocols is not guaranteed to work even when file creation is
supported by the protocol.

[Adapted this patch to use bs->zero_beyond_eof.
-- Stefan]

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 14:14:56 +02:00
Asias He
0d51b4debe block: Introduce bs->zero_beyond_eof
In 4146b46c42e0989cb5842e04d88ab6ccb1713a48 (block: Produce zeros when
protocols reading beyond end of file), we break qemu-iotests ./check
-qcow2 022. This happens because qcow2 temporarily sets ->growable = 1
for vmstate accesses (which are stored beyond the end of regular image
data).

We introduce the bs->zero_beyond_eof to allow qcow2_load_vmstate() to
disable ->zero_beyond_eof temporarily in addition to enable ->growable.

[Since the broken patch "block: Produce zeros when protocols reading
beyond end of file" has not been merged yet, I have applied this fix
*first* and will then apply the next patch to keep the tree bisectable.
-- Stefan]

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 14:10:21 +02:00
Stefan Hajnoczi
88266f5aa7 block: stop relying on io_flush() in bdrv_drain_all()
If a block driver has no file descriptors to monitor but there are still
active requests, it can return 1 from .io_flush().  This is used to spin
during synchronous I/O.

Stop relying on .io_flush() and instead check
QLIST_EMPTY(&bs->tracked_requests) to decide whether there are active
requests.

This is the first step in removing .io_flush() so that event loops no
longer need to have the concept of synchronous I/O.  Eventually we may
be able to kill synchronous I/O completely by running everything in a
coroutine, but that is future work.

Note this patch moves bs->throttled_reqs initialization to bdrv_new() so
that bdrv_requests_pending(bs) can safely access it.  In practice bs is
g_malloc0() so the memory is already zeroed but it's safer to initialize
the queue properly.

We also need to fix up block/stream.c:close_unused_images() to prevent
traversing a dangling pointer while it rearranges the backing file
chain.  This is necessary since the new bdrv_drain_all() traverses the
backing file chain.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-19 15:45:34 +02:00
Stefan Hajnoczi
e1b5c52e04 block: ensure bdrv_drain_all() works during bdrv_delete()
In bdrv_delete() make sure to call bdrv_make_anon() *after* bdrv_close()
so that the device is still seen by bdrv_drain_all() when iterating
bdrv_states.

Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-19 15:45:34 +02:00
Benoît Canet
b681a1c73e block: Repair the throttling code.
The throttling code was segfaulting since commit
02ffb50448 because some qemu_co_queue_next caller
does not run in a coroutine.
qemu_co_queue_do_restart assume that the caller is a coroutinne.
As suggested by Stefan fix this by entering the coroutine directly.
Also make sure like suggested that qemu_co_queue_next() and
qemu_co_queue_restart_all() can be called only in coroutines.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-07-29 17:07:37 +02:00
Kevin Wolf
74fe54f2a1 block: Allow "driver" option on the top level
This is traditionally -drive format=..., which is now translated into
the new driver option. This gives us a more consistent way to select the
driver of BlockDriverStates that can be used in QMP context, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-07-26 21:10:11 +02:00
Peter Lieven
4e7395e84f block: fix bdrv_read_unthrottled()
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-07-19 12:29:22 +08:00
Peter Lieven
4105eaaab9 block: add bdrv_write_zeroes()
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-07-19 12:29:21 +08:00
Kevin Wolf
f0f0fdfeec block: Add return value for bdrv_flush_all()
bdrv_flush() can fail, and bdrv_flush_all() should return an error as
well if this happens for a block device. It returns the first error
return now, but still at least tries to flush the remaining devices even
in error cases.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-07-15 09:51:27 +02:00
Kevin Wolf
98289620e0 block: Don't parse protocol from file.filename
One of the major reasons for doing something new for -blockdev and
blockdev-add was that the old block layer code parses filenames instead
of just taking them literally. So we should really leave it untouched
when it's passing using the new interfaces (like -drive
file.filename=...).

This allows opening relative file names that contain a colon.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-07-15 09:49:00 +02:00
Stefan Hajnoczi
58fda173e1 block: fix bdrv_flush() ordering in bdrv_close()
Since 80ccf93b we flush the block device during close.  The
bdrv_drain_all() call should come before bdrv_flush() to ensure guest
write requests have completed.  Otherwise we may miss pending writes
when flushing.

Call bdrv_drain_all() again for safety as the final step after
bdrv_flush().  This should not be necessary but we can be paranoid here
in case bdrv_flush() left I/O pending.

Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2013-07-05 10:52:23 +02:00
Peter Lieven
3ac216270a block: change default of .has_zero_init to 0
.has_zero_init defaults to 1 for all formats and protocols.

this is a dangerous default since this means that all
new added drivers need to manually overwrite it to 0 if
they do not ensure that a device is zero initialized
after bdrv_create().

if a driver needs to explicitly set this value to
1 its easier to verify the correctness in the review process.

during review of the existing drivers it turned out
that ssh and gluster had a wrong default of 1.
both protocols support host_devices as backend
which are not by default zero initialized. this
wrong assumption will lead to possible corruption
if qemu-img convert is used to write to such a backend.

vpc and vmdk also defaulted to 1 altough they support
fixed respectively flat extends. this has to be addresses
in separate patches. both formats as well as the mentioned
ssh and gluster are turned to the default of 0 with this
patch for safety.

a similar problem with the wrong default existed for
iscsi most likely because the driver developer did
oversee the default value of 1.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-06-28 13:52:35 +02:00
Stefan Hajnoczi
d616b22474 block: add bdrv_add_before_write_notifier()
The bdrv_add_before_write_notifier() function installs a callback that
is invoked before a write request is processed.  This will be used to
implement copy-on-write point-in-time snapshots where we need to copy
out old data before overwriting it.

Note that BdrvTrackedRequest is moved to block_int.h since it is passed
to .notify() functions.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-06-28 09:20:26 +02:00
Kevin Wolf
50b05b6f2e block: Always enable discard on the protocol level
Turning on discard options in qcow2 doesn't help a lot when the discard
requests that it issues are thrown away by the raw-posix layer. This
patch always enables discard functionality on the protocol level so that
it's the image format's responsibility to send (or not) discard
requests. Requests sent by the guest will be allowed or ignored by the
top level BlockDriverState, which depends on the discard=... option like
before.

In particular, this means that even without specifying options, the
qcow2 default of discarding deleted snapshots actually takes effect now,
both for qemu and qemu-img.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-06-24 10:25:17 +02:00
Luiz Capitulino
d8b6895f7a block: bdrv_reopen_prepare(): don't use QERR_OPEN_FILE_FAILED
The call to drv->bdrv_reopen_prepare() can fail due to reasons
other than an open failure. Unfortunately, we can't use errno
nor -ret, cause they are not always set.

Stick to a generic error message then.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
2013-06-17 11:01:14 -04:00
Kevin Wolf
bf736fe34c blkdebug: Add BLKDBG_FLUSH_TO_OS/DISK events
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-06-06 11:27:22 +02:00
Wenchao Xia
f364ec65b5 block: move qmp and info dump related code to block/qapi.c
This patch is a pure code move patch, except following modification:
1 get_human_readable_size() is changed to static function.
2 dump_human_image_info() is renamed to bdrv_image_info_dump().
3 in qmp_query_block() and qmp_query_blockstats, use bdrv_next(bs)
instead of direct traverse of global array 'bdrv_states'.
4 collect_snapshots() and collect_image_info() are renamed, unused parameter
*fmt in collect_image_info() is removed.
5 code style fix.

To avoid conflict and tip better, macro in header file is BLOCK_QAPI_H
instead of QAPI_H. Now block.h and snapshot.h are at the same level in
include path, block_int.h and qapi.h will both include them.

Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-06-04 13:56:30 +02:00
Wenchao Xia
de08c606f9 block: move snapshot code in block.c to block/snapshot.c
All snapshot related code, except bdrv_snapshot_dump() and
bdrv_is_snapshot(), is moved to block/snapshot.c. bdrv_snapshot_dump()
will be moved to another file later. bdrv_is_snapshot() is not related
with internal snapshot. It also fixes small code style errors reported
by check script.

Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-06-04 13:56:30 +02:00
Stefan Hajnoczi
29d782710f block: drop bs_snapshots global variable
The bs_snapshots global variable points to the BlockDriverState which
will be used to save vmstate.  This is really a savevm.c concept but was
moved into block.c:bdrv_snapshots() when it became clear that hotplug
could result in a dangling pointer.

While auditing the block layer's global state I came upon bs_snapshots
and realized that a variable is not necessary here.  Simply find the
first BlockDriverState capable of internal snapshots each time this is
needed.

The behavior of bdrv_snapshots() is preserved across hotplug because new
drives are always appended to the bdrv_states list.  This means that
calling the new find_vmstate_bs() function is idempotent - it returns
the same BlockDriverState unless it was hot-unplugged.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-06-04 13:56:29 +02:00
Fam Zheng
b64ec4e4ad block: add block driver read only whitelist
We may want to include a driver in the whitelist for read only tasks
such as diagnosing or exporting guest data (with libguestfs as a good
example). This patch introduces a readonly whitelist option, and for
backward compatibility, the old configure option --block-drv-whitelist
is now an alias to rw whitelist.

Drivers in readonly list is only permitted to open file readonly, and
returns -ENOTSUP for RW opening.

E.g. To include vmdk readonly, and others read+write:
    ./configure --target-list=x86_64-softmmu \
                --block-drv-rw-whitelist=qcow2,raw,file,qed \
                --block-drv-ro-whitelist=vmdk

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-06-04 12:11:58 +02:00
Kevin Wolf
f3f4d2c09b block: Add hint to -EFBIG error message
The limit of qcow2 files at least depends on the cluster size. If the
image format has a cluster_size option, suggest to increase it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-05-14 16:44:33 +02:00
Kevin Wolf
456736710d block: Fix build with tracing enabled
filename was still uninitialised when it's used as a parameter to a
tracing function, so let's move the initialisation. Also, commit c2ad1b0c
forgot to add a NULL check, which this patch adds while we're at it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Andreas Färber <afaerber@suse.de>
Message-id: 1366645720-11384-1-git-send-email-kwolf@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-04-22 11:31:41 -05:00
Kevin Wolf
1cb6f50644 block: Allow overriding backing.file.filename
If a filename is passed in the driver-specific options from the command
line, the backing file path from the image is ignored now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-04-22 11:37:12 +02:00
Kevin Wolf
56d1b4d21d block: Remove filename parameter from .bdrv_file_open()
It is unused now in all block drivers.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-04-22 11:34:35 +02:00
Kevin Wolf
035fccdf79 block: Enable filename option
This allows using the file.filename option instead of the string that
comes from -drive file=... and is passed around as a separate parameter.
The goal is to get rid of this parameter and use the options QDict more
consistently.

With this option you can access not only the top-level image, but
specify a filename for the backing file (currently only if no backing
file exists, but we'll allow overriding it later)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-04-22 10:27:59 +02:00
Kevin Wolf
31ca6d077c block: Add driver-specific options for backing files
Options starting in "backing." are passed to the backing file now. If
you don't need to specify the filename for the backing file, you can add
it on the command line instead of in the image file:

$ qemu-nbd -t /tmp/test.img
$ qemu-img create -f qcow2 empty.qcow2 1G
$ qemu-system-x86_64 -drive file=empty.qcow2,backing.file.driver=nbd,\
    backing.file.host=localhost

Note that this doesn't override the backing filename from the image. If
the image has one, this will fail because NBD doesn't want the options
and a filename at the same time.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-04-22 10:27:59 +02:00
Kevin Wolf
2af5ef70af block: Fail gracefully when using a format driver on protocol level
Specifying the wrong driver could fail an assertion:

$ qemu-system-x86_64 -drive file.driver=qcow2,file=x
qemu-system-x86_64: block.c:721: bdrv_open_common: Assertion `file !=
((void *)0)' failed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-04-22 10:27:59 +02:00
Kevin Wolf
8d3b1a2d0b block: Introduce bdrv_pwritev() for qcow2_save_vmstate
Directly pass the QEMUIOVector on instead of linearising it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-04-15 08:26:18 +02:00
Kevin Wolf
cf8074b382 block: Introduce bdrv_writev_vmstate
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-04-15 08:26:18 +02:00
Stefan Hajnoczi
0775437faf block: clean up I/O throttling wait_time code
The wait_time variable is in seconds.  Reflect this in a comment and use
NANOSECONDS_PER_SECOND instead of BLOCK_IO_SLICE_TIME * 10 (which
happens to have the right value).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-By: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-04-05 18:58:05 +02:00
Stefan Hajnoczi
e660fb8b3c block: drop duplicated slice extension code
The current slice is extended when an I/O request exceeds the limit.
There is no need to extend the slice every time we check a request.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-By: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-04-05 18:58:05 +02:00
Stefan Hajnoczi
ae29d6c64b block: keep I/O throttling slice time constant
It is not necessary to adjust the slice time at runtime.  We already
extend the current slice in order to carry over accounting into the next
slice.  Changing the actual slice time value introduces oscillations.

The guest may experience large changes in throughput or IOPS from one
moment to the next when slice times are adjusted.

Reported-by: Benoît Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-By: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-04-05 18:58:05 +02:00
Stefan Hajnoczi
5905fbc9c9 block: fix I/O throttling accounting blind spot
I/O throttling relies on bdrv_acct_done() which is called when a request
completes.  This leaves a blind spot since we only charge for completed
requests, not submitted requests.

For example, if there is 1 operation remaining in this time slice the
guest could submit 3 operations and they will all be submitted
successfully since they don't actually get accounted for until they
complete.

Originally we probably thought this is okay since the requests will be
accounted when the time slice is extended.  In practice it causes
fluctuations since the guest can exceed its I/O limit and it will be
punished for this later on.

Account for I/O upon submission so that I/O limits are enforced
properly.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-By: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-04-05 18:58:05 +02:00
Kevin Wolf
5d186eb03e block: Fix direct use of protocols as driver for bdrv_open()
bdrv_open_common() implements direct use of protocols by copying the
pre-opened BlockDriverStates to bs using bdrv_swap(). It did however
first set some fields in bs, which end up in file after the swap. When
bdrv_open() destroys file, it appears to be open, and because it isn't,
qemu could segfault while trying to close it.

Reorder the operations to return immediately in such cases so that file
is correctly detected as closed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-28 11:58:40 +01:00
Kevin Wolf
c2ad1b0c46 block: Allow omitting the file name when using driver-specific options
After this patch, using -drive with an empty file name continues to open
the file if driver-specific options are used. If no driver-specific
options are specified, the semantics stay as it was: It defines a drive
without an inserted medium.

In order to achieve this, bdrv_open() must be made safe to work with a
NULL filename parameter. The assumption that is made is that only block
drivers which implement bdrv_parse_filename() support using driver
specific options and could therefore work without a filename. These
drivers must make sure to cope with NULL in their implementation of
.bdrv_open() (this is only NBD for now). For all other drivers, the
block layer code will make sure to error out before calling into their
code - they can't possibly work without a filename.

Now an NBD connection can be opened like this:

  qemu-system-x86_64 -drive file.driver=nbd,file.port=1234,file.host=::1

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22 17:51:32 +01:00
Kevin Wolf
08b392e151 block: Rename variable to avoid shadowing
bdrv_open() uses two different variables called options. Rename one of
them to avoid confusion and to allow the outer one to be accessed
everywhere.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22 17:51:32 +01:00
Kevin Wolf
6963a30d82 block: Introduce .bdrv_parse_filename callback
If a driver needs structured data and not just a string, it can provide
a .bdrv_parse_filename callback now that parses the command line string
into separate options. Keeping this separate from .bdrv_open_filename
ensures that the preferred way of directly specifying the options always
works as well if parsing the string works.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22 17:51:32 +01:00
Kevin Wolf
707ff8282b block: Pass bdrv_file_open() options to block drivers
Specify -drive file.option=... on the command line to pass the option to
the protocol instead of the format driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22 17:51:31 +01:00
Kevin Wolf
787e4a8500 block: Add options QDict to bdrv_file_open() prototypes
The new parameter is unused yet.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-03-22 17:51:31 +01:00
Peter Lieven
5c916681ae Revert "block: complete all IOs before .bdrv_truncate"
brdv_truncate() is also called from readv/writev commands on self-
growing file based storage. this will result in requests waiting
for theirselves to complete.

This reverts commit 9a665b2b86.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-22 17:51:31 +01:00
Stefan Hajnoczi
4d70655bcb block: fix BDRV_O_SNAPSHOT protocol detection
realpath(3) is used to get an absolute path to the image file when
creating a -drive snapshot=on temporary qcow2.  This does not work for
protocols since their filenames ("proto:foo:...") do not correspond to
file system paths.

Commit 7c96d46ec2 ("Let snapshot work with
protocols") skipped realpath(3) for protocols.  Later on the "raw"
format was introduced and broke the check.

Use path_has_protocol(filename) to decide if this image uses a protocol
or a filename.

Reported-by: Richard Jones <rjones@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-03-19 11:48:37 +01:00
Stefan Hajnoczi
85d126f3ee block: add bdrv_get_aio_context()
For now bdrv_get_aio_context() is just a stub that calls
qemu_aio_get_context() since the block layer is currently tied to the
main loop AioContext.

Add the stub now so that the block layer can begin accessing its
AioContext.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2013-03-15 16:07:51 +01:00
Kevin Wolf
b6ad491a49 block: Add options QDict to bdrv_open_common()
The options are passed down to the block drivers, which are supposed to
remove all options they have processed. Anything that is left over in
the end is an unknown option and results in an error.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15 16:07:49 +01:00
Kevin Wolf
de9c0cec6c block: Add options QDict to bdrv_open() prototype
It doesn't do anything yet except storing the options QDict in the
BlockDriverState.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15 16:07:49 +01:00
Kevin Wolf
1a86938f04 block: Add options QDict to .bdrv_open()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-15 16:07:49 +01:00
Jeff Cody
272d2d8e12 block: for HMP commit() operations on 'all', skip non-COW drives
During a commit of 'all' using the HMP non-live commit, the operation
is aborted and returns error on the first error enountered.  When
non-COW drives are in use (e.g. ejected floppy, cdrom, or drives without
a backing parent), that means a commit all will return an error of either
-ENOMEDIUM or -ENOTSUP.  This is not desirable, so for the 'all' commit
case, only attempt the commit if both bs->drv and bs->backing_hd are
present.

More succinctly: 'commit all' now means a commit on all COW drives.

This means an individual commit to a specific non-COW drive will still
return the appropriate error (-ENOMEDIUM if eject / not present, -ENOTSUP
if no backing file).

Reported-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-04 09:54:17 +01:00
Paolo Bonzini
9e8f1835ea block: implement BDRV_O_UNMAP
It is better to present homogeneous hardware independent of the storage
technology that is chosen on the host, hence we make discard a host
parameter; the user can choose whether to pass it down to the image
format and protocol, or to ignore it.

Using DISCARD with filesystems can cause very severe fragmentation, so it
is left default-off for now.  This can change later when we implement the
"anchor" operation for efficient management of preallocated files.

There is still one choice to make: whether DISCARD has an effect on the
dirty bitmap or not.  I chose yes, though there is a disadvantage: if
the guest is buggy and issues discards for data that is in use, there
will be no way to migrate storage for that guest without downgrading
the machine type to an older one.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-02-22 21:29:42 +01:00
Peter Lieven
9a665b2b86 block: complete all IOs before .bdrv_truncate
bdrv_truncate() invalidates the bdrv_check_request() result for
in-flight requests, so there should better be none.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-02-22 21:21:10 +01:00
Miroslav Rezanina
f382d43a91 qemu-img: Add "Quiet mode" option
There can be a need to turn output to stdout off. This patch adds a -q option
that enable "Quiet mode". In Quiet mode, only errors are printed out.

Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-22 21:21:09 +01:00
Miroslav Rezanina
b35b2bba5b block: Add synchronous wrapper for bdrv_co_is_allocated_above
There's no synchronous wrapper for bdrv_co_is_allocated_above function
so it's not possible to check for sector allocation in an image with
a backing file.

Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-22 21:21:09 +01:00
Vishvananda Ishaya
63ba17d39f block: Fix is_allocated_above with resized files
In an image chain, if the base image is smaller than the current
image, we need to make sure to use the current images count of
unallocated blocks once we get to the end of the base image. Without
this change the code will return 0 blocks when it gets to the end
of the base image and mirror_run will fail its assertion.

Signed-off-by: Vishvananda Ishaya <vishvananda@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-02-01 14:58:28 +01:00
Paolo Bonzini
50717e941b block: allow customizing the granularity of the dirty bitmap
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:34 +01:00
Paolo Bonzini
acc906c6c5 block: return count of dirty sectors, not chunks
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:33 +01:00
Paolo Bonzini
343bded4ec block: make round_to_clusters public
This is needed in the following patch.

Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:33 +01:00
Paolo Bonzini
8f0720ecbc block: implement dirty bitmap using HBitmap
This actually uses the dirty bitmap in the block layer, and converts
mirroring to use an HBitmapIter.

Reviewed-by: Laszlo Ersek <lersek@redhat.com> (except block/mirror.c parts)
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-25 18:18:33 +01:00
Paolo Bonzini
df702c9b4c block: clear dirty bitmap when discarding
Note that resetting bits in the dirty bitmap is done _before_ actually
processing the request.  Writes, instead, set bits after the request
is completed.

This way, when there are concurrent write and discard requests, the
outcome will always be that the blocks are marked dirty.  This scenario
should never happen, but it is safer to do it this way.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 10:03:48 +01:00
Peter Lieven
029d091e49 block: fix initialization in bdrv_io_limits_enable()
bdrv_io_limits_enable() starts a new slice, but does not set io_base
correctly for that slice.

Here is how io_base is used:

    bytes_base  = bs->nr_bytes[is_write] - bs->io_base.bytes[is_write];
    bytes_res   = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;

    if (bytes_base + bytes_res <= bytes_limit) {
        /* no wait */
    } else {
        /* operation needs to be throttled */
    }

As a result, any I/O operations that are triggered between now and
bs->slice_end are incorrectly limited.  If 10 MB of data has been
written since the VM was started, QEMU thinks that 10 MB of data has
been written in this slice. This leads to a I/O lockup in the guest.

We fix this by delaying the start of a new slice to the next
call of bdrv_exceed_io_limits().

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15 09:24:38 +01:00
Stefan Hajnoczi
c53b1c5114 block: make qiov_is_aligned() public
The qiov_is_aligned() function checks whether a QEMUIOVector meets a
BlockDriverState's alignment requirements.  This is needed by
virtio-blk-data-plane so:

1. Move the function from block/raw-posix.c to block/block.c.
2. Make it public in block/block.h.
3. Rename to bdrv_qiov_is_aligned().
4. Change return type from int to bool.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-14 10:06:56 +01:00
Paolo Bonzini
8e895599a1 block: do not probe zero-sized disks
A blank CD or DVD is visible as a zero-sized disks.  Probing such
disks will lead to an EIO and a failure to start the VM.  Treating
them as raw is a better solution.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-14 10:06:56 +01:00
Stefan Weil
eb7ff6fb0b Replace remaining gmtime, localtime by gmtime_r, localtime_r
This allows removing of MinGW specific code and improves
reentrancy for POSIX hosts.

[Removed unused ret variable in qemu_get_timedate() to fix warning:
vl.c: In function ‘qemu_get_timedate’:
vl.c:451:16: error: variable ‘ret’ set but not used [-Werror=unused-but-set-variable]
-- Stefan Hajnoczi]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-11 09:44:37 +01:00
Paolo Bonzini
9c17d615a6 softmmu: move include files to include/sysemu/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:32:45 +01:00
Paolo Bonzini
1de7afc984 misc: move include files to include/qemu/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:32:39 +01:00
Paolo Bonzini
83c9089e73 monitor: move include files to include/monitor/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:32 +01:00
Paolo Bonzini
737e150e89 block: move include files to include/block/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:31 +01:00
Paolo Bonzini
7b1b5d1913 qapi: move include files to include/qobject/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:31 +01:00
Kevin Wolf
41c695c749 qemu-io: Add AIO debugging commands
This makes the blkdebug suspend/resume functionality available in
qemu-io. Use it like this:

  $ ./qemu-io blkdebug::/tmp/test.qcow2
  qemu-io> break write_aio req_a
  qemu-io> aio_write 0 4k
  qemu-io> blkdebug: Suspended request 'req_a'
  qemu-io> resume req_a
  blkdebug: Resuming request 'req_a'
  qemu-io> wrote 4096/4096 bytes at offset 0
  4 KiB, 1 ops; 0:00:30.71 (133.359788 bytes/sec and 0.0326 ops/sec)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-12 12:33:48 +01:00
Luiz Capitulino
d92ada2202 block: bdrv_img_create(): drop unused error handling code
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11 11:05:10 +01:00
Luiz Capitulino
71c79813d8 block: bdrv_img_create(): add Error ** argument
This commit adds an Error ** argument to bdrv_img_create() and set it
appropriately on error.

Callers of bdrv_img_create() pass NULL for the new argument and still
rely on bdrv_img_create()'s return value. Next commits will change
callers to use the Error object instead.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11 11:05:10 +01:00
Kevin Wolf
f500a6d3c2 block: Avoid second open for format probing
This fixes problems that are caused by the additional open/close cycle
of the existing format probing, for example related to qemu-nbd without
-t option or file descriptor passing.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11 11:04:25 +01:00
Kevin Wolf
7b27245239 block: Factor out bdrv_open_flags
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11 11:04:25 +01:00
Kevin Wolf
d318aea932 block: Improve bdrv_aio_co_cancel_em
Instead of waiting for all requests to complete, wait just for the
specific request that should be cancelled.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11 11:04:25 +01:00
Stefan Weil
89c9bc3d14 block: Fix regression for MinGW (assertion caused by short string)
The local string tmp_filename is passed to function get_tmp_filename
which expects a string with minimum size MAX_PATH for w32 hosts.

MAX_PATH is 260 and PATH_MAX is 259, so tmp_filename was too short.

Commit eba25057b9 introduced this
regression.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-24 19:54:43 +00:00
Stefan Hajnoczi
d7331bed11 aio: rename AIOPool to AIOCBInfo
Now that AIOPool no longer keeps a freelist, it isn't really a "pool"
anymore.  Rename it to AIOCBInfo and make it const since it no longer
needs to be modified.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-11-14 18:19:21 +01:00
Stefan Hajnoczi
d37c975fb1 aio: use g_slice_alloc() for AIOCB pooling
AIO control blocks are frequently acquired and released because each aio
request involves at least one AIOCB.  Therefore, we pool them to avoid
heap allocation overhead.

The problem with the freelist approach in AIOPool is thread-safety.  If
we want BlockDriverStates to associate with AioContexts that execute in
multiple threads, then a global freelist becomes a problem.

This patch drops the freelist and instead uses g_slice_alloc() which is
tuned for per-thread fixed-size object pools.  qemu_aio_get() and
qemu_aio_release() are now thread-safe.

Note that the change from g_malloc0() to g_slice_alloc() should be safe
since the freelist reuse case doesn't zero the AIOCB either.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-11-14 18:19:21 +01:00
Anthony Liguori
90c45b3031 Merge remote-tracking branch 'kwolf/for-anthony' into staging
* kwolf/for-anthony: (32 commits)
  osdep: Less restrictive F_SEFL in qemu_dup_flags()
  qemu-iotests: add testcases for mirroring on-source-error/on-target-error
  qmp: add pull_event function
  mirror: add support for on-source-error/on-target-error
  iostatus: forward block_job_iostatus_reset to block job
  qemu-iotests: add mirroring test case
  mirror: implement completion
  qmp: add drive-mirror command
  mirror: introduce mirror job
  block: introduce BLOCK_JOB_READY event
  block: add block-job-complete
  block: rename block_job_complete to block_job_completed
  block: export dirty bitmap information in query-block
  block: introduce new dirty bitmap functionality
  block: add bdrv_open_backing_file
  block: add bdrv_query_stats
  block: add bdrv_query_info
  qemu-config: Add new -add-fd command line option
  monitor: Prevent removing fd from set during init
  monitor: Enable adding an inherited fd to an fd set
  ...

Conflicts:
	vl.c

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-10-29 10:34:05 -05:00
Paolo Bonzini
3bd293c3fd iostatus: forward block_job_iostatus_reset to block job
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24 10:26:22 +02:00
Paolo Bonzini
b9a9b3a462 block: export dirty bitmap information in query-block
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24 10:26:19 +02:00
Paolo Bonzini
1755da16e3 block: introduce new dirty bitmap functionality
Assert that write_compressed is never used with the dirty bitmap.
Setting the bits early is wrong, because a coroutine might concurrently
examine them and copy incomplete data from the source.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24 10:26:19 +02:00
Paolo Bonzini
9156df12a4 block: add bdrv_open_backing_file
Mirroring runs without the backing file so that it can be copied outside
QEMU.  However, we need to add it at the time the job is completed and
QEMU switches to the target.  Factor out the common bits of opening an
image and completing a mirroring operation.

The new function does not assume that the file is closed immediately after
it returns failure, so it keeps the BDRV_O_NO_BACKING flag up-to-date.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24 10:26:19 +02:00
Paolo Bonzini
9887b61661 block: add bdrv_query_stats
qmp_query_blockstat cannot have errors, remove the Error argument and
create a new public function bdrv_query_stats out of it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24 10:26:19 +02:00
Paolo Bonzini
ac84adac48 block: add bdrv_query_info
Extract it out of the implementation of "info block".

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24 10:26:19 +02:00
Luiz Capitulino
80168bff43 block: bdrv_create(): don't leak cco.filename on error
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24 10:26:19 +02:00
Jeff Cody
b1b1d783ea block: make bdrv_find_backing_image compare canonical filenames
Currently, bdrv_find_backing_image compares bs->backing_file with
what is passed in as a backing_file name.  Mismatches may occur,
however, when bs->backing_file and backing_file are not both
absolute or relative.

Use path_combine() to make sure any relative backing filenames are
relative to the current image filename being searched, and then use
realpath() to make all comparisons based on absolute filenames.

If either backing_file or bs->backing_file is determine to be a
protocol, then no filename normalization is performed.

This also changes bdrv_find_backing_image to no longer be recursive,
but iterative.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24 10:26:18 +02:00
Paolo Bonzini
d7d512f609 block: add close notifiers
The first user of close notifiers will be the embedded NBD server.
It would be possible to use them to do some of the ad hoc processing
(e.g. for block jobs and I/O limits) that is currently done by
bdrv_close.

Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-23 22:39:32 +02:00
Paolo Bonzini
3cbc002c34 block: prepare code for adding block notifiers
There is no reason in principle to skip job cancellation and draining
of pending I/O when there is no medium in the disk.  Do these unconditionally,
which also prepares the code for the next patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-23 22:39:32 +02:00
Jim Meyering
c2cba3d931 block: avoid buffer overrun by using pstrcpy, not strncpy
Also, use PATH_MAX, rather than the arbitrary 1024.
Using PATH_MAX is more consistent with other filename-related
variables in this file, like backing_filename and tmp_filename.

Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-10-05 07:58:36 -05:00
Paolo Bonzini
32c81a4a6e block: introduce block job error
The following behaviors are possible:

'report': The behavior is the same as in 1.1.  An I/O error,
respectively during a read or a write, will complete the job immediately
with an error code.

'ignore': An I/O error, respectively during a read or a write, will be
ignored.  For streaming, the job will complete with an error and the
backing file will be left in place.  For mirroring, the sector will be
marked again as dirty and re-examined later.

'stop': The job will be paused and the job iostatus will be set to
failed or nospace, while the VM will keep running.  This can only be
specified if the block device has rerror=stop and werror=stop or enospc.

'enospc': Behaves as 'stop' for ENOSPC errors, 'report' for others.

In all cases, even for 'report', the I/O error is reported as a QMP
event BLOCK_JOB_ERROR, with the same arguments as BLOCK_IO_ERROR.

It is possible that while stopping the VM a BLOCK_IO_ERROR event will be
reported and will clobber the event from BLOCK_JOB_ERROR, or vice versa.
This is not really avoidable since stopping the VM completes all pending
I/O requests.  In fact, it is already possible now that a series of
BLOCK_IO_ERROR events are reported with rerror=stop, because vm_stop
calls bdrv_drain_all and this can generate further errors.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28 19:40:56 +02:00
Paolo Bonzini
3e1caa5f76 iostatus: reorganize io error code
Move the common part of IDE/SCSI/virtio error handling to the block
layer.  The new function bdrv_error_action subsumes all three of
bdrv_emit_qmp_error_event, vm_stop, bdrv_iostatus_set_err.

The same scheme will be used for errors in block jobs.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28 19:40:56 +02:00
Paolo Bonzini
1ceee0d5cc iostatus: change is_read to a bool
Do this while we are touching this part of the code, before introducing
more uses of "int is_read".

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28 19:40:56 +02:00
Paolo Bonzini
92aa5c6d77 iostatus: move BlockdevOnError declaration to QAPI
This will let block-stream reuse the enum.  Places that used the enums
are renamed accordingly.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28 19:40:26 +02:00
Paolo Bonzini
ff06f5f351 iostatus: rename BlockErrorAction, BlockQMPEventAction
We want to remove knowledge of BLOCK_ERR_STOP_ENOSPC from drivers;
drivers should only be told whether to stop/report/ignore the error.
On the other hand, we want to keep using the nicer BlockErrorAction
name in the drivers.  So rename the enums, while leaving aside the
names of the enum values for now.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28 19:14:32 +02:00
Paolo Bonzini
2f0c9fe64c block: move job APIs to separate files
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28 19:14:26 +02:00
Jeff Cody
79fac5680d block: helper function, to find the base image of a chain
This is a simple helper function, that will return the base image
of a given image chain.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28 18:23:44 +02:00
Jeff Cody
6ebdcee2d8 block: add support functions for live commit, to find and delete images.
Add bdrv_find_overlay(), and bdrv_drop_intermediate().

bdrv_find_overlay():  given 'bs' and the active (topmost) BDS of an image chain,
                    find the image that is the immediate top of 'bs'

bdrv_drop_intermediate():
                    Given 3 BDS (active, top, base), drop images above
                    base up to and including top, and set base to be the
                    backing file of top's overlay node.

                    E.g., this converts:

                    bottom <- base <- intermediate <- top <- active

                    to

                    bottom <- base <- active

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28 18:22:44 +02:00
Jeff Cody
dc1c13d969 block: remove keep_read_only flag from BlockDriverState struct
The keep_read_only flag is no longer used, in favor of the bdrv
flag BDRV_O_ALLOW_RDWR.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:13 +02:00
Jeff Cody
0bce597d6e block: convert bdrv_commit() to use bdrv_reopen()
Currently, bdrv_commit() reopens images r/w itself, via risky
_delete() and _open() calls. Use the new safe method for drive reopen.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:12 +02:00
Jeff Cody
e971aa1273 block: Framework for reopening files safely
This is based on Supriya Kannery's bdrv_reopen() patch series.

This provides a transactional method to reopen multiple
images files safely.

Image files are queue for reopen via bdrv_reopen_queue(), and the
reopen occurs when bdrv_reopen_multiple() is called.  Changes are
staged in bdrv_reopen_prepare() and in the equivalent driver level
functions.  If any of the staged images fails a prepare, then all
of the images left untouched, and the staged changes for each image
abandoned.

Block drivers are passed a reopen state structure, that contains:
    * BDS to reopen
    * flags for the reopen
    * opaque pointer for any driver-specific data that needs to be
      persistent from _prepare to _commit/_abort
    * reopen queue pointer, if the driver needs to queue additional
      BDS for a reopen

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:11 +02:00
Jeff Cody
55b110f24e block: make bdrv_set_enable_write_cache() modify open_flags
bdrv_set_enable_write_cache() sets the bs->enable_write_cache flag,
but without the flag recorded in bs->open_flags, then next time
a reopen() is performed the enable_write_cache setting may be
inadvertently lost.

This will set the flag in open_flags, so it is preserved across
reopens.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:11 +02:00
Jeff Cody
be028adced block: correctly set the keep_read_only flag
I believe the bs->keep_read_only flag is supposed to reflect
the initial open state of the device. If the device is initially
opened R/O, then commit operations, or reopen operations changing
to R/W, are prohibited.

Currently, the keep_read_only flag is only accurate for the active
layer, and its backing file. Subsequent images end up always having
the keep_read_only flag set.

For instance, what happens now:

[  base  ]  kro = 1, ro = 1
    |
    v
[ snap-1 ]  kro = 1, ro = 1
    |
    v
[ snap-2 ]  kro = 0, ro = 1
    |
    v
[ active ]  kro = 0, ro = 0

What we want:

[  base  ]  kro = 0, ro = 1
    |
    v
[ snap-1 ]  kro = 0, ro = 1
    |
    v
[ snap-2 ]  kro = 0, ro = 1
    |
    v
[ active ]  kro = 0, ro = 0

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:11 +02:00
Dunrong Huang
fe235a06e1 block: Don't forget to delete temporary file
The caller would not delete temporary file after failed get_tmp_filename().

Signed-off-by: Dunrong Huang <riegamaths@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-12 15:50:09 +02:00
Pavel Hrdina
9ca111544c block: fix block tray status
The tray status should change also if you eject empty block device.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-12 15:50:09 +02:00
Kevin Wolf
d4c8232923 block: Flush parent to OS with cache=unsafe
Commit 29cdb251 already added a comment that no unnecessary flushes to
disk will occur, this patch makes the code even get to the point of the
comment. This is mostly theoretical because in practice we only stack
one format on top of one protocol, the former implementing flush_to_os
and the latter only flush_to_disk. It starts to matter when drivers that
are not on top implement flush_to_os.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-08-15 15:14:43 +02:00
Luiz Capitulino
c75a1a8a5a qmp: query-block: add 'encryption_key_missing' field
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
2012-08-13 13:20:06 -03:00
Benoît Canet
2e3e331710 block: Use bdrv_get_backing_file_depth()
Use the dedicated counting function in qmp_query_block in order to
propagate the backing file depth to HMP and add backing_file_depth
to qmp-commands.hx

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-08-03 10:10:51 -03:00
Benoît Canet
f198fd1c9a block: create bdrv_get_backing_file_depth()
Create bdrv_get_backing_file_depth() in order to be able to show
in QMP and HMP how many ancestors backing an image a block device
have.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-08-03 10:10:38 -03:00
Blue Swirl
0ed8b6f67f Avoid returning void
It's silly and non-conforming to standards to return void,
don't do it.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-07-28 09:23:11 +00:00
Markus Armbruster
2b584959ed block: Geometry and translation hints are now useless, purge them
There are two producers of these hints: drive_init() on behalf of
-drive, and hd_geometry_guess().

The only consumer of the hint is hd_geometry_guess().

The callers of hd_geometry_guess() call it only when drive_init()
didn't set the hints.  Therefore, drive_init()'s hints are never used.

Thus, hd_geometry_guess() only ever sees hints it produced itself in a
prior call.  Only the first call computes something, subsequent calls
just repeat the first call's results.  However, hd_geometry_guess() is
never called more than once: the device models don't, and the block
device is destroyed on unplug.  Thus, dropping the repeat feature
doesn't break anything now.

If a block device wasn't destroyed on unplug and could be reused with
a new device, then repeating old results would be wrong.  Thus,
dropping the repeat feature prevents future breakage.

This renders the hints unused.  Purge them from the block layer.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-17 16:48:31 +02:00
Markus Armbruster
9db1c0f7a9 hd-geometry: Move disk geometry guessing back from block.c
Commit f3d54fc4 factored it out of hw/ide.c for reuse.  Sensible,
except it was put into block.c.  Device-specific functionality should
be kept in device code, not the block layer.  Move it to
hw/hd-geometry.c, and make stylistic changes required to keep
checkpatch.pl happy.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-17 16:48:30 +02:00
Markus Armbruster
61a8d649ff fdc: Move floppy geometry guessing back from block.c
Commit 5bbdbb46 moved it to block.c because "other geometry guessing
functions already reside in block.c".  Device-specific functionality
should be kept in device code, not the block layer.  Move it back.

Disk geometry guessing is still in block.c.  To be moved out in a
later patch series.

Bonus: the floppy type used in pc_cmos_init() now obviously matches
the one in the FDrive.  Before, we relied on
bdrv_get_floppy_geometry_hint() picking the same type both in
fd_revalidate() and in pc_cmos_init().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-17 16:48:29 +02:00
Anthony Liguori
23797df3d9 Merge remote-tracking branch 'mjt/mjt-iov2' into staging
* mjt/mjt-iov2:
  rewrite iov_send_recv() and move it to iov.c
  cleanup qemu_co_sendv(), qemu_co_recvv() and friends
  export iov_send_recv() and use it in iov_send() and iov_recv()
  rename qemu_sendv to iov_send, change proto and move declarations to iov.h
  change qemu_iovec_to_buf() to match other to,from_buf functions
  consolidate qemu_iovec_copy() and qemu_iovec_concat() and make them consistent
  allow qemu_iovec_from_buffer() to specify offset from which to start copying
  consolidate qemu_iovec_memset{,_skip}() into single function and use existing iov_memset()
  rewrite iov_* functions
  change iov_* function prototypes to be more appropriate
  virtio-serial-bus: use correct lengths in control_out() message

Conflicts:
	tests/Makefile

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-07-09 12:35:06 -05:00
Markus Armbruster
07d27a442e block: Factor bdrv_read_unthrottled() out of guess_disk_lchs()
To prepare move of guess_disk_lchs() into hw/, where it poking
BlockDriverState member io_limits_enabled directly would be unclean.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 17:21:02 +02:00
Markus Armbruster
1f69c2b022 fdc: Drop broken code for user-defined floppy geometry
bdrv_get_floppy_geometry_hint() fails to store through its parameter
drive when bs has a geometry hint.  Makes fd_revalidate() assign
random crap to drv->drive.

Has been broken that way for ages.  Harmless, because:

* The only way to set a geometry hint is -drive if=none,cyls=...
  Since commit c219331e, probably unintentional.

* The only use of drv->drive is as argument to another
  bdrv_get_floppy_geometry_hint().  Which doesn't use it, since the
  geometry hint is still there.

Drop the broken code, ignore -drive parameter cyls, heads and secs for
floppies even with if=none, just like before commit c219331e.  Matches
-help, which explains cyls, heads, secs as "hard disk physical
geometry".

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:03 +02:00
Paolo Bonzini
4ddc07cac2 block: introduce bdrv_swap, implement bdrv_append on top of it
The new function can be made a bit nicer than bdrv_append.  It swaps the
whole contents, and then swaps back (using the usual t=a;a=b;b=t idiom)
the fields that need to stay on top.  Thus, it does not need explicit
bdrv_detach_dev, bdrv_iostatus_disable, etc.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:02 +02:00
Paolo Bonzini
a9fc4408e3 block: copy over job and dirty bitmap fields in bdrv_append
While these should not be in use at the time a transaction is started,
a command in the prepare phase of a transaction might have added them,
so they need to be brought over.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:02 +02:00
Markus Armbruster
f8d6bba1c1 block: Replace bdrv_get_format() by bdrv_get_format_name()
So callers don't need to know anything about maximum name length.
Returning a pointer is safe, because the name string lives as long as
the block driver it names, and block drivers don't die.

Requested by Peter Maydell.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:43 +02:00
Paolo Bonzini
e1e9b0aca0 block: always open drivers in writeback mode
Formats are entirely in charge of flushes for metadata writes.  For
guest-initiated writes, a writethrough cache is faked in the block layer.
So we can always open in writeback mode.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:43 +02:00
Paolo Bonzini
425b01487a block: add bdrv_set_enable_write_cache
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:43 +02:00
Paolo Bonzini
c4a248a138 block: copy enable_write_cache in bdrv_append
Because the guest will be able to flip enable_write_cache, the actual
state may not match what is used to open the new snapshot.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:43 +02:00
Paolo Bonzini
f05fa4ad03 block: flush in writethrough mode after writes
We want to make the formats handle their own flushes
autonomously, while keeping for guests the ability to use a writethrough
cache.  Since formats will write metadata via bs->file, bdrv_co_do_writev
is the only place where we need to add a flush.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:43 +02:00
Markus Armbruster
c843328783 block: New bdrv_get_flags()
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:43 +02:00
Kevin Wolf
4534ff5426 qemu-img check -r for repairing images
The QED block driver already provides the functionality to not only
detect inconsistencies in images, but also fix them. However, this
functionality cannot be manually invoked with qemu-img, but the
check happens only automatically during bdrv_open().

This adds a -r switch to qemu-img check that allows manual invocation
of an image repair.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:42 +02:00
Paolo Bonzini
188a7bbf94 stream: move is_allocated_above to block.c
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:42 +02:00
Michael Tokarev
d5e6b1619c change qemu_iovec_to_buf() to match other to,from_buf functions
It now allows specifying offset within qiov to start from and
amount of bytes to copy.  Actual implementation is just a call
to iov_to_buf().

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2012-06-11 23:12:11 +04:00
Michael Tokarev
1b093c480a consolidate qemu_iovec_copy() and qemu_iovec_concat() and make them consistent
qemu_iovec_concat() is currently a wrapper for
qemu_iovec_copy(), use the former (with extra
"0" arg) in a few places where it is used.

Change skip argument of qemu_iovec_copy() from
uint64_t to size_t, since size of qiov itself
is size_t, so there's no way to skip larger
sizes.  Rename it to soffset, to make it clear
that the offset is applied to src.

Also change the only usage of uint64_t in
hw/9pfs/virtio-9p.c, in v9fs_init_qiov_from_pdu() -
all callers of it actually uses size_t too,
not uint64_t.

One added restriction: as for all other iovec-related
functions, soffset must point inside src.

Order of argumens is already good:
 qemu_iovec_memset(QEMUIOVector *qiov, size_t offset,
                   int c, size_t bytes)
vs:
 qemu_iovec_concat(QEMUIOVector *dst,
                   QEMUIOVector *src,
                   size_t soffset, size_t sbytes)
(note soffset is after _src_ not dst, since it applies to src;
for memset it applies to qiov).

Note that in many places where this function is used,
the previous call is qemu_iovec_reset(), which means
many callers actually want copy (replacing dst content),
not concat.  So we may want to add a wrapper like
qemu_iovec_copy() with the same arguments but which
calls qemu_iovec_reset() before _concat().

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2012-06-11 23:12:11 +04:00
Michael Tokarev
03396148bc allow qemu_iovec_from_buffer() to specify offset from which to start copying
Similar to
 qemu_iovec_memset(QEMUIOVector *qiov, size_t offset,
                   int c, size_t bytes);
the new prototype is:
 qemu_iovec_from_buf(QEMUIOVector *qiov, size_t offset,
                     const void *buf, size_t bytes);

The processing starts at offset bytes within qiov.

This way, we may copy a bounce buffer directly to
a middle of qiov.

This is exactly the same function as iov_from_buf() from
iov.c, so use the existing implementation and rename it
to qemu_iovec_from_buf() to be shorter and to match the
utility function.

As with utility implementation, we now assert that the
offset is inside actual iovec.  Nothing changed for
current callers, because `offset' parameter is new.

While at it, stop using "bounce-qiov" in block/qcow2.c
and copy decrypted data directly from cluster_data
instead of recreating a temp qiov for doing that.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2012-06-11 23:12:11 +04:00
Jim Meyering
eba25057b9 block: prevent snapshot mode $TMPDIR symlink attack
In snapshot mode, bdrv_open creates an empty temporary file without
checking for mkstemp or close failure, and ignoring the possibility
of a buffer overrun given a surprisingly long $TMPDIR.
Change the get_tmp_filename function to return int (not void),
so that it can inform its two callers of those failures.
Also avoid the risk of buffer overrun and do not ignore mkstemp
or close failure.
Update both callers (in block.c and vvfat.c) to propagate
temp-file-creation failure to their callers.

get_tmp_filename creates and closes an empty file, while its
callers later open that presumed-existing file with O_CREAT.
The problem was that a malicious user could provoke mkstemp failure
and race to create a symlink with the selected temporary file name,
thus causing the qemu process (usually root owned) to open through
the symlink, overwriting an attacker-chosen file.

This addresses CVE-2012-2652.
http://bugzilla.redhat.com/CVE-2012-2652

Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-30 14:48:40 +08:00
Paolo Bonzini
dc5a137125 qemu-img: make "info" backing file output correct and easier to use
qemu-img info should use the same logic as qemu when printing the
backing file path, or debugging becomes quite tricky.  We can also
simplify the output in case the backing file has an absolute path
or a protocol.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
6405875cdd block: move field reset from bdrv_open_common to bdrv_close
bdrv_close should leave fields in the same state as bdrv_new.  It is
not up to bdrv_open_common to fix the mess.

Also, backing_format was not being re-initialized.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
947995c09e block: protect path_has_protocol from filenames with colons
path_has_protocol will erroneously return "true" if the colon is part
of a filename.  These names are common with stable device names produced
by udev.  We cannot fully protect against this in case the filename
does not have a path component (e.g. if the current directory is
/dev/disk/by-path), but in the common case there will be a slash before
and path_has_protocol can easily detect that and return false.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
f53f4da9c6 block: simplify path_is_absolute
On Windows, all the logic is already in is_windows_drive and
is_windows_drive_prefix.  On POSIX, there is no need to look
out for colons.

The win32 code changes the behaviour in some cases, we could have
something like "d:foo.img". The old code would treat it as relative
path, the new one as absolute. Now the path is absolute, because to
go from c:/program files/blah to d:foo.img you cannot say c:/program
files/blah/d:foo.img.  You have to say d:foo.img.  But you could also
say it's relative because (I think, at least it was like that in DOS
15 years ago) d:foo.img is relative to the current path of drive D.
Considering how path_is_absolute is used by path_combine, I think it's
better to treat it as absolute.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
fa4478d5c8 block: wait for job callback in block_job_cancel_sync
The limitation on not having I/O after cancellation cannot really be
kept.  Even streaming has a very small race window where you could
cancel a job and have it report completion.  If this window is hit,
bdrv_change_backing_file() will yield and possibly cause accesses to
dangling pointers etc.

So, let's just assume that we cannot know exactly what will happen
after the coroutine has set busy to false.  We can set a very lax
condition:

- if we cancel the job, the coroutine won't set it to false again
(and hence will not call co_sleep_ns again).

- block_job_cancel_sync will wait for the coroutine to exit, which
pretty much ensures no race.

Instead, we track the coroutine that executes the job and put very
strict conditions on what to do while it is quiescent (busy = false).
First of all, the coroutine must never set busy = false while the job
has been cancelled.  Second, the coroutine can be reentered arbitrarily
while it is quiescent, so you cannot really do anything but co_sleep_ns at
that time.  This condition is obeyed by the block_job_sleep_ns function.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
4513eafe92 block: add block_job_sleep_ns
This function abstracts the pretty complex semantics of the "busy"
member of BlockJob.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
0ac9377d04 block: fully delete bs->file when closing
We are reusing bs->file across close/open, which may not cause any
known bugs but is a recipe for trouble.  Prefer bdrv_delete, and
enjoy the new invariant in the implementation of bdrv_delete.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
a275fa42fa block: do not reuse the backing file across bdrv_close/bdrv_open
This is another bug caused by not doing a full cleanup of the BDS
across close/open.  This was found with mirroring by Shaolong Hu,
but it can probably be reproduced also with eject or change.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
3a389e7926 block: another bdrv_append fix
bdrv_append must also copy open_flags to the top, because the snapshot
has BDRV_O_NO_BACKING set.  This causes interesting results if you
later use drive-reopen (not upstream) to reopen the image, and lose
the backing file in the process.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
e023b2e244 block: fix snapshot on QED
QED's opaque data includes a pointer back to the BlockDriverState.
This breaks when bdrv_append shuffles data between bs_new and bs_top.
To avoid this, add a "rebind" function that tells the driver about
the new relationship between the BlockDriverState and its opaque.

The patch also adds rebind to VVFAT for completeness, even though
it is not used with live snapshots.

Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
71df14fcbe block: fix allocation size for dirty bitmap
Also reuse elsewhere the new constant for sizeof(unsigned long) * 8.

The dirty bitmap is allocated in bits but declared as unsigned long.
Thus, its memory block is accessed beyond its end unless the image
is a multiple of 64 chunks (i.e. a multiple of 64 MB).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Paolo Bonzini
63090dac3a block: open backing file as read-only when probing for size
bdrv_img_create will temporarily open the backing file to probe its size.
However, this could be done with a read-write open if the wrong flags are
passed to bdrv_img_create.  Since there is really no documentation on
what flags can be passed, assume that bdrv_img_create receives the flags
with which the new image will be opened; sanitize them when opening
the backing file.

Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Paolo Bonzini
469ef350e1 block: update in-memory backing file and format
These are needed to print "info block" output correctly.  QCOW2 does this
because it needs it to write the header, but QED does not, and common code
is the right place to do it.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Paolo Bonzini
5f3777945d block: push bdrv_change_backing_file error checking up from drivers
This check applies to all drivers, but QED lacks it.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Zhi Yong Wu
4c355d53c6 block: add the support to drain throttled requests
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
[ Iterate until all block devices have processed all requests,
  add comments. - Paolo ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Zhi Yong Wu
5b7e1542cf block: make bdrv_create adopt coroutine
The current qemu.git introduces failure with preallocation and some
sizes:

qemu-img create -f qcow2 new.img 976563K -o preallocation=metadata
qemu-img: qemu-coroutine-lock.c:111: qemu_co_mutex_unlock: Assertion
`mutex->locked == 1' failed.

And lock needs to work in coroutine context. So to fix this issue, we
need to make bdrv_create adopt coroutine at first.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-07 19:33:18 +02:00
Stefan Hajnoczi
c83c66c3b5 block: add 'speed' optional parameter to block-stream
Allow streaming operations to be started with an initial speed limit.
This eliminates the window of time between starting streaming and
issuing block-job-set-speed.  Users should use the new optional 'speed'
parameter instead so that speed limits are in effect immediately when
the job starts.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-27 11:44:50 -03:00
Stefan Hajnoczi
882ec7ce53 block: change block-job-set-speed argument from 'value' to 'speed'
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-27 11:44:50 -03:00
Stefan Hajnoczi
9e6636c72d block: use Error mechanism instead of -errno for block_job_set_speed()
There are at least two different errors that can occur in
block_job_set_speed(): the job might not support setting speeds or the
value might be invalid.

Use the Error mechanism to report the error where it occurs.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-27 11:44:50 -03:00
Stefan Hajnoczi
fd7f8c6537 block: use Error mechanism instead of -errno for block_job_create()
The block job API uses -errno return values internally and we convert
these to Error in the QMP functions.  This is ugly because the Error
should be created at the point where we still have all the relevant
information.  More importantly, it is hard to add new error cases to
this case since we quickly run out of -errno values without losing
information.

Go ahead and use Error directly and don't convert later.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-27 11:44:50 -03:00
Kevin Wolf
621f058940 qcow2: Zero write support
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:30 +02:00
Liu Yuan
80ccf93b88 qemu-img: let 'qemu-img convert' flush data
The 'qemu-img convert -h' advertise that the default cache mode is
'writeback', while in fact it is 'unsafe'.

This patch 1) fix the help manual and 2) let bdrv_close() call bdrv_flush()

2) is needed because some backend storage doesn't have a self-flush
mechanism(for e.g., sheepdog), so we need to call bdrv_flush() to make
sure the image is really writen to the storage instead of hanging around
writeback cache forever.

Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 11:42:41 +02:00
Kevin Wolf
7094f12f86 block: Drain requests in bdrv_close
If an AIO request is in flight that refers to a BlockDriverState that
has been closed and possibly even freed, more or less anything could
happen. I have seen segfaults, -EBADF return values and qcow2 sometimes
actually catches the situation in bdrv_close() and abort()s.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 15:48:52 +02:00
Benoît Canet
077892696b block: add a function to clear incoming live migration flags
This function will clear all BDRV_O_INCOMING flags.

Signed-off-by: Benoit Canet <benoit.canet@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 16:27:56 +02:00
Jeff Cody
f6801b83d0 block: bdrv_append() fixes
A few fixups for bdrv_append():

The new bs (bs_new) passed into bdrv_append() should be anonymous.  Rather
than call bdrv_make_anon() to enforce this, use an assert to catch when a caller
is passing in a bs_new that is not anonymous.

Also, the new top layer should have its backing_format reflect the original
top's format.

And last, after the swap of bs contents, the device_name will have been copied
down. This needs to be cleared to reflect the anonymity of the bs that was
pushed down.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:41 +02:00
Paolo Bonzini
9f25eccc1c block: set job->speed in block_set_speed
There is no need to do this in every implementation of set_speed
(even though there is only one right now).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Paolo Bonzini
3e914655f2 block: fix streaming/closing race
Streaming can issue I/O while qcow2_close is running.  This causes the
L2 caches to become very confused or, alternatively, could cause a
segfault when the streaming coroutine is reentered after closing its
block device.  The fix is to cancel streaming jobs when closing their
underlying device.

The cancellation must be synchronous, on the other hand qemu_aio_wait
will not restart a coroutine that is sleeping in co_sleep.  So add
a flag saying whether streaming has in-flight I/O.  If the busy flag
is false, the coroutine is quiescent and, when cancelled, will not
issue any new I/O.

This protects streaming against closing, but not against deleting.
We have a reference count protecting us against concurrent deletion,
but I still added an assertion to ensure nothing bad happens.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Zhi Yong Wu
498e386c58 block: disable I/O throttling on sync api
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Paolo Bonzini
29cdb2513c block: push recursive flushing up from drivers
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:39 +02:00
Stefan Hajnoczi
e88774971c block: handle -EBUSY in bdrv_commit_all()
Monitor operations that manipulate image files must not execute while a
background job (like image streaming) is in progress.  This prevents
corruptions from happening when two pieces of code are manipulating the
image file without knowledge of each other.

The monitor "commit" command raises QERR_DEVICE_IN_USE when
bdrv_commit() returns -EBUSY but "commit all" has no error handling.
This is easy to fix, although note that we do not deliver a detailed
error about which device was busy in the "commit all" case.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-03-12 15:14:06 +01:00
Jeff Cody
8802d1fdd4 qapi: Introduce blockdev-group-snapshot-sync command
This is a QAPI/QMP only command to take a snapshot of a group of
devices. This is similar to the blockdev-snapshot-sync command, except
blockdev-group-snapshot-sync accepts a list devices, filenames, and
formats.

It is attempted to keep the snapshot of the group atomic; if the
creation or open of any of the new snapshots fails, then all of
the new snapshots are abandoned, and the name of the snapshot image
that failed is returned.  The failure case should not interrupt
any operations.

Rather than use bdrv_close() along with a subsequent bdrv_open() to
perform the pivot, the original image is never closed and the new
image is placed 'in front' of the original image via manipulation
of the BlockDriverState fields.  Thus, once the new snapshot image
has been successfully created, there are no more failure points
before pivoting to the new snapshot.

This allows the group of disks to remain consistent with each other,
even across snapshot failures.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Acked-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-29 15:48:33 +01:00
Paolo Bonzini
b6a127a156 block: drop aio_multiwrite in BlockDriver
These were never used.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-29 12:48:47 +01:00
Hervé Poussineau
f8d3d12857 block: add a transfer rate for floppy types
Floppies must be read at a specific transfer rate, depending of its own format.
Update floppy description table to include required transfer rate.

Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-29 12:48:46 +01:00
Luiz Capitulino
6f382ed226 qmp: add DEVICE_TRAY_MOVED event
It's emitted whenever the tray is moved by the guest or by HMP/QMP
commands.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
2012-02-22 17:23:50 -02:00
Luiz Capitulino
f36f394952 block: bdrv_eject(): Make eject_flag a real bool
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
2012-02-22 17:23:05 -02:00
Luiz Capitulino
329c0a48a9 block: Rename bdrv_mon_event() & BlockMonEventAction
They are QMP events, not monitor events. Rename them accordingly.

Also, move bdrv_emit_qmp_error_event() up in the file. A new event will
be added soon and it's good to have them next each other.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
2012-02-22 17:22:35 -02:00
Stefan Hajnoczi
79c053bde9 block: perform zero-detection during copy-on-read
Copy-on-Read populates the image file with data read from a backing
image.  In order to avoid bloating the image file when all zeroes are
read we should scan the buffer and perform an optimized zero write
operation.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09 16:17:50 +01:00
Stefan Hajnoczi
f08f2ddae0 block: add .bdrv_co_write_zeroes() interface
The ability to zero regions of an image file is a useful primitive for
higher-level features such as image streaming or zero write detection.

Image formats may support an optimized metadata representation instead
of writing zeroes into the image file.  This allows zero writes to be
potentially faster than regular write operations and also preserve
sparseness of the image file.

The .bdrv_co_write_zeroes() interface should be implemented by block
drivers that wish to provide efficient zeroing.

Note that this operation is different from the discard operation, which
may leave the contents of the region indeterminate.  That means
discarded blocks are not guaranteed to contain zeroes and may contain
junk data instead.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09 16:17:50 +01:00
Marcelo Tosatti
e8a6bb9caa block: add bdrv_find_backing_image
Add bdrv_find_backing_image: given a BlockDriverState pointer, and an id,
traverse the backing image chain to locate the id.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 14:49:18 +01:00
Stefan Hajnoczi
eeec61f291 block: add BlockJob interface for long-running operations
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 11:45:26 +01:00
Stefan Hajnoczi
470c05047a block: make copy-on-read a per-request flag
Previously copy-on-read could only be enabled for all requests to a
block device.  This means requests coming from the guest as well as
QEMU's internal requests would perform copy-on-read when enabled.

For image streaming we want to support finer-grained behavior than just
populating the image file from its backing image.  Image streaming
supports partial streaming where a common backing image is preserved.
In this case guest requests should not perform copy-on-read because they
would indiscriminately copy data which should be left in a backing image
from the backing chain.

Introduce a per-request flag for copy-on-read so that a block device can
process both regular and copy-on-read requests.  Overlapping reads and
writes still need to be serialized for correctness when copy-on-read is
happening, so add an in-flight reference count to track this.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 11:45:26 +01:00
Stefan Hajnoczi
2d3735d3bf block: check bdrv_in_use() before blockdev operations
Long-running block operations like block migration and image streaming
must have continual access to their block device.  It is not safe to
perform operations like hotplug, eject, change, resize, commit, or
external snapshot while a long-running operation is in progress.

This patch adds the missing bdrv_in_use() checks so that block migration
and image streaming never have the rug pulled out from underneath them.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 11:45:26 +01:00
Paolo Bonzini
3f3aace830 block: avoid useless checks on acb->bh
Coverity is confused by this "if" and reports leaks on acb->bh.
The bottom half is always deleted before releasing the AIOCB,
in either bdrv_aio_cancel_em or bdrv_aio_bh_cb.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-15 12:40:08 +01:00
Paolo Bonzini
df9309fb43 block: simplify failure handling for bdrv_aio_multiwrite
Now that early failure of bdrv_aio_writev is not possible anymore,
mcb->num_requests can be set before the loop starts.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-15 12:40:07 +01:00
Paolo Bonzini
ad54ae80c7 block: bdrv_aio_* do not return NULL
Initially done with the following semantic patch:

@ rule1 @
expression E;
statement S;
@@
  E =
(
   bdrv_aio_readv
|  bdrv_aio_writev
|  bdrv_aio_flush
|  bdrv_aio_discard
|  bdrv_aio_ioctl
)
     (...);
(
- if (E == NULL) { ... }
|
- if (E)
    { <... S ...> }
)

which however missed the occurrence in block/blkverify.c
(as it should have done), and left behind some unused
variables.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-15 12:40:07 +01:00
Stefan Hajnoczi
922453bca6 block: convert qemu_aio_flush() calls to bdrv_drain_all()
Many places in QEMU call qemu_aio_flush() to complete all pending
asynchronous I/O.  Most of these places actually want to drain all block
requests but there is no block layer API to do so.

This patch introduces the bdrv_drain_all() API to wait for requests
across all BlockDriverStates to complete.  As a bonus we perform checks
after qemu_aio_wait() to ensure that requests really have finished.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:56:06 +01:00
Stefan Hajnoczi
5f8b6491f2 block: wait_for_overlapping_requests() deadlock detection
Debugging a reentrant request deadlock was fun but in the future we need
a quick and obvious way of detecting such bugs.  Add an assert that
checks we are not about to deadlock when waiting for another request.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:52:34 +01:00
Stefan Hajnoczi
bd9533e36e block: implement bdrv_co_is_allocated() boundary cases
Cases beyond the end of the disk image are only implemented for block
drivers that do not provide .bdrv_co_is_allocated().  It's worth making
these cases generic so that block drivers that do implement
.bdrv_co_is_allocated() also get them for free.

Suggested-by: Mark Wu <wudxw@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:39 +01:00
Stefan Hajnoczi
ab1859218a block: core copy-on-read logic
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:38 +01:00
Stefan Hajnoczi
d83947ac6d block: request overlap detection
Detect overlapping requests and remember to align to cluster boundaries
if the image format uses them.  This assumes that allocating I/O is
performed in cluster granularity - which is true for qcow2, qed, etc.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:38 +01:00
Stefan Hajnoczi
f4658285f9 block: wait for overlapping requests
When copy-on-read is enabled it is necessary to wait for overlapping
requests before issuing new requests.  This prevents races between the
copy-on-read and a write request.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:38 +01:00
Stefan Hajnoczi
53fec9d3fd block: add interface to toggle copy-on-read
The bdrv_enable_copy_on_read()/bdrv_disable_copy_on_read() functions can
be used to programmatically enable or disable copy-on-read for a block
device.  Later patches add the actual copy-on-read logic.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:38 +01:00
Stefan Hajnoczi
dbffbdcfff block: add request tracking
The block layer does not know about pending requests.  This information
is necessary for copy-on-read since overlapping requests must be
serialized to prevent races that corrupt the image.

The BlockDriverState gets a new tracked_request list field which
contains all pending requests.  Each request is a BdrvTrackedRequest
record with sector_num, nb_sectors, and is_write fields.

Note that request tracking is always enabled but hopefully this extra
work is so small that it doesn't justify adding an enable/disable flag.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:38 +01:00
Stefan Hajnoczi
060f51c9de block: add bdrv_co_is_allocated() interface
This patch introduces the public bdrv_co_is_allocated() interface which
can be used to query image allocation status while the VM is running.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:37 +01:00
Stefan Hajnoczi
6aebab140d block: drop .bdrv_is_allocated() interface
Now that all block drivers have been converted to
.bdrv_co_is_allocated() we can drop .bdrv_is_allocated().

Note that the public bdrv_is_allocated() interface is still available
but is in fact a synchronous wrapper around .bdrv_co_is_allocated().

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:37 +01:00
Stefan Hajnoczi
376ae3f1cb block: add .bdrv_co_is_allocated()
This patch adds the .bdrv_co_is_allocated() interface which is identical
to .bdrv_is_allocated() but runs in coroutine context.  Running in
coroutine context implies that other coroutines might be performing I/O
at the same time.   Therefore it must be safe to run while the following
BlockDriver functions are in-flight:

    .bdrv_co_readv()
    .bdrv_co_writev()
    .bdrv_co_flush()
    .bdrv_co_is_allocated()

The new .bdrv_co_is_allocated() interface is useful because it can be
used when a VM is running, whereas .bdrv_is_allocated() is a synchronous
interface that does not cope with parallel requests.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:36 +01:00
Stefan Hajnoczi
05c4af54c6 block: use public bdrv_is_allocated() interface
There is no need for bdrv_commit() to use the BlockDriver
.bdrv_is_allocated() interface directly.  Converting to the public
interface gives us the freedom to drop .bdrv_is_allocated() entirely in
favor of a new .bdrv_co_is_allocated() in the future.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:36 +01:00
Zhi Yong Wu
727f005e6a hmp/qmp: add block_set_io_throttle
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:35 +01:00
Zhi Yong Wu
98f90dba5e block: add I/O throttling algorithm
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:35 +01:00
Zhi Yong Wu
0563e19151 block: add the blockio limits command line support
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:35 +01:00
Anthony Liguori
0f15423c32 block: allow migration to work with image files (v3)
Image files have two types of data: immutable data that describes things like
image size, backing files, etc. and mutable data that includes offset and
reference count tables.

Today, image formats aggressively cache mutable data to improve performance.  In
some cases, this happens before a guest even starts.  When dealing with live
migration, since a file is open on two machines, the caching of meta data can
lead to data corruption.

This patch addresses this by introducing a mechanism to invalidate any cached
mutable data a block driver may have which is then used by the live migration
code.

NB, this still requires coherent shared storage.  Addressing migration without
coherent shared storage (i.e. NFS) requires additional work.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-21 14:58:48 -06:00
Kevin Wolf
ca716364f0 block: Make cache=unsafe flush to the OS
cache=unsafe completely ignored bdrv_flush, because flushing the host disk
costs a lot of performance. However, this means that qcow2 images (and
potentially any other format) can lose data even after the guest has issued a
flush if the qemu process crashes/is killed. In case of a host crash, data loss
is certainly expected with cache=unsafe, but if just the qemu process dies this
is a bit too unsafe.

Now that we have two separate flush functions, we can choose to flush
everythign to the OS, but don't enforce that it's physically written to the
disk.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-11-11 14:02:59 +01:00
Kevin Wolf
eb489bb1ec block: Introduce bdrv_co_flush_to_os
qcow2 has a writeback metadata cache, so flushing a qcow2 image actually
consists of writing back that cache to the protocol and only then flushes the
protocol in order to get everything stable on disk.

This introduces a separate bdrv_co_flush_to_os to reflect the split.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-11-11 14:02:59 +01:00
Kevin Wolf
c68b89acd6 block: Rename bdrv_co_flush to bdrv_co_flush_to_disk
There are two different types of flush that you can do: Flushing one level up
to the OS (i.e. writing data to the host page cache) or flushing it all the way
down to the disk. The existing functions flush to the disk, reflect this in the
function name.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-11-11 14:02:59 +01:00
Paolo Bonzini
025ccaa7f9 block: add eject request callback
Recent versions of udev always keep the tray locked so that the kernel
can observe "eject request" events (aka tray button presses) even on
discs that aren't mounted.  Add support for these events in the ATAPI
and SCSI cd drive device models.

To let management cope with the behavior of udev, an event should also
be added for "tray opened/closed".  This way, after issuing an "eject"
command, management can poll until the guests actually reacts to the
command.  They can then issue the "change" command after the tray has been
opened, or try with "eject -f" after a (configurable?) timeout.  However,
with this patch and the corresponding support in the device models,
at least it is possible to do a manual two-step eject+change sequence.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-11-11 14:02:57 +01:00
Anthony Liguori
8494a397b6 Merge remote-tracking branch 'kwolf/for-anthony' into staging
Conflicts:
	block/vmdk.c
2011-10-31 11:09:00 -05:00
Stefan Hajnoczi
03f541bd6e block: reinitialize across bdrv_close()/bdrv_open()
Several BlockDriverState fields are not being reinitialized across
bdrv_close()/bdrv_open().  Make sure they are reset to their default
values.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-28 19:25:50 +02:00
Stefan Hajnoczi
e7c637967e block: set bs->read_only before .bdrv_open()
Several block drivers set bs->read_only in .bdrv_open() but
block.c:bdrv_open_common() clobbers its value.  Additionally, QED uses
bdrv_is_read_only() in .bdrv_open() to decide whether to perform
consistency checks.

The correct ordering is to initialize bs->read_only from the open flags
before calling .bdrv_open().  This way block drivers can override it if
necessary and can use bdrv_is_read_only() in .bdrv_open().

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-28 19:25:49 +02:00
Kevin Wolf
2b5728164f block: Fix bdrv_open use after free
tmp_filename was used outside the block it was defined in, i.e. after it went
out of scope. Move its declaration to the top level.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-28 19:25:49 +02:00
Kevin Wolf
3574c60819 block: Remove dead code
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-28 19:25:49 +02:00
Luiz Capitulino
f795e743bd Drop qemu-objects.h from modules that don't require it
Previous commits dropped most qobjects usage from qemu modules
(now they are a low level interface used by the QAPI). However,
some modules still include the qemu-objects.h header file.

This commit drops qemu-objects.h from some of those modules
and includes qjson.h instead, which is what they actually need.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2011-10-27 11:48:47 -02:00
Luiz Capitulino
f11f57e405 qapi: Convert query-blockstats
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2011-10-27 11:48:47 -02:00
Luiz Capitulino
b202381800 qapi: Convert query-block
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2011-10-27 11:48:47 -02:00
Luiz Capitulino
58e21ef5ab block: Rename the BlockIOStatus enum values
The biggest change is to rename its prefix from BDRV_IOS to
BLOCK_DEVICE_IO_STATUS.

Next commit will convert the query-block command to the QAPI
and that's how the enumeration is going to be generated.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2011-10-27 11:48:47 -02:00
Luiz Capitulino
d6bf279e7a block: iostatus: Drop BDRV_IOS_INVAL
A future commit will convert bdrv_info() to the QAPI and it won't
provide IOS_INVAL.

Luckily all we have to do is to add a new 'iostatus_enabled'
member to BlockDriverState and use it instead.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2011-10-27 11:48:47 -02:00
Paolo Bonzini
6db39ae2e2 block: change discard to co_discard
Since coroutine operation is now mandatory, convert both bdrv_discard
implementations to coroutines.  For qcow2, this means taking the lock
around the operation.  raw-posix remains synchronous.

The bdrv_discard callback is then unused and can be eliminated.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-21 17:34:14 +02:00
Paolo Bonzini
8b94ff8573 block: change flush to co_flush
Since coroutine operation is now mandatory, convert all bdrv_flush
implementations to coroutines.  For qcow2, this means taking the lock.
Other implementations are simpler and just forward bdrv_flush to the
underlying protocol, so they can avoid the lock.

The bdrv_flush callback is then unused and can be eliminated.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-21 17:34:14 +02:00
Paolo Bonzini
4265d620c5 block: add bdrv_co_discard and bdrv_aio_discard support
This similarly adds support for coroutine and asynchronous discard.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-21 17:34:13 +02:00
Paolo Bonzini
07f0761574 block: unify flush implementations
Add coroutine support for flush and apply the same emulation that
we already do for read/write.  bdrv_aio_flush is simplified to always
go through a coroutine.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-21 17:34:13 +02:00
Paolo Bonzini
35246a6825 block: rename bdrv_co_rw_bh
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-21 17:34:12 +02:00
Stefan Hajnoczi
09f085d59d block: drop bdrv_has_async_rw()
Commit cd74d83345e0e3b708330ab8c4cd9111bb82cda6 ("block: switch
bdrv_read()/bdrv_write() to coroutines") removed the bdrv_has_async_rw()
callers.  This patch removes bdrv_has_async_rw() since it is no longer
used.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-14 17:31:22 +02:00
Stefan Hajnoczi
f8c35c1d59 block: drop .bdrv_read()/.bdrv_write() emulation
There is no need to emulate .bdrv_read()/.bdrv_write() since these
interfaces are only called if aio and coroutine interfaces are not
present.  All valid BlockDrivers must implement either sync, aio, or
coroutine interfaces.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-14 17:31:22 +02:00
Stefan Hajnoczi
8c5873d697 block: drop emulation functions that use coroutines
Block drivers that implement coroutine functions used to get sync and
aio wrappers.  This is no longer necessary since all request processing
now happens in a coroutine.  If a block driver implements the coroutine
interface then none of the other interfaces will be invoked.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-14 17:31:22 +02:00
Stefan Hajnoczi
1a6e115b19 block: switch bdrv_aio_writev() to coroutines
More sync, aio, and coroutine unification.  Make bdrv_aio_writev() go
through coroutine request processing.

Remove the dirty block callback mechanism which was needed only for aio
processing and can be done more naturally in coroutine context.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-13 15:02:54 +02:00
Stefan Hajnoczi
6b7cb2479b block: mark blocks dirty on coroutine write completion
The aio write operation marks blocks dirty when the write operation
completes.  The coroutine write operation marks blocks dirty before
issuing the write operation.

It seems safest to mark the block dirty when the operation completes so
that anything tracking dirty blocks will not act before the change has
been made to the image file.

Make the coroutine write operation dirty blocks on write completion.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-13 15:02:54 +02:00
Stefan Hajnoczi
b2a6137166 block: switch bdrv_aio_readv() to coroutines
More sync, aio, and coroutine unification.  Make bdrv_aio_readv() go
through coroutine request processing.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-13 15:02:54 +02:00
Stefan Hajnoczi
1c9805a398 block: switch bdrv_read()/bdrv_write() to coroutines
The bdrv_read()/bdrv_write() functions call .bdrv_read()/.bdrv_write().
They should go through bdrv_co_do_readv() and bdrv_co_do_writev()
instead in order to unify request processing code across sync, aio, and
coroutine interfaces.  This is also an important step towards removing
BlockDriverState .bdrv_read()/.bdrv_write() in the future.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-13 15:02:53 +02:00
Stefan Hajnoczi
c5fbe57111 block: split out bdrv_co_do_readv() and bdrv_co_do_writev()
The public interface for I/O in coroutine context is bdrv_co_readv() and
bdrv_co_writev().  Split out the request processing code into
bdrv_co_do_readv() and bdrv_co_writev() so that it can be called
internally when we refactor all request processing to use coroutines.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-13 15:02:53 +02:00
Stefan Hajnoczi
1ed20acf2f block: directly invoke .bdrv_* from emulation functions
The emulation functions which supply default BlockDriver .bdrv_*()
functions given another implemented .bdrv_*() function should not use
public bdrv_*() interfaces.  This patch ensures they invoke .bdrv_*()
directly to avoid adding an extra layer of coroutine request processing
and possibly entering an infinite loop.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-13 15:02:53 +02:00
Stefan Hajnoczi
a652d16025 block: directly invoke .bdrv_aio_*() in bdrv_co_io_em()
We will unify block layer request processing across sync, aio, and
coroutines and this means a .bdrv_co_*() emulation function should not
call back into the public interface.  There's no need here, just call
.bdrv_aio_*() directly.

The gory details: bdrv_co_io_em() cannot call back into the public
bdrv_aio_*() interface since that will be handled using coroutines,
which causes us to call into bdrv_co_io_em() again in an infinite loop
:).

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-13 15:02:27 +02:00
Luiz Capitulino
d2078cc238 HMP: Print 'io-status' information
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-11 09:42:45 +02:00
Luiz Capitulino
f04ef60100 QMP: query-status: Add 'io-status' key
Contains the I/O status for the given device. The key is only present
if the device supports it and the VM is configured to stop on errors.

Please, check the documentation being added in this commit for more
information.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-11 09:42:45 +02:00
Luiz Capitulino
28a7282a5d block: Keep track of devices' I/O status
This commit adds support to the BlockDriverState type to keep track
of devices' I/O status.

There are three possible status: BDRV_IOS_OK (no error), BDRV_IOS_ENOSPC
(no space error) and BDRV_IOS_FAILED (any other error). The distinction
between no space and other errors is important because a management
application may want to watch for no space in order to extend the
space assigned to the VM and put it to run again.

Qemu devices supporting the I/O status feature have to enable it
explicitly by calling bdrv_iostatus_enable() _and_ have to be
configured to stop the VM on errors (ie. werror=stop|enospc or
rerror=stop).

In case of multiple errors being triggered in sequence only the first
one is stored. The I/O status is always reset to BDRV_IOS_OK when the
'cont' command is issued.

Next commits will add support to some devices and extend the
query-block/info block commands to return the I/O status information.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-10-11 09:41:47 +02:00
Stefan Hajnoczi
59370aaa56 trace: add arguments to bdrv_co_io_em() trace event
It is useful to know the BlockDriverState as well as the
sector_num/nb_sectors of an emulated .bdrv_co_*() request.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-10-03 10:56:27 +01:00
Stefan Hajnoczi
28dcee10c5 trace: trace bdrv_open_common()
bdrv_open_common() is a useful point to trace since it reveals the
filename and block driver for a given BlockDriverState.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-10-03 10:55:50 +01:00
Markus Armbruster
7d4b4ba5c2 block: New change_media_cb() parameter load
To let device models distinguish between eject and load.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:22 +02:00
Markus Armbruster
29e05f2022 block: Reset buffer alignment on detach
BlockDriverState member buffer_alignment is initially 512.  The device
model may set them, with bdrv_set_buffer_alignment().  If the device
model gets detached (hot unplug), the device's alignment is left
behind.  Only okay because device hot unplug automatically destroys
the BlockDriverState.  But that's a questionable feature, best not to
rely on it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:22 +02:00
Markus Armbruster
7b6f9300d5 block: New bdrv_set_buffer_alignment()
Device models should be able to set it without an unclean include of
block_int.h.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:22 +02:00
Markus Armbruster
e4def80b36 block: Show whether the virtual tray is open in info block
Need to ask the device, so this requires new BlockDevOps member
is_tray_open().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:21 +02:00
Markus Armbruster
9e6a4c9177 block: Drop BlockDriverState member removable
It's a confused mess (see previous commit).  No users remain.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:21 +02:00
Markus Armbruster
2c6942fa7b block: Clean up remaining users of "removable"
BlockDriverState member removable is a confused mess.  It is true when
an ide-cd, scsi-cd or floppy qdev is attached, or when the
BlockDriverState was created with -drive if={floppy,sd} or -drive
if={ide,scsi,xen,none},media=cdrom ("created removable"), except when
an ide-hd, scsi-hd, scsi-generic or virtio-blk qdev is attached.

Three users remain:

1. eject_device(), via bdrv_is_removable() uses it to determine
   whether a block device can eject media.

2. bdrv_info() is monitor command "info block".  QMP documentation
   says "true if the device is removable, false otherwise".  From the
   monitor user's point of view, the only sensible interpretation of
   "is removable" is "can eject media with monitor commands eject and
   change".

A block device can eject media unless a device is attached that
doesn't support it.  Switch the two users over to new
bdrv_dev_has_removable_media() that returns exactly that.

3. bdrv_getlength() uses to suppress its length cache when media can
   change (see commit 46a4e4e6).  Media change is either monitor
   command change (updates the length cache), monitor command eject
   (doesn't update the length cache, easily fixable), or physical
   media change (invalidates length cache, not so easily fixable).

I'm refraining from improving anything here, because this series is
long enough already.  Instead, I simply switch it over to
bdrv_dev_has_removable_media() as well.

This changes the behavior of the length cache and of monitor commands
eject and change in two cases:

a. drive not created removable, no device attached

   The commit makes the drive removable, and defeats the length cache.

   Example: -drive if=none

b. drive created removable, but the attached drive is non-removable,
   and doesn't call bdrv_set_removable(..., 0) (most devices don't)

   The commit makes the drive non-removable, and enables the length
   cache.

   Example: -drive if=xen,media=cdrom -M xenpv

   The other non-removable devices that don't call
   bdrv_set_removable() can't currently use a drive created removable,
   either because they aren't qdevified, or because they lack a drive
   property.  Won't stay that way.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:21 +02:00
Markus Armbruster
025e849a50 block: Rename bdrv_set_locked() to bdrv_lock_medium()
While there, make the locked parameter bool.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:20 +02:00
Markus Armbruster
f107639a6f block: Drop medium lock tracking, ask device models instead
Requires new BlockDevOps member is_medium_locked().  Implement for IDE
and SCSI CD-ROMs.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:20 +02:00
Markus Armbruster
fdec4404dd block: Leave enforcing tray lock to device models
The device model knows best when to accept the guest's eject command.
No need to detour through the block layer.

bdrv_eject() can't fail anymore.  Make it void.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:20 +02:00
Markus Armbruster
22cf56c4d8 block: Drop tray status tracking, no longer used
Commit 4be9762a is now completely redone.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:20 +02:00
Markus Armbruster
a1aff5bf67 block: Revert entanglement of bdrv_is_inserted() with tray status
Commit 4be9762a changed bdrv_is_inserted() to fail when the tray is
open.  Unfortunately, there are two different kinds of users, with
conflicting needs.

1. Device models using bdrv_eject(), currently ide-cd and scsi-cd.
They expect bdrv_is_inserted() to reflect the tray status.  Commit
4be9762a makes them happy.

2. Code that wants to know whether a BlockDriverState has media, such
as find_image_format(), bdrv_flush_all().  Commit 4be9762a makes them
unhappy.  In particular, it breaks flush on VM stop for media ejected
by the guest.

Revert the change to bdrv_is_inserted().  Check the tray status in the
device models instead.

Note on IDE: Since only ATAPI devices have a tray, and they don't
accept ATA commands since the recent commit "ide: Reject ATA commands
specific to drive kinds", checking in atapi.c suffices.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-12 15:17:20 +02:00
Markus Armbruster
07b70bfbb3 savevm: Include writable devices with removable media
savevm and loadvm silently ignore block devices with removable media,
such as floppies and SD cards.  Rolling back a VM to a previous
checkpoint will *not* roll back writes to block devices with removable
media.

Moreover, bdrv_is_removable() is a confused mess, and wrong in at
least one case: it considers "-drive if=xen,media=cdrom -M xenpv"
removable.  It'll be cleaned up later in this series.

Read-only block devices are also ignored, but that's okay.

Fix by ignoring only read-only block devices and empty block devices.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-06 11:24:07 +02:00
Markus Armbruster
c602a489f9 block: Clean up bdrv_flush_all()
Change (!bdrv_is_removable(bs) || bdrv_is_inserted(bs)) to just
bdrv_is_inserted().  Rationale:

    The value of bdrv_is_removable(bs) matters only when
    bdrv_is_inserted(bs) is false.

    bdrv_is_inserted(bs) is true when bs is open (bs->drv != NULL) and
    not an empty host drive (CD-ROM or floppy).

    Therefore, bdrv_is_removable(bs) matters only when:

    1. bs is not open
       old: may call bdrv_flush(bs), which does nothing
       new: won't call

    2. bs is an empty host drive
       old: may call bdrv_flush(bs), which calls driver method
            raw_flush(), which calls fdatasync() or equivalent, which
            can't do anything useful while the drive is empty
       new: won't call

Result is bs->drv && !bdrv_is_read_only(bs) && bdrv_is_inserted(bs).
bdrv_is_inserted(bs) implies bs->drv.  Drop the redundant test.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-06 11:24:07 +02:00
Markus Armbruster
8e49ca4624 block: Leave tracking media change to device models
hw/fdc.c is the only one that cares.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-06 11:24:06 +02:00
Markus Armbruster
145feb176f block: Split change_cb() into change_media_cb(), resize_cb()
Multiplexing callbacks complicates matters needlessly.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-06 11:23:51 +02:00
Markus Armbruster
0e49de5232 block: Generalize change_cb() to BlockDevOps
So we can more easily add device model callbacks.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-06 11:23:51 +02:00
Markus Armbruster
fa879d62eb block: Attach non-qdev devices as well
For now, this just protects against programming errors like having the
same drive back multiple non-qdev devices, or untimely bdrv_delete().
Later commits will add other interesting uses.

While there, rename BlockDriverState member peer to dev, bdrv_attach()
to bdrv_attach_dev(), bdrv_detach() to bdrv_detach_dev(), and
bdrv_get_attached() to bdrv_get_attached_dev().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-06 11:23:51 +02:00
Stefan Weil
541dc0d47f Use new macro QEMU_PACKED for packed structures
Most changes were made using these commands:

git grep -la '__attribute__((packed))'|xargs perl -pi -e 's/__attribute__\(\(packed\)\)/QEMU_PACKED/'
git grep -la '__attribute__ ((packed))'|xargs perl -pi -e 's/__attribute__ \(\(packed\)\)/QEMU_PACKED/'
git grep -la '__attribute__((__packed__))'|xargs perl -pi -e 's/__attribute__\(\(__packed__\)\)/QEMU_PACKED/'
git grep -la '__attribute__ ((__packed__))'|xargs perl -pi -e 's/__attribute__ \(\(__packed__\)\)/QEMU_PACKED/'
git grep -la '__attribute((packed))'|xargs perl -pi -e 's/__attribute\(\(packed\)\)/QEMU_PACKED/'

Whitespace in linux-user/syscall_defs.h was fixed manually
to avoid warnings from scripts/checkpatch.pl.

Manual changes were also applied to hw/pc.c.

I did not fix indentation with tabs in block/vvfat.c.
The patch will show 4 errors with scripts/checkpatch.pl.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-09-03 10:45:59 +00:00
Christoph Hellwig
c488c7f649 block: latency accounting
Account the total latency for read/write/flush requests.  This allows
management tools to average it based on a snapshot of the nr ops
counters and allow checking for SLAs or provide statistics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-26 18:18:38 +02:00
Christoph Hellwig
a597e79ce1 block: explicit I/O accounting
Decouple the I/O accounting from bdrv_aio_readv/writev/flush and
make the hardware models call directly into the accounting helpers.

This means:
 - we do not count internal requests from image formats in addition
   to guest originating I/O
 - we do not double count I/O ops if the device model handles it
   chunk wise
 - we only account I/O once it actuall is done
 - can extent I/O accounting to synchronous or coroutine I/O easily
 - implement I/O latency tracking easily (see the next patch)

I've conveted the existing device model callers to the new model,
device models that are using synchronous I/O and weren't accounted
before haven't been updated yet.  Also scsi hasn't been converted
to the end-to-end accounting as I want to defer that after the pending
scsi layer overhaul.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-25 18:18:42 +02:00
Christoph Hellwig
e8045d6726 block: include flush requests in info blockstats
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:14 +02:00
Stefan Hajnoczi
92196b2f56 block: add cache=directsync parameter to -drive
This patch adds -drive cache=directsync for O_DIRECT | O_SYNC host file
I/O with no disk write cache presented to the guest.

This mode is useful when guests may not be sending flushes when
appropriate and therefore leave data at risk in case of power failure.
When cache=directsync is used, write operations are only completed to
the guest when data is safely on disk.

This new mode is like cache=writethrough but it bypasses the host page
cache.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 14:15:17 +02:00
Stefan Hajnoczi
c3993cdca3 block: parse cache mode flags in a single place
This patch introduces bdrv_parse_cache_flags() which sets open flags
given a cache mode.  Previously this was duplicated in blockdev.c and
qemu-img.c.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 14:15:17 +02:00
Robert Wang
d62b5dea30 fix code format
Fix code format to make checkpatch.pl happy.

Signed-off-by: Robert Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-22 10:17:52 -05:00
Anthony Liguori
7267c0947d Use glib memory allocation and free functions
qemu_malloc/qemu_free no longer exist after this commit.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-20 23:01:08 -05:00
Kevin Wolf
e7a8a7837a block: Use bdrv_co_* instead of synchronous versions in coroutines
If we're already in a coroutine, there is no reason to use the synchronous
version of block layer functions when a coroutine one exists. This makes
bdrv_read/write/flush use bdrv_co_* when used inside a coroutine.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-04 11:27:15 +02:00
Kevin Wolf
384acbf46b async: Remove AsyncContext
The purpose of AsyncContexts was to protect qcow and qcow2 against reentrancy
during an emulated bdrv_read/write (which includes a qemu_aio_wait() call and
can run AIO callbacks of different requests if it weren't for AsyncContexts).

Now both qcow and qcow2 are protected by CoMutexes and AsyncContexts can be
removed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-02 15:53:41 +02:00
Kevin Wolf
f9f05dc58c block: Add bdrv_co_readv/writev emulation
In order to be able to call bdrv_co_readv/writev for drivers that don't
implement the functions natively, add an emulation that uses the AIO functions
to implement them.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-02 15:53:40 +02:00
Kevin Wolf
6848542018 block: Emulate AIO functions with bdrv_co_readv/writev
Use the bdrv_co_readv/writev callbacks to implement bdrv_aio_readv/writev and
bdrv_read/write if a driver provides the coroutine version instead of the
synchronous or AIO version.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-02 15:53:40 +02:00
Kevin Wolf
da1fa91d6c block: Add bdrv_co_readv/writev
Add new block driver callbacks bdrv_co_readv/writev, which work on a
QEMUIOVector like bdrv_aio_*, but don't need a callback. The function may only
be called inside a coroutine, so a block driver implementing this interface can
yield instead of blocking during I/O.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-02 15:53:40 +02:00
Frediano Ziglio
5bf3f8e4f7 block: Removed unused function bdrv_write_sync
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-01 12:10:29 +02:00
Markus Armbruster
49aa46bb4b block: Don't let locked flag prevent medium load
Commit aea2a33c made bdrv_eject() obey the locked flag.  Correct for
medium eject (eject_flag set), incorrect for medium load (eject_flag
clear).  See MMC-5 Table 341 "Actions for Lock/Unlock/Eject".

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-01 12:10:28 +02:00
Markus Armbruster
822e1cd17e block: Make BlockDriver method bdrv_eject() return void
Callees always return 0, except for FreeBSD's cdrom_eject(), which
returns -ENOTSUP when the device is in a terminally wedged state.

The only caller is bdrv_eject(), and it maps -ENOTSUP to 0 since
commit 4be9762a.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-01 12:10:28 +02:00
Markus Armbruster
a19712b0db block: Reset device model callbacks on detach
BlockDriverState members change_cb and change_opaque are initially
null.  The device model may set them, with bdrv_set_change_cb().  If
the device model gets detached (hot unplug), they're left dangling.
Only safe because device hot unplug automatically destroys the
BlockDriverState.  But that's a questionable feature, best not to rely
on it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-01 12:09:11 +02:00
Fam Zheng
4a1d5e1fde block: add bdrv_get_allocated_file_size() operation
qemu-img.c wants to count allocated file size of image. Previously it
counts a single bs->file by 'stat' or Window API. As VMDK introduces
multiple file support, the operation becomes format specific with
platform specific meanwhile.

The functions are moved to block/raw-{posix,win32}.c and qemu-img.c calls
bdrv_get_allocated_file_size to count the bs. And also added VMDK code
to count his own extents.

Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-07-19 15:39:08 +02:00
Kevin Wolf
d220894e02 bdrv_img_create: Fix segfault
Block drivers that don't support creating images don't have a size option. Fail
gracefully instead of segfaulting when trying to access the option's value.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-06-08 11:56:40 +02:00
Christoph Hellwig
a659979328 block: clarify the meaning of BDRV_O_NOCACHE
Change BDRV_O_NOCACHE to only imply bypassing the host OS file cache,
but no writeback semantics.  All existing callers are changed to also
specify BDRV_O_CACHE_WB to give them writeback semantics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-06-08 10:39:32 +02:00
Markus Armbruster
8d278467ff block: Remove type hint, it's guest matter, doesn't belong here
No users of bdrv_get_type_hint() left.  bdrv_set_type_hint() can make
the media removable by side effect.  Make that explicit.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-05-19 10:26:23 +02:00
Markus Armbruster
d8aeeb31d5 block QMP: Deprecate query-block's "type", drop info block's "type="
query-block's specification documents response member "type" with
values "hd", "cdrom", "floppy", "unknown".

Its value is unreliable: a block device used as floppy has type
"floppy" if created with if=floppy, but type "hd" if created with
if=none.

That's because with if=none, the type is at best a declaration of
intent: the drive can be connected to any guest device.  Its type is
really the guest device's business.  Reporting it here is wrong.

No known user of QMP uses "type".  It's unlikely that any unknown
users exist, because its value is useless unless you know how the
block device was created.  But then you also know the true value.

Fixing the broken value risks breaking (hypothetical!) clients that
somehow rely on the current behavior.  Not fixing the value risks
breaking (hypothetical!) clients that rely on the value to be
accurate.  Can't entirely avoid hypothetical lossage.  Change the
value to be always "unknown".

This makes "info block" always report "type=unknown".  Pointless.
Change it to not report the type.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-05-19 10:26:19 +02:00
Stefan Weil
a1c7273b82 Fix typos in comments and code (occured -> occurred and related)
The code changed here is an unused data type name (evt_flush_occurred).

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-05-08 10:02:18 +01:00
Stefan Weil
ebabb67a17 Fix typo in code and comments
Replace writeable -> writable

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-05-06 08:19:25 +01:00
Stefan Hajnoczi
46a4e4e608 block: Do not cache device size for removable media
The block layer caches the device size to avoid doing lseek(fd, 0,
SEEK_END) every time this value is needed.  For removable media the
device size becomes stale if a new medium is inserted.  This patch
simply prevents device size caching for removable media.

A smarter solution is to update the cached device size when a new medium
is inserted.  Given that there are currently bugs with CD-ROM media
change I do not want to implement that approach until we've gotten
things correct first.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-04-07 13:51:47 +02:00
Stefan Hajnoczi
b8c6d09589 trace: Trace bdrv_set_locked()
It can be handy to know when the guest locks/unlocks the CD-ROM tray.
This trace event makes that possible.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-04-07 13:51:47 +02:00
Ryan Harper
d22b2f41c4 Do not delete BlockDriverState when deleting the drive
When removing a drive from the host-side via drive_del we currently have
the following path:

drive_del
qemu_aio_flush()
bdrv_close()    // zaps bs->drv, which makes any subsequent I/O get
                // dropped.  Works as designed
drive_uninit()
bdrv_delete()   // frees the bs.  Since the device is still connected to
                // bs, any subsequent I/O is a use-after-free.

The value of bs->drv becomes unpredictable on free.  As long as it
remains null, I/O still gets dropped, however it could become non-null
at any point after the free resulting SEGVs or other QEMU state
corruption.

To resolve this issue as simply as possible, we can chose to not
actually delete the BlockDriverState pointer.  Since bdrv_close()
handles setting the drv pointer to NULL, we just need to remove the
BlockDriverState from the QLIST that is used to enumerate the block
devices.  This is currently handled within bdrv_delete, so move this
into its own function, bdrv_make_anon().

The result is that we can now invoke drive_del, this closes the file
descriptors and sets BlockDriverState->drv to NULL which prevents futher
IO to the device, and since we do not free BlockDriverState, we don't
have to worry about the copy retained in the block devices.

We also don't attempt to remove the qdev property since we are no longer
deleting the BlockDriverState on drives with associated drives.  This
also allows for removing Drives with no devices associated either.

Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-04-07 13:51:47 +02:00
Ryan Harper
301db7c2dd Don't allow multiwrites against a block device without underlying medium
If the block device has been closed, we no longer have a medium to submit
IO against, check for this before submitting io.  This prevents a segfault
further in the code where we dereference elements of the block driver.

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-03-15 13:21:14 +01:00
Stefan Hajnoczi
a13aac04e1 trace: Trace bdrv_aio_flush()
Add a trace event for bdrv_aio_flush() to complement the existing
bdrv_aio_readv() and bdrv_aio_writev() events.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-03-07 15:34:42 +00:00
Blue Swirl
5bbdbb4676 fdc: move floppy geometry guessing to block.c
Other geometry guessing functions already reside in block.c.

Remove some unused or debugging only fields.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-02-20 09:33:17 +00:00
Marcelo Tosatti
8591675f44 block: enable in_use flag
Set block device in use during block migration, disallow drive_del and
bdrv_truncate for in use devices.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-02-07 12:51:19 +01:00
Marcelo Tosatti
db593f2565 Add flag to indicate external users to block device
Certain operations such as drive_del or resize cannot be performed
while external users (eg. block migration) reference the block device.

Add a flag to indicate that.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-02-07 12:51:19 +01:00
Christoph Hellwig
db97ee6a97 block: tell drivers about an image resize
Extend the change_cb callback with a reason argument, and use it
to tell drivers about size changes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-31 10:03:00 +01:00
Stefan Hajnoczi
96df67d1c3 block: Use backing format driver during image creation
The backing format should be honored during image creation.  For some
reason we currently use the image format to open the backing file.  This
fails when the backing file has a different format than the image being
created.  Keep the image and backing format drivers completely separate.

Also print the backing filename if there is an error opening the backing
file instead of the image filename.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-24 16:49:50 +01:00
Blue Swirl
71df0eeb98 block: delete a write-only variable
Avoid a warning with GCC 4.6.0:
/src/qemu/block.c: In function 'bdrv_img_create':
/src/qemu/block.c:2862:25: error: variable 'fmt' set but not used [-Werror=unused-but-set-variable]

CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2011-01-06 18:25:37 +00:00
Christoph Hellwig
bb8bf76fb1 block: add discard support
Add a new bdrv_discard method to free blocks in a mapping image, and a new
drive property to set the granularity for these discard.  If no discard
granularity support is set discard support is disabled.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:11:03 +01:00
Jes Sorensen
4f70f249ca bdrv_img_create() use proper errno return values
Kevin suggested to have bdrv_img_create() return proper -errno values
on error.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:11:03 +01:00
Jes Sorensen
792da93a63 Prevent creating an image with the same filename as backing file
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:11:03 +01:00
Jes Sorensen
f88e1a4201 qemu-img.c: Re-factor img_create()
This patch re-factors img_create() moving the code doing the actual
work into block.c where it can be shared with QEMU. This is needed to
be able to create images from QEMU to be used for live snapshots.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:11:03 +01:00
Stefan Hajnoczi
df2dbb4a50 block: Fix the use of protocols in backing files
Backing filenames may contain a protocol.  The code currently doesn't
consider this case and produces filenames that embed "<protocol>:".
Don't combine filenames if the backing filename contains a protocol.

Based on an earlier patch by Anthony Liguori <aliguori@us.ibm.com>.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:10:59 +01:00
Stefan Hajnoczi
9e0b22f4f2 block: Introduce path_has_protocol() function
The bdrv_find_protocol() function returns NULL if an unknown protocol
name is given.  It returns the "file" protocol when the filename
contains no protocol at all.  This makes it difficult to distinguish
between paths which contain a protocol and those which do not.

Factor out a helper function that tests whether or not a filename has a
protocol.  The next patch makes use of this function.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:10:59 +01:00
Stefan Hajnoczi
16905d7175 block: Make bdrv_create_file() ':' handling consistent
Filenames may start with "<protocol>:" to explicitly use a protocol like
nbd.  Filenames with unknown protocols are rejected in most of QEMU
except for bdrv_create_file().  Even if a file with an invalid filename
can be created, QEMU cannot use it since all the other relevant
functions reject such paths.  Make bdrv_create_file() consistent.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-14 15:44:21 +01:00
Marcelo Tosatti
4dcafbb1eb block: set sector dirty on AIO write completion
Sectors are marked dirty in the bitmap on AIO submission. This is wrong
since data has not reached storage.

Set a given sector as dirty in the dirty bitmap on AIO completion, so that
reading a sector marked as dirty is guaranteed to return uptodate data.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-11-21 09:16:56 -06:00
Marcelo Tosatti
6d59fec11e block: fix shift in dirty bitmap calculation
Otherwise upper 32 bits of bitmap entries are not correctly calculated.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-11-21 09:16:56 -06:00
Kevin Wolf
205ef7961f block: Allow bdrv_flush to return errors
This changes bdrv_flush to return 0 on success and -errno in case of failure.
It's a requirement for implementing proper error handle in users of bdrv_flush.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2010-11-04 12:52:16 +01:00
edison
51ef67270b Copy snapshots out of QCOW2 disk
In order to backup snapshots, created from QCOW2 iamge, we want to copy snapshots out of QCOW2 disk to a seperate storage.
The following patch adds a new option in "qemu-img": qemu-img convert -f qcow2 -O qcow2 -s snapshot_name src_img bck_img.
Right now, it only supports to copy the full snapshot, delta snapshot is on the way.

Changes from V1: all the comments from Kevin are addressed:
Add read-only checking
Fix coding style
Change the name from bdrv_snapshot_load to bdrv_snapshot_load_tmp

Signed-off-by: Disheng Su <edison@cloud.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-10-22 14:49:35 +02:00
Stefan Hajnoczi
bbf0a44081 trace: Trace bdrv_aio_{readv,writev}
Observing block layer aio readv/writev operations is useful for
debugging image formats or understanding guest disk I/O patterns.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2010-10-09 08:17:03 +00:00
Stefan Hajnoczi
6d519a5f95 trace: Trace virtio-blk, multiwrite, and paio_submit
This patch adds trace events that make it possible to observe
virtio-blk.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2010-09-09 16:22:45 -05:00
Anthony Liguori
8b33d9eeba Revert "Make default invocation of block drivers safer (v3)"
This reverts commit 79368c81bf.

Conflicts:

	block.c

I haven't been able to come up with a solution yet for the corruption caused by
unaligned requests from the IDE disk so revert until a solution can be written.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-09-08 17:09:15 -05:00
Kevin Wolf
ee1811965f block: Fix image re-open in bdrv_commit
Arguably we should re-open the backing file with the backing file format and
not with the format of the snapshot image.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-08-30 18:29:22 +02:00
Markus Armbruster
4be9762adb block: Change bdrv_eject() not to drop the image
bdrv_eject() gets called when a device model opens or closes the tray.

If the block driver implements method bdrv_eject(), that method gets
called.  Drivers host_cdrom implements it, and it opens and closes the
physical tray, and nothing else.  When a device model opens, then
closes the tray, media changes only if the user actively changes the
physical media while the tray is open.  This is matches how physical
hardware behaves.

If the block driver doesn't implement method bdrv_eject(), we do
something quite different: opening the tray severs the connection to
the image by calling bdrv_close(), and closing the tray does nothing.
When the device model opens, then closes the tray, media is gone,
unless the user actively inserts another one while the tray is open,
with a suitable change command in the monitor.  This isn't how
physical hardware behaves.  Rather inconvenient when programs
"helpfully" eject media to give you a chance to change it.  The way
bdrv_eject() behaves here turns that chance into a must, which is not
what these programs or their users expect.

Change the default action not to call bdrv_close().  Instead, note the
tray status in new BlockDriverState member tray_open.  Use it in
bdrv_is_inserted().

Arguably, the device models should keep track of tray status
themselves.  But this is less invasive.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-08-03 15:57:22 +02:00
Kevin Wolf
336c1c1255 block: Fix bdrv_has_zero_init
Assuming that any image on a block device is not properly zero-initialized is
actually wrong: Only raw images have this problem. Any other image format
shouldn't care about it, they initialize everything properly themselves.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-08-03 15:57:22 +02:00
Kevin Wolf
8a4266144e block: Change bdrv_commit to handle multiple sectors at once
bdrv_commit copies the image to its backing file sector by sector, which
is (surprise!) relatively slow. Let's take a larger buffer and handle more
sectors at once if possible.

With a 1G qcow2 file, this brought the time bdrv_commit takes down from
5:06 min to 1:14 min for me.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-08-03 15:57:22 +02:00
Blue Swirl
199630b62e Fix -snapshot deleting images on disk change
Block device change command did not copy BDRV_O_SNAPSHOT flag. Thus
the new image did not have this flag and the file got deleted during
opening.

Fix by copying BDRV_O_SNAPSHOT flag.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-26 13:39:40 +02:00
Stefan Weil
c98ac35d87 block: Use error codes from lower levels for error message
"No such file or directory" is a misleading error message
when a user tries to open a file with wrong permissions.

Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-26 13:39:40 +02:00
Anthony Liguori
79368c81bf Make default invocation of block drivers safer (v3)
CVE-2008-2004 described a vulnerability in QEMU whereas a malicious user could
trick the block probing code into accessing arbitrary files in a guest.  To
mitigate this, we added an explicit format parameter to -drive which disabling
block probing.

Fast forward to today, and the vast majority of users do not use this parameter.
libvirt does not use this by default nor does virt-manager.

Most users want block probing so we should try to make it safer.

This patch adds some logic to the raw device which attempts to detect a write
operation to the beginning of a raw device.  If the first 4 bytes happen to
match an image file that has a backing file that we support, it scrubs the
signature to all zeros.  If a user specifies an explicit format parameter, this
behavior is disabled.

I contend that while a legitimate guest could write such a signature to the
header, we would behave incorrectly anyway upon the next invocation of QEMU.
This simply changes the incorrect behavior to not involve a security
vulnerability.

I've tested this pretty extensively both in the positive and negative case.  I'm
not 100% confident in the block layer's ability to deal with zero sized writes
particularly with respect to the aio functions so some additional eyes would be
appreciated.

Even in the case of a single sector write, we have to make sure to invoked the
completion from a bottom half so just removing the zero sized write is not an
option.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-07-15 08:17:06 -05:00
Kevin Wolf
9ac228e02c qcow2/vdi: Change check to distinguish error cases
This distinguishes between harmless leaks and real corruption. Hopefully users
better understand what qemu-img check wants to tell them.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-06 17:05:49 +02:00
Kevin Wolf
e076f3383b qemu-img check: Distinguish different kinds of errors
People think that their images are corrupted when in fact there are just some
leaked clusters. Differentiating several error cases should make the messages
more comprehensible.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-06 17:05:48 +02:00
Kevin Wolf
de189a1b4a block: Handle multiwrite errors only when all requests have completed
Don't try to be clever by freeing all temporary data and calling all callbacks
when the return value (an error) is certain. Doing so has at least two
important problems:

* The temporary data that is freed (qiov, possibly zero buffer) is still used
  by the requests that have not yet completed.
* Calling the callbacks for all requests in the multiwrite means for the caller
  that it may free buffers etc. which are still in use.

Just remember the error value and do the cleanup when all requests have
completed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 15:44:12 +02:00
Kevin Wolf
453f9a1652 block: Fix early failure in multiwrite
bdrv_aio_writev may call the callback immediately (and it will commonly do so
in error cases). Current code doesn't consider this. For details see the
comment added by this patch.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 15:44:12 +02:00
Markus Armbruster
7d0d69509a block: Fix virtual media change for if=none
BlockDriverState member removable controls whether virtual media
change (monitor commands change, eject) is allowed.  It is set when
the "type hint" is BDRV_TYPE_CDROM or BDRV_TYPE_FLOPPY.

The type hint is only set by drive_init().  It sets BDRV_TYPE_FLOPPY
for if=floppy.  It sets BDRV_TYPE_CDROM for media=cdrom and if=ide,
scsi, xen, or none.

if=ide and if=scsi work, because the type hint makes it a CD-ROM.
if=xen likewise, I think.

For the same reason, if=none works when it's used by ide-drive or
scsi-disk.  For other guest devices, there are problems:

* fdc: you can't change virtual media

    $ qemu [...] -drive if=none,id=foo,... -global isa-fdc.driveA=foo
    QEMU 0.12.50 monitor - type 'help' for more information
    (qemu) eject foo
    Device 'foo' is not removable

  unless you add media=cdrom, but that makes it readonly.

* virtio: if you add media=cdrom, you can change virtual media.  If
  you eject, the guest gets I/O errors.  If you change, the guest sees
  the drive's contents suddenly change.

* scsi-generic: if you add media=cdrom, you can change virtual media.
  I didn't test what that does to the guest or the physical device,
  but it can't be pretty.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:02 +02:00
Markus Armbruster
3ac906f771 block: Clean up bdrv_snapshots()
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:02 +02:00
Markus Armbruster
f9092b108f savevm: Survive hot-unplug of snapshot device
savevm.c keeps a pointer to the snapshot block device.  If you manage
to get that device deleted, the pointer dangles, and the next snapshot
operation will crash & burn.  Unplugging a guest device that uses it
does the trick:

    $ MALLOC_PERTURB_=234 qemu-system-x86_64 [...]
    QEMU 0.12.50 monitor - type 'help' for more information
    (qemu) info snapshots
    No available block device supports snapshots
    (qemu) drive_add auto if=none,file=tmp.qcow2
    OK
    (qemu) device_add usb-storage,id=foo,drive=none1
    (qemu) info snapshots
    Snapshot devices: none1
    Snapshot list (from none1):
    ID        TAG                 VM SIZE                DATE       VM CLOCK
    (qemu) device_del foo
    (qemu) info snapshots
    Snapshot devices:
    Segmentation fault (core dumped)

Move management of that pointer to block.c, and zap it when the device
it points becomes unusable.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:02 +02:00
Markus Armbruster
18846dee1a block: Catch attempt to attach multiple devices to a blockdev
For instance, -device scsi-disk,drive=foo -device scsi-disk,drive=foo
happily creates two SCSI disks connected to the same block device.
It's all downhill from there.

Device usb-storage deliberately attaches twice to the same blockdev,
which fails with the fix in place.  Detach before the second attach
there.

Also catch attempt to delete while a guest device model is attached.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:02 +02:00
Ryan Harper
15c7733bb2 Don't reset bs->is_temporary in bdrv_open_common
To fix https://bugs.launchpad.net/qemu/+bug/597402 where qemu fails to
call unlink() on temporary snapshots due to bs->is_temporary getting clobbered
in bdrv_open_common() after being set in bdrv_open() which calls the former.

We don't need to initialize bs->is_temporary in bdrv_open_common().

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:01 +02:00
Christoph Hellwig
39508e7adb block: allow filenames with colons again for host devices
Before the raw/file split we used to allow filenames with colons for host
device only.  While this was more by accident than by design people rely
on it, so we need to bring it back.

So move the host device probing to be before the protocol detection
again.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:01 +02:00
Kevin Wolf
f08145fe16 block: Add bdrv_(p)write_sync
Add new functions that write and flush the written data to disk immediately.
This is what needs to be used for image format metadata to maintain integrity
for cache=... modes that don't use O_DSYNC. (Actually, we only need barriers,
and therefore the functions are defined as such, but flushes is what is
implemented in this patch - we can try to change that later)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-22 14:38:02 +02:00
Blue Swirl
5ffbbc67b5 block: fix a warning and possible truncation
Fix a warning from OpenBSD gcc (3.3.5 (propolice)):
/src/qemu/block.c: In function `bdrv_info_stats_bs':
/src/qemu/block.c:1548: warning: long long int format, long unsigned
int arg (arg 6)

There may be also truncation effects.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:42:30 +02:00
Markus Armbruster
2f399b0aad block: New bdrv_next()
This is a more flexible alternative to bdrv_iterate().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:41:59 +02:00
Markus Armbruster
6ab4b5ab8f block: Decouple block device "commit all" from DriveInfo
do_commit() and mux_proc_byte() iterate over the list of drives
defined with drive_init().  This misses host block devices defined by
other means.  Such means don't exist now, but will be introduced later
in this series.

Change them to use new bdrv_commit_all(), which iterates over all host
block devices.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:41:59 +02:00
Markus Armbruster
abd7f68d08 block: Move error actions from DriveInfo to BlockDriverState
That's where they belong semantically (block device host part), even
though the actions are actually executed by guest device code.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:41:59 +02:00
Miguel Di Ciurcio Filho
feeee5aca7 savevm: Really verify if a drive supports snapshots
Both bdrv_can_snapshot() and bdrv_has_snapshot() does not work as advertized.

First issue: Their names implies different porpouses, but they do the same thing
and have exactly the same code. Maybe copied and pasted and forgotten?
bdrv_has_snapshot() is called in various places for actually checking if there
is snapshots or not.

Second issue: the way bdrv_can_snapshot() verifies if a block driver supports or
not snapshots does not catch all cases. E.g.: a raw image.

So when do_savevm() is called, first thing it does is to set a global
BlockDriverState to save the VM memory state calling get_bs_snapshots().

static BlockDriverState *get_bs_snapshots(void)
{
    BlockDriverState *bs;
    DriveInfo *dinfo;

    if (bs_snapshots)
        return bs_snapshots;
    QTAILQ_FOREACH(dinfo, &drives, next) {
        bs = dinfo->bdrv;
        if (bdrv_can_snapshot(bs))
            goto ok;
    }
    return NULL;
 ok:
    bs_snapshots = bs;
    return bs;
}

bdrv_can_snapshot() may return a BlockDriverState that does not support
snapshots and do_savevm() goes on.

Later on in do_savevm(), we find:

    QTAILQ_FOREACH(dinfo, &drives, next) {
        bs1 = dinfo->bdrv;
        if (bdrv_has_snapshot(bs1)) {
            /* Write VM state size only to the image that contains the state */
            sn->vm_state_size = (bs == bs1 ? vm_state_size : 0);
            ret = bdrv_snapshot_create(bs1, sn);
            if (ret < 0) {
                monitor_printf(mon, "Error while creating snapshot on '%s'\n",
                               bdrv_get_device_name(bs1));
            }
        }
    }

bdrv_has_snapshot(bs1) is not checking if the device does support or has
snapshots as explained above. Only in bdrv_snapshot_create() the device is
actually checked for snapshot support.

So, in cases where the first device supports snapshots, and the second does not,
the snapshot on the first will happen anyways. I believe this is not a good
behavior. It should be an all or nothing process.

This patch addresses these issues by making bdrv_can_snapshot() actually do
what it must do and enforces better tests to avoid errors in the middle of
do_savevm(). bdrv_has_snapshot() is removed and replaced by bdrv_can_snapshot()
where appropriate.

bdrv_can_snapshot() was moved from savevm.c to block.c. It makes more sense to me.

The loadvm_state() function was updated too to enforce that when loading a VM at
least all writable devices must support snapshots too.

Signed-off-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:41:58 +02:00
MORITA Kazutaka
7cdb1f6d30 block: call the snapshot handlers of the protocol drivers
When snapshot handlers are not defined in the format driver, it is
better to call the ones of the protocol driver.  This enables us to
implement snapshot support in the protocol driver.

We need to call bdrv_close() and bdrv_open() handlers of the format
driver before and after bdrv_snapshot_goto() call of the protocol.  It is
because the contents of the block driver state may need to be changed
after loading vmstate.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-04 11:43:40 +02:00
MORITA Kazutaka
2bc93fed76 close all the block drivers before the qemu process exits
This patch calls the close handler of the block driver before the qemu
process exits.

This is necessary because the sheepdog block driver releases the lock
of VM images in the close handler.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-04 11:43:40 +02:00
Kevin Wolf
08a00559f0 block: Assume raw for drives without media
qemu -cdrom /dev/cdrom with an empty CD-ROM drive doesn't work any more because
we try to guess the format and when this fails (because there is no medium) we
exit with an error message.

This patch should restore the old behaviour by assuming raw format for such
drives.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-04 11:43:40 +02:00
Jes Sorensen
eb5a316514 Cleanup: Be consistent and use BDRV_SECTOR_SIZE instead of 512
Clean up block.c and use BDRV_SECTOR_SIZE rather than hard coded
numbers (512) when referring to sector size throughout the code.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-04 11:43:39 +02:00
Jes Sorensen
3e82990b52 Cleanup: bdrv_open() no need to shift total_size just to shift back.
In bdrv_open() there is no need to shift total_size >> 9 just to
multiply it by 512 again just a few lines later, since this is the
only place the variable is used.

Mask with BDRV_SECTOR_MASK to protect against case where we are
passed a corrupted image.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-04 11:43:39 +02:00
Anthony Liguori
358c360feb Merge remote branch 'kwolf/for-anthony' into staging 2010-06-03 14:55:49 -05:00
Luiz Capitulino
637503d122 Monitor: Drop QMP documentation from code
Previous commit added QMP documentation to the qemu-monitor.hx
file, it's is a copy of this information.

While it's good to keep it near code, maintaining two copies of
the same information is too hard and has little benefit as we
don't expect client writers to consult the code to find how to
use a QMP command.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-06-01 13:48:43 -05:00
Nicholas A. Bellinger
1a39685910 block: Add missing bdrv_delete() for SG_IO BlockDriver in find_image_format()
This patch adds a missing bdrv_delete() call in find_image_format() so that a
SG_IO BlockDriver properly releases the temporary BlockDriverState *bs created
from bdrv_file_open()

Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Reported-by: Chris Krumme <chris.krumme@windriver.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-28 13:29:17 +02:00
MORITA Kazutaka
b50cbabc1b add support for protocol driver create_options
This patch enables protocol drivers to use their create options which
are not supported by the format.  For example, protcol drivers can use
a backing_file option with raw format.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-28 13:14:25 +02:00
Kevin Wolf
cbf1dff2f1 block: Fix multiwrite with overlapping requests
With overlapping requests, the total number of sectors is smaller than the sum
of the nb_sectors of both requests.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-28 13:14:25 +02:00
Alexander Graf
016f5cf6ff Add cache=unsafe parameter to -drive
Usually the guest can tell the host to flush data to disk. In some cases we
don't want to flush though, but try to keep everything in cache.

So let's add a new cache value to -drive that allows us to set the cache
policy to most aggressive, disabling flushes. We call this mode "unsafe",
as guest data is not guaranteed to survive host crashes anymore.

This patch also adds a noop function for aio, so we can do nothing in AIO
fashion.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2010-05-26 20:05:14 +02:00
Nicholas Bellinger
396759ad4a block: Add SG_IO device check in refresh_total_sectors()
This patch adds a special case check for scsi-generic devices in
refresh_total_sectors() to skip the subsequent BlockDriver->bdrv_getlength()
that will be returning -ESPIPE from block/raw-posic.c:raw_getlength() for
BlockDriverState->sg=1 devices.

Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-21 11:49:19 +02:00
Nicholas Bellinger
f8ea0b00e0 block: Make find_image_format() return 'raw' BlockDriver for SG_IO devices
This patch adds a special BlockDriverState->sg check in block.c:find_image_format()
after bdrv_file_open() -> block/raw-posix.c:hdev_open() has been called to determine
if we are dealing with a Linux host scsi-generic device.

The patch then returns the BlockDriver * from bdrv_find_format("raw"), skipping the
subsequent bdrv_read() and rest of find_image_format().

Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-21 11:49:19 +02:00
Christoph Hellwig
77be4366ba block: fix sector comparism in multiwrite_req_compare
The difference between the start sectors of two requests can be larger
than the size of the "int" type, which can lead to a not correctly
sorted multiwrite array and thus spurious I/O errors and filesystem
corruption due to incorrect request merges.

So instead of doing the cute sector arithmetics trick spell out the
exact comparisms.

Spotted by Kevin Wolf based on a testcase from Michael Tokarev.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-21 11:49:19 +02:00
Kevin Wolf
35ed5de6be block: Remove special case for vvfat
The special case doesn't really us buy anything. Without it vvfat works more
consistently as a protocol. We get raw on top of vvfat now, which works just
as well as using vvfat directly.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-17 10:20:05 +02:00
Daniel P. Berrange
21955137ee Fix docs for block stats monitor command
The 'parent' field in the 'query-blockstats' monitor command is
part of the top level block device QDict, not part of the 2nd
level 'stats' QDict.

* block.c: Fix docs for 'parent' field in block stats monitor
  command output

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-17 10:20:05 +02:00
Bruce Rogers
af474591e5 use qemu_free() instead of free()
There is a call to free() where qemu_free() should instead be used.

Signed-off-by: Bruce Rogers <brogers@novell.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-17 10:20:05 +02:00
Kevin Wolf
c33491978c block: Fix bdrv_commit
When reopening the image, don't guess the driver, but use the same driver as
was used before. This is important if the format=... option was used for that
image.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-17 10:20:05 +02:00
Kevin Wolf
209930818b block: Fix protocol detection for Windows devices
We can't assume the file protocol for Windows devices, they need the same
detection as other files for which an explicit protocol is not specified.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-17 10:20:05 +02:00
Kevin Wolf
b666d23950 block: Avoid unchecked casts for AIOCBs
Use container_of for one direction and &acb->common for the other one.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-17 10:20:05 +02:00
Jan Kiszka
d748768c09 block: Release allocated options after bdrv_open
They aren't used afterwards nor supposed to be stored by a bdrv_create
handler.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:32 +02:00
Kevin Wolf
294cc35f3d block: Add wr_highest_sector blockstat
This adds the wr_highest_sector blockstat which implements what is generally
known as the high watermark. It is the highest offset of a sector written to
the respective BlockDriverState since it has been opened.

The query-blockstat QMP command is extended to add this value to the result,
and also to add the statistics of the underlying protocol in a new "parent"
field. Note that to get the "high watermark" of a qcow2 image, you need to look
into the wr_highest_sector field of the parent (which can be a file, a
host_device, ...). The wr_highest_sector of the qcow2 BlockDriverState itself
is the highest offset on the _virtual_ disk that the guest has written to.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:32 +02:00
Stefan Hajnoczi
51762288b4 block: Cache total_sectors to reduce bdrv_getlength calls
The BlockDriver bdrv_getlength function is called from the I/O code path
when checking that the request falls within the device.  Unfortunately
this involves an lseek system call in the raw protocol; every read or
write request will incur this lseek cost.

Jan Kiszka <jan.kiszka@siemens.com> identified this issue and its
latency overhead.  This patch caches device length in the existing
total_sectors variable so lseek calls can be avoided for fixed size
devices.

Growable devices fall back to the full bdrv_getlength code path because
I have not added logic to detect extending the size of the device in a
write.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:32 +02:00
Stefan Hajnoczi
557df6aca2 block: Set backing_hd to NULL after deleting it
It is safer to set backing_hd to NULL after deleting it so that any use
after deletion is obvious during development.  Happy segfaulting!

This patch should be applied after Kevin Wolf's "vmdk: Convert to
bdrv_open" so that vmdk does not segfault on close.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:31 +02:00
Kevin Wolf
f2feebbd93 block: bdrv_has_zero_init
This fixes the problem that qemu-img's use of no_zero_init only considered the
no_zero_init flag of the format driver, but not of the underlying protocols.

Between the raw/file split and this fix, converting to host devices is broken.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:30 +02:00
Kevin Wolf
66f82ceed6 block: Open the underlying image file in generic code
Format drivers shouldn't need to bother with things like file names, but rather
just get an open BlockDriverState for the underlying protocol. This patch
introduces this behaviour for bdrv_open implementation. For protocols which
need to access the filename to open their file/device/connection/... a new
callback bdrv_file_open is introduced which doesn't get an underlying file
opened.

For now, also some of the more obscure formats use bdrv_file_open because they
open() the file themselves instead of using the block.c functions. They need to
be fixed in later patches.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:30 +02:00
Kevin Wolf
5791533251 block: Avoid forward declaration of bdrv_open_common
Move bdrv_open_common so it's defined before its callers and remove the forward
declaration.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:30 +02:00
Kevin Wolf
b6ce07aa83 block: Split bdrv_open
bdrv_open contains quite some code that is only useful for opening images (as
opposed to opening files by a protocol), for example snapshots.

This patch splits the code so that we have bdrv_open_file() for files (uses
protocols), bdrv_open() for images (uses format drivers) and bdrv_open_common()
for the code common for opening both images and files.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:30 +02:00
Christoph Hellwig
84a12e6648 block: separate raw images from the file protocol
We're running into various problems because the "raw" file access, which
is used internally by the various image formats is entangled with the
"raw" image format, which maps the VM view 1:1 to a file system.

This patch renames the raw file backends to the file protocol which
is treated like other protocols (e.g. nbd and http) and adds a new
"raw" image format which is just a wrapper around calls to the underlying
protocol.

The patch is surprisingly simple, besides changing the probing logical
in block.c to only look for image formats when using bdrv_open and
renaming of the old raw protocols to file there's almost nothing in there.

For creating images, a new bdrv_create_file is introduced which guesses the
protocol to use. This allows using qemu-img create -f raw (or just using the
default) for both files and host devices. Converting the other format drivers
to use this function to create their images is left for later patches.

The only issues still open are in the handling of the host devices.
Firstly in current qemu we can specifiy the host* format names
on various command line acceping images, but the new code can't
do that without adding some translation.  Second the layering breaks
the no_zero_init flag in the BlockDriver used by qemu-img.  I'm not
happy how this is done per-driver instead of per-state so I'll
prepare a separate patch to clean this up.

There's some more cleanup opportunity after this patch, e.g. using
separate lists and registration functions for image formats vs
protocols and maybe even host drivers, but this can be done at a
later stage.

Also there's a check for protocol in bdrv_open for the BDRV_O_SNAPSHOT
case that I don't quite understand, but which I fear won't work as
expected - possibly even before this patch.

Note that this patch requires various recent block patches from Kevin
and me, which should all be in his block queue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:30 +02:00
Stefan Hajnoczi
1e1ea48d42 block: Free iovec arrays allocated by multiwrite_merge()
A new iovec array is allocated when creating a merged write request.
This patch ensures that the iovec array is deleted in addition to its
qiov owner.

Reported-by: Leszek Urbanski <tygrys@moo.pl>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:21:58 +02:00
Stefan Hajnoczi
8a22f02a88 block: Convert first_drv to QLIST
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:21:57 +02:00
Stefan Hajnoczi
1b7bdbc13c block: Convert bdrv_first to QTAILQ
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:21:57 +02:00
Stefan Hajnoczi
b66460e4e9 block: Do not export bdrv_first
The bdrv_first linked list of BlockDriverStates is currently extern so
that block migration can iterate the list.  However, since there is
already a bdrv_iterate() function there is no need to expose bdrv_first.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:21:57 +02:00
Christoph Hellwig
6db956039d block: get rid of the BDRV_O_FILE flag
BDRV_O_FILE is only used to communicate between bdrv_file_open and bdrv_open.
It affects two things:  first bdrv_open only searches for protocols using
find_protocol instead of all image formats and host drivers.  We can easily
move that to the caller and pass the found driver to bdrv_open.  Second
it is used to not force a read-write open of a snapshot file.  But we never
use bdrv_file_open to open snapshots and this behaviour doesn't make sense
to start with.

qemu-io abused the BDRV_O_FILE for it's growable option, switch it to
using bdrv_file_open to make sure we only open files as growable were
we can actually support that.

This patch requires Kevin's "[PATCH] Replace calls of old bdrv_open" to
be applied first.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:08:46 +02:00
Kevin Wolf
d6e9098e10 Replace calls of old bdrv_open
What is known today as bdrv_open2 becomes the new bdrv_open. All remaining
callers of the old function are converted to the new one. In some places they
even know the right format, so they should have used bdrv_open2 from the
beginning.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:08:46 +02:00
Kevin Wolf
8b9b0cc2fd blkdebug: Add events and rules
Block drivers can trigger a blkdebug event whenever they reach a place where it
could be useful to inject an error for testing/debugging purposes.

Rules are read from a blkdebug config file and describe which action is taken
when an event is triggered. For now this is only injecting an error (with a few
options) or changing the state (which is an integer). Rules can be declared to
be active only in a specific state; this way later rules can distiguish on
which path we came to trigger their event.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:08:46 +02:00
Kevin Wolf
7eb58a6c55 block: Fix multiwrite memory leak in error case
Previously multiwrite_user_cb was never called if a request in the multiwrite
batch failed right away because it did set mcb->error immediately. Make it look
more like a normal callback to fix this.

Reported-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2010-04-10 00:39:35 +02:00
Kevin Wolf
0f0b604b00 block: Fix error code in multiwrite for immediate failures
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2010-04-10 00:39:33 +02:00
Kevin Wolf
cb6d3ca07b block: Fix multiwrite error handling
When two requests of the same multiwrite batch fail, the callback of all
requests in that batch were called twice. This could have any kind of nasty
effects, in my case it lead to use after free and eventually a segfault.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2010-04-10 00:14:23 +02:00
Shahar Havivi
fd04a2aeda Wrong error message in block_passwd command
Signed-off-by: Shahar Havivi <shaharh@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-03-17 10:41:38 -05:00
Naphtali Sprei
4dca4b639c block: more read-only changes, related to backing files
Open backing file read-only where possible
Upgrade backing file to read-write during commit, back to read-only after commit
  If upgrade fail, back to read-only. If also fail, "disconnect" the drive.

Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-19 15:32:15 -06:00
Luiz Capitulino
ba14414174 Monitor: remove unneeded checks
It's not needed to check the return of qobject_from_jsonf()
anymore, as an assert() has been added there.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-10 13:46:17 -06:00
Christoph Hellwig
15dc2697a5 block: saner flags filtering in bdrv_open2
Clean up the current mess about figuring out which flags to pass to the
driver.  BDRV_O_FILE, BDRV_O_SNAPSHOT and BDRV_O_NO_BACKING are flags
only used by the block layer internally so filter them out directly.
Previously BDRV_O_NO_BACKING could accidentally be passed to the drivers,
but wasn't ever used.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-10 11:57:03 -06:00
Luiz Capitulino
2582bfedd2 block: BLOCK_IO_ERROR QMP event
This commit introduces the bdrv_mon_event() function, which
should be called by block subsystems (eg. IDE) when a I/O
error occurs, so that an QMP event is emitted.

The following information is currently provided in the event:

- device name
- operation (ie. "read" or "write")
- action taken (eg. "stop")

Event example:

{ "event": "BLOCK_IO_ERROR",
    "data": { "device": "ide0-hd1",
              "operation": "write",
              "action": "stop" },
    "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-10 11:57:03 -06:00
Liran Schour
aaa0eb75e2 Count dirty blocks and expose an API to get dirty count
This will manage dirty counter for each device and will allow to get the
dirty counter from above.

Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-09 16:56:14 -06:00
Christoph Hellwig
e2a305fb13 block: avoid creating too large iovecs in multiwrite_merge
If we go over the maximum number of iovecs support by syscall we get
back EINVAL from the kernel which translate to I/O errors for the guest.

Add a MAX_IOV defintion for platforms that don't have it.  For now we use
the same 1024 define that's used on Linux and various other platforms,
but until the windows block backend implements some kind of vectored I/O
it doesn't matter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 17:08:03 -06:00
Herve Poussineau
f8a83245d9 win32: pair qemu_memalign() with qemu_vfree()
Win32 suffers from a very big memory leak when dealing with SCSI devices.
Each read/write request allocates memory with qemu_memalign (ie
VirtualAlloc) but frees it with qemu_free (ie free).
Pair all qemu_memalign() calls with qemu_vfree() to prevent such leaks.

Signed-off-by: Herve Poussineau <hpoussin@reactos.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 16:41:06 -06:00