Commit Graph

192 Commits

Author SHA1 Message Date
Alexander Graf
016f5cf6ff Add cache=unsafe parameter to -drive
Usually the guest can tell the host to flush data to disk. In some cases we
don't want to flush though, but try to keep everything in cache.

So let's add a new cache value to -drive that allows us to set the cache
policy to most aggressive, disabling flushes. We call this mode "unsafe",
as guest data is not guaranteed to survive host crashes anymore.

This patch also adds a noop function for aio, so we can do nothing in AIO
fashion.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2010-05-26 20:05:14 +02:00
Nicholas Bellinger
396759ad4a block: Add SG_IO device check in refresh_total_sectors()
This patch adds a special case check for scsi-generic devices in
refresh_total_sectors() to skip the subsequent BlockDriver->bdrv_getlength()
that will be returning -ESPIPE from block/raw-posic.c:raw_getlength() for
BlockDriverState->sg=1 devices.

Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-21 11:49:19 +02:00
Nicholas Bellinger
f8ea0b00e0 block: Make find_image_format() return 'raw' BlockDriver for SG_IO devices
This patch adds a special BlockDriverState->sg check in block.c:find_image_format()
after bdrv_file_open() -> block/raw-posix.c:hdev_open() has been called to determine
if we are dealing with a Linux host scsi-generic device.

The patch then returns the BlockDriver * from bdrv_find_format("raw"), skipping the
subsequent bdrv_read() and rest of find_image_format().

Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-21 11:49:19 +02:00
Christoph Hellwig
77be4366ba block: fix sector comparism in multiwrite_req_compare
The difference between the start sectors of two requests can be larger
than the size of the "int" type, which can lead to a not correctly
sorted multiwrite array and thus spurious I/O errors and filesystem
corruption due to incorrect request merges.

So instead of doing the cute sector arithmetics trick spell out the
exact comparisms.

Spotted by Kevin Wolf based on a testcase from Michael Tokarev.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-21 11:49:19 +02:00
Kevin Wolf
35ed5de6be block: Remove special case for vvfat
The special case doesn't really us buy anything. Without it vvfat works more
consistently as a protocol. We get raw on top of vvfat now, which works just
as well as using vvfat directly.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-17 10:20:05 +02:00
Daniel P. Berrange
21955137ee Fix docs for block stats monitor command
The 'parent' field in the 'query-blockstats' monitor command is
part of the top level block device QDict, not part of the 2nd
level 'stats' QDict.

* block.c: Fix docs for 'parent' field in block stats monitor
  command output

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-17 10:20:05 +02:00
Bruce Rogers
af474591e5 use qemu_free() instead of free()
There is a call to free() where qemu_free() should instead be used.

Signed-off-by: Bruce Rogers <brogers@novell.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-17 10:20:05 +02:00
Kevin Wolf
c33491978c block: Fix bdrv_commit
When reopening the image, don't guess the driver, but use the same driver as
was used before. This is important if the format=... option was used for that
image.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-17 10:20:05 +02:00
Kevin Wolf
209930818b block: Fix protocol detection for Windows devices
We can't assume the file protocol for Windows devices, they need the same
detection as other files for which an explicit protocol is not specified.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-17 10:20:05 +02:00
Kevin Wolf
b666d23950 block: Avoid unchecked casts for AIOCBs
Use container_of for one direction and &acb->common for the other one.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-17 10:20:05 +02:00
Jan Kiszka
d748768c09 block: Release allocated options after bdrv_open
They aren't used afterwards nor supposed to be stored by a bdrv_create
handler.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:32 +02:00
Kevin Wolf
294cc35f3d block: Add wr_highest_sector blockstat
This adds the wr_highest_sector blockstat which implements what is generally
known as the high watermark. It is the highest offset of a sector written to
the respective BlockDriverState since it has been opened.

The query-blockstat QMP command is extended to add this value to the result,
and also to add the statistics of the underlying protocol in a new "parent"
field. Note that to get the "high watermark" of a qcow2 image, you need to look
into the wr_highest_sector field of the parent (which can be a file, a
host_device, ...). The wr_highest_sector of the qcow2 BlockDriverState itself
is the highest offset on the _virtual_ disk that the guest has written to.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:32 +02:00
Stefan Hajnoczi
51762288b4 block: Cache total_sectors to reduce bdrv_getlength calls
The BlockDriver bdrv_getlength function is called from the I/O code path
when checking that the request falls within the device.  Unfortunately
this involves an lseek system call in the raw protocol; every read or
write request will incur this lseek cost.

Jan Kiszka <jan.kiszka@siemens.com> identified this issue and its
latency overhead.  This patch caches device length in the existing
total_sectors variable so lseek calls can be avoided for fixed size
devices.

Growable devices fall back to the full bdrv_getlength code path because
I have not added logic to detect extending the size of the device in a
write.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:32 +02:00
Stefan Hajnoczi
557df6aca2 block: Set backing_hd to NULL after deleting it
It is safer to set backing_hd to NULL after deleting it so that any use
after deletion is obvious during development.  Happy segfaulting!

This patch should be applied after Kevin Wolf's "vmdk: Convert to
bdrv_open" so that vmdk does not segfault on close.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:31 +02:00
Kevin Wolf
f2feebbd93 block: bdrv_has_zero_init
This fixes the problem that qemu-img's use of no_zero_init only considered the
no_zero_init flag of the format driver, but not of the underlying protocols.

Between the raw/file split and this fix, converting to host devices is broken.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:30 +02:00
Kevin Wolf
66f82ceed6 block: Open the underlying image file in generic code
Format drivers shouldn't need to bother with things like file names, but rather
just get an open BlockDriverState for the underlying protocol. This patch
introduces this behaviour for bdrv_open implementation. For protocols which
need to access the filename to open their file/device/connection/... a new
callback bdrv_file_open is introduced which doesn't get an underlying file
opened.

For now, also some of the more obscure formats use bdrv_file_open because they
open() the file themselves instead of using the block.c functions. They need to
be fixed in later patches.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:30 +02:00
Kevin Wolf
5791533251 block: Avoid forward declaration of bdrv_open_common
Move bdrv_open_common so it's defined before its callers and remove the forward
declaration.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:30 +02:00
Kevin Wolf
b6ce07aa83 block: Split bdrv_open
bdrv_open contains quite some code that is only useful for opening images (as
opposed to opening files by a protocol), for example snapshots.

This patch splits the code so that we have bdrv_open_file() for files (uses
protocols), bdrv_open() for images (uses format drivers) and bdrv_open_common()
for the code common for opening both images and files.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:30 +02:00
Christoph Hellwig
84a12e6648 block: separate raw images from the file protocol
We're running into various problems because the "raw" file access, which
is used internally by the various image formats is entangled with the
"raw" image format, which maps the VM view 1:1 to a file system.

This patch renames the raw file backends to the file protocol which
is treated like other protocols (e.g. nbd and http) and adds a new
"raw" image format which is just a wrapper around calls to the underlying
protocol.

The patch is surprisingly simple, besides changing the probing logical
in block.c to only look for image formats when using bdrv_open and
renaming of the old raw protocols to file there's almost nothing in there.

For creating images, a new bdrv_create_file is introduced which guesses the
protocol to use. This allows using qemu-img create -f raw (or just using the
default) for both files and host devices. Converting the other format drivers
to use this function to create their images is left for later patches.

The only issues still open are in the handling of the host devices.
Firstly in current qemu we can specifiy the host* format names
on various command line acceping images, but the new code can't
do that without adding some translation.  Second the layering breaks
the no_zero_init flag in the BlockDriver used by qemu-img.  I'm not
happy how this is done per-driver instead of per-state so I'll
prepare a separate patch to clean this up.

There's some more cleanup opportunity after this patch, e.g. using
separate lists and registration functions for image formats vs
protocols and maybe even host drivers, but this can be done at a
later stage.

Also there's a check for protocol in bdrv_open for the BDRV_O_SNAPSHOT
case that I don't quite understand, but which I fear won't work as
expected - possibly even before this patch.

Note that this patch requires various recent block patches from Kevin
and me, which should all be in his block queue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:30 +02:00
Stefan Hajnoczi
1e1ea48d42 block: Free iovec arrays allocated by multiwrite_merge()
A new iovec array is allocated when creating a merged write request.
This patch ensures that the iovec array is deleted in addition to its
qiov owner.

Reported-by: Leszek Urbanski <tygrys@moo.pl>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:21:58 +02:00
Stefan Hajnoczi
8a22f02a88 block: Convert first_drv to QLIST
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:21:57 +02:00
Stefan Hajnoczi
1b7bdbc13c block: Convert bdrv_first to QTAILQ
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:21:57 +02:00
Stefan Hajnoczi
b66460e4e9 block: Do not export bdrv_first
The bdrv_first linked list of BlockDriverStates is currently extern so
that block migration can iterate the list.  However, since there is
already a bdrv_iterate() function there is no need to expose bdrv_first.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:21:57 +02:00
Christoph Hellwig
6db956039d block: get rid of the BDRV_O_FILE flag
BDRV_O_FILE is only used to communicate between bdrv_file_open and bdrv_open.
It affects two things:  first bdrv_open only searches for protocols using
find_protocol instead of all image formats and host drivers.  We can easily
move that to the caller and pass the found driver to bdrv_open.  Second
it is used to not force a read-write open of a snapshot file.  But we never
use bdrv_file_open to open snapshots and this behaviour doesn't make sense
to start with.

qemu-io abused the BDRV_O_FILE for it's growable option, switch it to
using bdrv_file_open to make sure we only open files as growable were
we can actually support that.

This patch requires Kevin's "[PATCH] Replace calls of old bdrv_open" to
be applied first.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:08:46 +02:00
Kevin Wolf
d6e9098e10 Replace calls of old bdrv_open
What is known today as bdrv_open2 becomes the new bdrv_open. All remaining
callers of the old function are converted to the new one. In some places they
even know the right format, so they should have used bdrv_open2 from the
beginning.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:08:46 +02:00
Kevin Wolf
8b9b0cc2fd blkdebug: Add events and rules
Block drivers can trigger a blkdebug event whenever they reach a place where it
could be useful to inject an error for testing/debugging purposes.

Rules are read from a blkdebug config file and describe which action is taken
when an event is triggered. For now this is only injecting an error (with a few
options) or changing the state (which is an integer). Rules can be declared to
be active only in a specific state; this way later rules can distiguish on
which path we came to trigger their event.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:08:46 +02:00
Kevin Wolf
7eb58a6c55 block: Fix multiwrite memory leak in error case
Previously multiwrite_user_cb was never called if a request in the multiwrite
batch failed right away because it did set mcb->error immediately. Make it look
more like a normal callback to fix this.

Reported-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2010-04-10 00:39:35 +02:00
Kevin Wolf
0f0b604b00 block: Fix error code in multiwrite for immediate failures
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2010-04-10 00:39:33 +02:00
Kevin Wolf
cb6d3ca07b block: Fix multiwrite error handling
When two requests of the same multiwrite batch fail, the callback of all
requests in that batch were called twice. This could have any kind of nasty
effects, in my case it lead to use after free and eventually a segfault.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2010-04-10 00:14:23 +02:00
Shahar Havivi
fd04a2aeda Wrong error message in block_passwd command
Signed-off-by: Shahar Havivi <shaharh@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-03-17 10:41:38 -05:00
Naphtali Sprei
4dca4b639c block: more read-only changes, related to backing files
Open backing file read-only where possible
Upgrade backing file to read-write during commit, back to read-only after commit
  If upgrade fail, back to read-only. If also fail, "disconnect" the drive.

Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-19 15:32:15 -06:00
Luiz Capitulino
ba14414174 Monitor: remove unneeded checks
It's not needed to check the return of qobject_from_jsonf()
anymore, as an assert() has been added there.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-10 13:46:17 -06:00
Christoph Hellwig
15dc2697a5 block: saner flags filtering in bdrv_open2
Clean up the current mess about figuring out which flags to pass to the
driver.  BDRV_O_FILE, BDRV_O_SNAPSHOT and BDRV_O_NO_BACKING are flags
only used by the block layer internally so filter them out directly.
Previously BDRV_O_NO_BACKING could accidentally be passed to the drivers,
but wasn't ever used.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-10 11:57:03 -06:00
Luiz Capitulino
2582bfedd2 block: BLOCK_IO_ERROR QMP event
This commit introduces the bdrv_mon_event() function, which
should be called by block subsystems (eg. IDE) when a I/O
error occurs, so that an QMP event is emitted.

The following information is currently provided in the event:

- device name
- operation (ie. "read" or "write")
- action taken (eg. "stop")

Event example:

{ "event": "BLOCK_IO_ERROR",
    "data": { "device": "ide0-hd1",
              "operation": "write",
              "action": "stop" },
    "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-10 11:57:03 -06:00
Liran Schour
aaa0eb75e2 Count dirty blocks and expose an API to get dirty count
This will manage dirty counter for each device and will allow to get the
dirty counter from above.

Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-09 16:56:14 -06:00
Christoph Hellwig
e2a305fb13 block: avoid creating too large iovecs in multiwrite_merge
If we go over the maximum number of iovecs support by syscall we get
back EINVAL from the kernel which translate to I/O errors for the guest.

Add a MAX_IOV defintion for platforms that don't have it.  For now we use
the same 1024 define that's used on Linux and various other platforms,
but until the windows block backend implements some kind of vectored I/O
it doesn't matter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 17:08:03 -06:00
Herve Poussineau
f8a83245d9 win32: pair qemu_memalign() with qemu_vfree()
Win32 suffers from a very big memory leak when dealing with SCSI devices.
Each read/write request allocates memory with qemu_memalign (ie
VirtualAlloc) but frees it with qemu_free (ie free).
Pair all qemu_memalign() calls with qemu_vfree() to prevent such leaks.

Signed-off-by: Herve Poussineau <hpoussin@reactos.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 16:41:06 -06:00
Christoph Hellwig
6987307ca3 block: clean up bdrv_open2 structure a bit
Check the whitelist as early as possible instead of continuing the
setup, and move all the error handling code to the end of the
function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 15:42:02 -06:00
Naphtali Sprei
37226ad946 No need anymoe for bdrv_set_read_only
Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 15:42:01 -06:00
Kevin Wolf
9a8c4cceaf block: Return original error codes in bdrv_pread/write
Don't assume -EIO but return the real error.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 14:59:19 -06:00
Anthony Liguori
3e39789b64 Revert "block: prevent multiwrite_merge from creating too large iovecs"
This reverts commit 0076bc0c1d.

Kevin Wolf pointed out that this breaks the mingw32 build.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-20 10:12:23 -06:00
Christoph Hellwig
0076bc0c1d block: prevent multiwrite_merge from creating too large iovecs
If we go over the maximum number of iovecs support by syscall we get
back EINVAL from the kernel which translate to I/O errors for the guest.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-20 08:51:40 -06:00
Christoph Hellwig
1d44952fc7 block: fix cache flushing in bdrv_commit
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-20 08:51:11 -06:00
Naphtali Sprei
03cbdac7ef Disable fall-back to read-only when cannot open drive's file for read-write
Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-20 08:25:22 -06:00
Naphtali Sprei
f5edb014ed Clean-up a little bit the RW related bits of BDRV_O_FLAGS. BDRV_O_RDONLY gone (and so is BDRV_O_ACCESS). Default value for bdrv_flags (0/zero) is READ-ONLY. Need to explicitly request READ-WRITE.
Instead of using the field 'readonly' of the BlockDriverState struct for passing the request,
pass the request in the flags parameter to the function.

Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-20 08:25:22 -06:00
Christoph Hellwig
3f5075ae63 block: flush backing_hd in the right place
The backing device is only modified from bdrv_commit.  So instead of
flushing it every time bdrv_flush is called for the front-end device
only flush it after we're written data to it in bdrv_commit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-13 17:14:15 -06:00
Kevin Wolf
756e6736a1 block: Add bdrv_change_backing_file
Introduce the functions needed to change the backing file of an image. The
function is implemented for qcow2.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-13 17:14:15 -06:00
Kevin Wolf
b783e409bf block: Introduce BDRV_O_NO_BACKING
If an image references a backing file that doesn't exist, qemu-img info fails
to open this image. Exactly in this case the info would be valuable, though:
the user might want to find out which file is missing.

This patch introduces a BDRV_O_NO_BACKING flag to ignore the backing file when
opening the image. qemu-img info is the first user and provides info now even
if the backing file is invalid.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-13 17:14:15 -06:00
Kirill A. Shutemov
114cdfa908 block.c: fix warning with _FORTIFY_SOURCE
CC    block.o
cc1: warnings being treated as errors
block.c: In function 'bdrv_open2':
block.c:400: error: ignoring return value of 'realpath', declared with attribute warn_unused_result

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-12-25 18:19:22 +00:00
Luiz Capitulino
218a536a7a block: Convert bdrv_info_stats() to QObject
Each device statistic information is stored in a QDict and
the returned QObject is a QList of all devices.

This commit should not change user output.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-12-12 07:59:49 -06:00