Commit Graph

134 Commits

Author SHA1 Message Date
Markus Armbruster
fa879d62eb block: Attach non-qdev devices as well
For now, this just protects against programming errors like having the
same drive back multiple non-qdev devices, or untimely bdrv_delete().
Later commits will add other interesting uses.

While there, rename BlockDriverState member peer to dev, bdrv_attach()
to bdrv_attach_dev(), bdrv_detach() to bdrv_detach_dev(), and
bdrv_get_attached() to bdrv_get_attached_dev().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-09-06 11:23:51 +02:00
Christoph Hellwig
c488c7f649 block: latency accounting
Account the total latency for read/write/flush requests.  This allows
management tools to average it based on a snapshot of the nr ops
counters and allow checking for SLAs or provide statistics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-26 18:18:38 +02:00
Christoph Hellwig
a597e79ce1 block: explicit I/O accounting
Decouple the I/O accounting from bdrv_aio_readv/writev/flush and
make the hardware models call directly into the accounting helpers.

This means:
 - we do not count internal requests from image formats in addition
   to guest originating I/O
 - we do not double count I/O ops if the device model handles it
   chunk wise
 - we only account I/O once it actuall is done
 - can extent I/O accounting to synchronous or coroutine I/O easily
 - implement I/O latency tracking easily (see the next patch)

I've conveted the existing device model callers to the new model,
device models that are using synchronous I/O and weren't accounted
before haven't been updated yet.  Also scsi hasn't been converted
to the end-to-end accounting as I want to defer that after the pending
scsi layer overhaul.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-25 18:18:42 +02:00
Christoph Hellwig
e8045d6726 block: include flush requests in info blockstats
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-23 17:41:14 +02:00
Kevin Wolf
da1fa91d6c block: Add bdrv_co_readv/writev
Add new block driver callbacks bdrv_co_readv/writev, which work on a
QEMUIOVector like bdrv_aio_*, but don't need a callback. The function may only
be called inside a coroutine, so a block driver implementing this interface can
yield instead of blocking during I/O.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-02 15:53:40 +02:00
Markus Armbruster
822e1cd17e block: Make BlockDriver method bdrv_eject() return void
Callees always return 0, except for FreeBSD's cdrom_eject(), which
returns -ENOTSUP when the device is in a terminally wedged state.

The only caller is bdrv_eject(), and it maps -ENOTSUP to 0 since
commit 4be9762a.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-01 12:10:28 +02:00
Markus Armbruster
7bf37feddc block: Make BlockDriver method bdrv_set_locked() return void
The only caller is bdrv_set_locked(), and it ignores the value.

Callees always return 0, except for FreeBSD's cdrom_set_locked(),
which returns -ENOTSUP when the device is in a terminally wedged
state.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-08-01 12:10:28 +02:00
Fam Zheng
4a1d5e1fde block: add bdrv_get_allocated_file_size() operation
qemu-img.c wants to count allocated file size of image. Previously it
counts a single bs->file by 'stat' or Window API. As VMDK introduces
multiple file support, the operation becomes format specific with
platform specific meanwhile.

The functions are moved to block/raw-{posix,win32}.c and qemu-img.c calls
bdrv_get_allocated_file_size to count the bs. And also added VMDK code
to count his own extents.

Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-07-19 15:39:08 +02:00
Fam Zheng
f66fd6c383 VMDK: create different subformats
Add create option 'format', with enums:
    monolithicSparse
    monolithicFlat
    twoGbMaxExtentSparse
    twoGbMaxExtentFlat
Each creates a subformat image file. The default is monolithicSparse.

Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-07-19 15:39:07 +02:00
Devin Nakamura
39aa9a12cc Replaced tabs with spaces in block.h and block_int.h
Signed-off-by: Devin Nakamura <devin122@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-06-15 14:36:15 +02:00
Markus Armbruster
8d278467ff block: Remove type hint, it's guest matter, doesn't belong here
No users of bdrv_get_type_hint() left.  bdrv_set_type_hint() can make
the media removable by side effect.  Make that explicit.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-05-19 10:26:23 +02:00
Marcelo Tosatti
db593f2565 Add flag to indicate external users to block device
Certain operations such as drive_del or resize cannot be performed
while external users (eg. block migration) reference the block device.

Add a flag to indicate that.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-02-07 12:51:19 +01:00
Christoph Hellwig
db97ee6a97 block: tell drivers about an image resize
Extend the change_cb callback with a reason argument, and use it
to tell drivers about size changes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-31 10:03:00 +01:00
Stefan Hajnoczi
75411d236d qed: Add QEMU Enhanced Disk image format
This patch introduces the qed on-disk layout and implements image
creation.  Later patches add read/write and other functionality.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:11:04 +01:00
Christoph Hellwig
bb8bf76fb1 block: add discard support
Add a new bdrv_discard method to free blocks in a mapping image, and a new
drive property to set the granularity for these discard.  If no discard
granularity support is set discard support is disabled.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-17 16:11:03 +01:00
Jes Sorensen
eec77d9e71 qemu-img: Deprecate obsolete -6 and -e options
If -6 or -e is specified, an error message is printed and we exit. It
does not print help() to avoid the error message getting lost in the
noise.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-12-14 15:44:21 +01:00
Gleb Natapov
1ca4d09ae0 Add bootindex parameter to net/block/fd device
If bootindex is specified on command line a string that describes device
in firmware readable way is added into sorted list. Later this list will
be passed into firmware to control boot order.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2010-12-11 21:32:46 +00:00
Kevin Wolf
205ef7961f block: Allow bdrv_flush to return errors
This changes bdrv_flush to return 0 on success and -errno in case of failure.
It's a requirement for implementing proper error handle in users of bdrv_flush.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2010-11-04 12:52:16 +01:00
edison
51ef67270b Copy snapshots out of QCOW2 disk
In order to backup snapshots, created from QCOW2 iamge, we want to copy snapshots out of QCOW2 disk to a seperate storage.
The following patch adds a new option in "qemu-img": qemu-img convert -f qcow2 -O qcow2 -s snapshot_name src_img bck_img.
Right now, it only supports to copy the full snapshot, delta snapshot is on the way.

Changes from V1: all the comments from Kevin are addressed:
Add read-only checking
Fix coding style
Change the name from bdrv_snapshot_load to bdrv_snapshot_load_tmp

Signed-off-by: Disheng Su <edison@cloud.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-10-22 14:49:35 +02:00
Anthony Liguori
8b33d9eeba Revert "Make default invocation of block drivers safer (v3)"
This reverts commit 79368c81bf.

Conflicts:

	block.c

I haven't been able to come up with a solution yet for the corruption caused by
unaligned requests from the IDE disk so revert until a solution can be written.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-09-08 17:09:15 -05:00
Markus Armbruster
4be9762adb block: Change bdrv_eject() not to drop the image
bdrv_eject() gets called when a device model opens or closes the tray.

If the block driver implements method bdrv_eject(), that method gets
called.  Drivers host_cdrom implements it, and it opens and closes the
physical tray, and nothing else.  When a device model opens, then
closes the tray, media changes only if the user actively changes the
physical media while the tray is open.  This is matches how physical
hardware behaves.

If the block driver doesn't implement method bdrv_eject(), we do
something quite different: opening the tray severs the connection to
the image by calling bdrv_close(), and closing the tray does nothing.
When the device model opens, then closes the tray, media is gone,
unless the user actively inserts another one while the tray is open,
with a suitable change command in the monitor.  This isn't how
physical hardware behaves.  Rather inconvenient when programs
"helpfully" eject media to give you a chance to change it.  The way
bdrv_eject() behaves here turns that chance into a must, which is not
what these programs or their users expect.

Change the default action not to call bdrv_close().  Instead, note the
tray status in new BlockDriverState member tray_open.  Use it in
bdrv_is_inserted().

Arguably, the device models should keep track of tray status
themselves.  But this is less invasive.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-08-03 15:57:22 +02:00
Kevin Wolf
336c1c1255 block: Fix bdrv_has_zero_init
Assuming that any image on a block device is not properly zero-initialized is
actually wrong: Only raw images have this problem. Any other image format
shouldn't care about it, they initialize everything properly themselves.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-08-03 15:57:22 +02:00
Christoph Hellwig
55459498b2 block: default to 0 minimal / optiomal I/O size
Currently we set them to 512 bytes unless manually specified.  Unforuntaly
some brain-dead partitioning tools create unaligned partitions if they
get low enough optiomal I/O size values, so don't report any at all
unless explicitly set.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-26 13:39:39 +02:00
Anthony Liguori
79368c81bf Make default invocation of block drivers safer (v3)
CVE-2008-2004 described a vulnerability in QEMU whereas a malicious user could
trick the block probing code into accessing arbitrary files in a guest.  To
mitigate this, we added an explicit format parameter to -drive which disabling
block probing.

Fast forward to today, and the vast majority of users do not use this parameter.
libvirt does not use this by default nor does virt-manager.

Most users want block probing so we should try to make it safer.

This patch adds some logic to the raw device which attempts to detect a write
operation to the beginning of a raw device.  If the first 4 bytes happen to
match an image file that has a backing file that we support, it scrubs the
signature to all zeros.  If a user specifies an explicit format parameter, this
behavior is disabled.

I contend that while a legitimate guest could write such a signature to the
header, we would behave incorrectly anyway upon the next invocation of QEMU.
This simply changes the incorrect behavior to not involve a security
vulnerability.

I've tested this pretty extensively both in the positive and negative case.  I'm
not 100% confident in the block layer's ability to deal with zero sized writes
particularly with respect to the aio functions so some additional eyes would be
appreciated.

Even in the case of a single sector write, we have to make sure to invoked the
completion from a bottom half so just removing the zero sized write is not an
option.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-07-15 08:17:06 -05:00
Kevin Wolf
9ac228e02c qcow2/vdi: Change check to distinguish error cases
This distinguishes between harmless leaks and real corruption. Hopefully users
better understand what qemu-img check wants to tell them.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-06 17:05:49 +02:00
Markus Armbruster
18846dee1a block: Catch attempt to attach multiple devices to a blockdev
For instance, -device scsi-disk,drive=foo -device scsi-disk,drive=foo
happily creates two SCSI disks connected to the same block device.
It's all downhill from there.

Device usb-storage deliberately attaches twice to the same blockdev,
which fails with the fix in place.  Detach before the second attach
there.

Also catch attempt to delete while a guest device model is attached.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:02 +02:00
Markus Armbruster
f8b6cc0070 qdev: Decouple qdev_prop_drive from DriveInfo
Make the property point to BlockDriverState, cutting out the DriveInfo
middleman.  This prepares the ground for block devices that don't have
a DriveInfo.

Currently all user-defined ones have a DriveInfo, because the only way
to define one is -drive & friends (they go through drive_init()).
DriveInfo is closely tied to -drive, and like -drive, it mixes
information about host and guest part of the block device.  I'm
working towards a new way to define block devices, with clean
host/guest separation, and I need to get DriveInfo out of the way for
that.

Fortunately, the device models are perfectly happy with
BlockDriverState, except for two places: ide_drive_initfn() and
scsi_disk_initfn() need to check the DriveInfo for a serial number set
with legacy -drive serial=...  Use drive_get_by_blockdev() there.

Device model code should now use DriveInfo only when explicitly
dealing with drives defined the old way, i.e. without -device.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-07-02 13:18:02 +02:00
Christoph Hellwig
1e297c3235 block: fix physical_block_size calculation
Both SCSI and virtio expect the physical block size relative to the
logical block size.  So get the factor first before calculating the
log2.

Reported-by: Mike Cao <bcao@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-22 14:38:01 +02:00
Markus Armbruster
abd7f68d08 block: Move error actions from DriveInfo to BlockDriverState
That's where they belong semantically (block device host part), even
though the actions are actually executed by guest device code.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-06-15 09:41:59 +02:00
Kevin Wolf
294cc35f3d block: Add wr_highest_sector blockstat
This adds the wr_highest_sector blockstat which implements what is generally
known as the high watermark. It is the highest offset of a sector written to
the respective BlockDriverState since it has been opened.

The query-blockstat QMP command is extended to add this value to the result,
and also to add the statistics of the underlying protocol in a new "parent"
field. Note that to get the "high watermark" of a qcow2 image, you need to look
into the wr_highest_sector field of the parent (which can be a file, a
host_device, ...). The wr_highest_sector of the qcow2 BlockDriverState itself
is the highest offset on the _virtual_ disk that the guest has written to.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:32 +02:00
Kevin Wolf
66f82ceed6 block: Open the underlying image file in generic code
Format drivers shouldn't need to bother with things like file names, but rather
just get an open BlockDriverState for the underlying protocol. This patch
introduces this behaviour for bdrv_open implementation. For protocols which
need to access the filename to open their file/device/connection/... a new
callback bdrv_file_open is introduced which doesn't get an underlying file
opened.

For now, also some of the more obscure formats use bdrv_file_open because they
open() the file themselves instead of using the block.c functions. They need to
be fixed in later patches.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-05-03 10:07:30 +02:00
Stefan Hajnoczi
8a22f02a88 block: Convert first_drv to QLIST
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:21:57 +02:00
Stefan Hajnoczi
1b7bdbc13c block: Convert bdrv_first to QTAILQ
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:21:57 +02:00
Stefan Hajnoczi
b66460e4e9 block: Do not export bdrv_first
The bdrv_first linked list of BlockDriverStates is currently extern so
that block migration can iterate the list.  However, since there is
already a bdrv_iterate() function there is no need to expose bdrv_first.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:21:57 +02:00
Kevin Wolf
8b9b0cc2fd blkdebug: Add events and rules
Block drivers can trigger a blkdebug event whenever they reach a place where it
could be useful to inject an error for testing/debugging purposes.

Rules are read from a blkdebug config file and describe which action is taken
when an event is triggered. For now this is only injecting an error (with a few
options) or changing the state (which is an integer). Rules can be declared to
be active only in a specific state; this way later rules can distiguish on
which path we came to trigger their event.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2010-04-23 16:08:46 +02:00
Christoph Hellwig
8cfacf0790 block: add logical_block_size property
Add a logical block size attribute as various guest side tools only
increase the filesystem sector size based on it, not the advisory
physical block size.

For scsi we already have support for a different logical block size
in place for CDROMs that we can built upon.  Only my recent block
device characteristics VPD page needs some fixups.  Note that we
leave the logial block size for CDROMs hardcoded as the 2k value
is expected for it in general.

For virtio-blk we already have a feature flag claiming to support
a variable logical block size that was added for the s390 kuli
hypervisor.  Interestingly it does not actually change the units
in which the protocol works, which is still fixed at 512 bytes,
but only communicates a different minimum I/O granularity.  So
all we need to do in virtio is to add a trap for unaligned I/O
and round down the device size to the next multiple of the logical
block size.

IDE does not support any other logical block size than 512 bytes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-03-17 10:42:27 -05:00
Naphtali Sprei
4dca4b639c block: more read-only changes, related to backing files
Open backing file read-only where possible
Upgrade backing file to read-write during commit, back to read-only after commit
  If upgrade fail, back to read-only. If also fail, "disconnect" the drive.

Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-19 15:32:15 -06:00
Christoph Hellwig
428c149b0b block: add topology qdev properties
Add three new qdev properties to export block topology information to
the guest.  This is needed to get optimal I/O alignment for RAID arrays
or SSDs.

The options are:

 - physical_block_size to specify the physical block size of the device,
   this is going to increase from 512 bytes to 4096 kilobytes for many
   modern storage devices
 - min_io_size to specify the minimal I/O size without performance impact,
   this is typically set to the RAID chunk size for arrays.
 - opt_io_size to specify the optimal sustained I/O size, this is
   typically the RAID stripe width for arrays.

I decided to not auto-probe these values from blkid which might easily
be possible as I don't know how to deal with these issues on migration.

Note that we specificly only set the physical_block_size, and not the
logial one which is the unit all I/O is described in.  The reason for
that is that IDE does not support increasing the logical block size and
at last for now I want to stick to one meachnisms in queue and allow
for easy switching of transports for a given backing image which would
not be possible if scsi and virtio use real 4k sectors, while ide only
uses the physical block exponent.

To make this more common for the different block drivers introduce a
new BlockConf structure holding all common block properties and a
DEFINE_BLOCK_PROPERTIES macro to add them all together, mirroring
what is done for network drivers.  Also switch over all block drivers
to use it, except for the floppy driver which has weird driveA/driveB
properties and probably won't require any advanced block options ever.

Example usage for a virtio device with 4k physical block size and
8k optimal I/O size:

  -drive file=scratch.img,media=disk,cache=none,id=scratch \
  -device virtio-blk-pci,drive=scratch,physical_block_size=4096,opt_io_size=8192

aliguori: updated patch to take into account BLOCK events

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-10 16:53:25 -06:00
Liran Schour
aaa0eb75e2 Count dirty blocks and expose an API to get dirty count
This will manage dirty counter for each device and will allow to get the
dirty counter from above.

Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-09 16:56:14 -06:00
Kevin Wolf
756e6736a1 block: Add bdrv_change_backing_file
Introduce the functions needed to change the backing file of an image. The
function is implemented for qcow2.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-13 17:14:15 -06:00
Kevin Wolf
12c09b8ce2 qemu-img: There is more than one host device driver
I haven't heard yet of anyone using qemu-img to copy an image to a real floppy,
but it's a valid use case.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-12-03 11:45:50 -06:00
Jan Kiszka
c6d2283068 block migration: Cleanup dirty tracking code
This switches the dirty bitmap to a true bitmap, reducing its footprint
(specifically in caches). It moreover fixes off-by-one bugs in
set_dirty_bitmap (nb_sectors+1 were marked) and bdrv_get_dirty (limit
check allowed one sector behind end of drive). And is drops redundant
dirty_tracking field from BlockDriverState.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-12-03 10:48:52 -06:00
lirans@il.ibm.com
7cd1e32a86 Expose a mechanism to trace block writes
To support live migration without shared storage we need to be able to trace
writes to disk while migrating. This Patch expose dirty block tracking per
device to be polled from upper layer.

Changes from v4:
- Register dirty tracking for each block device.
- Minor coding style issues.
- Block.c will now manage a dirty bitmap per device once
  bdrv_set_dirty_tracking() is called. Bitmap is polled by the upper
  layer (block-migration.c).

Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-11-17 08:03:31 -06:00
Christoph Hellwig
b2e12bc6e3 block: add aio_flush operation
Instead stalling the VCPU while serving a cache flush try to do it
asynchronously.  Use our good old helper thread pool to issue an
asynchronous fdatasync for raw-posix.  Note that while Linux AIO
implements a fdatasync operation it is not useful for us because
it isn't actually implement in asynchronous fashion.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-11 10:19:46 -05:00
Christoph Hellwig
e900a7b748 block: add enable_write_cache flag
Add a enable_write_cache flag in the block driver state, and use it to
decide if we claim to have a volatile write cache that needs controlled
flushing from the guest.  The flag is off if cache=writethrough is
defined because O_DSYNC guarantees that every write goes to stable
storage, and it is on for cache=none and cache=writeback.

Both scsi-disk and ide now use the new flage, changing from their
defaults of always off (ide) or always on (scsi-disk).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-11 10:19:46 -05:00
Kevin Wolf
40b4f53967 Add bdrv_aio_multiwrite
One performance problem of qcow2 during the initial image growth are
sequential writes that are not cluster aligned. In this case, when a first
requests requires to allocate a new cluster but writes only to the first
couple of sectors in that cluster, the rest of the cluster is zeroed - just
to be overwritten by the following second request that fills up the cluster.

Let's try to merge sequential write requests to the same cluster, so we can
avoid to write the zero padding to the disk in the first place.

As a nice side effect, also other formats take advantage of dealing with less
and larger requests.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-11 10:18:06 -05:00
Kevin Wolf
a35e1c177d qcow2: Metadata preallocation
This introduces a qemu-img create option for qcow2 which allows the metadata to
be preallocated, i.e. clusters are reserved in the refcount table and L1/L2
tables, but no data is written to them. Metadata is quite small, so this
happens in almost no time.

Especially with qcow2 on virtio this helps to gain a bit of performance during
the initial writes. However, as soon as create a snapshot, we're back to the
normal slow speed, obviously. So this isn't the real fix, but kind of a cheat
while we're still having trouble with qcow2 on virtio.

Note that the option is disabled by default and needs to be specified
explicitly using qemu-img create -f qcow2 -o preallocation=metadata.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-27 20:30:20 -05:00
Christoph Hellwig
45566e9c99 replace bdrv_{get, put}_buffer with bdrv_{load, save}_vmstate
The VM state offset is a concept internal to the image format.  Replace
the old bdrv_{get,put}_buffer method that require an index into the
image file that is constructed from the VM state offset and an offset
into the vmstate with the bdrv_{load,save}_vmstate that just take an
offset into the VM state.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-16 08:28:13 -05:00
Christoph Hellwig
508c7cb3fa block: add bdrv_probe_device method
Add a bdrv_probe_device method to all BlockDriver instances implementing
host devices to move matching of host device types into the actual drivers.
For now we keep exacly the old matching behaviour based on the devices names,
although we really should have better detetion methods based on device
information in the future.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2009-06-15 14:04:22 +02:00
Christoph Hellwig
c16b5a2ca0 fully split aio_pool from BlockDriver
Now that we have a separate aio pool structure we can remove those
aio pool details from BlockDriver.

Every driver supporting AIO now needs to declare a static AIOPool
with the aiocb size and the cancellation method.  This cleans up the
current code considerably and will make it cleaner and more obvious
to support two different aio implementations behind a single
BlockDriver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-05-27 09:46:03 -05:00
Kevin Wolf
73c632edc4 qcow2: Allow different cluster sizes
Add an option to specify the cluster size of a newly created qcow2 image.
Default is 4k which is the same value that was hard-coded before.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-05-22 10:50:32 -05:00
Kevin Wolf
0e7e1989f7 Convert all block drivers to new bdrv_create
Now we can make use of the newly introduced option structures. Instead of
having bdrv_create carry more and more parameters (which are format specific in
most cases), just pass a option structure as defined by the driver itself.

bdrv_create2() contains an emulation of the old interface to simplify the
transition.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-05-22 10:50:31 -05:00
aliguori
e268ca5232 implement qemu_blockalign (Stefano Stabellini)
this patch adds a buffer_alignment field to BlockDriverState and
implements a qemu_blockalign function that uses that field to allocate a
memory aligned buffer to be used by the block driver.
buffer_alignment is initialized to 512 but each block driver can set
a different value (at the moment none of them do).
This patch modifies ide.c, block-qcow.c, block-qcow2.c and block.c to
use qemu_blockalign instead of qemu_memalign.
There is only one place left that still uses qemu_memalign to allocate
buffers used by block drivers that is posix-aio-compat:handle_aiocb_rw
because it is not possible to get the BlockDriverState from that
function. However I think it is not important because posix-aio-compat
already deals with driver specific code so it is supposed to know its
own needs.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>


git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7229 c046a42c-6fe2-441c-8c8c-71466251a162
2009-04-22 20:20:00 +00:00
aliguori
e97fc193e1 Introduce bdrv_check (Kevin Wolf)
From: Kevin Wolf <kwolf@redhat.com>

Introduce a new bdrv_check function pointer for block drivers. Modify qcow2 to
return an error status in check_refcounts(), so it can implement bdrv_check.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>


git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7214 c046a42c-6fe2-441c-8c8c-71466251a162
2009-04-21 23:11:50 +00:00
aliguori
f141eafe28 push down vector linearization to posix-aio-compat.c (Christoph Hellwig)
Make all AIO requests vectored and defer linearization until the actual
I/O thread.  This prepares for using native preadv/pwritev.

Also enables asynchronous direct I/O by handling that case in the I/O thread.

Qcow and qcow2 propably want to be adopted to directly deal with multi-segment
requests, but that can be implemented later.


Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>


git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7020 c046a42c-6fe2-441c-8c8c-71466251a162
2009-04-07 18:43:24 +00:00
aliguori
178e08a58f Fix savevm after BDRV_FILE size enforcement
We now enforce that you cannot write beyond the end of a non-growable file.
qcow2 files are not growable but we rely on them being growable to do
savevm/loadvm.  Temporarily allow them to be growable by introducing a new
API specifically for savevm read/write operations.

Reported-by: malc
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>


git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6994 c046a42c-6fe2-441c-8c8c-71466251a162
2009-04-05 19:10:55 +00:00
aliguori
5eb456396d block: support known backing format for image create and open (Uri Lublin)
Added a backing_format field to BlockDriverState.
Added bdrv_create2 and drv->bdrv_create2 to create an image with
a known backing file format.
Upon bdrv_open2 if backing format is known use it, instead of
probing the (backing) image.

Signed-off-by: Uri Lublin <uril@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>


git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6908 c046a42c-6fe2-441c-8c8c-71466251a162
2009-03-28 17:55:10 +00:00
aliguori
221f715d90 new scsi-generic abstraction, use SG_IO (Christoph Hellwig)
Okay, I started looking into how to handle scsi-generic I/O in the
new world order.

I think the best is to use the SG_IO ioctl instead of the read/write
interface as that allows us to support scsi passthrough on disk/cdrom
devices, too.  See Hannes patch on the kvm list from August for an
example.

Now that we always do ioctls we don't need another abstraction than
bdrv_ioctl for the synchronous requests for now, and for asynchronous
requests I've added a aio_ioctl abstraction keeping it simple.

Long-term we might want to move the ops to a higher-level abstraction
and let the low-level code fill out the request header, but I'm lazy
enough to leave that to the people trying to support scsi-passthrough
on a non-Linux OS.

Tested lightly by issuing various sg_ commands from sg3-utils in a guest
to a host CDROM device.


Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>


git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6895 c046a42c-6fe2-441c-8c8c-71466251a162
2009-03-28 17:28:41 +00:00
aliguori
6bbff9a0b4 Refactor aio callback allocation to use an aiocb pool (Avi Kivity)
Move the AIOCB allocation code to use a dedicate structure, AIOPool.  AIOCB
specific information, such as the AIOCB size and cancellation routine, is
moved into the pool.

At present, there is exactly one pool per block format driver, maintaining
the status quo.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>


git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6870 c046a42c-6fe2-441c-8c8c-71466251a162
2009-03-20 18:25:59 +00:00
aliguori
eda578e559 Drop internal bdrv_pread()/bdrv_pwrite() APIs (Avi Kivity)
Now that scsi generic no longer uses bdrv_pread() and bdrv_pwrite(), we can
drop the corresponding internal APIs, which overlap bdrv_read()/bdrv_write()
and, being byte oriented, are unnatural for a block device.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>


git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6824 c046a42c-6fe2-441c-8c8c-71466251a162
2009-03-12 19:57:16 +00:00
aliguori
04eeb8b6d6 Add internal scsi generic block API (Avi Kivity)
Add an internal API for the generic block layer to send scsi generic commands
to block format driver.  This means block format drivers no longer need
to consider overloaded nb_sectors parameters.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>


git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6823 c046a42c-6fe2-441c-8c8c-71466251a162
2009-03-12 19:57:12 +00:00
aliguori
c0f4ce7751 monitor: Rework early disk password inquiry (Jan Kiszka)
Reading the passwords for encrypted hard disks during early startup is
broken (I guess for quiet a while now):
 - No monitor terminal is ready for input at this point
 - Forcing all mux'ed terminals into monitor mode can confuse other
   users of that channels

To overcome these issues and to lay the ground for a clean decoupling of
monitor terminals, this patch changes the initial password inquiry as
follows:
 - Prevent autostart if there is some encrypted disk
 - Once the user tries to resume the VM, prompt for all missing
   passwords
 - Only resume if all passwords were accepted

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>


git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6707 c046a42c-6fe2-441c-8c8c-71466251a162
2009-03-05 23:01:01 +00:00
aliguori
71d0770c4c Fix CVE-2008-0928 - insufficient block device address range checking (Anthony Liguori)
Introduce a growable flag that's set by bdrv_file_open().  Block devices should
never be growable, only files that are being used by block devices.

I went through Fabrice's early comments about the patch that was first applied.
While I disagree with that patch, I also disagree with Fabrice's suggestion.

There's no good reason to do the checks in the block drivers themselves.  It
just increases the possibility that this bug could show up again.  Since we're
calling bdrv_getlength() to determine the length, we're giving the block drivers
a chance to chime in and let us know what range is valid.

Basically, this patch makes the BlockDriver API guarantee that all requests are
within 0..bdrv_getlength() which to me seems like a Good Thing.

What do others think?

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>


git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6677 c046a42c-6fe2-441c-8c8c-71466251a162
2009-03-03 17:37:16 +00:00
aliguori
b0a7b120a3 qemu: record devfn on block driver instance (Marcelo Tosatti)
Record PCIDev on the BlockDriverState structure to locate for release
on hot-removal.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>


git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6597 c046a42c-6fe2-441c-8c8c-71466251a162
2009-02-11 15:20:29 +00:00
aliguori
4fc9af53d8 Use an option rom instead of boot sector for -kernel
Generate an option rom instead of using a hijacked boot sector for kernel
booting.  This just requires adding a small option ROM header and a few more
instructions to the boot sector to take over the int19 vector and run our
boot code.

A disk is no longer needed when using -kernel on x86.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>



git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5650 c046a42c-6fe2-441c-8c8c-71466251a162
2008-11-08 16:27:07 +00:00
blueswir1
7ee930d031 Fix warnings that would be caused by ld flag --warn-common
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5240 c046a42c-6fe2-441c-8c8c-71466251a162
2008-09-17 19:04:14 +00:00
aurel32
b5eff35546 Revert fix for CVE-2008-0928. Will be fixed in a different way later.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4041 c046a42c-6fe2-441c-8c8c-71466251a162
2008-03-11 23:30:22 +00:00
aurel32
902b27d0b8 Fix CVE-2008-0928 - insufficient block device address range checking
Qemu 0.9.1 and earlier does not perform range checks for block device
read or write requests, which allows guest host users with root
privileges to access arbitrary memory and escape the virtual machine.


git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4037 c046a42c-6fe2-441c-8c8c-71466251a162
2008-03-11 17:17:59 +00:00
ths
985a03b0ce Real SCSI device passthrough (v4), by Laurent Vivier.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3851 c046a42c-6fe2-441c-8c8c-71466251a162
2007-12-24 16:10:43 +00:00
ths
a36e69ddfe Collecting block device statistics, by Richard W.M. Jones.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3760 c046a42c-6fe2-441c-8c8c-71466251a162
2007-12-02 05:18:19 +00:00
pbrook
faf07963cb Split block API from vl.h.
Remove QEMU_TOOL. Replace with QEMU_IMG and NEED_CPU_H.
Avoid linking qemu-img against whole system emulatior.


git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3578 c046a42c-6fe2-441c-8c8c-71466251a162
2007-11-11 02:51:17 +00:00
ths
3b46e62427 find -type f | xargs sed -i 's/[\t ]*$//g' # Yes, again. Note the star in the regex.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3177 c046a42c-6fe2-441c-8c8c-71466251a162
2007-09-17 08:09:54 +00:00
ths
ec36ba1474 vmdk compatibility level 6 images, by Soren Hansen.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3175 c046a42c-6fe2-441c-8c8c-71466251a162
2007-09-16 21:59:02 +00:00
ths
5fafdf24ef find -type f | xargs sed -i 's/[\t ]$//g' # on most files
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3173 c046a42c-6fe2-441c-8c8c-71466251a162
2007-09-16 21:08:06 +00:00
bellard
19cb37389f better support of host drives
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@2124 c046a42c-6fe2-441c-8c8c-71466251a162
2006-08-19 11:45:59 +00:00
pbrook
ce1a14dc0d Dynamically allocate AIO Completion Blocks.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@2098 c046a42c-6fe2-441c-8c8c-71466251a162
2006-08-07 02:38:06 +00:00
bellard
d15a771da1 qcow2 is now used for '-snapshot' - keep BlockDriverState.total_sectors
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@2094 c046a42c-6fe2-441c-8c8c-71466251a162
2006-08-06 13:35:09 +00:00
bellard
faea38e786 multiple snapshot support
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@2086 c046a42c-6fe2-441c-8c8c-71466251a162
2006-08-05 21:31:00 +00:00
bellard
83f6409109 async file I/O API
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@2075 c046a42c-6fe2-441c-8c8c-71466251a162
2006-08-01 16:21:11 +00:00
pbrook
7a6cba611d Disk cache flush support.
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@1949 c046a42c-6fe2-441c-8c8c-71466251a162
2006-06-04 11:39:07 +00:00
bellard
95389c8681 qcow_make_empty() support (Johannes Schindelin)
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@1716 c046a42c-6fe2-441c-8c8c-71466251a162
2005-12-18 18:28:15 +00:00
bellard
46d4767d93 better BIOS ATA translation support
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@1153 c046a42c-6fe2-441c-8c8c-71466251a162
2004-11-16 01:45:27 +00:00
bellard
e2731add29 fixed block close() method prototype
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@1068 c046a42c-6fe2-441c-8c8c-71466251a162
2004-09-18 19:32:11 +00:00
bellard
ea2384d36e new disk image layer
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@1037 c046a42c-6fe2-441c-8c8c-71466251a162
2004-08-01 21:59:26 +00:00