Many places in QEMU call qemu_aio_flush() to complete all pending
asynchronous I/O. Most of these places actually want to drain all block
requests but there is no block layer API to do so.
This patch introduces the bdrv_drain_all() API to wait for requests
across all BlockDriverStates to complete. As a bonus we perform checks
after qemu_aio_wait() to ensure that requests really have finished.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When SCSI passthrough is being used by the guest with virtio-blk, the
guest is not able to detect disk failures. This is because the status
field is expected by the guest driver to include also the msg_status,
host_status and driver_status fields, but the device is only passing
down the SCSI status.
The patch fixes this, and also makes sure that the guest always sees a
CHECK_CONDITION status when there is valid sense data.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Next commit will convert the query-status command to use the
RunState type as generated by the QAPI.
In order to "transparently" replace the current enum by the QAPI
one, we have to make some changes to some enum values.
As the changes are simple renames, I'll do them in one shot. The
changes are:
- Rename the prefix from RSTATE_ to RUN_STATE_
- RUN_STATE_SAVEVM to RUN_STATE_SAVE_VM
- RUN_STATE_IN_MIGRATE to RUN_STATE_INMIGRATE
- RUN_STATE_PANICKED to RUN_STATE_INTERNAL_ERROR
- RUN_STATE_POST_MIGRATE to RUN_STATE_POSTMIGRATE
- RUN_STATE_PRE_LAUNCH to RUN_STATE_PRELAUNCH
- RUN_STATE_PRE_MIGRATE to RUN_STATE_PREMIGRATE
- RUN_STATE_RESTORE to RUN_STATE_RESTORE_VM
- RUN_STATE_PRE_MIGRATE to RUN_STATE_FINISH_MIGRATE
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Today, when notifying a VM state change with vm_state_notify(),
we pass a VMSTOP macro as the 'reason' argument. This is not ideal
because the VMSTOP macros tell why qemu stopped and not exactly
what the current VM state is.
One example to demonstrate this problem is that vm_start() calls
vm_state_notify() with reason=0, which turns out to be VMSTOP_USER.
This commit fixes that by replacing the VMSTOP macros with a proper
state type called RunState.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Device models should be able to set it without an unclean include of
block_int.h.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It's convenience stuff for block device models, so block.h isn't the
ideal home either, but better than block_int.h.
Permits moving some #include "block_int.h" from device model .h into
.c.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It's a confused mess (see previous commit). No users remain.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qemu-common.h is not a system include file, so it should be included
with "" instead of <>. Otherwise incremental builds might fail
because only local include files are checked for changes.
* linux-user/syscall.c included the file twice.
Cc: Riku Voipio <riku.voipio@iki.fi>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Multiplexing callbacks complicates matters needlessly.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Decouple the I/O accounting from bdrv_aio_readv/writev/flush and
make the hardware models call directly into the accounting helpers.
This means:
- we do not count internal requests from image formats in addition
to guest originating I/O
- we do not double count I/O ops if the device model handles it
chunk wise
- we only account I/O once it actuall is done
- can extent I/O accounting to synchronous or coroutine I/O easily
- implement I/O latency tracking easily (see the next patch)
I've conveted the existing device model callers to the new model,
device models that are using synchronous I/O and weren't accounted
before haven't been updated yet. Also scsi hasn't been converted
to the end-to-end accounting as I want to defer that after the pending
scsi layer overhaul.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Calling virtio_cleanup() will free up memory allocated in
virtio_common_init().
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It needs to be a qdev property, because it belongs to the drive's
guest part. Precedence: commit a0fef654 and 6ced55a5.
Bonus: info qtree now shows the serial number.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Like all block drivers virtio-blk should not allow small than block size
granularity access. But given that the protocol specifies a
byte unit length field we currently accept such requests, which cause
qemu to abort() in lower layers. Add checks to the main read and
write handlers to catch them early.
Reported-by: Conor Murphy <conor_murphy_virt@hotmail.com>
Tested-by: Conor Murphy <conor_murphy_virt@hotmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Define and use dedicated constants for vm_stop reasons, they actually
have nothing to do with the EXCP_* defines used so far. At this chance,
specify more detailed reasons so that VM state change handlers can
evaluate them.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Raise a config change interrupt when the size changed. This allows
virtio-blk guest drivers to read-read the information from the
config space once it got the config chaged interrupt.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
virtio-blk doesn't work on cross-endian configuration, as endianness is
not handled correctly.
This patch adds missing endianness conversions to make virtio-blk
working. Tested on the following configurations:
- i386 guest on x86_64 host
- ppc guest on x86_64 host
- i386 guest on mips host
- ppc guest on mips host
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
If bootindex is specified on command line a string that describes device
in firmware readable way is added into sorted list. Later this list will
be passed into firmware to control boot order.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Errors should be logged using error_report() so they go to the
appropriate monitor.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Fix virtio-blk to use the usual completion path that involves werror handling
instead of directly completing the request in cases where bdrv_aio_flush
returns NULL.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The werror option now affects not only write requests, but also flush requests.
Previously, it was not possible to stop a VM on a failed flush.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
in_sg[].iovec and out_sg[].ioved are pointer to (source) host memory and
therefore invalid after migration. When loading the device state we must
create a new mapping on the destination host.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Changing block.h or blockdev.h resulted in recompiling most objects.
Move DriveInfo typedef and BlockInterfaceType enum definitions
to qemu-common.h and rearrange blockdev.h use to decrease churn.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Otherwise we can't migrate after we've removed a virtio block device.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Disks without media make no sense. For SCSI, a Linux guest kernel
complains during boot. I didn't try other combinations.
scsi-generic doesn't need the additional check, because it already
requires bdrv_is_sg(), which fails without media.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move the check from virtio_blk_init_pci(), where it protects only
virtio-blk-pci, to virtio_blk_init(). Without that, virtio-blk-s390
initializes without a drive. I figure that can lead to null pointer
dereferences.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When available, we'd like to be able to access the DeviceState
when registering a savevm. For buses with a get_dev_path()
function, this will allow us to create more unique savevm
id strings.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch adds the final missing bits for support of
passing a serial/id string to a virtio-blk guest driver.
The guest-side component already exists in the virtio
driver, and has recently been reworked by Ryan to export
a /sys interface for retrieval of the id from guest userland.
Signed-off-by: john cooper <john.cooper@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
BlockDriverState member removable controls whether virtual media
change (monitor commands change, eject) is allowed. It is set when
the "type hint" is BDRV_TYPE_CDROM or BDRV_TYPE_FLOPPY.
The type hint is only set by drive_init(). It sets BDRV_TYPE_FLOPPY
for if=floppy. It sets BDRV_TYPE_CDROM for media=cdrom and if=ide,
scsi, xen, or none.
if=ide and if=scsi work, because the type hint makes it a CD-ROM.
if=xen likewise, I think.
For the same reason, if=none works when it's used by ide-drive or
scsi-disk. For other guest devices, there are problems:
* fdc: you can't change virtual media
$ qemu [...] -drive if=none,id=foo,... -global isa-fdc.driveA=foo
QEMU 0.12.50 monitor - type 'help' for more information
(qemu) eject foo
Device 'foo' is not removable
unless you add media=cdrom, but that makes it readonly.
* virtio: if you add media=cdrom, you can change virtual media. If
you eject, the guest gets I/O errors. If you change, the guest sees
the drive's contents suddenly change.
* scsi-generic: if you add media=cdrom, you can change virtual media.
I didn't test what that does to the guest or the physical device,
but it can't be pretty.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Make the property point to BlockDriverState, cutting out the DriveInfo
middleman. This prepares the ground for block devices that don't have
a DriveInfo.
Currently all user-defined ones have a DriveInfo, because the only way
to define one is -drive & friends (they go through drive_init()).
DriveInfo is closely tied to -drive, and like -drive, it mixes
information about host and guest part of the block device. I'm
working towards a new way to define block devices, with clean
host/guest separation, and I need to get DriveInfo out of the way for
that.
Fortunately, the device models are perfectly happy with
BlockDriverState, except for two places: ide_drive_initfn() and
scsi_disk_initfn() need to check the DriveInfo for a serial number set
with legacy -drive serial=... Use drive_get_by_blockdev() there.
Device model code should now use DriveInfo only when explicitly
dealing with drives defined the old way, i.e. without -device.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Although it is really rare to get in to the while loop, the list
operation in the loop is obviously wrong.
Signed-off-by: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
That's where they belong semantically (block device host part), even
though the actions are actually executed by guest device code.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Pass the MultiReqBuffer structure down all the way to the I/O submission
instead of takin it apart. Also mark num_writes unsigned as it can't
go negative, and take the check for any pending I/O requests into the
submission function. Last but not least rename do_multiwrite to
virtio_submit_multiwrite to fit the general naming scheme and make clear
what it does.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
There is a 1:1 relation between VirtIOBlock and BlockDriverState instances,
no need to track it because it won't change.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Anything that moves hundreds of lines out of vl.c can't be all bad.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Clean up virtio-blk.c to be more consistent using BDRV_SECTOR_SIZE
instead of hard coded 512 values.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Before issuing the barrier to the block driver we need to flush our oustanding
queue of write requests, as the flush is supposed to be issued after them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The VirtIOBlockRequest structure is about 40 KB in size. This patch
avoids zeroing every request by only initializing fields that are read.
The other fields are either written to or may not be used at all.
Oprofile shows about 10% of CPU samples in memset called by
virtio_blk_alloc_request(). The workload is
dd if=/dev/vda of=/dev/null iflag=direct bs=8k running concurrently 4
times. This patch makes memset disappear to the bottom of the profile.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The bdrv_set_geometry_hint call below is not needed - it's just setting
what was just read.
Signed-off-by: Bruce Rogers <brogers@novell.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
virtio_blk_req_complete frees the request, so we can't access it any more when
calling bdrv_mon_event. Use the pointer that was copied earlier.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Add a logical block size attribute as various guest side tools only
increase the filesystem sector size based on it, not the advisory
physical block size.
For scsi we already have support for a different logical block size
in place for CDROMs that we can built upon. Only my recent block
device characteristics VPD page needs some fixups. Note that we
leave the logial block size for CDROMs hardcoded as the 2k value
is expected for it in general.
For virtio-blk we already have a feature flag claiming to support
a variable logical block size that was added for the s390 kuli
hypervisor. Interestingly it does not actually change the units
in which the protocol works, which is still fixed at 512 bytes,
but only communicates a different minimum I/O granularity. So
all we need to do in virtio is to add a trap for unaligned I/O
and round down the device size to the next multiple of the logical
block size.
IDE does not support any other logical block size than 512 bytes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The next commit will move the STOP event into do_vm_stop(), to
have the expected event sequence we need to emit the I/O error
event before calling vm_stop().
The expected sequence is:
{ "event": "BLOCK_IO_ERROR" [...] }
{ "event": "STOP" }
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Export all topology information in the block config structure,
guarded by a new VIRTIO_BLK_F_TOPOLOGY feature flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Add three new qdev properties to export block topology information to
the guest. This is needed to get optimal I/O alignment for RAID arrays
or SSDs.
The options are:
- physical_block_size to specify the physical block size of the device,
this is going to increase from 512 bytes to 4096 kilobytes for many
modern storage devices
- min_io_size to specify the minimal I/O size without performance impact,
this is typically set to the RAID chunk size for arrays.
- opt_io_size to specify the optimal sustained I/O size, this is
typically the RAID stripe width for arrays.
I decided to not auto-probe these values from blkid which might easily
be possible as I don't know how to deal with these issues on migration.
Note that we specificly only set the physical_block_size, and not the
logial one which is the unit all I/O is described in. The reason for
that is that IDE does not support increasing the logical block size and
at last for now I want to stick to one meachnisms in queue and allow
for easy switching of transports for a given backing image which would
not be possible if scsi and virtio use real 4k sectors, while ide only
uses the physical block exponent.
To make this more common for the different block drivers introduce a
new BlockConf structure holding all common block properties and a
DEFINE_BLOCK_PROPERTIES macro to add them all together, mirroring
what is done for network drivers. Also switch over all block drivers
to use it, except for the floppy driver which has weird driveA/driveB
properties and probably won't require any advanced block options ever.
Example usage for a virtio device with 4k physical block size and
8k optimal I/O size:
-drive file=scratch.img,media=disk,cache=none,id=scratch \
-device virtio-blk-pci,drive=scratch,physical_block_size=4096,opt_io_size=8192
aliguori: updated patch to take into account BLOCK events
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>