Sheepdog is a distributed storage system for QEMU. It provides highly
available block level storage volumes to VMs like Amazon EBS. This
patch adds a qemu block driver for Sheepdog.
Sheepdog features are:
- No node in the cluster is special (no metadata node, no control
node, etc)
- Linear scalability in performance and capacity
- No single point of failure
- Autonomous management (zero configuration)
- Useful volume management support such as snapshot and cloning
- Thin provisioning
- Autonomous load balancing
The more details are available at the project site:
http://www.osrg.net/sheepdog/
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
drive_init() doesn't permit invalid CHS for if=ide, but that's
worthless: we get it via if=none and -device.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
drive_init() doesn't permit option readonly for if=ide, but that's
worthless: we get it via if=none and -device.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It still always succeeds. The next commits will add failures.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The two aren't independent variables. Make that obvious.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Use error_report(), because it points to the error location.
Reword "tried to assign twice" messages to make it clear that we're
complaining about the unit property.
Report invalid unit property instead of failing silently.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Needed for decent error locations when complaining about options
outside of qemu_opts_foreach(). That one sets the location
already.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
drive_init() doesn't permit rerror for if=scsi, but that's worthless:
we get it via if=none and -device.
Moreover, scsi-generic doesn't support werror. Since drive_init()
doesn't catch that, option werror was silently ignored even with
if=scsi.
Wart: unlike drive_init(), we don't reject the default action when
it's explicitly specified. That's because we can't distinguish "no
rerror option" from "rerror=report", or "no werror" from
"rerror=enospc". Left for another day.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some of the failures are internal errors, and hw_error() is okay then.
But the common way to fail is bad user input, e.g. -global
isa-fdc.driveA=foo where drive foo has an unsupported rerror value.
exit(1) instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
drive_init() doesn't permit them for if=floppy, but that's worthless:
we get them via if=none and -global.
This can make device initialization fail. Since all callers of
fdctrl_init_isa() ignore its value, change it to die instead of
returning failure. Without this, some callers would ignore the
failure, and others would crash.
Wart: unlike drive_init(), we don't reject the default action when
it's explicitly specified. That's because we can't distinguish "no
rerror option" from "rerror=report", or "no werror" from
"rerror=enospc". Left for another day.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
raw_pread_aligned() retries up to two times if the block device backs
a virtual CD-ROM (a drive with media=cdrom and if=ide, scsi, xen or
none). This makes no sense. Whether retrying reads can correct read
errors can only depend on what we're reading, not on how the result
gets used. We need to check what whether we're reading from a
physical CD-ROM or floppy here.
I doubt retrying is useful even then. Left for another day.
Impact:
* Virtual CD-ROM backed by host_cdrom behaves the same.
* Virtual CD-ROM backed by file or host_device no longer retries.
* A drive backed by host_cdrom now retries even if it's not a virtual
CD-ROM.
* Any drive backed by host_floppy now retries.
While there, clean up gratuitous use of goto.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
init_blk_migration_it() skips drives with type hint BDRV_TYPE_CDROM.
The intention is to skip read-only drives. However, BDRV_TYPE_CDROM
is only a hint. It is currently sufficent for read-only. But it's
not necessary, and it may not remain sufficient.
Use bdrv_is_read_only() instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Since commit cb4e5f8e, monitor command change makes the new media
readonly iff the type hint is BDRV_TYPE_CDROM, i.e. the drive was
created with media=cdrom. The intention is to avoid changing a block
device's read-only-ness. However, BDRV_TYPE_CDROM is only a hint. It
is currently sufficent for read-only. But it's not necessary, and it
may not remain sufficient.
Use bdrv_is_read_only() instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds the final missing bits for support of
passing a serial/id string to a virtio-blk guest driver.
The guest-side component already exists in the virtio
driver, and has recently been reworked by Ryan to export
a /sys interface for retrieval of the id from guest userland.
Signed-off-by: john cooper <john.cooper@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This distinguishes between harmless leaks and real corruption. Hopefully users
better understand what qemu-img check wants to tell them.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
People think that their images are corrupted when in fact there are just some
leaked clusters. Differentiating several error cases should make the messages
more comprehensible.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Don't try to be clever by freeing all temporary data and calling all callbacks
when the return value (an error) is certain. Doing so has at least two
important problems:
* The temporary data that is freed (qiov, possibly zero buffer) is still used
by the requests that have not yet completed.
* Calling the callbacks for all requests in the multiwrite means for the caller
that it may free buffers etc. which are still in use.
Just remember the error value and do the cleanup when all requests have
completed.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_aio_writev may call the callback immediately (and it will commonly do so
in error cases). Current code doesn't consider this. For details see the
comment added by this patch.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch removes exit(1) from error(), and properly releases
resources such as a block driver and an allocated memory.
For testing the Sheepdog block driver with qemu-iotests, it is
necessary to call bdrv_delete() before the program exits. Because the
driver releases the lock of VM images in the close handler.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Drives defined with -drive if=ide get get created along with the IDE
controller, inside machine->init(). That's before cmos_init().
Drives defined with -device get created during generic device init.
That's after cmos_init(). Because of that, CMOS has no information on
them (type, geometry, translation). Older versions of Windows such as
XP reportedly choke on that.
Split off the part of CMOS initialization that needs to know about
-device devices, and turn it into a reset handler, so it runs after
device creation.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
BlockDriverState member removable controls whether virtual media
change (monitor commands change, eject) is allowed. It is set when
the "type hint" is BDRV_TYPE_CDROM or BDRV_TYPE_FLOPPY.
The type hint is only set by drive_init(). It sets BDRV_TYPE_FLOPPY
for if=floppy. It sets BDRV_TYPE_CDROM for media=cdrom and if=ide,
scsi, xen, or none.
if=ide and if=scsi work, because the type hint makes it a CD-ROM.
if=xen likewise, I think.
For the same reason, if=none works when it's used by ide-drive or
scsi-disk. For other guest devices, there are problems:
* fdc: you can't change virtual media
$ qemu [...] -drive if=none,id=foo,... -global isa-fdc.driveA=foo
QEMU 0.12.50 monitor - type 'help' for more information
(qemu) eject foo
Device 'foo' is not removable
unless you add media=cdrom, but that makes it readonly.
* virtio: if you add media=cdrom, you can change virtual media. If
you eject, the guest gets I/O errors. If you change, the guest sees
the drive's contents suddenly change.
* scsi-generic: if you add media=cdrom, you can change virtual media.
I didn't test what that does to the guest or the physical device,
but it can't be pretty.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
savevm.c keeps a pointer to the snapshot block device. If you manage
to get that device deleted, the pointer dangles, and the next snapshot
operation will crash & burn. Unplugging a guest device that uses it
does the trick:
$ MALLOC_PERTURB_=234 qemu-system-x86_64 [...]
QEMU 0.12.50 monitor - type 'help' for more information
(qemu) info snapshots
No available block device supports snapshots
(qemu) drive_add auto if=none,file=tmp.qcow2
OK
(qemu) device_add usb-storage,id=foo,drive=none1
(qemu) info snapshots
Snapshot devices: none1
Snapshot list (from none1):
ID TAG VM SIZE DATE VM CLOCK
(qemu) device_del foo
(qemu) info snapshots
Snapshot devices:
Segmentation fault (core dumped)
Move management of that pointer to block.c, and zap it when the device
it points becomes unusable.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
state = 0 in rules means that the rule is valid for any state. Therefore it's
impossible to have a rule that works only in the initial state. This changes
the initial state from 0 to 1 to make this possible.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Forgetting to free them means that the next instance inherits all rules and
gets its own rules only additionally.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The list head was initialized to point to the wrong list, so all actions ended
up being handled as inject-error even if they were set-state in fact.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For instance, -device scsi-disk,drive=foo -device scsi-disk,drive=foo
happily creates two SCSI disks connected to the same block device.
It's all downhill from there.
Device usb-storage deliberately attaches twice to the same blockdev,
which fails with the fix in place. Detach before the second attach
there.
Also catch attempt to delete while a guest device model is attached.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Make the property point to BlockDriverState, cutting out the DriveInfo
middleman. This prepares the ground for block devices that don't have
a DriveInfo.
Currently all user-defined ones have a DriveInfo, because the only way
to define one is -drive & friends (they go through drive_init()).
DriveInfo is closely tied to -drive, and like -drive, it mixes
information about host and guest part of the block device. I'm
working towards a new way to define block devices, with clean
host/guest separation, and I need to get DriveInfo out of the way for
that.
Fortunately, the device models are perfectly happy with
BlockDriverState, except for two places: ide_drive_initfn() and
scsi_disk_initfn() need to check the DriveInfo for a serial number set
with legacy -drive serial=... Use drive_get_by_blockdev() there.
Device model code should now use DriveInfo only when explicitly
dealing with drives defined the old way, i.e. without -device.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We automatically delete blockdev host parts on unplug of the guest
device. Too much magic, but we can't change that now.
The delete happens early in the guest device teardown, before the
connection to the host part is severed. Thus, the guest part's
pointer to the host part dangles for a brief time. No actual harm
comes from this, but we'll catch such dangling pointers a few commits
down the road. Clean up the dangling pointers by delaying the
automatic deletion until the guest part's pointer is gone.
Device usb-storage deliberately makes two qdev properties refer to the
same drive, because it automatically creates a second device. Again,
too much magic we can't change now. Multiple references worked okay
before, but now free_drive() dies for the second one. Zap the extra
reference.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
To fix https://bugs.launchpad.net/qemu/+bug/597402 where qemu fails to
call unlink() on temporary snapshots due to bs->is_temporary getting clobbered
in bdrv_open_common() after being set in bdrv_open() which calls the former.
We don't need to initialize bs->is_temporary in bdrv_open_common().
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All callers of ide_create_drive() ignore its value. Currently
harmless, because it fails only when qdev_init() fails, which fails
only when ide_drive_initfn() fails, which never fails.
Brittle. Change it to die instead of silently ignoring failure.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
None of its callers checks for failure. scsi_hot_add() can crash
because of that:
(qemu) drive_add 4 if=scsi,format=host_device,file=/dev/sg1
scsi-generic: scsi generic interface too old
Segmentation fault (core dumped)
Fix all callers, not just scsi_hot_add().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Before the raw/file split we used to allow filenames with colons for host
device only. While this was more by accident than by design people rely
on it, so we need to bring it back.
So move the host device probing to be before the protocol detection
again.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
People were wondering why qemu-img check failed after they tried to preallocate
a large qcow2 file and ran out of disk space.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
i386 cpuid.c currently claims XSAVE is supported in the CPUID filter,
but that's not true: Only FXSAVE is supported. Remove that bit
from the filter.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>