Previously multiwrite_user_cb was never called if a request in the multiwrite
batch failed right away because it did set mcb->error immediately. Make it look
more like a normal callback to fix this.
Reported-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
When two requests of the same multiwrite batch fail, the callback of all
requests in that batch were called twice. This could have any kind of nasty
effects, in my case it lead to use after free and eventually a segfault.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Open backing file read-only where possible
Upgrade backing file to read-write during commit, back to read-only after commit
If upgrade fail, back to read-only. If also fail, "disconnect" the drive.
Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
It's not needed to check the return of qobject_from_jsonf()
anymore, as an assert() has been added there.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Clean up the current mess about figuring out which flags to pass to the
driver. BDRV_O_FILE, BDRV_O_SNAPSHOT and BDRV_O_NO_BACKING are flags
only used by the block layer internally so filter them out directly.
Previously BDRV_O_NO_BACKING could accidentally be passed to the drivers,
but wasn't ever used.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This commit introduces the bdrv_mon_event() function, which
should be called by block subsystems (eg. IDE) when a I/O
error occurs, so that an QMP event is emitted.
The following information is currently provided in the event:
- device name
- operation (ie. "read" or "write")
- action taken (eg. "stop")
Event example:
{ "event": "BLOCK_IO_ERROR",
"data": { "device": "ide0-hd1",
"operation": "write",
"action": "stop" },
"timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This will manage dirty counter for each device and will allow to get the
dirty counter from above.
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
If we go over the maximum number of iovecs support by syscall we get
back EINVAL from the kernel which translate to I/O errors for the guest.
Add a MAX_IOV defintion for platforms that don't have it. For now we use
the same 1024 define that's used on Linux and various other platforms,
but until the windows block backend implements some kind of vectored I/O
it doesn't matter.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Win32 suffers from a very big memory leak when dealing with SCSI devices.
Each read/write request allocates memory with qemu_memalign (ie
VirtualAlloc) but frees it with qemu_free (ie free).
Pair all qemu_memalign() calls with qemu_vfree() to prevent such leaks.
Signed-off-by: Herve Poussineau <hpoussin@reactos.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Check the whitelist as early as possible instead of continuing the
setup, and move all the error handling code to the end of the
function.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
If we go over the maximum number of iovecs support by syscall we get
back EINVAL from the kernel which translate to I/O errors for the guest.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Instead of using the field 'readonly' of the BlockDriverState struct for passing the request,
pass the request in the flags parameter to the function.
Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The backing device is only modified from bdrv_commit. So instead of
flushing it every time bdrv_flush is called for the front-end device
only flush it after we're written data to it in bdrv_commit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Introduce the functions needed to change the backing file of an image. The
function is implemented for qcow2.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
If an image references a backing file that doesn't exist, qemu-img info fails
to open this image. Exactly in this case the info would be valuable, though:
the user might want to find out which file is missing.
This patch introduces a BDRV_O_NO_BACKING flag to ignore the backing file when
opening the image. qemu-img info is the first user and provides info now even
if the backing file is invalid.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
CC block.o
cc1: warnings being treated as errors
block.c: In function 'bdrv_open2':
block.c:400: error: ignoring return value of 'realpath', declared with attribute warn_unused_result
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Each device statistic information is stored in a QDict and
the returned QObject is a QList of all devices.
This commit should not change user output.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Each block device information is stored in a QDict and the
returned QObject is a QList of all devices.
This commit should not change user output.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This switches the dirty bitmap to a true bitmap, reducing its footprint
(specifically in caches). It moreover fixes off-by-one bugs in
set_dirty_bitmap (nb_sectors+1 were marked) and bdrv_get_dirty (limit
check allowed one sector behind end of drive). And is drops redundant
dirty_tracking field from BlockDriverState.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Instead of duplicating the definition of constants or introducing
trivial retrieval functions move the SECTOR constants into the public
block API. This also obsoletes sector_per_block in BlkMigState.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
To support live migration without shared storage we need to be able to trace
writes to disk while migrating. This Patch expose dirty block tracking per
device to be polled from upper layer.
Changes from v4:
- Register dirty tracking for each block device.
- Minor coding style issues.
- Block.c will now manage a dirty bitmap per device once
bdrv_set_dirty_tracking() is called. Bitmap is polled by the upper
layer (block-migration.c).
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We have code for a quite a few block formats. While I trust that all
of these formats are useful at least for some people in some
circumstances, some of them are of a kind that friends don't let
friends use in production.
This patch provides an optional block format whitelist, default off.
If a whitelist is configured with --block-drv-whitelist, QEMU proper
can use only whitelisted formats. Other programs, like qemu-img, are
not affected.
Drivers for formats off the whitelist still participate in format
probing, to ensure all programs probe exactly the same. Without that,
QEMU proper would be prone to treat images with a format off the
whitelist as raw when the image's format is probed.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This is a slightly revised patch for adding readonly flag to the -drive command.
Even though this patch is "stand-alone", it assumes a previous related patch (in Anthony staging tree), that passes
the readonly attribute of the drive to the guest OS, applied first.
This enables sharing same image between guests, with readonly access.
Implementaion mark the drive as read_only and changes the flags when actually opening the file.
The readonly attribute of a qcow also passed to it's base file.
For ide that cannot pass the readonly attribute to the guest OS, disallow the readonly flag.
Also, return error code from bdrv_truncate for readonly drive.
Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
bdrv_read/write emulation is used as the perfect example why we need something
like AsyncContexts. So maybe they better start using it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Problem: Our file sys-queue.h is a copy of the BSD file, but there are
some additions and it's not entirely compatible. Because of that, there have
been conflicts with system headers on BSD systems. Some hacks have been
introduced in the commits 15cc923584,
f40d753718,
96555a96d7 and
3990d09adf but the fixes were fragile.
Solution: Avoid the conflict entirely by renaming the functions and the
file. Revert the previous hacks.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Instead stalling the VCPU while serving a cache flush try to do it
asynchronously. Use our good old helper thread pool to issue an
asynchronous fdatasync for raw-posix. Note that while Linux AIO
implements a fdatasync operation it is not useful for us because
it isn't actually implement in asynchronous fashion.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Add a enable_write_cache flag in the block driver state, and use it to
decide if we claim to have a volatile write cache that needs controlled
flushing from the guest. The flag is off if cache=writethrough is
defined because O_DSYNC guarantees that every write goes to stable
storage, and it is on for cache=none and cache=writeback.
Both scsi-disk and ide now use the new flage, changing from their
defaults of always off (ide) or always on (scsi-disk).
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
One performance problem of qcow2 during the initial image growth are
sequential writes that are not cluster aligned. In this case, when a first
requests requires to allocate a new cluster but writes only to the first
couple of sectors in that cluster, the rest of the cluster is zeroed - just
to be overwritten by the following second request that fills up the cluster.
Let's try to merge sequential write requests to the same cluster, so we can
avoid to write the zero padding to the disk in the first place.
As a nice side effect, also other formats take advantage of dealing with less
and larger requests.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Now that do have a nicer interface to work against we can add Linux native
AIO support. It's an extremly thing layer just setting up an iocb for
the io_submit system call in the submission path, and registering an
eventfd with the qemu poll handler to do complete the iocbs directly
from there.
This started out based on Anthony's earlier AIO patch, but after
estimated 42,000 rewrites and just as many build system changes
there's not much left of it.
To enable native kernel aio use the aio=native sub-command on the
drive command line. I have also added an option to qemu-io to
test the aio support without needing a guest.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The VM state offset is a concept internal to the image format. Replace
the old bdrv_{get,put}_buffer method that require an index into the
image file that is constructed from the VM state offset and an offset
into the vmstate with the bdrv_{load,save}_vmstate that just take an
offset into the VM state.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Commit 6a7ad299 ("Call qemu_bh_delete at bdrv_aio_bh_cb") deletes emulated
aio bottom halves to prevent endless accumulation. However, it leaves a
stale ->bh pointer, which is then waited on when the aio is reused.
Zeroing the pointer fixes the issue, allowing vmdk format images to be used.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Fix missing strnlen (a GNU extension) problems by using qemu_strnlen
used for user emulators also for system emulators.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Problem: It is impossible to feed filenames with the character colon because
qemu interprets such names as a protocol. For example filename scsi:0, is
interpreted as a protocol by name "scsi".
This patch allows user to espace colon characters. For example the above
filename can now be expressed either as 'scsi\:0' or as file:scsi:0
anything following the "file:" tag is interpreted verbatin. However if "file:"
tag is omitted then any colon characters in the string must be escaped using
backslash.
Here are couple of examples:
scsi\:0\:abc is a local file scsi:0:abc
http\://myweb is a local file by name http://myweb
file:scsi:0:abc is a local file scsi:0:abc
file:http://myweb is a local file by name http://myweb
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Section 10.8.25 ("START/STOP UNIT Command") of SFF-8020i states that
if the device is locked we should refuse to eject if the device is
locked.
ASC_MEDIA_REMOVAL_PREVENTED is the appropriate return in this case.
In order to stop itself from ejecting the media it is running from,
Fedora's installer (anaconda) requires the CDROMEJECT ioctl() to fail
if the drive has been previously locked.
See also https://bugzilla.redhat.com/501412
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Also replave qemu_bh_cancel with qemu_bh_delete in bdrv_aio_cancel_em.
Otherwise the bh will live forever in the bh list.
Signed-off-by: Dor Laor <dor@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Add a bdrv_probe_device method to all BlockDriver instances implementing
host devices to move matching of host device types into the actual drivers.
For now we keep exacly the old matching behaviour based on the devices names,
although we really should have better detetion methods based on device
information in the future.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Instead of declaring one BlockDriver for all host devices declared one
for each type: a generic one for normal disk devices, a Linux floppy
driver and a CDROM driver for Linux and FreeBSD. This gets rid of a lot
of messy ifdefs and switching based on the type in the various removal
device methods.
block.c grows a new method to find the correct host device driver based
on OS-sepcific criteria, which will later into the actual drivers in a
later patch in this series.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Now that we have a separate aio pool structure we can remove those
aio pool details from BlockDriver.
Every driver supporting AIO now needs to declare a static AIOPool
with the aiocb size and the cancellation method. This cleans up the
current code considerably and will make it cleaner and more obvious
to support two different aio implementations behind a single
BlockDriver.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch converts the remaining users of bdrv_create2 to bdrv_create and
removes the now unused function.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Now we can make use of the newly introduced option structures. Instead of
having bdrv_create carry more and more parameters (which are format specific in
most cases), just pass a option structure as defined by the driver itself.
bdrv_create2() contains an emulation of the old interface to simplify the
transition.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>