Add some notes about Linux AIO explaining why we don't use AIO in
some situations.
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Callees always return 0, except for FreeBSD's cdrom_eject(), which
returns -ENOTSUP when the device is in a terminally wedged state.
The only caller is bdrv_eject(), and it maps -ENOTSUP to 0 since
commit 4be9762a.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The only caller is bdrv_set_locked(), and it ignores the value.
Callees always return 0, except for FreeBSD's cdrom_set_locked(),
which returns -ENOTSUP when the device is in a terminally wedged
state.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qemu-img.c wants to count allocated file size of image. Previously it
counts a single bs->file by 'stat' or Window API. As VMDK introduces
multiple file support, the operation becomes format specific with
platform specific meanwhile.
The functions are moved to block/raw-{posix,win32}.c and qemu-img.c calls
bdrv_get_allocated_file_size to count the bs. And also added VMDK code
to count his own extents.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
On Linux x86_64 host with 32bit userspace, running
qemu or even just "qemu-img create -f qcow2 some.img 1G"
causes a kernel warning:
ioctl32(qemu-img:5296): Unknown cmd fd(3) cmd(00005326){t:'S';sz:0} arg(7fffffff) on some.img
ioctl32(qemu-img:5296): Unknown cmd fd(3) cmd(801c0204){t:02;sz:28} arg(fff77350) on some.img
ioctl 00005326 is CDROM_DRIVE_STATUS,
ioctl 801c0204 is FDGETPRM.
The warning appears because the Linux compat-ioctl handler for these
ioctls only applies to block devices, while qemu also uses the ioctls on
plain files. Work around by calling fstat() the ensure the ioctls are
only used on block devices.
Signed-off-by: Johannes Stezenbach <js@sig21.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
use the correct way to get the size of a disk device or partition
From: Adam Hamsik <haad@netbsd.org>
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
On NetBSD a userland process is better with the character device
interface. In addition, a block device can't be opened twice; if a Xen
backend opens it, qemu can't and vice-versa.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Change BDRV_O_NOCACHE to only imply bypassing the host OS file cache,
but no writeback semantics. All existing callers are changed to also
specify BDRV_O_CACHE_WB to give them writeback semantics.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch removes all references to signal.h when qemu-common.h is included
as they become redundant.
Signed-off-by: Alexandre Raymond <cerbere@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Add support to discard blocks in a raw image residing on an XFS filesystem
by calling the XFS_IOC_UNRESVSP64 ioctl to punch holes. Support for other
hole punching mechanisms can be added when they become available.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This changes bdrv_flush to return 0 on success and -errno in case of failure.
It's a requirement for implementing proper error handle in users of bdrv_flush.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Move timer init functions to a new file, qemu-timer-common.c. Make other
critical timer functions inlined to preserve performance in
qemu-timer.c, also move muldiv64() (used by the inline functions)
to qemu-timer.h.
Adjust block/raw-posix.c and simpletrace.c to use get_clock() directly.
Remove a similar/duplicate definition in qemu-tool.c.
Adjust hw/omap_clk.c to include qemu-timer.h because muldiv64() is used
there.
After this change, tracing can be used also for user code and
simpletrace on Win32.
Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Replace the hardcoded handling of 512 byte alignment with bs->buffer_alignment
to handle larger sector size devices correctly.
Note that we can not rely on it to be initialize in bdrv_open, so deal
with the worst case there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Allow symbolic links which point to /dev/sgX devices.
Signed-off-by: Bernhard Kohl <bernhard.kohl@nsn.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
On Linux, we have code to detect CD-ROMs using an ioctl. We shouldn't lose
anything but false positives by removing the check for a /dev/cd* path.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Assuming that any image on a block device is not properly zero-initialized is
actually wrong: Only raw images have this problem. Any other image format
shouldn't care about it, they initialize everything properly themselves.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There is no need to have a second set of integral types.
Replace them by the standard types from stdint.h.
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
raw_pread_aligned() retries up to two times if the block device backs
a virtual CD-ROM (a drive with media=cdrom and if=ide, scsi, xen or
none). This makes no sense. Whether retrying reads can correct read
errors can only depend on what we're reading, not on how the result
gets used. We need to check what whether we're reading from a
physical CD-ROM or floppy here.
I doubt retrying is useful even then. Left for another day.
Impact:
* Virtual CD-ROM backed by host_cdrom behaves the same.
* Virtual CD-ROM backed by file or host_device no longer retries.
* A drive backed by host_cdrom now retries even if it's not a virtual
CD-ROM.
* Any drive backed by host_floppy now retries.
While there, clean up gratuitous use of goto.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Clean up raw-posix.c to be more consistent using BDRV_SECTOR_SIZE
instead of hard coded 512 values.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch combines the lseek+read/write calls to use pread/pwrite
instead. This will result in fewer system calls and is already used by
AIO.
Thanks to Jan Kiszka <jan.kiszka@siemens.com> for identifying excessive
lseek and Christoph Hellwig <hch@lst.de> for confirming that this
approach should work.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Format drivers shouldn't need to bother with things like file names, but rather
just get an open BlockDriverState for the underlying protocol. This patch
introduces this behaviour for bdrv_open implementation. For protocols which
need to access the filename to open their file/device/connection/... a new
callback bdrv_file_open is introduced which doesn't get an underlying file
opened.
For now, also some of the more obscure formats use bdrv_file_open because they
open() the file themselves instead of using the block.c functions. They need to
be fixed in later patches.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We're running into various problems because the "raw" file access, which
is used internally by the various image formats is entangled with the
"raw" image format, which maps the VM view 1:1 to a file system.
This patch renames the raw file backends to the file protocol which
is treated like other protocols (e.g. nbd and http) and adds a new
"raw" image format which is just a wrapper around calls to the underlying
protocol.
The patch is surprisingly simple, besides changing the probing logical
in block.c to only look for image formats when using bdrv_open and
renaming of the old raw protocols to file there's almost nothing in there.
For creating images, a new bdrv_create_file is introduced which guesses the
protocol to use. This allows using qemu-img create -f raw (or just using the
default) for both files and host devices. Converting the other format drivers
to use this function to create their images is left for later patches.
The only issues still open are in the handling of the host devices.
Firstly in current qemu we can specifiy the host* format names
on various command line acceping images, but the new code can't
do that without adding some translation. Second the layering breaks
the no_zero_init flag in the BlockDriver used by qemu-img. I'm not
happy how this is done per-driver instead of per-state so I'll
prepare a separate patch to clean this up.
There's some more cleanup opportunity after this patch, e.g. using
separate lists and registration functions for image formats vs
protocols and maybe even host drivers, but this can be done at a
later stage.
Also there's a check for protocol in bdrv_open for the BDRV_O_SNAPSHOT
case that I don't quite understand, but which I fear won't work as
expected - possibly even before this patch.
Note that this patch requires various recent block patches from Kevin
and me, which should all be in his block queue.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Split up the raw_getlength into separate generic, solaris and BSD
versions to reduce the ifdef maze a bit. The BSD variant still
is a complete maze, but to clean it up properly we'd need some
people using the BSD variants to figure out what code is used
for what variant.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_open already takes care of this for us.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Now that we output an error message according to the returned error code in
qemu-img, let's return the real error codes. "Input/output error" for
everything isn't helpful.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This shouldn't happen under any normal circumstances. However, it looks like
it's possible to achieve this with corrupted images. Without this patch
raw_pread is hanging in an endless loop in such cases.
The patch is not affecting growable files, for which such reads happen in
normal use cases. raw_pread_aligned already handles these cases and won't
return zero in the first place.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Win32 suffers from a very big memory leak when dealing with SCSI devices.
Each read/write request allocates memory with qemu_memalign (ie
VirtualAlloc) but frees it with qemu_free (ie free).
Pair all qemu_memalign() calls with qemu_vfree() to prevent such leaks.
Signed-off-by: Herve Poussineau <hpoussin@reactos.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The BDRV_O_CREAT option is unused inside qemu and partially duplicates
the bdrv_create method. Remove it, and the -C option to qemu-io which
isn't used in qemu-iotests anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Instead of using the field 'readonly' of the BlockDriverState struct for passing the request,
pass the request in the flags parameter to the function.
Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Current legacy floppy detection is hardcoded based on source file
name. Make this smarter on linux by attempting a floppy specific
ioctl.
v2:
Give ioctl check higher priority than filename check
s/IDE/legacy/
v3:
Actually initialize 'prio' variable
Check for ioctl success rather than absence of specific failure
v4:
Explicitly mention that change is linux specific.
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Current CDROM detection is hardcoded based on source file name.
Make this smarter on linux by attempting a CDROM specific ioctl.
This makes '-cdrom /dev/sr0' succeed with no media present.
v2:
Give ioctl check higher priority than filename check.
v3:
Actually initialize 'prio' variable.
Check for ioctl success rather than absence of specific failure.
v4:
Explicitly mention that change is linux specific.
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We're leaking file descriptors to child processes. Set FD_CLOEXEC on file
descriptors that don't need to be passed to children to stop this misbehaviour.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
I haven't heard yet of anyone using qemu-img to copy an image to a real floppy,
but it's a valid use case.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The context parameter in paio_submit isn't used anyway, so there is no reason
why block drivers should need to remember it. This also avoids passing a Linux
AIO context to paio_submit (which doesn't do any harm as long as the parameter
is unused, but it is highly confusing).
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
When using Linux AIO raw still falls back to POSIX AIO sometimes, so we should
initialize it.
Not initializing it happens to work if POSIX AIO is used by another drive, or
if the format is not specified (probing the format uses POSIX AIO) or by pure
luck (e.g. it doesn't seem to happen any more with qcow2 since we have re-added
synchronous qcow2 functions).
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Today host_devices have a create function, so they also need a create_options
field to prevent qemu-img from complaining.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Instead stalling the VCPU while serving a cache flush try to do it
asynchronously. Use our good old helper thread pool to issue an
asynchronous fdatasync for raw-posix. Note that while Linux AIO
implements a fdatasync operation it is not useful for us because
it isn't actually implement in asynchronous fashion.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
If we are flushing the caches for our image files we only care about the
data (including the metadata required for accessing it) but not things
like timestamp updates. So try to use fdatasync instead of fsync to
implement the flush operations.
Unfortunately many operating systems still do not support fdatasync,
so we add a qemu_fdatasync wrapper that uses fdatasync if available
as per the _POSIX_SYNCHRONIZED_IO feature macro or fsync otherwise.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch fixes linker errors when building QEMU without Linux AIO support.
It is based on suggestions from malc and Kevin Wolf.
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Now that do have a nicer interface to work against we can add Linux native
AIO support. It's an extremly thing layer just setting up an iocb for
the io_submit system call in the submission path, and registering an
eventfd with the qemu poll handler to do complete the iocbs directly
from there.
This started out based on Anthony's earlier AIO patch, but after
estimated 42,000 rewrites and just as many build system changes
there's not much left of it.
To enable native kernel aio use the aio=native sub-command on the
drive command line. I have also added an option to qemu-io to
test the aio support without needing a guest.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Currently the raw-posix.c code contains a lot of knowledge about the
asynchronous I/O scheme that is mostly implemented in posix-aio-compat.c.
All this code does not really belong here and is getting a bit in the
way of implementing native AIO on Linux.
So instead move all the guts of the AIO implementation into
posix-aio-compat.c (which might need a better name, btw).
There's now a very small interface between the AIO providers and raw-posix.c:
- an init routine is called from raw_open_common to return an AIO context
for this drive. An AIO implementation may either re-use one context
for all drives, or use a different one for each as the Linux native
AIO support will do.
- an submit routine is called from the aio_reav/writev methods to submit
an AIO request
There are no indirect calls involved in this interface as we need to
decide which one to call manually. We will only call the Linux AIO native
init function if we were requested to by vl.c, and we will only call
the native submit function if we are asked to and the request is properly
aligned. That's also the reason why the alignment check actually does
the inverse move and now goes into raw-posix.c.
The old posix-aio-compat.h headers is removed now that most of it's
content is private to posix-aio-compat.c, and instead we add a new
block/raw-posix-aio.h headers is created containing only the tiny interface
between raw-posix.c and the AIO implementation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
As requested by Anthony make pthreads mandatory. This means we will always
have AIO available on posix hosts, and it will also allow enabling the I/O
thread unconditionally once it's ready.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>