Make all AIO requests vectored and defer linearization until the actual
I/O thread. This prepares for using native preadv/pwritev.
Also enables asynchronous direct I/O by handling that case in the I/O thread.
Qcow and qcow2 propably want to be adopted to directly deal with multi-segment
requests, but that can be implemented later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@7020 c046a42c-6fe2-441c-8c8c-71466251a162
This was reported by malc.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6993 c046a42c-6fe2-441c-8c8c-71466251a162
This patch allows the use a host_device as the destination for "qemu-img
convert".
I added a ->bdrv_create function host_device. It merely verifies that
the device exists and is large enough.
A check is needed in the qemu-img convert loop to ensure that we write
out all 0 sectors to the host_device. Otherwise they end up with stale
garbage where all zero sectors were expected.
I also made the check against bdrv_is_allocated enabled for everything
_except_ host devices, since there is no point in making the block
backend write a bunch of zeros just so that we can memcmp them
immediately afterwards. Host devices can't benefit from this because
there is no way to differentiate between a sector being unallocated
because it was never written, or because it was written with all zeros
and then made a trip through qemu-img convert.
Finally, there is an unrelated fix for a typo in the error message
printed if the destination device does not support ->bdrv_create.
Signed-off-by: Nolan Leake <nolan <at> sigbus.net>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6978 c046a42c-6fe2-441c-8c8c-71466251a162
Okay, I started looking into how to handle scsi-generic I/O in the
new world order.
I think the best is to use the SG_IO ioctl instead of the read/write
interface as that allows us to support scsi passthrough on disk/cdrom
devices, too. See Hannes patch on the kvm list from August for an
example.
Now that we always do ioctls we don't need another abstraction than
bdrv_ioctl for the synchronous requests for now, and for asynchronous
requests I've added a aio_ioctl abstraction keeping it simple.
Long-term we might want to move the ops to a higher-level abstraction
and let the low-level code fill out the request header, but I'm lazy
enough to leave that to the people trying to support scsi-passthrough
on a non-Linux OS.
Tested lightly by issuing various sg_ commands from sg3-utils in a guest
to a host CDROM device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6895 c046a42c-6fe2-441c-8c8c-71466251a162
This improves physical cdrom support on FreeBSD hosts to be almost as
good as on Linux, with the only notable exception that you still need to
either have the guest itself eject the disc if you want to take it
out/change it, or do a change command in the monitor after taking out
a disc in case a guest cannot eject it itself - otherwise the guest may
continue using state (like size) of the old disc.
Signed-off-by: Juergen Lock <nox@jelal.kn-bremen.de>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6888 c046a42c-6fe2-441c-8c8c-71466251a162
The changes introduced by r6824 broke a subtle, and admittedly obscure, aspect
of the block API. While bdrv_{pread,pwrite} return the number of bytes read
or written upon success, bdrv_{read,write} returns a zero upon success.
When using bdrv_pread for bdrv_read, special care must be taken to handle this
case.
This fixes certain guest images (notably linux-0.2 provided on the qemu
website).
Reported-by: malc <av1474@comtv.ru>
Reported-by: Herve Poussineau <hpoussin@reactos.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6828 c046a42c-6fe2-441c-8c8c-71466251a162
Now that scsi generic no longer uses bdrv_pread() and bdrv_pwrite(), we can
drop the corresponding internal APIs, which overlap bdrv_read()/bdrv_write()
and, being byte oriented, are unnatural for a block device.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6824 c046a42c-6fe2-441c-8c8c-71466251a162
Add an internal API for the generic block layer to send scsi generic commands
to block format driver. This means block format drivers no longer need
to consider overloaded nb_sectors parameters.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6823 c046a42c-6fe2-441c-8c8c-71466251a162
Consistently use the C99 named initializer format for the BlockDriver
methods to make the method table more readable and more easily
extensible.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6768 c046a42c-6fe2-441c-8c8c-71466251a162
Hi all,
this small patch fixes a bug in the list iteration of raw_aio_remove.
Cheers,
Stefano
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6644 c046a42c-6fe2-441c-8c8c-71466251a162
Currently when qemu_paio_read or qemu_paio_write return an error we call
qemu_aio_release without removing the request from the list.
I know that in the current implementation qemu_paio_write\read don't return
any error, but still the behavior is wrong, especially considering
that the implementation of these two functions is likely to change in is
the future.
This patch fixes the problem adding a raw_aio_remove function that
removes the callback from the queue and also calls qemu_aio_release.
raw_aio_remove is called by raw_aio_read, raw_aio_write and
raw_aio_cancel.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6470 c046a42c-6fe2-441c-8c8c-71466251a162
This is a large patch that changes all occurrences of logfile/loglevel
global variables to use the new qemu_log*() macros.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@6338 c046a42c-6fe2-441c-8c8c-71466251a162
glibc implements posix-aio as a thread pool and imposes a number of limitations.
1) it limits one request per-file descriptor. we hack around this by dup()'ing
file descriptors which is hideously ugly
2) it's impossible to add new interfaces and we need a vectored read/write
operation to properly support a zero-copy API.
What has been suggested to me by glibc folks, is to implement whatever new
interfaces we want and then it can eventually be proposed for standardization.
This requires that we implement our own posix-aio implementation though.
This patch implements posix-aio using pthreads. It immediately eliminates the
need for fd pooling.
It performs at least as well as the current posix-aio code (in some
circumstances, even better).
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5996 c046a42c-6fe2-441c-8c8c-71466251a162
This patch switches the read handle of the signaling pipe into
non-blocking mode. This avoids unwanted blocking reads and also
allows to read all bytes out of the signaling pipe in case we got
signaled more that once before the handler ran.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5716 c046a42c-6fe2-441c-8c8c-71466251a162
O_DSYNC isn't available on OS X.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5486 c046a42c-6fe2-441c-8c8c-71466251a162
This patch changes the cache= option to accept none, writeback, or writethough
to control the host page cache behavior. By default, writethrough caching is
now used which internally is implemented by using O_DSYNC to open the disk
images. When using -snapshot, writeback is used by default since data integrity
it not at all an issue.
cache=none has the same behavior as cache=off previously. The later syntax is
still supported by now deprecated. I also cleaned up the O_DIRECT
implementation to avoid many of the #ifdefs.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5485 c046a42c-6fe2-441c-8c8c-71466251a162
Replace signalfd with signal handler/pipe. There is no way to interrupt
the CPU execution loop when a file descriptor becomes readable. This
results in a large performance regression in sparc emulation during
bootup.
This patch switches us to signal handler/pipe which was originally
suggested by Ian Jackson. The signal handler lets us interrupt the
CPU emulation loop while the write to a pipe lets us avoid the
select/signal race condition.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5451 c046a42c-6fe2-441c-8c8c-71466251a162
Be more friendly when signalfd() fails, and also add configure checks to detect
that syscall(SYS_signalfd) actually works. malc pointed out that some installs
do not have /usr/include/linux headers that are in sync with the glibc headers
so why SYS_signalfd is defined, it's #defined to _NR_signalfd which is not
defined in the /usr/include/linux header.
While this is a distro bug, it doesn't hurt to do a more thorough job in
detection.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5334 c046a42c-6fe2-441c-8c8c-71466251a162
struct aioinit isn't defined on BSD it appears so we need to guard everything
in an #if defined(__linux__).
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5325 c046a42c-6fe2-441c-8c8c-71466251a162
This patch implements a simple fd pool to allow many AIO requests with
posix-aio. The result is significantly improved performance (identical to that
reported for linux-aio) for both cache=on and cache=off.
The fundamental problem with posix-aio is that it limits itself to one thread
per-file descriptor. I don't know why this is, but this patch provides a simple
mechanism to work around this (duplicating the file descriptor).
This isn't a great solution, but it seems like a reasonable intermediate step
between posix-aio and a custom thread-pool to replace it.
Ryan Harper will be posting some performance analysis he did comparing posix-aio
with fd pooling against linux-aio. The size of the posix-aio thread pool and
the fd pool were largely determined by him based on this analysis.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5323 c046a42c-6fe2-441c-8c8c-71466251a162
__GLIBC_PREREQ is defined in such a way that the ! cannot be used in front of
it on FreeBSD. Also, -lpthread is not implied by the build and we definitely
use it for compatfd support.
While at it, I added a default initialization for posix-aio that seems to
perform well in our testing.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5322 c046a42c-6fe2-441c-8c8c-71466251a162
RedHat 9 shipped glibc 2.3. Modern versions of glibc do not have the aio thread
exit issue that the comment references. This patch adjusts the check to only
limit aio_init on glibc versions < 2.4.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5304 c046a42c-6fe2-441c-8c8c-71466251a162
This patch refactors the AIO layer to allow multiple AIO implementations. It's
only possible because of the recent signalfd() patch.
Right now, the AIO infrastructure is pretty specific to the block raw backend.
For other block devices to implement AIO, the qemu_aio_wait function must
support registration. This patch introduces a new function,
qemu_aio_set_fd_handler, which can be used to register a file descriptor to be
called back. qemu_aio_wait() now polls a set of file descriptors registered
with this function until one becomes readable or writable.
This patch should allow the implementation of alternative AIO backends (via a
thread pool or linux-aio) and AIO backends in non-traditional block devices
(like NBD).
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5297 c046a42c-6fe2-441c-8c8c-71466251a162
This prevents two signalfd() threads from being spawned. This problem was
originally spotted by Blue Swirl.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5289 c046a42c-6fe2-441c-8c8c-71466251a162
The protocol_name "file" was added to the block driver when async IO was
introduced. This can be used to select that a file is treated as a raw
device instead of probing for the type. However, protocols are not subject
to path interpretation which cases qcow2 images with raw base images to not
function is the path was specified relatively.
The fix is simply to remove the protocol_name from the raw block driver. The
proper way to force the use of a raw block format is to use the format= option
with -drive.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5233 c046a42c-6fe2-441c-8c8c-71466251a162
My previous commit broke the build. This was spotted by C.W. Betts.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5231 c046a42c-6fe2-441c-8c8c-71466251a162
Right now, we sprinkle #if defined(QEMU_IMG) && defined(QEMU_NBD) all over the
code. It's ugly and causes us to have to build multiple object files for
linking against qemu and the tools.
This patch introduces a new file, qemu-tool.c which contains enough for
qemu-img, qemu-nbd, and QEMU to all share the same objects.
This also required getting qemu-nbd to be a bit more Windows friendly. I also
changed the Windows block-raw to use normal IO instead of overlapping IO since
we don't actually do AIO yet on Windows. I changed the various #if 0's to
#if WIN32_AIO to make it easier for someone to eventually fix AIO on Windows.
After this patch, there are no longer any #ifdef's related to qemu-img and
qemu-nbd.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5226 c046a42c-6fe2-441c-8c8c-71466251a162
OpenBSD doesn't use AIO so don't try to build compatfd when not using AIO.
Also make sure to call qemu_aio_init() from bdrv_init. Everything that uses
bdrv calls bdrv_init so it makes sense to init aio from there instead of
in every single tool.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5197 c046a42c-6fe2-441c-8c8c-71466251a162
This patch introduces signalfd() to work around the signal/select race in
checking for AIO completions. For platforms that don't support signalfd(), we
emulate it with threads.
There was a long discussion about this approach. I don't believe there are any
fundamental problems with this approach and I believe eliminating the use of
signals is a good thing.
I've tested Windows and Linux using Windows and Linux guests. I've also checked
for disk IO performance regressions.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5187 c046a42c-6fe2-441c-8c8c-71466251a162