Add a new VIRTIO_BLK_F_WCACHE feature to virtio-blk to indicate that we have
a volatile write cache that needs controlled flushing. Implement a
VIRTIO_BLK_T_FLUSH operation to flush it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Instead stalling the VCPU while serving a cache flush try to do it
asynchronously. Use our good old helper thread pool to issue an
asynchronous fdatasync for raw-posix. Note that while Linux AIO
implements a fdatasync operation it is not useful for us because
it isn't actually implement in asynchronous fashion.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
If we are flushing the caches for our image files we only care about the
data (including the metadata required for accessing it) but not things
like timestamp updates. So try to use fdatasync instead of fsync to
implement the flush operations.
Unfortunately many operating systems still do not support fdatasync,
so we add a qemu_fdatasync wrapper that uses fdatasync if available
as per the _POSIX_SYNCHRONIZED_IO feature macro or fsync otherwise.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Add a enable_write_cache flag in the block driver state, and use it to
decide if we claim to have a volatile write cache that needs controlled
flushing from the guest. The flag is off if cache=writethrough is
defined because O_DSYNC guarantees that every write goes to stable
storage, and it is on for cache=none and cache=writeback.
Both scsi-disk and ide now use the new flage, changing from their
defaults of always off (ide) or always on (scsi-disk).
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
initialize vectors for all vqs to VIRTIO_NO_VECTOR rather than 0 which
is a valid vector. This fixes migration which happened before driver
was loaded.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reported-by: Amit Shah <amit.shah@redhat.com>
Tested-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
commit bf011293fa made virtio-blk-pci not
PCI-compliant, since it makes region 0 (which is an i/o region)
size > 256, and, since PCI 2.1, i/o regions are limited to 256 bytes size.
When the ATA serial number feature is off, which is the default,
make the device spec compliant again, by making region 0 smaller.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reported-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Tested-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
In usb-linux.c:usb_host_handle_control, we pass a 1024-byte buffer and
length to the kernel. However, the length was provided by the caller
of dev->handle_packet, and is not checked, so the kernel might provide
too much data and overflow our buffer.
For example, hw/usb-uhci.c could set the length to 2047.
hw/usb-ohci.c looks like it might go up to 4096 or 8192.
This causes a qemu crash, as reported here:
http://www.mail-archive.com/kvm@vger.kernel.org/msg18447.html
This patch increases the usb-linux.c buffer size to 2048 to fix the
specific device reported, and adds a check to avoid the overflow in
any case.
Signed-off-by: Jim Paris <jim@jtan.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Compilation for MIPS host (not part of official QEMU)
checks __mips_isa_rev which is not always defined.
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
It is quite common for virtio-blk to submit more than one write request in a
row to the qemu block layer. Use bdrv_aio_multiwrite to allow block drivers to
optimize its handling of the requests.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
One performance problem of qcow2 during the initial image growth are
sequential writes that are not cluster aligned. In this case, when a first
requests requires to allocate a new cluster but writes only to the first
couple of sectors in that cluster, the rest of the cluster is zeroed - just
to be overwritten by the following second request that fills up the cluster.
Let's try to merge sequential write requests to the same cluster, so we can
avoid to write the zero padding to the disk in the first place.
As a nice side effect, also other formats take advantage of dealing with less
and larger requests.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
If PVR settings enable illegal insn trap, trap when QEMU finds an
insn it knows nothing about.
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
The microblaze gives MMU faults priority. For stores we still
have a flaw that the value leaks to memory in the case of an
unaligned exception.
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
* Correct PVR checks for masking off individual exceptions.
* Correct FPU exception code.
* Set EAR on unaligned and unassigned exceptions.
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Also split the isa bits into a separate source file, so we don't drag in
a dependency for isa-bus.o for machines which want ne2k_pci only.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Like isa_create_simple, but doesn't call qdev_init, so one can set
properties after creating and before initializing the device.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
isa-bus owns the isa irqs now, so it can hand them out directly.
There is no need for the separate isa_connect_irqs step, drop it.
Also hard-code isa interrupts which can't be configured anyway.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Lot of ISA devices work at fixed addresses, so having iobase
as bus property doesn't make much sense. Devices which can
have different iobases will get a device property.
Also simply hard-code stuff which can't be configured anyway.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
PCI device entries have to have a default version, not 2, because they are
used in the midle of other structures that can have _any_ version number.
We can't use proper versioning here until we have SubSections support.
Why we didn't noticed before? Because in a PC, the only device ported with
a version less that 2 is piix_pm, and for that one, default pci values are
right. If you use a virtio-console, you will see that its state it is not
loaded back.
Thanks to Amit Shah for reporting the problem and help debug the fix.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Doing this will make the vcpu ioctl be issued from the I/O thread, instead
of cpu thread. The correct behaviour is to call it from within the cpu thread,
as soon as we are ready to go.
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The RTC emulation does not set the IRQ flags independent of the IRQ enable bits.
The original MC146818A datasheet from 1984 notes:
"flag bits in Register C [...] are set independent of the
state of the corresponding enable bits in Register B"
Similar sections can be found in newer documentation e.g. in rtc82885.
Qemu and Bochs set the IRQ flags only if they are enabled,
which breaks drivers polling on them.
The following patch corrects this for the update-ended-flag in Qemu only.
It does not fix the handling of the other flags.
Signed-off-by: Bernhard Kauer <kauer@tudos.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
qemu-kvm: fix segfault when running kvm without /dev/kvm, falling back
to non-accelerated mode
We're seeing segfaults on systems without access to /dev/kvm. It
looks like the global kvm_allowed is being set just a little too late
in vl.c. This patch moves the kvm initialization a bit higher in the
vl.c main, just after options processing, and solves the segfaults.
We're carrying this patch in Ubuntu 9.10 Alpha. Please apply
upstream, or advise if and why this might not be the optimal solution.
Signed-off-by: Dustin Kirkland <kirkland@canonical.com>
Move the kvm_init() call a bit higher to fix a segfault when
/dev/kvm is not available. The kvm_allowed global needs
to be set correctly a little earlier.
Signed-off-by: Dustin Kirkland <kirkland@canonical.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We should set $linux_aio to 'no' if detection failed, otherwise
its contents will be empty, which is a bug as we test for 'yes'
or 'no'.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
gcc 4.3.2 throws warnings when DEBUG_BUFFERED_FILE is defined, because
we are using the wrong format specifiers to print size_t/ssize_t values.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
When two AIO requests write to the same cluster, and this cluster is
unallocated, currently both requests allocate a new cluster and the second one
merges the first one when it is completed. This means an cluster allocation, a
read and a cluster deallocation which cause some overhead. If we simply let the
second request wait until the first one is done, we improve overall performance
with AIO requests (specifially, qcow2/virtio combinations).
This patch maintains a list of in-flight requests that have allocated new
clusters. A second request touching the same cluster is limited so that it
either doesn't touch the allocation of the first request (so it can have a
non-overlapping allocation) or it waits for the first request to complete.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The wrong version of the preallocation patch has been applied, so this is the
remaining diff.
We can't use truncate to grow the image file to the right size because we don't
know if metadata has been written after the last data cluster. In this case
truncate would shrink the file and destroy its metadata. Write a zero sector at
the end of the virtual disk instead to ensure that the file is big enough.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
With cc-option we are testing if gcc just accept a particular option, we don't need CFLAGS at all. And this fixes the recursive problem with CFLAGS
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Here's a patch to fix the issue introduced by me, as Reimar Döffinger pointed out,
Reimar Döffinger wrote:
> On Thu, Aug 13, 2009 at 03:01:20PM +0300, Naphtali Sprei wrote:
>> Bug fix for segfault when run as i82551 HW:
>> Use Extended TBD only when HW supports it (i82558 and up).
>>
>> Added assertions to guard from such buffer overflow
>> Introduce the MAX_TCB_BYTE_COUNT macro
>> Allocate buf big enough as HW needs (MAX_ETH_FRAME_SIZE -> MAX_TCB_BYTE_COUNT)
>>
>>
>> I don't feel 100% OK with the "s->device >= i82558B" condition
>> since it relies on the numeric (hex) value of those defines, which currently
>> is correct, but changes (which I don't forsee now) might break it.
>
> It seems this was applied. Unfortunately this breaks things on FreeBSD.
> There seem to be multiple issues.
> First, the intel document says the 82551, 82550, 82559 models are all
> supersets of the 82558. Or in other words: they all support this
> feature.
> Only the 82557 does not.
> But then even for that the FreeBSD driver will fail.
> The reason for that is this line:
> eeprom_contents[0xa] = 0x4000;
> the value here must be 0x01000 for all 82557 models it seems.
Correct the logic of determining devices that supports
extended TxCB: only the 82557 do not support it.
Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>