virtio has the equivalent of:
if (vq->last_avail_index != vring_avail_idx(vq)) {
read descriptor head at vq->last_avail_index;
}
In theory, processor can reorder descriptor head
read to happen speculatively before the index read.
this would trigger the following race:
host descriptor head read <- reads invalid head from ring
guest writes valid descriptor head
guest writes avail index
host avail index read <- observes valid index
as a result host will use an invalid head value.
This was not observed in the field by me but after
the experience with the previous two races
I think it is prudent to address this theoretical race condition.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This fixes an issue dual to the one fixed by
patch 'virtio: add missing mb() on notification'
and applies on top.
In this case, to enable vq kick to exit to host,
qemu writes out used flag then reads the
avail index. if these are reordered we get a race:
host avail index read: ring is empty
guest avail index write
guest flag read: exit disabled
host used flag write: enable exit
which results in a lost exit: host will never be notified about the
avail index update. Again, happens in the field but only seems to
trigger on some specific hardware.
Insert an smp_mb barrier operation to ensure the correct ordering.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
During normal operation, virtio first writes a used index
and then checks whether it should interrupt the guest
by reading guest avail index/flag values.
Guest does the reverse: writes the index/flag,
then checks the used ring.
The ordering is important: if host avail flag read bypasses the used
index write, we could in effect get this timing:
host avail flag read
guest enable interrupts: avail flag write
guest check used ring: ring is empty
host used index write
which results in a lost interrupt: guest will never be notified
about the used ring update.
This actually can happen when using kvm with an io thread,
such that the guest vcpu and qemu run on different host cpus,
and this has actually been observed in the field
(but only seems to trigger on very specific processor types)
with userspace virtio: vhost has the necessary smp_mb()
in place to prevent the regordering, so the same workload stalls
forever waiting for an interrupt with vhost=off but works
fine with vhost=on.
Insert an smp_mb barrier operation in userspace virtio to
ensure the correct ordering.
Applying this patch fixed the race condition we have observed.
Tested on x86_64. I checked the code generated by the new macro
for i386 and ppc but didn't run virtio.
Note: mb could in theory be implemented by __sync_synchronize, but this
would make us hit old GCC bugs. Besides old GCC
not implementing __sync_synchronize at all, there were bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793
in this functionality as recently as in 4.3.
As we need asm for rmb,wmb anyway, it's just as well to
use it for mb.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If a guest sets very short timeouts, and asks for a timer to be reloaded on
timeout, QEMU can go to 100%CPU utilisation and become unresponsive,
as it is spending all its time generating timeout interrupts. On real
hardware this doesn't matter, as the interrupts are just coalesced,
and the effect is to have the interrupt asserted all the time.
This patch is a band-aid, that prevents timeouts less than 10
microseconds from being set. 10 microseconds is a limit that was
determined empirically on a variety of machines as the shortest that
allowed QEMU to pick up a control-a c sequence to get at the monitor.
Reported-by: Anna Lyons <anna.lyons@nicta.com.au>
Signed-off-by: Peter Chubb <peter.chubb@nicta.com.au>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Specify the root to search from as argument. This avoids hardcoding
"/machine" in some places and makes it more flexible.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
* kwolf/for-anthony: (38 commits)
qemu-iotests: Fix test 031 for qcow2 v3 support
qemu-iotests: Add -o and make v3 the default for qcow2
qcow2: Zero write support
qemu-iotests: Test backing file COW with zero clusters
qemu-iotests: add a simple test for write_zeroes
qcow2: Support for feature table header extension
qcow2: Support reading zero clusters
qcow2: Version 3 images
qcow2: Ignore reserved bits in check_refcounts
qcow2: Ignore reserved bits in refcount table entries
qcow2: Simplify count_cow_clusters
qcow2: Refactor qcow2_free_any_clusters
qcow2: Ignore reserved bits in L1/L2 entries
qcow2: Fail write_compressed when overwriting data
qcow2: Ignore reserved bits in count_contiguous_clusters()
qcow2: Ignore reserved bits in get_cluster_offset
qcow2: Save disk size in snapshot header
Specification for qcow2 version 3
qcow2: Fix refcount block allocation during qcow2_alloc_cluster_at()
iotests: Resolve test failures caused by hostname
...
Fix BCD mask for date. The most visible effect of this patch is
Solaris 2.5.1 doesn't hang at boot if the day of month is >21.
Signed-off-by: Artyom Tarasenko <atar4qemu@gmail.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
* origin/master: (27 commits)
target-arm: Move reset handling to arm_cpu_reset
target-arm: Drop cpu_reset_model_id()
target-arm: Move cache ID register setup to cpu specific init fns
target-arm: Move OMAP cp15_i_{max,min} reset to cpu_state_reset
target-arm: Move feature register setup to per-CPU init fns
target-arm: Move iWMMXT wCID reset to cpu_state_reset
target-arm: Drop JTAG_ID documentation
target-arm: Move SCTLR reset value setup to per cpu init fns
target-arm: Move CTR setup to per cpu init fns
target-arm: Move MVFR* setup to per cpu init fns
target-arm: Move FPSID config to cpu init fns
target-arm: Move feature bit settings to CPU init fns
target-arm: Add QOM subclasses for each ARM cpu implementation
target-arm: remind to keep arm features in sync with linux-user/elfload.c
tci: GETPC() macro must return an uintptr_t
gdbstub: Synchronize CPU state unconditionally in gdb_set_cpu_pc
softfloat: make USE_SOFTFLOAT_STRUCT_TYPES compile
target-xtensa: add tests for LOOPNEZ and LOOPGTZ
target-xtensa: fix LOOPNEZ/LOOPGTZ translation
qtest: add m48t59 tests for Sparc
...
* stefanha/trivial-patches:
Add .gitignore for tests/
e1000: Fix spelling (segmentaion -> segmentation) in debug output
spice-qemu-char.c: Show what name is unsupported
pflash_cfi01: remove redundant line
qxl: Add missing GCC_FMT_ATTR and fix format specifier
fix block_job_set_speed name in documentation
error.c: don't return value for void function
* bonzini/scsi-next:
scsi: add SANITIZE command
SCSI emulation: should tell the guest that we actually support thin provisioning
SCSI emulation: Support unmap via WRITE_SAME_10.
scsi: advertise DPOFUA
scsi: small refactoring of MMC mode-sense
scsi: support FUA on reads
scsi: add a started field to SCSIDiskReq
scsi: force unit access on VERIFY
scsi: add support for FUA on writes
scsi: move scsi_flush_complete around
scsi: make code more homogeneous in AIO callback functions
scsi: add missing test for cancelled request
virtio-scsi: add multiqueue capability
virtio: add virtio_queue_get_id
virtio-scsi: prepare migration format for multiqueue
scsi: fix memory leak
On reset of the mpcore timer/watchdog block we need to
delete the qemu_timer in case it was running.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The versatile i2c controller implementation was separated to
its own file called versatile_i2c.c. This is done as a preparation
for adding i2c support to the versatilepb board.
Signed-off-by: Oskar Andero <oskar.andero@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This was reported by https://bugs.launchpad.net/qemu/+bug/984476.
I also changed the case for 'error'.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Eric Bénard <eric@eukrea.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
val is an uint64_t, therefore %d was not correct.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
[Actually, we should report it only if discard_granularity is nonzero.
Older SBC drafts assigned 0 to thin provisioning and 1 to thick
(resource-provisioned, they call it). Newer drafts assign respectively
1 and 2 - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This was added in SBC r26 in place of the reserved bits that were
present up to that version.
It is the same as WRITE_SAME_16 as far as QEMU is concerned.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The IDE PIO write sector code path uses bdrv_write() and hence can make
the guest unresponsive while the I/O request is in progress. This patch
converts ide_sector_write() to use bdrv_aio_writev() by using the
BUSY_STAT bit to tell the guest that the request is in progress.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Tested-by: Richard Davies <richard@arachsys.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The IDE PIO interface currently uses bdrv_read() to perform reads
synchronously. Synchronous I/O in the vcpu thread is bad because it
prevents the guest from executing code - it makes the guest
unresponsive.
This patch converts IDE PIO to use bdrv_aio_readv(). We simply need to
use the BUSY_STAT status so the guest knows to wait while we are busy.
The only external user of ide_sector_read() is restart behavior on I/O
errors and it is not affected by this change. We still need to restart
I/O in the same way.
Migration is also unaffected if I understand the code correctly. We
continue to use the same transfer function and the BUSY_STAT status
should never be migrated since we flush I/O before migrating device
state.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Tested-by: Richard Davies <richard@arachsys.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
To force unit access, add a flush operation after the actual write.
WRITE AND VERIFY commands always flush according to SBC, so do it
even though we do not perform the reread.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
First scsi_flush_complete, like scsi_dma_complete, is always called with
an active AIOCB.
Second, always test for "ret < 0" to check for errors.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Adding multiqueue is as simple as creating more than one virtqueues,
and saving the queue number for each request.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Serializing virtio-scsi requests needs a simple way to get from a
VirtQueue to the number of the queue. The virtio_queue_get_id
provides this.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In order to restore requests correctly from a multitude of virtqueues,
we need to store the id of the request queue that each request came
from.
Do this even for single-queue, by storing a hard-coded zero, to
simplify future implementation of multiqueue.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
scsibus_get_dev_path is leaking id if it is not NULL. Fix it.
Reported-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* origin/master:
Allow controlling volume with PulseAudio backend
configure: pa_simple is not needed anymore
Do not use pa_simple PulseAudio API
audio/spice: add support for volume control
hw/ac97: add support for volume control
hw/ac97: the volume mask is not only 0x1f
hw/ac97: remove USE_MIXER code
audio: don't apply volume effect if backend has VOICE_VOLUME_CAP
audio: add VOICE_VOLUME ctl
Notify any listeners such as vnc that the displaysurface has been
changed, otherwise they will segfault when first accessing the freed old
displaysurface data.
Signed-off-by: Alon Levy <alevy@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The addition of those values caused a regression where not specifying
any value for the vram bar size would result in a 4096 _byte_ surface
area. This is ok for the windows driver but causes the X driver to be
unusable. Also, it's a regression. This patch returns the default
behavior of having a 64 megabyte vram BAR.
Signed-off-by: Alon Levy <alevy@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
xc_hvm_inject_msi is only available on Xen >= 4.2: add a dummy
compatibility function for Xen < 4.2.
Also enable msi support only on Xen >= 4.2.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Combine output volume with Master and PCM registers values.
Use default values in mixer_reset ().
Set volume on post-load to update backend values.
v4,v5:
- fix some code style
Signed-off-by: Marc-Andr? Lureau <marcandre.lureau@redhat.com>
Signed-off-by: malc <av1474@comtv.ru>
It's a case by case (see Table 66. AC ?97 Baseline Audio Register Map)
Signed-off-by: Marc-Andr? Lureau <marcandre.lureau@redhat.com>
Signed-off-by: malc <av1474@comtv.ru>
That code doesn't compile. The interesting bits for volume control are
going to be rewritten in the following patch.
Signed-off-by: Marc-Andr? Lureau <marcandre.lureau@redhat.com>
Signed-off-by: malc <av1474@comtv.ru>
Not sure what the purpose of the assert() was, in any case it is bogous.
We can arrive there if transfer descriptors passed to us from the guest
failed to pass sanity checks, i.e. it is guest-triggerable. We deal
with that case by resetting the host controller. Everything is ok, no
need to throw a core dump here.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Also cleanup (reset) our device state when we reject a device due to a
speed mismatch.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The sofv value only ever gets a value assigned and is never used (read)
anywhere, so we can just drop it.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
This patch carries a complete rewrite of the usb descriptor parser.
Changes / improvements:
* We are using the USBDescriptor struct instead of hard-coded offsets
now to access descriptor data.
* (debug) printfs are all gone, tracepoints have been added instead.
* We don't try (and fail) to skip over unneeded descriptors. We parse
them all one by one. We keep track of which configuration, interface
and altsetting we are looking at and use this information to figure
which desciptors are in use and which we can ignore.
* On parse errors we clear all endpoint information, which will
disallow any communication with the device, except control endpoint
messages. This makes sure we don't end up with a silly device state
where half of the endpoints got enabled and the other half was left
disabled.
* Some sanity checks have been added.
The new parser is more robust and also leaves complete device
information in the trace log if you enable the ush_host_parse_*
tracepoints.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>