2010-02-22 19:57:54 +03:00
|
|
|
#ifndef __QEMU_BARRIER_H
|
|
|
|
#define __QEMU_BARRIER_H 1
|
|
|
|
|
2010-06-25 18:56:49 +04:00
|
|
|
/* Compiler barrier */
|
|
|
|
#define barrier() asm volatile("" ::: "memory")
|
|
|
|
|
virtio: add missing mb() on notification
During normal operation, virtio first writes a used index
and then checks whether it should interrupt the guest
by reading guest avail index/flag values.
Guest does the reverse: writes the index/flag,
then checks the used ring.
The ordering is important: if host avail flag read bypasses the used
index write, we could in effect get this timing:
host avail flag read
guest enable interrupts: avail flag write
guest check used ring: ring is empty
host used index write
which results in a lost interrupt: guest will never be notified
about the used ring update.
This actually can happen when using kvm with an io thread,
such that the guest vcpu and qemu run on different host cpus,
and this has actually been observed in the field
(but only seems to trigger on very specific processor types)
with userspace virtio: vhost has the necessary smp_mb()
in place to prevent the regordering, so the same workload stalls
forever waiting for an interrupt with vhost=off but works
fine with vhost=on.
Insert an smp_mb barrier operation in userspace virtio to
ensure the correct ordering.
Applying this patch fixed the race condition we have observed.
Tested on x86_64. I checked the code generated by the new macro
for i386 and ppc but didn't run virtio.
Note: mb could in theory be implemented by __sync_synchronize, but this
would make us hit old GCC bugs. Besides old GCC
not implementing __sync_synchronize at all, there were bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793
in this functionality as recently as in 4.3.
As we need asm for rmb,wmb anyway, it's just as well to
use it for mb.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-04-22 17:45:53 +04:00
|
|
|
#if defined(__i386__)
|
Barriers in qemu-barrier.h should not be x86 specific
qemu-barrier.h contains a few macros implementing memory barrier
primitives used in several places throughout qemu. However, apart
from the compiler-only barrier, the defined wmb() is correct only for
x86, or platforms which are similarly strongly ordered.
This patch addresses the FIXME about this by making the wmb() macro
arch dependent. On x86, it remains a compiler barrier only, but with
a comment explaining in more detail the conditions under which this is
correct. On weakly-ordered powerpc, an "eieio" instruction is used,
again with explanation of the conditions under which it is sufficient.
On other platforms, we use the __sync_synchronize() primitive,
available in sufficiently recent gcc (4.2 and after?). This should
implement a full barrier which will be sufficient on all platforms,
although it may be overkill in some cases. Other platforms can add
optimized versions in future if it's worth it for them.
Without proper memory barriers, it is easy to reproduce ordering
problems with virtio on powerpc; specifically, the QEMU puts new
element into the "used" ring and then updates the ring free-running
counter. Without a barrier between these under the right
circumstances, the guest linux driver can receive an interrupt, read
the counter change but find the ring element to be handled still has
an old value, leading to an "id %u is not a head!\n" error message.
Similar problems are likely to be possible with kvm on other weakly
ordered platforms.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-09-20 06:05:21 +04:00
|
|
|
|
2012-10-07 22:07:11 +04:00
|
|
|
#include "compiler.h" /* QEMU_GNUC_PREREQ */
|
2012-10-06 14:46:15 +04:00
|
|
|
|
Barriers in qemu-barrier.h should not be x86 specific
qemu-barrier.h contains a few macros implementing memory barrier
primitives used in several places throughout qemu. However, apart
from the compiler-only barrier, the defined wmb() is correct only for
x86, or platforms which are similarly strongly ordered.
This patch addresses the FIXME about this by making the wmb() macro
arch dependent. On x86, it remains a compiler barrier only, but with
a comment explaining in more detail the conditions under which this is
correct. On weakly-ordered powerpc, an "eieio" instruction is used,
again with explanation of the conditions under which it is sufficient.
On other platforms, we use the __sync_synchronize() primitive,
available in sufficiently recent gcc (4.2 and after?). This should
implement a full barrier which will be sufficient on all platforms,
although it may be overkill in some cases. Other platforms can add
optimized versions in future if it's worth it for them.
Without proper memory barriers, it is easy to reproduce ordering
problems with virtio on powerpc; specifically, the QEMU puts new
element into the "used" ring and then updates the ring free-running
counter. Without a barrier between these under the right
circumstances, the guest linux driver can receive an interrupt, read
the counter change but find the ring element to be handled still has
an old value, leading to an "id %u is not a head!\n" error message.
Similar problems are likely to be possible with kvm on other weakly
ordered platforms.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-09-20 06:05:21 +04:00
|
|
|
/*
|
2012-04-23 16:46:22 +04:00
|
|
|
* Because of the strongly ordered x86 storage model, wmb() and rmb() are nops
|
Barriers in qemu-barrier.h should not be x86 specific
qemu-barrier.h contains a few macros implementing memory barrier
primitives used in several places throughout qemu. However, apart
from the compiler-only barrier, the defined wmb() is correct only for
x86, or platforms which are similarly strongly ordered.
This patch addresses the FIXME about this by making the wmb() macro
arch dependent. On x86, it remains a compiler barrier only, but with
a comment explaining in more detail the conditions under which this is
correct. On weakly-ordered powerpc, an "eieio" instruction is used,
again with explanation of the conditions under which it is sufficient.
On other platforms, we use the __sync_synchronize() primitive,
available in sufficiently recent gcc (4.2 and after?). This should
implement a full barrier which will be sufficient on all platforms,
although it may be overkill in some cases. Other platforms can add
optimized versions in future if it's worth it for them.
Without proper memory barriers, it is easy to reproduce ordering
problems with virtio on powerpc; specifically, the QEMU puts new
element into the "used" ring and then updates the ring free-running
counter. Without a barrier between these under the right
circumstances, the guest linux driver can receive an interrupt, read
the counter change but find the ring element to be handled still has
an old value, leading to an "id %u is not a head!\n" error message.
Similar problems are likely to be possible with kvm on other weakly
ordered platforms.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-09-20 06:05:21 +04:00
|
|
|
* on x86(well, a compiler barrier only). Well, at least as long as
|
|
|
|
* qemu doesn't do accesses to write-combining memory or non-temporal
|
|
|
|
* load/stores from C code.
|
|
|
|
*/
|
|
|
|
#define smp_wmb() barrier()
|
2012-04-23 16:46:22 +04:00
|
|
|
#define smp_rmb() barrier()
|
virtio: add missing mb() on notification
During normal operation, virtio first writes a used index
and then checks whether it should interrupt the guest
by reading guest avail index/flag values.
Guest does the reverse: writes the index/flag,
then checks the used ring.
The ordering is important: if host avail flag read bypasses the used
index write, we could in effect get this timing:
host avail flag read
guest enable interrupts: avail flag write
guest check used ring: ring is empty
host used index write
which results in a lost interrupt: guest will never be notified
about the used ring update.
This actually can happen when using kvm with an io thread,
such that the guest vcpu and qemu run on different host cpus,
and this has actually been observed in the field
(but only seems to trigger on very specific processor types)
with userspace virtio: vhost has the necessary smp_mb()
in place to prevent the regordering, so the same workload stalls
forever waiting for an interrupt with vhost=off but works
fine with vhost=on.
Insert an smp_mb barrier operation in userspace virtio to
ensure the correct ordering.
Applying this patch fixed the race condition we have observed.
Tested on x86_64. I checked the code generated by the new macro
for i386 and ppc but didn't run virtio.
Note: mb could in theory be implemented by __sync_synchronize, but this
would make us hit old GCC bugs. Besides old GCC
not implementing __sync_synchronize at all, there were bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793
in this functionality as recently as in 4.3.
As we need asm for rmb,wmb anyway, it's just as well to
use it for mb.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-04-22 17:45:53 +04:00
|
|
|
/*
|
|
|
|
* We use GCC builtin if it's available, as that can use
|
|
|
|
* mfence on 32 bit as well, e.g. if built with -march=pentium-m.
|
|
|
|
* However, on i386, there seem to be known bugs as recently as 4.3.
|
|
|
|
* */
|
2012-10-04 01:11:02 +04:00
|
|
|
#if QEMU_GNUC_PREREQ(4, 4)
|
virtio: add missing mb() on notification
During normal operation, virtio first writes a used index
and then checks whether it should interrupt the guest
by reading guest avail index/flag values.
Guest does the reverse: writes the index/flag,
then checks the used ring.
The ordering is important: if host avail flag read bypasses the used
index write, we could in effect get this timing:
host avail flag read
guest enable interrupts: avail flag write
guest check used ring: ring is empty
host used index write
which results in a lost interrupt: guest will never be notified
about the used ring update.
This actually can happen when using kvm with an io thread,
such that the guest vcpu and qemu run on different host cpus,
and this has actually been observed in the field
(but only seems to trigger on very specific processor types)
with userspace virtio: vhost has the necessary smp_mb()
in place to prevent the regordering, so the same workload stalls
forever waiting for an interrupt with vhost=off but works
fine with vhost=on.
Insert an smp_mb barrier operation in userspace virtio to
ensure the correct ordering.
Applying this patch fixed the race condition we have observed.
Tested on x86_64. I checked the code generated by the new macro
for i386 and ppc but didn't run virtio.
Note: mb could in theory be implemented by __sync_synchronize, but this
would make us hit old GCC bugs. Besides old GCC
not implementing __sync_synchronize at all, there were bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793
in this functionality as recently as in 4.3.
As we need asm for rmb,wmb anyway, it's just as well to
use it for mb.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-04-22 17:45:53 +04:00
|
|
|
#define smp_mb() __sync_synchronize()
|
|
|
|
#else
|
|
|
|
#define smp_mb() asm volatile("lock; addl $0,0(%%esp) " ::: "memory")
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#elif defined(__x86_64__)
|
|
|
|
|
|
|
|
#define smp_wmb() barrier()
|
2012-04-23 16:46:22 +04:00
|
|
|
#define smp_rmb() barrier()
|
virtio: add missing mb() on notification
During normal operation, virtio first writes a used index
and then checks whether it should interrupt the guest
by reading guest avail index/flag values.
Guest does the reverse: writes the index/flag,
then checks the used ring.
The ordering is important: if host avail flag read bypasses the used
index write, we could in effect get this timing:
host avail flag read
guest enable interrupts: avail flag write
guest check used ring: ring is empty
host used index write
which results in a lost interrupt: guest will never be notified
about the used ring update.
This actually can happen when using kvm with an io thread,
such that the guest vcpu and qemu run on different host cpus,
and this has actually been observed in the field
(but only seems to trigger on very specific processor types)
with userspace virtio: vhost has the necessary smp_mb()
in place to prevent the regordering, so the same workload stalls
forever waiting for an interrupt with vhost=off but works
fine with vhost=on.
Insert an smp_mb barrier operation in userspace virtio to
ensure the correct ordering.
Applying this patch fixed the race condition we have observed.
Tested on x86_64. I checked the code generated by the new macro
for i386 and ppc but didn't run virtio.
Note: mb could in theory be implemented by __sync_synchronize, but this
would make us hit old GCC bugs. Besides old GCC
not implementing __sync_synchronize at all, there were bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793
in this functionality as recently as in 4.3.
As we need asm for rmb,wmb anyway, it's just as well to
use it for mb.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-04-22 17:45:53 +04:00
|
|
|
#define smp_mb() asm volatile("mfence" ::: "memory")
|
Barriers in qemu-barrier.h should not be x86 specific
qemu-barrier.h contains a few macros implementing memory barrier
primitives used in several places throughout qemu. However, apart
from the compiler-only barrier, the defined wmb() is correct only for
x86, or platforms which are similarly strongly ordered.
This patch addresses the FIXME about this by making the wmb() macro
arch dependent. On x86, it remains a compiler barrier only, but with
a comment explaining in more detail the conditions under which this is
correct. On weakly-ordered powerpc, an "eieio" instruction is used,
again with explanation of the conditions under which it is sufficient.
On other platforms, we use the __sync_synchronize() primitive,
available in sufficiently recent gcc (4.2 and after?). This should
implement a full barrier which will be sufficient on all platforms,
although it may be overkill in some cases. Other platforms can add
optimized versions in future if it's worth it for them.
Without proper memory barriers, it is easy to reproduce ordering
problems with virtio on powerpc; specifically, the QEMU puts new
element into the "used" ring and then updates the ring free-running
counter. Without a barrier between these under the right
circumstances, the guest linux driver can receive an interrupt, read
the counter change but find the ring element to be handled still has
an old value, leading to an "id %u is not a head!\n" error message.
Similar problems are likely to be possible with kvm on other weakly
ordered platforms.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-09-20 06:05:21 +04:00
|
|
|
|
2011-11-01 13:39:49 +04:00
|
|
|
#elif defined(_ARCH_PPC)
|
Barriers in qemu-barrier.h should not be x86 specific
qemu-barrier.h contains a few macros implementing memory barrier
primitives used in several places throughout qemu. However, apart
from the compiler-only barrier, the defined wmb() is correct only for
x86, or platforms which are similarly strongly ordered.
This patch addresses the FIXME about this by making the wmb() macro
arch dependent. On x86, it remains a compiler barrier only, but with
a comment explaining in more detail the conditions under which this is
correct. On weakly-ordered powerpc, an "eieio" instruction is used,
again with explanation of the conditions under which it is sufficient.
On other platforms, we use the __sync_synchronize() primitive,
available in sufficiently recent gcc (4.2 and after?). This should
implement a full barrier which will be sufficient on all platforms,
although it may be overkill in some cases. Other platforms can add
optimized versions in future if it's worth it for them.
Without proper memory barriers, it is easy to reproduce ordering
problems with virtio on powerpc; specifically, the QEMU puts new
element into the "used" ring and then updates the ring free-running
counter. Without a barrier between these under the right
circumstances, the guest linux driver can receive an interrupt, read
the counter change but find the ring element to be handled still has
an old value, leading to an "id %u is not a head!\n" error message.
Similar problems are likely to be possible with kvm on other weakly
ordered platforms.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-09-20 06:05:21 +04:00
|
|
|
|
|
|
|
/*
|
virtio: add missing mb() on notification
During normal operation, virtio first writes a used index
and then checks whether it should interrupt the guest
by reading guest avail index/flag values.
Guest does the reverse: writes the index/flag,
then checks the used ring.
The ordering is important: if host avail flag read bypasses the used
index write, we could in effect get this timing:
host avail flag read
guest enable interrupts: avail flag write
guest check used ring: ring is empty
host used index write
which results in a lost interrupt: guest will never be notified
about the used ring update.
This actually can happen when using kvm with an io thread,
such that the guest vcpu and qemu run on different host cpus,
and this has actually been observed in the field
(but only seems to trigger on very specific processor types)
with userspace virtio: vhost has the necessary smp_mb()
in place to prevent the regordering, so the same workload stalls
forever waiting for an interrupt with vhost=off but works
fine with vhost=on.
Insert an smp_mb barrier operation in userspace virtio to
ensure the correct ordering.
Applying this patch fixed the race condition we have observed.
Tested on x86_64. I checked the code generated by the new macro
for i386 and ppc but didn't run virtio.
Note: mb could in theory be implemented by __sync_synchronize, but this
would make us hit old GCC bugs. Besides old GCC
not implementing __sync_synchronize at all, there were bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793
in this functionality as recently as in 4.3.
As we need asm for rmb,wmb anyway, it's just as well to
use it for mb.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-04-22 17:45:53 +04:00
|
|
|
* We use an eieio() for wmb() on powerpc. This assumes we don't
|
Barriers in qemu-barrier.h should not be x86 specific
qemu-barrier.h contains a few macros implementing memory barrier
primitives used in several places throughout qemu. However, apart
from the compiler-only barrier, the defined wmb() is correct only for
x86, or platforms which are similarly strongly ordered.
This patch addresses the FIXME about this by making the wmb() macro
arch dependent. On x86, it remains a compiler barrier only, but with
a comment explaining in more detail the conditions under which this is
correct. On weakly-ordered powerpc, an "eieio" instruction is used,
again with explanation of the conditions under which it is sufficient.
On other platforms, we use the __sync_synchronize() primitive,
available in sufficiently recent gcc (4.2 and after?). This should
implement a full barrier which will be sufficient on all platforms,
although it may be overkill in some cases. Other platforms can add
optimized versions in future if it's worth it for them.
Without proper memory barriers, it is easy to reproduce ordering
problems with virtio on powerpc; specifically, the QEMU puts new
element into the "used" ring and then updates the ring free-running
counter. Without a barrier between these under the right
circumstances, the guest linux driver can receive an interrupt, read
the counter change but find the ring element to be handled still has
an old value, leading to an "id %u is not a head!\n" error message.
Similar problems are likely to be possible with kvm on other weakly
ordered platforms.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-09-20 06:05:21 +04:00
|
|
|
* need to order cacheable and non-cacheable stores with respect to
|
|
|
|
* each other
|
|
|
|
*/
|
|
|
|
#define smp_wmb() asm volatile("eieio" ::: "memory")
|
2012-04-23 16:46:22 +04:00
|
|
|
|
|
|
|
#if defined(__powerpc64__)
|
|
|
|
#define smp_rmb() asm volatile("lwsync" ::: "memory")
|
|
|
|
#else
|
|
|
|
#define smp_rmb() asm volatile("sync" ::: "memory")
|
|
|
|
#endif
|
|
|
|
|
virtio: add missing mb() on notification
During normal operation, virtio first writes a used index
and then checks whether it should interrupt the guest
by reading guest avail index/flag values.
Guest does the reverse: writes the index/flag,
then checks the used ring.
The ordering is important: if host avail flag read bypasses the used
index write, we could in effect get this timing:
host avail flag read
guest enable interrupts: avail flag write
guest check used ring: ring is empty
host used index write
which results in a lost interrupt: guest will never be notified
about the used ring update.
This actually can happen when using kvm with an io thread,
such that the guest vcpu and qemu run on different host cpus,
and this has actually been observed in the field
(but only seems to trigger on very specific processor types)
with userspace virtio: vhost has the necessary smp_mb()
in place to prevent the regordering, so the same workload stalls
forever waiting for an interrupt with vhost=off but works
fine with vhost=on.
Insert an smp_mb barrier operation in userspace virtio to
ensure the correct ordering.
Applying this patch fixed the race condition we have observed.
Tested on x86_64. I checked the code generated by the new macro
for i386 and ppc but didn't run virtio.
Note: mb could in theory be implemented by __sync_synchronize, but this
would make us hit old GCC bugs. Besides old GCC
not implementing __sync_synchronize at all, there were bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793
in this functionality as recently as in 4.3.
As we need asm for rmb,wmb anyway, it's just as well to
use it for mb.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-04-22 17:45:53 +04:00
|
|
|
#define smp_mb() asm volatile("sync" ::: "memory")
|
Barriers in qemu-barrier.h should not be x86 specific
qemu-barrier.h contains a few macros implementing memory barrier
primitives used in several places throughout qemu. However, apart
from the compiler-only barrier, the defined wmb() is correct only for
x86, or platforms which are similarly strongly ordered.
This patch addresses the FIXME about this by making the wmb() macro
arch dependent. On x86, it remains a compiler barrier only, but with
a comment explaining in more detail the conditions under which this is
correct. On weakly-ordered powerpc, an "eieio" instruction is used,
again with explanation of the conditions under which it is sufficient.
On other platforms, we use the __sync_synchronize() primitive,
available in sufficiently recent gcc (4.2 and after?). This should
implement a full barrier which will be sufficient on all platforms,
although it may be overkill in some cases. Other platforms can add
optimized versions in future if it's worth it for them.
Without proper memory barriers, it is easy to reproduce ordering
problems with virtio on powerpc; specifically, the QEMU puts new
element into the "used" ring and then updates the ring free-running
counter. Without a barrier between these under the right
circumstances, the guest linux driver can receive an interrupt, read
the counter change but find the ring element to be handled still has
an old value, leading to an "id %u is not a head!\n" error message.
Similar problems are likely to be possible with kvm on other weakly
ordered platforms.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-09-20 06:05:21 +04:00
|
|
|
|
|
|
|
#else
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For (host) platforms we don't have explicit barrier definitions
|
|
|
|
* for, we use the gcc __sync_synchronize() primitive to generate a
|
|
|
|
* full barrier. This should be safe on all platforms, though it may
|
2012-04-23 16:46:22 +04:00
|
|
|
* be overkill for wmb() and rmb().
|
Barriers in qemu-barrier.h should not be x86 specific
qemu-barrier.h contains a few macros implementing memory barrier
primitives used in several places throughout qemu. However, apart
from the compiler-only barrier, the defined wmb() is correct only for
x86, or platforms which are similarly strongly ordered.
This patch addresses the FIXME about this by making the wmb() macro
arch dependent. On x86, it remains a compiler barrier only, but with
a comment explaining in more detail the conditions under which this is
correct. On weakly-ordered powerpc, an "eieio" instruction is used,
again with explanation of the conditions under which it is sufficient.
On other platforms, we use the __sync_synchronize() primitive,
available in sufficiently recent gcc (4.2 and after?). This should
implement a full barrier which will be sufficient on all platforms,
although it may be overkill in some cases. Other platforms can add
optimized versions in future if it's worth it for them.
Without proper memory barriers, it is easy to reproduce ordering
problems with virtio on powerpc; specifically, the QEMU puts new
element into the "used" ring and then updates the ring free-running
counter. Without a barrier between these under the right
circumstances, the guest linux driver can receive an interrupt, read
the counter change but find the ring element to be handled still has
an old value, leading to an "id %u is not a head!\n" error message.
Similar problems are likely to be possible with kvm on other weakly
ordered platforms.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-09-20 06:05:21 +04:00
|
|
|
*/
|
|
|
|
#define smp_wmb() __sync_synchronize()
|
virtio: add missing mb() on notification
During normal operation, virtio first writes a used index
and then checks whether it should interrupt the guest
by reading guest avail index/flag values.
Guest does the reverse: writes the index/flag,
then checks the used ring.
The ordering is important: if host avail flag read bypasses the used
index write, we could in effect get this timing:
host avail flag read
guest enable interrupts: avail flag write
guest check used ring: ring is empty
host used index write
which results in a lost interrupt: guest will never be notified
about the used ring update.
This actually can happen when using kvm with an io thread,
such that the guest vcpu and qemu run on different host cpus,
and this has actually been observed in the field
(but only seems to trigger on very specific processor types)
with userspace virtio: vhost has the necessary smp_mb()
in place to prevent the regordering, so the same workload stalls
forever waiting for an interrupt with vhost=off but works
fine with vhost=on.
Insert an smp_mb barrier operation in userspace virtio to
ensure the correct ordering.
Applying this patch fixed the race condition we have observed.
Tested on x86_64. I checked the code generated by the new macro
for i386 and ppc but didn't run virtio.
Note: mb could in theory be implemented by __sync_synchronize, but this
would make us hit old GCC bugs. Besides old GCC
not implementing __sync_synchronize at all, there were bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793
in this functionality as recently as in 4.3.
As we need asm for rmb,wmb anyway, it's just as well to
use it for mb.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-04-22 17:45:53 +04:00
|
|
|
#define smp_mb() __sync_synchronize()
|
2012-04-23 16:46:22 +04:00
|
|
|
#define smp_rmb() __sync_synchronize()
|
Barriers in qemu-barrier.h should not be x86 specific
qemu-barrier.h contains a few macros implementing memory barrier
primitives used in several places throughout qemu. However, apart
from the compiler-only barrier, the defined wmb() is correct only for
x86, or platforms which are similarly strongly ordered.
This patch addresses the FIXME about this by making the wmb() macro
arch dependent. On x86, it remains a compiler barrier only, but with
a comment explaining in more detail the conditions under which this is
correct. On weakly-ordered powerpc, an "eieio" instruction is used,
again with explanation of the conditions under which it is sufficient.
On other platforms, we use the __sync_synchronize() primitive,
available in sufficiently recent gcc (4.2 and after?). This should
implement a full barrier which will be sufficient on all platforms,
although it may be overkill in some cases. Other platforms can add
optimized versions in future if it's worth it for them.
Without proper memory barriers, it is easy to reproduce ordering
problems with virtio on powerpc; specifically, the QEMU puts new
element into the "used" ring and then updates the ring free-running
counter. Without a barrier between these under the right
circumstances, the guest linux driver can receive an interrupt, read
the counter change but find the ring element to be handled still has
an old value, leading to an "id %u is not a head!\n" error message.
Similar problems are likely to be possible with kvm on other weakly
ordered platforms.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-09-20 06:05:21 +04:00
|
|
|
|
|
|
|
#endif
|
|
|
|
|
2010-02-22 19:57:54 +03:00
|
|
|
#endif
|