This patch lets vhost support multiqueue. The idea is simple, just launching
multiple threads of vhost and let each of vhost thread processing a subset of
the virtqueues of the device. After this change each emulated device can have
multiple vhost threads as its backend.
To do this, a virtqueue index were introduced to record to first virtqueue that
will be handled by this vhost_net device. Based on this and nvqs, vhost could
calculate its relative index to setup vhost_net device.
Since we may have many vhost/net devices for a virtio-net device. The setting of
guest notifiers were moved out of the starting/stopping of a specific vhost
thread. The vhost_net_{start|stop}() were renamed to
vhost_net_{start|stop}_one(), and a new vhost_net_{start|stop}() were introduced
to configure the guest notifiers and start/stop all vhost/vhost_net devices.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Once upon a time, it was decided that qemu_malloc(0) should abort.
Switching to glib retired that bright idea. Some code that was added
to cope with it (e.g. in commits 702ef63, b76b6e9) is still around.
Bury it.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Support backend guest notifier masking in vhost-net:
create eventfd at device init, when masked,
make vhost use that as eventfd instead of
sending an interrupt.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This makes it possible to use started flag for sanity checking
of callbacks that happen during start/stop.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Pass nvqs to set_guest_notifiers. This makes it possible to
save on irqfds by not allocating one for the control vq
for virtio-net.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
target_phys_addr_t is unwieldly, violates the C standard (_t suffixes are
reserved) and its purpose doesn't match the name (most target_phys_addr_t
addresses are not target specific). Replace it with a finger-friendly,
standards conformant hwaddr.
Outstanding patchsets can be fixed up with the command
git rebase -i --exec 'find -name "*.[ch]"
| xargs s/target_phys_addr_t/hwaddr/g' origin
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Using the AddressSpace type reduces confusion, as you can't accidentally
supply the MemoryRegion you're interested in.
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The path to /dev/vhost-net is currently hardcoded in vhost_dev_init().
This needs to be changed so that /dev/vhost-scsi can be used. Pass in
the device path instead of hardcoding it.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Under Win32, EventNotifiers will not have event_notifier_get_fd, so we
cannot call it in common code such as hw/virtio-pci.c. Pass a pointer to
the notifier, and only retrieve the file descriptor in kvm-specific code.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
It's clear from the surrounding code that
start < end so it's enough to assert end < log_size.
However, it's better to make this explicit in case
we refactor the code again.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When the vhost log is resized, we want to sync up to
the size of the old log. With that end address in place,
ignore regions that start after then end rather than
hitting assert.
This also addresses the following crash report:
When migrating a vm using vhost-net we hit the following assertion:
qemu-kvm: /usr/src/packages/BUILD/qemu-kvm-0.15.1/hw/vhost.c:30:
vhost_dev_sync_region: Assertion `start / (0x1000 * (8 *
sizeof(vhost_log_chunk_t))) < dev->log_size' failed.
The cases which the end < start check is intended to catch, such as
for vga video memory, will also likely trigger the assertion.
Reorder the code to handle this correctly.
Reported-by: Josh Durgin <josh.durgin@dreamhost.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Current memory listeners are incremental; that is, they are expected to
maintain their own state, and receive callbacks for changes to that state.
This patch adds support for stateless listeners; these work by receiving
a ->begin() callback (which tells them that new state is coming), a
sequence of ->region_add() and ->region_nop() callbacks, and then a
->commit() callback which signifies the end of the new state. They should
ignore ->region_del() callbacks.
Signed-off-by: Avi Kivity <avi@redhat.com>
This allows reverse iteration, which in turns allows consistent ordering
among multiple listeners:
l1->add
l2->add
l2->del
l1->del
Signed-off-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Instead of each target knowing or guessing the guest page size,
just pass the desired size of dirtied memory area.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
All files under GPLv2 will get GPLv2+ changes starting tomorrow.
event_notifier.c and exec-obsolete.h were only ever touched by Red Hat
employees and can be relicensed now.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
vhost memory management doesn't care about non-memory (e.g. PIO) or non-RAM
regions. Adjust the filtering to reflect that, and move it earlier so it
applies to mem_sections too.
Signed-off-by: Avi Kivity <avi@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
A memset() used to delete an entry in an array did not take into account
the array element's size.
Signed-off-by: Avi Kivity <avi@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
MemoryListener::region_add() gives us a slice of a MemoryRegion, not a
region. Adjust the userspace address to reflect that.
Signed-off-by: Avi Kivity <avi@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Drop the use of cpu_register_phys_memory_client() in favour of the new
MemoryListener API. The new API simplifies the caller, since there is no
need to deal with splitting and merging slots; however this is not exploited
in this patch.
Signed-off-by: Avi Kivity <avi@redhat.com>
When the vhost notifier is disabled, the userspace handler runs
immediately: virtio_pci_set_host_notifier_internal might
call virtio_queue_notify_vq.
Since the VQ state and the tap backend state aren't
recovered yet, this causes
"Guest moved used index from XXX to YYY" assertions.
The solution is to split out host notifier handling
from vhost VQ setup and disable notifiers as our last step
when we stop vhost-net. For symmetry enable them first thing
on start.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The 'to' can go negative when the first region gets removed
(it gets incremented by to 0 immediately afterward), which
makes the assertion fail. Nothing breaks if
to < 0 here so just remove the assert.
Tested-by: David Ahern <daahern@cisco.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost dev stop failed to clear the log field.
Typically not an issue as dev start overwrites this field,
but if logging gets disabled before the following start,
it doesn't so this causes a double free.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cirrus VGA (at least) calls register memory region
with the same values again and again. The
registration in vhost-net slows this a lot,
optimize by checking that the same data is already registered.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost was passing a physical address to cpu_physical_memory_set_dirty,
which is wrong: we need to translate to ram address first.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Note: this lead to crashes during migration, so the patch
is needed on the stable branch too.
In order to use log_start/log_stop with Xen as well in the vga code,
this two operations have been put in CPUPhysMemoryClient.
The two new functions cpu_physical_log_start,cpu_physical_log_stop are
used in hw/vga.c and replace the kvm_log_start/stop. With this, vga does
no longer depends on kvm header.
[ Jan: rebasing and style fixlets ]
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When MSI is off, each interrupt needs to be bounced through the io
thread when it's set/cleared, so vhost-net causes more context switches and
higher CPU utilization than userspace virtio which handles networking in
the same thread.
We'll need to fix this by adding level irq support in kvm irqfd,
for now disable vhost-net in these configurations.
Added a vhostforce flag to force vhost-net back on.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We still need advance address even we find there's no dirty pages in
current chunk.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When using irqfd with vhost-net to inject interrupts,
a single evenfd might inject multiple interrupts.
Implementing this is much easier with a single
per-device callback to set guest notifiers.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Extract range functions from pci.h. These will be used by later patches
by non-PCI devices. Adjust current users.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
This header is not present on my system and causes a build
failure, but is also not used in these files, so remove it.
Signed-off-by: Mike McCormack <mikem@ring3k.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
We need to know ring layout to allocate log buffer.
So init rings first.
Also fixes a theoretical memory-leak-on-error.
https://bugzilla.redhat.com/show_bug.cgi?id=615228
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
So the userspace headers define KERNEL_STRICT_NAMES and there's no
conflict on type definition for older kernels.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
u_int64_t raises compiler error messages:
CC libhw32/virtio.o
/qemu/ar7/hw/virtio.c: In function ‘virtio_queue_get_avail_size’:
/qemu/ar7/hw/virtio.c:776: error: ‘u_int64_t’ undeclared (first use in this function)
/qemu/ar7/hw/virtio.c:776: error: (Each undeclared identifier is reported only once
/qemu/ar7/hw/virtio.c:776: error: for each function it appears in.)
Replacing u_int64_t by uint64_t helps.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This adds vhost net device support in qemu. Will be tied to tap device
and virtio by following patches. Raw backend is currently missing,
will be worked on/submitted separately.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>