Commit Graph

87 Commits

Author SHA1 Message Date
mst@redhat.com
5430a28fe4 vhost: force vhost off for non-MSI guests
When MSI is off, each interrupt needs to be bounced through the io
thread when it's set/cleared, so vhost-net causes more context switches and
higher CPU utilization than userspace virtio which handles networking in
the same thread.

We'll need to fix this by adding level irq support in kvm irqfd,
for now disable vhost-net in these configurations.

Added a vhostforce flag to force vhost-net back on.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-02-01 16:50:44 -06:00
Aurelien Jarno
44b15bc5c6 virtio-net: fix cross-endianness support
virtio-net used to work on cross-endianness configurations, but doesn't
anymore with recent guest kernels, as the new features don't handle
endianness correctly.

This patch fixes wrong conversion, and add missing ones to make
virtio-net working. Tested on the following configurations:
- i386 guest on x86_64 host
- ppc guest on x86_64 host
- i386 guest on mips host
- ppc guest on mips host

Cc: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2011-01-29 15:07:56 +01:00
Michael S. Tsirkin
85cf2a8d74 virtio: move vmstate change tracking to core
Move tracking vmstate change from virtio-net to virtio.c
as it is going to be used by virito-blk and virtio-pci
for the ioeventfd support.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2011-01-10 14:44:07 +02:00
Anthony Liguori
b254b0d15d Merge remote branch 'mst/for_anthony' into staging 2010-12-17 08:21:29 -06:00
Gleb Natapov
1ca4d09ae0 Add bootindex parameter to net/block/fd device
If bootindex is specified on command line a string that describes device
in firmware readable way is added into sorted list. Later this list will
be passed into firmware to control boot order.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2010-12-11 21:32:46 +00:00
Michael S. Tsirkin
783e770693 virtio-net: stop/start bh when appropriate
Avoid sending out packets, and modifying
memory, when VM is stopped.
Add assert statements to verify this does not happen.

Avoid scheduling bh when vhost-net is started.

Stop bh when driver disabled bus mastering
(we must not access memory after this).

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Jason Wang <jasowang@redhat.com>
2010-12-09 12:47:48 +02:00
Michael S. Tsirkin
9547732304 virtio-net: don't dma while vm is stopped
DMA into memory while VM is stopped makes it
hard to debug migration (consequitive saves
result in different files).
Fixing this completely is a large effort,
this patch does this for virtio-net.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Jason Wang <jasowang@redhat.com>
2010-12-09 12:47:48 +02:00
Stefan Hajnoczi
e7b43f7e60 virtio-net: Convert fprintf() to error_report()
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-11-21 09:16:58 -06:00
Michael S. Tsirkin
afbaa7b438 virtio-net: unify vhost-net start/stop
Move all of vhost-net start/stop logic to a single routine,
and call it from everywhere.

Additionally, start/stop vhost-net on link up/down:
we should not transmit anything if user asked us to
put the link down.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
2010-10-07 12:19:47 +02:00
Alex Williamson
a697a334b3 virtio-net: Introduce a new bottom half packet TX
Based on a patch from Mark McLoughlin, this patch introduces a new
bottom half packet transmitter that avoids the latency imposed by
the tx_timer approach.  Rather than scheduling a timer when a TX
packet comes in, schedule a bottom half to be run from the iothread.
The bottom half handler first attempts to flush the queue with
notification disabled (this is where we could race with a guest
without txburst).  If we flush a full burst, reschedule immediately.
If we send short of a full burst, try to re-enable notification.
To avoid a race with TXs that may have occurred, we must then
flush again.  If we find some packets to send, the guest it probably
active, so we can reschedule again.

tx_timer and tx_bh are mutually exclusive, so we can re-use the
tx_waiting flag to indicate one or the other needs to be setup.
This allows us to seamlessly migrate between timer and bh TX
handling.

The bottom half handler becomes the new default and we add a new
tx= option to virtio-net-pci.  Usage:

-device virtio-net-pci,tx=timer # select timer mitigation vs "bh"

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2010-09-07 20:29:29 +03:00
Alex Williamson
4b4b8d361c virtio-net: Rename tx_timer_active to tx_waiting
De-couple this from the timer since we might want to use
different backends to send the packet.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2010-09-07 20:29:28 +03:00
Alex Williamson
e3f30488e5 virtio-net: Limit number of packets sent per TX flush
If virtio_net_flush_tx() is called with notification disabled, we can
race with the guest, processing packets at the same rate as they
get produced.  The trouble is that this means we have no guaranteed
exit condition from the function and can spend minutes in there.
Currently flush_tx is only called with notification on, which seems
to limit us to one pass through the queue per call.  An upcoming
patch changes this.

Also add an option to set this value on the command line as different
workloads may wish to use different values.  We can't necessarily
support any random value, so this is a developer option: x-txburst=
Usage:

-device virtio-net-pci,x-txburst=64 # 64 packets per tx flush

One pass through the queue (256) seems to be a good default value
for this, balancing latency with throughput.  We use a signed int
for x-txburst because 2^31 packets in a burst would take many, many
minutes to process and it allows us to easily return a negative
value value from virtio_net_flush_tx() to indicate a back-off
or error condition.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2010-09-07 20:29:26 +03:00
Alex Williamson
f0c07c7c7b virtio-net: Make tx_timer timeout configurable
Add an option to make the TX mitigation timer adjustable as a device
option.  The 150us hard coded default used currently is reasonable,
but may not be suitable for all workloads, this gives us a way to
adjust it using a single binary.  We can't support any random option
though, so use the "x-" prefix to indicate this is a developer
option.  Usage:

-device virtio-net-pci,x-txtimer=500000,... # .5ms timeout

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2010-09-07 20:29:24 +03:00
Michael S. Tsirkin
279a42535d virtio-net: correct packet length math
We were requesting too much when checking buffer
length: size already includes host header length.

Further, we should not exit if we get a packet that
is too long, since this might not be under control
of the guest. Just drop the packet.

Red Hat bz 591494

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2010-07-11 23:31:52 +03:00
Alex Williamson
01657c867d virtio-net: Incorporate a DeviceState pointer and let savevm track instances
Stuff a pointer to the DeviceState into the VirtIONet structure so that
we can easily remove the vmstate entry later.  Also, let vmstate track
the instance number (it should always be zero internally since the
device path should now be unique).

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-07-06 10:36:28 -05:00
Alex Williamson
0be71e324f savevm: Add DeviceState param
When available, we'd like to be able to access the DeviceState
when registering a savevm.  For buses with a get_dev_path()
function, this will allow us to create more unique savevm
id strings.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-07-06 10:36:28 -05:00
Michael S. Tsirkin
940cda94dc virtio-net: truncating packet
virtio net attempts to peek into virtio queue to
determine that we have enough space for the complete
packet to fit. However, it fails to account for space
consumed by virtio net header when it does this,
under stress this results in a failure
with the message 'truncating packet'.

redhat bz 591494.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2010-06-07 17:23:04 +03:00
Michael S. Tsirkin
7f97448122 virtio-net: stop vhost backend on vmstop
vhost net currently keeps running after vmstop,
which causes trouble as qemy does not check
for dirty pages anymore.
The fix is to simply keep vm and vhost running/stopped
status in sync.

Tested-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2010-06-02 11:54:45 +03:00
Michael S. Tsirkin
57c3229ba1 virtio-net: return with value in void function
virtio-net has return with value in a void function.
No idea why does it compile with gcc,
but this isn't standard C.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2010-05-12 18:33:39 +03:00
Amit Shah
e4d5639dbb iov: Introduce a new file for helpers around iovs, add iov_from_buf()
The virtio-net code uses iov_fill() which fills an iov from a linear
buffer. The virtio-serial-bus code does something similar in an
open-coded function.

Create a new iov.c file that has iov_from_buf().

Convert virtio-net and virtio-serial-bus over to use this functionality.
virtio-net used ints to hold sizes, the new function is going to use
size_t types.

Later commits will add the opposite functionality -- going from an iov
to a linear buffer.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-04-28 08:58:22 -05:00
David L Stevens
dc14a39781 vhost: fix features ack
vhost driver in qemu didn't ack features, and this happens
to work because we don't really require any features. However,
it's better not to rely on this. This patch passes features to
vhost as guest acks them.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2010-04-13 23:58:20 +02:00
Michael S. Tsirkin
9bc6304c15 virtio-net: vhost net support
This connects virtio-net to vhost net backend.
The code is structured in a way analogous to what we have with vnet
header capability in tap.

We start/stop backend on driver start/stop as
well as on save and vm start (for migration).

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-04-01 13:56:43 -05:00
Markus Armbruster
1ecda02b24 error: Replace qemu_error() by error_report()
error_report() terminates the message with a newline.  Strip it it
from its arguments.

This fixes a few error messages lacking a newline:
net_handle_fd_param()'s "No file descriptor named %s found", and
tap_open()'s "vnet_hdr=1 requested, but no kernel support for
IFF_VNET_HDR available" (all three versions).

There's one place that passes arguments without newlines
intentionally: load_vmstate().  Fix it up.
2010-03-16 16:58:32 +01:00
Markus Armbruster
2f7920166d error: Move qemu_error & friends into their own header 2010-03-16 16:55:05 +01:00
Tom Lendacky
06b1297017 virtio-net: fix network stall under load
Fix a race condition where qemu finds that there are not enough virtio
ring buffers available and the guest make more buffers available before
qemu can enable notifications.

Signed-off-by: Tom Lendacky <toml@us.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-10 12:48:48 -06:00
Amit Shah
22c253d9d6 virtio: net: remove dead assignment
clang-analyzer points out value assigned to 'len' is not used.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-13 17:14:15 -06:00
Michael S. Tsirkin
c9f79a3f79 virtio-net: mac property is mandatory
Mac feature bit isn't going to work as all network cards already have a
'mac' property to set the mac address.  Remove it from mask and add in
get_features.

Reported-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-12 14:32:19 -06:00
Michael S. Tsirkin
8172539d21 virtio: add features as qdev properties
Add feature bits as properties to virtio. This makes it possible to e.g. define
machine without indirect buffer support, which is required for 0.10
compatibility, or without hardware checksum support, which is required for 0.11
compatibility.  Since default values for optional features are now set by qdev,
get_features callback has been modified: it sets non-optional bits, and clears
bits not supported by host.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-11 13:40:59 -06:00
Michael S. Tsirkin
704a76fcd2 virtio: rename features -> guest_features
Rename features->guest_features. This is
what they are, avoid confusion with
host features which we also need to keep around.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-11 13:40:59 -06:00
Mark McLoughlin
665a3b071d net: remove VLANClientState members now in NetClientInfo
Add a NetClientInfo pointer to VLANClientState and use that
for the typecode and function pointers.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-12-03 09:41:34 -06:00
Mark McLoughlin
eb6b6c121a net: convert virtio to NICState
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-12-03 09:41:30 -06:00
Dustin Kirkland
184bd04845 whitelist host virtio networking features
This patch is a followup to 8eca6b1bc7,
fixing crashes when guests with 2.6.25 virtio drivers have saturated
virtio network connections.

https://bugs.edge.launchpad.net/ubuntu/+source/qemu-kvm/+bug/458521

That patch should have been whitelisting *_HOST_* rather than the the
*_GUEST_* features.

I tested this by running an Ubuntu 8.04 Hardy guest (2.6.24 kernel +
2.6.25-virtio driver).  I saturated both the incoming, and outgoing
network connection with nc, seeing sustained 6MB/s up and 6MB/s down
bitrates for ~20 minutes.  Previously, this crashed immediately.  Now,
the guest does not crash and maintains network connectivity throughout
the test.

Signed-off-by: Dustin Kirkland <kirkland@canonical.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-11-09 08:43:03 -06:00
Mark McLoughlin
cdd5cc12ba virtio-net: split the has_buffers() logic from can_receive()
We should only return zero from receive() for a condition which we'll
get notification of when it changes. Currently, we're returning zero
if the guest driver is not ready, but we won't ever flush our queue
when that status changes.

Also, don't check buffer space in can_receive(), but instead just allow
receive() to return zero when this condition occurs and have the caller
handle queueing the packet.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-11-09 08:43:02 -06:00
Mark McLoughlin
3cbe04c442 virtio-net: fix macaddr config regression
This commit:

    commit 97b15621
    virtio: use qdev properties for configuration.

    Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
    Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

makes a guest using virtio-net see an empty macaddr because we never
copy the macaddr into the location that virtio_net_get_config() uses.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-30 09:42:35 -05:00
Mark McLoughlin
a8ed73f73d net: move more stuff into net/tap-win32.c, add net/tap.h
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-30 08:39:27 -05:00
Mark McLoughlin
7200ac3c7c net: move net-checksum.c under net/
Also add a new net/checksum.h header

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-30 08:39:26 -05:00
Mark McLoughlin
0ce0e8f498 virtio-net: add tap_has_ufo flag to saved state
If we tell the guest we support UFO and then migrate to host which
doesn't support it, we will find ourselves in grave difficulties.

Prevent this scenario by adding a flag to virtio-net's savevm format
which indicates whether the device requires host UFO support.

[v2:
  - add has_ufo uint8_t field for ease of vmstate conversion
  - use qemu_error()
]

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-27 12:29:05 -05:00
Sridhar Samudrala
6c9f58ba3b Enable UFO on virtio-net and tap devices
Enable UFO on the host tap device if supported and allow setting UFO
on virtio-net in the guest.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-27 12:29:04 -05:00
Anthony Liguori
1d41b0c1ec Work around dhclient brokenness
With the latest GSO/csum offload patches, any guest using an unpatched version
of dhclient (any Ubuntu guest, for instance), will no longer be able to get
a DHCP address.

dhclient is actually at fault here.  It uses AF_PACKET to receive DHCP responses
but does not check auxdata to see if the packet has a valid csum.  This causes
it to throw out the DHCP responses it gets from the virtio interface as there
is not a valid checksum.

Fedora has carried a patch to fix their dhclient (it's needed for Xen too) but
this patch has not made it into a release of dhclient.  AFAIK, the patch is in
the dhclient CVS but I cannot confirm since their CVS is not public.

This patch, suggested by Rusty, looks for UDP packets (of a normal MTU) and
explicitly adds a checksum to them if they are missing one.

This allows unpatched dhclients to continue to work without needing to update
the guest kernels.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-27 12:29:04 -05:00
Mark McLoughlin
f5436dd96a virtio-net: enable tap offload if guest supports it
We query the guest's feature set to see if it supports offload and,
if so, we enable those features on the tap interface.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-27 12:29:03 -05:00
Mark McLoughlin
3a330134b3 virtio-net: add vnet_hdr support
With '-netdev tap,id=foo -nic model=virtio,netdev=foo' virtio-net can
detect that its peer (i.e. the tap backend) supports vnet headers
and advertise to the guest that it can send packets with partial
checksums and/or TSO packets.

One complication is that if we're migrating and the source host
supports IFF_VNET_HDR but the destination host doesn't, we can't then
stop the guest from using those features. In this scenario, we just
fail the migration.

[v2:
 - add has_vnet_hdr uint32_t field for ease of vmstate conversion
 - use qemu_error()
]

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-27 12:29:01 -05:00
Gerd Hoffmann
97b156213e virtio: use qdev properties for configuration.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-27 12:28:40 -05:00
Glauber Costa
a61d1f6701 notify io_thread at the end of rx handling
This is a backport from qemu-kvm. Just instead of using kvm's specific
notification mechanism, we use qemu_notify_event()

Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-22 10:58:49 -05:00
Michael S. Tsirkin
ffe6370c9f qemu/net: flag to control the number of vectors a nic has
Add an option to specify the number of MSI-X vectors for PCI NIC cards. This
can also be used to disable MSI-X, for compatibility with old qemu. This
option currently only affects virtio cards.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-24 09:09:15 -05:00
Michael S. Tsirkin
566e2d3e88 qemu/net: request 3 vectors in virtio-net
Request up to 3 vectors in virtio-net. Actual bindings might supply
less.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-24 09:09:15 -05:00
Mark McLoughlin
6243375f9b virtio-net: implement async packet sending
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-22 10:15:30 -05:00
Mark McLoughlin
e16044ef2e virtio-net: enable mergeable receive buffers
When virtio-net was merged in from qemu-kvm.git, the VNET_HDR related
features were dropped from the code.

However, VIRTIO_NET_F_MRG_RXBUF appears to have accidentally been
dropped too. Re-instate that now.

Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-22 10:15:28 -05:00
Alex Williamson
4ffb17f5c3 virtio-net: Increase filter and control limits
Increase the size of the perfect filter table and control queue depth.
This should give us more headroom in the MAC filter and is known to be
needed by at least one guest user.  Increasing the control queue depth
allows a guest to feed several commands back to back if they so desire
rather than using the send and wait approach Linux uses.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2009-06-09 11:38:50 +01:00
Alex Williamson
015cb16699 virtio-net: Add new RX filter controls
Add a few new RX modes to better control the receive_filter.  These
are all fairly obvious features that hardware could provide.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2009-06-09 11:38:50 +01:00
Alex Williamson
2d9aba3961 virtio-net: MAC filter optimization
The MAC filter table is received from the guest as two separate
buffers, one with unicast entries, the other with multicast
entries.  If we track the index dividing the two sets, we can
avoid searching the part of the table with the wrong type of
entries.

We could store this index as part of the save image, but its
trivially easy to discover it on load.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2009-06-09 11:38:50 +01:00