The most common use of -net tap is to connect a tap device to a bridge. This
requires the use of a script and running qemu as root in order to allocate a
tap device to pass to the script.
This model is great for portability and flexibility but it's incredibly
difficult to eliminate the need to run qemu as root. The only really viable
mechanism is to use tunctl to create a tap device, attach it to a bridge as
root, and then hand that tap device to qemu. The problem with this mechanism
is that it requires administrator intervention whenever a user wants to create
a guest.
By essentially writing a helper that implements the most common qemu-ifup
script that can be safely given cap_net_admin, we can dramatically simplify
things for non-privileged users. We still support existing -net tap options
as a mechanism for advanced users and backwards compatibility.
Currently, this is very Linux centric but there's really no reason why it
couldn't be extended for other Unixes.
A typical invocation would be similar to one of the following:
qemu linux.img -net bridge -net nic,model=virtio
qemu linux.img -net tap,helper="/usr/local/libexec/qemu-bridge-helper"
-net nic,model=virtio
qemu linux.img -netdev bridge,id=hn0
-device virtio-net-pci,netdev=hn0,id=nic1
qemu linux.img -netdev tap,helper="/usr/local/libexec/qemu-bridge-helper",id=hn0
-device virtio-net-pci,netdev=hn0,id=nic1
The default bridge that we attach to is br0. The thinking is that a distro
could preconfigure such an interface to allow out-of-the-box bridged networking.
Alternatively, if a user wants to use a different bridge, a typical invocation
would be simliar to one of the following:
qemu linux.img -net bridge,br=qemubr0 -net nic,model=virtio
qemu linux.img -net tap,helper="/usr/local/libexec/qemu-bridge-helper --br=qemubr0"
-net nic,model=virtio
qemu linux.img -netdev bridge,br=qemubr0,id=hn0
-device virtio-net-pci,netdev=hn0,id=nic1
qemu linux.img -netdev tap,helper="/usr/local/libexec/qemu-bridge-helper --br=qemubr0",id=hn0
-device virtio-net-pci,netdev=hn0,id=nic1
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Richa Marwaha <rmarwah@linux.vnet.ibm.com>
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch fixes a bug where child processes of launch_script() can
misbehave due to SIGCHLD being blocked. In the case of `sudo`, this
causes a permanent hang.
Previously a SIGCHLD handler was added to reap fork_exec()'d zombie
processes by calling waitpid(-1, ...). This required other
fork()/waitpid() callers to temporarilly block SIGCHILD to avoid
having the final wait status being intercepted by the SIGCHLD
handler:
7c3370d4fe
Since then, the qemu_add_child_watch() interface was added to allow
registration of such processes and reap only from that specific set
of PIDs:
4d54ec7898
As a result, we can now avoid blocking SIGCHLD in launch_script(), so
drop that behavior.
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch removes all references to signal.h when qemu-common.h is included
as they become redundant.
Signed-off-by: Alexandre Raymond <cerbere@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
When MSI is off, each interrupt needs to be bounced through the io
thread when it's set/cleared, so vhost-net causes more context switches and
higher CPU utilization than userspace virtio which handles networking in
the same thread.
We'll need to fix this by adding level irq support in kvm irqfd,
for now disable vhost-net in these configurations.
Added a vhostforce flag to force vhost-net back on.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
virtio-net expects set_offload to succeed after
peer cleanup.
Since we don't have an open fd anymore, make it so.
Fixes warning about the failure of offload setting.
Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Frontends calling tap_get_vhost_net get an invalid pointer after the
peer backend has been deleted. Jason Wang <jasowang@redhat.com> reports
this leading to a crash in ack_features when we remove the vhost-net
bakend of a virtio nic.
The fix is simply to clear the backend pointer.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Make host vnet header length a structure field in
preparation for using this support in linux kernel.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
will be used by virtio-net for vhost net support
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This adds vhost binary option to tap, to enable vhost net accelerator.
Default is off for now, we'll be able to make default on long term
when we know it's stable.
vhostfd option can be used by management, to pass in the fd. Assigning
vhostfd implies vhost=on.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Will be used by vhost to attach/detach to backend.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Right now, downscript is not invoked reliably. If you execute 'quit' from the
monitor, it won't be invoked.
This fixes that by converting tap to use an exit_notifier to execute the
downscript. In this case, allowing an exit notifier to include state is
critically important for the conversion.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
error_report() terminates the message with a newline. Strip it it
from its arguments.
This fixes a few error messages lacking a newline:
net_handle_fd_param()'s "No file descriptor named %s found", and
tap_open()'s "vnet_hdr=1 requested, but no kernel support for
IFF_VNET_HDR available" (all three versions).
There's one place that passes arguments without newlines
intentionally: load_vmstate(). Fix it up.
net_check_clients() prints this when an VLAN has host devices, but no
guest devices. It uses VLANState members nb_guest_devs and
nb_host_devs to keep track of these devices. However, -device does
not update nb_guest_devs, only net_init_nic() does that, for -net nic.
Check the VLAN clients directly, and remove the counters.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
When vhost is bound to a backend device, we need to stop polling it when
vhost is started, and restart polling when vhost is stopped.
Add an API for that for use by vhost, and implement in tap backend.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
net_tap_init() always sets vnet_hdr using qemu_opt_get_bool(), but
initialize it in net_init_tap() just to reduce confusion.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Add a NetClientInfo pointer to VLANClientState and use that
for the typecode and function pointers.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Okay, let's try re-enabling the drain-entire-queue behaviour, with a
difference - before each subsequent packet, use qemu_can_send_packet()
to check that we can send it. This is similar to how we check before
polling the tap fd and avoids having to drop a packet if the receiver
cannot handle it.
This patch should be a performance improvement since we no longer have
to go through the mainloop for each packet.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
If qemu_send_packet_async() returns zero, it means the packet has been
queued and the sent callback will be invoked once it has been flushed.
This is only possible where the NIC's receive() handler returns zero
and promises to notify the networking core that room is available in its
queue again.
In the case where the receive handler does not have this capability
(and its queue fills up) it returns -1 and the networking core does not
queue up the packet. This condition is indicated by a -1 return from
qemu_send_packet_async().
Currently, tap handles this condition simply by dropping the packet. It
should do its best to avoid getting into this situation by checking such
NIC's have room for a packet before copying the packet from the tap
interface.
tap_send() used to achieve this by only reading a single packet before
returning to the mainloop. That way, tap_can_send() is called before
reading each packet.
tap_send() was changed to completely drain the tap interface queue
without taking into account the situation where the NIC returns an
error and the packet is not queued. Let's start fixing this by
reverting to the previous behaviour of reading one packet at a time.
Reported-by: Scott Tsai <scottt.tw@gmail.com>
Tested-by: Sven Rudolph <Sven_Rudolph@drewag.de>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Currently compiling the tap sources breaks on Mac OS X. This is because of:
1) tap-linux.h requiring Linux includes
2) typos
3) missing #includes
This patch adds what's necessary to compile tap happily on Mac OS X.
I haven't tested if using tap actually works, but I don't think that's a
major issue as that code was probably seriously untested before already.
I didn't split the patch, because it's only a few lines of code and
splitting is probably not worth the effort here.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Okay, this makes the tap options available on AIX even though there's
no support, but if we want to do it right we should have not compile
the tap code at all on AIX using e.g. CONFIG_TAP.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>