error_propagate() already ignores local_err==NULL, so there's no
need to check it before calling.
Coccinelle patch used to perform the changes added to
scripts/coccinelle/error_propagate_null.cocci.
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1465855078-19435-2-git-send-email-ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Beginning of reconnect support for vhost-user.
Misc cleanups and fixes.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXY0Q3AAoJECgfDbjSjVRpkVcH/2gTHRE9yUoWe6ROvPV67BKx
8Iy9GzJ3BMO3RolVZEA5KXIevn5TG+pV274BZEuXMD3AL/molv279p0o/gvBYoqq
V0jNH2MO+MV6D9OzhUXcgWSejvybF5W07ojPDU/hlgtFXPZFbJDyt95MWaLiilOg
cCtTuRqgrrRaypcnnk/CIDbC+Ek2kAYdgQHQbfj9ihle3TWO8R0bSXnFqSaqCIkM
4slMlv8y82fODeiO83nkpfAP1NCnfnRC8r8Gv7hbEUTlZQntavx5DuYdiIx6nsJE
W0g+Gpe1o0+jRuMnucGIUZvqzZ0e/I0wZuV16Nsfx+Rbd5+4CzTxZda5Qb05v7I=
=BHbJ
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pc, pci, virtio: new features, cleanups, fixes
Beginning of reconnect support for vhost-user.
Misc cleanups and fixes.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Fri 17 Jun 2016 01:28:39 BST
# gpg: using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
MAINTAINERS: add Marcel to PCI
msi_init: change return value to 0 on success
fix some coding style problems
pci core: assert ENOSPC when add capability
test: start vhost-user reconnect test
tests: append i386 tests
vhost-net: save & restore vring enable state
vhost-net: save & restore vhost-user acked features
vhost-net: do not crash if backend is not present
vhost-user: disconnect on start failure
qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd
tests/vhost-user-bridge: workaround stale vring base
tests/vhost-user-bridge: add client mode
vhost-user: add ability to know vhost-user backend disconnection
pci: fix pci_requester_id()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Conflicts:
tests/Makefile.include
It has:
1. More newlines make the code block well separated.
2. Add more comments for msi_init.
3. Fix a indentation in vmxnet3.c.
4. ioh3420 & xio3130_downstream: put PCI Express capability init function
together, make it more readable.
cc: Michael S. Tsirkin <mst@redhat.com>
cc: Markus Armbruster <armbru@redhat.com>
cc: Marcel Apfelbaum <marcel@redhat.com>
cc: Dmitry Fleytman <dmitry@daynix.com>
cc: Jason Wang <jasowang@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A driver may change the vring enable state at run time but vhost-user
backend may not be present (a contrived example is when the backend is
disconnected and the device is reconfigured after driver rebinding)
Restore the vring state when the vhost-user backend is started, so it
can process the ring.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The initial vhost-user connection sets the features to be negotiated
with the driver. Renegotiation isn't possible without device reset.
To handle reconnection of vhost-user backend, ensure the same set of
features are provided, and reuse already acked features.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Do not crash when backend is not present while enabling the ring. A
following patch will save the enabled state so it can be restored once
the backend is started.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check
is bogus without a previous inclusion of sys/mman.h. Include it in
sysemu/os-posix.h and remove it from everywhere else.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace (((n) + (d) - 1) /(d)) by DIV_ROUND_UP(n,d).
This patch is the result of coccinelle script
scripts/coccinelle/round.cocci
CC: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Since mit_delay can never be 0 this if statement is
superfluous.
Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This patch fixes used-uninitialized false
positive while compiling with ust tracing
backend plus gcc 4.6.3:
hw/net/e1000e.c: In function ‘e1000e_io_write’:
hw/net/e1000e.c:170:39: error: ‘idx’ may be used uninitialized in this function [-Werror=uninitialized]
hw/net/e1000e.c: In function ‘e1000e_io_read’:
hw/net/e1000e.c:145:35: error: ‘idx’ may be used uninitialized in this function [-Werror=uninitialized]
cc1: all warnings being treated as errors
make: *** [hw/net/e1000e.o] Error 1
Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-id: 1465023763-10773-1-git-send-email-dmitry@daynix.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
ust trace backend has limitation of maximum 10
arguments per event. Traces with more arguments
cannot be compiled for this backend.
Trace e1000e_rx_rss_ip6 introduced by previous
commits has 11 arguments and fails to compile with
ust trace backend.
This patch fixes the problem by splitting this
tracepoint into two successive tracepoints with
smaller number of arguments.
For more information see comment regarding TP_ARGS
in lttng/tracepoint.h:
/*
* TP_ARGS takes tuples of type, argument separated by a comma.
* It can take up to 10 tuples (which means that less than 10 tuples is
* fine too).
* Each tuple is also separated by a comma.
*/
Build log generated by this problem:
In file included from ./trace/generated-tracers.h:9:0,
from /home/travis/build/qemu/qemu/include/trace.h:4,
from util/oslib-posix.c:36:
./trace/generated-ust-provider.h:16556:3: error: unknown type name ‘_TP_EXPROTO_Bool’
In file included from /home/travis/build/qemu/qemu/include/trace.h:4:0,
from util/oslib-posix.c:36:
./trace/generated-tracers.h: In function ‘trace_e1000e_rx_rss_ip6’:
./trace/generated-tracers.h:8379:431: error: expected string literal before ‘_SDT_ASM_OPERANDS_ipv6_enabled’
./trace/generated-tracers.h:8379:431: error: implicit declaration of function ‘__tracepoint_cb_qemu___e1000e_rx_rss_ip6’ [-Werror=implicit-function-declaration]
./trace/generated-tracers.h:8379:431: error: nested extern declaration of ‘__tracepoint_cb_qemu___e1000e_rx_rss_ip6’ [-Werror=nested-externs]
cc1: all warnings being treated as errors
make: *** [util/oslib-posix.o] Error 1
make: *** Waiting for unfinished jobs....
In file included from ./trace/generated-tracers.h:9:0,
from /home/travis/build/qemu/qemu/include/trace.h:4,
from util/hbitmap.c:16:
./trace/generated-ust-provider.h:16556:3: error: unknown type name ‘_TP_EXPROTO_Bool’
In file included from /home/travis/build/qemu/qemu/include/trace.h:4:0,
from util/hbitmap.c:16:
./trace/generated-tracers.h: In function ‘trace_e1000e_rx_rss_ip6’:
./trace/generated-tracers.h:8379:431: error: expected string literal before ‘_SDT_ASM_OPERANDS_ipv6_enabled’
./trace/generated-tracers.h:8379:431: error: implicit declaration of function ‘__tracepoint_cb_qemu___e1000e_rx_rss_ip6’ [-Werror=implicit-function-declaration]
./trace/generated-tracers.h:8379:431: error: nested extern declaration of ‘__tracepoint_cb_qemu___e1000e_rx_rss_ip6’ [-Werror=nested-externs]
cc1: all warnings being treated as errors
make: *** [util/hbitmap.o] Error 1
Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Message-id: 1464894748-27803-1-git-send-email-dmitry@daynix.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Version: GnuPG v1
iQEcBAABAgAGBQJXT9DWAAoJEO8Ells5jWIRgFAH/1ZDXm8V523AMDOEvBAWgqur
Dj8ZaIwFkqJp7xtLdhS0yKF3xW+vtgx9k+Qftk0S8qEiFKPbThR8iB5VNuesErwd
AZhWo4bnVhKwtWyMw3BDRDK1N4huAWPMZEva1xovR/Cc9v5IG5mx57/K3Zz5C8ec
Jsn4DsLKN0q7W0D0dlnbEOkSjl6iKJchvfPCR6UfvrU7BxfXaCZ9Z7Sfh8ec6tfr
iMgcV9u3A3Zs72gTM9/jdKx8vOrWtdKJufJ8s2Bctc7CyfBNWwnV8PjndhEe3Xvs
vlYeJopdpDPsdMkMtYD6cevtEgvD5yhOBndJ7et807jjuCvUf837tMhodKkFk9M=
=SjIZ
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Thu 02 Jun 2016 07:23:18 BST using RSA key ID 398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request: (31 commits)
Add ENET device to i.MX6 SOC.
Add ENET/Gbps Ethernet support to FEC device
i.MX: move FEC device to a register array structure.
i.MX: Rename i.MX FEC defines to ENET_XXX
i.MX: reset TX/RX descriptors when FEC is disabled.
i.MX: Fix FEC code for ECR register reset value.
i.MX: Fix FEC code for MDIO address selection
i.MX: Fix FEC code for MDIO operation selection
net: handle optional VLAN header in checksum computation.
net: improve UDP/TCP checksum computation.
e1000e: Introduce qtest for e1000e device
net: Introduce e1000e device emulation
e1000: Move out code that will be reused in e1000e
e1000_regs: Add definitions for Intel 82574-specific bits
vmxnet3: Use pci_dma_* API instead of cpu_physical_memory_*
net_pkt: Extend packet abstraction as required by e1000e functionality
rtl8139: Move more TCP definitions to common header
net_pkt: Name vmxnet3 packet abstractions more generic
vmxnet3: Use common MAC address tracing macros
net: Add macros for MAC address tracing
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The ENET device (present in i.MX6) is "derived" from FEC and backward
compatible with it.
This patch adds the necessary support of the added feature in the ENET
device to allow Linux to use it (on supported processors).
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This is to prepare for the ENET Gb device of the i.MX6.
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
According to the FEC chapter of i.MX25 reference manual
RX adn TX descriptors are reseted when the FEC device is disabled through ECR.
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
According to the FEC chapter of i.MX25 reference manual ECR register is
initialized at 0xf0000000 at reset time.
We fix the value.
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
According to the FEC chapter of i.MX25 reference manual
When writing to MMFR register, the MDIO device and adress are selected by
bit 27 to 23 and bit 22 to 18 respectively. This is a total of 10 bits
that need to be used by the Phy chip/address decoding function.
This patch fixes the number of bits used from 9 to 10.
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
According to the FEC chapter of i.MX25 reference manual
When writing the MMFR register, bit 29 and 28 select the requested operation.
* 10 means read operation with valid MII mgmt frame
* 11 means read operation with non compliant MII mgmt frame
* 01 means write operation with valid MII mgmt frame
* 00 means write operation with non compliant MII mgmt frame
So while bit 28 does change beween read/write for valid MII mgmt frame, the
mening is inverted for non compliant MII mgmt frame.
Bit 29 on the other hand means read/write whatever the type of mgmt frame
involved.
So this patch change the operation selection from bit 28 to bit 29 as it is
more generic.
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This patch introduces emulation for the Intel 82574 adapter, AKA e1000e.
This implementation is derived from the e1000 emulation code, and
utilizes the TX/RX packet abstractions that were initially developed for
the vmxnet3 device. Although some parts of the introduced code may be
shared with e1000, the differences are substantial enough so that the
only shared resources for the two devices are the definitions in
hw/net/e1000_regs.h.
Similarly to vmxnet3, the new device uses virtio headers for task
offloads (for backends that support virtio extensions). Usage of
virtio headers may be forcibly disabled via a boolean device property
"vnet" (which is enabled by default). In such case task offloads
will be performed in software, in the same way it is done on
backends that do not support virtio headers.
The device code is split into two parts:
1. hw/net/e1000e.c: QEMU-specific code for a network device;
2. hw/net/e1000e_core.[hc]: Device emulation according to the spec.
The new device name is e1000e.
Intel specifications for the 82574 controller are available at:
http://www.intel.com/content/dam/doc/datasheet/82574l-gbe-controller-datasheet.pdf
Throughput measurement results (iperf2):
Fedora 22 guest, TCP, RX
4 ++------------------------------------------+
| |
| X X X X X
3.5 ++ X X X X |
| X |
| |
3 ++ |
G | X |
b | |
/ 2.5 ++ |
s | |
| |
2 ++ |
| |
| |
1.5 X+ |
| |
+ + + + + + + + + + + +
1 ++--+---+---+---+---+---+---+---+---+---+---+
32 64 128 256 512 1 2 4 8 16 32 64
B B B B B KB KB KB KB KB KB KB
Buffer size
Fedora 22 guest, TCP, TX
18 ++-------------------------------------------+
| X |
16 ++ X X X X X
| X |
14 ++ |
| |
12 ++ |
G | X |
b 10 ++ |
/ | |
s 8 ++ |
| |
6 ++ X |
| |
4 ++ |
| X |
2 ++ X |
X + + + + + + + + + + +
0 ++--+---+---+---+---+----+---+---+---+---+---+
32 64 128 256 512 1 2 4 8 16 32 64
B B B B B KB KB KB KB KB KB KB
Buffer size
Fedora 22 guest, UDP, RX
3 ++------------------------------------------+
| X
| |
2.5 ++ |
| |
| |
2 ++ X |
G | |
b | |
/ 1.5 ++ |
s | X |
| |
1 ++ |
| |
| X |
0.5 ++ |
| X |
X + + + + +
0 ++-------+--------+-------+--------+--------+
32 64 128 256 512 1
B B B B B KB
Datagram size
Fedora 22 guest, UDP, TX
1 ++------------------------------------------+
| X
0.9 ++ |
| |
0.8 ++ |
0.7 ++ |
| |
G 0.6 ++ |
b | |
/ 0.5 ++ |
s | X |
0.4 ++ |
| |
0.3 ++ |
0.2 ++ X |
| |
0.1 ++ X |
X X + + + +
0 ++-------+--------+-------+--------+--------+
32 64 128 256 512 1
B B B B B KB
Datagram size
Windows 2012R2 guest, TCP, RX
3.2 ++------------------------------------------+
| X |
3 ++ |
| |
2.8 ++ |
| |
2.6 ++ X |
G | X X X X X
b 2.4 ++ X X |
/ | |
s 2.2 ++ |
| |
2 ++ |
| X X |
1.8 ++ |
| |
1.6 X+ |
+ + + + + + + + + + + +
1.4 ++--+---+---+---+---+---+---+---+---+---+---+
32 64 128 256 512 1 2 4 8 16 32 64
B B B B B KB KB KB KB KB KB KB
Buffer size
Windows 2012R2 guest, TCP, TX
14 ++-------------------------------------------+
| |
| X X
12 ++ |
| |
10 ++ |
| |
G | |
b 8 ++ |
/ | X |
s 6 ++ |
| |
| |
4 ++ X |
| |
2 ++ |
| X X X |
+ X X + + X X + + + + +
0 X+--+---+---+---+---+----+---+---+---+---+---+
32 64 128 256 512 1 2 4 8 16 32 64
B B B B B KB KB KB KB KB KB KB
Buffer size
Windows 2012R2 guest, UDP, RX
1.6 ++------------------------------------------X
| |
1.4 ++ |
| |
1.2 ++ |
| X |
| |
G 1 ++ |
b | |
/ 0.8 ++ |
s | |
0.6 ++ X |
| |
0.4 ++ |
| X |
| |
0.2 ++ X |
X + + + + +
0 ++-------+--------+-------+--------+--------+
32 64 128 256 512 1
B B B B B KB
Datagram size
Windows 2012R2 guest, UDP, TX
0.6 ++------------------------------------------+
| X
| |
0.5 ++ |
| |
| |
0.4 ++ |
G | |
b | |
/ 0.3 ++ X |
s | |
| |
0.2 ++ |
| |
| X |
0.1 ++ |
| X |
X X + + + +
0 ++-------+--------+-------+--------+--------+
32 64 128 256 512 1
B B B B B KB
Datagram size
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Code that will be shared moved to a separate files.
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
To make this device and network packets
abstractions ready for IOMMU.
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This patch extends the TX/RX packet abstractions with features that will
be used by the e1000e device implementation.
Changes are:
1. Support iovec lists for RX buffers
2. Deeper RX packets parsing
3. Loopback option for TX packets
4. Extended VLAN headers handling
5. RSS processing for RX packets
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This patch drops "vmx" prefix from packet abstractions names
to emphasize the fact they are generic and not tied to any
specific network device.
These abstractions will be reused by e1000e emulation implementation
introduced by following patches so their names need generalization.
This patch (except renamed files, adjusted comments and changes in MAINTAINTERS)
was produced by:
git grep -lz 'vmxnet_tx_pkt' | xargs -0 perl -i'' -pE "s/vmxnet_tx_pkt/net_tx_pkt/g"
git grep -lz 'vmxnet_rx_pkt' | xargs -0 perl -i'' -pE "s/vmxnet_rx_pkt/net_rx_pkt/g"
git grep -lz 'VmxnetTxPkt' | xargs -0 perl -i'' -pE "s/VmxnetTxPkt/NetTxPkt/g"
git grep -lz 'VMXNET_TX_PKT' | xargs -0 perl -i'' -pE "s/VMXNET_TX_PKT/NET_TX_PKT/g"
git grep -lz 'VmxnetRxPkt' | xargs -0 perl -i'' -pE "s/VmxnetRxPkt/NetRxPkt/g"
git grep -lz 'VMXNET_RX_PKT' | xargs -0 perl -i'' -pE "s/VMXNET_RX_PKT/NET_RX_PKT/g"
sed -ie 's/VMXNET_/NET_/g' hw/net/vmxnet_rx_pkt.c
sed -ie 's/VMXNET_/NET_/g' hw/net/vmxnet_tx_pkt.c
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The last 8 bytes of the receive buffer list page (that has been supplied
by the guest with the H_REGISTER_LOGICAL_LAN call) contain a counter
for frames that have been dropped because there was no suitable receive
buffer available. This patch introduces code to use this field to
provide the information about dropped rx packets to the guest.
There it can be queried with "ethtool -S eth0 | grep rx_no_buffer".
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently, the spapr-vlan device is trying to flush the RX queue
after each RX buffer that has been added by the guest via the
H_ADD_LOGICAL_LAN_BUFFER hypercall. In case the receive buffer pool
was empty before, we only pass single packets to the guest this
way. This can cause very bad performance if a sender is trying
to stream fragmented UDP packets to the guest. For example when
using the UDP_STREAM test from netperf with UDP packets that are
much bigger than the MTU size, almost all UDP packets are dropped
in the guest since the chances are quite high that at least one of
the fragments got lost on the way.
When flushing the receive queue, it's much better if we'd have
a bunch of receive buffers available already, so that fragmented
packets can be passed to the guest in one go. To do this, the
spapr_vlan_receive() function should return 0 instead of -1 if there
are no more receive buffers available, so that receive_disabled = 1
gets temporarily set for the receive queue, and we have to delay
the queue flushing at the end of h_add_logical_lan_buffer() a little
bit by using a timer, so that the guest gets a chance to add multiple
RX buffers before we flush the queue again.
This improves the UDP_STREAM test with the spapr-vlan device a lot:
Running
netserver -p 44444 -L <guestip> -f -D -4
in the guest, and
netperf -p 44444 -L <hostip> -H <guestip> -t UDP_STREAM -l 60 -- -m 16384
in the host, I get the following values _without_ this patch:
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
229376 16384 60.00 1738970 0 3798.83
229376 60.00 23 0.05
That "0.05" means that almost all UDP packets got lost/discarded
at the receiving side.
With this patch applied, the value look much better:
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
229376 16384 60.00 1789104 0 3908.35
229376 60.00 22818 49.85
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When receiving packets over MIPSnet network device, it uses
receive buffer of size 1514 bytes. In case the controller
accepts large(MTU) packets, it could lead to memory corruption.
Add check to avoid it.
Reported by: Oleksandr Bazhaniuk <oleksandr.bazhaniuk@intel.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
open_eth_start_xmit has a huge stack usage of 65536 bytes approx.
Moving large arrays to heap to reduce stack usage.
Reduce size of a buffer allocated on stack to 0x600 bytes, which is the
maximal frame length when HUGEN bit is not set in MODER, only allocate
buffer on heap when that is too small. Thus heap is not used in typical
use case.
Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Drop local definitions of MII registers and use constants from mii.h for
registers and register bits. No functional changes.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reserve this to CPU state serialization.
Luckily, they were only used by sPAPR devices and these are ppc64
only. So there is no change to migration format.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When receiving packets over Stellaris ethernet controller, it
uses receive buffer of size 2048 bytes. In case the controller
accepts large(MTU) packets, it could lead to memory corruption.
Add check to avoid it.
Reported-by: Oleksandr Bazhaniuk <oleksandr.bazhaniuk@intel.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 1460095428-22698-1-git-send-email-ppandit@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Through CP_TX_OWN and CP_RX_OWN points to the same bit, we'd better use
CP_TX_OWN for tx descriptor handling.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Indicate that autonegotiation is complete in the MII BMSR. This fixes
networking on xtfpga platform in linux v4.5.
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
This reverts commit 9596ef7c7b.
This workaround in order to fix endless interrupts is no
longer needed because it was superseded by the previous patch
(e1000: Fixing interrupt pace).
Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This patch introduces an upper bound for number of interrupts
per second. Without this bound an interrupt storm can occur as
it has been observed on Windows 10 when disabling the device.
According to the SPEC - Intel PCI/PCI-X Family of Gigabit
Ethernet Controllers Software Developer's Manual, section
13.4.18 - the Ethernet controller guarantees a maximum
observable interrupt rate of 7813 interrupts/sec. If there is
no upper bound this could lead to an interrupt storm by e1000
(when mit_delay < 500) causing interrupts to fire at a very high
pace.
Thus if mit_delay < 500 then the delay should be set to the
minimum delay possible which is 500. This can be calculated
easily as follows:
Interval = 10^9 / (7813 * 256) = 500.
Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* Chardev fix from Marc-André
* config.status tweak from David
* Header file tweaks from Markus, myself and Veronia (Outreachy candidate)
* get_ticks_per_sec() removal from Rutuja (Outreachy candidate)
* Coverity fix from myself
* PKE implementation from myself, based on rth's XSAVE support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJW9ErPAAoJEL/70l94x66DJfEH/A/QkMpAhrgNdyVsahzsGrzE
wx5gHFIc1nBYxyr62w4apUb5jPB7zaXu0LA7EAWDeAe0pyP8hZzLT9kJyOEDsuJu
zwKN2QeLSNMtPbnbKN0I/YQ2za2xX1V5ruhSeOJoVslUI214hgnAURaGshhQNzuZ
2CluDT9KgL5cQifAnKs5kJrwhIYShYNQB+1eDC/7wk28dd/EH+sPALIoF+rqrSmt
Zu4Mdqd+9Ns+oKOjA6br9ULq/Hzg0aDfY82J+XLVVqfF3PXQe8rTDmuMf/7jTn+M
Un7ZOcei9oZF2/9vfAfKQpDCcgD9HvOUSbgqV/ubmkPPmN/LNJzeKj0fBhrRN+Y=
=K12D
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* Log filtering from Alex and Peter
* Chardev fix from Marc-André
* config.status tweak from David
* Header file tweaks from Markus, myself and Veronia (Outreachy candidate)
* get_ticks_per_sec() removal from Rutuja (Outreachy candidate)
* Coverity fix from myself
* PKE implementation from myself, based on rth's XSAVE support
# gpg: Signature made Thu 24 Mar 2016 20:15:11 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
* remotes/bonzini/tags/for-upstream: (28 commits)
target-i386: implement PKE for TCG
config.status: Pass extra parameters
char: translate from QIOChannel error to errno
exec: fix error handling in file_ram_alloc
cputlb: modernise the debug support
qemu-log: support simple pid substitution for logs
target-arm: dfilter support for in_asm
qemu-log: dfilter-ise exec, out_asm, op and opt_op
qemu-log: new option -dfilter to limit output
qemu-log: Improve the "exec" TB execution logging
qemu-log: Avoid function call for disabled qemu_log_mask logging
qemu-log: correct help text for -d cpu
tcg: pass down TranslationBlock to tcg_code_gen
util: move declarations out of qemu-common.h
Replaced get_tick_per_sec() by NANOSECONDS_PER_SECOND
hw: explicitly include qemu-common.h and cpu.h
include/crypto: Include qapi-types.h or qemu/bswap.h instead of qemu-common.h
isa: Move DMA_transfer_handler from qemu-common.h to hw/isa/isa.h
Move ParallelIOArg from qemu-common.h to sysemu/char.h
Move QEMU_ALIGN_*() from qemu-common.h to qemu/osdep.h
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Conflicts:
scripts/clean-includes
RX buffer pools are now enabled by default for new machine types.
For older machine types, they are still disabled to avoid breaking
migration.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
tl;dr:
This patch introduces an alternate way of handling the receive
buffers of the spapr-vlan device, resulting in much better
receive performance for the guest.
Full story:
One of our testers recently discovered that the performance of the
spapr-vlan device is very poor compared to other NICs, and that
a simple "ping -i 0.2 -s 65507 someip" in the guest can result
in more than 50% lost ping packets (especially with older guest
kernels < 3.17).
After doing some analysis, it was clear that there is a problem
with the way we handle the receive buffers in spapr_llan.c: The
ibmveth driver of the guest Linux kernel tries to add a lot of
buffers into several buffer pools (with 512, 2048 and 65536 byte
sizes by default, but it can be changed via the entries in the
/sys/devices/vio/1000/pool* directories of the guest). However,
the spapr-vlan device of QEMU only tries to squeeze all receive
buffer descriptors into one single page which has been supplied
by the guest during the H_REGISTER_LOGICAL_LAN call, without
taking care of different buffer sizes. This has two bad effects:
First, only a very limited number of buffer descriptors is accepted
at all. Second, we also hand 64k buffers to the guest even if
the 2k buffers would fit better - and this results in dropped packets
in the IP layer of the guest since too much skbuf memory is used.
Though it seems at a first glance like PAPR says that we should store
the receive buffer descriptors in the page that is supplied during
the H_REGISTER_LOGICAL_LAN call, chapter 16.4.1.2 in the LoPAPR spec
declares that "the contents of these descriptors are architecturally
opaque, none of these descriptors are manipulated by code above
the architected interfaces". That means we don't have to store
the RX buffer descriptors in this page, but can also manage the
receive buffers at the hypervisor level only. This is now what we
are doing here: Introducing proper RX buffer pools which are also
sorted by size of the buffers, so we can hand out a buffer with
the best fitting size when a packet has been received.
To avoid problems with migration from/to older version of QEMU,
the old behavior is also retained and enabled by default. The new
buffer management has to be enabled via a new "use-rx-buffer-pools"
property.
Now with the new buffer pool management enabled, the problem with
"ping -s 65507" is fixed for me, and the throughput of a simple
test with wget increases from creeping 3MB/s up to 20MB/s!
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Refactor the code a little bit by extracting the code that reads
and writes the receive buffer list page into separate functions.
There should be no functional change in this patch, this is just
a preparation for the upcoming extensions that introduce receive
buffer pools.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch replaces get_ticks_per_sec() calls with the macro
NANOSECONDS_PER_SECOND. Also, as there are no callers, get_ticks_per_sec()
is then removed. This replacement improves the readability and
understandability of code.
For example,
timer_mod(fdctrl->result_timer,
qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + (get_ticks_per_sec() / 50));
NANOSECONDS_PER_SECOND makes it obvious that qemu_clock_get_ns
matches the unit of the expression on the right side of the plus.
Signed-off-by: Rutuja Shah <rutu.shah.26@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Manually drop redundant includes that scripts/clean-includes misses,
e.g. because they're hidden in generator programs, or they use the
wrong kind of delimiter.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef. Since then, we've moved to include qemu/osdep.h
everywhere. Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h. That's in excess of
100KiB of crap most .c files don't actually need.
Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h. Include qapi/error.h in .c files that need it and don't
get it now. Include qapi-types.h in qom/object.h for uint16List.
Update scripts/clean-includes accordingly. Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h
comment quoted above similarly.
This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third. Unfortunately, the number depending on
qapi-types.h shrinks only a little. More work is needed for that one.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add property to specify rocker world. All ports will be assigned to this
world.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Until now, 0 is returned in this error case. Fix it ro return -ENOMEM.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Port to world assignment should be permitted only by qemu user. Driver
should not be able to do it, so forbid that possibility.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Ne2000 NIC uses ring buffer of NE2000_MEM_SIZE(49152)
bytes to process network packets. Registers PSTART & PSTOP
define ring buffer size & location. Setting these registers
to invalid values could lead to infinite loop or OOB r/w
access issues. Add check to avoid it.
Reported-by: Yang Hongke <yanghongke@huawei.com>
Tested-by: Yang Hongke <yanghongke@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Since guest_mask_notifier can not be used in vhost-user mode due
to buffering implied by unix control socket, force
use_mask_notifier on virtio devices of vhost-user interfaces, and
send correct callfd to the guest at vhost start.
Using guest_notifier_mask function in vhost-user case may
break interrupt mask paradigm, because mask/unmask is not
really done when returning from guest_notifier_mask call, instead
message is posted in a unix socket, and processed later.
Add an option boolean flag 'use_mask_notifier' to disable the use
of guest_notifier_mask in virtio pci.
Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cross-endian is now handled by the core virtio-net code.
This patch reverts:
commit 5be7d9f1b1
vhost-net: tell tap backend about the vnet endianness
and
commit cf0a628f6e81bfc9b7a944fa0b80c3594836df56
net: set endianness on all backend devices
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
When running a fully emulated device in cross-endian conditions, including
a virtio 1.0 device offered to a big endian guest, we need to fix the vnet
headers. This is currently handled by the virtio_net_hdr_swap() function
in the core virtio-net code but it should actually be handled by the net
backend.
With this patch, virtio-net now tries to configure the backend to do the
endian fixing when the device starts (i.e. drivers sets the CONFIG_OK bit).
If the backend cannot support the requested endiannes, we have to fallback
onto virtio_net_hdr_swap(): this is recorded in the needs_vnet_hdr_swap flag,
to be used in the TX and RX paths.
Note that we reset the backend to the default behaviour (guest native
endianness) when the device stops (i.e. device status had CONFIG_OK bit and
driver unsets it). This is needed, with the linux tap backend at least,
otherwise the guest may lose network connectivity if rebooted into a
different endianness.
The current vhost-net code also tries to configure net backends. This will
be no more needed and will be reverted in a subsequent patch.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Similar to the previous patch, it's nice to have all functions
in the tree that involve a visitor and a name for conversion to
or from QAPI to consistently stick the 'name' parameter next
to the Visitor parameter.
Done by manually changing include/qom/object.h and qom/object.c,
then running this Coccinelle script and touching up the fallout
(Coccinelle insisted on adding some trailing whitespace).
@ rule1 @
identifier fn;
typedef Object, Visitor, Error;
identifier obj, v, opaque, name, errp;
@@
void fn
- (Object *obj, Visitor *v, void *opaque, const char *name,
+ (Object *obj, Visitor *v, const char *name, void *opaque,
Error **errp) { ... }
@@
identifier rule1.fn;
expression obj, v, opaque, name, errp;
@@
fn(obj, v,
- opaque, name,
+ name, opaque,
errp)
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1454075341-13658-20-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
JSON uses "name":value, but many of our visitor interfaces were
called with visit_type_FOO(v, &value, name, errp). This can be
a bit confusing to have to mentally swap the parameter order to
match JSON order. It's particularly bad for visit_start_struct(),
where the 'name' parameter is smack in the middle of the
otherwise-related group of 'obj, kind, size' parameters! It's
time to do a global swap of the parameter ordering, so that the
'name' parameter is always immediately after the Visitor argument.
Additional reason in favor of the swap: the existing include/qjson.h
prefers listing 'name' first in json_prop_*(), and I have plans to
unify that file with the qapi visitors; listing 'name' first in
qapi will minimize churn to the (admittedly few) qjson.h clients.
Later patches will then fix docs, object.h, visitor-impl.h, and
those clients to match.
Done by first patching scripts/qapi*.py by hand to make generated
files do what I want, then by running the following Coccinelle
script to affect the rest of the code base:
$ spatch --sp-file script `git grep -l '\bvisit_' -- '**/*.[ch]'`
I then had to apply some touchups (Coccinelle insisted on TAB
indentation in visitor.h, and botched the signature of
visit_type_enum() by rewriting 'const char *const strings[]' to
the syntactically invalid 'const char*const[] strings'). The
movement of parameters is sufficient to provoke compiler errors
if any callers were missed.
// Part 1: Swap declaration order
@@
type TV, TErr, TObj, T1, T2;
identifier OBJ, ARG1, ARG2;
@@
void visit_start_struct
-(TV v, TObj OBJ, T1 ARG1, const char *name, T2 ARG2, TErr errp)
+(TV v, const char *name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp)
{ ... }
@@
type bool, TV, T1;
identifier ARG1;
@@
bool visit_optional
-(TV v, T1 ARG1, const char *name)
+(TV v, const char *name, T1 ARG1)
{ ... }
@@
type TV, TErr, TObj, T1;
identifier OBJ, ARG1;
@@
void visit_get_next_type
-(TV v, TObj OBJ, T1 ARG1, const char *name, TErr errp)
+(TV v, const char *name, TObj OBJ, T1 ARG1, TErr errp)
{ ... }
@@
type TV, TErr, TObj, T1, T2;
identifier OBJ, ARG1, ARG2;
@@
void visit_type_enum
-(TV v, TObj OBJ, T1 ARG1, T2 ARG2, const char *name, TErr errp)
+(TV v, const char *name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp)
{ ... }
@@
type TV, TErr, TObj;
identifier OBJ;
identifier VISIT_TYPE =~ "^visit_type_";
@@
void VISIT_TYPE
-(TV v, TObj OBJ, const char *name, TErr errp)
+(TV v, const char *name, TObj OBJ, TErr errp)
{ ... }
// Part 2: swap caller order
@@
expression V, NAME, OBJ, ARG1, ARG2, ERR;
identifier VISIT_TYPE =~ "^visit_type_";
@@
(
-visit_start_struct(V, OBJ, ARG1, NAME, ARG2, ERR)
+visit_start_struct(V, NAME, OBJ, ARG1, ARG2, ERR)
|
-visit_optional(V, ARG1, NAME)
+visit_optional(V, NAME, ARG1)
|
-visit_get_next_type(V, OBJ, ARG1, NAME, ERR)
+visit_get_next_type(V, NAME, OBJ, ARG1, ERR)
|
-visit_type_enum(V, OBJ, ARG1, ARG2, NAME, ERR)
+visit_type_enum(V, NAME, OBJ, ARG1, ARG2, ERR)
|
-VISIT_TYPE(V, OBJ, NAME, ERR)
+VISIT_TYPE(V, NAME, OBJ, ERR)
)
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1454075341-13658-19-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Included here:
Refactoring and bugfix patches in PC/ACPI.
New commands for ipmi.
Virtio optimizations.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWtj8KAAoJECgfDbjSjVRpBIQIAJSB9xwTcBLXwD0+8z5lqjKC
GTtuVbHU0+Y/eO8O3llN5l+SzaRtPHo18Ele20Oz7IQc0ompANY273K6TOlyILwB
rOhrub71uqpOKbGlxXJflroEAXb78xVK02lohSUvOzCDpwV+6CS4ZaSer7yDCYkA
MODZj7rrEuN0RmBWqxbs1R7Mj2CeQJzlgTUNTBGCLEstoZGFOJq8FjVdG5P1q8vI
fnI9mGJ1JsDnmcUZe/bTFfB4VreqeQ7UuGyNAMMGnvIbr0D1a+CoaMdV7/HZ+KyT
5TIs0siVdhZei60A/Cq2OtSVCbj5QdxPBLhZfwJCp6oU4lh2U5tSvva0mh7MwJ0=
=D/cA
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pc and misc cleanups and fixes, virtio optimizations
Included here:
Refactoring and bugfix patches in PC/ACPI.
New commands for ipmi.
Virtio optimizations.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Sat 06 Feb 2016 18:44:26 GMT using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
* remotes/mst/tags/for_upstream: (45 commits)
net: set endianness on all backend devices
fix MSI injection on Xen
intel_iommu: large page support
dimm: Correct type of MemoryHotplugState->base
pc: set the OEM fields in the RSDT and the FADT from the SLIC
acpi: add function to extract oem_id and oem_table_id from the user's SLIC
acpi: expose oem_id and oem_table_id in build_rsdt()
acpi: take oem_id in build_header(), optionally
pc: Eliminate PcGuestInfo struct
pc: Move APIC and NUMA data from PcGuestInfo to PCMachineState
pc: Move PcGuestInfo.fw_cfg to PCMachineState
pc: Remove PcGuestInfo.isapc_ram_fw field
pc: Remove RAM size fields from PcGuestInfo
pc: Remove compat fields from PcGuestInfo
acpi: Don't save PcGuestInfo on AcpiBuildState
acpi: Remove guest_info parameters from functions
pc: Simplify xen_load_linux() signature
pc: Simplify pc_memory_init() signature
pc: Eliminate struct PcGuestInfoState
pc: Move PcGuestInfo declaration to top of file
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
commit 5be7d9f1b1
vhost-net: tell tap backend about the vnet endianness
makes vhost net to set the endianness of the device, but only for
the first device.
In case of multiqueue, we have multiple devices... This patch sets the
endianness for all the devices of the interface.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
The return code of virtqueue_pop/vring_pop is unused except to check for
errors or 0. We can thus easily move allocation inside the functions
and just return a pointer to the VirtQueueElement.
The advantage is that we will be able to allocate only the space that
is needed for the actual size of the s/g list instead of the full
VIRTQUEUE_MAX_SIZE items. Currently VirtQueueElement takes about 48K
of memory, and this kind of allocation puts a lot of stress on malloc.
By cutting the size by two or three orders of magnitude, malloc can
use much more efficient algorithms.
The patch is pretty large, but changes to each device are testable
more or less independently. Splitting it would mostly add churn.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
The start_xmit() and e1000_receive_iov() functions implement DMA transfers
iterating over a set of descriptors that the guest's e1000 driver
prepares:
- the TDLEN and RDLEN registers store the total size of the descriptor
area,
- while the TDH and RDH registers store the offset (in whole tx / rx
descriptors) into the area where the transfer is supposed to start.
Each time a descriptor is processed, the TDH and RDH register is bumped
(as appropriate for the transfer direction).
QEMU already contains logic to deal with bogus transfers submitted by the
guest:
- Normally, the transmit case wants to increase TDH from its initial value
to TDT. (TDT is allowed to be numerically smaller than the initial TDH
value; wrapping at or above TDLEN bytes to zero is normal.) The failsafe
that QEMU currently has here is a check against reaching the original
TDH value again -- a complete wraparound, which should never happen.
- In the receive case RDH is increased from its initial value until
"total_size" bytes have been received; preferably in a single step, or
in "s->rxbuf_size" byte steps, if the latter is smaller. However, null
RX descriptors are skipped without receiving data, while RDH is
incremented just the same. QEMU tries to prevent an infinite loop
(processing only null RX descriptors) by detecting whether RDH assumes
its original value during the loop. (Again, wrapping from RDLEN to 0 is
normal.)
What both directions miss is that the guest could program TDLEN and RDLEN
so low, and the initial TDH and RDH so high, that these registers will
immediately be truncated to zero, and then never reassume their initial
values in the loop -- a full wraparound will never occur.
The condition that expresses this is:
xdh_start >= s->mac_reg[XDLEN] / sizeof(desc)
i.e., TDH or RDH start out after the last whole rx or tx descriptor that
fits into the TDLEN or RDLEN sized area.
This condition could be checked before we enter the loops, but
pci_dma_read() / pci_dma_write() knows how to fill in buffers safely for
bogus DMA addresses, so we just extend the existing failsafes with the
above condition.
This is CVE-2016-1981.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Prasad Pandit <ppandit@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: qemu-stable@nongnu.org
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1296044
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
gem_transmit copies a packet from guest into an tx_packet[2048]
array on stack, with size limited by descriptor length set by guest. If
guest is malicious and specifies a descriptor length that is too large,
and should packet size exceed array size, this results in a buffer
overflow.
Reported-by: 刘令 <liuling-it@360.cn>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
While receiving packets in 'gem_receive' routine, if Frame Check
Sequence(FCS) is enabled, it copies the packet into a local
buffer without checking its size. Add check to validate packet
length against the buffer size to avoid buffer overflow.
Reported-by: Ling Liu <liuling-it@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-19-git-send-email-peter.maydell@linaro.org
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-15-git-send-email-peter.maydell@linaro.org
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-14-git-send-email-peter.maydell@linaro.org
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-13-git-send-email-peter.maydell@linaro.org
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-6-git-send-email-peter.maydell@linaro.org
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-5-git-send-email-peter.maydell@linaro.org
In Xen 4.7 we are refactoring parts libxenctrl into a number of
separate libraries which will provide backward and forward API and ABI
compatiblity.
One such library will be libxengnttab which provides access to grant
tables.
In preparation for this switch the compatibility layer in xen_common.h
(which support building with older versions of Xen) to use what will
be the new library API. This means that the gnttab shim will disappear
for versions of Xen which include libxengnttab.
To simplify things for the <= 4.0.0 support we wrap the int fd in a
malloc(sizeof int) such that the handle is always a pointer. This
leads to less typedef headaches and the need for
XC_HANDLER_INITIAL_VALUE etc for these interfaces.
Note that this patch does not add any support for actually using
libxengnttab, it just adjusts the existing shims.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Replace the uint8 softfloat-specific typedef with uint8_t.
This change was made with
find include hw fpu target-* -name '*.[ch]' | xargs sed -i -e 's/\buint8\b/uint8_t/g'
together with manual removal of the typedef definition and
manual fixing of more erroneous uses found via test compilation.
It turns out that the only code using this type is an accidental
use where uint8_t was intended anyway...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Acked-by: Leon Alrae <leon.alrae@imgtec.com>
Acked-by: James Hogan <james.hogan@imgtec.com>
Message-id: 1452603315-27030-7-git-send-email-peter.maydell@linaro.org
Replace the uint32 softfloat-specific typedef with uint32_t.
This change was made with
find include hw fpu target-* -name '*.[ch]' | xargs sed -i -e 's/\buint32\b/uint32_t/g'
together with manual removal of the typedef definition,
manual undoing of various mis-hits, and another couple of
fixes found via test compilation.
All the uses in hw/ were using the wrong type by mistake.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Acked-by: Leon Alrae <leon.alrae@imgtec.com>
Acked-by: James Hogan <james.hogan@imgtec.com>
Message-id: 1452603315-27030-5-git-send-email-peter.maydell@linaro.org
Device init() methods aren't supposed to call hw_error(), they should
report the error and fail cleanly. Do that.
Cc: "Edgar E. Iglesias" <edgar.iglesias@gmail.com>
Signed-off-by: Markus Armbruster <armbru@pond.sub.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-Id: <1450370121-5768-5-git-send-email-armbru@redhat.com>
eth.h and slirp.h both define ETH_ALEN and ETH_P_IP
rtl8139.c and eth.h both define ETH_HLEN
Move the related constant (ETH_P_ARP) from slirp.h to eth.h, and
remove the duplicates; make slirp.h include eth.h
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
While processing transmit(tx) descriptors in 'tx_consume' routine
the switch emulator suffers from an off-by-one error, if a
descriptor was to have more than allowed(ROCKER_TX_FRAGS_MAX=16)
fragments. Fix an incorrect bounds check to avoid it.
Reported-by: Qinghao Tang <luodalongde@gmail.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Following the previous patch which changed vmxnet3 to be a pci express
device, this patch introduces a boolean property 'x-disable-pcie' whose
default is false.
Setting 'x-disable-pcie' to 'on' preserves the old 'pci device' (non
express) behavior. This allows migration to older versions.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Report the DSN extended PCI capability at 0x100.
DSN value is a transformation of device MAC address, as calculated
by VMware virtual hardware.
DSN is reported only if device is pcie.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Report the 'express endpoint' capability if on a PCIE bus.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Introduce a class type for vmxnet3, and the usual
DEVICE_CLASS/DEVICE_GET_CLASS macros.
No semantic change.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Following the previous patches, where vmxnet3's pci's msi/msix
capability offsets and msix's PBA table offsets have been changed, this
patch introduces a boolean property 'x-old-msi-offsets' to vmxnet3,
whose default is false.
Setting 'x-old-msi-offsets' to 'on' preserves the old offsets behavior,
which allows migration to older versions.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Place the PBA table at 0x1000, as placed by VMware virtual hardware.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Place device reported PCI capabilities at the same offsets as placed by
the VMware virtual hardware: MSI at [84], MSI-X at [9c].
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
VMXNET3_DEVICE_VERSION is used as return value for accessing
UPT Revision Report and Selection register. So rename it
to VMXNET3_UPT_REVISION.
Signed-off-by: Miao Yan <yanmiaoebest@gmail.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Return 0 on unknown command, this is what esxi (5.x+) behaves.
Signed-off-by: Miao Yan <yanmiaobest@gmail.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
VMXNET3_CMD_GET_DEV_EXTRA_INFO should return 0 for emulation
mode
This behavior can be observed by the following steps:
1) run a Linux distro on esxi server (5.x+)
2) modify vmxnet3 Linux driver to read the register:
VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD, VMXNET3_CMD_GET_DEV_EXTRA_INFO);
ret = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_CMD);
pr_info("vmxnet3 dev_info: 0x%x\n", ret);
The kernel log will have some like the following message:
[ 7005.111170] vmxnet3 dev_info: 0x0
Signed-off-by: Miao Yan <yanmiaobest@gmail.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
VMXNET3_CMD_GET_DID_LO should return PCI ID of the device
and VMXNET3_CMD_GET_DID_HI should return vmxnet3 revision ID.
This behavior can be observed by the following steps:
1) run a Linux distro on esxi server (5.x+)
2) modify vmxnet3 Linux driver to read DID_HI and DID_LO:
VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD, VMXNET3_CMD_GET_DID_LO);
lo = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_CMD);
VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD, VMXNET3_CMD_GET_DID_HI);
high = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_CMD);
pr_info("vmxnet3 DID lo: 0x%x, high: 0x%x\n", lo, high);
The kernel log will have something like the following message:
[ 7005.111170] vmxnet3 DID lo: 0x7b0, high: 0x1
Signed-off-by: Miao Yan <yanmiaobest@gmail.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
When reading device status, 0 means device is successfully
activated and 1 means error.
This behavior can be observed by the following steps:
1) run a Linux distro on esxi server (5.5+)
2) modify vmxnet3 Linux driver to give it an invalid
address to 'adapter->shared_pa' which is the
shared memory for guest/host communication
This will trigger device activation failure and kernel
log will have the following message:
[ 7138.403256] vmxnet3 0000:03:00.0 eth1: Failed to activate dev: error 1
So return 1 on device activation failure instead of -1;
Signed-off-by: Miao Yan <yanmiaobest@gmail.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Vmxnet3 device emulator does not check if the device is active
before activating it, also it did not free the transmit & receive
buffers while deactivating the device, thus resulting in memory
leakage on the host. This patch fixes both these issues to avoid
host memory leakage.
Reported-by: Qinghao Tang <luodalongde@gmail.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Cc: qemu-stable@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Macro VMW_SHPRN(...) is already defined vmxnet3_debug.h,
so remove the duplication
Signed-off-by: Miao Yan <yanmiaobest@gmail.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Vmxnet3 uses the following debug macro style:
#ifdef SOME_DEBUG
# define debug(...) do{ printf(...); } while (0)
# else
# define debug(...) do{ } while (0)
#endif
If SOME_DEBUG is undefined, then format string inside the
debug macro will never be checked by compiler. Code is
likely to break in the future when SOME_DEBUG is enabled
because of lack of testing. This patch changes this
to the following:
#define debug(...) \
do { if (SOME_DEBUG_ENABLED) printf(...); } while (0)
Signed-off-by: Miao Yan <yanmiaobest@gmail.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Use %zu specifier for size_t in printf, otherwise build would fail
on platforms where size_t is not unsigned long
Signed-off-by: Miao Yan <yanmiaobest@gmail.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Macro MAC_FMT and MAC_ARG are not defined, but used in vmxnet3_net_init().
This will cause build error when debug level is raised in
vmxnet3_debug.h (enable all VMXNET3_DEBUG_xxx).
Use VMXNET_MF and VXMNET_MA instead.
Signed-off-by: Miao Yan <yanmiaobest@gmail.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Version: GnuPG v1
iQEcBAABAgAGBQJWZZJPAAoJEO8Ells5jWIRmp0H/26aFXVEgZykkUVNbqq05r7w
AI7podQlFOAESJHqZtR8FMaH8TAZ5GhphP4pn0PsWp54VjwcYZbdoME+dhZ4Elyc
WDanRHIweLv/zVg6+M8oHhw5GMaxtFLoLWrf0oanbUW9IZZmmM3COz/Y31hSVrR2
EzEJi1VZZhpMj3ibeOJns4MrugYrne8MtOdvusE/Uw2rJBTiStnWw1eTk8RmkNcg
5un1mQZxFU2AcNzmWdmWJmjY0rCnR3HhtTdZOwjM6uZGIJ9hbsItGzqiGadBfozI
fUtIa2HZahioe0VIzoB0snXnAuhV1jA0Uy18i04dPvgQOmiVSRjQNE2/lwQflyE=
=Pad3
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Mon 07 Dec 2015 14:06:07 GMT using RSA key ID 398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
lan9118: log and ignore access to invalid registers, rather than aborting
lan9118: fix emulation of MAC address loaded bit in E2P_CMD register
vmxnet3: silence warning
pcnet: fix rx buffer overflow(CVE-2015-7512)
net: pcnet: add check to validate receive data size(CVE-2015-7504)
e1000: fix hang of win2k12 shutdown with flood ping
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
With this change, access to invalid/unimplemented device registers are
logged as a "guest error" rather than aborting qemu with
hw_error. This enables drivers for similar devices (e.g. SMSC 9221),
by simply ignoring the unimplemented writes. It's also closer to what
real hardware does.
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
There appears to have been a longstanding typo in the implementation
of the "MAC address loaded" bit in the E2P_CMD (EEPROM command)
register. The code was using 0x10, but the controller spec says it
should be bit 8 (0x100).
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
vmxnet3 always produces a warning under qtest.
This is not a user error, don't warn.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Backends could provide a packet whose length is greater than buffer
size. Check for this and truncate the packet to avoid rx buffer
overflow in this case.
Cc: Prasad J Pandit <pjp@fedoraproject.org>
Cc: qemu-stable@nongnu.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
In loopback mode, pcnet_receive routine appends CRC code to the
receive buffer. If the data size given is same as the buffer size,
the appended CRC code overwrites 4 bytes after s->buffer. Added a
check to avoid that.
Reported by: Qinghao Tang <luodalongde@gmail.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
e1000 driver in Win2k12 is really well rotten. It 100% hangs on shutdown
of UP VM under flood ping. The guest checks card state and reinjects
itself interrupt in a loop. This is fatal for UP machine.
There is no good way to fix this misbehavior but to kludge it. The
emulation has interrupt throttling register aka ITR which limits
interrupt rate and allows the guest to proceed this phase.
There is no problem with this kludge for Linux guests - it adjust the
value of it itself.
On the other hand according to the initial research in
commit e9845f0985
Author: Vincenzo Maffione <v.maffione@gmail.com>
Date: Fri Aug 2 18:30:52 2013 +0200
e1000: add interrupt mitigation support
...
Interrupt mitigation boosts performance when the guest suffers from
an high interrupt rate (i.e. receiving short UDP packets at high packet
rate). For some numerical results see the following link
http://info.iet.unipi.it/~luigi/papers/20130520-rizzo-vm.pdf
this should also boost performance a bit.
See https://bugzilla.redhat.com/show_bug.cgi?id=874406 for additional
details.
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Vincenzo Maffione <v.maffione@gmail.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg04592.html
shows an example how an endless loop in function action_command can
be achieved.
During my code review, I noticed a 2nd case which can result in an
endless loop.
Reported-by: Qinghao Tang <luodalongde@gmail.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Some features (such as ctrl vq) are supported
by qemu without need to communicate with the
backend.
Drop them from the feature mask so we set them
unconditionally.
Reported-by: Victor Kaplansky <vkaplans@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
First of all, RESET_OWNER message is sent incorrectly, as it's sent
before GET_VRING_BASE. And the reset message would let the later call
get nothing correct.
And, sending SET_VRING_ENABLE at stop, which has already been done,
makes more sense than RESET_OWNER.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Version: GnuPG v1
iQEcBAABAgAGBQJWREdzAAoJEO8Ells5jWIRI18H/0CEDVwj7AJHLEpAv07hX2iS
jfq6Osgj5hDChv43+66Clz3owog3m9NfPKWxBMvIw5c/Q1mFvNuxZcUaVOzX2dT4
E+IwIsZxXOANIGPYtCxOhARz1zNSDxJxgYPMVuIDZ+uZVJqYeCjdduMGzgy8wt8H
qiquUCI2sktg97AntZqzp8iWfZZIN5w6uNbf3FvgwIffWDxGRPt8wY6dlwgIpsx2
uFd9PMwtj7lJyV9guy36FdrS7MhVTCF5/5GIerPj2nN1ByJp9vu5InzPAlmZNRSZ
KxKcBnmkLsnT3nDN86ZS6ajDyjeEgWSVdrQS9MHDURfinADuuqjbJkhME/UhG+g=
=vRNP
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Thu 12 Nov 2015 08:01:55 GMT using RSA key ID 398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
net: netmap: use error_setg() helpers in place of error_report()
net: netmap: Fix compilation issue
e1000: Introducing backward compatibility command line parameter
e1000: Implementing various counters
e1000: Fixing the packet address filtering procedure
e1000: Fixing the received/transmitted octets' counters
e1000: Fixing the received/transmitted packets' counters
e1000: Trivial implementation of various MAC registers
e1000: Introduced an array to control the access to the MAC registers
e1000: Add support for migrating the entire MAC registers' array
e1000: Cosmetic and alignment fixes
slirp: Fix type casts and format strings in debug code
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This follows the previous patches, where support for migrating the
entire MAC registers' array, and some new MAC registers were introduced.
This patch introduces the e1000-specific boolean parameter
"extra_mac_registers", which is on by default. Setting it to off will
enable migration to older versions of QEMU, but will disable the read
and write access to the new registers, that were introduced since adding
the ability to migrate the entire MAC array.
Example for usage to enable backward compatibility and to disable the
new MAC registers:
qemu-system-x86_64 -device e1000,extra_mac_registers=off,... ...
As mentioned above, the default value is "on".
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This implements the following Statistic registers (various counters)
according to Intel's specs:
TSCTC GOTCL GOTCH GORCL GORCH MPRC BPRC RUC ROC
BPTC MPTC PTC... PRC...
PLEASE NOTE: these registers will not be active, nor will migrate, until
a compatibility flag will be set (in the next patch in this series).
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Previously, if promiscuous unicast was enabled, a packet was received
straight away, even if it was a multicast or a broadcast packet. This
patch fixes that behavior, while making the filtering procedure a bit
more human-readable.
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Previously, these 64-bit registers did not stick at their maximal
values when (and if) they reached them, as they should do, according to
the specs.
This patch introduces a function that takes care of such registers,
avoiding code duplication, making the relevant parts more compatible
with the QEMU coding style, while ensuring that in the unlikely case
of reaching the maximal value, the counter will stick there, as it
supposed to.
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
According to Intel's specs, these counters (as the other Statistic
registers) stick at 0xffffffff when this maximal value is reached.
Previously, they would reset after the max. value.
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
These registers appear in Intel's specs, but were not implemented.
These registers are now implemented trivially, i.e. they are initiated
with zero values, and if they are RW, they can be written or read by the
driver, or read only if they are R (essentially retaining their zero
values). For these registers no other procedures are performed.
For the trivially implemented Diagnostic registers, a debug warning is
produced on read/write attempts.
PLEASE NOTE: these registers will not be active, nor will migrate, until
a compatibility flag will be set (in a later patch in this series).
The registers implemented here are:
Transmit:
RW: AIT
Management:
RW: WUC WUS IPAV IP6AT* IP4AT* FFLT* WUPM* FFMT* FFVT*
Diagnostic:
RW: RDFH RDFT RDFHS RDFTS RDFPC PBM* TDFH TDFT TDFHS
TDFTS TDFPC
Statistic:
RW: FCRUC
R: RNBC TSCTFC MGTPRC MGTPDC MGTPTC RFC RJC SCC ECOL
LATECOL MCC COLC DC TNCRS SEC CEXTERR RLEC XONRXC
XONTXC XOFFRXC XOFFTXC
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The array of uint8_t's which is introduced here, contains access metadata
about the MAC registers: if a register is accessible, but partly implemented,
or if a register requires a certain compatibility flag in order to be
accessed. Currently, 6 hypothetical flags are supported (3 exist for e1000
so far) but in the future, if more than 6 flags will be needed, the datatype
of this array can simply be swapped for a larger one.
This patch is intended to solve the following current problems:
1) In a scenario of migration between different versions of QEMU, which
differ by the MAC registers implemented in them, some registers need not to
be active if a compatibility flag is set, in order to preserve the machine's
state perfectly for the older version. Checking this for each register
individually, would create a lot of clutter in the code.
2) Some registers are (or may be) only partly implemented (e.g.
placeholders that allow reading and writing, but lack other functions).
In such cases it is better to print a debug warning on read/write attempts.
As above, dealing with this functionality on a per-register level, would
require longer and more messy code.
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This patch makes the migration of the entire array of MAC registers
possible during live migration. The entire array is just 128 KB long, so
practically no penalty should be felt when transmitting it, additionally
to the previously transmitted individual registers. The advantage here is
eliminating the need to introduce new vmstate subsections in the future,
when additional MAC registers will be implemented.
Backward compatibility is preserved by introducing a e1000-specific
boolean parameter (in a later patch), which will be on by default.
Setting it to off would enable migration to older versions of QEMU.
Additionally, this parameter will be used to control the access to the
extra MAC registers in the future.
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This fixes some alignment and cosmetic issues. The changes are made
in order that the following patches in this series will look like
integral parts of the code surrounding them, while conforming to the
coding style. Although some changes in unrelated areas are also made.
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
A few uses of error_set(ERROR_CLASS_GENERIC_ERROR) were missed in
c6bd8c706, or have snuck in since. Nuke them.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1447224690-9743-19-git-send-email-eblake@redhat.com>
Acked-by: Andreas Färber <afaerber@suse.de>
[Indentation tidied up, commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The goal is to have debug code always compiled during build.
We standardize all debug output on the following format:
[QOM_TYPE_NAME]reporting_function: debug message
The qemu_log_mask() output is following the same format as the
above debug.
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: 57e565982db94fb433c32dfa17608888464d21de.1445781957.git.jcd@tribudubois.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Guest OS may issue VMXNET3_CMD_GET_STATS even before device was
activated (for example in linux, after insmod but prior net-dev open).
Accessing shared descriptors prior device activation is illegal as the
VMXNET3State structures have not been fully initialized.
As a result, guest memory gets corrupted and may lead to guest OS
crashes.
Fix, by not filling the stats descriptors if device is inactive.
Reported-by: Leonid Shatz <leonid.shatz@ravellosystems.com>
Acked-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Dana Rubin <dana.rubin@ravellosystems.com>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Set initial MAC address to the one specified by the command line.
Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
A new vhost user message is added to allow QEMU to ask to vhost user backend to
broadcast a fake RARP after live migration for guest without GUEST_ANNOUNCE
capability.
This new message is sent only if the backend supports the new
VHOST_USER_PROTOCOL_F_RARP protocol feature.
The payload of this new message is the MAC address of the guest (not known by
the backend). The MAC address is copied in the first 6 bytes of a u64 to avoid
to create a new payload message type.
This new message has no equivalent ioctl so a new callback is added in the
userOps structure to send the request.
Upon reception of this new message the vhost user backend must generate and
broadcast a fake RARP request to notify the migration is terminated.
Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
[Rebased and fixed checkpatch errors - Marc-André]
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
Some vhost user backends are able to support live migration.
To provide this service the following features must be added:
1. Add the VIRTIO_NET_F_GUEST_ANNOUNCE capability to vhost-net when netdev
backend is vhost-user.
2. Provide a nop receive callback to vhost-user.
This callback is called by:
* qemu_announce_self after a migration to send fake RARP to avoid network
outage for peers talking to the migrated guest.
- For guest with GUEST_ANNOUNCE capabilities, guest already sends GARP
when the bit VIRTIO_NET_S_ANNOUNCE is set.
=> These packets must be discarded.
- For guest without GUEST_ANNOUNCE capabilities, migration termination
is notified when the guest sends packets.
=> These packets can be discarded.
* virtio_net_tx_bh with a dummy boot to send fake bootp/dhcp request.
BIOS guest manages virtio driver to send 4 bootp/dhcp request in case of
dummy boot.
=> These packets must be discarded.
Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
Replace the generic vhost_call() by specific functions for each
function call to help with type safety and changing arguments.
While doing this, I found that "unsigned long long" and "uint64_t" were
used interchangeably and causing compilation warnings, using uint64_t
instead, as the vhost & protocol specifies.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[Fix enum usage and MQ - Thibaut Collet]
Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
Version: GnuPG v1
iQEcBAABAgAGBQJWG2e/AAoJEO8Ells5jWIRcYcH/2D11W8cToCBjGDuw/u9K1ht
S3oGyFasOEq3lm3+a3zQE+vDw0RDkjLEMhcTVwNskJQl6k6Ts5JleTZ6wffvUKPM
UCozgPOCt1ZAdGskwdbByc+NhaVBHIiEsmlbDKqP22CENdDx6GWjcFW4brA4tQJQ
AW36EH77j/M+7/KiSukcUfIexILUZJRfN+ICJVyNTpGsqUNJtFqiVPBMPyJhKCEq
3pr3yJ2lf78SAEF5kBeBc9r/PDWUhtqExBsrK0L8Ey1FdrCy8ldqDPGecT4TsxNv
W/KX5AqhKSsMI8DQKdbv/IKaUdjYWNjTRQ2Qjm8Vt0hcW0PhxR0NYi6bV4yjDNM=
=f26Q
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Mon 12 Oct 2015 08:56:47 BST using RSA key ID 398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
tests: add test cases for netfilter object
netfilter: add a netbuffer filter
net/queue: export qemu_net_queue_append_iov
netfilter: print filter info associate with the netdev
netfilter: add an API to pass the packet to next filter
net/queue: introduce NetQueueDeliverFunc
net: merge qemu_deliver_packet and qemu_deliver_packet_iov
netfilter: hook packets before net queue send
init/cleanup of netfilter object
vl.c: init delayed object after net_init_clients
vmxnet3: Add support for VMXNET3_CMD_GET_ADAPTIVE_RING_INFO command
e1000: use alias for default model
vmxnet3: Support reading IMR registers on bar0
net/vmxnet3: Refine l2 header validation
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Some drivers (e.g. vmware-tools) issue the VMXNET3_CMD_GET_ADAPTIVE_RING_INFO
command.
Currently, due to lack of support, a bogus value (-1) is returned.
Support this command, returning the "adaptive-ring disabled" flag.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Instead of duplicating the "e1000-82540em" device model as "e1000",
make the latter an alias for the former.
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Instead of asserting, return the actual IMR register value.
This is aligned with what's returned on ESXi.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Tested-by: Dana Rubin <dana.rubin@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Validation of l2 header length assumed minimal packet size as
eth_header + 2 * vlan_header regardless of the actual protocol.
This caused crash for valid non-IP packets shorter than 22 bytes, as
'tx_pkt->packet_type' hasn't been assigned for such packets, and
'vmxnet3_on_tx_done_update_stats()' expects it to be properly set.
Refine header length validation in 'vmxnet_tx_pkt_parse_headers'.
Check its return value during packet processing flow.
As a side effect, in case IPv4 and IPv6 header validation failure,
corrupt packets will be dropped.
Signed-off-by: Dana Rubin <dana.rubin@ravellosystems.com>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.
This commit only touches allocations with size arguments of the form
sizeof(T). Same Coccinelle semantic patchas in commit b45c03f.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
When packet is truncated during receiving, we drop the packets but
neither discard the descriptor nor add and signal used
descriptor. This will lead several issues:
- sg mappings are leaked
- rx will be stalled if a lots of packets were truncated
In order to be consistent with vhost, fix by discarding the descriptor
in this case.
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Originally, timers were ticks based, and it made sense to
add ticks to current time to know when to trigger an alarm.
But since commit:
7447545 change all other clock references to use nanosecond resolution accessors
All timers use nanoseconds and we need to convert ticks to nanoseconds, by
doing something like:
y = muldiv64(x, get_ticks_per_sec(), PCI_FREQUENCY)
where x is the number of device ticks and y the number of system ticks.
y is used as nanoseconds in timer functions,
it works because 1 tick is 1 nanosecond.
(get_ticks_per_sec() is 10^9)
But as PCI frequency is 33 MHz, we can also do:
y = x * 30; /* 33 MHz PCI period is 30 ns */
Which is much more simple.
This implies a 33.333333 MHz PCI frequency,
but this is correct.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Originally, timers were ticks based, and it made sense to
add ticks to current time to know when to trigger an alarm.
But since commit:
7447545 change all other clock references to use nanosecond resolution accessors
All timers use nanoseconds and we need to convert ticks to nanoseconds, by
doing something like:
y = muldiv64(x, get_ticks_per_sec(), PCI_FREQUENCY)
where x is the number of device ticks and y the number of system ticks.
y is used as nanoseconds in timer functions,
it works because 1 tick is 1 nanosecond.
(get_ticks_per_sec() is 10^9)
But as PCI frequency is 33 MHz, we can also do:
y = x * 30; /* 33 MHz PCI period is 30 ns */
Which is much more simple.
This implies a 33.333333 MHz PCI frequency,
but this is correct.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Add a new message, VHOST_USER_SET_VRING_ENABLE, to enable or disable
a specific virt queue, which is similar to attach/detach queue for
tap device.
virtio driver on guest doesn't have to use max virt queue pair, it
could enable any number of virt queue ranging from 1 to max virt
queue pair.
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Tested-by: Marcel Apfelbaum <marcel@redhat.com>
This patch is initially based a patch from Nikolay Nikolaev.
This patch adds vhost-user multiple queue support, by creating a nc
and vhost_net pair for each queue.
Qemu exits if find that the backend can't support the number of requested
queues (by providing queues=# option). The max number is queried by a
new message, VHOST_USER_GET_QUEUE_NUM, and is sent only when protocol
feature VHOST_USER_PROTOCOL_F_MQ is present first.
The max queue check is done at vhost-user initiation stage. We initiate
one queue first, which, in the meantime, also gets the max_queues the
backend supports.
In older version, it was reported that some messages are sent more times
than necessary. Here we came an agreement with Michael that we could
categorize vhost user messages to 2 types: non-vring specific messages,
which should be sent only once, and vring specific messages, which should
be sent per queue.
Here I introduced a helper function vhost_user_one_time_request(), which
lists following messages as non-vring specific messages:
VHOST_USER_SET_OWNER
VHOST_USER_RESET_DEVICE
VHOST_USER_SET_MEM_TABLE
VHOST_USER_GET_QUEUE_NUM
For above messages, we simply ignore them when they are not sent the first
time.
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Tested-by: Marcel Apfelbaum <marcel@redhat.com>
This is for querying how many queues the backend supports if it has mq
support(when VHOST_USER_PROTOCOL_F_MQ flag is set from the quried
protocol features).
vhost_net_get_max_queues() is the interface to export that value, and
to tell if the backend supports # of queues user requested, which is
done in the following patch.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Marcel Apfelbaum <marcel@redhat.com>
Quote from Michael:
We really should rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE.
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Tested-by: Marcel Apfelbaum <marcel@redhat.com>
Support a separate bitmask for vhost-user protocol features,
and messages to get/set protocol features.
Invoke them at init.
No features are defined yet.
[ leverage vhost_user_call for request handling -- Yuanhan Liu ]
Signed-off-by: Michael S. Tsirkin <address@hidden>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Tested-by: Marcel Apfelbaum <marcel@redhat.com>
After commit 019a3edbb2 ("virtio: make
features 64bit wide"). Device's guest_features was actually set after
vdc->load(). This breaks the assumption that device specific load()
function can check guest_features. For virtio-net, self announcement
and guest offloads won't work after migration.
Fixing this by defer them to virtio_net_load() where guest_features
were guaranteed to be set. Other virtio devices looks fine.
Fixes: 019a3edbb2
("virtio: make features 64bit wide")
Cc: qemu-stable@nongnu.org
Cc: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Symptom:
$ qemu-system-x86_64 -m 10000000
Unexpected error in ram_block_add() at /work/armbru/qemu/exec.c:1456:
upstream-qemu: cannot set up guest memory 'pc.ram': Cannot allocate memory
Aborted (core dumped)
Root cause: commit ef701d7 screwed up handling of out-of-memory
conditions. Before the commit, we report the error and exit(1), in
one place, ram_block_add(). The commit lifts the error handling up
the call chain some, to three places. Fine. Except it uses
&error_abort in these places, changing the behavior from exit(1) to
abort(), and thus undoing the work of commit 3922825 "exec: Don't
abort when we can't allocate guest memory".
The three places are:
* memory_region_init_ram()
Commit 4994653 (right after commit ef701d7) lifted the error
handling further, through memory_region_init_ram(), multiplying the
incorrect use of &error_abort. Later on, imitation of existing
(bad) code may have created more.
* memory_region_init_ram_ptr()
The &error_abort is still there.
* memory_region_init_rom_device()
Doesn't need fixing, because commit 33e0eb5 (soon after commit
ef701d7) lifted the error handling further, and in the process
changed it from &error_abort to passing it up the call chain.
Correct, because the callers are realize() methods.
Fix the error handling after memory_region_init_ram() with a
Coccinelle semantic patch:
@r@
expression mr, owner, name, size, err;
position p;
@@
memory_region_init_ram(mr, owner, name, size,
(
- &error_abort
+ &error_fatal
|
err@p
)
);
@script:python@
p << r.p;
@@
print "%s:%s:%s" % (p[0].file, p[0].line, p[0].column)
When the last argument is &error_abort, it gets replaced by
&error_fatal. This is the fix.
If the last argument is anything else, its position is reported. This
lets us check the fix is complete. Four positions get reported:
* ram_backend_memory_alloc()
Error is passed up the call chain, ultimately through
user_creatable_complete(). As far as I can tell, it's callers all
handle the error sanely.
* fsl_imx25_realize(), fsl_imx31_realize(), dp8393x_realize()
DeviceClass.realize() methods, errors handled sanely further up the
call chain.
We're good. Test case again behaves:
$ qemu-system-x86_64 -m 10000000
qemu-system-x86_64: cannot set up guest memory 'pc.ram': Cannot allocate memory
[Exit 1 ]
The next commits will repair the rest of commit ef701d7's damage.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1441983105-26376-3-git-send-email-armbru@redhat.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
The SOFT_RST or RXEN in the control register can be used as a condition
to unblock the net layer via can_receive(). So check for possible
flushes on RCR changes. This will drop all pending packets on soft
reset or disable which is the functional intent of the can_receive()
logic.
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Tested-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Message-id: b114d4c96f4afbdaa15f1361d9c07e3021755915.1441873621.git.crosthwaite.peter@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Return false from can_receive() when the FIFO doesn't have a free RX
slot. This fixes a bug in the current code where the allocated buffer
is freed before the fifo pop, triggering a premature flush of queued RX
packets. It also will handle a corner case, where the guest manually
frees the allocated buffer before popping the rx FIFO (hence it is not
enough to just delay the flush_queued_packets()).
Reported-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Tested-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Message-id: 97bfdfc5cbce0bd5e0cbbbff35ce7a1bf6f8603d.1441873621.git.crosthwaite.peter@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Check that the core can once again receive packets before asking the
net layer to do a flush. This will make it more convenient to flush
packets when adding new conditions to can_receive.
Add missing if braces while moving the can_receive() core code.
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Tested-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Message-id: 92e15e12a6964274f4bc0eb71b61a7d94326f6c6.1441873621.git.crosthwaite.peter@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Ne2000 NIC uses ring buffer of NE2000_MEM_SIZE(49152)
bytes to process network packets. While receiving packets
via ne2000_receive() routine, a local 'index' variable
could exceed the ring buffer size, leading to an infinite
loop situation.
Reported-by: Qinghao Tang <luodalongde@gmail.com>
Signed-off-by: P J P <pjp@fedoraproject.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Ne2000 NIC uses ring buffer of NE2000_MEM_SIZE(49152)
bytes to process network packets. While receiving packets
via ne2000_receive() routine, a local 'index' variable
could exceed the ring buffer size, which could lead to a
memory buffer overflow. Added other checks at initialisation.
Reported-by: Qinghao Tang <luodalongde@gmail.com>
Signed-off-by: P J P <pjp@fedoraproject.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
While processing transmit descriptors, it could lead to an infinite
loop if 'bytes' was to become zero; Add a check to avoid it.
[The guest can force 'bytes' to 0 by setting the hdr_len and mss
descriptor fields to 0.
--Stefan]
Signed-off-by: P J P <pjp@fedoraproject.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 1441383666-6590-1-git-send-email-stefanha@redhat.com