Commit Graph

556 Commits

Author SHA1 Message Date
Peter Maydell
b0ad00b8c9 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQEcBAABAgAGBQJXaFInAAoJEJykq7OBq3PI6VsH/0Sfgbdo1RksYuQwb/y92sCW
 EN+lxUZ+OLfgrc8PYgNZwfSM3rsfYhznL0MAXOeEe7Ahabi07w7DhGR8WvwfAOlI
 G96FRuvrIPfv5u6U6fwS4CvG3TIHVLxfHKCsTpPUmH8U5CNx/x/tpjNiWN1dj6t+
 sXybSjYHfZfiZy2tI9MFIFWCdxnF/pl0QAPhbRqc8Y/RQTDrPKRjLpz+nitN/u96
 5TS7KlELyQuP91YMmLceYSmIkHbxW703h+iE2n4hov0uZCP8Jil+2Jsd3ziQSRlL
 j6LqexQ2ViBGdDSfiZGYES2VPlsHOCwb4G+IgWBStfZg1ppaXENvcDzPrgrB+L4=
 =eUnF
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging

# gpg: Signature made Mon 20 Jun 2016 21:29:27 BST
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/tracing-pull-request: (42 commits)
  trace: split out trace events for linux-user/ directory
  trace: split out trace events for qom/ directory
  trace: split out trace events for target-ppc/ directory
  trace: split out trace events for target-s390x/ directory
  trace: split out trace events for target-sparc/ directory
  trace: split out trace events for net/ directory
  trace: split out trace events for audio/ directory
  trace: split out trace events for ui/ directory
  trace: split out trace events for hw/alpha/ directory
  trace: split out trace events for hw/arm/ directory
  trace: split out trace events for hw/acpi/ directory
  trace: split out trace events for hw/vfio/ directory
  trace: split out trace events for hw/s390x/ directory
  trace: split out trace events for hw/pci/ directory
  trace: split out trace events for hw/ppc/ directory
  trace: split out trace events for hw/9pfs/ directory
  trace: split out trace events for hw/i386/ directory
  trace: split out trace events for hw/isa/ directory
  trace: split out trace events for hw/sd/ directory
  trace: split out trace events for hw/sparc/ directory
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-20 22:30:34 +01:00
Daniel P. Berrange
cd8c2fe77b trace: split out trace events for hw/net/ directory
Move all trace-events for files in the hw/net/ directory to
their own file.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1466066426-16657-11-git-send-email-berrange@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-20 17:22:15 +01:00
Eduardo Habkost
621ff94d50 error: Remove NULL checks on error_propagate() calls
error_propagate() already ignores local_err==NULL, so there's no
need to check it before calling.

Coccinelle patch used to perform the changes added to
scripts/coccinelle/error_propagate_null.cocci.

Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1465855078-19435-2-git-send-email-ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-06-20 16:38:13 +02:00
Peter Maydell
7263a903c3 pc, pci, virtio: new features, cleanups, fixes
Beginning of reconnect support for vhost-user.
 Misc cleanups and fixes.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXY0Q3AAoJECgfDbjSjVRpkVcH/2gTHRE9yUoWe6ROvPV67BKx
 8Iy9GzJ3BMO3RolVZEA5KXIevn5TG+pV274BZEuXMD3AL/molv279p0o/gvBYoqq
 V0jNH2MO+MV6D9OzhUXcgWSejvybF5W07ojPDU/hlgtFXPZFbJDyt95MWaLiilOg
 cCtTuRqgrrRaypcnnk/CIDbC+Ek2kAYdgQHQbfj9ihle3TWO8R0bSXnFqSaqCIkM
 4slMlv8y82fODeiO83nkpfAP1NCnfnRC8r8Gv7hbEUTlZQntavx5DuYdiIx6nsJE
 W0g+Gpe1o0+jRuMnucGIUZvqzZ0e/I0wZuV16Nsfx+Rbd5+4CzTxZda5Qb05v7I=
 =BHbJ
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc, pci, virtio: new features, cleanups, fixes

Beginning of reconnect support for vhost-user.
Misc cleanups and fixes.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Fri 17 Jun 2016 01:28:39 BST
# gpg:                using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream:
  MAINTAINERS: add Marcel to PCI
  msi_init: change return value to 0 on success
  fix some coding style problems
  pci core: assert ENOSPC when add capability
  test: start vhost-user reconnect test
  tests: append i386 tests
  vhost-net: save & restore vring enable state
  vhost-net: save & restore vhost-user acked features
  vhost-net: do not crash if backend is not present
  vhost-user: disconnect on start failure
  qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd
  tests/vhost-user-bridge: workaround stale vring base
  tests/vhost-user-bridge: add client mode
  vhost-user: add ability to know vhost-user backend disconnection
  pci: fix pci_requester_id()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Conflicts:
	tests/Makefile.include
2016-06-17 11:25:46 +01:00
Cao jin
52ea63dea4 fix some coding style problems
It has:
1. More newlines make the code block well separated.
2. Add more comments for msi_init.
3. Fix a indentation in vmxnet3.c.
4. ioh3420 & xio3130_downstream: put PCI Express capability init function
   together, make it more readable.

cc: Michael S. Tsirkin <mst@redhat.com>
cc: Markus Armbruster <armbru@redhat.com>
cc: Marcel Apfelbaum <marcel@redhat.com>
cc: Dmitry Fleytman <dmitry@daynix.com>
cc: Jason Wang <jasowang@redhat.com>

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-06-17 03:28:03 +03:00
Marc-André Lureau
bfc6cf31ce vhost-net: save & restore vring enable state
A driver may change the vring enable state at run time but vhost-user
backend may not be present (a contrived example is when the backend is
disconnected and the device is reconfigured after driver rebinding)

Restore the vring state when the vhost-user backend is started, so it
can process the ring.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-06-17 03:28:03 +03:00
Marc-André Lureau
a463215b08 vhost-net: save & restore vhost-user acked features
The initial vhost-user connection sets the features to be negotiated
with the driver. Renegotiation isn't possible without device reset.

To handle reconnection of vhost-user backend, ensure the same set of
features are provided, and reuse already acked features.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-06-17 03:28:03 +03:00
Marc-André Lureau
72b65f922b vhost-net: do not crash if backend is not present
Do not crash when backend is not present while enabling the ring. A
following patch will save the enabled state so it can be restored once
the backend is started.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-06-17 03:28:03 +03:00
Paolo Bonzini
e9abfcb57f clean-includes: run it once more
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:03 +02:00
Paolo Bonzini
02d0e09503 os-posix: include sys/mman.h
qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check
is bogus without a previous inclusion of sys/mman.h.  Include it in
sysemu/os-posix.h and remove it from everywhere else.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 18:39:03 +02:00
Laurent Vivier
df5d1c17b6 rocker: Use DIV_ROUND_UP
Replace (((n) + (d) - 1) /(d)) by DIV_ROUND_UP(n,d).

This patch is the result of coccinelle script
scripts/coccinelle/round.cocci

CC: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-06-07 18:19:25 +03:00
Sameeh Jubran
b92233b329 e1000: Removing unnecessary if statement
Since mit_delay can never be 0 this if statement is
superfluous.

Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-06-07 18:19:23 +03:00
Dmitry Fleytman
de5dca1b79 e1000e: Fix build with gcc 4.6.3 and ust tracing
This patch fixes used-uninitialized false
positive while compiling with ust tracing
backend plus gcc 4.6.3:

hw/net/e1000e.c: In function ‘e1000e_io_write’:
hw/net/e1000e.c:170:39: error: ‘idx’ may be used uninitialized in this function [-Werror=uninitialized]
hw/net/e1000e.c: In function ‘e1000e_io_read’:
hw/net/e1000e.c:145:35: error: ‘idx’ may be used uninitialized in this function [-Werror=uninitialized]
cc1: all warnings being treated as errors
make: *** [hw/net/e1000e.o] Error 1

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-id: 1465023763-10773-1-git-send-email-dmitry@daynix.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-06 09:42:54 +01:00
Dmitry Fleytman
defbaec160 e1000e: Fix build with ust trace backend
ust trace backend has limitation of maximum 10
arguments per event. Traces with more arguments
cannot be compiled for this backend.

Trace e1000e_rx_rss_ip6 introduced by previous
commits has 11 arguments and fails to compile with
ust trace backend.

This patch fixes the problem by splitting this
tracepoint into two successive tracepoints with
smaller number of arguments.

For more information see comment regarding TP_ARGS
in lttng/tracepoint.h:

/*
* TP_ARGS takes tuples of type, argument separated by a comma.
* It can take up to 10 tuples (which means that less than 10 tuples is
* fine too).
* Each tuple is also separated by a comma.
*/

Build log generated by this problem:

In file included from ./trace/generated-tracers.h:9:0,
                 from /home/travis/build/qemu/qemu/include/trace.h:4,
                 from util/oslib-posix.c:36:
./trace/generated-ust-provider.h:16556:3: error: unknown type name ‘_TP_EXPROTO_Bool’
In file included from /home/travis/build/qemu/qemu/include/trace.h:4:0,
                 from util/oslib-posix.c:36:
./trace/generated-tracers.h: In function ‘trace_e1000e_rx_rss_ip6’:
./trace/generated-tracers.h:8379:431: error: expected string literal before ‘_SDT_ASM_OPERANDS_ipv6_enabled’
./trace/generated-tracers.h:8379:431: error: implicit declaration of function ‘__tracepoint_cb_qemu___e1000e_rx_rss_ip6’ [-Werror=implicit-function-declaration]
./trace/generated-tracers.h:8379:431: error: nested extern declaration of ‘__tracepoint_cb_qemu___e1000e_rx_rss_ip6’ [-Werror=nested-externs]
cc1: all warnings being treated as errors
make: *** [util/oslib-posix.o] Error 1
make: *** Waiting for unfinished jobs....
In file included from ./trace/generated-tracers.h:9:0,
                 from /home/travis/build/qemu/qemu/include/trace.h:4,
                 from util/hbitmap.c:16:
./trace/generated-ust-provider.h:16556:3: error: unknown type name ‘_TP_EXPROTO_Bool’
In file included from /home/travis/build/qemu/qemu/include/trace.h:4:0,
                 from util/hbitmap.c:16:
./trace/generated-tracers.h: In function ‘trace_e1000e_rx_rss_ip6’:
./trace/generated-tracers.h:8379:431: error: expected string literal before ‘_SDT_ASM_OPERANDS_ipv6_enabled’
./trace/generated-tracers.h:8379:431: error: implicit declaration of function ‘__tracepoint_cb_qemu___e1000e_rx_rss_ip6’ [-Werror=implicit-function-declaration]
./trace/generated-tracers.h:8379:431: error: nested extern declaration of ‘__tracepoint_cb_qemu___e1000e_rx_rss_ip6’ [-Werror=nested-externs]
cc1: all warnings being treated as errors
make: *** [util/hbitmap.o] Error 1

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Message-id: 1464894748-27803-1-git-send-email-dmitry@daynix.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-03 11:06:09 +01:00
Peter Maydell
2c107d7684 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQEcBAABAgAGBQJXT9DWAAoJEO8Ells5jWIRgFAH/1ZDXm8V523AMDOEvBAWgqur
 Dj8ZaIwFkqJp7xtLdhS0yKF3xW+vtgx9k+Qftk0S8qEiFKPbThR8iB5VNuesErwd
 AZhWo4bnVhKwtWyMw3BDRDK1N4huAWPMZEva1xovR/Cc9v5IG5mx57/K3Zz5C8ec
 Jsn4DsLKN0q7W0D0dlnbEOkSjl6iKJchvfPCR6UfvrU7BxfXaCZ9Z7Sfh8ec6tfr
 iMgcV9u3A3Zs72gTM9/jdKx8vOrWtdKJufJ8s2Bctc7CyfBNWwnV8PjndhEe3Xvs
 vlYeJopdpDPsdMkMtYD6cevtEgvD5yhOBndJ7et807jjuCvUf837tMhodKkFk9M=
 =SjIZ
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging

# gpg: Signature made Thu 02 Jun 2016 07:23:18 BST using RSA key ID 398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request: (31 commits)
  Add ENET device to i.MX6 SOC.
  Add ENET/Gbps Ethernet support to FEC device
  i.MX: move FEC device to a register array structure.
  i.MX: Rename i.MX FEC defines to ENET_XXX
  i.MX: reset TX/RX descriptors when FEC is disabled.
  i.MX: Fix FEC code for ECR register reset value.
  i.MX: Fix FEC code for MDIO address selection
  i.MX: Fix FEC code for MDIO operation selection
  net: handle optional VLAN header in checksum computation.
  net: improve UDP/TCP checksum computation.
  e1000e: Introduce qtest for e1000e device
  net: Introduce e1000e device emulation
  e1000: Move out code that will be reused in e1000e
  e1000_regs: Add definitions for Intel 82574-specific bits
  vmxnet3: Use pci_dma_* API instead of cpu_physical_memory_*
  net_pkt: Extend packet abstraction as required by e1000e functionality
  rtl8139: Move more TCP definitions to common header
  net_pkt: Name vmxnet3 packet abstractions more generic
  vmxnet3: Use common MAC address tracing macros
  net: Add macros for MAC address tracing
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-02 14:26:57 +01:00
Jean-Christophe Dubois
a699b410d7 Add ENET/Gbps Ethernet support to FEC device
The ENET device (present in i.MX6) is "derived" from FEC and backward
compatible with it.

This patch adds the necessary support of the added feature in the ENET
device to allow Linux to use it (on supported processors).

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
db0de35268 i.MX: move FEC device to a register array structure.
This is to prepare for the ENET Gb device of the i.MX6.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
1bb3c37182 i.MX: Rename i.MX FEC defines to ENET_XXX
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
ff4b325f5e i.MX: reset TX/RX descriptors when FEC is disabled.
According to the FEC chapter of i.MX25 reference manual

RX adn TX descriptors are reseted when the FEC device is disabled through ECR.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
ccdb81d327 i.MX: Fix FEC code for ECR register reset value.
According to the FEC chapter of i.MX25 reference manual ECR register is
initialized at 0xf0000000 at reset time.

We fix the value.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
b413643a5c i.MX: Fix FEC code for MDIO address selection
According to the FEC chapter of i.MX25 reference manual

When writing to MMFR register, the MDIO device and adress are selected by
bit 27 to 23 and bit 22 to 18 respectively. This is a total of 10 bits
that need to be used by the Phy chip/address decoding function.

This patch fixes the number of bits used from 9 to 10.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
4816dc168b i.MX: Fix FEC code for MDIO operation selection
According to the FEC chapter of i.MX25 reference manual

When writing the MMFR register, bit 29 and 28 select the requested operation.
 * 10 means read operation with valid MII mgmt frame
 * 11 means read operation with non compliant MII mgmt frame
 * 01 means write operation with valid MII mgmt frame
 * 00 means write operation with non compliant MII mgmt frame

So while bit 28 does change beween read/write for valid MII mgmt frame, the
mening is inverted for non compliant MII mgmt frame.

Bit 29 on the other hand means read/write whatever the type of mgmt frame
involved.

So this patch change the operation selection from bit 28 to bit 29 as it is
more generic.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Dmitry Fleytman
6f3fbe4ed0 net: Introduce e1000e device emulation
This patch introduces emulation for the Intel 82574 adapter, AKA e1000e.

This implementation is derived from the e1000 emulation code, and
utilizes the TX/RX packet abstractions that were initially developed for
the vmxnet3 device. Although some parts of the introduced code may be
shared with e1000, the differences are substantial enough so that the
only shared resources for the two devices are the definitions in
hw/net/e1000_regs.h.

Similarly to vmxnet3, the new device uses virtio headers for task
offloads (for backends that support virtio extensions). Usage of
virtio headers may be forcibly disabled via a boolean device property
"vnet" (which is enabled by default). In such case task offloads
will be performed in software, in the same way it is done on
backends that do not support virtio headers.

The device code is split into two parts:

  1. hw/net/e1000e.c: QEMU-specific code for a network device;
  2. hw/net/e1000e_core.[hc]: Device emulation according to the spec.

The new device name is e1000e.

Intel specifications for the 82574 controller are available at:
http://www.intel.com/content/dam/doc/datasheet/82574l-gbe-controller-datasheet.pdf

Throughput measurement results (iperf2):

                Fedora 22 guest, TCP, RX
    4 ++------------------------------------------+
      |                                           |
      |                           X   X   X   X   X
  3.5 ++          X   X   X   X                   |
      |       X                                   |
      |                                           |
    3 ++                                          |
G     |   X                                       |
b     |                                           |
/ 2.5 ++                                          |
s     |                                           |
      |                                           |
    2 ++                                          |
      |                                           |
      |                                           |
  1.5 X+                                          |
      |                                           |
      +   +   +   +   +   +   +   +   +   +   +   +
    1 ++--+---+---+---+---+---+---+---+---+---+---+
     32  64  128 256 512  1   2   4   8  16  32  64
      B   B   B   B   B   KB  KB  KB  KB KB  KB  KB
                       Buffer size

               Fedora 22 guest, TCP, TX
  18 ++-------------------------------------------+
     |                        X                   |
  16 ++                           X   X   X   X   X
     |                   X                        |
  14 ++                                           |
     |                                            |
  12 ++                                           |
G    |               X                            |
b 10 ++                                           |
/    |                                            |
s  8 ++                                           |
     |                                            |
   6 ++          X                                |
     |                                            |
   4 ++                                           |
     |       X                                    |
   2 ++  X                                        |
     X   +   +   +   +   +    +   +   +   +   +   +
   0 ++--+---+---+---+---+----+---+---+---+---+---+
    32  64  128 256 512  1    2   4   8  16  32  64
     B   B   B   B   B   KB   KB  KB  KB KB  KB  KB
                       Buffer size

                Fedora 22 guest, UDP, RX
    3 ++------------------------------------------+
      |                                           X
      |                                           |
  2.5 ++                                          |
      |                                           |
      |                                           |
    2 ++                                 X        |
G     |                                           |
b     |                                           |
/ 1.5 ++                                          |
s     |                         X                 |
      |                                           |
    1 ++                                          |
      |                                           |
      |                 X                         |
  0.5 ++                                          |
      |        X                                  |
      X        +        +       +        +        +
    0 ++-------+--------+-------+--------+--------+
     32       64       128     256      512       1
      B        B         B       B        B      KB
                       Datagram size

                Fedora 22 guest, UDP, TX
    1 ++------------------------------------------+
      |                                           X
  0.9 ++                                          |
      |                                           |
  0.8 ++                                          |
  0.7 ++                                          |
      |                                           |
G 0.6 ++                                          |
b     |                                           |
/ 0.5 ++                                          |
s     |                                  X        |
  0.4 ++                                          |
      |                                           |
  0.3 ++                                          |
  0.2 ++                        X                 |
      |                                           |
  0.1 ++                X                         |
      X        X        +       +        +        +
    0 ++-------+--------+-------+--------+--------+
     32       64       128     256      512       1
      B        B         B       B        B      KB
                       Datagram size

              Windows 2012R2 guest, TCP, RX
  3.2 ++------------------------------------------+
      |                                   X       |
    3 ++                                          |
      |                                           |
  2.8 ++                                          |
      |                                           |
  2.6 ++                              X           |
G     |   X                   X   X           X   X
b 2.4 ++      X       X                           |
/     |                                           |
s 2.2 ++                                          |
      |                                           |
    2 ++                                          |
      |           X       X                       |
  1.8 ++                                          |
      |                                           |
  1.6 X+                                          |
      +   +   +   +   +   +   +   +   +   +   +   +
  1.4 ++--+---+---+---+---+---+---+---+---+---+---+
     32  64  128 256 512  1   2   4   8  16  32  64
      B   B   B   B   B   KB  KB  KB  KB KB  KB  KB
                       Buffer size

             Windows 2012R2 guest, TCP, TX
  14 ++-------------------------------------------+
     |                                            |
     |                                        X   X
  12 ++                                           |
     |                                            |
  10 ++                                           |
     |                                            |
G    |                                            |
b  8 ++                                           |
/    |                                    X       |
s  6 ++                                           |
     |                                            |
     |                                            |
   4 ++                               X           |
     |                                            |
   2 ++                                           |
     |           X   X            X               |
     +   X   X   +   +   X    X   +   +   +   +   +
   0 X+--+---+---+---+---+----+---+---+---+---+---+
    32  64  128 256 512  1    2   4   8  16  32  64
     B   B   B   B   B   KB   KB  KB  KB KB  KB  KB
                       Buffer size

              Windows 2012R2 guest, UDP, RX
  1.6 ++------------------------------------------X
      |                                           |
  1.4 ++                                          |
      |                                           |
  1.2 ++                                          |
      |                                  X        |
      |                                           |
G   1 ++                                          |
b     |                                           |
/ 0.8 ++                                          |
s     |                                           |
  0.6 ++                        X                 |
      |                                           |
  0.4 ++                                          |
      |                 X                         |
      |                                           |
  0.2 ++       X                                  |
      X        +        +       +        +        +
    0 ++-------+--------+-------+--------+--------+
     32       64       128     256      512       1
      B        B         B       B        B      KB
                       Datagram size

              Windows 2012R2 guest, UDP, TX
  0.6 ++------------------------------------------+
      |                                           X
      |                                           |
  0.5 ++                                          |
      |                                           |
      |                                           |
  0.4 ++                                          |
G     |                                           |
b     |                                           |
/ 0.3 ++                                 X        |
s     |                                           |
      |                                           |
  0.2 ++                                          |
      |                                           |
      |                         X                 |
  0.1 ++                                          |
      |                 X                         |
      X        X        +       +        +        +
    0 ++-------+--------+-------+--------+--------+
     32       64       128     256      512       1
      B        B         B       B        B      KB
                       Datagram size

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:29 +08:00
Dmitry Fleytman
093454e21d e1000: Move out code that will be reused in e1000e
Code that will be shared moved to a separate files.

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:29 +08:00
Dmitry Fleytman
06e7fa0ad7 e1000_regs: Add definitions for Intel 82574-specific bits
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:29 +08:00
Dmitry Fleytman
111710107d vmxnet3: Use pci_dma_* API instead of cpu_physical_memory_*
To make this device and network packets
abstractions ready for IOMMU.

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:28 +08:00
Dmitry Fleytman
eb700029c7 net_pkt: Extend packet abstraction as required by e1000e functionality
This patch extends the TX/RX packet abstractions with features that will
be used by the e1000e device implementation.

Changes are:

  1. Support iovec lists for RX buffers
  2. Deeper RX packets parsing
  3. Loopback option for TX packets
  4. Extended VLAN headers handling
  5. RSS processing for RX packets

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:28 +08:00
Dmitry Fleytman
66409b7c8b rtl8139: Move more TCP definitions to common header
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:28 +08:00
Dmitry Fleytman
605d52e62f net_pkt: Name vmxnet3 packet abstractions more generic
This patch drops "vmx" prefix from packet abstractions names
to emphasize the fact they are generic and not tied to any
specific network device.

These abstractions will be reused by e1000e emulation implementation
introduced by following patches so their names need generalization.

This patch (except renamed files, adjusted comments and changes in MAINTAINTERS)
was produced by:

git grep -lz 'vmxnet_tx_pkt' | xargs -0 perl -i'' -pE "s/vmxnet_tx_pkt/net_tx_pkt/g"
git grep -lz 'vmxnet_rx_pkt' | xargs -0 perl -i'' -pE "s/vmxnet_rx_pkt/net_rx_pkt/g"
git grep -lz 'VmxnetTxPkt' | xargs -0 perl -i'' -pE "s/VmxnetTxPkt/NetTxPkt/g"
git grep -lz 'VMXNET_TX_PKT' | xargs -0 perl -i'' -pE "s/VMXNET_TX_PKT/NET_TX_PKT/g"
git grep -lz 'VmxnetRxPkt' | xargs -0 perl -i'' -pE "s/VmxnetRxPkt/NetRxPkt/g"
git grep -lz 'VMXNET_RX_PKT' | xargs -0 perl -i'' -pE "s/VMXNET_RX_PKT/NET_RX_PKT/g"
sed -ie 's/VMXNET_/NET_/g' hw/net/vmxnet_rx_pkt.c
sed -ie 's/VMXNET_/NET_/g' hw/net/vmxnet_tx_pkt.c

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:27 +08:00
Dmitry Fleytman
ab64787201 vmxnet3: Use common MAC address tracing macros
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:27 +08:00
Dmitry Fleytman
a4b387e623 vmxnet3: Use generic function for DSN capability definition
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:26 +08:00
Thomas Huth
5c29dd8c28 hw/net/spapr_llan: Provide counter with dropped rx frames to the guest
The last 8 bytes of the receive buffer list page (that has been supplied
by the guest with the H_REGISTER_LOGICAL_LAN call) contain a counter
for frames that have been dropped because there was no suitable receive
buffer available. This patch introduces code to use this field to
provide the information about dropped rx packets to the guest.
There it can be queried with "ethtool -S eth0 | grep rx_no_buffer".

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:23 +10:00
Thomas Huth
8836630f5d hw/net/spapr_llan: Delay flushing of the RX queue while adding new RX buffers
Currently, the spapr-vlan device is trying to flush the RX queue
after each RX buffer that has been added by the guest via the
H_ADD_LOGICAL_LAN_BUFFER hypercall. In case the receive buffer pool
was empty before, we only pass single packets to the guest this
way. This can cause very bad performance if a sender is trying
to stream fragmented UDP packets to the guest. For example when
using the UDP_STREAM test from netperf with UDP packets that are
much bigger than the MTU size, almost all UDP packets are dropped
in the guest since the chances are quite high that at least one of
the fragments got lost on the way.

When flushing the receive queue, it's much better if we'd have
a bunch of receive buffers available already, so that fragmented
packets can be passed to the guest in one go. To do this, the
spapr_vlan_receive() function should return 0 instead of -1 if there
are no more receive buffers available, so that receive_disabled = 1
gets temporarily set for the receive queue, and we have to delay
the queue flushing at the end of h_add_logical_lan_buffer() a little
bit by using a timer, so that the guest gets a chance to add multiple
RX buffers before we flush the queue again.

This improves the UDP_STREAM test with the spapr-vlan device a lot:
Running
 netserver -p 44444 -L <guestip> -f -D -4
in the guest, and
 netperf -p 44444 -L <hostip> -H <guestip> -t UDP_STREAM -l 60 -- -m 16384
in the host, I get the following values _without_ this patch:

Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

229376   16384   60.00     1738970      0    3798.83
229376           60.00          23              0.05

That "0.05" means that almost all UDP packets got lost/discarded
at the receiving side.
With this patch applied, the value look much better:

Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

229376   16384   60.00     1789104      0    3908.35
229376           60.00       22818             49.85

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:22 +10:00
Prasad J Pandit
3af9187fc6 net: mipsnet: check packet length against buffer
When receiving packets over MIPSnet network device, it uses
receive buffer of size 1514 bytes. In case the controller
accepts large(MTU) packets, it could lead to memory corruption.
Add check to avoid it.

Reported by: Oleksandr Bazhaniuk <oleksandr.bazhaniuk@intel.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-05-25 15:46:07 +08:00
Zhou Jie
ea4d824168 hw/net/opencores_eth: Allocating Large sized arrays to heap
open_eth_start_xmit has a huge stack usage of 65536 bytes approx.
Moving large arrays to heap to reduce stack usage.

Reduce size of a buffer allocated on stack to 0x600 bytes, which is the
maximal frame length when HUGEN bit is not set in MODER, only allocate
buffer on heap when that is too small. Thus heap is not used in typical
use case.

Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2016-05-23 22:10:16 +03:00
Max Filippov
aa8e0ab975 hw/net/opencores_eth: use mii.h
Drop local definitions of MII registers and use constants from mii.h for
registers and register bits. No functional changes.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2016-05-23 22:10:16 +03:00
Paolo Bonzini
03dd024ff5 hw: explicitly include qemu/log.h
Move the inclusion out of hw/hw.h, most files do not need it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:29 +02:00
Paolo Bonzini
cbd62f8616 hw: do not use VMSTATE_*TL
Reserve this to CPU state serialization.

Luckily, they were only used by sPAPR devices and these are ppc64
only.  So there is no change to migration format.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:28 +02:00
Stefan Weil
cb8d4c8f54 Fix some typos found by codespell
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-05-18 15:04:27 +03:00
Prasad J Pandit
3a15cc0e1e net: stellaris_enet: check packet length against receive buffer
When receiving packets over Stellaris ethernet controller, it
uses receive buffer of size 2048 bytes. In case the controller
accepts large(MTU) packets, it could lead to memory corruption.
Add check to avoid it.

Reported-by: Oleksandr Bazhaniuk <oleksandr.bazhaniuk@intel.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 1460095428-22698-1-git-send-email-ppandit@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-11 14:22:33 +01:00
Peter Maydell
24790aefe0 Xtensa-related fixes:
- fix networking on xtfpga platform in linux v4.5 by indicating
   autonegotiation completion in opencores_eth MII BMSR.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXBuBXAAoJEFH5zJH4P6BEwAwP/0+MvaQCwv0IXKA6lPgUgDFl
 oofrkJhC/4eV+n0LF7v6kl2WeZvK+kG4qJHDqp9ARlGel5qzGQ8pORqx+S7U3TrD
 Qxw6xwbYNOf+vstDnN36fnd7rUlRwl5QjaWXA3zemya9uaFKX748mq5AV52T6GU+
 C3jmUiMMlE/2B5EkLiR6D1K/MkAsf41plvfIhTozxcxQqjNkeIzYRwli7TAKiDxI
 KZZGCIC0Jh1ZHevvbX3bdrj+0xf+pDeB5tG26ZnXjf2zbrnft589PqNsNZ2TcZ6m
 RBNYOJBqQx+h0FHCNonMveZKNZ6HvIMC+AHmwM8nfmRxPMeqWyLwyiI0w5x0vhW5
 bnsAcPY67BaVJddDDD+udfMGUAHAb1QttcwNYUOTal3SJGDGbUmKb2OVyeRmXpoH
 4DIOlhno45bd7pagkefENR9MuOqwxrzm4qkkyazYogwyXilLPOrVgYHW2COQRsJ8
 PG180PDsY0FYmmf1i/64LJDKXOIjf1VnFDTC3lWoVbibDiNtvAEvAMgAdnNqmTHT
 Xl1RuDdZke28TmsU4gRKiLsGIwentiTJxNMiKCjpDYsrTIN7f2GwirL8wNnYuz79
 iNKPcWOh/fNxTnLVkheoyzc1fOPUzq36I+RALxA2WkvEmOUdKobXCmABDz0ByQBh
 Z+/B0kXXQ0hAK84qR6eB
 =otcW
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/xtensa/tags/20160408-xtensa' into staging

Xtensa-related fixes:

- fix networking on xtfpga platform in linux v4.5 by indicating
  autonegotiation completion in opencores_eth MII BMSR.

# gpg: Signature made Thu 07 Apr 2016 23:33:59 BST using RSA key ID F83FA044
# gpg: Good signature from "Max Filippov <max.filippov@cogentembedded.com>"
# gpg:                 aka "Max Filippov <jcmvbkbc@gmail.com>"

* remotes/xtensa/tags/20160408-xtensa:
  opencores_eth: indicate autonegotiation completion

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-08 11:28:49 +01:00
Jason Wang
91731d5f6d rtl8139: using CP_TX_OWN for ownership transferring during tx
Through CP_TX_OWN and CP_RX_OWN points to the same bit, we'd better use
CP_TX_OWN for tx descriptor handling.

Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-04-06 09:52:07 +08:00
Max Filippov
34fe9af09b opencores_eth: indicate autonegotiation completion
Indicate that autonegotiation is complete in the MII BMSR. This fixes
networking on xtfpga platform in linux v4.5.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2016-04-04 07:08:26 +03:00
Sameeh Jubran
8e0f7dd251 Revert "e1000: fix hang of win2k12 shutdown with flood ping"
This reverts commit 9596ef7c7b.

This workaround in order to fix endless interrupts is no
longer needed because it was superseded by the previous patch
(e1000: Fixing interrupt pace).

Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-30 08:57:42 +08:00
Sameeh Jubran
74004e8ce4 e1000: Fixing interrupts pace.
This patch introduces an upper bound for number of interrupts
per second. Without this bound an interrupt storm can occur as
it has been observed on Windows 10 when disabling the device.

According to the SPEC - Intel PCI/PCI-X Family of Gigabit
Ethernet Controllers Software Developer's Manual, section
13.4.18 - the Ethernet controller guarantees a maximum
observable interrupt rate of 7813 interrupts/sec. If there is
no upper bound this could lead to an interrupt storm by e1000
(when mit_delay < 500) causing interrupts to fire at a very high
pace.
Thus if mit_delay < 500 then the delay should be set to the
minimum delay possible which is 500. This can be calculated
easily as follows:

Interval = 10^9 / (7813 * 256) = 500.

Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-30 08:57:36 +08:00
Peter Maydell
84a5a80148 * Log filtering from Alex and Peter
* Chardev fix from Marc-André
 * config.status tweak from David
 * Header file tweaks from Markus, myself and Veronia (Outreachy candidate)
 * get_ticks_per_sec() removal from Rutuja (Outreachy candidate)
 * Coverity fix from myself
 * PKE implementation from myself, based on rth's XSAVE support
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJW9ErPAAoJEL/70l94x66DJfEH/A/QkMpAhrgNdyVsahzsGrzE
 wx5gHFIc1nBYxyr62w4apUb5jPB7zaXu0LA7EAWDeAe0pyP8hZzLT9kJyOEDsuJu
 zwKN2QeLSNMtPbnbKN0I/YQ2za2xX1V5ruhSeOJoVslUI214hgnAURaGshhQNzuZ
 2CluDT9KgL5cQifAnKs5kJrwhIYShYNQB+1eDC/7wk28dd/EH+sPALIoF+rqrSmt
 Zu4Mdqd+9Ns+oKOjA6br9ULq/Hzg0aDfY82J+XLVVqfF3PXQe8rTDmuMf/7jTn+M
 Un7ZOcei9oZF2/9vfAfKQpDCcgD9HvOUSbgqV/ubmkPPmN/LNJzeKj0fBhrRN+Y=
 =K12D
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* Log filtering from Alex and Peter
* Chardev fix from Marc-André
* config.status tweak from David
* Header file tweaks from Markus, myself and Veronia (Outreachy candidate)
* get_ticks_per_sec() removal from Rutuja (Outreachy candidate)
* Coverity fix from myself
* PKE implementation from myself, based on rth's XSAVE support

# gpg: Signature made Thu 24 Mar 2016 20:15:11 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"

* remotes/bonzini/tags/for-upstream: (28 commits)
  target-i386: implement PKE for TCG
  config.status: Pass extra parameters
  char: translate from QIOChannel error to errno
  exec: fix error handling in file_ram_alloc
  cputlb: modernise the debug support
  qemu-log: support simple pid substitution for logs
  target-arm: dfilter support for in_asm
  qemu-log: dfilter-ise exec, out_asm, op and opt_op
  qemu-log: new option -dfilter to limit output
  qemu-log: Improve the "exec" TB execution logging
  qemu-log: Avoid function call for disabled qemu_log_mask logging
  qemu-log: correct help text for -d cpu
  tcg: pass down TranslationBlock to tcg_code_gen
  util: move declarations out of qemu-common.h
  Replaced get_tick_per_sec() by NANOSECONDS_PER_SECOND
  hw: explicitly include qemu-common.h and cpu.h
  include/crypto: Include qapi-types.h or qemu/bswap.h instead of qemu-common.h
  isa: Move DMA_transfer_handler from qemu-common.h to hw/isa/isa.h
  Move ParallelIOArg from qemu-common.h to sysemu/char.h
  Move QEMU_ALIGN_*() from qemu-common.h to qemu/osdep.h
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Conflicts:
	scripts/clean-includes
2016-03-24 21:42:40 +00:00
Thomas Huth
57c522f47b hw/net/spapr_llan: Enable the RX buffer pools by default for new machines
RX buffer pools are now enabled by default for new machine types.
For older machine types, they are still disabled to avoid breaking
migration.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Thomas Huth
831e882253 hw/net/spapr_llan: Fix receive buffer handling for better performance
tl;dr:
This patch introduces an alternate way of handling the receive
buffers of the spapr-vlan device, resulting in much better
receive performance for the guest.

Full story:
One of our testers recently discovered that the performance of the
spapr-vlan device is very poor compared to other NICs, and that
a simple "ping -i 0.2 -s 65507 someip" in the guest can result
in more than 50% lost ping packets (especially with older guest
kernels < 3.17).

After doing some analysis, it was clear that there is a problem
with the way we handle the receive buffers in spapr_llan.c: The
ibmveth driver of the guest Linux kernel tries to add a lot of
buffers into several buffer pools (with 512, 2048 and 65536 byte
sizes by default, but it can be changed via the entries in the
/sys/devices/vio/1000/pool* directories of the guest). However,
the spapr-vlan device of QEMU only tries to squeeze all receive
buffer descriptors into one single page which has been supplied
by the guest during the H_REGISTER_LOGICAL_LAN call, without
taking care of different buffer sizes. This has two bad effects:
First, only a very limited number of buffer descriptors is accepted
at all. Second, we also hand 64k buffers to the guest even if
the 2k buffers would fit better - and this results in dropped packets
in the IP layer of the guest since too much skbuf memory is used.

Though it seems at a first glance like PAPR says that we should store
the receive buffer descriptors in the page that is supplied during
the H_REGISTER_LOGICAL_LAN call, chapter 16.4.1.2 in the LoPAPR spec
declares that "the contents of these descriptors are architecturally
opaque, none of these descriptors are manipulated by code above
the architected interfaces". That means we don't have to store
the RX buffer descriptors in this page, but can also manage the
receive buffers at the hypervisor level only. This is now what we
are doing here: Introducing proper RX buffer pools which are also
sorted by size of the buffers, so we can hand out a buffer with
the best fitting size when a packet has been received.

To avoid problems with migration from/to older version of QEMU,
the old behavior is also retained and enabled by default. The new
buffer management has to be enabled via a new "use-rx-buffer-pools"
property.

Now with the new buffer pool management enabled, the problem with
"ping -s 65507" is fixed for me, and the throughput of a simple
test with wget increases from creeping 3MB/s up to 20MB/s!

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Thomas Huth
d6f39fdfcd hw/net/spapr_llan: Extract rx buffer code into separate functions
Refactor the code a little bit by extracting the code that reads
and writes the receive buffer list page into separate functions.
There should be no functional change in this patch, this is just
a preparation for the upcoming extensions that introduce receive
buffer pools.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Rutuja Shah
73bcb24d93 Replaced get_tick_per_sec() by NANOSECONDS_PER_SECOND
This patch replaces get_ticks_per_sec() calls with the macro
NANOSECONDS_PER_SECOND. Also, as there are no callers, get_ticks_per_sec()
is then removed.  This replacement improves the readability and
understandability of code.

For example,

    timer_mod(fdctrl->result_timer,
	      qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + (get_ticks_per_sec() / 50));

NANOSECONDS_PER_SECOND makes it obvious that qemu_clock_get_ns
matches the unit of the expression on the right side of the plus.

Signed-off-by: Rutuja Shah <rutu.shah.26@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:17 +01:00