Commit Graph

391 Commits

Author SHA1 Message Date
Ladi Prosek
f8693c2cd0 virtio-rng: ask for more data if queue is not fully drained
This commit effectively reverts:

  commit 4621c1768e
  Author: Amit Shah <amit.shah@redhat.com>
  Date:   Wed Nov 21 11:21:19 2012 +0530

  virtio-rng: remove extra request for entropy

but instead of calling virtio_rng_process unconditionally, it
first checks to see if the queue is empty as a little bit of
optimization.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Message-Id: <1456998514-19271-1-git-send-email-lprosek@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-03-03 17:42:26 +05:30
Paolo Bonzini
fee089e4e2 vring: remove
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-02-25 13:14:19 +02:00
Paolo Bonzini
adb3feda8d virtio: export vring_notify as virtio_should_notify
Virtio dataplane needs to trigger the irq manually through the
guest notifier.  Export virtio_should_notify so that it can be
used around event_notifier_set.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-02-25 13:14:18 +02:00
Paolo Bonzini
a1afb6062e virtio: add AioContext-specific function for host notifiers
This is used to register ioeventfd with a dataplane thread.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-02-25 13:14:18 +02:00
Paolo Bonzini
8b1fe1cedf vring: make vring_enable_notification return void
Make the API more similar to the regular virtqueue API.  This will
help when modifying the code to not use vring.c anymore.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-02-25 13:14:18 +02:00
Vladimir Sementsov-Ogievskiy
2b75f84823 balloon: Use only 'pc-dimm' type dimm for ballooning
For now there are only two dimm's: pc-dimm and nvdimm. This patch is
actually needed to disable ballooning on nvdimm. But, to avoid future
bugs, instead of disallowing nvdimm, we allow only pc-dimm. So, if
someone adds new dimm which should be balloon-able, then this ability
should be explicitly specified here.

Why ballooning for nvdimm should be disabled for now:

NVDIMM for now is planned to use as a backing store for DAX filesystem
in the guest and thus this memory is excluded from guest memory
management and LRUs.

In this case libvirt running QEMU along with configured balloon almost
immediately inflates balloon and effectively kill the guest as
qemu counts nvdimm as part of the ram.

Counting dimm devices as part of the ram for ballooning was started from
commit 463756d03:
 virtio-balloon: Fix balloon not working correctly when hotplug memory

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-25 13:14:18 +02:00
Vladimir Sementsov-Ogievskiy
e8dc06d225 virtio-balloon: rewrite get_current_ram_size()
Use pc_dimm_built_list() instead of qmp_pc_dimm_device_list()

Actually, Qapi is not related to this internal helper.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-25 13:14:18 +02:00
Vladimir Sementsov-Ogievskiy
39de99843e move get_current_ram_size to virtio-balloon.c
get_current_ram_size() is used only in virtio-balloon.c
This patch moves it into virtio-balloon and make it static, to allow
some balloon-specific tuning.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-23 12:55:16 +02:00
Michael S. Tsirkin
ffe42cc14c vhost-user: don't merge regions with different fds
vhost currently merges regions with contiguious virtual and physical
addresses.  This breaks for vhost-user since that also needs fds to
match.

Add a vhost_ops entry to compare the fds for vhost-user only.

Cc: qemu-stable@nongnu.org
Cc: Victor Kaplansky <victork@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-23 12:55:16 +02:00
Victor Kaplansky
5669655aaf vhost-user interrupt management fixes
Since guest_mask_notifier can not be used in vhost-user mode due
to buffering implied by unix control socket, force
use_mask_notifier on virtio devices of vhost-user interfaces, and
send correct callfd to the guest at vhost start.

Using guest_notifier_mask function in vhost-user case may
break interrupt mask paradigm, because mask/unmask is not
really done when returning from guest_notifier_mask call, instead
message is posted in a unix socket, and processed later.

Add an option boolean flag 'use_mask_notifier' to disable the use
of guest_notifier_mask in virtio pci.

Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-18 16:13:56 +02:00
Greg Kurz
46f70ff148 vhost: simplify vhost_needs_vring_endian()
After the call to virtio_vdev_has_feature(), we only care for legacy
devices, so we don't need the extra check in virtio_is_big_endian().

Also the device_endian field is always set (VIRTIO_DEVICE_ENDIAN_UNKNOWN
may only happen on a virtio_load() path that cannot lead here), so we
don't need the assert() either.

This open codes the device_endian checking in vhost_needs_vring_endian().
It also adds a comment to explain the logic, as recent reviews showed the
cross-endian tweaks aren't that obvious.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-02-16 12:05:18 +02:00
Greg Kurz
e58481234e vhost: move virtio 1.0 check to cross-endian helper
Indeed vhost doesn't need to ask for vring endian fixing if the device is
virtio 1.0, since it is already handled by the in-kernel vhost driver. This
patch simply consolidates the logic into the existing helper.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-02-16 12:05:17 +02:00
Greg Kurz
a122ab2472 virtio: move cross-endian helper to vhost
If target is bi-endian (ppc64, arm), the virtio_legacy_is_cross_endian()
indeed returns the runtime state of the virtio device. However, it returns
false unconditionally in the general case. This sounds a bit strange
given the name of the function.

This helper is only useful for vhost actually, where indeed non bi-endian
targets don't have to deal with cross-endian issues.

This patch moves the helper to vhost.c and gives it a more appropriate name.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-02-16 12:05:17 +02:00
Eric Blake
337283dffb qapi: Drop unused 'kind' for struct/enum visit
visit_start_struct() and visit_type_enum() had a 'kind' argument
that was usually set to either the stringized version of the
corresponding qapi type name, or to NULL (although some clients
didn't even get that right).  But nothing ever used the argument.
It's even hard to argue that it would be useful in a debugger,
as a stack backtrace also tells which type is being visited.

Therefore, drop the 'kind' argument as dead.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1454075341-13658-22-git-send-email-eblake@redhat.com>
[Harmless rebase mistake cleaned up]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-02-08 17:29:57 +01:00
Eric Blake
d7bce9999d qom: Swap 'name' next to visitor in ObjectPropertyAccessor
Similar to the previous patch, it's nice to have all functions
in the tree that involve a visitor and a name for conversion to
or from QAPI to consistently stick the 'name' parameter next
to the Visitor parameter.

Done by manually changing include/qom/object.h and qom/object.c,
then running this Coccinelle script and touching up the fallout
(Coccinelle insisted on adding some trailing whitespace).

    @ rule1 @
    identifier fn;
    typedef Object, Visitor, Error;
    identifier obj, v, opaque, name, errp;
    @@
     void fn
    - (Object *obj, Visitor *v, void *opaque, const char *name,
    + (Object *obj, Visitor *v, const char *name, void *opaque,
       Error **errp) { ... }

    @@
    identifier rule1.fn;
    expression obj, v, opaque, name, errp;
    @@
     fn(obj, v,
    -   opaque, name,
    +   name, opaque,
        errp)

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1454075341-13658-20-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-02-08 17:29:56 +01:00
Eric Blake
51e72bc1dd qapi: Swap visit_* arguments for consistent 'name' placement
JSON uses "name":value, but many of our visitor interfaces were
called with visit_type_FOO(v, &value, name, errp).  This can be
a bit confusing to have to mentally swap the parameter order to
match JSON order.  It's particularly bad for visit_start_struct(),
where the 'name' parameter is smack in the middle of the
otherwise-related group of 'obj, kind, size' parameters! It's
time to do a global swap of the parameter ordering, so that the
'name' parameter is always immediately after the Visitor argument.

Additional reason in favor of the swap: the existing include/qjson.h
prefers listing 'name' first in json_prop_*(), and I have plans to
unify that file with the qapi visitors; listing 'name' first in
qapi will minimize churn to the (admittedly few) qjson.h clients.

Later patches will then fix docs, object.h, visitor-impl.h, and
those clients to match.

Done by first patching scripts/qapi*.py by hand to make generated
files do what I want, then by running the following Coccinelle
script to affect the rest of the code base:
 $ spatch --sp-file script `git grep -l '\bvisit_' -- '**/*.[ch]'`
I then had to apply some touchups (Coccinelle insisted on TAB
indentation in visitor.h, and botched the signature of
visit_type_enum() by rewriting 'const char *const strings[]' to
the syntactically invalid 'const char*const[] strings').  The
movement of parameters is sufficient to provoke compiler errors
if any callers were missed.

    // Part 1: Swap declaration order
    @@
    type TV, TErr, TObj, T1, T2;
    identifier OBJ, ARG1, ARG2;
    @@
     void visit_start_struct
    -(TV v, TObj OBJ, T1 ARG1, const char *name, T2 ARG2, TErr errp)
    +(TV v, const char *name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp)
     { ... }

    @@
    type bool, TV, T1;
    identifier ARG1;
    @@
     bool visit_optional
    -(TV v, T1 ARG1, const char *name)
    +(TV v, const char *name, T1 ARG1)
     { ... }

    @@
    type TV, TErr, TObj, T1;
    identifier OBJ, ARG1;
    @@
     void visit_get_next_type
    -(TV v, TObj OBJ, T1 ARG1, const char *name, TErr errp)
    +(TV v, const char *name, TObj OBJ, T1 ARG1, TErr errp)
     { ... }

    @@
    type TV, TErr, TObj, T1, T2;
    identifier OBJ, ARG1, ARG2;
    @@
     void visit_type_enum
    -(TV v, TObj OBJ, T1 ARG1, T2 ARG2, const char *name, TErr errp)
    +(TV v, const char *name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp)
     { ... }

    @@
    type TV, TErr, TObj;
    identifier OBJ;
    identifier VISIT_TYPE =~ "^visit_type_";
    @@
     void VISIT_TYPE
    -(TV v, TObj OBJ, const char *name, TErr errp)
    +(TV v, const char *name, TObj OBJ, TErr errp)
     { ... }

    // Part 2: swap caller order
    @@
    expression V, NAME, OBJ, ARG1, ARG2, ERR;
    identifier VISIT_TYPE =~ "^visit_type_";
    @@
    (
    -visit_start_struct(V, OBJ, ARG1, NAME, ARG2, ERR)
    +visit_start_struct(V, NAME, OBJ, ARG1, ARG2, ERR)
    |
    -visit_optional(V, ARG1, NAME)
    +visit_optional(V, NAME, ARG1)
    |
    -visit_get_next_type(V, OBJ, ARG1, NAME, ERR)
    +visit_get_next_type(V, NAME, OBJ, ARG1, ERR)
    |
    -visit_type_enum(V, OBJ, ARG1, ARG2, NAME, ERR)
    +visit_type_enum(V, NAME, OBJ, ARG1, ARG2, ERR)
    |
    -VISIT_TYPE(V, OBJ, NAME, ERR)
    +VISIT_TYPE(V, NAME, OBJ, ERR)
    )

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1454075341-13658-19-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-02-08 17:29:56 +01:00
Eric Blake
4fa45492c3 qom: Use typedef for Visitor
No need to repeat 'struct Visitor' when we already have it in
typedefs.h.  Omitting the redundant 'struct' also makes a later
patch easier to search for all object property callbacks that
are associated with a Visitor.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1454075341-13658-18-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-02-08 17:29:56 +01:00
Eric Blake
9dbb8fa7ef balloon: Improve use of qapi visitor
Rework the control flow of balloon_stats_get_all() to make it
easier for a later patch to split visit_end_struct().  Also
switch to the uint64 visitor to match the data type.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1454075341-13658-10-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-02-08 17:29:55 +01:00
Vincenzo Maffione
1cdd2ee54a virtio: combine write of an entry into used ring
Fill in an element of the used ring with a single combined access to the
guest physical memory, rather than using two separated accesses.
This reduces the overhead due to expensive address translation.

Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
Message-Id: <e4a89a767a4a92cbb6bcc551e151487eb36e1722.1450218353.git.v.maffione@gmail.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-06 20:44:08 +02:00
Vincenzo Maffione
be1fea9bc2 virtio: read avail_idx from VQ only when necessary
The virtqueue_pop() implementation needs to check if the avail ring
contains some pending buffers. To perform this check, it is not
always necessary to fetch the avail_idx in the VQ memory, which is
expensive. This patch introduces a shadow variable tracking avail_idx
and modifies virtio_queue_empty() to access avail_idx in physical
memory only when necessary.

Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
Message-Id: <b617d6459902773d9f4ab843bfaca764f5af8eda.1450218353.git.v.maffione@gmail.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-06 20:44:08 +02:00
Vincenzo Maffione
b796fcd1bf virtio: cache used_idx in a VirtQueue field
Accessing used_idx in the VQ requires an expensive access to
guest physical memory. Before this patch, 3 accesses are normally
done for each pop/push/notify call. However, since the used_idx is
only written by us, we can track it in our internal data structure.

Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
Message-Id: <3d062ec54e9a7bf9fb325c1fd693564951f2b319.1450218353.git.v.maffione@gmail.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-06 20:44:08 +02:00
Paolo Bonzini
aa570d6fb6 virtio: combine the read of a descriptor
Compared to vring, virtio has a performance penalty of 10%.  Fix it
by combining all the reads for a descriptor in a single address_space_read
call.  This also simplifies the code nicely.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-06 20:44:08 +02:00
Paolo Bonzini
5dba97ebdc vring: slim down allocation of VirtQueueElements
Build the addresses and s/g lists on the stack, and then copy them
to a VirtQueueElement that is just as big as required to contain this
particular s/g list.  The cost of the copy is minimal compared to that
of a large malloc.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-06 20:44:08 +02:00
Paolo Bonzini
3b3b062821 virtio: slim down allocation of VirtQueueElements
Build the addresses and s/g lists on the stack, and then copy them
to a VirtQueueElement that is just as big as required to contain this
particular s/g list.  The cost of the copy is minimal compared to that
of a large malloc.

When virtqueue_map is used on the destination side of migration or on
loadvm, the iovecs have already been split at memory region boundary,
so we can just reuse the out_num/in_num we find in the file.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-06 20:44:08 +02:00
Paolo Bonzini
3724650db0 virtio: introduce virtqueue_alloc_element
Allocate the arrays for in_addr/out_addr/in_sg/out_sg outside the
VirtQueueElement.  For now, virtqueue_pop and vring_pop keep
allocating a very large VirtQueueElement.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-06 20:44:08 +02:00
Paolo Bonzini
ab281c1781 virtio: introduce qemu_get/put_virtqueue_element
Move allocation to virtio functions also when loading/saving a
VirtQueueElement.  This will also let the load/save functions
keep backwards compatibility when the VirtQueueElement layout
is changed.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-06 20:44:08 +02:00
Paolo Bonzini
51b19ebe43 virtio: move allocation to virtqueue_pop/vring_pop
The return code of virtqueue_pop/vring_pop is unused except to check for
errors or 0.  We can thus easily move allocation inside the functions
and just return a pointer to the VirtQueueElement.

The advantage is that we will be able to allocate only the space that
is needed for the actual size of the s/g list instead of the full
VIRTQUEUE_MAX_SIZE items.  Currently VirtQueueElement takes about 48K
of memory, and this kind of allocation puts a lot of stress on malloc.
By cutting the size by two or three orders of magnitude, malloc can
use much more efficient algorithms.

The patch is pretty large, but changes to each device are testable
more or less independently.  Splitting it would mostly add churn.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-02-06 20:39:07 +02:00
Dr. David Alan Gilbert
3e996cc583 Fix virtio migration
I misunderstood the vmstate macro definition when I reworked the
virtio .get/.put.
The VMSTATE_STRUCT_VARRAY_KNOWN, was described as being for "a
variable length array (i.e. _type *_field) but we know the
length".  However it actually specified operation for arrays embedded in
the struct (i.e. _type _field[]) since it lacked the VMS_POINTER
flag. This caused offset calculation to be completely off, examining and
potentially sending random data instead of the VirtQueue content.

Replace the otherwise unused VMSTATE_STRUCT_VARRAY_KNOWN with a
VMSTATE_STRUCT_VARRAY_POINTER_KNOWN that includes the VMS_POINTER flag
(so now actually doing what it advertises) and use it in the virtio
migration code.

Fixes and description as per Sascha's suggestions/debug.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reported-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Tested-By: Sascha Silbe <silbe@linux.vnet.ibm.com>
Reviewed-By: Sascha Silbe <silbe@linux.vnet.ibm.com>

Fixes: 50e5ae4dc3
Fixes: 2cf0148674
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-02-04 19:53:02 +02:00
Peter Maydell
9b8bfe21be virtio: Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-15-git-send-email-peter.maydell@linaro.org
2016-01-29 15:07:23 +00:00
Peter Maydell
649a1bbaf9 VirtFS update:
Cleanups mostly isolating virtio related details into separate files. This
 is done to enable easy addition of Xen transport for VirtFS.
 
 The changes include:
 
 1. Rename a bunch of files and functions to make clear they are generic.
 2. disentangle virtio transport code and generic 9pfs code.
 3. Some function name clean-up.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWlJdzAAoJEN5BpP4ExOI689QP/i4nPm/TFRlScnyXX6METu/E
 vDJv+9lWlJJ57len4MKDmGZYG+AbFzAY8BWtK4Ssr33aj/8GeNSq8u3rBJoE7SdJ
 BKQXAdP7mJRJOPh0WfCbeGaGa95hPzyOfZZs+IHXhulNFagraMrfWcNNdyNXsaw1
 uhKZbB4QmM29vyj0Mp7/ynMP2WJbS2sxkoJDhOjelGxS0E2JdSE7UO0h6l2WUk5B
 OK7YdsaO8tge8/45ECD/veIwOex55OeKHbZyQjgx0MK7QLhowEGNyY2r7wpjJB8Z
 xicGpY9/iY/YHqJhtKTa8vrs3tUlPl669u3QqNpXDGpwYMkNwfvyljx2tr3t1Nn5
 KSxfkYzOsf9TUnf+maMlFVJkMMQWshxR7zfr26+Fo/O+PJKsoF6Jdr63V/p3yNH8
 G8QoLhWv1sQfV15sFGUSjbTeIfhOAXPE+tYnAg3tn+PEFMoROUAxvDDMFYQzuVtZ
 IfzdhgFTQUVNzWxsa20pVSJ36+z+3TFzdEnTyRKixreZSkrvDJ62RfPgjtfbZkf+
 Of+TUvDmHiUcIexBZJeQhu/VcsLwuEAxOtomfsOxfrYVTmFWAoAmFzFn5Q6dr9Sd
 XV7/L0ApTnkFlQ9d2NgNkIKeMyL2BXQbAYSRs3bCIcvoSWctRa/Zx/L+BTp+iGwy
 3FrR2/jpdXKPeFmz+iZ7
 =ZBhb
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kvaneesh/tags/for-upstream-signed' into staging

VirtFS update:

Cleanups mostly isolating virtio related details into separate files. This
is done to enable easy addition of Xen transport for VirtFS.

The changes include:

1. Rename a bunch of files and functions to make clear they are generic.
2. disentangle virtio transport code and generic 9pfs code.
3. Some function name clean-up.

# gpg: Signature made Tue 12 Jan 2016 06:04:35 GMT using RSA key ID 04C4E23A
# gpg: Good signature from "Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 4846 9DE7 1860 360F A6E9  968C DE41 A4FE 04C4 E23A

* remotes/kvaneesh/tags/for-upstream-signed: (25 commits)
  9pfs: introduce V9fsVirtioState
  9pfs: factor out v9fs_device_{,un}realize_common
  9pfs: rename virtio-9p.c to 9p.c
  9pfs: rename virtio_9p_set_fd_limit to use v9fs_ prefix
  9pfs: move handle_9p_output and make it static function
  9pfs: export pdu_{submit,alloc,free}
  9pfs: factor out virtio_9p_push_and_notify
  9pfs: break out 9p.h from virtio-9p.h
  9pfs: break out virtio_init_iov_from_pdu
  9pfs: factor out pdu_push_and_notify
  9pfs: factor out virtio_pdu_{,un}marshal
  9pfs: make pdu_{,un}marshal proper functions
  9pfs: PDU processing functions should start pdu_ prefix
  9pfs: PDU processing functions don't need to take V9fsState as argument
  fsdev: rename virtio-9p-marshal.{c,h} to 9p-iov-marshal.{c,h}
  fsdev: break out 9p-marshal.{c,h} from virtio-9p-marshal.{c,h}
  9pfs: remove dead code
  9pfs: merge hw/virtio/virtio-9p.h into hw/9pfs/virtio-9p.h
  9pfs: rename virtio-9p-xattr{,-user}.{c,h} to 9p-xattr{,-user}.{c,h}
  9pfs: rename virtio-9p-synth.{c,h} to 9p-synth.{c,h}
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-01-12 17:37:22 +00:00
Wei Liu
00588a0aa2 9pfs: introduce V9fsVirtioState
V9fsState now only contains generic fields. Introduce V9fsVirtioState
for virtio transport.  Change virtio-pci and virtio-ccw to use
V9fsVirtioState.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2016-01-12 11:04:14 +05:30
Cornelia Huck
8a1be662a6 virtio: fix error message for number of queues
There's no such thing as "PCI queues" in the virtio core.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-01-09 23:20:20 +02:00
Dr. David Alan Gilbert
50e5ae4dc3 migration/virtio: Remove simple .get/.put use
The 'virtqueue_state' and 'ringsize' can be saved using VMSTATE
macros rather than hand coded .get/.put

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
2016-01-09 23:20:20 +02:00
Wei Liu
756cb74a59 9pfs: merge hw/virtio/virtio-9p.h into hw/9pfs/virtio-9p.h
The deleted file only contained V9fsConf which wasn't virtio specific.
Merge that to the general header of 9pfs.

Fixed header inclusions as I went along.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2016-01-08 12:48:11 +05:30
Shmulik Ladkani
0560b0e97d virtio-pci: Set the QEMU_PCI_CAP_EXPRESS capability early in its DeviceClass realize method
In 1811e64 'hw/virtio: Add PCIe capability to virtio devices', the
QEMU_PCI_CAP_EXPRESS capability was added to virtio's pci_dev, within
'virtio_pci_realize' - the pci device object realization method.

This occurs to late, as 'pci_qdev_realize' (DeviceClass.realize of
TYPE_PCI_DEVICE) has already been called, without knowing that the
device instance is indeed an "express" instance, thus allocating
insufficient pci config space.

As a result, device may crash upon attempt to write to the PCIE config
space.

Fix, by arming the QEMU_PCI_CAP_EXPRESS capability early in virtio-pci's
own DeviceClass realize method.

This also makes code cleaner, as 'virtio_pci_realize' may now access the
'pci_is_express' predicate when needed.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Tested-by: Marcel Apfelbaum <marcel@redhat.com>
2015-12-02 21:51:33 +02:00
Cornelia Huck
11380b3619 virtio: handle non-virtio-1-capable backend for ccw
If you run a qemu advertising VERSION_1 with an old kernel where
vhost did not yet support VERSION_1, you'll end up with a device
that is {modern pci|ccw revision 1} but does not advertise VERSION_1.
This is not a sensible configuration and is rejected by the Linux
guest drivers.

To fix this, add a ->post_plugged() callback invoked after features
have been queried that can handle the VERSION_1 bit being withdrawn
and change ccw to fall back to revision 0 if VERSION_1 is gone.

Note that pci is _not_ fixed; we'll need to rethink the approach
for the next release but at least for pci it's not a regression.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-12-02 19:34:11 +02:00
Michael S. Tsirkin
449e357810 Revert "vhost: send SET_VRING_ENABLE at start/stop"
This reverts commit 3a12f32229.

In case of live migration several queues can be enabled and not only the
first one. So informing backend that only the first queue is enabled is
wrong.

Reported-by: Thibaut Collet <thibaut.collet@6wind.com>
Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2015-11-26 12:02:11 +02:00
Michael S. Tsirkin
48854f57ce vhost-user: fix log size
commit 2b8819c6ee
("vhost-user: modify SET_LOG_BASE to pass mmap size and offset")
passes log size in units of 4 byte chunks instead of the
expected size in bytes.

Fix this up.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-18 18:49:27 +02:00
Michael S. Tsirkin
dc3db6adde vhost-user: start/stop all rings
We are currently only sending VRING_ENABLE message for the first ring,
that's wrong: we must start/stop them all.

Reported-by: Victor Kaplansky <victork@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-16 18:48:31 +02:00
Michael S. Tsirkin
5421f318ec vhost-user: print original request on error
When we get an unexpected response, print out
the original request.
Helps debug protocol errors tremendously.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-16 14:35:16 +02:00
Yuanhan Liu
923e2d98ed vhost: let SET_VRING_ENABLE message depends on protocol feature
But not depend on PROTOCOL_F_MQ feature bit. So that we could use
SET_VRING_ENABLE to sign the backend on stop, even if MQ is disabled.

That's reasonable, since we will have one queue pair at least.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-16 12:02:54 +02:00
Marcel Apfelbaum
1811e64c35 hw/virtio: Add PCIe capability to virtio devices
The virtio devices are converted to PCI-Express
if they are plugged into a PCI-Express bus and
the 'modern' protocol is enabled.

Devices plugged directly into the Root Complex as
Integrated Endpoints remain PCI.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-12 16:23:16 +02:00
Yuanhan Liu
3a12f32229 vhost: send SET_VRING_ENABLE at start/stop
Send SET_VRING_ENABLE at start/stop, to give the backend
an explicit sign of our state.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-12 15:49:33 +02:00
Yuanhan Liu
60915dc469 vhost: rename RESET_DEVICE backto RESET_OWNER
This patch basically reverts commit d1f8b30e.

It turned out that it breaks stuff, so revert it:
    http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg00949.html

CC: "Michael S. Tsirkin" <mst@redhat.com>
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-12 15:49:33 +02:00
Victor Kaplansky
2b8819c6ee vhost-user: modify SET_LOG_BASE to pass mmap size and offset
Unlike the kernel, vhost-user application accesses log table by
mmaping it to its user space. This change adds two new fields to
VhostUserMsg payload: mmap_size, and mmap_offset and make QEMU to
pass the to vhost-user application in VHOST_USER_SET_LOG_BASE
request.

Signed-off-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-12 15:49:33 +02:00
Jason Wang
393f04d3ab virtio-pci: unbreak queue_enable read
Guest always get zero when reading queue_enable. This violates
spec. Fixing this by setting the queue_enable to true during any guest
writing and setting it to zero during reset.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-12 15:49:33 +02:00
Jason Wang
9824d2a39d virtio-pci: introduce pio notification capability for modern device
We used to use mmio for notification. This could be slow on some arch
(e.g on x86 without EPT). So this patch introduces pio bar and a pio
notification cap for modern device. This ability is enabled through
property "modern-pio-notify" for virtio pci devices and was disabled
by default. Management can enable when it thinks it was needed.

Benchmarks shows almost no obvious difference compared to legacy
device on machines without ept. Thanks Wenli Quan <wquan@redhat.com>
for the benchmarking.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-12 15:49:32 +02:00
Jason Wang
bc85ccfdf5 virtio-pci: use zero length mmio eventfd for 1.0 notification cap when possible
We use data match eventfd for 1.0 notification currently. This could
be slow since software decoding is needed for mmio exit. To speed this
up, we can switch to use zero length mmio eventfd for 1.0 notification
since we can examine the queue index directly from the writing
address. KVM kernel module can utilize this by registering it to fast
mmio bus which could be as fast as pio on ept capable machine when
fast mmio is supported by host kernel.

Lots of improvements were seen on a ept capable machine:

Guest RX:(TCP)
size/session/+throughput%/+cpu%/-+per cpu%/
64/1/+1.6807%/[-16.2421%]/[+21.3984%]/
64/2/+0.6091%/[-11.0187%]/[+13.0678%]/
64/4/+0.0553%/[-5.9768%]/[+6.4155%]/
64/8/+0.1206%/[-4.0057%]/[+4.2984%]/
256/1/-0.0031%/[-10.1166%]/[+11.2517%]/
256/2/-0.5058%/[-6.1656%]/+6.0317%]/
...

Guest TX:(TCP)
size/session/+throughput%/+cpu%/-+per cpu%/
64/1/[+18.9183%]/-0.2823%/[+19.2550%]/
64/2/[+13.5714%]/[+2.2675%]/[+11.0533%]/
64/4/[+13.1070%]/[+2.1817%]/[+10.6920%]/
64/8/[+13.0426%]/[+2.0887%]/[+10.7299%]/
256/1/[+36.2761%]/+6.3434%/[+28.1471%]/
...
1024/1/[+44.8873%]/+2.0811%/[+41.9335%]/
...
1024/4/+0.0228%/[-2.2044%]/[+2.2774%]/
...
16384/2/+0.0127%/[-5.0346%]/[+5.3148%]/
...
65535/1/[+0.0062%]/[-4.1183%]/[+4.3017%]/
65535/2/+0.0004%/[-4.2311%]/[+4.4185%]/
65535/4/+0.0107%/[-4.6106%]/[+4.8446%]/
65535/8/-0.0090%/[-5.5178%]/[+5.8306%]/

Latency:(TCP_RR)
size/session/+transaction rate%/+cpu%/-+per cpu%/
64/1/[+6.5248%]/[-9.2882%]/[+17.4322%]/
64/25/[+11.0854%]/[+0.8000%]/[+10.2038%]/
64/50/[+12.1076%]/[+2.4627%]/[+9.4131%]/
256/1/[+5.3677%]/[+10.5669%]/-4.7024%/
256/25/[+5.6402%]/-0.8962%/[+6.5955%]/
256/50/[+5.9685%]/[+1.7766%]/[+4.1188%]/
4096/1/+0.2508%/[-10.4941%]/[+12.0047%]/
4096/25/[+1.8533%]/-0.0273%/+1.8812%/
4096/50/[+1.2156%]/-1.4134%/+2.6667%/

Notes: data with '[]' is the one whose significance is greater than 95%.

Thanks Wenli Quan <wquan@redhat.com> for the benchmarking.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-12 15:49:32 +02:00
Jason Wang
a6df8adf3e virtio-pci: fix 1.0 virtqueue migration
We don't migrate the followings fields for virtio-pci:

uint32_t dfselect;
uint32_t gfselect;
uint32_t guest_features[2];
struct {
    uint16_t num;
    bool enabled;
    uint32_t desc[2];
    uint32_t avail[2];
    uint32_t used[2];
} vqs[VIRTIO_QUEUE_MAX];

This will confuse driver if migrating during initialization. Solves
this issue by:

- introduce transport specific callbacks to load and store extra
  virtqueue states.
- add a new subsection for virtio to migrate transport specific modern
  device state.
- implement pci specific callbacks.
- add a new property for virtio-pci for whether or not to migrate
  extra state.
- compat the migration for 2.4 and elder machine types

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-11-12 15:49:32 +02:00
Dr. David Alan Gilbert
371ff5a3f0 Inhibit ballooning during postcopy
Postcopy detects accesses to pages that haven't been transferred yet
using userfaultfd, and it causes exceptions on pages that are 'not
present'.
Ballooning also causes pages to be marked as 'not present' when the
guest inflates the balloon.
Potentially a balloon could be inflated to discard pages that are
currently inflight during postcopy and that may be arriving at about
the same time.

To avoid this confusion, disable ballooning during postcopy.

When disabled we drop balloon requests from the guest.  Since ballooning
is generally initiated by the host, the management system should avoid
initiating any balloon instructions to the guest during migration,
although it's not possible to know how long it would take a guest to
process a request made prior to the start of migration.
Guest initiated ballooning will not know if it's really freed a page
of host memory or not.

Queueing the requests until after migration would be nice, but is
non-trivial, since the set of inflate/deflate requests have to
be compared with the state of the page to know what the final
outcome is allowed to be.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:28 +01:00