2014-06-10 14:02:57 +04:00
|
|
|
Vhost-user Protocol
|
|
|
|
===================
|
|
|
|
|
|
|
|
Copyright (c) 2014 Virtual Open Systems Sarl.
|
|
|
|
|
|
|
|
This work is licensed under the terms of the GNU GPL, version 2 or later.
|
|
|
|
See the COPYING file in the top-level directory.
|
|
|
|
===================
|
|
|
|
|
|
|
|
This protocol is aiming to complement the ioctl interface used to control the
|
|
|
|
vhost implementation in the Linux kernel. It implements the control plane needed
|
|
|
|
to establish virtqueue sharing with a user space process on the same host. It
|
|
|
|
uses communication over a Unix domain socket to share file descriptors in the
|
|
|
|
ancillary data of the message.
|
|
|
|
|
|
|
|
The protocol defines 2 sides of the communication, master and slave. Master is
|
|
|
|
the application that shares its virtqueues, in our case QEMU. Slave is the
|
|
|
|
consumer of the virtqueues.
|
|
|
|
|
|
|
|
In the current implementation QEMU is the Master, and the Slave is intended to
|
|
|
|
be a software Ethernet switch running in user space, such as Snabbswitch.
|
|
|
|
|
|
|
|
Master and slave can be either a client (i.e. connecting) or server (listening)
|
|
|
|
in the socket communication.
|
|
|
|
|
|
|
|
Message Specification
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
Note that all numbers are in the machine native byte order. A vhost-user message
|
|
|
|
consists of 3 header fields and a payload:
|
|
|
|
|
|
|
|
------------------------------------
|
|
|
|
| request | flags | size | payload |
|
|
|
|
------------------------------------
|
|
|
|
|
|
|
|
* Request: 32-bit type of the request
|
|
|
|
* Flags: 32-bit bit field:
|
|
|
|
- Lower 2 bits are the version (currently 0x01)
|
|
|
|
- Bit 2 is the reply flag - needs to be sent on each reply from the slave
|
|
|
|
* Size - 32-bit size of the payload
|
|
|
|
|
|
|
|
|
|
|
|
Depending on the request type, payload can be:
|
|
|
|
|
|
|
|
* A single 64-bit integer
|
|
|
|
-------
|
|
|
|
| u64 |
|
|
|
|
-------
|
|
|
|
|
|
|
|
u64: a 64-bit unsigned integer
|
|
|
|
|
|
|
|
* A vring state description
|
|
|
|
---------------
|
|
|
|
| index | num |
|
|
|
|
---------------
|
|
|
|
|
|
|
|
Index: a 32-bit index
|
|
|
|
Num: a 32-bit number
|
|
|
|
|
|
|
|
* A vring address description
|
|
|
|
--------------------------------------------------------------
|
|
|
|
| index | flags | size | descriptor | used | available | log |
|
|
|
|
--------------------------------------------------------------
|
|
|
|
|
|
|
|
Index: a 32-bit vring index
|
|
|
|
Flags: a 32-bit vring flags
|
|
|
|
Descriptor: a 64-bit user address of the vring descriptor table
|
|
|
|
Used: a 64-bit user address of the vring used ring
|
|
|
|
Available: a 64-bit user address of the vring available ring
|
|
|
|
Log: a 64-bit guest address for logging
|
|
|
|
|
|
|
|
* Memory regions description
|
|
|
|
---------------------------------------------------
|
|
|
|
| num regions | padding | region0 | ... | region7 |
|
|
|
|
---------------------------------------------------
|
|
|
|
|
|
|
|
Num regions: a 32-bit number of regions
|
|
|
|
Padding: 32-bit
|
|
|
|
|
|
|
|
A region is:
|
2014-06-27 01:01:32 +04:00
|
|
|
-----------------------------------------------------
|
|
|
|
| guest address | size | user address | mmap offset |
|
|
|
|
-----------------------------------------------------
|
2014-06-10 14:02:57 +04:00
|
|
|
|
|
|
|
Guest address: a 64-bit guest address of the region
|
|
|
|
Size: a 64-bit size
|
|
|
|
User address: a 64-bit user address
|
2014-06-29 18:58:48 +04:00
|
|
|
mmap offset: 64-bit offset where region starts in the mapped memory
|
2014-06-10 14:02:57 +04:00
|
|
|
|
|
|
|
In QEMU the vhost-user message is implemented with the following struct:
|
|
|
|
|
|
|
|
typedef struct VhostUserMsg {
|
|
|
|
VhostUserRequest request;
|
|
|
|
uint32_t flags;
|
|
|
|
uint32_t size;
|
|
|
|
union {
|
|
|
|
uint64_t u64;
|
|
|
|
struct vhost_vring_state state;
|
|
|
|
struct vhost_vring_addr addr;
|
|
|
|
VhostUserMemory memory;
|
|
|
|
};
|
|
|
|
} QEMU_PACKED VhostUserMsg;
|
|
|
|
|
|
|
|
Communication
|
|
|
|
-------------
|
|
|
|
|
|
|
|
The protocol for vhost-user is based on the existing implementation of vhost
|
|
|
|
for the Linux Kernel. Most messages that can be sent via the Unix domain socket
|
|
|
|
implementing vhost-user have an equivalent ioctl to the kernel implementation.
|
|
|
|
|
|
|
|
The communication consists of master sending message requests and slave sending
|
|
|
|
message replies. Most of the requests don't require replies. Here is a list of
|
|
|
|
the ones that do:
|
|
|
|
|
|
|
|
* VHOST_GET_FEATURES
|
2015-09-23 07:19:56 +03:00
|
|
|
* VHOST_GET_PROTOCOL_FEATURES
|
2014-06-10 14:02:57 +04:00
|
|
|
* VHOST_GET_VRING_BASE
|
|
|
|
|
|
|
|
There are several messages that the master sends with file descriptors passed
|
|
|
|
in the ancillary data:
|
|
|
|
|
|
|
|
* VHOST_SET_MEM_TABLE
|
|
|
|
* VHOST_SET_LOG_FD
|
|
|
|
* VHOST_SET_VRING_KICK
|
|
|
|
* VHOST_SET_VRING_CALL
|
|
|
|
* VHOST_SET_VRING_ERR
|
|
|
|
|
|
|
|
If Master is unable to send the full message or receives a wrong reply it will
|
|
|
|
close the connection. An optional reconnection mechanism can be implemented.
|
|
|
|
|
2015-09-23 07:19:56 +03:00
|
|
|
Any protocol extensions are gated by protocol feature bits,
|
|
|
|
which allows full backwards compatibility on both master
|
|
|
|
and slave.
|
|
|
|
As older slaves don't support negotiating protocol features,
|
|
|
|
a feature bit was dedicated for this purpose:
|
|
|
|
#define VHOST_USER_F_PROTOCOL_FEATURES 30
|
|
|
|
|
vhost-user: add multiple queue support
This patch is initially based a patch from Nikolay Nikolaev.
This patch adds vhost-user multiple queue support, by creating a nc
and vhost_net pair for each queue.
Qemu exits if find that the backend can't support the number of requested
queues (by providing queues=# option). The max number is queried by a
new message, VHOST_USER_GET_QUEUE_NUM, and is sent only when protocol
feature VHOST_USER_PROTOCOL_F_MQ is present first.
The max queue check is done at vhost-user initiation stage. We initiate
one queue first, which, in the meantime, also gets the max_queues the
backend supports.
In older version, it was reported that some messages are sent more times
than necessary. Here we came an agreement with Michael that we could
categorize vhost user messages to 2 types: non-vring specific messages,
which should be sent only once, and vring specific messages, which should
be sent per queue.
Here I introduced a helper function vhost_user_one_time_request(), which
lists following messages as non-vring specific messages:
VHOST_USER_SET_OWNER
VHOST_USER_RESET_DEVICE
VHOST_USER_SET_MEM_TABLE
VHOST_USER_GET_QUEUE_NUM
For above messages, we simply ignore them when they are not sent the first
time.
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Tested-by: Marcel Apfelbaum <marcel@redhat.com>
2015-09-23 07:20:00 +03:00
|
|
|
Multiple queue support
|
|
|
|
----------------------
|
|
|
|
|
|
|
|
Multiple queue is treated as a protocol extension, hence the slave has to
|
|
|
|
implement protocol features first. The multiple queues feature is supported
|
|
|
|
only when the protocol feature VHOST_USER_PROTOCOL_F_MQ (bit 0) is set:
|
|
|
|
#define VHOST_USER_PROTOCOL_F_MQ 0
|
|
|
|
|
|
|
|
The max number of queues the slave supports can be queried with message
|
|
|
|
VHOST_USER_GET_PROTOCOL_FEATURES. Master should stop when the number of
|
|
|
|
requested queues is bigger than that.
|
|
|
|
|
|
|
|
As all queues share one connection, the master uses a unique index for each
|
2015-09-23 07:20:01 +03:00
|
|
|
queue in the sent message to identify a specified queue. One queue pair
|
|
|
|
is enabled initially. More queues are enabled dynamically, by sending
|
|
|
|
message VHOST_USER_SET_VRING_ENABLE.
|
vhost-user: add multiple queue support
This patch is initially based a patch from Nikolay Nikolaev.
This patch adds vhost-user multiple queue support, by creating a nc
and vhost_net pair for each queue.
Qemu exits if find that the backend can't support the number of requested
queues (by providing queues=# option). The max number is queried by a
new message, VHOST_USER_GET_QUEUE_NUM, and is sent only when protocol
feature VHOST_USER_PROTOCOL_F_MQ is present first.
The max queue check is done at vhost-user initiation stage. We initiate
one queue first, which, in the meantime, also gets the max_queues the
backend supports.
In older version, it was reported that some messages are sent more times
than necessary. Here we came an agreement with Michael that we could
categorize vhost user messages to 2 types: non-vring specific messages,
which should be sent only once, and vring specific messages, which should
be sent per queue.
Here I introduced a helper function vhost_user_one_time_request(), which
lists following messages as non-vring specific messages:
VHOST_USER_SET_OWNER
VHOST_USER_RESET_DEVICE
VHOST_USER_SET_MEM_TABLE
VHOST_USER_GET_QUEUE_NUM
For above messages, we simply ignore them when they are not sent the first
time.
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Tested-by: Marcel Apfelbaum <marcel@redhat.com>
2015-09-23 07:20:00 +03:00
|
|
|
|
2014-06-10 14:02:57 +04:00
|
|
|
Message types
|
|
|
|
-------------
|
|
|
|
|
|
|
|
* VHOST_USER_GET_FEATURES
|
|
|
|
|
2014-06-25 16:49:36 +04:00
|
|
|
Id: 1
|
2014-06-10 14:02:57 +04:00
|
|
|
Equivalent ioctl: VHOST_GET_FEATURES
|
|
|
|
Master payload: N/A
|
|
|
|
Slave payload: u64
|
|
|
|
|
|
|
|
Get from the underlying vhost implementation the features bitmask.
|
2015-09-23 07:19:56 +03:00
|
|
|
Feature bit VHOST_USER_F_PROTOCOL_FEATURES signals slave support for
|
|
|
|
VHOST_USER_GET_PROTOCOL_FEATURES and VHOST_USER_SET_PROTOCOL_FEATURES.
|
2014-06-10 14:02:57 +04:00
|
|
|
|
|
|
|
* VHOST_USER_SET_FEATURES
|
|
|
|
|
2014-06-25 16:49:36 +04:00
|
|
|
Id: 2
|
2014-06-10 14:02:57 +04:00
|
|
|
Ioctl: VHOST_SET_FEATURES
|
|
|
|
Master payload: u64
|
|
|
|
|
|
|
|
Enable features in the underlying vhost implementation using a bitmask.
|
2015-09-23 07:19:56 +03:00
|
|
|
Feature bit VHOST_USER_F_PROTOCOL_FEATURES signals slave support for
|
|
|
|
VHOST_USER_GET_PROTOCOL_FEATURES and VHOST_USER_SET_PROTOCOL_FEATURES.
|
|
|
|
|
|
|
|
* VHOST_USER_GET_PROTOCOL_FEATURES
|
|
|
|
|
|
|
|
Id: 15
|
|
|
|
Equivalent ioctl: VHOST_GET_FEATURES
|
|
|
|
Master payload: N/A
|
|
|
|
Slave payload: u64
|
|
|
|
|
|
|
|
Get the protocol feature bitmask from the underlying vhost implementation.
|
|
|
|
Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in
|
|
|
|
VHOST_USER_GET_FEATURES.
|
|
|
|
Note: slave that reported VHOST_USER_F_PROTOCOL_FEATURES must support
|
|
|
|
this message even before VHOST_USER_SET_FEATURES was called.
|
|
|
|
|
|
|
|
* VHOST_USER_SET_PROTOCOL_FEATURES
|
|
|
|
|
|
|
|
Id: 16
|
|
|
|
Ioctl: VHOST_SET_FEATURES
|
|
|
|
Master payload: u64
|
|
|
|
|
|
|
|
Enable protocol features in the underlying vhost implementation.
|
|
|
|
Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in
|
|
|
|
VHOST_USER_GET_FEATURES.
|
|
|
|
Note: slave that reported VHOST_USER_F_PROTOCOL_FEATURES must support
|
|
|
|
this message even before VHOST_USER_SET_FEATURES was called.
|
2014-06-10 14:02:57 +04:00
|
|
|
|
|
|
|
* VHOST_USER_SET_OWNER
|
|
|
|
|
2014-06-25 16:49:36 +04:00
|
|
|
Id: 3
|
2014-06-10 14:02:57 +04:00
|
|
|
Equivalent ioctl: VHOST_SET_OWNER
|
|
|
|
Master payload: N/A
|
|
|
|
|
|
|
|
Issued when a new connection is established. It sets the current Master
|
|
|
|
as an owner of the session. This can be used on the Slave as a
|
|
|
|
"session start" flag.
|
|
|
|
|
2015-09-23 07:19:57 +03:00
|
|
|
* VHOST_USER_RESET_DEVICE
|
2014-06-10 14:02:57 +04:00
|
|
|
|
2014-06-25 16:49:36 +04:00
|
|
|
Id: 4
|
2015-09-23 07:19:57 +03:00
|
|
|
Equivalent ioctl: VHOST_RESET_DEVICE
|
2014-06-10 14:02:57 +04:00
|
|
|
Master payload: N/A
|
|
|
|
|
|
|
|
Issued when a new connection is about to be closed. The Master will no
|
|
|
|
longer own this connection (and will usually close it).
|
|
|
|
|
|
|
|
* VHOST_USER_SET_MEM_TABLE
|
|
|
|
|
2014-06-25 16:49:36 +04:00
|
|
|
Id: 5
|
2014-06-10 14:02:57 +04:00
|
|
|
Equivalent ioctl: VHOST_SET_MEM_TABLE
|
|
|
|
Master payload: memory regions description
|
|
|
|
|
|
|
|
Sets the memory map regions on the slave so it can translate the vring
|
|
|
|
addresses. In the ancillary data there is an array of file descriptors
|
|
|
|
for each memory mapped region. The size and ordering of the fds matches
|
|
|
|
the number and ordering of memory regions.
|
|
|
|
|
|
|
|
* VHOST_USER_SET_LOG_BASE
|
|
|
|
|
2014-06-25 16:49:36 +04:00
|
|
|
Id: 6
|
2014-06-10 14:02:57 +04:00
|
|
|
Equivalent ioctl: VHOST_SET_LOG_BASE
|
|
|
|
Master payload: u64
|
|
|
|
|
|
|
|
Sets the logging base address.
|
|
|
|
|
|
|
|
* VHOST_USER_SET_LOG_FD
|
|
|
|
|
2014-06-25 16:49:36 +04:00
|
|
|
Id: 7
|
2014-06-10 14:02:57 +04:00
|
|
|
Equivalent ioctl: VHOST_SET_LOG_FD
|
|
|
|
Master payload: N/A
|
|
|
|
|
|
|
|
Sets the logging file descriptor, which is passed as ancillary data.
|
|
|
|
|
|
|
|
* VHOST_USER_SET_VRING_NUM
|
|
|
|
|
2014-06-25 16:49:36 +04:00
|
|
|
Id: 8
|
2014-06-10 14:02:57 +04:00
|
|
|
Equivalent ioctl: VHOST_SET_VRING_NUM
|
|
|
|
Master payload: vring state description
|
|
|
|
|
|
|
|
Sets the number of vrings for this owner.
|
|
|
|
|
|
|
|
* VHOST_USER_SET_VRING_ADDR
|
|
|
|
|
2014-06-25 16:49:36 +04:00
|
|
|
Id: 9
|
2014-06-10 14:02:57 +04:00
|
|
|
Equivalent ioctl: VHOST_SET_VRING_ADDR
|
|
|
|
Master payload: vring address description
|
|
|
|
Slave payload: N/A
|
|
|
|
|
|
|
|
Sets the addresses of the different aspects of the vring.
|
|
|
|
|
|
|
|
* VHOST_USER_SET_VRING_BASE
|
|
|
|
|
2014-06-25 16:49:36 +04:00
|
|
|
Id: 10
|
2014-06-10 14:02:57 +04:00
|
|
|
Equivalent ioctl: VHOST_SET_VRING_BASE
|
|
|
|
Master payload: vring state description
|
|
|
|
|
|
|
|
Sets the base offset in the available vring.
|
|
|
|
|
|
|
|
* VHOST_USER_GET_VRING_BASE
|
|
|
|
|
2014-06-25 16:49:36 +04:00
|
|
|
Id: 11
|
2014-06-10 14:02:57 +04:00
|
|
|
Equivalent ioctl: VHOST_USER_GET_VRING_BASE
|
|
|
|
Master payload: vring state description
|
|
|
|
Slave payload: vring state description
|
|
|
|
|
|
|
|
Get the available vring base offset.
|
|
|
|
|
|
|
|
* VHOST_USER_SET_VRING_KICK
|
|
|
|
|
2014-06-25 16:49:36 +04:00
|
|
|
Id: 12
|
2014-06-10 14:02:57 +04:00
|
|
|
Equivalent ioctl: VHOST_SET_VRING_KICK
|
|
|
|
Master payload: u64
|
|
|
|
|
|
|
|
Set the event file descriptor for adding buffers to the vring. It
|
|
|
|
is passed in the ancillary data.
|
|
|
|
Bits (0-7) of the payload contain the vring index. Bit 8 is the
|
|
|
|
invalid FD flag. This flag is set when there is no file descriptor
|
|
|
|
in the ancillary data. This signals that polling should be used
|
|
|
|
instead of waiting for a kick.
|
|
|
|
|
|
|
|
* VHOST_USER_SET_VRING_CALL
|
|
|
|
|
2014-06-25 16:49:36 +04:00
|
|
|
Id: 13
|
2014-06-10 14:02:57 +04:00
|
|
|
Equivalent ioctl: VHOST_SET_VRING_CALL
|
|
|
|
Master payload: u64
|
|
|
|
|
|
|
|
Set the event file descriptor to signal when buffers are used. It
|
|
|
|
is passed in the ancillary data.
|
|
|
|
Bits (0-7) of the payload contain the vring index. Bit 8 is the
|
|
|
|
invalid FD flag. This flag is set when there is no file descriptor
|
|
|
|
in the ancillary data. This signals that polling will be used
|
|
|
|
instead of waiting for the call.
|
|
|
|
|
|
|
|
* VHOST_USER_SET_VRING_ERR
|
|
|
|
|
2014-06-25 16:49:36 +04:00
|
|
|
Id: 14
|
2014-06-10 14:02:57 +04:00
|
|
|
Equivalent ioctl: VHOST_SET_VRING_ERR
|
|
|
|
Master payload: u64
|
|
|
|
|
|
|
|
Set the event file descriptor to signal when error occurs. It
|
|
|
|
is passed in the ancillary data.
|
|
|
|
Bits (0-7) of the payload contain the vring index. Bit 8 is the
|
|
|
|
invalid FD flag. This flag is set when there is no file descriptor
|
|
|
|
in the ancillary data.
|
2015-09-23 07:19:58 +03:00
|
|
|
|
|
|
|
* VHOST_USER_GET_QUEUE_NUM
|
|
|
|
|
|
|
|
Id: 17
|
|
|
|
Equivalent ioctl: N/A
|
|
|
|
Master payload: N/A
|
|
|
|
Slave payload: u64
|
|
|
|
|
|
|
|
Query how many queues the backend supports. This request should be
|
|
|
|
sent only when VHOST_USER_PROTOCOL_F_MQ is set in quried protocol
|
|
|
|
features by VHOST_USER_GET_PROTOCOL_FEATURES.
|
2015-09-23 07:20:01 +03:00
|
|
|
|
|
|
|
* VHOST_USER_SET_VRING_ENABLE
|
|
|
|
|
|
|
|
Id: 18
|
|
|
|
Equivalent ioctl: N/A
|
|
|
|
Master payload: vring state description
|
|
|
|
|
|
|
|
Signal slave to enable or disable corresponding vring.
|