5ad204bf2a
This patch introduces two new messages VHOST_USER_GET_INFLIGHT_FD and VHOST_USER_SET_INFLIGHT_FD to support transferring a shared buffer between qemu and backend. Firstly, qemu uses VHOST_USER_GET_INFLIGHT_FD to get the shared buffer from backend. Then qemu should send it back through VHOST_USER_SET_INFLIGHT_FD each time we start vhost-user. This shared buffer is used to track inflight I/O by backend. Qemu should retrieve a new one when vm reset. Signed-off-by: Xie Yongji <xieyongji@baidu.com> Signed-off-by: Chai Wen <chaiwen@baidu.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Message-Id: <20190228085355.9614-2-xieyongji@baidu.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
1220 lines
44 KiB
Plaintext
1220 lines
44 KiB
Plaintext
Vhost-user Protocol
|
|
===================
|
|
|
|
Copyright (c) 2014 Virtual Open Systems Sarl.
|
|
|
|
This work is licensed under the terms of the GNU GPL, version 2 or later.
|
|
See the COPYING file in the top-level directory.
|
|
===================
|
|
|
|
This protocol is aiming to complement the ioctl interface used to control the
|
|
vhost implementation in the Linux kernel. It implements the control plane needed
|
|
to establish virtqueue sharing with a user space process on the same host. It
|
|
uses communication over a Unix domain socket to share file descriptors in the
|
|
ancillary data of the message.
|
|
|
|
The protocol defines 2 sides of the communication, master and slave. Master is
|
|
the application that shares its virtqueues, in our case QEMU. Slave is the
|
|
consumer of the virtqueues.
|
|
|
|
In the current implementation QEMU is the Master, and the Slave is the
|
|
external process consuming the virtio queues, for example a software
|
|
Ethernet switch running in user space, such as Snabbswitch, or a block
|
|
device backend processing read & write to a virtual disk. In order to
|
|
facilitate interoperability between various backend implementations,
|
|
it is recommended to follow the "Backend program conventions"
|
|
described in this document.
|
|
|
|
Master and slave can be either a client (i.e. connecting) or server (listening)
|
|
in the socket communication.
|
|
|
|
Message Specification
|
|
---------------------
|
|
|
|
Note that all numbers are in the machine native byte order. A vhost-user message
|
|
consists of 3 header fields and a payload:
|
|
|
|
------------------------------------
|
|
| request | flags | size | payload |
|
|
------------------------------------
|
|
|
|
* Request: 32-bit type of the request
|
|
* Flags: 32-bit bit field:
|
|
- Lower 2 bits are the version (currently 0x01)
|
|
- Bit 2 is the reply flag - needs to be sent on each reply from the slave
|
|
- Bit 3 is the need_reply flag - see VHOST_USER_PROTOCOL_F_REPLY_ACK for
|
|
details.
|
|
* Size - 32-bit size of the payload
|
|
|
|
|
|
Depending on the request type, payload can be:
|
|
|
|
* A single 64-bit integer
|
|
-------
|
|
| u64 |
|
|
-------
|
|
|
|
u64: a 64-bit unsigned integer
|
|
|
|
* A vring state description
|
|
---------------
|
|
| index | num |
|
|
---------------
|
|
|
|
Index: a 32-bit index
|
|
Num: a 32-bit number
|
|
|
|
* A vring address description
|
|
--------------------------------------------------------------
|
|
| index | flags | size | descriptor | used | available | log |
|
|
--------------------------------------------------------------
|
|
|
|
Index: a 32-bit vring index
|
|
Flags: a 32-bit vring flags
|
|
Descriptor: a 64-bit ring address of the vring descriptor table
|
|
Used: a 64-bit ring address of the vring used ring
|
|
Available: a 64-bit ring address of the vring available ring
|
|
Log: a 64-bit guest address for logging
|
|
|
|
Note that a ring address is an IOVA if VIRTIO_F_IOMMU_PLATFORM has been
|
|
negotiated. Otherwise it is a user address.
|
|
|
|
* Memory regions description
|
|
---------------------------------------------------
|
|
| num regions | padding | region0 | ... | region7 |
|
|
---------------------------------------------------
|
|
|
|
Num regions: a 32-bit number of regions
|
|
Padding: 32-bit
|
|
|
|
A region is:
|
|
-----------------------------------------------------
|
|
| guest address | size | user address | mmap offset |
|
|
-----------------------------------------------------
|
|
|
|
Guest address: a 64-bit guest address of the region
|
|
Size: a 64-bit size
|
|
User address: a 64-bit user address
|
|
mmap offset: 64-bit offset where region starts in the mapped memory
|
|
|
|
* Log description
|
|
---------------------------
|
|
| log size | log offset |
|
|
---------------------------
|
|
log size: size of area used for logging
|
|
log offset: offset from start of supplied file descriptor
|
|
where logging starts (i.e. where guest address 0 would be logged)
|
|
|
|
* An IOTLB message
|
|
---------------------------------------------------------
|
|
| iova | size | user address | permissions flags | type |
|
|
---------------------------------------------------------
|
|
|
|
IOVA: a 64-bit I/O virtual address programmed by the guest
|
|
Size: a 64-bit size
|
|
User address: a 64-bit user address
|
|
Permissions: an 8-bit value:
|
|
- 0: No access
|
|
- 1: Read access
|
|
- 2: Write access
|
|
- 3: Read/Write access
|
|
Type: an 8-bit IOTLB message type:
|
|
- 1: IOTLB miss
|
|
- 2: IOTLB update
|
|
- 3: IOTLB invalidate
|
|
- 4: IOTLB access fail
|
|
|
|
* Virtio device config space
|
|
-----------------------------------
|
|
| offset | size | flags | payload |
|
|
-----------------------------------
|
|
|
|
Offset: a 32-bit offset of virtio device's configuration space
|
|
Size: a 32-bit configuration space access size in bytes
|
|
Flags: a 32-bit value:
|
|
- 0: Vhost master messages used for writeable fields
|
|
- 1: Vhost master messages used for live migration
|
|
Payload: Size bytes array holding the contents of the virtio
|
|
device's configuration space
|
|
|
|
* Vring area description
|
|
-----------------------
|
|
| u64 | size | offset |
|
|
-----------------------
|
|
|
|
u64: a 64-bit integer contains vring index and flags
|
|
Size: a 64-bit size of this area
|
|
Offset: a 64-bit offset of this area from the start of the
|
|
supplied file descriptor
|
|
|
|
* Inflight description
|
|
-----------------------------------------------------
|
|
| mmap size | mmap offset | num queues | queue size |
|
|
-----------------------------------------------------
|
|
|
|
mmap size: a 64-bit size of area to track inflight I/O
|
|
mmap offset: a 64-bit offset of this area from the start
|
|
of the supplied file descriptor
|
|
num queues: a 16-bit number of virtqueues
|
|
queue size: a 16-bit size of virtqueues
|
|
|
|
In QEMU the vhost-user message is implemented with the following struct:
|
|
|
|
typedef struct VhostUserMsg {
|
|
VhostUserRequest request;
|
|
uint32_t flags;
|
|
uint32_t size;
|
|
union {
|
|
uint64_t u64;
|
|
struct vhost_vring_state state;
|
|
struct vhost_vring_addr addr;
|
|
VhostUserMemory memory;
|
|
VhostUserLog log;
|
|
struct vhost_iotlb_msg iotlb;
|
|
VhostUserConfig config;
|
|
VhostUserVringArea area;
|
|
VhostUserInflight inflight;
|
|
};
|
|
} QEMU_PACKED VhostUserMsg;
|
|
|
|
Communication
|
|
-------------
|
|
|
|
The protocol for vhost-user is based on the existing implementation of vhost
|
|
for the Linux Kernel. Most messages that can be sent via the Unix domain socket
|
|
implementing vhost-user have an equivalent ioctl to the kernel implementation.
|
|
|
|
The communication consists of master sending message requests and slave sending
|
|
message replies. Most of the requests don't require replies. Here is a list of
|
|
the ones that do:
|
|
|
|
* VHOST_USER_GET_FEATURES
|
|
* VHOST_USER_GET_PROTOCOL_FEATURES
|
|
* VHOST_USER_GET_VRING_BASE
|
|
* VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
|
|
* VHOST_USER_GET_INFLIGHT_FD (if VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)
|
|
|
|
[ Also see the section on REPLY_ACK protocol extension. ]
|
|
|
|
There are several messages that the master sends with file descriptors passed
|
|
in the ancillary data:
|
|
|
|
* VHOST_USER_SET_MEM_TABLE
|
|
* VHOST_USER_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
|
|
* VHOST_USER_SET_LOG_FD
|
|
* VHOST_USER_SET_VRING_KICK
|
|
* VHOST_USER_SET_VRING_CALL
|
|
* VHOST_USER_SET_VRING_ERR
|
|
* VHOST_USER_SET_SLAVE_REQ_FD
|
|
* VHOST_USER_SET_INFLIGHT_FD (if VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)
|
|
|
|
If Master is unable to send the full message or receives a wrong reply it will
|
|
close the connection. An optional reconnection mechanism can be implemented.
|
|
|
|
Any protocol extensions are gated by protocol feature bits,
|
|
which allows full backwards compatibility on both master
|
|
and slave.
|
|
As older slaves don't support negotiating protocol features,
|
|
a feature bit was dedicated for this purpose:
|
|
#define VHOST_USER_F_PROTOCOL_FEATURES 30
|
|
|
|
Starting and stopping rings
|
|
----------------------
|
|
Client must only process each ring when it is started.
|
|
|
|
Client must only pass data between the ring and the
|
|
backend, when the ring is enabled.
|
|
|
|
If ring is started but disabled, client must process the
|
|
ring without talking to the backend.
|
|
|
|
For example, for a networking device, in the disabled state
|
|
client must not supply any new RX packets, but must process
|
|
and discard any TX packets.
|
|
|
|
If VHOST_USER_F_PROTOCOL_FEATURES has not been negotiated, the ring is initialized
|
|
in an enabled state.
|
|
|
|
If VHOST_USER_F_PROTOCOL_FEATURES has been negotiated, the ring is initialized
|
|
in a disabled state. Client must not pass data to/from the backend until ring is enabled by
|
|
VHOST_USER_SET_VRING_ENABLE with parameter 1, or after it has been disabled by
|
|
VHOST_USER_SET_VRING_ENABLE with parameter 0.
|
|
|
|
Each ring is initialized in a stopped state, client must not process it until
|
|
ring is started, or after it has been stopped.
|
|
|
|
Client must start ring upon receiving a kick (that is, detecting that file
|
|
descriptor is readable) on the descriptor specified by
|
|
VHOST_USER_SET_VRING_KICK, and stop ring upon receiving
|
|
VHOST_USER_GET_VRING_BASE.
|
|
|
|
While processing the rings (whether they are enabled or not), client must
|
|
support changing some configuration aspects on the fly.
|
|
|
|
Multiple queue support
|
|
----------------------
|
|
|
|
Multiple queue is treated as a protocol extension, hence the slave has to
|
|
implement protocol features first. The multiple queues feature is supported
|
|
only when the protocol feature VHOST_USER_PROTOCOL_F_MQ (bit 0) is set.
|
|
|
|
The max number of queue pairs the slave supports can be queried with message
|
|
VHOST_USER_GET_QUEUE_NUM. Master should stop when the number of
|
|
requested queues is bigger than that.
|
|
|
|
As all queues share one connection, the master uses a unique index for each
|
|
queue in the sent message to identify a specified queue. One queue pair
|
|
is enabled initially. More queues are enabled dynamically, by sending
|
|
message VHOST_USER_SET_VRING_ENABLE.
|
|
|
|
Migration
|
|
---------
|
|
|
|
During live migration, the master may need to track the modifications
|
|
the slave makes to the memory mapped regions. The client should mark
|
|
the dirty pages in a log. Once it complies to this logging, it may
|
|
declare the VHOST_F_LOG_ALL vhost feature.
|
|
|
|
To start/stop logging of data/used ring writes, server may send messages
|
|
VHOST_USER_SET_FEATURES with VHOST_F_LOG_ALL and VHOST_USER_SET_VRING_ADDR with
|
|
VHOST_VRING_F_LOG in ring's flags set to 1/0, respectively.
|
|
|
|
All the modifications to memory pointed by vring "descriptor" should
|
|
be marked. Modifications to "used" vring should be marked if
|
|
VHOST_VRING_F_LOG is part of ring's flags.
|
|
|
|
Dirty pages are of size:
|
|
#define VHOST_LOG_PAGE 0x1000
|
|
|
|
The log memory fd is provided in the ancillary data of
|
|
VHOST_USER_SET_LOG_BASE message when the slave has
|
|
VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature.
|
|
|
|
The size of the log is supplied as part of VhostUserMsg
|
|
which should be large enough to cover all known guest
|
|
addresses. Log starts at the supplied offset in the
|
|
supplied file descriptor.
|
|
The log covers from address 0 to the maximum of guest
|
|
regions. In pseudo-code, to mark page at "addr" as dirty:
|
|
|
|
page = addr / VHOST_LOG_PAGE
|
|
log[page / 8] |= 1 << page % 8
|
|
|
|
Where addr is the guest physical address.
|
|
|
|
Use atomic operations, as the log may be concurrently manipulated.
|
|
|
|
Note that when logging modifications to the used ring (when VHOST_VRING_F_LOG
|
|
is set for this ring), log_guest_addr should be used to calculate the log
|
|
offset: the write to first byte of the used ring is logged at this offset from
|
|
log start. Also note that this value might be outside the legal guest physical
|
|
address range (i.e. does not have to be covered by the VhostUserMemory table),
|
|
but the bit offset of the last byte of the ring must fall within
|
|
the size supplied by VhostUserLog.
|
|
|
|
VHOST_USER_SET_LOG_FD is an optional message with an eventfd in
|
|
ancillary data, it may be used to inform the master that the log has
|
|
been modified.
|
|
|
|
Once the source has finished migration, rings will be stopped by
|
|
the source. No further update must be done before rings are
|
|
restarted.
|
|
|
|
In postcopy migration the slave is started before all the memory has been
|
|
received from the source host, and care must be taken to avoid accessing pages
|
|
that have yet to be received. The slave opens a 'userfault'-fd and registers
|
|
the memory with it; this fd is then passed back over to the master.
|
|
The master services requests on the userfaultfd for pages that are accessed
|
|
and when the page is available it performs WAKE ioctl's on the userfaultfd
|
|
to wake the stalled slave. The client indicates support for this via the
|
|
VHOST_USER_PROTOCOL_F_PAGEFAULT feature.
|
|
|
|
Memory access
|
|
-------------
|
|
|
|
The master sends a list of vhost memory regions to the slave using the
|
|
VHOST_USER_SET_MEM_TABLE message. Each region has two base addresses: a guest
|
|
address and a user address.
|
|
|
|
Messages contain guest addresses and/or user addresses to reference locations
|
|
within the shared memory. The mapping of these addresses works as follows.
|
|
|
|
User addresses map to the vhost memory region containing that user address.
|
|
|
|
When the VIRTIO_F_IOMMU_PLATFORM feature has not been negotiated:
|
|
|
|
* Guest addresses map to the vhost memory region containing that guest
|
|
address.
|
|
|
|
When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated:
|
|
|
|
* Guest addresses are also called I/O virtual addresses (IOVAs). They are
|
|
translated to user addresses via the IOTLB.
|
|
|
|
* The vhost memory region guest address is not used.
|
|
|
|
IOMMU support
|
|
-------------
|
|
|
|
When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated, the master
|
|
sends IOTLB entries update & invalidation by sending VHOST_USER_IOTLB_MSG
|
|
requests to the slave with a struct vhost_iotlb_msg as payload. For update
|
|
events, the iotlb payload has to be filled with the update message type (2),
|
|
the I/O virtual address, the size, the user virtual address, and the
|
|
permissions flags. Addresses and size must be within vhost memory regions set
|
|
via the VHOST_USER_SET_MEM_TABLE request. For invalidation events, the iotlb
|
|
payload has to be filled with the invalidation message type (3), the I/O virtual
|
|
address and the size. On success, the slave is expected to reply with a zero
|
|
payload, non-zero otherwise.
|
|
|
|
The slave relies on the slave communcation channel (see "Slave communication"
|
|
section below) to send IOTLB miss and access failure events, by sending
|
|
VHOST_USER_SLAVE_IOTLB_MSG requests to the master with a struct vhost_iotlb_msg
|
|
as payload. For miss events, the iotlb payload has to be filled with the miss
|
|
message type (1), the I/O virtual address and the permissions flags. For access
|
|
failure event, the iotlb payload has to be filled with the access failure
|
|
message type (4), the I/O virtual address and the permissions flags.
|
|
For synchronization purpose, the slave may rely on the reply-ack feature,
|
|
so the master may send a reply when operation is completed if the reply-ack
|
|
feature is negotiated and slaves requests a reply. For miss events, completed
|
|
operation means either master sent an update message containing the IOTLB entry
|
|
containing requested address and permission, or master sent nothing if the IOTLB
|
|
miss message is invalid (invalid IOVA or permission).
|
|
|
|
The master isn't expected to take the initiative to send IOTLB update messages,
|
|
as the slave sends IOTLB miss messages for the guest virtual memory areas it
|
|
needs to access.
|
|
|
|
Slave communication
|
|
-------------------
|
|
|
|
An optional communication channel is provided if the slave declares
|
|
VHOST_USER_PROTOCOL_F_SLAVE_REQ protocol feature, to allow the slave to make
|
|
requests to the master.
|
|
|
|
The fd is provided via VHOST_USER_SET_SLAVE_REQ_FD ancillary data.
|
|
|
|
A slave may then send VHOST_USER_SLAVE_* messages to the master
|
|
using this fd communication channel.
|
|
|
|
If VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD protocol feature is negotiated,
|
|
slave can send file descriptors (at most 8 descriptors in each message)
|
|
to master via ancillary data using this fd communication channel.
|
|
|
|
Inflight I/O tracking
|
|
---------------------
|
|
|
|
To support reconnecting after restart or crash, slave may need to resubmit
|
|
inflight I/Os. If virtqueue is processed in order, we can easily achieve
|
|
that by getting the inflight descriptors from descriptor table (split virtqueue)
|
|
or descriptor ring (packed virtqueue). However, it can't work when we process
|
|
descriptors out-of-order because some entries which store the information of
|
|
inflight descriptors in available ring (split virtqueue) or descriptor
|
|
ring (packed virtqueue) might be overrided by new entries. To solve this
|
|
problem, slave need to allocate an extra buffer to store this information of inflight
|
|
descriptors and share it with master for persistent. VHOST_USER_GET_INFLIGHT_FD and
|
|
VHOST_USER_SET_INFLIGHT_FD are used to transfer this buffer between master
|
|
and slave. And the format of this buffer is described below:
|
|
|
|
-------------------------------------------------------
|
|
| queue0 region | queue1 region | ... | queueN region |
|
|
-------------------------------------------------------
|
|
|
|
N is the number of available virtqueues. Slave could get it from num queues
|
|
field of VhostUserInflight.
|
|
|
|
For split virtqueue, queue region can be implemented as:
|
|
|
|
typedef struct DescStateSplit {
|
|
/* Indicate whether this descriptor is inflight or not.
|
|
* Only available for head-descriptor. */
|
|
uint8_t inflight;
|
|
|
|
/* Padding */
|
|
uint8_t padding[5];
|
|
|
|
/* Maintain a list for the last batch of used descriptors.
|
|
* Only available when batching is used for submitting */
|
|
uint16_t next;
|
|
|
|
/* Used to preserve the order of fetching available descriptors.
|
|
* Only available for head-descriptor. */
|
|
uint64_t counter;
|
|
} DescStateSplit;
|
|
|
|
typedef struct QueueRegionSplit {
|
|
/* The feature flags of this region. Now it's initialized to 0. */
|
|
uint64_t features;
|
|
|
|
/* The version of this region. It's 1 currently.
|
|
* Zero value indicates an uninitialized buffer */
|
|
uint16_t version;
|
|
|
|
/* The size of DescStateSplit array. It's equal to the virtqueue
|
|
* size. Slave could get it from queue size field of VhostUserInflight. */
|
|
uint16_t desc_num;
|
|
|
|
/* The head of list that track the last batch of used descriptors. */
|
|
uint16_t last_batch_head;
|
|
|
|
/* Store the idx value of used ring */
|
|
uint16_t used_idx;
|
|
|
|
/* Used to track the state of each descriptor in descriptor table */
|
|
DescStateSplit desc[0];
|
|
} QueueRegionSplit;
|
|
|
|
To track inflight I/O, the queue region should be processed as follows:
|
|
|
|
When receiving available buffers from the driver:
|
|
|
|
1. Get the next available head-descriptor index from available ring, i
|
|
|
|
2. Set desc[i].counter to the value of global counter
|
|
|
|
3. Increase global counter by 1
|
|
|
|
4. Set desc[i].inflight to 1
|
|
|
|
When supplying used buffers to the driver:
|
|
|
|
1. Get corresponding used head-descriptor index, i
|
|
|
|
2. Set desc[i].next to last_batch_head
|
|
|
|
3. Set last_batch_head to i
|
|
|
|
4. Steps 1,2,3 may be performed repeatedly if batching is possible
|
|
|
|
5. Increase the idx value of used ring by the size of the batch
|
|
|
|
6. Set the inflight field of each DescStateSplit entry in the batch to 0
|
|
|
|
7. Set used_idx to the idx value of used ring
|
|
|
|
When reconnecting:
|
|
|
|
1. If the value of used_idx does not match the idx value of used ring (means
|
|
the inflight field of DescStateSplit entries in last batch may be incorrect),
|
|
|
|
(a) Subtract the value of used_idx from the idx value of used ring to get
|
|
last batch size of DescStateSplit entries
|
|
|
|
(b) Set the inflight field of each DescStateSplit entry to 0 in last batch
|
|
list which starts from last_batch_head
|
|
|
|
(c) Set used_idx to the idx value of used ring
|
|
|
|
2. Resubmit inflight DescStateSplit entries in order of their counter value
|
|
|
|
For packed virtqueue, queue region can be implemented as:
|
|
|
|
typedef struct DescStatePacked {
|
|
/* Indicate whether this descriptor is inflight or not.
|
|
* Only available for head-descriptor. */
|
|
uint8_t inflight;
|
|
|
|
/* Padding */
|
|
uint8_t padding;
|
|
|
|
/* Link to the next free entry */
|
|
uint16_t next;
|
|
|
|
/* Link to the last entry of descriptor list.
|
|
* Only available for head-descriptor. */
|
|
uint16_t last;
|
|
|
|
/* The length of descriptor list.
|
|
* Only available for head-descriptor. */
|
|
uint16_t num;
|
|
|
|
/* Used to preserve the order of fetching available descriptors.
|
|
* Only available for head-descriptor. */
|
|
uint64_t counter;
|
|
|
|
/* The buffer id */
|
|
uint16_t id;
|
|
|
|
/* The descriptor flags */
|
|
uint16_t flags;
|
|
|
|
/* The buffer length */
|
|
uint32_t len;
|
|
|
|
/* The buffer address */
|
|
uint64_t addr;
|
|
} DescStatePacked;
|
|
|
|
typedef struct QueueRegionPacked {
|
|
/* The feature flags of this region. Now it's initialized to 0. */
|
|
uint64_t features;
|
|
|
|
/* The version of this region. It's 1 currently.
|
|
* Zero value indicates an uninitialized buffer */
|
|
uint16_t version;
|
|
|
|
/* The size of DescStatePacked array. It's equal to the virtqueue
|
|
* size. Slave could get it from queue size field of VhostUserInflight. */
|
|
uint16_t desc_num;
|
|
|
|
/* The head of free DescStatePacked entry list */
|
|
uint16_t free_head;
|
|
|
|
/* The old head of free DescStatePacked entry list */
|
|
uint16_t old_free_head;
|
|
|
|
/* The used index of descriptor ring */
|
|
uint16_t used_idx;
|
|
|
|
/* The old used index of descriptor ring */
|
|
uint16_t old_used_idx;
|
|
|
|
/* Device ring wrap counter */
|
|
uint8_t used_wrap_counter;
|
|
|
|
/* The old device ring wrap counter */
|
|
uint8_t old_used_wrap_counter;
|
|
|
|
/* Padding */
|
|
uint8_t padding[7];
|
|
|
|
/* Used to track the state of each descriptor fetched from descriptor ring */
|
|
DescStatePacked desc[0];
|
|
} QueueRegionPacked;
|
|
|
|
To track inflight I/O, the queue region should be processed as follows:
|
|
|
|
When receiving available buffers from the driver:
|
|
|
|
1. Get the next available descriptor entry from descriptor ring, d
|
|
|
|
2. If d is head descriptor,
|
|
|
|
(a) Set desc[old_free_head].num to 0
|
|
|
|
(b) Set desc[old_free_head].counter to the value of global counter
|
|
|
|
(c) Increase global counter by 1
|
|
|
|
(d) Set desc[old_free_head].inflight to 1
|
|
|
|
3. If d is last descriptor, set desc[old_free_head].last to free_head
|
|
|
|
4. Increase desc[old_free_head].num by 1
|
|
|
|
5. Set desc[free_head].addr, desc[free_head].len, desc[free_head].flags,
|
|
desc[free_head].id to d.addr, d.len, d.flags, d.id
|
|
|
|
6. Set free_head to desc[free_head].next
|
|
|
|
7. If d is last descriptor, set old_free_head to free_head
|
|
|
|
When supplying used buffers to the driver:
|
|
|
|
1. Get corresponding used head-descriptor entry from descriptor ring, d
|
|
|
|
2. Get corresponding DescStatePacked entry, e
|
|
|
|
3. Set desc[e.last].next to free_head
|
|
|
|
4. Set free_head to the index of e
|
|
|
|
5. Steps 1,2,3,4 may be performed repeatedly if batching is possible
|
|
|
|
6. Increase used_idx by the size of the batch and update used_wrap_counter if needed
|
|
|
|
7. Update d.flags
|
|
|
|
8. Set the inflight field of each head DescStatePacked entry in the batch to 0
|
|
|
|
9. Set old_free_head, old_used_idx, old_used_wrap_counter to free_head, used_idx,
|
|
used_wrap_counter
|
|
|
|
When reconnecting:
|
|
|
|
1. If used_idx does not match old_used_idx (means the inflight field of DescStatePacked
|
|
entries in last batch may be incorrect),
|
|
|
|
(a) Get the next descriptor ring entry through old_used_idx, d
|
|
|
|
(b) Use old_used_wrap_counter to calculate the available flags
|
|
|
|
(c) If d.flags is not equal to the calculated flags value (means slave has
|
|
submitted the buffer to guest driver before crash, so it has to commit the
|
|
in-progres update), set old_free_head, old_used_idx, old_used_wrap_counter
|
|
to free_head, used_idx, used_wrap_counter
|
|
|
|
2. Set free_head, used_idx, used_wrap_counter to old_free_head, old_used_idx,
|
|
old_used_wrap_counter (roll back any in-progress update)
|
|
|
|
3. Set the inflight field of each DescStatePacked entry in free list to 0
|
|
|
|
4. Resubmit inflight DescStatePacked entries in order of their counter value
|
|
|
|
Protocol features
|
|
-----------------
|
|
|
|
#define VHOST_USER_PROTOCOL_F_MQ 0
|
|
#define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1
|
|
#define VHOST_USER_PROTOCOL_F_RARP 2
|
|
#define VHOST_USER_PROTOCOL_F_REPLY_ACK 3
|
|
#define VHOST_USER_PROTOCOL_F_MTU 4
|
|
#define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5
|
|
#define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN 6
|
|
#define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
|
|
#define VHOST_USER_PROTOCOL_F_PAGEFAULT 8
|
|
#define VHOST_USER_PROTOCOL_F_CONFIG 9
|
|
#define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD 10
|
|
#define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11
|
|
#define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
|
|
|
|
Master message types
|
|
--------------------
|
|
|
|
* VHOST_USER_GET_FEATURES
|
|
|
|
Id: 1
|
|
Equivalent ioctl: VHOST_GET_FEATURES
|
|
Master payload: N/A
|
|
Slave payload: u64
|
|
|
|
Get from the underlying vhost implementation the features bitmask.
|
|
Feature bit VHOST_USER_F_PROTOCOL_FEATURES signals slave support for
|
|
VHOST_USER_GET_PROTOCOL_FEATURES and VHOST_USER_SET_PROTOCOL_FEATURES.
|
|
|
|
* VHOST_USER_SET_FEATURES
|
|
|
|
Id: 2
|
|
Ioctl: VHOST_SET_FEATURES
|
|
Master payload: u64
|
|
|
|
Enable features in the underlying vhost implementation using a bitmask.
|
|
Feature bit VHOST_USER_F_PROTOCOL_FEATURES signals slave support for
|
|
VHOST_USER_GET_PROTOCOL_FEATURES and VHOST_USER_SET_PROTOCOL_FEATURES.
|
|
|
|
* VHOST_USER_GET_PROTOCOL_FEATURES
|
|
|
|
Id: 15
|
|
Equivalent ioctl: VHOST_GET_FEATURES
|
|
Master payload: N/A
|
|
Slave payload: u64
|
|
|
|
Get the protocol feature bitmask from the underlying vhost implementation.
|
|
Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in
|
|
VHOST_USER_GET_FEATURES.
|
|
Note: slave that reported VHOST_USER_F_PROTOCOL_FEATURES must support
|
|
this message even before VHOST_USER_SET_FEATURES was called.
|
|
|
|
* VHOST_USER_SET_PROTOCOL_FEATURES
|
|
|
|
Id: 16
|
|
Ioctl: VHOST_SET_FEATURES
|
|
Master payload: u64
|
|
|
|
Enable protocol features in the underlying vhost implementation.
|
|
Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in
|
|
VHOST_USER_GET_FEATURES.
|
|
Note: slave that reported VHOST_USER_F_PROTOCOL_FEATURES must support
|
|
this message even before VHOST_USER_SET_FEATURES was called.
|
|
|
|
* VHOST_USER_SET_OWNER
|
|
|
|
Id: 3
|
|
Equivalent ioctl: VHOST_SET_OWNER
|
|
Master payload: N/A
|
|
|
|
Issued when a new connection is established. It sets the current Master
|
|
as an owner of the session. This can be used on the Slave as a
|
|
"session start" flag.
|
|
|
|
* VHOST_USER_RESET_OWNER
|
|
|
|
Id: 4
|
|
Master payload: N/A
|
|
|
|
This is no longer used. Used to be sent to request disabling
|
|
all rings, but some clients interpreted it to also discard
|
|
connection state (this interpretation would lead to bugs).
|
|
It is recommended that clients either ignore this message,
|
|
or use it to disable all rings.
|
|
|
|
* VHOST_USER_SET_MEM_TABLE
|
|
|
|
Id: 5
|
|
Equivalent ioctl: VHOST_SET_MEM_TABLE
|
|
Master payload: memory regions description
|
|
Slave payload: (postcopy only) memory regions description
|
|
|
|
Sets the memory map regions on the slave so it can translate the vring
|
|
addresses. In the ancillary data there is an array of file descriptors
|
|
for each memory mapped region. The size and ordering of the fds matches
|
|
the number and ordering of memory regions.
|
|
|
|
When VHOST_USER_POSTCOPY_LISTEN has been received, SET_MEM_TABLE replies with
|
|
the bases of the memory mapped regions to the master. The slave must
|
|
have mmap'd the regions but not yet accessed them and should not yet generate
|
|
a userfault event. Note NEED_REPLY_MASK is not set in this case.
|
|
QEMU will then reply back to the list of mappings with an empty
|
|
VHOST_USER_SET_MEM_TABLE as an acknowledgment; only upon reception of this
|
|
message may the guest start accessing the memory and generating faults.
|
|
|
|
* VHOST_USER_SET_LOG_BASE
|
|
|
|
Id: 6
|
|
Equivalent ioctl: VHOST_SET_LOG_BASE
|
|
Master payload: u64
|
|
Slave payload: N/A
|
|
|
|
Sets logging shared memory space.
|
|
When slave has VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol
|
|
feature, the log memory fd is provided in the ancillary data of
|
|
VHOST_USER_SET_LOG_BASE message, the size and offset of shared
|
|
memory area provided in the message.
|
|
|
|
|
|
* VHOST_USER_SET_LOG_FD
|
|
|
|
Id: 7
|
|
Equivalent ioctl: VHOST_SET_LOG_FD
|
|
Master payload: N/A
|
|
|
|
Sets the logging file descriptor, which is passed as ancillary data.
|
|
|
|
* VHOST_USER_SET_VRING_NUM
|
|
|
|
Id: 8
|
|
Equivalent ioctl: VHOST_SET_VRING_NUM
|
|
Master payload: vring state description
|
|
|
|
Set the size of the queue.
|
|
|
|
* VHOST_USER_SET_VRING_ADDR
|
|
|
|
Id: 9
|
|
Equivalent ioctl: VHOST_SET_VRING_ADDR
|
|
Master payload: vring address description
|
|
Slave payload: N/A
|
|
|
|
Sets the addresses of the different aspects of the vring.
|
|
|
|
* VHOST_USER_SET_VRING_BASE
|
|
|
|
Id: 10
|
|
Equivalent ioctl: VHOST_SET_VRING_BASE
|
|
Master payload: vring state description
|
|
|
|
Sets the base offset in the available vring.
|
|
|
|
* VHOST_USER_GET_VRING_BASE
|
|
|
|
Id: 11
|
|
Equivalent ioctl: VHOST_USER_GET_VRING_BASE
|
|
Master payload: vring state description
|
|
Slave payload: vring state description
|
|
|
|
Get the available vring base offset.
|
|
|
|
* VHOST_USER_SET_VRING_KICK
|
|
|
|
Id: 12
|
|
Equivalent ioctl: VHOST_SET_VRING_KICK
|
|
Master payload: u64
|
|
|
|
Set the event file descriptor for adding buffers to the vring. It
|
|
is passed in the ancillary data.
|
|
Bits (0-7) of the payload contain the vring index. Bit 8 is the
|
|
invalid FD flag. This flag is set when there is no file descriptor
|
|
in the ancillary data. This signals that polling should be used
|
|
instead of waiting for a kick.
|
|
|
|
* VHOST_USER_SET_VRING_CALL
|
|
|
|
Id: 13
|
|
Equivalent ioctl: VHOST_SET_VRING_CALL
|
|
Master payload: u64
|
|
|
|
Set the event file descriptor to signal when buffers are used. It
|
|
is passed in the ancillary data.
|
|
Bits (0-7) of the payload contain the vring index. Bit 8 is the
|
|
invalid FD flag. This flag is set when there is no file descriptor
|
|
in the ancillary data. This signals that polling will be used
|
|
instead of waiting for the call.
|
|
|
|
* VHOST_USER_SET_VRING_ERR
|
|
|
|
Id: 14
|
|
Equivalent ioctl: VHOST_SET_VRING_ERR
|
|
Master payload: u64
|
|
|
|
Set the event file descriptor to signal when error occurs. It
|
|
is passed in the ancillary data.
|
|
Bits (0-7) of the payload contain the vring index. Bit 8 is the
|
|
invalid FD flag. This flag is set when there is no file descriptor
|
|
in the ancillary data.
|
|
|
|
* VHOST_USER_GET_QUEUE_NUM
|
|
|
|
Id: 17
|
|
Equivalent ioctl: N/A
|
|
Master payload: N/A
|
|
Slave payload: u64
|
|
|
|
Query how many queues the backend supports. This request should be
|
|
sent only when VHOST_USER_PROTOCOL_F_MQ is set in queried protocol
|
|
features by VHOST_USER_GET_PROTOCOL_FEATURES.
|
|
|
|
* VHOST_USER_SET_VRING_ENABLE
|
|
|
|
Id: 18
|
|
Equivalent ioctl: N/A
|
|
Master payload: vring state description
|
|
|
|
Signal slave to enable or disable corresponding vring.
|
|
This request should be sent only when VHOST_USER_F_PROTOCOL_FEATURES
|
|
has been negotiated.
|
|
|
|
* VHOST_USER_SEND_RARP
|
|
|
|
Id: 19
|
|
Equivalent ioctl: N/A
|
|
Master payload: u64
|
|
|
|
Ask vhost user backend to broadcast a fake RARP to notify the migration
|
|
is terminated for guest that does not support GUEST_ANNOUNCE.
|
|
Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in
|
|
VHOST_USER_GET_FEATURES and protocol feature bit VHOST_USER_PROTOCOL_F_RARP
|
|
is present in VHOST_USER_GET_PROTOCOL_FEATURES.
|
|
The first 6 bytes of the payload contain the mac address of the guest to
|
|
allow the vhost user backend to construct and broadcast the fake RARP.
|
|
|
|
* VHOST_USER_NET_SET_MTU
|
|
|
|
Id: 20
|
|
Equivalent ioctl: N/A
|
|
Master payload: u64
|
|
|
|
Set host MTU value exposed to the guest.
|
|
This request should be sent only when VIRTIO_NET_F_MTU feature has been
|
|
successfully negotiated, VHOST_USER_F_PROTOCOL_FEATURES is present in
|
|
VHOST_USER_GET_FEATURES and protocol feature bit
|
|
VHOST_USER_PROTOCOL_F_NET_MTU is present in
|
|
VHOST_USER_GET_PROTOCOL_FEATURES.
|
|
If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond
|
|
with zero in case the specified MTU is valid, or non-zero otherwise.
|
|
|
|
* VHOST_USER_SET_SLAVE_REQ_FD
|
|
|
|
Id: 21
|
|
Equivalent ioctl: N/A
|
|
Master payload: N/A
|
|
|
|
Set the socket file descriptor for slave initiated requests. It is passed
|
|
in the ancillary data.
|
|
This request should be sent only when VHOST_USER_F_PROTOCOL_FEATURES
|
|
has been negotiated, and protocol feature bit VHOST_USER_PROTOCOL_F_SLAVE_REQ
|
|
bit is present in VHOST_USER_GET_PROTOCOL_FEATURES.
|
|
If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond
|
|
with zero for success, non-zero otherwise.
|
|
|
|
* VHOST_USER_IOTLB_MSG
|
|
|
|
Id: 22
|
|
Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
|
|
Master payload: struct vhost_iotlb_msg
|
|
Slave payload: u64
|
|
|
|
Send IOTLB messages with struct vhost_iotlb_msg as payload.
|
|
Master sends such requests to update and invalidate entries in the device
|
|
IOTLB. The slave has to acknowledge the request with sending zero as u64
|
|
payload for success, non-zero otherwise.
|
|
This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
|
|
has been successfully negotiated.
|
|
|
|
* VHOST_USER_SET_VRING_ENDIAN
|
|
|
|
Id: 23
|
|
Equivalent ioctl: VHOST_SET_VRING_ENDIAN
|
|
Master payload: vring state description
|
|
|
|
Set the endianness of a VQ for legacy devices. Little-endian is indicated
|
|
with state.num set to 0 and big-endian is indicated with state.num set
|
|
to 1. Other values are invalid.
|
|
This request should be sent only when VHOST_USER_PROTOCOL_F_CROSS_ENDIAN
|
|
has been negotiated.
|
|
Backends that negotiated this feature should handle both endiannesses
|
|
and expect this message once (per VQ) during device configuration
|
|
(ie. before the master starts the VQ).
|
|
|
|
* VHOST_USER_GET_CONFIG
|
|
|
|
Id: 24
|
|
Equivalent ioctl: N/A
|
|
Master payload: virtio device config space
|
|
Slave payload: virtio device config space
|
|
|
|
When VHOST_USER_PROTOCOL_F_CONFIG is negotiated, this message is
|
|
submitted by the vhost-user master to fetch the contents of the virtio
|
|
device configuration space, vhost-user slave's payload size MUST match
|
|
master's request, vhost-user slave uses zero length of payload to
|
|
indicate an error to vhost-user master. The vhost-user master may
|
|
cache the contents to avoid repeated VHOST_USER_GET_CONFIG calls.
|
|
|
|
* VHOST_USER_SET_CONFIG
|
|
|
|
Id: 25
|
|
Equivalent ioctl: N/A
|
|
Master payload: virtio device config space
|
|
Slave payload: N/A
|
|
|
|
When VHOST_USER_PROTOCOL_F_CONFIG is negotiated, this message is
|
|
submitted by the vhost-user master when the Guest changes the virtio
|
|
device configuration space and also can be used for live migration
|
|
on the destination host. The vhost-user slave must check the flags
|
|
field, and slaves MUST NOT accept SET_CONFIG for read-only
|
|
configuration space fields unless the live migration bit is set.
|
|
|
|
* VHOST_USER_CREATE_CRYPTO_SESSION
|
|
|
|
Id: 26
|
|
Equivalent ioctl: N/A
|
|
Master payload: crypto session description
|
|
Slave payload: crypto session description
|
|
|
|
Create a session for crypto operation. The server side must return the
|
|
session id, 0 or positive for success, negative for failure.
|
|
This request should be sent only when VHOST_USER_PROTOCOL_F_CRYPTO_SESSION
|
|
feature has been successfully negotiated.
|
|
It's a required feature for crypto devices.
|
|
|
|
* VHOST_USER_CLOSE_CRYPTO_SESSION
|
|
|
|
Id: 27
|
|
Equivalent ioctl: N/A
|
|
Master payload: u64
|
|
|
|
Close a session for crypto operation which was previously
|
|
created by VHOST_USER_CREATE_CRYPTO_SESSION.
|
|
This request should be sent only when VHOST_USER_PROTOCOL_F_CRYPTO_SESSION
|
|
feature has been successfully negotiated.
|
|
It's a required feature for crypto devices.
|
|
|
|
* VHOST_USER_POSTCOPY_ADVISE
|
|
Id: 28
|
|
Master payload: N/A
|
|
Slave payload: userfault fd
|
|
|
|
When VHOST_USER_PROTOCOL_F_PAGEFAULT is supported, the
|
|
master advises slave that a migration with postcopy enabled is underway,
|
|
the slave must open a userfaultfd for later use.
|
|
Note that at this stage the migration is still in precopy mode.
|
|
|
|
* VHOST_USER_POSTCOPY_LISTEN
|
|
Id: 29
|
|
Master payload: N/A
|
|
|
|
Master advises slave that a transition to postcopy mode has happened.
|
|
The slave must ensure that shared memory is registered with userfaultfd
|
|
to cause faulting of non-present pages.
|
|
|
|
This is always sent sometime after a VHOST_USER_POSTCOPY_ADVISE, and
|
|
thus only when VHOST_USER_PROTOCOL_F_PAGEFAULT is supported.
|
|
|
|
* VHOST_USER_POSTCOPY_END
|
|
Id: 30
|
|
Slave payload: u64
|
|
|
|
Master advises that postcopy migration has now completed. The
|
|
slave must disable the userfaultfd. The response is an acknowledgement
|
|
only.
|
|
When VHOST_USER_PROTOCOL_F_PAGEFAULT is supported, this message
|
|
is sent at the end of the migration, after VHOST_USER_POSTCOPY_LISTEN
|
|
was previously sent.
|
|
The value returned is an error indication; 0 is success.
|
|
|
|
* VHOST_USER_GET_INFLIGHT_FD
|
|
Id: 31
|
|
Equivalent ioctl: N/A
|
|
Master payload: inflight description
|
|
|
|
When VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD protocol feature has been
|
|
successfully negotiated, this message is submitted by master to get
|
|
a shared buffer from slave. The shared buffer will be used to track
|
|
inflight I/O by slave. QEMU should retrieve a new one when vm reset.
|
|
|
|
* VHOST_USER_SET_INFLIGHT_FD
|
|
Id: 32
|
|
Equivalent ioctl: N/A
|
|
Master payload: inflight description
|
|
|
|
When VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD protocol feature has been
|
|
successfully negotiated, this message is submitted by master to send
|
|
the shared inflight buffer back to slave so that slave could get
|
|
inflight I/O after a crash or restart.
|
|
|
|
Slave message types
|
|
-------------------
|
|
|
|
* VHOST_USER_SLAVE_IOTLB_MSG
|
|
|
|
Id: 1
|
|
Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
|
|
Slave payload: struct vhost_iotlb_msg
|
|
Master payload: N/A
|
|
|
|
Send IOTLB messages with struct vhost_iotlb_msg as payload.
|
|
Slave sends such requests to notify of an IOTLB miss, or an IOTLB
|
|
access failure. If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated,
|
|
and slave set the VHOST_USER_NEED_REPLY flag, master must respond with
|
|
zero when operation is successfully completed, or non-zero otherwise.
|
|
This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
|
|
has been successfully negotiated.
|
|
|
|
* VHOST_USER_SLAVE_CONFIG_CHANGE_MSG
|
|
|
|
Id: 2
|
|
Equivalent ioctl: N/A
|
|
Slave payload: N/A
|
|
Master payload: N/A
|
|
|
|
When VHOST_USER_PROTOCOL_F_CONFIG is negotiated, vhost-user slave sends
|
|
such messages to notify that the virtio device's configuration space has
|
|
changed, for those host devices which can support such feature, host
|
|
driver can send VHOST_USER_GET_CONFIG message to slave to get the latest
|
|
content. If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, and slave set
|
|
the VHOST_USER_NEED_REPLY flag, master must respond with zero when
|
|
operation is successfully completed, or non-zero otherwise.
|
|
|
|
* VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG
|
|
|
|
Id: 3
|
|
Equivalent ioctl: N/A
|
|
Slave payload: vring area description
|
|
Master payload: N/A
|
|
|
|
Sets host notifier for a specified queue. The queue index is contained
|
|
in the u64 field of the vring area description. The host notifier is
|
|
described by the file descriptor (typically it's a VFIO device fd) which
|
|
is passed as ancillary data and the size (which is mmap size and should
|
|
be the same as host page size) and offset (which is mmap offset) carried
|
|
in the vring area description. QEMU can mmap the file descriptor based
|
|
on the size and offset to get a memory range. Registering a host notifier
|
|
means mapping this memory range to the VM as the specified queue's notify
|
|
MMIO region. Slave sends this request to tell QEMU to de-register the
|
|
existing notifier if any and register the new notifier if the request is
|
|
sent with a file descriptor.
|
|
This request should be sent only when VHOST_USER_PROTOCOL_F_HOST_NOTIFIER
|
|
protocol feature has been successfully negotiated.
|
|
|
|
VHOST_USER_PROTOCOL_F_REPLY_ACK:
|
|
-------------------------------
|
|
The original vhost-user specification only demands replies for certain
|
|
commands. This differs from the vhost protocol implementation where commands
|
|
are sent over an ioctl() call and block until the client has completed.
|
|
|
|
With this protocol extension negotiated, the sender (QEMU) can set the
|
|
"need_reply" [Bit 3] flag to any command. This indicates that
|
|
the client MUST respond with a Payload VhostUserMsg indicating success or
|
|
failure. The payload should be set to zero on success or non-zero on failure,
|
|
unless the message already has an explicit reply body.
|
|
|
|
The response payload gives QEMU a deterministic indication of the result
|
|
of the command. Today, QEMU is expected to terminate the main vhost-user
|
|
loop upon receiving such errors. In future, qemu could be taught to be more
|
|
resilient for selective requests.
|
|
|
|
For the message types that already solicit a reply from the client, the
|
|
presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
|
|
no behavioural change. (See the 'Communication' section for details.)
|
|
|
|
Backend program conventions
|
|
---------------------------
|
|
|
|
vhost-user backends can provide various devices & services and may
|
|
need to be configured manually depending on the use case. However, it
|
|
is a good idea to follow the conventions listed here when
|
|
possible. Users, QEMU or libvirt, can then rely on some common
|
|
behaviour to avoid heterogenous configuration and management of the
|
|
backend programs and facilitate interoperability.
|
|
|
|
Each backend installed on a host system should come with at least one
|
|
JSON file that conforms to the vhost-user.json schema. Each file
|
|
informs the management applications about the backend type, and binary
|
|
location. In addition, it defines rules for management apps for
|
|
picking the highest priority backend when multiple match the search
|
|
criteria (see @VhostUserBackend documentation in the schema file).
|
|
|
|
If the backend is not capable of enabling a requested feature on the
|
|
host (such as 3D acceleration with virgl), or the initialization
|
|
failed, the backend should fail to start early and exit with a status
|
|
!= 0. It may also print a message to stderr for further details.
|
|
|
|
The backend program must not daemonize itself, but it may be
|
|
daemonized by the management layer. It may also have a restricted
|
|
access to the system.
|
|
|
|
File descriptors 0, 1 and 2 will exist, and have regular
|
|
stdin/stdout/stderr usage (they may have been redirected to /dev/null
|
|
by the management layer, or to a log handler).
|
|
|
|
The backend program must end (as quickly and cleanly as possible) when
|
|
the SIGTERM signal is received. Eventually, it may receive SIGKILL by
|
|
the management layer after a few seconds.
|
|
|
|
The following command line options have an expected behaviour. They
|
|
are mandatory, unless explicitly said differently:
|
|
|
|
* --socket-path=PATH
|
|
|
|
This option specify the location of the vhost-user Unix domain socket.
|
|
It is incompatible with --fd.
|
|
|
|
* --fd=FDNUM
|
|
|
|
When this argument is given, the backend program is started with the
|
|
vhost-user socket as file descriptor FDNUM. It is incompatible with
|
|
--socket-path.
|
|
|
|
* --print-capabilities
|
|
|
|
Output to stdout the backend capabilities in JSON format, and then
|
|
exit successfully. Other options and arguments should be ignored, and
|
|
the backend program should not perform its normal function. The
|
|
capabilities can be reported dynamically depending on the host
|
|
capabilities.
|
|
|
|
The JSON output is described in the vhost-user.json schema, by
|
|
@VHostUserBackendCapabilities. Example:
|
|
{
|
|
"type": "foo",
|
|
"features": [
|
|
"feature-a",
|
|
"feature-b"
|
|
]
|
|
}
|
|
|
|
vhost-user-input
|
|
----------------
|
|
|
|
Command line options:
|
|
|
|
* --evdev-path=PATH (optional)
|
|
|
|
Specify the linux input device.
|
|
|
|
* --no-grab (optional)
|
|
|
|
Do no request exclusive access to the input device.
|
|
|
|
vhost-user-gpu
|
|
--------------
|
|
|
|
Command line options:
|
|
|
|
* --render-node=PATH (optional)
|
|
|
|
Specify the GPU DRM render node.
|
|
|
|
* --virgl (optional)
|
|
|
|
Enable virgl rendering support.
|