VFIO updates 2023-02-16
* Initial v2 migration support for vfio (Avihai Horon) * Add Cédric as vfio reviewer (Cédric Le Goater) -----BEGIN PGP SIGNATURE----- iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmPumhUbHGFsZXgud2ls bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsijnMP/0Rz/lsGxym76mXtr5WY OR5SDFpifpaUVi+1xTugYFPnZvN+RdnlcLrcp1g8G+lmd4ANqwT0b9XTTTI8WTau DhSHW/05WgAOrf/jOSV29oNSf7jtGJZcDbAy8f5NXxwK/IRlJEDJfCaqxwYSyYf1 nfC0ZwMTrBrA6pzF5OzIJSkhl/uPwlTsBxRnbN86Z22rE128ASjUtj1jir4rPLg0 ClUn7Rrdk/Y6uXIB9c6TFC+wmG0QAVsklWIeNLUFWUak4H0gqp7AUmMlJV99i5Q7 3H4Zjspwn79llvGm4X1QpuLaop2QaIQaW4FTpzRSftelEosjIjkTCMrWTb4MKff1 cgT0dmC1Hht+zQ0MPbmgeaiwPH/V7r+J9GffG6p2b4itdHmrKVsqKQMSQS/IJFBw eiO1rENRXNcTnC29jPUhe1IS1DEwCNkWm9NgJoC5WPJYQXsiEvo4YDH/30FnByXg KQdd5OxR7o6qJM5e4PUn4wd9sHsYU8IsIEJdKnynoS9qUdPqv0tJ+tLYWcBhQPJq M8R+mDwImMzw0bgurg4607VgL9HJEXna2rgdd5hcMq88M+M5OpmowXlk4TTY4Ha9 lmWSndYJG6npNY4NXcxbe4x5H8ndvHcO+g3weynsxPFjnL959NzQyWNFXFDBqBg3 fhNVqYTrMOcEN5uv18o+mnsG =oK7/ -----END PGP SIGNATURE----- Merge tag 'vfio-updates-20230216.0' of https://gitlab.com/alex.williamson/qemu into staging VFIO updates 2023-02-16 * Initial v2 migration support for vfio (Avihai Horon) * Add Cédric as vfio reviewer (Cédric Le Goater) # -----BEGIN PGP SIGNATURE----- # # iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmPumhUbHGFsZXgud2ls # bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsijnMP/0Rz/lsGxym76mXtr5WY # OR5SDFpifpaUVi+1xTugYFPnZvN+RdnlcLrcp1g8G+lmd4ANqwT0b9XTTTI8WTau # DhSHW/05WgAOrf/jOSV29oNSf7jtGJZcDbAy8f5NXxwK/IRlJEDJfCaqxwYSyYf1 # nfC0ZwMTrBrA6pzF5OzIJSkhl/uPwlTsBxRnbN86Z22rE128ASjUtj1jir4rPLg0 # ClUn7Rrdk/Y6uXIB9c6TFC+wmG0QAVsklWIeNLUFWUak4H0gqp7AUmMlJV99i5Q7 # 3H4Zjspwn79llvGm4X1QpuLaop2QaIQaW4FTpzRSftelEosjIjkTCMrWTb4MKff1 # cgT0dmC1Hht+zQ0MPbmgeaiwPH/V7r+J9GffG6p2b4itdHmrKVsqKQMSQS/IJFBw # eiO1rENRXNcTnC29jPUhe1IS1DEwCNkWm9NgJoC5WPJYQXsiEvo4YDH/30FnByXg # KQdd5OxR7o6qJM5e4PUn4wd9sHsYU8IsIEJdKnynoS9qUdPqv0tJ+tLYWcBhQPJq # M8R+mDwImMzw0bgurg4607VgL9HJEXna2rgdd5hcMq88M+M5OpmowXlk4TTY4Ha9 # lmWSndYJG6npNY4NXcxbe4x5H8ndvHcO+g3weynsxPFjnL959NzQyWNFXFDBqBg3 # fhNVqYTrMOcEN5uv18o+mnsG # =oK7/ # -----END PGP SIGNATURE----- # gpg: Signature made Thu 16 Feb 2023 21:03:17 GMT # gpg: using RSA key 42F6C04E540BD1A99E7B8A90239B9B6E3BB08B22 # gpg: issuer "alex.williamson@redhat.com" # gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>" [full] # gpg: aka "Alex Williamson <alex@shazbot.org>" [full] # gpg: aka "Alex Williamson <alwillia@redhat.com>" [full] # gpg: aka "Alex Williamson <alex.l.williamson@gmail.com>" [full] # Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B 8A90 239B 9B6E 3BB0 8B22 * tag 'vfio-updates-20230216.0' of https://gitlab.com/alex.williamson/qemu: MAINTAINERS: Add myself as VFIO reviewer docs/devel: Align VFIO migration docs to v2 protocol vfio: Alphabetize migration section of VFIO trace-events file vfio/migration: Remove VFIO migration protocol v1 vfio/migration: Implement VFIO migration protocol v2 vfio/migration: Rename functions/structs related to v1 protocol vfio/migration: Move migration v1 logic to vfio_migration_init() vfio/migration: Block multiple devices migration vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one vfio/migration: Allow migration without VFIO IOMMU dirty tracking support vfio/migration: Fix NULL pointer dereference bug linux-headers: Update to v6.2-rc8 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This commit is contained in:
commit
9b0699ab80
@ -1995,6 +1995,7 @@ F: hw/usb/dev-serial.c
|
||||
|
||||
VFIO
|
||||
M: Alex Williamson <alex.williamson@redhat.com>
|
||||
R: Cédric Le Goater <clg@redhat.com>
|
||||
S: Supported
|
||||
F: hw/vfio/*
|
||||
F: include/hw/vfio/
|
||||
|
@ -7,46 +7,43 @@ the guest is running on source host and restoring this saved state on the
|
||||
destination host. This document details how saving and restoring of VFIO
|
||||
devices is done in QEMU.
|
||||
|
||||
Migration of VFIO devices consists of two phases: the optional pre-copy phase,
|
||||
and the stop-and-copy phase. The pre-copy phase is iterative and allows to
|
||||
accommodate VFIO devices that have a large amount of data that needs to be
|
||||
transferred. The iterative pre-copy phase of migration allows for the guest to
|
||||
continue whilst the VFIO device state is transferred to the destination, this
|
||||
helps to reduce the total downtime of the VM. VFIO devices can choose to skip
|
||||
the pre-copy phase of migration by returning pending_bytes as zero during the
|
||||
pre-copy phase.
|
||||
Migration of VFIO devices currently consists of a single stop-and-copy phase.
|
||||
During the stop-and-copy phase the guest is stopped and the entire VFIO device
|
||||
data is transferred to the destination.
|
||||
|
||||
The pre-copy phase of migration is currently not supported for VFIO devices.
|
||||
Support for VFIO pre-copy will be added later on.
|
||||
|
||||
Note that currently VFIO migration is supported only for a single device. This
|
||||
is due to VFIO migration's lack of P2P support. However, P2P support is planned
|
||||
to be added later on.
|
||||
|
||||
A detailed description of the UAPI for VFIO device migration can be found in
|
||||
the comment for the ``vfio_device_migration_info`` structure in the header
|
||||
file linux-headers/linux/vfio.h.
|
||||
the comment for the ``vfio_device_mig_state`` structure in the header file
|
||||
linux-headers/linux/vfio.h.
|
||||
|
||||
VFIO implements the device hooks for the iterative approach as follows:
|
||||
|
||||
* A ``save_setup`` function that sets up the migration region and sets _SAVING
|
||||
flag in the VFIO device state.
|
||||
* A ``save_setup`` function that sets up migration on the source.
|
||||
|
||||
* A ``load_setup`` function that sets up the migration region on the
|
||||
destination and sets _RESUMING flag in the VFIO device state.
|
||||
* A ``load_setup`` function that sets the VFIO device on the destination in
|
||||
_RESUMING state.
|
||||
|
||||
* A ``state_pending_exact`` function that reads pending_bytes from the vendor
|
||||
driver, which indicates the amount of data that the vendor driver has yet to
|
||||
save for the VFIO device.
|
||||
|
||||
* A ``save_live_iterate`` function that reads the VFIO device's data from the
|
||||
vendor driver through the migration region during iterative phase.
|
||||
|
||||
* A ``save_state`` function to save the device config space if it is present.
|
||||
|
||||
* A ``save_live_complete_precopy`` function that resets _RUNNING flag from the
|
||||
VFIO device state and iteratively copies the remaining data for the VFIO
|
||||
device until the vendor driver indicates that no data remains (pending bytes
|
||||
is zero).
|
||||
* A ``save_live_complete_precopy`` function that sets the VFIO device in
|
||||
_STOP_COPY state and iteratively copies the data for the VFIO device until
|
||||
the vendor driver indicates that no data remains.
|
||||
|
||||
* A ``load_state`` function that loads the config section and the data
|
||||
sections that are generated by the save functions above
|
||||
sections that are generated by the save functions above.
|
||||
|
||||
* ``cleanup`` functions for both save and load that perform any migration
|
||||
related cleanup, including unmapping the migration region
|
||||
related cleanup.
|
||||
|
||||
|
||||
The VFIO migration code uses a VM state change handler to change the VFIO
|
||||
@ -71,13 +68,13 @@ tracking can identify dirtied pages, but any page pinned by the vendor driver
|
||||
can also be written by the device. There is currently no device or IOMMU
|
||||
support for dirty page tracking in hardware.
|
||||
|
||||
By default, dirty pages are tracked when the device is in pre-copy as well as
|
||||
stop-and-copy phase. So, a page pinned by the vendor driver will be copied to
|
||||
the destination in both phases. Copying dirty pages in pre-copy phase helps
|
||||
QEMU to predict if it can achieve its downtime tolerances. If QEMU during
|
||||
pre-copy phase keeps finding dirty pages continuously, then it understands
|
||||
that even in stop-and-copy phase, it is likely to find dirty pages and can
|
||||
predict the downtime accordingly.
|
||||
By default, dirty pages are tracked during pre-copy as well as stop-and-copy
|
||||
phase. So, a page pinned by the vendor driver will be copied to the destination
|
||||
in both phases. Copying dirty pages in pre-copy phase helps QEMU to predict if
|
||||
it can achieve its downtime tolerances. If QEMU during pre-copy phase keeps
|
||||
finding dirty pages continuously, then it understands that even in stop-and-copy
|
||||
phase, it is likely to find dirty pages and can predict the downtime
|
||||
accordingly.
|
||||
|
||||
QEMU also provides a per device opt-out option ``pre-copy-dirty-page-tracking``
|
||||
which disables querying the dirty bitmap during pre-copy phase. If it is set to
|
||||
@ -111,23 +108,22 @@ Live migration save path
|
||||
|
|
||||
migrate_init spawns migration_thread
|
||||
Migration thread then calls each device's .save_setup()
|
||||
(RUNNING, _SETUP, _RUNNING|_SAVING)
|
||||
(RUNNING, _SETUP, _RUNNING)
|
||||
|
|
||||
(RUNNING, _ACTIVE, _RUNNING|_SAVING)
|
||||
(RUNNING, _ACTIVE, _RUNNING)
|
||||
If device is active, get pending_bytes by .state_pending_exact()
|
||||
If total pending_bytes >= threshold_size, call .save_live_iterate()
|
||||
Data of VFIO device for pre-copy phase is copied
|
||||
Iterate till total pending bytes converge and are less than threshold
|
||||
|
|
||||
On migration completion, vCPU stops and calls .save_live_complete_precopy for
|
||||
each active device. The VFIO device is then transitioned into _SAVING state
|
||||
(FINISH_MIGRATE, _DEVICE, _SAVING)
|
||||
each active device. The VFIO device is then transitioned into _STOP_COPY state
|
||||
(FINISH_MIGRATE, _DEVICE, _STOP_COPY)
|
||||
|
|
||||
For the VFIO device, iterate in .save_live_complete_precopy until
|
||||
pending data is 0
|
||||
(FINISH_MIGRATE, _DEVICE, _STOPPED)
|
||||
(FINISH_MIGRATE, _DEVICE, _STOP)
|
||||
|
|
||||
(FINISH_MIGRATE, _COMPLETED, _STOPPED)
|
||||
(FINISH_MIGRATE, _COMPLETED, _STOP)
|
||||
Migraton thread schedules cleanup bottom half and exits
|
||||
|
||||
Live migration resume path
|
||||
@ -136,7 +132,7 @@ Live migration resume path
|
||||
::
|
||||
|
||||
Incoming migration calls .load_setup for each device
|
||||
(RESTORE_VM, _ACTIVE, _STOPPED)
|
||||
(RESTORE_VM, _ACTIVE, _STOP)
|
||||
|
|
||||
For each device, .load_state is called for that device section data
|
||||
(RESTORE_VM, _ACTIVE, _RESUMING)
|
||||
|
@ -40,6 +40,8 @@
|
||||
#include "trace.h"
|
||||
#include "qapi/error.h"
|
||||
#include "migration/migration.h"
|
||||
#include "migration/misc.h"
|
||||
#include "migration/blocker.h"
|
||||
#include "sysemu/tpm.h"
|
||||
|
||||
VFIOGroupList vfio_group_list =
|
||||
@ -336,6 +338,58 @@ bool vfio_mig_active(void)
|
||||
return true;
|
||||
}
|
||||
|
||||
static Error *multiple_devices_migration_blocker;
|
||||
|
||||
static unsigned int vfio_migratable_device_num(void)
|
||||
{
|
||||
VFIOGroup *group;
|
||||
VFIODevice *vbasedev;
|
||||
unsigned int device_num = 0;
|
||||
|
||||
QLIST_FOREACH(group, &vfio_group_list, next) {
|
||||
QLIST_FOREACH(vbasedev, &group->device_list, next) {
|
||||
if (vbasedev->migration) {
|
||||
device_num++;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return device_num;
|
||||
}
|
||||
|
||||
int vfio_block_multiple_devices_migration(Error **errp)
|
||||
{
|
||||
int ret;
|
||||
|
||||
if (multiple_devices_migration_blocker ||
|
||||
vfio_migratable_device_num() <= 1) {
|
||||
return 0;
|
||||
}
|
||||
|
||||
error_setg(&multiple_devices_migration_blocker,
|
||||
"Migration is currently not supported with multiple "
|
||||
"VFIO devices");
|
||||
ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
|
||||
if (ret < 0) {
|
||||
error_free(multiple_devices_migration_blocker);
|
||||
multiple_devices_migration_blocker = NULL;
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
void vfio_unblock_multiple_devices_migration(void)
|
||||
{
|
||||
if (!multiple_devices_migration_blocker ||
|
||||
vfio_migratable_device_num() > 1) {
|
||||
return;
|
||||
}
|
||||
|
||||
migrate_del_blocker(multiple_devices_migration_blocker);
|
||||
error_free(multiple_devices_migration_blocker);
|
||||
multiple_devices_migration_blocker = NULL;
|
||||
}
|
||||
|
||||
static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
|
||||
{
|
||||
VFIOGroup *group;
|
||||
@ -354,8 +408,8 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
|
||||
return false;
|
||||
}
|
||||
|
||||
if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF)
|
||||
&& (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING)) {
|
||||
if (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF &&
|
||||
migration->device_state == VFIO_DEVICE_STATE_RUNNING) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
@ -363,13 +417,16 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool vfio_devices_all_running_and_saving(VFIOContainer *container)
|
||||
/*
|
||||
* Check if all VFIO devices are running and migration is active, which is
|
||||
* essentially equivalent to the migration being in pre-copy phase.
|
||||
*/
|
||||
static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
|
||||
{
|
||||
VFIOGroup *group;
|
||||
VFIODevice *vbasedev;
|
||||
MigrationState *ms = migrate_get_current();
|
||||
|
||||
if (!migration_is_setup_or_active(ms->state)) {
|
||||
if (!migration_is_active(migrate_get_current())) {
|
||||
return false;
|
||||
}
|
||||
|
||||
@ -381,8 +438,7 @@ static bool vfio_devices_all_running_and_saving(VFIOContainer *container)
|
||||
return false;
|
||||
}
|
||||
|
||||
if ((migration->device_state & VFIO_DEVICE_STATE_V1_SAVING) &&
|
||||
(migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING)) {
|
||||
if (migration->device_state == VFIO_DEVICE_STATE_RUNNING) {
|
||||
continue;
|
||||
} else {
|
||||
return false;
|
||||
@ -461,7 +517,7 @@ static int vfio_dma_unmap(VFIOContainer *container,
|
||||
};
|
||||
|
||||
if (iotlb && container->dirty_pages_supported &&
|
||||
vfio_devices_all_running_and_saving(container)) {
|
||||
vfio_devices_all_running_and_mig_active(container)) {
|
||||
return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
|
||||
}
|
||||
|
||||
@ -488,6 +544,12 @@ static int vfio_dma_unmap(VFIOContainer *container,
|
||||
return -errno;
|
||||
}
|
||||
|
||||
if (iotlb && vfio_devices_all_running_and_mig_active(container)) {
|
||||
cpu_physical_memory_set_dirty_range(iotlb->translated_addr, size,
|
||||
tcg_enabled() ? DIRTY_CLIENTS_ALL :
|
||||
DIRTY_CLIENTS_NOCODE);
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -1201,6 +1263,10 @@ static void vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
|
||||
.argsz = sizeof(dirty),
|
||||
};
|
||||
|
||||
if (!container->dirty_pages_supported) {
|
||||
return;
|
||||
}
|
||||
|
||||
if (start) {
|
||||
dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
|
||||
} else {
|
||||
@ -1236,6 +1302,13 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
|
||||
uint64_t pages;
|
||||
int ret;
|
||||
|
||||
if (!container->dirty_pages_supported) {
|
||||
cpu_physical_memory_set_dirty_range(ram_addr, size,
|
||||
tcg_enabled() ? DIRTY_CLIENTS_ALL :
|
||||
DIRTY_CLIENTS_NOCODE);
|
||||
return 0;
|
||||
}
|
||||
|
||||
dbitmap = g_malloc0(sizeof(*dbitmap) + sizeof(*range));
|
||||
|
||||
dbitmap->argsz = sizeof(*dbitmap) + sizeof(*range);
|
||||
@ -1409,8 +1482,7 @@ static void vfio_listener_log_sync(MemoryListener *listener,
|
||||
{
|
||||
VFIOContainer *container = container_of(listener, VFIOContainer, listener);
|
||||
|
||||
if (vfio_listener_skipped_section(section) ||
|
||||
!container->dirty_pages_supported) {
|
||||
if (vfio_listener_skipped_section(section)) {
|
||||
return;
|
||||
}
|
||||
|
||||
|
@ -10,6 +10,7 @@
|
||||
#include "qemu/osdep.h"
|
||||
#include "qemu/main-loop.h"
|
||||
#include "qemu/cutils.h"
|
||||
#include "qemu/units.h"
|
||||
#include <linux/vfio.h>
|
||||
#include <sys/ioctl.h>
|
||||
|
||||
@ -44,310 +45,124 @@
|
||||
#define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xffffffffef100003ULL)
|
||||
#define VFIO_MIG_FLAG_DEV_DATA_STATE (0xffffffffef100004ULL)
|
||||
|
||||
/*
|
||||
* This is an arbitrary size based on migration of mlx5 devices, where typically
|
||||
* total device migration size is on the order of 100s of MB. Testing with
|
||||
* larger values, e.g. 128MB and 1GB, did not show a performance improvement.
|
||||
*/
|
||||
#define VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE (1 * MiB)
|
||||
|
||||
static int64_t bytes_transferred;
|
||||
|
||||
static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count,
|
||||
off_t off, bool iswrite)
|
||||
static const char *mig_state_to_str(enum vfio_device_mig_state state)
|
||||
{
|
||||
int ret;
|
||||
|
||||
ret = iswrite ? pwrite(vbasedev->fd, val, count, off) :
|
||||
pread(vbasedev->fd, val, count, off);
|
||||
if (ret < count) {
|
||||
error_report("vfio_mig_%s %d byte %s: failed at offset 0x%"
|
||||
HWADDR_PRIx", err: %s", iswrite ? "write" : "read", count,
|
||||
vbasedev->name, off, strerror(errno));
|
||||
return (ret < 0) ? ret : -EINVAL;
|
||||
switch (state) {
|
||||
case VFIO_DEVICE_STATE_ERROR:
|
||||
return "ERROR";
|
||||
case VFIO_DEVICE_STATE_STOP:
|
||||
return "STOP";
|
||||
case VFIO_DEVICE_STATE_RUNNING:
|
||||
return "RUNNING";
|
||||
case VFIO_DEVICE_STATE_STOP_COPY:
|
||||
return "STOP_COPY";
|
||||
case VFIO_DEVICE_STATE_RESUMING:
|
||||
return "RESUMING";
|
||||
default:
|
||||
return "UNKNOWN STATE";
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int vfio_mig_rw(VFIODevice *vbasedev, __u8 *buf, size_t count,
|
||||
off_t off, bool iswrite)
|
||||
{
|
||||
int ret, done = 0;
|
||||
__u8 *tbuf = buf;
|
||||
|
||||
while (count) {
|
||||
int bytes = 0;
|
||||
|
||||
if (count >= 8 && !(off % 8)) {
|
||||
bytes = 8;
|
||||
} else if (count >= 4 && !(off % 4)) {
|
||||
bytes = 4;
|
||||
} else if (count >= 2 && !(off % 2)) {
|
||||
bytes = 2;
|
||||
} else {
|
||||
bytes = 1;
|
||||
}
|
||||
|
||||
ret = vfio_mig_access(vbasedev, tbuf, bytes, off, iswrite);
|
||||
if (ret) {
|
||||
return ret;
|
||||
}
|
||||
|
||||
count -= bytes;
|
||||
done += bytes;
|
||||
off += bytes;
|
||||
tbuf += bytes;
|
||||
}
|
||||
return done;
|
||||
}
|
||||
|
||||
#define vfio_mig_read(f, v, c, o) vfio_mig_rw(f, (__u8 *)v, c, o, false)
|
||||
#define vfio_mig_write(f, v, c, o) vfio_mig_rw(f, (__u8 *)v, c, o, true)
|
||||
|
||||
#define VFIO_MIG_STRUCT_OFFSET(f) \
|
||||
offsetof(struct vfio_device_migration_info, f)
|
||||
/*
|
||||
* Change the device_state register for device @vbasedev. Bits set in @mask
|
||||
* are preserved, bits set in @value are set, and bits not set in either @mask
|
||||
* or @value are cleared in device_state. If the register cannot be accessed,
|
||||
* the resulting state would be invalid, or the device enters an error state,
|
||||
* an error is returned.
|
||||
*/
|
||||
|
||||
static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t mask,
|
||||
uint32_t value)
|
||||
static int vfio_migration_set_state(VFIODevice *vbasedev,
|
||||
enum vfio_device_mig_state new_state,
|
||||
enum vfio_device_mig_state recover_state)
|
||||
{
|
||||
VFIOMigration *migration = vbasedev->migration;
|
||||
VFIORegion *region = &migration->region;
|
||||
off_t dev_state_off = region->fd_offset +
|
||||
VFIO_MIG_STRUCT_OFFSET(device_state);
|
||||
uint32_t device_state;
|
||||
uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
|
||||
sizeof(struct vfio_device_feature_mig_state),
|
||||
sizeof(uint64_t))] = {};
|
||||
struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
|
||||
struct vfio_device_feature_mig_state *mig_state =
|
||||
(struct vfio_device_feature_mig_state *)feature->data;
|
||||
int ret;
|
||||
|
||||
ret = vfio_mig_read(vbasedev, &device_state, sizeof(device_state),
|
||||
dev_state_off);
|
||||
if (ret < 0) {
|
||||
feature->argsz = sizeof(buf);
|
||||
feature->flags =
|
||||
VFIO_DEVICE_FEATURE_SET | VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE;
|
||||
mig_state->device_state = new_state;
|
||||
if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
|
||||
/* Try to set the device in some good state */
|
||||
ret = -errno;
|
||||
|
||||
if (recover_state == VFIO_DEVICE_STATE_ERROR) {
|
||||
error_report("%s: Failed setting device state to %s, err: %s. "
|
||||
"Recover state is ERROR. Resetting device",
|
||||
vbasedev->name, mig_state_to_str(new_state),
|
||||
strerror(errno));
|
||||
|
||||
goto reset_device;
|
||||
}
|
||||
|
||||
error_report(
|
||||
"%s: Failed setting device state to %s, err: %s. Setting device in recover state %s",
|
||||
vbasedev->name, mig_state_to_str(new_state),
|
||||
strerror(errno), mig_state_to_str(recover_state));
|
||||
|
||||
mig_state->device_state = recover_state;
|
||||
if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
|
||||
ret = -errno;
|
||||
error_report(
|
||||
"%s: Failed setting device in recover state, err: %s. Resetting device",
|
||||
vbasedev->name, strerror(errno));
|
||||
|
||||
goto reset_device;
|
||||
}
|
||||
|
||||
migration->device_state = recover_state;
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
device_state = (device_state & mask) | value;
|
||||
|
||||
if (!VFIO_DEVICE_STATE_VALID(device_state)) {
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
ret = vfio_mig_write(vbasedev, &device_state, sizeof(device_state),
|
||||
dev_state_off);
|
||||
if (ret < 0) {
|
||||
int rret;
|
||||
|
||||
rret = vfio_mig_read(vbasedev, &device_state, sizeof(device_state),
|
||||
dev_state_off);
|
||||
|
||||
if ((rret < 0) || (VFIO_DEVICE_STATE_IS_ERROR(device_state))) {
|
||||
hw_error("%s: Device in error state 0x%x", vbasedev->name,
|
||||
device_state);
|
||||
return rret ? rret : -EIO;
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
|
||||
migration->device_state = device_state;
|
||||
trace_vfio_migration_set_state(vbasedev->name, device_state);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void *get_data_section_size(VFIORegion *region, uint64_t data_offset,
|
||||
uint64_t data_size, uint64_t *size)
|
||||
{
|
||||
void *ptr = NULL;
|
||||
uint64_t limit = 0;
|
||||
int i;
|
||||
|
||||
if (!region->mmaps) {
|
||||
if (size) {
|
||||
*size = MIN(data_size, region->size - data_offset);
|
||||
}
|
||||
return ptr;
|
||||
}
|
||||
|
||||
for (i = 0; i < region->nr_mmaps; i++) {
|
||||
VFIOMmap *map = region->mmaps + i;
|
||||
|
||||
if ((data_offset >= map->offset) &&
|
||||
(data_offset < map->offset + map->size)) {
|
||||
|
||||
/* check if data_offset is within sparse mmap areas */
|
||||
ptr = map->mmap + data_offset - map->offset;
|
||||
if (size) {
|
||||
*size = MIN(data_size, map->offset + map->size - data_offset);
|
||||
}
|
||||
break;
|
||||
} else if ((data_offset < map->offset) &&
|
||||
(!limit || limit > map->offset)) {
|
||||
migration->device_state = new_state;
|
||||
if (mig_state->data_fd != -1) {
|
||||
if (migration->data_fd != -1) {
|
||||
/*
|
||||
* data_offset is not within sparse mmap areas, find size of
|
||||
* non-mapped area. Check through all list since region->mmaps list
|
||||
* is not sorted.
|
||||
* This can happen if the device is asynchronously reset and
|
||||
* terminates a data transfer.
|
||||
*/
|
||||
limit = map->offset;
|
||||
}
|
||||
}
|
||||
error_report("%s: data_fd out of sync", vbasedev->name);
|
||||
close(mig_state->data_fd);
|
||||
|
||||
if (!ptr && size) {
|
||||
*size = limit ? MIN(data_size, limit - data_offset) : data_size;
|
||||
}
|
||||
return ptr;
|
||||
}
|
||||
|
||||
static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev, uint64_t *size)
|
||||
{
|
||||
VFIOMigration *migration = vbasedev->migration;
|
||||
VFIORegion *region = &migration->region;
|
||||
uint64_t data_offset = 0, data_size = 0, sz;
|
||||
int ret;
|
||||
|
||||
ret = vfio_mig_read(vbasedev, &data_offset, sizeof(data_offset),
|
||||
region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_offset));
|
||||
if (ret < 0) {
|
||||
return ret;
|
||||
}
|
||||
|
||||
ret = vfio_mig_read(vbasedev, &data_size, sizeof(data_size),
|
||||
region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_size));
|
||||
if (ret < 0) {
|
||||
return ret;
|
||||
}
|
||||
|
||||
trace_vfio_save_buffer(vbasedev->name, data_offset, data_size,
|
||||
migration->pending_bytes);
|
||||
|
||||
qemu_put_be64(f, data_size);
|
||||
sz = data_size;
|
||||
|
||||
while (sz) {
|
||||
void *buf;
|
||||
uint64_t sec_size;
|
||||
bool buf_allocated = false;
|
||||
|
||||
buf = get_data_section_size(region, data_offset, sz, &sec_size);
|
||||
|
||||
if (!buf) {
|
||||
buf = g_try_malloc(sec_size);
|
||||
if (!buf) {
|
||||
error_report("%s: Error allocating buffer ", __func__);
|
||||
return -ENOMEM;
|
||||
}
|
||||
buf_allocated = true;
|
||||
|
||||
ret = vfio_mig_read(vbasedev, buf, sec_size,
|
||||
region->fd_offset + data_offset);
|
||||
if (ret < 0) {
|
||||
g_free(buf);
|
||||
return ret;
|
||||
}
|
||||
return -EBADF;
|
||||
}
|
||||
|
||||
qemu_put_buffer(f, buf, sec_size);
|
||||
|
||||
if (buf_allocated) {
|
||||
g_free(buf);
|
||||
}
|
||||
sz -= sec_size;
|
||||
data_offset += sec_size;
|
||||
migration->data_fd = mig_state->data_fd;
|
||||
}
|
||||
|
||||
ret = qemu_file_get_error(f);
|
||||
trace_vfio_migration_set_state(vbasedev->name, mig_state_to_str(new_state));
|
||||
|
||||
if (!ret && size) {
|
||||
*size = data_size;
|
||||
return 0;
|
||||
|
||||
reset_device:
|
||||
if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) {
|
||||
hw_error("%s: Failed resetting device, err: %s", vbasedev->name,
|
||||
strerror(errno));
|
||||
}
|
||||
|
||||
bytes_transferred += data_size;
|
||||
migration->device_state = VFIO_DEVICE_STATE_RUNNING;
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
|
||||
uint64_t data_size)
|
||||
{
|
||||
VFIORegion *region = &vbasedev->migration->region;
|
||||
uint64_t data_offset = 0, size, report_size;
|
||||
int ret;
|
||||
|
||||
do {
|
||||
ret = vfio_mig_read(vbasedev, &data_offset, sizeof(data_offset),
|
||||
region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_offset));
|
||||
if (ret < 0) {
|
||||
return ret;
|
||||
}
|
||||
|
||||
if (data_offset + data_size > region->size) {
|
||||
/*
|
||||
* If data_size is greater than the data section of migration region
|
||||
* then iterate the write buffer operation. This case can occur if
|
||||
* size of migration region at destination is smaller than size of
|
||||
* migration region at source.
|
||||
*/
|
||||
report_size = size = region->size - data_offset;
|
||||
data_size -= size;
|
||||
} else {
|
||||
report_size = size = data_size;
|
||||
data_size = 0;
|
||||
}
|
||||
|
||||
trace_vfio_load_state_device_data(vbasedev->name, data_offset, size);
|
||||
|
||||
while (size) {
|
||||
void *buf;
|
||||
uint64_t sec_size;
|
||||
bool buf_alloc = false;
|
||||
|
||||
buf = get_data_section_size(region, data_offset, size, &sec_size);
|
||||
|
||||
if (!buf) {
|
||||
buf = g_try_malloc(sec_size);
|
||||
if (!buf) {
|
||||
error_report("%s: Error allocating buffer ", __func__);
|
||||
return -ENOMEM;
|
||||
}
|
||||
buf_alloc = true;
|
||||
}
|
||||
|
||||
qemu_get_buffer(f, buf, sec_size);
|
||||
|
||||
if (buf_alloc) {
|
||||
ret = vfio_mig_write(vbasedev, buf, sec_size,
|
||||
region->fd_offset + data_offset);
|
||||
g_free(buf);
|
||||
|
||||
if (ret < 0) {
|
||||
return ret;
|
||||
}
|
||||
}
|
||||
size -= sec_size;
|
||||
data_offset += sec_size;
|
||||
}
|
||||
|
||||
ret = vfio_mig_write(vbasedev, &report_size, sizeof(report_size),
|
||||
region->fd_offset + VFIO_MIG_STRUCT_OFFSET(data_size));
|
||||
if (ret < 0) {
|
||||
return ret;
|
||||
}
|
||||
} while (data_size);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int vfio_update_pending(VFIODevice *vbasedev)
|
||||
{
|
||||
VFIOMigration *migration = vbasedev->migration;
|
||||
VFIORegion *region = &migration->region;
|
||||
uint64_t pending_bytes = 0;
|
||||
int ret;
|
||||
|
||||
ret = vfio_mig_read(vbasedev, &pending_bytes, sizeof(pending_bytes),
|
||||
region->fd_offset + VFIO_MIG_STRUCT_OFFSET(pending_bytes));
|
||||
if (ret < 0) {
|
||||
migration->pending_bytes = 0;
|
||||
return ret;
|
||||
}
|
||||
ret = qemu_file_get_to_fd(f, migration->data_fd, data_size);
|
||||
trace_vfio_load_state_device_data(vbasedev->name, data_size, ret);
|
||||
|
||||
migration->pending_bytes = pending_bytes;
|
||||
trace_vfio_update_pending(vbasedev->name, pending_bytes);
|
||||
return 0;
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int vfio_save_device_config_state(QEMUFile *f, void *opaque)
|
||||
@ -398,9 +213,55 @@ static void vfio_migration_cleanup(VFIODevice *vbasedev)
|
||||
{
|
||||
VFIOMigration *migration = vbasedev->migration;
|
||||
|
||||
if (migration->region.mmaps) {
|
||||
vfio_region_unmap(&migration->region);
|
||||
close(migration->data_fd);
|
||||
migration->data_fd = -1;
|
||||
}
|
||||
|
||||
static int vfio_query_stop_copy_size(VFIODevice *vbasedev,
|
||||
uint64_t *stop_copy_size)
|
||||
{
|
||||
uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
|
||||
sizeof(struct vfio_device_feature_mig_data_size),
|
||||
sizeof(uint64_t))] = {};
|
||||
struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
|
||||
struct vfio_device_feature_mig_data_size *mig_data_size =
|
||||
(struct vfio_device_feature_mig_data_size *)feature->data;
|
||||
|
||||
feature->argsz = sizeof(buf);
|
||||
feature->flags =
|
||||
VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIG_DATA_SIZE;
|
||||
|
||||
if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
|
||||
return -errno;
|
||||
}
|
||||
|
||||
*stop_copy_size = mig_data_size->stop_copy_length;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* Returns 1 if end-of-stream is reached, 0 if more data and -errno if error */
|
||||
static int vfio_save_block(QEMUFile *f, VFIOMigration *migration)
|
||||
{
|
||||
ssize_t data_size;
|
||||
|
||||
data_size = read(migration->data_fd, migration->data_buffer,
|
||||
migration->data_buffer_size);
|
||||
if (data_size < 0) {
|
||||
return -errno;
|
||||
}
|
||||
if (data_size == 0) {
|
||||
return 1;
|
||||
}
|
||||
|
||||
qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
|
||||
qemu_put_be64(f, data_size);
|
||||
qemu_put_buffer(f, migration->data_buffer, data_size);
|
||||
bytes_transferred += data_size;
|
||||
|
||||
trace_vfio_save_block(migration->vbasedev->name, data_size);
|
||||
|
||||
return qemu_file_get_error(f);
|
||||
}
|
||||
|
||||
/* ---------------------------------------------------------------------- */
|
||||
@ -409,169 +270,100 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
|
||||
{
|
||||
VFIODevice *vbasedev = opaque;
|
||||
VFIOMigration *migration = vbasedev->migration;
|
||||
int ret;
|
||||
|
||||
trace_vfio_save_setup(vbasedev->name);
|
||||
uint64_t stop_copy_size = VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE;
|
||||
|
||||
qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
|
||||
|
||||
if (migration->region.mmaps) {
|
||||
/*
|
||||
* Calling vfio_region_mmap() from migration thread. Memory API called
|
||||
* from this function require locking the iothread when called from
|
||||
* outside the main loop thread.
|
||||
*/
|
||||
qemu_mutex_lock_iothread();
|
||||
ret = vfio_region_mmap(&migration->region);
|
||||
qemu_mutex_unlock_iothread();
|
||||
if (ret) {
|
||||
error_report("%s: Failed to mmap VFIO migration region: %s",
|
||||
vbasedev->name, strerror(-ret));
|
||||
error_report("%s: Falling back to slow path", vbasedev->name);
|
||||
}
|
||||
vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
|
||||
migration->data_buffer_size = MIN(VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE,
|
||||
stop_copy_size);
|
||||
migration->data_buffer = g_try_malloc0(migration->data_buffer_size);
|
||||
if (!migration->data_buffer) {
|
||||
error_report("%s: Failed to allocate migration data buffer",
|
||||
vbasedev->name);
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
|
||||
VFIO_DEVICE_STATE_V1_SAVING);
|
||||
if (ret) {
|
||||
error_report("%s: Failed to set state SAVING", vbasedev->name);
|
||||
return ret;
|
||||
}
|
||||
trace_vfio_save_setup(vbasedev->name, migration->data_buffer_size);
|
||||
|
||||
qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
|
||||
|
||||
ret = qemu_file_get_error(f);
|
||||
if (ret) {
|
||||
return ret;
|
||||
}
|
||||
|
||||
return 0;
|
||||
return qemu_file_get_error(f);
|
||||
}
|
||||
|
||||
static void vfio_save_cleanup(void *opaque)
|
||||
{
|
||||
VFIODevice *vbasedev = opaque;
|
||||
VFIOMigration *migration = vbasedev->migration;
|
||||
|
||||
g_free(migration->data_buffer);
|
||||
migration->data_buffer = NULL;
|
||||
vfio_migration_cleanup(vbasedev);
|
||||
trace_vfio_save_cleanup(vbasedev->name);
|
||||
}
|
||||
|
||||
static void vfio_state_pending(void *opaque, uint64_t *must_precopy,
|
||||
uint64_t *can_postcopy)
|
||||
/*
|
||||
* Migration size of VFIO devices can be as little as a few KBs or as big as
|
||||
* many GBs. This value should be big enough to cover the worst case.
|
||||
*/
|
||||
#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)
|
||||
|
||||
/*
|
||||
* Only exact function is implemented and not estimate function. The reason is
|
||||
* that during pre-copy phase of migration the estimate function is called
|
||||
* repeatedly while pending RAM size is over the threshold, thus migration
|
||||
* can't converge and querying the VFIO device pending data size is useless.
|
||||
*/
|
||||
static void vfio_state_pending_exact(void *opaque, uint64_t *must_precopy,
|
||||
uint64_t *can_postcopy)
|
||||
{
|
||||
VFIODevice *vbasedev = opaque;
|
||||
VFIOMigration *migration = vbasedev->migration;
|
||||
int ret;
|
||||
uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;
|
||||
|
||||
ret = vfio_update_pending(vbasedev);
|
||||
if (ret) {
|
||||
return;
|
||||
}
|
||||
/*
|
||||
* If getting pending migration size fails, VFIO_MIG_STOP_COPY_SIZE is
|
||||
* reported so downtime limit won't be violated.
|
||||
*/
|
||||
vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
|
||||
*must_precopy += stop_copy_size;
|
||||
|
||||
*must_precopy += migration->pending_bytes;
|
||||
|
||||
trace_vfio_state_pending(vbasedev->name, *must_precopy, *can_postcopy);
|
||||
trace_vfio_state_pending_exact(vbasedev->name, *must_precopy, *can_postcopy,
|
||||
stop_copy_size);
|
||||
}
|
||||
|
||||
static int vfio_save_iterate(QEMUFile *f, void *opaque)
|
||||
static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
|
||||
{
|
||||
VFIODevice *vbasedev = opaque;
|
||||
VFIOMigration *migration = vbasedev->migration;
|
||||
uint64_t data_size;
|
||||
int ret;
|
||||
|
||||
qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
|
||||
|
||||
if (migration->pending_bytes == 0) {
|
||||
ret = vfio_update_pending(vbasedev);
|
||||
if (ret) {
|
||||
return ret;
|
||||
}
|
||||
|
||||
if (migration->pending_bytes == 0) {
|
||||
qemu_put_be64(f, 0);
|
||||
qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
|
||||
/* indicates data finished, goto complete phase */
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
|
||||
ret = vfio_save_buffer(f, vbasedev, &data_size);
|
||||
/* We reach here with device state STOP only */
|
||||
ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY,
|
||||
VFIO_DEVICE_STATE_STOP);
|
||||
if (ret) {
|
||||
error_report("%s: vfio_save_buffer failed %s", vbasedev->name,
|
||||
strerror(errno));
|
||||
return ret;
|
||||
}
|
||||
|
||||
qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
|
||||
do {
|
||||
ret = vfio_save_block(f, vbasedev->migration);
|
||||
if (ret < 0) {
|
||||
return ret;
|
||||
}
|
||||
} while (!ret);
|
||||
|
||||
qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
|
||||
ret = qemu_file_get_error(f);
|
||||
if (ret) {
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*
|
||||
* Reset pending_bytes as state_pending* are not called during
|
||||
* savevm or snapshot case, in such case vfio_update_pending() at
|
||||
* the start of this function updates pending_bytes.
|
||||
* If setting the device in STOP state fails, the device should be reset.
|
||||
* To do so, use ERROR state as a recover state.
|
||||
*/
|
||||
migration->pending_bytes = 0;
|
||||
trace_vfio_save_iterate(vbasedev->name, data_size);
|
||||
return 0;
|
||||
}
|
||||
ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
|
||||
VFIO_DEVICE_STATE_ERROR);
|
||||
trace_vfio_save_complete_precopy(vbasedev->name, ret);
|
||||
|
||||
static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
|
||||
{
|
||||
VFIODevice *vbasedev = opaque;
|
||||
VFIOMigration *migration = vbasedev->migration;
|
||||
uint64_t data_size;
|
||||
int ret;
|
||||
|
||||
ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_RUNNING,
|
||||
VFIO_DEVICE_STATE_V1_SAVING);
|
||||
if (ret) {
|
||||
error_report("%s: Failed to set state STOP and SAVING",
|
||||
vbasedev->name);
|
||||
return ret;
|
||||
}
|
||||
|
||||
ret = vfio_update_pending(vbasedev);
|
||||
if (ret) {
|
||||
return ret;
|
||||
}
|
||||
|
||||
while (migration->pending_bytes > 0) {
|
||||
qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
|
||||
ret = vfio_save_buffer(f, vbasedev, &data_size);
|
||||
if (ret < 0) {
|
||||
error_report("%s: Failed to save buffer", vbasedev->name);
|
||||
return ret;
|
||||
}
|
||||
|
||||
if (data_size == 0) {
|
||||
break;
|
||||
}
|
||||
|
||||
ret = vfio_update_pending(vbasedev);
|
||||
if (ret) {
|
||||
return ret;
|
||||
}
|
||||
}
|
||||
|
||||
qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
|
||||
|
||||
ret = qemu_file_get_error(f);
|
||||
if (ret) {
|
||||
return ret;
|
||||
}
|
||||
|
||||
ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_SAVING, 0);
|
||||
if (ret) {
|
||||
error_report("%s: Failed to set state STOPPED", vbasedev->name);
|
||||
return ret;
|
||||
}
|
||||
|
||||
trace_vfio_save_complete_precopy(vbasedev->name);
|
||||
return ret;
|
||||
}
|
||||
|
||||
@ -591,28 +383,9 @@ static void vfio_save_state(QEMUFile *f, void *opaque)
|
||||
static int vfio_load_setup(QEMUFile *f, void *opaque)
|
||||
{
|
||||
VFIODevice *vbasedev = opaque;
|
||||
VFIOMigration *migration = vbasedev->migration;
|
||||
int ret = 0;
|
||||
|
||||
if (migration->region.mmaps) {
|
||||
ret = vfio_region_mmap(&migration->region);
|
||||
if (ret) {
|
||||
error_report("%s: Failed to mmap VFIO migration region %d: %s",
|
||||
vbasedev->name, migration->region.nr,
|
||||
strerror(-ret));
|
||||
error_report("%s: Falling back to slow path", vbasedev->name);
|
||||
}
|
||||
}
|
||||
|
||||
ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_MASK,
|
||||
VFIO_DEVICE_STATE_V1_RESUMING);
|
||||
if (ret) {
|
||||
error_report("%s: Failed to set state RESUMING", vbasedev->name);
|
||||
if (migration->region.mmaps) {
|
||||
vfio_region_unmap(&migration->region);
|
||||
}
|
||||
}
|
||||
return ret;
|
||||
return vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
|
||||
vbasedev->migration->device_state);
|
||||
}
|
||||
|
||||
static int vfio_load_cleanup(void *opaque)
|
||||
@ -621,6 +394,7 @@ static int vfio_load_cleanup(void *opaque)
|
||||
|
||||
vfio_migration_cleanup(vbasedev);
|
||||
trace_vfio_load_cleanup(vbasedev->name);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -678,12 +452,10 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
|
||||
return ret;
|
||||
}
|
||||
|
||||
static SaveVMHandlers savevm_vfio_handlers = {
|
||||
static const SaveVMHandlers savevm_vfio_handlers = {
|
||||
.save_setup = vfio_save_setup,
|
||||
.save_cleanup = vfio_save_cleanup,
|
||||
.state_pending_exact = vfio_state_pending,
|
||||
.state_pending_estimate = vfio_state_pending,
|
||||
.save_live_iterate = vfio_save_iterate,
|
||||
.state_pending_exact = vfio_state_pending_exact,
|
||||
.save_live_complete_precopy = vfio_save_complete_precopy,
|
||||
.save_state = vfio_save_state,
|
||||
.load_setup = vfio_load_setup,
|
||||
@ -696,56 +468,33 @@ static SaveVMHandlers savevm_vfio_handlers = {
|
||||
static void vfio_vmstate_change(void *opaque, bool running, RunState state)
|
||||
{
|
||||
VFIODevice *vbasedev = opaque;
|
||||
VFIOMigration *migration = vbasedev->migration;
|
||||
uint32_t value, mask;
|
||||
enum vfio_device_mig_state new_state;
|
||||
int ret;
|
||||
|
||||
if (vbasedev->migration->vm_running == running) {
|
||||
return;
|
||||
}
|
||||
|
||||
if (running) {
|
||||
/*
|
||||
* Here device state can have one of _SAVING, _RESUMING or _STOP bit.
|
||||
* Transition from _SAVING to _RUNNING can happen if there is migration
|
||||
* failure, in that case clear _SAVING bit.
|
||||
* Transition from _RESUMING to _RUNNING occurs during resuming
|
||||
* phase, in that case clear _RESUMING bit.
|
||||
* In both the above cases, set _RUNNING bit.
|
||||
*/
|
||||
mask = ~VFIO_DEVICE_STATE_MASK;
|
||||
value = VFIO_DEVICE_STATE_V1_RUNNING;
|
||||
new_state = VFIO_DEVICE_STATE_RUNNING;
|
||||
} else {
|
||||
/*
|
||||
* Here device state could be either _RUNNING or _SAVING|_RUNNING. Reset
|
||||
* _RUNNING bit
|
||||
*/
|
||||
mask = ~VFIO_DEVICE_STATE_V1_RUNNING;
|
||||
|
||||
/*
|
||||
* When VM state transition to stop for savevm command, device should
|
||||
* start saving data.
|
||||
*/
|
||||
if (state == RUN_STATE_SAVE_VM) {
|
||||
value = VFIO_DEVICE_STATE_V1_SAVING;
|
||||
} else {
|
||||
value = 0;
|
||||
}
|
||||
new_state = VFIO_DEVICE_STATE_STOP;
|
||||
}
|
||||
|
||||
ret = vfio_migration_set_state(vbasedev, mask, value);
|
||||
/*
|
||||
* If setting the device in new_state fails, the device should be reset.
|
||||
* To do so, use ERROR state as a recover state.
|
||||
*/
|
||||
ret = vfio_migration_set_state(vbasedev, new_state,
|
||||
VFIO_DEVICE_STATE_ERROR);
|
||||
if (ret) {
|
||||
/*
|
||||
* Migration should be aborted in this case, but vm_state_notify()
|
||||
* currently does not support reporting failures.
|
||||
*/
|
||||
error_report("%s: Failed to set device state 0x%x", vbasedev->name,
|
||||
(migration->device_state & mask) | value);
|
||||
qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
|
||||
if (migrate_get_current()->to_dst_file) {
|
||||
qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
|
||||
}
|
||||
}
|
||||
vbasedev->migration->vm_running = running;
|
||||
|
||||
trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
|
||||
(migration->device_state & mask) | value);
|
||||
mig_state_to_str(new_state));
|
||||
}
|
||||
|
||||
static void vfio_migration_state_notifier(Notifier *notifier, void *data)
|
||||
@ -754,7 +503,6 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
|
||||
VFIOMigration *migration = container_of(notifier, VFIOMigration,
|
||||
migration_state);
|
||||
VFIODevice *vbasedev = migration->vbasedev;
|
||||
int ret;
|
||||
|
||||
trace_vfio_migration_state_notifier(vbasedev->name,
|
||||
MigrationStatus_str(s->state));
|
||||
@ -764,34 +512,57 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
|
||||
case MIGRATION_STATUS_CANCELLED:
|
||||
case MIGRATION_STATUS_FAILED:
|
||||
bytes_transferred = 0;
|
||||
ret = vfio_migration_set_state(vbasedev,
|
||||
~(VFIO_DEVICE_STATE_V1_SAVING |
|
||||
VFIO_DEVICE_STATE_V1_RESUMING),
|
||||
VFIO_DEVICE_STATE_V1_RUNNING);
|
||||
if (ret) {
|
||||
error_report("%s: Failed to set state RUNNING", vbasedev->name);
|
||||
}
|
||||
/*
|
||||
* If setting the device in RUNNING state fails, the device should
|
||||
* be reset. To do so, use ERROR state as a recover state.
|
||||
*/
|
||||
vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
|
||||
VFIO_DEVICE_STATE_ERROR);
|
||||
}
|
||||
}
|
||||
|
||||
static void vfio_migration_exit(VFIODevice *vbasedev)
|
||||
{
|
||||
VFIOMigration *migration = vbasedev->migration;
|
||||
|
||||
vfio_region_exit(&migration->region);
|
||||
vfio_region_finalize(&migration->region);
|
||||
g_free(vbasedev->migration);
|
||||
vbasedev->migration = NULL;
|
||||
}
|
||||
|
||||
static int vfio_migration_init(VFIODevice *vbasedev,
|
||||
struct vfio_region_info *info)
|
||||
static int vfio_migration_query_flags(VFIODevice *vbasedev, uint64_t *mig_flags)
|
||||
{
|
||||
uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
|
||||
sizeof(struct vfio_device_feature_migration),
|
||||
sizeof(uint64_t))] = {};
|
||||
struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
|
||||
struct vfio_device_feature_migration *mig =
|
||||
(struct vfio_device_feature_migration *)feature->data;
|
||||
|
||||
feature->argsz = sizeof(buf);
|
||||
feature->flags = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_MIGRATION;
|
||||
if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
|
||||
if (errno == ENOTTY) {
|
||||
error_report("%s: VFIO migration is not supported in kernel",
|
||||
vbasedev->name);
|
||||
} else {
|
||||
error_report("%s: Failed to query VFIO migration support, err: %s",
|
||||
vbasedev->name, strerror(errno));
|
||||
}
|
||||
|
||||
return -errno;
|
||||
}
|
||||
|
||||
*mig_flags = mig->flags;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int vfio_migration_init(VFIODevice *vbasedev)
|
||||
{
|
||||
int ret;
|
||||
Object *obj;
|
||||
VFIOMigration *migration;
|
||||
char id[256] = "";
|
||||
g_autofree char *path = NULL, *oid = NULL;
|
||||
uint64_t mig_flags = 0;
|
||||
|
||||
if (!vbasedev->ops->vfio_get_object) {
|
||||
return -EINVAL;
|
||||
@ -802,27 +573,21 @@ static int vfio_migration_init(VFIODevice *vbasedev,
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
vbasedev->migration = g_new0(VFIOMigration, 1);
|
||||
vbasedev->migration->device_state = VFIO_DEVICE_STATE_V1_RUNNING;
|
||||
vbasedev->migration->vm_running = runstate_is_running();
|
||||
|
||||
ret = vfio_region_setup(obj, vbasedev, &vbasedev->migration->region,
|
||||
info->index, "migration");
|
||||
ret = vfio_migration_query_flags(vbasedev, &mig_flags);
|
||||
if (ret) {
|
||||
error_report("%s: Failed to setup VFIO migration region %d: %s",
|
||||
vbasedev->name, info->index, strerror(-ret));
|
||||
goto err;
|
||||
return ret;
|
||||
}
|
||||
|
||||
if (!vbasedev->migration->region.size) {
|
||||
error_report("%s: Invalid zero-sized VFIO migration region %d",
|
||||
vbasedev->name, info->index);
|
||||
ret = -EINVAL;
|
||||
goto err;
|
||||
/* Basic migration functionality must be supported */
|
||||
if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
vbasedev->migration = g_new0(VFIOMigration, 1);
|
||||
migration = vbasedev->migration;
|
||||
migration->vbasedev = vbasedev;
|
||||
migration->device_state = VFIO_DEVICE_STATE_RUNNING;
|
||||
migration->data_fd = -1;
|
||||
|
||||
oid = vmstate_if_get_id(VMSTATE_IF(DEVICE(obj)));
|
||||
if (oid) {
|
||||
@ -840,11 +605,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
|
||||
vbasedev);
|
||||
migration->migration_state.notify = vfio_migration_state_notifier;
|
||||
add_migration_state_change_notifier(&migration->migration_state);
|
||||
return 0;
|
||||
|
||||
err:
|
||||
vfio_migration_exit(vbasedev);
|
||||
return ret;
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* ---------------------------------------------------------------------- */
|
||||
@ -856,35 +618,28 @@ int64_t vfio_mig_bytes_transferred(void)
|
||||
|
||||
int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
|
||||
{
|
||||
VFIOContainer *container = vbasedev->group->container;
|
||||
struct vfio_region_info *info = NULL;
|
||||
int ret = -ENOTSUP;
|
||||
|
||||
if (!vbasedev->enable_migration || !container->dirty_pages_supported) {
|
||||
if (!vbasedev->enable_migration) {
|
||||
goto add_blocker;
|
||||
}
|
||||
|
||||
ret = vfio_get_dev_region_info(vbasedev,
|
||||
VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
|
||||
VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
|
||||
&info);
|
||||
ret = vfio_migration_init(vbasedev);
|
||||
if (ret) {
|
||||
goto add_blocker;
|
||||
}
|
||||
|
||||
ret = vfio_migration_init(vbasedev, info);
|
||||
ret = vfio_block_multiple_devices_migration(errp);
|
||||
if (ret) {
|
||||
goto add_blocker;
|
||||
return ret;
|
||||
}
|
||||
|
||||
trace_vfio_migration_probe(vbasedev->name, info->index);
|
||||
g_free(info);
|
||||
trace_vfio_migration_probe(vbasedev->name);
|
||||
return 0;
|
||||
|
||||
add_blocker:
|
||||
error_setg(&vbasedev->migration_blocker,
|
||||
"VFIO device doesn't support migration");
|
||||
g_free(info);
|
||||
|
||||
ret = migrate_add_blocker(vbasedev->migration_blocker, errp);
|
||||
if (ret < 0) {
|
||||
@ -903,6 +658,7 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
|
||||
qemu_del_vm_change_state_handler(migration->vm_state);
|
||||
unregister_savevm(VMSTATE_IF(vbasedev->dev), "vfio", vbasedev);
|
||||
vfio_migration_exit(vbasedev);
|
||||
vfio_unblock_multiple_devices_migration();
|
||||
}
|
||||
|
||||
if (vbasedev->migration_blocker) {
|
||||
|
@ -119,6 +119,8 @@ vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Devic
|
||||
vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) "sparse entry %d [0x%lx - 0x%lx]"
|
||||
vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%0x8"
|
||||
vfio_dma_unmap_overflow_workaround(void) ""
|
||||
vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
|
||||
vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
|
||||
|
||||
# platform.c
|
||||
vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d"
|
||||
@ -148,21 +150,17 @@ vfio_display_edid_update(uint32_t prefx, uint32_t prefy) "%ux%u"
|
||||
vfio_display_edid_write_error(void) ""
|
||||
|
||||
# migration.c
|
||||
vfio_migration_probe(const char *name, uint32_t index) " (%s) Region %d"
|
||||
vfio_migration_set_state(const char *name, uint32_t state) " (%s) state %d"
|
||||
vfio_vmstate_change(const char *name, int running, const char *reason, uint32_t dev_state) " (%s) running %d reason %s device state %d"
|
||||
vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s"
|
||||
vfio_save_setup(const char *name) " (%s)"
|
||||
vfio_save_cleanup(const char *name) " (%s)"
|
||||
vfio_save_buffer(const char *name, uint64_t data_offset, uint64_t data_size, uint64_t pending) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64" pending 0x%"PRIx64
|
||||
vfio_update_pending(const char *name, uint64_t pending) " (%s) pending 0x%"PRIx64
|
||||
vfio_save_device_config_state(const char *name) " (%s)"
|
||||
vfio_state_pending(const char *name, uint64_t precopy, uint64_t postcopy) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64
|
||||
vfio_save_iterate(const char *name, int data_size) " (%s) data_size %d"
|
||||
vfio_save_complete_precopy(const char *name) " (%s)"
|
||||
vfio_load_cleanup(const char *name) " (%s)"
|
||||
vfio_load_device_config_state(const char *name) " (%s)"
|
||||
vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
|
||||
vfio_load_state_device_data(const char *name, uint64_t data_offset, uint64_t data_size) " (%s) Offset 0x%"PRIx64" size 0x%"PRIx64
|
||||
vfio_load_cleanup(const char *name) " (%s)"
|
||||
vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
|
||||
vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
|
||||
vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " (%s) size 0x%"PRIx64" ret %d"
|
||||
vfio_migration_probe(const char *name) " (%s)"
|
||||
vfio_migration_set_state(const char *name, const char *state) " (%s) state %s"
|
||||
vfio_migration_state_notifier(const char *name, const char *state) " (%s) state %s"
|
||||
vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
|
||||
vfio_save_cleanup(const char *name) " (%s)"
|
||||
vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d"
|
||||
vfio_save_device_config_state(const char *name) " (%s)"
|
||||
vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer size 0x%"PRIx64
|
||||
vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64
|
||||
vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
|
||||
|
@ -61,11 +61,11 @@ typedef struct VFIORegion {
|
||||
typedef struct VFIOMigration {
|
||||
struct VFIODevice *vbasedev;
|
||||
VMChangeStateEntry *vm_state;
|
||||
VFIORegion region;
|
||||
uint32_t device_state;
|
||||
int vm_running;
|
||||
Notifier migration_state;
|
||||
uint64_t pending_bytes;
|
||||
uint32_t device_state;
|
||||
int data_fd;
|
||||
void *data_buffer;
|
||||
size_t data_buffer_size;
|
||||
} VFIOMigration;
|
||||
|
||||
typedef struct VFIOAddressSpace {
|
||||
@ -218,6 +218,8 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
|
||||
extern VFIOGroupList vfio_group_list;
|
||||
|
||||
bool vfio_mig_active(void);
|
||||
int vfio_block_multiple_devices_migration(Error **errp);
|
||||
void vfio_unblock_multiple_devices_migration(void);
|
||||
int64_t vfio_mig_bytes_transferred(void);
|
||||
|
||||
#ifdef CONFIG_LINUX
|
||||
|
@ -743,6 +743,35 @@ extern "C" {
|
||||
*/
|
||||
#define DRM_FORMAT_MOD_VIVANTE_SPLIT_SUPER_TILED fourcc_mod_code(VIVANTE, 4)
|
||||
|
||||
/*
|
||||
* Vivante TS (tile-status) buffer modifiers. They can be combined with all of
|
||||
* the color buffer tiling modifiers defined above. When TS is present it's a
|
||||
* separate buffer containing the clear/compression status of each tile. The
|
||||
* modifiers are defined as VIVANTE_MOD_TS_c_s, where c is the color buffer
|
||||
* tile size in bytes covered by one entry in the status buffer and s is the
|
||||
* number of status bits per entry.
|
||||
* We reserve the top 8 bits of the Vivante modifier space for tile status
|
||||
* clear/compression modifiers, as future cores might add some more TS layout
|
||||
* variations.
|
||||
*/
|
||||
#define VIVANTE_MOD_TS_64_4 (1ULL << 48)
|
||||
#define VIVANTE_MOD_TS_64_2 (2ULL << 48)
|
||||
#define VIVANTE_MOD_TS_128_4 (3ULL << 48)
|
||||
#define VIVANTE_MOD_TS_256_4 (4ULL << 48)
|
||||
#define VIVANTE_MOD_TS_MASK (0xfULL << 48)
|
||||
|
||||
/*
|
||||
* Vivante compression modifiers. Those depend on a TS modifier being present
|
||||
* as the TS bits get reinterpreted as compression tags instead of simple
|
||||
* clear markers when compression is enabled.
|
||||
*/
|
||||
#define VIVANTE_MOD_COMP_DEC400 (1ULL << 52)
|
||||
#define VIVANTE_MOD_COMP_MASK (0xfULL << 52)
|
||||
|
||||
/* Masking out the extension bits will yield the base modifier. */
|
||||
#define VIVANTE_MOD_EXT_MASK (VIVANTE_MOD_TS_MASK | \
|
||||
VIVANTE_MOD_COMP_MASK)
|
||||
|
||||
/* NVIDIA frame buffer modifiers */
|
||||
|
||||
/*
|
||||
|
@ -159,8 +159,10 @@ static inline uint32_t ethtool_cmd_speed(const struct ethtool_cmd *ep)
|
||||
* in its bus driver structure (e.g. pci_driver::name). Must
|
||||
* not be an empty string.
|
||||
* @version: Driver version string; may be an empty string
|
||||
* @fw_version: Firmware version string; may be an empty string
|
||||
* @erom_version: Expansion ROM version string; may be an empty string
|
||||
* @fw_version: Firmware version string; driver defined; may be an
|
||||
* empty string
|
||||
* @erom_version: Expansion ROM version string; driver defined; may be
|
||||
* an empty string
|
||||
* @bus_info: Device bus address. This should match the dev_name()
|
||||
* string for the underlying bus device, if there is one. May be
|
||||
* an empty string.
|
||||
@ -179,10 +181,6 @@ static inline uint32_t ethtool_cmd_speed(const struct ethtool_cmd *ep)
|
||||
*
|
||||
* Users can use the %ETHTOOL_GSSET_INFO command to get the number of
|
||||
* strings in any string set (from Linux 2.6.34).
|
||||
*
|
||||
* Drivers should set at most @driver, @version, @fw_version and
|
||||
* @bus_info in their get_drvinfo() implementation. The ethtool
|
||||
* core fills in the other fields using other driver operations.
|
||||
*/
|
||||
struct ethtool_drvinfo {
|
||||
uint32_t cmd;
|
||||
@ -1737,6 +1735,13 @@ enum ethtool_link_mode_bit_indices {
|
||||
ETHTOOL_LINK_MODE_100baseFX_Half_BIT = 90,
|
||||
ETHTOOL_LINK_MODE_100baseFX_Full_BIT = 91,
|
||||
ETHTOOL_LINK_MODE_10baseT1L_Full_BIT = 92,
|
||||
ETHTOOL_LINK_MODE_800000baseCR8_Full_BIT = 93,
|
||||
ETHTOOL_LINK_MODE_800000baseKR8_Full_BIT = 94,
|
||||
ETHTOOL_LINK_MODE_800000baseDR8_Full_BIT = 95,
|
||||
ETHTOOL_LINK_MODE_800000baseDR8_2_Full_BIT = 96,
|
||||
ETHTOOL_LINK_MODE_800000baseSR8_Full_BIT = 97,
|
||||
ETHTOOL_LINK_MODE_800000baseVR8_Full_BIT = 98,
|
||||
|
||||
/* must be last entry */
|
||||
__ETHTOOL_LINK_MODE_MASK_NBITS
|
||||
};
|
||||
@ -1848,6 +1853,7 @@ enum ethtool_link_mode_bit_indices {
|
||||
#define SPEED_100000 100000
|
||||
#define SPEED_200000 200000
|
||||
#define SPEED_400000 400000
|
||||
#define SPEED_800000 800000
|
||||
|
||||
#define SPEED_UNKNOWN -1
|
||||
|
||||
|
@ -197,6 +197,10 @@
|
||||
*
|
||||
* 7.37
|
||||
* - add FUSE_TMPFILE
|
||||
*
|
||||
* 7.38
|
||||
* - add FUSE_EXPIRE_ONLY flag to fuse_notify_inval_entry
|
||||
* - add FOPEN_PARALLEL_DIRECT_WRITES
|
||||
*/
|
||||
|
||||
#ifndef _LINUX_FUSE_H
|
||||
@ -228,7 +232,7 @@
|
||||
#define FUSE_KERNEL_VERSION 7
|
||||
|
||||
/** Minor version number of this interface */
|
||||
#define FUSE_KERNEL_MINOR_VERSION 37
|
||||
#define FUSE_KERNEL_MINOR_VERSION 38
|
||||
|
||||
/** The node ID of the root inode */
|
||||
#define FUSE_ROOT_ID 1
|
||||
@ -300,6 +304,7 @@ struct fuse_file_lock {
|
||||
* FOPEN_CACHE_DIR: allow caching this directory
|
||||
* FOPEN_STREAM: the file is stream-like (no file position at all)
|
||||
* FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
|
||||
* FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on the same inode
|
||||
*/
|
||||
#define FOPEN_DIRECT_IO (1 << 0)
|
||||
#define FOPEN_KEEP_CACHE (1 << 1)
|
||||
@ -307,6 +312,7 @@ struct fuse_file_lock {
|
||||
#define FOPEN_CACHE_DIR (1 << 3)
|
||||
#define FOPEN_STREAM (1 << 4)
|
||||
#define FOPEN_NOFLUSH (1 << 5)
|
||||
#define FOPEN_PARALLEL_DIRECT_WRITES (1 << 6)
|
||||
|
||||
/**
|
||||
* INIT request/reply flags
|
||||
@ -487,6 +493,12 @@ struct fuse_file_lock {
|
||||
*/
|
||||
#define FUSE_SETXATTR_ACL_KILL_SGID (1 << 0)
|
||||
|
||||
/**
|
||||
* notify_inval_entry flags
|
||||
* FUSE_EXPIRE_ONLY
|
||||
*/
|
||||
#define FUSE_EXPIRE_ONLY (1 << 0)
|
||||
|
||||
enum fuse_opcode {
|
||||
FUSE_LOOKUP = 1,
|
||||
FUSE_FORGET = 2, /* no reply */
|
||||
@ -915,7 +927,7 @@ struct fuse_notify_inval_inode_out {
|
||||
struct fuse_notify_inval_entry_out {
|
||||
uint64_t parent;
|
||||
uint32_t namelen;
|
||||
uint32_t padding;
|
||||
uint32_t flags;
|
||||
};
|
||||
|
||||
struct fuse_notify_delete_out {
|
||||
|
@ -614,6 +614,9 @@
|
||||
#define KEY_KBD_LAYOUT_NEXT 0x248 /* AC Next Keyboard Layout Select */
|
||||
#define KEY_EMOJI_PICKER 0x249 /* Show/hide emoji picker (HUTRR101) */
|
||||
#define KEY_DICTATE 0x24a /* Start or Stop Voice Dictation Session (HUTRR99) */
|
||||
#define KEY_CAMERA_ACCESS_ENABLE 0x24b /* Enables programmatic access to camera devices. (HUTRR72) */
|
||||
#define KEY_CAMERA_ACCESS_DISABLE 0x24c /* Disables programmatic access to camera devices. (HUTRR72) */
|
||||
#define KEY_CAMERA_ACCESS_TOGGLE 0x24d /* Toggles the current state of the camera access control. (HUTRR72) */
|
||||
|
||||
#define KEY_BRIGHTNESS_MIN 0x250 /* Set Brightness to Minimum */
|
||||
#define KEY_BRIGHTNESS_MAX 0x251 /* Set Brightness to Maximum */
|
||||
|
@ -1058,6 +1058,7 @@
|
||||
/* Precision Time Measurement */
|
||||
#define PCI_PTM_CAP 0x04 /* PTM Capability */
|
||||
#define PCI_PTM_CAP_REQ 0x00000001 /* Requester capable */
|
||||
#define PCI_PTM_CAP_RES 0x00000002 /* Responder capable */
|
||||
#define PCI_PTM_CAP_ROOT 0x00000004 /* Root capable */
|
||||
#define PCI_PTM_GRANULARITY_MASK 0x0000FF00 /* Clock granularity */
|
||||
#define PCI_PTM_CTRL 0x08 /* PTM Control */
|
||||
@ -1119,6 +1120,7 @@
|
||||
#define PCI_DOE_STATUS_DATA_OBJECT_READY 0x80000000 /* Data Object Ready */
|
||||
#define PCI_DOE_WRITE 0x10 /* DOE Write Data Mailbox Register */
|
||||
#define PCI_DOE_READ 0x14 /* DOE Read Data Mailbox Register */
|
||||
#define PCI_DOE_CAP_SIZEOF 0x18 /* Size of DOE register block */
|
||||
|
||||
/* DOE Data Object - note not actually registers */
|
||||
#define PCI_DOE_DATA_OBJECT_HEADER_1_VID 0x0000ffff
|
||||
|
@ -9,6 +9,7 @@
|
||||
#define VIRTIO_BT_F_VND_HCI 0 /* Indicates vendor command support */
|
||||
#define VIRTIO_BT_F_MSFT_EXT 1 /* Indicates MSFT vendor support */
|
||||
#define VIRTIO_BT_F_AOSP_EXT 2 /* Indicates AOSP vendor support */
|
||||
#define VIRTIO_BT_F_CONFIG_V2 3 /* Use second version configuration */
|
||||
|
||||
enum virtio_bt_config_type {
|
||||
VIRTIO_BT_CONFIG_TYPE_PRIMARY = 0,
|
||||
@ -28,4 +29,11 @@ struct virtio_bt_config {
|
||||
uint16_t msft_opcode;
|
||||
} QEMU_PACKED;
|
||||
|
||||
struct virtio_bt_config_v2 {
|
||||
uint8_t type;
|
||||
uint8_t alignment;
|
||||
uint16_t vendor;
|
||||
uint16_t msft_opcode;
|
||||
};
|
||||
|
||||
#endif /* _LINUX_VIRTIO_BT_H */
|
||||
|
@ -57,6 +57,9 @@
|
||||
* Steering */
|
||||
#define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
|
||||
#define VIRTIO_NET_F_NOTF_COAL 53 /* Device supports notifications coalescing */
|
||||
#define VIRTIO_NET_F_GUEST_USO4 54 /* Guest can handle USOv4 in. */
|
||||
#define VIRTIO_NET_F_GUEST_USO6 55 /* Guest can handle USOv6 in. */
|
||||
#define VIRTIO_NET_F_HOST_USO 56 /* Host can handle USO in. */
|
||||
#define VIRTIO_NET_F_HASH_REPORT 57 /* Supports hash report */
|
||||
#define VIRTIO_NET_F_RSS 60 /* Supports RSS RX steering */
|
||||
#define VIRTIO_NET_F_RSC_EXT 61 /* extended coalescing info */
|
||||
@ -130,6 +133,7 @@ struct virtio_net_hdr_v1 {
|
||||
#define VIRTIO_NET_HDR_GSO_TCPV4 1 /* GSO frame, IPv4 TCP (TSO) */
|
||||
#define VIRTIO_NET_HDR_GSO_UDP 3 /* GSO frame, IPv4 UDP (UFO) */
|
||||
#define VIRTIO_NET_HDR_GSO_TCPV6 4 /* GSO frame, IPv6 TCP */
|
||||
#define VIRTIO_NET_HDR_GSO_UDP_L4 5 /* GSO frame, IPv4& IPv6 UDP (USO) */
|
||||
#define VIRTIO_NET_HDR_GSO_ECN 0x80 /* TCP has ECN set */
|
||||
uint8_t gso_type;
|
||||
__virtio16 hdr_len; /* Ethernet + IP + tcp/udp hdrs */
|
||||
|
@ -43,6 +43,7 @@
|
||||
#define __KVM_HAVE_VCPU_EVENTS
|
||||
|
||||
#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
|
||||
#define KVM_DIRTY_LOG_PAGE_OFFSET 64
|
||||
|
||||
#define KVM_REG_SIZE(id) \
|
||||
(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
|
||||
|
@ -49,6 +49,9 @@ struct kvm_sregs {
|
||||
struct kvm_riscv_config {
|
||||
unsigned long isa;
|
||||
unsigned long zicbom_block_size;
|
||||
unsigned long mvendorid;
|
||||
unsigned long marchid;
|
||||
unsigned long mimpid;
|
||||
};
|
||||
|
||||
/* CORE registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */
|
||||
|
@ -53,14 +53,6 @@
|
||||
/* Architectural interrupt line count. */
|
||||
#define KVM_NR_INTERRUPTS 256
|
||||
|
||||
struct kvm_memory_alias {
|
||||
__u32 slot; /* this has a different namespace than memory slots */
|
||||
__u32 flags;
|
||||
__u64 guest_phys_addr;
|
||||
__u64 memory_size;
|
||||
__u64 target_phys_addr;
|
||||
};
|
||||
|
||||
/* for KVM_GET_IRQCHIP and KVM_SET_IRQCHIP */
|
||||
struct kvm_pic_state {
|
||||
__u8 last_irr; /* edge detection */
|
||||
@ -214,6 +206,8 @@ struct kvm_msr_list {
|
||||
struct kvm_msr_filter_range {
|
||||
#define KVM_MSR_FILTER_READ (1 << 0)
|
||||
#define KVM_MSR_FILTER_WRITE (1 << 1)
|
||||
#define KVM_MSR_FILTER_RANGE_VALID_MASK (KVM_MSR_FILTER_READ | \
|
||||
KVM_MSR_FILTER_WRITE)
|
||||
__u32 flags;
|
||||
__u32 nmsrs; /* number of msrs in bitmap */
|
||||
__u32 base; /* MSR index the bitmap starts at */
|
||||
@ -224,6 +218,7 @@ struct kvm_msr_filter_range {
|
||||
struct kvm_msr_filter {
|
||||
#define KVM_MSR_FILTER_DEFAULT_ALLOW (0 << 0)
|
||||
#define KVM_MSR_FILTER_DEFAULT_DENY (1 << 0)
|
||||
#define KVM_MSR_FILTER_VALID_MASK (KVM_MSR_FILTER_DEFAULT_DENY)
|
||||
__u32 flags;
|
||||
struct kvm_msr_filter_range ranges[KVM_MSR_FILTER_MAX_RANGES];
|
||||
};
|
||||
|
@ -86,14 +86,6 @@ struct kvm_debug_guest {
|
||||
/* *** End of deprecated interfaces *** */
|
||||
|
||||
|
||||
/* for KVM_CREATE_MEMORY_REGION */
|
||||
struct kvm_memory_region {
|
||||
__u32 slot;
|
||||
__u32 flags;
|
||||
__u64 guest_phys_addr;
|
||||
__u64 memory_size; /* bytes */
|
||||
};
|
||||
|
||||
/* for KVM_SET_USER_MEMORY_REGION */
|
||||
struct kvm_userspace_memory_region {
|
||||
__u32 slot;
|
||||
@ -104,9 +96,9 @@ struct kvm_userspace_memory_region {
|
||||
};
|
||||
|
||||
/*
|
||||
* The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace,
|
||||
* other bits are reserved for kvm internal use which are defined in
|
||||
* include/linux/kvm_host.h.
|
||||
* The bit 0 ~ bit 15 of kvm_userspace_memory_region::flags are visible for
|
||||
* userspace, other bits are reserved for kvm internal use which are defined
|
||||
* in include/linux/kvm_host.h.
|
||||
*/
|
||||
#define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0)
|
||||
#define KVM_MEM_READONLY (1UL << 1)
|
||||
@ -483,6 +475,9 @@ struct kvm_run {
|
||||
#define KVM_MSR_EXIT_REASON_INVAL (1 << 0)
|
||||
#define KVM_MSR_EXIT_REASON_UNKNOWN (1 << 1)
|
||||
#define KVM_MSR_EXIT_REASON_FILTER (1 << 2)
|
||||
#define KVM_MSR_EXIT_REASON_VALID_MASK (KVM_MSR_EXIT_REASON_INVAL | \
|
||||
KVM_MSR_EXIT_REASON_UNKNOWN | \
|
||||
KVM_MSR_EXIT_REASON_FILTER)
|
||||
__u32 reason; /* kernel -> user */
|
||||
__u32 index; /* kernel -> user */
|
||||
__u64 data; /* kernel <-> user */
|
||||
@ -1176,6 +1171,8 @@ struct kvm_ppc_resize_hpt {
|
||||
#define KVM_CAP_S390_ZPCI_OP 221
|
||||
#define KVM_CAP_S390_CPU_TOPOLOGY 222
|
||||
#define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
|
||||
#define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224
|
||||
#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225
|
||||
|
||||
#ifdef KVM_CAP_IRQ_ROUTING
|
||||
|
||||
@ -1265,6 +1262,7 @@ struct kvm_x86_mce {
|
||||
#define KVM_XEN_HVM_CONFIG_RUNSTATE (1 << 3)
|
||||
#define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL (1 << 4)
|
||||
#define KVM_XEN_HVM_CONFIG_EVTCHN_SEND (1 << 5)
|
||||
#define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG (1 << 6)
|
||||
|
||||
struct kvm_xen_hvm_config {
|
||||
__u32 flags;
|
||||
@ -1435,18 +1433,12 @@ struct kvm_vfio_spapr_tce {
|
||||
__s32 tablefd;
|
||||
};
|
||||
|
||||
/*
|
||||
* ioctls for VM fds
|
||||
*/
|
||||
#define KVM_SET_MEMORY_REGION _IOW(KVMIO, 0x40, struct kvm_memory_region)
|
||||
/*
|
||||
* KVM_CREATE_VCPU receives as a parameter the vcpu slot, and returns
|
||||
* a vcpu fd.
|
||||
*/
|
||||
#define KVM_CREATE_VCPU _IO(KVMIO, 0x41)
|
||||
#define KVM_GET_DIRTY_LOG _IOW(KVMIO, 0x42, struct kvm_dirty_log)
|
||||
/* KVM_SET_MEMORY_ALIAS is obsolete: */
|
||||
#define KVM_SET_MEMORY_ALIAS _IOW(KVMIO, 0x43, struct kvm_memory_alias)
|
||||
#define KVM_SET_NR_MMU_PAGES _IO(KVMIO, 0x44)
|
||||
#define KVM_GET_NR_MMU_PAGES _IO(KVMIO, 0x45)
|
||||
#define KVM_SET_USER_MEMORY_REGION _IOW(KVMIO, 0x46, \
|
||||
@ -1738,6 +1730,8 @@ enum pv_cmd_id {
|
||||
KVM_PV_UNSHARE_ALL,
|
||||
KVM_PV_INFO,
|
||||
KVM_PV_DUMP,
|
||||
KVM_PV_ASYNC_CLEANUP_PREPARE,
|
||||
KVM_PV_ASYNC_CLEANUP_PERFORM,
|
||||
};
|
||||
|
||||
struct kvm_pv_cmd {
|
||||
@ -1768,8 +1762,10 @@ struct kvm_xen_hvm_attr {
|
||||
union {
|
||||
__u8 long_mode;
|
||||
__u8 vector;
|
||||
__u8 runstate_update_flag;
|
||||
struct {
|
||||
__u64 gfn;
|
||||
#define KVM_XEN_INVALID_GFN ((__u64)-1)
|
||||
} shared_info;
|
||||
struct {
|
||||
__u32 send_port;
|
||||
@ -1801,6 +1797,7 @@ struct kvm_xen_hvm_attr {
|
||||
} u;
|
||||
};
|
||||
|
||||
|
||||
/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO */
|
||||
#define KVM_XEN_ATTR_TYPE_LONG_MODE 0x0
|
||||
#define KVM_XEN_ATTR_TYPE_SHARED_INFO 0x1
|
||||
@ -1808,6 +1805,8 @@ struct kvm_xen_hvm_attr {
|
||||
/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND */
|
||||
#define KVM_XEN_ATTR_TYPE_EVTCHN 0x3
|
||||
#define KVM_XEN_ATTR_TYPE_XEN_VERSION 0x4
|
||||
/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG */
|
||||
#define KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG 0x5
|
||||
|
||||
/* Per-vCPU Xen attributes */
|
||||
#define KVM_XEN_VCPU_GET_ATTR _IOWR(KVMIO, 0xca, struct kvm_xen_vcpu_attr)
|
||||
@ -1824,6 +1823,7 @@ struct kvm_xen_vcpu_attr {
|
||||
__u16 pad[3];
|
||||
union {
|
||||
__u64 gpa;
|
||||
#define KVM_XEN_INVALID_GPA ((__u64)-1)
|
||||
__u64 pad[8];
|
||||
struct {
|
||||
__u64 state;
|
||||
|
@ -58,7 +58,7 @@
|
||||
|
||||
#define PSCI_1_1_FN_SYSTEM_RESET2 PSCI_0_2_FN(18)
|
||||
#define PSCI_1_1_FN_MEM_PROTECT PSCI_0_2_FN(19)
|
||||
#define PSCI_1_1_FN_MEM_PROTECT_CHECK_RANGE PSCI_0_2_FN(19)
|
||||
#define PSCI_1_1_FN_MEM_PROTECT_CHECK_RANGE PSCI_0_2_FN(20)
|
||||
|
||||
#define PSCI_1_0_FN64_CPU_DEFAULT_SUSPEND PSCI_0_2_FN64(12)
|
||||
#define PSCI_1_0_FN64_NODE_HW_STATE PSCI_0_2_FN64(13)
|
||||
@ -67,7 +67,7 @@
|
||||
#define PSCI_1_0_FN64_STAT_COUNT PSCI_0_2_FN64(17)
|
||||
|
||||
#define PSCI_1_1_FN64_SYSTEM_RESET2 PSCI_0_2_FN64(18)
|
||||
#define PSCI_1_1_FN64_MEM_PROTECT_CHECK_RANGE PSCI_0_2_FN64(19)
|
||||
#define PSCI_1_1_FN64_MEM_PROTECT_CHECK_RANGE PSCI_0_2_FN64(20)
|
||||
|
||||
/* PSCI v0.2 power state encoding for CPU_SUSPEND function */
|
||||
#define PSCI_0_2_POWER_STATE_ID_MASK 0xffff
|
||||
|
@ -819,12 +819,20 @@ struct vfio_device_feature {
|
||||
* VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that RUNNING_P2P
|
||||
* is supported in addition to the STOP_COPY states.
|
||||
*
|
||||
* VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_PRE_COPY means that
|
||||
* PRE_COPY is supported in addition to the STOP_COPY states.
|
||||
*
|
||||
* VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P | VFIO_MIGRATION_PRE_COPY
|
||||
* means that RUNNING_P2P, PRE_COPY and PRE_COPY_P2P are supported
|
||||
* in addition to the STOP_COPY states.
|
||||
*
|
||||
* Other combinations of flags have behavior to be defined in the future.
|
||||
*/
|
||||
struct vfio_device_feature_migration {
|
||||
__aligned_u64 flags;
|
||||
#define VFIO_MIGRATION_STOP_COPY (1 << 0)
|
||||
#define VFIO_MIGRATION_P2P (1 << 1)
|
||||
#define VFIO_MIGRATION_PRE_COPY (1 << 2)
|
||||
};
|
||||
#define VFIO_DEVICE_FEATURE_MIGRATION 1
|
||||
|
||||
@ -875,8 +883,13 @@ struct vfio_device_feature_mig_state {
|
||||
* RESUMING - The device is stopped and is loading a new internal state
|
||||
* ERROR - The device has failed and must be reset
|
||||
*
|
||||
* And 1 optional state to support VFIO_MIGRATION_P2P:
|
||||
* And optional states to support VFIO_MIGRATION_P2P:
|
||||
* RUNNING_P2P - RUNNING, except the device cannot do peer to peer DMA
|
||||
* And VFIO_MIGRATION_PRE_COPY:
|
||||
* PRE_COPY - The device is running normally but tracking internal state
|
||||
* changes
|
||||
* And VFIO_MIGRATION_P2P | VFIO_MIGRATION_PRE_COPY:
|
||||
* PRE_COPY_P2P - PRE_COPY, except the device cannot do peer to peer DMA
|
||||
*
|
||||
* The FSM takes actions on the arcs between FSM states. The driver implements
|
||||
* the following behavior for the FSM arcs:
|
||||
@ -908,20 +921,48 @@ struct vfio_device_feature_mig_state {
|
||||
*
|
||||
* To abort a RESUMING session the device must be reset.
|
||||
*
|
||||
* PRE_COPY -> RUNNING
|
||||
* RUNNING_P2P -> RUNNING
|
||||
* While in RUNNING the device is fully operational, the device may generate
|
||||
* interrupts, DMA, respond to MMIO, all vfio device regions are functional,
|
||||
* and the device may advance its internal state.
|
||||
*
|
||||
* The PRE_COPY arc will terminate a data transfer session.
|
||||
*
|
||||
* PRE_COPY_P2P -> RUNNING_P2P
|
||||
* RUNNING -> RUNNING_P2P
|
||||
* STOP -> RUNNING_P2P
|
||||
* While in RUNNING_P2P the device is partially running in the P2P quiescent
|
||||
* state defined below.
|
||||
*
|
||||
* STOP -> STOP_COPY
|
||||
* This arc begin the process of saving the device state and will return a
|
||||
* new data_fd.
|
||||
* The PRE_COPY_P2P arc will terminate a data transfer session.
|
||||
*
|
||||
* RUNNING -> PRE_COPY
|
||||
* RUNNING_P2P -> PRE_COPY_P2P
|
||||
* STOP -> STOP_COPY
|
||||
* PRE_COPY, PRE_COPY_P2P and STOP_COPY form the "saving group" of states
|
||||
* which share a data transfer session. Moving between these states alters
|
||||
* what is streamed in session, but does not terminate or otherwise affect
|
||||
* the associated fd.
|
||||
*
|
||||
* These arcs begin the process of saving the device state and will return a
|
||||
* new data_fd. The migration driver may perform actions such as enabling
|
||||
* dirty logging of device state when entering PRE_COPY or PER_COPY_P2P.
|
||||
*
|
||||
* Each arc does not change the device operation, the device remains
|
||||
* RUNNING, P2P quiesced or in STOP. The STOP_COPY state is described below
|
||||
* in PRE_COPY_P2P -> STOP_COPY.
|
||||
*
|
||||
* PRE_COPY -> PRE_COPY_P2P
|
||||
* Entering PRE_COPY_P2P continues all the behaviors of PRE_COPY above.
|
||||
* However, while in the PRE_COPY_P2P state, the device is partially running
|
||||
* in the P2P quiescent state defined below, like RUNNING_P2P.
|
||||
*
|
||||
* PRE_COPY_P2P -> PRE_COPY
|
||||
* This arc allows returning the device to a full RUNNING behavior while
|
||||
* continuing all the behaviors of PRE_COPY.
|
||||
*
|
||||
* PRE_COPY_P2P -> STOP_COPY
|
||||
* While in the STOP_COPY state the device has the same behavior as STOP
|
||||
* with the addition that the data transfers session continues to stream the
|
||||
* migration state. End of stream on the FD indicates the entire device
|
||||
@ -939,6 +980,13 @@ struct vfio_device_feature_mig_state {
|
||||
* device state for this arc if required to prepare the device to receive the
|
||||
* migration data.
|
||||
*
|
||||
* STOP_COPY -> PRE_COPY
|
||||
* STOP_COPY -> PRE_COPY_P2P
|
||||
* These arcs are not permitted and return error if requested. Future
|
||||
* revisions of this API may define behaviors for these arcs, in this case
|
||||
* support will be discoverable by a new flag in
|
||||
* VFIO_DEVICE_FEATURE_MIGRATION.
|
||||
*
|
||||
* any -> ERROR
|
||||
* ERROR cannot be specified as a device state, however any transition request
|
||||
* can be failed with an errno return and may then move the device_state into
|
||||
@ -950,7 +998,7 @@ struct vfio_device_feature_mig_state {
|
||||
* The optional peer to peer (P2P) quiescent state is intended to be a quiescent
|
||||
* state for the device for the purposes of managing multiple devices within a
|
||||
* user context where peer-to-peer DMA between devices may be active. The
|
||||
* RUNNING_P2P states must prevent the device from initiating
|
||||
* RUNNING_P2P and PRE_COPY_P2P states must prevent the device from initiating
|
||||
* any new P2P DMA transactions. If the device can identify P2P transactions
|
||||
* then it can stop only P2P DMA, otherwise it must stop all DMA. The migration
|
||||
* driver must complete any such outstanding operations prior to completing the
|
||||
@ -963,6 +1011,8 @@ struct vfio_device_feature_mig_state {
|
||||
* above FSM arcs. As there are multiple paths through the FSM arcs the path
|
||||
* should be selected based on the following rules:
|
||||
* - Select the shortest path.
|
||||
* - The path cannot have saving group states as interior arcs, only
|
||||
* starting/end states.
|
||||
* Refer to vfio_mig_get_next_state() for the result of the algorithm.
|
||||
*
|
||||
* The automatic transit through the FSM arcs that make up the combination
|
||||
@ -976,6 +1026,9 @@ struct vfio_device_feature_mig_state {
|
||||
* support them. The user can discover if these states are supported by using
|
||||
* VFIO_DEVICE_FEATURE_MIGRATION. By using combination transitions the user can
|
||||
* avoid knowing about these optional states if the kernel driver supports them.
|
||||
*
|
||||
* Arcs touching PRE_COPY and PRE_COPY_P2P are removed if support for PRE_COPY
|
||||
* is not present.
|
||||
*/
|
||||
enum vfio_device_mig_state {
|
||||
VFIO_DEVICE_STATE_ERROR = 0,
|
||||
@ -984,8 +1037,70 @@ enum vfio_device_mig_state {
|
||||
VFIO_DEVICE_STATE_STOP_COPY = 3,
|
||||
VFIO_DEVICE_STATE_RESUMING = 4,
|
||||
VFIO_DEVICE_STATE_RUNNING_P2P = 5,
|
||||
VFIO_DEVICE_STATE_PRE_COPY = 6,
|
||||
VFIO_DEVICE_STATE_PRE_COPY_P2P = 7,
|
||||
};
|
||||
|
||||
/**
|
||||
* VFIO_MIG_GET_PRECOPY_INFO - _IO(VFIO_TYPE, VFIO_BASE + 21)
|
||||
*
|
||||
* This ioctl is used on the migration data FD in the precopy phase of the
|
||||
* migration data transfer. It returns an estimate of the current data sizes
|
||||
* remaining to be transferred. It allows the user to judge when it is
|
||||
* appropriate to leave PRE_COPY for STOP_COPY.
|
||||
*
|
||||
* This ioctl is valid only in PRE_COPY states and kernel driver should
|
||||
* return -EINVAL from any other migration state.
|
||||
*
|
||||
* The vfio_precopy_info data structure returned by this ioctl provides
|
||||
* estimates of data available from the device during the PRE_COPY states.
|
||||
* This estimate is split into two categories, initial_bytes and
|
||||
* dirty_bytes.
|
||||
*
|
||||
* The initial_bytes field indicates the amount of initial precopy
|
||||
* data available from the device. This field should have a non-zero initial
|
||||
* value and decrease as migration data is read from the device.
|
||||
* It is recommended to leave PRE_COPY for STOP_COPY only after this field
|
||||
* reaches zero. Leaving PRE_COPY earlier might make things slower.
|
||||
*
|
||||
* The dirty_bytes field tracks device state changes relative to data
|
||||
* previously retrieved. This field starts at zero and may increase as
|
||||
* the internal device state is modified or decrease as that modified
|
||||
* state is read from the device.
|
||||
*
|
||||
* Userspace may use the combination of these fields to estimate the
|
||||
* potential data size available during the PRE_COPY phases, as well as
|
||||
* trends relative to the rate the device is dirtying its internal
|
||||
* state, but these fields are not required to have any bearing relative
|
||||
* to the data size available during the STOP_COPY phase.
|
||||
*
|
||||
* Drivers have a lot of flexibility in when and what they transfer during the
|
||||
* PRE_COPY phase, and how they report this from VFIO_MIG_GET_PRECOPY_INFO.
|
||||
*
|
||||
* During pre-copy the migration data FD has a temporary "end of stream" that is
|
||||
* reached when both initial_bytes and dirty_byte are zero. For instance, this
|
||||
* may indicate that the device is idle and not currently dirtying any internal
|
||||
* state. When read() is done on this temporary end of stream the kernel driver
|
||||
* should return ENOMSG from read(). Userspace can wait for more data (which may
|
||||
* never come) by using poll.
|
||||
*
|
||||
* Once in STOP_COPY the migration data FD has a permanent end of stream
|
||||
* signaled in the usual way by read() always returning 0 and poll always
|
||||
* returning readable. ENOMSG may not be returned in STOP_COPY.
|
||||
* Support for this ioctl is mandatory if a driver claims to support
|
||||
* VFIO_MIGRATION_PRE_COPY.
|
||||
*
|
||||
* Return: 0 on success, -1 and errno set on failure.
|
||||
*/
|
||||
struct vfio_precopy_info {
|
||||
__u32 argsz;
|
||||
__u32 flags;
|
||||
__aligned_u64 initial_bytes;
|
||||
__aligned_u64 dirty_bytes;
|
||||
};
|
||||
|
||||
#define VFIO_MIG_GET_PRECOPY_INFO _IO(VFIO_TYPE, VFIO_BASE + 21)
|
||||
|
||||
/*
|
||||
* Upon VFIO_DEVICE_FEATURE_SET, allow the device to be moved into a low power
|
||||
* state with the platform-based power management. Device use of lower power
|
||||
@ -1128,6 +1243,19 @@ struct vfio_device_feature_dma_logging_report {
|
||||
|
||||
#define VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT 8
|
||||
|
||||
/*
|
||||
* Upon VFIO_DEVICE_FEATURE_GET read back the estimated data length that will
|
||||
* be required to complete stop copy.
|
||||
*
|
||||
* Note: Can be called on each device state.
|
||||
*/
|
||||
|
||||
struct vfio_device_feature_mig_data_size {
|
||||
__aligned_u64 stop_copy_length;
|
||||
};
|
||||
|
||||
#define VFIO_DEVICE_FEATURE_MIG_DATA_SIZE 9
|
||||
|
||||
/* -------- API for Type1 VFIO IOMMU -------- */
|
||||
|
||||
/**
|
||||
|
Loading…
Reference in New Issue
Block a user