ivshmem: Split ivshmem-plain, ivshmem-doorbell off ivshmem

ivshmem can be configured with and without interrupt capability
(a.k.a. "doorbell").  The two configurations have largely disjoint
options, which makes for a confusing (and badly checked) user
interface.  Moreover, the device can't tell the guest whether its
doorbell is enabled.

Create two new device models ivshmem-plain and ivshmem-doorbell, and
deprecate the old one.

Changes from ivshmem:

* PCI revision is 1 instead of 0.  The new revision is fully backwards
  compatible for guests.  Guests may elect to require at least
  revision 1 to make sure they're not exposed to the funny "no shared
  memory, yet" state.

* Property "role" replaced by "master".  role=master becomes
  master=on, role=peer becomes master=off.  Default is off instead of
  auto.

* Property "use64" is gone.  The new devices always have 64 bit BARs.

Changes from ivshmem to ivshmem-plain:

* The Interrupt Pin register in PCI config space is zero (does not use
  an interrupt pin) instead of one (uses INTA).

* Property "x-memdev" is renamed to "memdev".

* Properties "shm" and "size" are gone.  Use property "memdev"
  instead.

* Property "msi" is gone.  The new device can't have MSI-X capability.
  It can't interrupt anyway.

* Properties "ioeventfd" and "vectors" are gone.  They're meaningless
  without interrupts anyway.

Changes from ivshmem to ivshmem-doorbell:

* Property "msi" is gone.  The new device always has MSI-X capability.

* Property "ioeventfd" defaults to on instead of off.

* Property "size" is gone.  The new device can only map all the shared
  memory received from the server.

Guests can easily find out whether the device is configured for
interrupts by checking for MSI-X capability.

Note: some code added in sub-optimal places to make the diff easier to
review.  The next commit will move it to more sensible places.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-37-git-send-email-armbru@redhat.com>
This commit is contained in:
Markus Armbruster 2016-03-15 19:34:51 +01:00
parent 2a845da736
commit 5400c02b90
4 changed files with 308 additions and 140 deletions

View File

@ -17,9 +17,10 @@ get interrupted by its peers.
There are two basic configurations: There are two basic configurations:
- Just shared memory: -device ivshmem,shm=NAME,... - Just shared memory: -device ivshmem-plain,memdev=HMB,...
This uses shared memory object NAME. This uses host memory backend HMB. It should have option "share"
set.
- Shared memory plus interrupts: -device ivshmem,chardev=CHR,vectors=N,... - Shared memory plus interrupts: -device ivshmem,chardev=CHR,vectors=N,...
@ -30,9 +31,8 @@ There are two basic configurations:
Each peer gets assigned a unique ID by the server. IDs must be Each peer gets assigned a unique ID by the server. IDs must be
between 0 and 65535. between 0 and 65535.
Interrupts are message-signaled by default (MSI-X). With msi=off Interrupts are message-signaled (MSI-X). vectors=N configures the
the device has no MSI-X capability, and uses legacy INTx instead. number of vectors to use.
vectors=N configures the number of vectors to use.
For more details on ivshmem device properties, see The QEMU Emulator For more details on ivshmem device properties, see The QEMU Emulator
User Documentation (qemu-doc.*). User Documentation (qemu-doc.*).
@ -40,14 +40,15 @@ User Documentation (qemu-doc.*).
== The ivshmem PCI device's guest interface == == The ivshmem PCI device's guest interface ==
The device has vendor ID 1af4, device ID 1110, revision 0. The device has vendor ID 1af4, device ID 1110, revision 1. Before
QEMU 2.6.0, it had revision 0.
=== PCI BARs === === PCI BARs ===
The ivshmem PCI device has two or three BARs: The ivshmem PCI device has two or three BARs:
- BAR0 holds device registers (256 Byte MMIO) - BAR0 holds device registers (256 Byte MMIO)
- BAR1 holds MSI-X table and PBA (only when using MSI-X) - BAR1 holds MSI-X table and PBA (only ivshmem-doorbell)
- BAR2 maps the shared memory object - BAR2 maps the shared memory object
There are two ways to use this device: There are two ways to use this device:
@ -58,18 +59,19 @@ There are two ways to use this device:
user space (see http://dpdk.org/browse/memnic). user space (see http://dpdk.org/browse/memnic).
- If you additionally need the capability for peers to interrupt each - If you additionally need the capability for peers to interrupt each
other, you need BAR0 and, if using MSI-X, BAR1. You will most other, you need BAR0 and BAR1. You will most likely want to write a
likely want to write a kernel driver to handle interrupts. Requires kernel driver to handle interrupts. Requires the device to be
the device to be configured for interrupts, obviously. configured for interrupts, obviously.
Before QEMU 2.6.0, BAR2 can initially be invalid if the device is Before QEMU 2.6.0, BAR2 can initially be invalid if the device is
configured for interrupts. It becomes safely accessible only after configured for interrupts. It becomes safely accessible only after
the ivshmem server provided the shared memory. Guest software should the ivshmem server provided the shared memory. These devices have PCI
wait for the IVPosition register (described below) to become revision 0 rather than 1. Guest software should wait for the
non-negative before accessing BAR2. IVPosition register (described below) to become non-negative before
accessing BAR2.
The device is not capable to tell guest software whether it is Revision 0 of the device is not capable to tell guest software whether
configured for interrupts. it is configured for interrupts.
=== PCI device registers === === PCI device registers ===
@ -77,10 +79,12 @@ BAR 0 contains the following registers:
Offset Size Access On reset Function Offset Size Access On reset Function
0 4 read/write 0 Interrupt Mask 0 4 read/write 0 Interrupt Mask
bit 0: peer interrupt bit 0: peer interrupt (rev 0)
reserved (rev 1)
bit 1..31: reserved bit 1..31: reserved
4 4 read/write 0 Interrupt Status 4 4 read/write 0 Interrupt Status
bit 0: peer interrupt bit 0: peer interrupt (rev 0)
reserved (rev 1)
bit 1..31: reserved bit 1..31: reserved
8 4 read-only 0 or ID IVPosition 8 4 read-only 0 or ID IVPosition
12 4 write-only N/A Doorbell 12 4 write-only N/A Doorbell
@ -92,18 +96,18 @@ Software should only access the registers as specified in column
"Access". Reserved bits should be ignored on read, and preserved on "Access". Reserved bits should be ignored on read, and preserved on
write. write.
Interrupt Status and Mask Register together control the legacy INTx In revision 0 of the device, Interrupt Status and Mask Register
interrupt when the device has no MSI-X capability: INTx is asserted together control the legacy INTx interrupt when the device has no
when the bit-wise AND of Status and Mask is non-zero and the device MSI-X capability: INTx is asserted when the bit-wise AND of Status and
has no MSI-X capability. Interrupt Status Register bit 0 becomes 1 Mask is non-zero and the device has no MSI-X capability. Interrupt
when an interrupt request from a peer is received. Reading the Status Register bit 0 becomes 1 when an interrupt request from a peer
register clears it. is received. Reading the register clears it.
IVPosition Register: if the device is not configured for interrupts, IVPosition Register: if the device is not configured for interrupts,
this is zero. Else, it is the device's ID (between 0 and 65535). this is zero. Else, it is the device's ID (between 0 and 65535).
Before QEMU 2.6.0, the register may read -1 for a short while after Before QEMU 2.6.0, the register may read -1 for a short while after
reset. reset. These devices have PCI revision 0 rather than 1.
There is no good way for software to find out whether the device is There is no good way for software to find out whether the device is
configured for interrupts. A positive IVPosition means interrupts, configured for interrupts. A positive IVPosition means interrupts,
@ -124,14 +128,14 @@ interrupt vectors connected, the write is ignored. The device is not
capable to tell guest software what peers are connected, or how many capable to tell guest software what peers are connected, or how many
interrupt vectors are connected. interrupt vectors are connected.
If the peer doesn't use MSI-X, its Interrupt Status register is set to The peer's interrupt for this vector then becomes pending. There is
1. This asserts INTx unless masked by the Interrupt Mask register. no way for software to clear the pending bit, and a polling mode of
The device is not capable to communicate the interrupt vector to guest operation is therefore impossible.
software then.
If the peer uses MSI-X, the interrupt for this vector becomes pending. If the peer is a revision 0 device without MSI-X capability, its
There is no way for software to clear the pending bit, and a polling Interrupt Status register is set to 1. This asserts INTx unless
mode of operation is therefore impossible with MSI-X. masked by the Interrupt Mask register. The device is not capable to
communicate the interrupt vector to guest software then.
With multiple MSI-X vectors, different vectors can be used to indicate With multiple MSI-X vectors, different vectors can be used to indicate
different events have occurred. The semantics of interrupt vectors different events have occurred. The semantics of interrupt vectors

View File

@ -29,6 +29,7 @@
#include "qom/object_interfaces.h" #include "qom/object_interfaces.h"
#include "sysemu/char.h" #include "sysemu/char.h"
#include "sysemu/hostmem.h" #include "sysemu/hostmem.h"
#include "sysemu/qtest.h"
#include "qapi/visitor.h" #include "qapi/visitor.h"
#include "exec/ram_addr.h" #include "exec/ram_addr.h"
@ -53,6 +54,18 @@
} \ } \
} while (0) } while (0)
#define TYPE_IVSHMEM_COMMON "ivshmem-common"
#define IVSHMEM_COMMON(obj) \
OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM_COMMON)
#define TYPE_IVSHMEM_PLAIN "ivshmem-plain"
#define IVSHMEM_PLAIN(obj) \
OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM_PLAIN)
#define TYPE_IVSHMEM_DOORBELL "ivshmem-doorbell"
#define IVSHMEM_DOORBELL(obj) \
OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM_DOORBELL)
#define TYPE_IVSHMEM "ivshmem" #define TYPE_IVSHMEM "ivshmem"
#define IVSHMEM(obj) \ #define IVSHMEM(obj) \
OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM) OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM)
@ -81,8 +94,6 @@ typedef struct IVShmemState {
MemoryRegion *ivshmem_bar2; /* BAR 2 (shared memory) */ MemoryRegion *ivshmem_bar2; /* BAR 2 (shared memory) */
MemoryRegion server_bar2; /* used with server_chr */ MemoryRegion server_bar2; /* used with server_chr */
size_t ivshmem_size; /* size of shared memory region */
uint32_t ivshmem_64bit;
Peer *peers; Peer *peers;
int nb_peers; /* space in @peers[] */ int nb_peers; /* space in @peers[] */
@ -97,9 +108,12 @@ typedef struct IVShmemState {
OnOffAuto master; OnOffAuto master;
Error *migration_blocker; Error *migration_blocker;
char * shmobj; /* legacy cruft */
char * sizearg; char *role;
char * role; char *shmobj;
char *sizearg;
size_t legacy_size;
uint32_t not_legacy_32bit;
} IVShmemState; } IVShmemState;
/* registers for the Inter-VM shared memory device */ /* registers for the Inter-VM shared memory device */
@ -126,8 +140,21 @@ static void ivshmem_update_irq(IVShmemState *s)
PCIDevice *d = PCI_DEVICE(s); PCIDevice *d = PCI_DEVICE(s);
uint32_t isr = s->intrstatus & s->intrmask; uint32_t isr = s->intrstatus & s->intrmask;
/* No INTx with msi=on, whether the guest enabled MSI-X or not */ /*
if (ivshmem_has_feature(s, IVSHMEM_MSI)) { * Do nothing unless the device actually uses INTx. Here's how
* the device variants signal interrupts, what they put in PCI
* config space:
* Device variant Interrupt Interrupt Pin MSI-X cap.
* ivshmem-plain none 0 no
* ivshmem-doorbell MSI-X 1 yes(1)
* ivshmem,msi=off INTx 1 no
* ivshmem,msi=on MSI-X 1(2) yes(1)
* (1) if guest enabled MSI-X
* (2) the device lies
* Leads to the condition for doing nothing:
*/
if (ivshmem_has_feature(s, IVSHMEM_MSI)
|| !d->config[PCI_INTERRUPT_PIN]) {
return; return;
} }
@ -259,7 +286,7 @@ static void ivshmem_vector_notify(void *opaque)
{ {
MSIVector *entry = opaque; MSIVector *entry = opaque;
PCIDevice *pdev = entry->pdev; PCIDevice *pdev = entry->pdev;
IVShmemState *s = IVSHMEM(pdev); IVShmemState *s = IVSHMEM_COMMON(pdev);
int vector = entry - s->msi_vectors; int vector = entry - s->msi_vectors;
EventNotifier *n = &s->peers[s->vm_id].eventfds[vector]; EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
@ -280,7 +307,7 @@ static void ivshmem_vector_notify(void *opaque)
static int ivshmem_vector_unmask(PCIDevice *dev, unsigned vector, static int ivshmem_vector_unmask(PCIDevice *dev, unsigned vector,
MSIMessage msg) MSIMessage msg)
{ {
IVShmemState *s = IVSHMEM(dev); IVShmemState *s = IVSHMEM_COMMON(dev);
EventNotifier *n = &s->peers[s->vm_id].eventfds[vector]; EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
MSIVector *v = &s->msi_vectors[vector]; MSIVector *v = &s->msi_vectors[vector];
int ret; int ret;
@ -297,7 +324,7 @@ static int ivshmem_vector_unmask(PCIDevice *dev, unsigned vector,
static void ivshmem_vector_mask(PCIDevice *dev, unsigned vector) static void ivshmem_vector_mask(PCIDevice *dev, unsigned vector)
{ {
IVShmemState *s = IVSHMEM(dev); IVShmemState *s = IVSHMEM_COMMON(dev);
EventNotifier *n = &s->peers[s->vm_id].eventfds[vector]; EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
int ret; int ret;
@ -314,7 +341,7 @@ static void ivshmem_vector_poll(PCIDevice *dev,
unsigned int vector_start, unsigned int vector_start,
unsigned int vector_end) unsigned int vector_end)
{ {
IVShmemState *s = IVSHMEM(dev); IVShmemState *s = IVSHMEM_COMMON(dev);
unsigned int vector; unsigned int vector;
IVSHMEM_DPRINTF("vector poll %p %d-%d\n", dev, vector_start, vector_end); IVSHMEM_DPRINTF("vector poll %p %d-%d\n", dev, vector_start, vector_end);
@ -461,6 +488,7 @@ static void setup_interrupt(IVShmemState *s, int vector, Error **errp)
static void process_msg_shmem(IVShmemState *s, int fd, Error **errp) static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
{ {
struct stat buf; struct stat buf;
size_t size;
void *ptr; void *ptr;
if (s->ivshmem_bar2) { if (s->ivshmem_bar2) {
@ -476,22 +504,28 @@ static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
return; return;
} }
if (s->ivshmem_size > buf.st_size) { size = buf.st_size;
error_setg(errp, "server sent only %zd bytes of shared memory",
(size_t)buf.st_size); /* Legacy cruft */
close(fd); if (s->legacy_size != SIZE_MAX) {
return; if (size < s->legacy_size) {
error_setg(errp, "server sent only %zd bytes of shared memory",
(size_t)buf.st_size);
close(fd);
return;
}
size = s->legacy_size;
} }
/* mmap the region and map into the BAR2 */ /* mmap the region and map into the BAR2 */
ptr = mmap(0, s->ivshmem_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (ptr == MAP_FAILED) { if (ptr == MAP_FAILED) {
error_setg_errno(errp, errno, "Failed to mmap shared memory"); error_setg_errno(errp, errno, "Failed to mmap shared memory");
close(fd); close(fd);
return; return;
} }
memory_region_init_ram_ptr(&s->server_bar2, OBJECT(s), memory_region_init_ram_ptr(&s->server_bar2, OBJECT(s),
"ivshmem.bar2", s->ivshmem_size, ptr); "ivshmem.bar2", size, ptr);
qemu_set_ram_fd(memory_region_get_ram_addr(&s->server_bar2), fd); qemu_set_ram_fd(memory_region_get_ram_addr(&s->server_bar2), fd);
s->ivshmem_bar2 = &s->server_bar2; s->ivshmem_bar2 = &s->server_bar2;
} }
@ -703,7 +737,7 @@ static void ivshmem_msix_vector_use(IVShmemState *s)
static void ivshmem_reset(DeviceState *d) static void ivshmem_reset(DeviceState *d)
{ {
IVShmemState *s = IVSHMEM(d); IVShmemState *s = IVSHMEM_COMMON(d);
s->intrstatus = 0; s->intrstatus = 0;
s->intrmask = 0; s->intrmask = 0;
@ -781,7 +815,7 @@ static void ivshmem_disable_irqfd(IVShmemState *s)
static void ivshmem_write_config(PCIDevice *pdev, uint32_t address, static void ivshmem_write_config(PCIDevice *pdev, uint32_t address,
uint32_t val, int len) uint32_t val, int len)
{ {
IVShmemState *s = IVSHMEM(pdev); IVShmemState *s = IVSHMEM_COMMON(pdev);
int is_enabled, was_enabled = msix_enabled(pdev); int is_enabled, was_enabled = msix_enabled(pdev);
pci_default_write_config(pdev, address, val, len); pci_default_write_config(pdev, address, val, len);
@ -805,7 +839,7 @@ static void desugar_shm(IVShmemState *s)
path = g_strdup_printf("/dev/shm/%s", s->shmobj); path = g_strdup_printf("/dev/shm/%s", s->shmobj);
object_property_set_str(obj, path, "mem-path", &error_abort); object_property_set_str(obj, path, "mem-path", &error_abort);
g_free(path); g_free(path);
object_property_set_int(obj, s->ivshmem_size, "size", &error_abort); object_property_set_int(obj, s->legacy_size, "size", &error_abort);
object_property_set_bool(obj, true, "share", &error_abort); object_property_set_bool(obj, true, "share", &error_abort);
object_property_add_child(OBJECT(s), "internal-shm-backend", obj, object_property_add_child(OBJECT(s), "internal-shm-backend", obj,
&error_abort); &error_abort);
@ -813,42 +847,14 @@ static void desugar_shm(IVShmemState *s)
s->hostmem = MEMORY_BACKEND(obj); s->hostmem = MEMORY_BACKEND(obj);
} }
static void pci_ivshmem_realize(PCIDevice *dev, Error **errp) static void ivshmem_common_realize(PCIDevice *dev, Error **errp)
{ {
IVShmemState *s = IVSHMEM(dev); IVShmemState *s = IVSHMEM_COMMON(dev);
Error *err = NULL; Error *err = NULL;
uint8_t *pci_conf; uint8_t *pci_conf;
uint8_t attr = PCI_BASE_ADDRESS_SPACE_MEMORY | uint8_t attr = PCI_BASE_ADDRESS_SPACE_MEMORY |
PCI_BASE_ADDRESS_MEM_PREFETCH; PCI_BASE_ADDRESS_MEM_PREFETCH;
if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) {
error_setg(errp,
"You must specify either 'shm', 'chardev' or 'x-memdev'");
return;
}
if (s->hostmem) {
MemoryRegion *mr;
if (s->sizearg) {
g_warning("size argument ignored with hostmem");
}
mr = host_memory_backend_get_memory(s->hostmem, &error_abort);
s->ivshmem_size = memory_region_size(mr);
} else if (s->sizearg == NULL) {
s->ivshmem_size = 4 << 20; /* 4 MB default */
} else {
char *end;
int64_t size = qemu_strtosz(s->sizearg, &end);
if (size < 0 || (size_t)size != size || *end != '\0'
|| !is_power_of_2(size)) {
error_setg(errp, "Invalid size %s", s->sizearg);
return;
}
s->ivshmem_size = size;
}
/* IRQFD requires MSI */ /* IRQFD requires MSI */
if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD) && if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD) &&
!ivshmem_has_feature(s, IVSHMEM_MSI)) { !ivshmem_has_feature(s, IVSHMEM_MSI)) {
@ -856,29 +862,9 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
return; return;
} }
/* check that role is reasonable */
if (s->role) {
if (strncmp(s->role, "peer", 5) == 0) {
s->master = ON_OFF_AUTO_OFF;
} else if (strncmp(s->role, "master", 7) == 0) {
s->master = ON_OFF_AUTO_ON;
} else {
error_setg(errp, "'role' must be 'peer' or 'master'");
return;
}
} else {
s->master = ON_OFF_AUTO_AUTO;
}
pci_conf = dev->config; pci_conf = dev->config;
pci_conf[PCI_COMMAND] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY; pci_conf[PCI_COMMAND] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY;
/*
* Note: we don't use INTx with IVSHMEM_MSI at all, so this is a
* bald-faced lie then. But it's a backwards compatible lie.
*/
pci_config_set_interrupt_pin(pci_conf, 1);
memory_region_init_io(&s->ivshmem_mmio, OBJECT(s), &ivshmem_mmio_ops, s, memory_region_init_io(&s->ivshmem_mmio, OBJECT(s), &ivshmem_mmio_ops, s,
"ivshmem-mmio", IVSHMEM_REG_BAR_SIZE); "ivshmem-mmio", IVSHMEM_REG_BAR_SIZE);
@ -886,14 +872,10 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
&s->ivshmem_mmio); &s->ivshmem_mmio);
if (s->ivshmem_64bit) { if (!s->not_legacy_32bit) {
attr |= PCI_BASE_ADDRESS_MEM_TYPE_64; attr |= PCI_BASE_ADDRESS_MEM_TYPE_64;
} }
if (s->shmobj) {
desugar_shm(s);
}
if (s->hostmem != NULL) { if (s->hostmem != NULL) {
IVSHMEM_DPRINTF("using hostmem\n"); IVSHMEM_DPRINTF("using hostmem\n");
@ -940,9 +922,68 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
} }
} }
static void pci_ivshmem_exit(PCIDevice *dev) static void ivshmem_realize(PCIDevice *dev, Error **errp)
{ {
IVShmemState *s = IVSHMEM(dev); IVShmemState *s = IVSHMEM_COMMON(dev);
if (!qtest_enabled()) {
error_report("ivshmem is deprecated, please use ivshmem-plain"
" or ivshmem-doorbell instead");
}
if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) {
error_setg(errp,
"You must specify either 'shm', 'chardev' or 'x-memdev'");
return;
}
if (s->hostmem) {
if (s->sizearg) {
g_warning("size argument ignored with hostmem");
}
} else if (s->sizearg == NULL) {
s->legacy_size = 4 << 20; /* 4 MB default */
} else {
char *end;
int64_t size = qemu_strtosz(s->sizearg, &end);
if (size < 0 || (size_t)size != size || *end != '\0'
|| !is_power_of_2(size)) {
error_setg(errp, "Invalid size %s", s->sizearg);
return;
}
s->legacy_size = size;
}
/* check that role is reasonable */
if (s->role) {
if (strncmp(s->role, "peer", 5) == 0) {
s->master = ON_OFF_AUTO_OFF;
} else if (strncmp(s->role, "master", 7) == 0) {
s->master = ON_OFF_AUTO_ON;
} else {
error_setg(errp, "'role' must be 'peer' or 'master'");
return;
}
} else {
s->master = ON_OFF_AUTO_AUTO;
}
if (s->shmobj) {
desugar_shm(s);
}
/*
* Note: we don't use INTx with IVSHMEM_MSI at all, so this is a
* bald-faced lie then. But it's a backwards compatible lie.
*/
pci_config_set_interrupt_pin(dev->config, 1);
ivshmem_common_realize(dev, errp);
}
static void ivshmem_exit(PCIDevice *dev)
{
IVShmemState *s = IVSHMEM_COMMON(dev);
int i; int i;
if (s->migration_blocker) { if (s->migration_blocker) {
@ -955,7 +996,7 @@ static void pci_ivshmem_exit(PCIDevice *dev)
void *addr = memory_region_get_ram_ptr(s->ivshmem_bar2); void *addr = memory_region_get_ram_ptr(s->ivshmem_bar2);
int fd; int fd;
if (munmap(addr, s->ivshmem_size) == -1) { if (munmap(addr, memory_region_size(s->ivshmem_bar2) == -1)) {
error_report("Failed to munmap shared memory %s", error_report("Failed to munmap shared memory %s",
strerror(errno)); strerror(errno));
} }
@ -1075,26 +1116,37 @@ static Property ivshmem_properties[] = {
DEFINE_PROP_BIT("msi", IVShmemState, features, IVSHMEM_MSI, true), DEFINE_PROP_BIT("msi", IVShmemState, features, IVSHMEM_MSI, true),
DEFINE_PROP_STRING("shm", IVShmemState, shmobj), DEFINE_PROP_STRING("shm", IVShmemState, shmobj),
DEFINE_PROP_STRING("role", IVShmemState, role), DEFINE_PROP_STRING("role", IVShmemState, role),
DEFINE_PROP_UINT32("use64", IVShmemState, ivshmem_64bit, 1), DEFINE_PROP_UINT32("use64", IVShmemState, not_legacy_32bit, 1),
DEFINE_PROP_END_OF_LIST(), DEFINE_PROP_END_OF_LIST(),
}; };
static void ivshmem_common_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);
PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
k->realize = ivshmem_common_realize;
k->exit = ivshmem_exit;
k->config_write = ivshmem_write_config;
k->vendor_id = PCI_VENDOR_ID_IVSHMEM;
k->device_id = PCI_DEVICE_ID_IVSHMEM;
k->class_id = PCI_CLASS_MEMORY_RAM;
k->revision = 1;
dc->reset = ivshmem_reset;
set_bit(DEVICE_CATEGORY_MISC, dc->categories);
dc->desc = "Inter-VM shared memory";
}
static void ivshmem_class_init(ObjectClass *klass, void *data) static void ivshmem_class_init(ObjectClass *klass, void *data)
{ {
DeviceClass *dc = DEVICE_CLASS(klass); DeviceClass *dc = DEVICE_CLASS(klass);
PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
k->realize = pci_ivshmem_realize; k->realize = ivshmem_realize;
k->exit = pci_ivshmem_exit; k->revision = 0;
k->config_write = ivshmem_write_config; dc->desc = "Inter-VM shared memory (legacy)";
k->vendor_id = PCI_VENDOR_ID_IVSHMEM;
k->device_id = PCI_DEVICE_ID_IVSHMEM;
k->class_id = PCI_CLASS_MEMORY_RAM;
dc->reset = ivshmem_reset;
dc->props = ivshmem_properties; dc->props = ivshmem_properties;
dc->vmsd = &ivshmem_vmsd; dc->vmsd = &ivshmem_vmsd;
set_bit(DEVICE_CATEGORY_MISC, dc->categories);
dc->desc = "Inter-VM shared memory";
} }
static void ivshmem_check_memdev_is_busy(Object *obj, const char *name, static void ivshmem_check_memdev_is_busy(Object *obj, const char *name,
@ -1123,16 +1175,121 @@ static void ivshmem_init(Object *obj)
&error_abort); &error_abort);
} }
static const TypeInfo ivshmem_common_info = {
.name = TYPE_IVSHMEM_COMMON,
.parent = TYPE_PCI_DEVICE,
.instance_size = sizeof(IVShmemState),
.abstract = true,
.class_init = ivshmem_common_class_init,
};
static const TypeInfo ivshmem_info = { static const TypeInfo ivshmem_info = {
.name = TYPE_IVSHMEM, .name = TYPE_IVSHMEM,
.parent = TYPE_PCI_DEVICE, .parent = TYPE_IVSHMEM_COMMON,
.instance_size = sizeof(IVShmemState), .instance_size = sizeof(IVShmemState),
.instance_init = ivshmem_init, .instance_init = ivshmem_init,
.class_init = ivshmem_class_init, .class_init = ivshmem_class_init,
}; };
static const VMStateDescription ivshmem_plain_vmsd = {
.name = TYPE_IVSHMEM_PLAIN,
.version_id = 0,
.minimum_version_id = 0,
.pre_load = ivshmem_pre_load,
.post_load = ivshmem_post_load,
.fields = (VMStateField[]) {
VMSTATE_PCI_DEVICE(parent_obj, IVShmemState),
VMSTATE_UINT32(intrstatus, IVShmemState),
VMSTATE_UINT32(intrmask, IVShmemState),
VMSTATE_END_OF_LIST()
},
};
static Property ivshmem_plain_properties[] = {
DEFINE_PROP_ON_OFF_AUTO("master", IVShmemState, master, ON_OFF_AUTO_OFF),
DEFINE_PROP_END_OF_LIST(),
};
static void ivshmem_plain_init(Object *obj)
{
IVShmemState *s = IVSHMEM_PLAIN(obj);
object_property_add_link(obj, "memdev", TYPE_MEMORY_BACKEND,
(Object **)&s->hostmem,
ivshmem_check_memdev_is_busy,
OBJ_PROP_LINK_UNREF_ON_RELEASE,
&error_abort);
}
static void ivshmem_plain_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);
dc->props = ivshmem_plain_properties;
dc->vmsd = &ivshmem_plain_vmsd;
}
static const TypeInfo ivshmem_plain_info = {
.name = TYPE_IVSHMEM_PLAIN,
.parent = TYPE_IVSHMEM_COMMON,
.instance_size = sizeof(IVShmemState),
.instance_init = ivshmem_plain_init,
.class_init = ivshmem_plain_class_init,
};
static const VMStateDescription ivshmem_doorbell_vmsd = {
.name = TYPE_IVSHMEM_DOORBELL,
.version_id = 0,
.minimum_version_id = 0,
.pre_load = ivshmem_pre_load,
.post_load = ivshmem_post_load,
.fields = (VMStateField[]) {
VMSTATE_PCI_DEVICE(parent_obj, IVShmemState),
VMSTATE_MSIX(parent_obj, IVShmemState),
VMSTATE_UINT32(intrstatus, IVShmemState),
VMSTATE_UINT32(intrmask, IVShmemState),
VMSTATE_END_OF_LIST()
},
};
static Property ivshmem_doorbell_properties[] = {
DEFINE_PROP_CHR("chardev", IVShmemState, server_chr),
DEFINE_PROP_UINT32("vectors", IVShmemState, vectors, 1),
DEFINE_PROP_BIT("ioeventfd", IVShmemState, features, IVSHMEM_IOEVENTFD,
true),
DEFINE_PROP_ON_OFF_AUTO("master", IVShmemState, master, ON_OFF_AUTO_OFF),
DEFINE_PROP_END_OF_LIST(),
};
static void ivshmem_doorbell_init(Object *obj)
{
IVShmemState *s = IVSHMEM_DOORBELL(obj);
s->features |= (1 << IVSHMEM_MSI);
s->legacy_size = SIZE_MAX; /* whatever the server sends */
}
static void ivshmem_doorbell_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);
dc->props = ivshmem_doorbell_properties;
dc->vmsd = &ivshmem_doorbell_vmsd;
}
static const TypeInfo ivshmem_doorbell_info = {
.name = TYPE_IVSHMEM_DOORBELL,
.parent = TYPE_IVSHMEM_COMMON,
.instance_size = sizeof(IVShmemState),
.instance_init = ivshmem_doorbell_init,
.class_init = ivshmem_doorbell_class_init,
};
static void ivshmem_register_types(void) static void ivshmem_register_types(void)
{ {
type_register_static(&ivshmem_common_info);
type_register_static(&ivshmem_plain_info);
type_register_static(&ivshmem_doorbell_info);
type_register_static(&ivshmem_info); type_register_static(&ivshmem_info);
} }

View File

@ -1262,13 +1262,18 @@ basic example.
@subsection Inter-VM Shared Memory device @subsection Inter-VM Shared Memory device
With KVM enabled on a Linux host, a shared memory device is available. Guests On Linux hosts, a shared memory device is available. The basic syntax
map a POSIX shared memory region into the guest as a PCI device that enables is:
zero-copy communication to the application level of the guests. The basic
syntax is:
@example @example
qemu-system-i386 -device ivshmem,size=@var{size},shm=@var{shm-name} qemu-system-x86_64 -device ivshmem-plain,memdev=@var{hostmem}
@end example
where @var{hostmem} names a host memory backend. For a POSIX shared
memory backend, use something like
@example
-object memory-backend-file,size=1M,share,mem-path=/dev/shm/ivshmem,id=@var{hostmem}
@end example @end example
If desired, interrupts can be sent between guest VMs accessing the same shared If desired, interrupts can be sent between guest VMs accessing the same shared
@ -1282,8 +1287,7 @@ memory server is:
ivshmem-server -p @var{pidfile} -S @var{path} -m @var{shm-name} -l @var{shm-size} -n @var{vectors} ivshmem-server -p @var{pidfile} -S @var{path} -m @var{shm-name} -l @var{shm-size} -n @var{vectors}
# Then start your qemu instances with matching arguments # Then start your qemu instances with matching arguments
qemu-system-i386 -device ivshmem,size=@var{shm-size},vectors=@var{vectors},chardev=@var{id} qemu-system-x86_64 -device ivshmem-doorbell,vectors=@var{vectors},chardev=@var{id}
[,msi=on][,ioeventfd=on][,role=peer|master]
-chardev socket,path=@var{path},id=@var{id} -chardev socket,path=@var{path},id=@var{id}
@end example @end example
@ -1291,12 +1295,11 @@ When using the server, the guest will be assigned a VM ID (>=0) that allows gues
using the same server to communicate via interrupts. Guests can read their using the same server to communicate via interrupts. Guests can read their
VM ID from a device register (see ivshmem-spec.txt). VM ID from a device register (see ivshmem-spec.txt).
The @option{role} argument can be set to either master or peer and will affect With device property @option{master=on}, the guest will copy the shared
how the shared memory is migrated. With @option{role=master}, the guest will memory on migration to the destination host. With @option{master=off},
copy the shared memory on migration to the destination host. With the guest will not be able to migrate with the device attached. In the
@option{role=peer}, the guest will not be able to migrate with the device attached. latter case, the device should be detached and then reattached after
With the @option{peer} case, the device should be detached and then reattached migration using the PCI hotplug support.
after migration using the PCI hotplug support.
@subsubsection ivshmem and hugepages @subsubsection ivshmem and hugepages
@ -1304,8 +1307,8 @@ Instead of specifying the <shm size> using POSIX shm, you may specify
a memory backend that has hugepage support: a memory backend that has hugepage support:
@example @example
qemu-system-i386 -object memory-backend-file,size=1G,mem-path=/dev/hugepages/my-shmem-file,share,id=mb1 qemu-system-x86_64 -object memory-backend-file,size=1G,mem-path=/dev/hugepages/my-shmem-file,share,id=mb1
-device ivshmem,x-memdev=mb1 -device ivshmem-plain,memdev=mb1
@end example @end example
ivshmem-server also supports hugepages mount points with the ivshmem-server also supports hugepages mount points with the

View File

@ -127,7 +127,9 @@ static void setup_vm_cmd(IVState *s, const char *cmd, bool msix)
static void setup_vm(IVState *s) static void setup_vm(IVState *s)
{ {
char *cmd = g_strdup_printf("-device ivshmem,shm=%s,size=1M", tmpshm); char *cmd = g_strdup_printf("-object memory-backend-file"
",id=mb1,size=1M,share,mem-path=/dev/shm%s"
" -device ivshmem-plain,memdev=mb1", tmpshm);
setup_vm_cmd(s, cmd, false); setup_vm_cmd(s, cmd, false);
@ -284,8 +286,10 @@ static void *server_thread(void *data)
static void setup_vm_with_server(IVState *s, int nvectors, bool msi) static void setup_vm_with_server(IVState *s, int nvectors, bool msi)
{ {
char *cmd = g_strdup_printf("-chardev socket,id=chr0,path=%s,nowait " char *cmd = g_strdup_printf("-chardev socket,id=chr0,path=%s,nowait "
"-device ivshmem,size=1M,chardev=chr0,vectors=%d,msi=%s", "-device ivshmem%s,chardev=chr0,vectors=%d",
tmpserver, nvectors, msi ? "true" : "false"); tmpserver,
msi ? "-doorbell" : ",size=1M,msi=off",
nvectors);
setup_vm_cmd(s, cmd, msi); setup_vm_cmd(s, cmd, msi);
@ -412,7 +416,7 @@ static void test_ivshmem_memdev(void)
/* just for the sake of checking memory-backend property */ /* just for the sake of checking memory-backend property */
setup_vm_cmd(&state, "-object memory-backend-ram,size=1M,id=mb1" setup_vm_cmd(&state, "-object memory-backend-ram,size=1M,id=mb1"
" -device ivshmem,x-memdev=mb1", false); " -device ivshmem-plain,memdev=mb1", false);
cleanup_vm(&state); cleanup_vm(&state);
} }