The virtio-blk-data-plane feature is easy to integrate into
hw/virtio-blk.c. The data plane can be started and stopped similar to
vhost-net.
Users can take advantage of the virtio-blk-data-plane feature using the
new -device virtio-blk-pci,x-data-plane=on property.
The x-data-plane name was chosen because at this stage the feature is
experimental and likely to see changes in the future.
If the VM configuration does not support virtio-blk-data-plane an error
message is printed. Although we could fall back to regular virtio-blk,
I prefer the explicit approach since it prompts the user to fix their
configuration if they want the performance benefit of
virtio-blk-data-plane.
Limitations:
* Only format=raw is supported
* Live migration is not supported
* Block jobs, hot unplug, and other operations fail with -EBUSY
* I/O throttling limits are ignored
* Only Linux hosts are supported due to Linux AIO usage
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Two slightly different versions of a patch to conditionally set
VIRTIO_BLK_F_CONFIG_WCE through the "config-wce" qdev property have been
applied (ea776abca and eec7f96c2). David Gibson
<david@gibson.dropbear.id.au> noticed that the "config-wce"
property is broken as a result and fixed it recently.
The fix sets the host_features VIRTIO_BLK_F_CONFIG_WCE bit from a qdev
property. Unfortunately, the virtio device then has no chance to test
for the presence of the feature bit during virtio_blk_init().
Therefore, reinstate the VirtIOBlkConf->config_wce flag. Drop the
duplicate qdev property to set the host_features bit. The
VirtIOBlkConf->config_wce flag will be used by virtio-blk-data-plane in
a later patch.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
At the moment, when irqfd is in use but a vector is masked,
qemu will poll it and handle vector masks in userspace.
Since almost no one ever looks at the pending bits,
it is better to defer this until pending bits
are actually read.
Implement this optimization using the new poll notifier.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Move bindings from opaque to DeviceState.
This gives us better type safety with no performance cost.
Add macros to make future QOM work easier.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
For the virtio-blk device (via virtio-pci) the property "config-wce" is
defined in two places. First, it's defined from the
DEFINE_VIRTIO_BLK_FEATURES macro, second it's defined directly in
virtio-pci, just two lines above the call to that macro.
The direct definition in virtio-pci.c is broken, since it operates on the
'config_wce' field of VirtIOBlkConf, which is never used anywhere else.
Therefore, this patch removes both the extra property definition and the
redundant field it works on.
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Cc: Paul 'Rusty' Russell <rusty@rustcorp.com.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This allows you to specify:
$ qemu -device virtio-rng-pci
And things will Just Work with a reasonable default.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This adds parameters to virtio-rng-pci to allow rate limiting the entropy a
guest receives. An example command line:
$ qemu -device virtio-rng-pci,max-bytes=1024,period=1000
Would limit entropy collection to 1Kb/s.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The Linux kernel already has a virtio-rng driver, this is the device
implementation.
When the guest asks for entropy from the virtio hwrng, it puts a buffer
in the vq. We then put entropy into that buffer, and push it back to
the guest.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
---
aliguori: converted to new RngBackend interface
aliguori: remove entropy needed event
aliguori: fix migration
target_phys_addr_t is unwieldly, violates the C standard (_t suffixes are
reserved) and its purpose doesn't match the name (most target_phys_addr_t
addresses are not target specific). Replace it with a finger-friendly,
standards conformant hwaddr.
Outstanding patchsets can be fixed up with the command
git rebase -i --exec 'find -name "*.[ch]"
| xargs s/target_phys_addr_t/hwaddr/g' origin
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
No need to expose the fd-based interface, everyone will already be fine
with the more handy EventNotifier variant. Rename the latter to clarify
that we are still talking about irqfds here.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The facility to use/unuse vectors dynamically is helpful
for virtio but little else: everyone just seems to use
vectors in their init function.
Avoid clearing msix vector use info on reset and load.
For virtio, clear it explicitly.
This should fix regressions reported with ivshmem - though
I didn't test this, I verified that virtio keeps
working like it did.
Tested-by: Cam Macdonell <cam@cs.ualberta.ca>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
QEMU has a policy of keeping a stable guest device ABI. When new guest device
features are introduced they must not change hardware info seen by existing
guests. This is important because operating systems or applications may
"fingerprint" the hardware and refuse to run when the hardware changes. To
always get the latest guest device ABI, run with x86 machine type "pc".
This patch hides the new VIRTIO_BLK_F_CONFIG_WCE virtio feature bit from
existing machine types. Only pc-1.2 and later will expose this feature
by default.
For more info on the VIRTIO_BLK_F_CONFIG_WCE feature bit, see:
commit 13e3dce068
Author: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu Aug 9 16:07:19 2012 +0200
virtio-blk: support VIRTIO_BLK_F_CONFIG_WCE
Also rename VIRTIO_BLK_F_WCACHE to VIRTIO_BLK_F_WCE for consistency with
the spec.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Anthony Liguori <aliguori@us.ibm.com> reported:
This broke qemu-test because it changed the pc-1.0 machine type:
Setting guest RANDOM seed to 47167
*** Running tests ***
Running test /tests/finger-print.sh... OK
--- fingerprints/pc-1.0.x86_64 2011-12-18 13:08:40.000000000 -0600
+++ fingerprint.txt 2012-08-12 13:30:48.000000000 -0500
@@ -55,7 +55,7 @@
/sys/bus/pci/devices/0000:00:06.0/subsystem_device=0x0002
/sys/bus/pci/devices/0000:00:06.0/class=0x010000
/sys/bus/pci/devices/0000:00:06.0/revision=0x00
-/sys/bus/pci/devices/0000:00:06.0/virtio/host-features=0x710006d4
+/sys/bus/pci/devices/0000:00:06.0/virtio/host-features=0x71000ed4
/sys/class/dmi/id/bios_vendor=Bochs
/sys/class/dmi/id/bios_date=01/01/2007
/sys/class/dmi/id/bios_version=Bochs
Guest fingerprint changed for pc-1.0!
Reported-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Decouple another x86-specific assumption about what irqchips imply.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Commit b1f416aa8d breaks vhost_net
because it always registers the virtio_pci_host_notifier_read() handler
function on the ioeventfd, even when vhost_net.ko is using the ioeventfd.
The result is both QEMU and vhost_net.ko polling on the same eventfd
and the virtio_net.ko guest driver seeing inconsistent results:
# ifconfig eth0 192.168.0.1 netmask 255.255.255.0
virtio_net virtio0: output:id 0 is not a head!
To fix this, proceed the same as we do for irqfd: add a parameter to
virtio_queue_set_host_notifier_fd_handler and in that case only set
the notifier, not the handler.
Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Tested-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
While virtio-scsi does support multiqueue, the default number of
interrupt vectors is not enough to actually enable usage of
multiple queues in the driver; this is because with only 2
vectors the driver will not be able to use a separate
interrupt for each request queue. Derive the desired number
of vectors from the number of request queues.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Probably due to bad merge months ago, virtio-scsi-pci did not have
ioeventfd support. Fix this and enable it by default, as is the
case for other virtio-pci devices.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kwolf/for-anthony: (41 commits)
fdc-test: Clean up a bit
fdc-test: introduce test_relative_seek
fdc: fix relative seek
qemu-iotests: Valgrind support
coroutine-ucontext: Help valgrind understand coroutines
qemu-io: Fix memory leaks
hw/block-common: Factor out fall back to legacy -drive cyls=...
blockdev: Don't limit DriveInfo serial to 20 characters
hw/block-common: Factor out fall back to legacy -drive serial=...
hw/block-common: Move BlockConf & friends from block.h
Relax IDE CHS limits from 16383,16,63 to 65535,16,255
blockdev: Drop redundant CHS validation for if=ide
hd-geometry: Compute BIOS CHS translation in one place
qtest: Test we don't put hard disk info into CMOS for a CD-ROM
ide pc: Put hard disk info into CMOS only for hard disks
block: Geometry and translation hints are now useless, purge them
qtest: Cover qdev property for BIOS CHS translation
ide: qdev property for BIOS CHS translation
qdev: New property type chs-translation
qdev: Collect private helpers in one place
...
Geometry needs to be qdev properties, because it belongs to the
disk's guest part.
Maintain backward compatibility exactly like for serial: fall back to
DriveInfo's geometry, set with -drive cyls=...
Bonus: info qtree now shows the geometry.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All transports can use the same event handler for the irqfd, though the
exact mechanics of the assignment will be specific. Note that there
are three states: handled by the kernel, handled in userspace, disabled.
This also lets virtio use event_notifier_set_handler.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
All transports can use the same event handler for the ioeventfd, though
the exact setup (address/memory region) will be specific.
This lets virtio use event_notifier_set_handler.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Under Win32, EventNotifiers will not have event_notifier_get_fd, so we
cannot call it in common code such as hw/virtio-pci.c. Pass a pointer to
the notifier, and only retrieve the file descriptor in kvm-specific code.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Not a single driver has any possibility of failure on their
exit function, let's keep it that way.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Also this functions is better invoked by the core than by each and every
device. This allows to drop the config_write callbacks from ich and
intel-hda.
CC: Alexander Graf <agraf@suse.de>
CC: Gerd Hoffmann <kraxel@redhat.com>
CC: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
There is no point in pushing this burden to the devices, they tend to
forget to call them (like intel-hda, ahci, xhci did). Instead, reset
functions are now called from pci_device_reset. They do nothing if
MSI/MSI-X is not in use.
CC: Alexander Graf <agraf@suse.de>
CC: Gerd Hoffmann <kraxel@redhat.com>
CC: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* qemu-kvm/uq/master:
virtio/vhost: Add support for KVM in-kernel MSI injection
msix: Add msix_nr_vectors_allocated
kvm: Enable use of kvm_irqchip_in_kernel in hwlib code
kvm: Introduce kvm_irqchip_add/remove_irqfd
kvm: Make kvm_irqchip_commit_routes an internal service
kvm: Publicize kvm_irqchip_release_virq
kvm: Introduce kvm_irqchip_add_msi_route
kvm: Rename kvm_irqchip_add_route to kvm_irqchip_add_irq_route
msix: Introduce vector notifiers
msix: Invoke msix_handle_mask_update on msix_mask_all
msix: Factor out msix_get_message
kvm: update vmxcap for EPT A/D, INVPCID, RDRAND, VMFUNC
kvm: Enable in-kernel irqchip support by default
kvm: Add support for direct MSI injections
kvm: Update kernel headers
kvm: x86: Wire up MSI support for in-kernel irqchip
pc: Enable MSI support at APIC level
kvm: Introduce basic MSI support for in-kernel irqchips
Introduce MSIMessage structure
kvm: Refactor KVMState::max_gsi to gsi_count
VIRTIO_BLK_F_SCSI is supposed to mean whether the host can *parse*
SCSI requests, not *execute* them. You could run QEMU with scsi=on
and a file-backed disk, and QEMU would fail all SCSI requests even
though it advertises VIRTIO_BLK_F_SCSI.
Because we need to do this to fix a migration compatibility problem
related to how QEMU is invoked by management, we must do this
unconditionally even on older machine types. This more or less assumes
that no one ever invoked QEMU with scsi=off.
Here is how testing goes:
- old QEMU, scsi=on -> new QEMU, scsi=on
- new QEMU, scsi=on -> old QEMU, scsi=on
- old QEMU, scsi=off -> new QEMU, scsi=on
- new QEMU, scsi=off -> old QEMU, scsi=on
ok (new QEMU has VIRTIO_BLK_F_SCSI, adding host features is fine)
- old QEMU, scsi=off -> new QEMU, scsi=off
ok (new QEMU has VIRTIO_BLK_F_SCSI, adding host features is fine)
- old QEMU, scsi=on -> new QEMU, scsi=off
ok, bug fixed
- new QEMU, scsi=on -> old QEMU, scsi=off
doesn't work (same as: old QEMU, scsi=on -> old QEMU, scsi=off)
- new QEMU, scsi=off -> old QEMU, scsi=off
broken by the patch
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
We will have to add another field to the virtio-blk configuration in
the next patch. Avoid a proliferation of arguments to virtio_blk_init.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Move it from virtio_blk_exit_pci to virtio_blk_exit.
This is included here because the next patch removes proxy->block.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Make use of the new vector notifier to track changes of the MSI-X
configuration of virtio PCI devices. On enabling events, we establish
the required virtual IRQ to MSI-X message route and link the signaling
eventfd file descriptor to this vIRQ line. That way, vhost-generated
interrupts can be directly delivered to an in-kernel MSI-X consumer like
the x86 APIC.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
There are no outside references to virtio_portio.
Add missing 'static' specifier.
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Currently the virtio balloon device, when using the virtio-pci interface
advertises itself with PCI class code MEMORY_RAM. This is wrong; the
balloon is vaguely related to memory, but is nothing like a PCI memory
device in the meaning of the class code, and this code is not required
or suggested by the virtio PCI specification.
Worse, this patch causes problems on the pseries machine, because the
firmware, seeing this class code, advertises the device as memory in the
device tree, and then a guest kernel bug causes it to see this "memory"
before the real system memory, leading to a crash in early boot.
This patch fixes the problem by removing the bogus PCI class code on the
balloon device. The backwards compatibility PC machines get new compat
properties so that they don't change.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Replace device_init() with generalized type_init().
While at it, unify naming convention: type_init([$prefix_]register_types)
Also, type_init() is a function, so add preceding blank line where
necessary and don't put a semicolon after the closing brace.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Cc: malc <av1474@comtv.ru>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Limit them to the device_add functionality. Device aliases were a hack based
on the fact that virtio was modeled the wrong way. The mechanism for aliasing
is very limited in that only one alias can exist for any device.
We have to support it for the purposes of compatibility but we only need to
support it in device_add so restrict it to that piece of code.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
---
v1 -> v2
- Use a table for aliases (Paolo)
This was done in a mostly automated fashion. I did it in three steps and then
rebased it into a single step which avoids repeatedly touching every file in
the tree.
The first step was a sed-based addition of the parent type to the subclass
registration functions.
The second step was another sed-based removal of subclass registration functions
while also adding virtual functions from the base class into a class_init
function as appropriate.
Finally, a python script was used to convert the DeviceInfo structures and
qdev_register_subclass functions to TypeInfo structures, class_init functions,
and type_register_static calls.
We are almost fully converted to QOM after this commit.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
These are various small stylistic changes which help make things more
consistent such that the automated conversion script can be simpler.
It's not necessary to agree or disagree with these style changes because all
of this code is going to be rewritten by the patch monkey script anyway.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The virtio config area in PIO space is a bit special. The initial
header is little endian but the rest (device specific) is guest
native endian.
The PIO accessors for PCI on machines that don't have native IO ports
assume that all PIO is little endian, which works fine for everything
except the above.
A complicated way to fix it would be to split the BAR into two memory
regions with different endianess settings, but this isn't practical
to do, besides, the PIO code doesn't honor region endianness anyway
(I have a patch for that too but it isn't necessary at this stage).
So I decided to go for the quick fix instead which consists of
reverting the swap in virtio-pci in selected places, hoping that when
we eventually do a "v2" of the virtio protocols, we sort that out once
and for all using a fixed endian setting for everything.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
[agraf: keep virtio in libhw and determine endianness through a
helper function in exec.c]
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
All files under GPLv2 will get GPLv2+ changes starting tomorrow.
event_notifier.c and exec-obsolete.h were only ever touched by Red Hat
employees and can be relicensed now.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>