* kraxel/usb.75: (32 commits)
uhci: stop using portio lists
usbredir: Add support for buffered bulk input (v2)
exynos4210: Add EHCI support
usb/ehci: Add SysBus EHCI device for Exynos4210
usb/ehci: Move capsbase and opregbase into SysBus EHCI class
usb/ehci: Clean up SysBus and PCI EHCI split
xhci: call set-address with dummy usbpacket
usb-redir: Add debugging to bufpq save / restore
usbredir: Add usbredir_init_endpoints() helper
usbredir: Verify we have 32 bits bulk length cap when redirecting to xhci
usbredir: Add ep_stopped USBDevice method
usbredir: Add USBEP2I and I2USBEP helper macros
usbredir: Add an usbredir_stop_ep helper function
usb: Add an usb_device_ep_stopped USBDevice method
usb: Fix usb_ep_find_packet_by_id
hid: Change idle handling to use a timer
uhci: Maximize how many frames we catch up when behind
uhci: Limit amount of frames processed in one go
uhci: Add a QH_VALID define
uhci: Fix pending interrupts getting lost on migration
...
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Buffered bulk mode is intended for bulk *input* endpoints, where the data is
of a streaming nature (not part of a command-response protocol). These
endpoints' input buffer may overflow if data is not read quickly enough.
So in buffered bulk mode the usb-host takes care of the submitting and
re-submitting of bulk transfers.
Buffered bulk mode is necessary for reliable operation with the bulk in
endpoints of usb to serial convertors. Unfortunatelty buffered bulk input
mode will only work with certain devices, therefor this patch also adds a
usb-id table to enable it for devices which need it, while leaving the
bulk ep handling for other devices unmodified.
Note that the bumping of the required usbredir from 0.5.3 to 0.6 does
not mean that we will now need a newer usbredir release then qemu-1.3,
.pc files reporting 0.5.3 have only ever existed in usbredir builds directly
from git, so qemu-1.3 needs the 0.6 release too.
Changes in v2:
-Split of quirk handling into quirks.c
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Support backend guest notifier masking in vhost-net:
create eventfd at device init, when masked,
make vhost use that as eventfd instead of
sending an interrupt.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This makes it possible to use started flag for sanity checking
of callbacks that happen during start/stop.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
As vhost started is cleared last thing on stop,
set it first things on start. This makes it
possible to use vhost_started while start is in
progress which is used by follow-up patches.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
some backends (notably vhost) can mask events
at their source in a way that is more efficient
than masking through kvm.
Specifically
- masking in kvm uses rcu write side so it has high latency
- in kvm on unmask we always send an interrupt
masking at source does not have these issues.
Add such support in virtio.h and use in virtio-pci.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Some guests mask a vector then unmask without changing it.
Store vectors to avoid kvm system calls in this case.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Pass nvqs to set_guest_notifiers. This makes it possible to
save on irqfds by not allocating one for the control vq
for virtio-net.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We already used to support the external proxy facility of FSL MPICs,
but only implemented it halfway correctly.
This patch adds support for
* dynamic enablement of the EPR facility
* interrupt acknowledgement only when the interrupt is delivered
This way the implementation now is closer to real hardware.
Signed-off-by: Alexander Graf <agraf@suse.de>
On e500mc, the platform doesn't provide a way for the CPU to go idle.
To still not uselessly burn CPU time, expose an idle hypercall to the guest
if kvm supports it.
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
[agraf: adjust for current code base, add patch description, fix non-kvm case]
Signed-off-by: Alexander Graf <agraf@suse.de>
Properly implement level-triggered interrupts by withdrawing an
interrupt from the raised queue if the interrupt source de-asserts.
Also withdraw from the raised queue if the interrupt becomes masked.
When CTPR is written, check whether we need to raise or lower the
interrupt output.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Besides making the code cleaner, we will need a separate way to access
IACK in order to implement EPR (external proxy) interrupt delivery.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Search the queue more efficiently by first looking for a non-zero word,
and then using the common bit-searching function to find the bit within
the word. It would be even nicer if bitops_ffsl() could be hooked up
to the compiler intrinsic so that bit-searching instructions could be
used, but that's another matter.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Previously, the sense and priority bits were masked off when writing
to IVPR, and all interrupts were treated as edge-triggered (despite
the existence of code for handling level-triggered interrupts).
Polarity is implemented only as storage. We don't simulate the
bad effects that you'd get on real hardware if you set this incorrectly,
but at least the guest sees the right thing when it reads back the register.
Sense now controls level/edge on FSL external interrupts (and all
interrupts on non-FSL MPIC). FSL internal interrupts do not have a sense
bit (reads as zero), but are level. FSL timers and IPIs do not have
sense or polarity bits (read as zero), and are edge-triggered. To
accommodate FSL internal interrupts, QEMU's internal notion of whether an
interrupt is level-triggered is separated from the IVPR bit.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
The two checks with abort() guard against potential QEMU-internal
problems, but the EOI check stops the guest from causing updates to queue
position -1 and other havoc if it writes EOI with no interrupt in
service.
Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: remove hunk in code that didn't get applied yet]
Signed-off-by: Alexander Graf <agraf@suse.de>
Besides the private implementation being redundant, namespace collisions
prevented the use of other things in bitops.h.
Serialization does get a bit more awkward, unfortunately, since the
standard bitmap operations are "unsigned long" rather than "uint32_t",
though in exchange we will get faster queue lookups on 64-bit hosts once
we search a word at a time.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This reverts commit a9bd83f4c65de0058659ede009fa1a241f379edd.
This counting approach is not robust against setting a bit that
was already set, or clearing a bit that was already clear. Perhaps
that is considered a bug, but besides the lack of any documentation
for that restriction, it's a pretty unpleasant way for the problem
to manifest itself.
It could be made more robust by testing the current value of the
bit before changing the count, but a later patch speeds up IRQ_check
in all cases, not just when there's nothing pending. Hopefully that
should be adequate to address performance concerns.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Previously the code relied on the queue's "next" field getting
set to -1 sometime between an update to the bitmap, and the next
call to IRQ_get_next. Sometimes this happened after the update.
Sometimes it happened before the check. Sometimes it didn't happen
at all.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Other priorities are signed, so avoid comparisons between
signed and unsigned.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Critical interrupts on FSL MPIC are not supposed to pay
attention to priority, IACK, EOI, etc. On the currently modeled
version it's not supposed to pay attention to the mask bit either.
Also reorganize to make it easier to implement newer FSL MPIC models,
which encode interrupt level information differently and support
mcheck as well as crit, and to reduce problems for later patches
in this set.
Still missing is the ability to lower the CINT signal to the core,
as IACK/EOI is not used. This will come with general IRQ-source-driven
lowering in the next patch.
New state is added which is not serialized, but instead is recomputed
in openpic_load() by calling the appropriate write_IRQreg function.
This should have the side effect of causing the IRQ outputs to be
raised appropriately on load, which was missing.
The serialization format is altered by swapping ivpr and idr (we'd like
IDR to be restored before we run the IVPR logic), and moving interrupts
to the end (so that other state has been restored by the time we run the
IDR/IVPR logic. Serialization for this driver is not yet in a state
where backwards compatibility is reasonable (assuming it works at all),
and the current serialization format was not built for extensibility.
Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: fix for current code state]
Signed-off-by: Alexander Graf <agraf@suse.de>
The base openpic specification doesn't provide abbreviated register
names, so it's somewhat understandable that the QEMU code made up
its own, except that most of the names that QEMU used didn't correspond
to the terminology used by any implementation I could find.
In some cases, like PCTP, the phrase "processor current task priority"
could be found in the openpic spec when describing the concept, but
the register itself was labelled "current task priority register"
and every implementation seems to use either CTPR or the full phrase.
In other cases, individual implementations disagree on what to call
the register. The implementations I have documentation for are
Freescale, Raven (MCP750), and IBM. The Raven docs tend to not use
abbreviations at all. The IBM MPIC isn't implemented in QEMU. Thus,
where there's disagreement I chose to use the Freescale abbreviations.
Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: rebase on current state of the code]
Signed-off-by: Alexander Graf <agraf@suse.de>
This will stop things from breaking once it's properly treated as a
level-triggered interrupt. Note that it's the MPIC's MSI cascade
interrupts that are level-triggered; the individual MSIs are
edge-triggered.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Fix various format errors when debug prints are enabled. Also
cause error checking to happen even when debug prints are not
enabled, and consistently use 0x for hex output.
Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: adjust for more recent code base, prettify DPRINTF macro]
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch install the timer reset handler. This will be called when
the guest is reset.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
[agraf: adjust for QOM'ification]
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch fixes the following coding style violations:
- structs have to be typedef and be CamelCase
- if()s are always surrounded by curly braces
Signed-off-by: Alexander Graf <agraf@suse.de>
If we access a register via the QEMU memory inspection commands (e.g.
"xp") rather than from guest code, we won't have a CPU context.
Gracefully fail to access the register in that case, rather than
crashing.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
"opp->nb_irqs-1" would have been a minor coding style error,
but putting in one space but not the other makes it look
confusingly like a numeric literal "-1".
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
It's in the address range that normally contains a magic redirection
to the CPU-specific region of the curretn CPU, but it isn't actually
a per-CPU register. On real hardware BRR1 shows up only at 0x40000,
not at 0x60000 or other non-magic per-CPU areas. Plus, this makes
it possible to read the register on the QEMU command line with "xp".
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Previously only the spurious vector was sized appropriately
to the openpic model.
Also, instances of "IPVP_VECTOR(opp->spve)" were replace with
just "opp->spve", as opp->spve is already just a vector and not
an IVPR.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
I could not find this register in any spec (FSL, IBM, or OpenPIC)
and the code doesn't do anything with it but initialize, save,
or restore it.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Deefine symbolic names for some register bits, and use some that
have already been defined.
Also convert some register values from hex to decimal when it improves
readability.
IPVP_PRIORITY_MASK is corrected from (0x1F << 16) to (0xF << 16), in
conjunction with making wider use of the symbolic name. I looked at
Freescale and IBM MPIC docs and at the base OpenPIC spec, and all three
had priority as 4 bits rather than 5. Plus, the magic nubmer that is
being replaced with symbolic values treated the field as 4 bits wide.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Add EHCI USB host controller to exynos4210.
Signed-off-by: Liming Wang <walimisdev@gmail.com>
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Reviewed-by: Igor Mitsyanko <i.mitsyanko@samsung.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
It uses a different capsbase and opregbase than the Xilinx device.
Signed-off-by: Liming Wang <walimisdev@gmail.com>
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Cc: Igor Mitsyanko <i.mitsyanko@samsung.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
This allows specific derived models to use different values.
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
SysBus EHCI was introduced in a hurry before 1.3 Soft Freeze.
To use QOM casts in place of DO_UPCAST() / FROM_SYSBUS(), we need an
identifying type. Introduce generic abstract base types for PCI and
SysBus EHCI to allow multiple types to access the shared fields.
While at it, move the state structs being amended with macros to the
header file so that they can be embedded.
The VMSTATE_PCI_DEVICE() macro does not play nice with the QOM
parent_obj naming convention, so defer that cleanup.
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Due to the way devices are addressed with xhci (done by hardware, not
the guest os) there is no packet when invoking the set-address control
request. Create a dummy packet in that case to avoid null pointer
dereferences.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The xhci-hcd may submit bulk transfers > 65535 bytes even when not using
bulk-in pipeling, so usbredir can only be used in combination with an xhci
hcd if the client has the 32 bits bulk length capability.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
To ensure that interrupt receiving is properly stopped when the guest is
no longer interested in an interrupt endpoint.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Some usb devices (host or network redirection) can benefit from knowing when
the guest stops using an endpoint. Redirection may involve submitting packets
independently from the guest (in combination with a fifo buffer between the
redirection code and the guest), to ensure that buffers of the real usb device
are timely emptied. This is done for example for isoc traffic and for interrupt
input endpoints. But when the (re)submission of packets is done by the device
code, then how does it know when to stop this?
For isoc endpoints this is handled by detecting a set interface (change alt
setting) command, which works well for isoc endpoints. But for interrupt
endpoints currently the redirection code never stops receiving data from
the device, which is less then ideal.
However the controller emulation is aware when a guest looses interest, as
then the qh for the endpoint gets unlinked (ehci, ohci, uhci) or the endpoint
is explicitly stopped (xhci). This patch adds a new ep_stopped USBDevice
method and modifies the hcd code to call this on queue unlink / ep stop.
This makes it possible for the redirection code to properly stop receiving
interrupt input (*) data when the guest no longer has interest in it.
*) And in the future also buffered bulk input.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
usb_ep_find_packet_by_id mistakenly only checks the first packet and if that
is not a match, keeps trying the first packet! This patch fixes this.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
This leads to cleaner code in usb-hid, and removes up to a 1000 calls / sec to
qemu_get_clock_ns(vm_clock) if idle-time is set to its default value of 0.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
If somehow we've gotten behind a lot, simply skip ahead, like the ehci code
does.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Before this patch uhci would process an unlimited amount of frames when
behind on schedule, by setting the timer to a time already past, causing the
timer subsys to immediately recall the frame_timer function gain.
This would cause invalid cancellations of bulk queues when the catching up
processed more then 32 frames at a moment when the bulk qh was temporarily
unlinked (which the Linux uhci driver does).
This patch fixes this by processing maximum 16 frames in one go, and always
setting the timer one ms later, making the code behave more like the ehci
code.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Rather then using the magic 32 value in various places.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Re-arrange how we process frames / increase frnum / report pending interrupts,
to avoid a 1 ms delay in interrupt reporting to the guest. This increases
the packet throughput for cases where the guest submits a single packet,
then waits for its completion then re-submits from 500 pkts / sec to
1000 pkts / sec. This impacts for example the use of redirected / virtual
usb to serial convertors.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
ehci_raise_irq(s, USBSTS_PCD), gets applied immediately so there is no need
to call commit_irq after it.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
I tried lowering the time between raising an interrupt and rescanning the
async schedule to see if the guest has queued a new transfer before, but
that did not have any positive effect. I now believe the cause for this is
that lowering this time made it more likely to hit the 1 ms interrupt
threshold penalty for the next packet, as described in my
"ehci: Use uframe precision for interrupt threshold checking" commit.
Now that we do interrupt threshold handling with uframe precision, futher
lowering this time from .5 to .25 ms gives an extra 15% improvement in speed
(MB/s) reading from a simple USB-2.0 thumb-drive.
While at it also properly set the int_req_by_async flag for short packet
completions.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Before this patch, the following could happen:
1) Transfer completes, raises interrupt
2) .5 ms later we check if the guest has queued up any new transfers
3) We find and execute a new transfer
4) .2 ms later the new transfer completes
5) We re-run our frame_timer to write back the completion, but less then
1 ms has passed since our last run, so frindex is not changed, so the
interrupt threshold code delays the interrupt
6) 1 ms from the re-run our frame-timer runs again and finally delivers
the interrupt
This leads to unnecessary large delays of interrupts, this code fixes this
by changing frindex to uframe precision and using that for interrupt threshold
control, making the interrupt fire at step 5 for guest which have low interrupt
threshold settings (like Linux).
Note that the guest still sees the frindex move in steps of 8 for migration
compatibility.
This boosts Linux read speed of a simple cheap USB thumb drive by 6 %.
Changes in v2:
-Make the guest see frindex move in steps of 8 by modifying ehci_opreg_read,
rather then using a shadow variable
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
ehci_fill_queue assumes that there is a one on one relationship between an ep
and a qh, this patch adds a check to ensure this.
Note I don't expect this to ever trigger, this is just something I noticed
the guest might do while working on other stuff. The only way this check can
trigger is if a guest mixes in and out qtd-s in a single qh for a non
control ep.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Remove the short-circuiting of fetchqtd in fetchqh, so that the
qtd gets properly verified before completing the transaction.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
This is not allowed, except for clearing active on cancellation, so don't
warn when the new token does not have its active bit set.
This unifies the cancellation path for modified qtd-s, and prepares
ehci_verify_qtd to be used ad an extra check inside
ehci_writeback_async_complete_packet().
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Also drop the warning printf, which was there mainly because this was an
untested code path (as the previous bug fixes to it show), but that no
longer is the case now :)
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
A device reset does not affect the link state, only set_link does.
Signed-off-by: Amos Kong <akong@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Commit b9d03e352c added link
auto-negotiation emulation, it would always set link up by
callback function. Problem exists if original link status
was down, link status should not be changed in auto-negotiation.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Amos Kong <akong@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Discard packets longer than 16384 when !SBP to match the hardware behavior.
Signed-off-by: Michael Contreras <michael@inetric.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
pc-testdev.c cannot be compiled with MinGW (and other non POSIX hosts):
CC i386-softmmu/hw/i386/../pc-testdev.o
qemu/hw/i386/../pc-testdev.c:38:22: warning: sys/mman.h: file not found
qemu/hw/i386/../pc-testdev.c: In function ‘test_flush_page’:
qemu/hw/i386/../pc-testdev.c:103: warning: implicit declaration of function ‘mprotect’
...
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
This typically reduces the size from 512 bytes to 128 bytes.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
sys/mman.h is not needed (tested on Linux) and unavailable for MinGW,
so remove it.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
pc_fw_add_pflash_drv() ignores qemu_find_file() failure, and happily
creates a drive without a medium.
When pc_system_flash_init() asks for its size, bdrv_getlength() fails
with -ENOMEDIUM, which isn't checked either. It fails relatively
cleanly only because -ENOMEDIUM isn't a multiple of 4096:
$ qemu-system-x86_64 -S -vnc :0 -bios nonexistant
qemu: PC system firmware (pflash) must be a multiple of 0x1000
[Exit 1 ]
Fix by handling the qemu_find_file() failure.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Prehistoric leftover, zap it. We poweroff via acpi these days.
And having a port (0x501,0x502) where any random guest write will make
qemu exit -- with no way to turn it off -- is a bad joke anyway.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Add a test device which supports the kvmctl ioports,
so one can run the KVM unittest suite.
Intended Usage:
qemu-system-x86_64 -nographic \
-device pc-testdev \
-device isa-debug-exit,iobase=0xf4,iosize=0x04 \
-kernel /path/to/kvm/unittests/msr.flat
Where msr.flat is one of the KVM unittests, present on a
separate repo,
git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git
[ kraxel: more memory api + qom fixes ]
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Lucas Meneghel Rodrigues <lmr@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
When present it makes qemu exit on any write.
Mapped to port 0x501 by default.
Without this patch Anthony doesn't allow me to
remove the bochs bios debug ports because his
test suite uses this.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The hw/dataplane/vring.c code includes linux/virtio_ring.h. Ensure that
we use linux-headers/ instead of the system-wide headers, which may be
out-of-date on older distros.
This resolves the following build error on Debian 6:
CC hw/dataplane/vring.o
cc1: warnings being treated as errors
hw/dataplane/vring.c: In function 'vring_enable_notification':
hw/dataplane/vring.c:71: error: implicit declaration of function 'vring_avail_event'
hw/dataplane/vring.c:71: error: nested extern declaration of 'vring_avail_event'
hw/dataplane/vring.c:71: error: lvalue required as left operand of assignment
Note that we now build dataplane/ for each target instead of only once.
There is no way around this since linux-headers/ is only available for
per-target objects - and it's how virtio, vfio, kvm, and friends are
built.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Currently, all unknown requests are treated as VIRTIO_BLK_T_IN
Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The virtio-blk-data-plane feature is easy to integrate into
hw/virtio-blk.c. The data plane can be started and stopped similar to
vhost-net.
Users can take advantage of the virtio-blk-data-plane feature using the
new -device virtio-blk-pci,x-data-plane=on property.
The x-data-plane name was chosen because at this stage the feature is
experimental and likely to see changes in the future.
If the VM configuration does not support virtio-blk-data-plane an error
message is printed. Although we could fall back to regular virtio-blk,
I prefer the explicit approach since it prompts the user to fix their
configuration if they want the performance benefit of
virtio-blk-data-plane.
Limitations:
* Only format=raw is supported
* Live migration is not supported
* Block jobs, hot unplug, and other operations fail with -EBUSY
* I/O throttling limits are ignored
* Only Linux hosts are supported due to Linux AIO usage
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
virtio-blk-data-plane is a subset implementation of virtio-blk. It only
handles read, write, and flush requests. It does this using a dedicated
thread that executes an epoll(2)-based event loop and processes I/O
using Linux AIO.
This approach performs very well but can be used for raw image files
only. The number of IOPS achieved has been reported to be several times
higher than the existing virtio-blk implementation.
Eventually it should be possible to unify virtio-blk-data-plane with the
main body of QEMU code once the block layer and hardware emulation is
able to run outside the global mutex.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Two slightly different versions of a patch to conditionally set
VIRTIO_BLK_F_CONFIG_WCE through the "config-wce" qdev property have been
applied (ea776abca and eec7f96c2). David Gibson
<david@gibson.dropbear.id.au> noticed that the "config-wce"
property is broken as a result and fixed it recently.
The fix sets the host_features VIRTIO_BLK_F_CONFIG_WCE bit from a qdev
property. Unfortunately, the virtio device then has no chance to test
for the presence of the feature bit during virtio_blk_init().
Therefore, reinstate the VirtIOBlkConf->config_wce flag. Drop the
duplicate qdev property to set the host_features bit. The
VirtIOBlkConf->config_wce flag will be used by virtio-blk-data-plane in
a later patch.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The IOQueue has a pool of iocb structs and a function to add new
read/write requests. Multiple requests can be added before calling the
submit function to actually tell the host kernel to begin I/O. This
allows callers to batch requests and submit them in one go.
The actual I/O is performed using Linux AIO.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Outside the safety of the global mutex we need to poll on file
descriptors. I found epoll(2) is a convenient way to do that, although
other options could replace this module in the future (such as an
AioContext-based loop or glib's GMainLoop).
One important feature of this small event loop implementation is that
the loop can be terminated in a thread-safe way. This allows QEMU to
stop the data plane thread cleanly.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The virtio-blk-data-plane cannot access memory using the usual QEMU
functions since it executes outside the global mutex and the memory APIs
are this time are not thread-safe.
This patch introduces a virtqueue module based on the kernel's vhost
vring code. The trick is that we map guest memory ahead of time and
access it cheaply outside the global mutex.
Once the hardware emulation code can execute outside the global mutex it
will be possible to drop this code.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The data plane thread needs to map guest physical addresses to host
pointers. Normally this is done with cpu_physical_memory_map() but the
function assumes the global mutex is held. The data plane thread does
not touch the global mutex and therefore needs a thread-safe memory
mapping mechanism.
Hostmem registers a MemoryListener similar to how vhost collects and
pushes memory region information into the kernel. There is a
fine-grained lock on the regions list which is held during lookup and
when installing a new regions list.
When the physical memory map changes the MemoryListener callbacks are
invoked. They build up a new list of memory regions which is finally
installed when the list has been completed.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This optimizes MSIX handling in virtio-pci.
Also included is pci express capability bugfix.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJQ2tD2AAoJECgfDbjSjVRpUNcIAKN2c+3iiutUWFBBII2TWppc
QAQ4Q5HK7gCtAnwNrlQMAIXcUzHBd5s6BW74BaFBZYymf/tqe4CsvmIH15qQyvm0
McdJAba3FLk0+TELG/Fmf4+faM/kr3gl5Cve3YJC69NHpcq3gi8V4696sP8cGfUt
atA+NR8AITBJDmQlcq6Vwfp+t+B1MY9D9SROT/BmfO+/kY3krkhlPL2pdcoinBa2
zKJLz+jE0tjz7kZ99bmbb2uzKImvtFwxCVZjhD0UINjDOWd9k6ao2pWQIEftv56z
zwz/L8TKCFdM2350XXPg99f4WbrvBqmg3Slb4vrsIYEuAWvArI8sUSYG3rC4fS4=
=8Jun
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'mst/tags/for_anthony' into staging
pci,virtio
This optimizes MSIX handling in virtio-pci.
Also included is pci express capability bugfix.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
* mst/tags/for_anthony:
virtio-pci: don't poll masked vectors
msix: expose access to masked/pending state
msi: add API to get notified about pending bit poll
pcie: Fix bug in pcie_ext_cap_set_next
virtio: make bindings typesafe
There are several ARM and MIPS boards which are manufactured with
either Intel (pflash_cfi01.c) or AMD (pflash_cfi02.c) flash memory.
The Linux kernel supports both and first probes for AMD flash which
resulted in one or two warnings from the Intel flash emulation:
pflash_write: Unimplemented flash cmd sequence (offset 0000000000000000, wcycle 0x0 cmd 0x0 value 0xf000f0)
pflash_write: Unimplemented flash cmd sequence (offset 0000000000000000, wcycle 0x0 cmd 0x0 value 0xf0)
These warnings confuse users, so suppress them.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* 'qom-cpu' of git://repo.or.cz/qemu/afaerber:
MAINTAINERS: Include X86CPU in CPU maintenance area
cpu: Move kvm_run into CPUState
cpu: Move kvm_state field into CPUState
ppc_booke: Pass PowerPCCPU to ppc_booke_timers_init()
ppc4xx_devs: Return PowerPCCPU from ppc4xx_init()
ppc_booke: Pass PowerPCCPU to {decr,fit,wdt} timer callbacks
ppc: Pass PowerPCCPU to [h]decr timer callbacks
ppc: Pass PowerPCCPU to [h]decr callbacks
ppc: Pass PowerPCCPU to ppc_set_irq()
kvm: Pass CPUState to kvm_vcpu_ioctl()
kvm: Pass CPUState to kvm_arch_*
cpu: Move kvm_fd into CPUState
qdev-properties.c: Separate core from the code used only by qemu-system-*
qdev: Coding style fixes
cpu: Introduce CPUListState struct
target-alpha: Add support for -cpu ?
target-alpha: Turn CPU definitions into subclasses
target-alpha: Avoid leaking the alarm timer over reset
alpha: Pass AlphaCPU array to Typhoon
target-alpha: Let cpu_alpha_init() return AlphaCPU
At the moment, when irqfd is in use but a vector is masked,
qemu will poll it and handle vector masks in userspace.
Since almost no one ever looks at the pending bits,
it is better to defer this until pending bits
are actually read.
Implement this optimization using the new poll notifier.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>