Currently, when KVM is enabled, the pseries machine checks if the host
CPU supports VMX, VSX and/or DFP instructions and advertises
accordingly in the guest device tree. It does this regardless of what
CPU is selected on the command line. On the other hand, when in TCG
mode, it never advertises any of these facilities, even basic VMX
(Altivec) which is supported in TCG.
Now that we have a -cpu host option for ppc, it is fairly
straightforward to fix both problems. This patch changes the -cpu
host code to override the basic cpu spec derived from the PVR with
information queried from the host avout VMX, VSX and DFP capability.
The pseries code then uses the instruction availability advertised in
the cpu state to set the guest device tree correctly for both the KVM
and TCG cases.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
We have several targets in the PPC tree now that basically require libfdt
to function properly, namely the pseries and the e500 targets. This dependency
will rather increase than decrease in the future, so I want to make sure
that people building shiny new 1.0 actually have libfdt installed to get
rid of a few ifdefs in the code.
Warning: This patch will likely make configure fail for people who don't
select their own --target-list, but don't have libfdt development packages
installed. However, we really need this new dependency to move on.
Signed-off-by: Alexander Graf <agraf@suse.de>
---
v1 -> v2:
- no paranthesis
- no fdt check for config_pseries
- add . in error message
In __cpu_ppc_store_decr(), we set up a regular timer used to trigger
decrementer interrupts. This is necessary to implement the decrementer
properly under TCG, but is unnecessary under KVM (true for both Book3S-PR
and Book3S-HV KVM variants), because the kernel handles generating and
delivering decrementer exceptions.
Under kvm, in fact, the timer causes expensive and unnecessary exits from
kvm to qemu. This patch, therefore, disables setting the timer when kvm
is in use.
Signed-off-by: Anton Blanchard <anton@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The sole reason we have the ppcemb target is to support MMUs that have
less than the usual 4k possible page size. There are very few of these
chips and I don't want to add additional QA and testing burden to everyone
to ensure that code still works when TARGET_PAGE_SIZE is not 4k.
So this patch disables all CPUs except for MMU_BOOKE capable ones from
the ppcemb target.
Signed-off-by: Alexander Graf <agraf@suse.de>
Some 32-bit PPC CPUs can use up to 36 bit of physical address space.
Treat them accordingly in the qemu-system-ppc binary type.
Signed-off-by: Alexander Graf <agraf@suse.de>
Now that we've implemented -cpu host for ppc, this patch updates the
pseries machine to use the host cpu as the guest cpu by default when
running under KVM. This is important because under KVM Book3S-HV the guest
cpu _cannot_ be of a different type to the host cpu (at the moment
KVM Book3S-HV will silently virtualize the host cpu instead of whatever was
requested, but in future it is likely to simply refuse to run the VM if
a cpu model other than the host's is requested).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch adds cpu specs to the table for POWER7 revisions 2.1 and 2.3.
This allows -cpu host to be used on these host cpus.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
For convenience with kvm, x86 allows the user to specify -cpu host on the
qemu command line, which means make the guest cpu the same as the host
cpu. This patch implements the same option for ppc targets.
For now, this just read the host PVR (Processor Version Register) and
selects one of our existing CPU specs based on it. This means that the
option will not work if the host cpu is not supported by TCG, even if that
wouldn't matter for use under kvm.
In future, we can extend this in future to override parts of the cpu spec
based on information obtained from the host (via /proc/cpuinfo, the host
device tree, or explicit KVM calls). That will let us handle cases where
the real kvm-virtualized CPU doesn't behave exactly like the TCG-emulated
CPU. With appropriate annotation of the CPU specs we'll also then be able
to use host cpus under kvm even when there isn't a matching full TCG model.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The ppc target contains a ppc_find_by_pvr() function, which looks up a
CPU spec based on a PVR (that is, based on the value in the target cpu's
Processor Version Register). PVR values contain information on both the
cpu model (upper 16 bits, usually) and on the precise revision (low 16
bits, usually).
ppc_find_by_pvr, as well as making exact PVR matches, attempts to find
"close" PVR matches, when we don't have a CPU spec for the exact revision
specified. This sounds like a good idea, execpt that the current logic
is completely nonsensical.
It seems to assume CPU families are subdivided bit by bit in the PVR in a
way they just aren't. Specifically, it requires a match on all bits of the
specified pvr up to the last non-zero bit. This has the bizarre effect
that when the low bits are simply a sequential revision number (a common
though not universal pattern), then odd specified revisions must be matched
exactly, whereas even specified revisions will also match the next odd
revision, likewise for powers of 4, 8 and so forth.
To correctly do inexact matching we'd need to re-organize the table of CPU
specs to include a mask showing what PVR range the spec is compatible with
(similar to the cputable code in the Linux kernel).
For now, just remove the bogosity by only permitting exact PVR matches.
That at least makes the matching simple and consistent. If we need inexact
matching we can add the necessary per-subfamily masks later.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch is a general update to the SLOF firmware image used on the
pseries machine. This doesn't contain updates for specific features but
contains a number of bugfixes and enhancements in the main SLOF tree from
Thomas Huth.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Sufficiently recent PAPR specifications define properties "ibm,vmx"
and "ibm,dfp" on the CPU node which advertise whether the VMX vector
extensions (or the later VSX version) and/or the Decimal Floating
Point operations from IBM's recent POWER CPUs are available.
Currently we do not put these in the guest device tree and the guest
kernel will consequently assume they are not available. This is good,
because they are not supported under TCG. VMX is similar enough to
Altivec that it might be trivial to support, but VSX and DFP would
both require significant work to support in TCG.
However, when running under kvm on a host which supports these
instructions, there's no reason not to let the guest use them. This
patch, therefore, checks for the relevant support on the host CPU
and, if present, advertises them to the guest as well.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently the kvmppc_get_clockfreq() function reads the host's clock
frequency from /proc/device-tree, which is useful to past to the guest
in KVM setups. However, there are some other host properties
advertised in the device tree which can also be relevant to the
guests.
This patch, therefore, replaces kvmppc_get_clockfreq() which can
retrieve any named, single integer property from the host device
tree's CPU node.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
SPE instructions are defined by pairs. Currently, the invalid-bits mask is set
for the first instruction, but the second one can have a different mask.
example:
GEN_SPE(efdcmpeq, efdcfs, 0x17, 0x0B, 0x00600000, 0x00180000, PPC_SPE_DOUBLE),
Signed-off-by: Fabien Chouteau <chouteau@adacore.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch updates the SLOF submodule and precompiled image. The new
SLOF versions contains two changes of note:
* The previous SLOF has a bug in SCSI condition handling that was
exposed by recent updates to qemu's SCSI emulation. This update
fixes the bug.
* The previous SLOF has a bug in its addressing of SCSI devices,
which can be exposed under certain conditions. The new SLOF also
fixes this.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The pseries machine of qemu implements the TCE mechanism used as a
virtual IOMMU for the PAPR defined virtual IO devices. Because the
PAPR spec only defines a small DMA address space, the guest VIO
drivers need to update TCE mappings very frequently - the virtual
network device is particularly bad. This means many slow exits to
qemu to emulate the H_PUT_TCE hypercall.
Sufficiently recent kernels allow this to be mitigated by implementing
H_PUT_TCE in the host kernel. To make use of this, however, qemu
needs to initialize the necessary TCE tables, and map them into itself
so that the VIO device implementations can retrieve the mappings when
they access guest memory (which is treated as a virtual DMA
operation).
This patch adds the necessary calls to use the KVM TCE acceleration.
If the kernel does not support acceleration, or there is some other
error creating the accelerated TCE table, then it will still fall back
to full userspace TCE implementation.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
At present, using the hypervisor aware Book3S-HV KVM will only work
with qemu on POWER7 CPUs. PPC970 CPUs also have hypervisor
capability, but they lack the VRMA feature which makes assigning guest
memory easier.
In order to allow KVM Book3S-HV on PPC970, we need to specially
allocate the first chunk of guest memory (the "Real Mode Area" or
RMA), so that it is physically contiguous.
Sufficiently recent host kernels allow such contiguous RMAs to be
allocated, with a kvm capability advertising whether the feature is
available and/or necessary on this hardware. This patch enables qemu
to use this support, thus allowing kvm acceleration of pseries qemu
machines on PPC970 hardware.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
---
agraf: fix to use memory api
Alex Graf has already made qemu support KVM for the pseries machine
when using the Book3S-PR KVM variant (which runs the guest in
usermode, emulating supervisor operations). This code allows gets us
very close to also working with KVM Book3S-HV (using the hypervisor
capabilities of recent POWER CPUs).
This patch moves us another step towards Book3S-HV support by
correctly handling SMT (multithreaded) POWER CPUs. There are two
parts to this:
* Querying KVM to check SMT capability, and if present, adjusting the
cpu numbers that qemu assigns to cause KVM to assign guest threads
to cores in the right way (this isn't automatic, because the POWER
HV support has a limitation that different threads on a single core
cannot be in different guests at the same time).
* Correctly informing the guest OS of the SMT thread to core mappings
via the device tree.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
When access PPCE500_PCI_IW1 the previous index get overflow.
The patch fix the issue and update all to keep consistent style.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Put trailing statements on next line.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Reviewed-by: Andreas Färber <andreas.faerber@web.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
If the deposit replaces the entire word, optimize to a move.
If we're inserting to the top of the word, avoid the mask of arg2
as we'll be shifting out all of the garbage and shifting in zeros.
If the host is 32-bit, reduce a 64-bit deposit to a 32-bit deposit
when possible.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
this patch fix multiple issues with VirtFS tracing.
a) Add tracepoint to the correct code path. We handle error in complete_pdu
b) Fix indentation in python script
c) Fix variable naming issue in python script
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Adding an offset to a void pointer works with gcc but is not allowed
by the current C standards. With -pedantic, gcc complains:
exec-all.h:344: error: pointer of type ‘void *’ used in arithmetic
Fix this, and also replace (unsigned long) by (uintptr_t) in the same
statement.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
linux-headers/asm is a symlink generated during configure. It should not,
therefore be committed to git, nor show up in git diffs and the like.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
scsi-block is a new device that supports device passthrough of Linux
block devices (i.e. /dev/sda, not /dev/sg0). It uses SG_IO for commands
other than I/O commands, and regular AIO read/writes for I/O commands.
Besides being simpler to configure (no mapping required to scsi-generic
device names), this removes the need for a large bounce buffer and,
in the future, will get scatter/gather support for free from scsi-disk.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The request restart mechanism is generic and could be reused for
scsi-generic. In the meanwhile, pushing it to SCSIDevice avoids
that scsi_dma_restart_bh looks at SCSIGenericReqs when working on
a scsi-block device.
The code is the same that is already in hw/scsi-disk.c, with
the type flags replaced by req->cmd.mode and a more generic way to
requeue SCSI_XFER_NONE commands.
I also added a missing call to qemu_del_vm_change_state_handler.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In some cases a request may be canceled before the completion callback
runs. Keep a reference to the request between starting an AIO operation
and the corresponding scsi_req_cancel or scsi_*_complete.
When a request has to be retried, the request can be dropped because
scsi_dma_restart_bh only looks at requests that are enqueued. As such,
they always have at least a reference.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Otherwise, if cancellation is "faked" by the AIO layer and goes
through qemu_aio_flush, the whole request is completed synchronously
during scsi_req_cancel.
Using the enqueued flag would work here, but not in the next patches,
so I'm introducing a new io_canceled flag. That's because scsi_req_data
is a synchronous callback and the enqueued flag might be reset by the
time it returns. scsi-disk cannot unref the request until after calling
scsi_req_data.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This will let scsi-block choose between passthrough and emulation.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Also delete a stale occurrence of SCSIReqOps inside SCSIDeviceInfo.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The field is only in scsi-disk for now. Moving it up to SCSIDevice makes
it easier to reuse the scsi-generic reqops elsewhere.
At the same time, make scsi-generic get max_lba from snooped READ CAPACITY
commands as well.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Set s->removable, s->qdev.blocksize and s->qdev.type in the callers
of scsi_initfn.
With this in place, s->qdev.type is allowed, and we can just reuse it
as the first byte in VPD data (just like we do in standard INQUIRY data).
Also set s->removable is set consistently and we can use it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This field is redundant, and having it makes it more complicated
to share reqops between the upcoming scsi-block and scsi-generic.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Same as for scsi-generic, avoid duplication even if it causes longer
lines.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of "guessing" the block size when there is no medium in the
drive, wait for the guest to send a READ CAPACITY command and snoop
it from there.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Pass down the host status so that failing transport can be detected
by the guest. Similar treatment of host status could be done in
virtio-blk, too.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
A succeeding ioctl does not imply that the SCSI command succeeded.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is not needed anymore, since asynchronous ioctls were introduced
by commit 221f715 (new scsi-generic abstraction, use SG_IO, 2009-03-28).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It is not needed, because s->bs is already stored in SCSIDevice, and
can be reached from the conf.bs member.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Flush does not go anymore through scsi_disk_emulate_command.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested by the Windows Logo Kit SCSI Compliance test. From SBC-3, paragraph
5.25: "The LOGICAL BLOCK ADDRESS field shall be set to zero if the PMI
bit is set to zero. If the PMI bit is set to zero and the LOGICAL BLOCK
ADDRESS field is not set to zero, then the device server shall terminate
the command with CHECK CONDITION status with the sense key set to ILLEGAL
REQUEST and the additional sense code set to INVALID FIELD IN CDB".
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This also requires little more than adding the new argument to
scsi_device_find, and the qdev property. All devices by default
end up on channel 0.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This only requires changes in two places: in SCSIBus, we need to look
for a free LUN if somebody creates a device with a pre-existing scsi-id
but the default LUN (-1, meaning "search for a free spot"); in vSCSI,
we need to actually parse the LUN according to the SCSI spec.
For vSCSI, max_target/max_lun are set according to the logical unit
addressing format in SAM.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Change the devs array into a linked list, and add a scsi_device_find
function to navigate the children list instead. This lets the SCSI
bus use more complex addressing, and HBAs can talk to the correct device
when there are multiple LUNs per target.
scsi_device_find may return another LUN on the same target if none is
found that matches exactly.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
SCSI buses will need to read the children list first-to-last. This
requires using a QTAILQ, because hell breaks loose if you just try
inserting at the tail (thus reversing the order of all existing
visits from last-to-first to first-to-tail).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds support for media change notification via the GET EVENT STATUS
NOTIFICATION command, used by Linux versions 2.6.38 and newer.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>