Commit Graph

559 Commits

Author SHA1 Message Date
Stefan Hajnoczi
1351d1ec89 qdev: drop iothread property type
The iothread property type is no longer used and can be removed.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-01 09:15:02 +02:00
Stefan Hajnoczi
32a877e405 virtio-blk: move qdev properties into virtio-blk.c
There is no need to make DEFINE_VIRTIO_BLK_PROPERTIES() public.  Inline
it into virtio-blk.c so it cannot be used by mistake from other source
files.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
2014-07-01 09:15:02 +02:00
Stefan Hajnoczi
f7fedda84a virtio-blk: drop virtio_blk_set_conf()
This function is no longer used since parent objects now use child
aliases to set the VirtIOBlkConf directly.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
2014-07-01 09:15:02 +02:00
Stefan Hajnoczi
67cc7e0aac qdev: add qdev_alias_all_properties()
The qdev_alias_all_properties() function creates QOM alias properties
for each qdev property on a DeviceState.  This is useful for parent
objects that wish to forward property accesses to their children.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
2014-07-01 09:15:02 +02:00
Stefan Hajnoczi
ee512c6f21 virtio-blk: move x-data-plane qdev property to virtio-blk.h
Move the x-data-plane property.  Originally it was outside since not
every transport may wish to support dataplane.  But that makes little
sense when we have a dedicated CONFIG_VIRTIO_BLK_DATA_PLANE ifdef
already.

This move makes it easier to switch to property aliases in the next
patch.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
2014-07-01 09:15:02 +02:00
Stefan Hajnoczi
dc80ca6cd6 virtio-blk: avoid qdev property definition duplication
It becomes unwiedly to duplicate all virtio-blk qdev property
definitions due to an #ifdef.  The C preprocessor syntax makes it a
little hard to resolve this cleanly but we can extract the #ifdef and
call a macro it defines later.

Avoiding duplication is important since it will only get worse when we
move the x-data-plane qdev property here too.  We'd have a combinatorial
explosion since x-data-plane has its own #ifdef.

Suggested-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
2014-07-01 09:15:02 +02:00
Greg Kurz
0f5d1d2a49 virtio: memory accessors for endian-ambivalent targets
This is the virtio-access.h header file taken from Rusty's "endian-ambivalent
targets using legacy virtio" patch. It introduces helpers that should be used
when accessing vring data or by drivers for data that contains headers.
The virtio config space is also target endian, but the current code already
handles that with the virtio_is_big_endian() helper. There is no obvious
benefit at using the virtio accessors in this case.

Now we have two distinct paths: a fast inline one for fixed endian targets,
and a slow out-of-line one for targets that define the new TARGET_IS_BIENDIAN
macro.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
[ relicensed virtio-access.h to GPLv2+ on Rusty's request,
  pass &address_space_memory to physical memory accessors,
  per-device endianness,
  virtio tswap16 and tswap64 helpers,
  faspath for fixed endian targets,
  Greg Kurz <gkurz@linux.vnet.ibm.com> ]
Cc: Cédric Le Goater <clg@fr.ibm.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-29 19:39:42 +03:00
Greg Kurz
616a655219 virtio: add endian-ambivalent support to VirtIODevice
Some CPU families can dynamically change their endianness. This means we
can have little endian ppc or big endian arm guests for example. This has
an impact on legacy virtio data structures since they are target endian.
We hence introduce a new property to track the endianness of each virtio
device. It is reasonnably assumed that endianness won't change while the
device is in use : we hence capture the device endianness when it gets
reset.

We migrate this property in a subsection, after the device descriptor. This
means the load code must not rely on it until it is restored. As a consequence,
the vring sanity checks had to be moved after the call to vmstate_load_state().
We enforce paranoia by poisoning the property at the begining of virtio_load().

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-29 19:39:42 +03:00
Greg Kurz
98ed8ecfc9 exec: introduce target_words_bigendian() helper
We currently have a virtio_is_big_endian() helper that provides the target
endianness to the virtio code. As of today, the helper returns a fixed
compile-time value. Of course, this will have to change if we want to
support target endianness changes at run-time.

Let's move the TARGET_WORDS_BIGENDIAN bits out to a new helper and have
virtio_is_big_endian() implemented on top of it.

This patch doesn't change any functionality.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-29 19:39:42 +03:00
Greg Kurz
1b5fc0dea4 virtio: introduce device specific migration calls
In order to migrate virtio subsections, they should be streamed after
the device itself. We need the device specific code to be called from
the common migration code to achieve this. This patch introduces load
and save methods for this purpose.

Suggested-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-29 19:39:41 +03:00
Eduardo Habkost
fa118d1f8b pc: Fix "prog_if" typo on PC_COMPAT_2_0
The property name is "prog_if", not "prof_if".

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reported-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-29 18:59:07 +03:00
Eduardo Habkost
b8f5cfd682 pc: Move q35 compat props to PC_COMPAT_*
For each compat property on PC_Q35_COMPAT_*, there are only two
possibilities:

 * If the device is never instantiated when using a machine other than
   pc-q35, then the compat property can be safely added to
   PC_COMPAT_*;
 * If the device can be instantiated when using a machine other than
   pc-q35, that means the other machines also need the compat property
   to be set.

That means we don't need separate PC_Q35_COMPAT_* macros at all, today.

The hpet.hpet-intcap case is interesting: piix and q35 do have something
that emulates different defaults, but the machine-specific default is
applied _after_ compat_props are applied, by simply checking if the
property is zero (which is the real default on the hpet code).

The hpet.hpet-intcap=0x4 compat property can (should?) be applied to
piix too, because 0x4 was the default on both piix and q35 before the
hpet-intcap property was introduced.

Now, if one day we change the default HPET intcap on one of the PC
machine-types again, we may want to introduce PC_{Q35,I440FX}_COMPAT
macros. But while we don't need that, we can keep the code simple.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Cc: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-29 18:59:06 +03:00
Peter Maydell
2d40fa6987 Block patches for 2.1.0-rc0
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTrbz4AAoJEH8JsnLIjy/WlIkP/RepIwS29f19i3B/idGzUdYW
 9XJnVowRvpkUzDqUprrr7lPHMW/CwAswLNis9B1hZ59rx+tx4Hm/rZGARlqhSOSO
 ZMdW32GFW0SyC5PglFBwGQAk4U0FxwW5cJD6US7h3L4pACIdCkzFSNxehyfCMyU/
 oJkjuAH4a2IQoQf/M7WMm5kPkrdpRp6ZgbQvJGHaR63cuulZDb7rbHMyG66MWH8P
 wahhFFPY1wOeMBiISxPbmcTus+AlfCffG5qPqq83OtaIuWzINTmWlpiFmtx+Aqwy
 HSvGnFJ4Rf7J6Fw8sdTsABdqUTc/gxDYmhAuftm/hsjD9MvPeuFSLPMPLfGg6aPR
 umKaeBOw8NoMTPgbxg403gxFTrHar+TidBu8KgZw5T189/oJSSpT2J53uHWazmd9
 8USkcYQ7VHdFUQVXluLEzHMIWc7kf87ylQ8c9S1yCkNeWYxRZDZGgHEU49ov7FFU
 FnA0w+ZFyDkU8d5gryG+vxOeBDlmXD4UHa676gGlaYhs7YC/BY/JaMgqY4Fd6MMW
 dS5ibPjdtbxEZTh29eWEByMWpzuitr+iPPzsJEdC29LeIIj3XRQq/4FyiQ6EMAAO
 iOlcqE3tws0Ty8GEp78xsAYjaLuH3zmvOTa4aHUQ+K9kwpMPFSJKEcLkwPWWYRbs
 qR2ZL6M+95oQTYkYzv8i
 =Wwqx
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block patches for 2.1.0-rc0

# gpg: Signature made Fri 27 Jun 2014 19:50:32 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream: (47 commits)
  iotests: Fix 083 for out-of-tree builds
  iotests: Drop Python version from 065's Shebang
  iotests: Use $PYTHON for Python scripts
  iotests: Source common.env
  configure: Enable out-of-tree iotests
  iotests: Allow out-of-tree run
  block.c: Don't return success for bdrv_append_temp_snapshot() failure
  qemu-iotests: Add TestRepairQuorum to 041 to test drive-mirror node-name mode.
  block: Add replaces argument to drive-mirror
  blockjob: Fix recent BLOCK_JOB_ERROR regression
  blockjob: Fix recent BLOCK_JOB_READY regression
  virtio-blk: Rename complete_request_early to complete_request_vring
  virtio-blk: Unify {non-,}dataplane's request handlings
  virtio-blk: Schedule BH in the right context
  virtio-blk: Export request handling functions to dataplane
  virtio-blk: Make request completion function virtual
  block: acquire AioContext in qmp_query_blockstats()
  block: make bdrv_query_stats() static
  virtio-blk: Fix and clean up the in_sg and out_sg check
  virtio-blk: Fill in VirtIOBlockReq.out in dataplane code
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-06-29 15:24:54 +01:00
Fam Zheng
fee65db771 virtio-blk: Export request handling functions to dataplane
So that dataplane can use virtio_blk_handle_request and
virtio_submit_multiwrite.

Signed-off-by: Fam Zheng <famz@redhat.com>
Tested-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-27 18:20:35 +02:00
Fam Zheng
bf4bd461b4 virtio-blk: Make request completion function virtual
virtio_blk_req_complete will call VirtIOBlock.complete_request() to push
data and notify guest. No functional change.

Later, this will allow dataplane to provide it's own (vring_) version.

Signed-off-by: Fam Zheng <famz@redhat.com>
Tested-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-27 18:20:32 +02:00
Fam Zheng
827805a249 virtio-blk: Convert VirtIOBlockReq.out to structrue
The virtio code currently assumes that the outhdr is in its own iovec.
This is not guaranteed by the spec, so we should relax this assumption.

Convert the VirtIOBlockReq.out field to structrue so that we can use
iov_to_buf and then discard the header from the beginning of iovec.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-27 18:18:25 +02:00
Fam Zheng
eddb102e86 virtio-blk: Use VirtIOBlockReq.in to drop VirtIOBlockReq.inhdr
In current virtio spec, inhdr is a single byte, and is unlikely to
change for both functionality and compatibility considerations.
Non-dataplane uses .in, and we are on the way to converge them. So
let's unify it to get cleaner code.

Remove .inhdr and use .in.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-27 18:18:23 +02:00
Fam Zheng
04af2d70c5 virtio-blk: Replace VirtIOBlockRequest with VirtIOBlockReq
Field "inhdr" is added temporarily for a more mechanical change, and
will be dropped in the next commit.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-27 18:18:20 +02:00
Fam Zheng
671ec3f056 virtio-blk: Convert VirtIOBlockReq.elem to pointer
This will make converging with dataplane code easier.

Add virtio_blk_free_request to handle the freeing of request internal
fields.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-27 18:18:13 +02:00
Fam Zheng
09f6458770 virtio-blk: Move VirtIOBlockReq to header
For later reusing by dataplane code.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-27 18:17:59 +02:00
Alexey Kardashevskiy
9a321e9234 spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
Currently SPAPR PHB keeps track of all allocated MSI (here and below
MSI stands for both MSI and MSIX) interrupt because
XICS used to be unable to reuse interrupts. This is a problem for
dynamic MSI reconfiguration which happens when guest reloads a driver
or performs PCI hotplug. Another problem is that the existing
implementation can enable MSI on 32 devices maximum
(SPAPR_MSIX_MAX_DEVS=32) and there is no good reason for that.

This makes use of new XICS ability to reuse interrupts.

This reorganizes MSI information storage in sPAPRPHBState. Instead of
static array of 32 descriptors (one per a PCI function), this patch adds
a GHashTable when @config_addr is a key and (first_irq, num) pair is
a value. GHashTable can dynamically grow and shrink so the initial limit
of 32 devices is gone.

This changes migration stream as @msi_table was a static array while new
@msi_devs is a dynamic hash table. This adds temporary array which is
used for migration, it is populated in "spapr_pci"::pre_save() callback
and expanded into the hash table in post_load() callback. Since
the destination side does not know the number of MSI-enabled devices
in advance and cannot pre-allocate the temporary array to receive
migration state, this makes use of new VMSTATE_STRUCT_VARRAY_ALLOC macro
which allocates the array automatically.

This resets the MSI configuration space when interrupts are released by
the ibm,change-msi RTAS call.

This fixed traces to be more informative.

This changes vmstate_spapr_pci_msi name from "...lsi" to "...msi" which
was incorrect by accident. As the internal representation changed,
thus bumps migration version number.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[agraf: drop g_malloc_n usage]
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:27 +02:00
Alexey Kardashevskiy
51bba713fe xics: Implement xics_ics_free()
This implements interrupt release function so IRQs can be returned back
to the pool for reuse in cases such as PCI hot plug.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:26 +02:00
Alexey Kardashevskiy
bee763dbfb spapr: Move interrupt allocator to xics
The current allocator returns IRQ numbers from a pool and does not
support IRQs reuse in any form as it did not keep track of what it
previously returned, it only keeps the last returned IRQ. Some use
cases such as PCI hot(un)plug may require IRQ release and reallocation.

This moves an allocator from SPAPR to XICS.

This switches IRQ users to use new API.

This uses LSI/MSI flags to know if interrupt is allocated.

The interrupt release function will be posted as a separate patch.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:26 +02:00
Alexey Kardashevskiy
4af88944d0 xics: Add flags for interrupts
The existing interrupt allocation scheme in SPAPR assumes that
interrupts are allocated at the start time, continously and the config
will not change. However, there are cases when this is not going to work
such as:

1. migration - we will have to have an ability to choose interrupt
numbers for devices in the command line and this will create gaps in
interrupt space.

2. PCI hotplug - interrupts from unplugged device need to be returned
back to interrupt pool, otherwise we will quickly run out of interrupts.

This replaces a separate lslsi[] array with a byte in the ICSIRQState
struct and defines "LSI" and "MSI" flags. Neither of these flags set
signals that the descriptor is not allocated and not in use.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:26 +02:00
Sam bobroff
3b50d8974b spapr: Add RTAS sysparm SPLPAR Characteristics
Add support for the SPLPAR Characteristics parameter to the emulated
RTAS call ibm,get-system-parameter.

The support provides just enough information to allow "cat
/proc/powerpc/lparcfg" to succeed without generating a kernel error
message.

Without this patch the above command will produce the following kernel
message: arch/powerpc/platforms/pseries/lparcfg.c \
parse_system_parameter_string Error calling get-system-parameter \
(0xfffffffd)

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:26 +02:00
Sam bobroff
b907d7b0fd spapr: Add RTAS sysparm UUID
Add support for the UUID parameter to the emulated RTAS call
ibm,get-system-parameter.

Return the guest's UUID as the value for the RTAS UUID system
parameter, or null (a zero length result) if it is not set.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:26 +02:00
Sam bobroff
3052d95190 spapr: Fix RTAS sysparm DIAGNOSTICS_RUN_MODE
This allows the ibm,get-system-parameter RTAS call to succeed for the
DIAGNOSTICS_RUN_MODE system parameter.

The problem can be seen with "ppc64_cpu --run-mode" from the
powerpc-utils package which fails before this patch with "Machine does
not support diagnostic run mode".

This is corrected by using the rtas_st_buffer() function to write to
the buffer.

The RTAS constants are also moved out into a header file, some new
constants added and the surrounding code slightly simplified.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[agraf: remove some commentary]
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:25 +02:00
Sam bobroff
ce3fa1eca2 spapr: Add rtas_st_buffer utility function
Add a function to write lengh + data into a buffer as required for the
emulation of the RTAS ibm,get-system-parameter call.

If the destination is smaller than the source, the write is truncated
and success is returned. This matches the behaviour of pHyp.

This will be used in following patches.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:25 +02:00
Alexey Kardashevskiy
9fc34ada7e spapr_pci_vfio: Add spapr-pci-vfio-host-bridge to support vfio
The patch adds a spapr-pci-vfio-host-bridge device type
which is a PCI Host Bridge with VFIO support. The new device
inherits from the spapr-pci-host-bridge device and adds an "iommu"
property which is an IOMMU id. This ID represents a minimal entity
for which IOMMU isolation can be guaranteed. In SPAPR architecture IOMMU
group is called a Partitionable Endpoint (PE).

Current implementation supports one IOMMU id per QEMU VFIO PHB. Since
SPAPR allows multiple PHB for no extra cost, this does not seem to
be a problem. This limitation may change in the future though.

Example of use:
Configure and Add 3 functions of a multifunctional device to QEMU:
(the NEC PCI USB card is used as an example here):
-device spapr-pci-vfio-host-bridge,id=USB,iommu=4,index=7 \
-device vfio-pci,host=4:0:1.0,addr=1.0,bus=USB,multifunction=true
-device vfio-pci,host=4:0:1.1,addr=1.1,bus=USB
-device vfio-pci,host=4:0:1.2,addr=1.2,bus=USB

where:
* index=7 is a QEMU PHB index (used as source for MMIO/MSI/IO windows
offset);
* iommu=4 is an IOMMU id which can be found in sysfs:
[aik@vpl2 ~]$ cd /sys/bus/pci/devices/0004:00:00.0/
[aik@vpl2 0004:00:00.0]$ ls -l iommu_group
lrwxrwxrwx 1 root root 0 Jun  5 12:49 iommu_group -> ../../../kernel/iommu_groups/4

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:23 +02:00
Alexey Kardashevskiy
6d8be4c343 vfio: Add vfio_container_ioctl()
While most operations with VFIO IOMMU driver are generic and used inside
vfio.c, there are still some operations which only specific VFIO IOMMU
drivers implement. The first example of it will be reading a DMA window
start from the host.

This adds a helper which passes an ioctl request to the container's fd.

The helper will check if @req is known. For this, stub is added. This return
-1 on any requests for now.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:23 +02:00
Alexey Kardashevskiy
9bb62a0702 spapr_iommu: Make in-kernel TCE table optional
POWER KVM supports an KVM_CAP_SPAPR_TCE capability which allows allocating
TCE tables in the host kernel memory and handle H_PUT_TCE requests
targeted to specific LIOBN (logical bus number) right in the host without
switching to QEMU. At the moment this is used for emulated devices only
and the handler only puts TCE to the table. If the in-kernel H_PUT_TCE
handler finds a LIOBN and corresponding table, it will put a TCE to
the table and complete hypercall execution. The user space will not be
notified.

Upcoming VFIO support is going to use the same sPAPRTCETable device class
so KVM_CAP_SPAPR_TCE is going to be used as well. That means that TCE
tables for VFIO are going to be allocated in the host as well.
However VFIO operates with real IOMMU tables and simple copying of
a TCE to the real hardware TCE table will not work as guest physical
to host physical address translation is requited.

So until the host kernel gets VFIO support for H_PUT_TCE, we better not
to register VFIO's TCE in the host.

This adds a place holder for KVM_CAP_SPAPR_TCE_VFIO capability. It is not
in upstream yet and being discussed so now it is always false which means
that in-kernel VFIO acceleration is not supported.

This adds a bool @vfio_accel flag to the sPAPRTCETable device telling
that sPAPRTCETable should not try allocating TCE table in the host kernel
for VFIO. The flag is false now as at the moment there is no VFIO.

This adds an vfio_accel parameter to spapr_tce_new_table(), the semantic
is the same. Since there is only emulated PCI and VIO now, the flag is set
to false. Upcoming VFIO support will set it to true.

This is a preparation patch so no change in behaviour is expected

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:23 +02:00
Alexey Kardashevskiy
3a3b8502e6 spapr: Fix RTAS token numbers
At the moment spapr_rtas_register() allocates a new token number for every
new RTAS callback so numbers are not fixed and depend on the number of
supported RTAS handlers and the exact order of spapr_rtas_register() calls.
These tokens are copied into the device tree and remain the same during
the guest lifetime.

When we start another guest to receive a migration, it calls
spapr_rtas_register() as well. If the number of RTAS handlers or their
order is different in QEMU on source and destination sides, the "/rtas"
node in the device tree will differ. Since migration overwrites the device
tree (as it overwrites the entire RAM), the actual RTAS config on
the destination side gets broken.

This defines global contant values for every RTAS token which QEMU
is using today.

This changes spapr_rtas_register() to accept a token number instead of
allocating one. This changes all users of spapr_rtas_register().

This changes XICS-KVM not to cache tokens registered with KVM as they
constant now.

This makes TOKEN_BASE global as RTAS_XXX use TOKEN_BASE as
a base. TOKEN_MAX is moved and renamed too and its value is changed
to the last token + 1. Boundary checks for token values are adjusted.

This reserves token numbers for "os-term" handlers and PCI hotplug
which we are working on.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27 13:48:22 +02:00
Don Slutz
c87b152072 pc & q35: Add new machine opt max-ram-below-4g
This is a pc & q35 only machine opt.

If you add enough PCI devices then all mmio for them will not fit
below 4G which may not be the layout the user wanted. This allows
you to increase the below 4G address space that PCI devices can use
(aka decrease ram below 4G) and therefore in more cases not have any
mmio that is above 4G.

For example using "-machine pc,max-ram-below-4g=2G" on the command
line will limit the amount of ram that is below 4G to 2G.

Note: this machine option cannot be used to increase the amount
of ram below 4G.

Signed-off-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

MST: fix 32 bit
2014-06-23 18:02:41 +03:00
Don Slutz
3c2a96699e xen-hvm: Fix xen_hvm_init() to adjust pc memory layout
This is just below_4g_mem_size and above_4g_mem_size which is used later in QEMU.

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-23 17:50:04 +03:00
Marcel Apfelbaum
f23b6bdc3c hw/pcie: implement power controller functionality
It is needed by hot-unplug in order to get an indication
from the OS when the device can be physically detached.

Signed-off-by: Marcel Apfelbaum <marcel.a@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-23 17:48:42 +03:00
Michael S. Tsirkin
7145872ed3 vhost: block migration if backend does not log memory
vhost user does not support LOG_ALL feature bit.
Generally, we should not try to set this bit without
checking that backend can support it first.

Detect and block migration.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-23 17:37:59 +03:00
Peter Maydell
d70a319b8d Merge remote-tracking branch 'remotes/kvm/uq/master' into staging
* remotes/kvm/uq/master:
  hw/mips: malta: Don't boot from flash with KVM T&E
  MAINTAINERS: Add entry for MIPS KVM
  target-mips: Enable KVM support in build system
  hw/mips: malta: Add KVM support
  hw/mips: In KVM mode, inject IRQ2 (I/O) interrupts via ioctls
  target-mips: Call kvm_mips_reset_vcpu() from mips_cpu_reset()
  target-mips: kvm: Add main KVM support for MIPS
  kvm: Allow arch to set sigmask length
  target-mips: get_physical_address: Add KVM awareness
  target-mips: get_physical_address: Add defines for segment bases
  hw/mips: Add API to convert KVM guest KSEG0 <-> GPA
  hw/mips/cputimer: Don't start periodic timer in KVM mode
  target-mips: Reset CPU timer consistently
  KVM: Fix GSI number space limit

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-06-20 19:25:18 +01:00
Peter Maydell
0a99aae5fa pc,pci,virtio,hotplug fixes, enhancements
numa work by Hu Tao and others
 memory hotplug by Igor
 vhost-user by Nikolay, Antonios and others
 guest virtio announcements by Jason
 qtest fixes by Sergey
 qdev hotplug fixes by Paolo
 misc other fixes mostly by myself
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTowW+AAoJECgfDbjSjVRpnMMH/jp3sKGzJumYLbi5ihjmYyND
 jYd6ySXoVAjUTgaCvdje5srisOap8pbc783kQvQS4CeWsjZ5Vvh+PZjkBPIqF1pD
 celxGQ43CY7QSUWq+02Dg9VIUwLwZqdKlxNsV01FligQn+ZBQ6sQ6ksWx7oGzqRt
 5/HMZykbwUvSk/4xGUaMn2+/4uhQ0Wz5EsCkv9L/u8kS72k6ldc/tCGZMzBUNHTM
 rW5FPYwMQP0MXgGTXnlLEQjJ7Lozc66IaMZoHw/a/aGSIxdag9Otj0ADuXq6yZaV
 Xi4O/EOJWd1JpSG7w8LOyIZNakpHkU43fmJCLzBjDAupHeRp57TcW5ox4PJYAtg=
 =Oxdt
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc,pci,virtio,hotplug fixes, enhancements

numa work by Hu Tao and others
memory hotplug by Igor
vhost-user by Nikolay, Antonios and others
guest virtio announcements by Jason
qtest fixes by Sergey
qdev hotplug fixes by Paolo
misc other fixes mostly by myself

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

* remotes/mst/tags/for_upstream: (109 commits)
  numa: use RAM_ADDR_FMT with ram_addr_t
  qapi/string-output-visitor: fix bugs
  tests: simplify code
  qapi: fix input visitor bugs
  acpi: rephrase comment
  qmp: add ACPI_DEVICE_OST event handling
  qmp: add query-acpi-ospm-status command
  acpi: implement ospm_status() method for PIIX4/ICH9_LPC devices
  acpi: introduce TYPE_ACPI_DEVICE_IF interface
  qmp: add query-memory-devices command
  numa: handle mmaped memory allocation failure correctly
  pc: acpi: do not hardcode preprocessor
  qmp: clean out whitespace
  qdev: recursively unrealize devices when unrealizing bus
  qdev: reorganize error reporting in bus_set_realized
  qapi: fix build on glib < 2.28
  qapi: make string output visitor parse int list
  qapi: make string input visitor parse int list
  tests: fix memory leak in test of string input visitor
  hmp: add info memdev
  ...

Conflicts:
	include/hw/i386/pc.h
[PMM: fixed minor conflict in pc.h]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-06-20 18:01:24 +01:00
Michael S. Tsirkin
b4acfbcd95 acpi: rephrase comment
"only upto" is not proper English.
Say "up to" and drop "only".

Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-19 18:44:22 +03:00
Igor Mammedov
43f5041008 acpi: implement ospm_status() method for PIIX4/ICH9_LPC devices
... using TYPE_ACPI_DEVICE_IF interface.
Which provides status reporting of ACPI declared memory devices

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-19 18:44:22 +03:00
Igor Mammedov
521b3673ac acpi: introduce TYPE_ACPI_DEVICE_IF interface
... it will be used to abstract generic ACPI bits from
device that implements ACPI interface.

ACPIOSTInfo type is used for passing-through raw _OST
event/status codes reported by guest OS to a management
layer. It lets management tools interpret values
as specified by ACPI spec if it is interested in it.

QEMU doesn't encode these values as enum, since it
doesn't need to handle them and it allows interface
to scale well without any changes in QEMU while guest
OS and management evolves in time.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-19 18:44:22 +03:00
Igor Mammedov
6f2e27301d qmp: add query-memory-devices command
... allowing to get state of present memory devices.
Currently implemented only for PCDIMMDevice.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-19 18:44:22 +03:00
Paolo Bonzini
9521d42b54 pc: pass MachineState to pc_memory_init
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-By: Igor Mammedov <imammedo@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-19 18:44:20 +03:00
Paolo Bonzini
dfabb8b916 numa: introduce memory_region_allocate_system_memory
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

MST: resolve conflicts
2014-06-19 18:44:19 +03:00
Nikolay Nikolaev
1a1bfac9ee Add vhost-backend and VhostBackendType
Use vhost_set_backend_type to initialise a proper vhost_ops structure.
In vhost_net_init and vhost_net_start_one call conditionally TAP related
initialisation depending on the vhost backend type.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-19 16:41:56 +03:00
Nikolay Nikolaev
24d1eb33eb Add vhost_ops to vhost_dev struct and replace all relevant ioctls
Decouple vhost from the Linux kernel by introducing vhost_ops. The
intention is to provide different backends - a 'kernel' backend based on
the ioctl interface, and an 'user' backend based on a UNIX domain socket
and shared memory interface.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-19 16:41:56 +03:00
Nikolay Nikolaev
81647a655f vhost_net_init will use VhostNetOptions to get all its arguments
vhost_dev_init will replace devfd and devpath with a single opaque argument.
This is initialised with a file descriptor. When TAP is used (through
vhost_net), open /dev/vhost-net and pass the fd as an opaque parameter in
VhostNetOptions. The same applies to vhost-scsi - open /dev/vhost-scsi and
pass the fd.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-19 16:41:56 +03:00
Nikolay Nikolaev
2e6d46d77e vhost: add vhost_get_features and vhost_ack_features
Generalize the features get/ack to be used for both vhost-net and vhost-scsi.
In vhost-net add vhost_net_get_feature_bits to select the feature bit set
depending on the NetClient kind.

Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-19 16:41:55 +03:00
Jason Wang
f57fcf7063 virtio-net: announce self by guest
It's hard to track all mac addresses and their configurations (e.g
vlan or ipv6) in qemu. Without this information, it's impossible to
build proper garp packet after migration. The only possible solution
to this is let guest (who knows all configurations) to do this.

So, this patch introduces a new readonly config status bit of virtio-net,
VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce
presence of its link through config update interrupt.When guest has
done the announcement, it should ack the notification through
VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new
feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by
Linux guest).

During load, a counter of announcing rounds is set so that after the vm is
running it can trigger rounds of config interrupts to notify the guest to build
and send the correct garps.

Cc: Liuyongan <liuyongan@huawei.com>
Cc: Amos Kong <akong@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-19 16:41:54 +03:00
Michael S. Tsirkin
292b1634d0 ich: get rid of spaces in type name
Names with spaces in them are nasty, let's not go there.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-06-19 16:41:53 +03:00