Enable transitional virtio devices by default.
Enable virtio-1.0 for devices plugged into
PCIe ports (Root ports or Downstream ports).
Using the virtio-1 mode will remove the limitation
of the number of devices that can be attached to a machine
by removing the need for the IO BAR.
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Convert a device model where initialization obviously can't fail,
make it implement realize() rather than init().
Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Firstly, convert pxb_dev_init_common() to Error and rename
it to pxb_dev_realize_common().
Actually, pxb_register_bus() is converted as well.
And then,
convert pxb_dev_initfn() and pxb_pcie_dev_initfn() to Error,
rename them to pxb_dev_realize() and pxb_pcie_dev_realize()
respectively.
Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In build_crs(), the calculation and merging of the ranges already happens
in 64-bit, but the entry boundaries are silently truncated to 32-bit in the
call to aml_dword_memory(). Fix it by handling the 64-bit MMIO ranges separately.
This fixes 64-bit BARs behind PXBs.
Reported-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Instead of always passing both IO and MEM ranges when
computing CRS ranges, define a new CrsRangeSet structure
that include them both.
This is done before introducing a third type of range,
64-bit MEM, so it will be easier to pass them all around.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
PXBs do not support hotplug so they don't have a PCNT function.
Since the PXB's PCI root-bus is a child bus of bus 0, the
build_dsdt code will add a call to the corresponding PCNT function.
Fix this by skipping the PCNT call for the above case.
While at it skip also PCIe child buses.
Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Prevent future issues when hotplug will work for devices
attached to pxbs.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Specify the root port interrupt pin as part of the init
process for cases when msi/msix are not enabled.
Fixes "hw/pci/pci.c:196:23: runtime error: shift exponent -1 is negative"
warning from clang's sanitizer.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We changed link status register in pci express endpoint capability
over time. Specifically,
commit b2101eae63 ("pcie: Set the "link
active" in the link status register") set data link layer link active
bit in this register without adding compatibility to old machine types.
When migrating from qemu 2.3 and older this affects xhci devices which
under machine type 2.0 and older have a pci express endpoint capability
even if they are on a pci bus.
Add compatibility flags to make this bit value match what it was under
2.3.
Additionally, to avoid breaking migration from qemu 2.3 and up,
suppress checking link status during migration: this seems sane
since hardware can change link status at any time.
https://bugzilla.redhat.com/show_bug.cgi?id=1352860
Reported-by: Gerd Hoffmann <kraxel@redhat.com>
Fixes: b2101eae63
("pcie: Set the "link active" in the link status register")
Cc: qemu-stable@nongnu.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
While implementing TLB invalidation feature we forgot to modify
part of code responsible for updating EntryHi during TLB exception.
Consequently EntryHi.EHINV is unexpectedly cleared on the exception.
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
The print routine provided as part of the in-built bootloader had a bug
in that it attempted to use a jump instruction as part of a loop, but
the target has its upper bits zeroed leading to control flow
transferring to 0xb0000814 rather than the intended 0xbfc00814. Fix this
by using a branch instruction instead, which seems more fit for purpose.
A simple way to test this is to build a Linux kernel with EVA enabled &
attempt to boot it in QEMU. It will attempt to print a message
indicating the configuration mismatch but QEMU would previously
incorrectly jump & wind up printing a continuous stream of the letter E.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
Highlights:
* Fixes to allow CPU hotplug/unplug in any order;
* Exit QEMU on invalid global properties.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJXmMUlAAoJECgHk2+YTcWmLooP/ioPaesxxcz4u7SMS2h6c12z
shC2ERzGNeB+d6KZvscYG/fbR5SV3P1JAtOE5+BbiVvnZRbyvv0NaYhPlnLV/P5C
KKIK3uimaSNyxaDF7zZiImZWLWq8vl51SXITIQenrCjPYPE6q32m3pvlbGJvVi8P
x4lW9LMyRG1a2yOqF9nt+eTqkcigSj6z9cL8MQRoSnDhkXAh/q3ySofKfJ9RAhWQ
tGg5xIa+B/0mjZQwS2e0vnDnVxeYRgQ0o7sdZJD9zjgog8HTKX2cei0b/+FINAKM
bzDAQI97ECsWQ8MxXdcXsu9dHr7mwovmpBvecsU0DZtWnMv0vmd0OxmWJXoI4uh2
4rcg576z9rQU7eA4CbU7Wh4PTeKcKYvjocIQlMN2mgrVNZK7FXsRDdUdyusE8JQC
HHc/990kboC2Ui1MPi/QJm1RvA8+ofAk3py8DK5ixYvBPfeLxtcgHM8d3MpgeLbp
jG1DgnhEcO5lg29vWUsPM8XujUdPSJ8sgJtjCQIiACKgpZhuYKHirrnq+TNtd0Ly
qVrlVNWUuT0zWb9/kRgtcczgXFzP2+DQHH4jRc0csyFtxolt5SmA2tZ3vnlXnoUn
SIOH4V16aY3tndt/uAJh7+Qo1ZsGoeTqzrFWYLdpynDxpNwjWcaLenJ9MogUmbWh
FK+cKsazPWn3MOCxjhgj
=2Abo
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging
x86 and machine queue, 2016-07-27
Highlights:
* Fixes to allow CPU hotplug/unplug in any order;
* Exit QEMU on invalid global properties.
# gpg: Signature made Wed 27 Jul 2016 15:28:53 BST
# gpg: using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6
* remotes/ehabkost/tags/x86-pull-request:
vl: exit if a bad property value is passed to -global
qdev: ignore GlobalProperty.errp for hotplugged devices
machine: Add comment to abort path in machine_set_kernel_irqchip
Revert "pc: Enforce adding CPUs contiguously and removing them in opposite order"
pc: Init CPUState->cpu_index with index in possible_cpus[]
qdev: Fix object reference leak in case device.realize() fails
exec: Set cpu_index only if it's not been explictly set
exec: Don't use cpu_index to detect if cpu_exec_init()'s been called
exec: Reduce CONFIG_USER_ONLY ifdeffenery
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
When passing '-global driver=host-powerpc64-cpu,property=compat,value=foo'
on the command line, without this patch, we get the following warning per
device (which means many lines if the guests has many cpus):
qemu-system-ppc64: Warning: can't apply global host-powerpc64-cpu.compat=foo:
Invalid compatibility mode "foo"
... and QEMU continues execution, ignoring the property.
With this patch, we get a single line:
qemu-system-ppc64: can't apply global host-powerpc64-cpu.compat=foo:
Invalid compatibility mode "foo"
... and QEMU exits.
The previous behavior is kept for hotplugged devices since we don't want
QEMU to exit when doing device_add.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This patch ensures QEMU won't terminate while hotplugging a device if the
global property cannot be set and errp points to error_fatal or error_abort.
While here, it also fixes indentation of the typename argument.
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
We're not supposed to abort when the user passes a bogus value.
Since the checking is done in visit_type_OnOffSplit(), the call
to abort() is legitimate. Let's add a comment to make it
explicit.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
A broken or malicious guest can submit more requests than the virtqueue
size permits, causing unbounded memory allocation in QEMU.
The guest can submit requests without bothering to wait for completion
and is therefore not bound by virtqueue size. This requires reusing
vring descriptors in more than one request, which is not allowed by the
VIRTIO 1.0 specification.
In "3.2.1 Supplying Buffers to The Device", the VIRTIO 1.0 specification
says:
1. The driver places the buffer into free descriptor(s) in the
descriptor table, chaining as necessary
and
Note that the above code does not take precautions against the
available ring buffer wrapping around: this is not possible since the
ring buffer is the same size as the descriptor table, so step (1) will
prevent such a condition.
This implies that placing more buffers into the virtqueue than the
descriptor table size is not allowed.
QEMU is missing the check to prevent this case. Processing a request
allocates a VirtQueueElement leading to unbounded memory allocation
controlled by the guest.
Exit with an error if the guest provides more requests than the
virtqueue size permits. This bounds memory allocation and makes the
buggy guest visible to the user.
This patch fixes CVE-2016-5403 and was reported by Zhenhao Hong from 360
Marvel Team, China.
Reported-by: Zhenhao Hong <hongzhenhao@360.cn>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Mirror can do up to 16 in-flight requests, but actually on full copy
(the whole source disk is non-zero) in-flight is always 1. This happens
as the request is not limited in size: the data occupies maximum available
capacity of s->buf.
The patch limits the size of the request to some artificial constant
(1 Mb here), which is not that big or small. This effectively enables
back parallelism in mirror code as it was designed.
The result is important: the time to migrate 10 Gb disk is reduced from
~350 sec to 170 sec.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1468516741-82174-1-git-send-email-vsementsov@virtuozzo.com
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
CC: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
This reverts commit 4da7faaeb0.
Since commit:
pc: init CPUState->cpu_index with index in possible_cpus[]
cpu_index is stable regardless of the order cpus were created
and QEMU instance stays migratable always so limitation added
by 4da7faaeb could be safely removed.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
It will enshure that cpu_index for a given cpu stays the same
regardless of the order cpus has been created/deleted.
No compat code is needed as for initial cpus index in
possible_cpus[] matches cpu_index that's been auto-allocated
in cpu_exec_init().
Tha same applies for hotplug with cpu-add command if cpus are
added sequentially in increasing order as 'id' matches cpu_index.
If cpu-add had been used for creating out-of-order cpus,
that created unmigratable instance since it were not possible
to start target with the same cpu_index using old way
of migrating instance with hotplugged cpus:
* source QEMU with CLI (-smp 1,maxcpus=3 and cpu-add id=2)
following set of cpu_index is allocated [0, 1] with
apics set [0, 2] respectivelly
* target QEMU is started with CLI -smp 2,maxcpus=3
resulting in set of cpu_index [0, 1] but with
set of apics [0, 1] wich doesn't match source.
So we don't need compat code in this case as it's never worked
and newelly added device_add support would use stable cpu_index
set by machine to begin with, so it won't have above limitation
and source QEMU could be migrated to destination regardless
of the order cpus were created.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
If device doesn't have parent assined before its realize
is called, device_set_realized() will implicitly set parent
to '/machine/unattached'.
However device_set_realized() may fail after that point at
several other points leaving not realized object dangling
in '/machine/unattached' and as result caller of
obj = object_new()
obj->ref == 1
object_property_set_bool(obj,..., true, "realized",...)
obj->ref == 2
if (fail)
object_unref(obj);
obj->ref == 1
will get object leak instead of expected object destruction.
Fix it by making device_set_realized() to cleanup after itself
in case of failure.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
It keeps the legacy behavior for all users that doesn't care
about stable cpu_index value, but would allow boards that
would support device_add/device_del to set stable cpu_index
that won't depend on order in which cpus are created/destroyed.
While at that simplify cpu_get_free_index() as cpu_index
generated by USER_ONLY and softmmu variants is the same
since none of the users support cpu-remove so far, except
of not yet released spapr/x86 device_add/delr, which
will be altered by follow up patches to set stable
cpu_index manually.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Instead use QTAIL's tqe_prev field to detect if cpu's been
placed in list by cpu_exec_init() which is always set if
QTAIL element is in list.
Fixes SIGSEGV on failure path in case cpu_index is assigned
by board and cpu.relalize() fails before cpu_exec_init() is called.
In follow up patches, cpu_index will be assigned by boards that
support cpu hot(un)plug and need stable cpu_index that doesn't
depend on order cpus are created/removed.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The previous commit refactoring iotests.py:
commit 6661397446
Author: Daniel P. Berrange <berrange@redhat.com>
Date: Wed Jul 20 14:23:10 2016 +0100
scripts: refactor the VM class in iotests for reuse
was not properly tested and included a number of broken
bits.
- The 'event_match' method was not moved into qemu.py
- The 'self._args' list parameter in QEMUMachine needs
to be copied otherwise modifications will affect the
global 'qemu_opts' variable in iotests.py
- The QEMUQtestMachine class methods had inverted
parameter order for the super() calls
- The QEMUQtestMachine class forgot to add
'-machine accel=qtest'
- The QEMUQtestMachine class constructor needs to set
a default 'name' value before using it as it may
be None
- The QEMUQtestMachine class constructor needs to use
named parameters when calling the super constructor
as it is leaving out some positional parameters.
- The 'qemu_prog' variable should be a string not a
list in iotests.py
- The VM classs constructor needs to use named
parameters when calling the super constructor
as it is leaving out some positional parameters.
- The path to the socket-scm-helper needs to be
passed into the QEMUMachine class
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1469549767-27249-1-git-send-email-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
The qemu-img info command has the ability to expose format
specific metadata about volumes. Wire up this facility for
the LUKS driver to report on cipher configuration and key
slot usage.
$ qemu-img info ~/VirtualMachines/demo.luks
image: /home/berrange/VirtualMachines/demo.luks
file format: luks
virtual size: 98M (102760448 bytes)
disk size: 100M
encrypted: yes
Format specific information:
ivgen alg: plain64
hash alg: sha1
cipher alg: aes-128
uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594
cipher mode: xts
slots:
[0]:
active: true
iters: 572706
key offset: 4096
stripes: 4000
[1]:
active: false
key offset: 135168
[2]:
active: false
key offset: 266240
[3]:
active: false
key offset: 397312
[4]:
active: false
key offset: 528384
[5]:
active: false
key offset: 659456
[6]:
active: false
key offset: 790528
[7]:
active: false
key offset: 921600
payload offset: 2097152
master key iters: 142375
One somewhat undesirable artifact is that the data fields are
printed out in (apparently) random order. This will be addressed
later by changing the way the block layer pretty-prints the
image specific data.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1469192015-16487-3-git-send-email-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
When creating new block encryption volumes, we accept a list of
parameters to control the formatting process. It is useful to
be able to query what those parameters were for existing block
devices. Add a qcrypto_block_get_info() method which returns a
QCryptoBlockInfo instance to report this data.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1469192015-16487-2-git-send-email-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Correct comments of field notify_me
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Message-id: 1468575858-22975-1-git-send-email-caoj.fnst@cn.fujitsu.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
There are no needs to allocate more than one cluster, as we set
avail_out for deflate to one cluster.
Zlib docs (http://www.zlib.net/manual.html) says:
"deflate compresses as much data as possible, and stops when the input
buffer becomes empty or the output buffer becomes full."
So, deflate will not write more than avail_out to output buffer. If
there is not enough space in output buffer for compressed data (it may
be larger than input data) deflate just returns Z_OK. (if all data is
compressed and written to output buffer deflate returns Z_STREAM_END).
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 1468515565-81313-1-git-send-email-vsementsov@virtuozzo.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Here's the current batch of ppc and spapr related patches intended for
qemu-2.7. Given the late stage in 2.7 development, these are all
bugfixes with one exception:
The "spapr: disintricate core-id from DT semantics" changes the way
ids are assigned in the new core-based hotplug infrastructure. This
isn't strictly a bugfix, but we've determined that the current way of
assigning core-ids will cause considerable grief with future plans for
cpu hotplug. Therefore it's better to fix this now, late in 2.7,
before we have a released version with the problematic numbering.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXltNZAAoJEGw4ysog2bOSKgUQAJ8Mm3JZG2kBbKixIFWiqC4y
lQJdyHcCY5K6xFZmjGlsrMr8vbXejmqZM8LqrFdnNWb+a7Lmegmg2PCnbFZjLxIB
Wjov2s9r4z2Zmlcz/mL05mZIcvUtcXXdstSRwFgExYOWKaT9RYQZNg06Jpejf7eg
/L0SvOiGGa/ZPm+dQKiC+deyR2HJ2dSbGb7ltNzhHsyUK3Iemonro8hCbggFLsXh
/lTpAHPAlki5U2jcjNPaXWlEFuJQYzRmkmEdDSNT+y88YDpM/t4khsOUchjY00cv
SLBfpmZckb+rrvwaw2fkSmDSTj1ZgxXknanpk3V4B5Wlehb9Bgcun47jzzxIHm0k
Yv55lDHVgFnogq852wk9bjxWD4QJJAvXYUXq1Sk4VLNI4o3/7y3gfG+LJ7J2FMvN
TOQHWTsRkCBUHVzllvr29gRGjtNFQBhOoeDyJoM02bohYfr/0Rx5SbQiQwJQdrdy
oqenrbgb3IFnEJMd2codh04dYm8e7YlHwlywTArsgc7ycfzrMHR93/ZBARbYWHpc
lO+5QMJ4iasAJaZXwVyCgPNcxIcmsoI9pASwMAWphFeXaJ9liAPwjYYq55NcTcyt
3nXHTrDinX3ve6QbStFeQ5RlRDMEA/mF3zzGy6kM23qWWY8qJ7A2W5UXBDMPlzBF
P/2Q3dkL92QXP/w8UGTs
=Nznp
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160726' into staging
ppc patch queue 2016-07-26
Here's the current batch of ppc and spapr related patches intended for
qemu-2.7. Given the late stage in 2.7 development, these are all
bugfixes with one exception:
The "spapr: disintricate core-id from DT semantics" changes the way
ids are assigned in the new core-based hotplug infrastructure. This
isn't strictly a bugfix, but we've determined that the current way of
assigning core-ids will cause considerable grief with future plans for
cpu hotplug. Therefore it's better to fix this now, late in 2.7,
before we have a released version with the problematic numbering.
# gpg: Signature made Tue 26 Jul 2016 04:04:57 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.7-20160726:
spapr: disintricate core-id from DT semantics
target-ppc: add PPC_MFTB flag to e500mc and e5500
spapr: fix spapr-nvram migration
hw/ppc/spapr: Make sure to close the htab_fd when migration is canceled
ppc: Huge page detection mechanism fixes - Episode III
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
As of e4650c81, we do w32 builds with -Werror enabled. Unfortunately
for cases where we enable VSS support in qemu-ga, we still have
warnings generated by VSS includes that ship as part of the Microsoft
VSS SDK.
We can selectively address a number of these warnings using
#pragma GCC diagnostic ignored ...
but at least one of these:
warning: ‘typedef’ was ignored in this declaration
resulting from declarations of the form:
typedef struct Blah { ... };
does not provide a specific command-line/pragma option to disable
warnings of the sort.
To allow VSS builds to succeed, the next-best option is disabling
these warnings on a per-file basis. pragmas like #pragma GCC
system_header can be used to declare subsequent includes/declarations
as being exempt from normal warnings, but this must be done within
a header file.
Since we don't control the VSS SDK, we'd need to rely on a
intermediate header include to accomplish this, and
since different objects in the VSS link target rely on different
headers from the VSS SDK, this would become somewhat of a rat's nest
(though not totally unmanageable).
The next step up in granularity is just marking the entire VSS
SDK include path as system headers via -isystem. This is a bit more
heavy-handed, but since this SDK hasn't changed since 2005, there's
likely little to be gained from selectively disabling warnings
anyway, so we implement that approach here.
This fixes the -Werror failures in both the configure test and the
qga build due to shared reliance on $vss_win32_include. For the
same reason, this also enforces a new dependency on -isystem support
in the C/C++ compiler when building QGA with VSS enabled.
Cc: Thomas Huth <thuth@redhat.com>
Cc: Stefan Weil <sw@weilnetz.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Do not create a leaking temporary file, but use a static file instead.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Link a common tests data directory to the build directory.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The goal of this patch is to have a stable core-id which does not depend
on any DT related semantics, which involve non-obvious computations on
modern PowerPC server cpus.
With this patch, the DT core id is computed on-demand as:
(core-id / smp_threads) * smt
where smt is the number of threads per core in the host.
This formula should be consolidated in a helper since it is needed in
several places.
Other uses for core-id includes: compute a stable cpu_index (which
allows random order hotplug/unplug without breaking migration) and
NUMA.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
According to the e500mc and e5500 core reference manual they have support
for the mftb instruction.
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When spapr-nvram is backed by a file using pflash interface,
migration fails on the destination guest with assert:
bdrv_co_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed.
This avoids the problem by delaying the pflash update until after
the device loads complete.
This fix is similar to the one for the pflash_cfi01 migration:
90c647d Fix pflash migration
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When canceling a migration process, we currently do not close the
HTAB migration file descriptor since htab_save_complete() is never
called in that case. So we leave the migration process with a
dangling htab_fd value around, and this causes any further migration
attempts to fail. To fix this issue, simply make sure that the
htab_fd is closed during the migration cleanup stage. And since the
cleanup() function is also called when migration succeeds, we can
also remove the call to close_htab_fd() from the htab_save_complete()
function.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1354341
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
After already fixing two issues with the huge page detection mechanism
(see commit 159d2e39a8 and 86b50f2e1b), Greg Kurz noticed another
case that caused the guest to crash where QEMU announces huge pages
though they should not be available for the guest:
qemu-system-ppc64 -enable-kvm ... -mem-path /dev/hugepages \
-m 1G,slots=4,maxmem=32G
-object memory-backend-ram,policy=default,size=1G,id=mem-mem1 \
-device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
-numa node,nodeid=0 -numa node,nodeid=1
That means if there is a global mem-path option, we still have
to look at the memory-backend objects that have been specified
additionally and return their minimum page size if that value
is smaller than the page size of the main memory.
Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Coverity spots that helper_movcal() calls malloc() but doesn't
check for failure. Fix this by switching to the glib allocation
functions, which abort on allocation failure.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1468327859-21385-1-git-send-email-peter.maydell@linaro.org
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
This introduces a moderately general purpose framework for
testing performance of migration.
The initial guest workload is provided by the included 'stress'
program, which is configured to spawn one thread per guest CPU
and run a maximally memory intensive workload. It will loop
over GB of memory, xor'ing each byte with data from a 4k array
of random bytes. This ensures heavy read and write load across
all of guest memory to stress the migration performance. While
running the 'stress' program will record how long it takes to
xor each GB of memory and print this data for later reporting.
The test engine will spawn a pair of QEMU processes, either on
the same host, or with the target on a remote host via ssh,
using the host kernel and a custom initrd built with 'stress'
as the /init binary. Kernel command line args are set to ensure
a fast kernel boot time (< 1 second) between launching QEMU and
the stress program starting execution.
None the less, the test engine will initially wait N seconds for
the guest workload to stablize, before starting the migration
operation. When migration is running, the engine will use pause,
post-copy, autoconverge, xbzrle compression and multithread
compression features, as well as downtime & bandwidth tuning
to encourage completion. If migration completes, the test engine
will wait N seconds again for the guest workooad to stablize on
the target host. If migration does not complete after a preset
number of iterations, it will be aborted.
While the QEMU process is running on the source host, the test
engine will sample the host CPU usage of QEMU as a whole, and
each vCPU thread. While migration is running, it will record
all the stats reported by 'query-migration'. Finally, it will
capture the output of the stress program running in the guest.
All the data produced from a single test execution is recorded
in a structured JSON file. A separate program is then able to
create interactive charts using the "plotly" python + javascript
libraries, showing the characteristics of the migration.
The data output provides visualization of the effect on guest
vCPU workloads from the migration process, the corresponding
vCPU utilization on the host, and the overall CPU hit from
QEMU on the host. This is correlated from statistics from the
migration process, such as downtime, vCPU throttling and iteration
number.
While the tests can be run individually with arbitrary parameters,
there is also a facility for producing batch reports for a number
of pre-defined scenarios / comparisons, in order to be able to
get standardized results across different hardware configurations
(eg TCP vs RDMA, or comparing different VCPU counts / memory
sizes, etc).
To use this, first you must build the initrd image
$ make tests/migration/initrd-stress.img
To run a a one-shot test with all default parameters
$ ./tests/migration/guestperf.py > result.json
This has many command line args for varying its behaviour.
For example, to increase the RAM size and CPU count and
bind it to specific host NUMA nodes
$ ./tests/migration/guestperf.py \
--mem 4 --cpus 2 \
--src-mem-bind 0 --src-cpu-bind 0,1 \
--dst-mem-bind 1 --dst-cpu-bind 2,3 \
> result.json
Using mem + cpu binding is strongly recommended on NUMA
machines, otherwise the guest performance results will
vary wildly between runs of the test due to lucky/unlucky
NUMA placement, making sensible data analysis impossible.
To make it run across separate hosts:
$ ./tests/migration/guestperf.py \
--dst-host somehostname > result.json
To request that post-copy is enabled, with switchover
after 5 iterations
$ ./tests/migration/guestperf.py \
--post-copy --post-copy-iters 5 > result.json
Once a result.json file is created, a graph of the data
can be generated, showing guest workload performance per
thread and the migration iteration points:
$ ./tests/migration/guestperf-plot.py --output result.html \
--migration-iters --split-guest-cpu result.json
To further include host vCPU utilization and overall QEMU
utilization
$ ./tests/migration/guestperf-plot.py --output result.html \
--migration-iters --split-guest-cpu \
--qemu-cpu --vcpu-cpu result.json
NB, the 'guestperf-plot.py' command requires that you have
the plotly python library installed. eg you must do
$ pip install --user plotly
Viewing the result.html file requires that you have the
plotly.min.js file in the same directory as the HTML
output. This js file is installed as part of the plotly
python library, so can be found in
$HOME/.local/lib/python2.7/site-packages/plotly/offline/plotly.min.js
The guestperf-plot.py program can accept multiple json files
to plot, enabling results from different configurations to
be compared.
Finally, to run the entire standardized set of comparisons
$ ./tests/migration/guestperf-batch.py \
--dst-host somehost \
--mem 4 --cpus 2 \
--src-mem-bind 0 --src-cpu-bind 0,1 \
--dst-mem-bind 1 --dst-cpu-bind 2,3
--output tcp-somehost-4gb-2cpu
will store JSON files from all scenarios in the directory
named tcp-somehost-4gb-2cpu
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1469020993-29426-7-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
If tests use a TCP based monitor socket, the connection will
go into a TIMED_WAIT state when the test exits. This will
randomly prevent the test from being re-run without a certain
time period. Set the SO_REUSEADDR flag on the socket to ensure
we can immediately re-run the tests
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1469020993-29426-6-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
If QEMU fails to launch for some reason, the QEMUMonitorProtocol
class accept() method will wait forever in a socket accept call.
Set a timeout of 15 seconds so that we fail more gracefully
instead of hanging the test script forever
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1469020993-29426-5-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
The iotests module has a python class for controlling QEMU
processes. Pull the generic functionality out of this file
and create a scripts/qemu.py module containing a QEMUMachine
class. Put the QTest integration support into a subclass
QEMUQtestMachine.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1469020993-29426-4-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Add a 'debug' parameter to the QEMUMonitorProtocol class
which will cause it to print out all JSON strings on
sys.stderr
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1469020993-29426-3-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>