docs: Introducing pseries documentation.

The purpose of this document is to substitute the content currently
available in the QEMU wiki at [0]. This initial version does contain
some additional content as well. Whenever this documentation gets
upstream and is reflected in [1], the QEMU wiki will be edited to point
to this documentation, so that we only need to keep it updated in one
place.

0. https://wiki.qemu.org/Documentation/Platforms/POWER
1. https://qemu.readthedocs.io/en/latest/system/ppc/pseries.html

Signed-off-by: Leonardo Garcia <lagarcia@br.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <66b6fdde52062fdf4f4b4dc35a9f06a899c88293.1638981899.git.lagarcia@br.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
This commit is contained in:
Leonardo Garcia 2021-12-17 17:57:13 +01:00 committed by Cédric Le Goater
parent bbfbbff5fc
commit 88581cc43b

View File

@ -1,12 +1,238 @@
pSeries family boards (``pseries``)
===================================
The Power machine para-virtualized environment described by the `Linux on Power
Architecture Reference document (LoPAR)
<https://openpowerfoundation.org/wp-content/uploads/2020/07/LoPAR-20200812.pdf>`_
is called pSeries. This environment is also known as sPAPR, System p guests, or
simply Power Linux guests (although it is capable of running other operating
systems, such as AIX).
Even though pSeries is designed to behave as a guest environment, it is also
capable of acting as a hypervisor OS, providing, on that role, nested
virtualization capabilities.
Supported devices
-----------------
* Multi processor support for many Power processors generations: POWER7,
POWER7+, POWER8, POWER8NVL, POWER9, and Power10. Support for POWER5+ exists,
but its state is unknown.
* Interrupt Controller, XICS (POWER8) and XIVE (POWER9 and Power10)
* vPHB PCIe Host bridge.
* vscsi and vnet devices, compatible with the same devices available on a
PowerVM hypervisor with VIOS managing LPARs.
* Virtio based devices.
* PCIe device pass through.
Missing devices
---------------
* SPICE support.
Firmware
--------
`SLOF <https://github.com/aik/SLOF>`_ (Slimline Open Firmware) is an
implementation of the `IEEE 1275-1994, Standard for Boot (Initialization
Configuration) Firmware: Core Requirements and Practices
<https://standards.ieee.org/standard/1275-1994.html>`_.
QEMU includes a prebuilt image of SLOF which is updated when a more recent
version is required.
Build directions
----------------
.. code-block:: bash
./configure --target-list=ppc64-softmmu && make
Running instructions
--------------------
Someone can select the pSeries machine type by running QEMU with the following
options:
.. code-block:: bash
qemu-system-ppc64 -M pseries <other QEMU arguments>
sPAPR devices
-------------
The sPAPR specification defines a set of para-virtualized devices, which are
also supported by the pSeries machine in QEMU and can be instantiated with the
``-device`` option:
* ``spapr-vlan`` : a virtual network interface.
* ``spapr-vscsi`` : a virtual SCSI disk interface.
* ``spapr-rng`` : a pseudo-device for passing random number generator data to the
guest (see the `H_RANDOM hypercall feature
<https://wiki.qemu.org/Features/HRandomHypercall>`_ for details).
* ``spapr-vty``: a virtual teletype.
* ``spapr-pci-host-bridge``: a PCI host bridge.
* ``tpm-spapr``: a Trusted Platform Module (TPM).
* ``spapr-tpm-proxy``: a TPM proxy.
These are compatible with the devices historically available for use when
running the IBM PowerVM hypervisor with LPARs.
However, since these devices have originally been specified with another
hypervisor and non-Linux guests in mind, you should use the virtio counterparts
(virtio-net, virtio-blk/scsi and virtio-rng for instance) if possible instead,
since they will most probably give you better performance with Linux guests in a
QEMU environment.
The pSeries machine in QEMU is always instantiated with the following devices:
* A NVRAM device (``spapr-nvram``).
* A virtual teletype (``spapr-vty``).
* A PCI host bridge (``spapr-pci-host-bridge``).
Hence, it is not needed to add them manually, unless you use the ``-nodefaults``
command line option in QEMU.
In the case of the default ``spapr-nvram`` device, if someone wants to make the
contents of the NVRAM device persistent, they will need to specify a PFLASH
device when starting QEMU, i.e. either use
``-drive if=pflash,file=<filename>,format=raw`` to set the default PFLASH
device, or specify one with an ID
(``-drive if=none,file=<filename>,format=raw,id=pfid``) and pass that ID to the
NVRAM device with ``-global spapr-nvram.drive=pfid``.
sPAPR specification
^^^^^^^^^^^^^^^^^^^
The main source of documentation on the sPAPR standard is the `Linux on Power
Architecture Reference document (LoPAR)
<https://openpowerfoundation.org/wp-content/uploads/2020/07/LoPAR-20200812.pdf>`_.
However, documentation specific to QEMU's implementation of the specification
can also be found in QEMU documentation:
.. toctree::
:maxdepth: 1
../../specs/ppc-spapr-numa.rst
../../specs/ppc-spapr-xive.rst
Other documentation available in QEMU docs directory:
* Hypervisor calls (a.k.a. hcalls) (``docs/specs/ppc-spapr-hcalls.txt``).
* Hot plug (``/docs/specs/ppc-spapr-hotplug.txt``).
* Hypervisor calls needed by the Ultravisor
(``/docs/specs/ppc-spapr-uv-hcalls.txt``).
Switching between the KVM-PR and KVM-HV kernel module
-----------------------------------------------------
Currently, there are two implementations of KVM on Power, ``kvm_hv.ko`` and
``kvm_pr.ko``.
If a host supports both KVM modes, and both KVM kernel modules are loaded, it is
possible to switch between the two modes with the ``kvm-type`` parameter:
* Use ``qemu-system-ppc64 -M pseries,accel=kvm,kvm-type=PR`` to use the
``kvm_pr.ko`` kernel module.
* Use ``qemu-system-ppc64 -M pseries,accel=kvm,kvm-type=HV`` to use ``kvm_hv.ko``
instead.
KVM-PR
^^^^^^
KVM-PR uses the so-called **PR**\ oblem state of the PPC CPUs to run the guests,
i.e. the virtual machine is run in user mode and all privileged instructions
trap and have to be emulated by the host. That means you can run KVM-PR inside
a pSeries guest (or a PowerVM LPAR for that matter), and that is where it has
originated, as historically (prior to POWER7) it was not possible to run Linux
on hypervisor mode on a Power processor (this function was restricted to
PowerVM, the IBM proprietary hypervisor).
Because all privileged instructions are trapped, guests that use a lot of
privileged instructions run quite slow with KVM-PR. On the other hand, because
of that, this kernel module can run on pretty much every PPC hardware, and is
able to emulate a lot of guests CPUs. This module can even be used to run other
PowerPC guests like an emulated PowerMac.
As KVM-PR can be run inside a pSeries guest, it can also provide nested
virtualization capabilities (i.e. running a guest from within a guest).
It is important to notice that, as KVM-HV provides a much better execution
performance, maintenance work has been much more focused on it in the past
years. Maintenance for KVM-PR has been minimal.
In order to run KVM-PR guests with POWER9 processors, someone will need to start
QEMU with ``kernel_irqchip=off`` command line option.
KVM-HV
^^^^^^
KVM-HV uses the hypervisor mode of more recent Power processors, that allow
access to the bare metal hardware directly. Although POWER7 had this capability,
it was only starting with POWER8 that this was officially supported by IBM.
Originally, KVM-HV was only available when running on a PowerNV platform (a.k.a.
Power bare metal). Although it runs on a PowerNV platform, it can only be used
to start pSeries guests. As the pSeries guest doesn't have access to the
hypervisor mode of the Power CPU, it wasn't possible to run KVM-HV on a guest.
This limitation has been lifted, and now it is possible to run KVM-HV inside
pSeries guests as well, making nested virtualization possible with KVM-HV.
As KVM-HV has access to privileged instructions, guests that use a lot of these
can run much faster than with KVM-PR. On the other hand, the guest CPU has to be
of the same type as the host CPU this way, e.g. it is not possible to specify an
embedded PPC CPU for the guest with KVM-HV. However, there is at least the
possibility to run the guest in a backward-compatibility mode of the previous
CPUs generations, e.g. you can run a POWER7 guest on a POWER8 host by using
``-cpu POWER8,compat=power7`` as parameter to QEMU.
Modules support
---------------
As noticed in the sections above, each module can run in a different
environment. The following table shows with which environment each module can
run. As long as you are in a supported environment, you can run KVM-PR or KVM-HV
nested. Combinations not shown in the table are not available.
+--------------+------------+------+-------------------+----------+--------+
| Platform | Host type | Bits | Page table format | KVM-HV | KVM-PR |
+==============+============+======+===================+==========+========+
| PowerNV | bare metal | 32 | hash | no | yes |
| | | +-------------------+----------+--------+
| | | | radix | N/A | N/A |
| | +------+-------------------+----------+--------+
| | | 64 | hash | yes | yes |
| | | +-------------------+----------+--------+
| | | | radix | yes | no |
+--------------+------------+------+-------------------+----------+--------+
| pSeries [1]_ | PowerNV | 32 | hash | no | yes |
| | | +-------------------+----------+--------+
| | | | radix | N/A | N/A |
| | +------+-------------------+----------+--------+
| | | 64 | hash | no | yes |
| | | +-------------------+----------+--------+
| | | | radix | yes [2]_ | no |
| +------------+------+-------------------+----------+--------+
| | PowerVM | 32 | hash | no | yes |
| | | +-------------------+----------+--------+
| | | | radix | N/A | N/A |
| | +------+-------------------+----------+--------+
| | | 64 | hash | no | yes |
| | | +-------------------+----------+--------+
| | | | radix [3]_ | no | yes |
+--------------+------------+------+-------------------+----------+--------+
.. [1] On POWER9 DD2.1 processors, the page table format on the host and guest
must be the same.
.. [2] KVM-HV cannot run nested on POWER8 machines.
.. [3] Introduced on Power10 machines.
Maintainer contact information
------------------------------
Cédric Le Goater <clg@kaod.org>
Daniel Henrique Barboza <danielhb413@gmail.com>