194 lines
7.0 KiB
ReStructuredText
194 lines
7.0 KiB
ReStructuredText
|
PowerNV family boards (``powernv8``, ``powernv9``)
|
||
|
==================================================================
|
||
|
|
||
|
PowerNV (as Non-Virtualized) is the "baremetal" platform using the
|
||
|
OPAL firmware. It runs Linux on IBM and OpenPOWER systems and it can
|
||
|
be used as an hypervisor OS, running KVM guests, or simply as a host
|
||
|
OS.
|
||
|
|
||
|
The PowerNV QEMU machine tries to emulate a PowerNV system at the
|
||
|
level of the skiboot firmware, which loads the OS and provides some
|
||
|
runtime services. Power Systems have a lower firmware (HostBoot) that
|
||
|
does low level system initialization, like DRAM training. This is
|
||
|
beyond the scope of what QEMU addresses today.
|
||
|
|
||
|
Supported devices
|
||
|
-----------------
|
||
|
|
||
|
* Multi processor support for POWER8, POWER8NVL and POWER9.
|
||
|
* XSCOM, serial communication sideband bus to configure chiplets
|
||
|
* Simple LPC Controller
|
||
|
* Processor Service Interface (PSI) Controller
|
||
|
* Interrupt Controller, XICS (POWER8) and XIVE (POWER9)
|
||
|
* POWER8 PHB3 PCIe Host bridge and POWER9 PHB4 PCIe Host bridge
|
||
|
* Simple OCC is an on-chip microcontroller used for power management
|
||
|
tasks
|
||
|
* iBT device to handle BMC communication, with the internal BMC
|
||
|
simulator provided by QEMU or an external BMC such as an Aspeed
|
||
|
QEMU machine.
|
||
|
* PNOR containing the different firmware partitions.
|
||
|
|
||
|
Missing devices
|
||
|
---------------
|
||
|
|
||
|
A lot is missing, among which :
|
||
|
|
||
|
* POWER10 processor
|
||
|
* XIVE2 (POWER10) interrupt controller
|
||
|
* I2C controllers (yet to be merged)
|
||
|
* NPU/NPU2/NPU3 controllers
|
||
|
* EEH support for PCIe Host bridge controllers
|
||
|
* NX controller
|
||
|
* VAS controller
|
||
|
* chipTOD (Time Of Day)
|
||
|
* Self Boot Engine (SBE).
|
||
|
* FSI bus
|
||
|
|
||
|
Firmware
|
||
|
--------
|
||
|
|
||
|
The OPAL firmware (OpenPower Abstraction Layer) for OpenPower systems
|
||
|
includes the runtime services `skiboot` and the bootloader kernel and
|
||
|
initramfs `skiroot`. Source code can be found on GitHub:
|
||
|
|
||
|
https://github.com/open-power.
|
||
|
|
||
|
Prebuilt images of `skiboot` and `skiboot` are made available on the `OpenPOWER <https://openpower.xyz/job/openpower/job/openpower-op-build/>`__ site. To boot a POWER9 machine, use the `witherspoon <https://openpower.xyz/job/openpower/job/openpower-op-build/label=slave,target=witherspoon/lastSuccessfulBuild/>`__ images. For POWER8, use
|
||
|
the `palmetto <https://openpower.xyz/job/openpower/job/openpower-op-build/label=slave,target=palmetto/lastSuccessfulBuild/>`__ images.
|
||
|
|
||
|
QEMU includes a prebuilt image of `skiboot` which is updated when a
|
||
|
more recent version is required by the models.
|
||
|
|
||
|
Boot options
|
||
|
------------
|
||
|
|
||
|
Here is a simple setup with one e1000e NIC :
|
||
|
|
||
|
.. code-block:: bash
|
||
|
|
||
|
$ qemu-system-ppc64 -m 2G -machine powernv9 -smp 2,cores=2,threads=1 \
|
||
|
-accel tcg,thread=single \
|
||
|
-device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0 \
|
||
|
-netdev user,id=net0,hostfwd=::20022-:22,hostname=pnv \
|
||
|
-kernel ./zImage.epapr \
|
||
|
-initrd ./rootfs.cpio.xz \
|
||
|
-nographic
|
||
|
|
||
|
and a SATA disk :
|
||
|
|
||
|
.. code-block:: bash
|
||
|
|
||
|
-device ich9-ahci,id=sata0,bus=pcie.1,addr=0x0 \
|
||
|
-drive file=./ubuntu-ppc64le.qcow2,if=none,id=drive0,format=qcow2,cache=none \
|
||
|
-device ide-hd,bus=sata0.0,unit=0,drive=drive0,id=ide,bootindex=1 \
|
||
|
|
||
|
Complex PCIe configuration
|
||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
Six PHBs are defined per chip (POWER9) but no default PCI layout is
|
||
|
provided (to be compatible with libvirt). One PCI device can be added
|
||
|
on any of the available PCIe slots using command line options such as:
|
||
|
|
||
|
.. code-block:: bash
|
||
|
|
||
|
-device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0
|
||
|
-netdev bridge,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=hostnet0
|
||
|
|
||
|
-device megasas,id=scsi0,bus=pcie.0,addr=0x0
|
||
|
-drive file=./ubuntu-ppc64le.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none
|
||
|
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2
|
||
|
|
||
|
Here is a full example with two different storage controllers on
|
||
|
different PHBs, each with a disk, the second PHB is empty :
|
||
|
|
||
|
.. code-block:: bash
|
||
|
|
||
|
$ qemu-system-ppc64 -m 2G -machine powernv9 -smp 2,cores=2,threads=1 -accel tcg,thread=single \
|
||
|
-kernel ./zImage.epapr -initrd ./rootfs.cpio.xz -bios ./skiboot.lid \
|
||
|
\
|
||
|
-device megasas,id=scsi0,bus=pcie.0,addr=0x0 \
|
||
|
-drive file=./rhel7-ppc64le.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none \
|
||
|
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2 \
|
||
|
\
|
||
|
-device pcie-pci-bridge,id=bridge1,bus=pcie.1,addr=0x0 \
|
||
|
\
|
||
|
-device ich9-ahci,id=sata0,bus=bridge1,addr=0x1 \
|
||
|
-drive file=./ubuntu-ppc64le.qcow2,if=none,id=drive0,format=qcow2,cache=none \
|
||
|
-device ide-hd,bus=sata0.0,unit=0,drive=drive0,id=ide,bootindex=1 \
|
||
|
-device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=bridge1,addr=0x2 \
|
||
|
-netdev bridge,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=net0 \
|
||
|
-device nec-usb-xhci,bus=bridge1,addr=0x7 \
|
||
|
\
|
||
|
-serial mon:stdio -nographic
|
||
|
|
||
|
You can also use VIRTIO devices :
|
||
|
|
||
|
.. code-block:: bash
|
||
|
|
||
|
-drive file=./fedora-ppc64le.qcow2,if=none,snapshot=on,id=drive0 \
|
||
|
-device virtio-blk-pci,drive=drive0,id=blk0,bus=pcie.0 \
|
||
|
\
|
||
|
-netdev tap,helper=/usr/lib/qemu/qemu-bridge-helper,br=virbr0,id=netdev0 \
|
||
|
-device virtio-net-pci,netdev=netdev0,id=net0,bus=pcie.1 \
|
||
|
\
|
||
|
-fsdev local,id=fsdev0,path=$HOME,security_model=passthrough \
|
||
|
-device virtio-9p-pci,fsdev=fsdev0,mount_tag=host,bus=pcie.2
|
||
|
|
||
|
Multi sockets
|
||
|
~~~~~~~~~~~~~
|
||
|
|
||
|
The number of sockets is deduced from the number of CPUs and the
|
||
|
number of cores. ``-smp 2,cores=1`` will define a machine with 2
|
||
|
sockets of 1 core, whereas ``-smp 2,cores=2`` will define a machine
|
||
|
with 1 socket of 2 cores. ``-smp 8,cores=2``, 4 sockets of 2 cores.
|
||
|
|
||
|
BMC configuration
|
||
|
~~~~~~~~~~~~~~~~~
|
||
|
|
||
|
OpenPOWER systems negotiate the shutdown and reboot with their
|
||
|
BMC. The QEMU PowerNV machine embeds an IPMI BMC simulator using the
|
||
|
iBT interface and should offer the same power features.
|
||
|
|
||
|
If you want to define your own BMC, use ``-nodefaults`` and specify
|
||
|
one on the command line :
|
||
|
|
||
|
.. code-block:: bash
|
||
|
|
||
|
-device ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10
|
||
|
|
||
|
The files `palmetto-SDR.bin <http://www.kaod.org/qemu/powernv/palmetto-SDR.bin>`__
|
||
|
and `palmetto-FRU.bin <http://www.kaod.org/qemu/powernv/palmetto-FRU.bin>`__
|
||
|
define a Sensor Data Record repository and a Field Replaceable Unit
|
||
|
inventory for a palmetto BMC. They can be used to extend the QEMU BMC
|
||
|
simulator.
|
||
|
|
||
|
.. code-block:: bash
|
||
|
|
||
|
-device ipmi-bmc-sim,sdrfile=./palmetto-SDR.bin,fruareasize=256,frudatafile=./palmetto-FRU.bin,id=bmc0 \
|
||
|
-device isa-ipmi-bt,bmc=bmc0,irq=10
|
||
|
|
||
|
The PowerNV machine can also be run with an external IPMI BMC device
|
||
|
connected to a remote QEMU machine acting as BMC, using these options
|
||
|
:
|
||
|
|
||
|
.. code-block:: bash
|
||
|
|
||
|
-chardev socket,id=ipmi0,host=localhost,port=9002,reconnect=10 \
|
||
|
-device ipmi-bmc-extern,id=bmc0,chardev=ipmi0 \
|
||
|
-device isa-ipmi-bt,bmc=bmc0,irq=10 \
|
||
|
-nodefaults
|
||
|
|
||
|
NVRAM
|
||
|
~~~~~
|
||
|
|
||
|
Use a MTD drive to add a PNOR to the machine, and get a NVRAM :
|
||
|
|
||
|
.. code-block:: bash
|
||
|
|
||
|
-drive file=./witherspoon.pnor,format=raw,if=mtd
|
||
|
|
||
|
CAVEATS
|
||
|
-------
|
||
|
|
||
|
* No support for multiple HW threads (SMT=1). Same as pseries.
|
||
|
* CPU can hang when doing intensive I/Os. Use ``-append powersave=off`` in that case.
|