Shared-processor partitions are those where a CPU is time-sliced between
partitions, rather than being permanently dedicated to a single
partition. qemu emulated partitions, since they are just scheduled with
the qemu user process, behave mostly like shared processor partitions.
In order to better support shared processor partitions (splpar), PAPR
defines the "VPA" (Virtual Processor Area), a shared memory communication
channel between the hypervisor and partitions. There are also two
additional shared memory communication areas for specialized purposes
associated with the VPA.
A VPA is not essential for operating an splpar, though it can be necessary
for obtaining accurate performance measurements in the presence of
runtime partition switching.
Most importantly, however, the VPA is a prerequisite for PAPR's H_CEDE,
hypercall, which allows a partition OS to give up it's shared processor
timeslices to other partitions when idle.
This patch implements the VPA and H_CEDE hypercalls in qemu. We don't
implement any of the more advanced statistics which can be communicated
through the VPA. However, this is enough to make normal pSeries kernels
do an effective power-save idle on an emulated pSeries, significantly
reducing the host load of a qemu emulated pSeries running an idle guest OS.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Usually, PAPR virtual IO devices use a virtual IOMMU mechanism, TCEs,
to mediate all DMA transfers. While this is necessary for some sorts of
operation, it can be complex to program and slow for others.
This patch implements a mechanism for bypassing TCE translation, treating
"IO" addresses as plain (guest) physical memory addresses. This has two
main uses:
* Simple, but 64-bit aware programs like firmwares can use the VIO devices
without the complexity of TCE setup.
* The guest OS can optionally use the TCE bypass to improve performance in
suitable situations.
The mechanism used is a per-device flag which disables TCE translation.
The flag is toggled with some (hypervisor-implemented) RTAS methods.
Signed-off-by: Ben Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch implements the infrastructure and hypercalls necessary for
the PAPR specified Virtual SCSI interface. This is the normal method
for providing (virtual) disks to PAPR partitions.
Signed-off-by: Ben Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch implements the infrastructure and hypercalls necessary for the
PAPR specified CRQ (Command Request Queue) mechanism. This general
request queueing system is used by many of the PAPR virtual IO devices,
including the virtual scsi adapter.
Signed-off-by: Ben Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch implements the PAPR specified Inter Virtual Machine Logical
LAN; that is the virtual hardware used by the Linux ibmveth driver.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch implements the necessary infrastructure and hypercalls for
sPAPR's TCE (Translation Control Entry) IOMMU mechanism. This is necessary
for all virtual IO devices which do DMA (i.e. nearly all of them).
Signed-off-by: Ben Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Now that we have implemented the PAPR "xics" virtualized interrupt
controller, we can add interrupts in PAPR VIO devices. This patch adds
interrupt support to the PAPR virtual tty/console device.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch adds infrastructure to support interrupts from PAPR virtual IO
devices. This includes correctly advertising those interrupts in the
device tree, and implementing the H_VIO_SIGNAL hypercall, used to
enable and disable individual device interrupts.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
PAPR defines an interrupt control architecture which is logically divided
into ICS (Interrupt Control Presentation, each unit is responsible for
presenting interrupts to a particular "interrupt server", i.e. CPU) and
ICS (Interrupt Control Source, each unit responsible for one or more
hardware interrupts as numbered globally across the system). All PAPR
virtual IO devices expect to deliver interrupts via this mechanism. In
Linux, this interrupt controller system is handled by the "xics" driver.
On pSeries systems, access to the interrupt controller is virtualized via
hypercalls and RTAS methods. However, the virtualized interface is very
similar to the underlying interrupt controller hardware, and similar PICs
exist un-virtualized in some other systems.
This patch implements both the ICP and ICS sides of the PAPR interrupt
controller. For now, only the hypercall virtualized interface is provided,
however it would be relatively straightforward to graft an emulated
register interface onto the underlying interrupt logic if we want to add
a machine with a hardware ICS/ICP system in the future.
There are some limitations in this implementation: it is assumed for now
that only one instance of the ICS exists, although a full xics system can
have several, each responsible for a different group of hardware irqs.
ICP/ICS can handle both level-sensitve (LSI) and message signalled (MSI)
interrupt inputs. For now, this implementation supports only MSI
interrupts, since that is used by PAPR virtual IO devices.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch adds several small utility hypercalls and RTAS methods to
the pSeries platform emulation. Specifically:
* 'display-character' rtas call
This just prints a character to the console, it's occasionally used
for early debug of the OS. The support includes a hack to make this
RTAS call respond on the normal token value present on real hardware,
since some early debugging tools just assume this value without
checking the device tree.
* 'get-time-of-day' rtas call
This one just takes the host real time, converts to the PAPR described
format and returns it to the guest.
* 'power-off' rtas call
This one shuts down the emulated system.
* H_DABR hypercall
On pSeries, the DABR debug register is usually a hypervisor resource
and virtualized through this hypercall. If the hypercall is not
present, Linux will under some circumstances attempt to manipulate the
DABR directly which will fail on this emulated machine.
This stub implementation is enough to stop that behaviour, although it
doesn't actually implement the requested DABR operations as yet.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
On pSeries machines, operating systems can instantiate "RTAS" (Run-Time
Abstraction Services), a runtime component of the firmware which implements
a number of low-level, infrequently used operations. On logical partitions
under a hypervisor, many of the RTAS functions require hypervisor
privilege. For simplicity, therefore, hypervisor systems typically
implement the in-partition RTAS as just a tiny wrapper around a hypercall
which actually implements the various RTAS functions.
This patch implements such a hypercall based RTAS for our emulated pSeries
machine. A tiny in-partition "firmware" calls a new hypercall, which
looks up available RTAS services in a table.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
On pSeries logical partitions, excepting the old POWER4-style full system
partitions, the guest does not have direct access to the hardware page
table. Instead, the pagetable exists in hypervisor memory, and the guest
must manipulate it with hypercalls.
However, our current pSeries emulation more closely resembles the old
style where the guest must set up and handle the pagetables itself. This
patch converts it to act like a modern partition.
This involves two things: first, the hash translation path is modified to
permit the has table to be stored externally to the emulated machine's
RAM. The pSeries machine init code configures the CPUs to use this mode.
Secondly, we emulate the PAPR hypercalls for manipulating the external
hashed page table.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This extends the "pseries" (PAPR) machine to include a virtual IO bus
supporting the PAPR defined hypercall based virtual IO mechanisms.
So far only one VIO device is provided, the vty / vterm, providing
a full console (polled only, for now).
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch adds a "pseries" machine to qemu. This aims to emulate a
logical partition on an IBM pSeries machine, compliant to the
"PowerPC Architecture Platform Requirements" (PAPR) document.
This initial version is quite limited, it implements a basic machine
and PAPR hypercall emulation. So far only one hypercall is present -
H_PUT_TERM_CHAR - so that a (write-only) console is available.
Multiple CPUs are permitted, with SMP entry handled kexec() style.
The machine so far more resembles an old POWER4 style "full system
partition" rather than a modern LPAR, in that the guest manages the
page tables directly, rather than via hypercalls.
The machine requires qemu to be configured with --enable-fdt. The
machine can (so far) only be booted with -kernel - i.e. no partition
firmware is provided.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds emulation support for the recent POWER7 cpu to qemu. It's far
from perfect - it's missing a number of POWER7 features so far, including
any support for VSX or decimal floating point instructions. However, it's
close enough to boot a kernel with the POWER7 PVR.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
vhost was passing a physical address to cpu_physical_memory_set_dirty,
which is wrong: we need to translate to ram address first.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Note: this lead to crashes during migration, so the patch
is needed on the stable branch too.
Reduce spurious packet drops on RX ring empty
by verifying that we have at least 1 buffer
ahead of the time.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit c81131db15
detects old guests by comparing virtio and
PCI status. It attempts to do this on load,
as well, but load_config callback in a binding
is invoked too early and so the virtio status
isn't set yet.
We could add yet another callback to the
binding, to invoke after load, but it
seems easier to reuse the existing vmstate
callback.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Alexander Graf <agraf@suse.de>
(slot, fn) pair is somewhat confusing because of ARI.
So use devfn for pci_find_device() instead of (slot, fn).
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introduce symbol PCI_SLOT_MAX for the # of slots,
and replace the magic, 256.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add support to the emulated hardware to insert vlan tags in packets
going from the guest to the network.
Signed-off-by: Benjamin Poirier <benjamin.poirier@gmail.com>
Cc: Igor V. Kovalenko <igor.v.kovalenko@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Add support to the emulated hardware to extract vlan tags in packets
going from the network to the guest.
Signed-off-by: Benjamin Poirier <benjamin.poirier@gmail.com>
Cc: Igor V. Kovalenko <igor.v.kovalenko@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Blue Swirl <blauwirbel@gmail.com>
--
AFAIK, extraction is optional to get vlans working. The driver
requests rx detagging but should not assume that it was done. Under
Linux, the mac layer will catch the vlan ethertype. I only added this
part for completeness (to emulate the hardware more truthfully...)
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
clean out ifdef's around ethernet checksum calculation
Signed-off-by: Benjamin Poirier <benjamin.poirier@gmail.com>
Acked-by: Igor V. Kovalenko <igor.v.kovalenko@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Latest refactorings left vmmouse nonfunctional behind. Fix it by adding
the required device initialization.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The PCI/PCI-X Family of Gigabit Ethernet Controllers Software
Developer’s Manual states the following about the POPTS field:
Provides a number of options which control the handling of this
packet. This field is ignored except on the first data descriptor of
a packet.
The current implementation always loads the field and its checksum
offload flags. This patch uses only the first descriptor's POPTS field
in order to comply with the specification.
When Solaris sends multi-descriptor packets it fills in POPTS for the
first descriptor only. Therefore this patch is necessary in order to
perform checksum offload correctly for multi-descriptor packets.
Reported-by: Daniel Pecka <dpecka@techniservit.cz>
Reported-by: Gabriele A. Trombetti <gabriele.trombetti@itb.cnr.it>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* 'for-anthony' of git://github.com/bonzini/qemu:
remove qemu_get_clock
add a generic scaling mechanism for timers
change all other clock references to use nanosecond resolution accessors
change all rt_clock references to use millisecond resolution accessors
add more helper functions with explicit milli/nanosecond resolution
* 'for-anthony' of git://repo.or.cz/qemu/kevin:
Add qcow2 documentation
hw/xen_disk: aio_inflight not released in handling ioreq when nr_segments==0
Improve error handling in do_snapshot_blkdev()
Fix ATA SMART and CHECK POWER MODE
Don't allow multiwrites against a block device without underlying medium
tools: Use real async.c instead of stubs
Add error message for loading snapshot without VM state
block/qcow: Don't ignore immediate read/write and other failures
block/vdi: Don't ignore immediate read/write failures
Add support for the Versatile Express SYS_CFG registers, which provide
a generic means of reading or writing configuration information from
various parts of the board. We only implement shutdown and reset.
Also make the RESETCTL register RAZ/WI on Versatile Express rather
than reset the board. Other system registers are generally the same
as Versatile and Realview.
This includes a VMState version number bump for arm_sysctl,
since we have new register state to preserve. It also adds
sys_mci to the VMState while we're bumping the version number
(an accidental omission from commit b50ff6f5).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Prevent:
-chardev socket,path=/tmp/foo,server,nowait,id=c0 \
-device virtserialport,chardev=c0,id=vs0 \
-device virtserialport,chardev=c0,id=vs1
Reported-by: Mike Cao <bcao@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
After a hot-unplug operation, the previous behaviour was to close the
chardev. That meant the chardev couldn't be re-used. Also, since
chardev hot-plug isn't possible so far, this means virtio-console
hot-plug isn't feasible as well.
With this change, the chardev is kept around. A new virtio-console
channel can then be hot-plugged with the same chardev and things will
continue to work.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
After a port unplug operation, the port->info->have_data() pointer was
set to NULL. The problem is, the ->info struct is shared by all ports,
effectively disabling writes to other ports.
Reported-by: juzhang <juzhang@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
There's no code change, just re-arrangement to simplify the function
after recent modifications.
Reported-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Enable ioeventfd for virtio-serial devices by default. Commit
25db9ebe15 lists the benefits of using
ioeventfd.
Copying a file from guest to host over a virtio-serial channel didn't
show much difference in time or io_exit rate.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Port 0 is reserved for virtconsole devices for backward compatibility
with the old -virtioconsole (from qemu 0.12) device type.
libvirt prior to commit 8e28c5d40200b4c5d483bd585d237b9d870372e5 used
port 0 for generic ports. libvirt will no longer do that, but disallow
instantiating generic ports at id 0 from qemu as well.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Instead of using a single variable to pass to the virtio_serial_init
function, use a struct so that expanding the number of variables to be
passed on later is easier.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
This was done with:
sed -i 's/qemu_get_clock\>/qemu_get_clock_ns/' \
$(git grep -l 'qemu_get_clock\>' )
sed -i 's/qemu_new_timer\>/qemu_new_timer_ns/' \
$(git grep -l 'qemu_new_timer\>' )
after checking that get_clock and new_timer never occur twice
on the same line. There were no missed occurrences; however, even
if there had been, they would have been caught by the compiler.
There was exactly one false positive in qemu_run_timers:
- current_time = qemu_get_clock (clock);
+ current_time = qemu_get_clock_ns (clock);
which is of course not in this patch.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This was done with:
sed -i '/get_clock\>.*rt_clock/s/get_clock\>/get_clock_ms/' \
$(git grep -l 'get_clock\>.*rt_clock' )
sed -i '/new_timer\>.*rt_clock/s/new_timer\>/new_timer_ms/' \
$(git grep -l 'new_timer\>.*rt_clock' )
after checking that get_clock and new_timer never occur twice
on the same line. There were no missed occurrences; however, even
if there had been, they would have been caught by the compiler.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove a write-only variable, spotted by GCC 4.6.0:
/src/qemu/hw/petalogix_ml605_mmu.c: In function 'petalogix_ml605_init':
/src/qemu/hw/petalogix_ml605_mmu.c:153:11: error: variable 'serial' set but not used [-Werror=unused-but-set-variable]
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
qdev conversion broke migration as the previous version used vmstate
instance IDs derived from the iobase. Fix it by registering a legacy
alias.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Add the first Microblaze little endian platform.
Platform uses uart16550, axi ethernet, timer, intc.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@petalogix.com>
In hw/xen_disk.c, async writing ioreq is leaked when
ioreq->req.nr_segments==0, because `aio_inflight` flag is not released
properly (skipped by misplaced "break").
Signed-off-by: Feiran Zheng <famcool@gmail.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch fixes two things:
1) CHECK POWER MODE
The error return value wasn't always zero, so it would show up as
offline. Error is now explicitly set to zero.
2) SMART
The smart values that were returned were invalid and tools like skdump
would not recognize that the smart data was actually valid and would
dump weird output. The data has been fixed up and raw value support
was added. Tools like skdump and palimpsest work as expected.
Signed-off-by: Brian Wheeler <bdwheele@indiana.edu>
Acked-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This ensures env->halt_cond is broadcast, and the loop in
qemu_tcg_wait_io_event and qemu_kvm_wait_io_event is exited
naturally rather than through a timeout.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>