In addition to the performance monitor registers found on nearly all
6xx chips, the POWER7 has two additional counters (PMC5 & PMC6) and an
extra control register (MMCRA). This patch adds stub support for them to
qemu - the registers won't do anything, but with this change won't cause
illegal instruction traps accessing them. They're also registered with
their ONE_REG ids, so their value will be kept in sync with KVM where
appropriate.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
PAPR requires that the device tree's CPU nodes have several properties
with information about the L1 cache. We already create two of these
properties, but with incorrect names - "[id]cache-block-size" instead
of "[id]-cache-block-size" (note the extra hyphen).
We were also missing some of the required cache properties. This
patch adds the [id]-cache-line-size properties (which have the same
values as the block size properties in all current cases). We also
add the [id]-cache-size properties.
Adding the cache sizes requires some extra infrastructure in the
general target-ppc code to (optionally) set the cache sizes for
various CPUs. The CPU family descriptions in translate_init.c can set
these sizes - this patch adds correct information for POWER7, I'm
leaving other CPU types to people who have a physical example to
verify against. In addition, for -cpu host we take the values
advertised by the host (if available) and use those to override the
information based on PVR.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
For the pseries machine, we need to advertise to the guest the size of its
RMA - that is the amount of memory it can access with the MMU off. For HV
KVM, this is constrained by the hardware limitations on the virtual RMA of
one hash PTE per PTE group in the hash page table. We already had code to
calculate this, but it was assuming the VRMA page size was the same as the
(host) backing page size for guest RAM.
In the case of a host kernel configured for 64k base page size, but running
on hardware (or firmware) which only allows 4k pages, the hose will do all
its allocations with a 64k page size, but still use 4k hardware pages for
actual mappings. Usually that's transparent to things running under the
host, but in the case of the maximum VRMA size it's not.
This patch refines the RMA size calculation to instead use the largest
available hardware page size (as reported by the SMMU_INFO call) which is
less than or equal to the backing page size. This now gives the correct
RMA size in all cases I've tested.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
When using profiling, we rely on profile_getclock() being available
at our disposal. Somehow that function got moved from an indirect
include we used to have in translate-init.c, so that we were now
left not properly compiling anymore.
Add an explicit include to timer.h which defines profile_getclock,
so that we can compile again.
Signed-off-by: Alexander Graf <agraf@suse.de>
On -M mac99, we can run 970 CPUs. However, these CPUs define the initial
instruction pointer they start execution at as part of their bootup protocol,
so effectively it's up to the board to decide where they start.
This went unnoticed, because they used to boot at the same location our flash
was mapped to, but due to the recent reset changes our 970 CPUs want to reset
to 0x100 now, which is always a 0 instruction.
Set the initial IP to something reasonable for -M mac99.
Signed-off-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Fabien Chouteau <chouteau@adacore.com>
Enable the KVM emulated watchdog if KVM supports (use the
capability enablement in watchdog handler). Also watchdog exit
(KVM_EXIT_WATCHDOG) handling is added.
Watchdog state machine is cleared whenever VM state changes to running.
This is to handle the cases like return from debug halt etc.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
[agraf: rebase to current code base, fix non-kvm cases]
Signed-off-by: Alexander Graf <agraf@suse.de>
Broken in b5a73f8d8a, the carry itself was
fixed in 79482e5ab3. But we still need to
produce the full 64-bit addition.
Simplify the conditions at the top of the functions for when we need a
new temporary. Only plain addition is important enough to warrent avoiding
the temporary, and the extra tcg move op that would come with it.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>
According to the different user's manuals, the vector offset for system
reset (both /HRESET and /SRESET) is 0x00100.
This patch may break support of some executables, as the power-on start
address may change. For a specific board, if the power-on start address
is different than HRESET vector (i.e. 0x00000100 or 0xfff00100), this
should be fixed in board's initialization code.
Signed-off-by: Fabien Chouteau <chouteau@adacore.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
The overflow computation of nego and subf*o instructions has been broken
in commit ffe30937. Contrary to other targets, the instruction is subtract
from an not subtract on PowerPC.
This patch fixes the issue by using the correct argument in the xor
computation. Thanks to Peter Maydell for the hint.
With this change the PPC emulation passes the Gwenole Beauchesne
testsuite again.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>
This value is not needed if we use correctly the MSR[IP] bit.
excp_prefix is always 0x00000000, except when the MSR[IP] bit is
implemented and set to 1, in that case excp_prefix is 0xfff00000.
The handling of MSR[IP] was already implemented but not used at reset
because the value of env->msr was changed "manually".
The patch uses the function hreg_store_msr() to set env->msr, this
ensures a good handling of MSR[IP] at reset, and therefore a good value
for excp_prefix.
Signed-off-by: Fabien Chouteau <chouteau@adacore.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Older KVM versions don't support EPR which breaks guests when we announce
MPIC variants that support EPR.
Catch that case and expose only MPIC version 2.0 which tells the guest that
we don't support the EPR capability yet.
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
[agraf: Add comment, route cap check through kvm_ppc.c]
Signed-off-by: Alexander Graf <agraf@suse.de>
ISEL is a Power ISA 2.06 instruction and thus is available on POWER7.
Given this is trapped and emulated by the Linux kernel, I guess it went
unnoticed.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>
I don't see any reason why these properties are missing.
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Note: Need to apply virtio-rng-refactoring first!
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Make use of new kvm_s390_get_registers_partial() for kvm_handle_css_inst() and
handle_hypercall() since they only need registers from the partial set and they
are called quite frequently.
Signed-off-by: Jason J. Herne <jjherne@us.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
We want to avoid expensive register synchronization IOCTL's on the hot path so
a new kvm_s390_get_registers_partial() is introduced as a compliment to
kvm_arch_get_registers(). The new function is called on the hot path, and
kvm_arch_get_registers() is called when we need the complete runtime register
state.
kvm_arch_put_registers() is updated to only sync the partial runtime set when
we've only dirtied the partial runtime set. This is to avoid sending bad data
back to KVM if we've only partially synced the runtime register set.
Signed-off-by: Jason J. Herne <jjherne@us.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Instead of manually parsing the boot_list as character stream,
we can access the nth boot device, specified by the position in the
boot order.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Since we now have working firmware for s390-ccw in the tree, we can
default to it on our s390-ccw machine, rendering it more useful.
Signed-off-by: Alexander Graf <agraf@suse.de>
Now that we have all the source code ready, add a compiled blob into
the QEMU source tree, so that people without access to an s390 compiler
can run the s390-ccw firmware.
Signed-off-by: Alexander Graf <agraf@suse.de>
Ask the host about the configuration instead of guessing it.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Try to handle at least some of the errors that may happen.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
- Use tpi + tsch to get interrupts.
- Return an error if the irb indicates problems.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
stsch is the canonical way to detect devices. As a bonus, we can
abort the loop if we get cc 3, and we need to check only the valid
devices (dnv set).
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Lets fix this gcc warning:
virtio.c: In function ‘vring_send_buf’:
virtio.c:125:35: error: operation on ‘vr->next_idx’ may be undefined
[-Werror=sequence-point]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
We have to call strip with s390-ccw.elf as input and
s390-ccw.img as output
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Lets build the s390-ccw rom if on s390. Also fix the separate build
folder case.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
dont waste cpu power on an error condition. Lets stop the guest
with a disabled wait.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch adds a makefile, so we can build our ccw firmware. Also
add the resulting binaries to .gitignore, so that nobody is annoyed
they might be in the tree.
Signed-off-by: Alexander Graf <agraf@suse.de>
On s390, there is an architected boot map format that we can read to
boot a certain entry off the disk. Implement a simple reader for this
that always boots the first (default) entry.
Signed-off-by: Alexander Graf <agraf@suse.de>
Like all great programs, we have to call between different functions in
different object files. And all of them need a common ground of defines.
Provide a file that provides these defines.
Signed-off-by: Alexander Graf <agraf@suse.de>
In order to boot, we need to be able to access a virtio-blk device through
the CCW bus. Implement support for this.
Signed-off-by: Alexander Graf <agraf@suse.de>
In order to communicate with the user, we need an I/O mechanism that he
can read. Implement SCLP ASCII support, which happens to be the default
in the s390 ccw machine.
This file is missing read support for now. It can only print messages.
Signed-off-by: Alexander Graf <agraf@suse.de>
This C file is the main driving piece of the s390 ccw firmware. It
provides a search for a workable block device, sets it as the default
to boot off of and boots from it.
Signed-off-by: Alexander Graf <agraf@suse.de>
We want to write most of our code in C, so add a small assembly
stub that jumps straight into C code for us to continue booting.
Signed-off-by: Alexander Graf <agraf@suse.de>
We have a virtio-s390 and a virtio-ccw machine in QEMU. Both use vastly
different ways to do I/O. Having the same firmware blob for both doesn't
really make any sense.
Instead, let's parametrize the firmware file name, so that we can have
different blobs for different machines.
Signed-off-by: Alexander Graf <agraf@suse.de>
Our firmware blob is always a raw file that we load at a fixed address today.
Support loading an ELF blob instead that we can map high up in memory.
This way we don't have to be so conscious about size constraints.
Signed-off-by: Alexander Graf <agraf@suse.de>
We can have different load addresses for different blobs we boot with.
Make the reset IP dynamic, so that we can handle things more flexibly.
Signed-off-by: Alexander Graf <agraf@suse.de>
# By Liu Yuan (4) and others
# Via Stefan Hajnoczi
* stefanha/block:
sheepdog: fix loadvm operation
sheepdog: resend write requests when SD_RES_READONLY is received
sheepdog: add helper function to reload inode
sheepdog: add SD_RES_READONLY result code
sheepdog: cleanup find_vdi_name
rbd: Fix use after free in rbd_open()
block: Disable driver-specific options for 1.5
sheepdog: implement .bdrv_co_is_allocated()
sheepdog: use BDRV_SECTOR_SIZE
sheepdog: add discard/trim support for sheepdog
block/ssh: Require libssh2 >= 1.2.8.
Message-id: 1366976682-10251-1-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
# By Ed Maste (2) and others
# Via Stefan Hajnoczi
* stefanha/trivial-patches:
bsd-user: Track change in FreeBSD SYSCTL(9) types
virtio: Fix compilation without CONFIG_VHOST_SCSI
qemu-doc: Option -ignore-environment removed.
s390x: use CONFIG_INT128 to detect __uint128_t
linux-user: fix compile error due to stray colon at end of #ifdef line
Message-id: 1366975563-16216-1-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Currently the 'loadvm' opertaion works as following:
1. switch to the snapshot
2. mark current working VDI as a snapshot
3. rely on sd_create_branch to create a new working VDI based on the snapshot
This works not the same as other format as QCOW2. For e.g,
qemu > savevm # get a live snapshot snap1
qemu > savevm # snap2
qemu > loadvm 1 # This will steally create snap3 of the working VDI
Which will result in following snapshot chain:
base <-- snap1 <-- snap2 <-- snap3
^
|
working VDI
snap3 was unnecessarily created and might be annoying users.
This patch discard the unnecessary 'snap3' creation. and implement
rollback(loadvm) operation to the specified snapshot by
1. switch to the snapshot
2. delete working VDI
3. rely on sd_create_branch to create a new working VDI based on the snapshot
The snapshot chain for above example will be:
base <-- snap1 <-- snap2
^
|
working VDI
Cc: qemu-devel@nongnu.org
Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
When a snapshot is taken from out side of qemu (e.g. qemu-img
snapshot), write requests to the current vdi return SD_RES_READONLY.
In this case, the sheepdog block driver needs to update the current
inode to the latest one and resend the write requests.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This adds a helper function to update the current inode state with the
specified vdi object.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Sheepdog returns SD_RES_READONLY when qemu sends write requests to the
snapshot vdi. This adds the result code and makes sd_strerror() print
its error reason.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This makes 'filename' and 'tag' constant variables, and renames
'for_snapshot' to 'lock' to clear how it works.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Commit a9ccedc3 frees the QemuOpts for the driver-specific options
immediately, even though it still needs the filename string that is
contained there. This doesn't work. Move the deletion of the QemuOpts to
the end of the function where its content isn't needed any more.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
We don't want to commit to the API yet before everything is worked out.
Disable it for the 1.5 release. This commit is meant to be reverted
after the 1.5 release.
The disabling of the driver-specific options is achieved by applying the
old checks while parsing the command line.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>