This patch implements MSI and MSI-X support for the pseries PCI host
bridge. To do this it adds:
* A "config_space_address to msi_table" map, since the MSI RTAS calls
take a PCI config space address as an identifier.
* A MSIX memory region to catch msi_notify()/msix_notiry() from
virtio-pci and pass them to the guest via qemu_irq_pulse().
* RTAS call "ibm,change-msi" which sets up MSI vectors for a
device. Note that this call may configure and return lesser number of
vectors than requested.
* RTAS call "ibm,query-interrupt-source-number" which translates MSI
vector to interrupt controller (XICS) IRQ number.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[agraf: fix error case ndev < 0]
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds a trace event in the pseries PCI specific set_irq() function to
assist in debugging.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[agraf: add trace.h include]
Signed-off-by: Alexander Graf <agraf@suse.de>
The pseries PCI code makes use of an internal find_dev() function which
locates a PCIDevice * given a (platform specific) bus ID and device
address. Internally this needs to first locate the host bridge on which
the device resides based on the bus ID. This patch exposes that host
bridge lookup as a separate function, which we will need later in the MSI
and VFIO code.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[agraf: drop trace.h inclusion]
Signed-off-by: Alexander Graf <agraf@suse.de>
The patch adds a simple helper which allocates a consecutive sequence
of IRQs calling spapr_allocate_irq for each and checks that allocated
IRQs go consequently.
The patch is required for upcoming support of MSI/MSIX on POWER.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently the RTAS functions for handling PCI are registered from the
class init code for the PCI host bridge. That sort of makes sense
now, but will break in the future when vfio gives us multiple types of
host bridge for pseries (emulated and pass-through, at least). The
RTAS functions will be common across all host bridge types (and will
call out to different places internally depending on the type).
So, this patch moves the RTAS registration into its own function
called direct from the machine setup code.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently, the interfaces in the pseries machine code for assignment
and setup of interrupts pass around qemu_irq objects. That was done
in an attempt not to be too closely linked to the specific XICS
interrupt controller. However interactions with the device tree setup
made that attempt rather futile, and XICS is part of the PAPR spec
anyway, so this really just meant we had to carry both the qemu_irq
pointers and the XICS irq numbers around.
This mess will just get worse when we add upcoming PCI MSI support,
since that will require tracking a bunch more interrupt. Therefore,
this patch reworks the spapr code to just use XICS irq numbers
(roughly equivalent to GSIs on x86) and only retrieve the qemu_irq
pointers from the XICS code when we need them (a trivial lookup).
This is a reworked and generalized version of an earlier spapr_pci
specific patch from Alexey Kardashevskiy.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[agraf: fix checkpath warning]
Signed-off-by: Alexander Graf <agraf@suse.de>
The pseries machine prints several messages to stderr whenever it starts up
and another whenever the vm is reset. It's not normal for qemu machines to
do this though, so this patch removes them. We can put them back
conditional on a DEBUG symbol if we really need them in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch updates the SLOF version, introducing a number of fixes:
* add proper graphics support
* fix bugs with graphical terminal under grub2
* fix bugs in handling of 64-bit unit addresses
* fix VSCSI representation to be closer to PowerVM
* fix bugs which caused grub2 to crash
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
When selecting our VGA adapter, we want to:
* fail completely when we can't satisfy the user's request
* support -nographic where no VGA adapter should be spawned
This patch reworks the logic so we fulfill the two conditions above.
Signed-off-by: Alexander Graf <agraf@suse.de>
When compiling the xbzrle code on my ppc32 user space, I hit the following
gcc compiler warning (treated as an error):
cc1: warnings being treated as errors
savevm.c: In function ‘xbzrle_encode_buffer’:
savevm.c:2476: error: overflow in implicit constant conversion
Fix this by making the cast explicit, rather than implicit.
Signed-off-by: Alexander Graf <agraf@suse.de>
Also instanciate the USB keyboard and mouse when that option is used
(you can still use -device to create individual devices without all
the defaults)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
[agraf: remove USB bits]
Signed-off-by: Alexander Graf <agraf@suse.de>
Functions pci_vga_init() and pci_cirrus_vga_init() are declared
in pc.h. That prevents other platforms (e.g. sPAPR) to use them.
This patch is to create one new file vga-pci.h and move the
declarations to vga-pci.h, so that they can be shared by
all platforms. This patch also cleans up on all platforms.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This reverts commit 518c7fb44f. It breaks
new Linux guests with SMP, because IPIs get mapped to large vectors which
our MPIC emulation does not implement.
Conflicts:
hw/ppc/e500.c
Currently for powerpc, kvm_arch_handle_exit() always returns 1, meaning
that its caller - kvm_cpu_exec() - will always exit immediately afterwards
to the loop in qemu_kvm_cpu_thread_fn().
There's no need to do this. Once we've handled the hypercall there's no
reason we can't go straight around and KVM_RUN again, which is what ret = 0
will signal. The only exception might be for hypercalls which affect the
state of cpu_can_run(), however the only one that might do this is H_CEDE
and for kvm that is always handled in the kernel, not qemu.
Furtherm setting ret = 0 means that when exit_requested is set from a
hypercall, we will enter KVM_RUN once more with a signal which lets the
the kernel do its internal logic to complete the hypercall with out
actually executing any more guest code. This is important if our hypercall
also triggered a reset, which previously would re-initialize everything
without completing the hypercall. This caused the kernel to get confused
because it thought the guest was still in the middle of a hypercall when
it has actually been reset.
This patch therefore changes to ret = 0, which is both a bugfix and a small
optimization.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This gives the kernel a paravirtualized machine to target, without
requiring both sides to pretend to be targeting a specific board
that likely has little to do with the host in KVM scenarios. This
avoids the need to add new boards to QEMU, just to be able to
run KVM on new CPUs.
Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: conditionalize on CONFIG_FDT]
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently the only mpc8544ds-ism that is factored out is
toplevel compatible and model. In the future the generic e500
code is expected to become more generic.
Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: conditionalize on CONFIG_FDT]
Signed-off-by: Alexander Graf <agraf@suse.de>
No functional changes -- machine is still outwardly mpc8544ds.
The references that are not changed contain mpc8544 hardware details that
need to be parameterized if/when a different e500 platform wants to
change them.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Rename the file (with no changes other than fixing up the header paths)
in preparation for refactoring into a generic e500 platform. Also move
it into the newly created ppc/ directory.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
[agraf: conditionalize on CONFIG_FDT]
Signed-off-by: Alexander Graf <agraf@suse.de>
spapr_populate_pci_devices() populates the device tree only with bus
properties and has nothing to do with the devices on it as PCI BAR
allocation is done by the system firmware (SLOF).
New name - spapr_populate_pci_dt() - describes the functionality better.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
The PCIHostState struct already contains SysBusDevice so
the one in sPAPRPHBState has to go.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
* qemu-kvm/uq/master:
update-linux-headers.sh: Pull in asm-generic/kvm_para.h
kvmvapic: Disable if there is insufficient memory
kvm: i8254: Finish time conversion fix
kvm: i8254: Cache kernel clock offset in KVMPITState
This patch creates interrupt.c. The first user is a callback for hw/*
code to trigger an service interrupt for a given sccb value. Several
interrupt types for s390 are floating (can be delivered to all CPUs).
so this code does not belong to a specific CPU.
Other interrupts (like the virtio one) are also floating and can be
moved here later on.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Invalid sccb addresses will cause specification or addressing exception.
Lets add those checks. Furthermore, the good case (cc=0) was incorrect
for KVM, we did not set the CC at all. We now use return codes < 0
as program checks and return codes > 0 as condition code values.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Now that the QERR_ macros no longer contain a json dictionary,
the order of some parameters needs to be fixed for them to appear
correctly.
Signed-off-by: Alberto Garcia <agarcia@igalia.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
POSIX allows sendmsg() and recvmsg() to fail EMSGSIZE if passed a zero
msg.msg_iovlen (in particular the MacOS X implementation will do this).
Handle the case where iov_send_recv() is passed a zero byte count
explicitly, to avoid accidentally depending on the OS to treat zero
msg_iovlen as a no-op.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
The comment about the return address from get_page_addr_code() was
well out of date as phys_ram_base has not existed for some time.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Avoid having an explicit list of directories in the 'clean'
target by using 'find' to remove all .o and .d files instead.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Commit 29cdb251 already added a comment that no unnecessary flushes to
disk will occur, this patch makes the code even get to the point of the
comment. This is mostly theoretical because in practice we only stack
one format on top of one protocol, the former implementing flush_to_os
and the latter only flush_to_disk. It starts to matter when drivers that
are not on top implement flush_to_os.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Fd sets are shared by all monitor connections. Fd sets are considered
to be in use while at least one monitor is connected. When the last
monitor disconnects, all fds that are members of an fd set with no
outstanding dup references are closed. This prevents any fd leakage
associated with a client disconnect prior to using a passed fd.
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When qemu_open is passed a filename of the "/dev/fdset/nnn"
format (where nnn is the fdset ID), an fd with matching access
mode flags will be searched for within the specified monitor
fd set. If the fd is found, a dup of the fd will be returned
from qemu_open.
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch converts all block layer close calls, that correspond
to qemu_open calls, to qemu_close.
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch converts all block layer open calls to qemu_open.
Note that this adds the O_CLOEXEC flag to the changed open paths
when the O_CLOEXEC macro is defined.
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds support that enables passing of file descriptors
to the QEMU monitor where they will be stored in specified file
descriptor sets.
A file descriptor set can be used by a client like libvirt to
store file descriptors for the same file. This allows the
client to open a file with different access modes (O_RDWR,
O_WRONLY, O_RDONLY) and add/remove the passed fds to/from an fd
set as needed. This will allow QEMU to (in a later patch in this
series) "open" and "reopen" the same file by dup()ing the fd in
the fd set that corresponds to the file, where the fd has the
matching access mode flag that QEMU requests.
The new QMP commands are:
add-fd: Add a file descriptor to an fd set
remove-fd: Remove a file descriptor from an fd set
query-fdsets: Return information describing all fd sets
Note: These commands are not compatible with the existing getfd
and closefd QMP commands.
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Set the close-on-exec flag for the file descriptor received
via SCM_RIGHTS.
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add asm-generic/kvm_para.h to the set of non-architecture specific
KVM kernel headers we copy into QEMU. This header may be included
by an architecture's kvm_para.h header.
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We need at least 1M of RAM to map the option ROM. Otherwise, we will
corrupt host memory or even crash:
$ qemu-system-x86_64 -nodefaults --enable-kvm -vnc :0 -m 640k
Segmentation fault (core dumped)
Reported-and-tested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
0cdd3d1444 fixed reading back the counter load time from the kernel
while assuming the kernel would always update its load time on writing
the state. That is only true for channel 1, and so pit_get_channel_info
returned wrong output pin states for high counter values.
Fix this by applying the offset also on kvm_pit_put. Now we also need to
update the offset when we write the state while the VM is stopped as it
keeps on changing in that state.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
To prepare the final fix for clock calibration issues with the in-kernel
PIT, we want to cache the offset between vmclock and the clock used by
the in-kernel PIT. So far, we only need to update it when the VM state
changes between running and stopped because we only read the in-kernel
PIT state while the VM is running.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* origin/master:
linux-user: ARM: Ignore immediate value for svc in thumb mode
linux-user: Use init_guest_space when -R and -B are specified
linux-user: Factor out guest space probing into a function
flatload: fix bss clearing
linux-user: make host_to_target_cmsg support SO_TIMESTAMP cmsg_type
linux-user: make do_setsockopt support SOL_RAW ICMP_FILTER socket option
linux-user: pass sockaddr from host to target
x86: switch to AREG0 free mode
x86: avoid AREG0 in segmentation helpers
x86: avoid AREG0 for misc helpers
x86: use wrappers for memory access helpers
x86: avoid AREG0 for SMM helpers
x86: avoid AREG0 for SVM helpers
x86: avoid AREG0 for integer helpers
x86: avoid AREG0 for condition code helpers
x86: avoid AREG0 for FPU helpers
linux-user: Move target_to_host_errno_table[] setup out of ioctl loop
linux-user: Fix SNDCTL_DSP_MAP{IN, OUT}BUF ioctl definitions
linux-user: Fix incorrect TARGET_BLKBSZGET, TARGET_BLKBSZSET
* 'linux-user.next' of git://git.linaro.org/people/pmaydell/qemu-arm:
linux-user: ARM: Ignore immediate value for svc in thumb mode
linux-user: Use init_guest_space when -R and -B are specified
linux-user: Factor out guest space probing into a function
flatload: fix bss clearing
linux-user: make host_to_target_cmsg support SO_TIMESTAMP cmsg_type
linux-user: make do_setsockopt support SOL_RAW ICMP_FILTER socket option
linux-user: pass sockaddr from host to target
linux-user: Move target_to_host_errno_table[] setup out of ioctl loop
linux-user: Fix SNDCTL_DSP_MAP{IN, OUT}BUF ioctl definitions
linux-user: Fix incorrect TARGET_BLKBSZGET, TARGET_BLKBSZSET