cpu.c contains the code that will check if all requested CPU features
are available, so the filtering of KVM features must be there, so we can
implement "check" and "enforce" properly.
The only point where kvm_arch_init_vcpu() is called on i386 is:
- cpu_x86_init()
- x86_cpu_realize() (after cpu_x86_register() is called)
- qemu_init_vcpu()
- qemu_kvm_start_vcpu()
- qemu_kvm_thread_fn() (on a new thread)
- kvm_init_vcpu()
- kvm_arch_init_vcpu()
With this patch, the filtering will be done earlier, at:
- cpu_x86_init()
- cpu_x86_register() (before x86_cpu_realize() is called)
Also, the KVM CPUID filtering will now be done at the same place where
the TCG CPUID feature filtering is done. Later, the code can be changed
to use the same filtering code for the "check" and "enforce" modes, as
now the cpu.c code knows exactly which CPU features are going to be
exposed to the guest (and much earlier).
One thing I was worrying about when doing this is that
kvm_arch_get_supported_cpuid() depends on kvm_irqchip_in_kernel(), and
maybe the 'kvm_kernel_irqchip' global variable wasn't initialized yet at
CPU creation time. But kvm_kernel_irqchip is initialized during
kvm_init(), that is called very early (much earlier than the machine
init function), and kvm_init() is already a requirement to run the
GET_SUPPORTED_CPUID ioctl() (as kvm_init() initializes the kvm_state
global variable).
Side note: it would be nice to keep KVM-specific code inside kvm.c. The
problem is that properly implementing -cpu check/enforce code (that's
inside cpu.c) depends directly on the feature bit filtering done using
kvm_arch_get_supported_cpuid(). Currently -cpu check/enforce is broken
because it simply uses the host CPU feature bits instead of
GET_SUPPORTED_CPUID, and we need to fix that.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This way all the filtering by GET_SUPPORTED_CPUID is being done at the
same place in the code.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Instead of masking the KVM feature bits very late (while building the
KVM_SET_CPUID2 data), mask it out on env->cpuid_kvm_features, at the
same point where the other feature words are masked out.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This is necessary so that x2apic is not improperly enabled when the
in-kernel irqchip is disabled.
This won't generate a warning with "-cpu ...,check" because the current
check/enforce code is broken (it checks the host CPU data directly,
instead of using kvm_arch_get_supported_cpuid()), but it will be
eventually fixed to properly report the missing x2apic flag.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This moves the CPUID_EXT_TSC_DEADLINE_TIMER CPUID flag hacking from
kvm_arch_init_vcpu() to kvm_arch_get_supported_cpuid().
Full git grep for kvm_arch_get_supported_cpuid:
kvm.h:uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
target-i386/cpu.c: x86_cpu_def->cpuid_7_0_ebx_features = kvm_arch_get_supported_cpuid(kvm_state, 0x7, 0, R_EBX);
target-i386/cpu.c: *eax = kvm_arch_get_supported_cpuid(s, 0xA, count, R_EAX);
target-i386/cpu.c: *ebx = kvm_arch_get_supported_cpuid(s, 0xA, count, R_EBX);
target-i386/cpu.c: *ecx = kvm_arch_get_supported_cpuid(s, 0xA, count, R_ECX);
target-i386/cpu.c: *edx = kvm_arch_get_supported_cpuid(s, 0xA, count, R_EDX);
target-i386/cpu.c: *eax = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EAX);
target-i386/cpu.c: *ebx = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EBX);
target-i386/cpu.c: *ecx = kvm_arch_get_supported_cpuid(s, 0xd, count, R_ECX);
target-i386/cpu.c: *edx = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EDX);
target-i386/kvm.c:uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
target-i386/kvm.c: cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
target-i386/kvm.c: env->cpuid_features &= kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
* target-i386/kvm.c: env->cpuid_ext_features &= kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
target-i386/kvm.c: env->cpuid_ext2_features &= kvm_arch_get_supported_cpuid(s, 0x80000001,
target-i386/kvm.c: env->cpuid_ext3_features &= kvm_arch_get_supported_cpuid(s, 0x80000001,
target-i386/kvm.c: env->cpuid_svm_features &= kvm_arch_get_supported_cpuid(s, 0x8000000A,
target-i386/kvm.c: kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
target-i386/kvm.c: kvm_arch_get_supported_cpuid(s, 0xC0000001, 0, R_EDX);
Note that there is only one call for CPUID[1].ECX above (*), and it is
the one that gets hacked to include CPUID_EXT_TSC_DEADLINE_TIMER, so we
can simply make kvm_arch_get_supported_cpuid() set it, to let the rest
of the code know the flag can be safely set by QEMU.
One thing I was worrying about when doing this is that now
kvm_arch_get_supported_cpuid() depends on kvm_irqchip_in_kernel(). But
the 'kvm_kernel_irqchip' global variable is initialized during
kvm_init(), that is called very early, and kvm_init() is already a
requirement to run the GET_SUPPORTED_CPUID ioctl() (as kvm_init() is the
function that initializes the 'kvm_state' global variable).
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Additional fixups will be added, and making them a single 'if/else if'
chain makes it clearer than two nested switch statements.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The reg switch will be moved to a separate function, so store the entry
pointer in a variable.
No behavior change, just code movement.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Instead of a function-specific has_kvm_features variable, simply use a
"found" variable that will be checked in case we have to use the legacy
get_para_features() interface.
No behavior change, just code cleanup.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The for loop will become a separate function, so clean it up so it can
become independent from the bit hacking for R_EDX.
No behavior change[1], just code movement.
[1] Well, only if the kernel returned CPUID leafs 1 or 0x80000001 as
unsupported, but there's no kernel version that does that.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Add missing stubs to win32 to fix link failure.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
With i386-linux-user target on x86_64 host, this does not introduce any new test
failures.
Signed-off-by: Catalin Patulea <catalinp@google.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
gcc will silently accept unrecognized -Wno-wombat warning suppression
options (it only mentions them if it has to print a compiler warning
for some other reason). Since we already run a check for whether gcc
recognizes the warning options we use, we can easily make this use
the positive sense of the option when checking for support for the
suppression option. This doesn't have any effect except that it avoids
gcc emitting extra messages about unrecognized command line options
when it is printing other warning messages.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Igor Mitsyanko <i.mitsyanko@samsung.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
There is some read-after-write error within the OP=2 insns which
prevents setting cpu_dst to the real output register. Until this
is found and fixed, always write to a temporary first.
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
* 'qspi.2' of git://developer.petalogix.com/public/qemu:
xilinx_zynq: added QSPI controller
xilinx_spips: Generalised to model QSPI
m25p80: Support for Quad SPI
* 's390-for-upstream' of git://repo.or.cz/qemu/agraf:
s390: sclp ascii console support
s390: sclp signal quiesce support
s390: sclp event support
s390: sclp base support
s390: use sync regs for register transfer
s390/kvm_stat: correct sys_perf_event_open syscall number
s390x: fix -initrd in virtio machine
MIPS32 and later instruction sets have a multiplication instruction
directly operating on GPRs. It only produces a 32-bit result but
it is exactly what is needed by QEMU.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The memory core drops regions that are hidden by another region (for example,
during BAR sizing), but it doesn't do so correctly if the lower address of the
existing range is below the lower address of the new range.
Example (qemu-system-mips -M malta -kernel vmlinux-2.6.32-5-4kc-malta
-append "console=ttyS0" -nographic -vga cirrus):
Existing range: 10000000-107fffff
New range: 100a0000-100bffff
Correct behaviour: drop new range
Incorrect behaviour: add new range
Fix by taking this case into account (previously we only considered
equal lower boundaries).
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This includes infrastructure patches that don't do much by themselves
but should help vfio and q35 make progress.
Also included is rework of virtio-net to use iovec APIs
for vector access - helpful to make it more secure
and in preparation for a new feature that will allow
arbitrary s/g layout for guests.
Also included is a pci bridge bugfix by Avi.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJQjrGtAAoJECgfDbjSjVRpoRUH/iW2wYfnPtgjIF2gSBmdlmLK
m5uLSuayHdiNXfjXeNJRySQDu4HZU3K9fDnU+HIMUUmvx/jgaMIbSQZ1gayqQhf8
T1WH21wkkSTEnW0y9t+C3NtRzdK6AbQ1Jy/TqgP++tjT0w54QkoGKEnX3mvvNoDv
xdT+dYcpzMFLZ2KaimP73EwILMMR45iRTPhrjSN2wlwCPPnNcg1Jt0Atb6pYveja
1/46ZY+VxNgRtVSNhnV8xeyyJgwPOrUReKlrqD67YQAeMbzqrel9XDBJuZ3kScWt
vv2AQ0Fe8GMkMkbNaQ0mFh6pXQ2arTAYb8sb6PColCLOn5yNqFz95YGebG36gxk=
=HP23
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'mst/tags/for_anthony' into staging
virtio,pci infrastructure
This includes infrastructure patches that don't do much by themselves
but should help vfio and q35 make progress.
Also included is rework of virtio-net to use iovec APIs
for vector access - helpful to make it more secure
and in preparation for a new feature that will allow
arbitrary s/g layout for guests.
Also included is a pci bridge bugfix by Avi.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* mst/tags/for_anthony: (25 commits)
pci: avoid destroying bridge address space windows in a transaction
virtio-net: enable mrg buf header in tap on linux
virtio-net: test peer header support at init time
virtio-net: minor code simplification
virtio-net: simplify rx code
virtio-net: switch tx to safe iov functions
virtio-net: first s/g is always at start of buf
virtio-net: refactor receive_hdr
virtio-net: use safe iov operations for rx
virtio-net: avoid sg copy
iov: add iov_cpy
virtio-net: track host/guest header length
pcie: Convert PCIExpressHost to use the QOM.
pcie: pass pcie window size to pcie_host_mmcfg_update()
pci: Add class 0xc05 as 'SMBus'
pci: introduce pci_swizzle_map_irq_fn() for standardized interrupt pin swizzle
pci_ids: add intel 82801BA pci-to-pci bridge id
pci: pci capability must be in PCI space
pci: make each capability DWORD aligned
qemu: enable PV EOI for qemu 1.3
...
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This code adds console support by implementing SCLP's ASCII Console
Data event. This is the same console as LPARs ASCII console or z/VMs
sysascii.
The console can be specified manually with something like
-chardev stdio,id=charconsole0 -device sclpconsole,chardev=charconsole0,id=console0
Newer kernels will autodetect that console and prefer that over virtio
console.
When data is received from the character layer it creates a service
interrupt to trigger a Read Event Data command from the guest that will
pick up the received character byte-stream.
When characters are echo'ed by the linux guest a Write Event Data occurs
which is forwarded by the Event Facility to the console that supports
a corresponding mask value.
Console resizing is not supported.
The character layer byte-stream is buffered using a fixed size iov
buffer.
Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This implements the sclp signal quiesce event via the SCLP Event
Facility.
This allows to gracefully shutdown a guest by using system_powerdown
notifiers. It creates a service interrupt that will trigger a
Read Event Data command from the guest. This code will then add an
event that is interpreted by linux guests as ctrl-alt-del.
Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Several SCLP features are considered to be events. Those events don't
provide SCLP commands on their own, instead they are all based on
Read Event Data, Write Event Data, Write Event Mask and the service
interrupt. Follow-on patches will provide SCLP's Signal Quiesce (via
system_powerdown) and the ASCII console.
Further down the road the sclp line mode console and configuration
change events (e.g. cpu hotplug) can be implemented.
Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds a more generic infrastructure for handling Service-Call
requests on s390. Currently we only support a small subset of Read
SCP Info directly in target-s390x. This patch provides the base
infrastructure for supporting more commands and moves Read SCP
Info.
In the future we could add additional commands for hotplug, call
home and event handling.
Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Newer kernels provide the guest registers in kvm_run. Lets use
those if available (i.e. the capability is set). This avoids
ioctls on cpu_synchronize_state making intercepts faster.
In addition, we have now the prefix register, the access registers
the control registers up to date. This helps in certain cases,
e.g. for resolving kernel module addresses with gdb on a guest.
On return, we update the registers according to the level statement,
i.e. we put all registers for KVM_PUT_FULL_STATE and _RESET_STATE.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Correct sys_perf_event_open syscall number for s390 architecture
- the hardcoded syscall number 298 is for x86 but should
be different for other architectures.
In case we figure out via /proc/cpuinfo that we are running
on s390 the appropriate syscall number is used from map
syscall_numbers; other architectures can extend this.
Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
When using -initrd in the virtio machine, we need to indicate the initrd
start and size inside the kernel image. These parameters need to be stored
in native endianness.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Richard Henderson <rth@twiddle.net>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Calling memory_region_destroy() in a transaction is illegal (and aborts),
as until the transaction is committed, the region remains live.
Fix by moving destruction until after the transaction commits. This requires
having an extra set of regions, so the new and old regions can coexist.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Modern linux supports arbitrary header size,
which makes it possible to pass mrg buf header
to tap directly without iovec mangling.
Use this capability when it is there.
This removes the need to deal with it in
vhost-net as we do now.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
There's no reason to query header support at random
times: at load or feature query.
Driver also might not query functions.
Cleaner to do it at device init.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Remove code duplication using guest header length that we track.
Drop specific layout requirement for rx buffers: things work
using generic iovec functions in any case.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Now that we know host hdr length, we don't need to
duplicate the logic in receive_hdr: caller can
figure out the offset itself.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Avoid magling iov manually: use safe iov operations
for processing packets incoming to guest.
This also removes the requirement for virtio header to
fit the first s/g entry exactly.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Avoid tweaking iovec during receive. This removes
the need to copy the vector.
Note: we currently have an evil cast in work_around_broken_dhclient
and unfortunately this patch does not fix it - just
pushes the evil cast to another place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Calling memory_region_destroy() in a transaction is illegal (and aborts),
as until the transaction is committed, the region remains live.
Fix by moving destruction until after the transaction commits. This requires
having an extra set of regions, so the new and old regions can coexist.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Let's use PCIExpressHost with QOM.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This allows q35 to pass/set the size of the pcie window in its update routine.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
[jbaron@redhat.com: add PCI_CLASS_SERIAL_SMBUS definition]
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introduce pci_swizzle_map_irq_fn() for interrupt pin swizzle which is
standardized. PCI bridge swizzle is common logic, by introducing
this function duplicated swizzle logic will be avoided later.
[jbaron@redhat.com: drop opaque argument]
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Adds pci id constants which will be used by q35.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
pci capability must be in PCI space.
It can't lay in PCIe extended config space.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
PCI spec (see e.g. 6.7 Capabilities List in spec rev 3.0)
requires that each capability is DWORD aligned.
Ensure this when allocating space by rounding size up to 4.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Enable KVM PV EOI by default. You can still disable it with
-kvm_pv_eoi cpu flag. To avoid breaking cross-version migration,
enable only for qemu 1.3 (or in the future, newer) machine type.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>