Some software algorithms are based on the hardware's cache info, for example,
for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will trigger
a resched IPI and told cpu2 to do the wakeup if they don't share low level
cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share llc.
The relevant linux-kernel code as bellow:
static void ttwu_queue(struct task_struct *p, int cpu)
{
struct rq *rq = cpu_rq(cpu);
......
if (... && !cpus_share_cache(smp_processor_id(), cpu)) {
......
ttwu_queue_remote(p, cpu); /* will trigger RES IPI */
return;
}
......
ttwu_do_activate(rq, p, 0); /* access target's rq directly */
......
}
In real hardware, the cpus on the same socket share L3 cache, so one won't
trigger a resched IPIs when wakeup a task on others. But QEMU doesn't present a
virtual L3 cache info for VM, then the linux guest will trigger lots of RES IPIs
under some workloads even if the virtual cpus belongs to the same virtual socket.
For KVM, there will be lots of vmexit due to guest send IPIs.
The workload is a SAP HANA's testsuite, we run it one round(about 40 minuates)
and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering during
the period:
No-L3 With-L3(applied this patch)
cpu0: 363890 44582
cpu1: 373405 43109
cpu2: 340783 43797
cpu3: 333854 43409
cpu4: 327170 40038
cpu5: 325491 39922
cpu6: 319129 42391
cpu7: 306480 41035
cpu8: 161139 32188
cpu9: 164649 31024
cpu10: 149823 30398
cpu11: 149823 32455
cpu12: 164830 35143
cpu13: 172269 35805
cpu14: 179979 33898
cpu15: 194505 32754
avg: 268963.6 40129.8
The VM's topology is "1*socket 8*cores 2*threads".
After present virtual L3 cache info for VM, the amounts of RES IPIs in guest
reduce 85%.
For KVM, vcpus send IPIs will cause vmexit which is expensive, so it can cause
severe performance degradation. We had tested the overall system performance if
vcpus actually run on sparate physical socket. With L3 cache, the performance
improves 7.2%~33.1%(avg:15.7%).
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
First remove VCPU from exec loop and only then remove lapic.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
object_property_add_child() silently fails with error that it can't
create duplicate propery 'apic' as we already have 'apic' property
registered for 'apic' feature. As result generic device_realize puts
apic into unattached container.
As it's programming error, abort if name collision happens in future
and fix property name for apic_state to 'lapic', this way apic is
a child of cpu instance.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
These are both stored in CPUID[EAX=7,EBX=0].ECX. KVM is going to
be able to emulate both (albeit with a performance loss in the case
of RDPID, which therefore will be in KVM_GET_EMULATED_CPUID rather
than KVM_GET_SUPPORTED_CPUID).
It's also possible to implement both in TCG, but this is for 2.8.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
These properties will be used by as address where to plug
CPU with help -device/device_add commands.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Custom apic-id setter/getter doesn't do any property specific
checks anymore, so clean it up and use more compact static
property DEFINE_PROP_UINT32 instead.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Machine code knows about all possible APIC IDs so use that
instead of hack which does O(n^2) complexity duplicate
checks, interating over global CPUs list.
As result duplicate check is done only once with O(log n) complexity.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Add the host-phys-bits boolean property, if true, take phys-bits
from the hosts physical bits value, overriding either the default
or the user specified value.
We can also use the value we read from the host to check the users
explicitly set value and warn them if it doesn't match.
Note:
a) We only read the hosts value in KVM mode (because on non-x86
we get an abort if we try)
b) We don't warn about trying to use host-phys-bits in TCG mode,
we just fall back to the TCG default. This allows the machine
type to set the host-phys-bits flag if it wants and then to
work in both TCG and KVM.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Redo 9886e834 (target-i386: Require APIC ID to be explicitly set before
CPU realize) in another way that doesn't use int64_t to detect
if apic-id property has been set.
Use the fact that 0xFFFFFFFF is the broadcast
value that a CPU can't have and set default
uint32_t apic_id to it instead of using int64_t.
Later uint32_t apic_id will be used to drop custom
property setter/getter in favor of static property.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Fill the bits between 51..number-of-physical-address-bits in the
MTRR_PHYSMASKn variable range mtrr masks so that they're consistent
in the migration stream irrespective of the physical address space
of the source VM in a migration.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Currently QEMU sets the x86 number of physical address bits to the
magic number 40. This is only correct on some small AMD systems;
Intel systems tend to have 36, 39, 46 bits, and large AMD systems
tend to have 48.
Having the value different from your actual hardware is detectable
by the guest and in principal can cause problems;
The current limit of 40 stops TB VMs being created by those lucky
enough to have that much.
This patch lets you set the physical bits by a cpu property but
defaults to the same 40bits which matches TCGs setup.
I've removed the ancient warning about the 42 bit limit in exec.c;
I can't find that limit in there and no one else seems to know where
it is.
We use a magic value of 0 as the property default so that we can
later distinguish between the default and a user set value.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
'HF_SOFTMMU_MASK' is only set when 'CONFIG_SOFTMMU' is defined. So
there's no need in this flag: test 'CONFIG_SOFTMMU' instead.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20160715175852.30749-6-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If -cpu host is used, LMCE will be automatically enabled when it's
supported by host.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This patch adds the support to inject SRAR and SRAO as LMCE, i.e. they
are injected to only one VCPU rather than broadcast to all VCPUs. As KVM
reports LMCE support on Intel platforms, this features is only available
on Intel platforms.
LMCE is disabled by default and can be enabled/disabled by cpu option
'lmce=on/off'.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
[Haozhong: Enable LMCE only on Intel platforms
Disable LMCE by default and add a cpu option 'lmce'
Handle the error if LMCE is enabled w/o host support
Remove MCG_LMCE_P from MCE_CAP_DEF
Add migration support for LMCE
Minor code style changes]
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This change adds hyperv feature words report through qom rpc.
When VM is configured with hyperv features enabled
libvirt will check that required feature words are set
in cpuid leaf 40000003 through qom request.
Currently qemu does not report hyperv feature words
which prevents windows guests from starting with libvirt.
To avoid conflicting with current hyperv properties all added feature
words cannot be set directly with -cpu +feature yet.
Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Eduardo Habkost <ehabkost@redhat.com>
CC: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Considering that features are converted to global properties and
global properties are automatically applied to every new instance
of created CPU (at object_new() time), there is no point in
parsing cpu_model string every time a CPU created. So move
parsing outside CPU creation loop and do it only once.
Parsing also should be done before any CPU is created so that
features would affect the first CPU a well.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Currently CPUClass->parse_features() is used to parse -cpu
features string and set properties on created CPU instances.
But considering that features specified by -cpu apply to every
created CPU instance, it doesn't make sense to parse the same
features string for every CPU created. It also makes every target
that cares about parsing features string explicitly call
CPUClass->parse_features() parser, which gets in a way if we
consider using generic device_add for CPU hotplug as device_add
has not a clue about CPU specific hooks.
Turns out we can use global properties mechanism to set
properties on every created CPU instance for a given type. That
way it's possible to convert CPU features into a set of global
properties for CPU type specified by -cpu cpu_model and common
Device.device_post_init() will apply them to CPU of given type
automatically regardless whether it's manually created CPU or CPU
created with help of device_add.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
x86_cpu_parse_featurestr has a "val = num;" assignment just before num
goes out of scope. Push num up to fix the issue.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
ERMS just says "rep movsb" and "rep stosb" are fast. It does not
imply any new instruction, so we can support it easily.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This patch simplifies code that uses a local_err variable just to
immediately use it for an error_propagate() call.
Coccinelle patch used to perform the changes added to
scripts/coccinelle/remove_local_err.cocci.
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1465855078-19435-3-git-send-email-ehabkost@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Blank line in s390-virtio-ccw.c restored]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Now cpu_x86_init() does nothing more or less
than duplicating cpu_generic_init() logic.
So simplify it by using cpu_generic_init().
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
It will allow to drop custom cpu_x86_init() and use
cpu_generic_init() instead, reducing cpu_x86_create()
to a simple 3-liner.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The code will be changed to allow creation of the CPU object and
report kvm_required errors only at realizefn, so we need to make
the instance_init function more flexible.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Making x86_cpu_parse_featurestr() a pure convertor
of legacy feature string into global properties, needs
it to be called before a CPU instance is created so
parser shouldn't modify CPUState directly or access
it at all. Hence move current hack that directly pokes
into CPUState, to set/unset +-feats, from parser to
CPU's realize method.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The "fixup will be removed in future versions" warnings are
present since QEMU 1.7.0, at least, so users should have fixed
their scripts and configurations, already.
In the case of libvirt users, libvirt doesn't use the "xlevel"
option, and already rejects HyperV spinlock retry count < 0xFFF.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
I looked at a dozen Intel CPU that have this CPUID and all of them
always had Core offset as 1 (a wasted bit when hyperthreading is
disabled) and Package offset at least 4 (wasted bits at <= 4 cores).
QEMU uses more compact IDs and it doesn't make much sense to change it
now. I keep the SMT and Core sub-leaves even if there is just one
thread/core; it makes the code simpler and there should be no harm.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Introduce Skylake-Client cpu mode which inherits the features from
Broadwell and supports some additional features that are: MPX,
XSAVEC, and XGETBV1.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
QOM instance_init functions are not supposed to have any side-effects,
as new objects may be created at any moment for querying property
information (see qmp_device_list_properties()).
Calling cpu_exec_init() also affects QEMU's ability to handle errors
during CPU creation, as some actions done by cpu_exec_init() can't be
reverted.
Move cpu_exec_init() call to realize so a simple object_new() won't
trigger it, and so that it is called after some basic validation of CPU
parameters.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
QOM instance_init functions are not supposed to have any side-effects,
as new objects may be created at any moment for querying property
information (see qmp_device_list_properties()).
Move TCG initialization to realize time so it won't be called when just
doing object_new() on a X86CPU subclass.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Instead of requiring cpu.c to check if TCG was already initialized,
simply let the function be called multiple times.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
x86_cpudef_init() doesn't do anything anymore, cpudef_init(),
cpudef_setup(), and x86_cpudef_init() can be finally removed.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Newer PC machines don't set hw_version, and older machines set
model-id on compat_props explicitly, so we don't need the
x86_cpudef_setup() code that sets model_id using
qemu_hw_version() anymore.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This doesn't introduce any change in the code, as the offsets and
struct sizes match what was present in the table. This can be
validated by the QEMU_BUILD_BUG_ON lines on target-i386/cpu.h,
which ensures the struct sizes and offsets match the existing
values in ext_save_area.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
exec-all.h contains TCG-specific definitions. It is not needed outside
TCG-specific files such as translate.c, exec.c or *helper.c.
One generic function had snuck into include/exec/exec-all.h; move it to
include/qom/cpu.h.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
All qdev definitions are available from other headers, user-mode
emulation does not need hw/hw.h.
By considering system emulation only, it is simpler to disentangle
hw/hw.h from NEED_CPU_H.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
QEMU complains about -cpu host on an AMD machine:
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 0]
For bits 0,1,3,4,5,6,7,8,9,12,13,14,15,16,17,23,24.
KVM_GET_SUPPORTED_CPUID and and x86_cpu_get_migratable_flags()
don't handle the AMD CPUID aliases bits, making
x86_cpu_filter_features() print warnings and clear those CPUID
bits incorrectly.
To avoid hacking x86_cpu_get_migratable_flags() to handle
CPUID_EXT2_AMD_ALIASES (just like the existing hack inside
kvm_arch_get_supported_cpuid()), simply move the
CPUID_EXT2_AMD_ALIASES code in x86_cpu_realizefn() after the
x86_cpu_filter_features() call.
This will probably make the CPUID_EXT2_AMD_ALIASES hack in
kvm_arch_get_supported_cpuid() unnecessary, too. The hack will be
removed in a follow-up patch after v2.6.0.
Reported-by: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Move declarations out of qemu-common.h for functions declared in
utils/ files: e.g. include/qemu/path.h for utils/path.c.
Move inline functions out of qemu-common.h and into new files (e.g.
include/qemu/bcd.h)
Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
x2apic feature is in the kvm_default_props and automatically added to all
CPU models when KVM is enabled. But userspace devices don't support x2apic
which can't be enabled without the in-kernel irqchip. It will trigger
warning of "host doesn't support requested feature: CPUID.01H:ECX.x2apic
[bit 21]" when kernel_irqchip is off. This patch is to fix it via removing
x2apic feature when kernel_irqchip is off.
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The xsave and xrstor helpers are accessing the x86_ext_save_areas array
using a bit mask instead of a bit position. Provide two sets of XSTATE_*
definitions and use XSTATE_*_BIT when a bit position is requested.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>