This fixes potential runtime crashes and two warnings from Coverity.
The new error message does not add a prefix "qemu:" because that is
already done in function hw_error. It also starts with an uppercase
letter because that seems to be the mostly used form.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
[agraf: fix typo]
Signed-off-by: Alexander Graf <agraf@suse.de>
The changelog is:
> virtio: Fix vring allocation
> helpers: Fix SLOF_alloc_mem_aligned to meet callers expectation
> Set default palette according to "16-color Text Extension" document
> Fix rectangle drawing functions to work also with higher bit depths
> Fix the x86emu patch file
> Silence compiler warning when building the biosemu
> Use device-type Forth word to set up the corresponding property
> Improve /openprom node
> pci-properties: Remove redundant call to device-type
> cas: reconfigure memory nodes
> pci: use 64bit bar ranges
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Commit 0b183fc871:"memory: move mem_path handling to
memory_region_allocate_system_memory" split memory_region_init_ram and
memory_region_init_ram_from_file. Also it moved mem-path handling a step
up from memory_region_init_ram to memory_region_allocate_system_memory.
Therefore for any board that uses memory_region_init_ram directly,
-mem-path is not supported.
Fix this by replacing memory_region_init_ram with
memory_region_allocate_system_memory.
Signed-off-by: Dirk Mueller <dmueller@suse.com>
Message-Id: <CAL5wTH7o8uA59Ep0n41i0M19VFWa73n9m172j2W3fjz6=PSVBA@mail.gmail.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 0b183fc871:"memory: move mem_path handling to
memory_region_allocate_system_memory" split memory_region_init_ram and
memory_region_init_ram_from_file. Also it moved mem-path handling a step
up from memory_region_init_ram to memory_region_allocate_system_memory.
Therefore for any board that uses memory_region_init_ram directly,
-mem-path is not supported.
Fix this by replacing memory_region_init_ram with
memory_region_allocate_system_memory.
Signed-off-by: Dirk Mueller <dmueller@suse.com>
Message-Id: <CAL5wTH4-=HJUvwBu+2o6jGanJesJOyNf3sL8-5+d_-6C3cWBfA@mail.gmail.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Acked-by: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 0b183fc871:"memory: move mem_path handling to
memory_region_allocate_system_memory" split memory_region_init_ram and
memory_region_init_ram_from_file. Also it moved mem-path handling a step
up from memory_region_init_ram to memory_region_allocate_system_memory.
Therefore for any board that uses memory_region_init_ram directly,
-mem-path is not supported.
Fix this by replacing memory_region_init_ram with
memory_region_allocate_system_memory.
Signed-off-by: Dirk Mueller <dmueller@suse.com>
Message-Id: <CAL5wTH6X-GsT1AA8kEtP_e7oZWGZgi=fCcDfSs3wLgJN30DbUw@mail.gmail.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We don't validate the backend queue numbers against bus limitation,
this will easily crash qemu if it exceeds the limitation which will
hit the abort() in virtio_del_queue(). An example is trying to
starting a virtio-net device with 256 queues. E.g:
./qemu-system-x86_64 -netdev tap,id=hn0,queues=256 -device
virtio-net-pci,netdev=hn0
Fixing this by doing the validation and fail early.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
commit 9b70c1790a
virtio-serial: switch to standard-headers
changes virtio_console_config size from 8 to 12 bytes:
it adds an optional 4 byte emerg_wr field.
As this crosses a power of two boundary, this changes the PCI BAR size,
which breaks migration compatibility with old qemu machine types.
It's probably a problem for other transports as well.
As a temporary fix, as we don't yet support this new field anyway,
simply make the config size smaller at init time.
Long terms we probably want something along the lines
of virtio_net_set_config_size.
Reported-by: Cole Robinson <crobinso@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Cole Robinson <crobinso@redhat.com>
This fixes these gcc warnings (not enabled in default build):
hw/acpi/aml-build.c:83:5: warning:
function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
hw/acpi/aml-build.c:88:5: warning:
function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1427271528-11624-1-git-send-email-armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It is wrong to use address_space_memory directly, because there could be an
IOMMU in the middle. Passing the entire PVSCSIRingInfo to RS_GET_FIELD
and RS_SET_FIELD makes it easy to go back to the PVSCSIState.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
following a464982499, it's now possible for
there to be attempts to take the BQL before CPUs have been realized in
cases where a machine model inits peripherals before the first CPU.
BQL lock aquisition kicks the first_cpu, leading to a segfault if this
happens pre-realize. Guard the CPU kick routine to perform no action for
a CPU that doesn't exist or doesn't have a thread yet.
There was a fix to this with commit
6b49809c59, but the check there misses
the case where the CPU has been inited and not realized. Strengthen the
check to make sure that the first_cpu has a thread (i.e. it is
realized) before allowing the kick.
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Message-Id: <1427107689-6946-1-git-send-email-peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the guest programs a sufficiently large timeout value an integer
overflow can occur in i6300esb_restart_timer(). e.g. if the maximum
possible timer preload value of 0xfffff is programmed then we end up with
the calculation:
timeout = get_ticks_per_sec() * (0xfffff << 15) / 33000000;
get_ticks_per_sec() returns 1000000000 (10^9) giving:
10^9 * (0xfffff * 2^15) == 0x1dcd632329b000000 (65 bits)
Obviously the division by 33MHz brings it back under 64-bits, but the
overflow has already occurred.
Since signed integer overflow has undefined behaviour in C, in theory this
could be arbitrarily bad. In practice, the overflowed value wraps around
to something negative, causing the watchdog to immediately expire, killing
the guest, which is still fairly bad.
The bug can be triggered by running a Linux guest, loading the i6300esb
driver with parameter "heartbeat=2046" and opening /dev/watchdog. The
watchdog will trigger as soon as the device is opened.
This patch corrects the problem by using muldiv64(), which effectively
allows a 128-bit intermediate value between the multiplication and
division.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1427075508-12099-3-git-send-email-david@gibson.dropbear.id.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The IO operations for the i6300esb watchdog timer are marked as
DEVICE_NATIVE_ENDIAN. This is not correct, and - as a PCI device - should
be DEVICE_LITTLE_ENDIAN.
This allows i6300esb to work on ppc targets (yes, using an Intel ICH
derived device on ppc is a bit odd, but the driver exists on the guest
and there's no more obviously suitable watchdog device).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1427075508-12099-2-git-send-email-david@gibson.dropbear.id.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The fw_cfg documentation says this of the revision key (0x0001, FW_CFG_ID):
> A 32-bit little-endian unsigned int, this item is used as an interface
> revision number, and is currently set to 1 by all QEMU architectures
> which expose a fw_cfg device.
arm/virt doesn't. It could be argued that that's an error in
"hw/arm/virt.c"; on the other hand, all of the other fw_cfg providing
boards set the interface version to 1 manually, despite the device
coming from the same, shared implementation. Therefore, instead of
adding
fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1);
to arm/virt, consolidate all such existing calls in the fw_cfg
initialization code.
Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
Message-Id: <1426789244-26318-1-git-send-email-somlo@cmu.edu>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
32-bit PPC cannot do atomic operations on long long. Inside the loops,
we are already using local counters that are summed at the end of
the run---with some exceptions (rcu_stress_count for rcutorture,
n_nodes for test-rcu-list): fix them to use the same technique.
For test-rcu-list, remove the mostly unused member "val" from the
list. Then, use a mutex to protect the global counts.
Performance does not matter there because every thread will only enter
the critical section once.
Remaining uses of atomic instructions are for ints or pointers.
Reported-by: Andreas Faerber <afaerber@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Libseccomp version updated to 2.2.0 and arch restriction to x86/x86_64
is now removed. It's supposed to work on armv7l as well.
Related bug: https://bugs.launchpad.net/qemu/+bug/1363641
Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
The TriCore documentation was wrong on how to calculate ovf bits for those two
instructions, which I confirmed with real hardware (TC1796 chip). An ovf
actually happens, if the result (without remainder) does not fit into 8/16 bits.
Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
RRPW_DEXTR used r1 for the low part and r2 for the high part. It should be the
other way round. This also fixes that the result of the first shift was not
saved in a temp and could overwrite registers that were needed for the second
shift.
Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
dvinit_hu/bu for ISA v1.3 calculate the higher part of the result, that is needed
for the overflow bits, after calculating the overflow bits.
Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
clang report:
target-tricore/op_helper.c:1247:24: warning:
taking the absolute value of unsigned type 'uint32_t' (aka 'unsigned int')
has no effect [-Wabsolute-value]
target-tricore/op_helper.c:1248:25: warning:
taking the absolute value of unsigned type 'uint32_t' (aka 'unsigned int')
has no effect [-Wabsolute-value]
target-tricore/op_helper.c:1249:19: warning:
taking the absolute value of unsigned type 'uint32_t' (aka 'unsigned int')
has no effect [-Wabsolute-value]
target-tricore/op_helper.c:1297:24: warning:
taking the absolute value of unsigned type 'uint32_t' (aka 'unsigned int')
has no effect [-Wabsolute-value]
target-tricore/op_helper.c:1298:25: warning:
taking the absolute value of unsigned type 'uint32_t' (aka 'unsigned int')
has no effect [-Wabsolute-value]
target-tricore/op_helper.c:1299:19: warning:
taking the absolute value of unsigned type 'uint32_t' (aka 'unsigned int')
has no effect [-Wabsolute-value]
Fix also the divisor which was taken from the wrong register
(thanks to Peter Maydell for this hint).
Cc: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-Id: <1425739412-8144-1-git-send-email-sw@weilnetz.de>
Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
My pattern was cyclical every 256 bytes, so it missed a fairly obvious
failure case. Add some rand() pepper into the test pattern, and for large
patterns that exceed 256 sectors, start writing an ID per-sector so that
we never generate identical sector patterns.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Andreas Färber <afaerber@suse.de>
Message-id: 1426811056-2202-5-git-send-email-jsnow@redhat.com
This does not bother DMA, because DMA generally transfers
the entire SGList in one shot if it can.
PIO, on the other hand, tries to transfer just one sector
at a time, and will make multiple visits to the sglist
to fetch memory addresses.
Fix the memory address calculaton when we have an offset
by moving the offset addition OUTSIDE of the le64_to_cpu
calculation.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Andreas Färber <afaerber@suse.de>
Message-id: 1426811056-2202-4-git-send-email-jsnow@redhat.com
Similar to the cmd_write_pio fix, update the nsector count and
ide sector before we invoke ide_transfer_start.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Andreas Färber <afaerber@suse.de>
Message-id: 1426811056-2202-3-git-send-email-jsnow@redhat.com
We need to adjust the sector being written to
prior to calling ide_transfer_start, otherwise
we'll write to the same sector again.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Andreas Färber <afaerber@suse.de>
Message-id: 1426811056-2202-2-git-send-email-jsnow@redhat.com
New threads always point at the same env which is incorrect and usually
leads to a crash.
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
The second and fourth argument are in/out parameters, store them back
after the syscall. Also, the fourth argument was mishandled, and EFAULT
handling was missing.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
start/end_exclusive() need be pairs, except the start_exclusive() in
stop_all_tasks() which is only used by force_sig(), which will be abort.
So at present, start_exclusive() in stop_all_task() need not be paired.
queue_signal() may call force_sig(), or return after kill pid (or queue
signal). If could return from queue_signal(), stop_all_task() would not
be called in time, the next end_exclusive() would be issue.
So in arm_kernel_cmpxchg64_helper() for ARM, need remove end_exclusive()
after queue_signal(). The related commit: "97cc756 linux-user: Implement
new ARM 64 bit cmpxchg kernel helper".
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJVCyYhAAoJECgHk2+YTcWmKl0P+wSBnfMCIMhKLmrqI6N8NKWT
RJV7EJaaY/ek/Ca4/fs6PZxGsfRDZ2KyGgP3rfM2tArW954Slw4eIpg/TlUFfcR1
M0kMoeEjh1tOeZjkMzSWuQoTjslSazMU+EeNzzykc2GR5uQthjJPqiJXqpsXoMmN
HvvwkoyWc3IE1qE2jdlRjfZg3k7HDEGUP92oR5/wrHuE2wK7QHNCrzt1Ej+aF1eW
s2neC//Q3CJXpbsAOMG7SbL/5C1k7tlmLGKs6LID5Q/kozxZUedOC7UIi43H3KbY
TsXPhggyETnnZ7b4hK3zbJEOjrcgBWq7houcERrDzFG5XK+4tOBOYN7KL8mL+MbJ
G3GkitC2Jdng/bmrSpT9GtPhDEmhx1JUNqSRoUt6YN1UxSsCATgLi0TBbKouFBGI
IY0aM1/WHV8m9ksOPWAcz0OG9kywt53jKbWaKIouDADGI9BEOsoK7+py8Kq3bbTV
92M4vCkX752z1wJgEwE7SIXvM3kCfm0C1tjvIipNlgMm9M+yM/FEX9xSmW3q36Sq
Vmn+2oMk+x6yntmdaYVWJKhqFWSriJQyfobgvk30enmvCmKKtBXo9i/WcjxvldVh
Zi6guaz8AG6bAigrZja7sGqnld7rZ7qgbJe8bNATGrgjT8Nrr2D3+DsdwTK/76nJ
pG9uGb/rtc5ZfDzDf+qd
=BNKx
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging
X86 queue 2015-03-19
# gpg: Signature made Thu Mar 19 19:40:17 2015 GMT using RSA key ID 984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6
* remotes/ehabkost/tags/x86-pull-request:
target-i386: Haswell-noTSX and Broadwell-noTSX
Revert "target-i386: Disable HLE and RTM on Haswell & Broadwell"
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJVCyLBAAoJECgHk2+YTcWm0kQP/09B1s3b32dABR6YsFFfeziS
I5GWzvwOh1HMMVx1sCrGbC/fvqavLqgfG/87Q+1ijGM6vAGMspGgxgFoCs50Tdrm
0jxrDODHFzuHZDhxk38rj5B8L6wEyrLzEpJYDZ6OzM0zuZU7/uoNjT9IqkzlFHFi
1BHxrttFLiGe9sRaVu8f60tmxpixgxRaj2o5ru3Cgt38YIVQ3Wl0XjZxXlAZI0Pu
C59HaBipPVYRNiNpasYPA9C5grjT01EtHgs3bUfKUz9tsrjvDaVFiQHIb7g+bUIV
J/0av8UwLtTnDovKqv4VxO31xugzTAPWzZU37bwhhGAgG8DlHyENzrrAfihy9TUQ
YQNm6y+0K+Mjtf5nDAoSmkAEBxLMyNo1SCbAFinL6xy1QaH/KYuCKS/RN7GuNgZZ
SQW3EFIoXacVPL0AQiBccOGZG03EOk9AJh4EdC2+QI73H/ZzZgQpLDuw+wWLxwip
s3piz9CUUXEKuqWaCWqJk3W1lmM35gwpzHOMRKbxCzHuX2aOPkO5pKeyliG7cmFI
sdmt6bB+lKL4EAm0bB2xeOzPC4wFgZa3ePu1wBrS7euWoUWuvkStm2C0QvLy7vpq
to7IPlBrhf/5QSLi4HIMYJ68kbf9fxDWW291LzKowDppAHtxLXkgYpnFHHOSk5YI
51cc/NuCGt9k+i7+Dtvm
=gCrW
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/ehabkost/tags/work/numa-verify-cpus-pull-request' into staging
NUMA queue 2015-03-19
# gpg: Signature made Thu Mar 19 19:25:53 2015 GMT using RSA key ID 984DC5A6
# gpg: Can't check signature: public key not found
* remotes/ehabkost/tags/work/numa-verify-cpus-pull-request:
numa: Print warning if no node is assigned to a CPU
pc: fix default VCPU to NUMA node mapping
numa: introduce machine callback for VCPU to node mapping
numa: Reject configuration if CPU appears on multiple nodes
numa: Reject CPU indexes > max_cpus
numa: Fix off-by-one error at MAX_CPUMASK_BITS check
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
When hot-unplugging the usb controllers (ehci/uhci),
we have to clean all resouce of these devices,
involved registered reset handler. Otherwise, it
may cause NULL pointer access and/or segmentation fault
if we reboot the guest os after hot-unplugging.
Let's hook up reset via DeviceClass->reset() and drop
the qemu_register_reset() call. Then Qemu will register
and unregister the reset handler automatically.
Cc: qemu-stable <qemu-stable@nongnu.org>
Reported-by: Lidonglin <lidonglin@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
When hot-unplugging the usb controllers (ehci/uhci),
we have to clean all resouce of these devices,
involved registered reset handler. Otherwise, it
may cause NULL pointer access and/or segmentation fault
if we reboot the guest os after hot-unplugging.
Let's hook up reset via DeviceClass->reset() and drop
the qemu_register_reset() call. Then Qemu will register
and unregister the reset handler automatically.
Ohci does't support hotplugging/hotunplugging yet, but
existing resource cleanup leak logic likes ehci/uhci.
Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
When hot-unplugging the usb controllers (ehci/uhci),
we have to clean all resouce of these devices,
involved registered reset handler. Otherwise, it
may cause NULL pointer access and/or segmentation fault
if we reboot the guest os after hot-unplugging.
Let's hook up reset via DeviceClass->reset() and drop
the qemu_register_reset() call. Then Qemu will register
and unregister the reset handler automatically.
Cc: qemu-stable <qemu-stable@nongnu.org>
Reported-by: Lidonglin <lidonglin@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
With the Intel microcode update that removed HLE and RTM, there will be
different kinds of Haswell and Broadwell CPUs out there: some that still
have the HLE and RTM features, and some that don't have the HLE and RTM
features. On both cases people may be willing to use the pc-*-2.3
machine-types.
So, to cover both cases, introduce Haswell-noTSX and Broadwell-noTSX CPU
models, for hosts that have Haswell and Broadwell CPUs without TSX support.
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This reverts commit 13704e4c45.
With the Intel microcode update that removed HLE and RTM, there will be
different kinds of Haswell and Broadwell CPUs out there: some that still
have the HLE and RTM features, and some that don't have the HLE and RTM
features. On both cases people may be willing to use the pc-*-2.3
machine-types.
So instead of making the CPU model results confusing by making it depend
on the machine-type, keep HLE and RTM on the existing Haswell and
Broadwell CPU models. The plan is to introduce "Haswell-noTSX" and
"Broadwell-noTSX" CPU models later, for people who have CPUs that don't
have TSX feature available.
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
We need all possible CPUs (including hotplug ones) to be present in the
SRAT when QEMU starts. QEMU already does that correctly today, the only
problem is that when a CPU is omitted from the NUMA configuration, it is
silently assigned to node 0.
Check if all CPUs up to max_cpus are present in the NUMA configuration
and warn about missing CPUs.
Make it just a warning, to allow management software to be updated if
necessary. In the future we may make it a fatal error instead.
Command-line examples:
* Correct, no warning:
$ qemu-system-x86_64 -smp 2,maxcpus=4
$ qemu-system-x86_64 -smp 2,maxcpus=4 -numa node,cpus=0-3
* Incomplete, with warnings:
$ qemu-system-x86_64 -smp 2,maxcpus=4 -numa node,cpus=0
qemu-system-x86_64: warning: CPU(s) not present in any NUMA nodes: 1 2 3
qemu-system-x86_64: warning: All CPU(s) up to maxcpus should be described in NUMA config
$ qemu-system-x86_64 -smp 2,maxcpus=4 -numa node,cpus=0-2
qemu-system-x86_64: warning: CPU(s) not present in any NUMA nodes: 3
qemu-system-x86_64: warning: All CPU(s) up to maxcpus should be described in NUMA config
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
---
v1 -> v2: (no changes)
v2 -> v3:
* Use enumerate_cpus() and error_report() for error message
* Simplify logic using bitmap_full()
v3 -> v4:
* Clarify error message, mention that all CPUs up to
maxcpus need to be described in NUMA config
v4 -> v5:
* Commit log update, to make problem description clearer
Since commit
dd0247e0 pc: acpi: mark all possible CPUs as enabled in SRAT
Linux kernel actually tries to use CPU to Node mapping from
QEMU provided SRAT table instead of discarding it, and that
in some cases breaks build_sched_domains() which expects
sane mapping where cores/threads belonging to the same socket
are on the same NUMA node.
With current default round-robin mapping of VCPUs to nodes
guest ends-up with cores/threads belonging to the same socket
being on different NUMA nodes.
For example with following CLI:
qemu-system-x86_64 -m 4G \
-cpu Opteron_G3,vendor=AuthenticAMD \
-smp 5,sockets=1,cores=4,threads=1,maxcpus=8 \
-numa node,nodeid=0 -numa node,nodeid=1
2.6.32 based kernels will hang on boot due to incorrectly built
sched_group-s list in update_sd_lb_stats()
Replacing default mapping with a manual, where VCPUs belonging to
the same socket are on the same NUMA node, fixes the issue for
guests which can't handle nonsense topology i.e. changing CLI to:
-numa node,nodeid=0,cpus=0-3 -numa node,nodeid=1,cpus=4-7
So instead of simply scattering VCPUs around nodes, provide
callback to map the same socket VCPUs to the same NUMA node,
which is what guests would expect from a sane hardware/BIOS.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Current default round-robin way of distributing VCPUs among
NUMA nodes might be wrong in case on multi-core/threads
CPUs. Making guests confused wrt topology where cores from
the same socket are on different nodes.
Allow a machine to override default mapping by providing
MachineClass::cpu_index_to_socket_id()
callback which would allow it group VCPUs from a socket
on the same NUMA node.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Each CPU can appear in only one NUMA node on the NUMA config. Reject
configuration if a CPU appears in multiple nodes.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
CPU index is always less than max_cpus, as documented at sysemu.h:
> The following shall be true for all CPUs:
> cpu->cpu_index < max_cpus <= MAX_CPUMASK_BITS
Reject configuration which uses invalid CPU indexes.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Fix the CPU index check to ensure we don't go beyond the size of the
node_cpu bitmap.
CPU index is always less than MAX_CPUMASK_BITS, as documented at
sysemu.h:
> The following shall be true for all CPUs:
> cpu->cpu_index < max_cpus <= MAX_CPUMASK_BITS
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Error classes are a leftover from the days of "rich" error objects.
New code should always use ERROR_CLASS_GENERIC_ERROR. Commit
b7b9d39..7c6a4ab added uses of ERROR_CLASS_DEVICE_NOT_FOUND. Replace
them.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>