qemu/include/hw
Greg Kurz 30499fdd98 spapr: Fix buffer overflow in spapr_numa_associativity_init()
Running a guest with 128 NUMA nodes crashes QEMU:

../../util/error.c:59: error_setv: Assertion `*errp == NULL' failed.

The crash happens when setting the FWNMI migration blocker:

2861	    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI) == SPAPR_CAP_ON) {
2862	        /* Create the error string for live migration blocker */
2863	        error_setg(&spapr->fwnmi_migration_blocker,
2864	            "A machine check is being handled during migration. The handler"
2865	            "may run and log hardware error on the destination");
2866	    }

Inspection reveals that papr->fwnmi_migration_blocker isn't NULL:

(gdb) p spapr->fwnmi_migration_blocker
$1 = (Error *) 0x8000000004000000

Since this is the only place where papr->fwnmi_migration_blocker is
set, this means someone wrote there in our back. Further analysis
points to spapr_numa_associativity_init(), especially the part
that initializes the associative arrays for NVLink GPUs:

    max_nodes_with_gpus = nb_numa_nodes + NVGPU_MAX_NUM;

ie. max_nodes_with_gpus = 128 + 6, but the array isn't sized to
accommodate the 6 extra nodes:

struct SpaprMachineState {
    .
    .
    .
    uint32_t numa_assoc_array[MAX_NODES][NUMA_ASSOC_SIZE];

    Error *fwnmi_migration_blocker;
};

and the following loops happily overwrite spapr->fwnmi_migration_blocker,
and probably more:

    for (i = nb_numa_nodes; i < max_nodes_with_gpus; i++) {
        spapr->numa_assoc_array[i][0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS);

        for (j = 1; j < MAX_DISTANCE_REF_POINTS; j++) {
            uint32_t gpu_assoc = smc->pre_5_1_assoc_refpoints ?
                                 SPAPR_GPU_NUMA_ID : cpu_to_be32(i);
            spapr->numa_assoc_array[i][j] = gpu_assoc;
        }

        spapr->numa_assoc_array[i][MAX_DISTANCE_REF_POINTS] = cpu_to_be32(i);
    }

Fix the size of the array. This requires "hw/ppc/spapr.h" to see
NVGPU_MAX_NUM. Including "hw/pci-host/spapr.h" introduces a
circular dependency that breaks the build, so this moves the
definition of NVGPU_MAX_NUM to "hw/ppc/spapr.h" instead.

Reported-by: Min Deng <mdeng@redhat.com>
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1908693
Fixes: dd7e1d7ae4 ("spapr_numa: move NVLink2 associativity handling to spapr_numa.c")
Cc: danielhb413@gmail.com
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160829960428.734871.12634150161215429514.stgit@bahia.lan>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-01-06 11:09:59 +11:00
..
acpi x86: acpi: let the firmware handle pending "CPU remove" events in SMM 2020-12-09 13:04:17 -05:00
adc Use OBJECT_DECLARE_SIMPLE_TYPE when possible 2020-09-18 14:12:32 -04:00
arm arm: xlnx-versal: Connect usb to virt-versal 2020-12-15 12:04:30 +00:00
audio qom: Put name parameter before value / visitor parameter 2020-07-10 15:18:08 +02:00
block qdev: Move softmmu properties to qdev-properties-system.h 2020-12-18 15:20:17 -05:00
char hw/char/pl011: add a clock input 2020-10-27 11:10:44 +00:00
core cpu: Move cpu_common_props to hw/core/cpu.c 2020-12-15 10:02:07 -05:00
cpu Use OBJECT_DECLARE_SIMPLE_TYPE when possible 2020-09-18 14:12:32 -04:00
cris sysbus: Convert to sysbus_realize() etc. with Coccinelle 2020-06-15 22:05:28 +02:00
display Clean up includes 2020-12-10 17:16:44 +01:00
dma Use OBJECT_DECLARE_SIMPLE_TYPE when possible 2020-09-18 14:12:32 -04:00
firmware
gpio hw/gpio: Add GPIO model for Nuvoton NPCM7xx 2020-10-27 11:10:32 +00:00
hyperv Use OBJECT_DECLARE_SIMPLE_TYPE when possible 2020-09-18 14:12:32 -04:00
i2c Use OBJECT_DECLARE_SIMPLE_TYPE when possible 2020-09-18 14:12:32 -04:00
i386 i386: remove bios_name 2020-12-10 12:15:05 -05:00
ide nomaintainer: Fix Lesser GPL version number 2020-11-15 17:04:40 +01:00
input input: tsc2xxx fix. 2020-09-22 21:11:10 +01:00
intc ppc: Convert PPC UIC to a QOM device 2021-01-06 11:09:59 +11:00
ipack Use OBJECT_DECLARE_SIMPLE_TYPE when possible 2020-09-18 14:12:32 -04:00
ipmi Use OBJECT_DECLARE_SIMPLE_TYPE when possible 2020-09-18 14:12:32 -04:00
isa Use OBJECT_DECLARE_SIMPLE_TYPE when possible 2020-09-18 14:12:32 -04:00
kvm target/i386: always create kvmclock device 2020-09-30 19:11:36 +02:00
lm32
m68k Use OBJECT_DECLARE_SIMPLE_TYPE when possible 2020-09-18 14:12:32 -04:00
mem memory-device: Add get_min_alignment() callback 2020-11-03 07:19:26 -05:00
mips hw/mips: Move address translation helpers to target/mips/ 2020-12-13 19:58:54 +01:00
misc hw/ssi: Rename SSI 'slave' as 'peripheral' 2020-12-10 12:15:03 -05:00
net hw/net/can: Introduce Xilinx ZynqMP CAN controller 2020-12-10 11:30:44 +00:00
nubus Use OBJECT_DECLARE_SIMPLE_TYPE when possible 2020-09-18 14:12:32 -04:00
nvram fw_cfg: Refactor extra pci roots addition 2020-12-08 13:48:57 -05:00
pci pci: Let pci_dma_write() propagate MemTxResult 2020-12-10 12:15:02 -05:00
pci-bridge Use OBJECT_DECLARE_SIMPLE_TYPE when possible 2020-09-18 14:12:32 -04:00
pci-host spapr: Fix buffer overflow in spapr_numa_associativity_init() 2021-01-06 11:09:59 +11:00
ppc spapr: Fix buffer overflow in spapr_numa_associativity_init() 2021-01-06 11:09:59 +11:00
rdma Use DECLARE_*CHECKER* macros 2020-09-09 09:27:09 -04:00
riscv riscv/opentitan: Update the OpenTitan memory layout 2020-12-17 21:56:44 -08:00
rtc m48t59: remove legacy m48t59_init() function 2020-10-18 16:21:42 +01:00
rx Use DECLARE_*CHECKER* macros 2020-09-09 09:27:09 -04:00
s390x s390x/pci: fix endianness issues 2020-11-18 16:59:29 +01:00
scsi scsi/scsi_bus: Add scsi_device_get 2020-10-12 11:50:51 -04:00
sd Use OBJECT_DECLARE_SIMPLE_TYPE when possible 2020-09-18 14:12:32 -04:00
semihosting semihosting: Fix Lesser GPL version number 2020-11-15 16:38:03 +01:00
sh4 hw/sh4: Extract timer definitions to 'hw/timer/tmu012.h' 2020-06-22 18:37:12 +02:00
southbridge Use DECLARE_*CHECKER* macros 2020-09-09 09:27:09 -04:00
sparc sparc32-espdma: use object_initialize_child() for esp child object 2020-10-28 07:59:25 +00:00
ssi hw/core/stream: Rename StreamSlave as StreamSink 2020-12-10 12:15:04 -05:00
timer hw/timer/armv7m_systick: Rewrite to use ptimers 2020-10-27 11:15:31 +00:00
tricore
unicore32
usb usb: xlnx-usb-subsystem: Add xilinx usb subsystem 2020-12-15 12:04:30 +00:00
vfio vfio: Change default dirty pages tracking behavior during migration 2020-11-23 10:05:58 -07:00
virtio failover: Remove primary_dev member 2020-12-08 13:48:58 -05:00
watchdog hw/watchdog: Implement SBSA watchdog device 2020-10-27 11:10:44 +00:00
xen xen: remove GNUC check 2020-12-15 12:53:13 -05:00
xtensa
boards.h vl: clean up -boot variables 2020-12-10 12:15:19 -05:00
clock.h hw/core/clock: provide the VMSTATE_ARRAY_CLOCK macro 2020-10-27 11:10:44 +00:00
elf_ops.h elf_ops.h: Be more verbose with ROM blob names 2020-12-15 12:04:30 +00:00
fw-path-provider.h Use DECLARE_*CHECKER* macros 2020-09-09 09:27:09 -04:00
hotplug.h Use DECLARE_*CHECKER* macros 2020-09-09 09:27:09 -04:00
hw.h
ide.h hw/ide: Move MAX_IDE_DEVS define to hw/ide/internal.h 2020-03-17 12:22:36 -04:00
irq.h include/hw/irq.h: New function qemu_irq_is_connected() 2020-08-03 17:55:03 +01:00
loader-fit.h nomaintainer: Fix Lesser GPL version number 2020-11-15 17:04:40 +01:00
loader.h hw/core/loader: Let load_elf() populate a field with CPU-specific flags 2020-01-29 19:28:52 +01:00
nmi.h Use DECLARE_*CHECKER* macros 2020-09-09 09:27:09 -04:00
or-irq.h Use DECLARE_*CHECKER* macros 2020-09-09 09:27:09 -04:00
pcmcia.h Use OBJECT_DECLARE_TYPE when possible 2020-09-18 14:12:32 -04:00
platform-bus.h nomaintainer: Fix Lesser GPL version number 2020-11-15 17:04:40 +01:00
ptimer.h
qdev-clock.h hw/qdev-clock: Avoid calling qdev_connect_clock_in after DeviceRealize 2020-08-28 10:02:46 +01:00
qdev-core.h machine: introduce MachineInitPhase 2020-12-15 12:51:52 -05:00
qdev-dma.h
qdev-properties-system.h qdev: Reuse DEFINE_PROP in all DEFINE_PROP_* macros 2020-12-18 15:20:17 -05:00
qdev-properties.h qdev: Rename qdev_get_prop_ptr() to object_field_prop_ptr() 2020-12-18 15:20:18 -05:00
register.h Use DECLARE_*CHECKER* macros 2020-09-09 09:27:09 -04:00
registerfields.h hw/registerfields: Prefix local variables with underscore in macros 2020-05-27 11:23:07 -07:00
resettable.h Use DECLARE_*CHECKER* macros 2020-09-09 09:27:09 -04:00
stream.h hw/core/stream: Rename StreamSlave as StreamSink 2020-12-10 12:15:04 -05:00
sysbus.h qom: Remove module_obj_name parameter from OBJECT_DECLARE* macros 2020-09-18 14:12:32 -04:00
usb.h Use OBJECT_DECLARE_SIMPLE_TYPE when possible 2020-09-18 14:12:32 -04:00
vmstate-if.h Use DECLARE_*CHECKER* macros 2020-09-09 09:27:09 -04:00