211a7784b9
Bitwise AND with kvm_run->flags to evaluate if we recovered from MCE or not is not correct, As disposition in kvm_run->flags is a two-bit integer value and not a bit map, So check for equality instead of bitwise AND. Without the fix qemu treats any unrecoverable mce error as recoverable and ends up in a mce loop inside the guest, Below are the MCE logs before and after the fix. Before fix: [ 66.775757] MCE: CPU0: Initiator CPU [ 66.775891] MCE: CPU0: Unknown [ 66.776587] MCE: CPU0: machine check (Harmless) Host UE Indeterminate [Recovered] [ 66.776857] MCE: CPU0: NIP: [c0080000000e00b8] mcetest_tlbie+0xb0/0x128 [mcetest_tlbie] After fix: [ 20.650577] CPU: 0 PID: 1415 Comm: insmod Tainted: G M O 5.6.0-fwnmi-arv+ #11 [ 20.650618] NIP: c0080000023a00e8 LR: c0080000023a00d8 CTR: c000000000021fe0 [ 20.650660] REGS: c0000001fffd3d70 TRAP: 0200 Tainted: G M O (5.6.0-fwnmi-arv+) [ 20.650708] MSR: 8000000002a0b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 42000222 XER: 20040000 [ 20.650758] CFAR: c00000000000b940 DAR: c0080000025e00e0 DSISR: 00000200 IRQMASK: 0 [ 20.650758] GPR00: c0080000023a00d8 c0000001fddd79a0 c0080000023a8500 0000000000000039 [ 20.650758] GPR04: 0000000000000001 0000000000000000 0000000000000000 0000000000000007 [ 20.650758] GPR08: 0000000000000007 c0080000025e00e0 0000000000000000 00000000000000f7 [ 20.650758] GPR12: 0000000000000000 c000000001900000 c00000000101f398 c0080000025c052f [ 20.650758] GPR16: 00000000000003a8 c0080000025c0000 c0000001fddd7d70 c0000000015b7940 [ 20.650758] GPR20: 000000000000fff1 c000000000f72c28 c0080000025a0988 0000000000000000 [ 20.650758] GPR24: 0000000000000100 c0080000023a05d0 c0000000001f1d70 0000000000000000 [ 20.650758] GPR28: c0000001fde20000 c0000001fd02b2e0 c0080000023a0000 c0080000025e0000 [ 20.651178] NIP [c0080000023a00e8] mcetest_tlbie+0xe8/0xf0 [mcetest_tlbie] [ 20.651220] LR [c0080000023a00d8] mcetest_tlbie+0xd8/0xf0 [mcetest_tlbie] [ 20.651262] Call Trace: [ 20.651280] [c0000001fddd79a0] [c0080000023a00d8] mcetest_tlbie+0xd8/0xf0 [mcetest_tlbie] (unreliable) [ 20.651340] [c0000001fddd7a10] [c00000000001091c] do_one_initcall+0x6c/0x2c0 [ 20.651390] [c0000001fddd7af0] [c0000000001f7998] do_init_module+0x90/0x298 [ 20.651433] [c0000001fddd7b80] [c0000000001f61a8] load_module+0x1f58/0x27a0 [ 20.651476] [c0000001fddd7d40] [c0000000001f6c70] __do_sys_finit_module+0xe0/0x100 [ 20.651526] [c0000001fddd7e20] [c00000000000b9d0] system_call+0x5c/0x68 [ 20.651567] Instruction dump: [ 20.651594] e8410018 3c620000 e8638020 480000cd e8410018 3c620000 e8638028 480000bd [ 20.651646] e8410018 7be904e4 39400000 612900e0 <7d434a64> 4bffff74 3c4c0001 38428410 [ 20.651699] ---[ end trace 4c40897f016b4340 ]--- [ 20.653310] Bus error [ 20.655575] MCE: CPU0: machine check (Harmless) Host UE Indeterminate [Not recovered] [ 20.655575] MCE: CPU0: NIP: [c0080000023a00e8] mcetest_tlbie+0xe8/0xf0 [mcetest_tlbie] [ 20.655576] MCE: CPU0: Initiator CPU [ 20.655576] MCE: CPU0: Unknown Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Message-Id: <20200408170944.16003-1-ganeshgr@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> |
||
---|---|---|
.. | ||
translate | ||
arch_dump.c | ||
compat.c | ||
cpu-models.c | ||
cpu-models.h | ||
cpu-param.h | ||
cpu-qom.h | ||
cpu.c | ||
cpu.h | ||
dfp_helper.c | ||
excp_helper.c | ||
fpu_helper.c | ||
gdbstub.c | ||
helper_regs.h | ||
helper.h | ||
int_helper.c | ||
internal.h | ||
kvm_ppc.h | ||
kvm-stub.c | ||
kvm.c | ||
machine.c | ||
Makefile.objs | ||
mem_helper.c | ||
mfrom_table_gen.c | ||
mfrom_table.inc.c | ||
misc_helper.c | ||
mmu_helper.c | ||
mmu-book3s-v3.c | ||
mmu-book3s-v3.h | ||
mmu-hash32.c | ||
mmu-hash32.h | ||
mmu-hash64.c | ||
mmu-hash64.h | ||
mmu-radix64.c | ||
mmu-radix64.h | ||
monitor.c | ||
timebase_helper.c | ||
trace-events | ||
translate_init.inc.c | ||
translate.c | ||
user_only_helper.c |