qemu/hw
Jose Ricardo Ziviani b12a4efb76 Fix a deadlock case in the CPU hotplug flow
We need to set cs->halted to 1 before calling ppc_set_compat. The reason
is that ppc_set_compat kicks up the new thread created to manage the
hotplugged KVM virtual CPU and the code drives directly to KVM_RUN
ioctl. When cs->halted is 1, the code:

int kvm_cpu_exec(CPUState *cpu)
...
     if (kvm_arch_process_async_events(cpu)) {
         atomic_set(&cpu->exit_request, 0);
         return EXCP_HLT;
     }
...

returns before it reaches KVM_RUN, giving time to the main thread to
finish its job. Otherwise we can fall in a deadlock because the KVM
thread will issue the KVM_RUN ioctl while the main thread is setting up
KVM registers. Depending on how these jobs are scheduled we'll end up
freezing QEMU.

The following output shows kvm_vcpu_ioctl sleeping because it cannot get
the mutex and never will.
PS: kvm_vcpu_ioctl was triggered kvm_set_one_reg - compat_pvr.

STATE: TASK_UNINTERRUPTIBLE|TASK_WAKEKILL

PID: 61564  TASK: c000003e981e0780  CPU: 48  COMMAND: "qemu-system-ppc"
 #0 [c000003e982679a0] __schedule at c000000000b10a44
 #1 [c000003e98267a60] schedule at c000000000b113a8
 #2 [c000003e98267a90] schedule_preempt_disabled at c000000000b11910
 #3 [c000003e98267ab0] __mutex_lock at c000000000b132ec
 #4 [c000003e98267bc0] kvm_vcpu_ioctl at c00800000ea03140 [kvm]
 #5 [c000003e98267d20] do_vfs_ioctl at c000000000407d30
 #6 [c000003e98267dc0] ksys_ioctl at c000000000408674
 #7 [c000003e98267e10] sys_ioctl at c0000000004086f8
 #8 [c000003e98267e30] system_call at c00000000000b488

crash> struct -x kvm.vcpus 0xc000003da0000000
vcpus = {0xc000003db4880000, 0xc000003d52b80000, 0xc0000039e9c80000, 0xc000003d0e200000, 0xc000003d58280000, 0x0, 0x0, ...}

crash> struct -x kvm_vcpu.mutex.owner 0xc000003d58280000
  mutex.owner = {
    counter = 0xc000003a23a5c881 <- flag 1: waiters
  },

crash> bt 0xc000003a23a5c880
PID: 61579  TASK: c000003a23a5c880  CPU: 9   COMMAND: "CPU 4/KVM"
(active)

crash> struct -x kvm_vcpu.mutex.wait_list 0xc000003d58280000
  mutex.wait_list = {
    next = 0xc000003e98267b10,
    prev = 0xc000003e98267b10
  },

crash> struct -x mutex_waiter.task 0xc000003e98267b10
  task = 0xc000003e981e0780

The following command-line was used to reproduce the problem (note: gdb
and trace can change the results).

 $ qemu-ppc/build/ppc64-softmmu/qemu-system-ppc64 -cpu host \
     -enable-kvm -m 4096 \
     -smp 4,maxcpus=8,sockets=1,cores=2,threads=4 \
     -display none -nographic \
     -drive file=disk1.qcow2,format=qcow2
 ...
 (qemu) device_add host-spapr-cpu-core,core-id=4
[no interaction is possible after it, only SIGKILL to take the terminal
back]

Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-09-03 11:46:43 +10:00
..
9pfs 9p: darwin: Explicitly cast comparisons of mode_t with -1 2018-06-29 12:32:10 +02:00
acpi i2c: pm_smbus: Add the ability to force block transfer enable 2018-08-23 18:46:25 +02:00
adc Include qapi/error.h exactly where needed 2018-02-09 13:50:17 +01:00
alpha hw/alpha: Use the IEC binary prefix definitions 2018-07-02 15:41:14 +02:00
arm hw/arm/mps2: Fix ID register errors on AN511 and AN385 2018-08-24 13:17:50 +01:00
audio fix "Missing break in switch" coverity reports 2018-08-23 13:32:50 +02:00
block block: Remove deprecated -drive option serial 2018-08-15 12:50:39 +02:00
bt hw/bt: Replace fprintf(stderr, "*\n" with error_report() 2018-01-22 09:51:00 +01:00
char imx_serial: Generate interrupt on receive data ready if enabled 2018-08-20 11:24:31 +01:00
core sysbus: always allow explicit_ofw_unit_address() to override address generation 2018-08-16 22:27:43 -03:00
cpu hw/cpu/a15mpcore: If CPU has EL2, enable it on the GIC and wire it up 2018-08-24 13:17:34 +01:00
cris hw/cris: Use the IEC binary prefix definitions 2018-07-02 15:41:15 +02:00
display hw/display/bcm2835_fb: Validate bcm2835_fb_mbox_push() config 2018-08-24 13:17:50 +01:00
dma hw/dma/pl080: Remove hw_error() if DMA is enabled 2018-08-20 11:24:33 +01:00
gpio hw/i2c: Use DeviceClass::realize instead of I2CSlaveClass::init 2018-06-01 15:14:31 +02:00
hppa hw/hppa: Use the IEC binary prefix definitions 2018-07-02 15:41:14 +02:00
i2c i2c: pm_smbus: Add the ability to force block transfer enable 2018-08-23 18:46:25 +02:00
i386 intel-iommu: replace more vtd_err_* traces 2018-08-27 15:09:20 +02:00
ide macio: add addr property to macio IDE object 2018-08-30 10:42:18 +10:00
input hw/input/tsc2005: Convert a fprintf() call to trace events 2018-06-29 15:04:18 +01:00
intc hw/intc/arm_gic: Make per-cpu GICH memory regions 0x200 bytes large 2018-08-24 13:17:31 +01:00
ipack hw/ipack: Use the IEC binary prefix definitions 2018-07-02 15:41:12 +02:00
ipmi ipmi: Use proper struct reference for BT vmstate 2018-08-23 18:46:25 +02:00
isa i2c: pm_smbus: Add the ability to force block transfer enable 2018-08-23 18:46:25 +02:00
lm32 hw/lm32: Use the IEC binary prefix definitions 2018-07-02 15:41:15 +02:00
m68k hw/m68k: Use the IEC binary prefix definitions 2018-07-02 15:41:14 +02:00
mem pc-dimm: assign and verify the "addr" property during pre_plug 2018-08-23 18:46:25 +02:00
microblaze hw/microblaze/xlnx-zynqmp-pmu: Fix introspection problem in 'xlnx, zynqmp-pmu-soc' 2018-07-23 15:21:25 +01:00
mips mips_malta: Fix semihosting argument passing for nanoMIPS bare metal 2018-08-24 17:51:59 +02:00
misc macio: add addr property to macio IDE object 2018-08-30 10:42:18 +10:00
moxie Change references to serial_hds[] to serial_hd() 2018-04-26 13:57:00 +01:00
net e1000e: Prevent MSI/MSI-X storms 2018-07-20 08:30:48 +08:00
nios2 hw/nios2: Use the IEC binary prefix definitions 2018-07-02 15:41:15 +02:00
nvram fw_cfg: ignore suffixes in the bootdevice list dependent on machine class 2018-08-16 22:27:43 -03:00
openrisc Change references to serial_hds[] to serial_hd() 2018-04-26 13:57:00 +01:00
pci virtio,vhost,pci,pc: features, cleanups 2018-03-20 15:48:34 +00:00
pci-bridge virtio,vhost,pci,pc: features, fixes and cleanups 2018-02-13 16:33:31 +00:00
pci-host uninorth: add ofw-addr property to allow correct fw path generation 2018-08-30 10:42:18 +10:00
pcmcia
ppc Fix a deadlock case in the CPU hotplug flow 2018-09-03 11:46:43 +10:00
rdma config: split PVRDMA from RDMA 2018-08-18 18:01:34 +03:00
riscv spike: Fix crash when introspecting the device 2018-07-19 09:05:48 -07:00
s390x s390x: remove 's390-squash-mcss' option 2018-08-20 14:18:49 +02:00
scsi vhost-scsi: expose 't10_pi' property for VIRTIO_SCSI_F_T10_PI 2018-08-23 18:46:25 +02:00
sd sdhci: add i.MX SD Stable Clock bit 2018-08-20 11:24:32 +01:00
sh4 hw/sh4: Use the IEC binary prefix definitions 2018-07-02 15:41:15 +02:00
smbios hw/smbios: Use the IEC binary prefix definitions 2018-07-02 15:41:12 +02:00
sparc sun4m: don't use legacy fw_cfg_init_mem() function 2018-08-20 19:18:31 +01:00
sparc64 sun4u: ensure kernel_top is always initialised 2018-08-20 19:18:31 +01:00
ssi hw/ssi/pl022: Correct wrong DMACR and ICR handling 2018-08-24 13:17:46 +01:00
timer hw/timer/cmsdk-apb-dualtimer: Implement CMSDK dual timer module 2018-08-24 13:17:41 +01:00
tpm tpm: extend TPM TIS with state migration support 2018-05-24 12:07:04 -04:00
tricore hw/tricore: Use the IEC binary prefix definitions 2018-07-02 15:41:14 +02:00
unicore32 hw/input/i8042: Extract declarations from i386/pc.h into input/i8042.h 2018-03-12 16:12:48 +01:00
usb dev-mtp: rename x-root to rootdir 2018-08-21 10:27:59 +02:00
vfio vfio/pci: Fix failure to close file descriptor on error 2018-08-23 10:45:58 -06:00
virtio kvm: Use inhibit to prevent ballooning without synchronous mmu 2018-08-17 09:27:15 -06:00
watchdog hw/watchdog/cmsdk_apb_watchdog: Implement CMSDK APB watchdog module 2018-08-20 11:24:33 +01:00
xen xen: Don't use memory_region_init_ram_nomigrate() in pci_assign_dev_load_option_rom() 2018-06-22 13:28:42 +01:00
xenpv hw/xen: Use the IEC binary prefix definitions 2018-07-02 15:41:13 +02:00
xtensa hw/xtensa: Use the IEC binary prefix definitions 2018-07-02 15:41:14 +02:00
Makefile.objs hw: allow compiling out SCSI 2018-06-01 15:14:31 +02:00