From 83a99212b0b2aaea35250f93a1acb760de542275 Mon Sep 17 00:00:00 2001 From: manu Date: Thu, 27 Sep 2018 01:03:40 +0000 Subject: [PATCH] Work around deadlock between fstchg and fstcnt When suspending a filesystem in fstrans_setstate(), we wait on fstcnt for threads to finish transactions. While we do this, any thread trying to start a filesystem transaction will wait on fstchg in fstrans_start(), a situation which can deadlock. The wait for fstcnt in fstrans_setstate() can be interrupted by a signal, but the wait for fstchg in fstrans_start() cannot. Once most processes are stuck in fstchg, it is impossible to send a signal to the thread that waits on fstcnt, because no process respond anymore to user input. We fix that by adding a timeout to the wait on fstcnt in fstrans_setstate(). This means suspending a filesystem may fail, but it was already the case when the sleep was interupted by a signal, hence calling function must already handle a possible failure. Fixes kern/53624 --- sys/kern/vfs_trans.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/sys/kern/vfs_trans.c b/sys/kern/vfs_trans.c index c38e0d3fb80e..d9a7efbc07c1 100644 --- a/sys/kern/vfs_trans.c +++ b/sys/kern/vfs_trans.c @@ -1,4 +1,4 @@ -/* $NetBSD: vfs_trans.c,v 1.48 2017/06/18 14:00:17 hannken Exp $ */ +/* $NetBSD: vfs_trans.c,v 1.49 2018/09/27 01:03:40 manu Exp $ */ /*- * Copyright (c) 2007 The NetBSD Foundation, Inc. @@ -30,7 +30,7 @@ */ #include -__KERNEL_RCSID(0, "$NetBSD: vfs_trans.c,v 1.48 2017/06/18 14:00:17 hannken Exp $"); +__KERNEL_RCSID(0, "$NetBSD: vfs_trans.c,v 1.49 2018/09/27 01:03:40 manu Exp $"); /* * File system transaction operations. @@ -42,6 +42,7 @@ __KERNEL_RCSID(0, "$NetBSD: vfs_trans.c,v 1.48 2017/06/18 14:00:17 hannken Exp $ #include #include +#include #include #include #include @@ -532,10 +533,14 @@ fstrans_setstate(struct mount *mp, enum fstrans_state new_state) /* * All threads see the new state now. * Wait for transactions invalid at this state to leave. + * We cannot wait forever because many processes would + * get stuck waiting for fstcnt in fstrans_start(). This + * is acute when suspending the root filesystem. */ error = 0; while (! state_change_done(mp)) { - error = cv_wait_sig(&fstrans_count_cv, &fstrans_lock); + error = cv_timedwait_sig(&fstrans_count_cv, + &fstrans_lock, hz / 4); if (error) { new_state = fmi->fmi_state = FSTRANS_NORMAL; break;