As noted by Thierry Deval in a posting to misc/at/openbsd.org,

rf_DecrAccessesCountState wasn't in the correct spot in
RF_AccessState_e.  Following up on that has resulted in one other
correction.  Changing orderings of these states is tricky, and
shouldn't be attempted without some thorough analysis.  For the
changes committed, the following analysis is offerred:

1) RAIDframe uses a little state machine to take care of building,
executing, and processing the DAGs used to direct IO.

2) The rf_DecrAccessesCountState state is handled by the function
rf_State_DecrAccessCount().  The purpose of this state is to
decrement the number of "accesses-in-flight".

3) rf_Cleanup_State is handled by rf_State_Cleanup().  Its job is to
do general cleanup of DAG arrays and any stripe locks.

4) DefaultStates[] in rf_layout.c indicates that the right spot
for rf_DecrAccessesCountState is just before rf_Cleanup_State.
Analysis of code for both states indicates that the order doesn't
matter too much, although rf_State_DecrAccessCount() should probably
take place *after* rf_State_Cleanup() to be more correct.

5) Comments in rf_State_ProcessDAG() indicates that the next state
should be rf_Cleanup_State.  However: it attempts to get there by using

 desc->state++;

which actually takes it to just rf_DecrAccessesCountState! This turned
out to be OK before, since rf_Cleanup_State would follow right after,
and all would be taken careof (albeit in arguably the "less correct"
order).

6) With the current ordering, if we head directly to rf_Cleanup_State
(as we do, for example, if multiple components fail in a RAID 5 set),
then we'll actually miss going trough rf_DecrAccessesCountState), and
could end up never being able to reach quiescence!  Perhaps not too
big of a deal, given that the RAID set is pretty much toast by that
point at which such a drastic state change happens, but might as well
have this correct.

The changes made are:
1) Since having rf_State_DecrAccessCount() come after
rf_State_Cleanup() is just fine, change rf_layout.c to reflect that
rf_DecrAccessesCountState comes after rf_Cleanup_State (i.e. they swap
positions in the state list).  This means that going to
rf_Cleanup_State after bailing on a failed DAG access will do all the
right things -- the state will get cleaned up, and then the access
counts will get decremented properly.  The comment in
rf_State_ProcessDAG() is now actually correct -- the next state *will*
be rf_Cleanup_State.

2) Move rf_DecrAccessesCountState in RF_AccessState_e to just after
rf_CleanupState.  This puts RF_AccessState_e in sync with
DefaultStates[].  Fortunately, these states are rarely referred to by
name, and so this change ends up being mostly cosmetic -- it really
only fixes cleanup behaviour for the recent "Failed to create a DAG"
changes.
This commit is contained in:
oster 2004-01-04 06:37:16 +00:00
parent 232d108953
commit 87e6ffb752
2 changed files with 5 additions and 5 deletions

View File

@ -1,4 +1,4 @@
/* $NetBSD: raidframevar.h,v 1.5 2003/12/29 02:38:17 oster Exp $ */ /* $NetBSD: raidframevar.h,v 1.6 2004/01/04 06:37:16 oster Exp $ */
/*- /*-
* Copyright (c) 1996, 1997, 1998 The NetBSD Foundation, Inc. * Copyright (c) 1996, 1997, 1998 The NetBSD Foundation, Inc.
* All rights reserved. * All rights reserved.
@ -281,7 +281,6 @@ typedef enum RF_AccessState_e {
/* original states */ /* original states */
rf_QuiesceState, /* handles queisence for reconstruction */ rf_QuiesceState, /* handles queisence for reconstruction */
rf_IncrAccessesCountState, /* count accesses in flight */ rf_IncrAccessesCountState, /* count accesses in flight */
rf_DecrAccessesCountState,
rf_MapState, /* map access to disk addresses */ rf_MapState, /* map access to disk addresses */
rf_LockState, /* take stripe locks */ rf_LockState, /* take stripe locks */
rf_CreateDAGState, /* create DAGs */ rf_CreateDAGState, /* create DAGs */
@ -289,6 +288,7 @@ typedef enum RF_AccessState_e {
rf_ProcessDAGState, /* DAGs are completing- check if correct, rf_ProcessDAGState, /* DAGs are completing- check if correct,
* or if we need to retry */ * or if we need to retry */
rf_CleanupState, /* release stripe locks, clean up */ rf_CleanupState, /* release stripe locks, clean up */
rf_DecrAccessesCountState,
rf_LastState /* must be the last state */ rf_LastState /* must be the last state */
} RF_AccessState_t; } RF_AccessState_t;

View File

@ -1,4 +1,4 @@
/* $NetBSD: rf_layout.c,v 1.15 2003/12/30 21:59:03 oster Exp $ */ /* $NetBSD: rf_layout.c,v 1.16 2004/01/04 06:37:16 oster Exp $ */
/* /*
* Copyright (c) 1995 Carnegie-Mellon University. * Copyright (c) 1995 Carnegie-Mellon University.
* All rights reserved. * All rights reserved.
@ -30,7 +30,7 @@
*/ */
#include <sys/cdefs.h> #include <sys/cdefs.h>
__KERNEL_RCSID(0, "$NetBSD: rf_layout.c,v 1.15 2003/12/30 21:59:03 oster Exp $"); __KERNEL_RCSID(0, "$NetBSD: rf_layout.c,v 1.16 2004/01/04 06:37:16 oster Exp $");
#include <dev/raidframe/raidframevar.h> #include <dev/raidframe/raidframevar.h>
@ -85,8 +85,8 @@ static const RF_AccessState_t DefaultStates[] = {
rf_CreateDAGState, rf_CreateDAGState,
rf_ExecuteDAGState, rf_ExecuteDAGState,
rf_ProcessDAGState, rf_ProcessDAGState,
rf_DecrAccessesCountState,
rf_CleanupState, rf_CleanupState,
rf_DecrAccessesCountState,
rf_LastState}; rf_LastState};
#define RF_NU(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p) a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p #define RF_NU(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p) a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p