Reproducer:
A: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600);
rmdir("c/d/e"); rmdir("c/d"); }
B: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600);
rename("c", "c/d/e"); }
C: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600);
rename("c/d/e", "c"); }
Deadlock:
- A holds c and wants to lock d; and either
- B holds . and d and wants to lock c, or
- C holds . and d and wants to lock c.
The problem with these is that genfs_rename_enter_separate in B or C
tried lock order .->d->c->e (in A/B, fdvp->tdvp->fvp->tvp; in A/C,
tdvp->fdvp->tvp->fvp) which violates the ancestor->descendant order
.->c->d->e.
The resolution is to change B to do fdvp->fvp->tdvp->tvp and C to do
tdvp->tvp->fdvp->fvp. But there's an edge case: tvp and fvp might be
the same (hard links), and we can't detect that until after we've
looked them both up -- and in some file systems (I'm looking at you,
ufs), there is no mere lookup operation, only lookup-and-lock, so we
can't even hold the lock on one of tvp or fvp when we look up the
other one if there's a chance they might be the same.
Fortunately the cases
(a) tvp = fvp
(b) tvp or fvp is a directory
are mutually exclusive as long as directories cannot be hard-linked.
In case (a) we can just defer locking {tvp, fvp} until the end, because
it can't possibly have {fdvp or fvp, tdvp or tvp} as descendants. In
case (b) we can just lock them in the order fdvp->fvp->tdvp->tvp or
tdvp->tvp->fdvp->fvp if the first one of {fvp, tvp} is a directory,
because it can't possibly coincide with the second one of {fvp, tvp}.
With this change, we can now prove that the locking order is consistent
with the ancestor->descendant partial ordering. Where two nodes are
incommensurate under that partial ordering, they are only ever locked
by rename and there is only ever one rename at a time.
Proof:
- For same-directory renames, genfs_rename_enter_common locks the
directory first and then the children. The order
directory->child[i] is consistent with ancestor->descendant and
child[0]/child[1] are incommensurate.
- For cross-directory renames:
. While a rename is in progress and the fs-wide rename lock is held,
directories can be created or removed but not changed, so the
outcome of gro_genealogy -- which, given fdvp and tdvp, returns
the node N relating fdvp/N/.../tdvp or null if there is none --
can only transition from finding N to not finding N, if one of
the directories is removed while any of the vnodes are unlocked.
Merely creating directories cannot change the ancestry of tdvp,
and concurrent renames are not possible.
Thus, if a gro_genealogy determined the operation to have the
form fdvp/N/.../tdvp, then it might cease to have that form, but
only because tdvp was removed which will harmlessly cause the
rename to fail later on. Similarly, if gro_genealogy determined
the operation _not_ to have the form fdvp/N/.../tdvp then it
can't begin to have that form until after the rename has
completed.
The lock order is,
=> for fdvp/.../tdvp:
1. lock fdvp
2. lookup(/lock/unlock) fvp (consistent with fdvp->fvp)
3. lock fvp if a directory (consistent with fdvp->fvp)
4. lock tdvp (consistent with fdvp->tdvp and possibly fvp->tdvp)
5. lookup(/lock/unlock) tvp (consistent with tdvp->tvp)
6. lock fvp if a nondirectory (fvp->t* or fvp->fdvp is impossible)
7. lock tvp if not fvp (tvp->f* is impossible unless tvp=fvp)
=> for incommensurate fdvp & tdvp, or for tdvp/.../fdvp:
1. lock tdvp
2. lookup(/lock/unlock) tvp (consistent with tdvp->tvp)
3. lock tvp if a directory (consistent with tdvp->tvp)
4. lock fdvp (either incommensurate with tdvp and/or tvp, or
consistent with tdvp(->tvp)->fdvp)
5. lookup(/lock/unlock) fvp (consistent with fdvp->fvp)
6. lock tvp if a nondirectory (tvp->f* or tvp->tdvp is impossible)
7. lock fvp if not tvp (fvp->t* is impossible unless fvp=tvp)
Deadlocks found by hannken@; resolution worked out with dholland@.
XXX I think we could improve concurrency somewhat -- with a likely
big win for applications like tar and rsync that create many files
with temporary names and then rename them to the permanent one in the
same directory -- by making vfs_renamelock a reader/writer lock: any
number of same-directory renames, or exactly one cross-directory
rename, at any one time.
It had no effect because RUMP_SOCKETS_LIST is not set in the shell
running the cleanup phase. Even if RUMP_SOCKETS_LIST had been set,
the code would still not have worked correctly because it ran
rump.halt via "atf_check -s exit:1", which would cause the first
successful halting of a rump processes to be treated as a failure
and abort the cleanup without halting any other rump processes still
running.
fifo_vnodeop_opv_desc symbols.
Many filesystems ffs, lfs, ulfs, chfs, ext2fs etc. use fifofs
internally for their fifo vnops. NFS does too, but it also needs
networking anyway. Unfortunately fifofs brings in a lot of the
networking code so that the rumpkernel is not well partition. In
addition the fifo code is rarely used.
The existing hack depended on duplicating the above symbols and
adding minimal functionality for the majority of the the tests
(except the ffs and the puffs one). In these two cases both symbols
were loaded and the symbol sizes clashed which broke the sanitizers.
While this can be fixed with weak symbols and other kinds of
indirection, it is more straight forward to select between the
minimal and the full fifofs implementation by introducing a new
shared library librumpvfs_nofifofs.
Makefiles so that we can make changes to it centrally as needed and have
less mess. Fixes the sun2 build that needs rumpvfs after librump after
the latest changes.
GCC_NO_FORMAT_TRUNCATION -Wno-format-truncation (GCC 7/8)
GCC_NO_STRINGOP_TRUNCATION -Wno-stringop-truncation (GCC 8)
GCC_NO_STRINGOP_OVERFLOW -Wno-stringop-overflow (GCC 8)
GCC_NO_CAST_FUNCTION_TYPE -Wno-cast-function-type (GCC 8)
use these to turn off warnings for most GCC-8 complaints. many
of these are false positives, most of the real bugs are already
commited, or are yet to come.
we plan to introduce versions of (some?) of these that use the
"-Wno-error=" form, which still displays the warnings but does
not make it an error, and all of the above will be re-considered
as either being "fix me" (warning still displayed) or "warning
is wrong."
validate that utimes() cannot update the times of a file on a read only
filesystem. The values are never actually used, but since
src/sys/kern/vfs_syscalls.c 1.535
they are validated for sanity, and the syscall returns EINVAL if the
values passed are invalid (tv_usec <0 or >= 1000000). If that happens
we don't get as far as the test which produces the EROFS that is expected
from this test (these tests - one for each filesystem type).
So, init the timeval structs (just to 0, the values will still not be
used) so that the EINVAL doesn't bite us before we're eaten by the EROFS
which is the way we're supposed to die.
If the syscall API args were labelled as "const" the compiler probably
would have caught the use of uninit'd vars and complained much sooner.
filesystem tests. Use the new -J option to pass the raw device into
the cleaner. This avoids the not rump safe getdiskrawname call and
makes sure we use an internal rump device name for cleaning. This
should fix bin/54488.
expected failure referencing PR kern/53865, and force failure to avoid
reports of unexpected success as it does not realiably fail under
qemu. This makes the treatment of udf_renamerace_dirs the same as
that of udf_renamerace, only with a different PR. Also, make
whitespace consistent between the two.
Use user flag UF_NODUMP instead of UF_IMMUTABLE for the test as it
is the only user flag supported by all tested file systems.
PR kern/47656 test zfs_flags.