Make VFS hooks dynamic while we're here and say farewell to VFS_ATTACH and
VFS_HOOKS_ATTACH linksets.
As a consequence, most of the file systems can now be loaded as new style
modules.
Quick sanity check by ad@.
Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.
Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:
- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.
- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.
One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.
quick consensus on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.
Implemented for file systems of type ffs.
The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.
Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.
Welcome to 4.99.9 (new vfs op vfs_suspendctl).
loops where vnodes can get removed or added during the loops. This could
lead to panic's on unmount since nodes are skipped or otherwise
TAILQ_NEXT(0xdeadbeef, ...) was dereferenced.
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.
An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.
In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
that callers are not responsible for initializing the fields. Store the name
inside the struct instead of maintaining a pointer to external storage, or
leaked memory (nfs case).
While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.
Discussed on tech-kern, with lots of help from yamt (thanks!).
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html