Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.
quick consensus on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
workaround for the problem analyzed more deeply in kern/30831. In
short, the problem is keeping the vnode on the mount point vnode
list during reclaim. If reclaim happens to sleep (as is a possibility
with smbfs due to calling vrele() and therefore possibly VOP_INACTIVE),
code going through the entire mountpoint vnode list will hit
half-reclaimed vnodes.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.
Implemented for file systems of type ffs.
The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.
Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.
Welcome to 4.99.9 (new vfs op vfs_suspendctl).
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
reference to the parent directory's vnode instead of its smbnode to
avoid a use-after-free bug causing a panic when a smbfs mount is
forcefully unmounted.
Keep trying to flush the vnode list for the mount while some are still
busy and we are making progress towards making them not busy. This
stops attempts to unmount idle smbfs mounts failing with EBUSY.
The easiest way to reproduce the above problem, from what I have seen is:
1) Assume /s is a smbfs mount point.
2) mount /s
3) stat /s/foo/1
4) umount /s
Returns error because the file system is busy.
5) Shutdown the machine: panic in smbfs_reclaim because vrele
accesses already-released memory.
loops where vnodes can get removed or added during the loops. This could
lead to panic's on unmount since nodes are skipped or otherwise
TAILQ_NEXT(0xdeadbeef, ...) was dereferenced.
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.
An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.
In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
If you perform this request on a directory with exactly 50 files
(plus '.' and '..' which brings the total to 52 objects), the first
reply for the SMB server completely satisfies the query (server
side is Windows 2000 Professional).
The smbfs client then performs a TRANS2_FIND_NEXT2 using the last
file name as the resume key. The response returns a SearchCount
of zero (ctx->f_ecnt == 0) and an EndOfSearch code of zero.
Any attempt to get more entries with calls to TRANS2_FIND_NEXT2
result in Badfid (bad file descriptor). I suspect the return of
SearchCount of zero means that end-of-search has been reached and
the Sid is now closed.
The solution is to set "SMB_RDD_EOF | SMB_RDD_NOCLOSE" after getting
back a zero SearchCount, I've tested this in the field on a quite
a few systems, aggressively accessing Windows shares over smbfs
and it appears flawless.
I was initially concerned about the possibility of resource exhaustion
on the Windows server. I was afraid by not officially closing the
search, it would leave a resource hung-up and over time, exhaust
some sort of "open search table" limit. I've since convinced myself
this is NOT the case.
Windows needs to be able to handle clients that come and go over
time. If the search is not closed, Windows will close it if it
finds it needs more resources. I've testing this on directory
searches descending into 10's of thousands of folders, with 100's
of thousands of files.
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
- make sure that kernel only files don't compile in userland using #error
- XXX: some kernel only files still get installed.
- XXX: some files used in userland, don't get installed.