nfs errors are chosen to be the same as errno, some of them are not and
it is better for portability to do the conversion anyway. Also a server
can return a bad error number that can cause the server to crash, because
it can have the high bits that are used internally set. This was the case
with amd. Finally nfs_request() should return a valid errno, because we
can return a bogus value to userland. Thanks to rpaulo for debugging this.
- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.
Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
of curproc (where uio->uio_procp should be used?). Don't do this
for nfs_commit(), because yamt says it is possibly wrong.
2. nfs_doio() does not use struct proc; remove it and the code to compute it.
3. use copyin_proc() and copyout_proc() instead of copyin() and copyout().
4. check return of copyout_proc(). and mark return from copyin_proc() XXX
5. Eliminate check p == curproc assertion check from nfs_write;
nfs_read does not have it and we might be called in a different
process context anyway (PR 20138).
into the "vfsops" link set.
- Use VFS_ATTACH() where vfsops are declared for individual file systems.
- In vfsinit(), traverse the "vfsops" link set, rather than vfs_list_initial[].
this means we can no longer look at the vnode size to determine how many
pages to request in a fault, which is good since for NFS the size can change
out from under us on the server anyway. there's also a new flag UBC_UNMAP
for ubc_release(), so that the file system code can make the decision about
whether to cache mappings for files being used as executables.
for i/o requests which are expected not to fail due to permission
to mimic unix file open semantics (READ, WRITE, COMMIT),
try two credentials. namely, the file owner's one and open time one.
remember which credential worked in per-file basis and try it first
next time to minimize number of retries.
ideas from Chuck Silvers. PR/23716 and PR/24987.
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.
Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.
Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86
Clean up some crappy pointer frobnication.
- catchup to in_ifaddr -> in_ifaddrhead rename.
XXX the address on the top of in_ifaddrhead is likely 127.0.0.1.
using it to construct the verifier doesn't make much sense.
maybe it's better to use some uuid or ip_randomid-like method.
- "intrusive" dirops now have more chances to get benefits from dnlc.
- fixes a deadlock due to vnode locking order inversion.
nfs_create and others: purge stale dnlc entries
as nfs_lookup() no longer does it automatically.
for consistency with M_FREE() and m_freem(). Affected files:
sys/mbuf.h
kern/uipc_socket2.c
kern/uipc_mbuf.c
net/if_ethersubr.c
netatalk/ddp_input.c
nfs/nfs_socket.c
- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.
Welcome to 2.0F.
Approved by: Jason R. Thorpe <thorpej@netbsd.org>
warning is triggered pervasively, so print it only once per boot.
(The callers who pass NULL r_procps should soon be fixed to pass a
valid struct proc* ).
otherwise we can be stuck in soreceive forever.
the problem is pointed by Minoura Makoto. PR/25662
- clear r_rexmit on reconnect and clear r_rtt and R_TIMING on retransmit
so that the above (and soft mounts) happy.
Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.
Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.
Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.
Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.
no longer use and/or need it
- removed casts from unionfs, deadfs and fdesc
(there are more to hunt down still)
- changed vfs_quotactl args argumet from caddr_t to void *
- changed vfs_quotactl structures/callers to reflect the api change
Compiled fine and ran for about a day. Approved/reviewed by
christos@netbsd.org and gimpy@netbsd.org.
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.
Convert struct session, ucred and lockf to pools.
in many cases, GETATTR RPCs here is redundant because the caller has
postop_attr. instead, make sure the resulted vnode have a valid
attribute in nfs_lookup().
a driver selectable callback function. This is used in the Xen port to
allow controlling the domain's network setup from the domain building
environment at domain creation (vs. having to maintain/change this on a
dhcp server). The Xen network driver parses a command line passed in
from the domain builder.
already being locked by our thread. VOP_INACTIVATE() makes no
statement as to the lock state of the parent, yet this code assumed
we had it unlocked.
With this change, we let vn_lock() fail with EDEADLK if we already
have the parent locked. We then handle the rename cleanup, and on
the way out just vrele() the parent vnode, not vput() it.
Fixes a case seen by Steve Woodford at Wasabisystems dot com where
we'd panic while running a pkgsrc configure test that verified
fork() functionality. I expect the problem is a result of the recent
exit() changes and the performance of the machines he tested on.
Specifically we would crash during an nfs_remove(). As best I can
tell, when nfs_remove() tested to see if we should rename or we
should remove, v_usecount was > 1 and vattr.va_nlink was 1. Thus
we did the sillyrename in nfs_remove(). However by the time we got
down to the vput(vp), v_usecount had dropped to one and thus vput()
triggered the VOP_INACTIVATE() code path. nfs_inactive() tries to
lock the parent to undo the sillyrename, and deadlocks as we still
have it locked.
is opened. An open file can always be read from and/or written to,
depending on how it was opened.
Therefore, the read/write/commit RPCs should never return EACCESS,
as they are only performed on files that have been successfully opened
already.
This change improves the current situation and works in most cases.
It simply always uses the most recently known owner/group of the file,
iff the authentication mechanism is AUTH_UNIX (in other cases, the
creds for a succesful open are used, but note that no other cases
are currently implemented).
A retry mechanism can be used to catch a few more cases, but this is
a good improvement for now.
Increase NFS_MAXRAHEAD to 32. With 32k read or write requests, that
amounts to 1 Mbyte of read-ahead, enough to cover about 10 ms latency
at gigabit Ethernet speeds. Increase the table of nfsiod kthreads
(NFS_MAXASYNCDAEMON) from 20 to 128, to match the raised value of
NFS_MAXRAHEAD. (Making the limit dynamic requires replacing the
compile-time array with a dynamic structure.)
Add a comment explaining that each read-ahead requires an I/O thread.
Wrap both parameters with an #ifdef <parameter>/#endif, to allow
hand-tuned values or (later) a kernel config-file option override.
Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.
Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.
All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.
PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
local-loopback (lo0). As posted for review on tech-kern 2003-18-09,
with a long comment explaining (one of) the deadlock scenarios.
I've used this since shortly after 2002-09-12-, without noticing
performance degradataion or instability for non-loopback mounts.
previous error conditions.
If "(flags & (V_WAIT|V_PCATCH)) == V_WAIT" the return value is always zero.
Ignore the return value in these cases.
From Darrin B. Jewell.
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.
From FreeBSD with slight modifications.
Approved by: Frank van der Linden <fvdl@netbsd.org>
it has a bug in the backoff calculation. so,
- clip it to 1-60 sec. (suggested by Rick Macklem)
- use a constant multiplier instead of nfs_backoff, which
is already exponential.
- move some related constant definations to nfs.h from nqnfs.h and
prefix with NFS_ instead of NQ_ because they are not nqnfs-specific.
past the end of the file. This can happen when two clients are writting to
the same file.
Close PR 21696 by myself, discussed on tech-net in 2003/05 and 2003/06.
Issue raised by Chuck Silvers (commit and truncate ops needs to be serialised)
still unadressed.
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.
Bump the kernel rev up to 1.6V
- for READ procedure, don't send back more bytes than requested.
- don't have doubtful assumptions on mbuf chain structure.
- rename a function (nfsm_adj -> nfs_zeropad) to avoid confusion as
the semantics of the function was changed.
retransmitted mbufs can survive even after requests themselves
finished. so, before unbusy pages, make sure that mbufs referring them
go away.
pointed by enami tsugutomo on port-mips.
while our nfsd announces MAXBSIZE as wtmax for tcp,
VOP_GETPAGES of filesystems that uses genfs_getpages can't
handle >= MAX_READ_AHEAD(16) pages at once.
therefore, depending on PAGE_SIZE of the machine and file offset of
a read request, we can't VOP_GETPAGES the range at once.
instead of "struct vnode". This saves a number of pointer dereferences;
it sums up to about half a kB for me. And it paves the way for future
fixes.
While cleaning up, eliminate a write-only member of "struct nfsreq"
and a pointless assignment in the NFS_V2_ONLY case.
- Under chroot it displays only the visible filesystems with appropriate paths.
- The statfs f_mntonname gets adjusted to contain the real path from root.
- While was there, fixed a bug in ext2fs, locking problems with vfs_getfsstat(),
and factored out some of the vfsop statfs() code to copy_statfs_info(). This
fixes the problem where some filesystems forgot to set fsid.
- Made coda look more like a normal fs.
belong to us. otherwise, data will be lost on server crash.
- use b_bcount instead of b_bufsize to determine
how many pages we should deal with.
based on a patch from Chuck Silvers.
discussed on tech-kern.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:
* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.
* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.
* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.
And a few that are not strictly necessary:
* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."
* Unified GOP_ALLOC between FFS and LFS.
* Update LFS copyright headers to correct values.
* Actually cast to unsigned in lfs_shellsort, like the comment says.
* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
It will never get back... it will not be found in nfs_nget, a new
nfsnode+vnode is allocated instead, which causes a node leak, and
also makes the mountpointness of the vnode to be forgotten, breaking
filesystem crossing lookups through this vnode.
into nfs_inactive, this is a better place for it.
This doesn't actually solve the actual problem, which appears to be a race
condition with unmounting and vnode recycling somewhere, but it fixes
it in the sense that nfs_reclaim will not reference a bad v_mount anymore.
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals
kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)
based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe