kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals
kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)
based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
add a flag that specify if the file can be truncated safely or not
to nfsm_loadattr and friends. when it isn't safe, just mark the nfsnode
as "should be truncated later".
ok'ed by Frank van der Linden and Chuck Silvers.
close kern/18036.
routers dropping the packet
(seems to be a problem with Cisco and its "helper-address" feature;
a Cabletron SSR I tested with didn't have this problem)
PGO_LOCKED getpages request. So, just make the lock fail and tell
the caller that there is no pages available if we can't acquire it.
The caller will call us again soon without PGO_LOCKED. Reviewed by chuq.
set via NFSV3SATTRTIME_TOSERVER and not NFSV3SATTRTIME_TOCLIENT,
add VA_UTIMES_NULL to the va_vflags. This reflects our policy
where we're much more liberal about who can set a & m times to 'now'
than we are about who can set them to a specific time.
Should close PR 15597 from Martin Husemann. Patch is based on the
one Matthias Drochner gave in the PR.
clean and without writable mappings. if we try to flush dirty pages past
EOF to the server when NMODIFIED is clear, we'll update the attrcache before
doing the write, which will try to free the pages past EOF and deadlock.
to deal with this, we write-protect pages before we send them to the server,
and restrict ourselves to creating read-only mappings if NMODIFIED isn't set.
score another one for enami.
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:
* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.
From art@openbsd.org.
rather than using home-grown code to find a free reserved socket.
this also results in nfs pcb's having the INP_ANONPORT and INP_LOWPORT flags
set, which is useful for netstat(1) to know.
frank's scheme, with one new twist: don't wait until we've totally run
out of free pages before committing, but instead notice when we've built
up a largish range of uncommitted pages and commit only the older half of
the range, which is likely to already be on disk on the server.
struct nfssvc_sock.
Affected only when a recordmark of RPC over TCP is fragmented to
multiple mbufs. I do not know whether this code has ever been executed :)
and confusion about the actual filesize. From Matt Dillon's
similar change in FreeBSD.
XXX n_size is really redundant in -current and must die. This commit
XXX is more of a placeholder for a pullup into the 1.5 branch.
uint32_t namei_hash(const char *p, const char **ep)
which determines the equivalent MI hash32_str() hash for p.
If *ep != NULL, calculate the hash to the character before ep.
If *ep == NULL, calculate the has to the first / or NUL found, and
point *ep to that location.
- Use namei_hash() to calculate cn_hash in lookup() and relookup().
Hash distribution goes from 35-40% to 55-70%, with similar profiled
time spent in cache_lookup() and cache_enter() on my P3-600.
- Use namei_hash() to calculate cn_hash in nfs_readdirplusrpc(),
insetad of homegrown code (that differed from that in lookup() !)
namei_hash() has better spread and is faster than previous code
(which used a non-constant multiplication).
hash32_buf() to obtain a 32 bit hash. On some tests I ran I obtained
a 30x improvement in hash distribution and a 6x reduction in time spent
in nfs_nget().
- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.
The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.
convert various file systems to use the <sys/queue.h> macros for
their hash tables.
of nfs_niothreads instead of hard-coding 4.
This change has the advantage that the default can be specified
at compile time. If the root filesystem is mounted over NFS
we don't have an opportunity to use the syscall to limit the
number of threads. Useful on small-memory machines.
the value of "next-server" from the DHCP (or BOOTP) reply. This is
not the DHCP server's IP address (except by chance), so instead of
"server" make it print "next-server".
prevents us losing the locked state of the old vnode.
fvdl thinks the old vnode is certain to be locked at this point. I've put in
a KASSERT to be on the safe side.
This seems to fix PR kern/12661.
vfs_busy'ing just before the dounmount() call. This is to avoid
sleeping with the mountlist_slock held -- but we must acquire
syncer_lock before vfs_busy because the syncer itself uses
syncer_lock -> vfs_busy locking order.
the mapping is:
VM_PAGER_OK 0
VM_PAGER_BAD <unused>
VM_PAGER_FAIL <unused>
VM_PAGER_PEND 0 (see below)
VM_PAGER_ERROR EIO
VM_PAGER_AGAIN EAGAIN
VM_PAGER_UNLOCK EBUSY
VM_PAGER_REFAULT ERESTART
for async i/o requests, it used to be possible for the request to
be convert to sync, and the pager would return VM_PAGER_OK or VM_PAGER_PEND
to indicate whether the caller should perform post-i/o cleanup.
this is no longer allowed; pagers must now return 0 to indicate that
the async i/o was successfully started, and the caller never needs to
worry about doing the post-i/o cleanup.
- in the cases where we skip over the i/o loop, increment npages by ridx
so that when the cleanup code starts processing the pgs array at index 0
it'll actually process all of the pages.
- process the PG_RELEASED flag when unbusying pages.
- add some missing MP locking.
- use MIN() and MAX() instead of min() and max() since the latter are
functions which take arguments of type "int" but we call them with
values of type "off_t", so the values could be truncated.
problem reported by msaitoh@netbsd.org. NOTE: These are marked XXXUBC
since the code that allocates the bufs is new with UBC, but it may be
the case that bp->b_proc needs to be intialized to curproc (it's used
in a call to nfs_sigintr()).
- fix math when skipping writing pages that just need a commit.
- clear the needcommit stuff and PG_RDONLY flags on pages returned for
overwrite requests as well as for normal write faults.
- bail out of nfs_write() if we get an error.
- remove a bogus attempt to clean up after failed uiomove()s.
- bring over a workaround for a lock-ordering problem from the genfs code.
- add some missing MP locking.
passed it down to the appropriate usrreq function, and this
allows usage for contexts that need to be explicitly different
from curproc (like in the NFS code when binding to a reserved port).
if we do this for VBLK vnodes which are in use by softdep mounts,
brelse() will mark the buffer B_INVAL as well, which makes the
softdep code very unhappy.
havoc if the server erroneously uses the same filehandle for
different files. This changes back revision 1.28; the PR that
that revision fixed doesn't apply anymore, it has been verified
not to be a problem with this change.
for them are actually done asynchronously. Idea taken from FreeBSD.
Do away with nfs_writebp completely, it's not needed anymore.
Keep an eye on the range of a file that needs to be committed, and
do it in heaps.
that required to support NFSv2 mounts. Not finished yet, but already
provides some 44k of saving in code size on arm26. More savings, and some
documentation, are still to come.
int lf_advlock __P((struct lockf **,
off_t, caddr_t, int, struct flock *, int));
to
int lf_advlock __P((struct vop_advlock_args *, struct lockf **, off_t));
This matches common usage and is also compatible with similar change
in FreeBSD (though they use u_quad_t as last arg).
case that write verf is changed. Suggested by mycroft@netbsd.org.
- Reset wcred to NULL (i.e., write credential isn't decieded) everytime
before gathering buffer for new commit, so that there is a chance to
the commit request is merged.
filesystem, if the number of threads is "-1", meaning it's never been
set, then set it to 4. You can override by setting this to some other
number (including 0) before or after mounting, of course.
Thanks to whoever it was that suggested this on ICB... sorry I don't
remember who.
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.
The old timeout()/untimeout() API has been removed from the kernel.
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.
For each leaf filesystem, add appropriate vfs_done routine.
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.
Bump version number to 1.4O
With nfsv2, the nfsm_reply() macro always causes the service routine
to return if error was nonzero.
With nfsv3, we can keep going after nfsm_reply() without returning,
but nqnfsrv_getlease() didn't take this into account, so add a
return(0) after each error-case nfsm_reply(0).
call with F_FSCTL set and F_SETFL calls generate calls to a new
fileop fo_fcntl. Add genfs_fcntl() and soo_fcntl() which return 0
for F_SETFL and EOPNOTSUPP otherwise. Have all leaf filesystems
use genfs_fcntl().
Reviewed by: thorpej
Tested by: wrstuden
shutdown:
During an unmount, wake up all the processes which are waiting to lock
the socket for receive, and wait for them (and the process blocked in
soreceive, if any) to go away before blowing away the socket and the
mount structure.
-arrange gateway code to fall back to the old method if the new "getfile"
is not answered (and both are enabled -- allow to switch off the new
method for symmetry)
-handle error if setting the netmask fails
count is 0, wait for use count to drain before finishing the close.
This is necessary in order for multiple processes to safely share file
descriptor tables.
* Add prototype to libkern.h.
* Remove the almost-identical-copy from libsa/net.[ch].
* Change its type back to the (wrong, but harmless) historical one. (u_long)
* Kill the XXX local prototype in nfs_bootparam.c
* We shouldn't truncate the file.
* We were leaving the vnode locked (unless the truncate happened to fail).
Solaris clients may cause this under some conditions.
Problem reported by chopps, analysis and fix by me.
interface and the address allocated, to roll everything back if the
mount fails:
-put an interface pointer into "struct nfs_diskless" to have it
available for cleanup, don't pass it around anymore where the
"struct nfs_diskless" is already passed
-add a "cleanup" function which shuts the interface down
-in the protocol-specific parts, either return with "everything
ready" or "completely shut down"
-use common functions for interface initialization and shutdown
-add a function to delete all routes associate to an interface
(why is this necessary and not done by ~IFF_UP?)
g/c diskless swap stuff
general cleanup
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.
Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.
not in the kernel, genfs_lease_check() is simply a no-op. This allows
LKM'd file systems to be exported (previously did not work properly
due to a compile-time decision based on -DNFSSERVER).
- defopt NFSSERVER
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.
this is the rest of the MI portion changes.
this will be KNF'd shortly. :-)
conditional on a particular configuration method.
The global flags "nfs_boot_rfc951" and "nfs_boot_bootparam" control
independantly if the functions are actually called. (Previous meaning
of "nfs_boot_rfc951" was "either-or".)
gateway=server:255.255.255.0 because that is the perferred format,
and the sys/libsa code already knows how to parse that format.
(Copied ip_convert here from the libsa code.)
whether we get it off the wire. An nfsiod might have been busy with
it, and finished while we were waiting for it in nfs_getcacheblk, so
we need to check for EOF again no matter what.
the directory cache as translation table. See nfs_subs.c for comments.
Makes the code a bit more complex to look at than I would have liked,
but doesn't affect the speed of the default behavior.
* Optimize caching behavior a bit when buffers are invalidated.
* Save some RPCs in readdir operations by not bothering if there is
a small amount left to do to fill the buffer. It'll be done in the
next RPC with a larger chunk anyway. Wastes a bit of buffer space
but is faster.
* Make n_vattr an allocated vattr struct. This avoids nfsnode bloat,
and is friendlier to the malloc routines.