file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.
From FreeBSD with slight modifications.
Approved by: Frank van der Linden <fvdl@netbsd.org>
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.
Bump the kernel rev up to 1.6V
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.
While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.
fdexpand(). The former will return ENOSPC if there is not space
in the current filedesc table. The latter performs the expansion
of the filedesc table. This means that fdalloc() won't ever block,
and it gives callers an opportunity to clean up before the
potentially-blocking fdexpand() call.
Update all fdalloc() callers to deal with the need-to-fdexpand() case.
Rewrite unp_externalize() to use fdalloc() and fdexpand() in a
safe way, using an algorithm suggested by Bill Sommerfeld:
- Use a temporary array of integers to hold the new filedesc table
indexes. This allows us to repeat the loop if necessary.
- Loop through the array of file *'s, assigning them to filedesc table
slots. If fdalloc() indicates expansion is necessary, undo the
assignments we've done so far, expand, and retry the whole process.
- Once all file *'s have been assigned to slots, update the f_msgcount
and unp_rights counters.
- Right before we return, copy the temporary integer array to the message
buffer, and trim the length as before.
Note that once locking is added to the filedesc array, this entire
operation will be `atomic', in that the lock will be held while
file *'s are assigned to embryonic table slots, thus preventing anything
else from using them.
descriptor array, which may have blocked. Change callers of
fdalloc() to restart whatever they\'re doing if this condition
happens. (XXX unp_externalize() needs some work, but that will
be tackled later.)
Change finishdup() to close the descriptor in the `new\' slot if
one exists, and change sys_dup2() accordingly.
Closes a race condition when using kernel-assisted user threads.
While here, garbage-collect UF_MAPPED -- it is not used anywhere.
on LP64 systems (and probably the SPARC) since the __cmsg_alignbytes()
changes went in.
- Change file descriptor passing to use CMSG_DATA(), not (cm + 1). This
pretty much has to be done in order to make it work properly on LP64,
and considering that it's been broken this long...
- Use CMSG_SPACE() to determine the mbuf length needed for a given
control message, and CMSG_LEN() to stash in the cmsg_len member.
count is 0, wait for use count to drain before finishing the close.
This is necessary in order for multiple processes to safely share file
descriptor tables.
directories which aren't under the recipient's root.
Clean up of many error conditions involving descriptor passing, to
eliminate infinite loops, panics, premature garbage collection of
sockets, and descriptor leaks:
- Avoid letting unp_gc() see descriptors with a refcount of zero by
removing them from the socket's queue before releasing them.
- Avoid socket leak in PRU_ABORT (this will also gc descriptors queued
on a not-yet accepted socket when the accepting socket goes away).
- Put in block comment explaining how unp_gc() should work.
- Correctly manage unp_defer count so we don't get stuck in an infinite
loop with nothing to do.
- Don't tie MARK and DEFER bits so closely together.
- Mark descriptors queued on not-yet-accepted sockets as well.
- Don't call sorflush on non-socket, it doesn't work very well.
- Deal with discard of NULL file pointer.
- Hopefully cause GC to converge faster by only deferring sockets in
unp_mark().
data after the cmsghdr when accessing internalized SCM_RIGHTS messages
(i.e. array of struct file *s). The historic interface does not align
the externalized SCM_RIGHTS messages (i.e. array of ints).
enabled with the LOCAL_CREDS socket option on the listener. Semantics are
similar to BSD/OS's:
- Creds are available with first data on SOCK_STREAM, and with every datagram
on SOCK_DGRAM.
- It is not possible to forge credentials.
Different in that:
- Different credential data structure (ours does not rely on the format
of internal kernel data structures, and does not pass the login name).
- We can pass creds and file descriptors at the same time (this does not
work in BSD/OS).
Luke Mewburn <lukem@netbsd.org> gets credit for inspiring me to implement
this. :-)
socket names:
- In unp_setsockaddr() and unp_setpeeraddr(), if the socket name can't
fit into a single mbuf, allocate enough external storage space to
hold it.
- In unp_bind() and unp_connect(), perform a similar operation, but allocate
one extra byte, and ensure that the pathname is nul-terminated.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for the sanity
checking.
fill the socket's creation time into the stat structure's st_[acm]time fields:
POSIX requires this behavior for pipe(2). N.B.: updating the st_[am]time fields
when reading/writing the pipe is neither required nor implemented, though.