struct socket so_state field to decide if we need to send asynchronous
notifications. This makes possible to request notification on write but
not on read, and vice versa.
This is used in Linux emulation code, because when async I/O is requested,
Linux does not send SIGIO to write end of sockets, and it never send any
SIGIO to any end of pipes. Il Linux emulation code, we then set SB_ASYNC
only on the read end of sockets, and on no end for pipes.
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.
Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva
All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.
This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.
MNT_NOSUID, just check MNT_NOSUID to clear the S{U,G}ID bits
in the attributes for the vnode we're about to exec.
We now check P_TRACED right before we would actually perform
the s{u,g}id function in the exec code.
This closes a race condition between exec of a setuid binary
and ptrace(2).
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.
While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.
fdexpand(). The former will return ENOSPC if there is not space
in the current filedesc table. The latter performs the expansion
of the filedesc table. This means that fdalloc() won't ever block,
and it gives callers an opportunity to clean up before the
potentially-blocking fdexpand() call.
Update all fdalloc() callers to deal with the need-to-fdexpand() case.
Rewrite unp_externalize() to use fdalloc() and fdexpand() in a
safe way, using an algorithm suggested by Bill Sommerfeld:
- Use a temporary array of integers to hold the new filedesc table
indexes. This allows us to repeat the loop if necessary.
- Loop through the array of file *'s, assigning them to filedesc table
slots. If fdalloc() indicates expansion is necessary, undo the
assignments we've done so far, expand, and retry the whole process.
- Once all file *'s have been assigned to slots, update the f_msgcount
and unp_rights counters.
- Right before we return, copy the temporary integer array to the message
buffer, and trim the length as before.
Note that once locking is added to the filedesc array, this entire
operation will be `atomic', in that the lock will be held while
file *'s are assigned to embryonic table slots, thus preventing anything
else from using them.
descriptor array, which may have blocked. Change callers of
fdalloc() to restart whatever they\'re doing if this condition
happens. (XXX unp_externalize() needs some work, but that will
be tackled later.)
Change finishdup() to close the descriptor in the `new\' slot if
one exists, and change sys_dup2() accordingly.
Closes a race condition when using kernel-assisted user threads.
While here, garbage-collect UF_MAPPED -- it is not used anywhere.
The ISO C standard says in 6.10.3.3 that if the result of using the
'##' operator "is not a valid preprocessing token, the behaviour is
undefined." Gcc 3.0 warns about this.
unions `union_elem: ...', and use c99 syntax `.union_elem = ...' only
where necessary.
in this case, there's no need to tag elf_probe_func because that's the
first union element, and therefore, the implicit case. only specifically
mention ecoff_probe_func where necessary.
if we decide to not use this c99 feature for now, at least there's now
less stuff to rip out.
Previously, we passed __FILE__ and __LINE__ on all pool_get/pool_set calls.
This change results in a measured 1.2% performance improvement in
ping-flood packets-per-second as reported by ping(8).
that the caller allocate the pool_item_header when it allocates the
pool page, so we can avoid a locking pitfall (sleeping with a simple
lock held).
Also revive pool_prime(), as there are some letigimate uses of it,
but in doing so, eliminate some of the bogosities of the old version
(i.e. don't do an implicit "setlowat", just prime the pool, and incr
the minpages for each additional page we add, and compute the number
of pages to prime in a way that callers would expect).
to 512. Apparently, there are ELF binaries with more than 128 section
headers - an example is one of Linux Word Perfect 8 utilities.
This fixes kern/12455 by Mark Davies.