and seems like generally sensible (more sensible than not doing so), so done
in generic code rather than compat glue only
Change proposed in PR kern/18767 by Emmanuel Dreyfus.
kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals
kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)
based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
indicating an unhandled "command". ERESTART is -1, which can lead to
confusion. ERESTART has been moved to -3 and EPASSTHROUGH has been
placed at -4. No ioctl code should now return -1 anywhere. The
ioctl() system call is now properly restartable.
Changes:
* MP locking changes (mostly FreeBSD specific)
XXXSMP the MP locking macros are noops on NetBSD for now
* kevent fix (FreeBSD rev. 1.87): when the last reader/writer
disconnects, ensure that anybody who is waiting for the kevent
on the other end of the pipe gets EV_EOF
* kill __P
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:
* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.
From art@openbsd.org.
expecting pmap_kenter_pa() to be used to replace an existing mapping,
plus it just seems like a bad idea to keep around mappings of pages
that may be freed and reused.
pages loaned to the kernel. this implies that we also need to
call pmap_kremove() before uvm_km_free().
other general cleanup: remove argument names from prototypes,
rename some variables, etc.
THAT accurate and microtime(9) is painlessly slow on i386 currently.
This speeds up small transfers much. The gain for large transfers
is less significant, but notable too.
Bottleneck was found by Andreas Persson (Re: kern/14246).
Performance improvement with PIII on 661 Mhz according to hbench (with
PIPE_MINDIRECT=8192):
buffersize before after
512 17 49
1024 33 110
2048 52 143
4096 77 163
8192 142 190
64K 577 662
128K 372 392
(not just EPIPE), so that the higher-level code would note partial
write has happened and DTRT if the write was interrupted due to
e.g. delivery of signal.
This fixes kern/14087 by Frank van der Linden.
Much thanks to Frank for extensive help with debugging this, and review
of the fix.
Note: EPIPE/SIGPIPE delivery behaviour was retained - they're delivered
even if the write was partially successful.
not do short writes unless when using non-blocking I/O.
This fixes kern/13744 by Geoff C. Wing.
Note this partially undoes rev. 1.5 change. Upon closer examination,
it's been apparent that hbench-OS expectations were not actually justified.
are only wired if this flag is present (i.e. they are not wired by default now)
loaned pages are unloaned via new uvm_unloan(), uvm_unloananon() and
uvm_unloanpage() are no longer exported
adjust uvm_unloanpage() to unwire the pages if UVM_LOAN_WIRED is specified
mark uvm_loanuobj() and uvm_loanzero() static also in function implementation
kern/sys_pipe.c: uvm_unloanpage() --> uvm_unloan()
ALWAYS call uvm_unloanpage() in cleanup - it's necessary even
in pipe_loan_free() case, since uvm_km_free() doesn't seem
to implicitly unloan the loaned pages
of some selective pieces. This fixes problem with NEW_PIPE in kernels
with DEBUG option, reported via e-mail by Chuck Silvers.
sys_pipe(): g/c fdp, provide it at the chunk of FreeBSD code where it's used
disabled loans for writes (a.k.a "direct write"), oops; use uio->uio_resid
for the check instead
don't bother updating uio->uio_offset in pipe_direct_write(), it's not used
by upper layers anyway
than PIPE_CHUNK_SIZE, just transfer first PIPE_CHUNK_SIZE and return short
write, expecting the caller to call us again later (if they need). Previous
behaviour (besides being wrong for O_NONBLOCK reads) hung hbench under some
circumstances and other applications may have similar expectations as hbench.
This might also fix port-vax/13333 by Manuel Bowyer.
Other changes to pipe_direct_write() include:
* return short write (and success) on EOF if any data were already read;
we return EPIPE on next write(2) call
* simplify error handling, actually handle uvm_loan() failure correctly,
call pipe_loan_free() on error explicitly and only call uvm_unloan()
if the address space was _not_ already freed by pipe_loan_free()
Thanks Chuck Silvers for uvm_unloan() hints :)
Fallthough to common write in pipe_write() if pipe_direct_write()
returns ENOMEM, otherwise always break out immediatelly.
Use uvm_km_valloc_wait() instead uvm_km_valloc() in pipe_loan_alloc().
The end we want to do selwakeup() on is not necessarily same as the one
we send SIGIO to. Make pipeselwakeup() accept two parameters and update
callers accordingly. This change fixes behaviour for code, which does
select(2)s on the write end waiting for reader (watched on gv, the problem
manifestated itself as a too long delay before the document was displayed).
Clearly separate the resource free code for FreeBSD
and NetBSD case in pipeclose(), so that it's a bit clearer what's going on.
Also LK_DRAIN the lock before the memory is returned to pipe_pool.
Add missing wakeup() in pipe_write() for PIPE_WANTCLOSE case.
for FreeBSD project. Besides huge speed boost compared with socketpair-based
pipes, this implementation also uses pagable kernel memory instead of mbufs.
Significant differences to FreeBSD version:
* uses uvm_loan() facility for direct write
* async/SIGIO handling correct also for sync writer, async reader
* limits settable via sysctl, amountpipekva and nbigpipes available via sysctl
* pipes are unidirectional - this is enforced on file descriptor level
for now only, the code would be updated to take advantage of it
eventually
* uses lockmgr(9)-based locks instead of home brew variant
* scatter-gather write is handled correctly for direct write case, data
is transferred by PIPE_DIRECT_CHUNK bytes maximum, to avoid running out of kva
All FreeBSD/NetBSD specific code is within appropriate #ifdef, in preparation
to feed changes back to FreeBSD tree.
This pipe implementation is optional for now, add 'options NEW_PIPE'
to your kernel config to use it.