uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.
Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.
Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.
All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.
PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
copyin() or copyout().
uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.
(1) split the single list of pages allocated to a pool into three lists:
completely full, partially full, and completely empty.
there is no longer any need to traverse any list looking for a
certain type of page.
(2) replace the 8-element hash table for out-of-page page headers
with a splay tree.
these two changes (together with the recent enhancements to the wait code)
give us linear scaling for a fork+exit microbenchmark.
to improve scalability of operations on the map.
originally done by Niels Provos for OpenBSD.
tweaked for NetBSD by me with some advices from enami tsugutomo.
discussed on tech-kern@ and tech-perform@.
- after sleeping for memory, re-check if we have a page.
- put the allocated page to pageq to appease UVM_PAGE_TRKOWN.
- dequeue the page when doing ->K loan.
The case where l_stat == LSONPROC and l_cpu == curcpu cannot happen
because the pagedaemon is the LWP on curcpu and the pagedaemon is a
kernel thread and the code is only used by the pagedaemon.
See also updated patch in PR kern/23095, which I ment to checkin
originally.
uvm_swapout_threads will swapout LWPs which are running on another CPU:
- uvm_swapout_threads considers LWPs running on another CPU for swapout
if their l_swtime is high
- uvm_swapout_threads considers LWPs on the runqueue for swapout if their
l_swtime is high but these LWPs might be running by the time uvm_swapout
is called
symptoms of failure: panic in setrunqueue
fixes PR kern/23095
with the KVA of the newly-wired uarea.
This is useful on some architectures (e.g. xscale) where the uarea mapping
can be tweaked to use the mini-data cache instead of the main cache.
it may return space already in use as free space under some condition.
The symptom of the bug is that exec fails if stack is unlimited on
topdown VM kernel.
uvm_swap_free() being called with a zero slot; this might have been
the reason for crashes with sysvshm and heavy swapping.
(PR kern/22752 by Tom Spindler)
Confirmed by Chuck Silvers.
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.
So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:
- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5
- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.
originally from openbsd, adapted for netbsd by me.