idle pool pages to be returned to the system immediately upon becoming
de-fragmented.
Also, in pool_do_put(), don't free back an idle page unless we are over
our minimum page claim.
- return EPERM when unlocking a lock which isn't held
=> prevent the failure in PR 24023, where the citrus code had a deadlocking
code path
- remove deadlock check in pthread_rwlock_tryrdlock, return EBUSY instead
=> makes pthread_rwlock_tryrdlock standards compliant
XXX we may want to not install it at all, but this is better than
having it spread across xbase for some machines and xserver
for others in various md.<mach> files.
this usually isn't necessary since we freed it earlier in pmap_remove_all(),
but since uvmspace_free() is now called in the context of the exiting
process, a new context might be allocated if uvm_unmap_detach() decides
to sleep (since cpu_switch() will allocate a new context when it switches
back to the exiting process).
cache) from 30% to 20%. This seems to significantly smooth the oscillation
between "almost no memory available" and "UVM free target available" caused
by the current sudden, heavy backpressure on the metadata cache. We should
revisit this again once the backpressure mechanism is better tuned; ideally,
the hard limit should almost never come into play, because the metadata
cache should gradually give back pages as buffers hit the AGE list and as
the page cache demands them, rather than giving back a big slug of pages
all at once when UVM decides it's in a hurry and fires off the page daemon.
Just how well this adjustment works is likely to vary significantly from
machine to machine depending on I/O mix, filesystem frag size, and total
memory. However, 20% seems to be quite a bit better than 30% on several
systems I've tested and is, coincidentally, more than enough to cache
the entire metadata working set of the AnonCVS server with 100 clients,
which is a useful worst-case stake in the ground...
0.5%, based on some quick measurements on a number of workstations and
small fileservers (including my home fileserver running simultaneous
builds of the NetBSD source tree and several NetBSD kernels). This
brings the hit rate on my machines from below 70% to above 90%. We
should be able to tune this as we run, by tracking the hit rate and
increasing the size of the cache if memory permits.
Some systems will still require significantly larger cache sizes. Some
ports -- notably the 64-bit ones -- probably should use more than 1% of
physmem as the default due to the larger size of struct vnode.
buf_mrelease(). Without this, though the pages are returned to the
relevant *pool*, they are never available for any other use in the
system.
Now the backpressure on the physical size of the buffer cache through
the buf_drain() call in the pagedaemon works correctly. If anything,
it may be a bit more aggressive than intended. On my 256MB system,
with vm.bufcache set to the default 30% of physmem, a kernel with this
fix can do 5 simultaneous config/makedep/builds of different NetBSD
kernels in 1313 seconds; with the "traditional" buffer cache code it
requires 1320 seconds. Running "find / -type d -exec ls -l {}" while
the build is going demonstrates that the backpressure is working
correctly: free memory oscillates slowly between close to none and
the UVM target free, and vmstat -m shows a large number of releases
for the buffer pools.
For future work: how is "bufpl" memory returned to the system? This
is not obvious to me (I must be looking in the wrong place). Also,
buf_mrelease() is also called from brelse() in some cases. Would it
be better to add a pool flag causing automatic release of full pages
as they become available (not fragmented)? Jason Thorpe proposed this
and it seems more elegant than cleaning the _entire_ pool only upon
memory pressure.
Greg Oster did a lot of the work of figuring this out. Jason proposed
the use of pool_reclaim as a way to fix it.
uninit function can call close again, which will try to obtain a held
lock. Unlock the lock before calling the actual close function, since
we already disassociated cm from the rest of the data structures.