idle pool pages to be returned to the system immediately upon becoming
de-fragmented.
Also, in pool_do_put(), don't free back an idle page unless we are over
our minimum page claim.
this usually isn't necessary since we freed it earlier in pmap_remove_all(),
but since uvmspace_free() is now called in the context of the exiting
process, a new context might be allocated if uvm_unmap_detach() decides
to sleep (since cpu_switch() will allocate a new context when it switches
back to the exiting process).
cache) from 30% to 20%. This seems to significantly smooth the oscillation
between "almost no memory available" and "UVM free target available" caused
by the current sudden, heavy backpressure on the metadata cache. We should
revisit this again once the backpressure mechanism is better tuned; ideally,
the hard limit should almost never come into play, because the metadata
cache should gradually give back pages as buffers hit the AGE list and as
the page cache demands them, rather than giving back a big slug of pages
all at once when UVM decides it's in a hurry and fires off the page daemon.
Just how well this adjustment works is likely to vary significantly from
machine to machine depending on I/O mix, filesystem frag size, and total
memory. However, 20% seems to be quite a bit better than 30% on several
systems I've tested and is, coincidentally, more than enough to cache
the entire metadata working set of the AnonCVS server with 100 clients,
which is a useful worst-case stake in the ground...
0.5%, based on some quick measurements on a number of workstations and
small fileservers (including my home fileserver running simultaneous
builds of the NetBSD source tree and several NetBSD kernels). This
brings the hit rate on my machines from below 70% to above 90%. We
should be able to tune this as we run, by tracking the hit rate and
increasing the size of the cache if memory permits.
Some systems will still require significantly larger cache sizes. Some
ports -- notably the 64-bit ones -- probably should use more than 1% of
physmem as the default due to the larger size of struct vnode.
buf_mrelease(). Without this, though the pages are returned to the
relevant *pool*, they are never available for any other use in the
system.
Now the backpressure on the physical size of the buffer cache through
the buf_drain() call in the pagedaemon works correctly. If anything,
it may be a bit more aggressive than intended. On my 256MB system,
with vm.bufcache set to the default 30% of physmem, a kernel with this
fix can do 5 simultaneous config/makedep/builds of different NetBSD
kernels in 1313 seconds; with the "traditional" buffer cache code it
requires 1320 seconds. Running "find / -type d -exec ls -l {}" while
the build is going demonstrates that the backpressure is working
correctly: free memory oscillates slowly between close to none and
the UVM target free, and vmstat -m shows a large number of releases
for the buffer pools.
For future work: how is "bufpl" memory returned to the system? This
is not obvious to me (I must be looking in the wrong place). Also,
buf_mrelease() is also called from brelse() in some cases. Would it
be better to add a pool flag causing automatic release of full pages
as they become available (not fragmented)? Jason Thorpe proposed this
and it seems more elegant than cleaning the _entire_ pool only upon
memory pressure.
Greg Oster did a lot of the work of figuring this out. Jason proposed
the use of pool_reclaim as a way to fix it.
are.
Cut SYSCTL_DEFSIZE in half, which results in roughly a 40% improvement
in memory usage by sysctlnodes (on my laptop), along with a dozen
extra calls to sysctl_realloc() during kernel bootstrap (which no one
should notice anyway).
This makes it possible to define header files on the command line that
might include ${MACHINE} somewhere in the path. This might be used in
evbppc, for example, when defining PPC_PCI_MACHDEP_IMPL as, for example:
PPC_PCI_MACHDEP_IMPL="<arch/evbppc/sandpoint/pci_machdep.h>"
which will be included as
#include PPC_PCI_MACHDEP_IMPL
Prior to this change, the compile would fail trying to include
<arch/evbppc/1/pci_machdep.h>
Split the sysctl setup routine into two routines, one for each
"subtree". Perhaps it's a little pedantic, but it's cleaner. Also,
assert that the "kern" and "vm" nodes exist.