Makoto Fujiwara <makoto@ki.nu> and Manuel Bouyer <bouyer@netbsd.org>.
Help from Allen Briggs, Jason Thorpe, and Matt Thomas.
We need to call cpu_cache_probe() early in boot (machdep.c).
Add 603 info for completeness, and use NBPG not PAGESIZE, as the
latter relies on uvm being setup (cpu_subr.c).
Let uvm_page_recolor() be called before uvm has been set up; just
note the page coloring value (uvm_page.c).
uobject and uanon pointers rather than at the PQ_ANON flag to determine
which lock to hold, since PQ_ANON can be clear even when the anon's lock
is the one which we should hold (if the page was loaned from an object
and then freed by the object).
will be allocated for the respective usage types when there is contention
for memory.
replace "vnode" and "vtext" with "file" and "exec" in uvmexp field names
and sysctl names.
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.
- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.
The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.
Currently this is a no-op on most platforms, so they should see no difference.
Reviewed by Jason.
dynamically re-coloring pages; as machine-dependent code discovers
the size of the system's caches, it may call uvm_page_recolor() with
the new number of colors to use. If the new mumber of colors is
smaller (or equal to) the current number of colors, then uvm_page_recolor()
is a no-op.
The system defaults to one bucket if machine-dependent code does not
initialize uvmexp.ncolors before uvm_page_init() is called.
Note that the number of color bins should be initialized to something
reasonable as early as possible -- for many early memory allocations,
we live with the consequences of the page choice for the lifetime of
the boot.
each vm_page structure. Add a VM_MDPAGE_INIT() macro to init this
data when pages are initialized by UVM. These macros are mandatory,
but ports may #define them to nothing if they are not needed/used.
This deprecates struct pmap_physseg. As a transitional measure,
allow a port to #define PMAP_PHYSSEG so that it can continue to
use it until its pmap is converted to use VM_MDPAGE_MEMBERS.
Use all this stuff to eliminate a lot of extra work in the Alpha
pmap module (it's smaller and faster now). Changes to other pmap
modules will follow.
algorithm (Solaris calls this "Bin Hopping").
This implementation currently relies on MD code to define a
constant defining the number of buckets. This will change
reasonably soon (MD code will be able to dynamically size
the bucket array).
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).
These calls are relatively conservative. It may be possible to
optimize these a little more.
which have pmap_steal_memory(). This is to reduce the API differences
between pmaps that implement pmap_steal_memory() and pmaps which do
not.
Note that pmap_steal_memory() needs to adjust *vstartp and/or
*vendp only if it used addresses within the range provided to UVM
via the pmap_virtual_space() call. I.e. it is not necessary to do
so in any current pmap_steal_memory() implementation.
each of the basic types (anonymous data, executable image, cached files)
and prevent the pagedaemon from reusing a given page if that would reduce
the count of that type of page below a sysctl-setable minimum threshold.
the thresholds are controlled via three new sysctl tunables:
vm.anonmin, vm.vnodemin, and vm.vtextmin. these tunables are the
percentages of pageable memory reserved for each usage, and we do not allow
the sum of the minimums to be more than 95% so that there's always some
memory that can be reused.
amap_free(): Assert that the amap is locked.
amap_share_protect(): Assert that the amap is locked.
amap_wipeout(): Assert that the amap is locked.
uvm_anfree(): Assert that the anon has a reference count of 0 and is
not locked.
uvm_anon_lockloanpg(): Assert that the anon is locked.
anon_pagein(): Assert that the anon is locked.
uvmfault_anonget(): Assert that the anon is locked.
uvm_pagealloc_strat(): Assert that the uobj or the anon is locked
And fix the problems these have uncovered:
amap_cow_now(): Lock the new anon after allocating it, and unref and
unlock it (rather than lock!) before freeing it in case
of an error condition. This should fix a problem reported
by Dan Carosone using cdrecord on an i386 MP kernel.
uvm_fault(): Case1B -- Lock the new anon afer allocating it, and unlock
it later when we unlock the old anon.
Case2 -- Lock the new anon after allocating it, and unlock
it later by passing it to uvmfault_unlockall() (we set anon
to NULL if we're not doing a promote fault).
that the page being zero'd was not completed and that page zeroing
should be aborted. This may be used by machine-dependent code doing
slow page access to reduce the latency of running a process that has
become runnable while in the middle of doing a slow page zero.
<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>
also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
state into global and per-CPU scheduler state:
- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.
- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).
- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.
- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.
Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.
- Make page free lists have two actual queues: known-zero pages and
pages with unknown contents.
- Implement uvm_pageidlezero(). This function attempts to zero up to
the target number of pages until the target has been reached (currently
target is `all free pages') or until whichqs becomes non-zero (indicating
that a process is ready to run).
- Define a new hook for the pmap module for pre-zero'ing pages. This is
used to zero the pages using uncached access. This allows us to zero
as many pages as we want without polluting the cache.
In order to use this feature, each platform must add the appropropriate
glue in their idle loop.
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.
This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.
Fix tested by Havard Eidnes.
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.
Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.