the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.
So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.
first step towards per-device MAXPHYS, and has the beneficial side effect
of allowing clustering to MAXPHYS even on systems that need to run with
a reduced MAXBSIZE to get more metadata buffers.
uobject and uanon pointers rather than at the PQ_ANON flag to determine
which lock to hold, since PQ_ANON can be clear even when the anon's lock
is the one which we should hold (if the page was loaned from an object
and then freed by the object).
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.
- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.
The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.
Currently this is a no-op on most platforms, so they should see no difference.
Reviewed by Jason.
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).
These calls are relatively conservative. It may be possible to
optimize these a little more.
the mapping is:
VM_PAGER_OK 0
VM_PAGER_BAD <unused>
VM_PAGER_FAIL <unused>
VM_PAGER_PEND 0 (see below)
VM_PAGER_ERROR EIO
VM_PAGER_AGAIN EAGAIN
VM_PAGER_UNLOCK EBUSY
VM_PAGER_REFAULT ERESTART
for async i/o requests, it used to be possible for the request to
be convert to sync, and the pager would return VM_PAGER_OK or VM_PAGER_PEND
to indicate whether the caller should perform post-i/o cleanup.
this is no longer allowed; pagers must now return 0 to indicate that
the async i/o was successfully started, and the caller never needs to
worry about doing the post-i/o cleanup.
failed because we failed to acquire some resource needed to initiate
the pageout (such as failing to lock an indirect buffer) rather than
a hard i/o error. in this case we just want to reactivate the page(s)
so that we'll try to write them again later.
while I'm here, clean up some DIAGNOSTIC code.
Mach VM's now. Specific changes:
- Pages now need not have all of their mappings removed before being
put on the inactive list. They only need to have the "referenced"
attribute cleared. This makes putting pages onto the inactive list
much more efficient. In order to eliminate redundant clearings of
"refrenced", callers of uvm_pagedeactivate() must now do this
themselves.
- When checking the "modified" attribute for a page (for clearing
PG_CLEAN), make sure to only do it if PG_CLEAN is currently set on
the page (saves a potentially expensive pmap operation).
- When scanning the inactive list, if a page is referenced, reactivate
it (this part was actually added in uvm_pdaemon.c,v 1.27). This
now works properly now that pages on the inactive list are allowed to
have mappings.
- When scanning the inactive list and considering a page for freeing,
remove all mappings, and then check the "modified" attribute if the
page is marked PG_CLEAN.
- When scanning the active list, if the page was referenced since its
last sweep by the scanner, don't deactivate it. (This part was
actually added in uvm_pdaemon.c,v 1.28.)
These changes greatly improve interactive performance during
moderate to high memory and I/O load.
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.
<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>
also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.
Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:
PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail
If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.
Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.
ensure we don't take mod/ref emulation faults in an interrupt context
(e.g. during the i/o operation). This is safe because:
- For a pageout operation, the page is already known to be
modified, and the pagedaemon will pmap_clear_modify() after
the pageout has completed.
- For a pagein operation, pagers must already pmap_clear_modify()
after the pagein operation is complete, because the i/o may have
been done with e.g. programmed i/o.
XXX It would be nice to know the i/o direction so that we can call
XXX pmap_enter() with only the protection and access_type necessary.
are still owned by the object which is paging, and so the test for a kernel
object in uvm_unmap_remove() will cause pmap_remove() to be used instead
of pmap_kremove().
This was a MAJOR source of pmap_remove() vs pmap_kremove() inconsistency
(which caused the busted kernel pmap statistics, and a cause of much
locking hair on MP systems).
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().