the page is still loaned to an anon, we should put the page back on a
paging queue. this is because while pages loaned to the kernel really
do need to stay resident (since the kernel is accessing the physical
memory directly), pages loaned to anons can be paged out just fine.
(the page will be paged out twice, first to the object and then again
to the anon, but after that the page can be reused.)
-pass vm_physseg* instead of physseg index, and PFN (int) instead
of physical address (could be done even more)
-simplify detection of boundary crossing and behave more intelligently
in this case
-take stuff out of the inner loops, or put into "#ifdef DEBUG"
(because we move along physsegs we don't need to check that the
pages are physically contigous)
-make the "simple" and "contigous" branches look more uniform; at
least the outer loops might coalesce one day
Makoto Fujiwara <makoto@ki.nu> and Manuel Bouyer <bouyer@netbsd.org>.
Help from Allen Briggs, Jason Thorpe, and Matt Thomas.
We need to call cpu_cache_probe() early in boot (machdep.c).
Add 603 info for completeness, and use NBPG not PAGESIZE, as the
latter relies on uvm being setup (cpu_subr.c).
Let uvm_page_recolor() be called before uvm has been set up; just
note the page coloring value (uvm_page.c).
obey the preferences expressed by freelist assignment,
to avoid wasting valuable "low memory" to devices which
don't really need it.
comments:
-I'm not sure searching the physsegs within a freelist
beginning with the biggest is the right thing. This is
what the "memory steal" code in uvm_page.c does, so
keep it consistent.
-There seems to be some confusion whether the upper
address limit passed is inclusive or not. Stays on
the save side, possibly leaving one page out.
-The boundary/pagemask check can be simplified, also some
arguments passed are only used for diagnostic checks.
-Integration with UVM_PAGE_TRKOWN???
no alignment / boundary / nsegs restrictions apply.
This one doesn't insist in a contigous range, and it honours the "waitok"
flag, thus succeeds in situations which were hopeless with the existing one.
(A solution which searches for a minimum number of contiguous ranges using
some best-fit or so algorithm would be expensive to implement; I believe the
"either-or" done here does reflect the current use by bus_dma quite well.)
Now agp memory allocation is robust for me. (tested on i810)
we can't simply reuse the pointor to the page. Instead, we need to
acquire it again. So, rearrange the loop like genfs_putpages() does.
Reviewed by chuq.
This makes `tail -<N> <FILE> | cat > file' correctly, where <FILE> is
a regular file larger than 10Mbytes (makes tail to map part of file)
and <N> is big enough to produce output larger than 8kbytes (makes pipe
to use page loan facility). Problem reported by FUKAUMI Naoki on japanese
local mailing list.
uvm_swap_stats(). This is done in order to allow COMPAT_* swapctl()
emulation to use it directly without going through sys_swapctl().
The problem with using sys_swapctl() there is that it involves
copying the swapent array to the stackgap, and this array's size
is not known at build time. Hence it would not be possible to
ensure it would fit in the stackgap in any case.
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:
* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.
From art@openbsd.org.
just skip that page. this situation can arise legitimately when a file
with a wired mapping is truncated so that a wired page is no longer
part of the file.
from VM_FAULT_WIRE in that when the pages being wired are faulted in,
the simulated fault is at the maximum protection allowed for the mapping
instead of the current protection. use this in uvm_map_pageable{,_all}()
to fix the problem where writing via ptrace() to shared libraries that
are also mapped with wired mappings in another process causes a
diagnostic panic when the wired mapping is removed.
this is a really obscure problem so it deserves some more explanation.
ptrace() writing to another process ends up down in uvm_map_extract(),
which for MAP_PRIVATE mappings (such as shared libraries) will cause
the amap to be copied or created. then the amap is made shared
(ie. the AMAP_SHARED flag is set) between the kernel and the ptrace()d
process so that the kernel can modify pages in the amap and have the
ptrace()d process see the changes. then when the page being modified
is actually faulted on, the object pages (from the shared library vnode)
is copied to a new anon page and inserted into the shared amap.
to make all the processes sharing the amap actually see the new anon
page instead of the vnode page that was there before, we need to
invalidate all the pmap-level mappings of the vnode page in the pmaps
of the processes sharing the amap, but we don't have a good way of
doing this. the amap doesn't keep track of the vm_maps which map it.
so all we can do at this point is to remove all the mappings of the
page with pmap_page_protect(), but this has the unfortunate side-effect
of removing wired mappings as well. removing wired mappings with
pmap_page_protect() is a legitimate operation, it can happen when a file
with a wired mapping is truncated. so the pmap has no way of knowing
whether a request to remove a wired mapping is normal or when it's due to
this weird situation. so the pmap has to remove the weird mapping.
the process being ptrace()d goes away and life continues. then,
much later when we go to unwire or remove the wired vm_map mapping,
we discover that the pmap mapping has been removed when it should
still be there, and we panic.
so where did we go wrong? the problem is that we don't have any way
to update just the pmap mappings that need to be updated in this
scenario. we could invent a mechanism to do this, but that is much
more complicated than this change and it doesn't seem like the right
way to go in the long run either.
the real underlying problem here is that wired pmap mappings just
aren't a good concept. one of the original properties of the pmap
design was supposed to be that all the information in the pmap could
be thrown away at any time and the VM system could regenerate it all
through fault processing, but wired pmap mappings don't allow that.
a better design for UVM would not require wired pmap mappings,
and Chuck C. and I are talking about this, but it won't be done
anytime soon, so this change will do for now.
this change has the effect of causing MAP_PRIVATE mappings to be
copied to anonymous memory when they are mlock()d, so that uvm_fault()
doesn't need to copy these pages later when called from ptrace(), thus
avoiding the call to pmap_page_protect() and the panic that results
from this when the mlock()d region is unlocked or freed. note that
this change doesn't help the case where the wired mapping is MAP_SHARED.
discussed at great length with Chuck Cranor.
fixes PRs 10363, 12554, 12604, 13041, 13487, 14580 and 14853.
we need to make sure that vnode pages are written to disk at least once,
otherwise processes could gain access to whatever data was previously stored
in disk blocks which are freshly allocated to a file.
uobject and uanon pointers rather than at the PQ_ANON flag to determine
which lock to hold, since PQ_ANON can be clear even when the anon's lock
is the one which we should hold (if the page was loaned from an object
and then freed by the object).