Commit Graph

149 Commits

Author SHA1 Message Date
yamt b479cef701 don't move hint backward. 2003-11-05 15:34:50 +00:00
yamt 171053e863 - fix a reversed comparison.
- fix "nextgap" case.
- make sure don't get addresses behind hint.
- deal with integer wraparounds better.
- assertions.
2003-11-05 15:09:09 +00:00
yamt c6d9c8814d fix a wrong assertion. pointed by Christian Limpach. 2003-11-02 07:58:52 +00:00
yamt 142a2d4058 - update uvm_map::size fewer places.
- add related assertions.
2003-11-01 19:56:09 +00:00
yamt c45bf442f2 commit rest of the previous (rbtree).
(i should check .rej files before commit, sorry)
2003-11-01 19:45:13 +00:00
yamt 57e554da69 track map entries and free spaces using red-black tree
to improve scalability of operations on the map.

originally done by Niels Provos for OpenBSD.
tweaked for NetBSD by me with some advices from enami tsugutomo.
discussed on tech-kern@ and tech-perform@.
2003-11-01 11:09:02 +00:00
junyoung b28a286e6a KNF. 2003-10-25 23:05:45 +00:00
enami 57a6593f52 Fix indent. 2003-10-09 03:12:29 +00:00
atatat d4de28f890 When pulling back an amap to cover the new allocation along with the
previous entry, don't add the size to the extension -- it's already
been added to the end of the previous entry.
2003-10-09 02:44:54 +00:00
enami ae9b5cba84 Rewrite uvm_map_findspace() to improve readability and to fix a bug that
it may return space already in use as free space under some condition.
The symptom of the bug is that exec fails if stack is unlimited on
topdown VM kernel.
2003-10-02 00:02:10 +00:00
enami 0ca733e759 Some whitespace fixes. 2003-10-01 23:08:32 +00:00
enami aa87bee0c5 ansi'fy. 2003-10-01 22:50:15 +00:00
yamt 91161caf3c use VM_PAGE_TO_PHYS macro instead of using phys_addr directly. 2003-08-26 15:12:18 +00:00
thorpej 03befad98b In uvm_map_clean(), only call pgo_put if the object has one.
From Quentin Garnier <quatriemek.com!netbsd>.
2003-04-09 21:39:29 +00:00
matt 76dd2c90fa In uvm_map_space, if the current entry is above the new space use the
previous entry.  (not if the current entry starts at the end of the new
space; that case doesn't take into account if the new space had a specified
alignment).
2003-03-02 08:57:49 +00:00
matt d6729b1f53 When finding an aligned block, we need to truncate in topdown, not roundup. 2003-03-02 02:55:03 +00:00
simonb 0b2b1cc0cc Remove assigned-to but not used variable. 2003-02-23 04:53:51 +00:00
matt 23b48be61f fix a tpyo in a comment. 2003-02-21 16:38:44 +00:00
atatat df0a9badc6 Introduce "top down" memory management for mmap()ed allocations. This
means that the dynamic linker gets mapped in at the top of available
user virtual memory (typically just below the stack), shared libraries
get mapped downwards from that point, and calls to mmap() that don't
specify a preferred address will get mapped in below those.

This means that the heap and the mmap()ed allocations will grow
towards each other, allowing one or the other to grow larger than
before.  Previously, the heap was limited to MAXDSIZ by the placement
of the dynamic linker (and the process's rlimits) and the space
available to mmap was hobbled by this reservation.

This is currently only enabled via an *option* for the i386 platform
(though other platforms are expected to follow).  Add "options
USE_TOPDOWN_VM" to your kernel config file, rerun config, and rebuild
your kernel to take advantage of this.

Note that the pmap_prefer() interface has not yet been modified to
play nicely with this, so those platforms require a bit more work
(most notably the sparc) before they can use this new memory
arrangement.

This change also introduces a VM_DEFAULT_ADDRESS() macro that picks
the appropriate default address based on the size of the allocation or
the size of the process's text segment accordingly.  Several drivers
and the SYSV SHM address assignment were changed to use this instead
of each one picking their own "default".
2003-02-20 22:16:05 +00:00
thorpej b193480908 Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant.  Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
2003-02-01 06:23:35 +00:00
christos 5c729d909f finally: step 5: disable a KASSERT() if we are doing_shutdown.
now sync from ddb should work as badly as before the nathanw_sa merge.
2003-01-21 00:03:07 +00:00
thorpej b78f59b443 Merge the nathanw_sa branch. 2003-01-18 08:51:40 +00:00
thorpej 130e5c278b UVM_KMF_NOWAIT -> UVM_FLAG_NOWAIT 2002-12-11 07:14:28 +00:00
bouyer d986226518 Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.
2002-11-30 18:28:04 +00:00
atatat 42c2fe641b Implement backwards extension of amaps. There are three cases to deal
with:

Case #1 -- adjust offset: The slot offset in the aref can be
decremented to cover the required size addition.

Case #2 -- move pages and adjust offset: The slot offset is not large
enough, but the amap contains enough inactive space *after* the mapped
pages to make up the difference, so active slots are slid to the "end"
of the amap, and the slot offset is, again, adjusted to cover the
required size addition.  This optimizes for hitting case #1 again on
the next small extension.

Case #3 -- reallocate, move pages, and adjust offset: There is not
enough inactive space in the amap, so the arrays are reallocated, and
the active pages are copied again to the "end" of the amap, and the
slot offset is adjusted to cover the required size.  This also
optimizes for hitting case #1 on the next backwards extension.

This provides the missing piece in the "forward extension of
vm_map_entries" logic, so the merge failure counters have been
removed.

Not many applications will make any use of this at this time (except
for jvms and perhaps gcc3), but a "top-down" memory allocator will use
it extensively.
2002-11-14 17:58:48 +00:00
perry bbad42171f /*CONTCOND*/ while (0)'ed macros 2002-11-02 07:40:47 +00:00
atatat 68277bb301 In the case of a double amap_extend() (during a forward merge after a
back merge), don't abort the allocation if the second extend fails,
just abort the forward merge and finish the allocation.

Code reviewed by thorpej.
2002-10-24 22:22:28 +00:00
atatat 2d6863ada3 Call amap_extend() a second time in the case of a bimerge (both
backwards and forwards) if the previous entry was backed by an amap.

Fixes pr kern/18789, where netscape 7 + a java applet actually manage
to incur forward and bimerges in userspace.

Code reviewed by fvdl and thorpej.
2002-10-24 20:37:59 +00:00
atatat 94ef8e0795 Add an implementation of forward merging of new map entries. Most new
allocations can be merged either forwards or backwards, meaning no new
entries will be added to the list, and some can even be merged in both
directions, resulting in a surplus entry.

This code typically reduces the number of map entries in the
kernel_map by an order of magnitude or more.  It also makes possible
recovery from the pathological case of "5000 processes created and
then killed", which leaves behind a large number of map entries.

The only forward merge case not covered is the instance of an amap
that has to be extended backwards (WIP).  Note that this only affects
processes, not the kernel (the kernel doesn't use amaps), and that
merge opportunities like this come up *very* rarely, if at all.  Eg,
after being up for eight days, I see only three failures in this
regard, and even those are most likely due to programs I'm developing
to exercise this case.

Code reviewed by thorpej, matt, christos, mrg, chuq, chuck, perry,
tls, and probably others.  I'd like to thank my mother, the Hollywood
Foreign Press...
2002-10-18 13:18:42 +00:00
chs 94a62d45d6 add a new flag VM_MAP_DYING, which is set before we start
tearing down a vm_map.  use this to skip the pmap_update()
at the end of all the removes, which allows pmaps to optimize
pmap tear-down.  also, use the new pmap_remove_all() hook to
let the pmap implemenation know what we're up to.
2002-09-22 07:21:29 +00:00
chs 9672ac098f add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.
2002-09-15 16:54:26 +00:00
thorpej a180cee23b Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map).  Try to deal with this:

* Group all information about the backend allocator for a pool in a
  separate structure.  The pool references this structure, rather than
  the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
  to become available, but will still fail if it cannot callocate KVA
  space for the pages.  If this happens, carefully drain all pools using
  the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
  some pages, and use that information to make draining easier and more
  efficient.
* Get rid of PR_URGENT.  There was only one use of it, and it could be
  dealt with by the caller.

From art@openbsd.org.
2002-03-08 20:48:27 +00:00
chs 43973be0c5 introduce a new UVM fault type, VM_FAULT_WIREMAX. this is different
from VM_FAULT_WIRE in that when the pages being wired are faulted in,
the simulated fault is at the maximum protection allowed for the mapping
instead of the current protection.  use this in uvm_map_pageable{,_all}()
to fix the problem where writing via ptrace() to shared libraries that
are also mapped with wired mappings in another process causes a
diagnostic panic when the wired mapping is removed.

this is a really obscure problem so it deserves some more explanation.
ptrace() writing to another process ends up down in uvm_map_extract(),
which for MAP_PRIVATE mappings (such as shared libraries) will cause
the amap to be copied or created.  then the amap is made shared
(ie. the AMAP_SHARED flag is set) between the kernel and the ptrace()d
process so that the kernel can modify pages in the amap and have the
ptrace()d process see the changes.  then when the page being modified
is actually faulted on, the object pages (from the shared library vnode)
is copied to a new anon page and inserted into the shared amap.
to make all the processes sharing the amap actually see the new anon
page instead of the vnode page that was there before, we need to
invalidate all the pmap-level mappings of the vnode page in the pmaps
of the processes sharing the amap, but we don't have a good way of
doing this.  the amap doesn't keep track of the vm_maps which map it.
so all we can do at this point is to remove all the mappings of the
page with pmap_page_protect(), but this has the unfortunate side-effect
of removing wired mappings as well.  removing wired mappings with
pmap_page_protect() is a legitimate operation, it can happen when a file
with a wired mapping is truncated.  so the pmap has no way of knowing
whether a request to remove a wired mapping is normal or when it's due to
this weird situation.  so the pmap has to remove the weird mapping.
the process being ptrace()d goes away and life continues.  then,
much later when we go to unwire or remove the wired vm_map mapping,
we discover that the pmap mapping has been removed when it should
still be there, and we panic.

so where did we go wrong?  the problem is that we don't have any way
to update just the pmap mappings that need to be updated in this
scenario.  we could invent a mechanism to do this, but that is much
more complicated than this change and it doesn't seem like the right
way to go in the long run either.

the real underlying problem here is that wired pmap mappings just
aren't a good concept.  one of the original properties of the pmap
design was supposed to be that all the information in the pmap could
be thrown away at any time and the VM system could regenerate it all
through fault processing, but wired pmap mappings don't allow that.
a better design for UVM would not require wired pmap mappings,
and Chuck C. and I are talking about this, but it won't be done
anytime soon, so this change will do for now.

this change has the effect of causing MAP_PRIVATE mappings to be
copied to anonymous memory when they are mlock()d, so that uvm_fault()
doesn't need to copy these pages later when called from ptrace(), thus
avoiding the call to pmap_page_protect() and the panic that results
from this when the mlock()d region is unlocked or freed.  note that
this change doesn't help the case where the wired mapping is MAP_SHARED.

discussed at great length with Chuck Cranor.
fixes PRs 10363, 12554, 12604, 13041, 13487, 14580 and 14853.
2001-12-31 22:34:39 +00:00
chs 23c75a9a98 in uvm_map_clean(), add PGO_CLEANIT to the flags passed to an object's pager.
we need to make sure that vnode pages are written to disk at least once,
otherwise processes could gain access to whatever data was previously stored
in disk blocks which are freshly allocated to a file.
2001-12-31 20:34:01 +00:00
chs ef57a67ca1 fix locking for loaning. in general we should be looking at the page's
uobject and uanon pointers rather than at the PQ_ANON flag to determine
which lock to hold, since PQ_ANON can be clear even when the anon's lock
is the one which we should hold (if the page was loaned from an object
and then freed by the object).
2001-12-31 19:21:36 +00:00
lukem b616d1ca1d add RCSIDs, and in some cases, slightly cleanup #include order 2001-11-10 07:36:59 +00:00
chs 07d2ec83fe don't call pmap_copy() from uvmspace_fork().
a new process is very likely to call execve() immediately after fork(),
so most of the time copying the pmap mappings is wasted effort.
2001-11-06 05:27:17 +00:00
thorpej f67e15c839 uvm_map_protect(): Don't allow VM_PROT_EXECUTE to be set on entries
(either the current protection or the max protection) that reference
vnodes associated with a file system mounted with the NOEXEC option.

uvm_mmap(): Don't allow PROT_EXEC mappings to be established of vnodes
which are associated with a file system mounted with the NOEXEC option.
2001-10-30 19:05:26 +00:00
thorpej a2cd7623d4 Correct a comment. 2001-10-30 18:52:17 +00:00
thorpej e8ee04475d - Add a new vnode flag VEXECMAP, which indicates that a vnode has
executable mappings.  Stop overloading VTEXT for this purpose (VTEXT
  also has another meaning).
- Rename vn_marktext() to vn_markexec(), and use it when executable
  mappings of a vnode are established.
- In places where we want to set VTEXT, set it in v_flag directly, rather
  than making a function call to do this (it no longer makes sense to
  use a function call, since we no longer overload VTEXT with VEXECMAP's
  meaning).

VEXECMAP suggested by Chuq Silvers.
2001-10-30 15:32:01 +00:00
thorpej 7285b2c290 uvm_mmap(): If a vnode mapping is established with PROT_EXEC, mark the
vnode as VTEXT.

uvm_map_protect(): When VM_PROT_EXECUTE is added to a VA range, mark
all the vnodes mapped by the range as VTEXT.
2001-10-29 23:06:03 +00:00
chs 2adcba997b make pmap_resident_count() non-optional. 2001-09-23 06:35:30 +00:00
chs a548bfb584 add an assert. 2001-09-21 07:57:35 +00:00
chs 64c6d1d2dc a whole bunch of changes to improve performance and robustness under load:
- remove special treatment of pager_map mappings in pmaps.  this is
   required now, since I've removed the globals that expose the address range.
   pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
   no longer any need to special-case it.
 - eliminate struct uvm_vnode by moving its fields into struct vnode.
 - rewrite the pageout path.  the pager is now responsible for handling the
   high-level requests instead of only getting control after a bunch of work
   has already been done on its behalf.  this will allow us to UBCify LFS,
   which needs tighter control over its pages than other filesystems do.
   writing a page to disk no longer requires making it read-only, which
   allows us to write wired pages without causing all kinds of havoc.
 - use a new PG_PAGEOUT flag to indicate that a page should be freed
   on behalf of the pagedaemon when it's unlocked.  this flag is very similar
   to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
   pageout fails due to eg. an indirect-block buffer being locked.
   this allows us to remove the "version" field from struct vm_page,
   and together with shrinking "loan_count" from 32 bits to 16,
   struct vm_page is now 4 bytes smaller.
 - no longer use PG_RELEASED for swap-backed pages.  if the page is busy
   because it's being paged out, we can't release the swap slot to be
   reallocated until that write is complete, but unlike with vnodes we
   don't keep a count of in-progress writes so there's no good way to
   know when the write is done.  instead, when we need to free a busy
   swap-backed page, just sleep until we can get it busy ourselves.
 - implement a fast-path for extending writes which allows us to avoid
   zeroing new pages.  this substantially reduces cpu usage.
 - encapsulate the data used by the genfs code in a struct genfs_node,
   which must be the first element of the filesystem-specific vnode data
   for filesystems which use genfs_{get,put}pages().
 - eliminate many of the UVM pagerops, since they aren't needed anymore
   now that the pager "put" operation is a higher-level operation.
 - enhance the genfs code to allow NFS to use the genfs_{get,put}pages
   instead of a modified copy.
 - clean up struct vnode by removing all the fields that used to be used by
   the vfs_cluster.c code (which we don't use anymore with UBC).
 - remove kmem_object and mb_object since they were useless.
   instead of allocating pages to these objects, we now just allocate
   pages with no object.  such pages are mapped in the kernel until they
   are freed, so we can use the mapping to find the page to free it.
   this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
2001-09-15 20:36:31 +00:00
chris 0e7661f023 Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.
2001-09-10 21:19:08 +00:00
chs 2133049a7c create a new pool for map entries, allocated from kmem_map instead of
kernel_map.  use this instead of the static map entries when allocating
map entries for kernel_map.  this greatly reduces the number of static
map entries used and should eliminate the problems with running out.
2001-09-09 19:38:22 +00:00
lukem 53156d96d0 let user know current value of MAX_KMAPENT in panic 2001-09-07 00:50:54 +00:00
wiz c52d355d71 "wierd" is weird. 2001-08-20 12:20:01 +00:00
chs e9fbc91f95 user maps are always pageable. 2001-08-16 01:37:50 +00:00
wiz a9356936b4 seperate -> separate 2001-07-22 13:33:58 +00:00