NetBSD

Author	SHA1	Message	Date
chs	709a3b4e52	two changes in improve scalability: (1) split the single list of pages allocated to a pool into three lists: completely full, partially full, and completely empty. there is no longer any need to traverse any list looking for a certain type of page. (2) replace the 8-element hash table for out-of-page page headers with a splay tree. these two changes (together with the recent enhancements to the wait code) give us linear scaling for a fork+exit microbenchmark.	2003-11-13 02:44:01 +00:00
yamt	8b91732e18	fix wrong assertions. they can be false due to alignment requiments (and PMAP_PREFER).	2003-11-06 12:45:26 +00:00
yamt	b479cef701	don't move hint backward.	2003-11-05 15:34:50 +00:00
yamt	171053e863	- fix a reversed comparison. - fix "nextgap" case. - make sure don't get addresses behind hint. - deal with integer wraparounds better. - assertions.	2003-11-05 15:09:09 +00:00
yamt	c6d9c8814d	fix a wrong assertion. pointed by Christian Limpach.	2003-11-02 07:58:52 +00:00
yamt	142a2d4058	- update uvm_map::size fewer places. - add related assertions.	2003-11-01 19:56:09 +00:00
yamt	c45bf442f2	commit rest of the previous (rbtree). (i should check .rej files before commit, sorry)	2003-11-01 19:45:13 +00:00
yamt	57e554da69	track map entries and free spaces using red-black tree to improve scalability of operations on the map. originally done by Niels Provos for OpenBSD. tweaked for NetBSD by me with some advices from enami tsugutomo. discussed on tech-kern@ and tech-perform@.	2003-11-01 11:09:02 +00:00
junyoung	b28a286e6a	KNF.	2003-10-25 23:05:45 +00:00
enami	57a6593f52	Fix indent.	2003-10-09 03:12:29 +00:00
atatat	d4de28f890	When pulling back an amap to cover the new allocation along with the previous entry, don't add the size to the extension -- it's already been added to the end of the previous entry.	2003-10-09 02:44:54 +00:00
enami	ae9b5cba84	Rewrite uvm_map_findspace() to improve readability and to fix a bug that it may return space already in use as free space under some condition. The symptom of the bug is that exec fails if stack is unlimited on topdown VM kernel.	2003-10-02 00:02:10 +00:00
enami	0ca733e759	Some whitespace fixes.	2003-10-01 23:08:32 +00:00
enami	aa87bee0c5	ansi'fy.	2003-10-01 22:50:15 +00:00
yamt	91161caf3c	use VM_PAGE_TO_PHYS macro instead of using phys_addr directly.	2003-08-26 15:12:18 +00:00
thorpej	03befad98b	In uvm_map_clean(), only call pgo_put if the object has one. From Quentin Garnier <quatriemek.com!netbsd>.	2003-04-09 21:39:29 +00:00
matt	76dd2c90fa	In uvm_map_space, if the current entry is above the new space use the previous entry. (not if the current entry starts at the end of the new space; that case doesn't take into account if the new space had a specified alignment).	2003-03-02 08:57:49 +00:00
matt	d6729b1f53	When finding an aligned block, we need to truncate in topdown, not roundup.	2003-03-02 02:55:03 +00:00
simonb	0b2b1cc0cc	Remove assigned-to but not used variable.	2003-02-23 04:53:51 +00:00
matt	23b48be61f	fix a tpyo in a comment.	2003-02-21 16:38:44 +00:00
atatat	df0a9badc6	Introduce "top down" memory management for mmap()ed allocations. This means that the dynamic linker gets mapped in at the top of available user virtual memory (typically just below the stack), shared libraries get mapped downwards from that point, and calls to mmap() that don't specify a preferred address will get mapped in below those. This means that the heap and the mmap()ed allocations will grow towards each other, allowing one or the other to grow larger than before. Previously, the heap was limited to MAXDSIZ by the placement of the dynamic linker (and the process's rlimits) and the space available to mmap was hobbled by this reservation. This is currently only enabled via an option for the i386 platform (though other platforms are expected to follow). Add "options USE_TOPDOWN_VM" to your kernel config file, rerun config, and rebuild your kernel to take advantage of this. Note that the pmap_prefer() interface has not yet been modified to play nicely with this, so those platforms require a bit more work (most notably the sparc) before they can use this new memory arrangement. This change also introduces a VM_DEFAULT_ADDRESS() macro that picks the appropriate default address based on the size of the allocation or the size of the process's text segment accordingly. Several drivers and the SYSV SHM address assignment were changed to use this instead of each one picking their own "default".	2003-02-20 22:16:05 +00:00
thorpej	b193480908	Add extensible malloc types, adapted from FreeBSD. This turns malloc types into a structure, a pointer to which is passed around, instead of an int constant. Allow the limit to be adjusted when the malloc type is defined, or with a function call, as suggested by Jonathan Stone.	2003-02-01 06:23:35 +00:00
christos	5c729d909f	finally: step 5: disable a KASSERT() if we are doing_shutdown. now sync from ddb should work as badly as before the nathanw_sa merge.	2003-01-21 00:03:07 +00:00
thorpej	b78f59b443	Merge the nathanw_sa branch.	2003-01-18 08:51:40 +00:00
thorpej	130e5c278b	UVM_KMF_NOWAIT -> UVM_FLAG_NOWAIT	2002-12-11 07:14:28 +00:00
bouyer	d986226518	Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change amap_extend() to take a flags parameter instead of just boolean for direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags (AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to read). Add a flag parameter to uvm_mapent_alloc(). This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK) in uvm_mapent_alloc(). Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe for feedback.	2002-11-30 18:28:04 +00:00
atatat	42c2fe641b	Implement backwards extension of amaps. There are three cases to deal with: Case #1 -- adjust offset: The slot offset in the aref can be decremented to cover the required size addition. Case #2 -- move pages and adjust offset: The slot offset is not large enough, but the amap contains enough inactive space after the mapped pages to make up the difference, so active slots are slid to the "end" of the amap, and the slot offset is, again, adjusted to cover the required size addition. This optimizes for hitting case #1 again on the next small extension. Case #3 -- reallocate, move pages, and adjust offset: There is not enough inactive space in the amap, so the arrays are reallocated, and the active pages are copied again to the "end" of the amap, and the slot offset is adjusted to cover the required size. This also optimizes for hitting case #1 on the next backwards extension. This provides the missing piece in the "forward extension of vm_map_entries" logic, so the merge failure counters have been removed. Not many applications will make any use of this at this time (except for jvms and perhaps gcc3), but a "top-down" memory allocator will use it extensively.	2002-11-14 17:58:48 +00:00
perry	bbad42171f	/CONTCOND/ while (0)'ed macros	2002-11-02 07:40:47 +00:00
atatat	68277bb301	In the case of a double amap_extend() (during a forward merge after a back merge), don't abort the allocation if the second extend fails, just abort the forward merge and finish the allocation. Code reviewed by thorpej.	2002-10-24 22:22:28 +00:00
atatat	2d6863ada3	Call amap_extend() a second time in the case of a bimerge (both backwards and forwards) if the previous entry was backed by an amap. Fixes pr kern/18789, where netscape 7 + a java applet actually manage to incur forward and bimerges in userspace. Code reviewed by fvdl and thorpej.	2002-10-24 20:37:59 +00:00
atatat	94ef8e0795	Add an implementation of forward merging of new map entries. Most new allocations can be merged either forwards or backwards, meaning no new entries will be added to the list, and some can even be merged in both directions, resulting in a surplus entry. This code typically reduces the number of map entries in the kernel_map by an order of magnitude or more. It also makes possible recovery from the pathological case of "5000 processes created and then killed", which leaves behind a large number of map entries. The only forward merge case not covered is the instance of an amap that has to be extended backwards (WIP). Note that this only affects processes, not the kernel (the kernel doesn't use amaps), and that merge opportunities like this come up very rarely, if at all. Eg, after being up for eight days, I see only three failures in this regard, and even those are most likely due to programs I'm developing to exercise this case. Code reviewed by thorpej, matt, christos, mrg, chuq, chuck, perry, tls, and probably others. I'd like to thank my mother, the Hollywood Foreign Press...	2002-10-18 13:18:42 +00:00
chs	94a62d45d6	add a new flag VM_MAP_DYING, which is set before we start tearing down a vm_map. use this to skip the pmap_update() at the end of all the removes, which allows pmaps to optimize pmap tear-down. also, use the new pmap_remove_all() hook to let the pmap implemenation know what we're up to.	2002-09-22 07:21:29 +00:00
chs	9672ac098f	add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to return failure if swap is full and there are no free physical pages. have malloc() use this flag if M_CANFAIL is passed to it. use M_CANFAIL to allow amap_extend() to fail when memory is scarce. this should prevent most of the remaining hangs in low-memory situations.	2002-09-15 16:54:26 +00:00
thorpej	a180cee23b	Pool deals fairly well with physical memory shortage, but it doesn't deal with shortages of the VM maps where the backing pages are mapped (usually kmem_map). Try to deal with this: * Group all information about the backend allocator for a pool in a separate structure. The pool references this structure, rather than the individual fields. * Change the pool_init() API accordingly, and adjust all callers. * Link all pools using the same backend allocator on a list. * The backend allocator is responsible for waiting for physical memory to become available, but will still fail if it cannot callocate KVA space for the pages. If this happens, carefully drain all pools using the same backend allocator, so that some KVA space can be freed. * Change pool_reclaim() to indicate if it actually succeeded in freeing some pages, and use that information to make draining easier and more efficient. * Get rid of PR_URGENT. There was only one use of it, and it could be dealt with by the caller. From art@openbsd.org.	2002-03-08 20:48:27 +00:00
chs	43973be0c5	introduce a new UVM fault type, VM_FAULT_WIREMAX. this is different from VM_FAULT_WIRE in that when the pages being wired are faulted in, the simulated fault is at the maximum protection allowed for the mapping instead of the current protection. use this in uvm_map_pageable{,_all}() to fix the problem where writing via ptrace() to shared libraries that are also mapped with wired mappings in another process causes a diagnostic panic when the wired mapping is removed. this is a really obscure problem so it deserves some more explanation. ptrace() writing to another process ends up down in uvm_map_extract(), which for MAP_PRIVATE mappings (such as shared libraries) will cause the amap to be copied or created. then the amap is made shared (ie. the AMAP_SHARED flag is set) between the kernel and the ptrace()d process so that the kernel can modify pages in the amap and have the ptrace()d process see the changes. then when the page being modified is actually faulted on, the object pages (from the shared library vnode) is copied to a new anon page and inserted into the shared amap. to make all the processes sharing the amap actually see the new anon page instead of the vnode page that was there before, we need to invalidate all the pmap-level mappings of the vnode page in the pmaps of the processes sharing the amap, but we don't have a good way of doing this. the amap doesn't keep track of the vm_maps which map it. so all we can do at this point is to remove all the mappings of the page with pmap_page_protect(), but this has the unfortunate side-effect of removing wired mappings as well. removing wired mappings with pmap_page_protect() is a legitimate operation, it can happen when a file with a wired mapping is truncated. so the pmap has no way of knowing whether a request to remove a wired mapping is normal or when it's due to this weird situation. so the pmap has to remove the weird mapping. the process being ptrace()d goes away and life continues. then, much later when we go to unwire or remove the wired vm_map mapping, we discover that the pmap mapping has been removed when it should still be there, and we panic. so where did we go wrong? the problem is that we don't have any way to update just the pmap mappings that need to be updated in this scenario. we could invent a mechanism to do this, but that is much more complicated than this change and it doesn't seem like the right way to go in the long run either. the real underlying problem here is that wired pmap mappings just aren't a good concept. one of the original properties of the pmap design was supposed to be that all the information in the pmap could be thrown away at any time and the VM system could regenerate it all through fault processing, but wired pmap mappings don't allow that. a better design for UVM would not require wired pmap mappings, and Chuck C. and I are talking about this, but it won't be done anytime soon, so this change will do for now. this change has the effect of causing MAP_PRIVATE mappings to be copied to anonymous memory when they are mlock()d, so that uvm_fault() doesn't need to copy these pages later when called from ptrace(), thus avoiding the call to pmap_page_protect() and the panic that results from this when the mlock()d region is unlocked or freed. note that this change doesn't help the case where the wired mapping is MAP_SHARED. discussed at great length with Chuck Cranor. fixes PRs 10363, 12554, 12604, 13041, 13487, 14580 and 14853.	2001-12-31 22:34:39 +00:00
chs	23c75a9a98	in uvm_map_clean(), add PGO_CLEANIT to the flags passed to an object's pager. we need to make sure that vnode pages are written to disk at least once, otherwise processes could gain access to whatever data was previously stored in disk blocks which are freshly allocated to a file.	2001-12-31 20:34:01 +00:00
chs	ef57a67ca1	fix locking for loaning. in general we should be looking at the page's uobject and uanon pointers rather than at the PQ_ANON flag to determine which lock to hold, since PQ_ANON can be clear even when the anon's lock is the one which we should hold (if the page was loaned from an object and then freed by the object).	2001-12-31 19:21:36 +00:00
lukem	b616d1ca1d	add RCSIDs, and in some cases, slightly cleanup #include order	2001-11-10 07:36:59 +00:00
chs	07d2ec83fe	don't call pmap_copy() from uvmspace_fork(). a new process is very likely to call execve() immediately after fork(), so most of the time copying the pmap mappings is wasted effort.	2001-11-06 05:27:17 +00:00
thorpej	f67e15c839	uvm_map_protect(): Don't allow VM_PROT_EXECUTE to be set on entries (either the current protection or the max protection) that reference vnodes associated with a file system mounted with the NOEXEC option. uvm_mmap(): Don't allow PROT_EXEC mappings to be established of vnodes which are associated with a file system mounted with the NOEXEC option.	2001-10-30 19:05:26 +00:00
thorpej	a2cd7623d4	Correct a comment.	2001-10-30 18:52:17 +00:00
thorpej	e8ee04475d	- Add a new vnode flag VEXECMAP, which indicates that a vnode has executable mappings. Stop overloading VTEXT for this purpose (VTEXT also has another meaning). - Rename vn_marktext() to vn_markexec(), and use it when executable mappings of a vnode are established. - In places where we want to set VTEXT, set it in v_flag directly, rather than making a function call to do this (it no longer makes sense to use a function call, since we no longer overload VTEXT with VEXECMAP's meaning). VEXECMAP suggested by Chuq Silvers.	2001-10-30 15:32:01 +00:00
thorpej	7285b2c290	uvm_mmap(): If a vnode mapping is established with PROT_EXEC, mark the vnode as VTEXT. uvm_map_protect(): When VM_PROT_EXECUTE is added to a VA range, mark all the vnodes mapped by the range as VTEXT.	2001-10-29 23:06:03 +00:00
chs	2adcba997b	make pmap_resident_count() non-optional.	2001-09-23 06:35:30 +00:00
chs	a548bfb584	add an assert.	2001-09-21 07:57:35 +00:00
chs	64c6d1d2dc	a whole bunch of changes to improve performance and robustness under load: - remove special treatment of pager_map mappings in pmaps. this is required now, since I've removed the globals that expose the address range. pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's no longer any need to special-case it. - eliminate struct uvm_vnode by moving its fields into struct vnode. - rewrite the pageout path. the pager is now responsible for handling the high-level requests instead of only getting control after a bunch of work has already been done on its behalf. this will allow us to UBCify LFS, which needs tighter control over its pages than other filesystems do. writing a page to disk no longer requires making it read-only, which allows us to write wired pages without causing all kinds of havoc. - use a new PG_PAGEOUT flag to indicate that a page should be freed on behalf of the pagedaemon when it's unlocked. this flag is very similar to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the pageout fails due to eg. an indirect-block buffer being locked. this allows us to remove the "version" field from struct vm_page, and together with shrinking "loan_count" from 32 bits to 16, struct vm_page is now 4 bytes smaller. - no longer use PG_RELEASED for swap-backed pages. if the page is busy because it's being paged out, we can't release the swap slot to be reallocated until that write is complete, but unlike with vnodes we don't keep a count of in-progress writes so there's no good way to know when the write is done. instead, when we need to free a busy swap-backed page, just sleep until we can get it busy ourselves. - implement a fast-path for extending writes which allows us to avoid zeroing new pages. this substantially reduces cpu usage. - encapsulate the data used by the genfs code in a struct genfs_node, which must be the first element of the filesystem-specific vnode data for filesystems which use genfs_{get,put}pages(). - eliminate many of the UVM pagerops, since they aren't needed anymore now that the pager "put" operation is a higher-level operation. - enhance the genfs code to allow NFS to use the genfs_{get,put}pages instead of a modified copy. - clean up struct vnode by removing all the fields that used to be used by the vfs_cluster.c code (which we don't use anymore with UBC). - remove kmem_object and mb_object since they were useless. instead of allocating pages to these objects, we now just allocate pages with no object. such pages are mapped in the kernel until they are freed, so we can use the mapping to find the page to free it. this allows us to remove splvm() protection in several places. The sum of all these changes improves write throughput on my decstation 5000/200 to within 1% of the rate of NetBSD 1.5 and reduces the elapsed time for "make release" of a NetBSD 1.5 source tree on my 128MB pc to 10% less than a 1.5 kernel took.	2001-09-15 20:36:31 +00:00
chris	0e7661f023	Update pmap_update to now take the updated pmap as an argument. This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment. Currently this is a no-op on most platforms, so they should see no difference. Reviewed by Jason.	2001-09-10 21:19:08 +00:00
chs	2133049a7c	create a new pool for map entries, allocated from kmem_map instead of kernel_map. use this instead of the static map entries when allocating map entries for kernel_map. this greatly reduces the number of static map entries used and should eliminate the problems with running out.	2001-09-09 19:38:22 +00:00
lukem	53156d96d0	let user know current value of MAX_KMAPENT in panic	2001-09-07 00:50:54 +00:00
wiz	c52d355d71	"wierd" is weird.	2001-08-20 12:20:01 +00:00

1 2 3 4

151 Commits