NetBSD/sys/vm/TODO

#	$NetBSD: TODO,v 1.3 1998/01/05 19:40:40 perry Exp $

A random assortment of things that I have thought about from time to time.
The biggie is:

0. Merge the page and buffer caches.
   This has been bandied about for a long time.  First need to decide
   whether you use VFS routines to do pagein/pageout or VM routines to
   do IO?  Lots of other things to worry about: mismatches in page/FS-block
   sizes, how to balance their memory needs, how is anon memory represented,
   how do you get file meta-data, etc.

or more modestly:

1. Use the multi-page pager interface to implement clustered pageins.
   Probably can't be as aggressive (w.r.t. cluster size) as in clustered
   pageout.  Maybe keep some kind of window ala the vfs_cluster routine
   or maybe always just be conservative.

2. vm_object_page_clean() needs work.
   For one, it uses a worst-case O(N**2) algorithm.  Since we might block
   in the pageout routine, it has to start over again afterward as things
   may have changed in the meantime.  Someone else actively writing pages
   in the object could keep this routine going forever also.  Note that
   just holding the object lock would be insufficient (even if it was safe)
   since these locks compile away on non-MP machines (i.e. always).
   Maybe we need an OBJ_BUSY flag to be check by anyone attempting to
   insert, modify or delete pages in the object.  This routine should also
   use clustering like vm_pageout to speed things along.

3. Do aggressive swapout.
   Right now the swapper just unwires the u-area allowing a process to be
   paged into oblivion.  We could use vm_map_clean() to force a process out
   in a hurry though this should probably only be done for "private" objects
   (i.e. refcount == 1).

4. Rethink sharing maps.
   Right now they are used inconsistently: related (via fork) processes
   sharing memory have one, unrelated (via mmap) processes don't.  Mach
   eliminated these a while back, I'm not sure what the right thing to do
   here is.

5. Use fictitious pages in vm_fault.
   Right now a real page is allocated in the top level object to prevent
   other faults from simultaneously going down the shadow chain.  Later,
   a second real page may be allocated.  Current Mach allocates a fictitious
   page in the top object and replaces it with a real one as necessary.

6. Improve the pageout daemon.
   It suffers from the same problem the old (4.2 vintage?) BSD one did.
   With large physical memories, cleaned pages may not be freed for a long
   time.  In the meantime, the daemon will continue cleaning more pages in
   an attempt to free memory.  This can lead to bursts of paging activity
   and erratic levels in the free list.

7. Nuke MAP_COPY.
   It isn't true anyway.  You can still get data modified after the virtual
   copy for pages that aren't present in memory at the time of the copy.
   The only concern with getting rid of it is that exec uses it for mapping
   the text of an executable (to deal with the modified text problem).
   MAP_COPY could probably be fixed but I don't think it is worth it.  If
   you want true copy semantics, use read().

8. Try harder to collapse objects. [ DONE - vm_object_collapse() ]
   Can wind up with a lot of wasted swap space in needlessly long shadow
   chains.  The problem is that you cannot collapse an object's backing
   object if the first object has a pager.  Since all our pagers have
   relatively inexpensive routines to determine if a pager object has a
   particular page, we could do a better job.  Probably don't want to go
   as far as bringing pages in from the backing object's pager just to move
   them to the primary object.

9. Implement madvise (Sun style).
   MADV_RANDOM: don't do clustered pageins. (like now!)
   MADV_SEQUENTIAL: in vm_fault, deactivate cached pages with lower
    offsets than the desired page.  Also only do forward read-ahead.
   MADV_WILLNEED: vm_fault the range, maybe deactivate to avoid conspicuous
    consumption.
   MADV_DONTNEED: clean and free the range.  Is this identical to msync
    with MS_INVALIDATE?

10. Machine dependent hook for virtual memory allocation. [ DONE -
    PMAP_PREFER() ]
   When the system gets to chose where something is placed in an address
   space, it should call a pmap routine to choose a desired location.
   This is useful for virtually-indexed cache machine where there are magic
   alignments that can prevent aliasing problems.

11. Allow vnode pager to be the default pager.
   Mostly interface (how to configure a swap file) and policy (what objects
   are backed in which files) needed.

12. Keep page/buffer caches coherent. [ A LITTLE BETTER - we now do a
    vnode_pager_sync() in sys_sync() ]
   Assuming #0 is not done.  Right now, very little is done.  The VM does
   track file size changes (vnode_pager_setsize) so that mapped accesses
   to truncated files give the correct response (SIGBUS).  It also purges
   unmapped cached objects whenever the corresponding file is changed
   (vnode_pager_uncache) but it doesn't maintain coherency of mapped objects
   that are changed via read/write (or visa-versa).  Reasonable explicit
   coherency can be maintained with msync but that is pretty feeble.

13. Properly handle sharing in the presence of wired pages.
   Right now it is possible to remove wired pages via pmap_page_protect.
   This has become an issue with the addition of the mlock() call which allows
   the situation where there are multiple mappings for a phys page and one or
   more of them are wired.  It is then possible that pmap_page_protect() with
   VM_PROT_NONE will be invoked.  Most implementations will go ahead and
   remove the wired mapping along with all other mappings, violating the
   assumption of wired-ness and potentially causing a panic later on when
   an attempt is made to unwire the page and the mapping doesn't exist.
   A work around of not removing wired mappings in pmap_page_protect is
   implemented in the hp300 pmap but leads to a condition that may be just
   as bad, "detached mappings" that exist at the pmap level but are unknown
   to the higher level VM.
----
Mike Hibler
University of Utah CSS group
mike@cs.utah.edu
RCSID Police. 1998-01-05 22:40:40 +03:00			`# $NetBSD: TODO,v 1.3 1998/01/05 19:40:40 perry Exp $`

lite2 1997-02-23 12:58:53 +03:00			`A random assortment of things that I have thought about from time to time.`
			`The biggie is:`

			`0. Merge the page and buffer caches.`
			`This has been bandied about for a long time. First need to decide`
			`whether you use VFS routines to do pagein/pageout or VM routines to`
			`do IO? Lots of other things to worry about: mismatches in page/FS-block`
			`sizes, how to balance their memory needs, how is anon memory represented,`
			`how do you get file meta-data, etc.`

			`or more modestly:`

			`1. Use the multi-page pager interface to implement clustered pageins.`
			`Probably can't be as aggressive (w.r.t. cluster size) as in clustered`
			`pageout. Maybe keep some kind of window ala the vfs_cluster routine`
			`or maybe always just be conservative.`

			`2. vm_object_page_clean() needs work.`
			`For one, it uses a worst-case O(N**2) algorithm. Since we might block`
			`in the pageout routine, it has to start over again afterward as things`
			`may have changed in the meantime. Someone else actively writing pages`
			`in the object could keep this routine going forever also. Note that`
			`just holding the object lock would be insufficient (even if it was safe)`
			`since these locks compile away on non-MP machines (i.e. always).`
			`Maybe we need an OBJ_BUSY flag to be check by anyone attempting to`
			`insert, modify or delete pages in the object. This routine should also`
			`use clustering like vm_pageout to speed things along.`

			`3. Do aggressive swapout.`
			`Right now the swapper just unwires the u-area allowing a process to be`
			`paged into oblivion. We could use vm_map_clean() to force a process out`
			`in a hurry though this should probably only be done for "private" objects`
			`(i.e. refcount == 1).`

			`4. Rethink sharing maps.`
			`Right now they are used inconsistently: related (via fork) processes`
			`sharing memory have one, unrelated (via mmap) processes don't. Mach`
			`eliminated these a while back, I'm not sure what the right thing to do`
			`here is.`

			`5. Use fictitious pages in vm_fault.`
			`Right now a real page is allocated in the top level object to prevent`
			`other faults from simultaneously going down the shadow chain. Later,`
			`a second real page may be allocated. Current Mach allocates a fictitious`
			`page in the top object and replaces it with a real one as necessary.`

			`6. Improve the pageout daemon.`
			`It suffers from the same problem the old (4.2 vintage?) BSD one did.`
			`With large physical memories, cleaned pages may not be freed for a long`
			`time. In the meantime, the daemon will continue cleaning more pages in`
			`an attempt to free memory. This can lead to bursts of paging activity`
			`and erratic levels in the free list.`

			`7. Nuke MAP_COPY.`
			`It isn't true anyway. You can still get data modified after the virtual`
			`copy for pages that aren't present in memory at the time of the copy.`
			`The only concern with getting rid of it is that exec uses it for mapping`
			`the text of an executable (to deal with the modified text problem).`
			`MAP_COPY could probably be fixed but I don't think it is worth it. If`
			`you want true copy semantics, use read().`

Update the status of a couple of items here... specifically, more aggressive object collapse (DONE), machine-dependent hook for virtual memory allocation (DONE - PMAP_PREFER()), and better coherency between page and buffer caches (A LITTLE BETTER - we sync up the vnode pager in the sync(2) system call now). 1997-06-26 05:02:07 +04:00			`8. Try harder to collapse objects. [ DONE - vm_object_collapse() ]`
lite2 1997-02-23 12:58:53 +03:00			`Can wind up with a lot of wasted swap space in needlessly long shadow`
			`chains. The problem is that you cannot collapse an object's backing`
			`object if the first object has a pager. Since all our pagers have`
			`relatively inexpensive routines to determine if a pager object has a`
			`particular page, we could do a better job. Probably don't want to go`
			`as far as bringing pages in from the backing object's pager just to move`
			`them to the primary object.`

			`9. Implement madvise (Sun style).`
			`MADV_RANDOM: don't do clustered pageins. (like now!)`
			`MADV_SEQUENTIAL: in vm_fault, deactivate cached pages with lower`
			`offsets than the desired page. Also only do forward read-ahead.`
			`MADV_WILLNEED: vm_fault the range, maybe deactivate to avoid conspicuous`
			`consumption.`
			`MADV_DONTNEED: clean and free the range. Is this identical to msync`
			`with MS_INVALIDATE?`

Update the status of a couple of items here... specifically, more aggressive object collapse (DONE), machine-dependent hook for virtual memory allocation (DONE - PMAP_PREFER()), and better coherency between page and buffer caches (A LITTLE BETTER - we sync up the vnode pager in the sync(2) system call now). 1997-06-26 05:02:07 +04:00			`10. Machine dependent hook for virtual memory allocation. [ DONE -`
			`PMAP_PREFER() ]`
lite2 1997-02-23 12:58:53 +03:00			`When the system gets to chose where something is placed in an address`
			`space, it should call a pmap routine to choose a desired location.`
			`This is useful for virtually-indexed cache machine where there are magic`
			`alignments that can prevent aliasing problems.`

			`11. Allow vnode pager to be the default pager.`
			`Mostly interface (how to configure a swap file) and policy (what objects`
			`are backed in which files) needed.`

Update the status of a couple of items here... specifically, more aggressive object collapse (DONE), machine-dependent hook for virtual memory allocation (DONE - PMAP_PREFER()), and better coherency between page and buffer caches (A LITTLE BETTER - we sync up the vnode pager in the sync(2) system call now). 1997-06-26 05:02:07 +04:00			`12. Keep page/buffer caches coherent. [ A LITTLE BETTER - we now do a`
			`vnode_pager_sync() in sys_sync() ]`
lite2 1997-02-23 12:58:53 +03:00			`Assuming #0 is not done. Right now, very little is done. The VM does`
			`track file size changes (vnode_pager_setsize) so that mapped accesses`
			`to truncated files give the correct response (SIGBUS). It also purges`
			`unmapped cached objects whenever the corresponding file is changed`
			`(vnode_pager_uncache) but it doesn't maintain coherency of mapped objects`
			`that are changed via read/write (or visa-versa). Reasonable explicit`
			`coherency can be maintained with msync but that is pretty feeble.`

			`13. Properly handle sharing in the presence of wired pages.`
			`Right now it is possible to remove wired pages via pmap_page_protect.`
			`This has become an issue with the addition of the mlock() call which allows`
			`the situation where there are multiple mappings for a phys page and one or`
			`more of them are wired. It is then possible that pmap_page_protect() with`
			`VM_PROT_NONE will be invoked. Most implementations will go ahead and`
			`remove the wired mapping along with all other mappings, violating the`
			`assumption of wired-ness and potentially causing a panic later on when`
			`an attempt is made to unwire the page and the mapping doesn't exist.`
			`A work around of not removing wired mappings in pmap_page_protect is`
			`implemented in the hp300 pmap but leads to a condition that may be just`
			`as bad, "detached mappings" that exist at the pmap level but are unknown`
			`to the higher level VM.`
			`----`
			`Mike Hibler`
			`University of Utah CSS group`
			`mike@cs.utah.edu`