any purpose (done by a macro, so we don't save any cycles for now)
-kill vm_fault_t; it is not needed for real faults, and for simulated
faults (wiring) it can be replaced by UVM internal flags
-remove <uvm/uvm_fault.h> from uvm_extern.h again
- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
where pgo_get returns pages which don't belong to the uobj.
also fix an XXX in uvm_loananon and lock-unlock mismatch in uvm_loanuobj.
PR/28372, PR/32665 (Alan Barrett).
use pmap_is_referenced rather than pmap_clear_reference.
we don't need to clear the bit here as we'll do so when
moving pages back to inactive queue again. pointed by Chuck Silvers.
already-copied pages are paged out. anons that have already been copied
will have refcount == 1, whereas anons that still need to be copied will
have refcount > 1. fixes PR 25392, PR 30257, PR 31924.
not running on the same CPU as the swapper. l_stat is protected by
sched_lock, which isn't held here, so we can race with that lwp
starting to run and see its l_cpu not updated yet, as in PR 31870.
we check l_stat again in uvm_swapout() while holding sched_lock,
so the race itself is harmless.
- rather than embedding bufq_state in driver softc,
have a pointer to the former.
- move bufq related functions from kern/subr_disk.c to kern/subr_bufq.c.
- rename method to strategy for consistency.
- move some definitions which don't need to be exposed to the rest of kernel
from sys/bufq.h to sys/bufq_impl.h.
(is it better to move it to kern/ or somewhere?)
- fix some obvious breakage in dev/qbus/ts.c. (not tested)
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.
i/o is done. Instead, pass an opaque cookie which is then passed to a
new routine, coredump_write, which does the actual i/o. This allows the
method of doing i/o to change without affecting any future MD code.
Also, make netbsd32_core.c [re]use core_netbsd.c (in a similar manner that
core_elf64.c uses core_elf32.c) and eliminate that code duplication.
cpu_coredump{,32} is now called twice, first with a NULL iocookie to fill
the core structure and a second to actually write md parts of the coredump.
All i/o is nolonger random access and is suitable for shipping over a stream.
protection does not include VM_PROT_READ, so that the core dumping
doesn't error out with EFAULT when trying to write that region.
Addresses PR kern/30143; approach suggested by chs@.
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.
fix "warning: resource shortage: %d pages of swap lost".
extent(9) has some undesirable characteristics for swap allocation:
- it involves alloc-to-free.
- its operational cost is O(n*n) where n is number of entries.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2
Tested on amd64, compile-tested on sparc64.
define and use vm_map_set{min,max}() for modifying these values.
remove the {min,max}_offset aliases for these vm_map fields to be more
namespace-friendly. PR 26475.
to four (adding size and direction).
In order for topdown uvm to be an option on ports using PMAP_PREFER,
they will need to "prefer" lower addresses if topdown is being used.
Additionally, at least one port also needs to know the size.
(namely exec_map and phys_map) becuase:
- normal vmmpepl is fine for them.
- some of them are tightly sized. eg. size of exec_map on vax is just NCARGS.
should fix vax boot failure reported by Johnny Billquist on current-users@.
this means we can no longer look at the vnode size to determine how many
pages to request in a fault, which is good since for NFS the size can change
out from under us on the server anyway. there's also a new flag UBC_UNMAP
for ubc_release(), so that the file system code can make the decision about
whether to cache mappings for files being used as executables.
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.
- decrement uvmexp.anonpages as it's no longer an anon page.
- null out anon->u.an_page as the anon no longer own the page.
uvm_anfree: add related assertions.
- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.
Welcome to 2.0F.
Approved by: Jason R. Thorpe <thorpej@netbsd.org>
PG_RELEASED for anon pages. PR/23171 from Christian Limpach.
for details, see discussion filed in the PR database.
uvm_anon_release: a new function to free anon-owned PG_RELEASED page.
uvm_anfree: we can't wait for the page here because the caller might hold
amap lock. instead, just mark the page as PG_RELEASED.
who unbusy the page should check the PG_RELEASED.
uvm_aio_aiodone: uvm_anon_release() instead of uvm_page_unbusy()
if appropriate.
uvmfault_anonget: check PG_RELEASED.
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.
Convert struct session, ucred and lockf to pools.
insert the replacement page into the same position
as the original page on the object memq so that
genfs_putpages (and lfs) won't be confused.
noted by Stephan Uphoff (PR/24328)
the latter is not a appropriate place to do so and it broke vfork.
- deactivate pmap before calling cpu_exit() to keep a balance of
pmap_activate/deactivate.
which is zero by default.
perform rbtree sanity checks only when it isn't zero
because the check is very heavy weight especially when
there're many entries.
in the case that there's no cached entries,
if kmem_map is already up, allocate a entry from it
so that we won't try to vm_map_lock recursively.
XXX assuming usage pattern of kmem_map.
and uncontrolled growth.
The key fix is from Dan Carasone, who noticed that buf_canfree() was
counting in _bytes_ but freeing in _buffers_, which caused the instant
drop to lowater observed by some users.
We now control the rate of growth; the probability of getting a new
allocation is inversely proportional to the current size of the
cache. This idea is from a long-ago conversation with Kirk McKusick
and, if memory serves, was used for the file-system cache in some
other BSD variant at some point in history.
With growth and shrinkage more or less dealt with, we return the
default maximum cache size to 15%. The default _minimum_ cache size
is raised from 1/16 of the maximum cache size to 1/8, since 1/16 was
chosen when the maximum size was 30% of memory.
Finally, after observing the behaviour of the pagedaemon and the
buffer cache drainer under pathological workloads (e.g. a benchmark
that steps through 75% of available memory backwards) I have moved
the call to buf_drain() to the beginning of the pagedaemon from the
end; if the pagedaemon bogs down, it still won't get run as often
as it should, but at least this way it will see the state of the
free count and free target _before_ the scan step does its thing.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.
VOP_STRATEGY(bp) is replaced by one of two new functions:
- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.
DEV_STRATEGY(bp) is used only for block-to-block device situations.
This avoids problems with the kernel grovelling vmstat -u/-U when
using LOCKDEBUG, which changes the size of struct simplelock.
Replaced the original location of the simplelock with "int unused"
so that binary compatibility will be retained with old vmstat.