so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.
When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.
to update any cpu flag due to a change to/from a 64bit and a 32bit address
space). This can set the state needed for copyout/copyin before setregs
is invoked.
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.
Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).
The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.
Instead, see if a process waits uninterruptibly like ps does,
so that the second column (`b') of default vmstat output prints
some useful value (-t is still broken though).
Maintain an array of pointer to struct vm_physseg, instead of struct
array. So that VM subsystem can take its pointer safely. Pointer
to this struct will replace raw paddr_t usage in the future.
Dynamic removal is not supported yet.
Only MD data structure changes, no kernel bump needed.
Tested on i386, amd64, powerpc/ibm40x, arm11.
vm_page *) "reverse" lookup code from uvm_page.h to uvm_page.c, to
help migration to not do that.
Likewise move per-page metadata (struct vm_page *) -> physical
address "forward" conversion code into *.c too. This is called
only low-layer VM and MD code.
lookup code from uvm_page.h to uvm_page.c.
This code is used by some pmaps to lookup per-page state (PV) from
per-segment metadata (struct vm_physseg). This is not needed if
UVM looks up physical segment once in fault handler, then directly
passes it to pmap. This change helps transition to that model.
The only users of vm_physseg_find() are pmap_motorola.c and
powerpc/ibm4xx/pmap.c.
Tested By: Compiling and running powerpc/ibm4xx/pmap.c
(evbppc/conf/OPENBLOCKS266)
in UVM external API, uvm_extern.h. Because most users care only
virtual memory.
Device drivers use bus_dma(9) to manage physical memory. Device
drivers pull in bus_dma(9) API, bus_dma.h. bus_dma(9) implementations
pull in UVM internal API, uvm.h.
Tested By: Compiling i386 ALL kernel
1. Fix inverted node order, so that negative value from comparison operator
would represent lower (left) node, and positive - higher (right) node.
2. Add an argument (i.e. "context"), passed to comparison operators.
3. Change rb_tree_insert_node() to return a node - either inserted one or
already existing one.
4. Amend the interface to manipulate the actual object, instead of the
rb_node (in a similar way as Patricia-tree interface does).
5. Update all RB-tree users accordingly.
XXX: Perhaps rename rb.h to rbtree.h, since cleaning-up..
1-3 address the PR/43488 by Jeremy Huddleston.
Passes RB-tree regression tests.
Reviewed by: matt@, christos@
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
in genfs_do_putpages() and uao_put().
Use 'v_uobj.uo_npages' to check for an empty memq.
Put some assertions where these marker pages may not appear.
Ok: YAMAMOTO Takashi <yamt@netbsd.org>
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.
hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.
x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.
Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html
No comments on this last version.
Forgot to commit this in previous.
in ubc_fault(), rework logic to "remember" the last object of page and
reduce locking overhead, since in common case pages belong to one and
the same UVM object (but not always, therefore add a comment).
Unlocks before pmap_update(), on removal of mappings, might cause TLB
coherency issues, since on architectures like x86 and mips64 invalidation
IPIs are deferred to pmap_update(). Hence, VA space might be globally
visible before IPIs are sent or while they are still in-flight.
OK ad@.
problems with large mappings. i've seen my system hang for a total
of 45 seconds when radeondrm is opened by X11, and it is the checks
in this function that take so long.
through all LWPs and duplicate locking overhead.
- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.
Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.
to the total free memory available to the system, use the smallest value
between VM_MAXUSER_ADDRESS and total free memory (having a RSS limit
bigger than VM_MAXUSER_ADDRESS has no real meaning).
Fix a possible int overflow when ptoa(uvmexp.free) is bigger than 4GB
with a 32 bits vaddr_t.
Reviewed by bouyer@.
See also http://mail-index.netbsd.org/tech-kern/2010/02/24/msg007395.html
pmap_update() without calling pmap_enter().
(Probably calling only once after loop (as done in uvm_fault_lower_lookup())
is enough. If done so, other threads see entered neighbor pages as reflected
a little latter.)
numbers. Using ptoa() will cast to vaddr_t, which might no be adequate
for architectures where sizeof(paddr_t) > sizeof(vaddr_t) (like i386 PAE).
- small fix inside AGP heuristics to avoid masking high order bits for
systems with more than 4GB.
Reviewed by bouyer@.
See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html
This blocks an easy exploit of kernel bugs leading to dereference
of a NULL pointer on some architectures (eg i386).
The check can be disabled in various ways:
-by CPP definitions in machine/types.h (portmaster's choice)
-by a kernel config option USER_VA0_DISABLED_DEFAULT=0
-at runtime by sysctl vm.user_va0_disabled (cannot be cleared
at securelevel>0)
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.
The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.
refers to another "uobj" used to call pgo_get. Revert the wrong assertion
I made. My bad.
(This and pgo_get's possible ERESTART return value check is the only 2 behavioral
changes I made.)
Reported by drochner@, thanks.
- Lower fault routines don't care the vm_anon array found in upper lookup.
Don't pass the pointer down.
- The flag "shadowed" is known when we lookup upper layer. Don't need to
keep in the fault context struct.
the original values. Pointed out by rmind@, thanks.
In the lower fault case, if (*pgo_get)() can return ERESTART and we should
re-fault for that remains a question. The original code just returned the
error, so keep that behaviour for now. In case (*pgo_get)() really returns
ERESTART, pass EIO to tell the uvm_fault caller that (*pgo_get)() failed.
(As far as I grep callers don't check if the return value is ERESTART or not.
So assuming (*pgo_get)() never returns ERESTART should be a safe bet.)
Move local variables around to isolate contexts. Note that remaining variables
are global in that function, and some hold state across re-fault.
Slilently clean-up the "eoff" mess.
(Superfluous braces will go once things settle down.)
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.
no functional change
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.
- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.
Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).
Discussed on <tech-kern>, reviewed by <ad>.
most cases, use a proper constructor. For proplib, give a local
equivalent of POOL_INIT for the kernel object implementation. This
way the code structure can be preserved, and a local link set is
not hazardous anyway (unless proplib is split to several modules,
but that'll be the day).
tested by booting a kernel in qemu and compile-testing i386/ALL