and let pmap_map_chunk() lookup the correct one to use for the
current VA. Eliminate the "l2table" argument to pmap_map_chunk().
Add a second L2 table for mapping kernel text/data/bss on the
IQ80310 (fixes booting kernels with ramdisks).
enabled by definining __ARM_FIQ_INDIRECT in <machine/types.h>.
This is needed for OpenFirmware systems (like the Shark), where
the OFW vector page is used, and kernel entries merely patched
into it.
interrupt code for the IQ80310 board support package.
XXX The Integrator board support package still uses the old-style
arm32 interrupt code, so some compatibility hacks have been added
for it. When the Integrator uses new-style interrupts, those hacks
can go away.
Those for the SA-1100 and SA-1110 are from Intel's documentation.
The mapping for the SA-110 is from various sources on the net, since Intel
don't seem to document it.
Also, change the layout of the maps to have four steppings per line,
so they aren't quite so unwieldy.
* Track which process (XXX really, vmspace) owns the mapping. When
we sync the map, if the mapping doesn't belong to the kernel or to
the current process (XXX really, vmspace), then no cache fobbing
is necessary, since the cache is Wb-Inv'd on context switch (XXX need
to revisit this when we support FCSE).
* Be smarter about which cache operation we do when sync'ing the map:
- PREREAD -- Invalidate D$ (XXX right now, we actually do Wb-Inv)
- PREWRITE -- Write-back D$ (note, we do NOT invalidate here)
- PREREAD|PREWRITE -- Wb-Inv D$
More work is needed here. In particular, a version for CPUs
with write-through caches should be provided, to eliminate
the write-back steps (which are noops on such CPUs, but skipping
two branches would be nice).
pass. Rather than providing a whole slew of cache operations that
aren't ever used, distill them down to some useful primitives:
icache_sync_all Synchronize I-cache
icache_sync_range Synchronize I-cache range
dcache_wbinv_all Write-back and Invalidate D-cache
dcache_wbinv_range Write-back and Invalidate D-cache range
dcache_inv_range Invalidate D-cache range
dcache_wb_range Write-back D-cache range
idcache_wbinv_all Write-back and Invalidate D-cache,
Invalidate I-cache
idcache_wbinv_range Write-back and Invalidate D-cache,
Invalidate I-cache range
Note: This does not yet include an overhaul of the actual asm files
that implement the primitives. Instead, we've provided a safe default
for each CPU type, and the individual CPU types can now be optimized
one at a time.
swi_handler() does stuff that all SWIs will need, then calls
curproc->p_emul->e_syscall.
syscall() handles native NetBSD system calls.
linux_syscall() handles Linux system calls.
Also update the funcs in arm32_machdep.c that create the entries so that on cats they expect the 2 pagetables to be contiguous, note this means that for now cats is special cased in lots of funcs. I'll tidy this up to something a bit more sane soon, to avoid the multitude of #ifndef cats that I had to sprinkle in.
Note that this leaves a few inconsistencies (no more than we already had though) eg initarm is now prototyped in arm32/machdep.h, however only cats currently makes use of that header.
Original i386 log message:
Optimize the case of writing to /dev/zero, and clean up the
surrounding code a bit. Partly suggested by gwr.
I think this needs to be applied to arm26 as well.
* Use a common set of exception handlers for all arm32 platforms.
* New FIQ framework based on discussions with Ben Harris, shared
between arm26 and arm32.
Note that someone needs to tidy this up, we've got 92 block devices, which just ain't true. Also appears we're actually missing some, eg the ld block device.
switch_exit only needs to take 1 parameter, it loads the value of proc0 into R1 itself
Fixup some comments to reflect the real state of things.
Tweak a couple of bits of asm to avoid a load delay.
remove excess code for setting curpcb and curproc.
On platforms which load the kernel sans symbols directly from firmware
(possibly in e.g. S-Record format), call ddb_init() with empty arguments,
so that it will search any compiled in SYMTAB_SPACE. On all other platforms,
if __ELF__, also call ddb_init() with empty arguments until ELF bootloaders
which pass symbol information are ready.
we need later in the code. This fixes a fatal kernel fault in
pmap_modified_emulation if a user application tries to access a kernel
address that is section-mapped.
Add a diagnostic that detects attempts to call pmap_kenter_pa with a
va that is section-mapped.
caching on for a page just because we are clearing the writable bit in
the PTE: this is incompatible with the way pmap_vac_me_harder works,
and the code in the modified emulation handler doesn't know about
recalculating the cachable attributes (nor should it, IMO).
Also, if we are invalidating a page, flush its TLB entry; for some
reason we were only doing this when clearing the Write or modified
bits.
These patches together seem to solve the random seg-faults that were
still occuring occasionally under heavy paging.
is higher than SPL_HIGH (maybe we should be fixing SPL_HIGH).
If IPL_STATCLOCK is defined, initialize spl_masks[_SPL_STATCLOCK] from
it; otherwise initialize use IPL_CLOCK.
(eg ARM920), the mode in which the processor operates is governed by
the use of both the PT_C and PT_B bits:
PT_C=1,PT_B=1 -> Write-back
PT_C=1,PT_B=0 -> Write-through
To support this define pte_cache_mode (initialized to PT_C|PT_B) and
use that when enabling cacheing for a page.
to allocate a L1 pt is often enough to bring the system to its knees:
so make the messages PDEBUG(0,...).
However, even with this step having more than a small number of
processes searching for a L1 pt can still be enough to bring the system
down, since they all run at high priority and sleep for very little time,
thus blocking out user code from completing. So implement an exponential
backoff when waiting for a page table, so that we don't hog the CPU when
memory is scarce.
Tested by running a make of the C compiler with "gnumake -j30" (and plenty
of swap space).
is shared with another process (as can happen if vfork is being used),
then that other process will end up not having a page 0, which is bad
news indeed, since then there is no way back into the kernel.
Found this using a multi-ice box, so they are useful after all!
This seems to fix pr port-arm32/11921 and (possibly) kern/9859.
- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.
The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
Remove some overzealous locking of HEAD_TO_MAP
Remove a potential deadlock in pmap_copy_page
Change alloc and free l1pt to use kenter/kremove.
Update pmap_map to use kenter (only actually used by dumpsys, so no matching kremove)
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.
Currently this is a no-op on most platforms, so they should see no difference.
Reviewed by Jason.
Improved locking (not that we actually use it on a uniprocessor, but one day :)
Removed unneeded splvm's
tweaked pmap_clean_page code to only flush the cache if the page is mapped in the current pmap (based on diff from richard E)
Adopted pv entry allocation mechanism from i386.
Laid framework for returning ptp's when we've finished with them rather than holding onto them till the process exits.
ptp's are now allocated with a uvm object for the pmap, means that we can walk a list to free them off in pmap_release, until they get freed off by pmap_remove.
Also implemented a page zeroing function when the processor is idling. Note that hpcarm may wish to disable this.
I believe this code to be stable, if anyone has any problems please shout up.
Note that I've some ideas in the works on how to improve the pv handling, so the slow down is short term only.
Also added non-advertising licence and copyright to myself and richard.
Update some of the functions that use pmap_pte to pmap_map_ptes.
Note that there's a dummy macro for pmap_unmap_ptes, this is because at some point locking will be needed, so we need to be able to unlock them.
Performance gain seems to be minimal, however long term it should help improve things.
This is similair to the i386 pmap_map_ptes, however it's based on a version from Richard Earnshaw.
in arm/arm. This version is based on the arm26 version, and includes dumping
the contents of stack frames, with automatic determination of the save code
pointer offset.
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).
These calls are relatively conservative. It may be possible to
optimize these a little more.
always points to the undefined instruction in question. It's up to the
handler to advance it to the next instruction if it wants execution to
continue there. This is how things have always worked on arm26.
co-processor. This is necessary so we can have several handlers for
CP0 (used as a catch-all for non-CP instructions).
Handlers are now removed using remove_coproc_handler(), rather than by calling
install_coproc_handler() with a NULL handler.
Because install_coproc_handler() can now allocate memory, there's a version
for use at boot time that doesn't.