cache, as well. The mini-data cache is 2-way, so src and dst won't
clobber each other, and the smallness of the cache doesn't matter,
since we access each page once sequentially.
While we still have to do the initial clean of the source page, this
saves another 4K of main D$ pollution, and also means we don't have
to do 2 cache passes after the copy is complete (i.e. we can skip the
invalidation of the source page in the main cache, since it's no longer
there).
vs. XScale. Use the mini-data cache for the destination on XScale,
thus saving tossing out 4K of possible-useful data from the main
data cache each time.
This significantly improves every test in lmbench.
and pte_l2_s_cache_mode. The cache-meaningful bits are different
for these descriptor types on some processor models.
* Add pte_*_cache_mask, corresponding to each above, which has a mask
of the cache-meangful bits, and define those for generic and XScale
MMU classes. Note, the L2_S_CACHE_MASK_xscale definition requires
use of the Extended Small Page L2 descriptor (the "X" bit overlaps
with AP bits otherwise).
L1_C_PROTO_xscale; while they are supposed to be set to 1 on generic
ARM MMUs (according to the SA-110 and ARM920T manuals), they are listed
as "should be zero" in the i80200 manual.
1. Generic (compatible with ARM6)
1. XScale (can be used as generic, but also has certainly nifty extensions).
Define abstract PTE bit defintions for each MMU class. If only one MMU
class is configured into the kernel (based on CPU_* options), then we
get the constants for that MMU class. Otherwise we indirect through
varaibles set up via set_cpufuncs().
XXX The XScale bits are currently the same as the generic bits. Baby steps.
Significant cleanup, here, including better PTE bit names.
* Add XScale PTE extensions (ECC enable, write-allocate cache mode).
* Mechanical changes everywhere else to update for new pte.h. While
doing this, two bugs (as a result of typos) were fixed in
arm/arm32/bus_dma.c
evbarm/integrator/int_bus_dma.c
the lower bits; UVM provides us page-aligned addresses for
everything. For the paranoid, we'll leave KDASSERT()'s in
that check for this if the kernel is built with DEBUG.
Low-hanging fruit that shaves some cycles.
pmap.h and give them more descriptive names and better comments:
* PT_M -> PVF_MOD (page is modified)
* PT_H -> PVF_REF (page is referenced)
* PT_W -> PVF_WIRED (mapping is wired)
* PT_Wr -> PVF_WRITE (mapping is writable)
* PT_NC -> PVF_NC (mapping is non-cacheable; multiple mappings)
* Don't refer to VA 0, instead refer to a new variable: vector_page
* Delete the old zero_page_*() functions, replacing them with a new
one: vector_page_setprot().
* When manipulating vector page mappings in user pmaps, only do so if
the vector page is below KERNEL_BASE (if it's above KERNEL_BASE, the
vector page is mapped by the kernel pmap).
* Add a new function, arm32_vector_init(), which takes the virtual
address of the vector page (which MUST be valid when the function
is called) and a bitmask of vectors the kernel is going to take
over, and performs all vector page initialization, including setting
the V bit in the CPU Control register ("relocate vectors to high
address"), if necessary.
when the part being quiried was mapped with a section (!) giving weird
results and had become a mess of goto's.
Complete rewrite and cleaned up the `goto'-jungle entirely ... ripped all
goto's. The resulting code is much better to read and might even have a
small performance gain.
I/O processors:
* The i80200 and the i80321 have the same CPU ID, so split the
CPU_XSCALE option into CPU_XSCALE_80200 and CPU_XSCALE_80321
options, and don't let them both be defined at the same time.
XXX May want to revisit this in the future.
* Split some registers common between the i80200 and i80321 into
<arm/xscale/xscalereg.h>.
* Rename a few existing functions.
pmap_copy_page() will never have any mappings. Therefore, it
is unnecessary to do a cache clean for that page.
Add assertions in #ifdef DEBUG that assert this invariant.
This shaves some cycles off the frequently-called pmap_zero_page()
and pmap_copy_page() (no need to look up the dst page's vm_page
structure, and one less function call to clean the page).
incorporate write back support for processors not having a write through
cache.
The current fb_devconfig structure now really holds the device's
configuration and the softc really only holds the attachment information.
This used to be mixed giving rise to weird stuctures and cross references.
The number of vertical syncs before the video memory writeback is triggered
is configurable ... default is to wait for 5 Vsync .. aprox minumum 10
times a second, but more likely in the order of 12,5 times a second. When
printing is in progress no write back is performed... only after the
waiting time. The reasoning behind this is that as long as the screen is
printed too the cache will be purged of dirty data anyway due to the
processing and new screen memory useage.
on the port-acorn32's TODO list for quite some time :
- when the serial console is selected, don't exclude the screen
alltogether; currently the keyboard is still not attached but that might be
a configuration problem in the GENERIC console or a failure to explicitly
connect to a wsmux. This needs further investigation.
- create a framework for the display memory writeback on vsync for
StrongARM processors since they don't have a write-trough cache. This is to
solve the lazy screen update that is very evident in single user mode on
these processors; the cache isn't flushed/written back that often and parts
of the screen can thus be resident in the cache but not written out to
memory yet.
- clean up some loose ends in the code.
indicating an unhandled "command". ERESTART is -1, which can lead to
confusion. ERESTART has been moved to -3 and EPASSTHROUGH has been
placed at -4. No ioctl code should now return -1 anywhere. The
ioctl() system call is now properly restartable.
of the cache in a static table. Note that the table isn't complete --
contributions of cache details for CPUs whose data sheets I haven't got are
welcome.
become ippp (ISDN ppp) and irip (ISDN raw IP). The character device now
are called: /dev/isdn (isdnd <-> kernel communication), /dev/isdnctl (dialing
and other control), /dev/isdntrc* (tracing), /dev/isdnbchan* (raw B channel
access, i.e. for user land PPP) and /dev/isdntel* (telephone devices, i.e.
for answering machines).
issue an instruction that caused the late abort handler to be called for
wich the kernel had no support build in for.
It now only panics when it happends in kernel but otherwise signals the
process a SEGV signal.
but not used resulting in a compiler error. By splitting the declaration
and the initialisation this is solved.
Better would be to not even declare the flag when ARMFPE isnt enabled but
that would just add to the #ifdef jungle.
even bother probing for an FPA. If ARMFPE is configured, always use it,
even if there's an FPA (since it provides the FPA support code). Move all
printfs about FPAs into armfpe_init.c.
This means I can delete the last two elements from struct _cpu, so that the
structure, and the whole of <arm/cpus.h> is redundant and can be deleted.
in identify_arm_cpu(), since it's almost unused elsewhere.
Change the detection of bugged StrongARMs to use the cpu ID rather than the
class. This turns "almost" into "entirely".
struct cpu_info. Also kill the cpuctrl global while we're here, and make
identify_arm_cpu() take a struct cpu_info * as an argument alongside the CPU
number.
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:
* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.
From art@openbsd.org.
Note that this has been compiled on some systems, cats, IQ80310, IPAQ, netwinder and shark (note that shark's build is currently broken due to other reasons), but only actually run on cats.
Shark doesn't make use of the functionality as I believe there has to be a correlation between OFW and the kernel tables so that calls into OFW work.
Be consistant in the way that MSIZE, MCLSHIFT, MCLBYTES and NMBCLUSTERS
are defined.
Remove old VM constants from cesfic port.
Bump MSIZE to 256 on mipsco (the only one that wasn't already 256).
to the L1 table and a virtual address, and no pointer to the L2 table.
The L2 table will be looked up by pmap_map_entry(), which will panic
if the there is no L2 table for the requested VA.
NOTE: IT IS EXTREMELY IMPORTANT THAT THE CORRECT VIRTUAL ADDRESS
BE PROVIDED TO pmap_map_entry()! Notably, the code that mapped
the kernel L2 tables into the kernel PT mapping L2 table were not
passing actual virtual addresses, but rather offsets into the range
mapped by the L2 table. I have fixed up all of these call sites,
and tested the resulting kernel on both an IQ80310 and a Shark.
Other portmasters should examine their pmap_map_entry() calls if
their new kernels fail.
and let pmap_map_chunk() lookup the correct one to use for the
current VA. Eliminate the "l2table" argument to pmap_map_chunk().
Add a second L2 table for mapping kernel text/data/bss on the
IQ80310 (fixes booting kernels with ramdisks).
- Add alignment-safe double and float unions.
- Use the above for the __infinity and __nan constants on all
architectures that use the standard ieee754 representation of
those constants.
- Add a single copy of various ieee754 math functions (frexp, isinf,
isnan, ldexp and modf) that had numerous duplicates among the
arch-specific directories.
- Use the above functions on all architectures where the generic C
versions where used. Architectures that had local assembly
routines are untouched (for those functions only).
MACHINE_ARCH since <arm/param.h> already sets it correctly to "arm".
* For platforms which are not yet ELF, defined MACHINE_ARCH to "arm32"
if __ELF__ is not defined by the C preprocessor.
* In <arm/param.h>, clarify the rules about when MACHINE and
MACHINE_ARCH are defined, and to what. Also, for ELF platforms,
int the non-_KERNEL case, force both MACHINE and MACHINE_ARCH to "arm",
rather than allowing platform-specifc code to define either.
enabled by definining __ARM_FIQ_INDIRECT in <machine/types.h>.
This is needed for OpenFirmware systems (like the Shark), where
the OFW vector page is used, and kernel entries merely patched
into it.
interrupt code for the IQ80310 board support package.
XXX The Integrator board support package still uses the old-style
arm32 interrupt code, so some compatibility hacks have been added
for it. When the Integrator uses new-style interrupts, those hacks
can go away.
Those for the SA-1100 and SA-1110 are from Intel's documentation.
The mapping for the SA-110 is from various sources on the net, since Intel
don't seem to document it.
Also, change the layout of the maps to have four steppings per line,
so they aren't quite so unwieldy.
cores.
* For i80200 Step-A0 and Step-A1, set dcache_inv_range to
xscale_cache_purgeD_rng to work around a bug where a D$
"invalidate by address" doesn't properly clear the dirty
bits on the cache block (i80200 errata item #25).
* Track which process (XXX really, vmspace) owns the mapping. When
we sync the map, if the mapping doesn't belong to the kernel or to
the current process (XXX really, vmspace), then no cache fobbing
is necessary, since the cache is Wb-Inv'd on context switch (XXX need
to revisit this when we support FCSE).
* Be smarter about which cache operation we do when sync'ing the map:
- PREREAD -- Invalidate D$ (XXX right now, we actually do Wb-Inv)
- PREWRITE -- Write-back D$ (note, we do NOT invalidate here)
- PREREAD|PREWRITE -- Wb-Inv D$
More work is needed here. In particular, a version for CPUs
with write-through caches should be provided, to eliminate
the write-back steps (which are noops on such CPUs, but skipping
two branches would be nice).
pass. Rather than providing a whole slew of cache operations that
aren't ever used, distill them down to some useful primitives:
icache_sync_all Synchronize I-cache
icache_sync_range Synchronize I-cache range
dcache_wbinv_all Write-back and Invalidate D-cache
dcache_wbinv_range Write-back and Invalidate D-cache range
dcache_inv_range Invalidate D-cache range
dcache_wb_range Write-back D-cache range
idcache_wbinv_all Write-back and Invalidate D-cache,
Invalidate I-cache
idcache_wbinv_range Write-back and Invalidate D-cache,
Invalidate I-cache range
Note: This does not yet include an overhaul of the actual asm files
that implement the primitives. Instead, we've provided a safe default
for each CPU type, and the individual CPU types can now be optimized
one at a time.
use 2 adjacent cache-size areas for global cache clean, alternating
between the two of them on each call. Without this, D-cache blocks
aren't evicted properly, and no one seems to know why.
1) Add defparam XSCALE_CCLKCFG to define a parameter for the
CCLKCFG register. Default it to '9' on the IQ80310.
2) Add a sleep call to the xscale CPU function vector (replacing
the nullop) which should drop the CPU into "idle" mode when
cpu_switch finds nothing on the run queues.
by SWI numbers above 0x9f0000, but we re-map them down to somewhere just
after the end of the usual syscall range, since NetBSD doesn't handle
sparse syscall arrays well.
The only syscall I've actually implemented in this range is cacheflush(),
which was previously being mapped to fork(), causing ... interesting results.
swi_handler() does stuff that all SWIs will need, then calls
curproc->p_emul->e_syscall.
syscall() handles native NetBSD system calls.
linux_syscall() handles Linux system calls.
Also update the funcs in arm32_machdep.c that create the entries so that on cats they expect the 2 pagetables to be contiguous, note this means that for now cats is special cased in lots of funcs. I'll tidy this up to something a bit more sane soon, to avoid the multitude of #ifndef cats that I had to sprinkle in.
Note that this leaves a few inconsistencies (no more than we already had though) eg initarm is now prototyped in arm32/machdep.h, however only cats currently makes use of that header.
on a page boundary and can be mapped straight into zero page. This means it
has to be in MD_SFILES on arm26, and not in SFILES.
This probably leaves kernel_text in the wrong place, but it at least leaves the system bootable.
Original i386 log message:
Optimize the case of writing to /dev/zero, and clean up the
surrounding code a bit. Partly suggested by gwr.
I think this needs to be applied to arm26 as well.
* Use a common set of exception handlers for all arm32 platforms.
* New FIQ framework based on discussions with Ben Harris, shared
between arm26 and arm32.
parameters from first principles rather than using a static table for some
rates. This makes it work correctly on ARM7500, for which the table was
bogus (ARM7500 has a different refclk from VIDC20).
Note that someone needs to tidy this up, we've got 92 block devices, which just ain't true. Also appears we're actually missing some, eg the ld block device.
Any problems reported by testers have been fixed, and massive
cross-compiling of kernels has shown that any problems that remain
with actually building kernels are not related to this.
The tmpx registers are now outputs, this makes them all unique.
Add the fact that cc is changed by the asm (not believed to be used but rather be correct)
Correctly specify w as an input and output register, I think this was hiding the bug below!
Allow sum to be in a different input and output register.
Correct bug in psuedo header handling for in4_cksum. Seems that the new macros turned up a latent bug in the psuedo header handling, the code was moving a pointer forward 16 bytes twice, not found before as the ADD16 macro wasn't 100% accurate, as it didn't output w, even though it modified it.
only the Secondary PCI bus (it's the only bus which can have a
device space hidden from any PCI host on the Primary bus).
Also, use the bus number from the PPB businfo register seecondary bus
field rather than hard-coding "1".