Also in the ARM32_PMAP_NEW case, reclaim the USPACE-bytes of wasted space
at the top of the user address that hasn't been needed for a very very
long time.
for L2 allocation. This avoids potential recursive calls into
uvm_km_kmemalloc() via the pool allocator.
Bug spotted by Allen Briggs while trying to boot on a machine with 512MB
of memory.
breakpoint address before it's used. Currently a no-op on all but sh5.
This is useful on sh5, for example, to mask off the instruction
type encoding in the bottom two address bits, and makes it possible
to do "db> break $rXX" instead of manually munging the address.
by the application, all NetBSD interfaces are made visible, even
if some other feature-test macro (like _POSIX_C_SOURCE) is defined.
<sys/featuretest.h> defined _NETBSD_SOURCE if none of _ANSI_SOURCE,
_POSIX_C_SOURCE and _XOPEN_SOURCE is defined, so as to preserve
existing behaviour.
This has two major advantages:
+ Programs that require non-POSIX facilities but define _POSIX_C_SOURCE
can trivially be overruled by putting -D_NETBSD_SOURCE in their CFLAGS.
+ It makes most of the #ifs simpler, in that they're all now ORs of the
various macros, rather than having checks for (!defined(_ANSI_SOURCE) ||
!defined(_POSIX_C_SOURCE) || !defined(_XOPEN_SOURCE)) all over the place.
I've tried not to change the semantics of the headers in any case where
_NETBSD_SOURCE wasn't defined, but there were some places where the
current semantics were clearly mad, and retaining them was harder than
correcting them. In particular, I've mostly normalised things so that
_ANSI_SOURCE gets you the smallest set of stuff, then _POSIX_C_SOURCE,
_XOPEN_SOURCE and _NETBSD_SOURCE in that order.
Tested by building for vax, encouraged by thorpej, and uncontested in
tech-userlevel for a week.
* Define a new "MMU type", ARM_MMU_SA1. While the SA-1's MMU is basically
compatible with the generic, the SA-1 cache does not have a write-through
mode, and it is useful to know have an indication of this.
* Add a new PMAP_NEEDS_PTE_SYNC indicator, and try to evaluate it at
compile time. We evaluate it like so:
- If SA-1-style MMU is the only type configured -> 1
- If SA-1-style MMU is not configured -> 0
- Otherwise, defer to a run-time variable.
If PMAP_NEEDS_PTE_SYNC might evaluate to true (SA-1 only or run-time
check), then we also define PMAP_INCLUDE_PTE_SYNC so that e.g. assembly
code can include the necessary run-time support. PMAP_INCLUDE_PTE_SYNC
largely replaces the ARM32_PMAP_NEEDS_PTE_SYNC manual setting Steve
included with the original new pmap.
* In the new pmap, make pmap_pte_init_generic() check to see if the CPU
has a write-back cache. If so, init the PT cache mode to C=1,B=0 to get
write-through mode. Otherwise, init the PT cache mode to C=1,B=1.
* Add a new pmap_pte_init_arm8(). Old pmap, same as generic. New pmap,
sets page table cacheability to 0 (ARM8 has a write-back cache, but
flushing it is quite expensive).
* In the new pmap, make pmap_pte_init_arm9() reset the PT cache mode to
C=1,B=0, since the write-back check in generic gets it wrong for ARM9,
since we use write-through mode all the time on ARM9 right now. (What
this really tells me is that the test for write-through cache is less
than perfect, but we can fix that later.)
* Add a new pmap_pte_init_sa1(). Old pmap, same as generic. New pmap,
does generic initialization, then resets page table cache mode to
C=1,B=1, since C=1,B=0 does not produce write-through on the SA-1.
was checked in:
* It was not actually disabling the MMU, and so jumping to the
reset vector would happily cause a panic(), since it would be
the kernel's reset vector, not the ROM's.
* In the event the system was using high vectors, VECRELOC was not
getting cleared, which has the potential to wreak havoc when re-entering
the ROM.
* It was totally broken for CPUs < ARMv4; you still need to disable
the MMU on those, just need to skip the ARMv4 TLB flush.
* The code that was checked in would only work if the kernel is mapped
VA==PA. For systems where the kernel is NOT mapped VA==PA, you only
get the prefetch depth # of insns (2) after the MMU is turned off before
you have fix the PC.
Backing out the change fixes rebooting on several evbarm platforms.
requires that the CPU control vector be properly readable. I believe that
all CPUs that have high vector support have a readable CPU control register,
but if we ever encounter one that does not, then we'll have to adjust this
code.
Some features of the new pmap are:
- It allows L1 descriptor tables to be shared efficiently between
multiple processes. A typical "maxusers 32" kernel, where NPROC is set
to 532, requires 35 L1s. A "maxusers 2" kernel runs quite happily
with just 4 L1s. This completely solves the problem of running out
of contiguous physical memory for allocating new L1s at runtime on a
busy system.
- Much improved cache/TLB management "smarts". This change ripples
out to encompass the low-level context switch code, which is also
much smarter about when to flush the cache/TLB, and when not to.
- Faster allocation of L2 page tables and associated metadata thanks,
in part, to the pool_cache enhancements recently contributed to
NetBSD by Wasabi Systems.
- Faster VM space teardown due to accurate referenced tracking of L2
page tables.
- Better/faster cache-alias tracking.
The new pmap is enabled by adding options ARM32_PMAP_NEW to the kernel
config file, and making the necessary changes to the port-specific
initarm() function. Several ports have already been converted and will
be committed shortly.
page: -1 for error, 0 for EOF, 1 otherwise. Inspired by an OpenBSD commit
message, pointed out by Miod Vallat in private mail.
vax/mba/hp.c: check return value <= 0, not < 0 to be concistent with how
other places handle return values from bounds_check_with_label().
to extract the physical address from the virtual.
On the ARM, also use the "read-only at MMU" indication to avoid a
redundant cache clean operation.
Other platforms should use these two as examples of how to use these
new pool/mbuf features to improve network performance. Note this requires
a platform to provide a working POOL_VTOPHYS().
Part 3 in a series of simple patches contributed by Wasabi Systems
to improve network performance.
objects. Clients of the pool_cache API must consistently use
the "paddr" variants or not, otherwise behavior is undefined.
Enable this on Alpha, ARM, MIPS, and x86. Other platforms must
define POOL_VTOPHYS() in the appropriate manner in order to enable
the feature.
Part 1 of a series of simple patches contributed by Wasabi Systems
to improve network performance.
cache based on CPU id. write-through on PXA2[15]0 B2 stepping and
earlier. write-back on C0 and C1 stepping (a.k.a PXA2[15]5 A0)
options XSCALE_CACHE_WRITE_{THROUGH,BACK} can override it.
for other XScale CPUs than PXA2xx, XSCALE_CACHE_WRITE_THROUGH works
same as before.
* The offset format was wrong.
* There is no post-increment or index register update.
* It wasn't even matching because the mask was wrong.
Also touch up ldr/str disassembly slightly.
too many bits (including some reserved ones) and was writing the wrong value
for the TLB flush.
Also, if the flag is off, don't write the control register!
indirect configuration (because the BECC is a soft-core, it could
have a variety of peripherals in the FPGA). Also add support for
local untranslated DMA.
of cycles off the syscall overhead.
Since all COMPAT_LINUX platforms now support __HAVE_SYSCALL_INTERN,
garbage-collect the LINUX_SYSCALL_FUNCTION stuff.
- to identify device instance, using hardware address.
- when console accesses device, using statically mapped address.
- when tty accesses device, using handler given by bus_space_map().
int's to unsigned int/u_int where they shouldn't go negative.
int's to boolean_t's where they're being used as bools.
No real functional change (in the produced asm a few condition codes changed)
no need device ixpcom in evbarm/conf/files.evbarm move it to
arm/ixp12x0/files.ixp12x0
ixp12x0_com.c:
some fix around address handling
1. Do not call bus_space_map() in ixpcominit(). Calling bus_space_map()
is not safe here, because bus_space_map() calls uvm_km_valloc() but
uvm is not yet initialized.
2. Use dv_unit to determine console instead comparering iobase.
Now you can attach ixpcom0 with physical address like this:
ixpcom* at ixpsip? addr 0x90000000 size 0x4000
Statically mapped address (0xf0000000) is still usable.
ixp12x0_clk:
1. access PLL_CFG register via bus_space
2. Make the delay() working correctly. (bug fix)
3. Start the timer device without interrupt on attach time.
Now delay() called before cpu_initclocks() works fine.
ixp12x0_pci:
1.Mapping PCI type0/1 configuration space to the upper address.
2."PCI I/O Cycle Access" mapping to same virtual address(VA==PA)
but size of this mapping increase to 1MByte because fails
cause couldnt set L2 table.
3.use bus_space address handling in ixp12x0_pci.c.
Export i80321_local_dma_init().
Make !sc->sc_is_host configuration a little more friendly.
Go back to using IABAR2 instead of IABAR3 for inbound SDRAM access.
* Define an ARM_INTR_IMPL option, which specifies a header file
describing the interrupt implementation for the platform. Use
this instead of the list of EVBARM_BOARDTYPE checks.
* Make the s3c2xx0 interrupt dispatch code a bit more generic, and move
it to a generic location so that other platforms can use it.
This eliminates all uses of the EVBARM_BOARDTYPE stuff, so delete it.
possible to use alternate system call tables. This is usefull for
displaying correctly the arguments in Mach binaries traces.
If NULL is given, then the regular systam call table for the process is used.
Our PC keyboard driver (sys/dev/pckbc/pckbd.c) works only with 8042
keyboard controller driver (sys/dev/ic/pckbc.c). So, This file
provides same functions as those of 8042 driver.
XXX: we need cleaner interface between the keyboard driver and
keyboard controller drivers.
XXX: PS/2 mice are not supported yet.
backed by physical pages (ie. because it reused a previously-freed one),
so that we can skip a bunch of useless work in that case.
this fixes the underlying problem behind PR 18543, and also speeds up fork()
quite a bit (eg. 7% on my pc, 1% on my ultra2) when we get a cache hit.
original system call number, which can be negative for a Mach trap.
We cannot just replace code by realcode, because ktrsyscall uses it as
an index in the system call table, thus crashing the kernel when the
value is negative.
when looking to reenable caching, only do so if all the pages aren't already
cached.
Convert some ints to unsigned int. (scarily this actually shows the biggest
decrease in timing for my benchmark, I guess the compiler can optimise better)
Use pmap_free_pvs in pmap_remove, should save on the overhead of freeing
each pv on it's own.
Correctly set ptp when calling pmap_enter_pv, this adds more overhead, but
the effect is minimal. Timings show that it increases gmake's make configure
step from 2:07.90 to 2:08.90. I've more optimisations planned that should
negate this increase.
This is based upon Jason's work on xscale.
Most of the interrupt handling code is now written in C using an asm stub to
call into the C code.
spl* now only updates a software mask, and does not update the hardware,
this should be much faster.
The new code works well on cats, it's untested on netwinder, but should work.
The code implements generic soft interrupts.
More work is still required to bring the isa interrupt handling code upto
scratch currently all isa interrupts are handled at IPL_BIO on the footbridge.
This may cause isa interrupts to be handled later than they should be.
I plan to fix this in the near future.
kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals
kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)
based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
In particular, use r8 to hold the old process, and r7 for medium-term
scratch, saving r0-r3 for things we don't need saved over function
calls. This gets rid of five register-to-register MOVs.
and hence save fewer into the PCB. This should give me enough free
registers in cpu_switch to tidy things up and support MULTIPROCESSOR
properly. While we're here, make the stacked registers into an
APCS stack frame, so that DDB backtraces through cpu_switch() will
work.
This also affects cpu_fork(), which has to fabricate a switchframe and
PCB for the new process.
GCC produces almost exactly the same instructions as the hand-assembled
versions, albeit in a different order. It even found one place where it
could shave one off. Its insistence on creating a stack frame might slow
things down marginally, but not, I think, enough to matter.
add rd, pc, #foo - . - 8 -> adr rd, foo
ldr rd, [pc, #foo - . - 8] -> ldr rd, foo
Also, when saving the return address for a function pointer call, use
"mov lr, pc" just before the call unless the return address is somewhere
other than just after the call site.
Finally, a few obvious little micro-optimisations like using LDR directly
rather than ADR followed by LDR, and loading directly into PC rather than
bouncing via R0.
in question, whereas the ARM code was using it to hold the model
identification. To fix this, rename:
ci_cpuid -> ci_arm_cpuid
ci_cputype -> ci_arm_cputype (for consistency)
ci_cpurev -> ci_arm_cpurev (ditto)
ci_cpunum -> ci_cpuid
This makes top(1) give correct CPU numbers in its "STATE" column (all 0 for
now).
joins other machdep files)
Saves maintaining multiple copies of the same thing, the only differences
were:
IRQ line used on the footbridge (made that a define in include/isa_machdep.h)
name of a dma_ranges variable contained arch name, so just made it generic.
When the read value is 0, reset the timer (don't wait till the next loop round to reset it)
Add a bit of debug to the calibration stuff to make sure its working ok.
in its own little inline function, and this allows us to get rid of all the
automatic variables elsewhere. This subtly changes the semantics of
__cpu_simple_lock() such that the loop ends up one instruction longer, but
I'm not sure that's a particularly bad thing.
and boot multi-user on a single-processor machine. Many of these changes
are wildly inappropriate for actual multi-processor operation, and correcting
this will be my next task.
Currently statclock runs at 64hz, maybe it should be faster or slower, I did
try it being the same as hz, but that just made it look like we spent 10% of
time handling interrupts, rather than the 3% that this gives.
Also fix the IPL_LEVELS for netwinder.
This merge changes the device switch tables from static array to
dynamically generated by config(8).
- All device switches is defined as a constant structure in device drivers.
- The new grammer ``device-major'' is introduced to ``files''.
device-major <prefix> char <num> [block <num>] [<rules>]
- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.
- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.
- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.
- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.
- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
/dev/ttyS0 crashed the kernel. This is because sacom_filltx uses some
uninitialized static variables. Pulling the salues from softc instead
fixes the problem (this is what was done before the drver was moved
from /sys/arch/hpcarm to /sys/arch/arm, anyway).
the 4M super-section that the PTP will map, not some random 1M
chunk of it. This gives the PTP hint code a much better chance
to working properly, and allows us to tidy up the code that
flushes a PTP from the cache in pmap_destroy().
to do uncached memory access during VM operations (which can be
quite expensive on some CPUs).
We currently write-back PTEs as soon as they're modified; there is
some room for optimization (to write them back in larger chunks).
For PTEs in the APTE space (i.e. PTEs for pmaps that describe another
process's address space), PTEs must also be evicted from the cache
complete (PTEs in PTE space will be evicted durint a context switch).
Change the bus_dmamap_sync() macro to test the ops argument against pre-
and post- constants. The compiler will optimize out dead code because
of the constants. Since post- operations are not needed on ARM (except
for ISA bounce buffers), this eliminate a large number of function calls
which are noops, each of which cost at least 6 cycles just in the call
and return overhead (not to mention whatever other useless work the
compiler decides to do in the callee).