42 lines
2.1 KiB
Plaintext
42 lines
2.1 KiB
Plaintext
|
Following are some observations about the the BSD hp300 pmap module that
|
||
|
may prove useful for other pmap modules:
|
||
|
|
||
|
1. pmap_remove should be efficient with large, sparsely populated ranges.
|
||
|
|
||
|
Profiling of exec/exit intensive work loads showed that much time was
|
||
|
being spent in pmap_remove. This was primarily due to calls from exec
|
||
|
when deallocating the stack segment. Since the current implementation
|
||
|
of the stack is to "lazy allocate" the maximum possible stack size
|
||
|
(typically 16-32mb) when the process is created, pmap_remove will be
|
||
|
called with a large chunk of largely empty address space. It is
|
||
|
important that this routine be able to quickly skip over large chunks
|
||
|
of allocated but unpopulated VA space. The hp300 pmap module did check
|
||
|
for unpopulated "segments" (which map 4mb chunks) and skipped them fairly
|
||
|
efficiently but once it found a valid segment descriptor (STE), it rather
|
||
|
clumsily moved forward over the PTEs mapping that segment. Particularly
|
||
|
bad was that for every PTE it would recheck that the STE was valid even
|
||
|
though we should already know that.
|
||
|
|
||
|
pmap_protect can benefit from similar optimizations though it is
|
||
|
(currently) not called with large regions.
|
||
|
|
||
|
Another solution would be to change the way stack allocation is done
|
||
|
(i.e. don't preallocate the entire address range) but I think it is
|
||
|
important to be able to efficiently support such large, spare ranges
|
||
|
that might show up in other applications (e.g. a randomly accessed
|
||
|
large mapped file).
|
||
|
|
||
|
2. Bit operations (i.e. ~,&,|) are more efficient than bitfields.
|
||
|
|
||
|
This is a 68k/gcc issue, but if you are trying to squeeze out maximum
|
||
|
performance...
|
||
|
|
||
|
3. Don't flush TLB/caches for inactive mappings.
|
||
|
|
||
|
On the hp300 the TLBs are either designed as, or used in such a way that,
|
||
|
they are flushed on every context switch (i.e. there are no "process
|
||
|
tags") Hence, doing TLB flushes on mappings that aren't associated with
|
||
|
either the kernel or the currently running process are a waste. Seems
|
||
|
pretty obvious but I missed it for many years. An analogous argument
|
||
|
applies to flushing untagged virtually addressed caches (ala the 320/350).
|