The recommended workaround is a 5-10ms delay before and after accesses.
Therefore, move the affected bus_space_* operations from bus.h to bus.c
and special-case MACE accesses.
CRIME accesses are not affected, so introduce SGIMIPS_BUS_SPACE_CRIME and
use it as the CRIME tag.
My ip32 seems a little bit happier with this change, and my ip22 didn't
notice the change.
- Document the kernel thread.
- Rename some functions and variables.
- Return EROFS where appropriate.
- Use shifts instead of 64-bit divide.
- Use a simple_lock to make it MP-safe.
- Add M_CANFAIL to malloc to avoid panic on large cluster size.
- Allow sparse file for backing store and use VOP_BALLOC() to allocate
space. Default size of backing store is the size of the file system.
8-bit VIDC audio. Both Richard Earnshaw and I had guessed that a set
bit was positive (the same as normal mu-law), but the AudioWorks
manual, and Sound_SoundLog on RISC OS, seem to disagree. Change
MULAW_TO_VIDC to match Sound_SoundLog, since the latter is probably
definitive.
suspending.
Move vfs_write_suspend() and vfs_write_resume() from kern/vfs_vnops.c
to kern/vfs_subr.c.
Change vnode write gating in ufs/ffs/ffs_softdep.c (from FreeBSD).
When vnodes are throttled in softdep_trackbufs() check for
file system suspension every 10 msecs to avoid a deadlock.
Collapse the related variables down to zero. That means 'flags' is 0
as well. Nuke the extraction macros, a bunch of the variables, and replace
'flags' as well.
it can be used to clear the work queue.
Cleanup ffs_sync() which did not synchronously wait when MNT_WAIT
was specified. Clear the work queue when MNT_WAIT is specified.
Result is a clean on-disk file system after ffs_sync(.., MNT_WAIT, ..)
From FreeBSD.
when the link condition changes by returning.
Note that hostap mode still does not work in atw, and ADMtek has
told me that the hardware will not support it, but I remain hopeful.
The compiler already knew that these chunks of code
could never be reached (since lu_flag was always 0), so it
already ignored them.
No functional changes.
rf_enableAtomicRMW changes.]
Cleanup rf_enableAtomicRMW and its use. According to the comments, we
can't set this to anything other than zero anyway. Shaves off another
900 bytes. lu_flag's days are numbered now, as are the middle
parameters of RF_CREATE_PARAM3.
can't set this to anything other than zero anyway. Shaves off another
900 bytes. lu_flag's days are numbered now, as are the middle
parameters of RF_CREATE_PARAM3.
add compatibility for filesystems created before FFSv2 integration
these patches are from pr port-macppc/23926 and should also fix
problems discussed in pr kern/21404 and pr kern/21283
idle pool pages to be returned to the system immediately upon becoming
de-fragmented.
Also, in pool_do_put(), don't free back an idle page unless we are over
our minimum page claim.
this usually isn't necessary since we freed it earlier in pmap_remove_all(),
but since uvmspace_free() is now called in the context of the exiting
process, a new context might be allocated if uvm_unmap_detach() decides
to sleep (since cpu_switch() will allocate a new context when it switches
back to the exiting process).
cache) from 30% to 20%. This seems to significantly smooth the oscillation
between "almost no memory available" and "UVM free target available" caused
by the current sudden, heavy backpressure on the metadata cache. We should
revisit this again once the backpressure mechanism is better tuned; ideally,
the hard limit should almost never come into play, because the metadata
cache should gradually give back pages as buffers hit the AGE list and as
the page cache demands them, rather than giving back a big slug of pages
all at once when UVM decides it's in a hurry and fires off the page daemon.
Just how well this adjustment works is likely to vary significantly from
machine to machine depending on I/O mix, filesystem frag size, and total
memory. However, 20% seems to be quite a bit better than 30% on several
systems I've tested and is, coincidentally, more than enough to cache
the entire metadata working set of the AnonCVS server with 100 clients,
which is a useful worst-case stake in the ground...
0.5%, based on some quick measurements on a number of workstations and
small fileservers (including my home fileserver running simultaneous
builds of the NetBSD source tree and several NetBSD kernels). This
brings the hit rate on my machines from below 70% to above 90%. We
should be able to tune this as we run, by tracking the hit rate and
increasing the size of the cache if memory permits.
Some systems will still require significantly larger cache sizes. Some
ports -- notably the 64-bit ones -- probably should use more than 1% of
physmem as the default due to the larger size of struct vnode.
buf_mrelease(). Without this, though the pages are returned to the
relevant *pool*, they are never available for any other use in the
system.
Now the backpressure on the physical size of the buffer cache through
the buf_drain() call in the pagedaemon works correctly. If anything,
it may be a bit more aggressive than intended. On my 256MB system,
with vm.bufcache set to the default 30% of physmem, a kernel with this
fix can do 5 simultaneous config/makedep/builds of different NetBSD
kernels in 1313 seconds; with the "traditional" buffer cache code it
requires 1320 seconds. Running "find / -type d -exec ls -l {}" while
the build is going demonstrates that the backpressure is working
correctly: free memory oscillates slowly between close to none and
the UVM target free, and vmstat -m shows a large number of releases
for the buffer pools.
For future work: how is "bufpl" memory returned to the system? This
is not obvious to me (I must be looking in the wrong place). Also,
buf_mrelease() is also called from brelse() in some cases. Would it
be better to add a pool flag causing automatic release of full pages
as they become available (not fragmented)? Jason Thorpe proposed this
and it seems more elegant than cleaning the _entire_ pool only upon
memory pressure.
Greg Oster did a lot of the work of figuring this out. Jason proposed
the use of pool_reclaim as a way to fix it.
are.
Cut SYSCTL_DEFSIZE in half, which results in roughly a 40% improvement
in memory usage by sysctlnodes (on my laptop), along with a dozen
extra calls to sysctl_realloc() during kernel bootstrap (which no one
should notice anyway).
This makes it possible to define header files on the command line that
might include ${MACHINE} somewhere in the path. This might be used in
evbppc, for example, when defining PPC_PCI_MACHDEP_IMPL as, for example:
PPC_PCI_MACHDEP_IMPL="<arch/evbppc/sandpoint/pci_machdep.h>"
which will be included as
#include PPC_PCI_MACHDEP_IMPL
Prior to this change, the compile would fail trying to include
<arch/evbppc/1/pci_machdep.h>
Split the sysctl setup routine into two routines, one for each
"subtree". Perhaps it's a little pedantic, but it's cleaner. Also,
assert that the "kern" and "vm" nodes exist.
exit path always calls lwp_exit2()
pointed out Martin Husemann, change reviewed by Chuck Silvers
also update comment with switch_exit() prototype while here
use this in mbus_dmamem_map() to fix corruption of DMA memory.
note that this TLB bit is ignored on some CPUs (PA7100 and probably
others of that era), so this doesn't fix the problem in general,
but it does work on newer models and will make things easier later.
debugging printf, and in rf_netbsdkintf.c. We can do the calculations
inside of RF_DEBUG_RECON for the one debugging printf, and only
perform the percentCompleted calculation "on demand" in the
rf_netbsdkintf.c case. Shaves a few more bytes off an i386 GENERIC
kernel, and ever-so-slightly decreases the amount of work performed
during a reconstruct.
- us/de keymaps based on ite keymaps, converted by Gunther Nikl
(with slight adjustments by me)
- es/fr/sv keymaps similarly based on ite keymaps
- removed the commented out keymaps
- some small fixes to the rest of the existing keymaps
Get rid of the static bus tag, instead move it into the softc.
Update to ThorpeJ's recent variable renaming for ATA things.
De-__P and KNF prototypes, also make attach and probe static.
Add RCSID.
Add a copyright for myself.
is empty besides calling switch_exit(). So, rename switch_exit() to
cpu_exit() and modify the routine to call lwp_exit2() direct.
This saves couple cycles on the exit path.
process context ('reaper').
From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit
uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.
MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.
g/c now unneeded routines and variables, including the reaper kernel thread
rf_DecrAccessesCountState wasn't in the correct spot in
RF_AccessState_e. Following up on that has resulted in one other
correction. Changing orderings of these states is tricky, and
shouldn't be attempted without some thorough analysis. For the
changes committed, the following analysis is offerred:
1) RAIDframe uses a little state machine to take care of building,
executing, and processing the DAGs used to direct IO.
2) The rf_DecrAccessesCountState state is handled by the function
rf_State_DecrAccessCount(). The purpose of this state is to
decrement the number of "accesses-in-flight".
3) rf_Cleanup_State is handled by rf_State_Cleanup(). Its job is to
do general cleanup of DAG arrays and any stripe locks.
4) DefaultStates[] in rf_layout.c indicates that the right spot
for rf_DecrAccessesCountState is just before rf_Cleanup_State.
Analysis of code for both states indicates that the order doesn't
matter too much, although rf_State_DecrAccessCount() should probably
take place *after* rf_State_Cleanup() to be more correct.
5) Comments in rf_State_ProcessDAG() indicates that the next state
should be rf_Cleanup_State. However: it attempts to get there by using
desc->state++;
which actually takes it to just rf_DecrAccessesCountState! This turned
out to be OK before, since rf_Cleanup_State would follow right after,
and all would be taken careof (albeit in arguably the "less correct"
order).
6) With the current ordering, if we head directly to rf_Cleanup_State
(as we do, for example, if multiple components fail in a RAID 5 set),
then we'll actually miss going trough rf_DecrAccessesCountState), and
could end up never being able to reach quiescence! Perhaps not too
big of a deal, given that the RAID set is pretty much toast by that
point at which such a drastic state change happens, but might as well
have this correct.
The changes made are:
1) Since having rf_State_DecrAccessCount() come after
rf_State_Cleanup() is just fine, change rf_layout.c to reflect that
rf_DecrAccessesCountState comes after rf_Cleanup_State (i.e. they swap
positions in the state list). This means that going to
rf_Cleanup_State after bailing on a failed DAG access will do all the
right things -- the state will get cleaned up, and then the access
counts will get decremented properly. The comment in
rf_State_ProcessDAG() is now actually correct -- the next state *will*
be rf_Cleanup_State.
2) Move rf_DecrAccessesCountState in RF_AccessState_e to just after
rf_CleanupState. This puts RF_AccessState_e in sync with
DefaultStates[]. Fortunately, these states are rarely referred to by
name, and so this change ends up being mostly cosmetic -- it really
only fixes cleanup behaviour for the recent "Failed to create a DAG"
changes.
with flow control being applied. It is simpler and no more problematic to
accept the data and drop it if we hit a resource limit than to expect the
Bluetooth device to do anything about it (which it won't).
acccesses with addresses shifted by the amount specified in the cookie.
Also make the inclusion of the wscons file the resposibility of whoever
includes files.iomd. (found while attempting to checking riscstation
support into evbarm)
http://www.simtec.co.uk/products/EB7500ATX/
also available with RISC-OS as a RiscStation:
http://www.riscstation.co.uk/html/products.html
This is basic bootstrap with support for ide and networking, currently only
tested with booting from ABLE, and not RISC-OS.
I would have placed it into evbarm, but iomd doesn't appear to use the same
interrupt files as evbarm. I'll check it into here for now, until iomd
uses the common interrupt code.
fit inside one memory chunk.
Leave 1 page before kernel untouched as that's where our initial kernel
stack before we switch to proc0 stack is (fixes boot problems on my Indy
with small kernels).
~forever. This requires a number of things:
1) If we can't create a DAG, set desc->numStripes to 0 in
rf_SelectAlgorithm. This will ensure that we don't attempt to free
any dagArray[] elements in rf_StateCleanup.
2) Modify rf_State_CreateDAG() to not panic in the event of a DAG
failure. Instead, set the bp->b_flags and bp->b_error, and set things
up to skip to rf_State_Cleanup().
3) Need to mark desc->status as "bad" so that we actually stop looking
for a different DAG. (which we won't find... no matter how many times
we try).
4) rf_State_LastState() will then do the biodone(), and return EIO for
the IO in question.
5) Remove some " || 1 "'s from ProcessNode(). These were for
debugging, and we don't need the failure notices spewing
over and over again as the failing DAGs are processed.
6) Needed to change
if (asmap->numDataFailed + asmap->numParityFailed > 1)
to
if ((asmap->numDataFailed + asmap->numParityFailed > 1) ||
(raidPtr->numFailures > 1)){
in rf_raid5.c so that it doesn't try to return
rf_CreateNonRedundantWriteDAG as the creation function.
7) Note that we can't apply the above change to the RAID 1 code as
with the silly "fake 2-D" RAID 1 sets, it is possible to have 2 failed
components in the RAID 1 set, and that would stop them from working.
(I really don't know why/how those "fake 2-D" RAID 1 sets even work
with all the "single-fault" assumptions present in the rest of the
code.)
8) Needed to protect rf_RAID0DagSelect() in a similar way -- it should
return NULL as the createFunc.
9) No point printing out "Multiple disks failed..." a zillion times.
an offset between ss_sp and struct sa_stackinfo_t (located in struct
__pthread_st) when calling sa_register. The kernel increments the
sast_gen counter in struct sastack when an upcall stack is used.
libpthread increments the sasi_stackgen counter in struct
sa_stackinfo_t when an upcall stack is freed. The kernel compares the
two counters to decide if a stack is free or in use.
- add struct sa_stackinfo_t with sasi_stackgen to count stack use in
userland
- add sast_gen to struct sastack to count stack use in kernel
- add SA_FLAG_STACKINFO to enable the stackinfo_offset argument in the
sa_register syscall
- add sa_stackinfo_offset to struct sadata for offset between ss_sp
and struct sa_stackinfo_t
- add ssize_t stackinfo_offset argument to sa_register, initialize
struct sadata's sa_stackinfo_offset from it if SA_FLAG_STACKINFO is
set
- add sa_getstack, sa_getstack0, sa_stackused and sa_setstackfree
functions to find/use/free upcall stacks and use these where
appropriate
- don't record stack for upcall in sa_upcall0
- pass sau to sa_switchcall instead of l2 (l2 = curlwp in sa_switchcall)
- add sa_vp_blocker to struct sadata to pass recently blocked lwp to
sa_switchcall
- delay finding a stack for blocked upcalls to sa_switchcall
- add sa_stacknext to struct sadata pointing to next most likely free
upcall stack; also g/c sa_stackslist in struct sadata and sast_list
in struct sastack
- add L_SA_WOKEN flag: LWP is on sa_woken queue
- add L_SA_RECYCLE flag: LWP should be recycled in sa_setwoken
- replace l_upcallstack with L_SA_WOKEN/L_SA_RECYCLE/L_SA_BLOCKING
flags
- g/c now unused sast_blocker in struct sastack
- make sa_switchcall, sa_upcall0 and sa_upcall_getstate static in
kern_sa.c
- call sa_upcall_userret only once in userret
- split sa_makeupcalls out of sa_upcall_userret and use to process
the sa_upcalls queue
- on process exit: mark LWPs sleeping in saunblock interruptible; also
there are no LWPs sleeping on l->l_upcallstack anymore; also clear
sa_wokenq_head to prevent unblocked upcalls
additional changes:
- cleanup timerupcall sa_vp == curlwp check
- add check in sa_yield if we didn't block on our way here and we
wouldn't any longer be the LWP on the VP
- invalidate sa_vp_ofaultaddr after resolving pagefault