its not on a free list.
Also change buf_init() to not automatically mark buffers `busy' since this
only makes sense for bufcache buffers.
Mark all buf_init'd buffers 'busy' on the places where they ought to be
flagged as such to not confuse the buffer cache.
Fixes PR 38923.
machine
Assume that a vnode (and associated data structures) costs 2kB in the
worst imaginable case. Don't allow sysctl to set desiredvnodes to a
value that would use more than 75% of KVA or 75% of physical memory.
- Back out the previous workaround now that the sleep queue code has
been changed to never let the queue become empty if there are valid
waiters.
- Use sleepq_hashlock() to improve clarity.
- Sprinkle some assertions.
sleepq_changepri, sleepq_lendpri: don't let an active sleep queue head become
empty. The condvar code inspects the queue head without holding the sleep
queue lock and needs to see a non-empty queue if there are waiters.
PR kern/38761.
The condvar must access the sleepq with the sleepq lock held, doing so
is causing inconsistent sleepq state to be read.
This is because some accesses to the sleepq don't come via the cv code,
but are call directly into sleepq_changepri and sleepq_lendpri, which take
the sleepq lock, and removes then re-inserts lwps into the sleepq.
Running a build.sh with -j8 now completes on my quad-core, also tested by
Simon@ on a 8-core server and matt@ on a quad-core.
I believe there is room to be more efficient with this, as we now take the
sleepq lock for all cv_broadcast and cv_signal calls. I'll look into this
and post a diff to tech-kern.
reference to the PMF private data until the private data has no
more waiters. This protects against a NULL dereference.
In device_pmf_lock1(), test a device_t for PMF registration before
dereferencing its PMF private data.
use both types of list.
- Make page coloring and idle zero state per-CPU.
- Maintain per-CPU page freelists. When freeing, put pages onto the local
CPU's lists and the global lists. When allocating, prefer to take pages
from the local CPU. If none are available take from the global list as
done now. Proposed on tech-kern@.
- Don't use goto in critical paths, it can confuse the compiler.
- Sprinkle some branch hints.
- Make namecache stats per-CPU and collate once per second.
- Use vtryget().
to scale more gracefully when there are thousands of active threads.
Proposed on tech-kern@.
- Use LOCKDEBUG to catch some errors in the use of condition variables:
freeing an active CV
re-initializing an active CV
using multiple distinct mutexes during concurrent waits
not holding the interlocking mutex when calling cv_broadcast/cv_signal
waking waiters and destroying the CV before they run and exit it
- Tweak it so it can also catch common errors with condition variables.
The change to kern_condvar.c is not included in this commit and will
come later.
- Don't call kmem_alloc() if operating in interrupt context, just fail
the allocation and disable debugging for the object. Makes it safe
to do mutex_init/rw_init/cv_init in interrupt context, when running
a LOCKDEBUG kernel.
* NOTE: if you get a panic in this code block, it is likely that
* a lock has been destroyed or corrupted while still in use. Try
* compiling a kernel with LOCKDEBUG to pinpoint the problem.
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.
Closes PR/38169, idea of migration via idle loop by Andrew Doran.
NetBSD's shutdown behavior of more than 6 years before rev 1.176.
Ok joerg@.
It is essential that we restore some hardware to initial conditions
before rebooting, in order to avoid interfering with the BIOS
bootstrap. For example, if NetBSD gives control back to the Soekris
comBIOS while the kernel text is write-protected, the BIOS bootstrap
hangs during the power-on self-test, "POST: 0123456789bcefghip".
In principle, bus masters can also interfere with BIOS boot.
- Fix performance regression inroduced by the workaround by making job
stealing a lot simpler: if the local run queue is empty, let the CPU enter
the idle loop. In the idle loop, try to steal a job from another CPU's run
queue if we are idle. If we succeed, re-enter mi_switch() immediatley to
dispatch the job.
- When stealing jobs, consider a remote CPU to have one less job in its
queue if it's currently in the idle loop. It will dispatch the job soon,
so there's no point sloshing it about.
- Introduce a few event counters to monitor what's happening with the run
queues.
- Revert the idle CPU bitmap change. It's pointless considering NUMA.
Fail sched_catchlwp() if mutex_tryenter() on the remote CPU's state fails.
Seems to work around the issue described in this PR.
XXX Stealing jobs from remote CPUs could probably be moved into the idle
loop, making the locking quite a bit simpler.
- Do timeslicing for SCHED_RR threads. At ~16Hz it's too slow but better
than nothing. XXX
- If a SCHED_OTHER thread has hogged the CPU for 1/8s without taking a
trip through mi_switch(), try to force a kernel preemption to give other
threads a chance.
run through copy-on-write. Call fscow_run() with valid data where possible.
The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.
- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.
- Always run copy-on-write on buffers returned from ffs_balloc().
- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.
Welcome to 4.99.63
Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
compiles in GENERIC for i386. This is still a work-in-progress, but
this checkin covers most of the mechanical work (changing signalling
to be able to accomidate SA's process-wide signalling and re-adding
includes of sys/sa.h and savar.h). Subsequent changes will be much
more interesting.
Also, kern_sa.c has received partial cleanup. There's still more
to do, though.
Make VFS hooks dynamic while we're here and say farewell to VFS_ATTACH and
VFS_HOOKS_ATTACH linksets.
As a consequence, most of the file systems can now be loaded as new style
modules.
Quick sanity check by ad@.
Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.
Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:
- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.
- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.
One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
- sys_sync: acquire a write lock on the mount since the operation modifies
the mount structure.
- sys_fchdir: use vfs_trybusy(). If an unmount is in progress, just fail it.
- vnode and cwd locks were being taken with proc_lock held, which is bad
because proc_lock can only be held for a short period of time.
- Processes could have continually forked and escaped notice, keeping
a reference to the old directory on top of which a new mount exists.