put in the right spot. The 'next' link in the new entry must become globally
visible before the list head is updated. This could have affected systems
with weak memory ordering like the alpha.
mandatory. Remove the 4BSD run queue code. Effects:
- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.
Discussed on tech-kern@.
- Use atomic ops directly, since rwlocks work the same way on all platforms.
- Try to make it a bit more cache efficient, and use branch hints.
- Fix a bug in rw_downgrade() where the turnstile lock was not released.
- Remove a couple of redundant assertions.
- Use atomic_swap instead of atomic_cas where it's safe to do so.
- After acquiring the turnstile lock in rw_vector_enter, check if the
owner is running again and spin if so.
- Introduce and use rw_onproc() instead of abusing mutex_onproc().
- Change the handoff/release algorithm to reduce the window when a rwlock
can held, but the owner not on a CPU.
- percpu_getptr() is now called percpu_getref() and implicitly disables
preemption (via crit_enter()) when it is called.
- Added percpu_putref() which implicitly reenables preemption (via
crit_exit()).
PRI_KTHREAD range. This is kind of ugly, but needed because of direct handoff
with rwlocks, and because threads that block holding a mutex regularly hold
other locks/resources.
Problem addressed: priority lending works well where a thread blocking on a
turnstile has a high priority level (eg realtime). For timeshared threads
(low priority) it's unlikely to have much effect. In the latter case threads
awoken from a turnstile can and do compete for CPU time with regular waits
like disk I/O. On MP systems this can result in a feedback loop where
threads cannot quickly get access to a resource held by a thread waking from
a turnstile. The waking thread eventually runs when enough of the other
threads block waiting for it, freeing up the CPU. The end result is a lot of
idle time during builds.
there are no waiters. This gives a major boost to build.sh on larger
systems as directory vnode locks are exclusive for lookup, but are often
only held for a very short period of time.
This change has the potential to more readily expose lock order reversals
and other types of deadlock.
tech-kern last year. Instead of modifying callout_stop, add a new
routine (callout_halt) which will sleep if the callout is already in
flight. Note that if a callout can take locks, the caller of callout_halt
must not hold any of those locks - otherwise the two could deadlock.
we have unlocked the kqueue to check its state, leave it queued and
re-check later.
- knote_dequeue: fold into knote_detach since nothing else uses it.
- Note a couple more problems.
buf_memcalc is used by bootstrap as well. fix NULL dereference for them.
- limit kva usage for each cache to 20% of vm_map. XXX a bit arbitrary.
- add a comment.
testing by rmind@ and myself.
Which approach to use is still being discussed, but I would like to get
this out of my working tree. If we decide to use a different approach
there is no problem with revisiting this.
- Redo reference counting to be sane. LWPs accessing files take a short
term reference on the local file descriptor. This is the most common
case. While a file is in a process descriptor table, a reference is
held to the file. The file reference count only changes during control
operations like open() or close(). Code that comes at files from an
unusual direction (i.e. foreign to the process) like procfs or sysctl
takes a reference on the file (f_count), and not on a descriptor.
- Remove knowledge of reference counting and locking from most code that
deals with files.
- Make the usual case of file descriptor lookup lockless.
- Make kqueue MP and MT safe. PR kern/38098, PR kern/38137.
- Fix numerous file handling bugs, and bugs in the descriptor code that
affected multithreaded processes.
- Split descriptor system calls out into sys_descrip.c.
- A few stylistic changes: KNF, remove unused casts now that caddr_t is
gone. Replace dumb gotos with loop control in a few places.
- Don't do redundant pointer passing (struct proc, lwp, filedesc *) unless
the routine is likely to be inlined. Most of the time it's about the
current process.
This is for netsmb which wants to poll sockets directly.
- When polling a socket, first check for pending I/O without acquring any
locks. If no I/O seems to be pending, acquire locks/spl and check again
doing selrecord() if necessary.
value, which can be overwritten with syscalls.conf defines (just like
sys_nosys).
This fix a problem where rump awk variables are set to 0 value,
leading to the creation of an unexpected file with that name.
ok by pooka.
This make uid_find(), chgproccnt(), chgsbsize() and lf_alloc(), lf_free()
functions lock-less.
- Increase the size of uihashtbl in case of MP system, as suggested by <ad>.
- Add HASH_SLIST type for hashinit().
Reviewed by <ad>.
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.
Improve PMF-ability.
Add a 'flags' argument to suspend/resume handlers and
callers such as pmf_system_suspend().
Define a flag, PMF_F_SELF, which indicates to PMF that a
device is suspending/resuming itself. Add helper routines,
pmf_device_suspend_self(dev) and pmf_device_resume_self(dev),
that call pmf_device_suspend(dev, PMF_F_SELF) and
pmf_device_resume(dev, PMF_F_SELF), respectively. Use
PMF_F_SELF to suspend/resume self in ath(4), audio(4),
rtw(4), and sip(4).
In ath(4) and in rtw(4), replace the icky sc_enable/sc_disable
callbacks, provided by the bus front-end, with
self-suspension/resumption. Also, clean up the bus
front-ends. Make sure that the interrupt handler is
disestablished during suspension. Get rid of driver-private
flags (e.g., RTW_F_ENABLED, ath_softc->sc_invalid); use
device_is_active()/device_has_power() calls, instead.
In the network-class suspend handler, call if_stop(, 0)
instead of if_stop(, 1), because the latter is superfluous
(bus- and driver-suspension hooks will 'disable' the NIC),
and it may cause recursion.
In the network-class resume handler, prevent infinite
recursion through if_init() by getting out early if we are
self-suspending (PMF_F_SELF).
rtw(4) improvements:
Destroy rtw(4) callouts when we detach it. Make rtw at
pci detachable. Print some more information with the "rx
frame too long" warning.
Remove activate() methods:
Get rid of rtw_activate() and ath_activate(). The device
activate() methods are not good for much these days.
Make ath at cardbus resume with crypto functions intact:
Introduce a boolean device property, "pmf-powerdown". If
pmf-powerdown is present and false, it indicates that a
bus back-end should not remove power from a device.
Honor this property in cardbus_child_suspend().
Set this property to 'false' in ath_attach(), since removing
power from an ath at cardbus seems to lobotomize the WPA
crypto engine. XXX Should the pmf-powerdown property
propagate toward the root of the device tree?
Miscellaneous ath(4) changes:
Warn if ath(4) tries to write crypto keys to suspended
hardware.
Reduce differences between FreeBSD and NetBSD in ath(4)
multicast filter setup.
Make ath_printrxbuf() print an rx descriptor's status &
key index, to help debug crypto errors.
Shorten a staircase in ath_ioctl(). Don't check for
ieee80211_ioctl() return code ERESTART, it never happens.
going through a syscall trap. These are currently useful for rumps.
As all the standard syscalls are not compiled into librump, mark
relevant ones with RUMP in syscalls.master. To do e.g. a mkdir
"system call" from a rump, one would call
rump_sys_mkdir("/dir", mode, &eval);
where the last value represents something to store errno into.
never sleep. Should fix PR/37245 by <yamt>.
- Fix a regression - dissalow catching of bound threads. Also, allow
migration of non-bound kthreads, this restriction seems pointless.
- Few micro-optimisations, misc.
Add a device iterator object, deviter_t, and methods deviter_init(),
deviter_first(), and deviter_next() for visiting each device in
the device tree.
Take care not to re-shutdown a device in the event that the machine
panics during reboot and the operator types 'reboot' at the kernel
debugger prompt.
While I'm here, sprinkle PMF_FN_ARGS, PMF_FN_PROTO, et cetera.
device_private(NULL). That will ease the conversion of drivers to splitted
softc/device_t which is mandatory for cube-autoconf and will be done in
HEAD.
- Add a lot of missing selinit() and seldestroy() calls.
- Merge selwakeup() and selnotify() calls into a single selnotify().
- Add an additional 'events' argument to selnotify() call. It will
indicate which event (POLL_IN, POLL_OUT, etc) happen. If unknown,
zero may be used.
Note: please pass appropriate value of 'events' where possible.
Proposed on: <tech-kern>
Make the FreeBSD and Linux compat code convert the parameters to their
native representation and call the native routines.
Remove KAUTH_PROCESS_SCHEDULER_GET/SET.
Update documentation and examples.
XXX: For now, only the Linux compat code does the priority conversion
XXX: right.
Linux priority conversion code from yamt@, thanks!
Okay yamt@.
can register a shutdown handler explicitely.
Install a pci bus shutdown handler which disables bus master accesses
for all childs, so the drivers don't need to care.
This will hopefully be sufficient to replace the shutdownhooks
(together with the powerhooks). (It has been suggested to use some
general event notification framework for shutdown handlers, but there
might be cases where shutdown handlers must be run in an order following
the device hierarchy, which wouldn't be easy with event handlers
not tied to drivers.)
approved by David Young
instead of doing it in syscall_intern().
Note that syscall_intern() must still be called when the flags change
since many ports use a different copy of the syscall entry code when
tracing is enabled.
too much in the RB_HALT case, making the "press a key to reboot" prompt
a bad joke. doshutdownhooks() should do shutdownhooks, not more.
Since it is md code which decides about halt/poweroff etc,
pmf_system_shutdown() should be called from there if appropriate.
respect the kernel device tree. (It is arguably ugly to special-case
wscons here, but as long as there is only one driver to be dealt with
it is not worth to introduce another set of hooks.)
Resume the X server at the end of resume, if everything went well.
Acquire the big KERNEL_LOCK before the device tree is walked on
suspend, until after the walk on resume. This is needed to avoid
device accesses by secondary CPUs, and it effectively keeps user
programs from interfering with the suspend process. This might be
revisited when all drivers are using private locks for MP-safeness
(but FreeBSD still does the same afaics).
It should be unnecessary now to switch secondary CPUs offline in
the powerd suspend script.
When returning a "close" symbol, make sure the value being searched for is
within the symtab. This prevents ddb matching addresses beyond the end of
the kernel.
words, don't pass an action and a request, and just use a single action to
indicate what is the operation in question.
This is the first step in fixing PR/37986, which calls for policy/priority
checking in the secmodel code. Right now we're lacking room for another
parameter required to make a decision, and this change makes room for such.
- revert struct sched_param changes to restore ABI.
- instead, add 'policy' arguments to _sched_{get,set}param syscalls.
(this is an API/ABI change.)
- correct kauth_authorize_process arguments.
while i'm here,
- don't bother to kmem_alloc for 4-byte structure.
Instead of passing the (un)real system call code and syscall table pointer,
just pass the number of arguments - which is what ktrace really wants.
Ride forthcoming 4.99.53
by config_detach_children(), because the latter can work recursively
and remove any number of devices, so rewrite config_detach_children()
to restart list traversal after each call of config_detach(), and since
only one user of device_foreach_child() is left (in kern_drvctl.c),
and it is simpler to open-code the loop than to deal with callbacks,
just remove it.
Fixes the problems with systems having > 2GB of memory;
From <drochner>, thanks for catching this!
- Convert pool to pool-cache;
- Adjust copyright while here;
- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
in-kernel priority is used. Reported by <drochner>.
Minor fixes for scheduling calls to conform the POSIX:
- If pid is equal to zero, use the calling process;
- In case of permission problem, return EPERM instead of EACESS;
- sched_setscheduler() should return previously used policy;
- pthread_* calls should return the error code or zero;
Should fix the namespace problems (and builds of some packages):
- Move cpuset_t defintion from pset.h to sched.h;
- Remove the #include of pset.h in pthread.h;
described in PR kern/37808. The ideal solution here is to kill vnode
lock recursion, which should not be hard once it is understood what
the two remaining callers of vn_setrecurse() are doing.
shutdown). There are still problems with device access and a PR will be
filed.
- Kill checkalias(). Allow multiple vnodes to reference a single device.
- Don't play dangerous tricks with block vnodes to ensure that only one
vnode can describe a block device. Instead, prohibit concurrent opens of
block devices. As a bonus remove the unreliable code that prevents
multiple file system mounts on the same device. It's no longer needed.
- Track opens by vnode and by device. Issue cdev_close() when the last open
goes away, instead of abusing vnode::v_usecount to tell if the device is
open.
- Add a KAUTH_PROCESS_SCHEDULER action, to handle scheduler related
requests, and add specific requests for set/get scheduler policy and
set/get scheduler parameters.
- Add a KAUTH_PROCESS_KEVENT_FILTER action, to handle kevent(2) related
requests.
- Add a KAUTH_DEVICE_TTY_STI action to handle requests to TIOCSTI.
- Add requests for the KAUTH_PROCESS_CANSEE action, indicating what
process information is being looked at (entry itself, args, env,
open files).
- Add requests for the KAUTH_PROCESS_RLIMIT action indicating set/get.
- Add requests for the KAUTH_PROCESS_CORENAME action indicating set/get.
- Make bsd44 secmodel code handle the newly added rqeuests appropriately.
All of the above make it possible to issue finer-grained kauth(9) calls in
many places, removing some KAUTH_GENERIC_ISSUSER requests.
- Remove the "CAN" from KAUTH_PROCESS_CAN{KTRACE,PROCFS,PTRACE,SIGNAL}.
Discussed with christos@ and yamt@.