The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.
quick consensus on tech-kern
tech-kern:
- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.
VOP_STRATEGY() no longer gets called from interrupt context via
biodone() -> sw_reg_iodone() -> sw_reg_start().
Removes a deadlock condition reported in PR 37109.
Ok: YAMAMOTO Takashi <yamt@netbsd.org>
use UVM_UNKNOWN_OFFSET in the call to uvm_map_prepare.
This fixes a '"panic: malloc: out of space in kmem_map" when it's not
really' testcase of mine, and one reported to me by chuq. This is likely
to fix PR/35587 as well.
Looks/seems fine to me from chuq and yamt. Thanks.
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
setting vnode sizes, is handled elsewhere: file system vnode creation
or spec_open() for regular files or block special files, respectively.
Add a call to VOP_MMAP() to the pagedvn exec path, since the vnode
is being memory mapped.
reviewed by tech-kern & wrstuden
WQ_PERCPU flag for workqueue and additional argument for workqueue_enqueue()
to assign a CPU might be used. Notes:
- For now, the list is used for workqueue_queue, which is non-optimal,
and will be changed with array, where index would be CPU ID.
- The data structures should be changed to be cache-friendly.
Reviewed by: <yamt>, <tech-kern>
from doc/BRANCHES:
idle lwp, and some changes depending on it.
1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
Bug fixes:
- Fix crash reported by Scott Ellis on current-users@.
- Fix race conditions in enforcing the Veriexec rename and remove
policies. These are NOT security issues.
- Fix memory leak in rename handling when overwriting a monitored
file.
- Fix table deletion logic.
- Don't prevent query requests if not in learning mode.
KPI updates:
- fileassoc_table_run() now takes a cookie to pass to the callback.
- veriexec_table_add() was removed, it is now done internally. As a
result, there's no longer a need for VERIEXEC_TABLESIZE.
- veriexec_report() was removed, it is now internal.
- Perform sanity checks on the entry type, and enforce default type
in veriexec_file_add() rather than in veriexecctl.
- Add veriexec_flush(), used to delete all Veriexec tables, and
veriexec_dump(), used to fill an array with all Veriexec entries.
New features:
- Add a '-k' flag to veriexecctl, to keep the filenames in the kernel
database. This allows Veriexec to produce slightly more accurate
logs under certain circumstances. In the future, this can be either
replaced by vnode->pathname translation, or combined with it.
- Add a VERIEXEC_DUMP ioctl, to dump the entire Veriexec database.
This can be used to recover a database if the file was lost.
Example usage:
# veriexecctl dump > /etc/signatures
Note that only entries with the filename kept (that is, were loaded
with the '-k' flag) will be dumped.
Idea from Brett Lymn.
- Add a VERIEXEC_FLUSH ioctl, to delete all Veriexec entries. Sample
usage:
# veriexecctl flush
- Add a 'veriexec_flags' rc(8) variable, and make its default have
the '-k' flag. On systems using the default signatures file
(generaetd from running 'veriexecgen' with no arguments), this will
use additional 32kb of kernel memory on average.
- Add a '-e' flag to veriexecctl, to evaluate the fingerprint during
load. This is done automatically for files marked as 'untrusted'.
Misc. stuff:
- The code for veriexecctl was massively simplified as a result of
eliminating the need for VERIEXEC_TABLESIZE, and now uses a single
pass of the signatures file, making the loading somewhat faster.
- Lots of minor fixes found using the (still under development)
Veriexec regression testsuite.
- Some of the messages Veriexec prints were improved.
- Various documentation fixes.
All relevant man-pages were updated to reflect the above changes.
Binary compatibility with existing veriexecctl binaries is maintained.
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.
to skip unnecessary flushing when layered file system vnodes are recycled.
this also prevents a deadlock with the dodgy LFS putpages routine.
fixes the non-LFS part of PR 36150.
with newlock2 merge:
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.
Restores source compatibility with pre-newlock2 tools like ps or top.
Reviewed by Andrew Doran.
- Don't consider kernel threads when calculating the load average. Their
priorities are no longer adjusted by the scheduler, and their level of
activity is dependent upon running user processes.
- Change the (l->l_priority > PZERO) check in uvm_meter() to (l->l_flag &
L_SINTR). I think this check was originally intended to weed out
processes sleeping interruptably.
by Slava Semushin <slava.semushin@gmail.com>.
To verify that no nasty side effects of duplicate includes (or their
removal) have an effect here, I've compiled an i386/ALL kernel with
and without the patch, and the only difference in the resulting .o
files was in shifted line numbers in some assert() calls.
The comparison of the .o files was based on the output of "objdump -D".
Thanks to martin@ for the input on testing.
- finish implementing splraiseipl (and makeiplcookie).
http://mail-index.NetBSD.org/tech-kern/2006/07/01/0000.html
- complete workqueue(9) and fix its ipl problem, which is reported
to cause audio skipping.
- fix netbt (at least compilation problems) for some ports.
- fix PR/33218.
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
dumpdev. this occurs when we try to set the dumpdev to a device
with no driver loaded. this fixes PR#34872.
in sys_swapctl, if bdevsw_lookup() fails, set dumpdev = NODEV
before calling cpu_dumpconf(). (this also fixes PR#34872.)
XXX: cpu_dumpconf() should probably be changed to take a dumpdev
XXX: and return an error in such cases, but that is a much more
XXX: intrusive change.
XXX2: this is only run-tested on sparc64 and compile tested on a
XXX2: couple of platforms.
- Add a few scopes to the kernel: system, network, and machdep.
- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.
- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.
- Update all relevant documentation.
- Add some code and docs to help folks who want to actually use this stuff:
* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.
This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...
* And of course, documentation describing how to do the above for quick
reference, including code samples.
All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:
http://kauth.linbsd.org/kauthwiki
NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:
- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.
(or if you feel you have to, contact me first)
This is still work in progress; It's far from being done, but now it'll
be a lot easier.
Relevant mailing list threads:
http://mail-index.netbsd.org/tech-security/2006/01/25/0011.htmlhttp://mail-index.netbsd.org/tech-security/2006/03/24/0001.htmlhttp://mail-index.netbsd.org/tech-security/2006/04/18/0000.htmlhttp://mail-index.netbsd.org/tech-security/2006/05/15/0000.htmlhttp://mail-index.netbsd.org/tech-security/2006/08/01/0000.htmlhttp://mail-index.netbsd.org/tech-security/2006/08/25/0000.html
Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).
Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.
Happy birthday Randi! :)
intervened by truncation.
it also fixes a deadlock. (g_glock vs pages locking order)
- uvm_vnp_setsize: modify v_size while holding v_interlock.
reviewed by Chuck Silvers.
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
code and not trying to use temporary solutions.
Lots of comments and help from YAMAMOTO Takashi, also thanks to the PaX
author for being quick to recognize that something fishy's going on. :)
Hook up in mmap/vmcmd rather than (ugh!) uvm_map_protect().
Next time I suggest to commit a temporary solution just revoke my
commit bit.
* uvm_loanzero may call uvm_analloc which will return with anon->an_lock
locked. This lock is never dropped by uvm_loanzero and AFAICS the caller
doesn't drop it either.
any purpose (done by a macro, so we don't save any cycles for now)
-kill vm_fault_t; it is not needed for real faults, and for simulated
faults (wiring) it can be replaced by UVM internal flags
-remove <uvm/uvm_fault.h> from uvm_extern.h again
- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
where pgo_get returns pages which don't belong to the uobj.
also fix an XXX in uvm_loananon and lock-unlock mismatch in uvm_loanuobj.
PR/28372, PR/32665 (Alan Barrett).