Commit Graph

7582 Commits

Author SHA1 Message Date
hubertf 739e259054 Let kernel build when MALLOCLOG is defined but DIAGNOSTIC is not.
Else, hitmlog() is defined but not used, which triggers a warning.
2010-01-22 08:32:05 +00:00
pgoyette 17d5113226 Remove unnecessary call to kauth_cred_free().
This resolves an occassional crash I'd been experiencing as reported on
current-users@

Fix suggested by and OK elad@
2010-01-21 04:40:22 +00:00
rmind f6d80c92e0 pool_cache_invalidate: comment out invalidation of per-CPU caches (nobody depends
on it, at the moment) until we decide how to fix it (xcall(9) cannot be used from
interrupt context).  XXX: Perhaps implement XC_HIGHPRI.
2010-01-20 23:40:42 +00:00
pooka 654415b2b7 Get rid of last "easy" kernel symbols starting with __:
__assert -> kern_assert
__sigtimedwait1 -> sigtimedwait1
__wdstart -> wdstart1

The rest are MD and/or shared with userspace, so they will require
a little more involvement than what is available for this quick
"ride the 5.99.24 bump" action.
2010-01-19 22:28:30 +00:00
pooka f32c83c1bd Rename a few routines from _file() to _vfs() for consistency.
Ride 5.99.24 bump.
2010-01-19 22:17:44 +00:00
pooka 10fe49d72c Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client.  This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached.  However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff.  ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
2010-01-19 22:06:18 +00:00
dyoung 71080992ef A new survey of the code indicates that the very highest interrupt
priority level where the kernel accesses alldevs is IPL_VM, where
some hardware interrupt handlers call config_deactivate(9).  Lower
the IPL of alldevs_mtx from IPL_HIGH to IPL_VM, accordingly.
2010-01-19 21:54:53 +00:00
dyoung 2905b5fc8d Refactor: as suggested by rmind@, extract duplicate code into
subroutines config_alldevs_enter() and config_alldevs_exit().  This
change amounts to textual substitution.  No functional change intended.

We do not collect garbage in device_lookup(), so there is no use dumping
it: get rid of the garbage list.  Do not call config_dump_garbage().

In device_lookup_private(), call device_lookup() instead of duplicating
the code from device_lookup().
2010-01-19 21:24:36 +00:00
pooka 27d8901688 Update comment: unloaded modules which were pumped up by the
bootloader are not freed at the end of bootstrap (there should be
none, although this is not asserted.  maybe it should be?).
2010-01-19 15:23:14 +00:00
bouyer 85e9e8e2b4 Revert previous. The KASSERT() is right and my analysis is wrong,
as pointed out by pooka@.
2010-01-15 19:28:26 +00:00
pooka 07df6e2689 Fix reference counting for vfsops in mount. Otherwise it's possible
(for an unprivileged user) to force vfs modules to remain loaded
forever.  Also, it's possible for an admin with fat fingers to have
to curse out loud (a lot) and reboot.

.. or at least fix things as much as seems to be possible without
involving 1000 zorkmids.  do_sys_mount() takes either struct vfsops
(which hopefully came properly referenced) or a userspace string
for file system type.  The standard in-kernel calling convention
of "do_sys_mount(l, vfs_getopsbyname("nfs"), NULL," is not to be
considered healthy, kosher, or even tasty (although if vfs_getopsbyname()
fails the whole thing *currently* fails without the program counter
pointing to hyperspace).
2010-01-15 01:00:46 +00:00
bouyer 7ffaf66ccb Remove KASSERT(vp->v_usecount == 1) in getnewvnode() and ungetnewvnode().
Another process could be vget()ing the vnode and bump v_usecount while
getcleanvnode() is vclean()ing it (as vclean drops the interlock).
vget() will then wait for VI_XLOCK or VI_FREEING to clear; and we could test
this assertion while the other process is still slepping. We could even
end up in ungetnewvnode() before this other process got a chance to run.
2010-01-14 22:41:52 +00:00
mrg efc854cf68 introduce a new function that returns a unique string for each cpu:
char *cpu_name(struct cpu_info *);

and use it when setting up the runq event counters, avoiding an 8 byte
kmem(4) allocation for each cpu.  there are more places the cpuname is
used that can be converted to using this new interface, but that can
and will be done as future work.

as discussed with rmind.
2010-01-13 01:57:17 +00:00
pooka 065afcb61a Minimize unnecessary differences in rump. 2010-01-13 01:53:38 +00:00
rmind 17990e0041 Revert 1.194 rev. 2010-01-12 22:11:13 +00:00
martin 693845d2c3 Add a new optional function device_register_post_config(), symmetric to
device register, called after config is done with a device.
Only used if an arch defines  __HAVE_DEVICE_REGISTER_POSTCONFIG.
2010-01-10 13:42:34 +00:00
rmind a4c32a06f6 softint_overlay: disable kernel preemption before curlwp->l_cpu use. 2010-01-09 19:02:17 +00:00
dyoung cd6e1fbf91 Expand PMF_FN_* macros. 2010-01-08 19:53:10 +00:00
pooka 113544b039 vcount() lost its purpose when opening multiple block devices was
made impossible, oh, two years ago.  nuke it (yes, the interface
name is overgeneric).
2010-01-08 13:07:26 +00:00
rmind 8431ea0b5e softint_execute: release/re-acquire kernel-lock depending on SOFTINT_MPSAFE
flag.  Keeping it held for MP-safe cases break the lock order assumptions.
Per discussion with <martin>.
2010-01-08 12:10:46 +00:00
rmind 97bb57c79f Simplify device G/C: use global list and config_alldevs_unlock_gc(). 2010-01-08 12:07:08 +00:00
pooka c3183f3251 The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase).  Plenty of mix'n match upper/lowercase has creeped
into the tree since then.  Nuke the macros and convert all callsites
to lowercase.

no functional change
2010-01-08 11:35:07 +00:00
dyoung e94f23b742 Move all copies of ifattr_match() to sys/kern/subr_autoconf.c. 2010-01-08 00:09:44 +00:00
dyoung bb333bec6f Add a do-nothing child-detachment hook, null_childdetached(device_t,
device_t).
2010-01-07 22:39:52 +00:00
pooka 8797d86fd0 Make sure struct vattr contains no random bits of kernel memory
after vattr_null().  This is especially nice considering things
like puffs, where the contents are copied to userspace.
2010-01-07 19:54:40 +00:00
dyoung 3fec0e6fa3 Call device_lookup() from device_lookup_private() instead of
duplicating code.

Per suggestions by rmind@:

Simplify some code that used "empty statements," ";".

Don't collect garbage in device_lookup{,_private}(), since they
are called in interrupt context from certain drivers.

Make config_collect_garbage() KASSERT() that it does not run in
interrupt or software-interrupt context.
2010-01-05 22:42:16 +00:00
skrll 7fe4e16803 Regen. 2010-01-05 15:25:32 +00:00
skrll e359b65038 Check for dev_t and time_t arguments and mark them as 64bit. 2010-01-05 15:23:32 +00:00
uebayasi 80d41370e7 Use CTASSERT() for constant only assertions. 2010-01-04 16:01:42 +00:00
mlelstv c0a2fae3f5 drop __predict micro optimization in pool_init for cleaner code. 2010-01-03 09:42:22 +00:00
mlelstv 0ca557be77 Pools are created way before the pool subsystem mutexes are
initialized.

Ignore also pool_allocator_lock while the system is in cold state.

When the system has left cold state, uvm_init() should have
also initialized the pool subsystem and the mutexes are
ready to use.
2010-01-03 01:07:19 +00:00
mlelstv d5c1a554d8 Move initialization of pool_allocator_lock before its first use.
This failed on archs where a mutex isn't initialized to a zero
value.

Defer allocation of pool log to the logging action, if allocation
fails, it will be retried the next time something is logged.

Clear pool log on allocation so that ddb doesn't crash when showing
so far unused log entries.
2010-01-02 15:20:39 +00:00
tsutsui 9d6449710b Update default TOD value to 2010/01/01 12:00:00. 2010-01-02 10:57:35 +00:00
dholland a4ce70f1ad typo in comment 2010-01-01 03:22:13 +00:00
elad a0c694197e Tiny cosmetics... 2009-12-31 02:20:36 +00:00
rmind ac4dea4ab5 - nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.
2009-12-30 23:54:30 +00:00
rmind 65265dedb7 sched_catchlwp: fix the case when other CPU might see curlwp->l_cpu != curcpu()
while LWP is finishing context switch.  Should fix PR/42539, tested by martin@.
2009-12-30 23:49:59 +00:00
rmind ffb9a7ee3c sigactsunshare(): set reference count in a case of new sigacts allocation.
Bug (e.g. memory leak) can happen when using clone(2) call.
2009-12-30 23:31:56 +00:00
elad 097059fb23 Don't bother caching egid. It'll be removed soon. 2009-12-30 22:12:12 +00:00
elad d4b368687f Turn PA_INITIALIZED to a reference count for the pool allocator, and once
it drops to zero destroy the mutex we initialize. This fixes the problem
mentioned in

	http://mail-index.netbsd.org/tech-kern/2009/12/28/msg006727.html

Also remove pa_flags now that it's no longer needed.

Idea from matt@, okay matt@.
2009-12-30 18:57:16 +00:00
elad 149888f85d Always use resource limits from the process, as proposed in
http://mail-index.netbsd.org/tech-kern/2009/12/30/msg006756.html

okay christos@.
2009-12-30 18:33:53 +00:00
elad 7bbc644a97 Use credentials from the socket. 2009-12-30 06:58:50 +00:00
elad 4046eb056d Move the listener plugging to module_init(), as it runs after kauth_init()
now. (Leaving only the module kthread creation in module_init2().)
2009-12-29 17:49:21 +00:00
elad 841ec82ba2 Add credentials to to sockets.
We don't need any deferred free etc. because we no longer free the
credentials in interrupt context.

Tons of help from matt@, thanks!
2009-12-29 04:23:43 +00:00
elad 34ce871d58 Remove commented-out code that should not have gone in. 2009-12-29 03:48:18 +00:00
elad ac90530da8 In veriexec_file_verify(), always check 'lockstate' before unlocking
'veriexec_op_lock'. Triggering a panic is possible in the path from
veriexec_openchk() (easily repeatable). The two switch cases at the
bottom of the function are going to panic anyway, but they might as well
panic as they're intended to as opposed to tripping over a locking
violation...
2009-12-28 07:16:41 +00:00
elad fa8206aeb0 Our error paths can call veriexec_file_free(), whicn in turn will try to
rw_destroy() the vfe lock. The easiest way to fix it for now is simply to
initialize the lock right after allocating the vfe...
2009-12-28 02:35:20 +00:00
elad 3bd7842cba Put a space after ':'... 2009-12-26 21:41:14 +00:00
elad d67e78d45a Only kmem_free() the filename if we have one. 2009-12-25 22:57:54 +00:00
elad 066035a515 Oops - unintentional locking bit that's not yet ready. 2009-12-25 20:07:18 +00:00
elad 471d0b3079 This subsystem had leftovers from the time it was part of Veriexec, and then
from when I first implemented it as "metahook."

Cleanup a lot of the mess by unifying variable names, add struct member
prefixes, adjust comments, etc.

No functional change intended.
2009-12-25 20:05:43 +00:00
elad c2d2f61cc2 No need for these prototypes here. 2009-12-25 18:51:41 +00:00
elad 36ec4b320c When reporting open files using sysctl, don't use 'filehead' to fetch files,
as we don't have a process context to authorize on. Instead, traverse the
file descriptor table of each process -- as we already do in one case.

Introduce a "marker" we can use to mark files we've seen in an iteration, as
the same file can be referenced more than once.

Hopefully this availability of filtering by process also makes life easier
for those who are interested in implementing process "containers" etc.
2009-12-24 19:01:12 +00:00
mbalmer 1ce3f76abb Fix typo, no code change. 2009-12-23 09:23:53 +00:00
pooka 3142d3ac31 Define namei flag INRENAME and set it if a lookup operation is part
of rename.  This helps with building better asserts for rename in
the DELETE lookup ... the RENAME lookup is quite obviously a part
of rename.
2009-12-23 01:09:24 +00:00
elad 4f2529fdb9 Including sysctl.h once is enough. 2009-12-23 00:21:38 +00:00
dsl 668acfeeca Use sizeof correct type, not pointer to wrong type.
Fixes PR/42498.
This has been wrong since the initial import!
2009-12-22 20:50:46 +00:00
rmind 4fff15550a Add comment about locking. 2009-12-20 23:00:59 +00:00
mrg 9a7ae38999 remove dated and wrong comments about curlwp being NULL.
_kernel_{,un}lock() always assume it is valid now.
2009-12-20 20:42:23 +00:00
pooka f015d3c5a1 Add a pointing to an explanation of why we have #ifdef pmax stuff in here. 2009-12-20 19:06:44 +00:00
dsl 2a54322c7b If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567
2009-12-20 09:36:05 +00:00
rmind 3c74cdf150 signal(9) code: add some comments, improve/fix wrong ones. While here, kill
trailing whitespaces, wrap long lines, etc.  No functional changes intended.
2009-12-20 04:49:09 +00:00
martin cecef5e6d5 Use the kernel space version of the vfs name, not the original userspace
pointer. Avoids crashes on archs with completely separate userspace VA.
2009-12-19 20:28:27 +00:00
rmind ebd0ab14ab sigtimedwait: fix a memory leak (which happens since newlock2 times).
Allocate ksiginfo on stack since it is safe and sigget() assumes that it is
not allocated from pool (pending signals via sigput()/sigget() "mill" should
be dynamically allocated, however).  Might be useful to revisit later.

Likely the cause of PR/40750 and indirect cause of PR/39283.
2009-12-19 18:25:54 +00:00
rmind 1069745866 Replace few USER_TO_UAREA/UAREA_TO_USER uses, reduce sys/user.h inclusions. 2009-12-17 01:25:10 +00:00
dsl bc86c9b425 Don't ERESTART write() calls for now.
I suspect some programs don't allow for the partial transfer.
2009-12-15 18:35:18 +00:00
dyoung 62f43df82a Per rmind@'s suggestion, avoid an acquire/release-mutex dance by
collecting garbage in two phases:  in the first stage, with
alldevs_mtx held, gather all of the objects to be freed onto a
list.  Drop alldevs_mtx, and in the second stage, free all the
collected objects.

Also per rmind@'s suggestion, remove KASSERT(!mutex_owned(&alldevs_mtx))
throughout, it is not useful.

Find a free unit number and allocate it for a new device_t atomically.
Before, two threads would sometimes find the same free unit number
and race to allocate it.  The loser panicked.  Now there is no
race.

In support of the changes above, extract some new subroutines that
are private to this module: config_unit_nextfree(), config_unit_alloc(),
config_devfree(), config_dump_garbage().

Delete all of the #ifdef __BROKEN_CONFIG_UNIT_USAGE code.  Only
the sun3 port still depends on __BROKEN_CONFIG_UNIT_USAGE, it's
not hard for the port to do without, and port-sun3@ had fair warning
that it was going away (>1 week, or a few years' warning, depending
how far back you look!).
2009-12-15 03:02:24 +00:00
matt 15aa4c53c9 Regen (new makesyscalls.sh) 2009-12-14 00:53:32 +00:00
matt e110dba586 Merge from matt-nb5-mips64 2009-12-14 00:47:10 +00:00
dsl 723a159171 Another, better, fix for PR/26567.
Only sleep once within each pipe_read/pipe_write call.
If there is no data/space available after we wakeup return ERESTART so
then the 'fd' number is validated again.
A simple broadcast of the cvs is then enough to evict the correct threads
when close() is called from an active thread.
2009-12-13 20:02:23 +00:00
dsl e19cad8fcc Revert most of the previous change.
Only one fd needs clobbering, not all fds that reference the pipe.
This may be what ad@ realised when he tried to add the same code to
sockets. Unfixes part of PR/26567.
2009-12-13 18:27:02 +00:00
matt dfa7467a6e Pullup from matt-nb5-mips64.
For each syscall, add a flag for the return value or an argument indicating
that it is a 64-bit argument.  Also include the number of 64-bit arguments.
In theory this could get most of the code in compat/netbsd32/netbsd32_netbsd.c
but not at the moment due to multiply defined structures.
2009-12-13 04:47:45 +00:00
dsl c7517e0921 Add support for unblocking read/write when close called.
Fixes PR/26567 for pipes.
(NB ad backed out the fix for sockets)
2009-12-12 21:28:04 +00:00
dsl 9987412565 Fix comment for arg types of sys_profil(). 2009-12-12 17:48:54 +00:00
dsl ef379fcb95 Bounding the 'nfds' arg to poll() at the current process limit for actual
open files is rather gross - the poll map isn't required to be dense.
Instead limit to a much larger value (1000 + dt_nfiles) so that user
programs cannot allocate indefinite sized blocks of kvm.
If the limit is exceeded, then return EINVAL instead of silently truncating
the list.
(The silent truncation in select isn't quite as bad - although even there
any high bits that are set ought to generate an EBADF response.)
Move the code that converts ERESTART and EWOULDBLOCK into common code.
Effectively fixes PR/17507 since the new limit is unlikely to be detected.
2009-12-12 17:47:05 +00:00
dsl 17a42f25f1 Report L_INMEM in the lwp info as well. 2009-12-12 17:29:34 +00:00
dsl f537a9ce5f Always set L_INMEM to maintain binary compatibility. 2009-12-12 17:03:19 +00:00
tsutsui 428585a7d8 Remove `volatile' qualifier from argument types of
struct timeval passed to todr_gettime(9) and todr_settime(9).
We no longer have an ancient and volatile struct timeval `time'
global since we have switched to MI timercounter(9) on all port.

XXX1: some of these RTC drivers still assume 32bit time_t
XXX2: some of these should be rewritten to use todr_[gs]ettime_ymdhms()
XXX3: todr(9) man page doesn't mention todr_[gs]ettime_ymdhms()
2009-12-12 15:10:34 +00:00
tsutsui a49264523b Use bool where appropriate. 2009-12-12 11:35:16 +00:00
tsutsui efd28fda6a Don't use int to get delta of time_t values. 2009-12-12 11:28:40 +00:00
dsl eff3e2124a Avoid leaking a mutex_obj when pipe_create() fails for the read pipe.
Remove the unused argument from pipeclose().
2009-12-10 20:55:17 +00:00
matt 6a9e4e8eeb Change u_long to vaddr_t/vsize_t in exec code where appropriate (mostly
involves setregs and vmcmds).  Should result in no code differences.
2009-12-10 14:13:48 +00:00
drochner a1a04dd1be If a struct sigevent with SIGEV_SIGNAL is passed to timer_create(2),
check the signal number to be in the allowed range. An invalid
signal number could crash the kernel by overflowing the sigset_t
array.
More checks would be good, and SIGEV_THREAD shouldn't be dropped
silently, but this fixes at least the local DOS vulnerability.
2009-12-10 12:39:12 +00:00
drochner fe1db36da9 fix some security critical bugs:
-an invalid signal number passed to mq_notify(2) could crash the kernel
 on delivery -- add a boundary check
-mq_receive(2) from an empty queue crashed the kernel by NULL dereference
 in timeout calculation -- handle the NULL case
-likewise for mq_send(2) to a full queue
-a user could set mq_maxmsg (the maximal number of messages in a queue)
 to a huge value on mq_open(O_CREAT) and later use up all kernel
 memory by mq_send(2) -- add a sysctl'able limit which defaults
 to 16*mq_def_maxmsg

(mq_notify(2) should get some more checks, and SIGEV_* values other
than SIGEV_SIGNAL should be handled somehow, but this doesn't look
security critical)
2009-12-10 12:22:48 +00:00
dsl 7a42c833db Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.
2009-12-09 21:32:58 +00:00
dsl 43bac9730d Correct comment, pipelock() no longer releases the mutex. 2009-12-06 20:26:55 +00:00
pooka d2445bdd09 tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.
2009-12-05 22:38:19 +00:00
pooka faa8e1b3e3 Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal.  I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.
2009-12-05 22:34:43 +00:00
pooka debaf78619 explicitly initialize static boolean 2009-11-30 15:37:56 +00:00
pooka 051b421f3f Create CTL_HW before creating nodes on top of it (sysctl constructors
run in "random" order).
2009-11-30 11:28:35 +00:00
pooka 0fb0ab1101 Fix kernel build on platforms which define __BROKEN_CONFIG_UNIT_USAGE
and therefore don't take config_alldevs_lock() in config_devalloc().
2009-11-29 15:17:30 +00:00
dsl 454df0687b When truncating a request in bounds_check_with_mediasize() multiply
by the provided sector size instead of 512.
Fixes last bit of PR/31565
2009-11-28 22:38:07 +00:00
bouyer 8c392da154 Previous did cause a deadlock with layered FS: the vrele thread
can sleep on the vnode lock, while vget is sleeping on the
VI_INACTNOW flag (or the vget caller is looping on vget returning failure
because of the VI_INACTNOW flag). With layered FSes, the upper and lower
vnodes share the same lock, so the vget() caller above can be already
holding the vnode lock.

Fix by dropping VI_INACTNOW before sleeping on the vnode lock in
vrelel(), and check the ref count again once we have the lock. If the
vnode has more than one reference, donc VOP_INACTIVE it.
Fix PR kern/42318 and PR kern/42377
patch tested by Hisashi T Fujinaka, Joachim König, Stephen Borrill and
Matthias Scheler.
2009-11-28 10:10:17 +00:00
pooka bbc50ef41d Due to the schizophrenic nature of kobj (mem + vfs source),
split the module in twain to subj_kobj.c (master + mem) and
subr_kobj_vfs.c (vfs).
2009-11-27 17:54:11 +00:00
pooka 8102fe7341 Move rootfs-related init from init_main() to vfs_mountroot().
Reduces code re-written in rump.
2009-11-27 16:43:51 +00:00
pooka 8257134a74 Make this work on some m68k ports which like putting the disklabel
in the third sector (or have copypasted disklabel.h from a port
which likes doing that ;).
2009-11-27 13:29:33 +00:00
tsutsui c48b085654 u_short -> uint16_t, some KNF. 2009-11-27 11:23:50 +00:00
pooka 1798957738 Add DV_VIRTUAL for non-backed virtual devices and allow to mount
root from a DV_VIRTUAL device.
2009-11-26 20:52:19 +00:00
pooka baffc0cbae typo in comment (it actually breaks the script totally. i wish
more typos in comments were as effective)
2009-11-26 17:23:48 +00:00
pooka 91ac00ac3a pipe +RUMP 2009-11-26 17:20:20 +00:00
pooka 67ff6315cd Add rump support for the special handling required by pipe(2). 2009-11-26 17:19:54 +00:00
pooka a91020162b Instead of a single register_t as the retval of rump syscalls,
use an array of two.  No functional change ... yet.
2009-11-26 16:34:24 +00:00
pooka 024c040316 modctl +RUMP 2009-11-26 09:00:45 +00:00
matt 11af2f9cfa Kill proc0paddr. Use lwp0.l_addr instead. 2009-11-26 00:19:11 +00:00
pooka 64ab232858 make WAPBL_DEBUG_PRINT compile 2009-11-25 14:43:31 +00:00
pooka 5fc3d70195 Remove highly questionable assert which demans that the kernel symbol
table is in memory at a lower address than the string table.
2009-11-25 13:16:55 +00:00
rmind 606b1d9782 Add assert that ce->ce_func is not NULL. 2009-11-24 20:11:50 +00:00
dyoung c8fed843e1 Address some of the concerns that SPLDEBUG is not machine-independent,
Part 1 of N:

        There is not an MI ordering of interrupt priority levels,
        so use == IPL_HIGH and != IPL_HIGH instead of >= IPL_HIGH
        and < IPL_HIGH.  Ignore 'cold' and always use curcpu(),
        since cpu_info_primary is MD.

Other changes:

        There is no need to create symbols named _spldebug_* and
        strong aliases to them.  Just use symbols spldebug_*,
        instead.  Use a temporary variable instead of repeat
        cpu_index(9) calls.  KASSERT() that cpu_index(9) is <
        MAXCPUS.
2009-11-24 17:28:32 +00:00
pooka 09dbb89b44 If cpu_disklabel includes struct dkbad, define __HAVE_DISKLABEL_DKBAD.
This allows use of subr_disk_mbr on all archs.  Default to it for
the rump disk component.  No functional change for regular kernels.
(The other option would've been to include dkbad in disklabels
everywhere, but arguably this approach has less possible side-effects,
especially given that wedges and related magic will take over the
world any second now).
2009-11-23 13:40:08 +00:00
mbalmer 0ae57f90dd more s/the the/the/ 2009-11-22 19:09:15 +00:00
enami 07ab814664 Fix indentation, wrap long line and remove unused variable. 2009-11-19 03:01:05 +00:00
enami 9f91c09ebc Add missing vfs_unbusy() call in error path of sysctl_kern_vnode().
This allows us to reboot machine successfully even if pstat -v fails once.
2009-11-19 02:59:33 +00:00
pooka a8ed404de6 * make it possible to include kern_module in a kernel without vfs
support, i.e. move vfs functionality to a separate module
  (kern_module_vfs.c)
* make module proplist size an MI constant (now 8k) instead of PAGE_SIZE
* change some error values to something else than the karmic EINVAL
2009-11-18 17:40:45 +00:00
yamt d8b340409c turnstile_block: reduce code duplication. 2009-11-18 12:26:22 +00:00
yamt e8ed984955 turnstile_block: turn a comment into KASSERTs. 2009-11-18 12:25:15 +00:00
bouyer e3c6fd050a Fix getcleanvnode() in previous: in the if (vp->v_usecount != 0)
case we didn't bump the refcount, so don't decrease it through vrelel().
call mutex_exit() on v_interlock directly instead.
2009-11-17 22:20:14 +00:00
pooka 1d8a950195 Add a comment saying "name" to pool_init() is never freed (fixing
requires touching pool implementation).  No biggie, though, since
the pools themselves are never freed.
2009-11-17 14:38:31 +00:00
elad 903af42390 Include miscfs/specfs/specdev.h for spec_init(). 2009-11-15 02:37:13 +00:00
rmind 16347a5be7 kpsignal2: do not make the signal pending twice when tracing the process,
also update a comment and add an assert.  Fixes PR/42309 by Nicolas Joly.
2009-11-14 19:06:54 +00:00
elad 1570e68c40 - Move kauth_init() a little bit higher.
- Add spec_init() to authorize special device actions (and passthru too for
  the time being). Move policy out of secmodel_suser.
2009-11-14 18:36:56 +00:00
dsl e6a11930a4 Christos was worried about clrbits() being called with a length of zero.
This can't happen, but rework so it doesn't matter.
Remove 'optimisation' for length 1, that doesn't happen often enough.
2009-11-14 13:18:41 +00:00
dsl f3583ee6ce Fix clrbits() so that it doesn't mask no bits out of the byte after the
range (when the last bit to be cleared is the msb of a byte).
Fixes PR/42312 in a slightly better way than proposed.
2009-11-13 19:15:24 +00:00
dsl be258d919e Change args to clrbits() to be unsigned for efficiency. 2009-11-13 19:00:15 +00:00
dyoung 3ea78c91dc Use TAILQ_FOREACH() instead of open-coding it.
I applied this patch with Coccinelle's semantic patch tool, spatch(1).
I installed Coccinelle from pkgsrc: devel/coccinelle/.  I wrote
tailq.spatch and kdefs.h (see below) and ran this command,

spatch -debug -macro_file_builtins ./kdefs.h -outplace \
    -sp_file sys/kern/tailq.spatch sys/kern/subr_autoconf.c

which wrote the transformed source file to /tmp/subr_autoconf.c.  Then I
used indent(1) to fix the indentation.

::::::::::::::::::::
::: tailq.spatch :::
::::::::::::::::::::

@@
identifier I, N;
expression H;
statement S;
iterator name TAILQ_FOREACH;
@@

- for (I = TAILQ_FIRST(H); I != NULL; I = TAILQ_NEXT(I, N)) S
+ TAILQ_FOREACH(I, H, N) S

:::::::::::::::
::: kdefs.h :::
:::::::::::::::

#define MAXUSERS 64
#define _KERNEL
#define _KERNEL_OPT
#define i386

/*
 * Tail queue definitions.
 */
#define	_TAILQ_HEAD(name, type, qual)					\
struct name {								\
	qual type *tqh_first;		/* first element */		\
	qual type *qual *tqh_last;	/* addr of last next element */	\
}
#define TAILQ_HEAD(name, type)	_TAILQ_HEAD(name, struct type,)

#define	TAILQ_HEAD_INITIALIZER(head)					\
	{ NULL, &(head).tqh_first }

#define	_TAILQ_ENTRY(type, qual)					\
struct {								\
	qual type *tqe_next;		/* next element */		\
	qual type *qual *tqe_prev;	/* address of previous next element */\
}
#define TAILQ_ENTRY(type)	_TAILQ_ENTRY(struct type,)

#define	PMF_FN_PROTO1	pmf_qual_t
#define	PMF_FN_ARGS1	pmf_qual_t qual
#define	PMF_FN_CALL1	qual

#define	PMF_FN_PROTO	, pmf_qual_t
#define	PMF_FN_ARGS	, pmf_qual_t qual
#define	PMF_FN_CALL	, qual

#define __KERNEL_RCSID(a, b)
2009-11-12 23:16:28 +00:00
dyoung 972989f5e3 Move a device-deactivation pattern that is replicated throughout
the system into config_deactivate(dev): deactivate dev and all of
its descendants.  Block all interrupts while calling each device's
activation hook, ca_activate.  Now it is possible to simplify or
to delete several device-activation hooks throughout the system.

Do not deactivate a driver while detaching it!  If the driver was
already deactivated (because of accidental/emergency removal), let
the driver cope with the knowledge that DVF_ACTIVE has been cleared.
Otherwise, let the driver access the underlying hardware (so that
it can flush caches, restore original register settings, et cetera)
until it exits its device-detachment hook.

Let multiple readers and writers simultaneously access the system's
device_t list, alldevs, from either interrupt or thread context:
postpone changing alldevs linkages and freeing autoconf device
structures until a garbage-collection phase that runs after all
readers & writers have left the list.

Give device iterators (deviter(9)) a consistent view of alldevs no
matter whether device_t's are added and deleted during iteration:
keep a global alldevs generation number.  When an iterator enters
alldevs, record the current generation number in the iterator and
increase the global number.  When a device_t is created, label it
with the current global generation number.  When a device_t is
deleted, add a second label, the current global generation number.
During iteration, compare a device_t's added- and deleted-generation
with the iterator's generation and skip a device_t that was deleted
before the iterator entered the list or added after the iterator
entered the list.

The alldevs generation number is never 0.  The garbage collector
reaps device_t's whose delete-generation number is non-zero.

Make alldevs private to sys/kern/subr_autoconf.c.  Use deviter(9)
to access it.
2009-11-12 19:10:30 +00:00
rmind ad4f42d499 workqueue_finiqueue: remove unused variable. 2009-11-11 14:54:40 +00:00
rmind 1283950019 - selcommon/pollcommon: drop redundant l argument.
- Use cached curlwp->l_fd, instead of p->p_fd.
- Inline selscan/pollscan.
2009-11-11 09:48:50 +00:00
rmind e6f025f1da Add a small comment on buffer cache locking, fix mark letter b_objlock. 2009-11-11 09:15:42 +00:00
rmind 484f70316c G/C unused breada() and bdirty(). 2009-11-11 07:22:33 +00:00
cegger 9480c51b04 Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
2009-11-07 07:27:40 +00:00
pooka 1dac1a8cbc g/c M_SOFTINTR 2009-11-06 13:32:41 +00:00
dyoung fbe2bb0ace Use deviter(9) instead of accessing alldevs directly. 2009-11-05 18:07:19 +00:00
pooka 11b02a2b55 Excommunicate comment not abiding to the 80col dogma.
(well, turns out it was no longer valid either)
2009-11-05 16:15:51 +00:00
pooka 35a75982e4 expose module_{lookup,enqueue}() 2009-11-05 14:09:14 +00:00
bouyer 6b8161200e getcleanvnode(): don't vclean() the vnode if it has gained another
reference while we were getting the v_interlock.
vget(): attempt prevent it from returning a clean vnode:
  if the vnode is being inactivated (by vrelel()), wait for
  vrelel() to complete (or return EBUSY if we can't wait), and return
  ENOENT if the vnode has been vclean'ed by vrelel()
Fix kern/41147 in a better way, hopefully fix other related race conditions.
2009-11-05 08:18:02 +00:00
rmind 4c1098f541 do_sys_wait(): fix previous by checking for ru != NULL. Noticed by
Onno van der Linden.  Also, remove redundant arguments (seems that
was_zombie was not used since rev 1.177 ?).
2009-11-04 21:23:02 +00:00
pooka fcc20a4ba1 Split uiomove() and high-level copy routines out of the crowded
kern_subr and into their own cozy home in subr_copy.
2009-11-04 16:54:00 +00:00
pooka ab72032a6c nuke unused local variable 2009-11-04 15:35:09 +00:00
pooka 83685e650c Heave-ho mutex/rwlock object routines into separate modules -- they
don't have anything to do with the lock internals.
2009-11-04 13:29:45 +00:00
dyoung e48f8429d1 Add a kernel configuration flag, SPLDEBUG, that activates a per-CPU log
of transitions to IPL_HIGH from lower IPLs.  SPLDEBUG is only available
on i386 and Xen kernels, today.

'options SPLDEBUG' adds instrumentation to spllower() and splraise() as
well as routines to start/stop debugging and to record IPL transitions:
spldebug_start(), spldebug_stop(), spldebug_raise(), spldebug_lower().
2009-11-03 05:23:27 +00:00
dyoung 648f423c6f Make lockdebug_lock_print(NULL, ...) dump all locks. Now, in ddb,
'show lock 0x0' dumps all of the locks.

XXX I still need to fix 'show all lock'.
2009-11-03 00:29:11 +00:00
rmind b9a294cf04 - Move inittimeleft() and gettimeleft() to subr_time.c, where they belong.
- Move abstimeout2timo() there too and export.  Use it in lwp_park().
2009-11-01 21:46:09 +00:00
rmind 1ceff942e5 Move common logic in selcommon() and pollcommon() into sel_do_scan().
Avoids code duplication.  XXX: pollsock() should be converted too, except
it's a bit ugly.
2009-11-01 21:14:21 +00:00
rmind 1ff7612225 do_sys_wait: clear rusage, instead of returning garbage. Patch from
dholland@ via PR/40717, with minor change by me.
2009-11-01 21:05:30 +00:00
rmind 5ccbe1e208 orphanpg: remove no longer user variable. 2009-11-01 20:59:24 +00:00
njoly b83467c466 Make flock(2) more robust to invalid operation, such as
(LOCK_EX|LOCK_SH).
2009-10-28 18:24:44 +00:00
rmind e4be2748a3 - Amend fd_hold() to take an argument and add assert (reflects two cases,
fork1() and the rest, e.g. kthread_create(), when creating from lwp0).

- lwp_create(): do not touch filedesc internals, use fd_hold().
2009-10-27 02:58:28 +00:00
rmind 0ca6708c13 - Use pool(9) for pmf_event_workitem_t, instead of pool_cache(9). Still,
meta-data of this pool takes more space than the actual data..

- Reduce lowat/hiwat to 1..8, since intensity is very low.

- Remove unused pew_next_free from pmf_event_workitem_t.
2009-10-27 02:55:07 +00:00
rmind c32b625d4c Update comment about proc0_init(). 2009-10-26 19:03:17 +00:00
rmind 554a0142dc Initialise struct emul members by name (it is readable now and one can search
them in the tree).
2009-10-25 01:14:03 +00:00
rmind 33963b1448 Avoid #ifndef __NO_CPU_LWP_FREE, only ia64 is missing cpu_lwp_free
routines and it can/should provide stubs.
2009-10-22 22:28:57 +00:00
rmind 30d0b02e57 Make lwp_park_sobj and lwp_park_tab static.
Wrap long lines while here.
2009-10-22 13:12:47 +00:00
rmind 40cf6f3659 Remove uarea swap-out functionality:
- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code.  Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
2009-10-21 21:11:57 +00:00
jym de3d6f78cf Fix a bug where on MP systems, pool_cache_invalidate(9) could be called
early during boot, just after CPUs are attached but before they are marked
as running.

This will result in a list of CPUs without the SPCF_RUNNING flag set, and
will trigger the 'KASSERT(xc_tailp < xc_headp)' in xc_lowpri() as no cross
call is issued.

Bug reported and patch tested by tron@.

See also http://mail-index.netbsd.org/tech-kern/2009/10/19/msg006293.html
2009-10-20 17:24:22 +00:00
snj 07ce40632e Follow upstream's lead and remove third and fourth clauses (except on
from usr.sbin/mopd/common/pf.c, where only the ad clause is removed,
because it has a shared UCB copyright) on Mats O Jansson's files.

thorpej OK'd usr.sbin/rpc.yppasswdd/yppasswdd_mkpw.c, where he shares
copyright.
2009-10-20 00:51:13 +00:00
snj 4968c04d96 Move Eduardo Horvath's license to 2 clause. OK eeh@. 2009-10-19 18:12:37 +00:00
jnemeth 30d0592bd3 allow passing a NULL proplib dictionary to modctl(MODCTL_LOAD, ...) 2009-10-16 00:27:07 +00:00
thorpej 1f59a448f4 - pool_cache_invalidate(): broadcast a cross-call to drain the per-CPU
caches before draining the global cache.
- pool_cache_invalidate_local(): remove.
2009-10-15 20:50:12 +00:00
pooka 624234c0c5 Generate scheduling points around rump vnode operations. 2009-10-15 00:29:40 +00:00
dsl 931ac5949a Error out of ptcread() if the uio length supplied is zero before the code
has a chance to panic in ureadc().
2009-10-14 19:25:39 +00:00
pooka ddc943db02 regen: fix rump varargs syscalls prototypes 2009-10-13 21:57:52 +00:00
pooka 0d8bdf6131 For varargs syscalls, create rump prototypes which match the regular
system call counterparts, e.g.:
open(const char *, int, mode_t) -> open(const char *, int, ...)
2009-10-13 21:54:29 +00:00
yamt e894729250 sys___aio_suspend50, sys_lio_listio:
- fix the buffer sizes.
	- use kmem_alloc instead of kmem_zalloc for buffers which we will
	  overwrite soon.
2009-10-12 23:43:13 +00:00
yamt 29e552b036 wrap long lines. no functional changes. 2009-10-12 23:38:08 +00:00
yamt b8562be527 make aio_worker static. 2009-10-12 23:36:56 +00:00
yamt 5873138145 constify 2009-10-12 23:36:02 +00:00
yamt 28bf72b353 fix KMEM_SIZE vs KMEM_GUARD 2009-10-12 23:35:09 +00:00
yamt de25ce6a4c remove no longer necessary include of drvctl.h 2009-10-12 23:33:02 +00:00
yamt 199e4526f3 aio_suspend1: fix a double free bug. 2009-10-12 23:31:59 +00:00
dsl 65dd100015 Check for zero length read here - and return zero.
Most times we've come through spec_read() which has already done the test,
but not always (eg pty with ptsfs mounted).
Without this there is a simple local-user panic in ureadc().
Noted Matthew Mondor on tech-kern.
2009-10-11 17:20:48 +00:00
dsl 270307174b Fix locking when collecting pt_read and pt_ucntl. 2009-10-11 08:08:32 +00:00
jym 31629a1342 Add pool_cache_invalidate_local() to the pool_cache(9) API, to permit
per-CPU objects invalidation when cached in the pool cache.

See http://mail-index.netbsd.org/tech-kern/2009/10/05/msg006206.html .

Reviewed by bouyer@. Thanks!
2009-10-08 21:54:45 +00:00
elad 2cb56be586 Add a (weak aliased) machdep_init() as a place to do machdep initialization
that can't happen as early as the other init functions as called from
cpu_startup() -- for example, register kauth(9) listeners.

Put unprivileged policy in the x86 code; used by i386, amd64, and xen.
2009-10-06 21:07:05 +00:00
elad 756638cf95 Factor out a block of code that appears in three places (Veriexec, keylock,
and securelevel) so that others can use it as well.
2009-10-06 04:28:10 +00:00
rmind c9a5a18df3 mq_timedsend/mq_timedreceive: timeout value is absolute, not relative.
While here, drop unecessary (since fdesc API changes) lwp_t arguments.

Bug reported by Stathis Kamperis, thanks!
2009-10-05 23:49:46 +00:00
rmind 5503429772 shmexit: simplify a lot by avoiding unnecessary memory allocations, since
it is a last reference, just re-lock and check mapping list again.  Often
there wont be re-locks at all, moreover, shm_lock is not contended at all.
2009-10-05 23:47:04 +00:00
rmind c3a98b4c87 semu_alloc: simplify a little. 2009-10-05 23:46:02 +00:00
rmind ac8f63538a Convert cpu_number(), which can be sparse, to cpu_index(), which is MI. 2009-10-05 23:39:27 +00:00
elad 4c9fcb77c3 - Add usermount_common_policy() that implements some common (everything
but access control) user mounting policies: enforced MNT_NOSUID and
  MNT_NODEV, no MNT_EXPORT, MNT_EXEC propagation. This can be useful for
  secmodels that are interested in simply adding finer grained user mount
  support.

- Add a mount subsystem listener for KAUTH_REQ_SYSTEM_MOUNT_GET.
2009-10-05 04:20:13 +00:00
elad fa69dc186a Install floppies (haha) don't get built with ktrace/ptrace, so they don't
include kern/sys_process.c. Move proc_uidmatch() to kern/kern_proc.c which
always gets built instead.

Pointed out by Kurt Schreiner on current-users@:

    http://mail-index.netbsd.org/current-users/2009/10/03/msg010745.html
2009-10-04 03:15:08 +00:00
elad b2f3768346 - Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
  in sys_sched.c where we (a) initialize the sysctl node (no more
  link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.
2009-10-03 22:32:56 +00:00
elad 458410e7b5 Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk! 2009-10-03 21:21:56 +00:00
elad 54d08ac134 Update a comment. No functional change. 2009-10-03 21:03:55 +00:00
elad a39251ecc2 Introduce time_wraps() to check if setting the time will wrap it (or
close to it). Useful for secmodels.

Replace open-coded form with it in secmodel code (securelevel, keylock).

Note: I need to find a way to make secmodel_keylock.c ~<100 lines.
2009-10-03 20:48:42 +00:00
elad 7f720ad562 KAUTH_GENERIC_CANSEE -> KAUTH_REQ_NETWORK_SOCKET_CANSEE.
Not quite the same semantics but it's okay. Once our sockets have
credentials (and they will) it's all the same.
2009-10-03 20:24:39 +00:00
elad 5b3a96a24d Move KAUTH_NETWORK_BIND::KAUTH_REQ_NETWORK_BIND_PORT policy back to the
subsystem (or close to it).

Note: Revisit KAUTH_REQ_NETWORK_BIND_PRIVPORT.
2009-10-03 03:59:39 +00:00
elad 82ce55ed44 Move policies for KAUTH_PROCESS_{CANSEE,CORENAME,STOPFLAG,FORK} back to
the subsystem.

Note: Consider killing the signal listener and sticking
      KAUTH_PROCESS_SIGNAL here as well.
2009-10-03 03:38:31 +00:00
elad 111de3833c Finish moving socket policy to the subsystem. 2009-10-03 01:41:39 +00:00
elad 452ced03bd Move sched policy back to the subsystem. 2009-10-03 01:30:25 +00:00
elad 212f5fa214 Move kevent policy back to the subsystem. 2009-10-03 00:14:07 +00:00
elad abc7a4290b Put module loading policy back in the subsystem.
Revisit: consider moving kauth_init() above module_init() in main().
2009-10-03 00:06:37 +00:00
elad 1f98cab201 Put the tty opening policy back in the subsystem.
Remove include we don't need from the secmodel code.
2009-10-02 23:58:53 +00:00
elad 510083464f Move some of the socket policy back to the subsystem.
Remove include we don't need in the secmodel code.
2009-10-02 23:50:16 +00:00
elad 8751f894d8 Put signal delivery policy back in the subsystem. 2009-10-02 23:24:15 +00:00
elad 09f3ac9e2f Stick nice policy in its own subsystem and call the listener "resource"
rather than "rlimit"...
2009-10-02 22:46:18 +00:00
elad bcc5014bd0 Move rlimit policy back to the subsystem.
For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.
2009-10-02 22:38:45 +00:00
elad 2ae3a70827 Move ptrace's security policy back to the subsystem itself.
Add a ptrace_init() so we have a place to register the listener; called
next to ktrinit().
2009-10-02 22:18:56 +00:00
elad 40cc528a28 Move psets security policy back to the subsystem and keep suser logic only
in the suser secmodel code.
2009-10-02 21:56:28 +00:00
elad 932cd15f91 Move ktrace's subsystem security policy to the subsystem itself, and keep
just the suser-related logic in the suser secmodel.
2009-10-02 21:47:35 +00:00
elad 53ca19a3b3 First part of secmodel cleanup and other misc. changes:
- Separate the suser part of the bsd44 secmodel into its own secmodel
    and directory, pending even more cleanups. For revision history
    purposes, the original location of the files was

        src/sys/secmodel/bsd44/secmodel_bsd44_suser.c
        src/sys/secmodel/bsd44/suser.h

  - Add a man-page for secmodel_suser(9) and update the one for
    secmodel_bsd44(9).

  - Add a "secmodel" module class and use it. Userland program and
    documentation updated.

  - Manage secmodel count (nsecmodels) through the module framework.
    This eliminates the need for secmodel_{,de}register() calls in
    secmodel code.

  - Prepare for secmodel modularization by adding relevant module bits.
    The secmodels don't allow auto unload. The bsd44 secmodel depends
    on the suser and securelevel secmodels. The overlay secmodel depends
    on the bsd44 secmodel. As the module class is only cosmetic, and to
    prevent ambiguity, the bsd44 and overlay secmodels are prefixed with
    "secmodel_".

  - Adapt the overlay secmodel to recent changes (mainly vnode scope).

  - Stop using link-sets for the sysctl node(s) creation.

  - Keep sysctl variables under nodes of their relevant secmodels. In
    other words, don't create duplicates for the suser/securelevel
    secmodels under the bsd44 secmodel, as the latter is merely used
    for "grouping".

  - For the suser and securelevel secmodels, "advertise presence" in
    relevant sysctl nodes (sysctl.security.models.{suser,securelevel}).

  - Get rid of the LKM preprocessor stuff.

  - As secmodels are now modules, there's no need for an explicit call
    to secmodel_start(); it's handled by the module framework. That
    said, the module framework was adjusted to properly load secmodels
    early during system startup.

  - Adapt rump to changes: Instead of using empty stubs for securelevel,
    simply use the suser secmodel. Also replace secmodel_start() with a
    call to secmodel_suser_start().

  - 5.99.20.

Testing was done on i386 ("release" build). Spearated module_init()
changes were tested on sparc and sparc64 as well by martin@ (thanks!).

Mailing list reference:

	http://mail-index.netbsd.org/tech-kern/2009/09/25/msg006135.html
2009-10-02 18:50:12 +00:00
pooka 68f37adaa6 Give humanize_number & format_bytes their own spots in the sun and move
from kern_subr to subr_humanize.
2009-10-02 15:48:41 +00:00
pooka bea18fb702 Add dealloccnt to list of things to be considered in the stetson-harrison
decision making algorithm for flushing a wapbl transation.
2009-10-01 12:28:34 +00:00
pooka 5b19885537 Turn a KASSERT into a panic. I don't want us to be randomly
overwriting memory on non-DIAGNOSTIC kernels if resource estimation
fails.
2009-10-01 07:42:45 +00:00
dyoung e533051d0f #include "drvctl.h" for the NDRVCTL definition. Without the NDRVCTL
definition, drvctl_init() is not called, the drvctl_eventq is not
initialized, and the kernel will panic in devmon_insert() when a
device is detached.

Thanks to Jared McNeill for pointing out the panic.
2009-09-29 22:40:15 +00:00
pooka 8de13bd4c6 regen: remove VNODE_LOCKDEBUG 2009-09-29 11:54:52 +00:00
pooka ab3237b942 Add a switch on whether to create VNODE_LOCKDEBUG checks or not.
Since VNODE_LOCKDEBUG has never been generally useful, default to
off.  However, the checks can still be generated by flipping the
switch for the isolated cases where this form of dynamic analysis
is useful and the person using it knows what she is doing.
2009-09-29 11:51:02 +00:00
dholland 8d36057243 Move a big wodge of symlink-following code from nfsd to inside
lookup_for_nfsd(). This code is, or at least should be, the same as
the regular symlink-following code plus an extra flag nfsd needs.

The two lots of code can/will be merged in the future.
2009-09-27 17:23:53 +00:00
dholland fb458255a3 Rename lookup() to lookup_for_nfsd(), to make it clear just whose
private backdoor entry point this is.

Also, clone the lookup_for_nfsd() entry point as
lookup_for_nfsd_index(), for use by a different call site in nfsd that
does different unclean things with nameidata.
2009-09-27 17:19:07 +00:00
dyoung 7e8a3f8dc1 Replace 'struct device *' with 'device_t', throughout. No functional
change intended.
2009-09-25 19:21:09 +00:00
yamt d571330722 cwdinit: whitespace fix. no functional changes. 2009-09-24 06:14:22 +00:00
pooka 9b040bc3a9 Split config_init() into config_init() and config_init_mi() to help
platforms which want to call config_init() very early in the boot.
2009-09-21 12:14:46 +00:00
jmcneill ae17b8bef2 If vfs_mountroot fails, print a list of supported file systems. If no
file systems are supported by the kernel, print a big fat warning instead.
2009-09-19 16:20:41 +00:00
pooka 26e4989d18 Provide unwind log for bufq sysctls, since (theoretically) bufq might
not be initialized during kernel bootstrap and therefore "permanent"
nodes can be created only with an unwind log.
2009-09-17 09:54:27 +00:00
pooka 8a9910b608 Can't use CTLFLAG_PERMANENT here without providing a rollback log,
since accept filters aren't (necessarily) added during kernel boot
phase.

pointed out & tested by Geoff Wing
2009-09-17 08:09:49 +00:00
dyoung 8497597988 Nothing calls config_activate(9) any longer, so delete it. 2009-09-16 22:45:23 +00:00
dyoung 36fffd8d02 In pmf(9), improve the implementation of device self-suspension
and make suspension by self, by drvctl(8), and by ACPI system sleep
play nice together.  Start solidifying some temporary API changes.

1. Extract a new header file, <sys/device_if.h>, from <sys/device.h> and
   #include it from <sys/pmf.h> instead of <sys/device.h> to break the
   circular dependency between <sys/device.h> and <sys/pmf.h>.

2. Introduce pmf_qual_t, an aggregate of qualifications on a PMF
   suspend/resume call.  Start to replace instances of PMF_FN_PROTO,
   PMF_FN_ARGS, et cetera, with a pmf_qual_t.

3. Introduce the notion of a "suspensor," an entity that holds a
   device in suspension.  More than one suspensor may hold a device
   at once.  A device stays suspended as long as at least one
   suspensor holds it.  A device resumes when the last suspensor
   releases it.

   Currently, the kernel defines three suspensors,

   3a the system-suspensor: for system suspension, initiated
      by 'sysctl -w machdep.sleep_state=3', by lid closure, by
      power-button press, et cetera,

   3b the drvctl-suspensor: for device suspension by /dev/drvctl
      ioctl, e.g., drvctl -S sip0.

   3c the system self-suspensor: for device drivers that suspend
      themselves and their children.  Several drivers for network
      interfaces put the network device to sleep while it is not
      administratively up, that is, after the kernel calls if_stop(,
      1).  The self-suspensor should not be used directly.  See
      the description of suspensor delegates, below.

   A suspensor can have one or more "delegates".  A suspensor can
   release devices that its delegates hold suspended.  Right now,
   only the system self-suspensor has delegates.  For each device
   that a self-suspending driver attaches, it creates the device's
   self-suspensor, a delegate of the system self-suspensor.

   Suspensors stop a system-wide suspend/resume cycle from waking
   devices that the operator put to sleep with drvctl before the cycle.
   They also help self-suspension to work more simply, safely, and in
   accord with expectations.

4. Add the notion of device activation level, devact_level_t,
   and a routine for checking the current activation level,
   device_activation().  Current activation levels are DEVACT_LEVEL_BUS,
   DEVACT_LEVEL_DRIVER, and DEVACT_LEVEL_CLASS, which respectively
   indicate that the device's bus is active, that the bus and device are
   active, and that the bus, device, and the functions of the device's
   class (network, audio) are active.

   Suspend/resume calls can be qualified with a devact_level_t.
   The power-management framework treats a devact_level_t that
   qualifies a device suspension as the device's current activation
   level; it only runs hooks to reduce the activation level from
   the presumed current level to the fully suspended state.  The
   framework treats a devact_level_t qualifying device resumption
   as the target activation level; it only runs hooks to raise the
   activation level to the target.

5. Use pmf_qual_t, devact_level_t, and self-suspensors in several
   drivers.

6. Temporarily add an unused power-management workqueue that I will
   remove or replace, soon.
2009-09-16 16:34:49 +00:00
pooka 11281f01a0 Replace a large number of link set based sysctl node creations with
calls from subsystem constructors.  Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
2009-09-16 15:23:04 +00:00
pooka 41c00db98c Chop init_sysctl into base nodes (init_sysctl_base.c) and the
kitchen sink (init_sysctl.c).  Further surgery may be needed down
the line.
2009-09-16 15:03:56 +00:00
pooka fbd53556dc Wipe out the last vestiges of POOL_INIT with one swift stroke. In
most cases, use a proper constructor.  For proplib, give a local
equivalent of POOL_INIT for the kernel object implementation.  This
way the code structure can be preserved, and a local link set is
not hazardous anyway (unless proplib is split to several modules,
but that'll be the day).

tested by booting a kernel in qemu and compile-testing i386/ALL
2009-09-13 18:45:10 +00:00
bouyer b21564d63d PR kern/41923: assertion "cur != owner" failed
In the for(;;) loop of turnstile_block(), the lock owner can change while
cur's lock is released (cur's lock is also the tschain_t's mutex).
Remove the KASSERT about owner being invariant and try to deal with the
fact that the owner can change instead.
http://mail-index.netbsd.org/tech-kern/2009/08/24/msg005957.html
and followups.
2009-09-13 14:38:20 +00:00
dyoung c5d5f7697a Make ifconfig(8) set and display preference numbers for IPv6
addresses.  Make the kernel support SIOC[SG]IFADDRPREF for IPv6
interface addresses.

In in6ifa_ifpforlinklocal(), consult preference numbers before
making an otherwise arbitrary choice of in6_ifaddr.  Otherwise,
preference numbers are *not* consulted by the kernel, but that will
be rather easy for somebody with a little bit of free time to fix.

Please note that setting the preference number for a link-local
IPv6 address does not work right, yet, but that ought to be fixed
soon.

In support of the changes above,

1 Add a method to struct domain for "externalizing" a sockaddr, and
  provide an implementation for IPv6.  Expect more work in this area: it
  may be more proper to say that the IPv6 implementation "internalizes"
  a sockaddr.  Add sockaddr_externalize().

2 Add a subroutine, sofamily(), that returns a struct socket's address
  family or AF_UNSPEC.

3 Make a lot of IPv4-specific code generic, and move it from
  sys/netinet/ to sys/net/ for re-use by IPv6 parts of the kernel and
  ifconfig(8).
2009-09-11 22:06:29 +00:00
apb 7ab65de0a9 Expose the kernel's boothowto(9) variable through the sysctl
kern.boothowto variable.

Part of the /etc/rc silent changes requested in PR 41946
and proposed in tech-userlevel.
2009-09-11 18:14:58 +00:00
dyoung 3d4351e682 Delete whitespace at ends of lines. 2009-09-08 18:01:34 +00:00
pooka f926eb58c3 Remove autoconf dependency on vfs and dk:
opendisk() -> kern/subr_disk_open.c
config_handle_wedges -> dev/dkwedge/dk.c
2009-09-06 16:18:55 +00:00
pooka 5e46a7c29a Move configure() and configure2() from subr_autoconf.c to init_main.c,
since they are only peripherially related to the autoconf subsystem
and more related to boot initialization.  Also, apply _KERNEL_OPT
to autoconf where necessary.
2009-09-03 15:20:08 +00:00
jmcneill 56614eff97 In bdev_strategy, return ENXIO instead of panicing if the block device has
disappeared. ok pooka@
2009-09-03 11:42:21 +00:00
elad a162140107 Implement the vnode scope and adapt tmpfs to use it.
Mailing list reference:

	http://mail-index.netbsd.org/tech-kern/2009/07/04/msg005404.html
2009-09-03 04:45:27 +00:00
tls fd671f648a Add a direction argument to socket upcalls, so they can tell why they've
been called when, for example, they're waiting for space to write.  From
Ritesh Agrawal at Coyote Point.
2009-09-02 14:56:57 +00:00
pooka 5523d7f5c9 Initialize devsw (lock) early so that subsystems may play with it. 2009-09-02 08:07:05 +00:00
rmind e24f6c0896 Turn off pipe's direct I/O again, it corrupts the data (although build and
various activity survived while testing this).  Corruptions also happen on
sparc64 where emap is not in effect, therefore bugs are in direct I/O code.
2009-08-31 20:48:14 +00:00
rmind 3a8481feb4 Make pool_head static. 2009-08-29 00:09:02 +00:00
rmind 924c9047ea - Re-enable direct I/O with emap for pipe.
- While not used, #ifdef KVA allocation in emap (so it wont burn the space).
2009-08-29 00:06:43 +00:00
bouyer 389f5178ad In uipc_usrreq(PRU_ACCEPT), grab the unp_streamlock before unp_setpeerlocks().
This fixes a race where, for a short period of time, so->so_lock and
so2->so_lock are not sync. This makes solocked2() and solocked()
unreliable and cause DIAGNOSTIC kernel panics. This also fixes a possible
panic in unp_setaddr() which expects the socket locked.
Should fix kern/38968, fix proposed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005863.html
2009-08-26 22:34:47 +00:00
dyoung 2d89489416 In sysctl_create(), the first character of sysctl_name is
sysctl_name[0], so write that instead of sysctl_name[sz] (where sz
just happened to be set to 0 in the previous line).

Also in sysctl_create(), give the length of the sysctl_name its
own variable, nsz, and reserve sz for expressing the size of the
node's value.

No functional change intended.
2009-08-24 20:53:00 +00:00
manu dd47ec7336 Back out previous change: do not skip the test on rootspec, but make it
a simple attempt instead of an authoritative answer. The failure of the
rootspec test could me machine-dependant. Thanks to martin@ for pointing
that out.
2009-08-23 12:10:50 +00:00
dyoung 210a227e29 In sysctl_realloc(), don't make 'i' act as both an child-array
iterator and the length of the old child array, but introduce a
new variable, 'olen', for the latter purpose.

In sysctl_alloc(), name a constant.

Introduce sysctl_log_print(), a handy debug routine.

No functional changes intended.
2009-08-21 22:51:00 +00:00
dyoung 5a3627a2a6 Make sure that a sysctlnode's child nodes, even nodes that are not
yet in service, have a correct pointer to their parent, sysctl_parent.
This fixes a bug where sysctl_teardown(9) could not clean up a
network interface's sysctl(9) trees when I detached it, because
the wrong log had been recorded.
2009-08-21 22:43:32 +00:00
manu 61a1c8cdd1 When netbooting, rootspec is now "md0a", and it has no chance to match
an interface name, so do not give it a try.
2009-08-21 09:20:47 +00:00
yamt f97310f398 whitespace fixes. no functional changes. 2009-08-18 02:43:49 +00:00
christos a9d1bfd0c5 provide compatibility for the older variant of kern.consdev, which used
a 32 bit dev_t. Reported by mrg.
2009-08-16 20:28:19 +00:00
yamt 273f17a18a kauth_cred_free: add an assertion. 2009-08-16 11:01:12 +00:00
yamt 77d977dcbc assertion 2009-08-16 11:00:20 +00:00
yamt d59302b0e4 struct lwp -> lwp_t for consistency 2009-08-16 10:59:25 +00:00
haad 5f6671a94a Allow undescribed, direct ioctls as used by Unix. This capability was removed in BSD, presumably because nothing used it any more.
Third party system software written for Unix (like ZFS) requires this to work without significant modifications.

Ok supremeleader@
2009-08-13 08:57:43 +00:00
haad 5200b9b492 Add enum uio_seg argument to do_sys_mknod and do_sys_mkdir so these functions
can be called from kernel, too.

Change needed for zfs device node creation, until we have propoer devfs.

Oked by ad@.
2009-08-09 22:49:00 +00:00
dholland f821ac304a Begin splitting lookup() into more tractable pieces too. 2009-08-09 07:27:54 +00:00
dholland 40c09fbf2c Begin splitting up namei into smaller pieces. 2009-08-09 03:28:35 +00:00
dsl 3d8c11d579 ktrace the arguments to script interpreters that come from the script.
Fixes PR/33021
2009-08-06 21:33:54 +00:00
dsl 8b926bc93c Fix ktrace of data from iovec based system calls.
Fixes PR/41819
2009-08-05 19:53:42 +00:00
dsl 8129ef72eb lockf() passes its arguments through to fcntl() but is supposed to
support -ve lengths (lock area before current offset).
Nothing in libc or the kernel allowed for this, so some random part
of the file would get locked (no idea which bits).
Although this could probably be fixed in libc, the stubs for posix file
locks for emulations could easily get into the kernel with -ve lengths.
So fixing in the kernel avoids those problems.
This also fixes PR/41620 (attempting to lock negative offsets) - which
is what I was looking into!
2009-08-05 19:39:50 +00:00
bad 0152c542e8 Add a note to change_root() that the callers need to authorize the operation.
As requested by elad@.
2009-08-02 20:44:55 +00:00
christos f1cd8c73cb Don't return EWOULDBLOCK on an O_NONBLOCK tty file descriptor that has vmin > 0
and vtime > 0. It should be allowed to go to sleep for the sleep interval
indicated in vtime. Reported by der Mouse a long while ago, and this is what
other unixes do.
2009-08-01 23:07:05 +00:00
bad 02bcf17298 As discussed on tech-kern:
Factor out common code of chroot-like syscalls into change_root() and export
that function for use in other parts of the kernel.
Rename change_dir() to chdir_lookup() as the latter describes better what
the function does.  While there, move the namei_data initialisation into
chdir_lookup(), too.  And export chdir_lookup().
2009-08-01 21:17:11 +00:00
mbalmer 9d8b69b23a Do not attach gpiosim(4) at root, but make it a pseudo device.
With help from Matthias Drochner, thanks!
2009-07-27 17:40:57 +00:00
mbalmer 953ebaaf3d Allow gpiosim(4) to attach if configured in the kernel configuration. 2009-07-25 16:23:39 +00:00
christos 47736ab62e check return code from soreserve() (Sean Boudreau) 2009-07-24 01:09:49 +00:00
pooka 39de73aae0 +fhopen, +fhstatvfs1 RUMP 2009-07-21 23:59:00 +00:00
yamt 0436400c70 set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.
2009-07-19 10:11:55 +00:00
rmind db98cd9499 Regen. 2009-07-19 02:54:21 +00:00
rmind 7512d1e720 Make POSIX message queues a kernel module. 2009-07-19 02:50:44 +00:00
rmind b95f99b9f9 Fix previous, so that it actually works, correctly. 2009-07-19 02:26:49 +00:00
ad 5c5bb856e1 Don't send the quiet banner to the log, since the usual noise gets dumped
there anyway.
2009-07-17 23:31:51 +00:00
dyoung b734bafe0e Fix spelling: situatations -> situations. 2009-07-17 22:17:37 +00:00
dyoung b43b2d186c A definition in aic79xxvar.h somehow shadows pci_attach_args (ctags
bug?), so leave  it out of the tags computation for now.
2009-07-16 23:53:10 +00:00
rmind 569aa0de8b Revert previous: disable direct I/O on pipe, it cought a problem with emap. 2009-07-15 21:09:41 +00:00
apb dfcfba79d8 Convert free text inside #ifdef to a proper comment.
Inspired by PR 41255 from Kurt Lidl.
2009-07-14 20:59:00 +00:00
tsutsui 46133c54ef Add a workaround for some traditional ports (amiga and atari):
- Defer callout_setfunc() call after config_init() call in configure().

Fixes silent hang before consinit() at least on atari.

These traditional ports use config(9) structures and
autoconf(9) functions to detect console devices, and
config_init() is called at very early stage at boot
where mutex(9) is not ready.

Actually config_init() has been split out from configure()
for these ports:
http://cvsweb.NetBSD.org/bsdweb.cgi/src/sys/kern/subr_autoconf.c#rev1.74
while x68k has been fixed properly:
http://mail-index.NetBSD.org/source-changes/2009/01/17/msg215673.html

See also:
http://mail-index.NetBSD.org/port-x68k/2008/12/31/msg000006.html
http://mail-index.NetBSD.org/port-atari/2009/07/03/msg000419.html
2009-07-14 13:24:00 +00:00
rmind f80b636295 Re-enable direct I/O for pipe:
- Larger writes (2 or more pages) will use emap.
- Might help to catch rare hang (some very old bug).
2009-07-13 02:49:08 +00:00
rmind 7e069f82fb - Make insertion to message queue O(1) by using bitmap and array. However,
mq_prio_max is dynamic, and sorted list is used for custom setup, when
  user manually sets higher priority range.
- Cache mq->mq_attrib in some places.  Change msg_ptr type to uint8_t.
- Update copyright, misc.
2009-07-13 02:37:12 +00:00
rmind b83b94a98e mq_send/mq_receive: while permission may allow that, return EBADF if sending
to read-only queue, or receiving from write-only queue.

From Stathis Kamperis, thanks!
2009-07-13 00:41:08 +00:00
dyoung 2261ca8c07 In lwp_create(), take a reference to l2's filedesc_t instead of
taking a reference to curlwp's by calling fd_hold().  If lwp_create()
is called from fork1(), then l2 != curlwp, but l2's and not curlwp's
filedesc_t whose reference we should take.

This change stops the problem I describe in
<http://mail-index.netbsd.org/tech-kern/2009/07/09/msg005422.html>,
where /dev/rsd0a is never properly closed after fsck / runs on it.
This change seems to quiet my USB backup drive, sd0 at scsibus0 at
umass0, which had stopped spinning down when it was not in use:
The unit probably stayed open after mount(8) tried (and failed:
errant fstab entry) to mount it.

I am confident that this change is an improvement, but I doubt that
it is the last word on the matter.  I hate to get under the filedesc_t
abstraction by fiddling with fd_refcnt, and there may be something
I have missed, so somebody with greater understanding of the file
descriptors code should have a look.
2009-07-10 23:07:54 +00:00
dyoung bfd7452af9 pmf_event_inject(9) may be called from interrupt context, so we
must not allocate a pmf_event_workitem_t using kmem_alloc(9).  Use
pool_cache(9), instead, because it is safe in interrupt context.
Thanks, rmind@, for catching the problem and suggesting the solution.
2009-07-08 18:53:36 +00:00
joerg 73df1b22f7 Remove unused include. 2009-07-06 12:37:17 +00:00
elad 518bb3e503 Message queues also use genfs_can_access() to control access. Since the
latter might lose its KAUTH_GENERIC_ISSUSER check soon, add an internal
function, mqueue_access(), and call genfs_can_access() from it instead
so we don't pollute the main code path once we need to add a special
kauth(9) check for message queues.

No functional change, error codes preserved.

Related mailing list thread:

	http://mail-index.netbsd.org/tech-kern/2009/06/21/msg005311.html
2009-07-03 21:32:09 +00:00
pooka 1a0b832e88 expose mkdir to in-kernel consumers 2009-07-02 12:53:47 +00:00
martin 53822d1e78 Update fd_freefile when kqueue descriptors are not copied from
parent to child. From Wolfgang Solfrank in PR kern/41651.
Approved by Andrew Doran.
2009-06-30 20:32:49 +00:00
yamt 6d375e715d update a comment 2009-06-29 23:39:00 +00:00
dyoung c53d86bdf2 Fix a typo in last (coda/ exclusion). 2009-06-29 18:03:37 +00:00
dholland effcf1af5c Convert 67 namei call sites to use namei_simple, in these functions:
check_console, veriexecclose, veriexec_delete, veriexec_file_add,
emul_find_root, coff_load_shlib (sh3 version), coff_load_shlib,
compat_20_sys_statfs, compat_20_netbsd32_statfs,
ELFNAME2(netbsd32,probe_noteless), darwin_sys_statfs,
ibcs2_sys_statfs, ibcs2_sys_statvfs, linux_sys_uselib,
osf1_sys_statfs, sunos_sys_statfs, sunos32_sys_statfs,
ultrix_sys_statfs, do_sys_mount, fss_create_files (3 of 4),
adosfs_mount, cd9660_mount, coda_ioctl, coda_mount, ext2fs_mount,
ffs_mount, filecore_mount, hfs_mount, lfs_mount, msdosfs_mount,
ntfs_mount, sysvbfs_mount, udf_mount, union_mount, sys_chflags,
sys_lchflags, sys_chmod, sys_lchmod, sys_chown, sys_lchown,
sys___posix_chown, sys___posix_lchown, sys_link, do_sys_pstatvfs,
sys_quotactl, sys_revoke, sys_truncate, do_sys_utimes, sys_extattrctl,
sys_extattr_set_file, sys_extattr_set_link, sys_extattr_get_file,
sys_extattr_get_link, sys_extattr_delete_file,
sys_extattr_delete_link, sys_extattr_list_file, sys_extattr_list_link,
sys_setxattr, sys_lsetxattr, sys_getxattr, sys_lgetxattr,
sys_listxattr, sys_llistxattr, sys_removexattr, sys_lremovexattr

All have been scrutinized (several times, in fact) and compile-tested,
but not all have been explicitly tested in action.

XXX: While I haven't (intentionally) changed the use or nonuse of
XXX: TRYEMULROOT in any of these places, I'm not convinced all the
XXX: uses are correct; an audit might be desirable.
2009-06-29 05:08:15 +00:00
dholland acfecf55d7 Add namei_simple_kernel and namei_simple_user. These provide the common
case functionality of namei in a simple package with only a couple flags.

A substantial majority of the namei call sites in the kernel can use
this interface; this will isolate those areas from the changes arising
as the internals of namei are fumigated.
2009-06-29 05:00:14 +00:00
rmind fe55ad324c panic: use MI cpu_index(), instead of cpu_number(), which could be sparse. 2009-06-28 15:30:30 +00:00
rmind 5c68e5d0ee Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect.  To track that, global and per-CPU generation numbers
are used.  This idea was suggested by Andrew Doran; various improvements to
it by me.  Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.
2009-06-28 15:18:50 +00:00
rmind 7b7c187a92 Amend previous. 2009-06-28 14:34:48 +00:00
rmind 39b52425ff - Convert some #ifdefs to KASSERT()s.
- KNF, style, no parameters in function declarations.
- No functional changes.
2009-06-28 14:22:11 +00:00
yamt 85542b11cd wrap a long line. 2009-06-28 11:42:07 +00:00
ad 5b4feac126 idle_loop: explicitly go to spl0() to sidestep potential MD bugs. 2009-06-28 09:25:05 +00:00
dyoung b4f24be356 sys/coda/ rudely re-#defines some kernel constants and such, so
leave it out of the tags for now.
2009-06-26 22:59:25 +00:00
dyoung 9d9978e5a5 Switch to kmem(9).
(void *)pew is one way to get a struct work *, but let's
write&pew->pew_work, instead.  It is more defensive and persuasive.

Make miscellaneous changes in support of tearing down arbitrary
stacks of filesystems and devices during shutdown:

1 Move struct shutdown_state, shutdown_first(), and shutdown_next(),
  from kern_pmf.c to subr_autoconf.c.  Rename detach_all() to
  config_detach_all(), and move it from kern_pmf.c to subr_autoconf.c.
  Export all of those routines.

2 In pmf_system_shutdown(), do not suspend user process scheduling, and
  do not detach all devices: I am going to do that in cpu_reboot(),
  instead.  (Soon I will do it in an MI cpu_reboot() routine.)  Do still
  call PMF shutdown hooks.

3 In config_detach(), add a DIAGNOSTIC assertion: if we're exiting
  config_detach() at the bottom, alldevs_nwrite had better not be 0,
  because config_detach() is a writer of the device list.

4 In deviter_release(), check to see if we're iterating the device list
  for reading, *first*, and if so, decrease the number of readers.  Used
  to be that if we happened to be reading during shutdown, we ran the
  shutdown branch.  Thus the number of writers reached 0, the number
  of readers remained > 0, and no writer could iterate again.  Under
  certain circumstances that would cause a hang during shutdown.
2009-06-26 19:30:45 +00:00
dyoung 57a3ffeae7 Cosmetic: remove #if 1 / #endif. 2009-06-26 18:58:14 +00:00
dyoung 0b429bf76a Keep a generation number, mountgen, that increases every time a
filesystem is mounted.  Synchronize access to the number with a
mutex.  When a struct mount, mp, is allocated, assign the current
generation number to mp->mnt_gen.  Introduce vfs_unmount_forceone()
that forcefully unmounts the most recently mounted filesystem.

Refactor: extract vfs_shutdown1() from vfs_shutdown().  Extract
vfs_sync_all() from vfs_shutdown1().

Print more progress indications while we're unmounting all of the
filesystems during shutdown.

We increase the reference count on mp before calling dounmount(mp),
but we do not decrease it if dounmount(mp) fails, and neither does
dounmount(mp).  So decrease the reference count if dounmount(mp)
fails.

Change the loop terminating condition in vfs_unmountall1() to (mp
!= (void *)&mountlist) from !CIRCLEQ_EMPTY(&mountlist), because we
may not ever empty the list, especially if we're not forcing the
filesystems to unmount.
2009-06-26 18:53:07 +00:00
christos 2ee7096547 magic symlink cleanup:
- use size_t for len
- don't call strlen multiple times in macro
- add gid
- off by one in bounds calculation
2009-06-26 15:49:03 +00:00
elad 55f182207a Wow... too much Python.
Fix DIAGNOSTIC build breakage: print -> printf.

Pointed out by Kurt Schreiner on current-users@:

    http://mail-index.netbsd.org/current-users/2009/06/23/msg009815.html
2009-06-23 23:04:11 +00:00
elad 870920260d Move the implementation of vaccess() to genfs_can_access(), in line with
the other routines of the same spirit.

Adjust file-system code to use it.

Keep vaccess() for KPI compatibility and to keep element of least
surprise. A "diagnostic" message warning that vaccess() is deprecated will
be printed when it's used (obviously, only in DIAGNOSTIC kernels).

No objections on tech-kern@:

	http://mail-index.netbsd.org/tech-kern/2009/06/21/msg005310.html
2009-06-23 19:36:38 +00:00
cegger 4765113ada Return type of cpu_number(9) is cpuid_t which is effectively unsigned long.
So cast return type to unsigned long.
Fixes build for alpha GENERIC kernel.
2009-06-20 11:10:40 +00:00
mrg 8520c31093 when printing a ddb stack trace when entering ddb, include the cpu number 2009-06-18 06:26:58 +00:00
dyoung 61fa5bb9be Make kobj_stat() return ENOSYS instead of panicking ("not modular")
on non-MODULAR kernels.  Make a few kobj_stat() callers check for
a non-zero return code and deal gracefully.
2009-06-17 21:04:25 +00:00
kardel a888100516 Make PPS work with fast time counters (> 2GHz)
by making the pps count time stamp and the update
time stamp u_int64.
The time delta between two PPS events can now
be correctly calculated avoiding any unaccounted
for wraps with 32-bit counters.
2009-06-14 13:16:32 +00:00
plunky 6e74f4625b Writes on the controlling tty were not being awoken from blocks,
use the correct condvar to make this happen.

this fixes PR/41566
2009-06-12 09:26:50 +00:00
yamt 1a7984dbf3 do_posix_fadvise:
- deactivate pages on POSIX_FADV_DONTNEED.
	- more sanity checks.  fix a panic in genfs_getpages
	  introduced by the previous (rev.1.15).
2009-06-10 23:48:10 +00:00
yamt 724fd50176 don't make F_GETLK or the common case of F_UNLCK fail for per-user limit. 2009-06-10 22:34:35 +00:00
yamt 5216f042b0 lf_split: cv_destroy a condvar before clobbering it. 2009-06-10 22:23:15 +00:00
yamt 1763b7795c do_posix_fadvise: on POSIX_FADV_WILLNEED, start prefeching of object's pages. 2009-06-10 01:56:34 +00:00
jnemeth cbd3656645 Add the MODCTL_NO_PROP flag to tell the kernel to ignore <module>.prop.
Add the '-P' option to modload(8) to set this flag.
2009-06-09 20:35:02 +00:00
jnemeth 32b670979a Add code to merge the modload "command line" with <module>.prop. 2009-06-09 19:09:03 +00:00
yamt 5c0faad4bd fd_free: fix posix advisory locks. PR/41549 from HITOSHI OSADA. 2009-06-08 00:19:56 +00:00
jnemeth 1bdbe18dce Read in a <module>.prop file if it exists and internalize then pass it
to the <module> being loaded.

XXX A <module>.prop file will override anything on the "command line".
This will be fixed in the next commit.
2009-06-07 09:47:31 +00:00
yamt 6f174f1311 shut up the following assertion failure and add a comment.
panic: kernel diagnostic assertion "!fd_isused(fdp, fd)" failed: file "/siro/nbsd/src/sys/kern/kern_descrip.c", line 175
2009-06-07 09:39:02 +00:00
jnemeth fa6c059bce add KASSERT(p != NULL); to kmem_free() 2009-06-03 22:54:51 +00:00
pooka 48fb37f153 opt for _KERNEL_OPT 2009-06-03 15:07:30 +00:00
pooka b89c189be7 Declare extern syscallnames in a header. 2009-06-02 23:21:37 +00:00
yamt de0c01fd1d do_posix_fadvise: turn some KASSERTs into CTASSERTs. 2009-05-31 22:15:13 +00:00
yamt 31a7ec7dc7 sched_pstats_hook: fix estcpu decay.
this makes my desktop usable when running "make -j4".
2009-05-31 04:13:33 +00:00
dyoung 2bc3b9efe1 In config_detach(9), if device deactivation fails with EOPNOTSUPP,
don't treat it as an error.  This should stop the kernel from
panicking in config_detach(9) when sd(4)/wd(4) detach.
2009-05-29 23:27:08 +00:00
yamt 75c4e4fde7 fd_free: reset fd_himap/lomap to make fd_checkmaps comfortable. PR/41487. 2009-05-29 00:10:52 +00:00
yamt 4f22237449 wrap a long line. 2009-05-28 22:17:04 +00:00
pooka 8027ac8f1f Make domaininit() take an argument which determines if it should
add the special PF_ROUTE domain or not (if available).
2009-05-27 23:44:35 +00:00
hannken 5b4e527c76 PR kern/39536: bufq related problem when writing DVDR and DVDRWs.
Remove a race where physio_done() may use memory already freed.

Observed by Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>.
2009-05-26 14:59:31 +00:00
jnemeth d73b80a12b Move all namei flags handling into kobj_load_file().
When I originally wrote this, I was going for maximum flexibility.
However, after a private discussion with dholland@, I see how this
will cause problems with the future world order of namei whenever
that might be.  At the moment, I don't need the extra flexibility,
but if something comes up this may have to be revisited.
2009-05-26 08:34:22 +00:00
elad ae660023a4 PR/41489: Stathis Kamperis: etpriority(2) returns EACCES instead of EPERM
Per discussion on the PR's audit trail, put back original checks for now.
2009-05-26 06:57:38 +00:00
ad 0913d2e2f5 PR kern/41487: kern_descrip.c assertion failure
Remove bogus assertion.
2009-05-26 00:42:33 +00:00
rmind 75f55a05eb - Slightly rework the way permissions are checked. Neither mq_receive() not
mq_send() should fail due to permissions.  Noted by Stathis Kamperis!
- Check for empty message queue name (POSIX does not allow this for regular
  files, and it's weird), check for DTYPE_MQUEUE, fix permission check in
  mq_unlink(), clean up.
2009-05-26 00:39:14 +00:00
jnemeth a15ece476a Phase 0.5 of my options MODULAR enhancements. As suggested by ad@,
these commits move all path handling into module_do_load() from
kobj_load_file().  This way the final path used to load a module
is available for loading <module>.plist, which will store parameters
for a module.  The end goal of this project is good support for
MODULAR device drivers.
2009-05-25 22:33:00 +00:00
ad d991fcb3b6 More changes to improve kern_descrip.c.
- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
  It was only being used to synchronize close, and in any case we needed
  to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
  that we can eliminate the membar_consumer() call in fd_getfile().  This is
  mostly syntactic sugar; the main functional change is that fd_nfiles now
  lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.
2009-05-24 21:41:25 +00:00
ad 193d553767 Split out kobj structures so crash/ddb can get at them. 2009-05-24 15:00:24 +00:00
ad 27695c89cb ddb: don't try to stat builtin modules. 2009-05-24 14:54:17 +00:00
ad 7d20bf2a9e Bus scans can make it appear as if the system has paused, so
twiddle constantly while config_interrupts() jobs are running.
2009-05-24 12:27:50 +00:00
ad 3cb7a24bec Make descriptor access and file allocation cheaper in many cases,
mostly by avoiding a bunch of atomic operations.
2009-05-23 18:28:05 +00:00
ad f0545a5e5b - Add lwp_pctr(), get an LWP's preemption/ctxsw counter.
- Fix a preemption bug in CURCPU_IDLE_P() that can lead to a bogus
  assertion failure on DEBUG kernels.
- Fix MP/preemption races with timecounter detachment.
2009-05-23 18:21:20 +00:00
ad 2fc2b08001 - Add lwp_pctr(), get an LWP's preemption/ctxsw counter.
- Fix a preemption bug in CURCPU_IDLE_P() that can lead to a bogus
  assertion failure on DEBUG kernels.
- Fix MP/preemption races with timecounter detachment.
2009-05-23 17:08:04 +00:00
ad cb95ab6e35 Fix a crash observed when trying to load a corrupted ELF image. 2009-05-23 15:13:57 +00:00
dyoung 85d8d1fcdd On second thought, let's call disk_predetach() disk_begindetach().
Verbs are good.
2009-05-20 03:26:21 +00:00
dyoung a76a7fd159 Encapsulate the checks that I do before detaching a disk(9) provider
in a pre-detachment routine, disk_predetach().
2009-05-19 23:42:05 +00:00
bouyer 8ebd73cde8 Back out rev 1.27 now that MD implementations of spl*() have been fixed
to be a memory barrier.
2009-05-18 21:31:27 +00:00
ad 77e6671be0 - Remove unneded uvm_lwp_hold(), uvm_lwp_rele().
- Make physio_concurrency tuneable via crash(8).
- Update comments.
2009-05-18 21:12:33 +00:00
ad 92ee1731b0 Updates to f_flag need to be made with atomics. 2009-05-17 10:08:38 +00:00
yamt 7e13bf31c7 remove FILE_LOCK and FILE_UNLOCK. 2009-05-17 05:54:42 +00:00
rmind ba3fa2c82f sys_mq_open: remove broken access flag check.
Noted by Stathis Kamperis.
2009-05-16 23:58:09 +00:00
yamt 5368015c69 sysctl_doeproc:
- simplify.
	- KERN_PROC: fix possible stale proc pointer dereference.
	- KERN_PROC: don't do copyout with proc_lock held.
2009-05-16 12:02:00 +00:00
yamt 805df27570 rw_vector_exit: remove a redundant condition. 2009-05-16 08:36:32 +00:00
yamt 513f4955a7 put a flag bit into v_usecount to prevent vtryget during getcleanvnode.
this fixes the following deadlock.

	a thread doing getcleanvnode:
	pick a vnode
	acqure v_interlock
	v_usecount++
	call vclean

		now, another thread doing cache_lookup:
		picks the vnode
		vtryget succeed
		vn_lock succeed

	now in vclean:
	set VI_XLOCK (too late to be noticed by the competing thread)
	wait on the vnode lock (this might violate locking order)

the use of a flag bit was suggested by Andrew Doran.  PR/41374.
2009-05-16 08:29:53 +00:00
pooka e8f5dfa79e regen: pad -> PAD 2009-05-15 15:52:39 +00:00
pooka 6c68c84345 Use argname PAD to signal that an argument is used only for padding
and not part of the C interface.  Use this information for rump
syscalls to generate syscall interfaces without the extra parameter.
2009-05-15 15:51:27 +00:00
pooka 500fdd36a7 In addition to off_t alignment, check for dev_t and time_t too
(we don't currently have any syscalls passing time_t, though)
2009-05-15 14:52:47 +00:00
yamt d4da6c3d2e don't forget to skip marker processes. 2009-05-12 11:42:12 +00:00
yamt bed2400e59 lockdebug fixes for rw_tryupgrade/rw_downgrade. 2009-05-09 03:33:10 +00:00
yamt 9031548af6 exit1: fix a race with do_sys_wait/proc_free. 2009-05-08 13:32:59 +00:00
bouyer f48b5c49cc Declare sh_flags volatile.
Without it, on ports where splhigh() is inline, the compiler will optimise
the second SOFTINT_PENDING test in softint_schedule(). A dissasembly
of softint_schedule() with and without the volatile sh_flags confirm this
on sparc.
Because of this there is a race that could lead to the softhand_t
being enqueued twice on si_q, leading to a corrupted queue and
some handler being SOFTINT_PENDING but never called.

Should fix PR kern/38637
2009-05-05 20:26:36 +00:00
yamt 183ff8793d sysctl_doeproc: fix a bug in rev.1.135.
don't forget to mark our marker process PK_MARKER.
this fixes crashes in sched_pstats, etc.
2009-05-04 14:52:33 +00:00
yamt 6f0983460b when freeing cn_pnbuf, make it NULL if DIAGNOSTIC. 2009-05-04 06:05:19 +00:00
yamt 706e6928e0 tweak some assertions on so_head to make them more meaningful. 2009-05-04 06:02:40 +00:00