Commit Graph

1125 Commits

Author SHA1 Message Date
ad
284c2b9aef Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.
2008-04-24 18:39:20 +00:00
ad
6d70f903e6 Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
  be sent from a hardware interrupt handler. Signal activity must be
  deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
  and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
2008-04-24 15:35:27 +00:00
ad
ef9411cb09 Fix locking in the fifo kqueue routines. 2008-04-24 15:18:11 +00:00
ad
15e29e981b Merge the socket locking patch:
- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
2008-04-24 11:38:36 +00:00
hannken
0789b071d1 Remove a race when pages are released while waiting for fstrans_start().
Fixes PR #38460
2008-04-19 11:53:13 +00:00
hannken
dc04f63f5b Remove stale include <sys/fstrans.h>. 2008-04-19 11:49:54 +00:00
ad
a9ca7a3734 Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
2008-03-21 21:54:58 +00:00
yamt
29e0fd1c9e sprinkle KERNEL_LOCK for socket.
a little different version was tested by Matthias Drochner.
2008-02-11 23:53:32 +00:00
ad
d7f6ec471c Don't lock the socket to set/clear FNONBLOCK. Just set it atomically. 2008-02-06 21:57:53 +00:00
ad
22c6a20ebd Lock v_knlist with the vnode interlock. PR kern/37881. 2008-02-05 14:19:52 +00:00
ad
25153c3ec9 PR kern/37706 (forced unmount of file systems is unsafe):
- Do reference counting for 'struct mount'. Each vnode associated with a
  mount takes a reference, and in turn the mount takes a reference to the
  vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
  locking inherited from 4.4BSD with a recursable rwlock.
2008-01-30 11:46:59 +00:00
ad
3490efcc63 Replace struct lock on vnodes with a simpler lock object built on
krwlock_t. This is a step towards removing lockmgr and simplifying
vnode locking. Discussed on tech-kern.
2008-01-30 09:50:19 +00:00
dholland
764ffd05f0 Part of the rename patches *doh* 2008-01-28 15:17:54 +00:00
dholland
717e1785a5 Fix some race conditions in rename.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
2008-01-28 14:31:15 +00:00
hannken
5ab6217754 Spec_open(): clear sd_bdevvp if bdev_open() failed.
Ok: Andrew Doran <ad@netbsd.org>
2008-01-25 16:21:04 +00:00
riz
960857eb6d Since VOP_LEASE is gone, remove genfs_lease_check() too. Now my kernel
builds again.  :)
2008-01-25 15:34:59 +00:00
ad
1997a1e1f4 Remove VOP_LEASE. Discussed on tech-kern. 2008-01-25 14:32:11 +00:00
ad
f9a31c8cd0 spec_fsync: don't assert that 'vp' holds the block device open. If it's
not open, there shouldn't be dirty buffers so vinvalbuf() is harmless.
2008-01-24 21:05:52 +00:00
ad
703069c0e9 specfs changes for PR kern/37717 (raidclose() is no longer called on
shutdown). There are still problems with device access and a PR will be
filed.

- Kill checkalias(). Allow multiple vnodes to reference a single device.

- Don't play dangerous tricks with block vnodes to ensure that only one
  vnode can describe a block device. Instead, prohibit concurrent opens of
  block devices. As a bonus remove the unreliable code that prevents
  multiple file system mounts on the same device. It's no longer needed.

- Track opens by vnode and by device. Issue cdev_close() when the last open
  goes away, instead of abusing vnode::v_usecount to tell if the device is
  open.
2008-01-24 17:32:52 +00:00
ad
27c0e63a2a layer_node_find: if we find a node being cleaned out, then ignore it and
continue.  A thread trying to clean out the extant layer vnode needs to
acquire the shared lock (i.e. the lower vnode's lock), which our caller
already holds. To allow the cleaning to succeed the current thread must make
progress.  So, for a brief time more than one vnode in a layered file system
may refer to a single vnode in the lower file system.
2008-01-23 20:11:32 +00:00
elad
c27d5f30b6 Tons of process scope changes.
- Add a KAUTH_PROCESS_SCHEDULER action, to handle scheduler related
    requests, and add specific requests for set/get scheduler policy and
    set/get scheduler parameters.

  - Add a KAUTH_PROCESS_KEVENT_FILTER action, to handle kevent(2) related
    requests.

  - Add a KAUTH_DEVICE_TTY_STI action to handle requests to TIOCSTI.

  - Add requests for the KAUTH_PROCESS_CANSEE action, indicating what
    process information is being looked at (entry itself, args, env,
    open files).

  - Add requests for the KAUTH_PROCESS_RLIMIT action indicating set/get.

  - Add requests for the KAUTH_PROCESS_CORENAME action indicating set/get.

  - Make bsd44 secmodel code handle the newly added rqeuests appropriately.

All of the above make it possible to issue finer-grained kauth(9) calls in
many places, removing some KAUTH_GENERIC_ISSUSER requests.

  - Remove the "CAN" from KAUTH_PROCESS_CAN{KTRACE,PROCFS,PTRACE,SIGNAL}.

Discussed with christos@ and yamt@.
2008-01-23 15:04:38 +00:00
pooka
f7455b20d9 portal_advlock: badop -> eopnotsupp. I guess advlock can be called
for the root vnode and badop panics.

fix in PR kern/25393 by Laurent Sartran
2008-01-19 21:54:47 +00:00
yamt
93a915eb7a genfs_do_putpages: DEBUG checks. 2008-01-18 11:01:23 +00:00
yamt
36c701bcd4 genfs_do_putpages: ensure that we clean the vnode in the case of PGO_RECLAIM. 2008-01-18 11:00:53 +00:00
yamt
2b40f35040 push pmap_clear_reference calls into pdpolicy code, where reference bits
actually matter.
2008-01-18 10:48:23 +00:00
ad
4eb2a42ae6 Fix v_freelisthd assertion failure during call to vdevdone(). No calling
VOPs without a vnode reference!
2008-01-17 17:28:54 +00:00
ad
4a780c9ae2 Merge vmlocking2 to head. 2008-01-02 11:48:20 +00:00
ad
ea3f10f7e0 Merge more changes from vmlocking2, mainly:
- Locking improvements.
- Use pool_cache for more items.
2007-12-26 16:01:34 +00:00
yamt
2294b0bcb6 procfs_douptime: simply use microuptime() instead of a mysterious calculation. 2007-12-22 01:06:54 +00:00
yamt
0d13423925 procfs_docpustat: g/c a write-only variable. 2007-12-22 01:04:55 +00:00
dyoung
6528dd9d56 Bug fix: at the top of layer_bypass(), save a pointer to the mount
point for re-use at the bottom, instead of trying to re-read the
mount point from a potentially vrele()'d vnode.
2007-12-22 00:48:46 +00:00
christos
177940c72e use vnode_to_path. 2007-12-15 23:52:00 +00:00
pooka
db06a930e6 Remove cn_lwp from struct componentname. curlwp should be used
from on.  The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
2007-12-08 19:29:36 +00:00
ad
6ab26a0fa8 Partially merge syncer changes from vmlocking2. 2007-12-08 15:47:32 +00:00
ad
7c9b007bbc Destroy ovm_hashlock before freeing. 2007-12-08 15:12:15 +00:00
ad
0444cfe507 Use kmem_alloc/free. 2007-12-08 15:10:22 +00:00
pooka
4e38160d4d Do not "return 1" from kqfilter for errors. That value is passed
directly to the userland caller and results in a mysterious EPERM.
Instead, return EINVAL or something else sensible depending on the
case.
2007-12-05 17:19:46 +00:00
hannken
d556dc98b0 Fscow_run(): add a flag "bool data_valid" to note still valid data.
Buffers run through copy-on-write are marked B_COWDONE.  This condition
is valid until the buffer has run through bwrite() and gets cleared from
biodone().

Welcome to 4.99.39.

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
2007-12-02 13:56:15 +00:00
pooka
61e8303e9d Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start.  In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
2007-11-26 19:01:26 +00:00
ad
ad89ae5a21 Revision 1.42 was lost. Pointed out by Nicolas Joly:
This was using mutex_exit where mutex_enter was required.
2007-11-12 14:11:47 +00:00
christos
dfdca25ef7 report the proper stack size on 32 bit emulations. 2007-11-11 18:29:03 +00:00
christos
26515bc536 make the last argument of procfs_dir size_t 2007-11-09 22:45:49 +00:00
ad
d18c6ca4de Merge from vmlocking:
- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.
2007-11-07 00:23:13 +00:00
pooka
735dd21e07 Split I/O-related routines (getpages, putpages, etc.) which are heavily
tied to uvm out of genfs_vnops into genfs_io.c
2007-10-17 16:45:00 +00:00
ad
6b7322f1ed This was using mutex_exit where mutex_enter was required. 2007-10-11 18:46:19 +00:00
ad
3fa279a5ee umapm_hashlock is a mutex. 2007-10-10 22:07:48 +00:00
ad
7dad9f7391 Merge from vmlocking:
- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
2007-10-10 20:42:20 +00:00
ad
36a1712707 Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.
2007-10-08 20:06:17 +00:00
ad
9f56dfa520 Merge brelse() changes from the vmlocking branch. 2007-10-08 18:02:53 +00:00
ad
451aacda90 Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.
2007-10-08 15:12:05 +00:00
hannken
3856acafe2 Update the file system copy-on-write handler.
- Instead of hooking the handler on the specdev of a mounted file system
  hook directly on the `struct mount'.

- Rename from `vn_cow_*' to `fscow_*' and move to `kern/vfs_trans.c'.  Use
  `mount_*specific' instead of clobbering `struct mount' or `struct specinfo'.

- Replace the hand-made reader/writer lock with a krwlock.

- Keep `vn_cow_*' functions and mark as obsolete.

- Welcome to NetBSD 4.99.32 - `struct specinfo' changed size.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
2007-10-07 13:38:53 +00:00
pooka
3f3cac88a3 Make bioops a pointer and point it to the softdeps struct in softdep
init.  Decouples "options SOFTDEP" from the main kernel and ffs code.
2007-09-01 23:40:21 +00:00
pooka
ce3dd6b3a6 cleanup unused prototype 2007-08-03 08:50:23 +00:00
pooka
9feac0b35c ANSI-fy 2007-08-03 08:45:36 +00:00
pooka
8d1f899239 * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
  use VFS_PROTOS() instead of manually prototyping the methods
2007-07-31 21:14:15 +00:00
ad
a0d1fd8d0c It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
2007-07-29 13:31:07 +00:00
ad
66fefd117b It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
2007-07-29 12:15:35 +00:00
pooka
91f15f1760 whoops, forgot to commit this a while back: initialize new vnode size 2007-07-27 08:38:39 +00:00
pooka
c59e414d23 vop_mmap parameter change 2007-07-27 08:32:44 +00:00
pooka
d9970c8066 Use eopnotsupp() instead of vfs_stdsuspendctl() and retire the latter. 2007-07-26 22:57:36 +00:00
pooka
606670f3e8 Initialize size and/or writesize when creating a vnode. 2007-07-23 11:27:45 +00:00
pooka
05ce20f4a0 Retire uvn_attach() - it abuses VXLOCK and its functionality,
setting vnode sizes, is handled elsewhere: file system vnode creation
or spec_open() for regular files or block special files, respectively.

Add a call to VOP_MMAP() to the pagedvn exec path, since the vnode
is being memory mapped.

reviewed by tech-kern & wrstuden
2007-07-22 19:16:04 +00:00
pooka
0921857772 Don't allow getcwd() on procfs vnodes and provide "/" as the path
instead of the result from getcwd().  The works around locking
panics caused by namei calling VOP_READLINK while holding on to a
directory lock and getcwd() trying to acquire that lock.  The real
fix would be to get rid of getcwd() calls within VOPs (not locking
safe), but that's not a viable option in the netbsd-4 timeframe.

Suggestion for workaround from David Holland.
2007-07-22 13:37:13 +00:00
pooka
a97de7b959 nuke homegrown getcwd_common() decl 2007-07-21 22:47:36 +00:00
pooka
e24b0872a4 Make set_statvfs_info() take a parameter for the vfs name instead
of always retrieving it from mp->mnt_op->vfs_name

christos ok
2007-07-17 11:19:31 +00:00
dsl
2721ab6c7b Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
2007-07-12 19:35:32 +00:00
ad
88ab7da936 Merge some of the less invasive changes from the vmlocking branch:
- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
2007-07-09 20:51:58 +00:00
pooka
b7d4ee5f17 * allow unmount even if rootvp has a usecount > 1 provided that
MNT_FORCE is given
* decrease cargo cult index by getting rid of commented sections
  with mntflushbuf() in them - AFAICT the call was removed from our
  kernel over 13 years ago with the 4.4BSDlite import
2007-07-08 23:58:53 +00:00
pooka
dbeb9a3eeb I'm all for redundant and failsafe computing, but ...
vap->va_atime = vap->va_mtime = vap->va_ctime;
        vap->va_atime = vap->va_mtime = vap->va_ctime;

... is missing the point.
2007-07-02 17:55:33 +00:00
pooka
5ac04c46a8 VOP_LOCK() doesn't handle LK_RETRY, call vn_lock() instead 2007-06-30 18:28:15 +00:00
dsl
6319443e37 Updates for changes prototype of kauth_cred_set/getgroups(). 2007-06-30 15:27:02 +00:00
pooka
835b0326c5 Using POOL_INIT here makes no sense, since file systems always have
an init method.  So get rid of it and #ifdef _LKM and just always
init in the init method.  Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
2007-06-30 09:37:53 +00:00
yamt
da51d139a4 improve post-ubc file overwrite performance in common cases.
ie. when it's safe, actually overwrite blocks rather than doing
read-modify-write.

also fixes PR/33152 and PR/36303.
2007-06-05 12:31:30 +00:00
agc
f1a5908695 In /proc/<pid>/statm, avoid leaking buffer space if the attempt to get
vmspace information fails.

Return the nice value properly to userland via the /proc/<pid>/stat entry.

Use vm sizes from vmspace, rather than rusage structs, for the same
reasons as mentioned previously - see the comment in
kvm_proc.c::kvm_getproc2() about rusage values and zombie processes.
2007-05-26 16:21:04 +00:00
agc
12003e8756 Use a bit more common code for the MULTIPROCESSOR and !MULTIPROCESSOR
cases.

Use the lwp's priority when returning the priority value, rather than
returning the nice value.
2007-05-25 22:26:14 +00:00
agc
15a3a67ede Various changes for better Linux emulation:
+ in /proc/<pid>/statm emulation, use the memory values from vmspace,
rather than struct rusage, since the rusage values appear to be 0 for
all processes except zombies.  cf dsl's comment in
kvm_proc.c::kvm_getproc2()

+ in /proc/<pid>/stat, instead of returning the tv_sec value, return the
number of ticks we've had (roughly equivalent to the Linux jiffies).
Calculate these values from the tv_usec values.

Also:

+ enclose CPU_INFO_ITERATOR and CPU_INFO_FOREACH usage in #ifdef
MULTIPROCESSOR, at the request of Nick Hudson

Together, these changes allow htop to work on NetBSD.
2007-05-25 19:20:06 +00:00
dogcow
905b715a4b use PRIu64, not llu, to unbork on 64-bit platforms. 2007-05-24 05:33:08 +00:00
agc
4dbe5ed7e7 Extend the Linux emulation of /proc to include
/proc/stat
	/proc/loadavg and
	/proc/<pid>/statm.

These are only present when -o linux is specified as a mount option
to procfs.

Factor out some common code so that it can be used by a number of
functions.

XXX The values returned in the statm emulation need to be verified.
2007-05-24 00:37:40 +00:00
hannken
64b7e5637e Fstrans_start() always returns zero, so change its type to void. 2007-05-17 07:26:21 +00:00
yamt
4d3b7e04c8 use a cached value of v_size. no functional changes. 2007-05-13 13:11:53 +00:00
perseant
0569cad0fd Split the VOP interface part of genfs_putpages() from the code. The new
function that does the work, genfs_do_putpages(), now takes as an argument
a pointer to the page that would be waited on, if PGO_BUSYWAIT were not set.
This allows a consumer, e.g. lfs_putpages(), to perform an action outside
the scope of UVM before sleeping on the page in question.
2007-04-24 22:46:03 +00:00
enami
780e071921 Don't expand RCS id of ancestor file. The id itself is actually copied
from null_vnops.c since the log message of rev. 1.1 implies the copy.
2007-04-16 08:10:58 +00:00
chs
aba740b225 define a pager flag PGO_RECLAIM, similar to FSYNC_RECLAIM, and use it
to skip unnecessary flushing when layered file system vnodes are recycled.
this also prevents a deadlock with the dodgy LFS putpages routine.
fixes the non-LFS part of PR 36150.
2007-04-16 05:14:54 +00:00
hannken
fc6776f366 Remove now obsolete vn_start_write() and vn_finished_write() and
corresponding flags.

Revert softdep_trackbufs() to its state before vn_start_write() was added.

Remove from struct mount now unneeded flags IMNT_SUSPEND* and
members mnt_writeopcountupper, mnt_writeopcountlower and mnt_leaf.

Welcome to 4.99.17
2007-04-08 11:20:42 +00:00
hannken
e956461048 Remove calls to now obsolete vn_start_write() and vn_finished_write(). 2007-04-07 15:06:53 +00:00
rmind
0a747ea89c Unfortunately, missed procfs_proc_unlock() in previous.
Pointed out by pooka@
2007-04-04 10:50:42 +00:00
rmind
199691e947 procfs_readlink: Handle a possible fail of fd_getfile(), also, we
do not need to check for error again.
CID: 4436
2007-04-04 01:27:32 +00:00
christos
a7761fd2c5 Instead of reading and writing little by little, allocate memory and
write the whole map in one shot so that we don't have to deal with the
map changing under us. Fixes the linux emulated jdk-1.6 where it was
losing the last map entry and could not find the stack on startup.
2007-04-01 03:18:57 +00:00
christos
6a4825167b return a page less than the actual top of stack so that linux-java works. 2007-04-01 03:16:44 +00:00
ad
0b43c20288 Remove useless cast. 2007-03-11 22:07:32 +00:00
ad
c147748d84 - Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.
2007-03-09 14:11:22 +00:00
christos
53524e44ef Kill caddr_t; there will be some MI fallout, but it will be fixed shortly. 2007-03-04 05:59:00 +00:00
salo
20af5e4fd5 Don't prepend rootvnode to the path in non-NULL case for exe links.
It breaks procfs in chroot.

from <christos>, tested by me.
2007-03-03 01:18:32 +00:00
ad
b89010bfa3 Destroy the hash locks on final unmount. 2007-02-27 16:11:51 +00:00
thorpej
7cc07e11dc TRUE -> true, FALSE -> false 2007-02-22 06:16:03 +00:00
thorpej
712239e366 Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
2007-02-21 22:59:35 +00:00
ad
4abc9f506a Add genfs_node_destroy(). Fixes a lock "leak" seen when running LOCKDEBUG
kernels.
2007-02-20 16:19:42 +00:00
pooka
76aba343c2 When checking for file validity under pid/, do proper proc->lwp
lookup (fsvo proper) instead of fiddling directly with the lwp
list.
2007-02-19 00:08:18 +00:00
ad
42a7dff463 procfs_map():
- Drop the target's vm_map lock before calling uiomove(). We could
  deadlock if inspecting /proc/curproc/map.
- If the vm_map might have changed, restart the operation, but give
  up after 250 retries if the map keeps changing.  XXX This is not
  ideal.
2007-02-18 20:03:44 +00:00
pooka
7b63f0de5d Don't check for validity of p in lookup for root nodes, since it
will always be NULL.  Rather, just call pt_valid with NULL directly
and let it decide if we're a linux mount or not.
2007-02-18 01:55:26 +00:00
pavel
934634a18c Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.
2007-02-17 22:31:36 +00:00
pooka
85cb1a4957 In lookup, when checking for procfs process node validity, target the
process we're trying to get information about through procfs, not
the caller of lookup.

fixes 'ls -l /proc/*/file' panic, which would occur when trying to
lookup "file" for a kernel thread, which doesn't have p->p_textvp.
2007-02-16 21:37:56 +00:00
ad
9abeea588a Replace some uses of lockmgr() / simplelocks. 2007-02-15 15:40:50 +00:00
ad
f8fe10ea6a Need to acquire procp->p_mutex for procfs_dir(). 2007-02-15 15:35:45 +00:00
ad
c18c0d2eaa Eliminate a couple of reference count and mutex leaks. 2007-02-11 17:16:08 +00:00
ad
b07ec3fc38 Merge newlock2 to head. 2007-02-09 21:55:00 +00:00
hannken
4d607243ba Change fstrans enum types to upper case.
No functional change.

From Antti Kantee <pooka@netbsd.org>
2007-01-29 15:42:50 +00:00
hannken
1b9c6382e3 New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE.  This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
2007-01-19 14:49:08 +00:00
elad
b2eb9a5389 Consistent usage of KAUTH_GENERIC_ISSUSER. 2007-01-04 19:07:03 +00:00
elad
5d2c44c76f PR/32877: Geoff C. Wing: mount_procfs(8) doesn't null-terminate cmdline
output

Patch applied, thanks!
2006-12-28 09:17:52 +00:00
elad
8e5a82bb94 Revert bogus NULL check introduced in revision 1.96 that generated false
Coverity "bugs".
2006-12-28 09:12:38 +00:00
alc
8f1ebe33c9 revert previous, after inspection `kfs->kfs_kt' could really not be NULL here.
reported/requested by elad@
2006-12-28 05:51:56 +00:00
alc
94d1925ccb fix comment (forgotten in rev 1.19):
- pfsnode -> kernfs_node
 - procfs -> kernfs
2006-12-28 05:49:05 +00:00
yamt
ccfd2c0df0 remove nqnfs. 2006-12-27 12:10:09 +00:00
alc
8ffa4fbf16 CID-3855: check if 'kfs->kfs_kt != NULL' before dereferencing it 2006-12-26 00:01:48 +00:00
elad
f02f51a039 PR/35226: Johann Franz: Problems with permissions in
/usr/pkg/emul/linux/proc .

Okay mlelstv@
2006-12-25 12:13:54 +00:00
christos
b5fb56163d fix permissions on /proc/<pid> node. From elad. 2006-12-24 17:37:35 +00:00
elad
a687717695 Add two comments. No functional change. 2006-12-24 16:45:23 +00:00
elad
f1a69ab3ea Some changes to get rid of another KAUTH_GENERIC_ISSUSER usage:
- Make procfs_control() in procfs_ctl.c static,
  - Add an argument to the above, 'pfs', for the pfsnode,
  - Add another request type to KAUTH_PROCESS_CANPROCFS named
    KAUTH_REQ_PROCESS_CANPROCFS_CTL (and update documentation),
  - Use the above combination in a call to kauth_authorize_process().
2006-12-19 09:58:34 +00:00
yamt
fc88d88996 put ->K loaned pages on the page queue, so that page loaning doesn't
disturb pagedaemon/pdpolicy.
2006-12-15 13:51:30 +00:00
pooka
4013808f45 Teach deadfs about vm object locking for getpages. This avoids
errors resulting from situations where we take a page fault for a
vnode which has been converted a deadfs vnode.

wrstuden ok
2006-12-10 23:57:33 +00:00
chs
c398ae9734 a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
   these now always return the parent vnode locked.  namei() works as before.
   lookup() and various other paths no longer acquire vnode locks in the
   wrong order via vrele().  fixes PR 32535.
   as a nice side effect, path lookup is also up to 25% faster.
 - the above allows us to get rid of PDIRUNLOCK.
 - also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
 - remove an assumption in layer_node_find() that all file systems implement
   a recursive VOP_LOCK() (unionfs doesn't).
 - require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
   fill in eopnotsupp() for file systems that don't support being exported
   and remove the checks for NULL.  (layerfs calls these without checking.)
 - in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
   adjust which vnode is locked.  fixes PR 33374.
 - apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
2006-12-09 16:11:50 +00:00
christos
33b30b1ee3 From Nicolas Joly: restore previous behavior in procfs_validfile_linux, since
readdir passes a NULL lwp.
2006-12-04 18:27:52 +00:00
elad
8a806df7dc Move kauth(9) call to where it belongs. Noticed by Nicolas Joly, thanks! 2006-12-03 13:24:10 +00:00
pooka
5f132bf76c * update comments before putpages(): the vm object is always returned
unlocked instead of locked.  chuq agrees
* use slock set to &uobj->vmobjlock also for the last simple lock
  operation to be consistent with the rest of the function
2006-11-30 06:11:03 +00:00
elad
8bb202af97 Move ktrace, ptrace, systrace, and procfs to use kauth(9).
First, remove process_checkioperm() calls from MD code. Similar checks
using kauth(9) routines (on the process scope, using appropriate action)
are done in the callers.

Add secmodel back-end to handle each subsystem.
2006-11-28 17:27:09 +00:00
elad
21bc112176 Implement Veriexec's raw disk policy on-top of kauth(9)'s device scope,
using both the rawio_spec and passthru actions to detect raw disk
activity. Same for kernel memory policy.

Update documentation (no longer need to expose veriexec_rawchk()) and
remove all Veriexec-related bits from specfs.
2006-11-26 20:27:27 +00:00
elad
98f4b1ff55 Part of PR/33280: Christian Ehrhardt: If LK_INTERLOCK is set
vp->v_interlock may be unlocked twice: Once explicitly and a second time
implicilty by lockmgr. LK_INTERLOCK is cleared from the variable flags but
not from ap->a_flags which is used with lockmgr. This is not so much of a
problem because there seems to be no call site that actually uses
LK_INTERLOCK with layer_unlock or VOP_UNLOCK.

okay martin@
2006-11-25 22:36:24 +00:00
elad
895827f391 Part of PR/33280: Christian Ehrhardt: In the error path (which probably
can't happen) lmp->layerm_hashlock is not unlocked.
2006-11-25 22:14:38 +00:00
christos
52411aad22 instead of const int, use a #define which most of the time will evaluate
in a compile-time constant.
2006-11-25 21:15:01 +00:00
skrll
45ea587c94 Expose the 'exe' symlink to the process realpath in NetBSD as well. An
example user is gdb.

OK'd by christos.
2006-11-25 09:39:34 +00:00
wiz
09cb1d6f1c s/existance/existence/, from Zafer. 2006-11-24 22:52:16 +00:00
elad
72438de6ef Remove redundant securelevel check; this is already done in procfs_rw()
and we can't get here (procfs_control()) without being there first.

Pointed out by yamt@.
2006-11-22 15:48:11 +00:00
christos
168cd830d2 __unused removal on arguments; approved by core. 2006-11-16 01:32:37 +00:00
reinoud
dc6975451a Add missing space in comment 2006-11-10 22:31:19 +00:00
jmmv
9d877d347c Use size_t in a couple of places as it makes more sense WRT the places
where the variables are later used.  From PR kern/25277 by Jeff Ito.
2006-11-04 20:51:32 +00:00
elad
fe9e2303fd Change KAUTH_SYSTEM_RAWIO to KAUTH_DEVICE_RAWIO_SPEC (moving the raw i/o
requests to the device scope) and add KAUTH_DEVICE_RAWIO_PASSTHRU.

Expose iskmemdev() through sys/conf.h.

okay yamt@
2006-11-04 09:30:00 +00:00
elad
7e8b842ffa Redo Veriexec raw disk/memory access policies so they hold only if the
request is for write access.
2006-11-02 12:48:35 +00:00
elad
45f88cbee1 Only use blkdev/bvp for the Veriexec case. While here, fix up IPS mode
restrictions on kernel memory.

okay yamt@
2006-11-01 09:37:28 +00:00
elad
ea927d2c6a oops, remove debug printf slipped in. good catch from yamt@, thanks! 2006-10-30 12:19:23 +00:00
christos
3f792e2267 add an "emul" file node. 2006-10-29 22:35:35 +00:00
christos
26bb1685bd don't allocate large buffers on the stack. 2006-10-27 16:49:01 +00:00
christos
e8926fa3f7 1. fix procfs_validfile{,_linux} to test for NULL pointers properly.
2. make "exe" entry be a symlink to the executable, instead of pointing
   directly to the vnode of the executable.
3. factor out commonly used code.
2006-10-25 18:59:52 +00:00
elad
59e67acd85 kauth_cred_geteuid() is okay for the purposes of these checks. Revert
conversion to kauth_authorize_generic() done some time ago.
2006-10-25 11:59:34 +00:00
elad
af94ee3081 PR/34888: Nicolas Joly: kernel panic while trying to access
/emul/linux/proc/0/stat

Patch applied, thanks for the report!
2006-10-23 18:19:14 +00:00
reinoud
0ce809091d Replace the LIST structure mp->mnt_vnodelist to a TAILQ structure since all
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.

An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.

In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
2006-10-20 18:58:12 +00:00
yamt
37c68a65cf add wrapper functions of lockmgr on g_glock. 2006-10-14 09:16:28 +00:00
yamt
eee3695a7f genfs_getpages: use kmem_zalloc. 2006-10-14 09:15:52 +00:00
yamt
c2ae921270 genfs_do_io: iodone handler should be called at splbio. 2006-10-14 08:31:14 +00:00
yamt
35048d7491 genfs_putpages: don't try to deactivate loaned pages.
reported and tested by Nicolas Joly on current-users@.
2006-10-12 10:10:48 +00:00
thorpej
867f8cb239 genfs_lease_check(): Consume the arguments even if NFSSERVER is not defined. 2006-10-12 04:25:43 +00:00
christos
4d595fd7b1 - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
2006-10-12 01:30:41 +00:00
dogcow
62ce183fe4 fix build error in mount_sysvbfs. 2006-10-06 02:17:25 +00:00
chs
33c1fd1917 add support for O_DIRECT (I/O directly to application memory,
bypassing any kernel caching for file data).
2006-10-05 14:48:32 +00:00
jld
0ebf778b9d The poll routine needs to return POLLERR on error, not an errno. Sorry
about that.  Pointed out by Juergen Hannken-Illjes in mail.
2006-09-30 21:00:13 +00:00
jld
754d606748 Protect spec_poll from racing against revocation and thus dereferencing a
NULL v_specinfo.  Mostly copied (with understanding) from rev 1.83's fix
to spec_ioctl, and needed for the same reason (kern/vfs_subr.c r1.231).
2006-09-21 09:28:37 +00:00
manu
a540ef296e Emulate Linux's /proc/devices 2006-09-20 08:09:05 +00:00
elad
3964702f3a For the VBLK case, we always check vfs_mountedon() and it has nothing
to do with the security model used. Move back the call to spec_open(),
which can now return the real return value from vfs_mountedon() (EBUSY)
and not EPERM, changing semantics.
2006-09-19 16:41:57 +00:00
yamt
9d3e3eab23 merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
	- implement an alternative replacement policy
2006-09-15 15:51:12 +00:00
elad
bada0c776a Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.
2006-09-13 10:07:42 +00:00
elad
5f7169ccb1 First take at security model abstraction.
- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
  opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
  security model, called "bsd44". This is the default (and only) model we
  have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

  * There's a sample overlay model, sitting on-top of "bsd44", for
    fast experimenting with tweaking just a subset of an existing model.

    This is pretty cool because it's *really* straightforward to do stuff
    you had to use ugly hacks for until now...

  * And of course, documentation describing how to do the above for quick
    reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

	http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

  - Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
  - Checks 'securelevel' directly,
  - Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)
2006-09-08 20:58:56 +00:00
christos
4e6ffbfcf0 add missing initializers 2006-09-03 22:28:53 +00:00
christos
df1dfef2bc add missing initializers 2006-09-03 21:01:12 +00:00
christos
309d51fb22 add missing initializers 2006-09-03 04:56:33 +00:00
christos
e89033a3e5 add missing initializers 2006-09-03 04:54:24 +00:00
christos
3c95928caf add missing initializers. 2006-09-02 06:37:41 +00:00
cube
bd859bd3de Restore dependency on PTRACE for PROCFS.
Bump required config(1) version.
2006-08-30 13:49:27 +00:00
jnemeth
944592a2ee revert previous as it breaks the build due to invalid syntax 2006-08-30 07:46:37 +00:00
christos
676e77765a fix missing initializers 2006-08-30 01:28:53 +00:00
matt
9e0ec4816e Make PTRACE and COREDUMP optional. Make the default (status quo) by putting
them in conf/std.
2006-08-29 23:34:48 +00:00
christos
ce0ef6cfc4 Pretending to be Elad's keyboard:
fileassoc.diff adds a fileassoc_table_run() routine that allows you to
pass a callback to be called with every entry on a given mount.

veriexec.diff adds some raw device access policies: if raw disk is
opened at strict level 1, all fingerprints on this disk will be
invalidated as a safety measure. level 2 will not allow opening disk
for raw writing if we monitor it, and prevent raw writes to memory.
level 3 will not allow opening any disk for raw writing.

both update all relevant documentation.

veriexec concept is okay blymn@.
2006-08-11 19:17:47 +00:00
ad
f474dceb13 Use the LWP cached credentials where sane. 2006-07-23 22:06:03 +00:00
yamt
54a9d2b0f7 - genfs_getpages: in the case of PGO_LOCKED, check if we can acquire
g_glock as suggested by Chuck Silvers on tech-kern@.
- genfs_rel_pages: handle PGO_DONTCARE so that it can be used for the above.
2006-07-22 08:49:13 +00:00
yamt
f9458a6ba1 - in genfs_getpages, take g_glock earlier so that it can't be
intervened by truncation.
  it also fixes a deadlock.  (g_glock vs pages locking order)
- uvm_vnp_setsize: modify v_size while holding v_interlock.

reviewed by Chuck Silvers.
2006-07-22 08:47:56 +00:00
martin
a3b5baed42 Fix alignement problems for fhandle_t, exposed by gcc4.1.
While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
2006-07-13 12:00:24 +00:00
christos
f36aa0cd37 PR/33815: Nicolas Joly: /emul/linux/proc/#/stat always report current
process status
2006-06-24 16:34:02 +00:00
christos
7173cfeec6 remove useless genop 2006-06-23 20:54:21 +00:00
bouyer
14349e5550 For internal types call kernfs_default_xread() directly, as no entry in
the splay tree has been added for these types. Fix kern/33797 by
Geoff C. Wing.
While here also fix writes the same way (probably broken for 2 years),
and properly implement KERNFS_XREAD.
The IPsec code could probably be moved out now, and use kernfs_alloctype().
2006-06-23 20:30:11 +00:00
bouyer
82722a8d91 Backout previous: of course the change
"Allow optional /kern regular files to have custom read methods..."
works, it's used by Xen.
2006-06-23 16:26:59 +00:00
christos
c8ee2595ab PR/33797: Geoff C. Wing: kernfs files are not supplying information
Roll back the change:
    'Allow optional /kern regular files to have custom read methods...'
which does not work.
2006-06-23 14:59:40 +00:00
yamt
f7c7538921 use KAUTH_PROCESS_CANSEE rather than CURTAIN where appropriate. 2006-06-13 13:57:33 +00:00
yamt
f755e9e9b8 remove unnecessary arguments from kauth_authorize_process.
ie. make it similar to the one found in apple TN.
2006-06-13 13:56:50 +00:00
kardel
de4337ab21 merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
  time.tv_sec -> time_second
- struct timeval mono_time is gone
  mono_time.tv_sec -> time_uptime
- access to time via
	{get,}{micro,nano,bin}time()
	get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
  Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
  NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
2006-06-07 22:33:33 +00:00
rpaulo
9f668d8be9 Call to kauth_cred_getgroups() should use kauth_cred_ngroups() result,
not the size of the array.
2006-06-05 13:25:28 +00:00
rpaulo
abd28afaa8 pcr_ngroups should be uint16_t. 2006-06-05 13:24:22 +00:00
elad
fc9422c9d9 integrate kauth. 2006-05-14 21:31:52 +00:00
christos
9ae6310d36 Coverity CID 2851: Check for NULL before freeing. 2006-04-12 01:09:43 +00:00
yamt
4ddfb52ac9 genfs_getpages:
- use "overwrite" variable consistently.
- remove a set-only variable.
no functional changes.
2006-04-11 09:34:58 +00:00
christos
afa610222b Coverity CID 1002: Yes, this could really be NULL, so check against it. 2006-04-04 14:24:15 +00:00
christos
b33df30820 Coverity CID 1087: Clarify NULL test. 2006-04-04 14:21:55 +00:00
christos
e2b3af9d2c Coverity CID 1140: NULL dereference cannot happen, but protect against it. 2006-04-04 14:18:35 +00:00
christos
41a4245aa5 Coverity CID 2413: NULL deref cannot happen, but nevertheless protect against
it.
2006-04-04 14:16:46 +00:00
yamt
c5fcdd1719 some cleanups after the introduction of GOP_SIZE_MEM flag.
- remove GOP_SIZE_READ/GOP_SIZE_WRITE flags.
  they have not been used since the change.
- ufs_balloc_range: remove code which has been no-op since the change.
  thanks Konrad Schroder for explaining the original intention of the code.
- ffs_gop_size: don't extend past eof, in the case of GOP_SIZE_MEM.
  otherwise genfs_getpages end up to allocate pages past eof unnecessarily.
2006-03-30 12:40:06 +00:00
bouyer
59b64d6167 Allow optionnal /kern regular files to have custom read methods, the same
way writes are handled: Add KERNFS_XREAD and KERNFS_FILEOP_WRITE files
operations definitions to kfsfileop, a xread function pointer to
kernfs_fileop, rename kernfs_read to kernfs_default_xread and add a
kernfs_read calling kernfs_try_fileop(KERNFS_FILEOP_READ).

Proposed on tech-kern on Feb 18 2006.
2006-03-14 20:47:52 +00:00
christos
1b2709754a cleanup more SET/CLR/ISSET lossage 2006-03-05 17:33:33 +00:00
yamt
ec5a93183a merge yamt-uio_vmspace branch.
- use vmspace rather than proc or lwp where appropriate.
  the latter is more natural to specify an address space.
  (and less likely to be abused for random purposes.)
- fix a swdmover race.
2006-03-01 12:38:10 +00:00
christos
671d9ecff9 PR/32692: Matthew Mondor: linux compatibility in /proc/self should point
directly to the directory containing the pid instead of pointing to
/proc/curproc, because some programs rely on calling readlink on /proc/self
to get the pid.
2006-02-02 00:29:24 +00:00
reinoud
a024cb9151 Add genfs support for directories and softlinks next to regular files and
block devices.

Discussed on tech-kern and ok'd by Chuck
2006-01-16 19:45:00 +00:00
yamt
58d3c6b6cd use nestiobuf api for genfs. 2006-01-11 00:46:54 +00:00
yamt
690d424f28 - add simple functions to allocate/free a buffer for i/o.
- make bufpool static.
2006-01-04 10:13:05 +00:00
perry
0f0296d88a Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete. 2005-12-24 20:45:08 +00:00
yamt
238236815c fix lock/unlock mismatch in rev.1.115.
reported by Chris Tribo on current-users@.
2005-12-15 02:23:38 +00:00
christos
95e1ffb156 merge ktrace-lwp. 2005-12-11 12:16:03 +00:00
yamt
059ae4967d genfs_compat_getpages: add minimum support of async get. ie. ignore them.
should fix a crash reported by Jukka Salmi on current-users@.
2005-12-03 17:23:25 +00:00
yamt
e66191b30c genfs_gop_write: use devvp directly as genfs_getpages does. 2005-12-02 00:47:54 +00:00
yamt
8afb2e8ad0 genfs_putpages: initialize marker pages only when needed. 2005-12-02 00:43:51 +00:00
yamt
51a339dd4b revert rev.1.111 as it isn't necessary or correct.
- currently no one in tree has a problem with zero b_lblkno, afaik.
- this buf is used for "devvp", so it doesn't make sense to
  use lbn in the "vp".
2005-11-30 03:45:16 +00:00
reinoud
b91433e0fb Learn genfs that (struct buf *)->b_lblkno allways need to point to the
logical block number of the file instead of allways zero.
2005-11-30 01:46:06 +00:00
yamt
221616873d merge yamt-readahead branch. 2005-11-29 22:52:02 +00:00
christos
dcc61c764f Fix 64 bit truncation problem reported by http://www.securitylab.net 2005-11-23 22:00:32 +00:00
yamt
89bc307830 genfs_getpages:
- add an assertion.
- call VOP_STRATEGY of underlying vnode directly, rather than
  through the filesystem vnode.
- no need to set b_dev here because VOP_STRATEGY will take care of it.
2005-11-12 22:29:53 +00:00
yamt
a748ea88dd merge yamt-vop branch. remove following VOPs.
VOP_BLKATOFF
	VOP_VALLOC
	VOP_BALLOC
	VOP_REALLOCBLKS
	VOP_VFREE
	VOP_TRUNCATE
	VOP_UPDATE
2005-11-02 12:38:58 +00:00
elad
a61a2074a3 Remove Veriexec bits from genfs, don't #if 0 them. 2005-10-07 18:19:14 +00:00
elad
2de72bfe34 Various fixes from blymn@ and myself.
Also, put genfs changes under #if 0, and don't do per-page fingerprints
until this is properly discussed, as requested by yamt@.
2005-10-07 18:07:46 +00:00
elad
8fc0d7a9c3 Introduce per-page fingerprints in Veriexec.
This closes a hole pointed out by Thor Lancelot Simon on tech-kern ~3
years ago.

The problem was with running binaries from remote storage, where our
kernel (and Veriexec) has no control over any changes to files.

An attacker could, after the fingerprint has been verified and
program loaded to memory, inject malicious code into the backing
store on the remote storage, followed by a forced flush, causing
a page-in of the malicious data from backing store, bypassing
integrity checks.

Initial implementation by Brett Lymn.
2005-10-05 13:48:48 +00:00
atatat
fca6393ad4 Add "cwd" and "root" symlinks to each process's directory. The cwd
link points to the process's current working directory, and the root
link points to the process's root directory.  What else would you
expect?

For directories that are out of reach (caller is in a chroot, target
process is in a different chroot, etc), the links point to "/"
instead.
2005-10-01 03:17:37 +00:00
jmmv
9ba32cead7 Follow compat naming tradition: rename compat_export_args to export_args30. 2005-09-25 21:17:05 +00:00
jmmv
2a3e5eeb7c Apply the NFS exports list rototill patch:
- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
  function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
  file sys/nfs/nfs_export.c.  The former was becoming large and its code
  is always compiled, regardless of the build options.  Using the latter,
  the code is only compiled in when NFSSERVER is enabled.  While doing this,
  also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
  path and a set of export entries.  At the moment it can only clear the
  exports list or append entries, one by one, but it is done in a way that
  allows setting the whole set of entries atomically in the future (see the
  comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
  that it becomes file system agnostic.  In fact, all this whole thing was
  done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
  exports initialization; done internally by the kernel when initializing
  the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
  subsystems can run arbitrary code upon receipt of specific VFS events.
  At the moment, this only provides support for unmount and is used to
  destroy NFS exports lists from the file systems being unmounted, though it
  has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
2005-09-23 12:10:31 +00:00
christos
eefcfba9b0 When readdir() is called from vfs_getcwd, uio->uio_procp is NULL. Deal with
that. Fixes 'cd /dev/fd && pwd'
2005-09-14 14:53:47 +00:00
elad
a894866511 Implement curtain for procfs. 2005-09-11 20:15:53 +00:00
xtraeme
0cbb812de5 Add sysctl options for the syncer:
vfs.sync.delay: max time to delay syncing data
vfs.sync.filedelay: time to delay syncing files
vfs.sync.dirdelay: time to delay syncing directories
vfs.sync.metadelay: time to delay syncing metadata

Note that using a value of 0 is allowed, but it's not
recommended.
2005-09-11 17:55:56 +00:00
chs
0840b7949f in spec_ioctl(), don't dereference v_specinfo if it's NULL.
this is needed due to rev. 1.231 of kern/vfs_subr.c, which now sets
v_specinfo to NULL before changing the vnode's ops vector.
2005-09-11 14:18:54 +00:00
christos
7791a8f18b Also protect the ipsec ioctls from negative offsets to prevent panics
in m_copydata(). Pointed out by Karl Janmar. Move the negative offset
check from kernfs_xread() to kernfs_read().
2005-09-01 06:25:26 +00:00
christos
218f69d99f Don't allow negative offsets when reading the message buffer, because it
can allow reading arbitrary kernel memory.
2005-08-31 09:54:54 +00:00
xtraeme
af97f2e875 Remove __P() 2005-08-30 20:08:01 +00:00
christos
50f8955b6e 64 bit inode changes. 2005-08-19 02:04:03 +00:00
yamt
79ff185ac4 don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.
2005-07-26 08:06:29 +00:00
erh
fbd6fe6c7f Provide a sysctl (vfs.layerfs.debug) to control verbose output when
LAYERFS_DIAGNOSTIC is turned on.
2005-07-24 17:33:24 +00:00
yamt
b7bfe82866 update file timestamps for nfsd loaned-read and mmap.
PR/25279.  discussed on tech-kern@.
2005-07-23 12:18:41 +00:00
yamt
01f4919e33 genfs_putpages: don't bother to clean the vnode unless VONWORKLST. 2005-07-17 16:07:19 +00:00
yamt
8af42d8d3c ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
  setting "wasclean" false when encountering them.
  suggested by Stephan Uphoff in PR/24596 (1).

- genfs_putpages: write protect pages when cleaning out, if
  we're going to take the vnode off the syncer's queue.
  uvm_fault: don't write-map pages unless its vnode is already on
  the syncer's queue.

  fix PR/24596 (3) but in the different way from the suggested fix.
  (to keep our current behaviour, ie. not to require explicit msync.
  discussed on tech-kern@.)

- genfs_putpages: don't mistakenly take a vnode off the queue
  by introducing a generation number in genfs_node.
  genfs_getpages: increment the generation number.
  suggested by Stephan Uphoff in PR/24596 (2).

- add some assertions.
2005-07-17 12:27:47 +00:00
yamt
2a6dc9d02d - introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
  page size > block size.

- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
  VM_PROT_READ.
2005-07-17 09:13:35 +00:00
yamt
e9e22b28eb genfs_getpages: don't forget to put the vnode onto the syncer's work queue
even in the case of PGO_LOCKED.
2005-07-16 03:54:08 +00:00
yamt
44d128fa8e - constify genfs_ops.
- use member designators.
2005-06-28 09:30:37 +00:00
ws
9d78e0cf36 PR-30566: Poll must not return <sys/errno.h> values.
Start with those places I can easily test.
2005-06-21 14:01:11 +00:00
christos
1979e6e175 rename delay. 2005-05-30 22:13:50 +00:00
christos
c107ef9edc - sprinkle const
- avoid shadowed variables.
2005-05-29 21:55:33 +00:00
chs
448875a34c kernfs does not support mmap(), remove code that pretends that it does. 2005-05-20 13:16:54 +00:00
christos
8f3566ce61 PR/29782: Martin Husemann: procfs can not unmount when some process has its
current directory in curproc. Fix from Pedro Martelletto:
We cannot call vgone() from procfs_inactive() if we are coming from
vclean(). that's what's probably causing the deadlock.
2005-04-02 06:15:09 +00:00
thorpej
e633e8b61b - Define a VFS_ATTACH() macro that places a reference to a vfsops structure
into the "vfsops" link set.
- Use VFS_ATTACH() where vfsops are declared for individual file systems.
- In vfsinit(), traverse the "vfsops" link set, rather than vfs_list_initial[].
2005-03-29 02:41:05 +00:00
christos
bb48399e9b Remove bogus len setting noted by J. Chapman Flack. 2005-03-01 04:39:59 +00:00
christos
1a63592a9b Give more space for cpu info and allocate it dynamically. 2005-02-27 22:29:50 +00:00
perry
477853c351 nuke trailing whitespace 2005-02-26 22:58:54 +00:00
chs
d67b9b2ff2 undo the part of rev. 1.93 that turned the past-EOF check into an assertion.
read() can't request pages past EOF, but mmap() can.  apparently I had
disengaged the brain when I said that was ok.
2005-02-16 15:25:33 +00:00
wrstuden
e384a44e9d Extend fsync_range(2) to support the FDISKSYNC flag, which requests
that the sync be propogated out through the disk drive caches.
2005-01-25 23:55:20 +00:00
drochner
7d0567768c -in the read-ahead code, avoid to issue read requests at/past EOF
-because noone should request reads past EOF, or writes past EOF which
 are not explicitely marked as file-extending (PGO_PASTEOF), turn
 a boundary check into a KASSERT
approved by Chuck Silvers
2005-01-25 09:50:31 +00:00
thorpej
1c95472d01 Add the system call and VFS infrastructure for file system extended
attributes.

From FreeBSD.
2005-01-02 16:08:28 +00:00
dbj
8962229d27 check for _KERNEL_OPT around opt include 2004-12-22 23:29:51 +00:00
christos
31c81b28f5 Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
    - EDUPFD (used to overload ENODEV)
    - EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
   EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
   fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat
2004-11-30 04:25:43 +00:00
atatat
e23f0e2a34 Pass the caller's proc* to soreceive() via auio.uio_procp so that
unp_externalize() is called properly.

Addresses PR kern/28194.
2004-11-12 04:15:29 +00:00