NetBSD

Commit Graph

Author	SHA1	Message	Date
dholland	cba87bb8e7	Abolish all the silly indirection macros for initializing vnode ops tables. These are things of the form #define foofs_op genfs_op, or #define foofs_op genfs_eopnotsupp, or similar. They serve no purpose besides obfuscation, and have gotten cutpasted all over everywhere. Part 3; cvs randomly didn't commit all the files the first time, still hunting down the files it skipped.	2021-07-19 01:33:53 +00:00
dholland	4171507047	Abolish all the silly indirection macros for initializing vnode ops tables. These are things of the form #define foofs_op genfs_op, or #define foofs_op genfs_eopnotsupp, or similar. They serve no purpose besides obfuscation, and have gotten cutpasted all over everywhere.	2021-07-18 23:57:13 +00:00
dholland	d819c3614f	Use macros for the canned parts of device and fifo vnode op tables. Add GENFS_SPECOP_ENTRIES and GENFS_FIFOOP_ENTRIES macros that contain the portion of the vnode ops table declaration that is (conservatively) the same in every fs. Use these in every fs that supports devices and/or fifos with separate ops tables. Note that ptyfs works differently (it has one type of vnode with open-coded dispatch to the specfs code, which I haven't changed in this commit) and rump/librump/rumpvfs/rumpfs.c has an indirect dynamic dispatch that already does more or less the same thing, which I also haven't changed. Also note that this anticipates a few bits in the next changeset here and there, and adds missing but unreachable calls in some cases (e.g. most fses weren't defining whiteout on devices and fifos, but it isn't reachable there), and it changes parsepath on devices and fifos to genfs_badop from genfs_parsepath (but it's not reachable there either). It appears that devices in kernfs were missing kqfilter, so it's possible that if you try to use kqueue on /kern/rootdev that it'll explode. And finally note that the ops declaration tables aren't order-dependent. (Other than vop_default_desc has to come first.) Otherwise this wouldn't work.	2021-07-18 23:56:12 +00:00
dholland	37ce85c5f7	Fix perms on /kern/{r,}rootdev.	2021-07-06 03:23:03 +00:00
dholland	285b06fd69	Add missing VOP_KQFILTER to kernfs. Not sure if lack of it can be used for local DoS or not, but best to fix.	2021-07-06 03:22:44 +00:00
dholland	723d09ce8e	Add containment for the cloning devices hack in vn_open. Cloning devices (and also things like /dev/stderr) work by allocating a struct file, stuffing it in the file table (which is a layer violation), stuffing the file descriptor number for it in a magic field of struct lwp (which is gross), and then "failing" with one of two magic errnos, EDUPFD or EMOVEFD. Before this commit, all callers of vn_open in the kernel (there are quite a few) were expected to check for these errors and handle the situation. Needless to say, none of them except for open() itself did, resulting in internal negative errnos being returned to userspace. This hack is fairly deeply rooted and cannot be eliminated all at once. This commit adds logic to handle the magic errnos inside vn_open; now on success vn_open returns either a vnode or an integer file descriptor, along with a flag that says whether the underlying code requested EDUPFD or EMOVEFD. Callers not prepared to cope with file descriptors can pass NULL for the extra return values, in which case if a file descriptor would be produced vn_open fails with EOPNOTSUPP. Since I'm rearranging vn_open's signature anyway, stop exposing struct nameidata. Instead, take three arguments: an optional vnode to use as the starting point (like openat()), the path, and additional namei flags to use, restricted to NOCHROOT and TRYEMULROOT. (Other namei behavior, e.g. NOFOLLOW, can be requested via the open flags.) This change requires a kernel bump. Ride the one an hour ago. (That was supposed to be coordinated; did not intend to let an hour slip by. My fault.)	2021-06-29 22:40:53 +00:00
dholland	c6c16cd073	- Add a new vnode op: VOP_PARSEPATH. - Move namei_getcomponent to genfs_vnops.c and call it genfs_parsepath. - Add a parsepath entry to every vnode ops table. VOP_PARSEPATH takes a directory vnode to be searched and a complete following path and chooses how much of that path to consume. To begin with, all parsepath calls are genfs_parsepath, which locates the first '/' as always. Note that the call doesn't take the whole struct componentname, only the string. The other bits of struct componentname should not be needed and there's no reason to cause potential complications by exposing them.	2021-06-29 22:34:05 +00:00
chs	e3beb37645	VOP_BMAP() may be called via ioctl(FIOGETBMAP) on any vnode that applications can open. change various pseudo-fs *_bmap methods return an error instead of panic. Reported-by: syzbot+8289a3eaf2ba60958c87@syzkaller.appspotmail.com	2021-06-28 17:52:12 +00:00
hannken	9decf88a36	Make sure fdesc_lookup() never returns VNON vnodes. Should fix PR kern/56130 (fdescfs create nodes with wrong major number)	2021-05-01 15:08:14 +00:00
riastradh	a1daa84551	Fix procfs environ node.	2020-12-28 22:36:16 +00:00
mlelstv	1bc006b1c2	When reading from a block device, queue parallel block requests to fill a buffer with breadn.	2020-12-25 09:28:56 +00:00
thorpej	8cda207fa4	Use sel{record,remove}_knote().	2020-12-19 21:54:42 +00:00
riastradh	9fc453562f	Round of uvm.h cleanup. The poorly named uvm.h is generally supposed to be for uvm-internal users only. - Narrow it to files that actually need it -- mostly files that need to query whether curlwp is the pagedaemon, which should maybe be exposed by an external header. - Use uvm_extern.h where feasible and uvm_*.h for things not exposed by it. We should split up uvm_extern.h but this will serve for now to reduce the uvm.h dependencies. - Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use UVMHIST(ubchist), since ubchist is declared in uvm.h but the reference evaporates if UVMHIST is not defined, so we reduce header file dependencies. - Make uvm_device.h and uvm_swap.h independently includable while here. ok chs@	2020-09-05 16:30:10 +00:00
riastradh	44afc3b3f9	genfs_rename: Fix deadlocks in cross-directory cyclic rename. Reproducer: A: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600); rmdir("c/d/e"); rmdir("c/d"); } B: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600); rename("c", "c/d/e"); } C: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600); rename("c/d/e", "c"); } Deadlock: - A holds c and wants to lock d; and either - B holds . and d and wants to lock c, or - C holds . and d and wants to lock c. The problem with these is that genfs_rename_enter_separate in B or C tried lock order .->d->c->e (in A/B, fdvp->tdvp->fvp->tvp; in A/C, tdvp->fdvp->tvp->fvp) which violates the ancestor->descendant order .->c->d->e. The resolution is to change B to do fdvp->fvp->tdvp->tvp and C to do tdvp->tvp->fdvp->fvp. But there's an edge case: tvp and fvp might be the same (hard links), and we can't detect that until after we've looked them both up -- and in some file systems (I'm looking at you, ufs), there is no mere lookup operation, only lookup-and-lock, so we can't even hold the lock on one of tvp or fvp when we look up the other one if there's a chance they might be the same. Fortunately the cases (a) tvp = fvp (b) tvp or fvp is a directory are mutually exclusive as long as directories cannot be hard-linked. In case (a) we can just defer locking {tvp, fvp} until the end, because it can't possibly have {fdvp or fvp, tdvp or tvp} as descendants. In case (b) we can just lock them in the order fdvp->fvp->tdvp->tvp or tdvp->tvp->fdvp->fvp if the first one of {fvp, tvp} is a directory, because it can't possibly coincide with the second one of {fvp, tvp}. With this change, we can now prove that the locking order is consistent with the ancestor->descendant partial ordering. Where two nodes are incommensurate under that partial ordering, they are only ever locked by rename and there is only ever one rename at a time. Proof: - For same-directory renames, genfs_rename_enter_common locks the directory first and then the children. The order directory->child[i] is consistent with ancestor->descendant and child[0]/child[1] are incommensurate. - For cross-directory renames: . While a rename is in progress and the fs-wide rename lock is held, directories can be created or removed but not changed, so the outcome of gro_genealogy -- which, given fdvp and tdvp, returns the node N relating fdvp/N/.../tdvp or null if there is none -- can only transition from finding N to not finding N, if one of the directories is removed while any of the vnodes are unlocked. Merely creating directories cannot change the ancestry of tdvp, and concurrent renames are not possible. Thus, if a gro_genealogy determined the operation to have the form fdvp/N/.../tdvp, then it might cease to have that form, but only because tdvp was removed which will harmlessly cause the rename to fail later on. Similarly, if gro_genealogy determined the operation _not_ to have the form fdvp/N/.../tdvp then it can't begin to have that form until after the rename has completed. The lock order is, => for fdvp/.../tdvp: 1. lock fdvp 2. lookup(/lock/unlock) fvp (consistent with fdvp->fvp) 3. lock fvp if a directory (consistent with fdvp->fvp) 4. lock tdvp (consistent with fdvp->tdvp and possibly fvp->tdvp) 5. lookup(/lock/unlock) tvp (consistent with tdvp->tvp) 6. lock fvp if a nondirectory (fvp->t* or fvp->fdvp is impossible) 7. lock tvp if not fvp (tvp->f* is impossible unless tvp=fvp) => for incommensurate fdvp & tdvp, or for tdvp/.../fdvp: 1. lock tdvp 2. lookup(/lock/unlock) tvp (consistent with tdvp->tvp) 3. lock tvp if a directory (consistent with tdvp->tvp) 4. lock fdvp (either incommensurate with tdvp and/or tvp, or consistent with tdvp(->tvp)->fdvp) 5. lookup(/lock/unlock) fvp (consistent with fdvp->fvp) 6. lock tvp if a nondirectory (tvp->f* or tvp->tdvp is impossible) 7. lock fvp if not tvp (fvp->t* is impossible unless fvp=tvp) Deadlocks found by hannken@; resolution worked out with dholland@. XXX I think we could improve concurrency somewhat -- with a likely big win for applications like tar and rsync that create many files with temporary names and then rename them to the permanent one in the same directory -- by making vfs_renamelock a reader/writer lock: any number of same-directory renames, or exactly one cross-directory rename, at any one time.	2020-09-05 02:47:03 +00:00
simonb	bf74807839	Remove trailing \n from UVMHIST_LOG() format strings.	2020-08-19 07:29:00 +00:00
chs	19303cecfc	centralize calls from UVM to radixtree into a few functions. in those functions, assert that the object lock is held in the correct mode.	2020-08-14 09:06:14 +00:00
rin	3123ec52cf	Output offsets in hex for UVMHIST.	2020-08-10 11:09:15 +00:00
christos	ad1efe3529	accmode should be accmode_t	2020-08-07 18:14:21 +00:00
christos	79e3c74f8e	Introduce genfs_pathconf() and use it for the default case in all filesystems.	2020-06-27 17:29:17 +00:00
ad	2806b3da8b	genfs_putpages(): when building a cluster make use of pages in the in the existing uvm_page_array.	2020-06-14 00:25:22 +00:00
ad	ba90a6ba38	Counter tweaks: - Don't need to count anonpages+filepages any more; clean+unknown+dirty for each kind of page can be summed to get the totals. - Track the number of free pages with a counter so that it's one less thing for the allocator to do, which opens up further options there. - Remove cpu_count_sync_one(). It has no users and doesn't save a whole lot. For the cheap option, give cpu_count_sync() a boolean parameter indicating that a cached value is okay, and rate limit the updates for cached values to hz.	2020-06-11 22:21:05 +00:00
ad	4b8a875ae2	uvm_availmem(): give it a boolean argument to specify whether a recent cached value will do, or if the very latest total must be fetched. It can be called thousands of times a second and fetching the totals impacts not only the calling LWP but other CPUs doing unrelated activity in the VM system.	2020-06-11 19:20:42 +00:00
rin	46b290a0c3	struct statvfs is too large for stack. Use malloc(9) instead. XXX Switch to kmem(9) for entire this file. Frame size, e.g. for m68k, becomes: 3292 --> 12	2020-05-31 08:38:54 +00:00
bouyer	ecb2afc2be	Add need-flags for kernfs. Compile Xen kernfs support only if kernfs is compiled in the kernel. Should fix MODULAR build.	2020-05-26 10:37:24 +00:00
ad	4bfe043955	- Alter the convention for uvm_page_array slightly, so the basic search parameters can't change part way through a search: move the "uobj" and "flags" arguments over to uvm_page_array_init() and store those with the array. - With that, detect when it's not possible to find any more pages in the tree with the given search parameters, and avoid repeated tree lookups if the caller loops over uvm_page_array_fill_and_peek().	2020-05-25 21:15:10 +00:00
ad	0eaaa024ea	Move proc_lock into the data segment. It was dynamically allocated because at the time we had mutex_obj_alloc() but not __cacheline_aligned.	2020-05-23 23:42:41 +00:00
christos	44ed7d42f1	Fix EPERM vs EACCES on chtimes (thanks @hannken)	2020-05-20 17:06:15 +00:00
christos	10d4c4e928	remove debugging, it is just clutter.	2020-05-18 19:55:42 +00:00
christos	e59d6517f7	Fix EPERM vs EACCES return.	2020-05-18 19:42:16 +00:00
ad	ff872804dc	Start trying to reduce cache misses on vm_page during fault processing. - Make PGO_LOCKED getpages imply PGO_NOBUSY and remove the latter. Mark pages busy only when there's actually I/O to do. - When doing COW on a uvm_object, don't mess with neighbouring pages. In all likelyhood they're already entered. - Don't mess with neighbouring VAs that have existing mappings as replacing those mappings with same can be quite costly. - Don't enqueue pages for neighbour faults unless not enqueued already, and don't activate centre pages unless uvmpdpol says its useful. Also: - Make PGO_LOCKED getpages on UAOs work more like vnodes: do gang lookup in the radix tree, and don't allocate new pages. - Fix many assertion failures around faults/loans with tmpfs.	2020-05-17 19:38:16 +00:00
christos	9aa2a9c323	Add ACL support for FFS. From FreeBSD.	2020-05-16 18:31:45 +00:00
riastradh	8f400c1021	Put forward declaration a little further forward to unbreak build.	2020-04-29 07:18:24 +00:00
thorpej	91da5a2e36	If the procfs mount is marked as linux-compat, then allow proc lookup by any LWP ID in the proc, not just the canonical PID.	2020-04-29 01:56:54 +00:00
christos	2f4aa83fe6	Allow root to access and modify system space extended attributes. XXX: this routine should not be using the string, but the attribute namespace. I have fixed this in the ACL code.	2020-04-25 22:28:47 +00:00
ad	e88c11f417	Revert the changes made in February to make cwdinfo use mostly lockless, which relied on taking extra vnode refs. Having benchmarked various experimental changes over the past few months it seems that it's better to avoid vnode refs as much as possible. cwdi_lock as a RW lock already did that to some extent for getcwd() and will permit the same for namei() too.	2020-04-21 21:42:47 +00:00
martin	0a1cb03168	Add missing include of <sys/atomic.h> to fix the build	2020-04-20 13:30:34 +00:00
htodd	7bceb35060	Sort include files.	2020-04-20 05:22:28 +00:00
htodd	6b104688ac	Add missing include to fix build.	2020-04-20 05:11:00 +00:00
thorpej	a29147fa13	- Only increment nprocs when we're creating a new process, not just when allocating a PID. - Per above, proc_free_pid() no longer decrements nprocs. It's now done in proc_free() right after proc_free_pid(). - Ensure nprocs is accessed using atomics everywhere.	2020-04-19 20:31:59 +00:00
jdolecek	5dec3f0781	when determining I/O block size for VBLK device, only use pi_bsize returned by DIOCGPARTINFO if it's bigger than DEV_BSIZE and less than MAXBSIZE (MAXPHYS) fixes panic "buf mem pool index 8" in buf_mempoolidx() when the disklabel contains bsize 128KB and something reads the block device - buffer cache can't allocate bufs bigger than MAXPHYS	2020-04-13 20:02:27 +00:00
ad	23bf88000c	Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function that hides the details and does atomic_load_relaxed(). Signature matches FreeBSD.	2020-04-13 19:23:17 +00:00
jdolecek	8d1e4d9c00	switch to kmem_zalloc() instead of malloc() for struct kernfs_mount	2020-04-07 08:35:49 +00:00
jdolecek	8a3ee72648	switch KERNFS_ALLOCENTRY() to use kmem_zalloc() instead of malloc()	2020-04-07 08:14:42 +00:00
ad	c90f9c8c81	Merge the remaining changes from the ad-namecache branch, affecting namei() and getcwd(): - push vnode locking back as far as possible. - do most lookups directly in the namecache, avoiding vnode locks & refs. - don't block new refs to vnodes across VOP_INACTIVE(). - get shared locks for VOP_LOOKUP() if the file system supports it. - correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places. Possible future enhancements: - make the lookups lockless. - support dotdot lookups by being lockless and inferring absence of chroot. - maybe make it work for layered file systems. - avoid vnode references at the root & cwd.	2020-04-04 20:49:30 +00:00
ad	1d7848ad43	Process concurrent page faults on individual uvm_objects / vm_amaps in parallel, where the relevant pages are already in-core. Proposed on tech-kern. Temporarily disabled on MP architectures with __HAVE_UNLOCKED_PMAP until adjustments are made to their pmaps.	2020-03-22 18:32:41 +00:00
pgoyette	3a65820412	Finish the transition to SYSCTL_SETUP by removing local sysctllog in favor of the one provided by the module infrastructure.	2020-03-21 16:30:39 +00:00
ad	1912643ff9	Tweak the March 14th change to make page waits interlocked by pg->interlock. Remove unneeded changes and only deal with the PQ_WANTED flag, to exclude possible bugs.	2020-03-17 18:31:38 +00:00
pgoyette	9120d4511b	Use the module subsystem's ability to process SYSCTL_SETUP() entries to automate installation of sysctl nodes. Note that there are still a number of device and pseudo-device modules that create entries tied to individual device units, rather than to the module itself. These are not changed.	2020-03-16 21:20:09 +00:00
ad	94e054a199	Update a comment.	2020-03-14 21:47:41 +00:00
ad	da3ef92bf6	Make uvm_pagemarkdirty() responsible for putting vnodes onto the syncer work list. Proposed on tech-kern@.	2020-03-14 20:45:23 +00:00

1 2 3 4 5 ...

1366 Commits