NetBSD

Commit Graph

Author	SHA1	Message	Date
riastradh	9fc453562f	Round of uvm.h cleanup. The poorly named uvm.h is generally supposed to be for uvm-internal users only. - Narrow it to files that actually need it -- mostly files that need to query whether curlwp is the pagedaemon, which should maybe be exposed by an external header. - Use uvm_extern.h where feasible and uvm_*.h for things not exposed by it. We should split up uvm_extern.h but this will serve for now to reduce the uvm.h dependencies. - Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use UVMHIST(ubchist), since ubchist is declared in uvm.h but the reference evaporates if UVMHIST is not defined, so we reduce header file dependencies. - Make uvm_device.h and uvm_swap.h independently includable while here. ok chs@	2020-09-05 16:30:10 +00:00
riastradh	44afc3b3f9	genfs_rename: Fix deadlocks in cross-directory cyclic rename. Reproducer: A: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600); rmdir("c/d/e"); rmdir("c/d"); } B: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600); rename("c", "c/d/e"); } C: for (;;) { mkdir("c", 0600); mkdir("c/d", 0600); mkdir("c/d/e", 0600); rename("c/d/e", "c"); } Deadlock: - A holds c and wants to lock d; and either - B holds . and d and wants to lock c, or - C holds . and d and wants to lock c. The problem with these is that genfs_rename_enter_separate in B or C tried lock order .->d->c->e (in A/B, fdvp->tdvp->fvp->tvp; in A/C, tdvp->fdvp->tvp->fvp) which violates the ancestor->descendant order .->c->d->e. The resolution is to change B to do fdvp->fvp->tdvp->tvp and C to do tdvp->tvp->fdvp->fvp. But there's an edge case: tvp and fvp might be the same (hard links), and we can't detect that until after we've looked them both up -- and in some file systems (I'm looking at you, ufs), there is no mere lookup operation, only lookup-and-lock, so we can't even hold the lock on one of tvp or fvp when we look up the other one if there's a chance they might be the same. Fortunately the cases (a) tvp = fvp (b) tvp or fvp is a directory are mutually exclusive as long as directories cannot be hard-linked. In case (a) we can just defer locking {tvp, fvp} until the end, because it can't possibly have {fdvp or fvp, tdvp or tvp} as descendants. In case (b) we can just lock them in the order fdvp->fvp->tdvp->tvp or tdvp->tvp->fdvp->fvp if the first one of {fvp, tvp} is a directory, because it can't possibly coincide with the second one of {fvp, tvp}. With this change, we can now prove that the locking order is consistent with the ancestor->descendant partial ordering. Where two nodes are incommensurate under that partial ordering, they are only ever locked by rename and there is only ever one rename at a time. Proof: - For same-directory renames, genfs_rename_enter_common locks the directory first and then the children. The order directory->child[i] is consistent with ancestor->descendant and child[0]/child[1] are incommensurate. - For cross-directory renames: . While a rename is in progress and the fs-wide rename lock is held, directories can be created or removed but not changed, so the outcome of gro_genealogy -- which, given fdvp and tdvp, returns the node N relating fdvp/N/.../tdvp or null if there is none -- can only transition from finding N to not finding N, if one of the directories is removed while any of the vnodes are unlocked. Merely creating directories cannot change the ancestry of tdvp, and concurrent renames are not possible. Thus, if a gro_genealogy determined the operation to have the form fdvp/N/.../tdvp, then it might cease to have that form, but only because tdvp was removed which will harmlessly cause the rename to fail later on. Similarly, if gro_genealogy determined the operation _not_ to have the form fdvp/N/.../tdvp then it can't begin to have that form until after the rename has completed. The lock order is, => for fdvp/.../tdvp: 1. lock fdvp 2. lookup(/lock/unlock) fvp (consistent with fdvp->fvp) 3. lock fvp if a directory (consistent with fdvp->fvp) 4. lock tdvp (consistent with fdvp->tdvp and possibly fvp->tdvp) 5. lookup(/lock/unlock) tvp (consistent with tdvp->tvp) 6. lock fvp if a nondirectory (fvp->t* or fvp->fdvp is impossible) 7. lock tvp if not fvp (tvp->f* is impossible unless tvp=fvp) => for incommensurate fdvp & tdvp, or for tdvp/.../fdvp: 1. lock tdvp 2. lookup(/lock/unlock) tvp (consistent with tdvp->tvp) 3. lock tvp if a directory (consistent with tdvp->tvp) 4. lock fdvp (either incommensurate with tdvp and/or tvp, or consistent with tdvp(->tvp)->fdvp) 5. lookup(/lock/unlock) fvp (consistent with fdvp->fvp) 6. lock tvp if a nondirectory (tvp->f* or tvp->tdvp is impossible) 7. lock fvp if not tvp (fvp->t* is impossible unless fvp=tvp) Deadlocks found by hannken@; resolution worked out with dholland@. XXX I think we could improve concurrency somewhat -- with a likely big win for applications like tar and rsync that create many files with temporary names and then rename them to the permanent one in the same directory -- by making vfs_renamelock a reader/writer lock: any number of same-directory renames, or exactly one cross-directory rename, at any one time.	2020-09-05 02:47:03 +00:00
simonb	bf74807839	Remove trailing \n from UVMHIST_LOG() format strings.	2020-08-19 07:29:00 +00:00
chs	19303cecfc	centralize calls from UVM to radixtree into a few functions. in those functions, assert that the object lock is held in the correct mode.	2020-08-14 09:06:14 +00:00
rin	3123ec52cf	Output offsets in hex for UVMHIST.	2020-08-10 11:09:15 +00:00
christos	ad1efe3529	accmode should be accmode_t	2020-08-07 18:14:21 +00:00
christos	79e3c74f8e	Introduce genfs_pathconf() and use it for the default case in all filesystems.	2020-06-27 17:29:17 +00:00
ad	2806b3da8b	genfs_putpages(): when building a cluster make use of pages in the in the existing uvm_page_array.	2020-06-14 00:25:22 +00:00
ad	ba90a6ba38	Counter tweaks: - Don't need to count anonpages+filepages any more; clean+unknown+dirty for each kind of page can be summed to get the totals. - Track the number of free pages with a counter so that it's one less thing for the allocator to do, which opens up further options there. - Remove cpu_count_sync_one(). It has no users and doesn't save a whole lot. For the cheap option, give cpu_count_sync() a boolean parameter indicating that a cached value is okay, and rate limit the updates for cached values to hz.	2020-06-11 22:21:05 +00:00
ad	4b8a875ae2	uvm_availmem(): give it a boolean argument to specify whether a recent cached value will do, or if the very latest total must be fetched. It can be called thousands of times a second and fetching the totals impacts not only the calling LWP but other CPUs doing unrelated activity in the VM system.	2020-06-11 19:20:42 +00:00
rin	46b290a0c3	struct statvfs is too large for stack. Use malloc(9) instead. XXX Switch to kmem(9) for entire this file. Frame size, e.g. for m68k, becomes: 3292 --> 12	2020-05-31 08:38:54 +00:00
bouyer	ecb2afc2be	Add need-flags for kernfs. Compile Xen kernfs support only if kernfs is compiled in the kernel. Should fix MODULAR build.	2020-05-26 10:37:24 +00:00
ad	4bfe043955	- Alter the convention for uvm_page_array slightly, so the basic search parameters can't change part way through a search: move the "uobj" and "flags" arguments over to uvm_page_array_init() and store those with the array. - With that, detect when it's not possible to find any more pages in the tree with the given search parameters, and avoid repeated tree lookups if the caller loops over uvm_page_array_fill_and_peek().	2020-05-25 21:15:10 +00:00
ad	0eaaa024ea	Move proc_lock into the data segment. It was dynamically allocated because at the time we had mutex_obj_alloc() but not __cacheline_aligned.	2020-05-23 23:42:41 +00:00
christos	44ed7d42f1	Fix EPERM vs EACCES on chtimes (thanks @hannken)	2020-05-20 17:06:15 +00:00
christos	10d4c4e928	remove debugging, it is just clutter.	2020-05-18 19:55:42 +00:00
christos	e59d6517f7	Fix EPERM vs EACCES return.	2020-05-18 19:42:16 +00:00
ad	ff872804dc	Start trying to reduce cache misses on vm_page during fault processing. - Make PGO_LOCKED getpages imply PGO_NOBUSY and remove the latter. Mark pages busy only when there's actually I/O to do. - When doing COW on a uvm_object, don't mess with neighbouring pages. In all likelyhood they're already entered. - Don't mess with neighbouring VAs that have existing mappings as replacing those mappings with same can be quite costly. - Don't enqueue pages for neighbour faults unless not enqueued already, and don't activate centre pages unless uvmpdpol says its useful. Also: - Make PGO_LOCKED getpages on UAOs work more like vnodes: do gang lookup in the radix tree, and don't allocate new pages. - Fix many assertion failures around faults/loans with tmpfs.	2020-05-17 19:38:16 +00:00
christos	9aa2a9c323	Add ACL support for FFS. From FreeBSD.	2020-05-16 18:31:45 +00:00
riastradh	8f400c1021	Put forward declaration a little further forward to unbreak build.	2020-04-29 07:18:24 +00:00
thorpej	91da5a2e36	If the procfs mount is marked as linux-compat, then allow proc lookup by any LWP ID in the proc, not just the canonical PID.	2020-04-29 01:56:54 +00:00
christos	2f4aa83fe6	Allow root to access and modify system space extended attributes. XXX: this routine should not be using the string, but the attribute namespace. I have fixed this in the ACL code.	2020-04-25 22:28:47 +00:00
ad	e88c11f417	Revert the changes made in February to make cwdinfo use mostly lockless, which relied on taking extra vnode refs. Having benchmarked various experimental changes over the past few months it seems that it's better to avoid vnode refs as much as possible. cwdi_lock as a RW lock already did that to some extent for getcwd() and will permit the same for namei() too.	2020-04-21 21:42:47 +00:00
martin	0a1cb03168	Add missing include of <sys/atomic.h> to fix the build	2020-04-20 13:30:34 +00:00
htodd	7bceb35060	Sort include files.	2020-04-20 05:22:28 +00:00
htodd	6b104688ac	Add missing include to fix build.	2020-04-20 05:11:00 +00:00
thorpej	a29147fa13	- Only increment nprocs when we're creating a new process, not just when allocating a PID. - Per above, proc_free_pid() no longer decrements nprocs. It's now done in proc_free() right after proc_free_pid(). - Ensure nprocs is accessed using atomics everywhere.	2020-04-19 20:31:59 +00:00
jdolecek	5dec3f0781	when determining I/O block size for VBLK device, only use pi_bsize returned by DIOCGPARTINFO if it's bigger than DEV_BSIZE and less than MAXBSIZE (MAXPHYS) fixes panic "buf mem pool index 8" in buf_mempoolidx() when the disklabel contains bsize 128KB and something reads the block device - buffer cache can't allocate bufs bigger than MAXPHYS	2020-04-13 20:02:27 +00:00
ad	23bf88000c	Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function that hides the details and does atomic_load_relaxed(). Signature matches FreeBSD.	2020-04-13 19:23:17 +00:00
jdolecek	8d1e4d9c00	switch to kmem_zalloc() instead of malloc() for struct kernfs_mount	2020-04-07 08:35:49 +00:00
jdolecek	8a3ee72648	switch KERNFS_ALLOCENTRY() to use kmem_zalloc() instead of malloc()	2020-04-07 08:14:42 +00:00
ad	c90f9c8c81	Merge the remaining changes from the ad-namecache branch, affecting namei() and getcwd(): - push vnode locking back as far as possible. - do most lookups directly in the namecache, avoiding vnode locks & refs. - don't block new refs to vnodes across VOP_INACTIVE(). - get shared locks for VOP_LOOKUP() if the file system supports it. - correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places. Possible future enhancements: - make the lookups lockless. - support dotdot lookups by being lockless and inferring absence of chroot. - maybe make it work for layered file systems. - avoid vnode references at the root & cwd.	2020-04-04 20:49:30 +00:00
ad	1d7848ad43	Process concurrent page faults on individual uvm_objects / vm_amaps in parallel, where the relevant pages are already in-core. Proposed on tech-kern. Temporarily disabled on MP architectures with __HAVE_UNLOCKED_PMAP until adjustments are made to their pmaps.	2020-03-22 18:32:41 +00:00
pgoyette	3a65820412	Finish the transition to SYSCTL_SETUP by removing local sysctllog in favor of the one provided by the module infrastructure.	2020-03-21 16:30:39 +00:00
ad	1912643ff9	Tweak the March 14th change to make page waits interlocked by pg->interlock. Remove unneeded changes and only deal with the PQ_WANTED flag, to exclude possible bugs.	2020-03-17 18:31:38 +00:00
pgoyette	9120d4511b	Use the module subsystem's ability to process SYSCTL_SETUP() entries to automate installation of sysctl nodes. Note that there are still a number of device and pseudo-device modules that create entries tied to individual device units, rather than to the module itself. These are not changed.	2020-03-16 21:20:09 +00:00
ad	94e054a199	Update a comment.	2020-03-14 21:47:41 +00:00
ad	da3ef92bf6	Make uvm_pagemarkdirty() responsible for putting vnodes onto the syncer work list. Proposed on tech-kern@.	2020-03-14 20:45:23 +00:00
ad	5972ba1600	Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW locks out of the equation for sleep/wakeup, and allows observing+waiting for busy pages when holding only a read lock. Proposed on tech-kern.	2020-03-14 20:23:51 +00:00
ad	6ba8fa570a	Unused variable.	2020-03-14 19:07:22 +00:00
ad	16d4fad635	- Hide the details of SPCF_SHOULDYIELD and related behind a couple of small functions: preempt_point() and preempt_needed(). - preempt(): if the LWP has exceeded its timeslice in kernel, strip it of any priority boost gained earlier from blocking.	2020-03-14 18:08:38 +00:00
ad	01f564d8c8	OR into bp->b_cflags; don't overwrite.	2020-03-14 15:31:29 +00:00
ad	bf79731039	Tighten up the locking around vp->v_iflag a little more after the recent split of vmobjlock & v_interlock.	2020-02-27 22:12:53 +00:00
ad	1316228274	v_interlock -> vmobjlock	2020-02-24 20:49:51 +00:00
ad	bada6a544d	v_interlock -> vmobjlock	2020-02-24 20:44:25 +00:00
ad	926b25e154	Merge from ad-namecache: - Have a stab at clustering the members of vnode_t and vnode_impl_t in a more cache-conscious way. With that done, go back to adjusting v_usecount with atomics and keep vi_lock directly in vnode_impl_t (saves KVA). - Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT(). Make sure LK_UPGRADE always comes with LK_NOWAIT. - Make cwdinfo use mostly lockless.	2020-02-23 22:14:03 +00:00
ad	d2a0ebb67a	UVM locking changes, proposed on tech-kern: - Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock. - Break v_interlock and vmobjlock apart. v_interlock remains a mutex. - Do partial PV list locking in the x86 pmap. Others to follow later.	2020-02-23 15:46:38 +00:00
riastradh	16a0e8da32	Use vn_bwrite, not genfs_nullop, for VOP_BWRITE. VOP_BWRITE is responsible for calling biodone; can't just leave it hanging. XXX pullup	2020-02-20 15:48:05 +00:00
chs	5232c510c9	remove the aiodoned thread. I originally added this to provide a thread context for doing page cache iodone work, but since then biodone() has changed to hand off all iodone work to a softint thread, so we no longer need the special-purpose aiodoned thread.	2020-02-18 20:23:17 +00:00
riastradh	b26fba762e	Use specfs vnops for specnodes in kernfs. While here, don't filter out rootdev and rrootdev merely because they're not cached. Fixes the elusive /kern/rootdev and /kern/rrootdev nodes, which only appeared sometimes when they felt like it, and fixes operations on /kern/rootdev and /kern/rrootdev always returning EOPNOTSUPP. We didn't seem to have a single PR for these issues but the following PRs are all relevant: PR bin/13564 PR kern/38265 PR kern/38778 PR kern/45974 XXX pullup-9, pullup-8, pullup-7, pullup-6, pullup-5, pullup-4, pullup-3, pullup-2, pullup-1.4T...	2020-02-04 04:19:24 +00:00

1 2 3 4 5 ...

1354 Commits