Commit Graph

104 Commits

Author SHA1 Message Date
hannken ca4932dc6d Print dangling vnode before panic() to help debug.
PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"
2024-01-17 10:17:29 +00:00
hannken a5288eef9a Include "veriexec.h" and <sys/verified_exec.h> to run
veriexec_unmountchk() on "NVERIEXEC > 0".
2023-12-28 12:48:08 +00:00
riastradh 7dd29855d3 kern: Eliminate most __HAVE_ATOMIC_AS_MEMBAR conditionals.
I'm leaving in the conditional around the legacy membar_enters
(store-before-load, store-before-store) in kern_mutex.c and in
kern_lock.c because they may still matter: store-before-load barriers
tend to be the most expensive kind, so eliding them is probably
worthwhile on x86.  (It also may not matter; I just don't care to do
measurements right now, and it's a single valid and potentially
justifiable use case in the whole tree.)

However, membar_release/acquire can be mere instruction barriers on
all TSO platforms including x86, so there's no need to go out of our
way with a bad API to conditionalize them.  If the procedure call
overhead is measurable we just could change them to be macros on x86
that expand into __insn_barrier.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2023/02/23/msg028729.html
2023-02-24 11:02:27 +00:00
hannken 85cb97f0d7 Harden layered file systems usage of field "mnt_lower" against
forced unmounts of the lower layer.

- Dont allow "dead_rootmount" as lower layer.

- Take file system busy before a vfs operation walks down the stack.

Reported-by: syzbot+27b35e5675b1753cec03@syzkaller.appspotmail.com
Reported-by: syzbot+99071492e3de2eff49e9@syzkaller.appspotmail.com
2022-12-09 10:33:18 +00:00
hannken 3652e3db73 If built with DEBUG Limit the depth of file system stack so kernel sanitizers
may stress mount/unmount without exhausting the kernel stack.
2022-11-10 10:55:00 +00:00
hannken cfe3f2c399 Add a helper to set or clear lower mount and use it.
Always add a reference to the lower mount.

Ride 9.99.105
2022-11-04 11:20:39 +00:00
riastradh c0216ccf58 sys/filedesc.h: New home for extern cwdi0. 2022-10-26 23:39:10 +00:00
riastradh be5e920ca4 vflush(9): Insert `involuntary' preemption point at each vnode.
Currently there is a voluntary yield every 100ms, but that's a long
time.  Should help to avoid hogging the CPU while flushing lots of
data to big disks on systems without kpreemption.
2022-09-13 09:35:31 +00:00
hannken 5c869fd70e Two defects in vfs_getnewfsid():
- Parallel mounts may get the same fsid.  Always increment "xxxfs_mntid"
  to make it unlikely.

- Directly walk "mountlist" to prevent a rare deadlock where one thread
  holds a vnode locked, calls vfs_getnewfsid() and the iterator has to
  wait for a suspended file system while the thread suspending needs
  this vnode lock.
2022-08-26 11:03:53 +00:00
hannken 10a8222c49 Protect changing "v_mountedhere" with file system suspension instead
of vnode lock.
2022-08-22 09:14:24 +00:00
hannken ba6f7f8d2a Suspend file system after VFS_MOUNT() and before taking mnt_updating.
Prevents deadlock against concurrent unmounts of layered file systems.
2022-07-08 07:43:19 +00:00
riastradh ef3476fb57 sys: Use membar_release/acquire around reference drop.
This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.
2022-04-09 23:38:31 +00:00
riastradh a2155d69ea specfs: Let spec_node_lookup_by_dev wait for reclaim to finish.
vdevgone relies on this to ensure that if there is a concurrent
revoke in progress, it will wait for that revoke to finish -- that
way, it can guarantee all I/O operations have completed and the
device is closed.
2022-03-28 12:37:46 +00:00
riastradh 62e877acb3 vfs(9): Add missing vnode lock around VOP_CLOSE in vfs_mountroot.
Maybe vnode_if.c should be taught to KASSERT the vnode lock now that
locks always work.
2022-03-24 12:59:56 +00:00
hannken ce218897d7 Lock vnode across VOP_OPEN. 2022-03-19 13:50:02 +00:00
andvar db54414c68 s/paniced/panicked/ and s/borken/broken/ in comments. 2022-03-16 20:31:01 +00:00
riastradh 122a3e8a60 sys: Membar audit around reference count releases.
If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

	  Thread A			  Thread B
	obj->foo = 42;			obj->baz = 73;
	mumble(&obj->bar);		grumble(&obj->quux);
	/* membar_exit(); */		/* membar_exit(); */
	atomic_dec -- not last		atomic_dec -- last
					/* membar_enter(); */
					KASSERT(invariant(obj->foo,
					    obj->bar));
					free_stuff(obj);

The memory barriers ensure that

	obj->foo = 42;
	mumble(&obj->bar);

in thread A happens before

	KASSERT(invariant(obj->foo, obj->bar));
	free_stuff(obj);

in thread B.  Without them, this ordering is not guaranteed.

So in general it is necessary to do

	membar_exit();
	if (atomic_dec_uint_nv(&obj->refcnt) != 0)
		return;
	membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting.  (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these.  Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that.  It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.
2022-03-12 15:32:30 +00:00
hannken 1eaee3c4bf Stop clearing "v_mountedhere" in mount_domount() error path.
We did not set it and may clear the value from another mount.
2022-02-04 15:33:57 +00:00
hannken dc06ff169e Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)
2021-02-16 09:56:32 +00:00
hannken 0b5a6352dc We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com
2020-11-19 10:47:47 +00:00
hannken d8a571076b Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com
2020-10-13 13:15:39 +00:00
ad 0eaaa024ea Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.
2020-05-23 23:42:41 +00:00
hannken a6d5e17025 Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

  forcefully unmounting / (/dev/raid0a)...
  sd1: detached
  sd0: detached
  raid0: cache flush to component /dev/sd0a failed.
  raid0: cache flush to component /dev/sd1a failed.
  fatal page fault in supervisor mode
  Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown
2020-05-01 08:45:01 +00:00
ad e88c11f417 Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible.  cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.
2020-04-21 21:42:47 +00:00
ad 4e73754f15 Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY.  Removes the dependency on bufhash.
2020-04-20 21:39:05 +00:00
hannken 3c531f4f39 Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown
2020-04-19 13:26:17 +00:00
ad 23bf88000c Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed().  Signature matches
FreeBSD.
2020-04-13 19:23:17 +00:00
maxv 983fd9ccfe hardclock_ticks -> getticks() 2020-04-13 15:54:45 +00:00
ad b5adab0e05 vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.
2020-04-10 22:34:36 +00:00
ad 926b25e154 Merge from ad-namecache:
- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
  more cache-conscious way.  With that done, go back to adjusting v_usecount
  with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
  Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.
2020-02-23 22:14:03 +00:00
ad c2e9cb9413 VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode.  Matches
FreeBSD.
2020-01-17 20:08:06 +00:00
ad 7d06f3305f Make mntvnode_lock per-mount, and address false sharing of struct mount. 2019-12-22 19:47:34 +00:00
maxv 1bb344ad1f NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.
2019-11-16 10:07:53 +00:00
christos 5d96c08a38 If we could not start extattr for some reason, don't advertise extattr in the
mount.
2019-08-19 09:32:42 +00:00
hannken 583a153e11 Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount.  Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.
2019-02-20 10:08:37 +00:00
hannken f421b3668b Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.
2019-02-20 10:07:27 +00:00
hannken 88bcb4546a Allow dounmount() with file system already suspended.
Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.
2019-02-05 09:49:44 +00:00
hannken 28650af9eb Change forced unmount to revert open device vnodes to anonymous devices. 2017-08-21 09:00:21 +00:00
hannken 287643b0da Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.
2017-06-04 08:05:41 +00:00
chs fd34ea77eb remove checks for failure after memory allocation calls that cannot fail:
kmem_alloc() with KM_SLEEP
  kmem_zalloc() with KM_SLEEP
  percpu_alloc()
  pserialize_create()
  psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
2017-06-01 02:45:05 +00:00
hannken 69174779b1 With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73
2017-05-24 09:53:55 +00:00
hannken c2c49e1ed2 Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.
2017-05-24 09:52:59 +00:00
hannken 677cf1d8b4 Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.
2017-05-17 12:45:03 +00:00
hannken 4f4cfe27b2 Enter fstrans from _vfs_busy() and leave from vfs_unbusy().
Adapt sched_sync() and do_sys_sync().
2017-05-07 08:26:58 +00:00
hannken c18a56f135 Move fstrans initialization to vfs_mountalloc(). 2017-05-07 08:24:20 +00:00
hannken 853d034c97 Remove now invalid comment. 2017-05-07 08:21:08 +00:00
hannken bd152b56b5 Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer. 2017-04-17 08:34:27 +00:00
hannken eb8533a8b6 No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?
2017-04-17 08:32:55 +00:00
hannken 20bb034f5b Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.
2017-04-17 08:32:00 +00:00
hannken ebb8f73b4b Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount.  Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).
2017-04-17 08:31:01 +00:00