Commit Graph

1446 Commits

Author SHA1 Message Date
riastradh 86535c9941 specfs: KNF. No functional change intended. 2023-04-22 15:32:49 +00:00
hannken 149aa6be82 Remove unused specdev member sd_rdev.
Ride 10.99.4
2023-04-22 14:30:16 +00:00
riastradh ab579ad815 genfs: KASSERT(A && B) -> KASSERT(A); KASSERT(B) 2023-04-09 12:26:36 +00:00
hannken f8079ac547 Fix genfs_can_chtimes() to also handle the condition:
If the time pointer is null, then write permission
  on the file is also sufficient.

From FreeBSD.

Should fix PR kern/57246 "NFS group permissions regression"
2023-03-03 10:02:51 +00:00
hannken 6017299f5f Set IMNT_MPSAFE only if the lower layer has it set. 2023-02-06 10:32:58 +00:00
hannken 85cb97f0d7 Harden layered file systems usage of field "mnt_lower" against
forced unmounts of the lower layer.

- Dont allow "dead_rootmount" as lower layer.

- Take file system busy before a vfs operation walks down the stack.

Reported-by: syzbot+27b35e5675b1753cec03@syzkaller.appspotmail.com
Reported-by: syzbot+99071492e3de2eff49e9@syzkaller.appspotmail.com
2022-12-09 10:33:18 +00:00
hannken cfe3f2c399 Add a helper to set or clear lower mount and use it.
Always add a reference to the lower mount.

Ride 9.99.105
2022-11-04 11:20:39 +00:00
riastradh f29311a918 miscfs/fifofs/fifo.h: New home for extern fifo_vnodeop_opv_desc.
Add include guard and fix missing includes while here too.
2022-10-26 23:40:20 +00:00
riastradh fa76fa97ef miscfs/specfs/specdev.h: New home for extern spec_vnodeop_opv_desc.
Also use it for extern spec_vnodeop_p, which is already there.
2022-10-26 23:40:08 +00:00
riastradh 26784725ee miscfs/deadfs/deadfs.h: New home for deadfs-related externs.
XXX regen sys/kern/vnode_if.c and the others
2022-10-26 23:39:43 +00:00
riastradh d86da41736 specfs(9): Attribute blame by stack trace for write to r/o medium. 2022-10-15 15:20:46 +00:00
riastradh ae1d5f78a6 specfs(9): XXX comment: what if read downgrades lock? 2022-09-21 10:59:10 +00:00
riastradh 2ab9954344 specfs: Refuse to open a closing-in-progress block device.
We could wait for close to complete, but if this happened ever so
slightly earlier it would lead to EBUSY anyway, so there's no point
in adding logic for that -- either way the caller neglected to wait
for the last close to finish before trying to open it the device
again.

https://mail-index.netbsd.org/current-users/2022/08/09/msg042800.html

Reported-by: syzbot+4388f20706ec8a4c8db0@syzkaller.appspotmail.com
https://syzkaller.appspot.com/bug?id=47c67ab6d3a87514d0707882a9ad6671beaa8642

Reported-by: syzbot+0f1756652dce4cb341ed@syzkaller.appspotmail.com
https://syzkaller.appspot.com/bug?id=a632ce762d64241fc82a9bc57230b7b7c7095d1a
2022-08-12 21:25:39 +00:00
riastradh a5fdd92957 specfs: Assert !closing on successful open.
- If there's a prior concurrent close, it must have interrupted this
  open.

- If there's a new concurrent close, it must wait until this open has
  released device_lock before it can revoke.
2022-08-12 17:06:01 +00:00
riastradh 938c8ecc70 specfs: Assert opencnt>0 on successful open. 2022-08-12 17:05:49 +00:00
riastradh 25af48d0b5 specfs: Sprinkle opencnt/opened/closing assertions.
There seems to be a bug here but I'm not sure what it is yet:

https://mail-index.netbsd.org/current-users/2022/08/09/msg042800.html
https://syzkaller.appspot.com/bug?id=47c67ab6d3a87514d0707882a9ad6671beaa8642

The decision to actually invoke d_close is serialized under
device_lock, so it should not be possible for more than one process
to close at the same time, but syzbot and kre found a way for
sd_closing to be false later in spec_close.  Let's make sure it's
false when we're making what should be the exclusive decision to
close.

We can't assert !sd_opened before cancel and spec_io_drain, because
those are necessary to interrupt and wait for pending opens that
might later set sd_opened, but we can assert !sd_opened afterward
because once sd_closing is true nothing should set sd_opened.
2022-08-11 12:52:24 +00:00
thorpej 75d451f371 Make kqueue event status for vnodes shareable, and for stacked file systems
like nullfs, make the upper vnode share that status with the lower vnode.

And, lo, NetBSD 9.99.99.

Fixes PR kern/56713.
2022-07-18 04:30:30 +00:00
hannken 4c6398a93d Make dead vfs ops "vfs_statvfs" and "vfs_vptofh" return EOPNOTSUPP.
Both operations may originate from (possible dead) vnodes.

Reported-by: syzbot+eceb203d44457742be3b@syzkaller.appspotmail.com
2022-07-08 07:44:17 +00:00
hannken cc261bb6d7 Don't use LK_RETRY as we need an active vnode here. 2022-07-08 07:43:48 +00:00
hannken 4ad9c3956a Handle IMNT_GONE on the file system we want suspended not its
lowest mount we really suspend.
2022-07-08 07:42:05 +00:00
shm 49bf48f547 Add missing permission check 2022-06-17 14:30:37 +00:00
andvar 75d2abaeb1 fix various typos in comments and output/log messages. 2022-04-10 09:50:44 +00:00
riastradh c8ea12a8d3 driver(9): New devsw d_cancel op to interrupt I/O before close.
If specified, when revoking a device node or closing its last open
node, specfs will:

1. Call d_cancel, which should return promptly without blocking.
2. Wait for all concurrent d_read/write/ioctl/&c. to drain.
3. Call d_close.

Otherwise, specfs will:

1. Call d_close.
2. Wait for all concurrent d_read/write/ioctl/&c. to drain.

This fallback is problematic because often parts of d_close rely on
concurrent devsw operations to have completed already, so it is up to
each driver to have its own mechanism for waiting, and the extra step
in (2) is almost redundant.  But it is still important to ensure that
devsw operations are not active by the time a module tries to invoke
devsw_detach, because only d_open is protected against that.

The signature of d_cancel matches d_close, mostly so we don't raise
questions about `why is this different?'; the lwp argument is not
useful but we should remove it from open/cancel/close all at the same
time.

The only way d_cancel should fail, if it does at all, is with ENODEV,
meaning the driver doesn't support cancelling outstanding I/O, and
will take responsibility for that in d_close.  I would make it return
void and only have bdev_cancel and cdev_cancel possibly return ENODEV
so specfs can detect whether a driver supports it, but this would
break the pattern around devsw operation types.

Drivers are allowed to omit it from struct bdevsw, struct cdevsw --
if so, it is as if they used a function that just returns ENODEV.

XXX kernel ABI change to struct bdevsw/cdevsw requires bump
2022-03-28 12:39:10 +00:00
riastradh 24d512b12a specfs: Reorder struct specnode members to save padding.
Shrinks from 40 bytes to 32 bytes on LP64 systems this way.
2022-03-28 12:38:04 +00:00
riastradh 51a3f758d3 specfs: Remove specnode from hash table in spec_node_revoke.
Previously, it was possible for spec_node_lookup_by_dev to handle a
speconde that a concurrent spec_node_destroy is about to remove from
the hash table and then free, as soon as spec_node_lookup_by_dev
releases device_lock.

Now, the ordering is:

1. Remove specnode from hash table in spec_node_revoke.  At this
   point, no _new_ vnode references are possible (other than possibly
   one acquired by vcache_vget under v_interlock), but there may be
   existing ones.

2. Mark vnode reclaimed so vcache_vget will fail.

3. The last vrele (or equivalent logic in vcache_vget) will then free
   the specnode in spec_node_destroy.

This way, _if_ a thread in spec_node_lookup_by_dev finds a specnode
in the hash table under device_lock/v_interlock, _then_ it will not
be freed until the thread completes vcache_vget.

This change requires calling spec_node_revoke unconditionally for
device special nodes, not just for active ones.  Might introduce
slightly more contention on device_lock but not much because we
already have to take it in this path anyway a little later in
spec_node_destroy.
2022-03-28 12:37:56 +00:00
riastradh a2155d69ea specfs: Let spec_node_lookup_by_dev wait for reclaim to finish.
vdevgone relies on this to ensure that if there is a concurrent
revoke in progress, it will wait for that revoke to finish -- that
way, it can guarantee all I/O operations have completed and the
device is closed.
2022-03-28 12:37:46 +00:00
riastradh b89ee9efb4 specfs: Assert opencnt is nonzero before decrementing. 2022-03-28 12:37:35 +00:00
riastradh bd75be3e3d specfs: Take an I/O reference across bdev/cdev_open.
- Revoke is used to invalidate all prior access control checks when
  device permissions are changing, so it must wait for .d_open to exit
  so any new access must go through new access control checks.

- Revoke is used by vdevgone in xyz_detach to wait until all use of
  the driver's data structures have completed before xyz_detach frees
  them.

So we need to make sure spec_close waits for .d_open too.
2022-03-28 12:37:26 +00:00
riastradh a6a4e9ecd6 specfs: Wait for last close in spec_node_revoke.
Otherwise, revoke -- and vdevgone, in the detach path of removable
devices -- may complete while I/O operations are still running
concurrently.
2022-03-28 12:37:18 +00:00
riastradh daba8dd87b specfs: Prevent new opens while close is waiting to drain.
Otherwise, bdev/cdev_close could have cancelled all _existing_ opens,
and waited for them to complete (and freed resources used by them) --
but a new one could start, and hang (e.g., a tty), at the same time
spec_close tries to drain all pending I/O operations, one of which
(the new open) is now hanging indefinitely.

Preventing the new open from even starting until bdev/cdev_close is
finished and all I/O operations have drained avoids this deadlock.
2022-03-28 12:37:09 +00:00
riastradh c7aa557804 specfs: Take an I/O reference in spec_node_setmountedfs.
This is not quite correct.  We _should_ require the caller to hold a
vnode lock around spec_node_getmountedfs, and an exclusive vnode lock
around spec_node_setmountedfs, so that it is only necessary to check
whether revoke has already happened, not hold an I/O reference.

Unfortunately, various callers in various file systems don't follow
this sensible rule.  So let's at least make sure the vnode can't be
revoked in spec_node_setmountedfs, while we're in bdev_ioctl, and
leave a comment explaining what the sorry state of affairs is and how
to fix it later.
2022-03-28 12:37:01 +00:00
riastradh 66ae10f9a4 specfs: Drain all I/O operations after last .d_close call.
New kind of I/O reference on specdevs, sd_iocnt.  This could be done
with psref instead; I chose a reference count instead for now because
we already have to take a per-object lock anyway, v_interlock, for
vdead_check, so another atomic is not likely to hurt much more.  We
can always change the mechanism inside spec_io_enter/exit/drain later
on.

Make sure every access to vp->v_rdev or vp->v_specnode and every call
to a devsw operation is protected either:

- by the vnode lock (with vdead_check if we unlocked/relocked),
- by positive sd_opencnt,
- by spec_io_enter/exit, or
- by sd_opencnt management in open/close.
2022-03-28 12:36:51 +00:00
riastradh 71a1e06a17 specfs: Resolve a race between close and a failing reopen. 2022-03-28 12:36:42 +00:00
riastradh eb8abdf33b specfs: Document sn_opencnt, sd_opencnt, sd_refcnt. 2022-03-28 12:36:34 +00:00
riastradh 820dfcc78f specfs: Paranoia: Assert opencnt is zero on reclaim. 2022-03-28 12:36:26 +00:00
riastradh aa0e9abd05 specfs: Omit needless vdead_check in spec_fdiscard.
The vnode lock is held, so the vnode cannot be revoked without also
changing v_op so subsequent uses under the vnode lock will go to
deadfs's VOP_FDISCARD instead (which is genfs_eopnotsupp).
2022-03-28 12:36:18 +00:00
riastradh eecc7d184f specfs: Add a comment and assertion to spec_close about refcnts. 2022-03-28 12:36:09 +00:00
riastradh 8068be2d26 specfs: If sd_opencnt is zero, sn_opencnt had better be zero. 2022-03-28 12:36:00 +00:00
riastradh 442b916b2c specfs: Factor KASSERT out of switch in spec_open.
No functional change.
2022-03-28 12:35:52 +00:00
riastradh 4241b4d4c6 specfs: sn_gone cannot be set while we hold the vnode lock.
Revoke runs with the vnode lock too, which is exclusive.  Add an
assertion to this effect in spec_node_revoke to make it clear.
2022-03-28 12:35:44 +00:00
riastradh ce78318abc specfs: Reorganize D_DISK tail of spec_open and explain what's up.
No functional change intended.
2022-03-28 12:35:35 +00:00
riastradh 39225fd515 specfs: Factor VOP_UNLOCK/vn_lock out of switch for clarity.
No functional change.
2022-03-28 12:35:26 +00:00
riastradh 1ba4ba4825 specfs: Factor common device_lock out of switch for clarity.
No functional change.
2022-03-28 12:35:17 +00:00
riastradh 39e7464227 specfs: Delete bogus comment about .d_open/.d_close at same time.
Annoying as it is that .d_open and .d_close can run at the same time,
it is also necessary for tty semantics, where open can block
indefinitely, and it is the responsibility of close (called via
revoke) necessary to interrupt it.
2022-03-28 12:35:08 +00:00
riastradh c466e4b35b specfs: Split spec_open switch into three sections.
The sections are now:

1. Acquire open reference.

1a (intermezzo). Set VV_ISTTY.

2. Drop the vnode lock to call .d_open and autoload modules if
   necessary.

3. Handle concurrent revoke if it happenend, or release open reference
   if .d_open failed.

No functional change.  Sprinkle comments about problems.
2022-03-28 12:34:59 +00:00
riastradh 3213d6e5bd specfs: Factor common kauth check out of switch in spec_open.
No functional change.
2022-03-28 12:34:51 +00:00
riastradh 8f24965bf7 specfs: Assert v_type is VBLK or VCHR in spec_open.
Nothing else makes sense.  Prune dead branches (and replace default
case by panic).
2022-03-28 12:34:42 +00:00
riastradh 7005cbf9c7 specfs: Call bdev_open without the vnode lock.
There is no need for it to serialize opens, because they are already
serialized by sd_opencnt which for block devices is always either 0
or 1.

There's not obviously any other reason why the vnode lock should be
held across bdev_open, other than that it might be nice to avoid
dropping it if not necessary.  For character devices we always have
to drop the vnode lock because open might hang indefinitely, when
opening a tty, which is not allowed while holding the vnode lock.
2022-03-28 12:34:34 +00:00
riastradh 0760adde4b specfs: Note lock order for vnode lock, device_lock, v_interlock. 2022-03-28 12:34:25 +00:00
riastradh 04c6cbac06 driver(9): Eliminate D_MCLOSE.
D_MCLOSE was introduced a few years ago by mistake for audio(4),
which should have used -- and now does use -- fd_clone to create
per-open state.  The semantics was originally to call close once
every time the device node is closed, not only for the last close.
Nothing uses it any more, and it complicates reasoning about the
system, so let's simplify it away.
2022-03-28 12:34:17 +00:00