Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.
Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:
- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.
- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.
One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
The symptom was that sometimes file systems would occasionally not appear
in output from 'df' or 'mount' if the system was busy. Resolution:
- Make mount locks work somewhat like vm_map locks.
- vfs_trybusy() now only fails if the mount is gone, or if someone is
unmounting the file system. Simple contention on mnt_lock doesn't
cause it to fail.
- vfs_busy() will wait even if the file system is being unmounted.
we no longer need to guard against access from hardware interrupt handlers.
Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:
- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.
- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.
- The system spends less time at IPL_SCHED, and there is less lock activity.
- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.
With much feedback from matt@ and plunky@.
- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.
quick consensus on tech-kern
that use struct ifreq which have not been explicitly versioned.
If someone feels like fixing it with a list aproach, I think below is
a complete list - the one used in the previous version missed a lot of them.
BIOCGETIF
BIOCSETIF
GREDSOCK
GREGADDRD
GREGADDRS
GREGPROTO
GRESADDRD
GRESADDRS
GRESPROTO
GRESSOCK
SIOCADDMULTI
SIOCDELMULTI
SIOCDIFADDR
SIOCDIFADDR_IN6
SIOCDIFPHYADDR
SIOCGDEFIFACE_IN6
SIOCGIFADDR
SIOCGIFADDR_IN6
SIOCGIFAFLAG_IN6
SIOCGIFALIFETIME_IN6
SIOCGIFBRDADDR
SIOCGIFDLT
SIOCGIFDSTADDR
SIOCGIFDSTADDR_IN6
SIOCGIFFLAGS
SIOCGIFGENERIC
SIOCGIFMETRIC
SIOCGIFMTU
SIOCGIFNETMASK
SIOCGIFNETMASK_IN6
SIOCGIFPDSTADDR
SIOCGIFPDSTADDR_IN6
SIOCGIFPSRCADDR
SIOCGIFPSRCADDR_IN6
SIOCGIFSTAT_ICMP6
SIOCGIFSTAT_IN6
SIOCGPVCSIF
SIOCGVH
SIOCIFCREATE
SIOCIFDESTROY
SIOCSDEFIFACE_IN6
SIOCSIFADDR
SIOCSIFADDR_IN6
SIOCSIFALIFETIME_IN6
SIOCSIFBRDADDR
SIOCSIFDSTADDR
SIOCSIFDSTADDR_IN6
SIOCSIFFLAGS
SIOCSIFGENERIC
SIOCSIFMEDIA
SIOCSIFMETRIC
SIOCSIFMTU
SIOCSIFNETMASK
SIOCSIFNETMASK_IN6
SIOCSNDFLUSH_IN6
SIOCSPFXFLUSH_IN6
SIOCSPVCSIF
SIOCSRTRFLUSH_IN6
SIOCSVH
TAPGIFNAME
values of the SS_ONSTACK and SS_DISABLE constants.
Use it to shorten the source files when this action is replicated.
Actually, given the monstrous complexity of sigaltstack1() there is
probably a much better way to do this...
once the 'address' has been copied into an mbuf.
Add extra flags for 'struct msghdr.msg_flags' to indicate that the address
and control are already in mbufs, and that the uio structure is in userspace
for sending data, rename sendit() to do_sys_sendmsg() to ensure no old code
passes in random flags.
Changes to compat code to use new functions - removing some stackgap use.
Fix a 'use after free' in compat_43_sys_recvmsg.
I ***THINK*** the code that converts 'cmsg' formatted data is borked!
svr4_stream.c ought to be generated from svr4_32_stream.c during the build.