Commit Graph

289 Commits

Author SHA1 Message Date
thorpej 156895706e Overhaul the way LWP IDs are allocated. Instead of each LWP having it's
own LWP ID space, LWP IDs came from the same number space as PIDs.  The
lead LWP of a process gets the PID as its LID.  If a multi-LWP process's
lead LWP exits, the PID persists for the process.

In addition to providing system-wide unique thread IDs, this also lets us
eliminate the per-process LWP radix tree, and some associated locks.

Remove the separate "global thread ID" map added previously; it is no longer
needed to provide this functionality.

Nudged in this direction by ad@ and chs@.
2020-04-24 03:22:06 +00:00
thorpej a29147fa13 - Only increment nprocs when we're creating a new process, not just
when allocating a PID.
- Per above, proc_free_pid() no longer decrements nprocs.  It's now done
  in proc_free() right after proc_free_pid().
- Ensure nprocs is accessed using atomics everywhere.
2020-04-19 20:31:59 +00:00
thorpej 98a9cebbb6 Add support for lazily generating a "global thread ID" for a LWP. This
identifier uniquely identifies an LWP across the entire system, and will
be used in future improvements in user-space synchronization primitives.

(Test disabled and libc stub not included intentionally so as to avoid
multiple libc version bumps.)
2020-04-04 20:20:12 +00:00
ad 6a317a61b7 Fix crash observed with procfs on current-users by David Hopper. LWP refcnt
and p_zomblwp both must reach the needed state, and LSZOMB be set, under a
single hold of p_lock.
2020-03-26 21:31:55 +00:00
ad d05ef83dfb Kill off kernel_lock_plug_leak(), and go back to dropping kernel_lock in
exit1(), since there seems little hope of finding the leaking code any
time soon.  Can still be caught with LOCKDEBUG.
2020-03-08 15:05:18 +00:00
ad f61617cee7 exit1(): remove from the radix tree before setting zombie status, as
radix_tree_remove_node() can block on locks when freeing.

Reported-by: syzbot+02bf066c30f812b14f25@syzkaller.appspotmail.com
2020-02-22 21:07:46 +00:00
ad 82002773ec - Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky.  May help with the "softint screwup"
  panic.

- Correct the memory barriers around zombies switching into oblivion.
2020-02-15 18:12:14 +00:00
ad d1c42b4f7b - Track LWPs in a per-process radixtree. It uses no extra memory in the
single threaded case.  Replace scans of p->p_lwps with lookups in the
  tree.  Find free LIDs for new LWPs in the tree.  Replace the hashed sleep
  queues for park/unpark with lookups in the tree under cover of a RW lock.

- lwp_wait(): if waiting on a specific LWP, find the LWP via tree lookup and
  return EINVAL if it's detached, not ESRCH.

- Group the locks in struct proc at the end of the struct in their own cache
  line.

- Add some comments.
2020-01-29 15:47:51 +00:00
ad 8959bb6327 - exit1(): for DIAGNOSTIC, call kernel_lock_plug_leak() (temporary).
- exit_lwps(): call lwp_need_userret() or LWP might never notice.
2020-01-27 21:09:33 +00:00
ad 20f33b0230 Catch a leaked hold of kernel_lock sooner with DIAGNOSTIC and make the
message a bit more informative.
2020-01-22 12:23:04 +00:00
ad 2ddceed1d9 Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
  calling cpu_switchto().  It's not safe to let other actors mess with the
  LWP (in particular l->l_cpu) while it's still context switching.  This
  removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
  it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead.  Everything
  is in cache anyway so it wasn't buying much by trying to avoid saving old
  state.  This means cpu_switchto() will never be called with prevlwp ==
  NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.
2020-01-08 17:38:41 +00:00
ad 4477d28d73 Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system.  Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().
2019-12-06 21:36:10 +00:00
kamil a35a4fe3b8 Separate flag for suspended by _lwp_suspend and suspended by a debugger
Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.
2019-10-03 22:48:44 +00:00
kamil 6aa1291e37 Correct use-after-free issue in vfork(2)
In the previous behavior vforking parent was keeping pointer to a child
and checking whether it clears a PL_PPWAIT in its bitfield p_lflag. However
a child can go invalid between exec/exit event from child and waking up
vforked parent and this can cause invalid pointer read and in the worst
scenario kernel crash.

In the new behavior vforked child keeps a reference to vforked parent LWP
and sets a value l_vforkwaiting to false. This means that vforked child
can finish its work, exec/exit and be terminated and once parent will be
woken up it will read its own field whether its child is still blocking.

Add new field in struct lwp: l_vforkwaiting protected by proc_lock.
In future it should be refactored and all PL_PPWAIT users transformed to
l_vforkwaiting and next l_vforkwaiting probably transformed into a bit
field.

This is another attempt of fixing this bug after <rmind> from 2012 in
commit:

Author: rmind <rmind@NetBSD.org>
Date:   Sun Jul 22 22:40:18 2012 +0000

    fork1: fix use-after-free problems.  Addresses PR/46128 from Andrew Doran.
    Note: PL_PPWAIT should be fully replaced and modificaiton of l_pflag by
    other LWP is undesirable, but this is enough for netbsd-6.

The new version no longer performs unsafe access in l_lflag changing the
LP_VFORKWAIT bit.

Verified with ATF t_vfork and t_ptrace* tests and they are no longer
causing any issues in my local setup.

Fixes PR/46128 by Andrew Doran
2019-06-13 20:20:18 +00:00
ozaki-r 7fc219a5ee Implement an aggressive psref leak detector
It is yet another psref leak detector that enables to tell where a leak occurs
while a simpler version that is already committed just tells an occurrence of a
leak.

Investigating of psref leaks is hard because once a leak occurs a percpu list of
psref that tracks references can be corrupted.  A reference to a tracking object
is memorized in the list via an intermediate object (struct psref) that is
normally allocated on a stack of a thread.  Thus, the intermediate object can be
overwritten on a leak resulting in corruption of the list.

The tracker makes a shadow entry to an intermediate object and stores some hints
into it (currently it's a caller address of psref_acquire).  We can detect a
leak by checking the entries on certain points where any references should be
released such as the return point of syscalls and the end of each softint
handler.

The feature is expensive and enabled only if the kernel is built with
PSREF_DEBUG.

Proposed on tech-kern
2019-05-17 03:34:26 +00:00
hannken 72421a1974 Move pointer to fstrans private data into "struct lwp".
Ride NetBSD 8.99.35
2019-03-01 09:02:03 +00:00
maxv 2ce19e3fc7 Fix info leak. There is one branch where 'status' is not initialized at
all.

	+ Possible info leak: [len=4, leaked=4]
	| #0 0xffffffff80baf397 in kleak_copyout
	| #1 0xffffffff80b56d0c in sys_wait6
	| #2 0xffffffff80259c42 in syscall
2018-11-29 12:37:22 +00:00
maxv 62c8988166 Remove the kernel PMC code. Sent yesterday on tech-kern@.
This change:

 * Removes "options PERFCTRS", the associated includes, and the associated
   ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
   good.

 * Removes the PMC code of ARM XSCALE.

 * Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

 * Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
   definitions are put in sysarch.h.

 * Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
   and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
   netbsd32 and rump.

 * Removes the pmc_evid_t and pmc_ctr_t types.

 * Removes all the associated man pages. The sets are marked as obsolete.
2018-07-12 10:46:40 +00:00
christos e0983e96df Load the struct rusage text, data, and stack fields from the vmspace struct.
Before they were all 0. We update them when we call getrusage() or on
process exit() so that the children rusage is accounted for.
2018-05-07 21:03:45 +00:00
christos 0011aa658c Store full executable path in p->p_path as discussed in tech-kern.
This means that the full executable path is always available.

- exec_elf.c: use p->path to set AT_SUN_EXECNAME, and since this is
  always set, do so unconditionally.
- kern_exec.c: simplify pathexec, use kmem_strfree where appropriate
  and set p->p_path
- kern_exit.c: free p->p_path
- kern_fork.c: set p->p_path for the child.
- kern_proc.c: use p->p_path to return the executable pathname; the
  NULL check for p->p_path, should be a KASSERT?
- exec.h: gc ep_path, it is not used anymore
- param.h: bump version, 'struct proc' size change

TODO:
1. reference count the path string, to save copy at fork and free
   just before exec?
2. canonicalize the pathname by changing namei() to LOCKPARENT
   vnode and then using getcwd() on the parent directory?
2017-11-07 19:44:04 +00:00
kamil a69b333e73 Remove the filesystem tracing feature
This is a legacy interface from 4.4BSD, and it was
introduced to overcome shortcomings of ptrace(2) at that time, which are
no longer relevant (performance). Today /proc/#/ctl offers a narrow
subset of ptrace(2) commands and is not applicable for modern
applications use beyond simplistic tracing scenarios.

This removal will simplify kernel internals. Users will still be able to
use all the other /proc files.

This change won't affect other procfs files neither Linux compat
features within mount_procfs(8). /proc/#/ctl isn't available on Linux.

Remove:
 - /proc/#/ctl from mount_procfs(8)
 - P_FSTRACE note from the documentation of ps(1)
 - /proc/#/ctl and filesystem tracing documentation from mount_procfs(8)
 - KAUTH_REQ_PROCESS_PROCFS_CTL documentation from kauth(9)
 - source code file miscfs/procfs/procfs_ctl.c
 - PFSctl and procfs_doctl() from sys/miscfs/procfs/procfs.h
 - KAUTH_REQ_PROCESS_PROCFS_CTL from sys/sys/kauth.h
 - PSL_FSTRACE (0x00010000) from sys/sys/proc.h
 - P_FSTRACE (0x00010000) from sys/sys/sysctl.h

Reduce code complexity after removal of this functionality.

Update TODO.ptrace accordingly: remove two entries about /proc tracing.

Do not keep legacy notes as comments in the headers about removed
PSL_FSTRACE / P_FSTRACE, as this interface had little number of users
(close or equal to zero).

Proposed on tech-kern@.

All filesystem tracing utility users are encouraged to switch to ptrace(2).

Sponsored by <The NetBSD Foundation>
2017-08-28 00:46:06 +00:00
kamil e6f79d077f Cleanup dead code after revert of racy vfork(2) commit
This removes dead code introduced with the following commit:

date: 2012-07-27 22:52:49 +0200;  author: christos;  state: Exp;  lines: +8 -2;
revert racy vfork() parent-blocking-before-child-execs-or-exits code.
ok rmind
2017-01-09 00:31:30 +00:00
christos 931a19e8b1 Make p_ppid contain the original parent's pid even for traced processes.
Only change it when we are being permanently reparented to init. Since
p_ppid is only used as a cached value to retrieve the parent's process id
from userland, this change makes it correct at all times. Idea from kre@
Revert specialized logic from getpid/getppid now that it is not needed.
2016-11-13 15:25:01 +00:00
christos b2924f399d GC WOPTSCHECKED, define macros for the select opts and all the valid opts.
The linux compat flags are not part of X/Open.
2016-11-10 17:07:14 +00:00
kre b6732360dd PR kern/51600 ; PR standards/51606
Revert 1.264 - that was intended to fix 51600, but didn't, it just
hid the problem, and caused 51606.  This fixes 51606.

Handle waiting on a process that has been detatched from its parent
because of being ptrace'd by some other process.  This fixes 51600.
("handle" here means that the wait() hangs, or with WNOHANG, returns 0,
we cannot actually wait on a process that is not currently an attached
child.)

Note: the detatched process waiting is not yet perfect (it fails to
take account of options like WALLSIG and WALTSIG) - suport for those
(that is, ignoring a detatched child that one of those options will
later cause to be ignored when the process is re-attached.)

For now, for ither than when waiting for a specific process ID, when
a process does a wait() sys call (any of them), has no applicable
children attached that can be returned, and has at least one detatched
child, then we do a linear search of all processes to look for a
suitable detatched child.  This is likely to be slow - but very rare.
Eventually it might be better to keep a list of detatched children
per process.
2016-11-09 00:30:17 +00:00
christos 678541356f Return 0 if WNOHANG and no kids. 2016-11-05 02:59:22 +00:00
christos 9b5ab01589 deduplicate the complex lock reparent dance. 2016-11-04 18:14:04 +00:00
christos e8fde31e58 Cleanup old parent from zombies too. Fixes repeatable panic when we try
to signal the already freed zombie parent after the child exits.
2016-11-04 18:12:06 +00:00
christos 7bfe2974a7 Fix wrong WIFCONTINUED() status. 2016-11-03 20:58:25 +00:00
skrll cf96d30a9f Trailing whitespace 2016-09-23 14:16:32 +00:00
skrll 7b000a7783 Add netbsd32_clock_getcpuclockid2 and netbsd32_wait6 functions 2016-09-23 14:09:39 +00:00
christos cbf2c4d885 We need a flag for WCONTINUED so that we can reset it... Fixes bash issue. 2016-04-27 21:15:40 +00:00
christos d1ae6b027a set the return value to the pid if we found one (from kre@) 2016-04-25 16:35:47 +00:00
christos d583d77111 Implement WIFCONTINUED using the linux value instead of the FreeBSD one... 2016-04-06 03:51:26 +00:00
christos 0fe87e3916 Simplify even more to make it clear how the status is set. 2016-04-05 14:07:31 +00:00
christos 30e54fbe0a Set the exit status code properly. 2016-04-05 13:01:46 +00:00
christos 5c35dbcd66 no need to pass the coredump flag to exit1() since it is set and known
in one place.
2016-04-04 23:07:06 +00:00
christos 4fbdf206cb Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit:		exit code
2. p_xsig:		signal number
3. p_sflag & WCOREFLAG	bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>
2016-04-04 20:47:57 +00:00
christos 93583346f6 restore the early breaks for the exact process match. 2016-04-03 23:50:49 +00:00
christos ae0f86396b implement WCONTINUED, untested
fill out more siginfo fields.
use geteuid instead of getuid
2016-04-03 02:28:46 +00:00
christos 15563e6d2d Add wait6() to be used to implement waitid, mostly from FreeBSD.
Create idtypes.h shared by wait.h and pset.h
2016-04-02 20:38:40 +00:00
pgoyette d3abaa577e Update value of p_stat before we release the proc_lock. Thanks to
Robert Elz.

XXX Pull-ups for -7, -6{,-0,-1} and -5{,-0,-1,-2}
2015-10-13 06:47:21 +00:00
pgoyette b2557f247b For processes marked with PS_STOPEXIT, update the process's p_waited
value, and update its parent's p_nstopchild value when marking the
process's p_stat to SSTOP.  The process needed to be SACTIVE to get
here, so this transition represents an additional process for which
the parent needs to wait.

Fixes PR kern/50308

Pullups will be requested for:

       NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2
2015-10-13 00:28:22 +00:00
pgoyette ad146809be Currently, if a process is exiting and its parent has indicated no intent
of reaping the process (nor any other children), the process wil get
reparented to init.  Since the state of the exiting process at this point
is SDEAD, proc_reparent() will not update either the old or new parent's
p_nstopchild counters.

This change causes both old and new parents to be properly updated.

Fixes PR kern/50300

Pullups will be requested for:

       NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2
2015-10-13 00:27:19 +00:00
christos db70f1818e Change SDT (Statically Defined Tracing) probes to use link sets so that it
is easier to add probes. (From FreeBSD)
2015-10-02 16:54:15 +00:00
christos c4c94e150f Free pid for linux processes. Reported by Mark Davies, fix by dsl@
XXX: pullup 6
2014-05-05 15:45:32 +00:00
riz c02fb3c915 Add another field to the SDT_PROBE_DEFINE macro, so our DTrace probes
can named the same as those on other platforms.

For example, proc:::exec-success, not proc:::exec_success.

Implementation follows the same basic principle as FreeBSD's; add
another field to the SDT_PROBE_DEFINE macro which is the name
as exposed to userland.
2013-06-09 01:13:47 +00:00
rmind ea775f7598 exit_lwps, lwp_wait: fix a race condition by re-trying if p_lock was dropped
in a case of process exit.  Necessary to re-flag all LWPs for exit, as their
state might have changed or new LWPs spawned.

Should fix PR/46168 and PR/46402.
2012-09-27 20:43:15 +00:00
riastradh bb195042a2 Use separate names for the multitudinous uses of `q' in exit1.
Now I can follow which process is which in this routine.

If I jiggle the whitespace so line numbers don't change, there is no
change in the output of `objdump -d kern_exit.o' for amd64.

ok abp
2012-08-05 14:53:25 +00:00
christos 72e4156b86 revert racy vfork() parent-blocking-before-child-execs-or-exits code.
ok rmind
2012-07-27 20:52:49 +00:00