Commit Graph

12093 Commits

Author SHA1 Message Date
christos a91d6c6d0a Unbreak sched_m2 (died because lwp_eproc() KASSERT in DIAGNOSTIC) and explain
what is going on. This has been broken since the introduction of l_mutex
5 months ago.
2024-01-24 16:11:48 +00:00
christos e0dbe8aaa6 add lint comments 2024-01-19 19:07:38 +00:00
hannken 29964953ba Protect kernel hooks exechook, exithook and forkhook with rwlock.
Lock as writer on establish/disestablish and as reader on list traverse.

For exechook ride "exec_lock" as it is already take as reader when
traversing the list.  Add local locks for exithook and forkhook.

Move exec_init before signal_init as signal_init calls exechook_establish()
that needs "exec_lock".

PR kern/39913 "exec, fork, exit hooks need locking"
2024-01-17 10:18:41 +00:00
hannken ca4932dc6d Print dangling vnode before panic() to help debug.
PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"
2024-01-17 10:17:29 +00:00
andvar 91f7c6c821 Surround db_stacktrace() with "#ifdef DDB" check.
Fixes LOCKDEBUG enabled build without DDB option.
2024-01-14 11:46:05 +00:00
mlelstv 5238b8a351 dump topology information with aprint_debug instead of requiring to build
a DEBUG kernel.
2024-01-04 11:18:19 +00:00
andvar 13c9a1af85 s/addreseses/addresses/ in comments (and one missing whitespace). 2024-01-03 18:10:42 +00:00
hannken 9db8c230ae Initialize mutex fileassoc_global.lock. 2023-12-28 12:49:06 +00:00
hannken a5288eef9a Include "veriexec.h" and <sys/verified_exec.h> to run
veriexec_unmountchk() on "NVERIEXEC > 0".
2023-12-28 12:48:08 +00:00
andvar bcfabd50d9 s/deatched/detached/ in comment. While here, fix an article before annoyance. 2023-12-20 21:03:50 +00:00
andvar 4b34a91875 fix tripple n typos in "running"/"domainname", also one missing n in comments. 2023-12-20 20:35:37 +00:00
pgoyette 4be362dba2 Modularize the COMPAT_90 code that resulted from the removal of
netinet6/nd6 from the kernel.  Now, the minimal compat code can
be successfully loaded and unloaded along with the rest of the
COMPAT_90 code.

XXX  pullup-10 - hopefully before RC2
2023-12-09 15:21:01 +00:00
pgoyette e512fb100a There's no COMPAT_60 code left here, so no need for conditional
inclusion of header file.
2023-12-07 09:00:32 +00:00
thorpej bf9518a62c Add the notion of "private boundary tags" to vmem. This allows vmem to
be used VERY early in boot; such consumers statically allocate the vmem
arena and boundary tags, and then explicitly add those static, private
boundary tags to the arena tag free list using the new function vmem_add_bts().

Vmem arenas that use private boundary tags will NOT consume the statically
allocated bootstrap tags used by the vmem system itself; the assumption is
that the consumer of such an arena knows what they're doing, and is responsible
for all necessary resource management.  A macro, VMEM_EST_BTCOUNT(), is
provided to help such consumers size the static boundary tag store based
on the expected number of spans and early allocations.  Once the private
tags are exhausted, the arena will dynamically allocate tags as usual.
2023-12-03 19:34:08 +00:00
thorpej 06f1a2dac8 Split the boundary tag "type" field into "type" and "flags" fields.
Initialize the flags field to 0 before inserting into an arena's free
tag list.

NFC, but makes diff for a future enhancement smaller.
2023-12-03 15:06:45 +00:00
thorpej 4a50480344 bt_freetrim(): Restructure the loop as a LIST_FOREACH_SAFE() rather
than a while().  No real change in behavior now, but makes upcoming
enhancements easier.
2023-12-03 14:35:54 +00:00
thorpej 6da7222657 Assert that the vmem_btag_pool has been initialized before we attempt
to allocate from it.
2023-12-03 02:50:09 +00:00
thorpej 7f2518835c Add a vmem_xalloc_addr() function, which allocates a specific address
from an arena.  This is just a convenience wrapper around vmem_xalloc(),
that's just a bit more obvious how to use and performs some additional
sanity checks.
2023-12-02 21:02:12 +00:00
thorpej 7abe897db3 Minor changes to let this build as the "subr_vmem" test program again. 2023-12-02 19:06:17 +00:00
hannken c15cfd474f Restore kpause() accidentially removed with last commit. 2023-11-27 16:13:59 +00:00
hannken 6f60ad1b6a Implement and use an iterator over LRU lists.
Replace the vdrain kernel thread with two threadpool jobs,
one to process deferred vrele and
one to keep the number of allocated vnodes below limit.
2023-11-27 10:03:40 +00:00
ozaki-r e629b37024 mbuf: avoid assertion failure when splitting mbuf cluster
From OpenBSD:

	commit 7b4d35e0a60ba1dd4daf4b1c2932020a22463a89
	Author: bluhm <bluhm@openbsd.org>
	Date:   Fri Oct 20 16:25:15 2023 +0000

	    Avoid assertion failure when splitting mbuf cluster.

	    m_split() calls m_align() to initialize the data pointer of newly
	    allocated mbuf.  If the new mbuf will be converted to a cluster,
	    this is not necessary.  If additionally the new mbuf is larger than
	    MLEN, this can lead to a panic.
	    Only call m_align() when a valid m_data is needed.  This is the
	    case if we do not refecence the existing cluster, but memcpy() the
	    data into the new mbuf.

	    Reported-by: syzbot+0e6817f5877926f0e96a@syzkaller.appspotmail.com
	    OK claudio@ deraadt@

The issue is harmless if DIAGNOSTIC is not enabled.

XXX pullup-10
XXX pullup-9
2023-11-27 02:50:27 +00:00
riastradh fdf689ec4f vfs(9): Make sure to kpause at least one tick, not zero.
kpause(9) forbids zero.

Local workaround for wider problem in PR kern/57718, to address
immediate symptom of crash on any system with hz=50, e.g. alpha in
qemu:

panic: kernel diagnostic assertion "timo != 0 || intr" failed: file "/usr/src/sys/kern/kern_synch.c", line 249

XXX pullup-10
XXX pullup-9
XXX pullup-8
2023-11-22 13:19:50 +00:00
riastradh 4cb23c1777 kpause(9): KASSERT -> KASSERTMSG
PR kern/57718 (might help to diagnose manifestations of the problem)
2023-11-22 13:18:48 +00:00
riastradh 82de273b52 pax(9): Rework header file more coherently to nix some needless #ifs.
Cleans up some of the fallout from PR kern/57711 fixes.

Could do a little more to nix PAX_SEGVGUARD conditionals but maybe
not worth it.
2023-11-21 14:35:36 +00:00
martin 0d92cf4b8d Stopgap build fix for kernels w/o PAX_MPROTECT after the fixes
for PR 57711: mark variable as unused (sometimes, e.g. in macppc kernels).
2023-11-21 12:12:26 +00:00
riastradh bf53af405a exec: Map noaccess part of stack with prot=NONE, maxprot=READ|WRITE.
This way, setrlimit(RLIMT_STACK) can grant READ|WRITE access when
increasing the stack size.

PR kern/57711

XXX pullup-10
XXX pullup-9
XXX pullup-8
2023-11-21 00:09:18 +00:00
riastradh ad71ebb55e eventfd(2): Prune dead branch.
Fallout from PR kern/57703 fix.

XXX pullup-10
2023-11-19 17:16:00 +00:00
riastradh 36d181a381 eventfd(2): Omit needless micro-optimization causing PR kern/57703.
Unfortunately, owing to PR kern/57705 and PR misc/57706, it isn't
convenient to flip the xfail switch on a test for this bug.  So we'll
do that separately.  (But I did verify that a rumpified version of
the test postd to PR kern/57703 failed without this change, and
passed with this change.)

PR kern/57703

XXX pullup-10
2023-11-19 04:13:37 +00:00
hannken cc8bf809e7 As the number of allocated vnodes goes beyond 106% of desiredvnodes
start throttling threads allocating new vnodes at a rate of ~100 new
vnodes per second and thread.
2023-11-06 12:17:50 +00:00
martin 3007f1403a Back out the following revisions on behalf of core:
sys/sys/lwp.h: revision 1.228
	sys/sys/pipe.h: revision 1.40
	sys/kern/uipc_socket.c: revision 1.306
	sys/kern/kern_sleepq.c: revision 1.84
	sys/rump/librump/rumpkern/locks_up.c: revision 1.13
	sys/kern/sys_pipe.c: revision 1.165
	usr.bin/fstat/fstat.c: revision 1.119
	sys/rump/librump/rumpkern/locks.c: revision 1.87
	sys/ddb/db_xxx.c: revision 1.78
	sys/ddb/db_command.c: revision 1.187
	sys/sys/condvar.h: revision 1.18
	sys/ddb/db_interface.h: revision 1.42
	sys/sys/socketvar.h: revision 1.166
	sys/kern/uipc_syscalls.c: revision 1.209
	sys/kern/kern_condvar.c: revision 1.60

  Add cv_fdrestart() [...]
  Use cv_fdrestart() to implement fo_restart.
  Simplify/streamline pipes a little bit [...]

This changes have caused regressions and need to be debugged.
The cv_fdrestart() addition needs more discussion.
2023-11-02 10:31:55 +00:00
riastradh ed4437302a thmap(9): Preallocate GC list storage for thmap_del.
thmap_del can't fail, and it is used in places in npf where sleeping
is forbidden, so it can't rely on allocating memory either.

Instead of having thmap_del allocate memory on the fly for each
object to defer freeing until thmap_gc, arrange to have thmap(9)
preallocate the same storage when allocating all the objects in the
first place, with a GC header.

This is suboptimal for memory usage, especially on insertion- and
lookup-heavy but deletion-light workloads, but it's not clear rmind's
alternative (https://github.com/rmind/thmap/tree/thmap_del_mem_fail)
is ready to use yet, so we'll go with this for correctness.

PR kern/57208
https://github.com/rmind/npf/issues/129

XXX pullup-10
XXX pullup-9
2023-10-17 11:57:20 +00:00
riastradh 272918d637 thmap(9): Test alloc failure, not THMAP_GETPTR failure.
THMAP_GETPTR may return nonnull even though alloc returned zero.

Note that this failure branch is not actually appropriate;
thmap_create should not fail.  We really need to pass KM_SLEEP
through in this call site even though there are other call sites for
which KM_NOSLEEP is appropriate.

Adapted from: https://github.com/rmind/thmap/pull/14

PR kern/57666
https://github.com/rmind/thmap/issues/13

XXX pullup-10
XXX pullup-9
2023-10-17 11:55:28 +00:00
riastradh e700b97107 kern_ktrace.c: Sort includes. No functional change intended. 2023-10-17 10:27:34 +00:00
riastradh cee67a0a71 kern_turnstile.c: Use <sys/lwp.h> explicitly for struct lwp members. 2023-10-15 10:30:20 +00:00
riastradh 1da60e94d5 sys_select.c: Sort includes. No functional change intended. 2023-10-15 10:29:34 +00:00
riastradh b557f9979d kern_lwp.c: Sort includes. No functional change intended. 2023-10-15 10:29:24 +00:00
riastradh 5477de2490 kern_synch.c: Sort includes. No functional change intended. 2023-10-15 10:29:10 +00:00
riastradh 6768a36b8e kern_sleepq.c: Sort includes. No functional change intended. 2023-10-15 10:29:02 +00:00
riastradh 59b15a84e1 kern_rwlock.c: Sort includes. No functional change intended. 2023-10-15 10:28:48 +00:00
riastradh 6779c0232f kern_mutex.c: Sort includes. No functional change intended. 2023-10-15 10:28:23 +00:00
riastradh d31d6338cb kern_lwp: Sort includes. No functional change intended. 2023-10-15 10:28:14 +00:00
riastradh 72db704cac kern_condvar.c: Sort includes. No functional change intended. 2023-10-15 10:28:00 +00:00
riastradh fac91bbe0f sys/lwp.h: Nix sys/syncobj.h dependency.
Remove it in ddb/db_syncobj.h too.

New sys/wchan.h defines wchan_t so that users need not pull in
sys/syncobj.h to get it.

Sprinkle #include <sys/syncobj.h> in .c files where it is now needed.
2023-10-15 10:27:11 +00:00
ad 61757a071b Simplify/streamline pipes a little bit:
- Allocate only one struct pipe not two (no need to be bidirectional here).
- Then use f_flag (FREAD/FWRITE) to figure out what to do in the fileops.
- Never wake the other side or acquire long-term (I/O) lock unless needed.
- Whenever possible, defer wakeups until after locks have been released.
- Do some things locklessly in pipe_ioctl() and pipe_poll().

Some notable results:

- -30% latency on a 486DX2/66 doing 1 byte ping-pong within a single process.
- 2.5x less lock contention during "make cleandir" of src on a 48 CPU machine.
- 1.5x bandwith with 1kB messages on the same 48 CPU machine (8kB: same b/w).
2023-10-13 19:07:08 +00:00
ad 3e9215f785 Use cv_fdrestart() to implement fo_restart. 2023-10-13 18:50:39 +00:00
ad 6cba187343 Add cv_fdrestart() (better name suggestions welcome):
Like cv_broadcast(), but make any LWPs that share the same file descriptor
table as the caller return ERESTART when resuming.  Used to dislodge LWPs
waiting for I/O that prevent a file descriptor from being closed, without
upsetting access to the file (not descriptor) made from another direction.
2023-10-13 18:48:56 +00:00
ad 513d6564aa Comments. 2023-10-12 23:51:05 +00:00
ad f26f10e736 Oops, fix inverted test. 2023-10-08 13:37:26 +00:00
ad 32a89764db Ensure that an LWP that has taken a legitimate wakeup never produces an
error code from sleepq_block().  Then, it's possible to make cv_signal()
work as expected and only ever wake a singular LWP.
2023-10-08 13:23:05 +00:00