Commit Graph

12093 Commits

Author SHA1 Message Date
ad 9f3236292d Defer some wakeups till lock release. 2023-10-08 12:38:58 +00:00
ad 05f14f5014 sleepq_block(): slightly reduce number of test+branch in the common case. 2023-10-08 11:12:47 +00:00
ad d89ef44f99 sleepq_uncatch(): fix typo that's been there since 2020, hello @thorpej lol:
-       l->l_flag = ~(LW_SINTR | LW_CATCHINTR | LW_STIMO);
+       l->l_flag &= ~(LW_SINTR | LW_CATCHINTR | LW_STIMO);
2023-10-07 20:48:50 +00:00
ad bf5d5d6058 sleepq_uncatch(): clear LW_STIMO too, so that there's no possibility that
the newly non-interruptable sleep could produce EWOULDBLOCK (paranoia).
2023-10-07 14:12:29 +00:00
ad c55cb7eb1b Update comments to match reality 2023-10-05 19:44:26 +00:00
ad 68fa584377 Arrange to update cached LWP credentials in userret() rather than during
syscall/trap entry, eliminating a test+branch on every syscall/trap.

This wasn't possible in the 3.99.x timeframe when l->l_cred came about
because there wasn't a reliable/timely way to force an ONPROC LWP running on
a remote CPU into the kernel (which is just about the only new thing in
this scheme).
2023-10-05 19:41:03 +00:00
ad 5e6f75a121 Resolve !MULTIPROCESSOR build problem with the nasty kernel lock macros. 2023-10-05 19:28:30 +00:00
ad b316ad652f The idle LWP doesn't need to care about kernel_lock. 2023-10-05 19:10:18 +00:00
ad 01e5be2450 kern_sig.c: remove problematic kernel_lock handling which is unneeded in 2023. 2023-10-05 19:06:30 +00:00
riastradh f103f77a25 lwp_pctr(9): Make this a little more robust.
No substantive change to machine code on aarch64.  (Instructions and
registers got reordered a little but not in a way that matters.)
2023-10-05 13:05:18 +00:00
riastradh c6e0728e36 kern_cctr.c: Fix broken indentation.
No functional change intended.
2023-10-05 12:05:59 +00:00
ad c43491e4bf pipe1(): call getnanotime() once not twice. 2023-10-04 22:41:56 +00:00
ad 2c21032618 pipe->pipe_waiters isn't needed on NetBSD, kernel condvars do this for free. 2023-10-04 22:19:58 +00:00
ad 0f335007fe kauth_cred_hold(): return cred verbatim so that donating a reference to
another data structure can be done more elegantly.
2023-10-04 22:17:09 +00:00
ad d96d10fff5 pipe_read(): try to skip locking the pipe if a non-blocking fd is used, as
is very often the case with BSD make (from FreeBSD/mjg@).
2023-10-04 22:12:23 +00:00
ad 66fdfe9cee match_process(): most of the fields being inspected are covered by proc_lock
so don't grab p->p_lock so much.
2023-10-04 20:48:13 +00:00
ad 3d1cabfdfe Do cv_broadcast(&p->p_lwpcv) after dropping p->p_lock in a few places, to
reduce contention.
2023-10-04 20:46:33 +00:00
ad ce3debcbe5 lwp_wait(): restart the loop if p->p_lock dropped to reap zombie (paranoid). 2023-10-04 20:45:13 +00:00
ad 27711c94c3 Sprinkle a bunch more calls to lwp_need_userret(). There should be no
functional change but it does get rid of a bunch of assumptions about how
mi_userret() works making it easier to adjust in that in the future, and
works as a kind of documentation too.
2023-10-04 20:44:15 +00:00
ad 725adb2afe Sprinkle a bunch more calls to lwp_need_userret(). There should be no
functional change but it does get rid of a bunch of assumptions about how
mi_userret() works making it easier to adjust in that in the future, and
works as a kind of documentation too.
2023-10-04 20:42:38 +00:00
ad 9a587cc997 Turnstiles: use the syncobj name for ps/top wmesg when sleeping since it's
more informative than "tstile".
2023-10-04 20:39:35 +00:00
ad 0a6ca13bec Eliminate l->l_biglocks. Originally I think it had a use but these days a
local variable will do.
2023-10-04 20:29:18 +00:00
ad a355028fa4 Eliminate l->l_ncsw and l->l_nivcsw. From memory think they were added
before we had per-LWP struct rusage; the same is now tracked there.
2023-10-04 20:28:05 +00:00
ad 9fc55e2ba8 Tweak a couple of comments. 2023-10-02 21:50:18 +00:00
ad 2246c1eb4d Use kmem_intr_*() variants for lock objects since aiodoned was done away
with and we process these I/Os in soft interrupt context now.
2023-10-02 21:03:55 +00:00
ad 089029363e kauth_cred_groupmember(): check egid before a tedious scan of groups. 2023-10-02 20:59:12 +00:00
ad cbc1d2c479 Sigh.. Adjust previous to work as intended. The boosted LWP priority
didn't persist as far as the run queue because l_syncobj gets reset
earlier than I recalled.
2023-09-23 20:23:07 +00:00
ad 6ed72b5fad - Simplify how priority boost for blocking in kernel is handled. Rather
than setting it up at each site where we block, make it a property of
  syncobj_t.  Then, do not hang onto the priority boost until userret(),
  drop it as soon as the LWP is out of the run queue and onto a CPU.
  Holding onto it longer is of questionable benefit.

- This allows two members of lwp_t to be deleted, and mi_userret() to be
  simplified a lot (next step: trim it down to a single conditional).

- While here, constify syncobj_t and de-inline a bunch of small functions
  like lwp_lock() which turn out not to be small after all (I don't know
  why, but atomic_*_relaxed() seem to provoke a compiler shitfit above and
  beyond what volatile does).
2023-09-23 18:48:04 +00:00
ad 59e0001f2c Repply this change with a couple of bugs fixed:
- Do away with separate pool_cache for some kernel objects that have no special
  requirements and use the general purpose allocator instead. On one of my
  test systems this makes for a small (~1%) but repeatable reduction in system
  time during builds presumably because it decreases the kernel's cache /
  memory bandwidth footprint a little.
- vfs_lockf: cache a pointer to the uidinfo and put mutex in the data segment.
2023-09-23 18:21:11 +00:00
ad 6c27fcfbed kernel_lock isn't needed to synchronise kthread_exit() and kthread_join(). 2023-09-23 14:40:42 +00:00
msaitoh 821e159c0a s/ for for / for / in comment. 2023-09-21 09:31:49 +00:00
ad 60240e1fc2 Fix a comment. 2023-09-19 22:15:32 +00:00
ad ef0f79c8d1 Back out recent change to replace pool_cache with then general allocator.
Will return to this when I have time again.
2023-09-12 16:17:21 +00:00
martin 8b4bc2e26f Add missing <sys/intr.h> include (previously indirectly hidden via pool.h) 2023-09-11 08:55:01 +00:00
ad cbcf86cb1f - Do away with separate pool_cache for some kernel objects that have no special
requirements and use the general purpose allocator instead.  On one of my
  test systems this makes for a small (~1%) but repeatable reduction in system
  time during builds presumably because it decreases the kernel's cache /
  memory bandwidth footprint a little.
- vfs_lockf: cache a pointer to the uidinfo and put mutex in the data segment.
2023-09-10 14:45:52 +00:00
ad 37ee486b91 It's easy to exhaust the open file limit on a system with many CPUs due to
caching.  Allow a bit of leeway to reduce the element of surprise.
2023-09-10 14:44:08 +00:00
ad 4c2ca10b4a Assert that kmem_alloc() provides the expected alignment. 2023-09-10 14:29:13 +00:00
ad e5756b164f do_sys_accessat(): copy credentials only when needed. 2023-09-09 18:34:44 +00:00
ad 0860546435 Fix a ~16 year old perf regression: when accepting a connection, add a
reference to the caller's credentials rather than copying them.
2023-09-09 18:30:56 +00:00
ad 6e2810dfd3 - Shrink namecache entries to 64 bytes on 32-bit platforms and use 32-bit
key values there for speed (remains 128 bytes & 64-bits on _LP64).
- Comments.
2023-09-09 18:27:59 +00:00
christos 2b1dcc58b0 Move the initialization of the random hash for addresses earlier so that
it does not happen under a spin lock context (when it is first used).
2023-09-09 16:01:09 +00:00
ad 70ddceb5d7 Fix a ~16 year old perf regression: when creating a socket, add a reference
to the caller's credentials rather than copying them.  On an 80486DX2/66 this
seems to ~halve the time taken to create a socket.
2023-09-07 20:12:33 +00:00
ad 8479531bde Remove dodgy and unused mutex_owner_running() & rw_owner_running(). 2023-09-07 20:05:41 +00:00
riastradh 5c3232db9d heartbeat(9): Make heartbeat_suspend/resume nestable.
And make them bind to the CPU as a side effect, instead of requiring
the caller to have already done so.

This lets us eliminate the assertions so we can use them in ddb even
when things are going haywire and we just want to get diagnostics.

XXX kernel revbump -- struct cpu_info change
2023-09-06 12:29:14 +00:00
simonb 715431a6a9 Whitespace nit. 2023-09-04 09:13:23 +00:00
riastradh 792ae95f90 heartbeat(9): Move #ifdef HEARTBEAT to sys/heartbeat.h.
Less error-prone this way, and the callers are less cluttered.
2023-09-02 17:44:59 +00:00
riastradh c0abd507e0 heartbeat(9): Move panicstr check into the IPI itself.
We can't return early from defibrillate because the IPI may have yet
to run -- we can't return until the other CPU is definitely done
using the ipi_msg_t we created on the stack.

We should avoid calling panic again on the patient CPU in case it was
already in the middle of a panic, so that we don't re-enter panic
while, e.g., trying to print a stack trace.

Sprinkle some comments.
2023-09-02 17:44:41 +00:00
riastradh 11062fec23 heartbeat(9): More detail about manual test success criteria.
Changes comments only, no functional change.
2023-09-02 17:44:32 +00:00
riastradh 95d8ae3ce4 heartbeat(9): Ignore stale tc if primary CPU heartbeat is suspended.
The timecounter ticks only on the primary CPU, so of course it will
go stale if it's suspended.

(It is, perhaps, a mistake that it only ticks on the primary CPU,
even if the primary CPU is offlined or in a polled-input console
loop, but that's a separate issue.)
2023-09-02 17:44:23 +00:00
riastradh 572220daab heartbeat(9): New flag SPCF_HEARTBEATSUSPENDED.
This way we can suspend heartbeats on a single CPU while the console
is in polling mode, not just when the CPU is offlined.  This should
be rare, so it's not _convenient_, but it should enable us to fix
polling-mode console input when the hardclock timer is still running
on other CPUs.
2023-09-02 17:43:37 +00:00
riastradh b339b8878f cpu_setstate: Fix call to heartbeat_suspend.
Do this on successful offlining, not on failed offlining.

No functional change right now because heartbeat_suspend is
implemented as a noop -- heartbeat(9) will just check the
SPCF_OFFLINE flag.  But if we change it to not be a noop, well, then
we need to call it in the right place.
2023-09-02 17:43:28 +00:00
skrll 42c5a783e2 Trailing whitespace. 2023-09-01 16:57:33 +00:00
andvar 9eee100ff9 remove broken #ifdef KADB code block in subr_prf.
kdbpanic() was seemingly MIPS only and removed back in 1997,
since mips/locore.S rev 1.31.
should fix builds with KADB option enabled (tested on arc).
2023-08-29 21:23:14 +00:00
rin 21280b65d3 exec_elf: Sort auxv entries by value of types
No significant changes intended.
Just for slightly nicer output for gdb "info auxv".
2023-08-17 06:58:26 +00:00
christos 8e699bff4f F_GETPATH guarantees that data points to a MAXPATHLEN pointer, so go back
to using MAXPATHLEN.
2023-08-12 23:22:49 +00:00
christos 42c84f8ba4 mfd_name should be already NUL terminated. 2023-08-12 23:09:12 +00:00
christos f495a620b7 Add missing F_GETPATH (from Theodore Preduta) 2023-08-12 23:07:46 +00:00
riastradh e5b4f1636b workqueue(9): Factor out wq->wq_flags & WQ_FPU in workqueue_worker.
No functional change intended.  Makes it clearer that s is
initialized when used.
2023-08-09 08:24:18 +00:00
riastradh 9b18147164 workqueue(9): Sort includes.
No functional change intended.
2023-08-09 08:24:08 +00:00
riastradh 0d01e7b05c workqueue(9): Avoid unnecessary mutex_exit/enter cycle each loop. 2023-08-09 08:23:45 +00:00
riastradh 6ce73099f6 workqueue(9): Stop violating queue(3) internals. 2023-08-09 08:23:35 +00:00
riastradh 23d0d09807 workqueue(9): Sprinkle dtrace probes for workqueue_wait edge cases.
Let's make it easy to find out whether these are hit.
2023-08-09 08:23:25 +00:00
riastradh 53a88a4133 workqueue(9): Avoid touching running work items in workqueue_wait.
As soon as the workqueue function has called, it is forbidden to
touch the struct work passed to it -- the function might free or
reuse the data structure it is embedded in.

So workqueue_wait is forbidden to search the queue for the batch of
running work items.  Instead, use a generation number which is odd
while the thread is processing a batch of work and even when not.

There's still a small optimization available with the struct work
pointer to wait for: if we find the work item in one of the per-CPU
_pending_ queues, then after we wait for a batch of work to complete
on that CPU, we don't need to wait for work on any other CPUs.

PR kern/57574

XXX pullup-10
XXX pullup-9
XXX pullup-8
2023-08-09 08:23:13 +00:00
riastradh 4bff62f739 xcall(9): Rename condvars to be less confusing.
The `cv' suffix is not helpful and `xclocv' looks like some kind of
clock at first glance.  Just say `xclow' and `xchigh'.
2023-08-06 17:50:20 +00:00
riastradh 763d441de3 cprng(9): Drop and retake percpu reference across entropy_extract.
entropy_extract may sleep on an adaptive lock, which invalidates
percpu(9) references.

Add a note in the comment over entropy_extract about this.

Discovered by stumbling upon this panic during a test run:

[   1.0200050] panic: kernel diagnostic assertion "(cprng == percpu_getref(cprng_fast_percpu)) && (percpu_putref(cprng_fast_percpu), true)" failed: file "/home/riastradh/netbsd/current/src/sys/rump/librump/rumpkern/../../../crypto/cprng_fast/cprng_fast.c", line 117

XXX pullup-10
2023-08-05 11:21:24 +00:00
andvar 69dbeb3df8 s/acccept/accept/ in comment. 2023-08-05 09:25:39 +00:00
riastradh 72c927ccb4 entropy(9): Disable !cold assertion in rump for now.
Evidently rump starts threads before it sets cold = 0, and deferring
the call to rump_thread_allow(NULL) in rump.c rump_init_callback
until after setting cold = 0 doesn't work either -- rump kernels just
hang.  To be investigated -- for now, let's just stop breaking
thousands of tests.
2023-08-04 16:02:01 +00:00
riastradh f4fdfc607f Revert "softint(9): Sprinkle KASSERT(!cold)."
Temporary workaround for PR kern/57563 -- to be fixed properly after
analysis.
2023-08-04 12:24:36 +00:00
riastradh aa8e725d99 softint(9): Sprinkle KASSERT(!cold).
Softints are forbidden to run while cold.  So let's make sure nobody
even tries it -- if they do, they might be delayed indefinitely,
which is going to be much harder to diagnose.
2023-08-04 07:40:30 +00:00
riastradh 3586ae1d3b entropy(9): Simplify stages. Split interrupt vs non-interrupt paths.
- Nix the entropy stage (cold, warm, hot).  Just use the usual kernel
  `cold' (cold: single-core, single-thread; interrupts may happen),
  and don't make any three-way distinction about whether interrupts
  or threads or other CPUs can be running.

  Instead, while cold, use splhigh/splx or forbid paths to come from
  interrupt context, and while warm, use mutex or the per-CPU hard
  and soft interrupt paths for low latency.  This comes at a small
  cost to some interrupt latency, since we may stir the pool in
  interrupt context -- but only for a very short window early at boot
  between configure and configure2, so it's hard to imagine it
  matters much.

- Allow rnd_add_uint32 to run in hard interrupt context or with spin
  locks held, but defer processing to softint and drop samples on the
  floor if buffer is full.  This is mainly used for cheaply tossing
  samples from drivers for non-HWRNG devices into the entropy pool,
  so it is often used from interrupt context and/or under spin locks.

- New rnd_add_data_intr provides the interrupt-like data entry path
  for arbitrary buffers and driver-specified entropy estimates: defer
  processing to softint and drop samples on the floor if buffer is
  full.

- Document that rnd_add_data is forbidden under spin locks outside
  interrupt context (will crash in LOCKDEBUG), and inadvisable in
  interrupt context (but technically permitted just in case there are
  compatibility issues for now); later we can forbid it altogether in
  interrupt context or under spin locks.

- Audit all uses of rnd_add_data to use rnd_add_data_intr where it
  might be used in interrupt context or under a spin lock.

This fixes a regression from last year when the global entropy lock
was changed from IPL_VM (spin) to IPL_SOFTSERIAL (adaptive).  Thought
I'd caught all the problems from that, but another one bit three
different people this week, presumably because of recent changes that
led to more non-HWRNG drivers entering the entropy consolidation
path from rnd_add_uint32.

In my attempt to preserve the rnd(9) API for the (now long-since
abandoned) prospect of pullup to netbsd-9 in my rewrite of the
entropy subsystem in 2020, I didn't introduce a separate entry point
for entering entropy from interrupt context or equivalent, i.e., spin
locks held, and instead made rnd_add_data rely on cpu_intr_p() to
decide whether to process the whole sample under a lock or only take
as much as there's buffer space for before scheduling a softint.  In
retrospect, that was a mistake (though perhaps not as much of a
mistake as other entropy API decisions...), a mistake which is
finally getting rectified now by rnd_add_data_intr.

XXX pullup-10
2023-08-04 07:38:53 +00:00
riastradh 140f7071dd fileassoc(9): Fast paths to skip global locks when not in use.
PR kern/57552
2023-08-02 07:11:31 +00:00
christos 2c545067c7 Add EPOLL_CLOEXEC (Theodore Preduta) 2023-07-30 18:31:13 +00:00
riastradh f381a67eb4 timecounter(9): Nix trailing whitespace.
No functional change intended.
2023-07-30 12:39:18 +00:00
rin 7c274a73de sys_epoll: whitespace -> tab. no binary changes. 2023-07-30 04:39:00 +00:00
rin 9920a46aac sys_memfd: Comply with our implicit naming convention;
do_memfd_truncate() --> memfd_truncate_locked(). NFC.
2023-07-29 23:59:59 +00:00
rin a00d9217b4 sys_memfd: Fix logic errors for offset in the previous. 2023-07-29 23:51:29 +00:00
christos 4ab15e90fb Fix locking and offset issues pointed out by @riastradh (Theodore Preduta) 2023-07-29 17:54:54 +00:00
christos d3ba7ba3a2 Add tests for t_memfd_create and fix bug found by tests 2023-07-29 12:16:34 +00:00
riastradh ff3504bd37 sys: Rename sys/miscfd.h -> sys/memfd.h.
Let's not create new dumping grounds for miscellaneous stuff; one
header file for one purpose.
2023-07-29 08:46:47 +00:00
riastradh 1a841079ee memfd(2): Convert KASSERT to CTASSERT.
Move it closer to where it's relevant too.
2023-07-29 08:46:27 +00:00
pgoyette 2e3b428875 remove extra `_' to fix debug build 2023-07-29 04:06:32 +00:00
christos 63ea783feb regen 2023-07-28 18:20:28 +00:00
christos d11110f473 Add epoll(2) from Theodore Preduta as part of GSoC 2023 2023-07-28 18:18:59 +00:00
riastradh 002ee648a8 timecounter(9): Link to phk's timecounter paper for reference.
No functional change intended.
2023-07-28 10:37:28 +00:00
riastradh a263f0256c kern: Restore non-atomic time_second symbol.
This is used by savecore(8), vmstat(8), and possibly other things.

XXX Should really teach dump and savecore(8) to use an intentionally
designed header, rather than relying on kvm(3) -- and make it work
for saving cores from other kernels so you don't have to boot the
same buggy kernel to get a core dump.
2023-07-27 01:48:49 +00:00
riastradh 841a5477f2 autoconf(9): Print `waiting for devices' normally once a minute. 2023-07-18 11:57:37 +00:00
riastradh 86f2cee2fd device_printf(9): Lock to avoid interleaving output.
XXX pullup-9
XXX pullup-10
2023-07-17 22:57:35 +00:00
riastradh 113db6d7a3 timecounter(9): Sprinkle membar_consumer around th->th_generation.
This code was apparently written under the misapprehension that
membar_producer on one side is good enough, but that doesn't
accomplish anything other than making the code unnecessarily obscure.

For semantics, you always always always need memory barriers to come
in pairs, with membar_consumer on the reading side if you want the
membar_producer to have on the writing side to have any useful
effect.

It is unfortunate that this might hurt performance, but that's an
unfortunate consequence of the design made without understanding
memory barriers, not an unfortunate consequence of the memory
barriers.

If it is really critical for the read side to avoid memory barriers,
then the write side needs to issue an IPI or xcall to effect memory
barriers -- at higher cost to the write side, of course.
2023-07-17 21:51:45 +00:00
riastradh 69dc442726 timecounter(9): Use atomic_store_release/load_consume for timehands.
This probably fixes real bugs on Alpha and makes the synchronization
pattern clearer everywhere.
2023-07-17 21:51:30 +00:00
riastradh f9c3bb074c timecounter(9): Use seqlock for atomic snapshots of timebase. 2023-07-17 21:51:20 +00:00
riastradh 4dc2a56d7d Revert "timecounter(9): Use an ipi barrier on time_second/uptime rollover."
Evidently rump doesn't have ipi, so this won't work unless we have an
alternate approach for rump.
2023-07-17 15:41:05 +00:00
riastradh f748b08cbe timecounter(9): No static; committed wrong version of patch. 2023-07-17 13:48:14 +00:00
riastradh a41dd9cf49 timecounter(9): Limit scope of time__second/uptime.
Relevant only if __HAVE_ATOMIC64_LOADSTORE -- not updated otherwise.
2023-07-17 13:44:24 +00:00
riastradh acb819b891 timecounter(9): Use an ipi barrier on time_second/uptime rollover.
This way we only need __insn_barrier, not membar_consumer, on the
read side.
2023-07-17 13:42:23 +00:00
riastradh 0050e3876e timecounter(9): Revert last -- timecounter_lock is already IPL_HIGH. 2023-07-17 13:42:02 +00:00
riastradh 3d1d26e54d timecounter(9): Ward off interrupts during time_second/uptime update.
Only relevant during 32-bit wraparound, so the potential performance
impact of using splhigh here is negligible; indeed, we would have to
go out of our way to exercise this in tests before it will ever
happen in the next century.
2023-07-17 13:35:07 +00:00
riastradh 32fba50890 timecounter(9): Fix thinko in previous.
Swapped the wrong variable in this mental macro expansion!
2023-07-17 13:29:12 +00:00
riastradh 5524172f85 kern: Make time_second and time_uptime macros that work atomically.
These use atomic load on platforms with atomic 64-bit load, and
seqlocks on platforms without.

This has the unfortunate side effect of slightly reducing the real
times available on 32-bit platforms, from ending some time in the
year 584942417218 AD, available on 64-bit platforms, to ending some
time in the year 584942417355 AD.  But during that slightly shorter
time, 32-bit platforms can avoid bugs arising from non-atomic access
to time_uptime and time_second.

Note: All platforms still have non-atomic access problems for
bintime, binuptime, nanotime, nanouptime, &c.  This can be addressed
by putting a seqlock around timebasebin and possibly some other
variable -- to be done in a later change.

XXX kernel ABI change -- deleting symbols
2023-07-17 12:55:20 +00:00
riastradh f485358332 kern: New struct syncobj::sobj_name member for diagnostics.
XXX potential kernel ABI change -- not sure any modules actually use
struct syncobj but it's hard to rule that out because sys/syncobj.h
leaks into sys/lwp.h
2023-07-17 12:54:29 +00:00
riastradh 614a646114 kthread(9): Fix nested kthread_join.
No reason for one kthread_join to interfere with another, or to cause
non-cyclic dependencies to get stuck.

Uses struct lwp::l_private for this purpose, which for user threads
stores the tls pointer.  I don't think anything in kthread(9) uses
l_private -- generally kernel threads will use lwp specificdata.  But
maybe this should use a new member instead, or a union member with an
existing pointer for the purpose.
2023-07-17 10:55:27 +00:00