Commit Graph

271847 Commits

Author SHA1 Message Date
riastradh
53ecfc3aad Restore xcall(9) fast path using atomic_load/store_*.
While here, fix a bug that was formerly in xcall(9): a missing
acquire operation in the xc_wait fast path so that all memory
operations in the xcall on remote CPUs will happen before any memory
operations on the issuing CPU after xc_wait returns.

All stores of xc->xc_donep are done with atomic_store_release so that
we can safely use atomic_load_acquire to read it outside the lock.
However, this fast path only works on platforms with cheap 64-bit
atomic load/store, so conditionalize it on __HAVE_ATOMIC64_LOADSTORE.
(Under the lock, no need for atomic loads since nobody else will be
issuing stores.)

For review, here's the relevant diff from the old version of the fast
path, from before it was removed and some other things changed in the
file:

diff --git a/sys/kern/subr_xcall.c b/sys/kern/subr_xcall.c
index 45a877aa90e0..b6bfb6455291 100644
--- a/sys/kern/subr_xcall.c
+++ b/sys/kern/subr_xcall.c
@@ -84,6 +84,7 @@ __KERNEL_RCSID(0, "$NetBSD: subr_xcall.c,v 1.27 2019/10/06 15:11:17 uwe Exp $");
 #include <sys/evcnt.h>
 #include <sys/kthread.h>
 #include <sys/cpu.h>
+#include <sys/atomic.h>

 #ifdef _RUMPKERNEL
 #include "rump_private.h"
@@ -334,10 +353,12 @@ xc_wait(uint64_t where)
 		xc = &xc_low_pri;
 	}

+#ifdef __HAVE_ATOMIC64_LOADSTORE
 	/* Fast path, if already done. */
-	if (xc->xc_donep >= where) {
+	if (atomic_load_acquire(&xc->xc_donep) >= where) {
 		return;
 	}
+#endif

 	/* Slow path: block until awoken. */
 	mutex_enter(&xc->xc_lock);
@@ -422,7 +443,11 @@ xc_thread(void *cookie)
 		(*func)(arg1, arg2);

 		mutex_enter(&xc->xc_lock);
+#ifdef __HAVE_ATOMIC64_LOADSTORE
+		atomic_store_release(&xc->xc_donep, xc->xc_donep + 1);
+#else
 		xc->xc_donep++;
+#endif
 	}
 	/* NOTREACHED */
 }
@@ -462,7 +487,6 @@ xc__highpri_intr(void *dummy)
 	 * Lock-less fetch of function and its arguments.
 	 * Safe since it cannot change at this point.
 	 */
-	KASSERT(xc->xc_donep < xc->xc_headp);
 	func = xc->xc_func;
 	arg1 = xc->xc_arg1;
 	arg2 = xc->xc_arg2;
@@ -475,7 +499,13 @@ xc__highpri_intr(void *dummy)
 	 * cross-call has been processed - notify waiters, if any.
 	 */
 	mutex_enter(&xc->xc_lock);
-	if (++xc->xc_donep == xc->xc_headp) {
+	KASSERT(xc->xc_donep < xc->xc_headp);
+#ifdef __HAVE_ATOMIC64_LOADSTORE
+	atomic_store_release(&xc->xc_donep, xc->xc_donep + 1);
+#else
+	xc->xc_donep++;
+#endif
+	if (xc->xc_donep == xc->xc_headp) {
 		cv_broadcast(&xc->xc_busy);
 	}
 	mutex_exit(&xc->xc_lock);
2019-12-01 20:56:39 +00:00
ad
f36df6629d Avoid calling pmap_page_protect() while under uvm_pageqlock. 2019-12-01 20:31:40 +00:00
jmcneill
84d4c1fb40 Enable ZFS support on aarch64 2019-12-01 20:28:25 +00:00
jmcneill
9f2ee97f0a Flush insn / data caches after loading modules 2019-12-01 20:27:26 +00:00
jmcneill
e578db34f0 Need sys/atomic.h on NetBSD 2019-12-01 20:26:31 +00:00
jmcneill
fa74c92e0a Provide a default ptob() implementation 2019-12-01 20:26:05 +00:00
jmcneill
87afc7bc0f Initialize b_dev before passing buf to d_minphys (ldminphys needs this) 2019-12-01 20:25:31 +00:00
jmcneill
2e3c4047ee Build aarch64 modules without fp or simd instructions. 2019-12-01 20:24:47 +00:00
ad
ea045f02e7 Another instance of cpu_onproc to replace. 2019-12-01 19:21:13 +00:00
ad
bcbc56a72a Regen. 2019-12-01 18:32:07 +00:00
ad
2b25ff6d06 Back out previous temporarily - seeing unusual lookup failures. Will
come back to it.
2019-12-01 18:31:19 +00:00
ad
f278a3b979 Add ci_onproc. 2019-12-01 18:29:26 +00:00
ad
64e45337af cpu_onproc -> ci_onproc 2019-12-01 18:12:51 +00:00
kamil
82c05df197 Switch in_interrupt() in KCOV to cpu_intr_p()
This makes KCOV more MI friendly and removes x86-specific in_interrupt()
implementation.
2019-12-01 17:41:11 +00:00
kamil
018b416e9d Disable KCOV instrumentation in x86_machdep.c
This allows to use cpu_intr_p() directly inside KCOV.
2019-12-01 17:25:47 +00:00
ad
10fb14e25f Init kern_runq and kern_synch before booting secondary CPUs. 2019-12-01 17:08:31 +00:00
ad
ca7481a7dd Back out the fastpath change in xc_wait(). It's going to be done differently. 2019-12-01 17:06:00 +00:00
ad
b33d8c3694 Free pages in batch instead of taking uvm_pageqlock for each one. 2019-12-01 17:02:50 +00:00
ad
5ce257a95c __cacheline_aligned on a lock. 2019-12-01 16:44:11 +00:00
ad
2ed8ce1089 NetBSD 9.99.19 - many kernel data structure changes 2019-12-01 16:36:25 +00:00
tkusumi
c6a1f11f46 dm: Fix race on pdev create
List lookup and insert need to be atomic.
ac816675c8

take-from: DragonFlyBSD
2019-12-01 16:33:33 +00:00
ad
e398df6f78 Make the fast path in xc_wait() depend on _LP64 for now. Needs 64-bit
load/store.  To be revisited.
2019-12-01 16:32:01 +00:00
riastradh
6dde3af7e3 Mark unreachable branch with __unreachable() to fix i386/ALL build. 2019-12-01 16:22:10 +00:00
ad
57eb66c673 Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
  the IPI bitmask and ci_want_resched.
2019-12-01 15:34:44 +00:00
riastradh
d6cbc02da6 Adapt <sys/pslist.h> to use atomic_load/store_*.
Changes:

- membar_producer();
  *p = v;

    =>

  atomic_store_release(p, v);

  (Effectively like using membar_exit instead of membar_producer,
  which is what we should have been doing all along so that stores by
  the `reader' can't affect earlier loads by the writer, such as
  KASSERT(p->refcnt == 0) in the writer and atomic_inc(&p->refcnt) in
  the reader.)

- p = *pp;
  if (p != NULL) membar_datadep_consumer();

    =>

  p = atomic_load_consume(pp);

  (Only makes a difference on DEC Alpha.  As long as lists generally
  have at least one element, this is not likely to make a big
  difference, and keeps the code simpler and clearer.)

No other functional change intended.  While here, annotate each
synchronizing load and store with its counterpart in a comment.
2019-12-01 15:28:19 +00:00
riastradh
3006828963 Rework modified atomic_load/store_* to work on const pointers. 2019-12-01 15:28:02 +00:00
ad
c242783135 Fix a longstanding problem with LWP limits. When changing the user's
LWP count, we must use the process credentials because that's what
the accounting entity is tied to.

Reported-by: syzbot+d193266676f635661c62@syzkaller.appspotmail.com
2019-12-01 15:27:58 +00:00
jmcneill
19487d1aef Remove the pretty much useless 128MB swap partition from the arm images. 2019-12-01 15:07:04 +00:00
ad
7ce773db14 Make cpu_intr_p() safe to use anywhere, i.e. outside assertions:
Don't call kpreempt_disable() / kpreempt_enable() to make sure we're not
preempted while using the value of curcpu().  Instead, observe the value of
l_ncsw before and after the check to see if we have been preempted.  If
we have been preempted, then we need to retry the read.
2019-12-01 14:52:13 +00:00
ad
2fa8dbd876 Minor correction to previous. 2019-12-01 14:43:26 +00:00
ad
221d5f982e - Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.
2019-12-01 14:40:31 +00:00
ad
0aaea1d84e Deactivate pages in batch instead of acquiring uvm_pageqlock repeatedly. 2019-12-01 14:30:01 +00:00
ad
6e176d2434 Give each of the page queue locks their own cache line. 2019-12-01 14:28:01 +00:00
ad
24e75c17af Activate pages in batch instead of acquring uvm_pageqlock a zillion times. 2019-12-01 14:24:43 +00:00
ad
4bc8197e77 If the system is not up and running yet, just run the function locally. 2019-12-01 14:20:00 +00:00
ad
2ca6e3ffb4 Map the video RAM cacheable/prefetchable, it's very slow and this helps a bit. 2019-12-01 14:18:51 +00:00
ad
80e17de9fd Update to match change in layout of vnode LRU lists. 2019-12-01 14:04:52 +00:00
ad
94bb47e411 Regen for VOP_LOCK & LK_UPGRADE/LK_DOWNGRADE. 2019-12-01 13:58:52 +00:00
ad
fb0bbaf12a Minor vnode locking changes:
- Stop using atomics to maniupulate v_usecount.  It was a mistake to begin
  with.  It doesn't work as intended unless the XLOCK bit is incorporated in
  v_usecount and we don't have that any more.  When I introduced this 10+
  years ago it was to reduce pressure on v_interlock but it doesn't do that,
  it just makes stuff disappear from lockstat output and introduces problems
  elsewhere.  We could do atomic usecounts on vnodes but there has to be a
  well thought out scheme.

- Resurrect LK_UPGRADE/LK_DOWNGRADE which will be needed to work effectively
  when there is increased use of shared locks on vnodes.

- Allocate the vnode lock using rw_obj_alloc() to reduce false sharing of
  struct vnode.

- Put all of the LRU lists into a single cache line, and do not requeue a
  vnode if it's already on the correct list and was requeued recently (less
  than a second ago).

Kernel build before and after:

119.63s real  1453.16s user  2742.57s system
115.29s real  1401.52s user  2690.94s system
2019-12-01 13:56:29 +00:00
ad
ce41050c72 Regen. 2019-12-01 13:46:34 +00:00
ad
c03599c0cd Make nc_hittime volatile to defeat compiler cleverness. 2019-12-01 13:45:42 +00:00
ad
566436656a namecache changes:
- Delete the per-entry lock, and borrow the associated vnode's v_interlock
  instead.  We need to acquire it during lookup anyway.  We can revisit this
  in the future but for now it's a stepping stone, and works within the
  quite limited context of what we have (BSD namecache/lookup design).

- Implement an idea that Mateusz Guzik (mjg@FreeBSD.org) gave me.  In
  cache_reclaim(), we don't need to lock out all of the CPUs to garbage
  collect entries.  All we need to do is observe their locks unheld at least
  once: then we know they are not in the critical section, and no longer
  have visibility of the entries about to be garbage collected.

- The above makes it safe for sysctl to take only namecache_lock to get stats,
  and we can remove all the crap dealing with per-CPU locks.

- For lockstat, make namecache_lock a static now we have __cacheline_aligned.

- Avoid false sharing - don't write back to nc_hittime unless it has changed.
  Put a a comment in place explaining this.  Pretty sure this was there in
  2008/2009 but someone removed it (understandably, the code looks weird).

- Use a mutex to protect the garbage collection queue instead of atomics, and
  adjust the low water mark up so that cache_reclaim() isn't doing so much
  work at once.
2019-12-01 13:39:53 +00:00
ad
036b61e0aa PR port-sparc/54718 (sparc install hangs since recent scheduler changes)
- sched_tick: cpu_need_resched is no longer the correct thing to do here.
  All we need to do is OR the request into the local ci_want_resched.

- sched_resched_cpu: we need to set RESCHED_UPREEMPT even on softint LWPs,
  especially in the !__HAVE_FAST_SOFTINTS case, because the LWP with the
  LP_INTR flag could be running via softint_overlay() - i.e. it has been
  temporarily borrowed from a user process, and it needs to notice the
  resched after it has stopped running softints.
2019-12-01 13:20:42 +00:00
maxv
43e684e6ce minor adjustments, to avoid warnings on debug builds 2019-12-01 12:47:10 +00:00
ad
106905eaf9 sh3: make ASTs work as expected, and fix a few things in the TLB refill path.
With help from uwe@ and martin@.
2019-12-01 12:19:28 +00:00
martin
ef330e1237 Add missing <sys/atomic.h> include 2019-12-01 10:19:59 +00:00
maxv
97b908dcc6 localify 2019-12-01 08:23:09 +00:00
maxv
f0ea087db1 Use atomic_{load,store}_relaxed() on global counters. 2019-12-01 08:19:09 +00:00
msaitoh
76fbad7623 Use unsigned to avoid undefined behavoir. Found by kUBSan. 2019-12-01 08:16:49 +00:00
maxv
890f284aec Add KCSAN instrumentation for atomic_{load,store}_*. 2019-12-01 08:15:58 +00:00