NetBSD

Author	SHA1	Message	Date
riastradh	53ecfc3aad	Restore xcall(9) fast path using atomic_load/store_. While here, fix a bug that was formerly in xcall(9): a missing acquire operation in the xc_wait fast path so that all memory operations in the xcall on remote CPUs will happen before any memory operations on the issuing CPU after xc_wait returns. All stores of xc->xc_donep are done with atomic_store_release so that we can safely use atomic_load_acquire to read it outside the lock. However, this fast path only works on platforms with cheap 64-bit atomic load/store, so conditionalize it on __HAVE_ATOMIC64_LOADSTORE. (Under the lock, no need for atomic loads since nobody else will be issuing stores.) For review, here's the relevant diff from the old version of the fast path, from before it was removed and some other things changed in the file: diff --git a/sys/kern/subr_xcall.c b/sys/kern/subr_xcall.c index 45a877aa90e0..b6bfb6455291 100644 --- a/sys/kern/subr_xcall.c +++ b/sys/kern/subr_xcall.c @@ -84,6 +84,7 @@ __KERNEL_RCSID(0, "$NetBSD: subr_xcall.c,v 1.27 2019/10/06 15:11:17 uwe Exp $"); #include <sys/evcnt.h> #include <sys/kthread.h> #include <sys/cpu.h> +#include <sys/atomic.h> #ifdef _RUMPKERNEL #include "rump_private.h" @@ -334,10 +353,12 @@ xc_wait(uint64_t where) xc = &xc_low_pri; } +#ifdef __HAVE_ATOMIC64_LOADSTORE / Fast path, if already done. / - if (xc->xc_donep >= where) { + if (atomic_load_acquire(&xc->xc_donep) >= where) { return; } +#endif / Slow path: block until awoken. / mutex_enter(&xc->xc_lock); @@ -422,7 +443,11 @@ xc_thread(void cookie) (func)(arg1, arg2); mutex_enter(&xc->xc_lock); +#ifdef __HAVE_ATOMIC64_LOADSTORE + atomic_store_release(&xc->xc_donep, xc->xc_donep + 1); +#else xc->xc_donep++; +#endif } / NOTREACHED / } @@ -462,7 +487,6 @@ xc__highpri_intr(void dummy) * Lock-less fetch of function and its arguments. * Safe since it cannot change at this point. / - KASSERT(xc->xc_donep < xc->xc_headp); func = xc->xc_func; arg1 = xc->xc_arg1; arg2 = xc->xc_arg2; @@ -475,7 +499,13 @@ xc__highpri_intr(void dummy) * cross-call has been processed - notify waiters, if any. */ mutex_enter(&xc->xc_lock); - if (++xc->xc_donep == xc->xc_headp) { + KASSERT(xc->xc_donep < xc->xc_headp); +#ifdef __HAVE_ATOMIC64_LOADSTORE + atomic_store_release(&xc->xc_donep, xc->xc_donep + 1); +#else + xc->xc_donep++; +#endif + if (xc->xc_donep == xc->xc_headp) { cv_broadcast(&xc->xc_busy); } mutex_exit(&xc->xc_lock);	2019-12-01 20:56:39 +00:00
ad	f36df6629d	Avoid calling pmap_page_protect() while under uvm_pageqlock.	2019-12-01 20:31:40 +00:00
jmcneill	84d4c1fb40	Enable ZFS support on aarch64	2019-12-01 20:28:25 +00:00
jmcneill	9f2ee97f0a	Flush insn / data caches after loading modules	2019-12-01 20:27:26 +00:00
jmcneill	e578db34f0	Need sys/atomic.h on NetBSD	2019-12-01 20:26:31 +00:00
jmcneill	fa74c92e0a	Provide a default ptob() implementation	2019-12-01 20:26:05 +00:00
jmcneill	87afc7bc0f	Initialize b_dev before passing buf to d_minphys (ldminphys needs this)	2019-12-01 20:25:31 +00:00
jmcneill	2e3c4047ee	Build aarch64 modules without fp or simd instructions.	2019-12-01 20:24:47 +00:00
ad	ea045f02e7	Another instance of cpu_onproc to replace.	2019-12-01 19:21:13 +00:00
ad	bcbc56a72a	Regen.	2019-12-01 18:32:07 +00:00
ad	2b25ff6d06	Back out previous temporarily - seeing unusual lookup failures. Will come back to it.	2019-12-01 18:31:19 +00:00
ad	f278a3b979	Add ci_onproc.	2019-12-01 18:29:26 +00:00
ad	64e45337af	cpu_onproc -> ci_onproc	2019-12-01 18:12:51 +00:00
kamil	82c05df197	Switch in_interrupt() in KCOV to cpu_intr_p() This makes KCOV more MI friendly and removes x86-specific in_interrupt() implementation.	2019-12-01 17:41:11 +00:00
kamil	018b416e9d	Disable KCOV instrumentation in x86_machdep.c This allows to use cpu_intr_p() directly inside KCOV.	2019-12-01 17:25:47 +00:00
ad	10fb14e25f	Init kern_runq and kern_synch before booting secondary CPUs.	2019-12-01 17:08:31 +00:00
ad	ca7481a7dd	Back out the fastpath change in xc_wait(). It's going to be done differently.	2019-12-01 17:06:00 +00:00
ad	b33d8c3694	Free pages in batch instead of taking uvm_pageqlock for each one.	2019-12-01 17:02:50 +00:00
ad	5ce257a95c	__cacheline_aligned on a lock.	2019-12-01 16:44:11 +00:00
ad	2ed8ce1089	NetBSD 9.99.19 - many kernel data structure changes	2019-12-01 16:36:25 +00:00
tkusumi	c6a1f11f46	dm: Fix race on pdev create List lookup and insert need to be atomic. `ac816675c8` take-from: DragonFlyBSD	2019-12-01 16:33:33 +00:00
ad	e398df6f78	Make the fast path in xc_wait() depend on _LP64 for now. Needs 64-bit load/store. To be revisited.	2019-12-01 16:32:01 +00:00
riastradh	6dde3af7e3	Mark unreachable branch with __unreachable() to fix i386/ALL build.	2019-12-01 16:22:10 +00:00
ad	57eb66c673	Fix false sharing problems with cpu_info. Identified with tprof(8). This was a very nice win in my tests on a 48 CPU box. - Reorganise cpu_data slightly according to usage. - Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc). - On x86, put some items in their own cache lines according to usage, like the IPI bitmask and ci_want_resched.	2019-12-01 15:34:44 +00:00
riastradh	d6cbc02da6	Adapt <sys/pslist.h> to use atomic_load/store_. Changes: - membar_producer(); p = v; => atomic_store_release(p, v); (Effectively like using membar_exit instead of membar_producer, which is what we should have been doing all along so that stores by the `reader' can't affect earlier loads by the writer, such as KASSERT(p->refcnt == 0) in the writer and atomic_inc(&p->refcnt) in the reader.) - p = *pp; if (p != NULL) membar_datadep_consumer(); => p = atomic_load_consume(pp); (Only makes a difference on DEC Alpha. As long as lists generally have at least one element, this is not likely to make a big difference, and keeps the code simpler and clearer.) No other functional change intended. While here, annotate each synchronizing load and store with its counterpart in a comment.	2019-12-01 15:28:19 +00:00
riastradh	3006828963	Rework modified atomic_load/store_* to work on const pointers.	2019-12-01 15:28:02 +00:00
ad	c242783135	Fix a longstanding problem with LWP limits. When changing the user's LWP count, we must use the process credentials because that's what the accounting entity is tied to. Reported-by: syzbot+d193266676f635661c62@syzkaller.appspotmail.com	2019-12-01 15:27:58 +00:00
jmcneill	19487d1aef	Remove the pretty much useless 128MB swap partition from the arm images.	2019-12-01 15:07:04 +00:00
ad	7ce773db14	Make cpu_intr_p() safe to use anywhere, i.e. outside assertions: Don't call kpreempt_disable() / kpreempt_enable() to make sure we're not preempted while using the value of curcpu(). Instead, observe the value of l_ncsw before and after the check to see if we have been preempted. If we have been preempted, then we need to retry the read.	2019-12-01 14:52:13 +00:00
ad	2fa8dbd876	Minor correction to previous.	2019-12-01 14:43:26 +00:00
ad	221d5f982e	- Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static. - A bit more __cacheline_aligned on mutexes.	2019-12-01 14:40:31 +00:00
ad	0aaea1d84e	Deactivate pages in batch instead of acquiring uvm_pageqlock repeatedly.	2019-12-01 14:30:01 +00:00
ad	6e176d2434	Give each of the page queue locks their own cache line.	2019-12-01 14:28:01 +00:00
ad	24e75c17af	Activate pages in batch instead of acquring uvm_pageqlock a zillion times.	2019-12-01 14:24:43 +00:00
ad	4bc8197e77	If the system is not up and running yet, just run the function locally.	2019-12-01 14:20:00 +00:00
ad	2ca6e3ffb4	Map the video RAM cacheable/prefetchable, it's very slow and this helps a bit.	2019-12-01 14:18:51 +00:00
ad	80e17de9fd	Update to match change in layout of vnode LRU lists.	2019-12-01 14:04:52 +00:00
ad	94bb47e411	Regen for VOP_LOCK & LK_UPGRADE/LK_DOWNGRADE.	2019-12-01 13:58:52 +00:00
ad	fb0bbaf12a	Minor vnode locking changes: - Stop using atomics to maniupulate v_usecount. It was a mistake to begin with. It doesn't work as intended unless the XLOCK bit is incorporated in v_usecount and we don't have that any more. When I introduced this 10+ years ago it was to reduce pressure on v_interlock but it doesn't do that, it just makes stuff disappear from lockstat output and introduces problems elsewhere. We could do atomic usecounts on vnodes but there has to be a well thought out scheme. - Resurrect LK_UPGRADE/LK_DOWNGRADE which will be needed to work effectively when there is increased use of shared locks on vnodes. - Allocate the vnode lock using rw_obj_alloc() to reduce false sharing of struct vnode. - Put all of the LRU lists into a single cache line, and do not requeue a vnode if it's already on the correct list and was requeued recently (less than a second ago). Kernel build before and after: 119.63s real 1453.16s user 2742.57s system 115.29s real 1401.52s user 2690.94s system	2019-12-01 13:56:29 +00:00
ad	ce41050c72	Regen.	2019-12-01 13:46:34 +00:00
ad	c03599c0cd	Make nc_hittime volatile to defeat compiler cleverness.	2019-12-01 13:45:42 +00:00
ad	566436656a	namecache changes: - Delete the per-entry lock, and borrow the associated vnode's v_interlock instead. We need to acquire it during lookup anyway. We can revisit this in the future but for now it's a stepping stone, and works within the quite limited context of what we have (BSD namecache/lookup design). - Implement an idea that Mateusz Guzik (mjg@FreeBSD.org) gave me. In cache_reclaim(), we don't need to lock out all of the CPUs to garbage collect entries. All we need to do is observe their locks unheld at least once: then we know they are not in the critical section, and no longer have visibility of the entries about to be garbage collected. - The above makes it safe for sysctl to take only namecache_lock to get stats, and we can remove all the crap dealing with per-CPU locks. - For lockstat, make namecache_lock a static now we have __cacheline_aligned. - Avoid false sharing - don't write back to nc_hittime unless it has changed. Put a a comment in place explaining this. Pretty sure this was there in 2008/2009 but someone removed it (understandably, the code looks weird). - Use a mutex to protect the garbage collection queue instead of atomics, and adjust the low water mark up so that cache_reclaim() isn't doing so much work at once.	2019-12-01 13:39:53 +00:00
ad	036b61e0aa	PR port-sparc/54718 (sparc install hangs since recent scheduler changes) - sched_tick: cpu_need_resched is no longer the correct thing to do here. All we need to do is OR the request into the local ci_want_resched. - sched_resched_cpu: we need to set RESCHED_UPREEMPT even on softint LWPs, especially in the !__HAVE_FAST_SOFTINTS case, because the LWP with the LP_INTR flag could be running via softint_overlay() - i.e. it has been temporarily borrowed from a user process, and it needs to notice the resched after it has stopped running softints.	2019-12-01 13:20:42 +00:00
maxv	43e684e6ce	minor adjustments, to avoid warnings on debug builds	2019-12-01 12:47:10 +00:00
ad	106905eaf9	sh3: make ASTs work as expected, and fix a few things in the TLB refill path. With help from uwe@ and martin@.	2019-12-01 12:19:28 +00:00
martin	ef330e1237	Add missing <sys/atomic.h> include	2019-12-01 10:19:59 +00:00
maxv	97b908dcc6	localify	2019-12-01 08:23:09 +00:00
maxv	f0ea087db1	Use atomic_{load,store}_relaxed() on global counters.	2019-12-01 08:19:09 +00:00
msaitoh	76fbad7623	Use unsigned to avoid undefined behavoir. Found by kUBSan.	2019-12-01 08:16:49 +00:00
maxv	890f284aec	Add KCSAN instrumentation for atomic_{load,store}_*.	2019-12-01 08:15:58 +00:00

1 2 3 4 5 ...

271847 Commits