NetBSD

Commit Graph

Author	SHA1	Message	Date
skrll	16e0b464c4	Fix the copy&paste botch from previous. Spotted by Tom Lane.	2022-05-16 06:07:23 +00:00
skrll	bbf56d84ab	* empty log message *	2022-05-14 05:35:55 +00:00
riastradh	4a6459a8d1	mips/cavium: Take advantage of Octeon's guaranteed r/rw ordering.	2022-04-21 12:06:31 +00:00
riastradh	cfa39f97b0	libc/atomic: Fix membars in __atomic_load/store_* stubs. - membar_enter/exit ordering was backwards. - membar_enter doesn't make any sense for load anyway. - Switch to membar_release for store and membar_acquire for load. The only sensible orderings for a simple load or store are acquire or release, respectively, or sequential consistency. This never provided correct sequential consistency before -- we should really make it conditional on memmodel but I don't know offhand what the values of memmodel might be and this is at least better than before.	2022-04-09 23:38:57 +00:00
riastradh	4f8ce3b31d	Introduce membar_acquire/release. Deprecate membar_enter/exit. The names membar_enter/exit were unclear, and the documentation of membar_enter has disagreed with the implementations on sparc, powerpc, and even x86(!) for the entire time it has been in NetBSD. The terms `acquire' and `release' are ubiquitous in the literature today, and have been adopted in the C and C++ standards to mean load-before-load/store and load/store-before-store, respectively, which are exactly the orderings required by acquiring and releasing a mutex, as well as other useful applications like decrementing a reference count and then freeing the underlying object if it went to zero. Originally I proposed changing one word in the documentation for membar_enter to make it load-before-load/store instead of store-before-load/store, i.e., to make it an acquire barrier. I proposed this on the grounds that (a) all implementations guarantee load-before-load/store, (b) some implementations fail to guarantee store-before-load/store, and (c) all uses in-tree assume load-before-load/store. I verified parts (a) and (b) (except, for (a), powerpc didn't even guarantee load-before-load/store -- isync isn't necessarily enough; need lwsync in general -- but it _almost_ did, and it certainly didn't guarantee store-before-load/store). Part (c) might not be correct, however: under the mistaken assumption that atomic-r/m/w then membar-w/rw is equivalent to atomic-r/m/w then membar-r/rw, I only audited the cases of membar_enter that _aren't_ immediately after an atomic-r/m/w. All of those cases assume load-before-load/store. But my assumption was wrong -- there are cases of atomic-r/m/w then membar-w/rw that would be broken by changing to atomic-r/m/w then membar-r/rw: https://mail-index.netbsd.org/tech-kern/2022/03/29/msg028044.html Furthermore, the name membar_enter has been adopted in other places like OpenBSD where it actually does follow the documentation and guarantee store-before-load/store, even if that order is not useful. So the name membar_enter currently lives in a bad place where it means either of two things -- r/rw or w/rw. With this change, we deprecate membar_enter/exit, introduce membar_acquire/release as better names for the useful pair (r/rw and rw/w), and make sure the implementation of membar_enter guarantees both what was documented _and_ what was implemented, making it an alias for membar_sync. While here, rework all of the membar_* definitions and aliases. The new logic follows a rule to make it easier to audit: membar_X is defined as an alias for membar_Y iff membar_X is guaranteed by membar_Y. The `no stronger than' relation is (the transitive closure of): - membar_consumer (r/r) is guaranteed by membar_acquire (r/rw) - membar_producer (w/w) is guaranteed by membar_release (rw/w) - membar_acquire (r/rw) is guaranteed by membar_sync (rw/rw) - membar_release (rw/w) is guaranteed by membar_sync (rw/rw) And, for the deprecated membars: - membar_enter (whether r/rw, w/rw, or rw/rw) is guaranteed by membar_sync (rw/rw) - membar_exit (rw/w) is guaranteed by membar_release (rw/w) (membar_exit is identical to membar_release, but the name is deprecated.) Finally, while here, annotate some of the instructions with their semantics. For powerpc, leave an essay with citations on the unfortunate but -- as far as I can tell -- necessary decision to use lwsync, not isync, for membar_acquire and membar_consumer. Also add membar(3) and atomic(3) man page links.	2022-04-09 23:32:51 +00:00
riastradh	d808f015e1	riscv/membar_ops: Upgrade membar_enter from W/RW to RW/RW. This will be deprecated soon but let's avoid leaving rakes to trip on with it arising from disagreement over the documentation (W/RW) and implementation and usage (R/RW).	2022-04-09 22:53:53 +00:00
riastradh	75d950a155	x86_64/membar_ops: Upgrade membar_enter from R/RW to RW/RW. This will be deprecated soon but let's avoid leaving rakes to trip on with it arising from disagreement over the documentation (W/RW) and implementation and usage (R/RW).	2022-04-09 22:53:45 +00:00
riastradh	a1f4bcbfda	i386/membar_ops: Upgrade membar_enter from R/RW to RW/RW. This will be deprecated soon but let's avoid leaving rakes to trip on with it arising from disagreement over the documentation (W/RW) and implementation and usage (R/RW).	2022-04-09 22:53:36 +00:00
riastradh	48b2cb5aa9	sparc64/membar_ops: Upgrade membar_enter from R/RW to RW/RW. This will be deprecated soon but let's avoid leaving rakes to trip on with it arising from disagreement over the documentation (W/RW) and implementation and usage (R/RW).	2022-04-09 22:53:25 +00:00
riastradh	ca73d72920	sparc/membar_ops: Upgrade membar_enter from R/RW to RW/RW. This will be deprecated soon but let's avoid leaving rakes to trip on with it arising from disagreement over the documentation (W/RW) and implementation and usage (R/RW).	2022-04-09 22:53:17 +00:00
riastradh	a8d0eed140	aarch64/membar_ops: Fix wrong symbol end.	2022-04-09 12:07:37 +00:00
riastradh	d767c9730a	x86: Add a note on membar_sync and mfence.	2022-04-09 12:07:29 +00:00
riastradh	3066bbbbf8	x86: Omit needless store in membar_producer/exit. On x86, every store is a store-release, so there is no need for any barrier. But this wasn't a barrier anyway; it was just a store, which was redundant with the store of the return address to the stack implied by CALL even if issuing a store made a difference.	2022-04-09 12:07:17 +00:00
riastradh	e0c914a79b	x86: Every load is a load-acquire, so membar_consumer is a noop. lfence is only needed for MD logic, such as operations on I/O memory rather than normal cacheable memory, or special instructions like RDTSC -- never for MI synchronization between threads/CPUs. No need for hot-patching to do lfence here. (The x86_lfence function might reasonably be patched on i386 to do lfence for MD logic, but it isn't now and this doesn't change that.)	2022-04-09 12:07:00 +00:00
riastradh	ffe06880f0	sparc64: Fix membar_sync by issuing membar #StoreLoad. In TSO this is the only memory barrier ever needed, and somehow we got this wrong and instead issued an unnecessary membar #LoadLoad -- not needed even in PSO let alone in TSO. XXX Apparently we may run userland programs with PSO or RMO, in which case all of these membars need fixing: PSO RMO membar_consumer nop membar #LoadLoad membar_producer membar #StoreStore membar #StoreStore membar_enter nop membar #LoadLoad\|LoadStore membar_exit membar #StoreStore membar #LoadStore\|StoreStore membar_sync membar #StoreLoad\|StoreStore membar #...everything... But at least this fixes the TSO case in which we run the kernel. Also I'm not sure there's any non-TSO hardware out there in practice.	2022-04-09 12:06:47 +00:00
riastradh	da06f841fd	sparc: Fix membar_sync with LDSTUB. membar_sync is required to be a full sequential consistency barrier, equivalent to MEMBAR #StoreStore\|LoadStore\|StoreLoad\|LoadLoad on sparcv9. LDSTUB and SWAP are the only pre-v9 instructions that do this and SWAP doesn't exist on all v7 hardware, so use LDSTUB. Note: I'm having a hard time nailing down a reference for the ordering implied by LDSTUB and SWAP. I'm _pretty sure_ SWAP has to imply store-load ordering since the SPARCv8 manual recommends it for Dekker's algorithm (which notoriously requires store-load ordering), and the formal memory model treats LDSTUB and SWAP the same for ordering. But the v8 and v9 manuals aren't clear. GCC issues STBAR and LDSTUB, but (a) I don't see why STBAR is necessary here, (b) STBAR doesn't exist on v7 so it'd be a pain to use, and (c) from what I've heard (although again it's hard to nail down authoritative references here) all actual SPARC hardware is TSO or SC anyway so STBAR is a noop in all the silicon anyway. Either way, certainly this is better than what we had before, which was nothing implying ordering at all, just a store!	2022-04-09 12:06:39 +00:00
riastradh	09ff5f3b48	Nix trailing whitespace in files of membars, atomics, and lock stubs. Will be touching many of these files soon for functional changes. No functional change intended.	2022-04-06 22:47:55 +00:00
wiz	0362f707fc	zlib: Fix a bug that can crash deflate on some input when using Z_FIXED. `5c44459c3b` This bug was reported by Danilo Ramos of Eideticom, Inc. It has lain in wait 13 years before being found! The bug was introduced in zlib 1.2.2.2, with the addition of the Z_FIXED option. That option forces the use of fixed Huffman codes. For rare inputs with a large number of distant matches, the pending buffer into which the compressed data is written can overwrite the distance symbol table which it overlays. That results in corrupted output due to invalid distances, and can result in out-of-bound accesses, crashing the application. The fix here combines the distance buffer and literal/length buffers into a single symbol buffer. Now three bytes of pending buffer space are opened up for each literal or length/distance pair consumed, instead of the previous two bytes. This assures that the pending buffer cannot overwrite the symbol table, since the maximum fixed code compressed length/distance is 31 bits, and since there are four bytes of pending space for every three bytes of symbol space.	2022-03-24 10:13:01 +00:00
riastradh	05a5e24cff	mips: Membar audit. This change should be safe because it doesn't remove or weaken any memory barriers, but does add, clarify, or strengthen barriers. Goals: - Make sure mutex_enter/exit and mutex_spin_enter/exit have acquire/release semantics. - New macros make maintenance easier and purpose clearer: . SYNC_ACQ is for load-before-load/store barrier, and BDSYNC_ACQ for a branch delay slot -- currently defined as plain sync for MP and nothing, or nop, for UP; thus it is no weaker than SYNC and BDSYNC as currently defined, which is syncw on Octeon, plain sync on non-Octeon MP, and nothing/nop on UP. It is not clear to me whether load-then-syncw or ll/sc-then-syncw or even bare load provides load-acquire semantics on Octeon -- if no, this will fix bugs; if yes (like it is on SPARC PSO), we can relax SYNC_ACQ to be syncw or nothing later. . SYNC_REL is for load/store-before-store barrier -- currently defined as plain sync for MP and nothing for UP. It is not clear to me whether syncw-then-store is enough for store-release on Octeon -- if no, we can leave this as is; if yes, we can relax SYNC_REL to be syncw on Octeon. . SYNC_PLUNGER is there to flush clogged Cavium store buffers, and BDSYNC_PLUNGER for a branch delay slot -- syncw on Octeon, nothing or nop on non-Octeon. => This is not necessary (or, as far as I'm aware, sufficient) for acquire semantics -- it serves only to flush store buffers where stores might otherwise linger for hundreds of thousands of cycles, which would, e.g., cause spin locks to be held for unreasonably long durations. Newerish revisions of the MIPS ISA also have finer-grained sync variants that could be plopped in here. Mechanism: Insert these barriers in the right places, replacing only those where the definition is currently equivalent, so this change is safe. - Replace #ifdef _MIPS_ARCH_OCTEONP / syncw / #endif at the end of atomic_cas_* by SYNC_PLUNGER, which is `sync 4' (a.k.a. syncw) if __OCTEON__ and empty otherwise. => From what I can tell, __OCTEON__ is defined in at least as many contexts as _MIPS_ARCH_OCTEONP -- i.e., there are some Octeons with no _MIPS_ARCH_OCTEONP, but I don't know if any of them are relevant to us or ever saw the light of day outside Cavium; we seem to buid with `-march=octeonp' so this is unlikely to make a difference. If it turns out that we do care, well, now there's a central place to make the distinction for sync instructions. - Replace post-ll/sc SYNC by SYNC_ACQ in _atomic_cas_, which are internal kernel versions used in sys/arch/mips/include/lock.h where it assumes they have load-acquire semantics. Should move this to lock.h later, since we _don't_ define __HAVE_ATOMIC_AS_MEMBAR on MIPS and so the extra barrier might be costly. - Insert SYNC_REL before ll/sc, and replace post-ll/sc SYNC by SYNC_ACQ, in _ucas_, which is used without any barriers in futex code and doesn't mention barriers in the man page so I have to assume it is required to be a release/acquire barrier. - Change BDSYNC to BDSYNC_ACQ in mutex_enter and mutex_spin_enter. This is necessary to provide load-acquire semantics -- unclear if it was provided already by syncw on Octeon, but it seems more likely that either (a) no sync or syncw is needed at all, or (b) syncw is not enough and sync is needed, since syncw is only a store-before-store ordering barrier. - Insert SYNC_REL before ll/sc in mutex_exit and mutex_spin_exit. This is currently redundant with the SYNC already there, but SYNC_REL more clearly identifies the necessary semantics in case we want to define it differently on different systems, and having a sync in the middle of an ll/sc is a bit weird and possibly not a good idea, so I intend to (carefully) remove the redundant SYNC in a later change. - Change BDSYNC to BDSYNC_PLUNGER at the end of mutex_exit. This has no semantic change right now -- it's syncw on Octeon, sync on non-Octeon MP, nop on UP -- but we can relax it later to nop on non-Cavium MP. - Leave LLSCSYNC in for now -- it is apparently there for a Cavium erratum, but I'm not sure what the erratum is, exactly, and I have no reference for it. I suspect these can be safely removed, but we might have to double up some other syncw instructions -- Linux uses it only in store-release sequences, not at the head of every ll/sc.	2022-02-27 19:21:53 +00:00
riastradh	e35e7b15e2	mips: Brush up __cpu_simple_lock. - Eradicate last vestiges of mb_* barriers. - In __cpu_simple_lock_init, omit needless barrier. It is the caller's responsibility to ensure __cpu_simple_lock_init happens before other operations on it anyway, so there was never any need for a barrier here. - In __cpu_simple_lock_try, leave comments about memory ordering guarantees of the kernel's _atomic_cas_uint, which are inexplicably different from the non-underscored atomic_cas_uint. - In __cpu_simple_unlock, use membar_exit instead of mb_memory, and do it unconditionally. This ensures that in __cpu_simple_lock/.../__cpu_simple_unlock, all memory operations in the ellipsis happen before the store that releases the lock. - On Octeon, the barrier was omitted altogether, which is a bug -- it needs to be there or else there is no happens-before relation and whoever takes the lock next might see stale values stored or even stomp over the unlocking CPU's delayed loads. - On non-Octeon, the mb_memory was sync. Using membar_exit preserves this. XXX On Octeon, membar_exit only issues syncw -- this seems wrong, only store-before-store and not load/store-before-store, unless the CNMIPS architecture guarantees it is sufficient here like SPARCv8/v9 PSO (`Partial Store Order'). - Leave an essay with citations about why we have an apparently pointless syncw _after_ releasing a lock, to work around a design bug^W^Wquirk in cnmips which sometimes buffers stores for hundreds of thousands of cycles for fun unless you issue syncw.	2022-02-12 17:10:02 +00:00
andvar	5ceb9d96fa	fix typos in comments.	2022-01-15 10:38:56 +00:00
andvar	1cb7819f04	fix various typos in comments.	2021-12-12 22:20:52 +00:00
andvar	42412bc75c	s/efficent/efficient/ in comments.	2021-12-08 20:11:54 +00:00
msaitoh	7a2933d5cb	s/asychronous/asynchronous/ in comment.	2021-12-05 04:24:08 +00:00
msaitoh	7c496db356	s/absense/absence/ in comment.	2021-12-05 03:24:19 +00:00
msaitoh	344f0d1e04	s/exisit/exist/ in comment.	2021-12-05 02:52:17 +00:00
andvar	a27a533e2d	fix few typos in comments and log message.	2021-11-14 20:51:57 +00:00
christos	00f17ebc18	Use defined constant instead of direct value (Etienne Brateau)	2021-10-28 15:09:08 +00:00
christos	b0d97acfad	Fix build with -Werror=array-parameter (Etienne Brateau)	2021-10-28 15:08:05 +00:00
andvar	50d9072672	remove duplicate the article in comments.	2021-10-04 21:02:39 +00:00
andvar	a136e22ab6	fix various typos in comments, messages and documentation.	2021-09-19 10:34:06 +00:00
andvar	72e44f84cb	fix typos in word "successfully", mainly s/succesfully/successfully/.	2021-09-16 21:29:41 +00:00
andvar	4ddb87935b	s/aquire/acquire/ in comments, also one typo fix acqure->acquire.	2021-09-07 13:24:45 +00:00
christos	8f97cb72d8	remove lint exclusion	2021-08-30 12:52:32 +00:00
ryo	567a3a02e7	Improved the performance of kernel profiling on MULTIPROCESSOR, and possible to get profiling data for each CPU. In the current implementation, locks are acquired at the entrance of the mcount internal function, so the higher the number of cores, the more lock conflict occurs, making profiling performance in a MULTIPROCESSOR environment unusable and slow. Profiling buffers has been changed to be reserved for each CPU, improving profiling performance in MP by several to several dozen times. - Eliminated cpu_simple_lock in mcount internal function, using per-CPU buffers. - Add ci_gmon member to struct cpu_info of each MP arch. - Add kern.profiling.percpu node in sysctl tree. - Add new -c <cpuid> option to kgmon(8) to specify the cpuid, like openbsd. For compatibility, if the -c option is not specified, the entire system can be operated as before, and the -p option will get the total profiling data for all CPUs.	2021-08-14 17:51:18 +00:00
ryo	1979ff4ae2	don't include "opt_multiprocessor.h" inside an ifdef to work "make depend" properly.	2021-08-14 17:38:44 +00:00
andvar	ebbc7028d3	fix typos in words "pointer" and s/fram /frame/	2021-08-13 20:47:54 +00:00
skrll	1306a159ff	Whitespace	2021-08-08 07:17:18 +00:00
andvar	077d1c0f36	fix various typos in comments and log messages.	2021-08-02 12:56:22 +00:00
andvar	5298fab779	s/overwriten/overwritten/ in comments.	2021-08-01 21:58:56 +00:00
andvar	31f72197e0	fix more typos in style found one in file - check/fix them all.	2021-07-31 14:36:33 +00:00
skrll	65d55bcee1	As we're providing the legacy gcc __sync built-in functions for atomic memory access we might as well get the memory barriers right... From the gcc documentation: In most cases, these built-in functions are considered a full barrier. That is, no memory operand is moved across the operation, either forward or backward. Further, instructions are issued as necessary to prevent the processor from speculating loads across the operation and from queuing stores after the operation. type __sync_lock_test_and_set (type ptr, type value, ...) This built-in function is not a full barrier, but rather an acquire barrier. This means that references after the operation cannot move to (or be speculated to) before the operation, but previous memory stores may not be globally visible yet, and previous memory loads may not yet be satisfied. void __sync_lock_release (type ptr, ...) This built-in function is not a full barrier, but rather a release barrier. This means that all previous memory stores are globally visible, and all previous memory loads have been satisfied, but following memory reads are not prevented from being speculated to before the barrier.	2021-07-29 10:29:05 +00:00
simonb	3fc2996b41	#define<tab> consistency.	2021-07-28 08:01:10 +00:00
skrll	8e8c0784cf	Remove memory barriers from the atomic_ops(3) atomic operations. They're not needed for correctness. Add the correct memory barriers to the gcc legacy __sync built-in functions for atomic memory access. From the gcc documentation: In most cases, these built-in functions are considered a full barrier. That is, no memory operand is moved across the operation, either forward or backward. Further, instructions are issued as necessary to prevent the processor from speculating loads across the operation and from queuing stores after the operation. type __sync_lock_test_and_set (type ptr, type value, ...) This built-in function is not a full barrier, but rather an acquire barrier. This means that references after the operation cannot move to (or be speculated to) before the operation, but previous memory stores may not be globally visible yet, and previous memory loads may not yet be satisfied. void __sync_lock_release (type ptr, ...) This built-in function is not a full barrier, but rather a release barrier. This means that all previous memory stores are globally visible, and all previous memory loads have been satisfied, but following memory reads are not prevented from being speculated to before the barrier.	2021-07-28 07:32:20 +00:00
andvar	7991f5a7b8	Fix all remaining typos, mainly in comments but also in few definitions and log messages, reported by me in PR kern/54889. Also fixed some additional typos in comments, found on review of same files or typos.	2021-07-24 21:31:31 +00:00
skrll	6a2d1b5533	#include <sys/param.h>	2021-07-22 13:54:38 +00:00
skrll	5e911a385d	s/ifdef _ARM_ARCH_6/if defined(_ARM_ARCH_6)/ for consistency. NFCI.	2021-07-10 06:53:40 +00:00
skrll	52728926ba	One more s/pte/ptr/	2021-07-06 08:31:41 +00:00
skrll	6788795c38	typo in comment s/pte/ptr/	2021-07-05 08:50:31 +00:00
skrll	68a49f39f0	Fix the logic operation for atomic_nand_{8,16,32,64} From the gcc docs the operations are as follows { tmp = ptr; ptr = ~(tmp & value); return tmp; } // nand { tmp = ~(ptr & value); ptr = tmp; return *ptr; } // nand yes, this is really rather strange.	2021-07-04 06:55:47 +00:00

1 2 3 4 5 ...

1136 Commits