- use https where possible
- adapt to cdn/nycdn and our current auto-build conventions
- automate where possible
Many thanks to uwe for lots of *roff help.
Add support for earmv6hf binaries on COMPAT_NETBSD32 for aarch64:
- Emulate ARMv6 instructions with cache operations register (c7), that
are deprecated since ARMv7, and disabled on ARMv8 with LP64 kernel.
- ep_machine_arch (default: earmv7hf) is copied from executables, as we
do for mips64. "uname -p" reports earmv6hf if compiled for earmv6hf;
configure scripts etc can determine the appropriate architecture.
Many thanks to ryo@ for helping me to add support of Thumb-mode,
as well as providing exhaustive test cases:
https://github.com/ryo/mcr_test/
We've confirmed:
- Emulation works in Thumb-mode.
- T32 16-bit length illegal instruction results in SIGILL, even if
it is located nearby a boundary b/w mapped and unmapped pages.
- T32 32-bit instruction results in SIGSEGV if it is located across
a boundary b/w mapped and unmapped pages.
XXX
pullup to netbsd-9
- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.
- Remove all code that should be MI, leaving the bare minimum under arch/.
- Make the required actions very explicit.
- Pass in LWP pointer for convenience.
- When a trap is required on another CPU, have the IPI set it locally.
- Expunge cpu_did_resched().
rw_enter(lock, RW_READ);
Having instrumented it, it turns out that >99.5% of the time the lock is
completely unknowned. Make this assumption in the assembly stub for
rw_enter(), and avoid the initial read of the lock word. Where there are
existing read holds, we'll do an additional CMPXCHG but should already have
the cache line in the EXCLUSIVE state.
- Disable zeroing in the idle loop. That needs a cache-friendly strategy.
Result: 3 to 4% reduction in kernel build time on my test system.
Inspired by a discussion with Mateusz Guzik and David Maxwell.