there are a few things here:
- there's a race between reading the limit register (which clears
the interrupt and the limit bit) and increasing the latest offset.
this can happen easily if an interrupt comes between the read and
the call to tickle_tc() that increases the offset (i obverved this
actually happening.)
- in early boot, sometimes the counter can cycle twice before the
tickle happens.
to handle these issues, add two workarounds:
- if the limit bit isn't set, but the counter value is less than
the previous value, and the offset hasn't changed, use the same
fixup as if the limit bit was set. this handles the first case
above.
- add a hard-workaround for never allowing returning a smaller
value (except during 32 bit overflow): if the result is less than
the last result, add fixups until it does (or until it would
overflow.)
the first workaround fixes general run-time issues, and the second
fixes issues only seen during boot.
also expand some comments in timer_sun4m.c and re-enable the sun4m
sub-microsecond tmr_ustolim4m() support (but it's always called with
at least 'tick' microseconds, so the end result is the same.)
space a VA from the kernel space.
Use it to replace the PDIR_SLOT_PCPU slot: at boot time each CPU creates
its own slot which maps only its own pcpu_entry plus the common area (IDT+
LDT).
This way, the pcpu areas of the remote CPUs are not mapped in userland.
SVS_ENTER_NOSTACK in the syscall entry point, and put it before the code
that touches curlwp. (curlwp is located in the direct map.)
Then, disable __HAVE_CPU_UAREA_ROUTINES (to be removed later). This moves
the kernel stack into pmap_kernel(), and not the direct map. That's a
change I've always wanted to make: because of the direct map we can't add
a redzone on the stack, and basically, a stack overflow can go very far
in memory without being detected (as far as erasing all of the system's
memory).
Finally, unmap the direct map from userland.
SDM points out, so make sure we have PGEX_P. This way NULL dereferences -
which are caused by an unmapped VA, and therefore are not protection
violations - don't take this branch, and don't display a misleading
"SMAP" in ddb.
Adding a PGEX_P check, or not, does not essentially change anything from
a security point of view, it's just a matter of what gets displayed when
a fatal fault comes in.
I didn't put PGEX_P until now, because initially when I wrote the SMAP
implementation Qemu did not always receive the fault if the PGEX_P check
was there, while a native i5 would. I'm unable to reproduce this issue
with a recent Qemu, so I assume I did something wrong when testing in the
first place.
We must not call callout_halt of nd6_dad_timer with holding nd6_dad_lock because
the lock is taken in nd6_dad_timer. Once softnet_lock goes away, we can pass the
lock to callout_halt, but for now we cannot.
On rump kernels, the callouts for domains, pffasttimo and pfslowtimo, started
before domains were attached. Normally the callouts were dispatched after
domain attaches (initializations) finished, however, under load the callouts
could be executed prior to the attaches, resulting in that the callouts accessed
unallocated or uninitialized resources.