syscall/trap entry, eliminating a test+branch on every syscall/trap.
This wasn't possible in the 3.99.x timeframe when l->l_cred came about
because there wasn't a reliable/timely way to force an ONPROC LWP running on
a remote CPU into the kernel (which is just about the only new thing in
this scheme).
the 0x80000001 CPUID result needs some parsing to match against
actual family/model/stepping values. 4-bit 'family' values of
15 or 6 change how to parse the 4-bit extended model and 8-bit
extended family value - for family 6 or 15, the extended model
bits (4) are concatenated with the base 4-bits to create an
8-bit value, and for family 15, the family value is addition
of the family value and the 8-bit extended-family value, giving
a range of 0 to 15 + 0xff aka 270.
use a CPUREV(family, model, stepping) macro that builds the
relevant bit-representation of a CPUID, making it far easier
to understand what each entry means, and to add new ones too.
i have confirmed that the emitted cpurevs[] array has the same
values before/after this change, ie, NFCI or observed.
this is based upon Taylor's original work. i just made the list
of CPUs to run on correct as i could determine. (also, add some
Zen3 and Zen4 cpuids not yet used by any errata.)
(might be nice to have a better way to expression revision ranges
rather than specific cpuid matches, eg, 0x30-0x4f models in a cpu
family, etc.)
tested on ryzen 3600, and a ported zenbleed PoC that no longer
shows any obtained text. (a similar module-version of it stopped
the PoC on a ryzen 3950x without having to reboot.)
https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7008.htmlhttps://lock.cmpxchg8b.com/zenbleed.html
sys/sched.h included sys/mutex.h
which includes sys/intr.h
which includes machine/intr.h
which on cats includes arm/footbridge/footbridge_intr.h
which includes arm/cpu.h
which includes sys/cpu_data.h
which includes sys/sched.h
But there was never any real need for sys/mutex.h in sys/sched.h,
because it only uses pointers to the opaque struct kmutex. Cycle
broken by using `struct kmutex *' instead of pulling in sys/mutex.h
for the definition of kmutex_t.
Side effect: This revealed that sys/cpu_data.h needed sys/intr.h
(which was pulled in accidentally by sys/mutex.h via sys/sched.h) for
SOFTINT_COUNT. Also revealed some other machine/cpu.h header files
were missing includes of sys/mutex.h for kmutex_t.
- Change the lower limit from 70 to 60. At least, some BIOSes can change
the value down to 62.
- Change the upper limit from 110 to 120. At least, some BIOSes can change
the value up to 115.
- Print error message when rdmsr(TEMPERATURE_TARGET) failed.
- When Tjmax exceeded the limit, print warning message and use the value
as it is.
Turns out machine/lock.h is not needed for __cpu_simple_lock_t, which
always comes from sys/types.h. And, really, sys/types.h (or at least
sys/stdint.h) is needed for uintN_t and uintptr_t.
Delete a lot of unnecessary code with broken error branches involving
config_detach which have probably seldom if ever been exercised.
No substantive functional change intended. Low risk because
ichlpcib(4) is not a removable device, so you have to go out of your
way to exercise detach.
TCO (`Total Cost of Ownership', Intel's bizarre name for a watchdog
timer) used to hang off the Intel I/O platform controller hub's (ICH)
low-pin-count interface bridge (LPC IB), or ichlpcib(4). On newer
devices, it hangs off the ICH SMBus instead.
Tested on INTEL 100SERIES_SMB (works) and INTEL 100SERIES_LP_SMB
(doesn't work, still not sure why).
XXX kernel revbump: This breaks the module ABI -- tco(4) modules
older than the change to make ta_has_rcba into ta_version will
incorrectly attach at buses they do not understand. (However, the
tco(4) driver is statically built into GENERIC, so maybe it's safe
for pullup since the module wouldn't have worked anyway.)
it handles the console and 0 otherwise (especially when console=tty0 or
console=pc is present on the command line).
In consinit() fallback to native console selection if xen_pvh_consinit()
returns 0.
16 bytes is not enough.
(Is this why it never worked on Xen some years back? Got lucky and
accidentally had 64-byte alignment on native x86, but not in the call
stack in Xen?)
XXX pullup-10
This time, make sure to restore the FPU state when switching to a
kthread in the middle of kthread_fpu_enter/exit.
This adds a single predicted-taken branch for the case of kthreads
that are not in kthread_fpu_enter/exit, so it incurs a penalty only
for threads that actually use it. Since it avoids FPU state
switching in kthreads that do use the FPU, namely cgd worker threads,
this should be a net performance win on systems using it and have
negligible impact otherwise.
XXX pullup-10
In fpu_kern_enter, make sure all the MXCSR exception status bits are
set when we start using the FPU, so that instructions which exhibit
MCDT are unaffected by it.
While here, zero all the other FPU registers in fpu_kern_enter.
In principle we could skip this step on future CPUs that fix the MCDT
bug, but there's probably not much benefit -- workloads that do a lot
of crypto in the kernel are probably better off using
kthread_fpu_enter or WQ_FPU to skip the fpu_kern_enter/leave cycles
in the first place.
For details, see:
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/mxcsr-configuration-dependent-timing.html