- Desc 0x55 and 0xb1 are Instruction TLB but not fixed to 4K.
- Desc 0x5a and 0xc0 are Data TLB but not fixed to 4K.
- Desc 0x57 and 0x59 are 4K fixed DTLB.
- Fix string of desc 0xc2 and it's not fixed to 4K.
- Desc 0xca is 4K fixed L2 shared TLB.
- Add desc 0xa0.
BUG: A lot of CPUs have multiple CAI_DTLB and/or CAI_DTLB2 entries. Currently
TLB info is indexed in ci_cinfo[CAI_COUNT], so some info is overwritten.
Nowadays CPUs have very complexed TLBs. It's hard to manage with CAI_* index.
We should think to separate TLB info structure from ci_cinfo[CAI_COUNT]
in struct cpu_info.
- Add 4G freelist to i386 -- there may be higher addresses if PAE.
- Add 64G and 1T freelists to amd64.
- Simplify freelist setup code and condense it into a table.
- Add x86_select_freelist to get a freelist guaranteed to yield
addresses no greater than a prescribed maximum address.
x86_select_freelist takes a uint64_t, not a paddr_t or bus_addr_t, so
that you can pass in, e.g., a 36-bit maximum address without needing
to write conditionals for i386/PAE.
No objections on port-x86:
https://mail-index.netbsd.org/port-i386/2014/05/21/msg003277.htmlhttps://mail-index.netbsd.org/port-amd64/2014/05/21/msg002062.html
Add support for the forthcoming AVX-512 registers.
Code compiled with -mavx seems to work, but I've not tested context
switches with live ymm registers.
There is a small cost on fork/exec (a larger area is copied/zerod),
but I don't think the ymm registers are read/written unless they
have been used.
The code use XSAVE on all cpus, I'm not brave enough to enable XSAVEOPT.
x87 control word.
This means that nothing outside fpu.c cares about the internals of the
fpu save area.
New kernel modules won't load with the old kernel - but that won't matter.
The result is only written to sysctl nodes at the moment.
I see:
machdep.fpu_save = 3 (implies xsaveopt)
machdep.xsave_size = 832
machdep.xsave_features = 7
Completely common up the i386 and amd64 machdep sysctl creation.
for the normal and extended leafs.
(The 'normal' one might be luring in the global cpulevel.)
Read the 'extended feature' from cpuid.80000001.%ecx/edx into
ci_feat_val[3/2] just after saving cpuid.1.%ecx/dx in ci_feat_val[1/0]
instead of doing it separately for amd k678 and via c3 processors
in their probe functions and repeating it for all cpus a few instructions
later when x86_cpu_topology() is called.
x86_cpu_topology() is only called from cpu_probe() and really doesn't
deserve its own source file. Chasing the setup code is bad enough anyway.
Add the file back so that the firwfox source doesn't have to depend
on the version of netbsd it is being compiled for.
(The i386 version doesn't play the same games in its SIGFPE handler.)
helper functions in arch/x86/x86/fpu.c
They (hopefully) ensure that we write to the entire buffer and don't load
values that might cause faults in kernel.
Also zero out the 'pad' field of the i386 mcontext fp area that I think
once contained the registers of any Weitek fpu.
Dunno why it wasn't pasrt of the union.
Some of these copies could be removed if the code directly copied the save
area to/from userspace addresses.
The cost of zeroing the save area on exec is minimal.
This stops the FP registers of a random process being used the first
time an lwp uses the fpu.
sendsig_siginfo() and get_mcontext() now unconditionally copy the FP
registers.
I'll remove the double-copy for signal handlers soon.
get_mcontext() might have been leaking kernel memory to userspace - and
may still do so if i386_use_fxsave is false (short copies).
This changes the trap10 and trap13 code to call directly into fpu.c,
removing all the code for T_ARITHTRAP, T_XMM and T_FPUNDA from i386/trap.c
Not all of the code thate appeared to handle fpu traps was ever called!
Most of the changes just replace the include of machine/npx.h with x86/fpu.h
(or remove it entirely).
inside the ucontext structure passed to signal handlers to modify the
xmm registers.
This should make the code compile - I'm not at all sure it works as expected,
the interactions between FP and signal handlers aren't at all clear.
AFAICT the FP state is saved on the user stack when the handler is called,
however the FP trap code can already done odd things to the FPU....
definitions match those of i386.
Mostly just structure and field renames, in addition:
1) process_xmm_to_s87() and process_s87_to_xmm() moved into
x86/convert_xmm_s87.c so they can be used by amd64's netbsd32 code.
2) The linux signal code simplified to use a structure copy for ths fxsave
data - it matches the hardware definition and won't change.
Rename the associated ci_fpsaving field to 'unused'.
I'm not sure they could ever happen, you could get unwanted calls into
the fpu trap code while saving state when using INT13 - but these are
different.
The return value from the i386 fpudna() was always 1 - possibly a historic
relic of the kernel fp emulation. Remove and don't check in trap.S.
The amd64 and i386 fpudna() code is now almost identical.
compatible method of handling floating point exceptions.
Make kernel support for teh fpu non-optional (486SX should still work).
Only 386 cpus support external fpu, and i386 support was removed years ago.
This means that the npx code no longer uses port 0xf0 or interupt 13.
All the "npx at isa" lines go from the configs, arch/i386/isa/npx.c
is now mandatory for all i386 kernels.
I've renamed npxinit() to fpuinit() and npxinit_cpu() to fpuinit_cpu()
to match the very similar amd64 functions.
The fpu of the boot cpu is now initialised by a direct call from
cpu_configure(), this enables FP emulation for a 486SX.
(for amd64 the cr0 values are set in locore.S and similar).
This fixes a long-standing bug in linux_setregs() - which did not
save the fpu regsiters if they were active.
I've test booted a single cpu i386 kernel (using anita).
amd64 builds - none of teh changes should affect it.
The i386 XEN kernels build, but I'm not sure where they set cr0, and
it might have got lost!
the document (AMD64 Architecture ProgrammerVolume 3: General-Purpose and
System Instructions. Document revision 3.20)
- "s/MXX/MMXX/" because this bit is "MMX eXtention".
to reduce code duplication and to avoid bug.
CPUID_TO_STEPPING(cpuid) (not changed)
CPUID_TO_FAMILY(cpuid) (new)
CPUID_TO_MODEL(cpuid) (new)
Return the display family and the display model.
The macro names are the same as FreeBSD.
CPUID_TO_BASEFAMILY(cpuid) (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid) (The old name was CPUID2MODEL)
Only for the base field.
CPUID_TO_EXTFAMILY(cpuid) (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid) (The old name was CPUID2EXTMODEL)
Only for the extended field.
See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html
This reduces the amount of MD code enormously, and makes it easier
to implement support for newer CPU features which require more fpu
state, or for fpu usage by the kernel.
For access to FPU state across CPUs, an xcall kthread is used now
rather than a dedicated IPI.
No user visible changes intended.