we can disable the global-paging mechanism in %cr4 with CR4_PGE. Do that.
In addition, install CR4_PGE when SVS is disabled manually (via the
sysctl).
Now, doing "sysctl -w machdep.svs_enabled=0" restores the performance
completely, exactly as if SVS hadn't been enabled in the first place.
needed to disable SVS at runtime.
We set 'svs_enabled' to false, and hotpatch the kernel entry/exit points
to eliminate the context switch code.
We need to make sure there is no remote CPU that is executing the code we
are hotpatching. So we use two barriers:
* After the first one each CPU is guaranteed to be executing in
svs_disable_cpu with interrupts disabled (this way it can't leave this
place).
* After the second one it is guaranteed that SVS is disabled, so we flush
the cache, enable interrupts and continue execution normally.
Between the two barriers, cpu0 will disable SVS (svs_enabled=false and
hotpatch), and each CPU will restore the generic syscall entry point.
Three notes:
* We should call svs_pgg_update(true) afterwards, to put back PG_G on
the kernel pages (for better performance). This will be done in another
commit.
* The fact that we disable interrupts does not prevent us from receiving
an NMI, and it would be problematic. So we need to add some code to
verify that PMCs are disabled before hotpatching. This will be done
in another commit.
* In svs_disable() we expect each CPU to be online. We need to add a
check to make sure they indeed are.
The sysctl allows only a 1->0 transition. There is no point in doing 0->1
transitions anyway, and it would be complicated to implement because we
need to re-synchronize the CPU user page tables with the current ones (we
lost track of them in the last 1->0 transition).
Declare x86_patch_window_open() and x86_patch_window_close(), and globalify
x86_hotpatch().
Introduce svs_enable() in x86/svs.c, that does the SVS hotpatching.
Change svs_init() to take a bool. This function gets called twice; early
when the system just booted (and nothing is initialized), lately when at
least pmap_kernel has been initialized.
The SVS_* macros are now compiled as skip-noopt. When the system boots, if
the cpu is from Intel, they are hotpatched to their real content.
Typically:
jmp 1f
int3
int3
int3
... int3 ...
1:
gets hotpatched to:
movq SVS_UTLS+UTLS_KPDIRPA,%rax
movq %rax,%cr3
movq CPUVAR(KRSP0),%rsp
These two chunks of code being of the exact same size. We put int3 (0xCC)
to make sure we never execute there.
In the non-SVS (ie non-Intel) case, all it costs is one jump. Given that
the SVS_* macros are small, this jump will likely leave us in the same
icache line, so it's pretty fast.
The syscall entry point is special, because there we use a scratch uint64_t
not in curcpu but in the UTLS page, and it's difficult to hotpatch this
properly. So instead of hotpatching we declare the entry point as an ASM
macro, and define two functions: syscall and syscall_svs, the latter being
the one used in the SVS case.
While here 'syscall' is optimized not to contain an SVS_ENTER - this way
we don't even need to do a jump on the non-SVS case.
When adding pages in the user page tables, make sure we don't have PG_G,
now that it's dynamic.
A read-only sysctl is added, machdep.svs_enabled, that tells whether the
kernel uses SVS or not.
More changes to come, svs_init() is not very clean.
on ixv(4) derived from FreeBSD's AIM (Auto Interrupt Moderation) bug.
ITR_INTERVAL value must be larger than 4us.
- The bitfield of EITR register is different between 82598 and others.
ixv.c had a bug that it accessed 82598's way even though only 82599 and
newer support virtual function. Fix it using with new ixv_eitr_write()
function.
- local HOST_WIDE_INT_CONSTANT macro same as new HOST_WIDE_INT_C macro,
so use it instead, and remove the local macro.
- re-port the genrecog.c change which was missed in early GCC-6 port.
this makes sh3 work again.
KERNEL_PID> to avoid triggering KASSERT() checking allocated asid
is bigger than KERNEL_PID; adjust also TLBINFO_ASID_INITIAL_FREE()
accordingly
discussed with Nick
ipsec4_hdrsiz and ipsec6_hdrsiz
ipsec4_in_reject and ipsec6_in_reject
ipsec4_checkpolicy and ipsec4_checkpolicy
The members of these couples are now identical, and could be merged,
giving only three functions instead of six...
we are not guaranteed to have enough room for another struct ip, and we
may crash here. Triggerable remotely, but after authentication, by sending
an AH packet that has a one-byte-sized IPIP payload.