the page tree so that the first 2MB of virtual memory can be kentered in
L1.
Strictly speaking, the kernel should never kenter a virtual page below
VM_MIN_KERNEL_ADDRESS, because then it wouldn't be available in userland.
It used to need the first 2MB in order to map the CPU trampoline and the
initial VAs used by the bootstrap code. Now, the CPU trampoline VA is
allocated with uvm_km_alloc and the VAs used by the bootstrap code are
allocated with pmap_bootstrap_valloc, and in either case the resulting VA
is above VM_MIN_KERNEL_ADDRESS.
The low levels in the page tree are therefore unused. By removing this
function, we are making sure no one will be tempted to map an area below
VM_MIN_KERNEL_ADDRESS in kernel mode, and particularly, we are making sure
NULL cannot be kentered.
In short, there is no way to map NULL in kernel mode anymore.
virtual memory, and only the latter matters. I'm not exactly sure why, but
it appears that the kernel modules must be placed above the kernel image.
Just make this comment more ambiguous, in case the next passer-by gets
inspired.
relocating them. The text is allocated as RWX, and then mprotected to RW.
There is a bug that prevents us from doing RW->RX on amd64 and perhaps
sparc64. On x86, the pmap waits for the page to fault before granting it
the X permission. But in the trap handler, such a page is considered as
belonging to kernel_map, while it actually belongs to module_map. The
kernel then finds out the page is not present in kernel_map, and panics.
In all cases, module_map is non pageable, so even if the trap were handled
properly, it still wouldn't work.
Therefore, there is a small window in which the segment is RWX. But that's
fine enough, for now.
When mprotecting a page, the kernel updates the uvm protection associated
with the page, and then gives control to the x86 pmap which splits the
procedure in two: if we are restricting the permissions it updates the page
tree right away, and if we are increasing the permissions it just waits for
the page to fault.
In the first case, it forgets to take care of the X permission. Which means
that if we allocate an executable page, it is impossible to remove the X
permission on it, this being true regardless of whether the mprotect call
comes from the kernel or from userland. It is not possible to make sure the
page is non executable either, since the only holder of the permission
information is uvm, and no track is kept at the pmap level of the actual
permissions enforced. In short, the kernel believes the page is non
executable, while the cpu knows it is.
Fix this by properly taking care of the !VM_PROT_EXECUTE case. Since the
bit manipulation is a little tricky we use two vars: bit_rem (remove) and
bit_put.