This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.
PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).
In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.
When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.
Notes:
- reworked locore.S
- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.
- some changes in bioscall and kvm86_call, to reflect the above.
- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).
- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).
- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).
- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).
- 64 bits atomic functions for pmap
- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).
- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.
See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html
No objection raised on port-i386@ and port-xen@R for about a week.
XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).
XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).
This is useful for automated environments where everything (rpcbind,
mountd, nfsd and the client) is started in parallel in a split
second and there is a small chance we will race in there before
everything has been communicated to rpcbind.
still at the original value and not the schedstate one. This makes
select not miss wakeups in cases where there was a lot of selecting
going on (which is not all that common in a rump kernel).
activity (BM_STS) by reading from the PM1 register. According to the Intel
processor specification for ACPI, the FFH GAS encoding may provide a hint
that the check is not required. This may help some systems to enter C2/C3
even when e.g. usb(4) keeps the BM_STS bit always enabled.
existing ACPICPU_FLAG_C, as was intended. Set that flag only after the
idle-loop has been installed, so that the notify handler errors out if an
interrupt is received before the idle-loop is in place.