ac670b3099
into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64}, because amd64 already has a direct map that is way faster than that. There are two major issues with the global array: maxcpus entries are allocated while it is unlikely that common i386 machines have so many cpus, and the base VA of these entries is not cache-line-aligned, which mostly guarantees cache-line-thrashing each time the VAs are entered. Now the number of tmp VAs allocated is proportionate to the number of CPUs attached (which therefore reduces memory consumption), and the base is properly aligned. On my 3-core AMD, the number of DC_refills_L2 events triggered when performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on average divided by two with this patch. Discussed on tech-kern a little. |
||
---|---|---|
.. | ||
compile | ||
conf | ||
include | ||
x86 | ||
xen | ||
xenbus | ||
Makefile |