To implement a more accurate microtime using the CP0 COUNT
register we need to divide that register by the number of
cycles per MHz. But...
DIV and DIVU are expensive on MIPS (eg 75 clocks on the
R4000). MULT and MULTU are only 12 clocks on the same CPU.
On the SB1 these appear to be 40-72 clocks for DIV/DIVU and 3
clocks for MUL/MULTU.
The strategy we use to to calculate the reciprical of cycles
per MHz, scaled by 1<<32. Then we can simply issue a MULTU
and pluck of the HI register and have the results of the
division.