This is intended for 68060:
- GCC does not emit __muldi3() for 68020-40, that have 32 * 32 --> 64 mulul
- mulsl (and moveml), used in this code, are not implemented for 68010
In comparison with that from compiler_rt, this version saves:
- 12% of processing time
- 12 bytes of stack
- 50 bytes of code size
Also, slightly faster, memory saving, and smaller than libgcc version.
By examining with evcnt(9), __muldi3() is invoked more than 1000 times per
sec by kernel, which should justify to introduce assembler version of this
function.