d81b3ce220
optimization problem: subsequent add/subs were done inside FPU registers, with "double" precision, without rounding to "float" in between