a bunch of cruft and avoids using a v9a instruction. In addition, eliminate 8 of the fmovda's, which we are not using the result of anyway. Net result is that this should be faster in all cases.