but use VFP instructions to do the actual work. This should give near hard-float performance without requiring compiler changes.