ARMv8 requires support for converting 32 and 64bit floating point
values to signed and unsigned 16bit integers.
Signed-off-by: Will Newton <will.newton@linaro.org>
[PMM: updated not to incorrectly set Inexact for Invalid inputs]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Our float32 to float16 conversion routine was generating the correct
numerical answers, but not always setting the right set of exception
flags. Fix this, mostly by rearranging the code to more closely
resemble RoundAndPackFloat*, and in particular:
* non-IEEE halfprec always raises Invalid for input NaNs
* we need to check for the overflow case before underflow
* we weren't getting the tininess-detected-after-rounding
case correct (somewhat academic since only ARM uses halfprec
and it is always tininess-detected-before-rounding)
* non-IEEE halfprec overflow raises only Invalid, not
Invalid + Inexact
* we weren't setting Inexact when we should
Also add some clarifying comments about what the code is doing.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Add floatnn_minnum() and floatnn_maxnum() functions which are equivalent
to the minNum() and maxNum() functions from IEEE 754-2008. They are
similar to min() and max() but differ in the handling of QNaN arguments.
Signed-off-by: Will Newton <will.newton@linaro.org>
Message-id: 1386158099-9239-5-git-send-email-will.newton@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The nan_exp argument is not used, so remove it.
Signed-off-by: Will Newton <will.newton@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1386158099-9239-4-git-send-email-will.newton@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In handling float64_muladd, if we end up doing a subtraction of the
product and c, and the 128 bit result of this subtraction happens to
have its most significant bit in bit 63, we weren't handling this
correctly when attempting to normalize to put the most significant
bit into bit 126. We would end up doing a right shift by a negative
number (undefined behaviour in C) so at best we would return an
incorrect result to the guest. MSB in bit 63 has to be handled as a
special case separately from MSB in 0..62 and MSB in 63..126. (MSB
in 127 is not possible.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Honour float_muladd_negate_c in the case where the product is zero and
c is nonzero. Previously we would fail to negate c.
Seen in (and tested against) the gfortran testsuite on MIPS.
Signed-off-by: Richard Sandiford <rdsandiford@googlemail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The interface to normalizeRoundAndPackFloat64 requires that the
high bit be clear. Perform one shift-right-and-jam if needed.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The uint64_to_float32() conversion function was incorrectly always
returning numbers with the sign bit set (ie negative numbers). Correct
this so we return positive numbers instead.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
In float16_to_float32, when returning an infinity, just pass zero
as the mantissa argument to packFloat32(), rather than shifting
a value which we know must be zero.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Based on the following Coccinelle patch:
@@
typedef int16, int_fast16_t;
@@
-int16
+int_fast16_t
Avoids a workaround for AIX.
Add typedef for pre-10 Solaris.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Cc: malc <av1474@comtv.ru>
Cc: Ben Taylor <bentaylor.solx86@gmail.com>
Tested-by: Bernhard Walle <bernhard@bwalle.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Based on the following Coccinelle patch:
@@
typedef uint16, uint_fast16_t;
@@
-uint16
+uint_fast16_t
Fixes the build of the Cocoa frontend on Mac OS X and avoids a
workaround for AIX.
For pre-10 Solaris include osdep.h.
Reported-by: Pavel Borzenkov <pavel.borzenkov@gmail.com>
Reported-by: Rui Carmo <rui.carmo@gmail.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Cc: Juan Pineda <juan@logician.com>
Cc: malc <av1474@comtv.ru>
Cc: Ben Taylor <bentaylor.solx86@gmail.com>
Tested-by: Bernhard Walle <bernhard@bwalle.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
normalizeFloat{32,64}Subnormal() expect the exponent as int16, not int.
This went unnoticed since int16 and uint16 were both typedef'ed to int.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Bernhard Walle <bernhard@bwalle.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
This change makes it compile and return the same value than the #undef one.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Fix code in roundAndPackInt32 that assumed that int32 was only
32 bits, by simply using int32_t instead. Fix the parallel bug
in roundAndPackInt64 as well, although that one is only theoretical
since it's unlikely that int64 will ever be more than 64 bits.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Code in the float64_to_int32_round_to_zero() function was assuming
that int32 would not be wider than 32 bits; this meant it might
not correctly detect the overflow case. We take the simple approach
of using int32_t. Also fix equivalent issues in the functions
for other float sizes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Implement fused multiply-add as a softfloat primitive. This implements
"a+b*c" as a single step without any intermediate rounding; it is
specified in IEEE 754-2008 and implemented in a number of CPUs.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Include config.h in softfloat.c, so that the target specific ifdefs in
softfloat-specialize.h are evaluated correctly. This was accidentally
broken in commit 789ec7ce2 when config-target.h was removed from
softfloat.h, and means that most targets will have been returning the
wrong results for calculations involving NaNs.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Prepares for uint32 replacement.
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Prepares for uint16 replacement.
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Now that softfloat-native is gone, there is no real point on not always
enabling floatx80 and float128 support.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Add a new float_flag_output_denormal which is set when the result
of a floating point operation would be denormal but is flushed to
zero because we are in flush_to_zero mode. This is necessary because
some architectures signal this condition as an underflow and others
signal it as an inexact result.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
float*_scalnb() were not taking into account all cases. This patch fixes
some corner cases:
- NaN values in input were not properly propagated and the invalid flag
not correctly raised. Use propagateFloat*NaN() for that.
- NaN or infinite values in input of floatx80_scalnb() were not correctly
detected due to a typo.
- The sum of exponent and n could overflow, leading to strange results.
Additionally having int16 defined to int make that happening for a very
small range of values. Fix that by saturating n to the maximum exponent
range, and using an explicit wider type if needed.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Add floatx80_compare() and floatx80_compare_quiet() functions to match
the softfloat-native ones.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Make clear for all comparison functions which ones trigger an exception
for all NaNs, and which one only for sNaNs.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
I am not a big fan of code moving, but having the signaling version in
the middle of quiet versions and vice versa doesn't make the code easy
to read.
This patch is a simple code move, basically swapping locations of
float*_eq and float*_eq_quiet.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
float*_eq_signaling functions have a different semantics than other
comparison functions. Fix that by renaming float*_quiet_signaling() into
float*_eq().
Note that it is purely mechanical, and the behaviour should be unchanged.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
float*_eq functions have a different semantics than other comparison
functions. Fix that by first renaming float*_quiet() into float*_eq_quiet().
Note that it is purely mechanical, and the behaviour should be unchanged.
That said it clearly highlight problems due to this different semantics,
they are fixed later in this patch series.
Cc: Alexander Graf <agraf@suse.de>
Acked-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Add float*_unordered() functions to softfloat, matching the softfloat-native
ones. Also add float*_unordered_quiet() functions to match the others
comparison functions.
This allow target-i386/ops_sse.h to be compiled with softfloat.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Add min and max operations to softfloat. This allows us to implement
propagation of NaNs and handling of negative zero correctly (unlike
the approach of having target helper routines return one of the operands
based on the result of a comparison op).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
They are defined with the same semantics as the POSIX types,
so prefer those for consistency. Suggested by Peter Maydell.
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The SoftFloat license requires "prominent notice that the work
is derivative". Having added features like improved 16-bit support
for arm already, add such a notice to the sources.
softfloat-native.[ch] are not under the SoftFloat license
and thus are not changed.
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Make softfloat compile with USE_SOFTFLOAT_STRUCT_TYPES defined, by
adding and using new macros const_float16(), const_float32() and
const_float64() so you can use array initializers in an array of
float16/float32/float64 whether the types are bare or wrapped in the
structs.
[aurelien@aurel32.net: do the same for float16]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Correctly handle NaNs in float16_to_float32(), by defining and
using a float16ToCommonNaN() function, as we do with the other formats.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Fix various bugs in the single-to-half-precision conversion code:
* input NaNs not correctly converted in IEEE mode
(fixed by defining and using a commonNaNToFloat16())
* wrong values returned when converting NaN/Inf into non-IEEE
half precision value
* wrong values returned for conversion of values which are
on the boundary between denormal and zero for the half
precision format
* zeroes not correctly identified
* excessively large results in non-IEEE mode should
generate InvalidOp, not Overflow
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Honour the default_nan_mode flag when doing conversions between
different floating point formats, as well as when returning a NaN from
a two-operand floating point function. This corrects the behaviour
of float<->double conversions on both ARM and SH4.
Signed-off-by: Christophe Lyon <christophe.lyon@st.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Add a float16 type to softfloat, rather than using bits16 directly.
Also add the missing functions float16_is_quiet_nan(),
float16_is_signaling_nan() and float16_maybe_silence_nan(),
which are needed for the float16 conversion routines.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Add support to softfloat for flushing input denormal float32 and float64
to zero. softfloat's existing 'flush_to_zero' flag only flushes denormals
to zero on output. Some CPUs need input denormals to be flushed before
processing as well. Implement this, using a new status flag to enable it
and a new exception status bit to indicate when it has happened. Existing
CPUs should be unaffected as there is no behaviour change unless the
mode is enabled.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The ARM architecture needs float/double to 16 bit integer conversions.
(The 32 bit versions aren't sufficient because of the requirement
to saturate at 16 bit MAXINT/MININT and to get the exception bits right.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Nathan Froyd <froydnj@codesourcery.com>