Reduce the number of ifdefs. Correct the result for OpenRISC
and TriCore (although TriCore fixed in target-specific code).
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Isolate the target-specific choice to 2 functions instead of 6.
The code in float16_default_nan was only correct for ARM, MIPS, and X86.
Though float16 support is rare among our targets.
The code in float128_default_nan was arguably wrong for Sparc. While
QEMU supports the Sparc 128-bit insns, no real cpu enables it.
The code in floatx80_default_nan tried to be over-general. There are
only two targets that support this format: x86 and m68k. Thus there
is no point in inventing a value for snan_bit_is_one.
Move routines that no longer have ifdefs out of softfloat-specialize.h.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
For each operand, pass a single enumeration instead of a pair of booleans.
The commit also merges multiple different ifdef-selected implementations
of pickNaNMulAdd into a single function whose body is ifdef-selected.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
For each operand, pass a single enumeration instead of a pair of booleans.
The commit also merges multiple different ifdef-selected implementations
of pickNaN into a single function whose body is ifdef-selected.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We will need these helpers within softfloat-specialize.h, so move
the definitions above the include. After specialization, they will
not always be used so mark them to avoid the Werror.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Only MIPS requires snan_bit_is_one to be variable. While we are
specializing softfloat behaviour, allow other targets to eliminate
this runtime check.
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Yongbok Kim <yongbok.kim@mips.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Alexander Graf <agraf@suse.de>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These functions are now unused.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We have already checked the arguments for SNaN;
we don't need to do it again.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This allows us to delete a lot of additional boilerplate
code which is no longer needed.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
For float16 ARM supports an alternative half-precision format which
sacrifices the ability to represent NaN/Inf in return for a higher
dynamic range. The new FloatFmt flag, arm_althp, is then used to
modify the behaviour of canonicalize and round_canonical with respect
to representation and exception raising.
Usage of this new flag waits until we re-factor float-to-float conversions.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
With a canonical representation of NaNs, we can silence an SNaN
immediately rather than delay until the final format is known.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
With a canonical representation of NaNs, we can return the
default nan directly rather than delay the expansion until
the final format is known.
Note one case where we uselessly assigned to a.sign, which was
overwritten/ignored later when expanding float_class_dnan.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Shift the NaN fraction to a canonical position, much like we
do for the fraction of normal numbers. This will facilitate
manipulation of NaNs within the shared code paths.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We want to be able to specialize on the canonical representation.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The new function assumes that the input is an SNaN and
does not double-check.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Move the ifdef inside the relevant functions instead of
duplicating the function declarations.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The significand is passed to normalizeRoundAndPackFloat128() as high
first, low second. The current code passes the integer first, so the
result is incorrectly shifted left by 64 bits.
This bug affects the emulation of s390x instruction CXLGBR (convert
from logical 64-bit binary-integer operand to extended BFP result).
Cc: qemu-stable@nongnu.org
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Message-Id: <20180511071052.1443-1-ptesarik@suse.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
In float-to-integer conversion, if the floating point input
converts exactly to the largest or smallest integer that
fits in to the result type, this is not an overflow.
In this situation we were producing the correct result value,
but were incorrectly setting the Invalid flag.
For example for Arm A64, "FCVTAS w0, d0" on an input of
0x41dfffffffc00000 should produce 0x7fffffff and set no flags.
Fix the boundary case to take the right half of the if()
statements.
This fixes a regression from 2.11 introduced by the softfloat
refactoring.
Cc: qemu-stable@nongnu.org
Fixes: ab52f973a5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180510140141.12120-1-peter.maydell@linaro.org
Reported by Coverity (CID1390635). We ensure this for uint_to_float
later on so we might as well mirror that.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
It is implementation defined whether a multiply-add of
(0,inf,qnan) or (inf,0,qnan) raises InvalidaOperation or
not, so we let the target-specific pickNaNMulAdd function
handle this. This means that we must do the "return the
default NaN in default NaN mode" check after the call,
not before. Correct the ordering, and restore the comment
from the old propagateFloat64MulAddNaN() that warned about
this corner case.
This fixes a regression from 2.11 for Arm guests where we would
incorrectly fail to set the Invalid flag for these cases.
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180504100547.14621-1-peter.maydell@linaro.org
Without bounding the increment, we can overflow exp either here
in scalbn_decomposed or when adding the bias in round_canonical.
This can result in e.g. underflowing to 0 instead of overflowing
to infinity.
The old softfloat code did bound the increment.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The re-factoring of div_floats changed the order of checking meaning
an operation like -inf/0 erroneously raises the divbyzero flag.
IEEE-754 (2008) specifies this should only occur for operations on
finite operands.
We fix this by moving the check on the dividend being Inf/0 to before
the divisor is zero check.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180416135442.30606-1-alex.bennee@linaro.org
Cc: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Tested-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The re-factor broke the raising of INVALID when NaN/Inf is passed to
the float_to_int conversion functions. round_to_uint_and_pack got this
right for NaN but also missed out the Inf handling.
Fixes https://bugs.launchpad.net/qemu/+bug/1759264
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Tested-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20180413140334.26622-3-alex.bennee@linaro.org
Cc: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Before 8936006 ("fpu/softfloat: re-factor minmax", 2018-02-21),
we used to return +Zero for maxnummag(-Zero,+Zero); after that
commit, we return -Zero.
Fix it by making {min,max}nummag consistent with {min,max}num,
deferring to the latter when the absolute value of the operands
is the same.
With this fix we now pass fp-test.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180413140334.26622-2-alex.bennee@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We incorrectly passed in the current rounding mode
instead of float_round_to_zero.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180410055912.934-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Helper routines for FPU instructions and NaN definitions.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Sagar Karandikar <sagark@eecs.berkeley.edu>
Signed-off-by: Michael Clark <mjc@sifive.com>
Since f3218a8 ("softfloat: add floatx80 constants")
floatx80_infinity is defined but never used.
This patch updates floatx80 functions to use
this definition.
This allows to define a different default Infinity
value on m68k: the m68k FPU defines infinity with
all bits set to zero in the mantissa.
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20180224201802.911-4-laurent@vivier.eu>
Move fpu/softfloat-macros.h to include/fpu/
Export floatx80 functions to be used by target floatx80
specific implementations.
Exports:
propagateFloatx80NaN(), extractFloatx80Frac(),
extractFloatx80Exp(), extractFloatx80Sign(),
normalizeFloatx80Subnormal(), packFloatx80(),
roundAndPackFloatx80(), normalizeRoundAndPackFloatx80()
Also exports packFloat32() that will be used to implement
m68k fsinh, fcos, fsin, ftan operations.
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20180224201802.911-2-laurent@vivier.eu>
This is a little bit of a departure from softfloat's original approach
as we skip the estimate step in favour of a straight iteration. There
is a minor optimisation to avoid calculating more bits of precision
than we need however this still brings a performance drop, especially
for float64 operations.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
The compare function was already expanded from a macro. I keep the
macro expansion but move most of the logic into a compare_decomposed.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Let's do the same re-factor treatment for minmax functions. I still
use the MACRO trick to expand but now all the checking code is common.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
This is one of the simpler manipulations you could make to a floating
point number.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
These are considerably simpler as the lower order integers can just
use the higher order conversion function. As the decomposed fractional
part is a full 64 bit rounding and inexact handling comes from the
pack functions.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
We share the common int64/uint64_pack_decomposed function across all
the helpers and simply limit the final result depending on the final
size.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
We can now add float16_round_to_int and use the common round_decomposed and
canonicalize functions to have a single implementation for
float16/32/64 round_to_int functions.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
We can now add float16_muladd and use the common decompose and
canonicalize functions to have a single implementation for
float16/32/64 muladd functions.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
We can now add float16_div and use the common decompose and
canonicalize functions to have a single implementation for
float16/32/64 versions.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
We can now add float16_mul and use the common decompose and
canonicalize functions to have a single implementation for
float16/32/64 versions.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
We can now add float16_add/sub and use the common decompose and
canonicalize functions to have a single implementation for
float16/32/64 add and sub functions.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
These structures pave the way for generic softfloat helper routines
that will operate on fully decomposed numbers.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
This is pure code-motion during re-factoring as the helpers will be
needed earlier.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Mention the pseudo-code fragment from which this is based.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
This will be required when expanding the MINMAX() macro for 16
bit/half-precision operations.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Add a function to round a floatx80 to the defined precision
(floatx80_rounding_precision)
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <20170628204241.32106-5-laurent@vivier.eu>
In float64_to_uint64_round_to_zero() a typo meant that we were
taking the uint64_t return value from float64_to_uint64() and
putting it into an int64_t variable before returning it as
uint64_t again. Use uint64_t instead of pointlessly casting it
back and forth to int64_t.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
float128_to_uint32_round_to_zero() is needed by xscvqpuwz instruction
of PowerPC ISA 3.0.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Implement float128_to_uint64() and use that to implement
float128_to_uint64_round_to_zero()
This is required by xscvqpudz instruction of PowerPC ISA 3.0.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Power ISA 3.0 introduces a few quadruple precision floating point
instructions that support round-to-odd rounding mode. The
round-to-odd mode is explained as under:
Let Z be the intermediate arithmetic result or the operand of a convert
operation. If Z can be represented exactly in the target format, the
result is Z. Otherwise the result is either Z1 or Z2 whichever is odd.
Here Z1 and Z2 are the next larger and smaller numbers representable
in the target format respectively.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently float128_default_nan() returns 0xFFFF800000000000 in the
higher double word, but it should return 0x7FFF800000000000 which
is the correct higher double word for default qNAN on PowerPC.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Like the original MIPS, HPPA has the MSB of an SNaN set.
However, it has different rules for silencing an SNaN:
(1) msb is cleared and (2) msb-1 must be set if the fraction
is now zero, and (implementation defined) may be set always.
I haven't checked real hardware but chose the set always
alternative because it's easy and within spec.
Signed-off-by: Richard Henderson <rth@twiddle.net>
All operations that take a floatx80 as an operand need to have their
inputs checked for malformed encodings. In all of these cases, use the
function floatx80_invalid_encoding to perform the check. If an invalid
operand is found, raise an invalid operation exception, and then return
either NaN (for fp-typed results) or the integer indefinite value (the
minimum representable signed integer value, for int-typed results).
For the non-quiet comparison operations, this touches adjacent code in
order to pass style checks.
Signed-off-by: Andrew Dutcher <andrew@andrewdutcher.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1471392895-17324-1-git-send-email-andrew@andrewdutcher.com
[PMM: changed "1 << 63" to "1ULL << 63" to fix compile errors]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Change the flag type to 'uint8_t' to fix the implicit conversion error.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id: 20160810185502.32015-1-bobby.prani@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Only for Mips platform, and only for cases when snan_bit_is_one is 0,
correct the order of argument comparisons in pickNaNMulAdd().
For more info, see [1], page 53, section "3.5.3 NaN Propagation".
[1] "MIPS Architecture for Programmers Volume IV-j:
The MIPS32 SIMD Architecture Module",
Imagination Technologies LTD, Revision 1.12, February 3, 2016
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[leon.alrae@imgtec.com:
* reworded the subject of the patch
* swapped if/else code blocks to match the commit description]
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
Only for Mips platform, and only for cases when snan_bit_is_one is 0,
correct default NaN values (in their 16-, 32-, and 64-bit flavors).
For more info, see [1], page 84, Table 6.3 "Value Supplied When a New
Quiet NaN Is Created", and [2], page 52, Table 3.7 "Default NaN
Encodings".
[1] "MIPS Architecture For Programmers Volume II-A:
The MIPS64 Instruction Set Reference Manual",
Imagination Technologies LTD, Revision 6.04, November 13, 2015
[2] "MIPS Architecture for Programmers Volume IV-j:
The MIPS32 SIMD Architecture Module",
Imagination Technologies LTD, Revision 1.12, February 3, 2016
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
fpu/softfloat-specialize.h is the most critical file in SoftFloat
library, since it handles numerous differences between platforms in
relation to floating point arithmetics. This patch makes the code
in this file more consistent format-wise, and hopefully easier to
debug and maintain.
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
This patch modifies SoftFloat library so that it can be configured in
run-time in relation to the meaning of signaling NaN bit, while, at the
same time, strictly preserving its behavior on all existing platforms.
Background:
In floating-point calculations, there is a need for denoting undefined or
unrepresentable values. This is achieved by defining certain floating-point
numerical values to be NaNs (which stands for "not a number"). For additional
reasons, virtually all modern floating-point unit implementations use two
kinds of NaNs: quiet and signaling. The binary representations of these two
kinds of NaNs, as a rule, differ only in one bit (that bit is, traditionally,
the first bit of mantissa).
Up to 2008, standards for floating-point did not specify all details about
binary representation of NaNs. More specifically, the meaning of the bit
that is used for distinguishing between signaling and quiet NaNs was not
strictly prescribed. (IEEE 754-2008 was the first floating-point standard
that defined that meaning clearly, see [1], p. 35) As a result, different
platforms took different approaches, and that presented considerable
challenge for multi-platform emulators like QEMU.
Mips platform represents the most complex case among QEMU-supported
platforms regarding signaling NaN bit. Up to the Release 6 of Mips
architecture, "1" in signaling NaN bit denoted signaling NaN, which is
opposite to IEEE 754-2008 standard. From Release 6 on, Mips architecture
adopted IEEE standard prescription, and "0" denotes signaling NaN. On top of
that, Mips architecture for SIMD (also known as MSA, or vector instructions)
also specifies signaling bit in accordance to IEEE standard. MSA unit can be
implemented with both pre-Release 6 and Release 6 main processor units.
QEMU uses SoftFloat library to implement various floating-point-related
instructions on all platforms. The current QEMU implementation allows for
defining meaning of signaling NaN bit during build time, and is implemented
via preprocessor macro called SNAN_BIT_IS_ONE.
On the other hand, the change in this patch enables SoftFloat library to be
configured in run-time. This configuration is meant to occur during CPU
initialization, at the moment when it is definitely known what desired
behavior for particular CPU (or any additional FPUs) is.
The change is implemented so that it is consistent with existing
implementation of similar cases. This means that structure float_status is
used for passing the information about desired signaling NaN bit on each
invocation of SoftFloat functions. The additional field in float_status is
called snan_bit_is_one, which supersedes macro SNAN_BIT_IS_ONE.
IMPORTANT:
This change is not meant to create any change in emulator behavior or
functionality on any platform. It just provides the means for SoftFloat
library to be used in a more flexible way - in other words, it will just
prepare SoftFloat library for usage related to Mips platform and its
specifics regarding signaling bit meaning, which is done in some of
subsequent patches from this series.
Further break down of changes:
1) Added field snan_bit_is_one to the structure float_status, and
correspondent setter function set_snan_bit_is_one().
2) Constants <float16|float32|float64|floatx80|float128>_default_nan
(used both internally and externally) converted to functions
<float16|float32|float64|floatx80|float128>_default_nan(float_status*).
This is necessary since they are dependent on signaling bit meaning.
At the same time, for the sake of code cleanup and simplicity, constants
<floatx80|float128>_default_nan_<low|high> (used only internally within
SoftFloat library) are removed, as not needed.
3) Added a float_status* argument to SoftFloat library functions
XXX_is_quiet_nan(XXX a_), XXX_is_signaling_nan(XXX a_),
XXX_maybe_silence_nan(XXX a_). This argument must be present in
order to enable correct invocation of new version of functions
XXX_default_nan(). (XXX is <float16|float32|float64|floatx80|float128>
here)
4) Updated code for all platforms to reflect changes in SoftFloat library.
This change is twofolds: it includes modifications of SoftFloat library
functions invocations, and an addition of invocation of function
set_snan_bit_is_one() during CPU initialization, with arguments that
are appropriate for each particular platform. It was established that
all platforms zero their main CPU data structures, so snan_bit_is_one(0)
in appropriate places is not added, as it is not needed.
[1] "IEEE Standard for Floating-Point Arithmetic",
IEEE Computer Society, August 29, 2008.
Signed-off-by: Thomas Schwinge <thomas@codesourcery.com>
Signed-off-by: Maciej W. Rozycki <macro@codesourcery.com>
Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com>
Tested-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Tested-by: Leon Alrae <leon.alrae@imgtec.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[leon.alrae@imgtec.com:
* cherry-picked 2 chunks from patch #2 to fix compilation warnings]
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
This patch adds a file for all the FPU related helpers with all the includes,
useful defines, and a function to update the status bits. Additionally it adds
a mask for the rounding mode bits of PSW as well as all the opcodes for the
FPU instructions.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-Id: <1457708597-3025-2-git-send-email-kbastian@mail.uni-paderborn.de>
Use the plain 'int' type rather than 'int_fast16_t' for handling
exponents. Exponents don't need to be exactly 16 bits, so using int16_t
for them would confuse more than it clarified.
This should be a safe change because int_fast16_t semantics
permit use of 'int' (and on 32-bit glibc that is what you get).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id: 1453807806-32698-4-git-send-email-peter.maydell@linaro.org
Use the plain 'int' type rather than 'int_fast16_t' for shift counts
in the various shift related functions, since we don't actually care
about the size of the integer at all here, and using int16_t would
be confusing.
This should be a safe change because int_fast16_t semantics
permit use of 'int' (and on 32-bit glibc that is what you get).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id: 1453807806-32698-3-git-send-email-peter.maydell@linaro.org
Make the functions which convert floating point to 16 bit integer
return int16_t rather than int_fast16_t, and correspondingly use
int_fast16_t in their internal implementations where appropriate.
(These functions are used only by the ARM target.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id: 1453807806-32698-2-git-send-email-peter.maydell@linaro.org
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1454089805-5470-16-git-send-email-peter.maydell@linaro.org
The roundAndPackFloat16 function should return a float16 value, not a
float32 one. Fix that.
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1452700993-6570-1-git-send-email-aurelien@aurel32.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Replace the int8 softfloat-specific typedef with int8_t.
This change was made with
find include hw fpu target-* -name '*.[ch]' | xargs sed -i -e 's/\bint8\b/int8_t/g'
together with manual removal of the typedef definition, and
manual undoing of various mis-hits.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Acked-by: Leon Alrae <leon.alrae@imgtec.com>
Acked-by: James Hogan <james.hogan@imgtec.com>
Message-id: 1452603315-27030-6-git-send-email-peter.maydell@linaro.org
Replace the uint32 softfloat-specific typedef with uint32_t.
This change was made with
find include hw fpu target-* -name '*.[ch]' | xargs sed -i -e 's/\buint32\b/uint32_t/g'
together with manual removal of the typedef definition,
manual undoing of various mis-hits, and another couple of
fixes found via test compilation.
All the uses in hw/ were using the wrong type by mistake.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Acked-by: Leon Alrae <leon.alrae@imgtec.com>
Acked-by: James Hogan <james.hogan@imgtec.com>
Message-id: 1452603315-27030-5-git-send-email-peter.maydell@linaro.org
Replace the int32 softfloat-specific typedef with int32_t.
This change was made with
find hw include fpu target-* -name '*.[ch]' | xargs sed -i -e 's/\bint32\b/int32_t/g'
together with manual removal of the typedef definition, and
manual undoing of some mis-hits where macro arguments were
being used for token pasting rather than as a type.
The uses in hw/ipmi/ should not have been using this type at all.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Acked-by: Leon Alrae <leon.alrae@imgtec.com>
Acked-by: James Hogan <james.hogan@imgtec.com>
Message-id: 1452603315-27030-4-git-send-email-peter.maydell@linaro.org
Replace the uint64 softfloat-specific typedef with uint64_t.
This change was made with
find include fpu target-* -name '*.[ch]' | xargs sed -i -e 's/\buint64\b/uint64_t/g'
together with manual removal of the typedef definition, and
manual undoing of some mis-hits where macro arguments were
being used for token pasting rather than as a type.
Note that the target-mips/kvm.c and target-s390x/kvm.c changes are fixing
code that should not have been using the uint64 type in the first place.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Acked-by: Leon Alrae <leon.alrae@imgtec.com>
Acked-by: James Hogan <james.hogan@imgtec.com>
Message-id: 1452603315-27030-3-git-send-email-peter.maydell@linaro.org
Replace the int64 softfloat-specific typedef with int64_t.
This change was made with
find include fpu target-* -name '*.[ch]' | xargs sed -i -e 's/\bint64\b/int64_t/g'
together with manual removal of the typedef definition, and
manual undoing of some mis-hits where macro arguments were
being used for token pasting rather than as a type.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Acked-by: Leon Alrae <leon.alrae@imgtec.com>
Message-id: 1452603315-27030-2-git-send-email-peter.maydell@linaro.org
Expand out STATUS_PARAM wherever it is used and delete the definition.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
The code in the softfloat source files is under a mixture of
licenses: the original code and many changes from QEMU contributors
are under the base SoftFloat-2a license; changes from Stefan Weil
and RedHat employees are GPLv2-or-later; changes from Fabrice Bellard
are under the BSD license. Clarify this in the comments at the
top of each affected source file, including a statement about
the assumed licensing for future contributions, so we don't need
to remember to ask patch submitters explicitly to pick a license.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Andreas Färber <afaerber@suse.de>
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Acked-by: Avi Kivity <avi.kivity@gmail.com>
Acked-by: Ben Taylor <bentaylor.solx86@gmail.com>
Acked-by: Blue Swirl <blauwirbel@gmail.com>
Acked-by: Christophe Lyon <christophe.lyon@st.com>
Acked-by: Fabrice Bellard <fabrice@bellard.org>
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: Juan Quintela <quintela@redhat.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Paul Brook <paul@codesourcery.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Richard Henderson <rth@twiddle.net>
Acked-by: Richard Sandiford <rdsandiford@googlemail.com>
Acked-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1421073508-23909-5-git-send-email-peter.maydell@linaro.org
Revert the parts of commits b645bb4885 and 5a6932d51d which are still
in the codebase and under a SoftFloat-2b license.
Reimplement support for architectures where the most significant bit
in the mantissa is 1 for a signaling NaN rather than a quiet NaN,
by adding handling for SNAN_BIT_IS_ONE being set to the functions
which test values for NaN-ness.
This includes restoring the bugfixes lost in the reversion where
some of the float*_is_quiet_nan() functions were returning true
for both signaling and quiet NaNs.
[This is a mechanical squashing together of two separate "revert"
and "reimplement" patches.]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1421073508-23909-4-git-send-email-peter.maydell@linaro.org
Revert the remaining portions of commits 75d62a5856 and 3430b0be36
which are under a SoftFloat-2b license, ie the functions
uint64_to_float32() and uint64_to_float64(). (The float64_to_uint64()
and float64_to_uint64_round_to_zero() functions were completely
rewritten in commits fb3ea83aa and 0a87a3107d so can stay.)
Reimplement from scratch the uint64_to_float64() and uint64_to_float32()
conversion functions.
[This is a mechanical squashing together of two separate "revert"
and "reimplement" patches.]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1421073508-23909-3-git-send-email-peter.maydell@linaro.org
This commit applies the changes to master which correspond to
replacing commit 158142c2c2 with a set of changes made by:
* taking the SoftFloat-2a release
* mechanically transforming the block comment style
* reapplying Fabrice's original changes from 158142c2c2
This commit was created by:
diff -u 158142c2c2 import-sf-2a
patch -p1 --fuzz 10 <../relicense-patch.txt
(where import-sf-2a is the branch resulting from the changes above).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1421073508-23909-2-git-send-email-peter.maydell@linaro.org
Add abs argument to the existing softfloat minmax() function and define
new float{32,64}_{min,max}nummag functions.
minnummag(x,y) returns x if |x| < |y|,
returns y if |y| < |x|,
otherwise minnum(x,y)
maxnummag(x,y) returns x if |x| > |y|,
returns y if |y| > |x|,
otherwise maxnum(x,y)
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
This commit expands all uses of the INLINE macro and drop it.
The reason for this is to avoid clashes with external libraries with
bad name conventions and also because renaming keywords is not a good
practice.
PS: I'm fine with this change to be licensed under softfloat-2a or
softfloat-2b.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
This change adds the float32_to_uint64_round_to_zero function to the softfloat
library. This function fills out the complement of float32 to INT round-to-zero
conversion rountines, where INT is {int32_t, uint32_t, int64_t, uint64_t}.
This contribution can be licensed under either the softfloat-2a or -2b
license.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Tested-by: Tom Musta <tommusta@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
I need these available outside of softfloat for some of the reciprocal
processing in aarch64 helper functions.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 1394822294-14837-20-git-send-email-peter.maydell@linaro.org
The ARMv8 instruction set includes a fused floating point
reciprocal square root step instruction which demands an
"(x * y + z) / 2" fused operation. Support this by adding
a flag to the softfloat muladd operations which requests
that the result is halved before rounding.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
IEEE754-2008 specifies a new rounding mode:
"roundTiesToAway: the floating-point number nearest to the infinitely
precise result shall be delivered; if the two nearest floating-point
numbers bracketing an unrepresentable infinitely precise result are
equally near, the one with larger magnitude shall be delivered."
Implement this new mode (it is needed for ARM). The general principle
is that the required code is exactly like the ties-to-even code,
except that we do not need to do the "in case of exact tie clear LSB
to round-to-even", because the rounding operation naturally causes
the exact tie to round up in magnitude.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Refactor the code in various functions which calculates rounding
increments given the current rounding mode, so that instead of a
set of nested if statements we have a simple switch statement.
This will give us a clean place to add the case for the new
tiesAway rounding mode.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Add the conversion functions float16_to_float64() and
float64_to_float16(), which will be needed for the ARM
A64 instruction set.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
In preparation for adding conversions between float16 and float64,
factor out code currently done inline in the float16<=>float32
conversion functions into functions RoundAndPackFloat16 and
NormalizeFloat16Subnormal along the lines of the existing versions
for the other float types.
Note that we change the handling of zExp from the inline code
to match the API of the other RoundAndPackFloat functions; however
we leave the positioning of the binary point between bits 22 and 23
rather than shifting it up to the high end of the word.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Tidy up the get/set accessors for the fp state to add missing ones
and make them all inline in softfloat.h rather than some inline and
some not.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
The float64_to_uint32_round_to_zero routine is incorrect.
For example, the following test pattern:
425F81378DC0CD1F / 0x1.f81378dc0cd1fp+38
will erroneously set the inexact flag.
This patch re-implements the routine to use the float64_to_uint64_round_to_zero
routine. If saturation occurs we ignore any flags set by the
conversion function and raise only Invalid.
This contribution can be licensed under either the softfloat-2a or -2b
license.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Message-id: 1387397961-4894-6-git-send-email-tommusta@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
The float64_to_uint32 has several flaws:
- for numbers between 2**32 and 2**64, the inexact exception flag
may get incorrectly set. In this case, only the invalid flag
should be set.
test pattern: 425F81378DC0CD1F / 0x1.f81378dc0cd1fp+38
- for numbers between 2**63 and 2**64, incorrect results may
be produced:
test pattern: 43EAAF73F1F0B8BD / 0x1.aaf73f1f0b8bdp+63
This patch re-implements float64_to_uint32 to re-use the
float64_to_uint64 routine (instead of float64_to_int64). For the
saturation case, we ignore any flags which the conversion routine
has set and raise only the invalid flag.
This contribution can be licensed under either the softfloat-2a or -2b
license.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Message-id: 1387397961-4894-5-git-send-email-tommusta@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
The float64_to_uint64_round_to_zero routine is incorrect.
For example, the following test pattern:
46697351FF4AEC29 / 0x1.97351ff4aec29p+103
currently produces 8000000000000000 instead of FFFFFFFFFFFFFFFF.
This patch re-implements the routine to temporarily force the
rounding mode and use the float64_to_uint64 routine.
This contribution can be licensed under either the softfloat-2a or -2b
license.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Message-id: 1387397961-4894-4-git-send-email-tommusta@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
This patch adds the float32_to_uint64() routine, which converts a
32-bit floating point number to an unsigned 64 bit number.
This contribution can be licensed under either the softfloat-2a or -2b
license.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: removed harmless but silly int64_t casts]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
If the input to float*_scalbn() is denormal then it represents
a number 0.[mantissabits] * 2^(1-exponentbias) (and the actual
exponent field is all zeroes). This means that when we convert
it to our unpacked encoding the unpacked exponent must be one
greater than for a normal number, which represents
1.[mantissabits] * 2^(e-exponentbias) for an exponent field e.
This meant we were giving answers too small by a factor of 2 for
all denormal inputs.
Note that the float-to-int routines also have this behaviour
of not adjusting the exponent for denormals; however there it is
harmless because denormals will all convert to integer zero anyway.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
We implement a number of float-to-integer conversions using conversion
to an integer type with a wider range and then a check against the
narrower range we are actually converting to. If we find the result to
be out of range we correctly raise the Invalid exception, but we must
also suppress other exceptions which might have been raised by the
conversion function we called.
This won't throw away exceptions we should have preserved, because for
the 'core' exception flags the IEEE spec mandates that the only valid
combinations of exception that can be raised by a single operation are
Inexact + Overflow and Inexact + Underflow. For the non-IEEE softfloat
flag for input denormals, we can guarantee that that flag won't have
been set for out of range float-to-int conversions because a squashed
denormal by definition goes to plus or minus zero, which is always in
range after conversion to integer zero.
This bug has been fixed for some of the float-to-int conversion routines
by previous patches; fix it for the remaining functions as well, so
that they all restore the pre-conversion status flags prior to raising
Invalid.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
The comment preceding the float64_to_uint64 routine suggests that
the implementation is broken. And this is, indeed, the case.
This patch properly implements the conversion of a 64-bit floating
point number to an unsigned, 64 bit integer.
This contribution can be licensed under either the softfloat-2a or -2b
license.
Signed-off-by: Tom Musta <tommusta@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Currently the int-to-float functions take types which are specified
as "at least X bits wide", rather than "exactly X bits wide". This is
confusing and unhelpful since it means that the callers have to include
an explicit cast to [u]intXX_t to ensure the correct behaviour. Fix
them all to take the exactly-X-bits-wide types instead.
Note that this doesn't change behaviour at all since at the moment
we happen to define the 'int32' and 'uint32' types as exactly 32 bits
wide, and the 'int64' and 'uint64' types as exactly 64 bits wide.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
ARMv8 requires support for converting 32 and 64bit floating point
values to signed and unsigned 16bit integers.
Signed-off-by: Will Newton <will.newton@linaro.org>
[PMM: updated not to incorrectly set Inexact for Invalid inputs]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Our float32 to float16 conversion routine was generating the correct
numerical answers, but not always setting the right set of exception
flags. Fix this, mostly by rearranging the code to more closely
resemble RoundAndPackFloat*, and in particular:
* non-IEEE halfprec always raises Invalid for input NaNs
* we need to check for the overflow case before underflow
* we weren't getting the tininess-detected-after-rounding
case correct (somewhat academic since only ARM uses halfprec
and it is always tininess-detected-before-rounding)
* non-IEEE halfprec overflow raises only Invalid, not
Invalid + Inexact
* we weren't setting Inexact when we should
Also add some clarifying comments about what the code is doing.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Add floatnn_minnum() and floatnn_maxnum() functions which are equivalent
to the minNum() and maxNum() functions from IEEE 754-2008. They are
similar to min() and max() but differ in the handling of QNaN arguments.
Signed-off-by: Will Newton <will.newton@linaro.org>
Message-id: 1386158099-9239-5-git-send-email-will.newton@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The nan_exp argument is not used, so remove it.
Signed-off-by: Will Newton <will.newton@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1386158099-9239-4-git-send-email-will.newton@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>