Patchwork [V4,02/22] softfloat: Add float32_to_uint64()

login
register
mail settings
Submitter Tom Musta
Date Dec. 18, 2013, 8:19 p.m.
Message ID <1387397961-4894-3-git-send-email-tommusta@gmail.com>
Download mbox | patch
Permalink /patch/303093/
State New
Headers show

Comments

Tom Musta - Dec. 18, 2013, 8:19 p.m.
This patch adds the float32_to_uint64() routine, which converts a
32-bit floating point number to an unsigned 64 bit number.

This contribution can be licensed under either the softfloat-2a or -2b
license.

V2: Reduced patch to just this single routine per feedback from Peter
Maydell.

V4: Now passing sign to roundAndPackUint64()

Signed-off-by: Tom Musta <tommusta@gmail.com>
---
 fpu/softfloat.c         |   45 +++++++++++++++++++++++++++++++++++++++++++++
 include/fpu/softfloat.h |    1 +
 2 files changed, 46 insertions(+), 0 deletions(-)
Peter Maydell - Dec. 19, 2013, 9:31 p.m.
On 18 December 2013 20:19, Tom Musta <tommusta@gmail.com> wrote:
> This patch adds the float32_to_uint64() routine, which converts a
> 32-bit floating point number to an unsigned 64 bit number.
>
> This contribution can be licensed under either the softfloat-2a or -2b
> license.
>
> V2: Reduced patch to just this single routine per feedback from Peter
> Maydell.
>
> V4: Now passing sign to roundAndPackUint64()
>
> Signed-off-by: Tom Musta <tommusta@gmail.com>
> ---
>  fpu/softfloat.c         |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  include/fpu/softfloat.h |    1 +
>  2 files changed, 46 insertions(+), 0 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index ec23908..1ff59d0 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -1558,6 +1558,51 @@ int64 float32_to_int64( float32 a STATUS_PARAM )
>
>  /*----------------------------------------------------------------------------
>  | Returns the result of converting the single-precision floating-point value
> +| `a' to the 64-bit unsigned integer format.  The conversion is
> +| performed according to the IEC/IEEE Standard for Binary Floating-Point
> +| Arithmetic---which means in particular that the conversion is rounded
> +| according to the current rounding mode.  If `a' is a NaN, the largest
> +| unsigned integer is returned.  Otherwise, if the conversion overflows, the
> +| largest unsigned integer is returned.  If the 'a' is negative, zero is
> +| returned.
> +*----------------------------------------------------------------------------*/
> +
> +uint64 float32_to_uint64(float32 a STATUS_PARAM)
> +{
> +    flag aSign;
> +    int_fast16_t aExp, shiftCount;
> +    uint32_t aSig;
> +    uint64_t aSig64, aSigExtra;
> +    a = float32_squash_input_denormal(a STATUS_VAR);
> +
> +    aSig = extractFloat32Frac(a);
> +    aExp = extractFloat32Exp(a);
> +    aSign = extractFloat32Sign(a);
> +    if (aSign) {
> +        if (aExp) {
> +            float_raise(float_flag_invalid STATUS_VAR);

NaNs with the sign bit set will wind up in this case and return 0
rather than largest-unsigned-integer.

Also it seems like this code says "negative inputs return
zero if they're denormal or signal Invalid and return 0
if they're not". Are you sure this does the right thing for
(a) values which are not denormal but are close enough
to zero to round to it and (b) different rounding modes?

> +        } else if (aSig) { /* negative denormalized */
> +            float_raise(float_flag_inexact STATUS_VAR);
> +        }
> +        return 0;
> +    }
> +    shiftCount = 0xBE - aExp;
> +    if (aExp) {
> +        aSig |= 0x00800000;
> +    }
> +    if (shiftCount < 0) {
> +        float_raise(float_flag_invalid STATUS_VAR);
> +        return (int64_t)LIT64(0xFFFFFFFFFFFFFFFF);
> +    }
> +
> +    aSig64 = aSig;
> +    aSig64 <<= 40;
> +    shift64ExtraRightJamming(aSig64, 0, shiftCount, &aSig64, &aSigExtra);
> +    return roundAndPackUint64(aSign, aSig64, aSigExtra STATUS_VAR);
> +}

thanks
-- PMM
Tom Musta - Dec. 20, 2013, 8:07 p.m.
On 12/19/2013 3:31 PM, Peter Maydell wrote:
> On 18 December 2013 20:19, Tom Musta <tommusta@gmail.com> wrote:
>> This patch adds the float32_to_uint64() routine, which converts a
>> 32-bit floating point number to an unsigned 64 bit number.
>>
>> This contribution can be licensed under either the softfloat-2a or -2b
>> license.
>>
>> V2: Reduced patch to just this single routine per feedback from Peter
>> Maydell.
>>
>> V4: Now passing sign to roundAndPackUint64()
>>
>> Signed-off-by: Tom Musta <tommusta@gmail.com>
>> ---
>>  fpu/softfloat.c         |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>  include/fpu/softfloat.h |    1 +
>>  2 files changed, 46 insertions(+), 0 deletions(-)
>>
>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>> index ec23908..1ff59d0 100644
>> --- a/fpu/softfloat.c
>> +++ b/fpu/softfloat.c
>> @@ -1558,6 +1558,51 @@ int64 float32_to_int64( float32 a STATUS_PARAM )
>>
>>  /*----------------------------------------------------------------------------
>>  | Returns the result of converting the single-precision floating-point value
>> +| `a' to the 64-bit unsigned integer format.  The conversion is
>> +| performed according to the IEC/IEEE Standard for Binary Floating-Point
>> +| Arithmetic---which means in particular that the conversion is rounded
>> +| according to the current rounding mode.  If `a' is a NaN, the largest
>> +| unsigned integer is returned.  Otherwise, if the conversion overflows, the
>> +| largest unsigned integer is returned.  If the 'a' is negative, zero is
>> +| returned.
>> +*----------------------------------------------------------------------------*/
>> +
>> +uint64 float32_to_uint64(float32 a STATUS_PARAM)
>> +{
>> +    flag aSign;
>> +    int_fast16_t aExp, shiftCount;
>> +    uint32_t aSig;
>> +    uint64_t aSig64, aSigExtra;
>> +    a = float32_squash_input_denormal(a STATUS_VAR);
>> +
>> +    aSig = extractFloat32Frac(a);
>> +    aExp = extractFloat32Exp(a);
>> +    aSign = extractFloat32Sign(a);
>> +    if (aSign) {
>> +        if (aExp) {
>> +            float_raise(float_flag_invalid STATUS_VAR);
> 
> NaNs with the sign bit set will wind up in this case and return 0
> rather than largest-unsigned-integer.
> 
> Also it seems like this code says "negative inputs return
> zero if they're denormal or signal Invalid and return 0
> if they're not". Are you sure this does the right thing for
> (a) values which are not denormal but are close enough
> to zero to round to it and (b) different rounding modes?
> 
>> +        } else if (aSig) { /* negative denormalized */
>> +            float_raise(float_flag_inexact STATUS_VAR);
>> +        }
>> +        return 0;
>> +    }
>> +    shiftCount = 0xBE - aExp;
>> +    if (aExp) {
>> +        aSig |= 0x00800000;
>> +    }
>> +    if (shiftCount < 0) {
>> +        float_raise(float_flag_invalid STATUS_VAR);
>> +        return (int64_t)LIT64(0xFFFFFFFFFFFFFFFF);
>> +    }
>> +
>> +    aSig64 = aSig;
>> +    aSig64 <<= 40;
>> +    shift64ExtraRightJamming(aSig64, 0, shiftCount, &aSig64, &aSigExtra);
>> +    return roundAndPackUint64(aSign, aSig64, aSigExtra STATUS_VAR);
>> +}
> 
> thanks
> -- PMM
> 

Peter:  I agree ... this still isn't quite right.

Patch

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index ec23908..1ff59d0 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1558,6 +1558,51 @@  int64 float32_to_int64( float32 a STATUS_PARAM )
 
 /*----------------------------------------------------------------------------
 | Returns the result of converting the single-precision floating-point value
+| `a' to the 64-bit unsigned integer format.  The conversion is
+| performed according to the IEC/IEEE Standard for Binary Floating-Point
+| Arithmetic---which means in particular that the conversion is rounded
+| according to the current rounding mode.  If `a' is a NaN, the largest
+| unsigned integer is returned.  Otherwise, if the conversion overflows, the
+| largest unsigned integer is returned.  If the 'a' is negative, zero is
+| returned.
+*----------------------------------------------------------------------------*/
+
+uint64 float32_to_uint64(float32 a STATUS_PARAM)
+{
+    flag aSign;
+    int_fast16_t aExp, shiftCount;
+    uint32_t aSig;
+    uint64_t aSig64, aSigExtra;
+    a = float32_squash_input_denormal(a STATUS_VAR);
+
+    aSig = extractFloat32Frac(a);
+    aExp = extractFloat32Exp(a);
+    aSign = extractFloat32Sign(a);
+    if (aSign) {
+        if (aExp) {
+            float_raise(float_flag_invalid STATUS_VAR);
+        } else if (aSig) { /* negative denormalized */
+            float_raise(float_flag_inexact STATUS_VAR);
+        }
+        return 0;
+    }
+    shiftCount = 0xBE - aExp;
+    if (aExp) {
+        aSig |= 0x00800000;
+    }
+    if (shiftCount < 0) {
+        float_raise(float_flag_invalid STATUS_VAR);
+        return (int64_t)LIT64(0xFFFFFFFFFFFFFFFF);
+    }
+
+    aSig64 = aSig;
+    aSig64 <<= 40;
+    shift64ExtraRightJamming(aSig64, 0, shiftCount, &aSig64, &aSigExtra);
+    return roundAndPackUint64(aSign, aSig64, aSigExtra STATUS_VAR);
+}
+
+/*----------------------------------------------------------------------------
+| Returns the result of converting the single-precision floating-point value
 | `a' to the 64-bit two's complement integer format.  The conversion is
 | performed according to the IEC/IEEE Standard for Binary Floating-Point
 | Arithmetic, except that the conversion is always rounded toward zero.  If
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 2365274..080b36d 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -272,6 +272,7 @@  int32 float32_to_int32_round_to_zero( float32 STATUS_PARAM );
 uint32 float32_to_uint32( float32 STATUS_PARAM );
 uint32 float32_to_uint32_round_to_zero( float32 STATUS_PARAM );
 int64 float32_to_int64( float32 STATUS_PARAM );
+uint64 float32_to_uint64(float32 STATUS_PARAM);
 int64 float32_to_int64_round_to_zero( float32 STATUS_PARAM );
 float64 float32_to_float64( float32 STATUS_PARAM );
 floatx80 float32_to_floatx80( float32 STATUS_PARAM );