diff mbox

softfloat: roundAndPackFloat16 should return FPInfinity and FPMaxNormal based on overflow_to_inf.

Message ID 1423493767-3955-1-git-send-email-libhu.so@gmail.com
State New
Headers show

Commit Message

Xiangyu Hu Feb. 9, 2015, 2:56 p.m. UTC
Flag overflow_to_inf is specified by ARM manual to decide if
FPInfinity(sign) or FPMaxNormal(sign) is returned. Current
codepath fails to check it.
Fixed by adding overflow_to_inf flag to return infinity when it's
true and maxnormal otherwise.

Signed-off-by: Xiangyu Hu <libhu.so@gmail.com>
---
 fpu/softfloat.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Peter Maydell Feb. 11, 2015, 1:23 p.m. UTC | #1
On 9 February 2015 at 14:56, Xiangyu Hu <libhu.so@gmail.com> wrote:
> Flag overflow_to_inf is specified by ARM manual to decide if
> FPInfinity(sign) or FPMaxNormal(sign) is returned. Current
> codepath fails to check it.
> Fixed by adding overflow_to_inf flag to return infinity when it's
> true and maxnormal otherwise.

I agree that this is wrong, but I think we want to fix it by
making roundAndPackFloat16 work more similarly to the
roundAndPackFloat32 and 64 functions...

> Signed-off-by: Xiangyu Hu <libhu.so@gmail.com>
> ---
>  fpu/softfloat.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index f1170fe..409a574 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -3377,6 +3377,7 @@ static float32 roundAndPackFloat16(flag zSign, int_fast16_t zExp,
>      uint32_t increment;
>      bool rounding_bumps_exp;
>      bool is_tiny = false;
> +    bool overflow_to_inf = false;
>
>      /* Calculate the mask of bits of the mantissa which are not
>       * representable in half-precision and will be lost.
> @@ -3398,18 +3399,23 @@ static float32 roundAndPackFloat16(flag zSign, int_fast16_t zExp,
>          if ((zSig & mask) == increment) {
>              increment = zSig & (increment << 1);
>          }

1) we move this fiddling with increment down in the function
to the point just before where we add increment to zSig

> +        overflow_to_inf = true;
>          break;
>      case float_round_ties_away:
>          increment = (mask + 1) >> 1;
> +        overflow_to_inf = true;
>          break;
>      case float_round_up:
>          increment = zSign ? 0 : mask;
> +        overflow_to_inf = (zSign == 0);
>          break;
>      case float_round_down:
>          increment = zSign ? mask : 0;
> +        overflow_to_inf = (zSign == 1);
>          break;
>      default: /* round_to_zero */
>          increment = 0;
> +        overflow_to_inf = false;
>          break;
>      }

2) which means we don't need an extra flag, because now
"do we overflow to inf?" is always "is increment non-zero?"

> @@ -3418,7 +3424,11 @@ static float32 roundAndPackFloat16(flag zSign, int_fast16_t zExp,
>      if (zExp > maxexp || (zExp == maxexp && rounding_bumps_exp)) {
>          if (ieee) {
>              float_raise(float_flag_overflow | float_flag_inexact, status);
> -            return packFloat16(zSign, 0x1f, 0);
> +            if (overflow_to_inf) {
> +                return packFloat16(zSign, 0x1f, 0);
> +            } else {
> +                return packFloat16(zSign, 0x1e, 0x3ff);
> +            }

3) and so you can just say

   return packFloat16(zSign, 0x1f, -(increment == 0));

This is slightly tricky (we rely on packFloat16() adding
the significand argument into the value it builds up,
such that passing exponent X and significand -1 results
in a packed value with exponent X-1 and significand all-ones.)
But it's consistent with the way we handle this case in the
other roundAndPackFloat* functions, so I prefer it.

Deferring the adjustment of increment in the round_nearest_even
case looks at first glance like it might affect other cases,
because we calculate rounding_bumps_exp from increment. But
in fact rounding_bumps_exp will always have the same value
either way, if you work through the different possible
zSig values.

thanks
-- PMM
Xiangyu Hu Feb. 12, 2015, 6:25 a.m. UTC | #2
On Wed, Feb 11, 2015 at 5:23 AM, Peter Maydell <peter.maydell@linaro.org>
wrote:

> On 9 February 2015 at 14:56, Xiangyu Hu <libhu.so@gmail.com> wrote:
> > Flag overflow_to_inf is specified by ARM manual to decide if
> > FPInfinity(sign) or FPMaxNormal(sign) is returned. Current
> > codepath fails to check it.
> > Fixed by adding overflow_to_inf flag to return infinity when it's
> > true and maxnormal otherwise.
>
> I agree that this is wrong, but I think we want to fix it by
> making roundAndPackFloat16 work more similarly to the
> roundAndPackFloat32 and 64 functions...
>

Yes. Actually, a lot of single/double_to_half instructions would lead to
inconsistent errors
when calling roundAndPackFloat16. This fix on FPMaxNormal/FPInfinity
handling
only recover some of them. Failure to justify "round_up" flag described in
the manual
exposed other bugs.

I had another implementation that completely followed manual's FPRound(),
which
brought no failed cases at all.

I wonder where current algrithm of roundAndPackFloat16 comes from?


>
> > Signed-off-by: Xiangyu Hu <libhu.so@gmail.com>
> > ---
> >  fpu/softfloat.c | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> > index f1170fe..409a574 100644
> > --- a/fpu/softfloat.c
> > +++ b/fpu/softfloat.c
> > @@ -3377,6 +3377,7 @@ static float32 roundAndPackFloat16(flag zSign,
> int_fast16_t zExp,
> >      uint32_t increment;
> >      bool rounding_bumps_exp;
> >      bool is_tiny = false;
> > +    bool overflow_to_inf = false;
> >
> >      /* Calculate the mask of bits of the mantissa which are not
> >       * representable in half-precision and will be lost.
> > @@ -3398,18 +3399,23 @@ static float32 roundAndPackFloat16(flag zSign,
> int_fast16_t zExp,
> >          if ((zSig & mask) == increment) {
> >              increment = zSig & (increment << 1);
> >          }
>
> 1) we move this fiddling with increment down in the function
> to the point just before where we add increment to zSig
>
> > +        overflow_to_inf = true;
> >          break;
> >      case float_round_ties_away:
> >          increment = (mask + 1) >> 1;
> > +        overflow_to_inf = true;
> >          break;
> >      case float_round_up:
> >          increment = zSign ? 0 : mask;
> > +        overflow_to_inf = (zSign == 0);
> >          break;
> >      case float_round_down:
> >          increment = zSign ? mask : 0;
> > +        overflow_to_inf = (zSign == 1);
> >          break;
> >      default: /* round_to_zero */
> >          increment = 0;
> > +        overflow_to_inf = false;
> >          break;
> >      }
>
> 2) which means we don't need an extra flag, because now
> "do we overflow to inf?" is always "is increment non-zero?"
>

Yes... my original thought it was not meaningful as the manual and
added the overflow_to_inf flag.


> > @@ -3418,7 +3424,11 @@ static float32 roundAndPackFloat16(flag zSign,
> int_fast16_t zExp,
> >      if (zExp > maxexp || (zExp == maxexp && rounding_bumps_exp)) {
> >          if (ieee) {
> >              float_raise(float_flag_overflow | float_flag_inexact,
> status);
> > -            return packFloat16(zSign, 0x1f, 0);
> > +            if (overflow_to_inf) {
> > +                return packFloat16(zSign, 0x1f, 0);
> > +            } else {
> > +                return packFloat16(zSign, 0x1e, 0x3ff);
> > +            }
>
> 3) and so you can just say
>
>    return packFloat16(zSign, 0x1f, -(increment == 0));


> This is slightly tricky (we rely on packFloat16() adding
> the significand argument into the value it builds up,
> such that passing exponent X and significand -1 results
> in a packed value with exponent X-1 and significand all-ones.)
> But it's consistent with the way we handle this case in the
> other roundAndPackFloat* functions, so I prefer it.
>

That's a nice trick!


>
> Deferring the adjustment of increment in the round_nearest_even
> case looks at first glance like it might affect other cases,
> because we calculate rounding_bumps_exp from increment. But
> in fact rounding_bumps_exp will always have the same value
> either way, if you work through the different possible
> zSig values.
>

Again, I am not quite following the whole implemented algorithm of
this roundAndPackFloat16; it cannot align with roundAndPackFloat[32|64]
which originates softfloat(John Hauser).


>
> thanks
> -- PMM
>

Thanks!

- xiangyu
Xiangyu Hu Feb. 12, 2015, 6:29 a.m. UTC | #3
On Wed, Feb 11, 2015 at 9:23 PM, Peter Maydell <peter.maydell@linaro.org>
wrote:

> On 9 February 2015 at 14:56, Xiangyu Hu <libhu.so@gmail.com> wrote:
> > Flag overflow_to_inf is specified by ARM manual to decide if
> > FPInfinity(sign) or FPMaxNormal(sign) is returned. Current
> > codepath fails to check it.
> > Fixed by adding overflow_to_inf flag to return infinity when it's
> > true and maxnormal otherwise.
>
> I agree that this is wrong, but I think we want to fix it by
> making roundAndPackFloat16 work more similarly to the
> roundAndPackFloat32 and 64 functions...
>

​
Yes. Actually, a lot of single/double_to_half instructions would lead to
inconsistent errors
when calling roundAndPackFloat16. This fix on FPMaxNormal/FPInfinity
handling
only recover some of them. Failure to justify "round_up" flag described in
the manual
exposed other bugs.

I had another implementation that completely followed manual's FPRound(),
which
brought no failed cases at all.

I wonder where current algrithm of roundAndPackFloat16 comes from?​



> > Signed-off-by: Xiangyu Hu <libhu.so@gmail.com>
> > ---
> >  fpu/softfloat.c | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> > index f1170fe..409a574 100644
> > --- a/fpu/softfloat.c
> > +++ b/fpu/softfloat.c
> > @@ -3377,6 +3377,7 @@ static float32 roundAndPackFloat16(flag zSign,
> int_fast16_t zExp,
> >      uint32_t increment;
> >      bool rounding_bumps_exp;
> >      bool is_tiny = false;
> > +    bool overflow_to_inf = false;
> >
> >      /* Calculate the mask of bits of the mantissa which are not
> >       * representable in half-precision and will be lost.
> > @@ -3398,18 +3399,23 @@ static float32 roundAndPackFloat16(flag zSign,
> int_fast16_t zExp,
> >          if ((zSig & mask) == increment) {
> >              increment = zSig & (increment << 1);
> >          }
>
> 1) we move this fiddling with increment down in the function
> to the point just before where we add increment to zSig
>
> > +        overflow_to_inf = true;
> >          break;
> >      case float_round_ties_away:
> >          increment = (mask + 1) >> 1;
> > +        overflow_to_inf = true;
> >          break;
> >      case float_round_up:
> >          increment = zSign ? 0 : mask;
> > +        overflow_to_inf = (zSign == 0);
> >          break;
> >      case float_round_down:
> >          increment = zSign ? mask : 0;
> > +        overflow_to_inf = (zSign == 1);
> >          break;
> >      default: /* round_to_zero */
> >          increment = 0;
> > +        overflow_to_inf = false;
> >          break;
> >      }
>
> 2) which means we don't need an extra flag, because now
> "do we overflow to inf?" is always "is increment non-zero?"
>

​
Yes... my original thought it was not meaningful as the manual and
added the overflow_to_inf flag.
​


>
> > @@ -3418,7 +3424,11 @@ static float32 roundAndPackFloat16(flag zSign,
> int_fast16_t zExp,
> >      if (zExp > maxexp || (zExp == maxexp && rounding_bumps_exp)) {
> >          if (ieee) {
> >              float_raise(float_flag_overflow | float_flag_inexact,
> status);
> > -            return packFloat16(zSign, 0x1f, 0);
> > +            if (overflow_to_inf) {
> > +                return packFloat16(zSign, 0x1f, 0);
> > +            } else {
> > +                return packFloat16(zSign, 0x1e, 0x3ff);
> > +            }
>
> 3) and so you can just say
>
>    return packFloat16(zSign, 0x1f, -(increment == 0));
>

​That's a nice trick!
​


>
> This is slightly tricky (we rely on packFloat16() adding
> the significand argument into the value it builds up,
> such that passing exponent X and significand -1 results
> in a packed value with exponent X-1 and significand all-ones.)
> But it's consistent with the way we handle this case in the
> other roundAndPackFloat* functions, so I prefer it.
>
> Deferring the adjustment of increment in the round_nearest_even
> case looks at first glance like it might affect other cases,
> because we calculate rounding_bumps_exp from increment. But
> in fact rounding_bumps_exp will always have the same value
> either way, if you work through the different possible
> zSig values.
>

​
Again, I am not quite following the whole implemented algorithm of
this roundAndPackFloat16; it cannot align with roundAndPackFloat[32|64]
which originates softfloat(John Hauser).
​


>
> thanks
> -- PMM
>

​Thanks!
- xiangyu​
Peter Maydell Feb. 12, 2015, 6:32 a.m. UTC | #4
On 12 February 2015 at 06:25, Xiangyu Hu <xiangyu.a.hu@gmail.com> wrote:
> On Wed, Feb 11, 2015 at 5:23 AM, Peter Maydell <peter.maydell@linaro.org>
> wrote:
>>
>> On 9 February 2015 at 14:56, Xiangyu Hu <libhu.so@gmail.com> wrote:
>> > Flag overflow_to_inf is specified by ARM manual to decide if
>> > FPInfinity(sign) or FPMaxNormal(sign) is returned. Current
>> > codepath fails to check it.
>> > Fixed by adding overflow_to_inf flag to return infinity when it's
>> > true and maxnormal otherwise.
>>
>> I agree that this is wrong, but I think we want to fix it by
>> making roundAndPackFloat16 work more similarly to the
>> roundAndPackFloat32 and 64 functions...
>
>
> Yes. Actually, a lot of single/double_to_half instructions would lead to
> inconsistent errors
> when calling roundAndPackFloat16. This fix on FPMaxNormal/FPInfinity
> handling
> only recover some of them. Failure to justify "round_up" flag described in
> the manual
> exposed other bugs.

I'm not surprised there are other bugs lurking here.

> I had another implementation that completely followed manual's FPRound(),
> which
> brought no failed cases at all.

I'd prefer to fix the bugs we have rather than totally
reimplement, though (especially since the ARM ARM pseudocode
tends to assume things like infinite-precision floating
point types; it's not always a great model to copy).

> I wonder where current algrithm of roundAndPackFloat16 comes from?

It was added ages ago as part of supporting the ARM 16-bit
fp types; upstream softfloat doesn't have any 16-bit fp
support.

-- PMM
diff mbox

Patch

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index f1170fe..409a574 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -3377,6 +3377,7 @@  static float32 roundAndPackFloat16(flag zSign, int_fast16_t zExp,
     uint32_t increment;
     bool rounding_bumps_exp;
     bool is_tiny = false;
+    bool overflow_to_inf = false;
 
     /* Calculate the mask of bits of the mantissa which are not
      * representable in half-precision and will be lost.
@@ -3398,18 +3399,23 @@  static float32 roundAndPackFloat16(flag zSign, int_fast16_t zExp,
         if ((zSig & mask) == increment) {
             increment = zSig & (increment << 1);
         }
+        overflow_to_inf = true;
         break;
     case float_round_ties_away:
         increment = (mask + 1) >> 1;
+        overflow_to_inf = true;
         break;
     case float_round_up:
         increment = zSign ? 0 : mask;
+        overflow_to_inf = (zSign == 0);
         break;
     case float_round_down:
         increment = zSign ? mask : 0;
+        overflow_to_inf = (zSign == 1);
         break;
     default: /* round_to_zero */
         increment = 0;
+        overflow_to_inf = false;
         break;
     }
 
@@ -3418,7 +3424,11 @@  static float32 roundAndPackFloat16(flag zSign, int_fast16_t zExp,
     if (zExp > maxexp || (zExp == maxexp && rounding_bumps_exp)) {
         if (ieee) {
             float_raise(float_flag_overflow | float_flag_inexact, status);
-            return packFloat16(zSign, 0x1f, 0);
+            if (overflow_to_inf) {
+                return packFloat16(zSign, 0x1f, 0);
+            } else {
+                return packFloat16(zSign, 0x1e, 0x3ff);
+            }
         } else {
             float_raise(float_flag_invalid, status);
             return packFloat16(zSign, 0x1f, 0x3ff);