Patchwork [08/19] target-alpha: use new float64_unordered() function

login
register
mail settings
Submitter Aurelien Jarno
Date April 12, 2011, 9:59 p.m.
Message ID <1302645571-20500-9-git-send-email-aurelien@aurel32.net>
Download mbox | patch
Permalink /patch/90924/
State New
Headers show

Comments

Aurelien Jarno - April 12, 2011, 9:59 p.m.
Use float64_unordered() in helper_cmptun() instead of doing the
the comparison manually. This also fixes the wrong behaviours with
sNaNs.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target-alpha/op_helper.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)
Peter Maydell - April 13, 2011, 2:52 p.m.
On 12 April 2011 22:59, Aurelien Jarno <aurelien@aurel32.net> wrote:
> Use float64_unordered() in helper_cmptun() instead of doing the
> the comparison manually. This also fixes the wrong behaviours with
> sNaNs.
>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  target-alpha/op_helper.c |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/target-alpha/op_helper.c b/target-alpha/op_helper.c
> index 6c2ae20..e07ae69 100644
> --- a/target-alpha/op_helper.c
> +++ b/target-alpha/op_helper.c
> @@ -904,10 +904,11 @@ uint64_t helper_cmptun (uint64_t a, uint64_t b)
>     fa = t_to_float64(a);
>     fb = t_to_float64(b);
>
> -    if (float64_is_quiet_nan(fa) || float64_is_quiet_nan(fb))
> +    if (float64_unordered(fa, fb, &FP_STATUS)) {
>         return 0x4000000000000000ULL;
> -    else
> +    } else {
>         return 0;
> +    }
>  }

I'm not sure this is right. The Alpha Architecture Handbook is a
bit opaque, but the Compiler Writer's Guide is clearer:
www.compaq.com/cpq-alphaserver/technology/literature/cmpwrgd.pdf
page D-4 says that CMPTUN (like CMPTEQ) should generate InvalidOp
for SNaNs but not for QNaNs. (Contrast CMPTLT, CMPTLE, which set
InvalidOp for both SNaNs and QNaNs.)

So I think you want the _quiet version here. (And helper_cmpteq
needs to use float64_eq_quiet rather than float64_eq.)

-- PMM
Richard Henderson - April 13, 2011, 3:38 p.m.
[ Odd, the original thread doesn't seem to have arrived here. ]

On 04/13/2011 07:52 AM, Peter Maydell wrote:
> So I think you want the _quiet version here. (And helper_cmpteq
> needs to use float64_eq_quiet rather than float64_eq.)

Yes, the _quiet version is what's needed for all comparisons.

For the record, the goal for QEMU should not be to emulate the
hardware exactly, but to emulate the entire HW+OS system.  Thus
when looking at Table B-2 one should look at the OS Completion
Handler and User Signal Handler columns.

That's for /s qualified instructions anyway.  For non-ieee code
we take care to inject the extra traps via helper_ieee_input*.

(It looks like some of the Alpha code can be cleaned up a bit.
I don't recall flush_inputs_to_zero option being there before,
and we do that by hand in helper_ieee_input*.)


r~
Peter Maydell - April 13, 2011, 3:42 p.m.
On 13 April 2011 16:38, Richard Henderson <rth@twiddle.net> wrote:
> [ Odd, the original thread doesn't seem to have arrived here. ]
>
> On 04/13/2011 07:52 AM, Peter Maydell wrote:
>> So I think you want the _quiet version here. (And helper_cmpteq
>> needs to use float64_eq_quiet rather than float64_eq.)
>
> Yes, the _quiet version is what's needed for all comparisons.

Really all comparisons, including CMPTLT, CMPTLE?

> (It looks like some of the Alpha code can be cleaned up a bit.
> I don't recall flush_inputs_to_zero option being there before,
> and we do that by hand in helper_ieee_input*.)

Yes, I added that softfloat feature for the benefit of ARM's
flush-to-zero mode.

-- PMM
Richard Henderson - April 13, 2011, 3:53 p.m.
On 04/13/2011 08:42 AM, Peter Maydell wrote:
> On 13 April 2011 16:38, Richard Henderson <rth@twiddle.net> wrote:
>> [ Odd, the original thread doesn't seem to have arrived here. ]
>>
>> On 04/13/2011 07:52 AM, Peter Maydell wrote:
>>> So I think you want the _quiet version here. (And helper_cmpteq
>>> needs to use float64_eq_quiet rather than float64_eq.)
>>
>> Yes, the _quiet version is what's needed for all comparisons.
> 
> Really all comparisons, including CMPTLT, CMPTLE?

Oops, no.  CMPTLE and CMPTLT in Table B-2 are on the next page,
and clearly indicate that they signal InvalidOP for QNaN.

So it's just CMPTUN and CMPTEQ that should not signal on QNaN.


r~
Aurelien Jarno - April 13, 2011, 6:18 p.m.
On Wed, Apr 13, 2011 at 08:53:27AM -0700, Richard Henderson wrote:
> On 04/13/2011 08:42 AM, Peter Maydell wrote:
> > On 13 April 2011 16:38, Richard Henderson <rth@twiddle.net> wrote:
> >> [ Odd, the original thread doesn't seem to have arrived here. ]
> >>
> >> On 04/13/2011 07:52 AM, Peter Maydell wrote:
> >>> So I think you want the _quiet version here. (And helper_cmpteq
> >>> needs to use float64_eq_quiet rather than float64_eq.)
> >>
> >> Yes, the _quiet version is what's needed for all comparisons.
> > 
> > Really all comparisons, including CMPTLT, CMPTLE?
> 
> Oops, no.  CMPTLE and CMPTLT in Table B-2 are on the next page,
> and clearly indicate that they signal InvalidOP for QNaN.
> 
> So it's just CMPTUN and CMPTEQ that should not signal on QNaN.
> 

Thanks to both of you for digging into the manuals, the information I
have found in the manual I have were not so clear. I'll fix that in the
next version.
Peter Maydell - April 14, 2011, 9:14 a.m.
On 13 April 2011 16:38, Richard Henderson <rth@twiddle.net> wrote:
> (It looks like some of the Alpha code can be cleaned up a bit.
> I don't recall flush_inputs_to_zero option being there before,
> and we do that by hand in helper_ieee_input*.)

While we're on the subject of Alpha and flush-to-zero modes,
do you know what exception bits should get set when Alpha
flushes a denormal output (not input) to zero?

At the moment softfloat doesn't set any flags when it does
this, which is definitely wrong for ARM. I had a look at
the manuals for the other archs which might set flush_to_zero:

 SH: should set underflow flag
 ARM: should set underflow flag
 PPC: should set underflow (I think)
 MIPS: docs didn't say

I think Alpha should set Inexact and not Underflow, but
I'm not sure -- can you confirm/deny?

(Obviously I would prefer it if every architecture just
needed to set underflow, but somehow I don't think it
will work out that cleanly :-))

thanks
-- PMM
Richard Henderson - April 14, 2011, 3:14 p.m.
On 04/14/2011 02:14 AM, Peter Maydell wrote:
> While we're on the subject of Alpha and flush-to-zero modes,
> do you know what exception bits should get set when Alpha
> flushes a denormal output (not input) to zero?
...
> I think Alpha should set Inexact and not Underflow, but
> I'm not sure -- can you confirm/deny?

Deny.

Page B-8, Add Sub Output Exceptions,

  Exponent underflow and disabled:
    Supply +0, no exception delivered to user.

  Exponent underflow and enabled:
    Supply +-MIN denorm, Underflow delivered to user.

    Footnote 3, Overflow and Underflow have priority over Inexact.


r~
Peter Maydell - April 14, 2011, 3:39 p.m.
On 14 April 2011 16:14, Richard Henderson <rth@twiddle.net> wrote:
> On 04/14/2011 02:14 AM, Peter Maydell wrote:
>> While we're on the subject of Alpha and flush-to-zero modes,
>> do you know what exception bits should get set when Alpha
>> flushes a denormal output (not input) to zero?
> ...
>> I think Alpha should set Inexact and not Underflow, but
>> I'm not sure -- can you confirm/deny?
>
> Deny.
>
> Page B-8, Add Sub Output Exceptions,
>
>  Exponent underflow and disabled:
>    Supply +0, no exception delivered to user.
>
>  Exponent underflow and enabled:
>    Supply +-MIN denorm, Underflow delivered to user.
>
>    Footnote 3, Overflow and Underflow have priority over Inexact.

Thanks. Does "no exception delivered to user" mean also
"and do not set FPCR bit UNF" ?

The reason I thought it might set Inexact is that I was looking
at page 4-79, which says:

"If both the UNFD (underflow disable) bit and the UNDZ (underflow
 to zero) bit are set in the FPCR, the implementation sets the
 result of an underflow operation to a true zero result. The
 zeroing of a denormal result by UNDZ must also be treated as an
 inexact result."

-- PMM
Richard Henderson - April 14, 2011, 4:27 p.m.
On 04/14/2011 08:39 AM, Peter Maydell wrote:
>>  Exponent underflow and disabled:
>>    Supply +0, no exception delivered to user.
>>
>>  Exponent underflow and enabled:
>>    Supply +-MIN denorm, Underflow delivered to user.
>>
>>    Footnote 3, Overflow and Underflow have priority over Inexact.
> 
> Thanks. Does "no exception delivered to user" mean also
> "and do not set FPCR bit UNF" ?

Yes.

> The reason I thought it might set Inexact is that I was looking
> at page 4-79, which says:
> 
> "If both the UNFD (underflow disable) bit and the UNDZ (underflow
>  to zero) bit are set in the FPCR, the implementation sets the
>  result of an underflow operation to a true zero result. The
>  zeroing of a denormal result by UNDZ must also be treated as an
>  inexact result."

Hum.  It looks like we can choose between these results then,
depending on the intersection of the FPCR disable bits, and
the per-instruction trapping mode bits (see section 4.7.7.2).

I *think* what would be best for Alpha is if, within softfloat,
both conditions are signaled, and then we can filter the result
that is actually needed via helper_fp_exc_raise?  It's hard to
say without actually doing the work...

Unfortunately, I suspect that the Correct result on real HW
also depends on the OS completion handler, and I know that at
least for Linux that code was written before UNDZ was added.
So I don't know if even real HW produces the correct result
when considering Underflow priority over Inexact.


r~
Peter Maydell - April 14, 2011, 4:48 p.m.
On 14 April 2011 17:27, Richard Henderson <rth@twiddle.net> wrote:
> On 04/14/2011 08:39 AM, Peter Maydell wrote:
>>>  Exponent underflow and disabled:
>>>    Supply +0, no exception delivered to user.
>>>
>>>  Exponent underflow and enabled:
>>>    Supply +-MIN denorm, Underflow delivered to user.
>>>
>>>    Footnote 3, Overflow and Underflow have priority over Inexact.
>>
>> Thanks. Does "no exception delivered to user" mean also
>> "and do not set FPCR bit UNF" ?
>
> Yes.
>
>> The reason I thought it might set Inexact is that I was looking
>> at page 4-79, which says:
>>
>> "If both the UNFD (underflow disable) bit and the UNDZ (underflow
>>  to zero) bit are set in the FPCR, the implementation sets the
>>  result of an underflow operation to a true zero result. The
>>  zeroing of a denormal result by UNDZ must also be treated as an
>>  inexact result."
>
> Hum.  It looks like we can choose between these results then,
> depending on the intersection of the FPCR disable bits, and
> the per-instruction trapping mode bits (see section 4.7.7.2).
>
> I *think* what would be best for Alpha is if, within softfloat,
> both conditions are signaled, and then we can filter the result
> that is actually needed via helper_fp_exc_raise?  It's hard to
> say without actually doing the work...

Unfortunately doing that is bad for other architectures because
they tend to let the softfloat status flags act directly as
the cumulative exception flags, so if softfloat sets flags they
don't want you've suddenly imposed a burden of saving and
restoring flags or something similarly ugly.

It sounds like maybe what we need is to have a
 signed char float_ftz_flags
or similar in float_status which should be set to the
float_flags which should be passed to float_raise() when
we flush an output denormal to zero. Then we can set that to
float_flag_underflow for ARM, leave it at 0 for other
architectures [preserve existing semantics] and set it to
float_flag_underflow | float_flag_inexact for Alpha.

Or the other approach would be to do something like
how the 'denormal-input-went-to-zero' case is handled: have
an extra float_flag for float_flag_output_denormal, and let
targets either (a) special-case it or (b) merge it with
whichever of underflow or inexact they like when constructing
the guest-visible floating point status flags. I kind of like
that better, actually.

-- PMM
Richard Henderson - April 14, 2011, 5:01 p.m.
On 04/14/2011 09:48 AM, Peter Maydell wrote:
> Or the other approach would be to do something like
> how the 'denormal-input-went-to-zero' case is handled: have
> an extra float_flag for float_flag_output_denormal, and let
> targets either (a) special-case it or (b) merge it with
> whichever of underflow or inexact they like when constructing
> the guest-visible floating point status flags. I kind of like
> that better, actually.

Either of your proposed solutions works for me.

Either allows significant freedom in the backend to decide on
how to implement the per-instruction selection.


r~

Patch

diff --git a/target-alpha/op_helper.c b/target-alpha/op_helper.c
index 6c2ae20..e07ae69 100644
--- a/target-alpha/op_helper.c
+++ b/target-alpha/op_helper.c
@@ -904,10 +904,11 @@  uint64_t helper_cmptun (uint64_t a, uint64_t b)
     fa = t_to_float64(a);
     fb = t_to_float64(b);
 
-    if (float64_is_quiet_nan(fa) || float64_is_quiet_nan(fb))
+    if (float64_unordered(fa, fb, &FP_STATUS)) {
         return 0x4000000000000000ULL;
-    else
+    } else {
         return 0;
+    }
 }
 
 uint64_t helper_cmpteq(uint64_t a, uint64_t b)