target-alpha: An approach to fp insn qualifiers

Message ID	4B27076C.6020806@twiddle.net
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> Message-ID: <4B27076C.6020806@twiddle.net> Date: Mon, 14 Dec 2009 19:50:04 -0800 From: Richard Henderson <rth@twiddle.net> User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.4pre) Gecko/20090922 Fedora/3.0-3.9.b4.fc12 Thunderbird/3.0b4 MIME-Version: 1.0 To: Laurent Desnogues <laurent.desnogues@gmail.com> Subject: Re: [Qemu-devel] target-alpha: An approach to fp insn qualifiers References: <4B267DBC.8050909@twiddle.net> <761ea48b0912141211keb5bbben584d7fe76f44d78c@mail.gmail.com> <4B26D8EF.10801@twiddle.net> In-Reply-To: <4B26D8EF.10801@twiddle.net> Content-Type: multipart/mixed; boundary="------------030802040407050404070200" Cc: qemu-devel@nongnu.org Precedence: list Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Message ID

4B27076C.6020806@twiddle.net

State

New

Headers

Message-ID: <4B27076C.6020806@twiddle.net>
Date: Mon, 14 Dec 2009 19:50:04 -0800
From: Richard Henderson <rth@twiddle.net>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
	rv:1.9.1.4pre) Gecko/20090922 Fedora/3.0-3.9.b4.fc12
	Thunderbird/3.0b4
MIME-Version: 1.0
To: Laurent Desnogues <laurent.desnogues@gmail.com>
Subject: Re: [Qemu-devel] target-alpha: An approach to fp insn qualifiers
References: <4B267DBC.8050909@twiddle.net>	<761ea48b0912141211keb5bbben584d7fe76f44d78c@mail.gmail.com>
	<4B26D8EF.10801@twiddle.net>
In-Reply-To: <4B26D8EF.10801@twiddle.net>
Content-Type: multipart/mixed;
	boundary="------------030802040407050404070200"
Cc: qemu-devel@nongnu.org
Precedence: list
Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Commit Message

Richard Henderson Dec. 15, 2009, 3:50 a.m. UTC

On 12/14/2009 04:31 PM, Richard Henderson wrote:
> On 12/14/2009 12:11 PM, Laurent Desnogues wrote:
>> I'll take a closer look at your patch tomorrow.
>
> For the record, I believe this finishes what I had in mind for the
> exception handling there in op_handler.c.

Hmph.  One more patch for correctness.  With this 183.equake runs 
correctly.  I couldn't remember all the hoops to get runspec.pl to work, 
to do the whole testsuite, but I did run this one by hand.

./quake-amd64: Done. Terminating the simulation.

real	0m34.943s
user	0m34.913s
sys	0m0.024s

./quake-axp: Done. Terminating the simulation.

real	33m24.105s
user	33m23.674s
sys	0m0.116s

with identical output.

r~
commit daf11ad5cd50c56d44e36e4ea334c660f8fe4c16
Author: Richard Henderson <rth@twiddle.net>
Date:   Mon Dec 14 19:46:57 2009 -0800

    target-alpha: Don't ever saturate cvttq.

    The previous patch tried allowing saturation if /S;
    that doesn't match what the kernels generate.

Comments

Laurent Desnogues Dec. 15, 2009, 11:31 a.m. UTC | #1

On Tue, Dec 15, 2009 at 4:50 AM, Richard Henderson <rth@twiddle.net> wrote:
> On 12/14/2009 04:31 PM, Richard Henderson wrote:
>>
>> On 12/14/2009 12:11 PM, Laurent Desnogues wrote:
>>>
>>> I'll take a closer look at your patch tomorrow.
>>
>> For the record, I believe this finishes what I had in mind for the
>> exception handling there in op_handler.c.
>
> Hmph.  One more patch for correctness.  With this 183.equake runs correctly.
>  I couldn't remember all the hoops to get runspec.pl to work, to do the
> whole testsuite, but I did run this one by hand.
>
> ./quake-amd64: Done. Terminating the simulation.
>
> real    0m34.943s
> user    0m34.913s
> sys     0m0.024s
>
> ./quake-axp: Done. Terminating the simulation.
>
> real    33m24.105s
> user    33m23.674s
> sys     0m0.116s
>
> with identical output.

Well equake works me with the last set of patches you sent (that is
without using this FPU patch you just sent).  Can you confirm?


Laurent

Richard Henderson Dec. 15, 2009, 4:17 p.m. UTC | #2

On 12/15/2009 03:31 AM, Laurent Desnogues wrote:
> Well equake works me with the last set of patches you sent (that is
> without using this FPU patch you just sent).  Can you confirm?

GCC testsuite breaks with the saturating cvttq so I didn't even try.
If equake works, it just means that it doesn't do any conversions to 
unsigned long.

r~

Vince Weaver Dec. 15, 2009, 5:32 p.m. UTC | #3

On Mon, 14 Dec 2009, Richard Henderson wrote:

> On 12/14/2009 04:31 PM, Richard Henderson wrote:
> 
> Hmph.  One more patch for correctness.  With this 183.equake runs correctly.
> I couldn't remember all the hoops to get runspec.pl to work, to do the whole
> testsuite, but I did run this one by hand.

This is great!  About a year ago I had been steadily working on getting 
all of spec2k to run under alpha qemu but then other projects came up and 
I didn't have time.

I'll re-run once the patches get applied to see if any more of the 
benchmarks have issues.  My notes from before your patches showed the 
following benchmarks had issues:

perlbmk.perfect   WRONG_OUTPUT
eon.rushmeier     wrong_results
eon.kajiya        wrong_results
eon.cook          wrong_results
art.110           never finished
art.470           never finished
vpr.place         Arithmetic trap.
vpr.route         Arithmetic trap.
mesa              Arithmetic trap.
wupwise           wrong results (NaN)
lucas             Arithmetic trap.
equake            wrong results (NaN)
sixtrack          Fortran runtime error: Invalid string input in item 1
facerec           Arithmetic trap.
fma3d             MMU data fault
ammp              Arithmetic trap.
galgel            broken?
twolf             Arithmetic trap.
apsi              MMU data fault

Vince

Vince Weaver Dec. 17, 2009, 5:18 p.m. UTC | #4

On Mon, 14 Dec 2009, Richard Henderson wrote:

> On 12/14/2009 04:31 PM, Richard Henderson wrote:
> 
> Hmph.  One more patch for correctness.  With this 183.equake runs correctly.

I just finished running all of spec2k Alpha through Qemu.

With these patches installed (the 4 fpu ones, and the 5 from the earlier 
thread) on top of current Qemu, pretty much all of the floating point 
issues have been fixed!

Before, at least 19 of the spec2k benchmarks didn't run for various 
reasons.  Now only two fail, sixtrack and fma3d.  And I think those two 
are probably syscall related, though I haven't had chance to look in 
detail.

#sixtrack     	 At line 343 of file daten.f Fortran runtime error: End of file
#fma3d	     	 FMA-3D> Fatal Error Reported. Job Terminated.

So it would be great if the alpha patches could be merged.

Vince

diff --git a/target-alpha/op_helper.c b/target-alpha/op_helper.c
index d031f56..2d1c3d5 100644
--- a/target-alpha/op_helper.c
+++ b/target-alpha/op_helper.c
@@ -1220,120 +1220,106 @@  uint64_t helper_cvtqs (uint64_t a, uint32_t quals)
    is used by the compiler to get unsigned conversion for free with
    the same instruction.  */
 
-static uint64_t cvttq_noqual_internal(uint64_t a, uint32_t rounding_mode)
+static uint64_t cvttq_internal(uint64_t a)
 {
     uint64_t frac, ret = 0;
-    uint32_t exp, sign;
+    uint32_t exp, sign, exc = 0;
     int shift;
 
     sign = (a >> 63);
     exp = (uint32_t)(a >> 52) & 0x7ff;
     frac = a & 0xfffffffffffffull;
 
-    /* We already handled denormals in remap_ieee_input; infinities and
-       nans are defined to return zero as per truncation.  */
-    if (exp == 0 || exp == 0x7ff)
-        return 0;
-
-    /* Restore implicit bit.  */
-    frac |= 0x10000000000000ull;
-
-    /* Note that neither overflow exceptions nor inexact exceptions
-       are desired.  This lets us streamline the checks quite a bit.  */
-    shift = exp - 1023 - 52;
-    if (shift >= 0) {
-        /* In this case the number is so large that we must shift
-           the fraction left.  There is no rounding to do.  */
-        if (shift < 63) {
-            ret = frac << shift;
-        }
+    if (exp == 0) {
+        if (unlikely(frac != 0))
+            goto do_underflow;
+    } else if (exp == 0x7ff) {
+        if (frac == 0)
+            exc = float_flag_overflow;
+        else
+            exc = float_flag_invalid;
     } else {
-        uint64_t round;
-
-        /* In this case the number is smaller than the fraction as
-           represented by the 52 bit number.  Here we must think 
-           about rounding the result.  Handle this by shifting the
-           fractional part of the number into the high bits of ROUND.
-           This will let us efficiently handle round-to-nearest.  */
-        shift = -shift;
-        if (shift < 63) {
-            ret = frac >> shift;
-            round = frac << (64 - shift);
+        /* Restore implicit bit.  */
+        frac |= 0x10000000000000ull;
+
+        /* Note that neither overflow exceptions nor inexact exceptions
+           are desired.  This lets us streamline the checks quite a bit.  */
+        shift = exp - 1023 - 52;
+        if (shift >= 0) {
+            /* In this case the number is so large that we must shift
+               the fraction left.  There is no rounding to do.  */
+            if (shift < 63) {
+                ret = frac << shift;
+                if ((ret >> shift) != frac)
+                    exc = float_flag_overflow;
+            }
         } else {
-            /* The exponent is so small we shift out everything.
-               Leave a sticky bit for proper rounding below.  */
-            round = 1;
-        }
+            uint64_t round;
+
+            /* In this case the number is smaller than the fraction as
+               represented by the 52 bit number.  Here we must think 
+               about rounding the result.  Handle this by shifting the
+               fractional part of the number into the high bits of ROUND.
+               This will let us efficiently handle round-to-nearest.  */
+            shift = -shift;
+            if (shift < 63) {
+                ret = frac >> shift;
+                round = frac << (64 - shift);
+            } else {
+                /* The exponent is so small we shift out everything.
+                   Leave a sticky bit for proper rounding below.  */
+            do_underflow:
+                round = 1;
+            }
 
-        if (round) {
-            switch (rounding_mode) {
-            case float_round_nearest_even:
-                if (round == (1ull << 63)) {
-                    /* Remaining fraction is exactly 0.5; round to even.  */
-                    ret += (ret & 1);
-                } else if (round > (1ull << 63)) {
-                    ret += 1;
+            if (round) {
+                exc = float_flag_inexact;
+                switch (FP_STATUS.float_rounding_mode) {
+                case float_round_nearest_even:
+                    if (round == (1ull << 63)) {
+                        /* Fraction is exactly 0.5; round to even.  */
+                        ret += (ret & 1);
+                    } else if (round > (1ull << 63)) {
+                        ret += 1;
+                    }
+                    break;
+                case float_round_to_zero:
+                    break;
+                case float_round_up:
+                    if (!sign)
+                        ret += 1;
+                    break;
+                case float_round_down:
+                    if (sign)
+                        ret += 1;
+                    break;
                 }
-                break;
-            case float_round_to_zero:
-                break;
-            case float_round_up:
-                if (!sign)
-                    ret += 1;
-                break;
-            case float_round_down:
-                if (sign)
-                    ret += 1;
-                break;
             }
         }
+        if (sign)
+            ret = -ret;
     }
+    if (unlikely(exc))
+        float_raise(exc, &FP_STATUS);
 
-    if (sign)
-        ret = -ret;
     return ret;
 }
 
 uint64_t helper_cvttq (uint64_t a, uint32_t quals)
 {
     uint64_t ret;
+    uint32_t token;
 
-    a = remap_ieee_input(quals, a);
-
-    if (quals & QUAL_V) {
-        float64 fa = t_to_float64(a);
-        uint32_t token;
-
-        token = begin_fp_exception();
-        if ((quals & QUAL_RM_MASK) == QUAL_RM_C) {
-            ret = float64_to_int64_round_to_zero(fa, &FP_STATUS);
-        } else {
-            token |= begin_fp_roundmode(quals);
-            ret = float64_to_int64(fa, &FP_STATUS);
-            end_fp_roundmode(token);
-        }
-        end_fp_exception(quals, token);
-    } else {
-        uint32_t round_mode;
-
-        switch (quals & QUAL_RM_MASK) {
-        case QUAL_RM_N:
-            round_mode = float_round_nearest_even;
-            break;
-        case QUAL_RM_C:
-        default:
-            round_mode = float_round_to_zero;
-            break;
-        case QUAL_RM_M:
-            round_mode = float_round_down;
-            break;
-        case QUAL_RM_D:
-            round_mode = FP_STATUS.float_rounding_mode;
-            break;
-        }
+    /* ??? There's an arugument to be made that when /S is enabled, we
+       should provide the standard IEEE saturated result, instead of
+       the truncated result that we *must* provide when /V is disabled.
+       However, that's not how either the Tru64 or Linux completion
+       handlers actually work, and GCC knows it.  */
 
-        ret = cvttq_noqual_internal(a, round_mode);
-    }
+    token = begin_fp(quals);
+    a = remap_ieee_input(quals, a);
+    ret = cvttq_internal(a);
+    end_fp(quals, token);
 
     return ret;
 }

target-alpha: An approach to fp insn qualifiers

Commit Message

Comments

Patch