Submitted by Richard Henderson on Dec. 15, 2009, 3:50 a.m.

Message ID | 4B27076C.6020806@twiddle.net |
---|---|

State | New |

Headers | show |

On Tue, Dec 15, 2009 at 4:50 AM, Richard Henderson <rth@twiddle.net> wrote: > On 12/14/2009 04:31 PM, Richard Henderson wrote: >> >> On 12/14/2009 12:11 PM, Laurent Desnogues wrote: >>> >>> I'll take a closer look at your patch tomorrow. >> >> For the record, I believe this finishes what I had in mind for the >> exception handling there in op_handler.c. > > Hmph. One more patch for correctness. With this 183.equake runs correctly. > I couldn't remember all the hoops to get runspec.pl to work, to do the > whole testsuite, but I did run this one by hand. > > ./quake-amd64: Done. Terminating the simulation. > > real 0m34.943s > user 0m34.913s > sys 0m0.024s > > ./quake-axp: Done. Terminating the simulation. > > real 33m24.105s > user 33m23.674s > sys 0m0.116s > > with identical output. Well equake works me with the last set of patches you sent (that is without using this FPU patch you just sent). Can you confirm? Laurent

On 12/15/2009 03:31 AM, Laurent Desnogues wrote: > Well equake works me with the last set of patches you sent (that is > without using this FPU patch you just sent). Can you confirm? GCC testsuite breaks with the saturating cvttq so I didn't even try. If equake works, it just means that it doesn't do any conversions to unsigned long. r~

On Mon, 14 Dec 2009, Richard Henderson wrote: > On 12/14/2009 04:31 PM, Richard Henderson wrote: > > Hmph. One more patch for correctness. With this 183.equake runs correctly. > I couldn't remember all the hoops to get runspec.pl to work, to do the whole > testsuite, but I did run this one by hand. This is great! About a year ago I had been steadily working on getting all of spec2k to run under alpha qemu but then other projects came up and I didn't have time. I'll re-run once the patches get applied to see if any more of the benchmarks have issues. My notes from before your patches showed the following benchmarks had issues: perlbmk.perfect WRONG_OUTPUT eon.rushmeier wrong_results eon.kajiya wrong_results eon.cook wrong_results art.110 never finished art.470 never finished vpr.place Arithmetic trap. vpr.route Arithmetic trap. mesa Arithmetic trap. wupwise wrong results (NaN) lucas Arithmetic trap. equake wrong results (NaN) sixtrack Fortran runtime error: Invalid string input in item 1 facerec Arithmetic trap. fma3d MMU data fault ammp Arithmetic trap. galgel broken? twolf Arithmetic trap. apsi MMU data fault Vince

On Mon, 14 Dec 2009, Richard Henderson wrote: > On 12/14/2009 04:31 PM, Richard Henderson wrote: > > Hmph. One more patch for correctness. With this 183.equake runs correctly. I just finished running all of spec2k Alpha through Qemu. With these patches installed (the 4 fpu ones, and the 5 from the earlier thread) on top of current Qemu, pretty much all of the floating point issues have been fixed! Before, at least 19 of the spec2k benchmarks didn't run for various reasons. Now only two fail, sixtrack and fma3d. And I think those two are probably syscall related, though I haven't had chance to look in detail. #sixtrack At line 343 of file daten.f Fortran runtime error: End of file #fma3d FMA-3D> Fatal Error Reported. Job Terminated. So it would be great if the alpha patches could be merged. Vince

diff --git a/target-alpha/op_helper.c b/target-alpha/op_helper.c index d031f56..2d1c3d5 100644 --- a/target-alpha/op_helper.c +++ b/target-alpha/op_helper.c @@ -1220,120 +1220,106 @@ uint64_t helper_cvtqs (uint64_t a, uint32_t quals) is used by the compiler to get unsigned conversion for free with the same instruction. */ -static uint64_t cvttq_noqual_internal(uint64_t a, uint32_t rounding_mode) +static uint64_t cvttq_internal(uint64_t a) { uint64_t frac, ret = 0; - uint32_t exp, sign; + uint32_t exp, sign, exc = 0; int shift; sign = (a >> 63); exp = (uint32_t)(a >> 52) & 0x7ff; frac = a & 0xfffffffffffffull; - /* We already handled denormals in remap_ieee_input; infinities and - nans are defined to return zero as per truncation. */ - if (exp == 0 || exp == 0x7ff) - return 0; - - /* Restore implicit bit. */ - frac |= 0x10000000000000ull; - - /* Note that neither overflow exceptions nor inexact exceptions - are desired. This lets us streamline the checks quite a bit. */ - shift = exp - 1023 - 52; - if (shift >= 0) { - /* In this case the number is so large that we must shift - the fraction left. There is no rounding to do. */ - if (shift < 63) { - ret = frac << shift; - } + if (exp == 0) { + if (unlikely(frac != 0)) + goto do_underflow; + } else if (exp == 0x7ff) { + if (frac == 0) + exc = float_flag_overflow; + else + exc = float_flag_invalid; } else { - uint64_t round; - - /* In this case the number is smaller than the fraction as - represented by the 52 bit number. Here we must think - about rounding the result. Handle this by shifting the - fractional part of the number into the high bits of ROUND. - This will let us efficiently handle round-to-nearest. */ - shift = -shift; - if (shift < 63) { - ret = frac >> shift; - round = frac << (64 - shift); + /* Restore implicit bit. */ + frac |= 0x10000000000000ull; + + /* Note that neither overflow exceptions nor inexact exceptions + are desired. This lets us streamline the checks quite a bit. */ + shift = exp - 1023 - 52; + if (shift >= 0) { + /* In this case the number is so large that we must shift + the fraction left. There is no rounding to do. */ + if (shift < 63) { + ret = frac << shift; + if ((ret >> shift) != frac) + exc = float_flag_overflow; + } } else { - /* The exponent is so small we shift out everything. - Leave a sticky bit for proper rounding below. */ - round = 1; - } + uint64_t round; + + /* In this case the number is smaller than the fraction as + represented by the 52 bit number. Here we must think + about rounding the result. Handle this by shifting the + fractional part of the number into the high bits of ROUND. + This will let us efficiently handle round-to-nearest. */ + shift = -shift; + if (shift < 63) { + ret = frac >> shift; + round = frac << (64 - shift); + } else { + /* The exponent is so small we shift out everything. + Leave a sticky bit for proper rounding below. */ + do_underflow: + round = 1; + } - if (round) { - switch (rounding_mode) { - case float_round_nearest_even: - if (round == (1ull << 63)) { - /* Remaining fraction is exactly 0.5; round to even. */ - ret += (ret & 1); - } else if (round > (1ull << 63)) { - ret += 1; + if (round) { + exc = float_flag_inexact; + switch (FP_STATUS.float_rounding_mode) { + case float_round_nearest_even: + if (round == (1ull << 63)) { + /* Fraction is exactly 0.5; round to even. */ + ret += (ret & 1); + } else if (round > (1ull << 63)) { + ret += 1; + } + break; + case float_round_to_zero: + break; + case float_round_up: + if (!sign) + ret += 1; + break; + case float_round_down: + if (sign) + ret += 1; + break; } - break; - case float_round_to_zero: - break; - case float_round_up: - if (!sign) - ret += 1; - break; - case float_round_down: - if (sign) - ret += 1; - break; } } + if (sign) + ret = -ret; } + if (unlikely(exc)) + float_raise(exc, &FP_STATUS); - if (sign) - ret = -ret; return ret; } uint64_t helper_cvttq (uint64_t a, uint32_t quals) { uint64_t ret; + uint32_t token; - a = remap_ieee_input(quals, a); - - if (quals & QUAL_V) { - float64 fa = t_to_float64(a); - uint32_t token; - - token = begin_fp_exception(); - if ((quals & QUAL_RM_MASK) == QUAL_RM_C) { - ret = float64_to_int64_round_to_zero(fa, &FP_STATUS); - } else { - token |= begin_fp_roundmode(quals); - ret = float64_to_int64(fa, &FP_STATUS); - end_fp_roundmode(token); - } - end_fp_exception(quals, token); - } else { - uint32_t round_mode; - - switch (quals & QUAL_RM_MASK) { - case QUAL_RM_N: - round_mode = float_round_nearest_even; - break; - case QUAL_RM_C: - default: - round_mode = float_round_to_zero; - break; - case QUAL_RM_M: - round_mode = float_round_down; - break; - case QUAL_RM_D: - round_mode = FP_STATUS.float_rounding_mode; - break; - } + /* ??? There's an arugument to be made that when /S is enabled, we + should provide the standard IEEE saturated result, instead of + the truncated result that we *must* provide when /V is disabled. + However, that's not how either the Tru64 or Linux completion + handlers actually work, and GCC knows it. */ - ret = cvttq_noqual_internal(a, round_mode); - } + token = begin_fp(quals); + a = remap_ieee_input(quals, a); + ret = cvttq_internal(a); + end_fp(quals, token); return ret; }