Message ID | 20140624043423.GX18016@ZenIV.linux.org.uk |
---|---|
State | New |
Headers | show |
On Tue, Jun 24, 2014 at 05:34:23AM +0100, Al Viro wrote: > First of all, kudos - with current qemu tree qemu-alpha-system is > working pretty well - debian install and a *lot* of builds work just fine. > As in, getting from lenny to pretty complete squeeze toolchain, including gcj, > openjdk6 and a lot of crap needed to satisfy build-deps of those, plus all > priority:required and most of priority:important ones. It's a _lot_ of > beating and the damn thing survives - the problems had been with debian > packages themselves (fstatat() bug in lenny libc, epically buggered build-deps > in gcc-defaults, etc.). As it is, one core of 6-way 3.3GHz phenom II is quite > capable of running a home-grown autobuilder. Feels like ~250-300MHz alpha > with a very fast disk... > > Remaining problems, AFAICS, are around floating point traps. > I've found one in glibc testsuite (math/tests-misc.c; overflow in > ADDS/SU ends up with wrong results from fetestexcept() - only FE_OVERFLOW is > set, while the sucker expects FE_INEXACT as well and actual hardware sets both) > and another in gcc one (with -funsafe-math-optimizations CVTST/S on denorms > triggers SIGFPE/FPE_FLTINV). > > The libc one is a bug in gen_fp_exc_raise_ignore() - the difference > between ADDS/SU and ADDS/SUI is only in trapping, not storing results in > FPCR.INE and friends. Both will have the same effect on those and > if (ignore) { > tcg_gen_andi_i32(exc, exc, ~ignore); > } > in gen_fp_exc_raise_ignore() leads to exc & ignore not reaching the > update of env->fpcr_exc_status in helper_fp_exc_raise_s(). See 4.7.8: > [quote] > In addition, the FPCR gives a summary of each exception type for the > exception conditions detected by all IEEE floating-point operates thus > far, as well as an overall summary bit that indicates whether any of > these exception conditions has been detected. The indiividual exception > bits match exactly in purpose and order the exception bits found in the > exception summary quadword that is pushed for arithmetic traps. However, > for each instruction, these exception bits arse set independent of the > trapping mode specified for the instruction. Therefore, even though > trapping may be disabled for a certain exceptional condition, the fact > that the exceptional condition was encountered by an instruction is > still recorded in the FPCR. > [end quote] > And yes, on actual hardware both ADDS/SU and ADDS/SUI set FPCR.INE the same > way - verified by direct experiment. BTW, here's another testcase: nclude <stdio.h> unsigned long __attribute__((noinline)) f(double x) { return (unsigned long)x; // SVCTQ/SVC } main() { unsigned long x; extern unsigned long __ieee_get_fp_control(void); printf("before:%lx\n", __ieee_get_fp_control()); x = f(1ULL<<63); printf("after:%lx\n", __ieee_get_fp_control()); printf("result:%lx\n", x); } On actual hardware: before:0 after:20000 result:8000000000000000 On qemu: before:0 after:0 result:8000000000000000 IOW, gen_fcvttq() is also affected, not only gen_fp_exc_raise(). Can't we simply have separate helpers for various trap suffices, with all this work on getting exc, etc. taken inside them? It's not as if we distinguished many variants, after all... Right now we have: plain, /U, /V /S, /SU /SUI /SV /SVI and /SU should probably be separated from /S - we do want to suppress underflow traps on those (again, FPCR.UND should be set regardless). That's what, 5 or 6 helpers? Might want to separate /V and /U from plain - AFAICS, we get it wrong with things like ADDS/U vs. ADDS (it's just that normally underflow traps are disabled by FPCR.DUND). I hadn't experimented with those yet, but even if it turns out that they *are* different - 8 helpers instead of the 2 we currently have, sharing most of the actual source... Another thing: shouldn't arithmetics on denorms without /S raise EXC_M_INV, rather than EXC_M_UNF?
On 06/23/2014 09:34 PM, Al Viro wrote: > Anyway, delta that seems to fix the gcc one (gcc.dg/pr28796-2.c from > gcc-4.3 and later) follows. Again, I'm not at all sure if handling of > env->pc in there is safe from qemu POV and I'd like like to get comments on > that from somebody more familiar with qemu guts. Thanks for the diagnosis on the gcc test case. I've been meaning to investigate some of these edge cases for quite a while and never quite got there. > static inline void inline_fp_exc_raise(CPUAlphaState *env, uintptr_t retaddr, > - uint32_t exc, uint32_t regno) > + uint32_t exc, uint32_t regno, uint32_t sw) > { > if (exc) { > - uint32_t hw_exc = 0; > + uint32_t hw_exc = sw; > > if (exc & float_flag_invalid) { > hw_exc |= EXC_M_INV; > @@ -75,7 +75,7 @@ static inline void inline_fp_exc_raise(CPUAlphaState *env, uintptr_t retaddr, > doesn't apply. */ > void helper_fp_exc_raise(CPUAlphaState *env, uint32_t exc, uint32_t regno) > { > - inline_fp_exc_raise(env, GETPC(), exc, regno); > + inline_fp_exc_raise(env, GETPC(), exc, regno, 0); > } > > /* Raise exceptions for ieee fp insns with software completion. */ > @@ -84,7 +84,7 @@ void helper_fp_exc_raise_s(CPUAlphaState *env, uint32_t exc, uint32_t regno) > if (exc) { > env->fpcr_exc_status |= exc; > exc &= ~env->fpcr_exc_mask; > - inline_fp_exc_raise(env, GETPC(), exc, regno); > + inline_fp_exc_raise(env, GETPC(), exc, regno, EXC_M_SWC); > } > } This part looks good. > diff --git a/target-alpha/helper.c b/target-alpha/helper.c > index 7c053a3..538c6b2 100644 > --- a/target-alpha/helper.c > +++ b/target-alpha/helper.c > @@ -527,6 +527,7 @@ void QEMU_NORETURN dynamic_excp(CPUAlphaState *env, uintptr_t retaddr, > env->error_code = error; > if (retaddr) { > cpu_restore_state(cs, retaddr); > + env->pc += 4; This one needs a different fix, since dynamic_excp is also used from alpha_cpu_unassigned_access, and I'm pretty sure the mchk should have the address of the memory insn. But that should be easy to fix up. r~
On 06/24/2014 09:52 AM, Al Viro wrote: > unsigned long __attribute__((noinline)) f(double x) > { > return (unsigned long)x; // SVCTQ/SVC > } > > main() > { > unsigned long x; > extern unsigned long __ieee_get_fp_control(void); > printf("before:%lx\n", __ieee_get_fp_control()); > x = f(1ULL<<63); > printf("after:%lx\n", __ieee_get_fp_control()); > printf("result:%lx\n", x); > } > > On actual hardware: > before:0 > after:20000 > result:8000000000000000 > > On qemu: > before:0 > after:0 > result:8000000000000000 > > > IOW, gen_fcvttq() is also affected, not only gen_fp_exc_raise(). Clearly a gross misunderstanding of what bits are actually computed, never mind what gets signaled. Thanks for the test. I've not had working hardware for a couple of years to validate what's supposed to get set and what isn't. > Can't we simply have separate helpers for various trap suffices, with > all this work on getting exc, etc. taken inside them? It's not as if > we distinguished many variants, after all... Right now we have: > plain, /U, /V > /S, /SU > /SUI > /SV > /SVI We used to have separate helpers... at least for the modes that had been implemented at the time. The combinatorial explosion ugly though -- 4 different versions of add, sub, etc. I thought the partial inlining was a decent solution, as far as maintainability, but it's not unreasonable to disagree. > Another thing: shouldn't arithmetics on denorms without /S raise EXC_M_INV, > rather than EXC_M_UNF? No idea. Should they? r~
On Tue, Jun 24, 2014 at 11:33:32AM -0700, Richard Henderson wrote: > > return (unsigned long)x; // SVCTQ/SVC CVTTQ/SVC, of course... > Clearly a gross misunderstanding of what bits are actually computed, never mind > what gets signaled. > > Thanks for the test. I've not had working hardware for a couple of years to > validate what's supposed to get set and what isn't. If you have any ideas for testing, I do have working hw (the box that is currently alive is ev45, though; I _can_ try to boot a UP1000 one, but I make no promises regarding its fans, both in PSU and in CPU module ;-/) > > Can't we simply have separate helpers for various trap suffices, with > > all this work on getting exc, etc. taken inside them? It's not as if > > we distinguished many variants, after all... Right now we have: > > plain, /U, /V > > /S, /SU > > /SUI > > /SV > > /SVI > > We used to have separate helpers... at least for the modes that had been > implemented at the time. The combinatorial explosion ugly though -- 4 > different versions of add, sub, etc. I thought the partial inlining was a > decent solution, as far as maintainability, but it's not unreasonable to disagree. Um? No, I mean having gen_fp_exc_raise() generate a call of one of the 8 helpers; gen_ieee_arith3() and friends would remain as-is. It's just that instead of generating load to exc, andi, call of helper_fp_exc_raise_s or call of helper_fp_exc_raise we would generate a call of one of the helper_fp_exc_raise{,_u,_v,_s,_su,_sui,_sv,_svi} and let that sucker deal with loading exc, updating ->fpcr_exc_status and generating traps. > > Another thing: shouldn't arithmetics on denorms without /S raise EXC_M_INV, > > rather than EXC_M_UNF? > > No idea. Should they? They seem to - both from the arch.manual and from direct experiment... And they do set FPCR.INV at the same time, not just trigger the trap.
On Tue, Jun 24, 2014 at 11:23:01AM -0700, Richard Henderson wrote: > > env->error_code = error; > > if (retaddr) { > > cpu_restore_state(cs, retaddr); > > + env->pc += 4; > > This one needs a different fix, since dynamic_excp is also used from > alpha_cpu_unassigned_access, and I'm pretty sure the mchk should have the > address of the memory insn. But that should be easy to fix up. That's not a problem, actually - there we have dynamic_excp(env, 0, EXCP_MCHK, 0); so retaddr is going to be 0 and that env->pc += 4 won't be reached at all...
On 06/24/2014 01:32 PM, Al Viro wrote: > If you have any ideas for testing, I do have working hw (the box that is > currently alive is ev45, though; I _can_ try to boot a UP1000 one, but > I make no promises regarding its fans, both in PSU and in CPU module ;-/) Ah. Gotta be careful with ev4/45... half of the fpu is unimplemented, and so if you're not careful all you're testing is the kernel emulation behaviour. > Um? No, I mean having gen_fp_exc_raise() generate a call of one of the 8 > helpers; gen_ieee_arith3() and friends would remain as-is. It's just that > instead of generating load to exc, andi, call of helper_fp_exc_raise_s or > call of helper_fp_exc_raise we would generate a call of one of the > helper_fp_exc_raise{,_u,_v,_s,_su,_sui,_sv,_svi} and let that sucker deal > with loading exc, updating ->fpcr_exc_status and generating traps. Ah, I getcha. Yes, that makes sense. >>> Another thing: shouldn't arithmetics on denorms without /S raise EXC_M_INV, >>> rather than EXC_M_UNF? >> >> No idea. Should they? > > They seem to - both from the arch.manual and from direct experiment... And > they do set FPCR.INV at the same time, not just trigger the trap. Ok. I'll try to make time to fix up some of this stuff this weekend. r~
On Tue, Jun 24, 2014 at 01:57:52PM -0700, Richard Henderson wrote: > On 06/24/2014 01:32 PM, Al Viro wrote: > > If you have any ideas for testing, I do have working hw (the box that is > > currently alive is ev45, though; I _can_ try to boot a UP1000 one, but > > I make no promises regarding its fans, both in PSU and in CPU module ;-/) > > Ah. Gotta be careful with ev4/45... half of the fpu is unimplemented, and so > if you're not careful all you're testing is the kernel emulation behaviour. *nod* > > Um? No, I mean having gen_fp_exc_raise() generate a call of one of the 8 > > helpers; gen_ieee_arith3() and friends would remain as-is. It's just that > > instead of generating load to exc, andi, call of helper_fp_exc_raise_s or > > call of helper_fp_exc_raise we would generate a call of one of the > > helper_fp_exc_raise{,_u,_v,_s,_su,_sui,_sv,_svi} and let that sucker deal > > with loading exc, updating ->fpcr_exc_status and generating traps. > > Ah, I getcha. Yes, that makes sense. FWIW, the crudest version of that is simply + env->fpcr_exc_status |= (uint8_t)env->fp_status.float_exception_flags; in the very beginning of helper_fp_exc_raise_s(). And yes, it recovers math/tests-misc.c, even though it's obviously not a good final fix. Al, off to figure out the black magic TCG is using to generate calls...
On 06/24/2014 02:24 PM, Al Viro wrote:
> Al, off to figure out the black magic TCG is using to generate calls...
If you've a helper
DEF_HELPER_1(halt, void, i64)
then
gen_helper_halt(...)
will generate the tcg ops that result in the call.
r~
On Tue, Jun 24, 2014 at 02:32:46PM -0700, Richard Henderson wrote: > On 06/24/2014 02:24 PM, Al Viro wrote: > > Al, off to figure out the black magic TCG is using to generate calls... > > If you've a helper > > DEF_HELPER_1(halt, void, i64) > > then > > gen_helper_halt(...) > > will generate the tcg ops that result in the call. Another fun issue: CVTTQ is both underreporting the overflow *AND* reports the wrong kind - FOV instead of IOV. * it misses reporting overflows for case when it *knows* that overflow will happen - the need to shift up by more than 63 bits. Trivially fixed, of course. There overflow cases leave wrong result as well - should be 0. * it also misses reporting overflows for case when value is in ranges 2^63..2^64-1 and -2^64+1..-2^63-1. And yes, it's asymmetric - 2^63 is an overflow, -2^63 isn't. * overflow is reported by float_raise(float_flag_overflow, &FP_STATUS). Wrong flag - it should be IOV, not FOV. And it should be set in FPCR regardless of the trap modifier (IOV, this VI thing is wrong - we should deal with that only when we generate a trap). * All overflow cases should raise INE as well. Could we steal bit 1 in float_exception_flags for IOV? It is (currently?) unused - enum { float_flag_invalid = 1, float_flag_divbyzero = 4, float_flag_overflow = 8, float_flag_underflow = 16, float_flag_inexact = 32, float_flag_input_denormal = 64, float_flag_output_denormal = 128 }; That would allow to deal with that crap nicely - we could have it raise the new flag, then have helper_fp_exc_raise_... for default trap mode mask it out (and yes, we need to set FPCR flags in default mode, as well as /U and /V - confirmed by direct experiment *and* by TFM). If we can't... well, we could put that flag separately, but it would be more unpleasant. Folks?
On 25 June 2014 08:01, Al Viro <viro@zeniv.linux.org.uk> wrote: > Could we steal bit 1 in float_exception_flags for IOV? It is (currently?) > unused - > enum { > float_flag_invalid = 1, > float_flag_divbyzero = 4, > float_flag_overflow = 8, > float_flag_underflow = 16, > float_flag_inexact = 32, > float_flag_input_denormal = 64, > float_flag_output_denormal = 128 > }; > > That would allow to deal with that crap nicely - we could have it raise > the new flag, then have helper_fp_exc_raise_... for default trap mode > mask it out (and yes, we need to set FPCR flags in default mode, as well > as /U and /V - confirmed by direct experiment *and* by TFM). > > If we can't... well, we could put that flag separately, but it would be > more unpleasant. Folks? I think it's OK to put extra float_flags in, provided you can define their semantics in terms that make sense for more than one architecture (even if only one arch actually happens to need them). The input_denormal/output_denormal flags only get used for ARM, for instance. However if you wanted to split overflow from integer overflow you'd need to fix up all the other targets which expect them to generate just one exception flag... (Note that any patch touching softfloat files needs to come with a statement that you're happy to license it under either the softfloat-2a or softfloat-2b licenses, because we're currently midway through the tedious process of trying to relicense it.) thanks -- PMM
On Wed, Jun 25, 2014 at 10:27:11AM +0100, Peter Maydell wrote: > I think it's OK to put extra float_flags in, provided you can define > their semantics in terms that make sense for more than one > architecture (even if only one arch actually happens to need them). > The input_denormal/output_denormal flags only get used for ARM, > for instance. However if you wanted to split overflow from integer > overflow you'd need to fix up all the other targets which expect > them to generate just one exception flag... Hmm... On alpha it's generated only by the following: CVTTQ, CVTGQ, CVTQL. I.e. conversions to integer formats that can be held in FPU registers (double -> s64, VAX double -> s64 and s64 -> s32). Does softfloat even have anything similar? As it is, it's all in alpha-specific code; double -> s64 might have a chance to be generic (semantics: * denorms -> 0, raise "inexact", provided that they survived to that point and hadn't buggered off with "invalid" * exact integers in range -2^63 .. 2^63-1 -> equivalent 64bit integer * values outside of that range (all with zero fractional part, since the weight of LSB of significand will be considerably greater than 1 by that point) -> lower 64 bits of value, raise "integer overflow" and "inexact". * values with non-zero fractional part -> rounded according to rounding mode, raise "inexact". ), but existing float64_to_int64() isn't it - very different behaviour on overflows. Incidentally, VAX double to s64 is buggered in that area - it *does* try to use float64_to_int64() and, on top of getting INV instead of IOV, gets the wrong result in case of overflow (MAX_LONG/MIN_LONG instead of value in -2^63..2^63-1 comparable modulo 2^64 with exact value taken as element of $\Bbb Z$). And s64->s32 is just plain weird - not in the part that has IOV raised on values outside of -2^31..2^31-1, but in the bit shuffling it's doing if the test passes; alpha FPU stores s32 value in bits 63-62/58-29, with the rest filled with zeroes. In any case, it's not splitting float_overflow_flag; similar cases in softfloat.c raise float_invalid_flag. I don't know if it would make sense to try and teach float64_to_int64() about this kind of return value on overflow... > (Note that any patch touching softfloat files needs to come with > a statement that you're happy to license it under either the > softfloat-2a or softfloat-2b licenses, because we're currently > midway through the tedious process of trying to relicense it.) Wouldn't be a problem, but I doubt that it would be particulary useful to touch softfloat.c due to the reasons above, anyway.
On 25 June 2014 15:26, Al Viro <viro@zeniv.linux.org.uk> wrote: > Hmm... On alpha it's generated only by the following: CVTTQ, CVTGQ, > CVTQL. I.e. conversions to integer formats that can be held in FPU > registers (double -> s64, VAX double -> s64 and s64 -> s32). Does > softfloat even have anything similar? Well, VAX doubles are a bit out of scope for an IEEE emulation library :-) > As it is, it's all in alpha-specific code; It does sound like that's the best place for it. In that case, you don't want to add a flag to the softfloat float_flags -- they are specifically for indicating softfloat's status/exceptions. Flags handled purely in CPU-specific code should be stored in the CPU specific state struct somewhere. thanks -- PMM
diff --git a/target-alpha/fpu_helper.c b/target-alpha/fpu_helper.c index d2d776c..a24535b 100644 --- a/target-alpha/fpu_helper.c +++ b/target-alpha/fpu_helper.c @@ -45,10 +45,10 @@ uint32_t helper_fp_exc_get(CPUAlphaState *env) } static inline void inline_fp_exc_raise(CPUAlphaState *env, uintptr_t retaddr, - uint32_t exc, uint32_t regno) + uint32_t exc, uint32_t regno, uint32_t sw) { if (exc) { - uint32_t hw_exc = 0; + uint32_t hw_exc = sw; if (exc & float_flag_invalid) { hw_exc |= EXC_M_INV; @@ -75,7 +75,7 @@ static inline void inline_fp_exc_raise(CPUAlphaState *env, uintptr_t retaddr, doesn't apply. */ void helper_fp_exc_raise(CPUAlphaState *env, uint32_t exc, uint32_t regno) { - inline_fp_exc_raise(env, GETPC(), exc, regno); + inline_fp_exc_raise(env, GETPC(), exc, regno, 0); } /* Raise exceptions for ieee fp insns with software completion. */ @@ -84,7 +84,7 @@ void helper_fp_exc_raise_s(CPUAlphaState *env, uint32_t exc, uint32_t regno) if (exc) { env->fpcr_exc_status |= exc; exc &= ~env->fpcr_exc_mask; - inline_fp_exc_raise(env, GETPC(), exc, regno); + inline_fp_exc_raise(env, GETPC(), exc, regno, EXC_M_SWC); } } diff --git a/target-alpha/helper.c b/target-alpha/helper.c index 7c053a3..538c6b2 100644 --- a/target-alpha/helper.c +++ b/target-alpha/helper.c @@ -527,6 +527,7 @@ void QEMU_NORETURN dynamic_excp(CPUAlphaState *env, uintptr_t retaddr, env->error_code = error; if (retaddr) { cpu_restore_state(cs, retaddr); + env->pc += 4; } cpu_loop_exit(cs); }