Message ID | CAFULd4bRa44uFd4uvFa3AzfJnX-6sJpB4ZtWPjXYxf3P2i-SeA@mail.gmail.com |
---|---|
State | New |
Headers | show |
* Uros Bizjak: > Add earlyclobber to sqrtt/sqrtf insns. > > When using software completions, we have to prevent assembler to match > input and output operands of sqrtt/sqrtf insn. Add earlyclobber to > output operand to avoid unwanted operand matching. Is this so that the trap handler can recover the original input? In this case, please add this to the commit message for those of us who are not familiar with the Alpha architecture.
On Fri, Apr 14, 2017 at 2:38 PM, Florian Weimer <fw@deneb.enyo.de> wrote: > * Uros Bizjak: > >> Add earlyclobber to sqrtt/sqrtf insns. >> >> When using software completions, we have to prevent assembler to match >> input and output operands of sqrtt/sqrtf insn. Add earlyclobber to >> output operand to avoid unwanted operand matching. > > Is this so that the trap handler can recover the original input? > > In this case, please add this to the commit message for those of us > who are not familiar with the Alpha architecture. I don't know the low-level details, perhaps Richard can explain these requirements. The sqrtt simply returns zero for subnormal operand with software completions when input and output reg are matched. Uros.
* Uros Bizjak: > On Fri, Apr 14, 2017 at 2:38 PM, Florian Weimer <fw@deneb.enyo.de> wrote: >> * Uros Bizjak: >> >>> Add earlyclobber to sqrtt/sqrtf insns. >>> >>> When using software completions, we have to prevent assembler to match >>> input and output operands of sqrtt/sqrtf insn. Add earlyclobber to >>> output operand to avoid unwanted operand matching. >> >> Is this so that the trap handler can recover the original input? >> >> In this case, please add this to the commit message for those of us >> who are not familiar with the Alpha architecture. > > I don't know the low-level details, perhaps Richard can explain these > requirements. > > The sqrtt simply returns zero for subnormal operand with software > completions when input and output reg are matched. Did you see that in an emulator, or on actual hardware? The architecture manual suggests that we also need to take measures to limit the trap shadow.
On Fri, Apr 14, 2017 at 2:54 PM, Florian Weimer <fw@deneb.enyo.de> wrote: >>>> Add earlyclobber to sqrtt/sqrtf insns. >>>> >>>> When using software completions, we have to prevent assembler to match >>>> input and output operands of sqrtt/sqrtf insn. Add earlyclobber to >>>> output operand to avoid unwanted operand matching. >>> >>> Is this so that the trap handler can recover the original input? >>> >>> In this case, please add this to the commit message for those of us >>> who are not familiar with the Alpha architecture. >> >> I don't know the low-level details, perhaps Richard can explain these >> requirements. >> >> The sqrtt simply returns zero for subnormal operand with software >> completions when input and output reg are matched. > > Did you see that in an emulator, or on actual hardware? I have seen this on actual hardware, AlphaServer ES40, EV68AL Tsunami. Without the patch, there are many instances of problematic insns, simple grep sqrt[ts] of libm dump shows ~ $ grep sqrt[ts] aaa 6e94: 6b bd eb 53 sqrtt/sud $f11,$f11 6ef0: 70 bd f0 53 sqrtt/sud $f16,$f16 cb8c: 60 bd e0 53 sqrtt/sud $f0,$f0 cc14: 60 bd e0 53 sqrtt/sud $f0,$f0 d28c: 70 bd e2 53 sqrtt/sud $f2,$f16 d350: 70 bd e2 53 sqrtt/sud $f2,$f16 d588: 62 bd e2 53 sqrtt/sud $f2,$f2 d60c: 62 bd e2 53 sqrtt/sud $f2,$f2 dad0: 70 bd e2 53 sqrtt/sud $f2,$f16 db64: 70 bd e2 53 sqrtt/sud $f2,$f16 ddbc: 62 bd e2 53 sqrtt/sud $f2,$f2 de24: 62 bd e2 53 sqrtt/sud $f2,$f2 e028: 62 bd e2 53 sqrtt/sud $f2,$f2 e628: 71 bd f1 53 sqrtt/sud $f17,$f17 12290: 6a bd ea 53 sqrtt/sud $f10,$f10 133ec: 6b bd eb 53 sqrtt/sud $f11,$f11 13460: 6b bd eb 53 sqrtt/sud $f11,$f11 1dff4: 60 bd e0 53 sqrtt/sud $f0,$f0 23de4: 62 bd e2 53 sqrtt/sud $f2,$f2 23f00: 62 bd e2 53 sqrtt/sud $f2,$f2 24000: 70 bd f0 53 sqrtt/sud $f16,$f16 24040: 64 bd ea 53 sqrtt/sud $f10,$f4 24084: 64 bd ea 53 sqrtt/sud $f10,$f4 24124: 6a bd ea 53 sqrtt/sud $f10,$f10 2414c: 70 bd e0 53 sqrtt/sud $f0,$f16 24168: 61 bd e1 53 sqrtt/sud $f1,$f1 25cac: 63 bd ea 53 sqrtt/sud $f10,$f3 25d94: 71 bd e5 53 sqrtt/sud $f5,$f17 25eb4: 65 bd e5 53 sqrtt/sud $f5,$f5 25f24: 63 bd e3 53 sqrtt/sud $f3,$f3 25f84: 6b bd eb 53 sqrtt/sud $f11,$f11 25f98: 6a bd ea 53 sqrtt/sud $f10,$f10 25fa8: 6b bd eb 53 sqrtt/sud $f11,$f11 2607c: 6c bd ec 53 sqrtt/sud $f12,$f12 26094: 6a bd ea 53 sqrtt/sud $f10,$f10 2613c: 6b bd eb 53 sqrtt/sud $f11,$f11 26150: 63 bd eb 53 sqrtt/sud $f11,$f3 2995c: 6b b9 ea 53 sqrts/sud $f10,$f11 29ad0: 6b b9 ea 53 sqrts/sud $f10,$f11 29c00: 70 b9 f0 53 sqrts/sud $f16,$f16 29c60: 6b b9 eb 53 sqrts/sud $f11,$f11 29d50: 6d b9 ea 53 sqrts/sud $f10,$f13 29e18: 6b b9 ea 53 sqrts/sud $f10,$f11 2aa00: 6a bd ea 53 sqrtt/sud $f10,$f10 2aebc: 62 b9 e2 53 sqrts/sud $f2,$f2 2af6c: 62 b9 e2 53 sqrts/sud $f2,$f2 2b1a8: 62 b9 e2 53 sqrts/sud $f2,$f2 2b230: 62 b9 e2 53 sqrts/sud $f2,$f2 2b630: 70 b9 e2 53 sqrts/sud $f2,$f16 2b6c8: 70 b9 e2 53 sqrts/sud $f2,$f16 2b940: 62 b9 e2 53 sqrts/sud $f2,$f2 2b9b8: 62 b9 e2 53 sqrts/sud $f2,$f2 2d29c: 60 b9 f0 53 sqrts/sud $f16,$f0 2e528: 6a b9 ea 53 sqrts/sud $f10,$f10 2fb68: 6b b9 eb 53 sqrts/sud $f11,$f11 2fbfc: 6b b9 eb 53 sqrts/sud $f11,$f11 33948: 60 b9 e0 53 sqrts/sud $f0,$f0 37c30: 63 b9 e3 53 sqrts/sud $f3,$f3 37d64: 63 b9 e3 53 sqrts/sud $f3,$f3 37e64: 70 b9 f0 53 sqrts/sud $f16,$f16 37eac: 62 b9 ea 53 sqrts/sud $f10,$f2 37ed8: 62 b9 ea 53 sqrts/sud $f10,$f2 37f88: 6a b9 ea 53 sqrts/sud $f10,$f10 37fa8: 70 b9 e0 53 sqrts/sud $f0,$f16 37fb8: 61 b9 e1 53 sqrts/sud $f1,$f1 3909c: 65 b9 ea 53 sqrts/sud $f10,$f5 39184: 65 b9 e7 53 sqrts/sud $f7,$f5 39294: 67 b9 e7 53 sqrts/sud $f7,$f7 39310: 65 b9 e5 53 sqrts/sud $f5,$f5 39370: 6b b9 eb 53 sqrts/sud $f11,$f11 39384: 65 b9 e5 53 sqrts/sud $f5,$f5 39390: 6a b9 ea 53 sqrts/sud $f10,$f10 39458: 6c b9 ec 53 sqrts/sud $f12,$f12 39470: 6a b9 ea 53 sqrts/sud $f10,$f10 39514: 6a b9 ea 53 sqrts/sud $f10,$f10 39528: 65 b9 ea 53 sqrts/sud $f10,$f5 7429c: 6a bd f0 53 sqrtt/sud $f16,$f10 742f8: 6a bd f0 53 sqrtt/sud $f16,$f10 Uros.
On 04/14/2017 05:30 AM, Uros Bizjak wrote: > Add earlyclobber to sqrtt/sqrtf insns. > > When using software completions, we have to prevent assembler to match > input and output operands of sqrtt/sqrtf insn. Add earlyclobber to > output operand to avoid unwanted operand matching. > > 2017-04-14 Uros Bizjak <ubizjak@gmail.com> > > * sysdeps/alpha/fpu/math_private.h (__ieee754_sqrt): Add > earlyclobber to output operand of sqrt insn. > (__ieee754_sqrtf): Ditto. > > diff --git a/sysdeps/alpha/fpu/math_private.h b/sysdeps/alpha/fpu/math_private.h > index 9e06e25..1e97c86 100644 > --- a/sysdeps/alpha/fpu/math_private.h > +++ b/sysdeps/alpha/fpu/math_private.h > @@ -27,9 +27,9 @@ __ieee754_sqrt (double d) > { > double ret; > # ifdef _IEEE_FP_INEXACT > - asm ("sqrtt/suid %1,%0" : "=f"(ret) : "f"(d)); > + asm ("sqrtt/suid %1,%0" : "=&f"(ret) : "f"(d)); > # else > - asm ("sqrtt/sud %1,%0" : "=f"(ret) : "f"(d)); > + asm ("sqrtt/sud %1,%0" : "=&f"(ret) : "f"(d)); > # endif Hmm. This is surprising because any host that has sqrtt also has exact traps, and so trap shadows and recovery of the input shouldn't be an issue. That said, do we actually still support a gcc that doesn't have these as builtins? That would be more ideal than inline asm. r~
On Fri, Apr 14, 2017 at 3:55 PM, Richard Henderson <rth@twiddle.net> wrote: > On 04/14/2017 05:30 AM, Uros Bizjak wrote: >> >> Add earlyclobber to sqrtt/sqrtf insns. >> >> When using software completions, we have to prevent assembler to match >> input and output operands of sqrtt/sqrtf insn. Add earlyclobber to >> output operand to avoid unwanted operand matching. >> >> 2017-04-14 Uros Bizjak <ubizjak@gmail.com> >> >> * sysdeps/alpha/fpu/math_private.h (__ieee754_sqrt): Add >> earlyclobber to output operand of sqrt insn. >> (__ieee754_sqrtf): Ditto. >> >> diff --git a/sysdeps/alpha/fpu/math_private.h >> b/sysdeps/alpha/fpu/math_private.h >> index 9e06e25..1e97c86 100644 >> --- a/sysdeps/alpha/fpu/math_private.h >> +++ b/sysdeps/alpha/fpu/math_private.h >> @@ -27,9 +27,9 @@ __ieee754_sqrt (double d) >> { >> double ret; >> # ifdef _IEEE_FP_INEXACT >> - asm ("sqrtt/suid %1,%0" : "=f"(ret) : "f"(d)); >> + asm ("sqrtt/suid %1,%0" : "=&f"(ret) : "f"(d)); >> # else >> - asm ("sqrtt/sud %1,%0" : "=f"(ret) : "f"(d)); >> + asm ("sqrtt/sud %1,%0" : "=&f"(ret) : "f"(d)); >> # endif > > > Hmm. This is surprising because any host that has sqrtt also has exact > traps, and so trap shadows and recovery of the input shouldn't be an issue. I have tried with a sqrtt/sud $f0, $f0, where the value in $f0 was 0x0....10. I can confirm that the returned result was 0. > That said, do we actually still support a gcc that doesn't have these as > builtins? That would be more ideal than inline asm. Compiling double test_sqrt (double d) { return __builtin_sqrt (d); } with -O2 -mcpu=ev6 resulted in: test_sqrt: .frame $30,16,$26,0 .mask 0x4000000,-16 $LFB0: .cfi_startproc ldah $29,0($27) !gpdisp!1 lda $29,0($29) !gpdisp!1 $test_sqrt..ng: sqrtt $f16,$f0 lda $30,-16($30) .cfi_def_cfa_offset 16 stq $26,0($30) .cfi_offset 26, -16 .prologue 1 cmpteq $f0,$f0,$f10 fbeq $f10,$L4 $L2: ldq $26,0($30) lda $30,16($30) .cfi_remember_state .cfi_restore 26 .cfi_def_cfa_offset 0 ret $31,($26),1 $L4: .cfi_restore_state ldq $27,sqrt($29) !literal!2 jsr $26,($27),sqrt !lituse_jsr!2 ldah $29,0($26) !gpdisp!3 lda $29,0($29) !gpdisp!3 br $31,$L2 __builtin_sqrt expands with a call to sqrt(), this could possibly create a recursive call loop if the builtin is used in libc to implement sqrt. Uros.
On 04/14/2017 02:30 PM, Uros Bizjak wrote: > Add earlyclobber to sqrtt/sqrtf insns. > > When using software completions, we have to prevent assembler to match > input and output operands of sqrtt/sqrtf insn. Add earlyclobber to > output operand to avoid unwanted operand matching. > > 2017-04-14 Uros Bizjak<ubizjak@gmail.com> > > * sysdeps/alpha/fpu/math_private.h (__ieee754_sqrt): Add > earlyclobber to output operand of sqrt insn. > (__ieee754_sqrtf): Ditto. Pushed to master. r~
diff --git a/sysdeps/alpha/fpu/math_private.h b/sysdeps/alpha/fpu/math_private.h index 9e06e25..1e97c86 100644 --- a/sysdeps/alpha/fpu/math_private.h +++ b/sysdeps/alpha/fpu/math_private.h @@ -27,9 +27,9 @@ __ieee754_sqrt (double d) { double ret; # ifdef _IEEE_FP_INEXACT - asm ("sqrtt/suid %1,%0" : "=f"(ret) : "f"(d)); + asm ("sqrtt/suid %1,%0" : "=&f"(ret) : "f"(d)); # else - asm ("sqrtt/sud %1,%0" : "=f"(ret) : "f"(d)); + asm ("sqrtt/sud %1,%0" : "=&f"(ret) : "f"(d)); # endif return ret; } @@ -39,9 +39,9 @@ __ieee754_sqrtf (float d) { float ret; # ifdef _IEEE_FP_INEXACT - asm ("sqrts/suid %1,%0" : "=f"(ret) : "f"(d)); + asm ("sqrts/suid %1,%0" : "=&f"(ret) : "f"(d)); # else - asm ("sqrts/sud %1,%0" : "=f"(ret) : "f"(d)); + asm ("sqrts/sud %1,%0" : "=&f"(ret) : "f"(d)); # endif return ret; }