[alpha] : Add earlyclobber to sqrtt/sqrtf insns.

Message ID CAFULd4bRa44uFd4uvFa3AzfJnX-6sJpB4ZtWPjXYxf3P2i-SeA@mail.gmail.com
State New
Headers show

Commit Message

Uros Bizjak April 14, 2017, 12:30 p.m.
Add earlyclobber to sqrtt/sqrtf insns.

When using software completions, we have to prevent assembler to match
input and output operands of sqrtt/sqrtf insn. Add earlyclobber to
output operand to avoid unwanted operand matching.

2017-04-14  Uros Bizjak  <ubizjak@gmail.com>

    * sysdeps/alpha/fpu/math_private.h (__ieee754_sqrt): Add
    earlyclobber to output operand of sqrt insn.
    (__ieee754_sqrtf): Ditto.

Comments

Florian Weimer April 14, 2017, 12:38 p.m. | #1
* Uros Bizjak:

> Add earlyclobber to sqrtt/sqrtf insns.
>
> When using software completions, we have to prevent assembler to match
> input and output operands of sqrtt/sqrtf insn. Add earlyclobber to
> output operand to avoid unwanted operand matching.

Is this so that the trap handler can recover the original input?

In this case, please add this to the commit message for those of us
who are not familiar with the Alpha architecture.
Uros Bizjak April 14, 2017, 12:47 p.m. | #2
On Fri, Apr 14, 2017 at 2:38 PM, Florian Weimer <fw@deneb.enyo.de> wrote:
> * Uros Bizjak:
>
>> Add earlyclobber to sqrtt/sqrtf insns.
>>
>> When using software completions, we have to prevent assembler to match
>> input and output operands of sqrtt/sqrtf insn. Add earlyclobber to
>> output operand to avoid unwanted operand matching.
>
> Is this so that the trap handler can recover the original input?
>
> In this case, please add this to the commit message for those of us
> who are not familiar with the Alpha architecture.

I don't know the low-level details, perhaps Richard can explain these
requirements.

The sqrtt simply returns zero for subnormal operand with software
completions when input and output reg are matched.

Uros.
Florian Weimer April 14, 2017, 12:54 p.m. | #3
* Uros Bizjak:

> On Fri, Apr 14, 2017 at 2:38 PM, Florian Weimer <fw@deneb.enyo.de> wrote:
>> * Uros Bizjak:
>>
>>> Add earlyclobber to sqrtt/sqrtf insns.
>>>
>>> When using software completions, we have to prevent assembler to match
>>> input and output operands of sqrtt/sqrtf insn. Add earlyclobber to
>>> output operand to avoid unwanted operand matching.
>>
>> Is this so that the trap handler can recover the original input?
>>
>> In this case, please add this to the commit message for those of us
>> who are not familiar with the Alpha architecture.
>
> I don't know the low-level details, perhaps Richard can explain these
> requirements.
>
> The sqrtt simply returns zero for subnormal operand with software
> completions when input and output reg are matched.

Did you see that in an emulator, or on actual hardware?

The architecture manual suggests that we also need to take measures to
limit the trap shadow.
Uros Bizjak April 14, 2017, 1:02 p.m. | #4
On Fri, Apr 14, 2017 at 2:54 PM, Florian Weimer <fw@deneb.enyo.de> wrote:

>>>> Add earlyclobber to sqrtt/sqrtf insns.
>>>>
>>>> When using software completions, we have to prevent assembler to match
>>>> input and output operands of sqrtt/sqrtf insn. Add earlyclobber to
>>>> output operand to avoid unwanted operand matching.
>>>
>>> Is this so that the trap handler can recover the original input?
>>>
>>> In this case, please add this to the commit message for those of us
>>> who are not familiar with the Alpha architecture.
>>
>> I don't know the low-level details, perhaps Richard can explain these
>> requirements.
>>
>> The sqrtt simply returns zero for subnormal operand with software
>> completions when input and output reg are matched.
>
> Did you see that in an emulator, or on actual hardware?

I have seen this on actual hardware, AlphaServer ES40, EV68AL Tsunami.

Without the patch, there are many instances of problematic insns,
simple grep sqrt[ts] of libm dump shows

~ $ grep sqrt[ts] aaa

    6e94:       6b bd eb 53     sqrtt/sud       $f11,$f11
    6ef0:       70 bd f0 53     sqrtt/sud       $f16,$f16
    cb8c:       60 bd e0 53     sqrtt/sud       $f0,$f0
    cc14:       60 bd e0 53     sqrtt/sud       $f0,$f0
    d28c:       70 bd e2 53     sqrtt/sud       $f2,$f16
    d350:       70 bd e2 53     sqrtt/sud       $f2,$f16
    d588:       62 bd e2 53     sqrtt/sud       $f2,$f2
    d60c:       62 bd e2 53     sqrtt/sud       $f2,$f2
    dad0:       70 bd e2 53     sqrtt/sud       $f2,$f16
    db64:       70 bd e2 53     sqrtt/sud       $f2,$f16
    ddbc:       62 bd e2 53     sqrtt/sud       $f2,$f2
    de24:       62 bd e2 53     sqrtt/sud       $f2,$f2
    e028:       62 bd e2 53     sqrtt/sud       $f2,$f2
    e628:       71 bd f1 53     sqrtt/sud       $f17,$f17
   12290:       6a bd ea 53     sqrtt/sud       $f10,$f10
   133ec:       6b bd eb 53     sqrtt/sud       $f11,$f11
   13460:       6b bd eb 53     sqrtt/sud       $f11,$f11
   1dff4:       60 bd e0 53     sqrtt/sud       $f0,$f0
   23de4:       62 bd e2 53     sqrtt/sud       $f2,$f2
   23f00:       62 bd e2 53     sqrtt/sud       $f2,$f2
   24000:       70 bd f0 53     sqrtt/sud       $f16,$f16
   24040:       64 bd ea 53     sqrtt/sud       $f10,$f4
   24084:       64 bd ea 53     sqrtt/sud       $f10,$f4
   24124:       6a bd ea 53     sqrtt/sud       $f10,$f10
   2414c:       70 bd e0 53     sqrtt/sud       $f0,$f16
   24168:       61 bd e1 53     sqrtt/sud       $f1,$f1
   25cac:       63 bd ea 53     sqrtt/sud       $f10,$f3
   25d94:       71 bd e5 53     sqrtt/sud       $f5,$f17
   25eb4:       65 bd e5 53     sqrtt/sud       $f5,$f5
   25f24:       63 bd e3 53     sqrtt/sud       $f3,$f3
   25f84:       6b bd eb 53     sqrtt/sud       $f11,$f11
   25f98:       6a bd ea 53     sqrtt/sud       $f10,$f10
   25fa8:       6b bd eb 53     sqrtt/sud       $f11,$f11
   2607c:       6c bd ec 53     sqrtt/sud       $f12,$f12
   26094:       6a bd ea 53     sqrtt/sud       $f10,$f10
   2613c:       6b bd eb 53     sqrtt/sud       $f11,$f11
   26150:       63 bd eb 53     sqrtt/sud       $f11,$f3
   2995c:       6b b9 ea 53     sqrts/sud       $f10,$f11
   29ad0:       6b b9 ea 53     sqrts/sud       $f10,$f11
   29c00:       70 b9 f0 53     sqrts/sud       $f16,$f16
   29c60:       6b b9 eb 53     sqrts/sud       $f11,$f11
   29d50:       6d b9 ea 53     sqrts/sud       $f10,$f13
   29e18:       6b b9 ea 53     sqrts/sud       $f10,$f11
   2aa00:       6a bd ea 53     sqrtt/sud       $f10,$f10
   2aebc:       62 b9 e2 53     sqrts/sud       $f2,$f2
   2af6c:       62 b9 e2 53     sqrts/sud       $f2,$f2
   2b1a8:       62 b9 e2 53     sqrts/sud       $f2,$f2
   2b230:       62 b9 e2 53     sqrts/sud       $f2,$f2
   2b630:       70 b9 e2 53     sqrts/sud       $f2,$f16
   2b6c8:       70 b9 e2 53     sqrts/sud       $f2,$f16
   2b940:       62 b9 e2 53     sqrts/sud       $f2,$f2
   2b9b8:       62 b9 e2 53     sqrts/sud       $f2,$f2
   2d29c:       60 b9 f0 53     sqrts/sud       $f16,$f0
   2e528:       6a b9 ea 53     sqrts/sud       $f10,$f10
   2fb68:       6b b9 eb 53     sqrts/sud       $f11,$f11
   2fbfc:       6b b9 eb 53     sqrts/sud       $f11,$f11
   33948:       60 b9 e0 53     sqrts/sud       $f0,$f0
   37c30:       63 b9 e3 53     sqrts/sud       $f3,$f3
   37d64:       63 b9 e3 53     sqrts/sud       $f3,$f3
   37e64:       70 b9 f0 53     sqrts/sud       $f16,$f16
   37eac:       62 b9 ea 53     sqrts/sud       $f10,$f2
   37ed8:       62 b9 ea 53     sqrts/sud       $f10,$f2
   37f88:       6a b9 ea 53     sqrts/sud       $f10,$f10
   37fa8:       70 b9 e0 53     sqrts/sud       $f0,$f16
   37fb8:       61 b9 e1 53     sqrts/sud       $f1,$f1
   3909c:       65 b9 ea 53     sqrts/sud       $f10,$f5
   39184:       65 b9 e7 53     sqrts/sud       $f7,$f5
   39294:       67 b9 e7 53     sqrts/sud       $f7,$f7
   39310:       65 b9 e5 53     sqrts/sud       $f5,$f5
   39370:       6b b9 eb 53     sqrts/sud       $f11,$f11
   39384:       65 b9 e5 53     sqrts/sud       $f5,$f5
   39390:       6a b9 ea 53     sqrts/sud       $f10,$f10
   39458:       6c b9 ec 53     sqrts/sud       $f12,$f12
   39470:       6a b9 ea 53     sqrts/sud       $f10,$f10
   39514:       6a b9 ea 53     sqrts/sud       $f10,$f10
   39528:       65 b9 ea 53     sqrts/sud       $f10,$f5
   7429c:       6a bd f0 53     sqrtt/sud       $f16,$f10
   742f8:       6a bd f0 53     sqrtt/sud       $f16,$f10

Uros.
Richard Henderson April 14, 2017, 1:55 p.m. | #5
On 04/14/2017 05:30 AM, Uros Bizjak wrote:
> Add earlyclobber to sqrtt/sqrtf insns.
>
> When using software completions, we have to prevent assembler to match
> input and output operands of sqrtt/sqrtf insn. Add earlyclobber to
> output operand to avoid unwanted operand matching.
>
> 2017-04-14  Uros Bizjak  <ubizjak@gmail.com>
>
>     * sysdeps/alpha/fpu/math_private.h (__ieee754_sqrt): Add
>     earlyclobber to output operand of sqrt insn.
>     (__ieee754_sqrtf): Ditto.
>
> diff --git a/sysdeps/alpha/fpu/math_private.h b/sysdeps/alpha/fpu/math_private.h
> index 9e06e25..1e97c86 100644
> --- a/sysdeps/alpha/fpu/math_private.h
> +++ b/sysdeps/alpha/fpu/math_private.h
> @@ -27,9 +27,9 @@ __ieee754_sqrt (double d)
>  {
>    double ret;
>  # ifdef _IEEE_FP_INEXACT
> -  asm ("sqrtt/suid %1,%0" : "=f"(ret) : "f"(d));
> +  asm ("sqrtt/suid %1,%0" : "=&f"(ret) : "f"(d));
>  # else
> -  asm ("sqrtt/sud %1,%0" : "=f"(ret) : "f"(d));
> +  asm ("sqrtt/sud %1,%0" : "=&f"(ret) : "f"(d));
>  # endif

Hmm.  This is surprising because any host that has sqrtt also has exact traps, 
and so trap shadows and recovery of the input shouldn't be an issue.

That said, do we actually still support a gcc that doesn't have these as 
builtins?  That would be more ideal than inline asm.


r~
Uros Bizjak April 14, 2017, 6:01 p.m. | #6
On Fri, Apr 14, 2017 at 3:55 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 04/14/2017 05:30 AM, Uros Bizjak wrote:
>>
>> Add earlyclobber to sqrtt/sqrtf insns.
>>
>> When using software completions, we have to prevent assembler to match
>> input and output operands of sqrtt/sqrtf insn. Add earlyclobber to
>> output operand to avoid unwanted operand matching.
>>
>> 2017-04-14  Uros Bizjak  <ubizjak@gmail.com>
>>
>>     * sysdeps/alpha/fpu/math_private.h (__ieee754_sqrt): Add
>>     earlyclobber to output operand of sqrt insn.
>>     (__ieee754_sqrtf): Ditto.
>>
>> diff --git a/sysdeps/alpha/fpu/math_private.h
>> b/sysdeps/alpha/fpu/math_private.h
>> index 9e06e25..1e97c86 100644
>> --- a/sysdeps/alpha/fpu/math_private.h
>> +++ b/sysdeps/alpha/fpu/math_private.h
>> @@ -27,9 +27,9 @@ __ieee754_sqrt (double d)
>>  {
>>    double ret;
>>  # ifdef _IEEE_FP_INEXACT
>> -  asm ("sqrtt/suid %1,%0" : "=f"(ret) : "f"(d));
>> +  asm ("sqrtt/suid %1,%0" : "=&f"(ret) : "f"(d));
>>  # else
>> -  asm ("sqrtt/sud %1,%0" : "=f"(ret) : "f"(d));
>> +  asm ("sqrtt/sud %1,%0" : "=&f"(ret) : "f"(d));
>>  # endif
>
>
> Hmm.  This is surprising because any host that has sqrtt also has exact
> traps, and so trap shadows and recovery of the input shouldn't be an issue.

I have tried with a sqrtt/sud $f0, $f0, where the value in $f0 was
0x0....10. I can confirm that the returned result was 0.

> That said, do we actually still support a gcc that doesn't have these as
> builtins?  That would be more ideal than inline asm.

Compiling

double test_sqrt (double d)
{
        return __builtin_sqrt (d);
}

with -O2 -mcpu=ev6 resulted in:

test_sqrt:
        .frame $30,16,$26,0
        .mask 0x4000000,-16
$LFB0:
        .cfi_startproc
        ldah $29,0($27)         !gpdisp!1
        lda $29,0($29)          !gpdisp!1
$test_sqrt..ng:
        sqrtt $f16,$f0
        lda $30,-16($30)
        .cfi_def_cfa_offset 16
        stq $26,0($30)
        .cfi_offset 26, -16
        .prologue 1
        cmpteq $f0,$f0,$f10
        fbeq $f10,$L4
$L2:
        ldq $26,0($30)
        lda $30,16($30)
        .cfi_remember_state
        .cfi_restore 26
        .cfi_def_cfa_offset 0
        ret $31,($26),1
$L4:
        .cfi_restore_state
        ldq $27,sqrt($29)               !literal!2
        jsr $26,($27),sqrt              !lituse_jsr!2
        ldah $29,0($26)         !gpdisp!3
        lda $29,0($29)          !gpdisp!3
        br $31,$L2

__builtin_sqrt expands with a call to sqrt(), this could possibly
create a recursive call loop if the builtin is used in libc to
implement sqrt.

Uros.
Richard Henderson April 26, 2017, 1:43 p.m. | #7
On 04/14/2017 02:30 PM, Uros Bizjak wrote:
> Add earlyclobber to sqrtt/sqrtf insns.
> 
> When using software completions, we have to prevent assembler to match
> input and output operands of sqrtt/sqrtf insn. Add earlyclobber to
> output operand to avoid unwanted operand matching.
> 
> 2017-04-14  Uros Bizjak<ubizjak@gmail.com>
> 
>      * sysdeps/alpha/fpu/math_private.h (__ieee754_sqrt): Add
>      earlyclobber to output operand of sqrt insn.
>      (__ieee754_sqrtf): Ditto.

Pushed to master.


r~

Patch

diff --git a/sysdeps/alpha/fpu/math_private.h b/sysdeps/alpha/fpu/math_private.h
index 9e06e25..1e97c86 100644
--- a/sysdeps/alpha/fpu/math_private.h
+++ b/sysdeps/alpha/fpu/math_private.h
@@ -27,9 +27,9 @@  __ieee754_sqrt (double d)
 {
   double ret;
 # ifdef _IEEE_FP_INEXACT
-  asm ("sqrtt/suid %1,%0" : "=f"(ret) : "f"(d));
+  asm ("sqrtt/suid %1,%0" : "=&f"(ret) : "f"(d));
 # else
-  asm ("sqrtt/sud %1,%0" : "=f"(ret) : "f"(d));
+  asm ("sqrtt/sud %1,%0" : "=&f"(ret) : "f"(d));
 # endif
   return ret;
 }
@@ -39,9 +39,9 @@  __ieee754_sqrtf (float d)
 {
   float ret;
 # ifdef _IEEE_FP_INEXACT
-  asm ("sqrts/suid %1,%0" : "=f"(ret) : "f"(d));
+  asm ("sqrts/suid %1,%0" : "=&f"(ret) : "f"(d));
 # else
-  asm ("sqrts/sud %1,%0" : "=f"(ret) : "f"(d));
+  asm ("sqrts/sud %1,%0" : "=&f"(ret) : "f"(d));
 # endif
   return ret;
 }