diff mbox

[testsuite] Fix sse4_1-round* inline asm statements

Message ID CAFULd4bR6fUFrmarPSqseOepn1Z60hn-H_pH8vuWgHjWBt-Usg@mail.gmail.com
State New
Headers show

Commit Message

Uros Bizjak Dec. 8, 2015, 6:23 p.m. UTC
On Tue, Dec 8, 2015 at 7:10 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> Hello!
>
>> this patch fixes the asm statements in the gcc.target/i386/sse4_1-round* test cases.
>>
>> They do lots of things that are just absolutely forbidden, like clobber registers
>> that are not mentioned in the clobber list, and create a hidden data flow.
>>
>> The test cases work just by chance, and You can see the asm statements
>> ripped completely apart by the loop optimizer if you try to do the assembler
>> part in a loop:
>>
>>   for (i = 0; i < 10; i++) {
>>   __asm__ ("fld" ASM_SUFFIX " %0" : : "m" (*&f));
>>
>>   __asm__ ("fstcw %0" : "=m" (*&saved_cw));
>>   new_cw = saved_cw & clr_mask;
>>   new_cw |= type;
>>   __asm__ ("fldcw %0" : : "m" (*&new_cw));
>>
>>   __asm__ ("frndint\n"
>>            "fstp" ASM_SUFFIX " %0\n" : "=m" (*&ret));
>>   __asm__ ("fldcw %0" : : "m" (*&saved_cw));
>>   }
>>   return ret;
>>
>> So this patch avoids the hidden data flow, and
>> adds "st" to the clobber list.
>>
>> Boot-strapped and reg-tested on x86_64-pc-linux-gnu
>> OK for trunk?
>
> Uh, no.
>
> x87 is a strange beast, and even your patch is wrong. The assembly
> should be written in this way:
>
>   __asm__ ("fnstcw %0" : "=m" (saved_cw));
>
>   new_cw = saved_cw & clr_mask;
>   new_cw |= type;
>
>   __asm__ ("fldcw %2\n\t"
>        "frndint\n\t"
>        "fldcw %3" : "=t" (ret)
>               : "0" (f), "m" (new_cw), "m" (saved_cw));
>
> I'm testing the attached patch.

Committed slightly updated patch to mainline SVN with the following ChangeLog:

2015-12-08  Uros Bizjak  <ubizjak@gmail.com>

    * gcc.target/i386/sse4_1-round.h (do_round): Fix inline asm statements.
    * gcc.target/i386/sse4_1-roundsd-4.c (do_round): Ditto.
    * gcc.target/i386/sse4_1-roundss-4.c (do_round): Ditto.

Tested on x86_64-linux-gnu {,-m32}, committed to mainline SVN, will be
backported to all active branches.

Uros.

Comments

Bernd Edlinger Dec. 8, 2015, 8:13 p.m. UTC | #1
Hi,

Am 08.12.2015 um 19:23 schrieb Uros Bizjak:
> On Tue, Dec 8, 2015 at 7:10 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> Hello!
>>
>>> this patch fixes the asm statements in the gcc.target/i386/sse4_1-round* test cases.
>>>
>>> They do lots of things that are just absolutely forbidden, like clobber registers
>>> that are not mentioned in the clobber list, and create a hidden data flow.
>>>
>>> The test cases work just by chance, and You can see the asm statements
>>> ripped completely apart by the loop optimizer if you try to do the assembler
>>> part in a loop:
>>>
>>>    for (i = 0; i < 10; i++) {
>>>    __asm__ ("fld" ASM_SUFFIX " %0" : : "m" (*&f));
>>>
>>>    __asm__ ("fstcw %0" : "=m" (*&saved_cw));
>>>    new_cw = saved_cw & clr_mask;
>>>    new_cw |= type;
>>>    __asm__ ("fldcw %0" : : "m" (*&new_cw));
>>>
>>>    __asm__ ("frndint\n"
>>>             "fstp" ASM_SUFFIX " %0\n" : "=m" (*&ret));
>>>    __asm__ ("fldcw %0" : : "m" (*&saved_cw));
>>>    }
>>>    return ret;
>>>
>>> So this patch avoids the hidden data flow, and
>>> adds "st" to the clobber list.
>>>
>>> Boot-strapped and reg-tested on x86_64-pc-linux-gnu
>>> OK for trunk?
>> Uh, no.
>>
>> x87 is a strange beast, and even your patch is wrong. The assembly
>> should be written in this way:
>>
>>    __asm__ ("fnstcw %0" : "=m" (saved_cw));
>>
>>    new_cw = saved_cw & clr_mask;
>>    new_cw |= type;
>>
>>    __asm__ ("fldcw %2\n\t"
>>         "frndint\n\t"
>>         "fldcw %3" : "=t" (ret)
>>                : "0" (f), "m" (new_cw), "m" (saved_cw));
>>
>> I'm testing the attached patch.

Thanks.

That is certainly a good idea to use the "=t"(ret) : "0" (f) !

But could you explain, just for my curiosity, why my approach
was wrong?  It may have to do with the semantic of "st" clobber, right?
I thought that allows the inline-assembler to push one value on the 
fpu-stack,
and it is responsible for removing that value again.


Bernd.

> Committed slightly updated patch to mainline SVN with the following ChangeLog:
>
> 2015-12-08  Uros Bizjak  <ubizjak@gmail.com>
>
>      * gcc.target/i386/sse4_1-round.h (do_round): Fix inline asm statements.
>      * gcc.target/i386/sse4_1-roundsd-4.c (do_round): Ditto.
>      * gcc.target/i386/sse4_1-roundss-4.c (do_round): Ditto.
>
> Tested on x86_64-linux-gnu {,-m32}, committed to mainline SVN, will be
> backported to all active branches.
>
> Uros.
Uros Bizjak Dec. 9, 2015, 7:18 a.m. UTC | #2
On Tue, Dec 8, 2015 at 9:13 PM, Bernd Edlinger
<bernd.edlinger@hotmail.de> wrote:
> Hi,
>
> Am 08.12.2015 um 19:23 schrieb Uros Bizjak:
>> On Tue, Dec 8, 2015 at 7:10 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> Hello!
>>>
>>>> this patch fixes the asm statements in the gcc.target/i386/sse4_1-round* test cases.
>>>>
>>>> They do lots of things that are just absolutely forbidden, like clobber registers
>>>> that are not mentioned in the clobber list, and create a hidden data flow.
>>>>
>>>> The test cases work just by chance, and You can see the asm statements
>>>> ripped completely apart by the loop optimizer if you try to do the assembler
>>>> part in a loop:
>>>>
>>>>    for (i = 0; i < 10; i++) {
>>>>    __asm__ ("fld" ASM_SUFFIX " %0" : : "m" (*&f));
>>>>
>>>>    __asm__ ("fstcw %0" : "=m" (*&saved_cw));
>>>>    new_cw = saved_cw & clr_mask;
>>>>    new_cw |= type;
>>>>    __asm__ ("fldcw %0" : : "m" (*&new_cw));
>>>>
>>>>    __asm__ ("frndint\n"
>>>>             "fstp" ASM_SUFFIX " %0\n" : "=m" (*&ret));
>>>>    __asm__ ("fldcw %0" : : "m" (*&saved_cw));
>>>>    }
>>>>    return ret;
>>>>
>>>> So this patch avoids the hidden data flow, and
>>>> adds "st" to the clobber list.
>>>>
>>>> Boot-strapped and reg-tested on x86_64-pc-linux-gnu
>>>> OK for trunk?
>>> Uh, no.
>>>
>>> x87 is a strange beast, and even your patch is wrong. The assembly
>>> should be written in this way:
>>>
>>>    __asm__ ("fnstcw %0" : "=m" (saved_cw));
>>>
>>>    new_cw = saved_cw & clr_mask;
>>>    new_cw |= type;
>>>
>>>    __asm__ ("fldcw %2\n\t"
>>>         "frndint\n\t"
>>>         "fldcw %3" : "=t" (ret)
>>>                : "0" (f), "m" (new_cw), "m" (saved_cw));
>>>
>>> I'm testing the attached patch.
>
> Thanks.
>
> That is certainly a good idea to use the "=t"(ret) : "0" (f) !
>
> But could you explain, just for my curiosity, why my approach
> was wrong?  It may have to do with the semantic of "st" clobber, right?
> I thought that allows the inline-assembler to push one value on the
> fpu-stack,
> and it is responsible for removing that value again.

Not "wrong" in the sense that it won't work, but it is not written in
the optimal way. The st(0) register is not clobbered, it is loaded
with the input argument, and the value is returned there. So, when asm
input arguments are described in the right way, there is no need to
explicitly move values to the stack, the compiler will do it for you.

Saying that, I see we don't need to define ASM_SUFFIX anymore. I'll
prepare the patch that removes these #defines.

Uros.
diff mbox

Patch

Index: gcc.target/i386/sse4_1-round.h
===================================================================
--- gcc.target/i386/sse4_1-round.h	(revision 231413)
+++ gcc.target/i386/sse4_1-round.h	(working copy)
@@ -28,7 +28,7 @@  init_round (FP_T *src)
 static FP_T
 do_round (FP_T f, int type)
 {
-  short saved_cw, new_cw, clr_mask;
+  unsigned short saved_cw, new_cw, clr_mask;
   FP_T ret;
 
   if ((type & 4))
@@ -42,16 +42,15 @@  do_round (FP_T f, int type)
       clr_mask = ~0x0C3F;
     }
 
-  __asm__ ("fld" ASM_SUFFIX " %0" : : "m" (*&f));
+  __asm__ ("fnstcw %0" : "=m" (saved_cw));
 
-  __asm__ ("fstcw %0" : "=m" (*&saved_cw));
   new_cw = saved_cw & clr_mask;
   new_cw |= type;
-  __asm__ ("fldcw %0" : : "m" (*&new_cw));
 
-  __asm__ ("frndint\n"
-	   "fstp" ASM_SUFFIX " %0\n" : "=m" (*&ret));
-  __asm__ ("fldcw %0" : : "m" (*&saved_cw));
+  __asm__ ("fldcw %2\n\t"
+	   "frndint\n\t"
+	   "fldcw %3" : "=t" (ret)
+		      : "0" (f), "m" (new_cw), "m" (saved_cw));
   return ret;
 }
 
Index: gcc.target/i386/sse4_1-roundsd-4.c
===================================================================
--- gcc.target/i386/sse4_1-roundsd-4.c	(revision 231413)
+++ gcc.target/i386/sse4_1-roundsd-4.c	(working copy)
@@ -36,7 +36,7 @@  init_round (double *src)
 static double
 do_round (double f, int type)
 {
-  short saved_cw, new_cw, clr_mask;
+  unsigned short saved_cw, new_cw, clr_mask;
   double ret;
 
   if ((type & 4))
@@ -50,16 +50,15 @@  do_round (double f, int type)
       clr_mask = ~0x0C3F;
     }
 
-  __asm__ ("fldl %0" : : "m" (*&f));
+  __asm__ ("fnstcw %0" : "=m" (saved_cw));
 
-  __asm__ ("fstcw %0" : "=m" (*&saved_cw));
   new_cw = saved_cw & clr_mask;
   new_cw |= type;
-  __asm__ ("fldcw %0" : : "m" (*&new_cw));
 
-  __asm__ ("frndint\n"
-	   "fstpl %0\n" : "=m" (*&ret));
-  __asm__ ("fldcw %0" : : "m" (*&saved_cw));
+  __asm__ ("fldcw %2\n\t"
+	   "frndint\n\t"
+	   "fldcw %3" : "=t" (ret)
+		      : "0" (f), "m" (new_cw), "m" (saved_cw));
   return ret;
 }
 
Index: gcc.target/i386/sse4_1-roundss-4.c
===================================================================
--- gcc.target/i386/sse4_1-roundss-4.c	(revision 231413)
+++ gcc.target/i386/sse4_1-roundss-4.c	(working copy)
@@ -36,7 +36,7 @@  init_round (float *src)
 static float
 do_round (float f, int type)
 {
-  short saved_cw, new_cw, clr_mask;
+  unsigned short saved_cw, new_cw, clr_mask;
   float ret;
 
   if ((type & 4))
@@ -50,16 +50,15 @@  do_round (float f, int type)
       clr_mask = ~0x0C3F;
     }
 
-  __asm__ ("flds %0" : : "m" (*&f));
+  __asm__ ("fnstcw %0" : "=m" (saved_cw));
 
-  __asm__ ("fstcw %0" : "=m" (*&saved_cw));
   new_cw = saved_cw & clr_mask;
   new_cw |= type;
-  __asm__ ("fldcw %0" : : "m" (*&new_cw));
 
-  __asm__ ("frndint\n"
-	   "fstps %0\n" : "=m" (*&ret));
-  __asm__ ("fldcw %0" : : "m" (*&saved_cw));
+  __asm__ ("fldcw %2\n\t"
+	   "frndint\n\t"
+	   "fldcw %3" : "=t" (ret)
+		      : "0" (f), "m" (new_cw), "m" (saved_cw));
   return ret;
 }