[rs6000] Fix constraints issue in _mm_cvtss_si{32,64}

Message ID 06add7ad-55e6-40a3-ac1b-13aaf8dccd2e@linux.ibm.com
State New
Headers show
Series
  • [rs6000] Fix constraints issue in _mm_cvtss_si{32,64}
Related show

Commit Message

Bill Schmidt Nov. 8, 2018, 8:18 p.m.
Hi,

We recently discovered that GCC is getting lucky with register allocation of
some inline assembly code, despite invalid register constraints.  In these
two functions, a "wi" constraint (VSX valid for direct moves) was used for a 
temporary that, as written, is further constrained to be an FPR.  This patch
fixes the problem by introducing a separate temporary with an "f" constraint
and breaking the lifetime of the existing temporary.  The existing temporary
can now have a less onerous "wa" constraint as it is no longer used within a
direct move instruction.

Installed and tested on powerpc64le-linux-gnu with no regressions.  Is this
okay for trunk?

Thanks,
Bill


2018-11-08  Bill Schmidt  <wschmidt@linux.ibm.com>
	    Jinsong Ji  <jji@us.ibm.com>

	* config/rs6000/xmmintrin.h (_mm_cvtss_si32): Fix incorrect
	constraints by introducing a new temporary.
	(_mm_cvtss_si64): Likewise.

Comments

Segher Boessenkool Nov. 8, 2018, 10:51 p.m. | #1
Hi!

On Thu, Nov 08, 2018 at 02:18:51PM -0600, Bill Schmidt wrote:
> We recently discovered that GCC is getting lucky with register allocation of
> some inline assembly code, despite invalid register constraints.  In these
> two functions, a "wi" constraint (VSX valid for direct moves) was used for a 

"wi" is not direct move only, that is "wj".  "wi" is for DImode in VSX.
It allows all VSX registers (or nothing at all).

> temporary that, as written, is further constrained to be an FPR.  This patch
> fixes the problem by introducing a separate temporary with an "f" constraint
> and breaking the lifetime of the existing temporary.  The existing temporary
> can now have a less onerous "wa" constraint as it is no longer used within a
> direct move instruction.

It could already use a "wa", and the "f" isn't needed at all?  But it
should use xsrdpic then, i.e. the VSX instruction instead of the FP insn.
Maybe that only works for 64 bit though (for the overflow behaviour).  Hrm.
Although it doesn't currently implement overflow behaviour correctly either!

Your code is not wrong though, so okay for trunk.  Thanks!


Segher

Patch

Index: gcc/config/rs6000/xmmintrin.h
===================================================================
--- gcc/config/rs6000/xmmintrin.h	(revision 265895)
+++ gcc/config/rs6000/xmmintrin.h	(working copy)
@@ -908,13 +908,15 @@  _mm_cvtss_si32 (__m128 __A)
   __m64 res = 0;
 #ifdef _ARCH_PWR8
   __m128 vtmp;
+  double dtmp;
   __asm__(
-      "xxsldwi %x1,%x2,%x2,3;\n"
-      "xscvspdp %x1,%x1;\n"
-      "fctiw  %1,%1;\n"
-      "mfvsrd  %0,%x1;\n"
+      "xxsldwi %x1,%x3,%x3,3;\n"
+      "xscvspdp %x2,%x1;\n"
+      "fctiw  %2,%2;\n"
+      "mfvsrd  %0,%x2;\n"
       : "=r" (res),
-	"=&wi" (vtmp)
+      	"=&wa" (vtmp),
+        "=f" (dtmp)
       : "wa" (__A)
       : );
 #else
@@ -939,13 +941,15 @@  _mm_cvtss_si64 (__m128 __A)
   __m64 res = 0;
 #ifdef _ARCH_PWR8
   __m128 vtmp;
+  double dtmp;
   __asm__(
-      "xxsldwi %x1,%x2,%x2,3;\n"
-      "xscvspdp %x1,%x1;\n"
-      "fctid  %1,%1;\n"
-      "mfvsrd  %0,%x1;\n"
+      "xxsldwi %x1,%x3,%x3,3;\n"
+      "xscvspdp %x2,%x1;\n"
+      "fctid  %2,%2;\n"
+      "mfvsrd  %0,%x2;\n"
       : "=r" (res),
-	"=&wi" (vtmp)
+        "=&wa" (vtmp),
+        "=f" (dtmp)
       : "wa" (__A)
       : );
 #else