diff mbox

[i386] : Fix PR78794, ~9% regression in 32-bit mode for 462.libquntum on Avoton after r243202

Message ID CAFULd4ZPsZ0MuAXH2pD4b8GuqnUAUQdLwHYOU-EnP-ttfsKAJw@mail.gmail.com
State New
Headers show

Commit Message

Uros Bizjak Dec. 13, 2016, 5:23 p.m. UTC
Hello!

Attached patch fixes STV cost function to better model gains of pandn
insn on non-BMI targets. As explained in the PR, STV converts four
scalar arithmetic insns (2 * not and 2 * and) to one (pandn). The
patch increases gain for non-BMI targets for 2 * ix86_cost->add to a
total of 3 * ix86_cost->add.

2016-12-13  Uros Bizjak  <ubizjak@gmail.com>

    PR target/78794
    * config/i386/i386.c (dimode_scalar_chain::compute_convert_gain):
    Calculate additional gain for andnot for targets without BMI.

testsuite/ChangeLog:

2016-12-13  Uros Bizjak  <ubizjak@gmail.com>

    PR target/78794
    * gcc.target/i386/pr78794.c: New test.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
diff mbox

Patch

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 243611)
+++ config/i386/i386.c	(working copy)
@@ -3419,6 +3419,10 @@  dimode_scalar_chain::compute_convert_gain ()
 	       || GET_CODE (src) == AND)
 	{
 	  gain += ix86_cost->add;
+	  /* Additional gain for andnot for targets without BMI.  */
+	  if (GET_CODE (XEXP (src, 0)) == NOT
+	      && !TARGET_BMI)
+	    gain += 2 * ix86_cost->add;
 	  if (CONST_INT_P (XEXP (src, 0)))
 	    gain -= vector_const_cost (XEXP (src, 0));
 	  if (CONST_INT_P (XEXP (src, 1)))
Index: testsuite/gcc.target/i386/pr78794.c
===================================================================
--- testsuite/gcc.target/i386/pr78794.c	(nonexistent)
+++ testsuite/gcc.target/i386/pr78794.c	(working copy)
@@ -0,0 +1,32 @@ 
+/* PR target/pr78794 */
+/* { dg-do compile { target { ia32 } } } */
+/* { dg-options "-O2 -march=slm -mno-bmi -mno-stackrealign" } */
+/* { dg-final { scan-assembler "pandn" } } */
+
+typedef unsigned long long ull;
+
+struct S1
+{
+  float x;
+  ull y;
+};
+
+
+struct S2
+{
+  int a1;
+  struct S1 *node;
+  int *a2;
+};
+
+void
+foo(int c1, int c2, int c3, struct S2 *reg)
+{
+  int i;
+  for(i=0; i<reg->a1; i++)
+    if(reg->node[i].y & ((ull) 1 << c1))
+      {
+	if(reg->node[i].y & ((ull) 1 << c2))
+	  reg->node[i].y ^= ((ull) 1 << c3);
+      }
+}