diff mbox

[i386] Delay DI mode xor split when expanding comparison

Message ID 20160115092211.GA13418@msticlxl57.ims.intel.com
State New
Headers show

Commit Message

Ilya Enkovich Jan. 15, 2016, 9:22 a.m. UTC
Hi,

When scalar-to-vector pass was introduced in i386 target, DI mode
instructions split was delayed to split2 pass (was performed
on expand before).  It appears this causes ~5% degradation on
libquantum benchmark.  This happens because in original code we
have AND and XOR combined into ANDN.  Now it doesn't happen because
AND is not split on expand but XOR is split.  This patch delays
split of XOR generated for DI mode comparison.  This resolves regression
on libquantum.  Unfortunately we still don't generate SSE version
of ANDN, I'll look into this later.

Botostrapped and tested on x86_64-unknown-linux-gnu.  OK for trunk?

Thanks,
Ilya
--
gcc/

2016-01-14  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* config/i386/i386.c (ix86_expand_branch): Don't split
	DI mode xor instruction to SI mode.

gcc/testsuite/

2016-01-14  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* gcc.target/i386/pr65105-5.c: New test.
diff mbox

Patch

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ed91e5d..504ac55 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21699,6 +21699,19 @@  ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label)
     case DImode:
       if (TARGET_64BIT)
 	goto simple;
+      /* For 32-bit target DI comparison may be performed on
+	 SSE registers.  To allow this we should avoid split
+	 to SI mode which is achieved by doing xor in DI mode
+	 and then comparing with zero (which is recognized by
+	 STV pass).  We don't compare using xor when optimizing
+	 for size.  */
+      if (!optimize_insn_for_size_p ()
+	  && TARGET_STV
+	  && (code == EQ || code == NE))
+	{
+	  op0 = force_reg (mode, gen_rtx_XOR (mode, op0, op1));
+	  op1 = const0_rtx;
+	}
     case TImode:
       /* Expand DImode branch into multiple compare+branch.  */
       {
diff --git a/gcc/testsuite/gcc.target/i386/pr65105-5.c b/gcc/testsuite/gcc.target/i386/pr65105-5.c
new file mode 100644
index 0000000..5818c1c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr65105-5.c
@@ -0,0 +1,22 @@ 
+/* PR target/pr65105 */
+/* { dg-do compile { target { ia32 } } } */
+/* { dg-options "-O2 -march=core-avx2" } */
+/* { dg-final { scan-assembler "pand" } } */
+/* { dg-final { scan-assembler "pxor" } } */
+/* { dg-final { scan-assembler "ptest" } } */
+
+struct S1
+{
+  unsigned long long a;
+  unsigned long long b;
+};
+
+void
+test (int p1, unsigned long long p2, int p3, struct S1 *p4)
+{
+  int i;
+
+  for (i = 0; i < p1; i++)
+    if ((p4[i].a & p2) != p2)
+      p4[i].a ^= (1ULL << p3);
+}