Patchwork Fix AVX2 mulv32qi expander (PR target/51387)

login
register
mail settings
Submitter Jakub Jelinek
Date Dec. 2, 2011, 7:18 p.m.
Message ID <20111202191825.GF27242@tyan-ft48-01.lab.bos.redhat.com>
Download mbox | patch
Permalink /patch/128939/
State New
Headers show

Comments

Jakub Jelinek - Dec. 2, 2011, 7:18 p.m.
Hi!

As reported by Michael, vect-116.c testcase fails with -mavx2, the
problem is that mulv32qi pattern computes wrong result, the second and
third quarters of the vector are swapped compared to what it should
contain.  This is because we can't use vec_extract_even_odd for V32QI
when we prepared the vpmullw arguments using vpunpck[hl]bw, because
those insns interleave only intra-lanes, therefore we want
to finalize the result using
{ 0,2,..,14,32,34,..,46,16,18,..,30,48,50,..,62 }
permutation instead of the current one
{ 0,2,..,14,16,18,..,30,32,34,..,46,48,50,..,62 }
The new permutation is even shorter (2 vpshufb + vpor) compared to the
extract even (2 vpshufb + vpor + vpermq).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-12-02  Jakub Jelinek  <jakub@redhat.com>

	PR target/51387
	* config/i386/sse.md (mul<mode>3 with VI1_AVX2 iterator): For
	V32QImode use { 0,2,..,14,32,34,..,46,16,18,..,30,48,50,..,62 }
	permutation instead of extract even permutation.


	Jakub
Richard Henderson - Dec. 2, 2011, 9:52 p.m.
On 12/02/2011 11:18 AM, Jakub Jelinek wrote:
> 	PR target/51387
> 	* config/i386/sse.md (mul<mode>3 with VI1_AVX2 iterator): For
> 	V32QImode use { 0,2,..,14,32,34,..,46,16,18,..,30,48,50,..,62 }
> 	permutation instead of extract even permutation.

Ok.


r~

Patch

--- gcc/config/i386/sse.md.jj	2011-12-01 11:44:58.000000000 +0100
+++ gcc/config/i386/sse.md	2011-12-02 12:18:42.657795749 +0100
@@ -5066,7 +5066,24 @@  (define_insn_and_split "mul<mode>3"
 					gen_lowpart (mulmode, t[3]))));
 
   /* Extract the even bytes and merge them back together.  */
-  ix86_expand_vec_extract_even_odd (operands[0], t[5], t[4], 0);
+  if (<MODE>mode == V16QImode)
+    ix86_expand_vec_extract_even_odd (operands[0], t[5], t[4], 0);
+  else
+    {
+      /* Since avx2_interleave_{low,high}v32qi used above aren't cross-lane,
+	 this can't be normal even extraction, but one where additionally
+	 the second and third quarter are swapped.  That is even one insn
+	 shorter than even extraction.  */
+      rtvec v = rtvec_alloc (32);
+      for (i = 0; i < 32; ++i)
+	RTVEC_ELT (v, i)
+	  = GEN_INT (i * 2 + ((i & 24) == 8 ? 16 : (i & 24) == 16 ? -16 : 0));
+      t[0] = operands[0];
+      t[1] = t[5];
+      t[2] = t[4];
+      t[3] = gen_rtx_CONST_VECTOR (<MODE>mode, v);
+      ix86_expand_vec_perm_const (t);
+    }
 
   set_unique_reg_note (get_last_insn (), REG_EQUAL,
 		       gen_rtx_MULT (<MODE>mode, operands[1], operands[2]));