diff mbox

Fix ix86_expand_vector_set (PR target/70421)

Message ID 20160329170521.GM3017@tucnak.redhat.com
State New
Headers show

Commit Message

Jakub Jelinek March 29, 2016, 5:05 p.m. UTC
Hi!

The various blendm expanders look like:
(define_insn "<avx512>_blendm<mode>"
  [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v")
        (vec_merge:V48_AVX512VL
          (match_operand:V48_AVX512VL 2 "nonimmediate_operand" "vm")
          (match_operand:V48_AVX512VL 1 "register_operand" "v")
          (match_operand:<avx512fmaskmode> 3 "register_operand" "Yk")))]
  "TARGET_AVX512F"
  "vblendm<ssemodesuffix>\t{%2, %1, %0%{%3%}|%0%{%3%}, %1, %2}"
  [(set_attr "type" "ssemov")
   (set_attr "prefix" "evex")
   (set_attr "mode" "<sseinsnmode>")])
(i.e. their operands[1] is the second argument of VEC_MERGE (aka the value
to take elements from for bits cleared in the mask), while operands[2]
is the first argument of VEC_MERGE (aka the value to take elements from for
bits set in the mask)), so the call to gen_blendm which want to insert a
single element (with index elt) into target by broadcasting val into a
temporary and using mask of 1 << elt uses wrong order of arguments.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2016-03-27  Jakub Jelinek  <jakub@redhat.com>

	PR target/70421
	* config/i386/i386.c (ix86_expand_vector_set): Fix up argument order
	in gen_blendm expander.

	* gcc.dg/torture/pr70421.c: New test.
	* gcc.target/i386/avx512f-pr70421.c: New test.


	Jakub

Comments

Jeff Law March 29, 2016, 5:44 p.m. UTC | #1
On 03/29/2016 11:05 AM, Jakub Jelinek wrote:
> Hi!
>
> The various blendm expanders look like:
> (define_insn "<avx512>_blendm<mode>"
>    [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v")
>          (vec_merge:V48_AVX512VL
>            (match_operand:V48_AVX512VL 2 "nonimmediate_operand" "vm")
>            (match_operand:V48_AVX512VL 1 "register_operand" "v")
One could argue this ordering is just asking for trouble.

Ultimately, I'll defer to Uros and Kirill.

Jeff
Jakub Jelinek March 29, 2016, 5:49 p.m. UTC | #2
On Tue, Mar 29, 2016 at 11:44:15AM -0600, Jeff Law wrote:
> On 03/29/2016 11:05 AM, Jakub Jelinek wrote:
> >Hi!
> >
> >The various blendm expanders look like:
> >(define_insn "<avx512>_blendm<mode>"
> >   [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v")
> >         (vec_merge:V48_AVX512VL
> >           (match_operand:V48_AVX512VL 2 "nonimmediate_operand" "vm")
> >           (match_operand:V48_AVX512VL 1 "register_operand" "v")
> One could argue this ordering is just asking for trouble.

I bet the reason for this ordering are both the x86 intrinsics and
the HW behavior (see e.g. the order of arguments in the insn template
of the define_insn, etc.).
I think VEC_MERGE's definition on which argument you pick the elements from
for 0 bits in the mask vs. 1 bits in the mask is the exact opposite of what
the x86 HW wants and the intrinsics expect.

	Jakub
Kirill Yukhin March 30, 2016, 1:53 p.m. UTC | #3
On 29 Mar 19:49, Jakub Jelinek wrote:
> On Tue, Mar 29, 2016 at 11:44:15AM -0600, Jeff Law wrote:
> > On 03/29/2016 11:05 AM, Jakub Jelinek wrote:
> > >Hi!
> > >
> > >The various blendm expanders look like:
> > >(define_insn "<avx512>_blendm<mode>"
> > >   [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v")
> > >         (vec_merge:V48_AVX512VL
> > >           (match_operand:V48_AVX512VL 2 "nonimmediate_operand" "vm")
> > >           (match_operand:V48_AVX512VL 1 "register_operand" "v")
> > One could argue this ordering is just asking for trouble.
> 
> I bet the reason for this ordering are both the x86 intrinsics and
> the HW behavior (see e.g. the order of arguments in the insn template
> of the define_insn, etc.).
> I think VEC_MERGE's definition on which argument you pick the elements from
> for 0 bits in the mask vs. 1 bits in the mask is the exact opposite of what
> the x86 HW wants and the intrinsics expect.
I think that order of arguments in built-in might be changed easily.
This doesn't affect intrinsics at all, because of that.

I can hardly recall, but my bet is that this order was dictated by:
ix86_expand_sse_movcc where order of blends args should corresond
on AVX*. 
If we want to fix the order in the pattern we should swap op_[true|false]
here:
        case V64QImode:
          gen = gen_avx512bw_blendmv64qi;
          break;
        case V32HImode:
          gen = gen_avx512bw_blendmv32hi;
          break;
        case V16SImode:
          gen = gen_avx512f_blendmv16si;
          break;
        case V8DImode:
          gen = gen_avx512f_blendmv8di;
          break;
        case V8DFmode:
          gen = gen_avx512f_blendmv8df;
          break;
        case V16SFmode:
          gen = gen_avx512f_blendmv16sf;
          break;

Jakub, nay be add comment in the patch on blendm patterns emphasizing
this non-regular order?

Otherwise, patch is OK to me.

--
Thanks, K

> 
> 	Jakub
Jakub Jelinek March 30, 2016, 1:59 p.m. UTC | #4
On Wed, Mar 30, 2016 at 04:53:48PM +0300, Kirill Yukhin wrote:
> I think that order of arguments in built-in might be changed easily.
> This doesn't affect intrinsics at all, because of that.
> 
> I can hardly recall, but my bet is that this order was dictated by:
> ix86_expand_sse_movcc where order of blends args should corresond
> on AVX*. 

Having the AVX512* blends have different order from AVX{,2} blends would be
bad though, so if we want to change the order, we'd have to change it
everywhere.

> Jakub, nay be add comment in the patch on blendm patterns emphasizing
> this non-regular order?

Ok, will do.

	Jakub
diff mbox

Patch

--- gcc/config/i386/i386.c.jj	2016-03-23 10:41:12.000000000 +0100
+++ gcc/config/i386/i386.c	2016-03-27 22:32:51.748280358 +0200
@@ -46930,7 +46930,7 @@  half:
     {
       tmp = gen_reg_rtx (mode);
       emit_insn (gen_rtx_SET (tmp, gen_rtx_VEC_DUPLICATE (mode, val)));
-      emit_insn (gen_blendm (target, tmp, target,
+      emit_insn (gen_blendm (target, target, tmp,
 			     force_reg (mmode,
 					gen_int_mode (1 << elt, mmode))));
     }
--- gcc/testsuite/gcc.dg/torture/pr70421.c.jj	2016-03-29 09:25:37.015469084 +0200
+++ gcc/testsuite/gcc.dg/torture/pr70421.c	2016-03-29 09:25:13.000000000 +0200
@@ -0,0 +1,22 @@ 
+/* PR target/70421 */
+/* { dg-do run } */
+/* { dg-additional-options "-Wno-psabi -w" } */
+
+typedef unsigned V __attribute__ ((vector_size (64)));
+
+unsigned __attribute__ ((noinline, noclone))
+foo (unsigned x, V u, V v)
+{
+  v[1] ^= v[2];
+  x ^= ((V) v)[u[0]];
+  return x;
+}
+
+int
+main ()
+{
+  unsigned x = foo (0x10, (V) { 1 }, (V) { 0x100, 0x1000, 0x10000 });
+  if (x != 0x11010)
+    __builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.target/i386/avx512f-pr70421.c.jj	2016-03-29 09:26:23.380837148 +0200
+++ gcc/testsuite/gcc.target/i386/avx512f-pr70421.c	2016-03-29 09:27:37.066832846 +0200
@@ -0,0 +1,15 @@ 
+/* PR target/70421 */
+/* { dg-do run } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-options "-O2 -mavx512f" } */
+
+#include "avx512f-check.h"
+
+#define main() do_main()
+#include "../../gcc.dg/torture/pr70421.c"
+
+static void
+avx512f_test (void)
+{
+  do_main ();
+}