diff mbox

[AArch64] Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns.

Message ID DB5PR08MB11440EF163AB4AECD8266A8EE75E0@DB5PR08MB1144.eurprd08.prod.outlook.com
State New
Headers show

Commit Message

Bin Cheng June 8, 2016, 2:57 p.m. UTC
> From: James Greenhalgh <james.greenhalgh@arm.com>
> Sent: 31 May 2016 16:24
> To: Bin Cheng
> Cc: gcc-patches@gcc.gnu.org; nd
> Subject: Re: [PATCH AArch64]Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns.
>     
> On Tue, May 17, 2016 at 09:02:22AM +0000, Bin Cheng wrote:
> > Hi,
> > Alan and Renlin noticed that some vcond patterns are not supported in
> > AArch64(or AArch32?) backend, and they both had some patches fixing this.
> > After investigation, I agree with them that vcond/vcondu in AArch64's backend
> > should be re-implemented using vec_cmp/vcond_mask patterns, so here comes
> > this patch which is based on Alan's.  This patch supports all vcond/vcondu
> > patterns by implementing/using vec_cmp and vcond_mask patterns.  Different to
> > the original patch, it doesn't change GCC's expanding process, and it keeps
> > vcond patterns.  The patch also introduces vec_cmp*_internal to support
> > special case optimization for vcond/vcondu which current implementation does.
> > Apart from Alan's patch, I also learned ideas from Renlin's, and it is my
> > change that shall be blamed if any potential bug is introduced.
> > 
> > With this patch, GCC's test condition "vect_cond_mixed" can be enabled on
> > AArch64 (in a following patch).  Bootstrap and test on AArch64.  Is it OK?
> > BTW, this patch is necessary for gcc.dg/vect/PR56541.c (on AArch64) which was
> > added before in tree if-conversion patch.
> 
> Splitting this patch would have been very helpful. One patch each for the
> new standard pattern names, and one patch for the refactor of vcond. As
> it is, this patch is rather difficult to read.
Done, patch split into two with one implementing new vcond_mask&vec_cmp patterns, and another re-writing vcond patterns.

> 
> > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> > index bd73bce..f51473a 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -1053,7 +1053,7 @@
> >      }
> >  
> >    cmp_fmt = gen_rtx_fmt_ee (cmp_operator, V2DImode, operands[1], operands[2]);
> > -  emit_insn (gen_aarch64_vcond_internalv2div2di (operands[0], operands[1],
> > +  emit_insn (gen_vcondv2div2di (operands[0], operands[1],
> >                operands[2], cmp_fmt, operands[1], operands[2]));
> >    DONE;
> >  })
> > @@ -2225,204 +2225,215 @@
> >    DONE;
> >  })
> >  
> > -(define_expand "aarch64_vcond_internal<mode><mode>"
> > +(define_expand "vcond_mask_<mode><v_cmp_result>"
> > +  [(match_operand:VALLDI 0 "register_operand")
> > +   (match_operand:VALLDI 1 "nonmemory_operand")
> > +   (match_operand:VALLDI 2 "nonmemory_operand")
> > +   (match_operand:<V_cmp_result> 3 "register_operand")]
> > +  "TARGET_SIMD"
> > +{
> > +  /* If we have (a = (P) ? -1 : 0);
> > +     Then we can simply move the generated mask (result must be int).  */
> > +  if (operands[1] == CONSTM1_RTX (<MODE>mode)
> > +      && operands[2] == CONST0_RTX (<MODE>mode))
> > +    emit_move_insn (operands[0], operands[3]);
> > +  /* Similarly, (a = (P) ? 0 : -1) is just inverting the generated mask.  */
> > +  else if (operands[1] == CONST0_RTX (<MODE>mode)
> > +        && operands[2] == CONSTM1_RTX (<MODE>mode))
> > +    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[3]));
> > +  else
> > +    {
> > +      if (!REG_P (operands[1]))
> > +     operands[1] = force_reg (<MODE>mode, operands[1]);
> > +      if (!REG_P (operands[2]))
> > +     operands[2] = force_reg (<MODE>mode, operands[2]);
> > +      emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], operands[3],
> > +                                          operands[1], operands[2]));
> > +    }
> > +
> > +  DONE;
> > +})
> > +
> 
> This pattern is fine.
> 
> > +;; Patterns comparing two vectors to produce a mask.
> 
> This comment is insufficient. The logic in vec_cmp<mode><mode>_internal
> does not always return the expected mask (in particular for NE), but this
> is not made clear in the comment.
Comments added to various places to make the code easier to understand and maintain.

> 
> > +
> > +(define_expand "vec_cmp<mode><mode>_internal"
> >    [(set (match_operand:VSDQ_I_DI 0 "register_operand")
> > -     (if_then_else:VSDQ_I_DI
> > -       (match_operator 3 "comparison_operator"
> > -         [(match_operand:VSDQ_I_DI 4 "register_operand")
> > -          (match_operand:VSDQ_I_DI 5 "nonmemory_operand")])
> > -       (match_operand:VSDQ_I_DI 1 "nonmemory_operand")
> > -       (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))]
> > +       (match_operator 1 "comparison_operator"
> > +         [(match_operand:VSDQ_I_DI 2 "register_operand")
> > +          (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
> >    "TARGET_SIMD"
> 
> <snip>
> 
> > +(define_expand "vec_cmp<mode><mode>"
> > +  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
> > +       (match_operator 1 "comparison_operator"
> > +         [(match_operand:VSDQ_I_DI 2 "register_operand")
> > +          (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
> > +  "TARGET_SIMD"
> > +{
> > +  enum rtx_code code = GET_CODE (operands[1]);
> > +
> > +  emit_insn (gen_vec_cmp<mode><mode>_internal (operands[0],
> > +                                      operands[1], operands[2],
> > +                                      operands[3]));
> > +  /* See comments in vec_cmp<mode><v_cmp_result>_internal, below
> > +     operators are computed using other operators, and we need to
> > +     revert the result.  */
> 
> s/revert/invert
> 
> There are no comments in vec_cmp<mode><v_cmp_result>_internal for this
> to refer to. But, there should be - more comments would definitely help
> this patch!
> 
> These comments apply equally to the other copies of this comment in the
> patch.
Done.

> 
> > +  if (code == NE)
> > +    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
> >    DONE;
> >  })
> >  
> > -(define_expand "aarch64_vcond_internal<VDQF_COND:mode><VDQF:mode>"
> > -  [(set (match_operand:VDQF_COND 0 "register_operand")
> > -     (if_then_else:VDQF
> > -       (match_operator 3 "comparison_operator"
> > -         [(match_operand:VDQF 4 "register_operand")
> > -          (match_operand:VDQF 5 "nonmemory_operand")])
> > -       (match_operand:VDQF_COND 1 "nonmemory_operand")
> > -       (match_operand:VDQF_COND 2 "nonmemory_operand")))]
> > +(define_expand "vec_cmp<mode><v_cmp_result>_internal"
> > +  [(set (match_operand:<V_cmp_result> 0 "register_operand")
> > +     (match_operator 1 "comparison_operator"
> > +         [(match_operand:VDQF 2 "register_operand")
> > +          (match_operand:VDQF 3 "nonmemory_operand")]))]
> 
> This needs much more thorough commenting. The logic is now extremely
> difficult to verify as to which pattern is responible for inverting the
> mask. This makes it hard to spot whether the inversions applied to the
> unordered comparisons are correct. I can't really review the patch with
> the logic split across uncommented patterns in this way - is there
> anything you can do to clean it up?
Done.

> 
> > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vcond.c b/gcc/testsuite/gcc.target/aarch64/vect-vcond.c
> > new file mode 100644
> > index 0000000..b0ca6d1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/vect-vcond.c
> 
> None of this is AArch64 specific?
Test case removed since I will enable vect_cond_mixed on AArch64, which will enables various generic cases testing mixed vcond vectorization, which makes adding AArch64 specific test unnecessary here.

> 
> > @@ -0,0 +1,59 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> > +
> > +float fa[1024] = {0.0}, fb[1024] = {0.0}, fc[1024];
> > +double da[1024] = {0.0}, db[1024] = {0.0}, dc[1024];
> > +
> > +int vcond_f_i (int *x)
> > +{
> > +  int i = 0;
> > +  for (i = 0; i < 1024; i++)
> > +    {
> > +      fa[i] -= 1.0;
> > +      fb[i] += 1.0;
> > +      fc[i] = (x[i] < i) ? fa[i] : fb[i];
> > +    }
> > +
> > +  return 0;
> > +}
> > +
> > +int vcond_f_u (unsigned int *x)
> > +{
> > +  unsigned int i = 0;
> > +  for (i = 0; i < 1024; i++)
> > +    {
> > +      fa[i] -= 1.0;
> > +      fb[i] += 1.0;
> > +      fc[i] = (x[i] < i) ? fa[i] : fb[i];
> > +    }
> > +
> > +  return 0;
> > +}
> > +
> > +int vcond_d_i (long long *x)
> > +{
> > +  long long i = 0;
> > +  for (i = 0; i < 1024; i++)
> > +    {
> > +      da[i] -= 1.0;
> > +      db[i] += 1.0;
> > +      dc[i] = (x[i] < i) ? da[i] : db[i];
> > +    }
> > +
> > +  return 0;
> > +}
> > +
> > +int vcond_d_u (unsigned long long *x)
> > +{
> > +  unsigned long long i = 0;
> > +  for (i = 0; i < 1024; i++)
> > +    {
> > +      da[i] -= 1.0;
> > +      db[i] += 1.0;
> > +      dc[i] = (x[i] < i) ? da[i] : db[i];
> > +    }
> > +
> > +  return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
> 

Patches attached.  Bootstrap and test on AArch64.  Is it OK?

Thanks,
bin

2016-06-07  Alan Lawrence  <alan.lawrence@arm.com>
	    Renlin Li  <renlin.li@arm.com>
	    Bin Cheng  <bin.cheng@arm.com>

	* config/aarch64/aarch64-simd.md (vec_cmp<mode><mode>): New pattern.
	(vec_cmp<mode><mode>_internal): New pattern.
	(vec_cmp<mode><v_cmp_result>): New pattern.
	(vec_cmp<mode><v_cmp_result>_internal): New pattern.
	(vec_cmpu<mode><mode>): New pattern.
	(vcond_mask_<mode><v_cmp_result>): New pattern.


2016-06-07  Alan Lawrence  <alan.lawrence@arm.com>
	    Renlin Li  <renlin.li@arm.com>
	    Bin Cheng  <bin.cheng@arm.com>

	* config/aarch64/iterators.md (V_cmp_mixed, v_cmp_mixed): New.
	* config/aarch64/aarch64-simd.md (<su><maxmin>v2di3): Call
	gen_vcondv2div2di instead of gen_aarch64_vcond_internalv2div2di.
	(aarch64_vcond_internal<mode><mode>): Delete pattern.
	(aarch64_vcond_internal<VDQF_COND:mode><VDQF:mode>): Ditto.
	(vcond<mode><mode>): Re-implement using vec_cmp and vcond_mask.
	(vcondu<mode><mode>): Ditto.
	(vcond<v_cmp_result><mode>): Delete.
	(vcond<v_cmp_mixed><mode>): New pattern.
	(vcondu<mode><v_cmp_mixed>): New pattern.
	(aarch64_cmtst<mode>): Revise comment using aarch64_vcond instead
	of aarch64_vcond_internal.

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 6ea35bf..e080b71 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1053,7 +1053,7 @@
     }
 
   cmp_fmt = gen_rtx_fmt_ee (cmp_operator, V2DImode, operands[1], operands[2]);
-  emit_insn (gen_aarch64_vcond_internalv2div2di (operands[0], operands[1],
+  emit_insn (gen_vcondv2div2di (operands[0], operands[1],
               operands[2], cmp_fmt, operands[1], operands[2]));
   DONE;
 })
@@ -2202,314 +2202,6 @@
   DONE;
 })
 
-(define_expand "aarch64_vcond_internal<mode><mode>"
-  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
-	(if_then_else:VSDQ_I_DI
-	  (match_operator 3 "comparison_operator"
-	    [(match_operand:VSDQ_I_DI 4 "register_operand")
-	     (match_operand:VSDQ_I_DI 5 "nonmemory_operand")])
-	  (match_operand:VSDQ_I_DI 1 "nonmemory_operand")
-	  (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))]
-  "TARGET_SIMD"
-{
-  rtx op1 = operands[1];
-  rtx op2 = operands[2];
-  rtx mask = gen_reg_rtx (<MODE>mode);
-  enum rtx_code code = GET_CODE (operands[3]);
-
-  /* Switching OP1 and OP2 is necessary for NE (to output a cmeq insn),
-     and desirable for other comparisons if it results in FOO ? -1 : 0
-     (this allows direct use of the comparison result without a bsl).  */
-  if (code == NE
-      || (code != EQ
-	  && op1 == CONST0_RTX (<V_cmp_result>mode)
-	  && op2 == CONSTM1_RTX (<V_cmp_result>mode)))
-    {
-      op1 = operands[2];
-      op2 = operands[1];
-      switch (code)
-        {
-        case LE: code = GT; break;
-        case LT: code = GE; break;
-        case GE: code = LT; break;
-        case GT: code = LE; break;
-        /* No case EQ.  */
-        case NE: code = EQ; break;
-        case LTU: code = GEU; break;
-        case LEU: code = GTU; break;
-        case GTU: code = LEU; break;
-        case GEU: code = LTU; break;
-        default: gcc_unreachable ();
-        }
-    }
-
-  /* Make sure we can handle the last operand.  */
-  switch (code)
-    {
-    case NE:
-      /* Normalized to EQ above.  */
-      gcc_unreachable ();
-
-    case LE:
-    case LT:
-    case GE:
-    case GT:
-    case EQ:
-      /* These instructions have a form taking an immediate zero.  */
-      if (operands[5] == CONST0_RTX (<MODE>mode))
-        break;
-      /* Fall through, as may need to load into register.  */
-    default:
-      if (!REG_P (operands[5]))
-        operands[5] = force_reg (<MODE>mode, operands[5]);
-      break;
-    }
-
-  switch (code)
-    {
-    case LT:
-      emit_insn (gen_aarch64_cmlt<mode> (mask, operands[4], operands[5]));
-      break;
-
-    case GE:
-      emit_insn (gen_aarch64_cmge<mode> (mask, operands[4], operands[5]));
-      break;
-
-    case LE:
-      emit_insn (gen_aarch64_cmle<mode> (mask, operands[4], operands[5]));
-      break;
-
-    case GT:
-      emit_insn (gen_aarch64_cmgt<mode> (mask, operands[4], operands[5]));
-      break;
-
-    case LTU:
-      emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[5], operands[4]));
-      break;
-
-    case GEU:
-      emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[4], operands[5]));
-      break;
-
-    case LEU:
-      emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[5], operands[4]));
-      break;
-
-    case GTU:
-      emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[4], operands[5]));
-      break;
-
-    /* NE has been normalized to EQ above.  */
-    case EQ:
-      emit_insn (gen_aarch64_cmeq<mode> (mask, operands[4], operands[5]));
-      break;
-
-    default:
-      gcc_unreachable ();
-    }
-
-    /* If we have (a = (b CMP c) ? -1 : 0);
-       Then we can simply move the generated mask.  */
-
-    if (op1 == CONSTM1_RTX (<V_cmp_result>mode)
-	&& op2 == CONST0_RTX (<V_cmp_result>mode))
-      emit_move_insn (operands[0], mask);
-    else
-      {
-	if (!REG_P (op1))
-	  op1 = force_reg (<MODE>mode, op1);
-	if (!REG_P (op2))
-	  op2 = force_reg (<MODE>mode, op2);
-	emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], mask,
-					       op1, op2));
-      }
-
-  DONE;
-})
-
-(define_expand "aarch64_vcond_internal<VDQF_COND:mode><VDQF:mode>"
-  [(set (match_operand:VDQF_COND 0 "register_operand")
-	(if_then_else:VDQF
-	  (match_operator 3 "comparison_operator"
-	    [(match_operand:VDQF 4 "register_operand")
-	     (match_operand:VDQF 5 "nonmemory_operand")])
-	  (match_operand:VDQF_COND 1 "nonmemory_operand")
-	  (match_operand:VDQF_COND 2 "nonmemory_operand")))]
-  "TARGET_SIMD"
-{
-  int inverse = 0;
-  int use_zero_form = 0;
-  int swap_bsl_operands = 0;
-  rtx op1 = operands[1];
-  rtx op2 = operands[2];
-  rtx mask = gen_reg_rtx (<VDQF_COND:V_cmp_result>mode);
-  rtx tmp = gen_reg_rtx (<VDQF_COND:V_cmp_result>mode);
-
-  rtx (*base_comparison) (rtx, rtx, rtx);
-  rtx (*complimentary_comparison) (rtx, rtx, rtx);
-
-  switch (GET_CODE (operands[3]))
-    {
-    case GE:
-    case GT:
-    case LE:
-    case LT:
-    case EQ:
-      if (operands[5] == CONST0_RTX (<MODE>mode))
-	{
-	  use_zero_form = 1;
-	  break;
-	}
-      /* Fall through.  */
-    default:
-      if (!REG_P (operands[5]))
-	operands[5] = force_reg (<VDQF:MODE>mode, operands[5]);
-    }
-
-  switch (GET_CODE (operands[3]))
-    {
-    case LT:
-    case UNLT:
-      inverse = 1;
-      /* Fall through.  */
-    case GE:
-    case UNGE:
-    case ORDERED:
-    case UNORDERED:
-      base_comparison = gen_aarch64_cmge<VDQF:mode>;
-      complimentary_comparison = gen_aarch64_cmgt<VDQF:mode>;
-      break;
-    case LE:
-    case UNLE:
-      inverse = 1;
-      /* Fall through.  */
-    case GT:
-    case UNGT:
-      base_comparison = gen_aarch64_cmgt<VDQF:mode>;
-      complimentary_comparison = gen_aarch64_cmge<VDQF:mode>;
-      break;
-    case EQ:
-    case NE:
-    case UNEQ:
-      base_comparison = gen_aarch64_cmeq<VDQF:mode>;
-      complimentary_comparison = gen_aarch64_cmeq<VDQF:mode>;
-      break;
-    default:
-      gcc_unreachable ();
-    }
-
-  switch (GET_CODE (operands[3]))
-    {
-    case LT:
-    case LE:
-    case GT:
-    case GE:
-    case EQ:
-      /* The easy case.  Here we emit one of FCMGE, FCMGT or FCMEQ.
-	 As a LT b <=> b GE a && a LE b <=> b GT a.  Our transformations are:
-	 a GE b -> a GE b
-	 a GT b -> a GT b
-	 a LE b -> b GE a
-	 a LT b -> b GT a
-	 a EQ b -> a EQ b
-	 Note that there also exist direct comparison against 0 forms,
-	 so catch those as a special case.  */
-      if (use_zero_form)
-	{
-	  inverse = 0;
-	  switch (GET_CODE (operands[3]))
-	    {
-	    case LT:
-	      base_comparison = gen_aarch64_cmlt<VDQF:mode>;
-	      break;
-	    case LE:
-	      base_comparison = gen_aarch64_cmle<VDQF:mode>;
-	      break;
-	    default:
-	      /* Do nothing, other zero form cases already have the correct
-		 base_comparison.  */
-	      break;
-	    }
-	}
-
-      if (!inverse)
-	emit_insn (base_comparison (mask, operands[4], operands[5]));
-      else
-	emit_insn (complimentary_comparison (mask, operands[5], operands[4]));
-      break;
-    case UNLT:
-    case UNLE:
-    case UNGT:
-    case UNGE:
-    case NE:
-      /* FCM returns false for lanes which are unordered, so if we use
-	 the inverse of the comparison we actually want to emit, then
-	 swap the operands to BSL, we will end up with the correct result.
-	 Note that a NE NaN and NaN NE b are true for all a, b.
-
-	 Our transformations are:
-	 a GE b -> !(b GT a)
-	 a GT b -> !(b GE a)
-	 a LE b -> !(a GT b)
-	 a LT b -> !(a GE b)
-	 a NE b -> !(a EQ b)  */
-
-      if (inverse)
-	emit_insn (base_comparison (mask, operands[4], operands[5]));
-      else
-	emit_insn (complimentary_comparison (mask, operands[5], operands[4]));
-
-      swap_bsl_operands = 1;
-      break;
-    case UNEQ:
-      /* We check (a > b ||  b > a).  combining these comparisons give us
-	 true iff !(a != b && a ORDERED b), swapping the operands to BSL
-	 will then give us (a == b ||  a UNORDERED b) as intended.  */
-
-      emit_insn (gen_aarch64_cmgt<VDQF:mode> (mask, operands[4], operands[5]));
-      emit_insn (gen_aarch64_cmgt<VDQF:mode> (tmp, operands[5], operands[4]));
-      emit_insn (gen_ior<VDQF_COND:v_cmp_result>3 (mask, mask, tmp));
-      swap_bsl_operands = 1;
-      break;
-    case UNORDERED:
-       /* Operands are ORDERED iff (a > b || b >= a).
-	 Swapping the operands to BSL will give the UNORDERED case.  */
-     swap_bsl_operands = 1;
-     /* Fall through.  */
-    case ORDERED:
-      emit_insn (gen_aarch64_cmgt<VDQF:mode> (tmp, operands[4], operands[5]));
-      emit_insn (gen_aarch64_cmge<VDQF:mode> (mask, operands[5], operands[4]));
-      emit_insn (gen_ior<VDQF_COND:v_cmp_result>3 (mask, mask, tmp));
-      break;
-    default:
-      gcc_unreachable ();
-    }
-
-  if (swap_bsl_operands)
-    {
-      op1 = operands[2];
-      op2 = operands[1];
-    }
-
-    /* If we have (a = (b CMP c) ? -1 : 0);
-       Then we can simply move the generated mask.  */
-
-    if (op1 == CONSTM1_RTX (<VDQF_COND:V_cmp_result>mode)
-	&& op2 == CONST0_RTX (<VDQF_COND:V_cmp_result>mode))
-      emit_move_insn (operands[0], mask);
-    else
-      {
-	if (!REG_P (op1))
-	  op1 = force_reg (<VDQF_COND:MODE>mode, op1);
-	if (!REG_P (op2))
-	  op2 = force_reg (<VDQF_COND:MODE>mode, op2);
-	emit_insn (gen_aarch64_simd_bsl<VDQF_COND:mode> (operands[0], mask,
-					       op1, op2));
-      }
-
-  DONE;
-})
-
 (define_expand "vcond<mode><mode>"
   [(set (match_operand:VALLDI 0 "register_operand")
 	(if_then_else:VALLDI
@@ -2520,26 +2212,50 @@
 	  (match_operand:VALLDI 2 "nonmemory_operand")))]
   "TARGET_SIMD"
 {
-  emit_insn (gen_aarch64_vcond_internal<mode><mode> (operands[0], operands[1],
-					       operands[2], operands[3],
+  rtx mask = gen_reg_rtx (<V_cmp_result>mode);
+  enum rtx_code code = GET_CODE (operands[3]);
+
+  emit_insn (gen_vec_cmp<mode><v_cmp_result>_internal (mask, operands[3],
 					       operands[4], operands[5]));
+  /* See comments of vec_cmp<mode><v_cmp_result>_internal, the opposite
+     result masks are computed for below operators, we need to invert
+     the mask here.  In this case we can save an inverting instruction
+     by simply swapping the two operands to bsl.  */
+  if (code == NE || code == UNEQ || code == UNLT || code == UNLE
+      || code == UNGT || code == UNGE || code == UNORDERED)
+    std::swap (operands[1], operands[2]);
+
+  emit_insn (gen_vcond_mask_<mode><v_cmp_result> (operands[0], operands[1],
+						 operands[2], mask));
   DONE;
 })
 
-(define_expand "vcond<v_cmp_result><mode>"
-  [(set (match_operand:<V_cmp_result> 0 "register_operand")
-	(if_then_else:<V_cmp_result>
+(define_expand "vcond<v_cmp_mixed><mode>"
+  [(set (match_operand:<V_cmp_mixed> 0 "register_operand")
+	(if_then_else:<V_cmp_mixed>
 	  (match_operator 3 "comparison_operator"
-	    [(match_operand:VDQF 4 "register_operand")
-	     (match_operand:VDQF 5 "nonmemory_operand")])
-	  (match_operand:<V_cmp_result> 1 "nonmemory_operand")
-	  (match_operand:<V_cmp_result> 2 "nonmemory_operand")))]
+	    [(match_operand:VDQF_COND 4 "register_operand")
+	     (match_operand:VDQF_COND 5 "nonmemory_operand")])
+	  (match_operand:<V_cmp_mixed> 1 "nonmemory_operand")
+	  (match_operand:<V_cmp_mixed> 2 "nonmemory_operand")))]
   "TARGET_SIMD"
 {
-  emit_insn (gen_aarch64_vcond_internal<v_cmp_result><mode> (
+  rtx mask = gen_reg_rtx (<V_cmp_result>mode);
+  enum rtx_code code = GET_CODE (operands[3]);
+
+  emit_insn (gen_vec_cmp<mode><v_cmp_result>_internal (mask, operands[3],
+					       operands[4], operands[5]));
+  /* See comments of vec_cmp<mode><v_cmp_result>_internal, the opposite
+     result masks are computed for below operators, we need to invert
+     the mask here.  In this case we can save an inverting instruction
+     by simply swapping the two operands to bsl.  */
+  if (code == NE || code == UNEQ || code == UNLT || code == UNLE
+      || code == UNGT || code == UNGE || code == UNORDERED)
+    std::swap (operands[1], operands[2]);
+
+  emit_insn (gen_vcond_mask_<v_cmp_mixed><v_cmp_result> (
 						operands[0], operands[1],
-						operands[2], operands[3],
-						operands[4], operands[5]));
+						operands[2], mask));
   DONE;
 })
 
@@ -2553,9 +2269,48 @@
 	  (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))]
   "TARGET_SIMD"
 {
-  emit_insn (gen_aarch64_vcond_internal<mode><mode> (operands[0], operands[1],
-					       operands[2], operands[3],
+  rtx mask = gen_reg_rtx (<MODE>mode);
+  enum rtx_code code = GET_CODE (operands[3]);
+
+  emit_insn (gen_vec_cmp<mode><mode>_internal (mask, operands[3],
 					       operands[4], operands[5]));
+  /* See comments of vec_cmp<mode><mode>_internal, the opposite result
+     mask is computed for NE operator, we need to invert the mask here.
+     In this case we can save an inverting instruction by simply swapping
+     the two operands to bsl.  */
+  if (code == NE)
+    std::swap (operands[1], operands[2]);
+
+  emit_insn (gen_vcond_mask_<mode><v_cmp_result> (operands[0], operands[1],
+						 operands[2], mask));
+  DONE;
+})
+
+(define_expand "vcondu<mode><v_cmp_mixed>"
+  [(set (match_operand:VDQF 0 "register_operand")
+	(if_then_else:VDQF
+	  (match_operator 3 "comparison_operator"
+	    [(match_operand:<V_cmp_mixed> 4 "register_operand")
+	     (match_operand:<V_cmp_mixed> 5 "nonmemory_operand")])
+	  (match_operand:VDQF 1 "nonmemory_operand")
+	  (match_operand:VDQF 2 "nonmemory_operand")))]
+  "TARGET_SIMD"
+{
+  rtx mask = gen_reg_rtx (<V_cmp_result>mode);
+  enum rtx_code code = GET_CODE (operands[3]);
+
+  emit_insn (gen_vec_cmp<v_cmp_mixed><v_cmp_mixed>_internal (
+						  mask, operands[3],
+						  operands[4], operands[5]));
+  /* See comments of vec_cmp<mode><mode>_internal, the opposite result
+     mask is computed for NE operator, we need to invert the mask here.
+     In this case we can save an inverting instruction by simply swapping
+     the two operands to bsl.  */
+  if (code == NE)
+    std::swap (operands[1], operands[2]);
+
+  emit_insn (gen_vcond_mask_<mode><v_cmp_result> (operands[0], operands[1],
+						 operands[2], mask));
   DONE;
 })
 
@@ -4156,7 +3911,7 @@
 ;; cmtst
 
 ;; Although neg (ne (and x y) 0) is the natural way of expressing a cmtst,
-;; we don't have any insns using ne, and aarch64_vcond_internal outputs
+;; we don't have any insns using ne, and aarch64_vcond outputs
 ;; not (neg (eq (and x y) 0))
 ;; which is rewritten by simplify_rtx as
 ;; plus (eq (and x y) 0) -1.
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 43b22d8..f53fe9d 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -610,6 +610,16 @@
 				(V2DF "v2di") (DF    "di")
 				(SF   "si")])
 
+;; Mode for vector conditional operations where the comparison has
+;; different type from the lhs.
+(define_mode_attr V_cmp_mixed [(V2SI "V2SF") (V4SI "V4SF")
+			       (V2DI "V2DF") (V2SF "V2SI")
+			       (V4SF "V4SI") (V2DF "V2DI")])
+
+(define_mode_attr v_cmp_mixed [(V2SI "v2sf") (V4SI "v4sf")
+			       (V2DI "v2df") (V2SF "v2si")
+			       (V4SF "v4si") (V2DF "v2di")])
+
 ;; Lower case element modes (as used in shift immediate patterns).
 (define_mode_attr ve_mode [(V8QI "qi") (V16QI "qi")
 			   (V4HI "hi") (V8HI  "hi")

Comments

James Greenhalgh June 14, 2016, 2:51 p.m. UTC | #1
On Wed, Jun 08, 2016 at 03:57:42PM +0100, Bin Cheng wrote:
> > From: James Greenhalgh <james.greenhalgh@arm.com>
> > Sent: 31 May 2016 16:24
> > To: Bin Cheng
> > Cc: gcc-patches@gcc.gnu.org; nd
> > Subject: Re: [PATCH AArch64]Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns.
> >     
> > On Tue, May 17, 2016 at 09:02:22AM +0000, Bin Cheng wrote:
> > > Hi,
> > > Alan and Renlin noticed that some vcond patterns are not supported in
> > > AArch64(or AArch32?) backend, and they both had some patches fixing this.
> > > After investigation, I agree with them that vcond/vcondu in AArch64's backend
> > > should be re-implemented using vec_cmp/vcond_mask patterns, so here comes
> > > this patch which is based on Alan's.  This patch supports all vcond/vcondu
> > > patterns by implementing/using vec_cmp and vcond_mask patterns.  Different to
> > > the original patch, it doesn't change GCC's expanding process, and it keeps
> > > vcond patterns.  The patch also introduces vec_cmp*_internal to support
> > > special case optimization for vcond/vcondu which current implementation does.
> > > Apart from Alan's patch, I also learned ideas from Renlin's, and it is my
> > > change that shall be blamed if any potential bug is introduced.
> > > 
> > > With this patch, GCC's test condition "vect_cond_mixed" can be enabled on
> > > AArch64 (in a following patch).  Bootstrap and test on AArch64.  Is it OK?
> > > BTW, this patch is necessary for gcc.dg/vect/PR56541.c (on AArch64) which was
> > > added before in tree if-conversion patch.
> > 
> > Splitting this patch would have been very helpful. One patch each for the
> > new standard pattern names, and one patch for the refactor of vcond. As
> > it is, this patch is rather difficult to read.
> Done, patch split into two with one implementing new vcond_mask&vec_cmp
> patterns, and another re-writing vcond patterns.
> 

Hi Bin,

Thanks for the changes. Just to make keeping track of review easier for
me, would you mind resending these in series as two seperate emails.

i.e.

[Patch AArch64 1/2]  Implement vcond_mask and vec_cmp patterns
[Patch AArch64 2/2]  Rewrite vcond patterns

Thanks,
James
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 6ea35bf..9437d02 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2202,6 +2202,325 @@ 
   DONE;
 })
 
+(define_expand "vcond_mask_<mode><v_cmp_result>"
+  [(match_operand:VALLDI 0 "register_operand")
+   (match_operand:VALLDI 1 "nonmemory_operand")
+   (match_operand:VALLDI 2 "nonmemory_operand")
+   (match_operand:<V_cmp_result> 3 "register_operand")]
+  "TARGET_SIMD"
+{
+  /* If we have (a = (P) ? -1 : 0);
+     Then we can simply move the generated mask (result must be int).  */
+  if (operands[1] == CONSTM1_RTX (<MODE>mode)
+      && operands[2] == CONST0_RTX (<MODE>mode))
+    emit_move_insn (operands[0], operands[3]);
+  /* Similarly, (a = (P) ? 0 : -1) is just inverting the generated mask.  */
+  else if (operands[1] == CONST0_RTX (<MODE>mode)
+	   && operands[2] == CONSTM1_RTX (<MODE>mode))
+    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[3]));
+  else
+    {
+      if (!REG_P (operands[1]))
+	operands[1] = force_reg (<MODE>mode, operands[1]);
+      if (!REG_P (operands[2]))
+	operands[2] = force_reg (<MODE>mode, operands[2]);
+      emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], operands[3],
+					     operands[1], operands[2]));
+    }
+
+  DONE;
+})
+
+;; Patterns comparing two vectors to produce a mask.
+
+;; Internal pattern for vec_cmp.  It returns expected result mask for
+;; comparison operators other than NE.  For NE operator, it returns
+;; the opposite result mask.  This is intended behavior so that we
+;; can save one mask inverting instruction when using this pattern in
+;; vcond patterns.  In this case, it is the caller's responsibility
+;; to interpret and use the result mask correctly.
+(define_expand "vec_cmp<mode><mode>_internal"
+  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
+	  (match_operator 1 "comparison_operator"
+	    [(match_operand:VSDQ_I_DI 2 "register_operand")
+	     (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
+  "TARGET_SIMD"
+{
+  rtx mask = operands[0];
+  enum rtx_code code = GET_CODE (operands[1]);
+
+  if (operands[3] == CONST0_RTX (<MODE>mode)
+      && (code == NE || code == LE || code == LT
+	  || code == GE || code == GT || code == EQ))
+    {
+      /* Some instructions have a form taking an immediate zero.  */
+      ;
+    }
+  else if (!REG_P (operands[3]))
+    {
+      /* Make sure we can handle the last operand.  */
+      operands[3] = force_reg (<MODE>mode, operands[3]);
+    }
+
+  switch (code)
+    {
+    case LT:
+      emit_insn (gen_aarch64_cmlt<mode> (mask, operands[2], operands[3]));
+      break;
+
+    case GE:
+      emit_insn (gen_aarch64_cmge<mode> (mask, operands[2], operands[3]));
+      break;
+
+    case LE:
+      emit_insn (gen_aarch64_cmle<mode> (mask, operands[2], operands[3]));
+      break;
+
+    case GT:
+      emit_insn (gen_aarch64_cmgt<mode> (mask, operands[2], operands[3]));
+      break;
+
+    case LTU:
+      emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[3], operands[2]));
+      break;
+
+    case GEU:
+      emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[2], operands[3]));
+      break;
+
+    case LEU:
+      emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[3], operands[2]));
+      break;
+
+    case GTU:
+      emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[2], operands[3]));
+      break;
+
+    case NE:
+      /* See comment at the beginning of this pattern, we return the
+	 opposite of result mask for this operator, and it's caller's
+	 resonsibility to invert the mask.
+
+	 Fall through.  */
+
+    case EQ:
+      emit_insn (gen_aarch64_cmeq<mode> (mask, operands[2], operands[3]));
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  /* Don't invert result mask for NE, which should be done in caller.  */
+
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><mode>"
+  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
+	  (match_operator 1 "comparison_operator"
+	    [(match_operand:VSDQ_I_DI 2 "register_operand")
+	     (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
+  "TARGET_SIMD"
+{
+  enum rtx_code code = GET_CODE (operands[1]);
+
+  emit_insn (gen_vec_cmp<mode><mode>_internal (operands[0],
+					 operands[1], operands[2],
+					 operands[3]));
+  /* See comments of vec_cmp<mode><mode>_internal, the opposite result
+     mask is computed for NE operator, we need to invert the mask here.  */
+  if (code == NE)
+    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
+
+  DONE;
+})
+
+
+;; Internal pattern for vec_cmp.  Similar to vec_cmp<mode<mode>_internal,
+;; it returns the opposite result mask for operators NE, UNEQ, UNLT,
+;; UNLE, UNGT, UNGE and UNORDERED.  This is intended behavior so that
+;; we can save one mask inverting instruction when using this pattern
+;; in vcond patterns.  In these cases, it is the caller's responsibility
+;; to interpret and use the result mask correctly.
+(define_expand "vec_cmp<mode><v_cmp_result>_internal"
+  [(set (match_operand:<V_cmp_result> 0 "register_operand")
+	(match_operator 1 "comparison_operator"
+	    [(match_operand:VDQF 2 "register_operand")
+	     (match_operand:VDQF 3 "nonmemory_operand")]))]
+  "TARGET_SIMD"
+{
+  int use_zero_form = 0;
+  enum rtx_code code = GET_CODE (operands[1]);
+  rtx tmp = gen_reg_rtx (<V_cmp_result>mode);
+
+  rtx (*comparison) (rtx, rtx, rtx);
+
+  if (operands[3] == CONST0_RTX (<MODE>mode)
+      && (code == LE || code == LT || code == GE || code == GT || code == EQ))
+    {
+      /* Some instructions have a form taking an immediate zero.  */
+      use_zero_form = 1;
+    }
+  else if (!REG_P (operands[3]))
+    {
+      /* Make sure we can handle the last operand.  */
+      operands[3] = force_reg (<MODE>mode, operands[3]);
+    }
+
+  switch (code)
+    {
+    case LT:
+      if (use_zero_form)
+	{
+	  comparison = gen_aarch64_cmlt<mode>;
+	  break;
+	}
+      /* Else, fall through.  */
+    case UNGE:
+      std::swap (operands[2], operands[3]);
+      /* Fall through.  */
+    case UNLE:
+    case GT:
+      comparison = gen_aarch64_cmgt<mode>;
+      break;
+    case LE:
+      if (use_zero_form)
+	{
+	  comparison = gen_aarch64_cmle<mode>;
+	  break;
+	}
+      /* Else, fall through.  */
+    case UNGT:
+      std::swap (operands[2], operands[3]);
+      /* Fall through.  */
+    case UNLT:
+    case GE:
+      comparison = gen_aarch64_cmge<mode>;
+      break;
+    case NE:
+    case EQ:
+      comparison = gen_aarch64_cmeq<mode>;
+      break;
+    case UNEQ:
+    case ORDERED:
+    case UNORDERED:
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  switch (code)
+    {
+    case UNGT:
+    case UNGE:
+    case NE:
+    case UNLT:
+    case UNLE:
+      /* FCM returns false for lanes which are unordered, so if we use
+	 the inverse of the comparison we actually want to emit, then
+	 revert the result, we will end up with the correct result.
+	 Note that a NE NaN and NaN NE b are true for all a, b.
+
+	 Our transformations are:
+	 a GE b -> !(b GT a)
+	 a GT b -> !(b GE a)
+	 a LE b -> !(a GT b)
+	 a LT b -> !(a GE b)
+	 a NE b -> !(a EQ b)
+
+	 See comment at the beginning of this pattern, we return the
+	 opposite of result mask for these operators, and it's caller's
+	 resonsibility to invert the mask.
+
+	 Fall through.  */
+    case LT:
+    case LE:
+    case GT:
+    case GE:
+    case EQ:
+      /* The easy case.  Here we emit one of FCMGE, FCMGT or FCMEQ.
+	 As a LT b <=> b GE a && a LE b <=> b GT a.  Our transformations are:
+	 a GE b -> a GE b
+	 a GT b -> a GT b
+	 a LE b -> b GE a
+	 a LT b -> b GT a
+	 a EQ b -> a EQ b
+	 Note that there also exist direct comparison against 0 forms,
+	 so catch those as a special case.  */
+
+      emit_insn (comparison (operands[0], operands[2], operands[3]));
+      break;
+
+    case UNEQ:
+      /* We first check (a > b ||  b > a) which is !UNEQ, reverting
+	 this result will then give us (a == b || a UNORDERED b).
+
+	 See comment at the beginning of this pattern, we return the
+	 opposite of result mask for this operator, and it's caller's
+	 resonsibility to invert the mask.  */
+
+      emit_insn (gen_aarch64_cmgt<mode> (operands[0],
+					 operands[2], operands[3]));
+      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2]));
+      emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
+      break;
+    case UNORDERED:
+      /* Operands are ORDERED iff (a > b || b >= a), so we can compute
+	 UNORDERED as !ORDERED.
+
+	 See comment at the beginning of this pattern, we return the
+	 opposite of result mask for this operator, and it's caller's
+	 resonsibility to invert the mask.
+
+	 Fall through.  */
+    case ORDERED:
+      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[2], operands[3]));
+      emit_insn (gen_aarch64_cmge<mode> (operands[0],
+					 operands[3], operands[2]));
+      emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><v_cmp_result>"
+  [(set (match_operand:<V_cmp_result> 0 "register_operand")
+	(match_operator 1 "comparison_operator"
+	    [(match_operand:VDQF 2 "register_operand")
+	     (match_operand:VDQF 3 "nonmemory_operand")]))]
+  "TARGET_SIMD"
+{
+  enum rtx_code code = GET_CODE (operands[1]);
+
+  emit_insn (gen_vec_cmp<mode><v_cmp_result>_internal (operands[0],
+					 operands[1], operands[2],
+					 operands[3]));
+  /* See comments of vec_cmp<mode><v_cmp_result>_internal, the opposite
+     result masks are computed for below operators, we need to invert
+     the mask here.  */
+  if (code == NE || code == UNEQ || code == UNLT || code == UNLE
+      || code == UNGT || code == UNGE || code == UNORDERED)
+    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
+
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><mode>"
+  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
+	  (match_operator 1 "comparison_operator"
+	    [(match_operand:VSDQ_I_DI 2 "register_operand")
+	     (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
+  "TARGET_SIMD"
+{
+  emit_insn (gen_vec_cmp<mode><mode> (operands[0], operands[1],
+				      operands[2], operands[3]));
+  DONE;
+})
+
 (define_expand "aarch64_vcond_internal<mode><mode>"
   [(set (match_operand:VSDQ_I_DI 0 "register_operand")
 	(if_then_else:VSDQ_I_DI