diff mbox

[AArch64] Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns.

Message ID DB5PR08MB1144FD75076B139152FF6394E7480@DB5PR08MB1144.eurprd08.prod.outlook.com
State New
Headers show

Commit Message

Bin Cheng May 17, 2016, 9:02 a.m. UTC
Hi,
Alan and Renlin noticed that some vcond patterns are not supported in AArch64(or AArch32?) backend, and they both had some patches fixing this.  After investigation, I agree with them that vcond/vcondu in AArch64's backend should be re-implemented using vec_cmp/vcond_mask patterns, so here comes this patch which is based on Alan's.  This patch supports all vcond/vcondu patterns by implementing/using vec_cmp and vcond_mask patterns.  Different to the original patch, it doesn't change GCC's expanding process, and it keeps vcond patterns.  The patch also introduces vec_cmp*_internal to support special case optimization for vcond/vcondu which current implementation does.
Apart from Alan's patch, I also learned ideas from Renlin's, and it is my change that shall be blamed if any potential bug is introduced.

With this patch, GCC's test condition "vect_cond_mixed" can be enabled on AArch64 (in a following patch).
Bootstrap and test on AArch64.  Is it OK?  BTW, this patch is necessary for gcc.dg/vect/PR56541.c (on AArch64) which was added before in tree if-conversion patch.

Thanks,
bin

2016-05-11  Alan Lawrence  <alan.lawrence@arm.com>
	    Renlin Li  <renlin.li@arm.com>
	    Bin Cheng  <bin.cheng@arm.com>

	* config/aarch64/iterators.md (V_cmp_mixed, v_cmp_mixed): New.
	* config/aarch64/aarch64-simd.md (<su><maxmin>v2di3): Call
	gen_vcondv2div2di instead of gen_aarch64_vcond_internalv2div2di.
	(aarch64_vcond_internal<mode><mode>): Delete pattern.
	(aarch64_vcond_internal<VDQF_COND:mode><VDQF:mode>): Ditto.
	(vcond_mask_<mode><v_cmp_result>): New pattern.
	(vec_cmp<mode><mode>_internal, vec_cmp<mode><mode>): New pattern.
	(vec_cmp<mode><v_cmp_result>_internal): New pattern.
	(vec_cmp<mode><v_cmp_result>, vec_cmpu<mode><mode>): New pattern.
	(vcond<mode><mode>): Re-implement using vec_cmp and vcond_mask.
	(vcondu<mode><mode>): Ditto.
	(vcond<v_cmp_result><mode>): Delete.
	(vcond<v_cmp_mixed><mode>): New pattern.
	(vcondu<mode><v_cmp_mixed>): New pattern.
	(aarch64_cmtst<mode>): Revise comment using aarch64_vcond instead
	of aarch64_vcond_internal.

gcc/testsuite/ChangeLog
2016-05-11  Bin Cheng  <bin.cheng@arm.com>

	* gcc.target/aarch64/vect-vcond.c: New test.

Comments

Bin.Cheng May 24, 2016, 10:13 a.m. UTC | #1
Ping.

Thanks,
bin

On Tue, May 17, 2016 at 10:02 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
> Hi,
> Alan and Renlin noticed that some vcond patterns are not supported in AArch64(or AArch32?) backend, and they both had some patches fixing this.  After investigation, I agree with them that vcond/vcondu in AArch64's backend should be re-implemented using vec_cmp/vcond_mask patterns, so here comes this patch which is based on Alan's.  This patch supports all vcond/vcondu patterns by implementing/using vec_cmp and vcond_mask patterns.  Different to the original patch, it doesn't change GCC's expanding process, and it keeps vcond patterns.  The patch also introduces vec_cmp*_internal to support special case optimization for vcond/vcondu which current implementation does.
> Apart from Alan's patch, I also learned ideas from Renlin's, and it is my change that shall be blamed if any potential bug is introduced.
>
> With this patch, GCC's test condition "vect_cond_mixed" can be enabled on AArch64 (in a following patch).
> Bootstrap and test on AArch64.  Is it OK?  BTW, this patch is necessary for gcc.dg/vect/PR56541.c (on AArch64) which was added before in tree if-conversion patch.
>
> Thanks,
> bin
>
> 2016-05-11  Alan Lawrence  <alan.lawrence@arm.com>
>             Renlin Li  <renlin.li@arm.com>
>             Bin Cheng  <bin.cheng@arm.com>
>
>         * config/aarch64/iterators.md (V_cmp_mixed, v_cmp_mixed): New.
>         * config/aarch64/aarch64-simd.md (<su><maxmin>v2di3): Call
>         gen_vcondv2div2di instead of gen_aarch64_vcond_internalv2div2di.
>         (aarch64_vcond_internal<mode><mode>): Delete pattern.
>         (aarch64_vcond_internal<VDQF_COND:mode><VDQF:mode>): Ditto.
>         (vcond_mask_<mode><v_cmp_result>): New pattern.
>         (vec_cmp<mode><mode>_internal, vec_cmp<mode><mode>): New pattern.
>         (vec_cmp<mode><v_cmp_result>_internal): New pattern.
>         (vec_cmp<mode><v_cmp_result>, vec_cmpu<mode><mode>): New pattern.
>         (vcond<mode><mode>): Re-implement using vec_cmp and vcond_mask.
>         (vcondu<mode><mode>): Ditto.
>         (vcond<v_cmp_result><mode>): Delete.
>         (vcond<v_cmp_mixed><mode>): New pattern.
>         (vcondu<mode><v_cmp_mixed>): New pattern.
>         (aarch64_cmtst<mode>): Revise comment using aarch64_vcond instead
>         of aarch64_vcond_internal.
>
> gcc/testsuite/ChangeLog
> 2016-05-11  Bin Cheng  <bin.cheng@arm.com>
>
>         * gcc.target/aarch64/vect-vcond.c: New test.
James Greenhalgh May 31, 2016, 3:24 p.m. UTC | #2
On Tue, May 17, 2016 at 09:02:22AM +0000, Bin Cheng wrote:
> Hi,
> Alan and Renlin noticed that some vcond patterns are not supported in
> AArch64(or AArch32?) backend, and they both had some patches fixing this.
> After investigation, I agree with them that vcond/vcondu in AArch64's backend
> should be re-implemented using vec_cmp/vcond_mask patterns, so here comes
> this patch which is based on Alan's.  This patch supports all vcond/vcondu
> patterns by implementing/using vec_cmp and vcond_mask patterns.  Different to
> the original patch, it doesn't change GCC's expanding process, and it keeps
> vcond patterns.  The patch also introduces vec_cmp*_internal to support
> special case optimization for vcond/vcondu which current implementation does.
> Apart from Alan's patch, I also learned ideas from Renlin's, and it is my
> change that shall be blamed if any potential bug is introduced.
> 
> With this patch, GCC's test condition "vect_cond_mixed" can be enabled on
> AArch64 (in a following patch).  Bootstrap and test on AArch64.  Is it OK?
> BTW, this patch is necessary for gcc.dg/vect/PR56541.c (on AArch64) which was
> added before in tree if-conversion patch.

Splitting this patch would have been very helpful. One patch each for the
new standard pattern names, and one patch for the refactor of vcond. As
it is, this patch is rather difficult to read.

> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index bd73bce..f51473a 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1053,7 +1053,7 @@
>      }
>  
>    cmp_fmt = gen_rtx_fmt_ee (cmp_operator, V2DImode, operands[1], operands[2]);
> -  emit_insn (gen_aarch64_vcond_internalv2div2di (operands[0], operands[1],
> +  emit_insn (gen_vcondv2div2di (operands[0], operands[1],
>                operands[2], cmp_fmt, operands[1], operands[2]));
>    DONE;
>  })
> @@ -2225,204 +2225,215 @@
>    DONE;
>  })
>  
> -(define_expand "aarch64_vcond_internal<mode><mode>"
> +(define_expand "vcond_mask_<mode><v_cmp_result>"
> +  [(match_operand:VALLDI 0 "register_operand")
> +   (match_operand:VALLDI 1 "nonmemory_operand")
> +   (match_operand:VALLDI 2 "nonmemory_operand")
> +   (match_operand:<V_cmp_result> 3 "register_operand")]
> +  "TARGET_SIMD"
> +{
> +  /* If we have (a = (P) ? -1 : 0);
> +     Then we can simply move the generated mask (result must be int).  */
> +  if (operands[1] == CONSTM1_RTX (<MODE>mode)
> +      && operands[2] == CONST0_RTX (<MODE>mode))
> +    emit_move_insn (operands[0], operands[3]);
> +  /* Similarly, (a = (P) ? 0 : -1) is just inverting the generated mask.  */
> +  else if (operands[1] == CONST0_RTX (<MODE>mode)
> +	   && operands[2] == CONSTM1_RTX (<MODE>mode))
> +    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[3]));
> +  else
> +    {
> +      if (!REG_P (operands[1]))
> +	operands[1] = force_reg (<MODE>mode, operands[1]);
> +      if (!REG_P (operands[2]))
> +	operands[2] = force_reg (<MODE>mode, operands[2]);
> +      emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], operands[3],
> +					     operands[1], operands[2]));
> +    }
> +
> +  DONE;
> +})
> +

This pattern is fine.

> +;; Patterns comparing two vectors to produce a mask.

This comment is insufficient. The logic in vec_cmp<mode><mode>_internal
does not always return the expected mask (in particular for NE), but this
is not made clear in the comment.

> +
> +(define_expand "vec_cmp<mode><mode>_internal"
>    [(set (match_operand:VSDQ_I_DI 0 "register_operand")
> -	(if_then_else:VSDQ_I_DI
> -	  (match_operator 3 "comparison_operator"
> -	    [(match_operand:VSDQ_I_DI 4 "register_operand")
> -	     (match_operand:VSDQ_I_DI 5 "nonmemory_operand")])
> -	  (match_operand:VSDQ_I_DI 1 "nonmemory_operand")
> -	  (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))]
> +	  (match_operator 1 "comparison_operator"
> +	    [(match_operand:VSDQ_I_DI 2 "register_operand")
> +	     (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
>    "TARGET_SIMD"

<snip>

> +(define_expand "vec_cmp<mode><mode>"
> +  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
> +	  (match_operator 1 "comparison_operator"
> +	    [(match_operand:VSDQ_I_DI 2 "register_operand")
> +	     (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
> +  "TARGET_SIMD"
> +{
> +  enum rtx_code code = GET_CODE (operands[1]);
> +
> +  emit_insn (gen_vec_cmp<mode><mode>_internal (operands[0],
> +					 operands[1], operands[2],
> +					 operands[3]));
> +  /* See comments in vec_cmp<mode><v_cmp_result>_internal, below
> +     operators are computed using other operators, and we need to
> +     revert the result.  */

s/revert/invert

There are no comments in vec_cmp<mode><v_cmp_result>_internal for this
to refer to. But, there should be - more comments would definitely help
this patch!

These comments apply equally to the other copies of this comment in the
patch.

> +  if (code == NE)
> +    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
>    DONE;
>  })
>  
> -(define_expand "aarch64_vcond_internal<VDQF_COND:mode><VDQF:mode>"
> -  [(set (match_operand:VDQF_COND 0 "register_operand")
> -	(if_then_else:VDQF
> -	  (match_operator 3 "comparison_operator"
> -	    [(match_operand:VDQF 4 "register_operand")
> -	     (match_operand:VDQF 5 "nonmemory_operand")])
> -	  (match_operand:VDQF_COND 1 "nonmemory_operand")
> -	  (match_operand:VDQF_COND 2 "nonmemory_operand")))]
> +(define_expand "vec_cmp<mode><v_cmp_result>_internal"
> +  [(set (match_operand:<V_cmp_result> 0 "register_operand")
> +	(match_operator 1 "comparison_operator"
> +	    [(match_operand:VDQF 2 "register_operand")
> +	     (match_operand:VDQF 3 "nonmemory_operand")]))]

This needs much more thorough commenting. The logic is now extremely
difficult to verify as to which pattern is responible for inverting the
mask. This makes it hard to spot whether the inversions applied to the
unordered comparisons are correct. I can't really review the patch with
the logic split across uncommented patterns in this way - is there
anything you can do to clean it up?

> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vcond.c b/gcc/testsuite/gcc.target/aarch64/vect-vcond.c
> new file mode 100644
> index 0000000..b0ca6d1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-vcond.c

None of this is AArch64 specific?

> @@ -0,0 +1,59 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> +
> +float fa[1024] = {0.0}, fb[1024] = {0.0}, fc[1024];
> +double da[1024] = {0.0}, db[1024] = {0.0}, dc[1024];
> +
> +int vcond_f_i (int *x)
> +{
> +  int i = 0;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      fa[i] -= 1.0;
> +      fb[i] += 1.0;
> +      fc[i] = (x[i] < i) ? fa[i] : fb[i];
> +    }
> +
> +  return 0;
> +}
> +
> +int vcond_f_u (unsigned int *x)
> +{
> +  unsigned int i = 0;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      fa[i] -= 1.0;
> +      fb[i] += 1.0;
> +      fc[i] = (x[i] < i) ? fa[i] : fb[i];
> +    }
> +
> +  return 0;
> +}
> +
> +int vcond_d_i (long long *x)
> +{
> +  long long i = 0;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      da[i] -= 1.0;
> +      db[i] += 1.0;
> +      dc[i] = (x[i] < i) ? da[i] : db[i];
> +    }
> +
> +  return 0;
> +}
> +
> +int vcond_d_u (unsigned long long *x)
> +{
> +  unsigned long long i = 0;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      da[i] -= 1.0;
> +      db[i] += 1.0;
> +      dc[i] = (x[i] < i) ? da[i] : db[i];
> +    }
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */

Thanks,
James
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index bd73bce..f51473a 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1053,7 +1053,7 @@ 
     }
 
   cmp_fmt = gen_rtx_fmt_ee (cmp_operator, V2DImode, operands[1], operands[2]);
-  emit_insn (gen_aarch64_vcond_internalv2div2di (operands[0], operands[1],
+  emit_insn (gen_vcondv2div2di (operands[0], operands[1],
               operands[2], cmp_fmt, operands[1], operands[2]));
   DONE;
 })
@@ -2225,204 +2225,215 @@ 
   DONE;
 })
 
-(define_expand "aarch64_vcond_internal<mode><mode>"
+(define_expand "vcond_mask_<mode><v_cmp_result>"
+  [(match_operand:VALLDI 0 "register_operand")
+   (match_operand:VALLDI 1 "nonmemory_operand")
+   (match_operand:VALLDI 2 "nonmemory_operand")
+   (match_operand:<V_cmp_result> 3 "register_operand")]
+  "TARGET_SIMD"
+{
+  /* If we have (a = (P) ? -1 : 0);
+     Then we can simply move the generated mask (result must be int).  */
+  if (operands[1] == CONSTM1_RTX (<MODE>mode)
+      && operands[2] == CONST0_RTX (<MODE>mode))
+    emit_move_insn (operands[0], operands[3]);
+  /* Similarly, (a = (P) ? 0 : -1) is just inverting the generated mask.  */
+  else if (operands[1] == CONST0_RTX (<MODE>mode)
+	   && operands[2] == CONSTM1_RTX (<MODE>mode))
+    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[3]));
+  else
+    {
+      if (!REG_P (operands[1]))
+	operands[1] = force_reg (<MODE>mode, operands[1]);
+      if (!REG_P (operands[2]))
+	operands[2] = force_reg (<MODE>mode, operands[2]);
+      emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], operands[3],
+					     operands[1], operands[2]));
+    }
+
+  DONE;
+})
+
+;; Patterns comparing two vectors to produce a mask.
+
+(define_expand "vec_cmp<mode><mode>_internal"
   [(set (match_operand:VSDQ_I_DI 0 "register_operand")
-	(if_then_else:VSDQ_I_DI
-	  (match_operator 3 "comparison_operator"
-	    [(match_operand:VSDQ_I_DI 4 "register_operand")
-	     (match_operand:VSDQ_I_DI 5 "nonmemory_operand")])
-	  (match_operand:VSDQ_I_DI 1 "nonmemory_operand")
-	  (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))]
+	  (match_operator 1 "comparison_operator"
+	    [(match_operand:VSDQ_I_DI 2 "register_operand")
+	     (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
   "TARGET_SIMD"
 {
-  rtx op1 = operands[1];
-  rtx op2 = operands[2];
-  rtx mask = gen_reg_rtx (<MODE>mode);
-  enum rtx_code code = GET_CODE (operands[3]);
+  rtx mask = operands[0];
+  enum rtx_code code = GET_CODE (operands[1]);
 
-  /* Switching OP1 and OP2 is necessary for NE (to output a cmeq insn),
-     and desirable for other comparisons if it results in FOO ? -1 : 0
-     (this allows direct use of the comparison result without a bsl).  */
-  if (code == NE
-      || (code != EQ
-	  && op1 == CONST0_RTX (<V_cmp_result>mode)
-	  && op2 == CONSTM1_RTX (<V_cmp_result>mode)))
+  if (operands[3] == CONST0_RTX (<MODE>mode)
+      && (code == NE || code == LE || code == LT
+	  || code == GE || code == GT || code == EQ))
     {
-      op1 = operands[2];
-      op2 = operands[1];
-      switch (code)
-        {
-        case LE: code = GT; break;
-        case LT: code = GE; break;
-        case GE: code = LT; break;
-        case GT: code = LE; break;
-        /* No case EQ.  */
-        case NE: code = EQ; break;
-        case LTU: code = GEU; break;
-        case LEU: code = GTU; break;
-        case GTU: code = LEU; break;
-        case GEU: code = LTU; break;
-        default: gcc_unreachable ();
-        }
+      /* Some instructions have a form taking an immediate zero.  */
+      ;
     }
-
-  /* Make sure we can handle the last operand.  */
-  switch (code)
+  else if (!REG_P (operands[3]))
     {
-    case NE:
-      /* Normalized to EQ above.  */
-      gcc_unreachable ();
-
-    case LE:
-    case LT:
-    case GE:
-    case GT:
-    case EQ:
-      /* These instructions have a form taking an immediate zero.  */
-      if (operands[5] == CONST0_RTX (<MODE>mode))
-        break;
-      /* Fall through, as may need to load into register.  */
-    default:
-      if (!REG_P (operands[5]))
-        operands[5] = force_reg (<MODE>mode, operands[5]);
-      break;
+      /* Make sure we can handle the last operand.  */
+      operands[3] = force_reg (<MODE>mode, operands[3]);
     }
 
   switch (code)
     {
     case LT:
-      emit_insn (gen_aarch64_cmlt<mode> (mask, operands[4], operands[5]));
+      emit_insn (gen_aarch64_cmlt<mode> (mask, operands[2], operands[3]));
       break;
 
     case GE:
-      emit_insn (gen_aarch64_cmge<mode> (mask, operands[4], operands[5]));
+      emit_insn (gen_aarch64_cmge<mode> (mask, operands[2], operands[3]));
       break;
 
     case LE:
-      emit_insn (gen_aarch64_cmle<mode> (mask, operands[4], operands[5]));
+      emit_insn (gen_aarch64_cmle<mode> (mask, operands[2], operands[3]));
       break;
 
     case GT:
-      emit_insn (gen_aarch64_cmgt<mode> (mask, operands[4], operands[5]));
+      emit_insn (gen_aarch64_cmgt<mode> (mask, operands[2], operands[3]));
       break;
 
     case LTU:
-      emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[5], operands[4]));
+      emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[3], operands[2]));
       break;
 
     case GEU:
-      emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[4], operands[5]));
+      emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[2], operands[3]));
       break;
 
     case LEU:
-      emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[5], operands[4]));
+      emit_insn (gen_aarch64_cmgeu<mode> (mask, operands[3], operands[2]));
       break;
 
     case GTU:
-      emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[4], operands[5]));
+      emit_insn (gen_aarch64_cmgtu<mode> (mask, operands[2], operands[3]));
       break;
 
-    /* NE has been normalized to EQ above.  */
+    case NE:
     case EQ:
-      emit_insn (gen_aarch64_cmeq<mode> (mask, operands[4], operands[5]));
+      emit_insn (gen_aarch64_cmeq<mode> (mask, operands[2], operands[3]));
       break;
 
     default:
       gcc_unreachable ();
     }
 
-    /* If we have (a = (b CMP c) ? -1 : 0);
-       Then we can simply move the generated mask.  */
-
-    if (op1 == CONSTM1_RTX (<V_cmp_result>mode)
-	&& op2 == CONST0_RTX (<V_cmp_result>mode))
-      emit_move_insn (operands[0], mask);
-    else
-      {
-	if (!REG_P (op1))
-	  op1 = force_reg (<MODE>mode, op1);
-	if (!REG_P (op2))
-	  op2 = force_reg (<MODE>mode, op2);
-	emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], mask,
-					       op1, op2));
-      }
+  DONE;
+})
 
+(define_expand "vec_cmp<mode><mode>"
+  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
+	  (match_operator 1 "comparison_operator"
+	    [(match_operand:VSDQ_I_DI 2 "register_operand")
+	     (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
+  "TARGET_SIMD"
+{
+  enum rtx_code code = GET_CODE (operands[1]);
+
+  emit_insn (gen_vec_cmp<mode><mode>_internal (operands[0],
+					 operands[1], operands[2],
+					 operands[3]));
+  /* See comments in vec_cmp<mode><v_cmp_result>_internal, below
+     operators are computed using other operators, and we need to
+     revert the result.  */
+  if (code == NE)
+    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
   DONE;
 })
 
-(define_expand "aarch64_vcond_internal<VDQF_COND:mode><VDQF:mode>"
-  [(set (match_operand:VDQF_COND 0 "register_operand")
-	(if_then_else:VDQF
-	  (match_operator 3 "comparison_operator"
-	    [(match_operand:VDQF 4 "register_operand")
-	     (match_operand:VDQF 5 "nonmemory_operand")])
-	  (match_operand:VDQF_COND 1 "nonmemory_operand")
-	  (match_operand:VDQF_COND 2 "nonmemory_operand")))]
+(define_expand "vec_cmp<mode><v_cmp_result>_internal"
+  [(set (match_operand:<V_cmp_result> 0 "register_operand")
+	(match_operator 1 "comparison_operator"
+	    [(match_operand:VDQF 2 "register_operand")
+	     (match_operand:VDQF 3 "nonmemory_operand")]))]
   "TARGET_SIMD"
 {
-  int inverse = 0;
   int use_zero_form = 0;
-  int swap_bsl_operands = 0;
-  rtx op1 = operands[1];
-  rtx op2 = operands[2];
-  rtx mask = gen_reg_rtx (<VDQF_COND:V_cmp_result>mode);
-  rtx tmp = gen_reg_rtx (<VDQF_COND:V_cmp_result>mode);
+  enum rtx_code code = GET_CODE (operands[1]);
+  rtx tmp = gen_reg_rtx (<V_cmp_result>mode);
 
-  rtx (*base_comparison) (rtx, rtx, rtx);
-  rtx (*complimentary_comparison) (rtx, rtx, rtx);
+  rtx (*comparison) (rtx, rtx, rtx);
 
-  switch (GET_CODE (operands[3]))
+  if (operands[3] == CONST0_RTX (<MODE>mode)
+      && (code == LE || code == LT || code == GE || code == GT || code == EQ))
     {
-    case GE:
-    case GT:
-    case LE:
-    case LT:
-    case EQ:
-      if (operands[5] == CONST0_RTX (<MODE>mode))
-	{
-	  use_zero_form = 1;
-	  break;
-	}
-      /* Fall through.  */
-    default:
-      if (!REG_P (operands[5]))
-	operands[5] = force_reg (<VDQF:MODE>mode, operands[5]);
+      /* Some instructions have a form taking an immediate zero.  */
+      use_zero_form = 1;
+    }
+  else if (!REG_P (operands[3]))
+    {
+      /* Make sure we can handle the last operand.  */
+      operands[3] = force_reg (<MODE>mode, operands[3]);
     }
 
-  switch (GET_CODE (operands[3]))
+  switch (code)
     {
     case LT:
-    case UNLT:
-      inverse = 1;
-      /* Fall through.  */
-    case GE:
+      if (use_zero_form)
+	{
+	  comparison = gen_aarch64_cmlt<mode>;
+	  break;
+	}
+      /* Else, fall through.  */
     case UNGE:
-    case ORDERED:
-    case UNORDERED:
-      base_comparison = gen_aarch64_cmge<VDQF:mode>;
-      complimentary_comparison = gen_aarch64_cmgt<VDQF:mode>;
-      break;
-    case LE:
-    case UNLE:
-      inverse = 1;
+      std::swap (operands[2], operands[3]);
       /* Fall through.  */
+    case UNLE:
     case GT:
+      comparison = gen_aarch64_cmgt<mode>;
+      break;
+    case LE:
+      if (use_zero_form)
+	{
+	  comparison = gen_aarch64_cmle<mode>;
+	  break;
+	}
+      /* Else, fall through.  */
     case UNGT:
-      base_comparison = gen_aarch64_cmgt<VDQF:mode>;
-      complimentary_comparison = gen_aarch64_cmge<VDQF:mode>;
+      std::swap (operands[2], operands[3]);
+      /* Fall through.  */
+    case UNLT:
+    case GE:
+      comparison = gen_aarch64_cmge<mode>;
       break;
-    case EQ:
     case NE:
+    case EQ:
+      comparison = gen_aarch64_cmeq<mode>;
+      break;
     case UNEQ:
-      base_comparison = gen_aarch64_cmeq<VDQF:mode>;
-      complimentary_comparison = gen_aarch64_cmeq<VDQF:mode>;
+    case ORDERED:
+    case UNORDERED:
       break;
     default:
       gcc_unreachable ();
     }
 
-  switch (GET_CODE (operands[3]))
+  switch (code)
     {
+    case UNGT:
+    case UNGE:
+    case NE:
+    case UNLT:
+    case UNLE:
+      /* FCM returns false for lanes which are unordered, so if we use
+	 the inverse of the comparison we actually want to emit, then
+	 revert the result, we will end up with the correct result.
+	 Note that a NE NaN and NaN NE b are true for all a, b.
+
+	 Our transformations are:
+	 a GE b -> !(b GT a)
+	 a GT b -> !(b GE a)
+	 a LE b -> !(a GT b)
+	 a LT b -> !(a GE b)
+	 a NE b -> !(a EQ b)
+
+	 Note we revert the result in vec_cmp.
+
+	 Fall through.  */
     case LT:
     case LE:
     case GT:
@@ -2437,99 +2448,69 @@ 
 	 a EQ b -> a EQ b
 	 Note that there also exist direct comparison against 0 forms,
 	 so catch those as a special case.  */
-      if (use_zero_form)
-	{
-	  inverse = 0;
-	  switch (GET_CODE (operands[3]))
-	    {
-	    case LT:
-	      base_comparison = gen_aarch64_cmlt<VDQF:mode>;
-	      break;
-	    case LE:
-	      base_comparison = gen_aarch64_cmle<VDQF:mode>;
-	      break;
-	    default:
-	      /* Do nothing, other zero form cases already have the correct
-		 base_comparison.  */
-	      break;
-	    }
-	}
 
-      if (!inverse)
-	emit_insn (base_comparison (mask, operands[4], operands[5]));
-      else
-	emit_insn (complimentary_comparison (mask, operands[5], operands[4]));
+      emit_insn (comparison (operands[0], operands[2], operands[3]));
       break;
-    case UNLT:
-    case UNLE:
-    case UNGT:
-    case UNGE:
-    case NE:
-      /* FCM returns false for lanes which are unordered, so if we use
-	 the inverse of the comparison we actually want to emit, then
-	 swap the operands to BSL, we will end up with the correct result.
-	 Note that a NE NaN and NaN NE b are true for all a, b.
-
-	 Our transformations are:
-	 a GE b -> !(b GT a)
-	 a GT b -> !(b GE a)
-	 a LE b -> !(a GT b)
-	 a LT b -> !(a GE b)
-	 a NE b -> !(a EQ b)  */
 
-      if (inverse)
-	emit_insn (base_comparison (mask, operands[4], operands[5]));
-      else
-	emit_insn (complimentary_comparison (mask, operands[5], operands[4]));
-
-      swap_bsl_operands = 1;
-      break;
     case UNEQ:
-      /* We check (a > b ||  b > a).  combining these comparisons give us
-	 true iff !(a != b && a ORDERED b), swapping the operands to BSL
-	 will then give us (a == b ||  a UNORDERED b) as intended.  */
-
-      emit_insn (gen_aarch64_cmgt<VDQF:mode> (mask, operands[4], operands[5]));
-      emit_insn (gen_aarch64_cmgt<VDQF:mode> (tmp, operands[5], operands[4]));
-      emit_insn (gen_ior<VDQF_COND:v_cmp_result>3 (mask, mask, tmp));
-      swap_bsl_operands = 1;
+      /* We first check (a > b ||  b > a) which is !UNEQ, reverting
+	 this result will then give us (a == b || a UNORDERED b).
+	 Note we revert the result in vec_cmp.  */
+
+      emit_insn (gen_aarch64_cmgt<mode> (operands[0],
+					 operands[2], operands[3]));
+      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[3], operands[2]));
+      emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
       break;
     case UNORDERED:
-       /* Operands are ORDERED iff (a > b || b >= a).
-	 Swapping the operands to BSL will give the UNORDERED case.  */
-     swap_bsl_operands = 1;
-     /* Fall through.  */
+      /* Operands are ORDERED iff (a > b || b >= a), so we can compute
+	 UNORDERED as !ORDERED.  Note we revert the result in vec_cmp.
+
+	 Fall through.  */
     case ORDERED:
-      emit_insn (gen_aarch64_cmgt<VDQF:mode> (tmp, operands[4], operands[5]));
-      emit_insn (gen_aarch64_cmge<VDQF:mode> (mask, operands[5], operands[4]));
-      emit_insn (gen_ior<VDQF_COND:v_cmp_result>3 (mask, mask, tmp));
+      emit_insn (gen_aarch64_cmgt<mode> (tmp, operands[2], operands[3]));
+      emit_insn (gen_aarch64_cmge<mode> (operands[0],
+					 operands[3], operands[2]));
+      emit_insn (gen_ior<v_cmp_result>3 (operands[0], operands[0], tmp));
       break;
     default:
       gcc_unreachable ();
     }
 
-  if (swap_bsl_operands)
-    {
-      op1 = operands[2];
-      op2 = operands[1];
-    }
+  DONE;
+})
 
-    /* If we have (a = (b CMP c) ? -1 : 0);
-       Then we can simply move the generated mask.  */
+(define_expand "vec_cmp<mode><v_cmp_result>"
+  [(set (match_operand:<V_cmp_result> 0 "register_operand")
+	(match_operator 1 "comparison_operator"
+	    [(match_operand:VDQF 2 "register_operand")
+	     (match_operand:VDQF 3 "nonmemory_operand")]))]
+  "TARGET_SIMD"
+{
+  enum rtx_code code = GET_CODE (operands[1]);
+
+  emit_insn (gen_vec_cmp<mode><v_cmp_result>_internal (operands[0],
+					 operands[1], operands[2],
+					 operands[3]));
+  /* See comments in vec_cmp<mode><v_cmp_result>_internal, below
+     operators are computed using other operators, and we need to
+     revert the result.  */
+  if (code == NE || code == UNEQ || code == UNLT || code == UNLE
+      || code == UNGT || code == UNGE || code == UNORDERED)
+    emit_insn (gen_one_cmpl<v_cmp_result>2 (operands[0], operands[0]));
 
-    if (op1 == CONSTM1_RTX (<VDQF_COND:V_cmp_result>mode)
-	&& op2 == CONST0_RTX (<VDQF_COND:V_cmp_result>mode))
-      emit_move_insn (operands[0], mask);
-    else
-      {
-	if (!REG_P (op1))
-	  op1 = force_reg (<VDQF_COND:MODE>mode, op1);
-	if (!REG_P (op2))
-	  op2 = force_reg (<VDQF_COND:MODE>mode, op2);
-	emit_insn (gen_aarch64_simd_bsl<VDQF_COND:mode> (operands[0], mask,
-					       op1, op2));
-      }
+  DONE;
+})
 
+(define_expand "vec_cmpu<mode><mode>"
+  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
+	  (match_operator 1 "comparison_operator"
+	    [(match_operand:VSDQ_I_DI 2 "register_operand")
+	     (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
+  "TARGET_SIMD"
+{
+  emit_insn (gen_vec_cmp<mode><mode> (operands[0], operands[1],
+				      operands[2], operands[3]));
   DONE;
 })
 
@@ -2543,26 +2524,50 @@ 
 	  (match_operand:VALLDI 2 "nonmemory_operand")))]
   "TARGET_SIMD"
 {
-  emit_insn (gen_aarch64_vcond_internal<mode><mode> (operands[0], operands[1],
-					       operands[2], operands[3],
+  rtx mask = gen_reg_rtx (<V_cmp_result>mode);
+  enum rtx_code code = GET_CODE (operands[3]);
+
+  emit_insn (gen_vec_cmp<mode><v_cmp_result>_internal (mask, operands[3],
 					       operands[4], operands[5]));
+  /* See comments in vec_cmp<mode><v_cmp_result>_internal, below
+     operators are computed using other operators, and we need to
+     revert the result.  In this case we can simply swap the operands
+     to bsl.  */
+  if (code == NE || code == UNEQ || code == UNLT || code == UNLE
+      || code == UNGT || code == UNGE || code == UNORDERED)
+    std::swap (operands[1], operands[2]);
+
+  emit_insn (gen_vcond_mask_<mode><v_cmp_result> (operands[0], operands[1],
+						 operands[2], mask));
   DONE;
 })
 
-(define_expand "vcond<v_cmp_result><mode>"
-  [(set (match_operand:<V_cmp_result> 0 "register_operand")
-	(if_then_else:<V_cmp_result>
+(define_expand "vcond<v_cmp_mixed><mode>"
+  [(set (match_operand:<V_cmp_mixed> 0 "register_operand")
+	(if_then_else:<V_cmp_mixed>
 	  (match_operator 3 "comparison_operator"
-	    [(match_operand:VDQF 4 "register_operand")
-	     (match_operand:VDQF 5 "nonmemory_operand")])
-	  (match_operand:<V_cmp_result> 1 "nonmemory_operand")
-	  (match_operand:<V_cmp_result> 2 "nonmemory_operand")))]
+	    [(match_operand:VDQF_COND 4 "register_operand")
+	     (match_operand:VDQF_COND 5 "nonmemory_operand")])
+	  (match_operand:<V_cmp_mixed> 1 "nonmemory_operand")
+	  (match_operand:<V_cmp_mixed> 2 "nonmemory_operand")))]
   "TARGET_SIMD"
 {
-  emit_insn (gen_aarch64_vcond_internal<v_cmp_result><mode> (
+  rtx mask = gen_reg_rtx (<V_cmp_result>mode);
+  enum rtx_code code = GET_CODE (operands[3]);
+
+  emit_insn (gen_vec_cmp<mode><v_cmp_result>_internal (mask, operands[3],
+					       operands[4], operands[5]));
+  /* See comments in vec_cmp<mode><v_cmp_result>_internal, below
+     operators are computed using other operators, and we need to
+     revert the result.  In this case we can simply swap the operands
+     to bsl.  */
+  if (code == NE || code == UNEQ || code == UNLT || code == UNLE
+      || code == UNGT || code == UNGE || code == UNORDERED)
+    std::swap (operands[1], operands[2]);
+
+  emit_insn (gen_vcond_mask_<v_cmp_mixed><v_cmp_result> (
 						operands[0], operands[1],
-						operands[2], operands[3],
-						operands[4], operands[5]));
+						operands[2], mask));
   DONE;
 })
 
@@ -2576,9 +2581,48 @@ 
 	  (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))]
   "TARGET_SIMD"
 {
-  emit_insn (gen_aarch64_vcond_internal<mode><mode> (operands[0], operands[1],
-					       operands[2], operands[3],
+  rtx mask = gen_reg_rtx (<MODE>mode);
+  enum rtx_code code = GET_CODE (operands[3]);
+
+  emit_insn (gen_vec_cmp<mode><mode>_internal (mask, operands[3],
 					       operands[4], operands[5]));
+  /* See comments in vec_cmp<mode><v_cmp_result>_internal, below
+     operators are computed using other operators, and we need to
+     revert the result.  In this case we can simply swap the operands
+     to bsl.  */
+  if (code == NE)
+    std::swap (operands[1], operands[2]);
+
+  emit_insn (gen_vcond_mask_<mode><v_cmp_result> (operands[0], operands[1],
+						 operands[2], mask));
+  DONE;
+})
+
+(define_expand "vcondu<mode><v_cmp_mixed>"
+  [(set (match_operand:VDQF 0 "register_operand")
+	(if_then_else:VDQF
+	  (match_operator 3 "comparison_operator"
+	    [(match_operand:<V_cmp_mixed> 4 "register_operand")
+	     (match_operand:<V_cmp_mixed> 5 "nonmemory_operand")])
+	  (match_operand:VDQF 1 "nonmemory_operand")
+	  (match_operand:VDQF 2 "nonmemory_operand")))]
+  "TARGET_SIMD"
+{
+  rtx mask = gen_reg_rtx (<V_cmp_result>mode);
+  enum rtx_code code = GET_CODE (operands[3]);
+
+  emit_insn (gen_vec_cmp<v_cmp_mixed><v_cmp_mixed>_internal (
+						  mask, operands[3],
+						  operands[4], operands[5]));
+  /* See comments in vec_cmp<mode><v_cmp_result>_internal, below
+     operators are computed using other operators, and we need to
+     revert the result.  In this case we can simply swap the operands
+     to bsl.  */
+  if (code == NE)
+    std::swap (operands[1], operands[2]);
+
+  emit_insn (gen_vcond_mask_<mode><v_cmp_result> (operands[0], operands[1],
+						 operands[2], mask));
   DONE;
 })
 
@@ -4179,7 +4223,7 @@ 
 ;; cmtst
 
 ;; Although neg (ne (and x y) 0) is the natural way of expressing a cmtst,
-;; we don't have any insns using ne, and aarch64_vcond_internal outputs
+;; we don't have any insns using ne, and aarch64_vcond outputs
 ;; not (neg (eq (and x y) 0))
 ;; which is rewritten by simplify_rtx as
 ;; plus (eq (and x y) 0) -1.
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index d9bd391..2f15678 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -610,6 +610,16 @@ 
 				(V2DF "v2di") (DF    "di")
 				(SF   "si")])
 
+;; Mode for vector conditional operations where the comparison has
+;; different type from the lhs.
+(define_mode_attr V_cmp_mixed [(V2SI "V2SF") (V4SI "V4SF")
+			       (V2DI "V2DF") (V2SF "V2SI")
+			       (V4SF "V4SI") (V2DF "V2DI")])
+
+(define_mode_attr v_cmp_mixed [(V2SI "v2sf") (V4SI "v4sf")
+			       (V2DI "v2df") (V2SF "v2si")
+			       (V4SF "v4si") (V2DF "v2di")])
+
 ;; Lower case element modes (as used in shift immediate patterns).
 (define_mode_attr ve_mode [(V8QI "qi") (V16QI "qi")
 			   (V4HI "hi") (V8HI  "hi")
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vcond.c b/gcc/testsuite/gcc.target/aarch64/vect-vcond.c
new file mode 100644
index 0000000..b0ca6d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-vcond.c
@@ -0,0 +1,59 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
+
+float fa[1024] = {0.0}, fb[1024] = {0.0}, fc[1024];
+double da[1024] = {0.0}, db[1024] = {0.0}, dc[1024];
+
+int vcond_f_i (int *x)
+{
+  int i = 0;
+  for (i = 0; i < 1024; i++)
+    {
+      fa[i] -= 1.0;
+      fb[i] += 1.0;
+      fc[i] = (x[i] < i) ? fa[i] : fb[i];
+    }
+
+  return 0;
+}
+
+int vcond_f_u (unsigned int *x)
+{
+  unsigned int i = 0;
+  for (i = 0; i < 1024; i++)
+    {
+      fa[i] -= 1.0;
+      fb[i] += 1.0;
+      fc[i] = (x[i] < i) ? fa[i] : fb[i];
+    }
+
+  return 0;
+}
+
+int vcond_d_i (long long *x)
+{
+  long long i = 0;
+  for (i = 0; i < 1024; i++)
+    {
+      da[i] -= 1.0;
+      db[i] += 1.0;
+      dc[i] = (x[i] < i) ? da[i] : db[i];
+    }
+
+  return 0;
+}
+
+int vcond_d_u (unsigned long long *x)
+{
+  unsigned long long i = 0;
+  for (i = 0; i < 1024; i++)
+    {
+      da[i] -= 1.0;
+      db[i] += 1.0;
+      dc[i] = (x[i] < i) ? da[i] : db[i];
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */