diff mbox series

internal-fn: Add VCOND_MASK_LEN.

Message ID e9b33876-cf75-417e-85b3-89e00e17435f@gmail.com
State New
Headers show
Series internal-fn: Add VCOND_MASK_LEN. | expand

Commit Message

Robin Dapp Oct. 23, 2023, 4:09 p.m. UTC
The attached patch introduces a VCOND_MASK_LEN, helps for the riscv cases
that were broken before and looks unchanged on x86, aarch64 and power
bootstrap and testsuites.

I only went with the minimal number of new match.pd patterns and did not
try stripping the length of a COND_LEN_OP in order to simplify the
associated COND_OP.

An important part that I'm not sure how to handle properly is -
when we have a constant immediate length of e.g. 16 and the hardware
also operates on 16 units, vector length masking is actually
redundant and the vcond_mask_len can be reduced to a vec_cond.
For those (if_then_else unsplit) we have a large number of combine
patterns that fuse instruction which do not correspond to ifns
(like widening operations but also more complex ones).

Currently I achieve this in a most likely wrong way:

      auto sz = GET_MODE_NUNITS (TYPE_MODE (res_op->type));
      bool full_len = len && known_eq (sz.coeffs[0], ilen);
      if (!len || full_len)
         "vec_cond"
      else
         "vcond_mask_len"

Another thing not done in this patch:  For vcond_mask we only expect
register operands as mask and force to a register.  For a vcond_mask_len
that results from a simplification with all-one or all-zero mask we
could allow constant immediate vectors and expand them to simple
len moves in the backend.

Regards
 Robin

From bc72e9b2f3ee46508404ee7723ca78790fa96b6b Mon Sep 17 00:00:00 2001
From: Robin Dapp <rdapp@ventanamicro.com>
Date: Fri, 13 Oct 2023 10:20:35 +0200
Subject: [PATCH] internal-fn: Add VCOND_MASK_LEN.

In order to prevent simplification of a COND_OP with degenerate mask
(all true or all zero) into just an OP in the presence of length
masking this patch introduces a length-masked analog to VEC_COND_EXPR:
IFN_VCOND_MASK_LEN.  If the to-be-simplified conditional operation has a
length that is not the full hardware vector length a simplification now
does not result int a VEC_COND but rather a VCOND_MASK_LEN.

For cases where the masks is known to be all true or all zero the patch
introduces new match patterns that allow combination of unconditional
unary, binary and ternay operations with the respective conditional
operations if the target supports it.

Similarly, if the length is known to be equal to the target hardware
length VCOND_MASK_LEN will be simplified to VEC_COND_EXPR.

gcc/ChangeLog:

	* config/riscv/autovec.md (vcond_mask_len_<mode><vm>): Add
	expander.
	* config/riscv/riscv-protos.h (enum insn_type):
	* doc/md.texi: Add vcond_mask_len.
	* gimple-match-exports.cc (maybe_resimplify_conditional_op):
	Create VCOND_MASK_LEN when
	length masking.
	* gimple-match.h (gimple_match_op::gimple_match_op): Allow
	matching of 6 and 7 parameters.
	(gimple_match_op::set_op): Ditto.
	(gimple_match_op::gimple_match_op): Always initialize len and
	bias.
	* internal-fn.cc (vec_cond_mask_len_direct): Add.
	(expand_vec_cond_mask_len_optab_fn): Add.
	(direct_vec_cond_mask_len_optab_supported_p): Add.
	(internal_fn_len_index): Add VCOND_MASK_LEN.
	(internal_fn_mask_index): Ditto.
	* internal-fn.def (VCOND_MASK_LEN): New internal function.
	* match.pd: Combine unconditional unary, binary and ternary
	operations into the respective COND_LEN operations.
	* optabs.def (OPTAB_CD): Add vcond_mask_len optab.
---
 gcc/config/riscv/autovec.md     | 20 +++++++++
 gcc/config/riscv/riscv-protos.h |  4 ++
 gcc/doc/md.texi                 |  9 ++++
 gcc/gimple-match-exports.cc     | 20 +++++++--
 gcc/gimple-match.h              | 78 ++++++++++++++++++++++++++++++++-
 gcc/internal-fn.cc              | 41 +++++++++++++++++
 gcc/internal-fn.def             |  2 +
 gcc/match.pd                    | 74 +++++++++++++++++++++++++++++++
 gcc/optabs.def                  |  1 +
 9 files changed, 244 insertions(+), 5 deletions(-)

Comments

Richard Sandiford Oct. 24, 2023, 9:50 p.m. UTC | #1
Robin Dapp <rdapp.gcc@gmail.com> writes:
> The attached patch introduces a VCOND_MASK_LEN, helps for the riscv cases
> that were broken before and looks unchanged on x86, aarch64 and power
> bootstrap and testsuites.
>
> I only went with the minimal number of new match.pd patterns and did not
> try stripping the length of a COND_LEN_OP in order to simplify the
> associated COND_OP.
>
> An important part that I'm not sure how to handle properly is -
> when we have a constant immediate length of e.g. 16 and the hardware
> also operates on 16 units, vector length masking is actually
> redundant and the vcond_mask_len can be reduced to a vec_cond.
> For those (if_then_else unsplit) we have a large number of combine
> patterns that fuse instruction which do not correspond to ifns
> (like widening operations but also more complex ones).
>
> Currently I achieve this in a most likely wrong way:
>
>       auto sz = GET_MODE_NUNITS (TYPE_MODE (res_op->type));
>       bool full_len = len && known_eq (sz.coeffs[0], ilen);
>       if (!len || full_len)
>          "vec_cond"
>       else
>          "vcond_mask_len"

At first, this seemed like an odd place to fold away the length.
AFAIK the length in res_op is inherited directly from the original
operation, and so it isn't any more redundant after the fold than
it was before.  But I suppose the reason for doing it here is that
we deliberately create IFN_COND_LEN_FOO calls that have "redundant"
lengths.  Doing that avoids the need to define an IFN_COND_FOO
equivalent of every IFN_COND_LEN_FOO optab.  Is that right?  If so,
I think it deserves a comment.

But yeah, known_eq (sz.coeffs[0], ilen) doesn't look right.
If the target knows that the length is exactly 16 at runtime,
then it should set GET_MODE_NUNITS to 16.  So I think the length
is only redundant if known_eq (sz, ilen).

The calculation should take the bias into account as well.

Any reason not to make IFN_COND_LEN_MASK a directly-mapped optab?
(I realise IFN_COND_MASK isn't, but that's used differently.)

Failing that, could the expansion use expand_fn_using_insn?

It generally looks OK to me otherwise FWIW, but it would be nice
to handle the fold programmatically in gimple-match*.cc rather
than having the explicit match.pd patterns.  Richi should review
the match.pd stuff though. ;)  (I didn't really look at it.)

Thanks,
Richard

> Another thing not done in this patch:  For vcond_mask we only expect
> register operands as mask and force to a register.  For a vcond_mask_len
> that results from a simplification with all-one or all-zero mask we
> could allow constant immediate vectors and expand them to simple
> len moves in the backend.
>
> Regards
>  Robin
>
> From bc72e9b2f3ee46508404ee7723ca78790fa96b6b Mon Sep 17 00:00:00 2001
> From: Robin Dapp <rdapp@ventanamicro.com>
> Date: Fri, 13 Oct 2023 10:20:35 +0200
> Subject: [PATCH] internal-fn: Add VCOND_MASK_LEN.
>
> In order to prevent simplification of a COND_OP with degenerate mask
> (all true or all zero) into just an OP in the presence of length
> masking this patch introduces a length-masked analog to VEC_COND_EXPR:
> IFN_VCOND_MASK_LEN.  If the to-be-simplified conditional operation has a
> length that is not the full hardware vector length a simplification now
> does not result int a VEC_COND but rather a VCOND_MASK_LEN.
>
> For cases where the masks is known to be all true or all zero the patch
> introduces new match patterns that allow combination of unconditional
> unary, binary and ternay operations with the respective conditional
> operations if the target supports it.
>
> Similarly, if the length is known to be equal to the target hardware
> length VCOND_MASK_LEN will be simplified to VEC_COND_EXPR.
>
> gcc/ChangeLog:
>
> 	* config/riscv/autovec.md (vcond_mask_len_<mode><vm>): Add
> 	expander.
> 	* config/riscv/riscv-protos.h (enum insn_type):
> 	* doc/md.texi: Add vcond_mask_len.
> 	* gimple-match-exports.cc (maybe_resimplify_conditional_op):
> 	Create VCOND_MASK_LEN when
> 	length masking.
> 	* gimple-match.h (gimple_match_op::gimple_match_op): Allow
> 	matching of 6 and 7 parameters.
> 	(gimple_match_op::set_op): Ditto.
> 	(gimple_match_op::gimple_match_op): Always initialize len and
> 	bias.
> 	* internal-fn.cc (vec_cond_mask_len_direct): Add.
> 	(expand_vec_cond_mask_len_optab_fn): Add.
> 	(direct_vec_cond_mask_len_optab_supported_p): Add.
> 	(internal_fn_len_index): Add VCOND_MASK_LEN.
> 	(internal_fn_mask_index): Ditto.
> 	* internal-fn.def (VCOND_MASK_LEN): New internal function.
> 	* match.pd: Combine unconditional unary, binary and ternary
> 	operations into the respective COND_LEN operations.
> 	* optabs.def (OPTAB_CD): Add vcond_mask_len optab.
> ---
>  gcc/config/riscv/autovec.md     | 20 +++++++++
>  gcc/config/riscv/riscv-protos.h |  4 ++
>  gcc/doc/md.texi                 |  9 ++++
>  gcc/gimple-match-exports.cc     | 20 +++++++--
>  gcc/gimple-match.h              | 78 ++++++++++++++++++++++++++++++++-
>  gcc/internal-fn.cc              | 41 +++++++++++++++++
>  gcc/internal-fn.def             |  2 +
>  gcc/match.pd                    | 74 +++++++++++++++++++++++++++++++
>  gcc/optabs.def                  |  1 +
>  9 files changed, 244 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 80910ba3cc2..27a71bc1ef9 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -565,6 +565,26 @@ (define_insn_and_split "vcond_mask_<mode><vm>"
>    [(set_attr "type" "vector")]
>  )
>  
> +(define_expand "vcond_mask_len_<mode><vm>"
> +  [(match_operand:V_VLS 0 "register_operand")
> +    (match_operand:<VM> 3 "register_operand")
> +    (match_operand:V_VLS 1 "nonmemory_operand")
> +    (match_operand:V_VLS 2 "register_operand")
> +    (match_operand 4 "autovec_length_operand")
> +    (match_operand 5 "const_0_operand")]
> +  "TARGET_VECTOR"
> +  {
> +    /* The order of vcond_mask is opposite to pred_merge.  */
> +    rtx ops[] = {operands[0], operands[0], operands[2], operands[1],
> +		 operands[3]};
> +    riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (<MODE>mode),
> +				      riscv_vector::MERGE_OP_REAL_ELSE, ops,
> +				      operands[4]);
> +    DONE;
> +  }
> +  [(set_attr "type" "vector")]
> +)
> +
>  ;; -------------------------------------------------------------------------
>  ;; ---- [BOOL] Select based on masks
>  ;; -------------------------------------------------------------------------
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 6cb9d459ee9..025a3568566 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -337,6 +337,10 @@ enum insn_type : unsigned int
>    /* For vmerge, no mask operand, no mask policy operand.  */
>    MERGE_OP = __NORMAL_OP_TA2 | TERNARY_OP_P,
>  
> +  /* For vmerge with no vundef operand.  */
> +  MERGE_OP_REAL_ELSE = HAS_DEST_P | HAS_MERGE_P | TDEFAULT_POLICY_P
> +		       | TERNARY_OP_P,
> +
>    /* For vm<compare>, no tail policy operand.  */
>    COMPARE_OP = __NORMAL_OP_MA | TERNARY_OP_P,
>    COMPARE_OP_MU = __MASK_OP_MU | TERNARY_OP_P,
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index daa318ee3da..de0757f1903 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5306,6 +5306,15 @@ no need to define this instruction pattern if the others are supported.
>  Similar to @code{vcond@var{m}@var{n}} but operand 3 holds a pre-computed
>  result of vector comparison.
>  
> +@cindex @code{vcond_mask_len_@var{m}@var{n}} instruction pattern
> +@item @samp{vcond_mask_@var{m}@var{n}}
> +Similar to @code{vcond_mask@var{m}@var{n}} but operand 4 holds a variable
> +or constant length and operand 5 holds a bias.  If the
> +element index < operand 4 + operand 5 the respective element of the result is
> +computed as in @code{vcond_mask_@var{m}@var{n}}.  For element indices >=
> +operand 4 + operand 5 the computation is performed as if the respective mask
> +element were zero.
> +
>  @cindex @code{maskload@var{m}@var{n}} instruction pattern
>  @item @samp{maskload@var{m}@var{n}}
>  Perform a masked load of vector from memory operand 1 of mode @var{m}
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> index b36027b0bad..32134dbf711 100644
> --- a/gcc/gimple-match-exports.cc
> +++ b/gcc/gimple-match-exports.cc
> @@ -307,9 +307,23 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
>        && VECTOR_TYPE_P (res_op->type)
>        && gimple_simplified_result_is_gimple_val (res_op))
>      {
> -      new_op.set_op (VEC_COND_EXPR, res_op->type,
> -		     res_op->cond.cond, res_op->ops[0],
> -		     res_op->cond.else_value);
> +      tree len = res_op->cond.len;
> +      HOST_WIDE_INT ilen = -1;
> +      if (len && TREE_CODE (len) == INTEGER_CST && tree_fits_uhwi_p (len))
> +	ilen = tree_to_uhwi (len);
> +
> +      auto sz = GET_MODE_NUNITS (TYPE_MODE (res_op->type));
> +      bool full_len = len && known_eq (sz.coeffs[0], ilen);
> +
> +      if (!len || full_len)
> +	new_op.set_op (VEC_COND_EXPR, res_op->type,
> +		       res_op->cond.cond, res_op->ops[0],
> +		       res_op->cond.else_value);
> +      else
> +	new_op.set_op (IFN_VCOND_MASK_LEN, res_op->type,
> +		       res_op->cond.cond, res_op->ops[0],
> +		       res_op->cond.else_value, res_op->cond.len,
> +		       res_op->cond.bias);
>        *res_op = new_op;
>        return gimple_resimplify3 (seq, res_op, valueize);
>      }
> diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
> index bec3ff42e3e..63a9f029589 100644
> --- a/gcc/gimple-match.h
> +++ b/gcc/gimple-match.h
> @@ -32,7 +32,8 @@ public:
>    enum uncond { UNCOND };
>  
>    /* Build an unconditional op.  */
> -  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE) {}
> +  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE), len
> +			       (NULL_TREE), bias (NULL_TREE) {}
>    gimple_match_cond (tree, tree);
>    gimple_match_cond (tree, tree, tree, tree);
>  
> @@ -56,7 +57,8 @@ public:
>  
>  inline
>  gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in)
> -  : cond (cond_in), else_value (else_value_in)
> +  : cond (cond_in), else_value (else_value_in), len (NULL_TREE),
> +    bias (NULL_TREE)
>  {
>  }
>  
> @@ -92,6 +94,10 @@ public:
>  		   code_helper, tree, tree, tree, tree, tree);
>    gimple_match_op (const gimple_match_cond &,
>  		   code_helper, tree, tree, tree, tree, tree, tree);
> +  gimple_match_op (const gimple_match_cond &,
> +		   code_helper, tree, tree, tree, tree, tree, tree, tree);
> +  gimple_match_op (const gimple_match_cond &,
> +		   code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
>  
>    void set_op (code_helper, tree, unsigned int);
>    void set_op (code_helper, tree, tree);
> @@ -100,6 +106,8 @@ public:
>    void set_op (code_helper, tree, tree, tree, tree, bool);
>    void set_op (code_helper, tree, tree, tree, tree, tree);
>    void set_op (code_helper, tree, tree, tree, tree, tree, tree);
> +  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree);
> +  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
>    void set_value (tree);
>  
>    tree op_or_null (unsigned int) const;
> @@ -212,6 +220,39 @@ gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
>    ops[4] = op4;
>  }
>  
> +inline
> +gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
> +				  code_helper code_in, tree type_in,
> +				  tree op0, tree op1, tree op2, tree op3,
> +				  tree op4, tree op5)
> +  : cond (cond_in), code (code_in), type (type_in), reverse (false),
> +    num_ops (6)
> +{
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +}
> +
> +inline
> +gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
> +				  code_helper code_in, tree type_in,
> +				  tree op0, tree op1, tree op2, tree op3,
> +				  tree op4, tree op5, tree op6)
> +  : cond (cond_in), code (code_in), type (type_in), reverse (false),
> +    num_ops (7)
> +{
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +  ops[6] = op6;
> +}
> +
>  /* Change the operation performed to CODE_IN, the type of the result to
>     TYPE_IN, and the number of operands to NUM_OPS_IN.  The caller needs
>     to set the operands itself.  */
> @@ -299,6 +340,39 @@ gimple_match_op::set_op (code_helper code_in, tree type_in,
>    ops[4] = op4;
>  }
>  
> +inline void
> +gimple_match_op::set_op (code_helper code_in, tree type_in,
> +			 tree op0, tree op1, tree op2, tree op3, tree op4,
> +			 tree op5)
> +{
> +  code = code_in;
> +  type = type_in;
> +  num_ops = 6;
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +}
> +
> +inline void
> +gimple_match_op::set_op (code_helper code_in, tree type_in,
> +			 tree op0, tree op1, tree op2, tree op3, tree op4,
> +			 tree op5, tree op6)
> +{
> +  code = code_in;
> +  type = type_in;
> +  num_ops = 7;
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +  ops[6] = op6;
> +}
> +
>  /* Set the "operation" to be the single value VALUE, such as a constant
>     or SSA_NAME.  */
>  
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 61d5a9e4772..b47c33faf85 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -170,6 +170,7 @@ init_internal_fns ()
>  #define store_lanes_direct { 0, 0, false }
>  #define mask_store_lanes_direct { 0, 0, false }
>  #define vec_cond_mask_direct { 1, 0, false }
> +#define vec_cond_mask_len_direct { 2, 0, false }
>  #define vec_cond_direct { 2, 0, false }
>  #define scatter_store_direct { 3, 1, false }
>  #define len_store_direct { 3, 3, false }
> @@ -3129,6 +3130,41 @@ expand_vec_cond_mask_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>      emit_move_insn (target, ops[0].value);
>  }
>  
> +static void
> +expand_vec_cond_mask_len_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +{
> +  class expand_operand ops[6];
> +
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree op0 = gimple_call_arg (stmt, 0);
> +  tree op1 = gimple_call_arg (stmt, 1);
> +  tree op2 = gimple_call_arg (stmt, 2);
> +  tree vec_cond_type = TREE_TYPE (lhs);
> +
> +  machine_mode mode = TYPE_MODE (vec_cond_type);
> +  machine_mode mask_mode = TYPE_MODE (TREE_TYPE (op0));
> +  enum insn_code icode = convert_optab_handler (optab, mode, mask_mode);
> +  rtx rtx_op1, rtx_op2;
> +
> +  gcc_assert (icode != CODE_FOR_nothing);
> +
> +  rtx_op1 = expand_normal (op1);
> +  rtx_op2 = expand_normal (op2);
> +
> +  rtx_op1 = force_reg (mode, rtx_op1);
> +
> +  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +  create_output_operand (&ops[0], target, mode);
> +  create_input_operand (&ops[1], rtx_op1, mode);
> +  create_input_operand (&ops[2], rtx_op2, mode);
> +
> +  int opno = add_mask_and_len_args (ops, 3, stmt);
> +  expand_insn (icode, opno, ops);
> +
> +  if (!rtx_equal_p (ops[0].value, target))
> +    emit_move_insn (target, ops[0].value);
> +}
> +
>  /* Expand VEC_SET internal functions.  */
>  
>  static void
> @@ -4018,6 +4054,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
>  #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_vec_cond_mask_optab_supported_p convert_optab_supported_p
> +#define direct_vec_cond_mask_len_optab_supported_p convert_optab_supported_p
>  #define direct_vec_cond_optab_supported_p convert_optab_supported_p
>  #define direct_scatter_store_optab_supported_p convert_optab_supported_p
>  #define direct_len_store_optab_supported_p direct_optab_supported_p
> @@ -4690,6 +4727,7 @@ internal_fn_len_index (internal_fn fn)
>      case IFN_MASK_LEN_STORE:
>      case IFN_MASK_LEN_LOAD_LANES:
>      case IFN_MASK_LEN_STORE_LANES:
> +    case IFN_VCOND_MASK_LEN:
>        return 3;
>  
>      default:
> @@ -4721,6 +4759,9 @@ internal_fn_mask_index (internal_fn fn)
>      case IFN_MASK_LEN_SCATTER_STORE:
>        return 4;
>  
> +    case IFN_VCOND_MASK_LEN:
> +      return 0;
> +
>      default:
>        return (conditional_internal_fn_code (fn) != ERROR_MARK
>  	      || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index a2023ab9c3d..581cc3b5140 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -221,6 +221,8 @@ DEF_INTERNAL_OPTAB_FN (VCONDU, ECF_CONST | ECF_NOTHROW, vcondu, vec_cond)
>  DEF_INTERNAL_OPTAB_FN (VCONDEQ, ECF_CONST | ECF_NOTHROW, vcondeq, vec_cond)
>  DEF_INTERNAL_OPTAB_FN (VCOND_MASK, ECF_CONST | ECF_NOTHROW,
>  		       vcond_mask, vec_cond_mask)
> +DEF_INTERNAL_OPTAB_FN (VCOND_MASK_LEN, ECF_CONST | ECF_NOTHROW,
> +		       vcond_mask_len, vec_cond_mask_len)
>  
>  DEF_INTERNAL_OPTAB_FN (VEC_SET, ECF_CONST | ECF_NOTHROW, vec_set, vec_set)
>  DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW,
> diff --git a/gcc/match.pd b/gcc/match.pd
> index ce8d159d260..f187d560fbf 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -87,6 +87,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    negate bit_not)
>  (define_operator_list COND_UNARY
>    IFN_COND_NEG IFN_COND_NOT)
> +(define_operator_list COND_LEN_UNARY
> +  IFN_COND_LEN_NEG IFN_COND_LEN_NOT)
>  
>  /* Binary operations and their associated IFN_COND_* function.  */
>  (define_operator_list UNCOND_BINARY
> @@ -103,12 +105,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    IFN_COND_FMIN IFN_COND_FMAX
>    IFN_COND_AND IFN_COND_IOR IFN_COND_XOR
>    IFN_COND_SHL IFN_COND_SHR)
> +(define_operator_list COND_LEN_BINARY
> +  IFN_COND_LEN_ADD IFN_COND_LEN_SUB
> +  IFN_COND_LEN_MUL IFN_COND_LEN_DIV IFN_COND_LEN_MOD IFN_COND_LEN_RDIV
> +  IFN_COND_LEN_MIN IFN_COND_LEN_MAX
> +  IFN_COND_LEN_FMIN IFN_COND_LEN_FMAX
> +  IFN_COND_LEN_AND IFN_COND_LEN_IOR IFN_COND_LEN_XOR
> +  IFN_COND_LEN_SHL IFN_COND_LEN_SHR)
>  
>  /* Same for ternary operations.  */
>  (define_operator_list UNCOND_TERNARY
>    IFN_FMA IFN_FMS IFN_FNMA IFN_FNMS)
>  (define_operator_list COND_TERNARY
>    IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS)
> +(define_operator_list COND_LEN_TERNARY
> +  IFN_COND_LEN_FMA IFN_COND_LEN_FMS IFN_COND_LEN_FNMA IFN_COND_LEN_FNMS)
>  
>  /* __atomic_fetch_or_*, __atomic_fetch_xor_*, __atomic_xor_fetch_*  */
>  (define_operator_list ATOMIC_FETCH_OR_XOR_N
> @@ -8949,6 +8960,69 @@ and,
>  	&& single_use (@5))
>      (view_convert (cond_op (bit_not @0) @2 @3 @4
>  		  (view_convert:op_type @1)))))))
> +
> +/* Similar for all cond_len operations.  */
> +(for uncond_op (UNCOND_UNARY)
> +     cond_op (COND_LEN_UNARY)
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@3 @1)) @2 @4 @5)
> +   (with { tree op_type = TREE_TYPE (@3); }
> +    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +        && is_truth_type_for (op_type, TREE_TYPE (@0)))
> +     (cond_op @0 @1 @2 @4 @5))))
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@3 @2)) @4 @5)
> +   (with { tree op_type = TREE_TYPE (@3); }
> +    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +        && is_truth_type_for (op_type, TREE_TYPE (@0)))
> +     (cond_op (bit_not @0) @2 @1 @4 @5)))))
> +
> +(for uncond_op (UNCOND_BINARY)
> +     cond_op (COND_LEN_BINARY)
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@4 @1 @2)) @3 @5 @6)
> +  (with { tree op_type = TREE_TYPE (@4); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@4))
> +    (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3) @5 @6)))))
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@4 @2 @3)) @5 @6)
> +  (with { tree op_type = TREE_TYPE (@4); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@4))
> +    (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1) @5 @6))))))
> +
> +(for uncond_op (UNCOND_TERNARY)
> +     cond_op (COND_LEN_TERNARY)
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4 @6 @7)
> +  (with { tree op_type = TREE_TYPE (@5); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@5))
> +    (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4) @6 @7)))))
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@5 @2 @3 @4 @6 @7)))
> +  (with { tree op_type = TREE_TYPE (@5); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@5))
> +    (view_convert (cond_op (bit_not @0) @2 @3 @4 (view_convert:op_type @1) @6 @7))))))
> +
> +/* A VCOND_MASK_LEN with a size that equals the full hardware vector size
> +   is just a vec_cond.  */
> +(simplify
> + (IFN_VCOND_MASK_LEN @0 @1 @2 INTEGER_CST@3 INTEGER_CST@4)
> + (with {
> +      HOST_WIDE_INT len = -1;
> +      if (tree_fits_uhwi_p (@3))
> +	len = tree_to_uhwi (@3);
> +      auto sz = GET_MODE_NUNITS (TYPE_MODE (res_op->type));
> +      bool full_len = (sz.coeffs[0] == len); }
> +   (if (full_len)
> +    (vec_cond @0 @1 @2))))
>  #endif
>  
>  /* Detect cases in which a VEC_COND_EXPR effectively replaces the
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 2ccbe4197b7..3cb16bd3002 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -88,6 +88,7 @@ OPTAB_CD(vcond_optab, "vcond$a$b")
>  OPTAB_CD(vcondu_optab, "vcondu$a$b")
>  OPTAB_CD(vcondeq_optab, "vcondeq$a$b")
>  OPTAB_CD(vcond_mask_optab, "vcond_mask_$a$b")
> +OPTAB_CD(vcond_mask_len_optab, "vcond_mask_len_$a$b")
>  OPTAB_CD(vec_cmp_optab, "vec_cmp$a$b")
>  OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b")
>  OPTAB_CD(vec_cmpeq_optab, "vec_cmpeq$a$b")
Robin Dapp Oct. 25, 2023, 7:59 p.m. UTC | #2
> At first, this seemed like an odd place to fold away the length.
> AFAIK the length in res_op is inherited directly from the original
> operation, and so it isn't any more redundant after the fold than
> it was before.  But I suppose the reason for doing it here is that
> we deliberately create IFN_COND_LEN_FOO calls that have "redundant"
> lengths.  Doing that avoids the need to define an IFN_COND_FOO
> equivalent of every IFN_COND_LEN_FOO optab.  Is that right?  If so,
> I think it deserves a comment.

I think, generally, what I want to cover is a more fundamental thing
- in length-controlled targets the loop length doesn't change
throughout a loop and what we normally do is load the right length,
operate on the maximum length (ignoring tail elements) and store
the right length.

So, whenever the length is constant it was already determined that
we operate on exactly this length and length masking is not needed.
Only when the length is variable and not compile-time constant we need
to use length masking (and therefore the vec_cond simplification becomes
invalid).  I think we never e.g. operate on the first "half" of a
vector, leaving the second half unchanged.  As far as I know such access
patterns are always done with non-length, "conditional" masking.

Actually the only problematic cases I found were reduction-like loops
where the reduction operated on full length rather than the "right" one.
If a tail element is wrong then, obviously the reduction result is also
wrong.  From a "loop len" point of view a reduction could have a length
like len_store.  Then the simplification problem would go away.

In the attached version I removed the hunk you mentioned but added a
match.pd pattern where all constant-length vcond_mask_len are simplified
to vec_cond.

/* A VCOND_MASK_LEN with a constant length is just a vec_cond for
   our purposes.  */
(simplify
 (IFN_VCOND_MASK_LEN @0 @1 @2 INTEGER_CST@3 INTEGER_CST@4)
  (vec_cond @0 @1 @2))

This works for all of the testsuite (and is basically the same
thing we have been testing all along with the bogus simplification
still in place).  Is there any way how to formalize the
requirement?  Or am I totally wrong and this must never be done?

> Any reason not to make IFN_COND_LEN_MASK a directly-mapped optab?
> (I realise IFN_COND_MASK isn't, but that's used differently.)

Right, a conversion optab is not necessary - in the expander function
all we really do is move the condition from position 1 to 3.  Changing
the order would mean inconsistency with vec_cond.  If that's acceptable
I can change it and we can use expand_direct_optab_fn.  For now I kept
the expander function but used a direct optab.

Regards
 Robin

From 4f793b71184b3301087780ed500f798d69328fc9 Mon Sep 17 00:00:00 2001
From: Robin Dapp <rdapp@ventanamicro.com>
Date: Fri, 13 Oct 2023 10:20:35 +0200
Subject: [PATCH v2] internal-fn: Add VCOND_MASK_LEN.

In order to prevent simplification of a COND_OP with degenerate mask
(all true or all zero) into just an OP in the presence of length
masking this patch introduces a length-masked analog to VEC_COND_EXPR:
IFN_VCOND_MASK_LEN.  If the to-be-simplified conditional operation has a
length that is not the full hardware vector length a simplification now
does not result int a VEC_COND but rather a VCOND_MASK_LEN.

For cases where the masks is known to be all true or all zero the patch
introduces new match patterns that allow combination of unconditional
unary, binary and ternay operations with the respective conditional
operations if the target supports it.

Similarly, if the length is known to be equal to the target hardware
length VCOND_MASK_LEN will be simplified to VEC_COND_EXPR.

gcc/ChangeLog:

	* config/riscv/autovec.md (vcond_mask_len_<mode><vm>): Add
	expander.
	* config/riscv/riscv-protos.h (enum insn_type):
	* doc/md.texi: Add vcond_mask_len.
	* gimple-match-exports.cc (maybe_resimplify_conditional_op):
	Create VCOND_MASK_LEN when
	length masking.
	* gimple-match.h (gimple_match_op::gimple_match_op): Allow
	matching of 6 and 7 parameters.
	(gimple_match_op::set_op): Ditto.
	(gimple_match_op::gimple_match_op): Always initialize len and
	bias.
	* internal-fn.cc (vec_cond_mask_len_direct): Add.
	(expand_vec_cond_mask_len_optab_fn): Add.
	(direct_vec_cond_mask_len_optab_supported_p): Add.
	(internal_fn_len_index): Add VCOND_MASK_LEN.
	(internal_fn_mask_index): Ditto.
	* internal-fn.def (VCOND_MASK_LEN): New internal function.
	* match.pd: Combine unconditional unary, binary and ternary
	operations into the respective COND_LEN operations.
	* optabs.def (OPTAB_CD): Add vcond_mask_len optab.
---
 gcc/config/riscv/autovec.md     | 37 ++++++++++++++++
 gcc/config/riscv/riscv-protos.h |  5 +++
 gcc/doc/md.texi                 |  9 ++++
 gcc/gimple-match-exports.cc     | 13 ++++--
 gcc/gimple-match.h              | 78 ++++++++++++++++++++++++++++++++-
 gcc/internal-fn.cc              | 42 ++++++++++++++++++
 gcc/internal-fn.def             |  2 +
 gcc/match.pd                    | 67 ++++++++++++++++++++++++++++
 gcc/optabs.def                  |  1 +
 9 files changed, 249 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index c9a2cf44816..096012af401 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -565,6 +565,43 @@ (define_insn_and_split "vcond_mask_<mode><vm>"
   [(set_attr "type" "vector")]
 )
 
+(define_expand "vcond_mask_len_<mode>"
+  [(match_operand:V_VLS 0 "register_operand")
+    (match_operand:<VM> 3 "nonmemory_operand")
+    (match_operand:V_VLS 1 "nonmemory_operand")
+    (match_operand:V_VLS 2 "autovec_else_operand")
+    (match_operand 4 "autovec_length_operand")
+    (match_operand 5 "const_0_operand")]
+  "TARGET_VECTOR"
+  {
+    if (satisfies_constraint_Wc1 (operands[3]))
+      {
+	rtx ops[] = {operands[0], operands[2], operands[1]};
+	riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (<MODE>mode),
+					  riscv_vector::UNARY_OP_TUMA,
+					  ops, operands[4]);
+      }
+    else if (satisfies_constraint_Wc0 (operands[3]))
+      {
+	rtx ops[] = {operands[0], operands[2], operands[2]};
+	riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (<MODE>mode),
+					  riscv_vector::UNARY_OP_TUMA,
+					  ops, operands[4]);
+      }
+    else
+      {
+	/* The order of vcond_mask is opposite to pred_merge.  */
+	rtx ops[] = {operands[0], operands[2], operands[2], operands[1],
+		     operands[3]};
+	riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (<MODE>mode),
+					  riscv_vector::MERGE_OP_TUMA,
+					  ops, operands[4]);
+      }
+    DONE;
+  }
+  [(set_attr "type" "vector")]
+)
+
 ;; -------------------------------------------------------------------------
 ;; ---- [BOOL] Select based on masks
 ;; -------------------------------------------------------------------------
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index aa9ce4b70e4..3938c500839 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -302,6 +302,7 @@ enum insn_type : unsigned int
   UNARY_OP = __NORMAL_OP | UNARY_OP_P,
   UNARY_OP_TAMA = __MASK_OP_TAMA | UNARY_OP_P,
   UNARY_OP_TAMU = __MASK_OP_TAMU | UNARY_OP_P,
+  UNARY_OP_TUMA = __MASK_OP_TUMA | UNARY_OP_P,
   UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P,
   UNARY_OP_FRM_RMM = UNARY_OP | FRM_RMM_P,
   UNARY_OP_FRM_RUP = UNARY_OP | FRM_RUP_P,
@@ -337,6 +338,10 @@ enum insn_type : unsigned int
   /* For vmerge, no mask operand, no mask policy operand.  */
   MERGE_OP = __NORMAL_OP_TA2 | TERNARY_OP_P,
 
+  /* For vmerge with no vundef operand.  */
+  MERGE_OP_TUMA = HAS_DEST_P | HAS_MERGE_P | TERNARY_OP_P
+		  | TU_POLICY_P,
+
   /* For vm<compare>, no tail policy operand.  */
   COMPARE_OP = __NORMAL_OP_MA | TERNARY_OP_P,
   COMPARE_OP_MU = __MASK_OP_MU | TERNARY_OP_P,
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index daa318ee3da..de0757f1903 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5306,6 +5306,15 @@ no need to define this instruction pattern if the others are supported.
 Similar to @code{vcond@var{m}@var{n}} but operand 3 holds a pre-computed
 result of vector comparison.
 
+@cindex @code{vcond_mask_len_@var{m}@var{n}} instruction pattern
+@item @samp{vcond_mask_@var{m}@var{n}}
+Similar to @code{vcond_mask@var{m}@var{n}} but operand 4 holds a variable
+or constant length and operand 5 holds a bias.  If the
+element index < operand 4 + operand 5 the respective element of the result is
+computed as in @code{vcond_mask_@var{m}@var{n}}.  For element indices >=
+operand 4 + operand 5 the computation is performed as if the respective mask
+element were zero.
+
 @cindex @code{maskload@var{m}@var{n}} instruction pattern
 @item @samp{maskload@var{m}@var{n}}
 Perform a masked load of vector from memory operand 1 of mode @var{m}
diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index b36027b0bad..d6dac08cc2b 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -307,9 +307,16 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
       && VECTOR_TYPE_P (res_op->type)
       && gimple_simplified_result_is_gimple_val (res_op))
     {
-      new_op.set_op (VEC_COND_EXPR, res_op->type,
-		     res_op->cond.cond, res_op->ops[0],
-		     res_op->cond.else_value);
+      tree len = res_op->cond.len;
+      if (!len)
+	new_op.set_op (VEC_COND_EXPR, res_op->type,
+		       res_op->cond.cond, res_op->ops[0],
+		       res_op->cond.else_value);
+      else
+	new_op.set_op (IFN_VCOND_MASK_LEN, res_op->type,
+		       res_op->cond.cond, res_op->ops[0],
+		       res_op->cond.else_value,
+		       res_op->cond.len, res_op->cond.bias);
       *res_op = new_op;
       return gimple_resimplify3 (seq, res_op, valueize);
     }
diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
index bec3ff42e3e..63a9f029589 100644
--- a/gcc/gimple-match.h
+++ b/gcc/gimple-match.h
@@ -32,7 +32,8 @@ public:
   enum uncond { UNCOND };
 
   /* Build an unconditional op.  */
-  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE) {}
+  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE), len
+			       (NULL_TREE), bias (NULL_TREE) {}
   gimple_match_cond (tree, tree);
   gimple_match_cond (tree, tree, tree, tree);
 
@@ -56,7 +57,8 @@ public:
 
 inline
 gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in)
-  : cond (cond_in), else_value (else_value_in)
+  : cond (cond_in), else_value (else_value_in), len (NULL_TREE),
+    bias (NULL_TREE)
 {
 }
 
@@ -92,6 +94,10 @@ public:
 		   code_helper, tree, tree, tree, tree, tree);
   gimple_match_op (const gimple_match_cond &,
 		   code_helper, tree, tree, tree, tree, tree, tree);
+  gimple_match_op (const gimple_match_cond &,
+		   code_helper, tree, tree, tree, tree, tree, tree, tree);
+  gimple_match_op (const gimple_match_cond &,
+		   code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
 
   void set_op (code_helper, tree, unsigned int);
   void set_op (code_helper, tree, tree);
@@ -100,6 +106,8 @@ public:
   void set_op (code_helper, tree, tree, tree, tree, bool);
   void set_op (code_helper, tree, tree, tree, tree, tree);
   void set_op (code_helper, tree, tree, tree, tree, tree, tree);
+  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree);
+  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
   void set_value (tree);
 
   tree op_or_null (unsigned int) const;
@@ -212,6 +220,39 @@ gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
   ops[4] = op4;
 }
 
+inline
+gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
+				  code_helper code_in, tree type_in,
+				  tree op0, tree op1, tree op2, tree op3,
+				  tree op4, tree op5)
+  : cond (cond_in), code (code_in), type (type_in), reverse (false),
+    num_ops (6)
+{
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+}
+
+inline
+gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
+				  code_helper code_in, tree type_in,
+				  tree op0, tree op1, tree op2, tree op3,
+				  tree op4, tree op5, tree op6)
+  : cond (cond_in), code (code_in), type (type_in), reverse (false),
+    num_ops (7)
+{
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+  ops[6] = op6;
+}
+
 /* Change the operation performed to CODE_IN, the type of the result to
    TYPE_IN, and the number of operands to NUM_OPS_IN.  The caller needs
    to set the operands itself.  */
@@ -299,6 +340,39 @@ gimple_match_op::set_op (code_helper code_in, tree type_in,
   ops[4] = op4;
 }
 
+inline void
+gimple_match_op::set_op (code_helper code_in, tree type_in,
+			 tree op0, tree op1, tree op2, tree op3, tree op4,
+			 tree op5)
+{
+  code = code_in;
+  type = type_in;
+  num_ops = 6;
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+}
+
+inline void
+gimple_match_op::set_op (code_helper code_in, tree type_in,
+			 tree op0, tree op1, tree op2, tree op3, tree op4,
+			 tree op5, tree op6)
+{
+  code = code_in;
+  type = type_in;
+  num_ops = 7;
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+  ops[6] = op6;
+}
+
 /* Set the "operation" to be the single value VALUE, such as a constant
    or SSA_NAME.  */
 
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index f196064c195..318756b6992 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -170,6 +170,7 @@ init_internal_fns ()
 #define store_lanes_direct { 0, 0, false }
 #define mask_store_lanes_direct { 0, 0, false }
 #define vec_cond_mask_direct { 1, 0, false }
+#define vec_cond_mask_len_direct { 1, 1, false }
 #define vec_cond_direct { 2, 0, false }
 #define scatter_store_direct { 3, 1, false }
 #define len_store_direct { 3, 3, false }
@@ -3129,6 +3130,39 @@ expand_vec_cond_mask_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
     emit_move_insn (target, ops[0].value);
 }
 
+static void
+expand_vec_cond_mask_len_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  class expand_operand ops[6];
+
+  tree lhs = gimple_call_lhs (stmt);
+  tree op1 = gimple_call_arg (stmt, 1);
+  tree op2 = gimple_call_arg (stmt, 2);
+  tree vec_cond_type = TREE_TYPE (lhs);
+
+  machine_mode mode = TYPE_MODE (vec_cond_type);
+  enum insn_code icode = direct_optab_handler (optab, mode);
+  rtx rtx_op1, rtx_op2;
+
+  gcc_assert (icode != CODE_FOR_nothing);
+
+  rtx_op1 = expand_normal (op1);
+  rtx_op2 = expand_normal (op2);
+
+  rtx_op1 = force_reg (mode, rtx_op1);
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_op1, mode);
+  create_input_operand (&ops[2], rtx_op2, mode);
+
+  int opno = add_mask_and_len_args (ops, 3, stmt);
+  expand_insn (icode, opno, ops);
+
+  if (!rtx_equal_p (ops[0].value, target))
+    emit_move_insn (target, ops[0].value);
+}
+
 /* Expand VEC_SET internal functions.  */
 
 static void
@@ -3931,6 +3965,9 @@ expand_convert_optab_fn (internal_fn fn, gcall *stmt, convert_optab optab,
 #define expand_vec_extract_optab_fn(FN, STMT, OPTAB) \
   expand_convert_optab_fn (FN, STMT, OPTAB, 2)
 
+#define expand_vec_cond_mask_len_optab_fn(FN, STMT, OPTAB) \
+  expand_vec_cond_mask_len_optab_fn (FN, STMT, OPTAB)
+
 /* RETURN_TYPE and ARGS are a return type and argument list that are
    in principle compatible with FN (which satisfies direct_internal_fn_p).
    Return the types that should be used to determine whether the
@@ -4022,6 +4059,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_vec_cond_mask_optab_supported_p convert_optab_supported_p
+#define direct_vec_cond_mask_len_optab_supported_p direct_optab_supported_p
 #define direct_vec_cond_optab_supported_p convert_optab_supported_p
 #define direct_scatter_store_optab_supported_p convert_optab_supported_p
 #define direct_len_store_optab_supported_p direct_optab_supported_p
@@ -4694,6 +4732,7 @@ internal_fn_len_index (internal_fn fn)
     case IFN_MASK_LEN_STORE:
     case IFN_MASK_LEN_LOAD_LANES:
     case IFN_MASK_LEN_STORE_LANES:
+    case IFN_VCOND_MASK_LEN:
       return 3;
 
     default:
@@ -4783,6 +4822,9 @@ internal_fn_mask_index (internal_fn fn)
     case IFN_MASK_LEN_SCATTER_STORE:
       return 4;
 
+    case IFN_VCOND_MASK_LEN:
+      return 0;
+
     default:
       return (conditional_internal_fn_code (fn) != ERROR_MARK
 	      || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index a2023ab9c3d..581cc3b5140 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -221,6 +221,8 @@ DEF_INTERNAL_OPTAB_FN (VCONDU, ECF_CONST | ECF_NOTHROW, vcondu, vec_cond)
 DEF_INTERNAL_OPTAB_FN (VCONDEQ, ECF_CONST | ECF_NOTHROW, vcondeq, vec_cond)
 DEF_INTERNAL_OPTAB_FN (VCOND_MASK, ECF_CONST | ECF_NOTHROW,
 		       vcond_mask, vec_cond_mask)
+DEF_INTERNAL_OPTAB_FN (VCOND_MASK_LEN, ECF_CONST | ECF_NOTHROW,
+		       vcond_mask_len, vec_cond_mask_len)
 
 DEF_INTERNAL_OPTAB_FN (VEC_SET, ECF_CONST | ECF_NOTHROW, vec_set, vec_set)
 DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/match.pd b/gcc/match.pd
index f725a685863..33532776288 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -87,6 +87,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   negate bit_not)
 (define_operator_list COND_UNARY
   IFN_COND_NEG IFN_COND_NOT)
+(define_operator_list COND_LEN_UNARY
+  IFN_COND_LEN_NEG IFN_COND_LEN_NOT)
 
 /* Binary operations and their associated IFN_COND_* function.  */
 (define_operator_list UNCOND_BINARY
@@ -103,12 +105,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   IFN_COND_FMIN IFN_COND_FMAX
   IFN_COND_AND IFN_COND_IOR IFN_COND_XOR
   IFN_COND_SHL IFN_COND_SHR)
+(define_operator_list COND_LEN_BINARY
+  IFN_COND_LEN_ADD IFN_COND_LEN_SUB
+  IFN_COND_LEN_MUL IFN_COND_LEN_DIV IFN_COND_LEN_MOD IFN_COND_LEN_RDIV
+  IFN_COND_LEN_MIN IFN_COND_LEN_MAX
+  IFN_COND_LEN_FMIN IFN_COND_LEN_FMAX
+  IFN_COND_LEN_AND IFN_COND_LEN_IOR IFN_COND_LEN_XOR
+  IFN_COND_LEN_SHL IFN_COND_LEN_SHR)
 
 /* Same for ternary operations.  */
 (define_operator_list UNCOND_TERNARY
   IFN_FMA IFN_FMS IFN_FNMA IFN_FNMS)
 (define_operator_list COND_TERNARY
   IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS)
+(define_operator_list COND_LEN_TERNARY
+  IFN_COND_LEN_FMA IFN_COND_LEN_FMS IFN_COND_LEN_FNMA IFN_COND_LEN_FNMS)
 
 /* __atomic_fetch_or_*, __atomic_fetch_xor_*, __atomic_xor_fetch_*  */
 (define_operator_list ATOMIC_FETCH_OR_XOR_N
@@ -8949,6 +8960,62 @@ and,
 	&& single_use (@5))
     (view_convert (cond_op (bit_not @0) @2 @3 @4
 		  (view_convert:op_type @1)))))))
+
+/* Similar for all cond_len operations.  */
+(for uncond_op (UNCOND_UNARY)
+     cond_op (COND_LEN_UNARY)
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@3 @1)) @2 @4 @5)
+   (with { tree op_type = TREE_TYPE (@3); }
+    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+        && is_truth_type_for (op_type, TREE_TYPE (@0)))
+     (cond_op @0 @1 @2 @4 @5))))
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@3 @2)) @4 @5)
+   (with { tree op_type = TREE_TYPE (@3); }
+    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+        && is_truth_type_for (op_type, TREE_TYPE (@0)))
+     (cond_op (bit_not @0) @2 @1 @4 @5)))))
+
+(for uncond_op (UNCOND_BINARY)
+     cond_op (COND_LEN_BINARY)
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@4 @1 @2)) @3 @5 @6)
+  (with { tree op_type = TREE_TYPE (@4); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@4))
+    (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3) @5 @6)))))
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@4 @2 @3)) @5 @6)
+  (with { tree op_type = TREE_TYPE (@4); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@4))
+    (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1) @5 @6))))))
+
+(for uncond_op (UNCOND_TERNARY)
+     cond_op (COND_LEN_TERNARY)
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4 @6 @7)
+  (with { tree op_type = TREE_TYPE (@5); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@5))
+    (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4) @6 @7)))))
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@5 @2 @3 @4 @6 @7)))
+  (with { tree op_type = TREE_TYPE (@5); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@5))
+    (view_convert (cond_op (bit_not @0) @2 @3 @4 (view_convert:op_type @1) @6 @7))))))
+
+/* A VCOND_MASK_LEN with a constant length is just a vec_cond for our
+   purposes.  */
+(simplify
+ (IFN_VCOND_MASK_LEN @0 @1 @2 INTEGER_CST@3 INTEGER_CST@4)
+  (vec_cond @0 @1 @2))
 #endif
 
 /* Detect cases in which a VEC_COND_EXPR effectively replaces the
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 2ccbe4197b7..8d5ceeb8710 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -282,6 +282,7 @@ OPTAB_D (cond_len_fnma_optab, "cond_len_fnma$a")
 OPTAB_D (cond_len_fnms_optab, "cond_len_fnms$a")
 OPTAB_D (cond_len_neg_optab, "cond_len_neg$a")
 OPTAB_D (cond_len_one_cmpl_optab, "cond_len_one_cmpl$a")
+OPTAB_D (vcond_mask_len_optab, "vcond_mask_len_$a")
 OPTAB_D (cmov_optab, "cmov$a6")
 OPTAB_D (cstore_optab, "cstore$a4")
 OPTAB_D (ctrap_optab, "ctrap$a4")
Richard Sandiford Oct. 25, 2023, 9:58 p.m. UTC | #3
Robin Dapp <rdapp.gcc@gmail.com> writes:
>> At first, this seemed like an odd place to fold away the length.
>> AFAIK the length in res_op is inherited directly from the original
>> operation, and so it isn't any more redundant after the fold than
>> it was before.  But I suppose the reason for doing it here is that
>> we deliberately create IFN_COND_LEN_FOO calls that have "redundant"
>> lengths.  Doing that avoids the need to define an IFN_COND_FOO
>> equivalent of every IFN_COND_LEN_FOO optab.  Is that right?  If so,
>> I think it deserves a comment.
>
> I think, generally, what I want to cover is a more fundamental thing
> - in length-controlled targets the loop length doesn't change
> throughout a loop and what we normally do is load the right length,
> operate on the maximum length (ignoring tail elements) and store
> the right length.
>
> So, whenever the length is constant it was already determined that
> we operate on exactly this length and length masking is not needed.
> Only when the length is variable and not compile-time constant we need
> to use length masking (and therefore the vec_cond simplification becomes
> invalid).  I think we never e.g. operate on the first "half" of a
> vector, leaving the second half unchanged.  As far as I know such access
> patterns are always done with non-length, "conditional" masking.

In that case, I think we need to nail down what the semantics of
these LEN functions actually are.  There seems to be a discrepancy
between the optab documentation and the internal-fn.cc documentation.

The optab documentation says:

for (i = 0; i < ops[5] + ops[6]; i++)
  op0[i] = op1[i] ? op2[i] @var{op} op3[i] : op4[i];

which leaves trailing elements of op0 in an undefined state.
But internal-fn.cc says:

     for (int i = 0; i < NUNITS; i++)
      {
	if (i < LEN + BIAS && COND[i])
	  LHS[i] = A[i] CODE B[i];
	else
	  LHS[i] = ELSE[i];
      }

which leaves all lanes in a well-defined state.  Which one is right?

If the first one is right, then it doesn't seem to matter whether
the length is constant or variable.  We can simplify:

  IFN_COND_LEN_IOR (mask, a, 0, b, len, bias)

to:

  VEC_COND_EXPR <mask, a, b>

regardless of the values of len and bias.  We wouldn't then need a
VCOND_MASK_LEN after all.

If the second one is right, then we cannot get rid of the length
unless it is known to be equal to the number of lanes, at least
according to gimple semantics.  Any knowledge about which lanes
"exist" is only exposed in target-dependent code (presumably by
the VSETVL pass).

Thanks,
Richard

> Actually the only problematic cases I found were reduction-like loops
> where the reduction operated on full length rather than the "right" one.
> If a tail element is wrong then, obviously the reduction result is also
> wrong.  From a "loop len" point of view a reduction could have a length
> like len_store.  Then the simplification problem would go away.
>
> In the attached version I removed the hunk you mentioned but added a
> match.pd pattern where all constant-length vcond_mask_len are simplified
> to vec_cond.
>
> /* A VCOND_MASK_LEN with a constant length is just a vec_cond for
>    our purposes.  */
> (simplify
>  (IFN_VCOND_MASK_LEN @0 @1 @2 INTEGER_CST@3 INTEGER_CST@4)
>   (vec_cond @0 @1 @2))
>
> This works for all of the testsuite (and is basically the same
> thing we have been testing all along with the bogus simplification
> still in place).  Is there any way how to formalize the
> requirement?  Or am I totally wrong and this must never be done?
>
>> Any reason not to make IFN_COND_LEN_MASK a directly-mapped optab?
>> (I realise IFN_COND_MASK isn't, but that's used differently.)
>
> Right, a conversion optab is not necessary - in the expander function
> all we really do is move the condition from position 1 to 3.  Changing
> the order would mean inconsistency with vec_cond.  If that's acceptable
> I can change it and we can use expand_direct_optab_fn.  For now I kept
> the expander function but used a direct optab.
>
> Regards
>  Robin
>
> From 4f793b71184b3301087780ed500f798d69328fc9 Mon Sep 17 00:00:00 2001
> From: Robin Dapp <rdapp@ventanamicro.com>
> Date: Fri, 13 Oct 2023 10:20:35 +0200
> Subject: [PATCH v2] internal-fn: Add VCOND_MASK_LEN.
>
> In order to prevent simplification of a COND_OP with degenerate mask
> (all true or all zero) into just an OP in the presence of length
> masking this patch introduces a length-masked analog to VEC_COND_EXPR:
> IFN_VCOND_MASK_LEN.  If the to-be-simplified conditional operation has a
> length that is not the full hardware vector length a simplification now
> does not result int a VEC_COND but rather a VCOND_MASK_LEN.
>
> For cases where the masks is known to be all true or all zero the patch
> introduces new match patterns that allow combination of unconditional
> unary, binary and ternay operations with the respective conditional
> operations if the target supports it.
>
> Similarly, if the length is known to be equal to the target hardware
> length VCOND_MASK_LEN will be simplified to VEC_COND_EXPR.
>
> gcc/ChangeLog:
>
> 	* config/riscv/autovec.md (vcond_mask_len_<mode><vm>): Add
> 	expander.
> 	* config/riscv/riscv-protos.h (enum insn_type):
> 	* doc/md.texi: Add vcond_mask_len.
> 	* gimple-match-exports.cc (maybe_resimplify_conditional_op):
> 	Create VCOND_MASK_LEN when
> 	length masking.
> 	* gimple-match.h (gimple_match_op::gimple_match_op): Allow
> 	matching of 6 and 7 parameters.
> 	(gimple_match_op::set_op): Ditto.
> 	(gimple_match_op::gimple_match_op): Always initialize len and
> 	bias.
> 	* internal-fn.cc (vec_cond_mask_len_direct): Add.
> 	(expand_vec_cond_mask_len_optab_fn): Add.
> 	(direct_vec_cond_mask_len_optab_supported_p): Add.
> 	(internal_fn_len_index): Add VCOND_MASK_LEN.
> 	(internal_fn_mask_index): Ditto.
> 	* internal-fn.def (VCOND_MASK_LEN): New internal function.
> 	* match.pd: Combine unconditional unary, binary and ternary
> 	operations into the respective COND_LEN operations.
> 	* optabs.def (OPTAB_CD): Add vcond_mask_len optab.
> ---
>  gcc/config/riscv/autovec.md     | 37 ++++++++++++++++
>  gcc/config/riscv/riscv-protos.h |  5 +++
>  gcc/doc/md.texi                 |  9 ++++
>  gcc/gimple-match-exports.cc     | 13 ++++--
>  gcc/gimple-match.h              | 78 ++++++++++++++++++++++++++++++++-
>  gcc/internal-fn.cc              | 42 ++++++++++++++++++
>  gcc/internal-fn.def             |  2 +
>  gcc/match.pd                    | 67 ++++++++++++++++++++++++++++
>  gcc/optabs.def                  |  1 +
>  9 files changed, 249 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index c9a2cf44816..096012af401 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -565,6 +565,43 @@ (define_insn_and_split "vcond_mask_<mode><vm>"
>    [(set_attr "type" "vector")]
>  )
>  
> +(define_expand "vcond_mask_len_<mode>"
> +  [(match_operand:V_VLS 0 "register_operand")
> +    (match_operand:<VM> 3 "nonmemory_operand")
> +    (match_operand:V_VLS 1 "nonmemory_operand")
> +    (match_operand:V_VLS 2 "autovec_else_operand")
> +    (match_operand 4 "autovec_length_operand")
> +    (match_operand 5 "const_0_operand")]
> +  "TARGET_VECTOR"
> +  {
> +    if (satisfies_constraint_Wc1 (operands[3]))
> +      {
> +	rtx ops[] = {operands[0], operands[2], operands[1]};
> +	riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (<MODE>mode),
> +					  riscv_vector::UNARY_OP_TUMA,
> +					  ops, operands[4]);
> +      }
> +    else if (satisfies_constraint_Wc0 (operands[3]))
> +      {
> +	rtx ops[] = {operands[0], operands[2], operands[2]};
> +	riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (<MODE>mode),
> +					  riscv_vector::UNARY_OP_TUMA,
> +					  ops, operands[4]);
> +      }
> +    else
> +      {
> +	/* The order of vcond_mask is opposite to pred_merge.  */
> +	rtx ops[] = {operands[0], operands[2], operands[2], operands[1],
> +		     operands[3]};
> +	riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (<MODE>mode),
> +					  riscv_vector::MERGE_OP_TUMA,
> +					  ops, operands[4]);
> +      }
> +    DONE;
> +  }
> +  [(set_attr "type" "vector")]
> +)
> +
>  ;; -------------------------------------------------------------------------
>  ;; ---- [BOOL] Select based on masks
>  ;; -------------------------------------------------------------------------
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index aa9ce4b70e4..3938c500839 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -302,6 +302,7 @@ enum insn_type : unsigned int
>    UNARY_OP = __NORMAL_OP | UNARY_OP_P,
>    UNARY_OP_TAMA = __MASK_OP_TAMA | UNARY_OP_P,
>    UNARY_OP_TAMU = __MASK_OP_TAMU | UNARY_OP_P,
> +  UNARY_OP_TUMA = __MASK_OP_TUMA | UNARY_OP_P,
>    UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P,
>    UNARY_OP_FRM_RMM = UNARY_OP | FRM_RMM_P,
>    UNARY_OP_FRM_RUP = UNARY_OP | FRM_RUP_P,
> @@ -337,6 +338,10 @@ enum insn_type : unsigned int
>    /* For vmerge, no mask operand, no mask policy operand.  */
>    MERGE_OP = __NORMAL_OP_TA2 | TERNARY_OP_P,
>  
> +  /* For vmerge with no vundef operand.  */
> +  MERGE_OP_TUMA = HAS_DEST_P | HAS_MERGE_P | TERNARY_OP_P
> +		  | TU_POLICY_P,
> +
>    /* For vm<compare>, no tail policy operand.  */
>    COMPARE_OP = __NORMAL_OP_MA | TERNARY_OP_P,
>    COMPARE_OP_MU = __MASK_OP_MU | TERNARY_OP_P,
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index daa318ee3da..de0757f1903 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5306,6 +5306,15 @@ no need to define this instruction pattern if the others are supported.
>  Similar to @code{vcond@var{m}@var{n}} but operand 3 holds a pre-computed
>  result of vector comparison.
>  
> +@cindex @code{vcond_mask_len_@var{m}@var{n}} instruction pattern
> +@item @samp{vcond_mask_@var{m}@var{n}}
> +Similar to @code{vcond_mask@var{m}@var{n}} but operand 4 holds a variable
> +or constant length and operand 5 holds a bias.  If the
> +element index < operand 4 + operand 5 the respective element of the result is
> +computed as in @code{vcond_mask_@var{m}@var{n}}.  For element indices >=
> +operand 4 + operand 5 the computation is performed as if the respective mask
> +element were zero.
> +
>  @cindex @code{maskload@var{m}@var{n}} instruction pattern
>  @item @samp{maskload@var{m}@var{n}}
>  Perform a masked load of vector from memory operand 1 of mode @var{m}
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> index b36027b0bad..d6dac08cc2b 100644
> --- a/gcc/gimple-match-exports.cc
> +++ b/gcc/gimple-match-exports.cc
> @@ -307,9 +307,16 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
>        && VECTOR_TYPE_P (res_op->type)
>        && gimple_simplified_result_is_gimple_val (res_op))
>      {
> -      new_op.set_op (VEC_COND_EXPR, res_op->type,
> -		     res_op->cond.cond, res_op->ops[0],
> -		     res_op->cond.else_value);
> +      tree len = res_op->cond.len;
> +      if (!len)
> +	new_op.set_op (VEC_COND_EXPR, res_op->type,
> +		       res_op->cond.cond, res_op->ops[0],
> +		       res_op->cond.else_value);
> +      else
> +	new_op.set_op (IFN_VCOND_MASK_LEN, res_op->type,
> +		       res_op->cond.cond, res_op->ops[0],
> +		       res_op->cond.else_value,
> +		       res_op->cond.len, res_op->cond.bias);
>        *res_op = new_op;
>        return gimple_resimplify3 (seq, res_op, valueize);
>      }
> diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
> index bec3ff42e3e..63a9f029589 100644
> --- a/gcc/gimple-match.h
> +++ b/gcc/gimple-match.h
> @@ -32,7 +32,8 @@ public:
>    enum uncond { UNCOND };
>  
>    /* Build an unconditional op.  */
> -  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE) {}
> +  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE), len
> +			       (NULL_TREE), bias (NULL_TREE) {}
>    gimple_match_cond (tree, tree);
>    gimple_match_cond (tree, tree, tree, tree);
>  
> @@ -56,7 +57,8 @@ public:
>  
>  inline
>  gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in)
> -  : cond (cond_in), else_value (else_value_in)
> +  : cond (cond_in), else_value (else_value_in), len (NULL_TREE),
> +    bias (NULL_TREE)
>  {
>  }
>  
> @@ -92,6 +94,10 @@ public:
>  		   code_helper, tree, tree, tree, tree, tree);
>    gimple_match_op (const gimple_match_cond &,
>  		   code_helper, tree, tree, tree, tree, tree, tree);
> +  gimple_match_op (const gimple_match_cond &,
> +		   code_helper, tree, tree, tree, tree, tree, tree, tree);
> +  gimple_match_op (const gimple_match_cond &,
> +		   code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
>  
>    void set_op (code_helper, tree, unsigned int);
>    void set_op (code_helper, tree, tree);
> @@ -100,6 +106,8 @@ public:
>    void set_op (code_helper, tree, tree, tree, tree, bool);
>    void set_op (code_helper, tree, tree, tree, tree, tree);
>    void set_op (code_helper, tree, tree, tree, tree, tree, tree);
> +  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree);
> +  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
>    void set_value (tree);
>  
>    tree op_or_null (unsigned int) const;
> @@ -212,6 +220,39 @@ gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
>    ops[4] = op4;
>  }
>  
> +inline
> +gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
> +				  code_helper code_in, tree type_in,
> +				  tree op0, tree op1, tree op2, tree op3,
> +				  tree op4, tree op5)
> +  : cond (cond_in), code (code_in), type (type_in), reverse (false),
> +    num_ops (6)
> +{
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +}
> +
> +inline
> +gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
> +				  code_helper code_in, tree type_in,
> +				  tree op0, tree op1, tree op2, tree op3,
> +				  tree op4, tree op5, tree op6)
> +  : cond (cond_in), code (code_in), type (type_in), reverse (false),
> +    num_ops (7)
> +{
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +  ops[6] = op6;
> +}
> +
>  /* Change the operation performed to CODE_IN, the type of the result to
>     TYPE_IN, and the number of operands to NUM_OPS_IN.  The caller needs
>     to set the operands itself.  */
> @@ -299,6 +340,39 @@ gimple_match_op::set_op (code_helper code_in, tree type_in,
>    ops[4] = op4;
>  }
>  
> +inline void
> +gimple_match_op::set_op (code_helper code_in, tree type_in,
> +			 tree op0, tree op1, tree op2, tree op3, tree op4,
> +			 tree op5)
> +{
> +  code = code_in;
> +  type = type_in;
> +  num_ops = 6;
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +}
> +
> +inline void
> +gimple_match_op::set_op (code_helper code_in, tree type_in,
> +			 tree op0, tree op1, tree op2, tree op3, tree op4,
> +			 tree op5, tree op6)
> +{
> +  code = code_in;
> +  type = type_in;
> +  num_ops = 7;
> +  ops[0] = op0;
> +  ops[1] = op1;
> +  ops[2] = op2;
> +  ops[3] = op3;
> +  ops[4] = op4;
> +  ops[5] = op5;
> +  ops[6] = op6;
> +}
> +
>  /* Set the "operation" to be the single value VALUE, such as a constant
>     or SSA_NAME.  */
>  
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index f196064c195..318756b6992 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -170,6 +170,7 @@ init_internal_fns ()
>  #define store_lanes_direct { 0, 0, false }
>  #define mask_store_lanes_direct { 0, 0, false }
>  #define vec_cond_mask_direct { 1, 0, false }
> +#define vec_cond_mask_len_direct { 1, 1, false }
>  #define vec_cond_direct { 2, 0, false }
>  #define scatter_store_direct { 3, 1, false }
>  #define len_store_direct { 3, 3, false }
> @@ -3129,6 +3130,39 @@ expand_vec_cond_mask_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>      emit_move_insn (target, ops[0].value);
>  }
>  
> +static void
> +expand_vec_cond_mask_len_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +{
> +  class expand_operand ops[6];
> +
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree op1 = gimple_call_arg (stmt, 1);
> +  tree op2 = gimple_call_arg (stmt, 2);
> +  tree vec_cond_type = TREE_TYPE (lhs);
> +
> +  machine_mode mode = TYPE_MODE (vec_cond_type);
> +  enum insn_code icode = direct_optab_handler (optab, mode);
> +  rtx rtx_op1, rtx_op2;
> +
> +  gcc_assert (icode != CODE_FOR_nothing);
> +
> +  rtx_op1 = expand_normal (op1);
> +  rtx_op2 = expand_normal (op2);
> +
> +  rtx_op1 = force_reg (mode, rtx_op1);
> +
> +  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +  create_output_operand (&ops[0], target, mode);
> +  create_input_operand (&ops[1], rtx_op1, mode);
> +  create_input_operand (&ops[2], rtx_op2, mode);
> +
> +  int opno = add_mask_and_len_args (ops, 3, stmt);
> +  expand_insn (icode, opno, ops);
> +
> +  if (!rtx_equal_p (ops[0].value, target))
> +    emit_move_insn (target, ops[0].value);
> +}
> +
>  /* Expand VEC_SET internal functions.  */
>  
>  static void
> @@ -3931,6 +3965,9 @@ expand_convert_optab_fn (internal_fn fn, gcall *stmt, convert_optab optab,
>  #define expand_vec_extract_optab_fn(FN, STMT, OPTAB) \
>    expand_convert_optab_fn (FN, STMT, OPTAB, 2)
>  
> +#define expand_vec_cond_mask_len_optab_fn(FN, STMT, OPTAB) \
> +  expand_vec_cond_mask_len_optab_fn (FN, STMT, OPTAB)
> +
>  /* RETURN_TYPE and ARGS are a return type and argument list that are
>     in principle compatible with FN (which satisfies direct_internal_fn_p).
>     Return the types that should be used to determine whether the
> @@ -4022,6 +4059,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
>  #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p
>  #define direct_vec_cond_mask_optab_supported_p convert_optab_supported_p
> +#define direct_vec_cond_mask_len_optab_supported_p direct_optab_supported_p
>  #define direct_vec_cond_optab_supported_p convert_optab_supported_p
>  #define direct_scatter_store_optab_supported_p convert_optab_supported_p
>  #define direct_len_store_optab_supported_p direct_optab_supported_p
> @@ -4694,6 +4732,7 @@ internal_fn_len_index (internal_fn fn)
>      case IFN_MASK_LEN_STORE:
>      case IFN_MASK_LEN_LOAD_LANES:
>      case IFN_MASK_LEN_STORE_LANES:
> +    case IFN_VCOND_MASK_LEN:
>        return 3;
>  
>      default:
> @@ -4783,6 +4822,9 @@ internal_fn_mask_index (internal_fn fn)
>      case IFN_MASK_LEN_SCATTER_STORE:
>        return 4;
>  
> +    case IFN_VCOND_MASK_LEN:
> +      return 0;
> +
>      default:
>        return (conditional_internal_fn_code (fn) != ERROR_MARK
>  	      || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index a2023ab9c3d..581cc3b5140 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -221,6 +221,8 @@ DEF_INTERNAL_OPTAB_FN (VCONDU, ECF_CONST | ECF_NOTHROW, vcondu, vec_cond)
>  DEF_INTERNAL_OPTAB_FN (VCONDEQ, ECF_CONST | ECF_NOTHROW, vcondeq, vec_cond)
>  DEF_INTERNAL_OPTAB_FN (VCOND_MASK, ECF_CONST | ECF_NOTHROW,
>  		       vcond_mask, vec_cond_mask)
> +DEF_INTERNAL_OPTAB_FN (VCOND_MASK_LEN, ECF_CONST | ECF_NOTHROW,
> +		       vcond_mask_len, vec_cond_mask_len)
>  
>  DEF_INTERNAL_OPTAB_FN (VEC_SET, ECF_CONST | ECF_NOTHROW, vec_set, vec_set)
>  DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW,
> diff --git a/gcc/match.pd b/gcc/match.pd
> index f725a685863..33532776288 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -87,6 +87,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    negate bit_not)
>  (define_operator_list COND_UNARY
>    IFN_COND_NEG IFN_COND_NOT)
> +(define_operator_list COND_LEN_UNARY
> +  IFN_COND_LEN_NEG IFN_COND_LEN_NOT)
>  
>  /* Binary operations and their associated IFN_COND_* function.  */
>  (define_operator_list UNCOND_BINARY
> @@ -103,12 +105,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    IFN_COND_FMIN IFN_COND_FMAX
>    IFN_COND_AND IFN_COND_IOR IFN_COND_XOR
>    IFN_COND_SHL IFN_COND_SHR)
> +(define_operator_list COND_LEN_BINARY
> +  IFN_COND_LEN_ADD IFN_COND_LEN_SUB
> +  IFN_COND_LEN_MUL IFN_COND_LEN_DIV IFN_COND_LEN_MOD IFN_COND_LEN_RDIV
> +  IFN_COND_LEN_MIN IFN_COND_LEN_MAX
> +  IFN_COND_LEN_FMIN IFN_COND_LEN_FMAX
> +  IFN_COND_LEN_AND IFN_COND_LEN_IOR IFN_COND_LEN_XOR
> +  IFN_COND_LEN_SHL IFN_COND_LEN_SHR)
>  
>  /* Same for ternary operations.  */
>  (define_operator_list UNCOND_TERNARY
>    IFN_FMA IFN_FMS IFN_FNMA IFN_FNMS)
>  (define_operator_list COND_TERNARY
>    IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS)
> +(define_operator_list COND_LEN_TERNARY
> +  IFN_COND_LEN_FMA IFN_COND_LEN_FMS IFN_COND_LEN_FNMA IFN_COND_LEN_FNMS)
>  
>  /* __atomic_fetch_or_*, __atomic_fetch_xor_*, __atomic_xor_fetch_*  */
>  (define_operator_list ATOMIC_FETCH_OR_XOR_N
> @@ -8949,6 +8960,62 @@ and,
>  	&& single_use (@5))
>      (view_convert (cond_op (bit_not @0) @2 @3 @4
>  		  (view_convert:op_type @1)))))))
> +
> +/* Similar for all cond_len operations.  */
> +(for uncond_op (UNCOND_UNARY)
> +     cond_op (COND_LEN_UNARY)
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@3 @1)) @2 @4 @5)
> +   (with { tree op_type = TREE_TYPE (@3); }
> +    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +        && is_truth_type_for (op_type, TREE_TYPE (@0)))
> +     (cond_op @0 @1 @2 @4 @5))))
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@3 @2)) @4 @5)
> +   (with { tree op_type = TREE_TYPE (@3); }
> +    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +        && is_truth_type_for (op_type, TREE_TYPE (@0)))
> +     (cond_op (bit_not @0) @2 @1 @4 @5)))))
> +
> +(for uncond_op (UNCOND_BINARY)
> +     cond_op (COND_LEN_BINARY)
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@4 @1 @2)) @3 @5 @6)
> +  (with { tree op_type = TREE_TYPE (@4); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@4))
> +    (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3) @5 @6)))))
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@4 @2 @3)) @5 @6)
> +  (with { tree op_type = TREE_TYPE (@4); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@4))
> +    (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1) @5 @6))))))
> +
> +(for uncond_op (UNCOND_TERNARY)
> +     cond_op (COND_LEN_TERNARY)
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4 @6 @7)
> +  (with { tree op_type = TREE_TYPE (@5); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@5))
> +    (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4) @6 @7)))))
> + (simplify
> +  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@5 @2 @3 @4 @6 @7)))
> +  (with { tree op_type = TREE_TYPE (@5); }
> +   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
> +	&& is_truth_type_for (op_type, TREE_TYPE (@0))
> +	&& single_use (@5))
> +    (view_convert (cond_op (bit_not @0) @2 @3 @4 (view_convert:op_type @1) @6 @7))))))
> +
> +/* A VCOND_MASK_LEN with a constant length is just a vec_cond for our
> +   purposes.  */
> +(simplify
> + (IFN_VCOND_MASK_LEN @0 @1 @2 INTEGER_CST@3 INTEGER_CST@4)
> +  (vec_cond @0 @1 @2))
>  #endif
>  
>  /* Detect cases in which a VEC_COND_EXPR effectively replaces the
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 2ccbe4197b7..8d5ceeb8710 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -282,6 +282,7 @@ OPTAB_D (cond_len_fnma_optab, "cond_len_fnma$a")
>  OPTAB_D (cond_len_fnms_optab, "cond_len_fnms$a")
>  OPTAB_D (cond_len_neg_optab, "cond_len_neg$a")
>  OPTAB_D (cond_len_one_cmpl_optab, "cond_len_one_cmpl$a")
> +OPTAB_D (vcond_mask_len_optab, "vcond_mask_len_$a")
>  OPTAB_D (cmov_optab, "cmov$a6")
>  OPTAB_D (cstore_optab, "cstore$a4")
>  OPTAB_D (ctrap_optab, "ctrap$a4")
diff mbox series

Patch

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 80910ba3cc2..27a71bc1ef9 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -565,6 +565,26 @@  (define_insn_and_split "vcond_mask_<mode><vm>"
   [(set_attr "type" "vector")]
 )
 
+(define_expand "vcond_mask_len_<mode><vm>"
+  [(match_operand:V_VLS 0 "register_operand")
+    (match_operand:<VM> 3 "register_operand")
+    (match_operand:V_VLS 1 "nonmemory_operand")
+    (match_operand:V_VLS 2 "register_operand")
+    (match_operand 4 "autovec_length_operand")
+    (match_operand 5 "const_0_operand")]
+  "TARGET_VECTOR"
+  {
+    /* The order of vcond_mask is opposite to pred_merge.  */
+    rtx ops[] = {operands[0], operands[0], operands[2], operands[1],
+		 operands[3]};
+    riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (<MODE>mode),
+				      riscv_vector::MERGE_OP_REAL_ELSE, ops,
+				      operands[4]);
+    DONE;
+  }
+  [(set_attr "type" "vector")]
+)
+
 ;; -------------------------------------------------------------------------
 ;; ---- [BOOL] Select based on masks
 ;; -------------------------------------------------------------------------
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6cb9d459ee9..025a3568566 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -337,6 +337,10 @@  enum insn_type : unsigned int
   /* For vmerge, no mask operand, no mask policy operand.  */
   MERGE_OP = __NORMAL_OP_TA2 | TERNARY_OP_P,
 
+  /* For vmerge with no vundef operand.  */
+  MERGE_OP_REAL_ELSE = HAS_DEST_P | HAS_MERGE_P | TDEFAULT_POLICY_P
+		       | TERNARY_OP_P,
+
   /* For vm<compare>, no tail policy operand.  */
   COMPARE_OP = __NORMAL_OP_MA | TERNARY_OP_P,
   COMPARE_OP_MU = __MASK_OP_MU | TERNARY_OP_P,
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index daa318ee3da..de0757f1903 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5306,6 +5306,15 @@  no need to define this instruction pattern if the others are supported.
 Similar to @code{vcond@var{m}@var{n}} but operand 3 holds a pre-computed
 result of vector comparison.
 
+@cindex @code{vcond_mask_len_@var{m}@var{n}} instruction pattern
+@item @samp{vcond_mask_@var{m}@var{n}}
+Similar to @code{vcond_mask@var{m}@var{n}} but operand 4 holds a variable
+or constant length and operand 5 holds a bias.  If the
+element index < operand 4 + operand 5 the respective element of the result is
+computed as in @code{vcond_mask_@var{m}@var{n}}.  For element indices >=
+operand 4 + operand 5 the computation is performed as if the respective mask
+element were zero.
+
 @cindex @code{maskload@var{m}@var{n}} instruction pattern
 @item @samp{maskload@var{m}@var{n}}
 Perform a masked load of vector from memory operand 1 of mode @var{m}
diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index b36027b0bad..32134dbf711 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -307,9 +307,23 @@  maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op,
       && VECTOR_TYPE_P (res_op->type)
       && gimple_simplified_result_is_gimple_val (res_op))
     {
-      new_op.set_op (VEC_COND_EXPR, res_op->type,
-		     res_op->cond.cond, res_op->ops[0],
-		     res_op->cond.else_value);
+      tree len = res_op->cond.len;
+      HOST_WIDE_INT ilen = -1;
+      if (len && TREE_CODE (len) == INTEGER_CST && tree_fits_uhwi_p (len))
+	ilen = tree_to_uhwi (len);at results from a simplification with all-one or all-zero mask we
could allow constant immediate vectors and expand them to simple
len moves in the backend.
+
+      auto sz = GET_MODE_NUNITS (TYPE_MODE (res_op->type));
+      bool full_len = len && known_eq (sz.coeffs[0], ilen);
+
+      if (!len || full_len)
+	new_op.set_op (VEC_COND_EXPR, res_op->type,
+		       res_op->cond.cond, res_op->ops[0],
+		       res_op->cond.else_value);
+      else
+	new_op.set_op (IFN_VCOND_MASK_LEN, res_op->type,
+		       res_op->cond.cond, res_op->ops[0],
+		       res_op->cond.else_value, res_op->cond.len,
+		       res_op->cond.bias);
       *res_op = new_op;
       return gimple_resimplify3 (seq, res_op, valueize);
     }
diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
index bec3ff42e3e..63a9f029589 100644
--- a/gcc/gimple-match.h
+++ b/gcc/gimple-match.h
@@ -32,7 +32,8 @@  public:
   enum uncond { UNCOND };
 
   /* Build an unconditional op.  */
-  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE) {}
+  gimple_match_cond (uncond) : cond (NULL_TREE), else_value (NULL_TREE), len
+			       (NULL_TREE), bias (NULL_TREE) {}
   gimple_match_cond (tree, tree);
   gimple_match_cond (tree, tree, tree, tree);
 
@@ -56,7 +57,8 @@  public:
 
 inline
 gimple_match_cond::gimple_match_cond (tree cond_in, tree else_value_in)
-  : cond (cond_in), else_value (else_value_in)
+  : cond (cond_in), else_value (else_value_in), len (NULL_TREE),
+    bias (NULL_TREE)
 {
 }
 
@@ -92,6 +94,10 @@  public:
 		   code_helper, tree, tree, tree, tree, tree);
   gimple_match_op (const gimple_match_cond &,
 		   code_helper, tree, tree, tree, tree, tree, tree);
+  gimple_match_op (const gimple_match_cond &,
+		   code_helper, tree, tree, tree, tree, tree, tree, tree);
+  gimple_match_op (const gimple_match_cond &,
+		   code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
 
   void set_op (code_helper, tree, unsigned int);
   void set_op (code_helper, tree, tree);
@@ -100,6 +106,8 @@  public:
   void set_op (code_helper, tree, tree, tree, tree, bool);
   void set_op (code_helper, tree, tree, tree, tree, tree);
   void set_op (code_helper, tree, tree, tree, tree, tree, tree);
+  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree);
+  void set_op (code_helper, tree, tree, tree, tree, tree, tree, tree, tree);
   void set_value (tree);
 
   tree op_or_null (unsigned int) const;
@@ -212,6 +220,39 @@  gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
   ops[4] = op4;
 }
 
+inline
+gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
+				  code_helper code_in, tree type_in,
+				  tree op0, tree op1, tree op2, tree op3,
+				  tree op4, tree op5)
+  : cond (cond_in), code (code_in), type (type_in), reverse (false),
+    num_ops (6)
+{
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+}
+
+inline
+gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
+				  code_helper code_in, tree type_in,
+				  tree op0, tree op1, tree op2, tree op3,
+				  tree op4, tree op5, tree op6)
+  : cond (cond_in), code (code_in), type (type_in), reverse (false),
+    num_ops (7)
+{
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+  ops[6] = op6;
+}
+
 /* Change the operation performed to CODE_IN, the type of the result to
    TYPE_IN, and the number of operands to NUM_OPS_IN.  The caller needs
    to set the operands itself.  */
@@ -299,6 +340,39 @@  gimple_match_op::set_op (code_helper code_in, tree type_in,
   ops[4] = op4;
 }
 
+inline void
+gimple_match_op::set_op (code_helper code_in, tree type_in,
+			 tree op0, tree op1, tree op2, tree op3, tree op4,
+			 tree op5)
+{
+  code = code_in;
+  type = type_in;
+  num_ops = 6;
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+}
+
+inline void
+gimple_match_op::set_op (code_helper code_in, tree type_in,
+			 tree op0, tree op1, tree op2, tree op3, tree op4,
+			 tree op5, tree op6)
+{
+  code = code_in;
+  type = type_in;
+  num_ops = 7;
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+  ops[5] = op5;
+  ops[6] = op6;
+}
+
 /* Set the "operation" to be the single value VALUE, such as a constant
    or SSA_NAME.  */
 
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 61d5a9e4772..b47c33faf85 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -170,6 +170,7 @@  init_internal_fns ()
 #define store_lanes_direct { 0, 0, false }
 #define mask_store_lanes_direct { 0, 0, false }
 #define vec_cond_mask_direct { 1, 0, false }
+#define vec_cond_mask_len_direct { 2, 0, false }
 #define vec_cond_direct { 2, 0, false }
 #define scatter_store_direct { 3, 1, false }
 #define len_store_direct { 3, 3, false }
@@ -3129,6 +3130,41 @@  expand_vec_cond_mask_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
     emit_move_insn (target, ops[0].value);
 }
 
+static void
+expand_vec_cond_mask_len_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  class expand_operand ops[6];
+
+  tree lhs = gimple_call_lhs (stmt);
+  tree op0 = gimple_call_arg (stmt, 0);
+  tree op1 = gimple_call_arg (stmt, 1);
+  tree op2 = gimple_call_arg (stmt, 2);
+  tree vec_cond_type = TREE_TYPE (lhs);
+
+  machine_mode mode = TYPE_MODE (vec_cond_type);
+  machine_mode mask_mode = TYPE_MODE (TREE_TYPE (op0));
+  enum insn_code icode = convert_optab_handler (optab, mode, mask_mode);
+  rtx rtx_op1, rtx_op2;
+
+  gcc_assert (icode != CODE_FOR_nothing);
+
+  rtx_op1 = expand_normal (op1);
+  rtx_op2 = expand_normal (op2);
+
+  rtx_op1 = force_reg (mode, rtx_op1);
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_op1, mode);
+  create_input_operand (&ops[2], rtx_op2, mode);
+
+  int opno = add_mask_and_len_args (ops, 3, stmt);
+  expand_insn (icode, opno, ops);
+
+  if (!rtx_equal_p (ops[0].value, target))
+    emit_move_insn (target, ops[0].value);
+}
+
 /* Expand VEC_SET internal functions.  */
 
 static void
@@ -4018,6 +4054,7 @@  multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_vec_cond_mask_optab_supported_p convert_optab_supported_p
+#define direct_vec_cond_mask_len_optab_supported_p convert_optab_supported_p
 #define direct_vec_cond_optab_supported_p convert_optab_supported_p
 #define direct_scatter_store_optab_supported_p convert_optab_supported_p
 #define direct_len_store_optab_supported_p direct_optab_supported_p
@@ -4690,6 +4727,7 @@  internal_fn_len_index (internal_fn fn)
     case IFN_MASK_LEN_STORE:
     case IFN_MASK_LEN_LOAD_LANES:
     case IFN_MASK_LEN_STORE_LANES:
+    case IFN_VCOND_MASK_LEN:
       return 3;
 
     default:
@@ -4721,6 +4759,9 @@  internal_fn_mask_index (internal_fn fn)
     case IFN_MASK_LEN_SCATTER_STORE:
       return 4;
 
+    case IFN_VCOND_MASK_LEN:
+      return 0;
+
     default:
       return (conditional_internal_fn_code (fn) != ERROR_MARK
 	      || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index a2023ab9c3d..581cc3b5140 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -221,6 +221,8 @@  DEF_INTERNAL_OPTAB_FN (VCONDU, ECF_CONST | ECF_NOTHROW, vcondu, vec_cond)
 DEF_INTERNAL_OPTAB_FN (VCONDEQ, ECF_CONST | ECF_NOTHROW, vcondeq, vec_cond)
 DEF_INTERNAL_OPTAB_FN (VCOND_MASK, ECF_CONST | ECF_NOTHROW,
 		       vcond_mask, vec_cond_mask)
+DEF_INTERNAL_OPTAB_FN (VCOND_MASK_LEN, ECF_CONST | ECF_NOTHROW,
+		       vcond_mask_len, vec_cond_mask_len)
 
 DEF_INTERNAL_OPTAB_FN (VEC_SET, ECF_CONST | ECF_NOTHROW, vec_set, vec_set)
 DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/match.pd b/gcc/match.pd
index ce8d159d260..f187d560fbf 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -87,6 +87,8 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   negate bit_not)
 (define_operator_list COND_UNARY
   IFN_COND_NEG IFN_COND_NOT)
+(define_operator_list COND_LEN_UNARY
+  IFN_COND_LEN_NEG IFN_COND_LEN_NOT)
 
 /* Binary operations and their associated IFN_COND_* function.  */
 (define_operator_list UNCOND_BINARY
@@ -103,12 +105,21 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   IFN_COND_FMIN IFN_COND_FMAX
   IFN_COND_AND IFN_COND_IOR IFN_COND_XOR
   IFN_COND_SHL IFN_COND_SHR)
+(define_operator_list COND_LEN_BINARY
+  IFN_COND_LEN_ADD IFN_COND_LEN_SUB
+  IFN_COND_LEN_MUL IFN_COND_LEN_DIV IFN_COND_LEN_MOD IFN_COND_LEN_RDIV
+  IFN_COND_LEN_MIN IFN_COND_LEN_MAX
+  IFN_COND_LEN_FMIN IFN_COND_LEN_FMAX
+  IFN_COND_LEN_AND IFN_COND_LEN_IOR IFN_COND_LEN_XOR
+  IFN_COND_LEN_SHL IFN_COND_LEN_SHR)
 
 /* Same for ternary operations.  */
 (define_operator_list UNCOND_TERNARY
   IFN_FMA IFN_FMS IFN_FNMA IFN_FNMS)
 (define_operator_list COND_TERNARY
   IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS)
+(define_operator_list COND_LEN_TERNARY
+  IFN_COND_LEN_FMA IFN_COND_LEN_FMS IFN_COND_LEN_FNMA IFN_COND_LEN_FNMS)
 
 /* __atomic_fetch_or_*, __atomic_fetch_xor_*, __atomic_xor_fetch_*  */
 (define_operator_list ATOMIC_FETCH_OR_XOR_N
@@ -8949,6 +8960,69 @@  and,
 	&& single_use (@5))
     (view_convert (cond_op (bit_not @0) @2 @3 @4
 		  (view_convert:op_type @1)))))))
+
+/* Similar for all cond_len operations.  */
+(for uncond_op (UNCOND_UNARY)
+     cond_op (COND_LEN_UNARY)
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@3 @1)) @2 @4 @5)
+   (with { tree op_type = TREE_TYPE (@3); }
+    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+        && is_truth_type_for (op_type, TREE_TYPE (@0)))
+     (cond_op @0 @1 @2 @4 @5))))
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@3 @2)) @4 @5)
+   (with { tree op_type = TREE_TYPE (@3); }
+    (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+        && is_truth_type_for (op_type, TREE_TYPE (@0)))
+     (cond_op (bit_not @0) @2 @1 @4 @5)))))
+
+(for uncond_op (UNCOND_BINARY)
+     cond_op (COND_LEN_BINARY)
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@4 @1 @2)) @3 @5 @6)
+  (with { tree op_type = TREE_TYPE (@4); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@4))
+    (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3) @5 @6)))))
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@4 @2 @3)) @5 @6)
+  (with { tree op_type = TREE_TYPE (@4); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@4))
+    (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1) @5 @6))))))
+
+(for uncond_op (UNCOND_TERNARY)
+     cond_op (COND_LEN_TERNARY)
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4 @6 @7)
+  (with { tree op_type = TREE_TYPE (@5); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@5))
+    (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4) @6 @7)))))
+ (simplify
+  (IFN_VCOND_MASK_LEN @0 @1 (view_convert? (uncond_op@5 @2 @3 @4 @6 @7)))
+  (with { tree op_type = TREE_TYPE (@5); }
+   (if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
+	&& is_truth_type_for (op_type, TREE_TYPE (@0))
+	&& single_use (@5))
+    (view_convert (cond_op (bit_not @0) @2 @3 @4 (view_convert:op_type @1) @6 @7))))))
+
+/* A VCOND_MASK_LEN with a size that equals the full hardware vector size
+   is just a vec_cond.  */
+(simplify
+ (IFN_VCOND_MASK_LEN @0 @1 @2 INTEGER_CST@3 INTEGER_CST@4)
+ (with {
+      HOST_WIDE_INT len = -1;
+      if (tree_fits_uhwi_p (@3))
+	len = tree_to_uhwi (@3);
+      auto sz = GET_MODE_NUNITS (TYPE_MODE (res_op->type));
+      bool full_len = (sz.coeffs[0] == len); }
+   (if (full_len)
+    (vec_cond @0 @1 @2))))
 #endif
 
 /* Detect cases in which a VEC_COND_EXPR effectively replaces the
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 2ccbe4197b7..3cb16bd3002 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -88,6 +88,7 @@  OPTAB_CD(vcond_optab, "vcond$a$b")
 OPTAB_CD(vcondu_optab, "vcondu$a$b")
 OPTAB_CD(vcondeq_optab, "vcondeq$a$b")
 OPTAB_CD(vcond_mask_optab, "vcond_mask_$a$b")
+OPTAB_CD(vcond_mask_len_optab, "vcond_mask_len_$a$b")
 OPTAB_CD(vec_cmp_optab, "vec_cmp$a$b")
 OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b")
 OPTAB_CD(vec_cmpeq_optab, "vec_cmpeq$a$b")