diff mbox series

RISC-V: Refactor and clean expand_cond_len_{unop, binop, ternop}

Message ID 20230825080204.3531681-1-lehua.ding@rivai.ai
State New
Headers show
Series RISC-V: Refactor and clean expand_cond_len_{unop, binop, ternop} | expand

Commit Message

Lehua Ding Aug. 25, 2023, 8:02 a.m. UTC
Hi,

This patch refactors the codes of expand_cond_len_{unop,binop,ternop}.
Introduces a new unified function expand_cond_len_op to do the main thing.
The expand_cond_len_{unop,binop,ternop} functions only care about how
to pass the operands to the intrinsic patterns.

Best,
Lehua

gcc/ChangeLog:

	* config/riscv/autovec.md: Adjust
	* config/riscv/riscv-protos.h (RVV_VUNDEF): Clean.
	(get_vlmax_rtx): Exported.
	* config/riscv/riscv-v.cc (emit_nonvlmax_fp_ternary_tu_insn): Deleted.
	(emit_vlmax_masked_gather_mu_insn): Adjust.
	(get_vlmax_rtx): New func.
	(expand_load_store): Adjust.
	(expand_cond_len_unop): Call expand_cond_len_op.
	(expand_cond_len_op): New subroutine.
	(expand_cond_len_binop): Call expand_cond_len_op.
	(expand_cond_len_ternop): Call expand_cond_len_op.
	(expand_lanes_load_store): Adjust.

---
 gcc/config/riscv/autovec.md     |   6 +-
 gcc/config/riscv/riscv-protos.h |  16 ++-
 gcc/config/riscv/riscv-v.cc     | 166 ++++++++++----------------------
 3 files changed, 60 insertions(+), 128 deletions(-)

Comments

Robin Dapp Aug. 28, 2023, 10:35 a.m. UTC | #1
Hi Lehua,

thanks for starting with the refactoring.  I have some minor comments.

> +/* The value means the number of operands for insn_expander.  */
>  enum insn_type
>  {
>    RVV_MISC_OP = 1,
>    RVV_UNOP = 2,
> -  RVV_UNOP_M = RVV_UNOP + 2,
> -  RVV_UNOP_MU = RVV_UNOP + 2,
> -  RVV_UNOP_TU = RVV_UNOP + 2,
> -  RVV_UNOP_TUMU = RVV_UNOP + 2,
> +  RVV_UNOP_MASK = RVV_UNOP + 2,

Cleanup up here is good, right now it's not really an insn_type but
indeed just the number of operands.  My original idea was to have an
insn type and a mostly unified expander that performs all necessary
operations depending on the insn_type.  Just to give an idea of why it's
called that way.

> +  rtx ops[RVV_BINOP_MASK] = {target, mask, target, op, sel};
> +  emit_vlmax_masked_mu_insn (icode, RVV_BINOP_MASK, ops);

One of the ideas was that a function emit_vlmax_masked_mu_insn would already
know that it's dealing with a mask and we would just pass something like
RVV_BINOP.  The other way would be to just have emit_vlmax_mu_insn or
something and let the rest be deduced from the insn_type.  Even the vlmax
I intended to have mostly implicit but that somehow got lost during
refactorings :)  No need to change anything for now, just for perspective
again. 

> -/* Expand unary ops COND_LEN_*.  */
> -void
> -expand_cond_len_unop (rtx_code code, rtx *ops)
> +/* Subroutine to expand COND_LEN_* patterns.  */
> +static void
> +expand_cond_len_op (rtx_code code, unsigned icode, int op_num, rtx *cond_ops,
> +		    rtx len)
>  {

Would you mind renaming op_num (i.e. usually understood as operand_number) into
num_ops or nops? (i.e. number of operands).  That way we would be more in line of
what the later expander functions do.

> -  rtx dest = ops[0];
> -  rtx mask = ops[1];
> -  rtx src = ops[2];
> -  rtx merge = ops[3];
> -  rtx len = ops[4];
> +  rtx dest = cond_ops[0];
> +  rtx mask = cond_ops[1];

I would actually prefer to keep "ops" because it's already clear from the
function name that we work with a conditional function (and we don't have
any other ops).

>  
> +/* Expand unary ops COND_LEN_*.  */
> +void
> +expand_cond_len_unop (rtx_code code, rtx *ops)
> +{
> +  rtx dest = ops[0];
> +  rtx mask = ops[1];
> +  rtx src = ops[2];
> +  rtx merge = ops[3];
> +  rtx len = ops[4];
> +
> +  machine_mode mode = GET_MODE (dest);
> +  insn_code icode = code_for_pred (code, mode);
> +  rtx cond_ops[RVV_UNOP_MASK] = {dest, mask, merge, src};
> +  expand_cond_len_op (code, icode, RVV_UNOP_MASK, cond_ops, len);
> +}

We're already a bit inconsistent with how we pasds mask, merge and the source
operands.  Maybe we could also unify this a bit?  I don't have a clear
preference for either, though.

> +  rtx cond_ops[RVV_BINOP_MASK] = {dest, mask, merge, src1, src2};

Here, the merge comes before the sources as well.

> +  rtx cond_ops[RVV_TERNOP_MASK] = {dest, mask, src1, src2, src3, merge};
And here, the merge comes last.  I realize this makes sense in the context
of a ternary operation because the merge is always "real".  As our vector
patterns are similar, maybe we should use this ordering all the time?

Regards
 Robin
Lehua Ding Aug. 28, 2023, 11:32 a.m. UTC | #2
Hi Robin,

Thanks for reviewing.

> Cleanup up here is good, right now it's not really an insn_type but
> indeed just the number of operands.  My original idea was to have an
> insn type and a mostly unified expander that performs all necessary
> operations depending on the insn_type.  Just to give an idea of why it's
> called that way.
> 
>> +  rtx ops[RVV_BINOP_MASK] = {target, mask, target, op, sel};
>> +  emit_vlmax_masked_mu_insn (icode, RVV_BINOP_MASK, ops);
> 
> One of the ideas was that a function emit_vlmax_masked_mu_insn would already
> know that it's dealing with a mask and we would just pass something like
> RVV_BINOP.  The other way would be to just have emit_vlmax_mu_insn or
> something and let the rest be deduced from the insn_type.  Even the vlmax
> I intended to have mostly implicit but that somehow got lost during
> refactorings :)  No need to change anything for now, just for perspective
> again.

I think the ideas of these two comments will be reflected in the next 
patch of refactoring emit_vlmax/nonvlmax_xxx functions (abstract several 
uniform types of RVV instructions), thanks a lot!


> Would you mind renaming op_num (i.e. usually understood as operand_number) into
> num_ops or nops? (i.e. number of operands).  That way we would be more in line of
> what the later expander functions do.

OK, rename op_num into num_ops.


> I would actually prefer to keep "ops" because it's already clear from the
> function name that we work with a conditional function (and we don't have
> any other ops).

No problem.


> We're already a bit inconsistent with how we pasds mask, merge and the source
> operands.  Maybe we could also unify this a bit?  I don't have a clear
> preference for either, though.
> 
>> +  rtx cond_ops[RVV_BINOP_MASK] = {dest, mask, merge, src1, src2};
> 
> Here, the merge comes before the sources as well.
> 
>> +  rtx cond_ops[RVV_TERNOP_MASK] = {dest, mask, src1, src2, src3, merge};
> And here, the merge comes last.  I realize this makes sense in the context
> of a ternary operation because the merge is always "real".  As our vector
> patterns are similar, maybe we should use this ordering all the time?

Yes, the ternary merge is not placed in operand 2 like binop or unop, I 
discussed it with Juzhe and it could be unified, but it would change to 
the intrinsic part, so I would suggest to bring up a separate patch to 
unfied the operand order. The unfied order like this:

   DEST, MASK, MERGE (this three operands fixed for most insns)
   OPS (the number can be 1, 2, 3)
   VL, TAIL_POLICY, MASK_POLICY, AVL_TYPE, ROUDING_MODE
Lehua Ding Aug. 29, 2023, 4:30 a.m. UTC | #3
Here is the V3 patch fix the comments, thanks.

https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628650.html
diff mbox series

Patch

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index e9659b2b157..ad072ff439a 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -971,9 +971,9 @@ 
   rtx mask = gen_reg_rtx (mask_mode);
   riscv_vector::expand_vec_cmp (mask, LT, operands[1], zero);
 
-  rtx ops[] = {operands[0], mask, operands[1], operands[1]};
-  riscv_vector::emit_vlmax_masked_mu_insn (code_for_pred (NEG, <MODE>mode),
-					   riscv_vector::RVV_UNOP_MU, ops);
+  rtx ops[] = {operands[0], mask, operands[1], operands[1],
+               riscv_vector::get_vlmax_rtx (<MODE>mode)};
+  riscv_vector::expand_cond_len_unop (NEG, ops);
   DONE;
 })
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 2c4405c9860..6a49718b34d 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -180,25 +180,20 @@  namespace riscv_vector {
 #define RVV_VUNDEF(MODE)                                                       \
   gen_rtx_UNSPEC (MODE, gen_rtvec (1, gen_rtx_REG (SImode, X0_REGNUM)),        \
 		  UNSPEC_VUNDEF)
+
+/* The value means the number of operands for insn_expander.  */
 enum insn_type
 {
   RVV_MISC_OP = 1,
   RVV_UNOP = 2,
-  RVV_UNOP_M = RVV_UNOP + 2,
-  RVV_UNOP_MU = RVV_UNOP + 2,
-  RVV_UNOP_TU = RVV_UNOP + 2,
-  RVV_UNOP_TUMU = RVV_UNOP + 2,
+  RVV_UNOP_MASK = RVV_UNOP + 2,
   RVV_BINOP = 3,
-  RVV_BINOP_MU = RVV_BINOP + 2,
-  RVV_BINOP_TU = RVV_BINOP + 2,
-  RVV_BINOP_TUMU = RVV_BINOP + 2,
+  RVV_BINOP_MASK = RVV_BINOP + 2,
   RVV_MERGE_OP = 4,
   RVV_CMP_OP = 4,
   RVV_CMP_MU_OP = RVV_CMP_OP + 2, /* +2 means mask and maskoff operand.  */
   RVV_TERNOP = 5,
-  RVV_TERNOP_MU = RVV_TERNOP + 1,
-  RVV_TERNOP_TU = RVV_TERNOP + 1,
-  RVV_TERNOP_TUMU = RVV_TERNOP + 1,
+  RVV_TERNOP_MASK = RVV_TERNOP + 1,
   RVV_WIDEN_TERNOP = 4,
   RVV_SCALAR_MOV_OP = 4, /* +1 for VUNDEF according to vector.md.  */
   RVV_SLIDE_OP = 4,      /* Dest, VUNDEF, source and offset.  */
@@ -258,6 +253,7 @@  void emit_vlmax_masked_mu_insn (unsigned, int, rtx *);
 void emit_scalar_move_insn (unsigned, rtx *, rtx = 0);
 void emit_nonvlmax_integer_move_insn (unsigned, rtx *, rtx);
 enum vlmul_type get_vlmul (machine_mode);
+rtx get_vlmax_rtx (machine_mode);
 unsigned int get_ratio (machine_mode);
 unsigned int get_nf (machine_mode);
 machine_mode get_subpart_mode (machine_mode);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 14eda581d00..8b5bb0097bc 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -759,28 +759,6 @@  emit_vlmax_fp_ternary_insn (unsigned icode, int op_num, rtx *ops, rtx vl)
   e.emit_insn ((enum insn_code) icode, ops);
 }
 
-/* This function emits a {NONVLMAX, TAIL_UNDISTURBED, MASK_ANY} vsetvli followed
- * by the ternary operation which always has a real merge operand.  */
-static void
-emit_nonvlmax_fp_ternary_tu_insn (unsigned icode, int op_num, rtx *ops, rtx vl)
-{
-  machine_mode dest_mode = GET_MODE (ops[0]);
-  machine_mode mask_mode = get_mask_mode (dest_mode);
-  insn_expander<RVV_INSN_OPERANDS_MAX> e (/*OP_NUM*/ op_num,
-					  /*HAS_DEST_P*/ true,
-					  /*FULLY_UNMASKED_P*/ false,
-					  /*USE_REAL_MERGE_P*/ true,
-					  /*HAS_AVL_P*/ true,
-					  /*VLMAX_P*/ false,
-					  /*DEST_MODE*/ dest_mode,
-					  /*MASK_MODE*/ mask_mode);
-  e.set_policy (TAIL_UNDISTURBED);
-  e.set_policy (MASK_ANY);
-  e.set_rounding_mode (FRM_DYN);
-  e.set_vl (vl);
-  e.emit_insn ((enum insn_code) icode, ops);
-}
-
 /* This function emits a {NONVLMAX, TAIL_ANY, MASK_ANY} vsetvli followed by the
  * actual operation.  */
 void
@@ -1165,8 +1143,8 @@  emit_vlmax_masked_gather_mu_insn (rtx target, rtx op, rtx sel, rtx mask)
     }
   else
     icode = code_for_pred_gather (data_mode);
-  rtx ops[] = {target, mask, target, op, sel};
-  emit_vlmax_masked_mu_insn (icode, RVV_BINOP_MU, ops);
+  rtx ops[RVV_BINOP_MASK] = {target, mask, target, op, sel};
+  emit_vlmax_masked_mu_insn (icode, RVV_BINOP_MASK, ops);
 }
 
 /* According to RVV ISA spec (16.5.1. Synthesizing vdecompress):
@@ -1796,6 +1774,14 @@  get_vlmul (machine_mode mode)
   return mode_vtype_infos.vlmul[mode];
 }
 
+/* Return the VLMAX rtx of vector mode MODE.  */
+rtx
+get_vlmax_rtx (machine_mode mode)
+{
+  gcc_assert (riscv_v_ext_vector_mode_p (mode));
+  return gen_int_mode (GET_MODE_NUNITS (mode), Pmode);
+}
+
 /* Return the NF value of the corresponding mode.  */
 unsigned int
 get_nf (machine_mode mode)
@@ -3353,7 +3339,7 @@  expand_load_store (rtx *ops, bool is_load)
       if (is_load)
 	{
 	  rtx m_ops[] = {ops[0], mask, RVV_VUNDEF (mode), ops[1]};
-	  emit_vlmax_masked_insn (code_for_pred_mov (mode), RVV_UNOP_M, m_ops);
+	  emit_vlmax_masked_insn (code_for_pred_mov (mode), RVV_UNOP_MASK, m_ops);
 	}
       else
 	{
@@ -3370,7 +3356,7 @@  expand_load_store (rtx *ops, bool is_load)
       if (is_load)
 	{
 	  rtx m_ops[] = {ops[0], mask, RVV_VUNDEF (mode), ops[1]};
-	  emit_nonvlmax_masked_insn (code_for_pred_mov (mode), RVV_UNOP_M,
+	  emit_nonvlmax_masked_insn (code_for_pred_mov (mode), RVV_UNOP_MASK,
 				     m_ops, len);
 	}
       else
@@ -3389,32 +3375,26 @@  needs_fp_rounding (rtx_code code, machine_mode mode)
   return code != SMIN && code != SMAX && code != NEG && code != ABS;
 }
 
-/* Expand unary ops COND_LEN_*.  */
-void
-expand_cond_len_unop (rtx_code code, rtx *ops)
+/* Subroutine to expand COND_LEN_* patterns.  */
+static void
+expand_cond_len_op (rtx_code code, unsigned icode, int op_num, rtx *cond_ops,
+		    rtx len)
 {
-  rtx dest = ops[0];
-  rtx mask = ops[1];
-  rtx src = ops[2];
-  rtx merge = ops[3];
-  rtx len = ops[4];
+  rtx dest = cond_ops[0];
+  rtx mask = cond_ops[1];
   machine_mode mode = GET_MODE (dest);
   machine_mode mask_mode = GET_MODE (mask);
-
   poly_int64 value;
   bool is_dummy_mask = rtx_equal_p (mask, CONSTM1_RTX (mask_mode));
   bool is_vlmax_len
     = poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode));
-  rtx cond_ops[] = {dest, mask, merge, src};
-  insn_code icode = code_for_pred (code, mode);
-
   if (is_dummy_mask)
     {
       /* Use TU, MASK ANY policy.  */
       if (needs_fp_rounding (code, mode))
-	emit_nonvlmax_fp_tu_insn (icode, RVV_UNOP_TU, cond_ops, len);
+	emit_nonvlmax_fp_tu_insn (icode, op_num, cond_ops, len);
       else
-	emit_nonvlmax_tu_insn (icode, RVV_UNOP_TU, cond_ops, len);
+	emit_nonvlmax_tu_insn (icode, op_num, cond_ops, len);
     }
   else
     {
@@ -3422,21 +3402,37 @@  expand_cond_len_unop (rtx_code code, rtx *ops)
 	{
 	  /* Use TAIL ANY, MU policy.  */
 	  if (needs_fp_rounding (code, mode))
-	    emit_vlmax_masked_fp_mu_insn (icode, RVV_UNOP_MU, cond_ops);
+	    emit_vlmax_masked_fp_mu_insn (icode, op_num, cond_ops);
 	  else
-	    emit_vlmax_masked_mu_insn (icode, RVV_UNOP_MU, cond_ops);
+	    emit_vlmax_masked_mu_insn (icode, op_num, cond_ops);
 	}
       else
 	{
 	  /* Use TU, MU policy.  */
 	  if (needs_fp_rounding (code, mode))
-	    emit_nonvlmax_fp_tumu_insn (icode, RVV_UNOP_TUMU, cond_ops, len);
+	    emit_nonvlmax_fp_tumu_insn (icode, op_num, cond_ops, len);
 	  else
-	    emit_nonvlmax_tumu_insn (icode, RVV_UNOP_TUMU, cond_ops, len);
+	    emit_nonvlmax_tumu_insn (icode, op_num, cond_ops, len);
 	}
     }
 }
 
+/* Expand unary ops COND_LEN_*.  */
+void
+expand_cond_len_unop (rtx_code code, rtx *ops)
+{
+  rtx dest = ops[0];
+  rtx mask = ops[1];
+  rtx src = ops[2];
+  rtx merge = ops[3];
+  rtx len = ops[4];
+
+  machine_mode mode = GET_MODE (dest);
+  insn_code icode = code_for_pred (code, mode);
+  rtx cond_ops[RVV_UNOP_MASK] = {dest, mask, merge, src};
+  expand_cond_len_op (code, icode, RVV_UNOP_MASK, cond_ops, len);
+}
+
 /* Expand binary ops COND_LEN_*.  */
 void
 expand_cond_len_binop (rtx_code code, rtx *ops)
@@ -3447,43 +3443,11 @@  expand_cond_len_binop (rtx_code code, rtx *ops)
   rtx src2 = ops[3];
   rtx merge = ops[4];
   rtx len = ops[5];
-  machine_mode mode = GET_MODE (dest);
-  machine_mode mask_mode = GET_MODE (mask);
 
-  poly_int64 value;
-  bool is_dummy_mask = rtx_equal_p (mask, CONSTM1_RTX (mask_mode));
-  bool is_vlmax_len
-    = poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode));
-  rtx cond_ops[] = {dest, mask, merge, src1, src2};
+  machine_mode mode = GET_MODE (dest);
   insn_code icode = code_for_pred (code, mode);
-
-  if (is_dummy_mask)
-    {
-      /* Use TU, MASK ANY policy.  */
-      if (needs_fp_rounding (code, mode))
-	emit_nonvlmax_fp_tu_insn (icode, RVV_BINOP_TU, cond_ops, len);
-      else
-	emit_nonvlmax_tu_insn (icode, RVV_BINOP_TU, cond_ops, len);
-    }
-  else
-    {
-      if (is_vlmax_len)
-	{
-	  /* Use TAIL ANY, MU policy.  */
-	  if (needs_fp_rounding (code, mode))
-	    emit_vlmax_masked_fp_mu_insn (icode, RVV_BINOP_MU, cond_ops);
-	  else
-	    emit_vlmax_masked_mu_insn (icode, RVV_BINOP_MU, cond_ops);
-	}
-      else
-	{
-	  /* Use TU, MU policy.  */
-	  if (needs_fp_rounding (code, mode))
-	    emit_nonvlmax_fp_tumu_insn (icode, RVV_BINOP_TUMU, cond_ops, len);
-	  else
-	    emit_nonvlmax_tumu_insn (icode, RVV_BINOP_TUMU, cond_ops, len);
-	}
-    }
+  rtx cond_ops[RVV_BINOP_MASK] = {dest, mask, merge, src1, src2};
+  expand_cond_len_op (code, icode, RVV_BINOP_MASK, cond_ops, len);
 }
 
 /* Prepare insn_code for gather_load/scatter_store according to
@@ -3649,42 +3613,14 @@  expand_cond_len_ternop (unsigned icode, rtx *ops)
 {
   rtx dest = ops[0];
   rtx mask = ops[1];
+  rtx src1 = ops[2];
+  rtx src2 = ops[3];
+  rtx src3 = ops[4];
+  rtx merge = ops[5];
   rtx len = ops[6];
-  machine_mode mode = GET_MODE (dest);
-  machine_mode mask_mode = GET_MODE (mask);
-
-  poly_int64 value;
-  bool is_dummy_mask = rtx_equal_p (mask, CONSTM1_RTX (mask_mode));
-  bool is_vlmax_len
-    = poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode));
 
-  if (is_dummy_mask)
-    {
-      /* Use TU, MASK ANY policy.  */
-      if (FLOAT_MODE_P (mode))
-	emit_nonvlmax_fp_ternary_tu_insn (icode, RVV_TERNOP_TU, ops, len);
-      else
-	emit_nonvlmax_tu_insn (icode, RVV_TERNOP_TU, ops, len);
-    }
-  else
-    {
-      if (is_vlmax_len)
-	{
-	  /* Use TAIL ANY, MU policy.  */
-	  if (FLOAT_MODE_P (mode))
-	    emit_vlmax_masked_fp_mu_insn (icode, RVV_TERNOP_MU, ops);
-	  else
-	    emit_vlmax_masked_mu_insn (icode, RVV_TERNOP_MU, ops);
-	}
-      else
-	{
-	  /* Use TU, MU policy.  */
-	  if (FLOAT_MODE_P (mode))
-	    emit_nonvlmax_fp_tumu_insn (icode, RVV_TERNOP_TUMU, ops, len);
-	  else
-	    emit_nonvlmax_tumu_insn (icode, RVV_TERNOP_TUMU, ops, len);
-	}
-    }
+  rtx cond_ops[RVV_TERNOP_MASK] = {dest, mask, src1, src2, src3, merge};
+  expand_cond_len_op (UNSPEC, icode, RVV_TERNOP_MASK, cond_ops, len);
 }
 
 /* Expand reduction operations.  */
@@ -3790,7 +3726,7 @@  expand_lanes_load_store (rtx *ops, bool is_load)
 	{
 	  rtx m_ops[] = {reg, mask, RVV_VUNDEF (mode), addr};
 	  emit_vlmax_masked_insn (code_for_pred_unit_strided_load (mode),
-				  RVV_UNOP_M, m_ops);
+				  RVV_UNOP_MASK, m_ops);
 	}
       else
 	{
@@ -3808,7 +3744,7 @@  expand_lanes_load_store (rtx *ops, bool is_load)
 	{
 	  rtx m_ops[] = {reg, mask, RVV_VUNDEF (mode), addr};
 	  emit_nonvlmax_masked_insn (code_for_pred_unit_strided_load (mode),
-				     RVV_UNOP_M, m_ops, len);
+				     RVV_UNOP_MASK, m_ops, len);
 	}
       else
 	emit_insn (gen_pred_unit_strided_store (mode, mask, addr, reg, len,