diff mbox series

[26/44] RISC-V: Add `movMODEcc' implementation for generic targets

Message ID alpine.DEB.2.20.2311181833450.5892@tpp.orcam.me.uk
State New
Headers show
Series RISC-V: Various if-conversion fixes and improvements | expand

Commit Message

Maciej W. Rozycki Nov. 19, 2023, 5:40 a.m. UTC
Provide RTL expansion of conditional-move operations for generic targets 
using a suitable sequence of base integer machine instructions according 
to cost evaluation by if-conversion.  Add `-mmovcc' command line option 
to enable this transformation, off by default.

For the generic sequences small immediates as per the `arith_operand' 
predicate are cost-equivalent to registers as we can use them as input, 
alternative to a register, to the respective AND[I] machine operations, 
however we need to reject immediates fulfilling `lui_operand', because 
they would require reloading into a register, making the operation more 
costly.  Therefore add `movcc_operand' predicate and use it accordingly.

There is a need to adjust zbs-bext-02.c, which can also serve as emitted 
code example, because with certain compilation options an AND operation 
can now legitimately appear in output despite BEXT having been produced 
as expected, such as with `-march=rv64gc -O2':

foo:
	mv	a3,a0
	li	a5,0
	mv	a0,a1
	li	a2,64
	li	a1,1
.L3:
	sll	a4,a1,a5
	and	a4,a4,a3
	addiw	a5,a5,1
	beq	a4,zero,.L2
	addiw	a0,a0,1
.L2:
	bne	a5,a2,.L3
	ret

vs `-march=rv64gc_zbs -O2':

foo:
	mv	a4,a0
	li	a5,0
	mv	a0,a1
	li	a3,64
.L3:
	bext	a2,a4,a5
	beq	a2,zero,.L2
	addiw	a0,a0,1
.L2:
	addiw	a5,a5,1
	bne	a5,a3,.L3
	ret

and then with `-march=rv64gc -mmovcc -mbranch-cost=7':

foo:
	mv	a6,a0
	li	a4,0
	mv	a0,a1
	li	a7,1
	li	a1,64
.L3:
	sll	a5,a7,a4
	and	a5,a5,a6
	snez	a5,a5
	neg	a5,a5
	not	a2,a5
	addiw	a3,a0,1
	and	a5,a5,a3
	and	a0,a2,a0
	addiw	a4,a4,1
	or	a0,a5,a0
	bne	a4,a1,.L3
	ret

vs `-march=rv64gc_zbs -mmovcc -mbranch-cost=7':

foo:
	mv	a6,a0
	li	a4,0
	mv	a0,a1
	li	a1,64
.L3:
	bext	a5,a6,a4
	neg	a5,a5
	not	a2,a5
	addiw	a3,a0,1
	and	a5,a5,a3
	and	a0,a2,a0
	addiw	a4,a4,1
	or	a0,a5,a0
	bne	a4,a1,.L3
	ret

However BEXT is supposed to replace an SLL operation so adjust the test 
case to reject SLL rather than AND, letting the test case pass even with 
`/-mmovcc/-mbranch-cost=7' specified as DejaGNU test flags (and in the 
absence of target-specific conditional-move operations enabled either by 
default or with other test flags).

	gcc/
	* config/riscv/predicates.md (movcc_operand): New predicate.
	* config/riscv/riscv.cc (riscv_expand_conditional_move): Handle
	generic targets.
	* config/riscv/riscv.md (mov<mode>cc): Likewise.
	* config/riscv/riscv.opt (mmovcc): New option.
	* doc/invoke.texi (Option Summary): Document it.

	gcc/testsuite/
	* gcc.target/riscv/zbs-bext-02.c: Adjust to reject SLL rather 
	than AND.
---
 gcc/config/riscv/predicates.md               |    6 +++
 gcc/config/riscv/riscv.cc                    |   41 ++++++++++++++++++++++-----
 gcc/config/riscv/riscv.md                    |    7 ++--
 gcc/config/riscv/riscv.opt                   |    4 ++
 gcc/doc/invoke.texi                          |    9 +++++
 gcc/testsuite/gcc.target/riscv/zbs-bext-02.c |    2 -
 6 files changed, 58 insertions(+), 11 deletions(-)

gcc-riscv-movcc.diff

Comments

Jeff Law Nov. 19, 2023, 6:18 p.m. UTC | #1
On 11/18/23 22:40, Maciej W. Rozycki wrote:
> Provide RTL expansion of conditional-move operations for generic targets
> using a suitable sequence of base integer machine instructions according
> to cost evaluation by if-conversion.  Add `-mmovcc' command line option
> to enable this transformation, off by default.
> 
> For the generic sequences small immediates as per the `arith_operand'
> predicate are cost-equivalent to registers as we can use them as input,
> alternative to a register, to the respective AND[I] machine operations,
> however we need to reject immediates fulfilling `lui_operand', because
> they would require reloading into a register, making the operation more
> costly.  Therefore add `movcc_operand' predicate and use it accordingly.
> 
> There is a need to adjust zbs-bext-02.c, which can also serve as emitted
> code example, because with certain compilation options an AND operation
> can now legitimately appear in output despite BEXT having been produced
> as expected, such as with `-march=rv64gc -O2':
> 
> foo:
> 	mv	a3,a0
> 	li	a5,0
> 	mv	a0,a1
> 	li	a2,64
> 	li	a1,1
> .L3:
> 	sll	a4,a1,a5
> 	and	a4,a4,a3
> 	addiw	a5,a5,1
> 	beq	a4,zero,.L2
> 	addiw	a0,a0,1
> .L2:
> 	bne	a5,a2,.L3
> 	ret
> 
> vs `-march=rv64gc_zbs -O2':
> 
> foo:
> 	mv	a4,a0
> 	li	a5,0
> 	mv	a0,a1
> 	li	a3,64
> .L3:
> 	bext	a2,a4,a5
> 	beq	a2,zero,.L2
> 	addiw	a0,a0,1
> .L2:
> 	addiw	a5,a5,1
> 	bne	a5,a3,.L3
> 	ret
> 
> and then with `-march=rv64gc -mmovcc -mbranch-cost=7':
> 
> foo:
> 	mv	a6,a0
> 	li	a4,0
> 	mv	a0,a1
> 	li	a7,1
> 	li	a1,64
> .L3:
> 	sll	a5,a7,a4
> 	and	a5,a5,a6
> 	snez	a5,a5
> 	neg	a5,a5
> 	not	a2,a5
> 	addiw	a3,a0,1
> 	and	a5,a5,a3
> 	and	a0,a2,a0
> 	addiw	a4,a4,1
> 	or	a0,a5,a0
> 	bne	a4,a1,.L3
> 	ret
> 
> vs `-march=rv64gc_zbs -mmovcc -mbranch-cost=7':
> 
> foo:
> 	mv	a6,a0
> 	li	a4,0
> 	mv	a0,a1
> 	li	a1,64
> .L3:
> 	bext	a5,a6,a4
> 	neg	a5,a5
> 	not	a2,a5
> 	addiw	a3,a0,1
> 	and	a5,a5,a3
> 	and	a0,a2,a0
> 	addiw	a4,a4,1
> 	or	a0,a5,a0
> 	bne	a4,a1,.L3
> 	ret
> 
> However BEXT is supposed to replace an SLL operation so adjust the test
> case to reject SLL rather than AND, letting the test case pass even with
> `/-mmovcc/-mbranch-cost=7' specified as DejaGNU test flags (and in the
> absence of target-specific conditional-move operations enabled either by
> default or with other test flags).
> 
> 	gcc/
> 	* config/riscv/predicates.md (movcc_operand): New predicate.
> 	* config/riscv/riscv.cc (riscv_expand_conditional_move): Handle
> 	generic targets.
> 	* config/riscv/riscv.md (mov<mode>cc): Likewise.
> 	* config/riscv/riscv.opt (mmovcc): New option.
> 	* doc/invoke.texi (Option Summary): Document it.
> 
> 	gcc/testsuite/
> 	* gcc.target/riscv/zbs-bext-02.c: Adjust to reject SLL rather
> 	than AND.
OK.  Just curious are y'all seeing significant interest in this case 
from customers or is this more a case of rounding out the implementation 
to cover all potential possibilities?



Jeff
Maciej W. Rozycki Nov. 23, 2023, 10:16 p.m. UTC | #2
On Sun, 19 Nov 2023, Jeff Law wrote:

> OK.  Just curious are y'all seeing significant interest in this case from
> customers or is this more a case of rounding out the implementation to cover
> all potential possibilities?

 As in the cover letter: we have a case where the pipeline seems to imply 
high enough a branch penalty for branchless alternatives to be worth 
considering even where built from more than just a couple instructions.  
We yet have to fully evaluate it for our case and it may be that these 
sequences are costlier than branches after all or that the boundary lies 
somewhere in between (or that we end up with Zicond also available in the 
CPU RTL).  However we realised that combined with `-mbranch-cost=' it 
might be a useful tool for performance evaluation for other people as 
well.

 This, combined with the branch costing issues observed earlier and 
attempted to address with 09/44, is also why I chose, conservatively, to 
add the `-mmovcc' option to control this feature: we may choose to flip it 
on by default for some tuning architectures or even globally at one point.  
Especially as it seems like the `addMODEcc' part could be profitable for 
some architectures already.

  Maciej
diff mbox series

Patch

Index: gcc/gcc/config/riscv/predicates.md
===================================================================
--- gcc.orig/gcc/config/riscv/predicates.md
+++ gcc/gcc/config/riscv/predicates.md
@@ -41,6 +41,12 @@ 
   (ior (match_operand 0 "arith_operand")
        (match_operand 0 "lui_operand")))
 
+(define_predicate "movcc_operand"
+  (if_then_else (match_test "TARGET_SFB_ALU || TARGET_XTHEADCONDMOV
+			     || TARGET_ZICOND_LIKE")
+		(match_operand 0 "sfb_alu_operand")
+		(match_operand 0 "arith_operand")))
+
 (define_predicate "const_csr_operand"
   (and (match_code "const_int")
        (match_test "IN_RANGE (INTVAL (op), 0, 31)")))
Index: gcc/gcc/config/riscv/riscv.cc
===================================================================
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -4099,7 +4099,9 @@  riscv_expand_conditional_move (rtx dest,
   rtx op0 = XEXP (op, 0);
   rtx op1 = XEXP (op, 1);
 
-  if ((TARGET_ZICOND_LIKE && GET_MODE_CLASS (mode) == MODE_INT)
+  if (((TARGET_ZICOND_LIKE
+	|| (arith_operand (cons, mode) && arith_operand (alt, mode)))
+       && (GET_MODE_CLASS (mode) == MODE_INT))
       || TARGET_SFB_ALU || TARGET_XTHEADCONDMOV)
     {
       machine_mode mode0 = GET_MODE (op0);
@@ -4113,6 +4115,15 @@  riscv_expand_conditional_move (rtx dest,
 	  || (mode1 != word_mode && mode1 != VOIDmode))
 	return false;
 
+      /* In the fallback generic case use MODE rather than WORD_MODE for
+	 the output of the SCC instruction, to match the mode of the NEG
+	 operation below.  The output of SCC is 0 or 1 boolean, so it is
+	 valid for input in any scalar integer mode.  */
+      rtx tmp = gen_reg_rtx ((TARGET_ZICOND_LIKE
+			      || TARGET_SFB_ALU || TARGET_XTHEADCONDMOV)
+			     ? word_mode : mode);
+      bool invert = false;
+
       /* Canonicalize the comparison.  It must be an equality comparison
 	 of integer operands, or with SFB it can be any comparison of
 	 integer operands.  If it isn't, then emit an SCC instruction
@@ -4121,7 +4132,6 @@  riscv_expand_conditional_move (rtx dest,
 	  || !INTEGRAL_MODE_P (mode0))
 	{
 	  bool *invert_ptr = nullptr;
-	  bool invert = false;
 
 	  /* If riscv_expand_int_scc inverts the condition, then it will
 	     flip the value of INVERT.  We need to know where so that
@@ -4129,11 +4139,9 @@  riscv_expand_conditional_move (rtx dest,
 	  if (code == LE || code == LEU || code == GE || code == GEU)
 	    invert_ptr = &invert;
 
-	  /* Emit an scc like instruction into a temporary
-	     so that we can use an EQ/NE comparison.  */
-	  rtx tmp = gen_reg_rtx (word_mode);
-
-	  /* We can support both FP and integer conditional moves.  */
+	  /* Emit an SCC-like instruction into a temporary so that we can
+	     use an EQ/NE comparison.  We can support both FP and integer
+	     conditional moves.  */
 	  if (INTEGRAL_MODE_P (mode0))
 	    riscv_expand_int_scc (tmp, code, op0, op1, invert_ptr);
 	  else if (FLOAT_MODE_P (mode0)
@@ -4149,6 +4157,8 @@  riscv_expand_conditional_move (rtx dest,
 	  op0 = XEXP (op, 0);
 	  op1 = XEXP (op, 1);
 	}
+      else if (!TARGET_ZICOND_LIKE && !TARGET_SFB_ALU && !TARGET_XTHEADCONDMOV)
+	riscv_expand_int_scc (tmp, code, op0, op1, &invert);
 
       if (TARGET_SFB_ALU || TARGET_XTHEADCONDMOV)
 	{
@@ -4166,6 +4176,23 @@  riscv_expand_conditional_move (rtx dest,
 							      cons, alt)));
 	  return true;
 	}
+      else if (!TARGET_ZICOND_LIKE)
+	{
+	  if (invert)
+	    std::swap (cons, alt);
+
+	  rtx reg1 = gen_reg_rtx (mode);
+	  rtx reg2 = gen_reg_rtx (mode);
+	  rtx reg3 = gen_reg_rtx (mode);
+	  rtx reg4 = gen_reg_rtx (mode);
+
+	  riscv_emit_unary (NEG, reg1, tmp);
+	  riscv_emit_binary (AND, reg2, reg1, cons);
+	  riscv_emit_unary (NOT, reg3, reg1);
+	  riscv_emit_binary (AND, reg4, reg3, alt);
+	  riscv_emit_binary (IOR, dest, reg2, reg4);
+	  return true;
+	}
       /* 0, reg or 0, imm */
       else if (cons == CONST0_RTX (mode)
 	       && (REG_P (alt)
Index: gcc/gcc/config/riscv/riscv.md
===================================================================
--- gcc.orig/gcc/config/riscv/riscv.md
+++ gcc/gcc/config/riscv/riscv.md
@@ -2658,9 +2658,10 @@ 
 (define_expand "mov<mode>cc"
   [(set (match_operand:GPR 0 "register_operand")
 	(if_then_else:GPR (match_operand 1 "comparison_operator")
-			  (match_operand:GPR 2 "sfb_alu_operand")
-			  (match_operand:GPR 3 "sfb_alu_operand")))]
-  "TARGET_SFB_ALU || TARGET_XTHEADCONDMOV || TARGET_ZICOND_LIKE"
+			  (match_operand:GPR 2 "movcc_operand")
+			  (match_operand:GPR 3 "movcc_operand")))]
+  "TARGET_SFB_ALU || TARGET_XTHEADCONDMOV || TARGET_ZICOND_LIKE
+   || TARGET_MOVCC"
 {
   if (riscv_expand_conditional_move (operands[0], operands[1],
 				     operands[2], operands[3]))
Index: gcc/gcc/config/riscv/riscv.opt
===================================================================
--- gcc.orig/gcc/config/riscv/riscv.opt
+++ gcc/gcc/config/riscv/riscv.opt
@@ -460,6 +460,10 @@  misa-spec=
 Target RejectNegative Joined Enum(isa_spec_class) Var(riscv_isa_spec) Init(TARGET_DEFAULT_ISA_SPEC)
 Set the version of RISC-V ISA spec.
 
+mmovcc
+Target Var(TARGET_MOVCC)
+Enable conditional moves unconditionally.
+
 minline-atomics
 Target Var(TARGET_INLINE_SUBWORD_ATOMIC) Init(1)
 Always inline subword atomic operations.
Index: gcc/gcc/doc/invoke.texi
===================================================================
--- gcc.orig/gcc/doc/invoke.texi
+++ gcc/gcc/doc/invoke.texi
@@ -1242,6 +1242,7 @@  See RS/6000 and PowerPC Options.
 -mstack-protector-guard=@var{guard}  -mstack-protector-guard-reg=@var{reg}
 -mstack-protector-guard-offset=@var{offset}
 -mcsr-check -mno-csr-check
+-mmovcc  -mno-movcc
 -minline-atomics  -mno-inline-atomics
 -minline-strlen  -mno-inline-strlen
 -minline-strcmp  -mno-inline-strcmp
@@ -29586,6 +29587,14 @@  Do or don't use smaller but slower prolo
 library function calls.  The default is to use fast inline prologues and
 epilogues.
 
+@opindex mmovcc
+@item -mmovcc
+@itemx -mno-movcc
+Do or don't produce branchless conditional-move code sequences even with
+targets that do not have specific instructions for conditional operations.
+If enabled, sequences of ALU operations are produced using base integer
+ISA instructions where profitable.
+
 @opindex minline-atomics
 @item -minline-atomics
 @itemx -mno-inline-atomics
Index: gcc/gcc/testsuite/gcc.target/riscv/zbs-bext-02.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.target/riscv/zbs-bext-02.c
+++ gcc/gcc/testsuite/gcc.target/riscv/zbs-bext-02.c
@@ -15,4 +15,4 @@  foo(const long long B, int a)
 
 /* { dg-final { scan-assembler-times "bext\t" 1 } } */
 /* { dg-final { scan-assembler-not {\mbset} } } */
-/* { dg-final { scan-assembler-not {\mand} } } */
+/* { dg-final { scan-assembler-not {\msll} } } */