diff mbox series

[committed] Reasonably handle SUBREGs in risc-v cost modeling

Message ID 7b53cdea-e75e-4df5-a0ae-91f7c15fb0fa@ventanamicro.com
State New
Headers show
Series [committed] Reasonably handle SUBREGs in risc-v cost modeling | expand

Commit Message

Jeff Law Feb. 4, 2024, 8:10 p.m. UTC
This patch adjusts the costs so that we treat REG and SUBREG expressions 
the same for costing.

This was motivated by bt_skip_func and bt_find_func in xz and results in 
nearly a 5% improvement in the dynamic instruction count for input #2 
and smaller, but definitely visible improvements pretty much across the 
board.  Exceptions would be perlbench input #1 and exchange2 which 
showed small regressions.



In the bt_find_func and bt_skip_func cases we have  something like this:

> (insn 10 7 11 2 (set (reg/v:DI 136 [ x ])
>         (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))) "zz.c":6:21 387 {*zero_extendsidi2_bitmanip}
>      (nil))
> (insn 11 10 12 2 (set (reg:DI 142 [ _1 ])
>         (plus:DI (reg/v:DI 136 [ x ])
>             (reg/v:DI 139 [ b ]))) "zz.c":7:23 5 {adddi3}
>      (nil))

[ ... ]
> (insn 13 12 14 2 (set (reg:DI 143 [ _2 ])
>         (plus:DI (reg/v:DI 136 [ x ])
>             (reg/v:DI 141 [ c ]))) "zz.c":8:23 5 {adddi3}
>      (nil))


Note the two uses of (reg 136). The best way to handle that in combine 
might be a 3->2 split.  But there's a much better approach if we look at 
fwprop...



(set (reg:DI 142 [ _1 ])
     (plus:DI (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))
         (reg/v:DI 139 [ b ])))
change not profitable (cost 4 -> cost 8)


So that should be the same cost as a regular DImode addition when the 
ZBA extension is enabled.  But it ends up costing more because the 
clause to cost this variant isn't prepared to handle a SUBREG.  That 
results in the RTL above having too high a cost and fwprop gives up.

One approach would be to replace the REG_P  with REG_P || SUBREG_P in 
the costing code.  I ultimately decided against that and instead check 
if the operand in question passes register_operand.

By far the most important case to handle is the DImode PLUS.  But for 
the sake of consistency, I changed the other instances in 
riscv_rtx_costs as well.  For those other cases we're talking about 
improvements in the .000001% range.

While we are into stage4, this just hits cost modeling which we've 
generally agreed is still appropriate for the RISC-V backend (though we 
were mostly talking about vector).  So I'm going to extend that general 
agreement ever so slightly and include scalar cost modeling :-)

Built and regression tested on rv64gc.  Pushing to the trunk.

Shout out to Jivan who took the original somewhat vague report about 
bt_skip_func and boiled it down to a very simple testcase along with 
info on a couple attempted fixes that didn't work out.


Jeff
commit 777df37a12e55ecbc135efbed2749a8a8a756d4d
Author: Jeff Law <jlaw@ventanamicro.com>
Date:   Sun Feb 4 13:01:50 2024 -0700

    [committed] Reasonably handle SUBREGs in risc-v cost modeling
    
    This patch adjusts the costs so that we treat REG and SUBREG expressions the
    same for costing.
    
    This was motivated by bt_skip_func and bt_find_func in xz and results in nearly
    a 5% improvement in the dynamic instruction count for input #2 and smaller, but
    definitely visible improvements pretty much across the board.  Exceptions would
    be perlbench input #1 and exchange2 which showed very small regressions.
    
    In the bt_find_func and bt_skip_func cases we have  something like this:
    
    > (insn 10 7 11 2 (set (reg/v:DI 136 [ x ])
    >         (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))) "zz.c":6:21 387 {*zero_extendsidi2_bitmanip}
    >      (nil))
    > (insn 11 10 12 2 (set (reg:DI 142 [ _1 ])
    >         (plus:DI (reg/v:DI 136 [ x ])
    >             (reg/v:DI 139 [ b ]))) "zz.c":7:23 5 {adddi3}
    >      (nil))
    
    [ ... ]> (insn 13 12 14 2 (set (reg:DI 143 [ _2 ])
    >         (plus:DI (reg/v:DI 136 [ x ])
    >             (reg/v:DI 141 [ c ]))) "zz.c":8:23 5 {adddi3}
    >      (nil))
    
    Note the two uses of (reg 136). The best way to handle that in combine might be
    a 3->2 split.  But there's a much better approach if we look at fwprop...
    
    (set (reg:DI 142 [ _1 ])
        (plus:DI (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))
            (reg/v:DI 139 [ b ])))
    change not profitable (cost 4 -> cost 8)
    
    So that should be the same cost as a regular DImode addition when the ZBA
    extension is enabled.  But it ends up costing more because the clause to cost
    this variant isn't prepared to handle a SUBREG.  That results in the RTL above
    having too high a cost and fwprop gives up.
    
    One approach would be to replace the REG_P  with REG_P || SUBREG_P in the
    costing code.  I ultimately decided against that and instead check if the
    operand in question passes register_operand.
    
    By far the most important case to handle is the DImode PLUS.  But for the sake
    of consistency, I changed the other instances in riscv_rtx_costs as well.  For
    those other cases we're talking about improvements in the .000001% range.
    
    While we are into stage4, this just hits cost modeling which we've generally
    agreed is still appropriate (though we were mostly talking about vector).  So
    I'm going to extend that general agreement ever so slightly and include scalar
    cost modeling :-)
    
    gcc/
            * config/riscv/riscv.cc (riscv_rtx_costs): Handle SUBREG and REG
            similarly.
    
    gcc/testsuite/
    
            * gcc.target/riscv/reg_subreg_costs.c: New test.
    
            Co-authored-by: Jivan Hakobyan <jivanhakobyan9@gmail.com>
diff mbox series

Patch

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d7cdd7183c2..d6868a65b31 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3055,7 +3055,8 @@  riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
     case SET:
       /* If we are called for an INSN that's a simple set of a register,
 	 then cost based on the SET_SRC alone.  */
-      if (outer_code == INSN && REG_P (SET_DEST (x)))
+      if (outer_code == INSN
+	  && register_operand (SET_DEST (x), GET_MODE (SET_DEST (x))))
 	{
 	  riscv_rtx_costs (SET_SRC (x), mode, outer_code, opno, total, speed);
 	  return true;
@@ -3172,7 +3173,7 @@  riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
 	  rtx and_rhs = XEXP (x, 1);
 	  rtx ashift_lhs = XEXP (XEXP (x, 0), 0);
 	  rtx ashift_rhs = XEXP (XEXP (x, 0), 1);
-	  if (REG_P (ashift_lhs)
+	  if (register_operand (ashift_lhs, GET_MODE (ashift_lhs))
 	      && CONST_INT_P (ashift_rhs)
 	      && CONST_INT_P (and_rhs)
 	      && ((INTVAL (and_rhs) >> INTVAL (ashift_rhs)) == 0xffffffff))
@@ -3188,7 +3189,7 @@  riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
 	}
       /* bclr pattern for zbs.  */
       if (TARGET_ZBS
-	  && REG_P (XEXP (x, 1))
+	  && register_operand (XEXP (x, 1), GET_MODE (XEXP (x, 1)))
 	  && GET_CODE (XEXP (x, 0)) == ROTATE
 	  && CONST_INT_P (XEXP ((XEXP (x, 0)), 0))
 	  && INTVAL (XEXP ((XEXP (x, 0)), 0)) == -2)
@@ -3344,7 +3345,8 @@  riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
       if (TARGET_ZBA
 	  && (TARGET_64BIT && (mode == DImode))
 	  && GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
-	  && REG_P (XEXP (XEXP (x, 0), 0))
+	  && register_operand (XEXP (XEXP (x, 0), 0),
+			       GET_MODE (XEXP (XEXP (x, 0), 0)))
 	  && GET_MODE (XEXP (XEXP (x, 0), 0)) == SImode)
 	{
 	  *total = COSTS_N_INSNS (1);
@@ -3355,7 +3357,8 @@  riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
 	  && ((!TARGET_64BIT && (mode == SImode)) ||
 	      (TARGET_64BIT && (mode == DImode)))
 	  && (GET_CODE (XEXP (x, 0)) == ASHIFT)
-	  && REG_P (XEXP (XEXP (x, 0), 0))
+	  && register_operand (XEXP (XEXP (x, 0), 0),
+			       GET_MODE (XEXP (XEXP (x, 0), 0)))
 	  && CONST_INT_P (XEXP (XEXP (x, 0), 1))
 	  && IN_RANGE (INTVAL (XEXP (XEXP (x, 0), 1)), 1, 3))
 	{
@@ -3368,7 +3371,8 @@  riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
       if (TARGET_ZBA
 	  && mode == word_mode
 	  && GET_CODE (XEXP (x, 0)) == MULT
-	  && REG_P (XEXP (XEXP (x, 0), 0))
+	  && register_operand (XEXP (XEXP (x, 0), 0),
+			       GET_MODE (XEXP (XEXP (x, 0), 0)))
 	  && CONST_INT_P (XEXP (XEXP (x, 0), 1))
 	  && pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1)))
 	  && IN_RANGE (exact_log2 (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3))
@@ -3390,7 +3394,7 @@  riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
       if (TARGET_ZBA
 	  && (TARGET_64BIT && (mode == DImode))
 	  && (GET_CODE (XEXP (x, 0)) == AND)
-	  && (REG_P (XEXP (x, 1))))
+	  && register_operand (XEXP (x, 1), GET_MODE (XEXP (x, 1))))
 	{
 	  do {
 	    rtx and_lhs = XEXP (XEXP (x, 0), 0);
diff --git a/gcc/testsuite/gcc.target/riscv/reg_subreg_costs.c b/gcc/testsuite/gcc.target/riscv/reg_subreg_costs.c
new file mode 100644
index 00000000000..874dff3a688
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/reg_subreg_costs.c
@@ -0,0 +1,15 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_zba" } */
+
+#include <stdint-gcc.h>
+void foo(uint32_t a, uint64_t *b_ptr, uint64_t b, uint64_t *c_ptr, uint64_t c)
+{
+  uint64_t x = a;
+  *b_ptr = b + x;
+  *c_ptr = c + x;
+}
+
+/* { dg-final { scan-assembler-not "\\szext.w\\s" } } */
+