Message ID | 20221110213403.3592364-1-philipp.tomsich@vrull.eu |
---|---|
State | New |
Headers | show |
Series | [v2] RISC-V: costs: support shift-and-add in strength-reduction | expand |
On 11/10/22 14:34, Philipp Tomsich wrote: > The strength-reduction implementation in expmed.cc will assess the > profitability of using shift-and-add using a RTL expression that wraps > a MULT (with a power-of-2) in a PLUS. Unless the RISC-V rtx_costs > function recognizes this as expressing a sh[123]add instruction, we > will return an inflated cost---thus defeating the optimization. > > This change adds the necessary idiom recognition to provide an > accurate cost for this for of expressing sh[123]add. > > Instead on expanding to > li a5,200 > mulw a0,a5,a0 > with this change, the expression 'a * 200' is sythesized as: > sh2add a0,a0,a0 // *5 = a + 4 * a > sh2add a0,a0,a0 // *5 = a + 4 * a > slli a0,a0,3 // *8 > > gcc/ChangeLog: > > * config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd, > if expressed as a plus and multiplication with a power-of-2. > Split costing for MINUS from PLUS. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/zba-shNadd-07.c: New test. OK. Note that getting this right can impact one of the spec2017 integer benchmarks notably. I don't recall which one, but it has a div and a mod by the same constant which is fairly reasonably implement with shifts and adds. You won't see it in instruction count data, but would see it if you had cycle count data or instrumented for div/mod instructions. Jeff
Applied to master. Thanks! Note that the multiply-by-200 (in the testcase) originates from Dhrystone. Philipp. On Sun, 13 Nov 2022 at 02:23, Jeff Law <jeffreyalaw@gmail.com> wrote: > > > On 11/10/22 14:34, Philipp Tomsich wrote: > > The strength-reduction implementation in expmed.cc will assess the > > profitability of using shift-and-add using a RTL expression that wraps > > a MULT (with a power-of-2) in a PLUS. Unless the RISC-V rtx_costs > > function recognizes this as expressing a sh[123]add instruction, we > > will return an inflated cost---thus defeating the optimization. > > > > This change adds the necessary idiom recognition to provide an > > accurate cost for this for of expressing sh[123]add. > > > > Instead on expanding to > > li a5,200 > > mulw a0,a5,a0 > > with this change, the expression 'a * 200' is sythesized as: > > sh2add a0,a0,a0 // *5 = a + 4 * a > > sh2add a0,a0,a0 // *5 = a + 4 * a > > slli a0,a0,3 // *8 > > > > gcc/ChangeLog: > > > > * config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd, > > if expressed as a plus and multiplication with a power-of-2. > > Split costing for MINUS from PLUS. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/zba-shNadd-07.c: New test. > > OK. Note that getting this right can impact one of the spec2017 integer > benchmarks notably. I don't recall which one, but it has a div and a > mod by the same constant which is fairly reasonably implement with > shifts and adds. You won't see it in instruction count data, but would > see it if you had cycle count data or instrumented for div/mod instructions. > > > Jeff > >
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 3e2dc8192e4..2a94482b8ed 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -2428,6 +2428,12 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN return false; case MINUS: + if (float_mode_p) + *total = tune_param->fp_add[mode == DFmode]; + else + *total = riscv_binary_cost (x, 1, 4); + return false; + case PLUS: /* add.uw pattern for zba. */ if (TARGET_ZBA @@ -2451,6 +2457,19 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN *total = COSTS_N_INSNS (1); return true; } + /* Before strength-reduction, the shNadd can be expressed as the addition + of a multiplication with a power-of-two. If this case is not handled, + the strength-reduction in expmed.c will calculate an inflated cost. */ + if (TARGET_ZBA + && mode == word_mode + && GET_CODE (XEXP (x, 0)) == MULT + && REG_P (XEXP (XEXP (x, 0), 0)) + && CONST_INT_P (XEXP (XEXP (x, 0), 1)) + && IN_RANGE (pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3)) + { + *total = COSTS_N_INSNS (1); + return true; + } /* shNadd.uw pattern for zba. [(set (match_operand:DI 0 "register_operand" "=r") (plus:DI diff --git a/gcc/testsuite/gcc.target/riscv/zba-shNadd-07.c b/gcc/testsuite/gcc.target/riscv/zba-shNadd-07.c new file mode 100644 index 00000000000..98d35e1da9b --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/zba-shNadd-07.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc_zba -mabi=lp64 -O2" } */ + +unsigned long +f1 (unsigned long i) +{ + return i * 200; +} + +unsigned long +f2 (unsigned long i) +{ + return i * 783; +} + +unsigned long +f3 (unsigned long i) +{ + return i * 784; +} + +unsigned long +f4 (unsigned long i) +{ + return i * 1574; +} + +/* { dg-final { scan-assembler-times "sh2add" 2 } } */ +/* { dg-final { scan-assembler-times "sh1add" 2 } } */ +/* { dg-final { scan-assembler-times "slli" 5 } } */ +/* { dg-final { scan-assembler-times "mul" 1 } } */
The strength-reduction implementation in expmed.cc will assess the profitability of using shift-and-add using a RTL expression that wraps a MULT (with a power-of-2) in a PLUS. Unless the RISC-V rtx_costs function recognizes this as expressing a sh[123]add instruction, we will return an inflated cost---thus defeating the optimization. This change adds the necessary idiom recognition to provide an accurate cost for this for of expressing sh[123]add. Instead on expanding to li a5,200 mulw a0,a5,a0 with this change, the expression 'a * 200' is sythesized as: sh2add a0,a0,a0 // *5 = a + 4 * a sh2add a0,a0,a0 // *5 = a + 4 * a slli a0,a0,3 // *8 gcc/ChangeLog: * config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd, if expressed as a plus and multiplication with a power-of-2. Split costing for MINUS from PLUS. gcc/testsuite/ChangeLog: * gcc.target/riscv/zba-shNadd-07.c: New test. Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> --- Changes in v2: - Split rtx_costs calculation for MINUS from PLUS to ensure that (minus reg (ashift reg SHAMT)) is not mistaken for a shNadd - Add testcase gcc/config/riscv/riscv.cc | 19 ++++++++++++ .../gcc.target/riscv/zba-shNadd-07.c | 31 +++++++++++++++++++ 2 files changed, 50 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/zba-shNadd-07.c