Message ID | 20240117095055.119853-1-monk.chiang@sifive.com |
---|---|
State | New |
Headers | show |
Series | match: Do not select to branchless expression when target has movcc pattern [PR113095] | expand |
On Wed, 17 Jan 2024, Monk Chiang wrote: > This allows the backend to generate movcc instructions, if target > machine has movcc pattern. > > branchless-cond.c needs to be updated since some target machines have > conditional move instructions, and the experssion will not change to > branchless expression. While I agree this pattern should possibly be applied during RTL expansion or instruction selection on x86 which also has movcc the multiplication is cheaper. So I don't think this isn't the way to go. I'd rather revert the change than trying to "fix" it this way? Thanks, Richard. > gcc/ChangeLog: > PR target/113095 > * match.pd (`(zero_one == 0) ? y : z <op> y`, > `(zero_one != 0) ? z <op> y : y`): Do not match to branchless > expression, if target machine has conditional move pattern. > > gcc/testsuite/ChangeLog: > * gcc.dg/tree-ssa/branchless-cond.c: Update testcase. > --- > gcc/match.pd | 30 +++++++++++++++++-- > .../gcc.dg/tree-ssa/branchless-cond.c | 6 ++-- > 2 files changed, 31 insertions(+), 5 deletions(-) > > diff --git a/gcc/match.pd b/gcc/match.pd > index e42ecaf9ec7..a1f90b1cd41 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -4231,7 +4231,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (if (INTEGRAL_TYPE_P (type) > && TYPE_PRECISION (type) > 1 > && (INTEGRAL_TYPE_P (TREE_TYPE (@0)))) > - (op (mult (convert:type @0) @2) @1)))) > + (with { > + bool can_movecc_p = false; > + if (can_conditionally_move_p (TYPE_MODE (type))) > + can_movecc_p = true; > + > + /* Some target only support word_mode for movcc pattern, if type can > + extend to word_mode then use conditional move. Even if there is a > + extend instruction, the cost is lower than branchless. */ > + if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type)) > + && can_conditionally_move_p (word_mode)) > + can_movecc_p = true; > + } > + (if (!can_movecc_p) > + (op (mult (convert:type @0) @2) @1)))))) > > /* (zero_one != 0) ? z <op> y : y -> ((typeof(y))zero_one * z) <op> y */ > (for op (bit_xor bit_ior plus) > @@ -4243,7 +4256,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (if (INTEGRAL_TYPE_P (type) > && TYPE_PRECISION (type) > 1 > && (INTEGRAL_TYPE_P (TREE_TYPE (@0)))) > - (op (mult (convert:type @0) @2) @1)))) > + (with { > + bool can_movecc_p = false; > + if (can_conditionally_move_p (TYPE_MODE (type))) > + can_movecc_p = true; > + > + /* Some target only support word_mode for movcc pattern, if type can > + extend to word_mode then use conditional move. Even if there is a > + extend instruction, the cost is lower than branchless. */ > + if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type)) > + && can_conditionally_move_p (word_mode)) > + can_movecc_p = true; > + } > + (if (!can_movecc_p) > + (op (mult (convert:type @0) @2) @1)))))) > > /* ?: Value replacement. */ > /* a == 0 ? b : b + a -> b + a */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c > index e063dc4bb5f..c002ed97364 100644 > --- a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c > +++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c > @@ -21,6 +21,6 @@ int f4(unsigned int x, unsigned int y, unsigned int z) > return ((x & 1) != 0) ? z | y : y; > } > > -/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" } } */ > -/* { dg-final { scan-tree-dump-times " & " 4 "optimized" } } */ > -/* { dg-final { scan-tree-dump-not "if " "optimized" } } */ > +/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" { xfail { "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */ > +/* { dg-final { scan-tree-dump-times " & " 4 "optimized" { xfail { "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */ > +/* { dg-final { scan-tree-dump-not "if " "optimized" { xfail { "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */ >
On 1/17/24 05:14, Richard Biener wrote: > On Wed, 17 Jan 2024, Monk Chiang wrote: > >> This allows the backend to generate movcc instructions, if target >> machine has movcc pattern. >> >> branchless-cond.c needs to be updated since some target machines have >> conditional move instructions, and the experssion will not change to >> branchless expression. > > While I agree this pattern should possibly be applied during RTL > expansion or instruction selection on x86 which also has movcc > the multiplication is cheaper. So I don't think this isn't the way to go. > > I'd rather revert the change than trying to "fix" it this way? WRT reverting -- the patch in question's sole purpose was to enable branchless sequences for that very same code. Reverting would regress performance on a variety of micro-architectures. IIUC, the issue is that the SiFive part in question has a fusion which allows it to do the branchy sequence cheaply. ISTM this really needs to be addressed during expansion and most likely with a RISC-V target twiddle for the micro-archs which have short-forward-branch optimizations. jeff
Thanks for your advice!! I agree it should be fixed in the RISC-V backend when expansion. On Wed, Jan 17, 2024 at 10:37 PM Jeff Law <jeffreyalaw@gmail.com> wrote: > > > On 1/17/24 05:14, Richard Biener wrote: > > On Wed, 17 Jan 2024, Monk Chiang wrote: > > > >> This allows the backend to generate movcc instructions, if target > >> machine has movcc pattern. > >> > >> branchless-cond.c needs to be updated since some target machines have > >> conditional move instructions, and the experssion will not change to > >> branchless expression. > > > > While I agree this pattern should possibly be applied during RTL > > expansion or instruction selection on x86 which also has movcc > > the multiplication is cheaper. So I don't think this isn't the way to > go. > > > > I'd rather revert the change than trying to "fix" it this way? > WRT reverting -- the patch in question's sole purpose was to enable > branchless sequences for that very same code. Reverting would regress > performance on a variety of micro-architectures. IIUC, the issue is > that the SiFive part in question has a fusion which allows it to do the > branchy sequence cheaply. > > ISTM this really needs to be addressed during expansion and most likely > with a RISC-V target twiddle for the micro-archs which have > short-forward-branch optimizations. > > jeff >
On Wed, 17 Jan 2024 19:19:58 PST (-0800), monk.chiang@sifive.com wrote: > Thanks for your advice!! I agree it should be fixed in the RISC-V backend > when expansion. > > > On Wed, Jan 17, 2024 at 10:37 PM Jeff Law <jeffreyalaw@gmail.com> wrote: > >> >> >> On 1/17/24 05:14, Richard Biener wrote: >> > On Wed, 17 Jan 2024, Monk Chiang wrote: >> > >> >> This allows the backend to generate movcc instructions, if target >> >> machine has movcc pattern. >> >> >> >> branchless-cond.c needs to be updated since some target machines have >> >> conditional move instructions, and the experssion will not change to >> >> branchless expression. >> > >> > While I agree this pattern should possibly be applied during RTL >> > expansion or instruction selection on x86 which also has movcc >> > the multiplication is cheaper. So I don't think this isn't the way to >> go. >> > >> > I'd rather revert the change than trying to "fix" it this way? >> WRT reverting -- the patch in question's sole purpose was to enable >> branchless sequences for that very same code. Reverting would regress >> performance on a variety of micro-architectures. IIUC, the issue is >> that the SiFive part in question has a fusion which allows it to do the >> branchy sequence cheaply. >> >> ISTM this really needs to be addressed during expansion and most likely >> with a RISC-V target twiddle for the micro-archs which have >> short-forward-branch optimizations. IIRC I ran into some of these middle-end interactions a year or two ago and determined that we'd need middle-end changes to get this working smoothly -- essentially replacing the expander checks for a MOVCC insn with some sort of costing. Without that, we're just going to end up with some missed optimizations that favor one way or the other. >> >> jeff >>
diff --git a/gcc/match.pd b/gcc/match.pd index e42ecaf9ec7..a1f90b1cd41 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -4231,7 +4231,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (INTEGRAL_TYPE_P (type) && TYPE_PRECISION (type) > 1 && (INTEGRAL_TYPE_P (TREE_TYPE (@0)))) - (op (mult (convert:type @0) @2) @1)))) + (with { + bool can_movecc_p = false; + if (can_conditionally_move_p (TYPE_MODE (type))) + can_movecc_p = true; + + /* Some target only support word_mode for movcc pattern, if type can + extend to word_mode then use conditional move. Even if there is a + extend instruction, the cost is lower than branchless. */ + if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type)) + && can_conditionally_move_p (word_mode)) + can_movecc_p = true; + } + (if (!can_movecc_p) + (op (mult (convert:type @0) @2) @1)))))) /* (zero_one != 0) ? z <op> y : y -> ((typeof(y))zero_one * z) <op> y */ (for op (bit_xor bit_ior plus) @@ -4243,7 +4256,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (INTEGRAL_TYPE_P (type) && TYPE_PRECISION (type) > 1 && (INTEGRAL_TYPE_P (TREE_TYPE (@0)))) - (op (mult (convert:type @0) @2) @1)))) + (with { + bool can_movecc_p = false; + if (can_conditionally_move_p (TYPE_MODE (type))) + can_movecc_p = true; + + /* Some target only support word_mode for movcc pattern, if type can + extend to word_mode then use conditional move. Even if there is a + extend instruction, the cost is lower than branchless. */ + if (can_extend_p (word_mode, TYPE_MODE (type), TYPE_UNSIGNED (type)) + && can_conditionally_move_p (word_mode)) + can_movecc_p = true; + } + (if (!can_movecc_p) + (op (mult (convert:type @0) @2) @1)))))) /* ?: Value replacement. */ /* a == 0 ? b : b + a -> b + a */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c index e063dc4bb5f..c002ed97364 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c @@ -21,6 +21,6 @@ int f4(unsigned int x, unsigned int y, unsigned int z) return ((x & 1) != 0) ? z | y : y; } -/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" } } */ -/* { dg-final { scan-tree-dump-times " & " 4 "optimized" } } */ -/* { dg-final { scan-tree-dump-not "if " "optimized" } } */ +/* { dg-final { scan-tree-dump-times " \\\*" 4 "optimized" { xfail { "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */ +/* { dg-final { scan-tree-dump-times " & " 4 "optimized" { xfail { "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */ +/* { dg-final { scan-tree-dump-not "if " "optimized" { xfail { "aarch64*-*-* alpha*-*-* bfin*-*-* epiphany-*-* i?86-*-* x86_64-*-* nds32*-*-*" } } } } */