Message ID | alpine.LSU.2.11.1511231217040.4884@t29.fhfr.qr |
---|---|
State | New |
Headers | show |
On 23/11/15 12:31, Richard Biener wrote: >>> From the dump below I understand you want no memory references in >>> > >the outer loop? >>> > >So the issue seems to be that store motion fails >>> > >to insert the preheader load / exit store to the outermost loop >>> > >possible and thus another LIM pass is needed to "store motion" those >>> > >again? >> > >> >Yep. >> > >>> > > But a simple testcase >>> > > >>> > >int a; >>> > >int *p = &a; >>> > >int foo (int n) >>> > >{ >>> > > for (int i = 0; i < n; ++i) >>> > > for (int j = 0; j < 100; ++j) >>> > > *p += j + i; >>> > > return a; >>> > >} >>> > > >>> > >shows that LIM can do this in one step. >> > >> >I've filed a FTR PR68465 - "pass_lim doesn't detect identical loop entry >> >conditions" for a test-case where that doesn't happen (when using >> >-fno-tree-dominator-opts). >> > >>> > >Which means it should >>> > >be investigated why it doesn't do this properly for your testcase >>> > >(store motion of *_25). >> > >> >There seems to be two related problems: >> >1. the store has tree_could_trap_p (ref->mem.ref) true, which should be >> > false. I'll work on a fix for this. >> >2. Give that the store can trap, I was running into PR68465. I managed >> > to eliminate the 2nd pass_lim by moving the pass_dominator instance >> > before the pass_lim instance. >> > >> >Attached patch shows the pass group with only one pass_lim. I hope to be able >> >to eliminate the first pass_dominator instance before pass_lim once I fix 1. >> > >>> > >Simply adding two LIM passes either papers over a wrong-code >>> > >bug (in LIM or in DOM) or over a missed-optimization in LIM. >> > >> >AFAIU now, it's PR68465, a missed optimization in LIM. > Ok, it's not really LIMs job to cleanup loop header copying that way. > > DOM performs jump-threading for this but FRE should also be able > to handle this just fine. Ah, it doesn't because the outer loop > header directly contains the condition > > Index: gcc/tree-ssa-sccvn.c > =================================================================== > --- gcc/tree-ssa-sccvn.c (revision 230737) > +++ gcc/tree-ssa-sccvn.c (working copy) > @@ -4357,20 +4402,32 @@ sccvn_dom_walker::before_dom_children (b > > /* If we have a single predecessor record the equivalence from a > possible condition on the predecessor edge. */ > - if (single_pred_p (bb)) > + edge pred_e = NULL; > + FOR_EACH_EDGE (e, ei, bb->preds) > + { > + if (e->flags & EDGE_DFS_BACK) > + continue; > + if (! pred_e) > + pred_e = e; > + else > + { > + pred_e = NULL; > + break; > + } > + } > + if (pred_e) > { > - edge e = single_pred_edge (bb); > /* Check if there are multiple executable successor edges in > the source block. Otherwise there is no additional info > to be recorded. */ > edge e2; > - FOR_EACH_EDGE (e2, ei, e->src->succs) > - if (e2 != e > + FOR_EACH_EDGE (e2, ei, pred_e->src->succs) > + if (e2 != pred_e > && e2->flags & EDGE_EXECUTABLE) > break; > if (e2 && (e2->flags & EDGE_EXECUTABLE)) > { > - gimple *stmt = last_stmt (e->src); > + gimple *stmt = last_stmt (pred_e->src); > if (stmt > && gimple_code (stmt) == GIMPLE_COND) > { > @@ -4378,11 +4435,11 @@ sccvn_dom_walker::before_dom_children (b > tree lhs = gimple_cond_lhs (stmt); > tree rhs = gimple_cond_rhs (stmt); > record_conds (bb, code, lhs, rhs, > - (e->flags & EDGE_TRUE_VALUE) != 0); > + (pred_e->flags & EDGE_TRUE_VALUE) != 0); > code = invert_tree_comparison (code, HONOR_NANS (lhs)); > if (code != ERROR_MARK) > record_conds (bb, code, lhs, rhs, > - (e->flags & EDGE_TRUE_VALUE) == 0); > + (pred_e->flags & EDGE_TRUE_VALUE) == 0); > } > } > } > > fixes this for me (for a small testcase). Does it help yours? > Yes, it has the desired effect (of not needing pass_dominator before pass_lim) . But, patch "Mark by_ref mem_ref in build_receiver_ref as non-trapping" committed as r230738, also has that effect, so AFAIU I don't require this tree-ssa-sccvn.c fix. Thanks, - Tom > Otherwise untested of course (I hope EDGE_DFS_BACK is good enough, > it's supposed to match edges that have the src dominated by the dest). > Testing the above now.
On November 23, 2015 4:37:18 PM GMT+01:00, Tom de Vries <Tom_deVries@mentor.com> wrote: >On 23/11/15 12:31, Richard Biener wrote: >>>> From the dump below I understand you want no memory references in >>>> > >the outer loop? >>>> > >So the issue seems to be that store motion fails >>>> > >to insert the preheader load / exit store to the outermost loop >>>> > >possible and thus another LIM pass is needed to "store motion" >those >>>> > >again? >>> > >>> >Yep. >>> > >>>> > > But a simple testcase >>>> > > >>>> > >int a; >>>> > >int *p = &a; >>>> > >int foo (int n) >>>> > >{ >>>> > > for (int i = 0; i < n; ++i) >>>> > > for (int j = 0; j < 100; ++j) >>>> > > *p += j + i; >>>> > > return a; >>>> > >} >>>> > > >>>> > >shows that LIM can do this in one step. >>> > >>> >I've filed a FTR PR68465 - "pass_lim doesn't detect identical loop >entry >>> >conditions" for a test-case where that doesn't happen (when using >>> >-fno-tree-dominator-opts). >>> > >>>> > >Which means it should >>>> > >be investigated why it doesn't do this properly for your >testcase >>>> > >(store motion of *_25). >>> > >>> >There seems to be two related problems: >>> >1. the store has tree_could_trap_p (ref->mem.ref) true, which >should be >>> > false. I'll work on a fix for this. >>> >2. Give that the store can trap, I was running into PR68465. I >managed >>> > to eliminate the 2nd pass_lim by moving the pass_dominator >instance >>> > before the pass_lim instance. >>> > >>> >Attached patch shows the pass group with only one pass_lim. I hope >to be able >>> >to eliminate the first pass_dominator instance before pass_lim once >I fix 1. >>> > >>>> > >Simply adding two LIM passes either papers over a wrong-code >>>> > >bug (in LIM or in DOM) or over a missed-optimization in LIM. >>> > >>> >AFAIU now, it's PR68465, a missed optimization in LIM. >> Ok, it's not really LIMs job to cleanup loop header copying that way. >> >> DOM performs jump-threading for this but FRE should also be able >> to handle this just fine. Ah, it doesn't because the outer loop >> header directly contains the condition >> >> Index: gcc/tree-ssa-sccvn.c >> =================================================================== >> --- gcc/tree-ssa-sccvn.c (revision 230737) >> +++ gcc/tree-ssa-sccvn.c (working copy) >> @@ -4357,20 +4402,32 @@ sccvn_dom_walker::before_dom_children (b >> >> /* If we have a single predecessor record the equivalence from a >> possible condition on the predecessor edge. */ >> - if (single_pred_p (bb)) >> + edge pred_e = NULL; >> + FOR_EACH_EDGE (e, ei, bb->preds) >> + { >> + if (e->flags & EDGE_DFS_BACK) >> + continue; >> + if (! pred_e) >> + pred_e = e; >> + else >> + { >> + pred_e = NULL; >> + break; >> + } >> + } >> + if (pred_e) >> { >> - edge e = single_pred_edge (bb); >> /* Check if there are multiple executable successor edges in >> the source block. Otherwise there is no additional info >> to be recorded. */ >> edge e2; >> - FOR_EACH_EDGE (e2, ei, e->src->succs) >> - if (e2 != e >> + FOR_EACH_EDGE (e2, ei, pred_e->src->succs) >> + if (e2 != pred_e >> && e2->flags & EDGE_EXECUTABLE) >> break; >> if (e2 && (e2->flags & EDGE_EXECUTABLE)) >> { >> - gimple *stmt = last_stmt (e->src); >> + gimple *stmt = last_stmt (pred_e->src); >> if (stmt >> && gimple_code (stmt) == GIMPLE_COND) >> { >> @@ -4378,11 +4435,11 @@ sccvn_dom_walker::before_dom_children (b >> tree lhs = gimple_cond_lhs (stmt); >> tree rhs = gimple_cond_rhs (stmt); >> record_conds (bb, code, lhs, rhs, >> - (e->flags & EDGE_TRUE_VALUE) != 0); >> + (pred_e->flags & EDGE_TRUE_VALUE) != 0); >> code = invert_tree_comparison (code, HONOR_NANS >(lhs)); >> if (code != ERROR_MARK) >> record_conds (bb, code, lhs, rhs, >> - (e->flags & EDGE_TRUE_VALUE) == 0); >> + (pred_e->flags & EDGE_TRUE_VALUE) == >0); >> } >> } >> } >> >> fixes this for me (for a small testcase). Does it help yours? >> > >Yes, it has the desired effect (of not needing pass_dominator before >pass_lim) . But, patch "Mark by_ref mem_ref in build_receiver_ref as >non-trapping" committed as r230738, also has that effect, so AFAIU I >don't require this tree-ssa-sccvn.c fix. OK, I committed it anyway already. Richard. >Thanks, >- Tom > >> Otherwise untested of course (I hope EDGE_DFS_BACK is good enough, >> it's supposed to match edges that have the src dominated by the >dest). >> Testing the above now.
Index: gcc/tree-ssa-sccvn.c =================================================================== --- gcc/tree-ssa-sccvn.c (revision 230737) +++ gcc/tree-ssa-sccvn.c (working copy) @@ -4357,20 +4402,32 @@ sccvn_dom_walker::before_dom_children (b /* If we have a single predecessor record the equivalence from a possible condition on the predecessor edge. */ - if (single_pred_p (bb)) + edge pred_e = NULL; + FOR_EACH_EDGE (e, ei, bb->preds) + { + if (e->flags & EDGE_DFS_BACK) + continue; + if (! pred_e) + pred_e = e; + else + { + pred_e = NULL; + break; + } + } + if (pred_e) { - edge e = single_pred_edge (bb); /* Check if there are multiple executable successor edges in the source block. Otherwise there is no additional info to be recorded. */ edge e2; - FOR_EACH_EDGE (e2, ei, e->src->succs) - if (e2 != e + FOR_EACH_EDGE (e2, ei, pred_e->src->succs) + if (e2 != pred_e && e2->flags & EDGE_EXECUTABLE) break; if (e2 && (e2->flags & EDGE_EXECUTABLE)) { - gimple *stmt = last_stmt (e->src); + gimple *stmt = last_stmt (pred_e->src); if (stmt && gimple_code (stmt) == GIMPLE_COND) { @@ -4378,11 +4435,11 @@ sccvn_dom_walker::before_dom_children (b tree lhs = gimple_cond_lhs (stmt); tree rhs = gimple_cond_rhs (stmt); record_conds (bb, code, lhs, rhs, - (e->flags & EDGE_TRUE_VALUE) != 0); + (pred_e->flags & EDGE_TRUE_VALUE) != 0); code = invert_tree_comparison (code, HONOR_NANS (lhs)); if (code != ERROR_MARK) record_conds (bb, code, lhs, rhs, - (e->flags & EDGE_TRUE_VALUE) == 0); + (pred_e->flags & EDGE_TRUE_VALUE) == 0); } } }