Message ID | 20210816084612.7999-1-luoxhu@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | Fix incorrect computation in fill_always_executed_in_1 | expand |
On Mon, 16 Aug 2021, Xiong Hu Luo wrote: > It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for > nested loops. inn_loop is updated to inner loop, so it need be restored > when exiting from innermost loop. With this patch, the store instruction > in outer loop could also be moved out of outer loop by store motion. > Any comments? Thanks. > gcc/ChangeLog: > > * tree-ssa-loop-im.c (fill_always_executed_in_1): Restore > inn_loop when exiting from innermost loop. > > gcc/testsuite/ChangeLog: > > * gcc.dg/tree-ssa/ssa-lim-19.c: New test. > --- > gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++++++++++++++++++++++ > gcc/tree-ssa-loop-im.c | 6 +++++- > 2 files changed, 29 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c > new file mode 100644 > index 00000000000..097a5ee4a4b > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c > @@ -0,0 +1,24 @@ > +/* PR/101293 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-lim2-details" } */ > + > +struct X { int i; int j; int k;}; > + > +void foo(struct X *x, int n, int l) > +{ > + for (int j = 0; j < l; j++) > + { > + for (int i = 0; i < n; ++i) > + { > + int *p = &x->j; > + int tem = *p; > + x->j += tem * i; > + } > + int *r = &x->k; > + int tem2 = *r; > + x->k += tem2 * j; > + } > +} > + > +/* { dg-final { scan-tree-dump-times "Executing store motion" 2 "lim2" } } */ > + > diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c > index b24bc64f2a7..5ca4738b20e 100644 > --- a/gcc/tree-ssa-loop-im.c > +++ b/gcc/tree-ssa-loop-im.c > @@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop, sbitmap contains_call) > if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) > last = bb; > > + if (inn_loop != loop > + && flow_loop_nested_p (bb->loop_father, inn_loop)) > + inn_loop = bb->loop_father; > + The comment says /* In a loop that is always entered we may proceed anyway. But record that we entered it and stop once we leave it. */ inn_loop = bb->loop_father; and your change would defeat that early return, no? > if (bitmap_bit_p (contains_call, bb->index)) > break; > > @@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop, sbitmap contains_call) > > if (bb->loop_father->header == bb) > { > - if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) > + if (!dominated_by_p (CDI_DOMINATORS, bb->loop_father->latch, bb)) > break; That's now a always false condition - a loops latch is always dominated by its header. The condition as written tries to verify whether the loop is always entered - mind we visit all blocks, not only those always executed. In fact for your testcase the x->j ref is _not_ always executed since the inner loop is conditional on n > 0. Richard.
Hi, On 2021/8/16 19:46, Richard Biener wrote: > On Mon, 16 Aug 2021, Xiong Hu Luo wrote: > >> It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for >> nested loops. inn_loop is updated to inner loop, so it need be restored >> when exiting from innermost loop. With this patch, the store instruction >> in outer loop could also be moved out of outer loop by store motion. >> Any comments? Thanks. > >> gcc/ChangeLog: >> >> * tree-ssa-loop-im.c (fill_always_executed_in_1): Restore >> inn_loop when exiting from innermost loop. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.dg/tree-ssa/ssa-lim-19.c: New test. >> --- >> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++++++++++++++++++++++ >> gcc/tree-ssa-loop-im.c | 6 +++++- >> 2 files changed, 29 insertions(+), 1 deletion(-) >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c >> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c >> new file mode 100644 >> index 00000000000..097a5ee4a4b >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c >> @@ -0,0 +1,24 @@ >> +/* PR/101293 */ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */ >> + >> +struct X { int i; int j; int k;}; >> + >> +void foo(struct X *x, int n, int l) >> +{ >> + for (int j = 0; j < l; j++) >> + { >> + for (int i = 0; i < n; ++i) >> + { >> + int *p = &x->j; >> + int tem = *p; >> + x->j += tem * i; >> + } >> + int *r = &x->k; >> + int tem2 = *r; >> + x->k += tem2 * j; >> + } >> +} >> + >> +/* { dg-final { scan-tree-dump-times "Executing store motion" 2 "lim2" } } */ >> + >> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c >> index b24bc64f2a7..5ca4738b20e 100644 >> --- a/gcc/tree-ssa-loop-im.c >> +++ b/gcc/tree-ssa-loop-im.c >> @@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop, sbitmap contains_call) >> if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) >> last = bb; >> >> + if (inn_loop != loop >> + && flow_loop_nested_p (bb->loop_father, inn_loop)) >> + inn_loop = bb->loop_father; >> + > > The comment says > > /* In a loop that is always entered we may proceed anyway. > But record that we entered it and stop once we leave it. > */ > inn_loop = bb->loop_father; > > and your change would defeat that early return, no? The issue is the search method exits too early when iterating the outer loop. For example of a nested loop, loop 1 includes 5,8,3,10,4,9 and loop2 includes 3,10. Currently, it breaks when bb is 3 as bb 3 doesn't dominate bb 9 of loop 1. But actually, both bb 5 and bb 4 are ALWAYS_EXECUTED for loop 1, so if there are store instructions in bb 4 they won't be processed by store motion again. 5<---- |\ | 8 \ 9 | \ | --->3--->4 | | 10---| SET_ALWAYS_EXECUTED_IN is only set to bb 5 on master code now, with this patch, it will continue search when meet bb 3 until bb 4, then last is updated to bb 4, it will break until exit edge is found at bb 4 by "if (!flow_bb_inside_loop_p (loop, e->dest))". Then the followed loop code will set bb 4 as ALWAYS_EXEUCTED and all it's idoms bb 5. while (1) { SET_ALWAYS_EXECUTED_IN (last, loop); if (last == loop->header) break; last = get_immediate_dominator (CDI_DOMINATORS, last); } After further discussion with Kewen, we found that the inn_loop variable is totally useless and could be removed. > >> if (bitmap_bit_p (contains_call, bb->index)) >> break; >> >> @@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop, sbitmap contains_call) >> >> if (bb->loop_father->header == bb) >> { >> - if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) >> + if (!dominated_by_p (CDI_DOMINATORS, bb->loop_father->latch, bb)) >> break; > > That's now a always false condition - a loops latch is always dominated > by its header. The condition as written tries to verify whether the > loop is always entered - mind we visit all blocks, not only those > always executed. Thanks for the catch! I am afraid the piece of code should be removed since it stops search of potential ALWAYS EXECUTED bb after inner loop... > > In fact for your testcase the x->j ref is _not_ always executed > since the inner loop is conditional on n > 0. Yes. But I want to move x->k (not x->j) out of loop 1 when l > 0 in store-motion. Attached the diff file without and with my patch to show the extra optimization. x->j is already moved out of loop 2 on master code. If change n and l to constant numbers like 100, master code could also do 2 store motions as expected. The edge from bb 5 to bb 4 doesn't exist now, so bb 4, bb 3 and bb 5 are ALWAYS EXECUTED for loop 1. struct X { int i; int j; int k;}; void foo(struct X *x, int n, int l) { for (int j = 0; j < l; j++) // loop 1 { for (int i = 0; i < n; ++i) // loop 2 { int *p = &x->j; int tem = *p; x->j += tem * i; } int *r = &x->k; int tem2 = *r; x->k += tem2 * j; } } > > Richard. >
On 2021/8/17 13:17, Xionghu Luo via Gcc-patches wrote: > Hi, > > On 2021/8/16 19:46, Richard Biener wrote: >> On Mon, 16 Aug 2021, Xiong Hu Luo wrote: >> >>> It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for >>> nested loops. inn_loop is updated to inner loop, so it need be restored >>> when exiting from innermost loop. With this patch, the store instruction >>> in outer loop could also be moved out of outer loop by store motion. >>> Any comments? Thanks. >> >>> gcc/ChangeLog: >>> >>> * tree-ssa-loop-im.c (fill_always_executed_in_1): Restore >>> inn_loop when exiting from innermost loop. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.dg/tree-ssa/ssa-lim-19.c: New test. >>> --- >>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++++++++++++++++++++++ >>> gcc/tree-ssa-loop-im.c | 6 +++++- >>> 2 files changed, 29 insertions(+), 1 deletion(-) >>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c >>> >>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c >>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c >>> new file mode 100644 >>> index 00000000000..097a5ee4a4b >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c >>> @@ -0,0 +1,24 @@ >>> +/* PR/101293 */ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */ >>> + >>> +struct X { int i; int j; int k;}; >>> + >>> +void foo(struct X *x, int n, int l) >>> +{ >>> + for (int j = 0; j < l; j++) >>> + { >>> + for (int i = 0; i < n; ++i) >>> + { >>> + int *p = &x->j; >>> + int tem = *p; >>> + x->j += tem * i; >>> + } >>> + int *r = &x->k; >>> + int tem2 = *r; >>> + x->k += tem2 * j; >>> + } >>> +} >>> + >>> +/* { dg-final { scan-tree-dump-times "Executing store motion" 2 >>> "lim2" } } */ >>> + >>> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c >>> index b24bc64f2a7..5ca4738b20e 100644 >>> --- a/gcc/tree-ssa-loop-im.c >>> +++ b/gcc/tree-ssa-loop-im.c >>> @@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop, >>> sbitmap contains_call) >>> if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) >>> last = bb; >>> + if (inn_loop != loop >>> + && flow_loop_nested_p (bb->loop_father, inn_loop)) >>> + inn_loop = bb->loop_father; >>> + >> >> The comment says >> >> /* In a loop that is always entered we may proceed anyway. >> But record that we entered it and stop once we leave >> it. >> */ >> inn_loop = bb->loop_father; >> >> and your change would defeat that early return, no? > > The issue is the search method exits too early when iterating the outer > loop. For example of a nested loop, loop 1 includes 5,8,3,10,4,9 > and loop2 includes 3,10. Currently, it breaks when bb is 3 as bb 3 > doesn't dominate bb 9 of loop 1. But actually, both bb 5 and bb 4 are > ALWAYS_EXECUTED for loop 1, so if there are store instructions in bb 4 > they won't be processed by store motion again. > > > 5<---- > |\ | > 8 \ 9 > | \ | > --->3--->4 > | | 10---| Correct the graph display: 5<---- |\ | 8 \ 9 | \ | --->3--->4 | | ---10 > > > SET_ALWAYS_EXECUTED_IN is only set to bb 5 on master code now, with this > patch, it will continue search when meet bb 3 until bb 4, then last is > updated > to bb 4, it will break until exit edge is found at bb 4 by > "if (!flow_bb_inside_loop_p (loop, e->dest))". Then the followed loop > code will > set bb 4 as ALWAYS_EXEUCTED and all it's idoms bb 5. > > > while (1) > { > SET_ALWAYS_EXECUTED_IN (last, loop); > if (last == loop->header) > break; > last = get_immediate_dominator (CDI_DOMINATORS, last); > } > > After further discussion with Kewen, we found that the inn_loop variable is > totally useless and could be removed. > > >> >>> if (bitmap_bit_p (contains_call, bb->index)) >>> break; >>> @@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop, >>> sbitmap contains_call) >>> if (bb->loop_father->header == bb) >>> { >>> - if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) >>> + if (!dominated_by_p (CDI_DOMINATORS, >>> bb->loop_father->latch, bb)) >>> break; >> >> That's now a always false condition - a loops latch is always dominated >> by its header. The condition as written tries to verify whether the >> loop is always entered - mind we visit all blocks, not only those >> always executed. > > Thanks for the catch! I am afraid the piece of code should be removed > since it stops > search of potential ALWAYS EXECUTED bb after inner loop... > >> >> In fact for your testcase the x->j ref is _not_ always executed >> since the inner loop is conditional on n > 0. > > Yes. But I want to move x->k (not x->j) out of loop 1 when l > 0 in > store-motion. > Attached the diff file without and with my patch to show the extra > optimization. > > x->j is already moved out of loop 2 on master code. > If change n and l to constant numbers like 100, master code could also > do 2 store > motions as expected. The edge from bb 5 to bb 4 doesn't exist now, so bb > 4, bb 3 > and bb 5 are ALWAYS EXECUTED for loop 1. > > > struct X { int i; int j; int k;}; > > void foo(struct X *x, int n, int l) > { > for (int j = 0; j < l; j++) // loop 1 > { > for (int i = 0; i < n; ++i) // loop 2 > { > int *p = &x->j; > int tem = *p; > x->j += tem * i; > } > int *r = &x->k; > int tem2 = *r; > x->k += tem2 * j; > } > } > >> >> Richard. >> >
On Tue, 17 Aug 2021, Xionghu Luo wrote: > Hi, > > On 2021/8/16 19:46, Richard Biener wrote: > > On Mon, 16 Aug 2021, Xiong Hu Luo wrote: > > > >> It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for > >> nested loops. inn_loop is updated to inner loop, so it need be restored > >> when exiting from innermost loop. With this patch, the store instruction > >> in outer loop could also be moved out of outer loop by store motion. > >> Any comments? Thanks. > > > >> gcc/ChangeLog: > >> > >> * tree-ssa-loop-im.c (fill_always_executed_in_1): Restore > >> inn_loop when exiting from innermost loop. > >> > >> gcc/testsuite/ChangeLog: > >> > >> * gcc.dg/tree-ssa/ssa-lim-19.c: New test. > >> --- > >> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++++++++++++++++++++++ > >> gcc/tree-ssa-loop-im.c | 6 +++++- > >> 2 files changed, 29 insertions(+), 1 deletion(-) > >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c > >> > >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c > >> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c > >> new file mode 100644 > >> index 00000000000..097a5ee4a4b > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c > >> @@ -0,0 +1,24 @@ > >> +/* PR/101293 */ > >> +/* { dg-do compile } */ > >> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */ > >> + > >> +struct X { int i; int j; int k;}; > >> + > >> +void foo(struct X *x, int n, int l) > >> +{ > >> + for (int j = 0; j < l; j++) > >> + { > >> + for (int i = 0; i < n; ++i) > >> + { > >> + int *p = &x->j; > >> + int tem = *p; > >> + x->j += tem * i; > >> + } > >> + int *r = &x->k; > >> + int tem2 = *r; > >> + x->k += tem2 * j; > >> + } > >> +} > >> + > >> +/* { dg-final { scan-tree-dump-times "Executing store motion" 2 "lim2" } } > >> */ > >> + > >> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c > >> index b24bc64f2a7..5ca4738b20e 100644 > >> --- a/gcc/tree-ssa-loop-im.c > >> +++ b/gcc/tree-ssa-loop-im.c > >> @@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop, sbitmap > >> @@ contains_call) > >> if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) > >> last = bb; > >> + if (inn_loop != loop > >> + && flow_loop_nested_p (bb->loop_father, inn_loop)) > >> + inn_loop = bb->loop_father; > >> + > > > > The comment says > > > > /* In a loop that is always entered we may proceed anyway. > > But record that we entered it and stop once we leave it. > > */ > > inn_loop = bb->loop_father; > > > > and your change would defeat that early return, no? > > The issue is the search method exits too early when iterating the outer > loop. For example of a nested loop, loop 1 includes 5,8,3,10,4,9 > and loop2 includes 3,10. Currently, it breaks when bb is 3 as bb 3 > doesn't dominate bb 9 of loop 1. But actually, both bb 5 and bb 4 are > ALWAYS_EXECUTED for loop 1, so if there are store instructions in bb 4 > they won't be processed by store motion again. > > > 5<---- > |\ | > 8 \ 9 > | \ | > --->3--->4 > | | > 10---| > > > SET_ALWAYS_EXECUTED_IN is only set to bb 5 on master code now, with this > patch, it will continue search when meet bb 3 until bb 4, then last is updated > to bb 4, it will break until exit edge is found at bb 4 by > "if (!flow_bb_inside_loop_p (loop, e->dest))". Then the followed loop code > will > set bb 4 as ALWAYS_EXEUCTED and all it's idoms bb 5. > > > while (1) > { > SET_ALWAYS_EXECUTED_IN (last, loop); > if (last == loop->header) > break; > last = get_immediate_dominator (CDI_DOMINATORS, last); > } > > After further discussion with Kewen, we found that the inn_loop variable is > totally useless and could be removed. > > > > > >> if (bitmap_bit_p (contains_call, bb->index)) > >> break; > >> > >> @@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop, sbitmap > >> @@ contains_call) > >> > >> if (bb->loop_father->header == bb) > >> { > >> - if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) > >> + if (!dominated_by_p (CDI_DOMINATORS, bb->loop_father->latch, > >> bb)) > >> break; > > > > That's now a always false condition - a loops latch is always dominated > > by its header. The condition as written tries to verify whether the > > loop is always entered - mind we visit all blocks, not only those > > always executed. > > Thanks for the catch! I am afraid the piece of code should be removed since > it stops > search of potential ALWAYS EXECUTED bb after inner loop... But the code says: /* In a loop that is always entered we may proceed anyway. But record that we entered it and stop once we leave it. */ and you do not remove this comment still it doesn't hold anymore after your patch. I don't say the current code is optimal - I just say it does exactly what is documented. > > > > In fact for your testcase the x->j ref is _not_ always executed > > since the inner loop is conditional on n > 0. > > Yes. But I want to move x->k (not x->j) out of loop 1 when l > 0 in > store-motion. Sorry, that wasn't clear. Yes, I agree the code fails to see always executed blocks after inner loops. But since the code simply walks all blocks in a loop instead of greedily following edges it has to do that since the inner loop might exit the outer loop as well, sth your change wouldn't honor. Consider while (--n) { do { if (do_exit) goto out; } while (1); p->x += 1; } out:; you'll see a CFG where the inner loop exits the outer loop as well. So I'm saying to improve the code you'll likely have to do more. And add a testcase like the following void __attribute__((noipa)) foo (int n, int m, int f, int *p, int *q) { while (--n) { do { *q += 1; if (f) goto out; } while (--m); *p += 1; } out:; } int main() { int i = 0; foo (10, 10, 1, (void *)0, &i); if (i != 1) __builtin_abort (); return 0; } Richard. > Attached the diff file without and with my patch to show the extra > optimization. > > x->j is already moved out of loop 2 on master code. > If change n and l to constant numbers like 100, master code could also do 2 > store > motions as expected. The edge from bb 5 to bb 4 doesn't exist now, so bb 4, bb > 3 > and bb 5 are ALWAYS EXECUTED for loop 1. > > > struct X { int i; int j; int k;}; > > void foo(struct X *x, int n, int l) > { > for (int j = 0; j < l; j++) // loop 1 > { > for (int i = 0; i < n; ++i) // loop 2 > { > int *p = &x->j; > int tem = *p; > x->j += tem * i; > } > int *r = &x->k; > int tem2 = *r; > x->k += tem2 * j; > } > } > > > > > Richard. > > > >
Hi! As an aside... On Mon, Aug 16, 2021 at 03:46:12AM -0500, Xiong Hu Luo wrote: > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c > --- a/gcc/tree-ssa-loop-im.c > +++ b/gcc/tree-ssa-loop-im.c You can make a saner order for your diffs by putting the testsuite changes after the rest. A very mimimal example would use [diff] orderfile = .gitorder with that file containing something like gcc/*.* and nothing else even. Yeah this is minimal ;-) Segher
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c new file mode 100644 index 00000000000..097a5ee4a4b --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c @@ -0,0 +1,24 @@ +/* PR/101293 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-lim2-details" } */ + +struct X { int i; int j; int k;}; + +void foo(struct X *x, int n, int l) +{ + for (int j = 0; j < l; j++) + { + for (int i = 0; i < n; ++i) + { + int *p = &x->j; + int tem = *p; + x->j += tem * i; + } + int *r = &x->k; + int tem2 = *r; + x->k += tem2 * j; + } +} + +/* { dg-final { scan-tree-dump-times "Executing store motion" 2 "lim2" } } */ + diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c index b24bc64f2a7..5ca4738b20e 100644 --- a/gcc/tree-ssa-loop-im.c +++ b/gcc/tree-ssa-loop-im.c @@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop, sbitmap contains_call) if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) last = bb; + if (inn_loop != loop + && flow_loop_nested_p (bb->loop_father, inn_loop)) + inn_loop = bb->loop_father; + if (bitmap_bit_p (contains_call, bb->index)) break; @@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop, sbitmap contains_call) if (bb->loop_father->header == bb) { - if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) + if (!dominated_by_p (CDI_DOMINATORS, bb->loop_father->latch, bb)) break; /* In a loop that is always entered we may proceed anyway.