Message ID | 2b986549-702b-2a09-84d5-e3f93518e4c3@suse.cz |
---|---|
State | New |
Headers | show |
On Wed, Jan 18, 2017 at 10:10 AM, Martin Liška <mliska@suse.cz> wrote: > Hello. > > After basic understanding of loop predictive commoning, the problematic combined chain is: > > Loads-only chain 0x38b6730 (combined) > max distance 0 > references: > MEM[(real(kind=8) *)vectp_a.29_81] (id 1) > offset 20 > distance 0 > MEM[(real(kind=8) *)vectp_a.38_141] (id 3) > offset 20 > distance 0 > > Loads-only chain 0x38b68b0 (combined) > max distance 0 > references: > MEM[(real(kind=8) *)vectp_a.23_102] (id 0) > offset 0 > distance 0 > MEM[(real(kind=8) *)vectp_a.33_33] (id 2) > offset 0 > distance 0 > > Combination chain 0x38b65b0 > max distance 0, may reuse first > equal to 0x38b6730 + 0x38b68b0 in type vector(2) real(kind=8) > references: > combination ref > in statement predreastmp.48_10 = vect__32.31_78 + vect__28.25_100; > > distance 0 > combination ref > in statement predreastmp.50_17 = vect__42.41_138 + vect__38.36_29; > > distance 0 > > It's important to note that distance is equal to zero (happening within a same loop iteration). > Aforementioned chains correspond to: > > ... > r2: vect__28.25_100 = MEM[(real(kind=8) *)vectp_a.23_102]; > vectp_a.23_99 = vectp_a.23_102 + 16; > vect__28.26_98 = MEM[(real(kind=8) *)vectp_a.23_99]; > vect__82.27_97 = vect__22.22_108; > vect__82.27_96 = vect__22.22_107; > vect__79.28_95 = vect__82.27_97 + vect__84.17_120; > vect__79.28_94 = vect__82.27_96 + vect__84.17_119; > r1: vect__32.31_78 = MEM[(real(kind=8) *)vectp_a.29_81]; > vectp_a.29_77 = vectp_a.29_81 + 16; > vect__32.32_76 = MEM[(real(kind=8) *)vectp_a.29_77]; > vect__38.35_39 = MEM[(real(kind=8) *)vectp_a.33_57]; > r2': vectp_a.33_33 = vectp_a.33_57 + 16; > vect__38.36_29 = MEM[(real(kind=8) *)vectp_a.33_33]; > vect__56.37_23 = vect__38.35_39; > vect__56.37_15 = vect__32.32_76; > vect__42.40_161 = MEM[(real(kind=8) *)vectp_a.38_163]; > vectp_a.38_141 = vectp_a.38_163 + 16; > r1': vect__42.41_138 = MEM[(real(kind=8) *)vectp_a.38_141]; > vect__54.42_135 = vect__42.40_161 + vect__56.37_23; > r1'+r2': predreastmp.50_17 = vect__42.41_138 + vect__38.36_29; > predreastmp.51_18 = vect__56.37_15; > vect__54.42_134 = predreastmp.50_17; > r1+r2: predreastmp.48_10 = vect__32.31_78 + vect__28.25_100; > ... > > Problematic construct is that while having load-only chains r1->r1' and r2->r2', the combination > is actually r1'+r2'->r1+r2, which cause the troubles. I believe the proper fix is to reject such > combinations where combined root stmt does not dominate usages. It's probably corner case as it does > not reuse any values among loop iterations (which is main motivation of the pass), it's doing PRE > if I'm right. I think this is OK, though I can't approve it. Later DOM pass should be able to pick up the redundant combination operation between predreastmp.50_17 and predreastmp.48_10, could you double check this? Thanks, bin > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > > Ready to be installed? > Martin >
On Wed, Jan 18, 2017 at 11:10 AM, Martin Liška <mliska@suse.cz> wrote: > Hello. > > After basic understanding of loop predictive commoning, the problematic combined chain is: > > Loads-only chain 0x38b6730 (combined) > max distance 0 > references: > MEM[(real(kind=8) *)vectp_a.29_81] (id 1) > offset 20 > distance 0 > MEM[(real(kind=8) *)vectp_a.38_141] (id 3) > offset 20 > distance 0 > > Loads-only chain 0x38b68b0 (combined) > max distance 0 > references: > MEM[(real(kind=8) *)vectp_a.23_102] (id 0) > offset 0 > distance 0 > MEM[(real(kind=8) *)vectp_a.33_33] (id 2) > offset 0 > distance 0 > > Combination chain 0x38b65b0 > max distance 0, may reuse first > equal to 0x38b6730 + 0x38b68b0 in type vector(2) real(kind=8) > references: > combination ref > in statement predreastmp.48_10 = vect__32.31_78 + vect__28.25_100; > > distance 0 > combination ref > in statement predreastmp.50_17 = vect__42.41_138 + vect__38.36_29; > > distance 0 > > It's important to note that distance is equal to zero (happening within a same loop iteration). > Aforementioned chains correspond to: > > ... > r2: vect__28.25_100 = MEM[(real(kind=8) *)vectp_a.23_102]; > vectp_a.23_99 = vectp_a.23_102 + 16; > vect__28.26_98 = MEM[(real(kind=8) *)vectp_a.23_99]; > vect__82.27_97 = vect__22.22_108; > vect__82.27_96 = vect__22.22_107; > vect__79.28_95 = vect__82.27_97 + vect__84.17_120; > vect__79.28_94 = vect__82.27_96 + vect__84.17_119; > r1: vect__32.31_78 = MEM[(real(kind=8) *)vectp_a.29_81]; > vectp_a.29_77 = vectp_a.29_81 + 16; > vect__32.32_76 = MEM[(real(kind=8) *)vectp_a.29_77]; > vect__38.35_39 = MEM[(real(kind=8) *)vectp_a.33_57]; > r2': vectp_a.33_33 = vectp_a.33_57 + 16; > vect__38.36_29 = MEM[(real(kind=8) *)vectp_a.33_33]; > vect__56.37_23 = vect__38.35_39; > vect__56.37_15 = vect__32.32_76; > vect__42.40_161 = MEM[(real(kind=8) *)vectp_a.38_163]; > vectp_a.38_141 = vectp_a.38_163 + 16; > r1': vect__42.41_138 = MEM[(real(kind=8) *)vectp_a.38_141]; > vect__54.42_135 = vect__42.40_161 + vect__56.37_23; > r1'+r2': predreastmp.50_17 = vect__42.41_138 + vect__38.36_29; > predreastmp.51_18 = vect__56.37_15; > vect__54.42_134 = predreastmp.50_17; > r1+r2: predreastmp.48_10 = vect__32.31_78 + vect__28.25_100; > ... > > Problematic construct is that while having load-only chains r1->r1' and r2->r2', the combination > is actually r1'+r2'->r1+r2, which cause the troubles. I believe the proper fix is to reject such > combinations where combined root stmt does not dominate usages. It's probably corner case as it does > not reuse any values among loop iterations (which is main motivation of the pass), it's doing PRE > if I'm right. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > > Ready to be installed? I'm not sure. If we have such zero distance refs in the IL at the time pcom runs then not handling them will pessimize code-gen for cases where they are part of a larger chain. Esp. I don't like another stmt_dominates_stmt_p call and thus rather not handle length == 0 at all... We already seem to go great length in associating stuff when combining stuff thus isn't this maybe an artifact of this association? Maybe we simply need to sort the new chain after combining it so the root stmt comes last? Note that there seems to be only a single length per chain but not all refs in a chain need to have the same distance. This means your fix is likely incomplete? What prevents the situation to arise for distance != 0? Richard. > Martin >
On Wed, Jan 18, 2017 at 11:10:32AM +0100, Martin Liška wrote: > Hello. > > After basic understanding of loop predictive commoning, the problematic combined chain is: > > Loads-only chain 0x38b6730 (combined) > max distance 0 > references: > MEM[(real(kind=8) *)vectp_a.29_81] (id 1) > offset 20 > distance 0 > MEM[(real(kind=8) *)vectp_a.38_141] (id 3) > offset 20 > distance 0 > > Loads-only chain 0x38b68b0 (combined) > max distance 0 > references: > MEM[(real(kind=8) *)vectp_a.23_102] (id 0) > offset 0 > distance 0 > MEM[(real(kind=8) *)vectp_a.33_33] (id 2) > offset 0 > distance 0 > > Combination chain 0x38b65b0 > max distance 0, may reuse first > equal to 0x38b6730 + 0x38b68b0 in type vector(2) real(kind=8) > references: > combination ref > in statement predreastmp.48_10 = vect__32.31_78 + vect__28.25_100; > > distance 0 > combination ref > in statement predreastmp.50_17 = vect__42.41_138 + vect__38.36_29; > > distance 0 > > It's important to note that distance is equal to zero (happening within a same loop iteration). > Aforementioned chains correspond to: > > ... > r2: vect__28.25_100 = MEM[(real(kind=8) *)vectp_a.23_102]; > vectp_a.23_99 = vectp_a.23_102 + 16; > vect__28.26_98 = MEM[(real(kind=8) *)vectp_a.23_99]; > vect__82.27_97 = vect__22.22_108; > vect__82.27_96 = vect__22.22_107; > vect__79.28_95 = vect__82.27_97 + vect__84.17_120; > vect__79.28_94 = vect__82.27_96 + vect__84.17_119; > r1: vect__32.31_78 = MEM[(real(kind=8) *)vectp_a.29_81]; > vectp_a.29_77 = vectp_a.29_81 + 16; > vect__32.32_76 = MEM[(real(kind=8) *)vectp_a.29_77]; > vect__38.35_39 = MEM[(real(kind=8) *)vectp_a.33_57]; > r2': vectp_a.33_33 = vectp_a.33_57 + 16; > vect__38.36_29 = MEM[(real(kind=8) *)vectp_a.33_33]; > vect__56.37_23 = vect__38.35_39; > vect__56.37_15 = vect__32.32_76; > vect__42.40_161 = MEM[(real(kind=8) *)vectp_a.38_163]; > vectp_a.38_141 = vectp_a.38_163 + 16; > r1': vect__42.41_138 = MEM[(real(kind=8) *)vectp_a.38_141]; > vect__54.42_135 = vect__42.40_161 + vect__56.37_23; > r1'+r2': predreastmp.50_17 = vect__42.41_138 + vect__38.36_29; > predreastmp.51_18 = vect__56.37_15; > vect__54.42_134 = predreastmp.50_17; > r1+r2: predreastmp.48_10 = vect__32.31_78 + vect__28.25_100; > ... > > Problematic construct is that while having load-only chains r1->r1' and r2->r2', the combination > is actually r1'+r2'->r1+r2, which cause the troubles. I believe the proper fix is to reject such > combinations where combined root stmt does not dominate usages. It's probably corner case as it does > not reuse any values among loop iterations (which is main motivation of the pass), it's doing PRE > if I'm right. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. I could bootstrap on aarch64-none-linux-gnu without any issues, regression tests are fine and the testcase compiles without ICE. Thanks for fixing this. VP. > > Ready to be installed? > Martin > > From 41b153cf975374fff48419ec8ac5991ac134735f Mon Sep 17 00:00:00 2001 > From: marxin <mliska@suse.cz> > Date: Tue, 17 Jan 2017 14:22:40 +0100 > Subject: [PATCH] Be careful about combined chain with length == 0 (PR > tree-optimization/70754). > > gcc/testsuite/ChangeLog: > > 2017-01-17 Martin Liska <mliska@suse.cz> > > PR tree-optimization/70754 > * gfortran.dg/pr70754.f90: New test. > > gcc/ChangeLog: > > 2017-01-17 Martin Liska <mliska@suse.cz> > > PR tree-optimization/70754 > * tree-predcom.c (combine_chains): Do not create a combined chain > with length equal to zero when root_stmt does not dominate > stmts of references. > --- > gcc/testsuite/gfortran.dg/pr70754.f90 | 35 +++++++++++++++++++++++++++++++++++ > gcc/tree-predcom.c | 10 ++++++++++ > 2 files changed, 45 insertions(+) > create mode 100644 gcc/testsuite/gfortran.dg/pr70754.f90 > > diff --git a/gcc/testsuite/gfortran.dg/pr70754.f90 b/gcc/testsuite/gfortran.dg/pr70754.f90 > new file mode 100644 > index 00000000000..758901ce2b2 > --- /dev/null > +++ b/gcc/testsuite/gfortran.dg/pr70754.f90 > @@ -0,0 +1,35 @@ > +! { dg-options "-Ofast" } > + > +module m > + implicit none > + private > + save > + > + integer, parameter, public :: & > + ii4 = selected_int_kind(6), & > + rr8 = selected_real_kind(13) > + > + integer (ii4), dimension(40,40,199), public :: xyz > + public :: foo > +contains > + subroutine foo(a) > + real (rr8), dimension(40,40), intent(out) :: a > + real (rr8), dimension(40,40) :: b > + integer (ii4), dimension(40,40) :: c > + integer i, j > + > + do i=1,8 > + b(i,j) = 123 * a(i,j) + a(i,j+1) & > + + a(i,j) + a(i+1,j+1) & > + + a(i+1,j) + a(i-1,j+1) & > + + a(i-1,j) > + c(i,j) = 123 > + end do > + > + where ((xyz(:,:,2) /= 0) .and. (c /= 0)) > + a = b/real(c) > + elsewhere > + a = 456 > + endwhere > + end subroutine foo > +end module m > diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c > index f0b70a61fb8..768f976cb87 100644 > --- a/gcc/tree-predcom.c > +++ b/gcc/tree-predcom.c > @@ -2323,6 +2323,16 @@ combine_chains (chain_p ch1, chain_p ch2) > root_stmt = get_chain_root (new_chain)->stmt; > for (i = 1; new_chain->refs.iterate (i, &nw); i++) > { > + /* PR tree-optimization/70754 > + For a combined chain with length equal to zero, we have to guarantee > + that ROOT_STMT dominates all references. */ > + if (new_chain->length == 0 > + && !stmt_dominates_stmt_p (root_stmt, nw->stmt)) > + { > + release_chain (new_chain); > + return NULL; > + } > + > if (nw->distance == new_chain->length > && !stmt_dominates_stmt_p (nw->stmt, root_stmt)) > { > -- > 2.11.0 >
On Wed, Jan 18, 2017 at 2:54 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Wed, Jan 18, 2017 at 11:10 AM, Martin Liška <mliska@suse.cz> wrote: >> Hello. >> >> >> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. >> >> Ready to be installed? > > I'm not sure. If we have such zero distance refs in the IL at the > time pcom runs then not handling > them will pessimize code-gen for cases where they are part of a larger > chain. Esp. I don't like Do you mean different chains distributed because of MAX_DISTANCE by "larger chain"? With the patch, such chain of refs would still be pred-commoned, just the arithmetic operation not combined, which could be handled by later DOM? > another stmt_dominates_stmt_p call and thus rather not handle length > == 0 at all... Not handle length == 0 chains at all may be sub-optimal. As you said, such chain of refs at the point may simply because previous dom/cse fail to analyze the references. > > We already seem to go great length in associating stuff when combining > stuff thus isn't this > maybe an artifact of this association? Maybe we simply need to sort > the new chain after > combining it so the root stmt comes last? > > Note that there seems to be only a single length per chain but not all > refs in a chain need to > have the same distance. This means your fix is likely incomplete? > What prevents the situation > to arise for distance != 0? Yes, it's possible for two refs have the same distance in a chain with length > 0. But that should not be a problem, because existing uses are replaced by newly generated PHI variables which always dominate the uses, right? Thanks, bin > > Richard. > >> Martin >>
On Wed, Jan 18, 2017 at 4:32 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Wed, Jan 18, 2017 at 2:54 PM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Wed, Jan 18, 2017 at 11:10 AM, Martin Liška <mliska@suse.cz> wrote: >>> Hello. >>> >>> >>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. >>> >>> Ready to be installed? >> >> I'm not sure. If we have such zero distance refs in the IL at the >> time pcom runs then not handling >> them will pessimize code-gen for cases where they are part of a larger >> chain. Esp. I don't like > Do you mean different chains distributed because of MAX_DISTANCE by > "larger chain"? With the patch, such chain of refs would still be > pred-commoned, just the arithmetic operation not combined, which could > be handled by later DOM? >> another stmt_dominates_stmt_p call and thus rather not handle length >> == 0 at all... > Not handle length == 0 chains at all may be sub-optimal. As you said, > such chain of refs at the point may simply because previous dom/cse > fail to analyze the references. >> >> We already seem to go great length in associating stuff when combining >> stuff thus isn't this >> maybe an artifact of this association? Maybe we simply need to sort >> the new chain after >> combining it so the root stmt comes last? >> >> Note that there seems to be only a single length per chain but not all >> refs in a chain need to >> have the same distance. This means your fix is likely incomplete? >> What prevents the situation >> to arise for distance != 0? > Yes, it's possible for two refs have the same distance in a chain with > length > 0. But that should not be a problem, because existing uses > are replaced by newly generated PHI variables which always dominate > the uses, right? I must admit I don't know predcom in such detail but then can we handle distance == 0 by simply inserting a PHI for those as well (a degenerate one of course)? Or can for distance == 0 the ref be not loop invariant? Note that for length == 0 all refs in the chain will have a dependence distance of zero. So my first argument probably doesn't hold and we could simply remove handling of length == 0 chains and rely on CSE? Richard. > Thanks, > bin >> >> Richard. >> >>> Martin >>>
On Thu, Jan 19, 2017 at 9:42 AM, Richard Biener <richard.guenther@gmail.com> wrote: > On Wed, Jan 18, 2017 at 4:32 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >> On Wed, Jan 18, 2017 at 2:54 PM, Richard Biener >> <richard.guenther@gmail.com> wrote: >>> On Wed, Jan 18, 2017 at 11:10 AM, Martin Liška <mliska@suse.cz> wrote: >>>> Hello. >>>> >>>> >>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. >>>> >>>> Ready to be installed? >>> >>> I'm not sure. If we have such zero distance refs in the IL at the >>> time pcom runs then not handling >>> them will pessimize code-gen for cases where they are part of a larger >>> chain. Esp. I don't like >> Do you mean different chains distributed because of MAX_DISTANCE by >> "larger chain"? With the patch, such chain of refs would still be >> pred-commoned, just the arithmetic operation not combined, which could >> be handled by later DOM? >>> another stmt_dominates_stmt_p call and thus rather not handle length >>> == 0 at all... >> Not handle length == 0 chains at all may be sub-optimal. As you said, >> such chain of refs at the point may simply because previous dom/cse >> fail to analyze the references. >>> >>> We already seem to go great length in associating stuff when combining >>> stuff thus isn't this >>> maybe an artifact of this association? Maybe we simply need to sort >>> the new chain after >>> combining it so the root stmt comes last? >>> >>> Note that there seems to be only a single length per chain but not all >>> refs in a chain need to >>> have the same distance. This means your fix is likely incomplete? >>> What prevents the situation >>> to arise for distance != 0? >> Yes, it's possible for two refs have the same distance in a chain with >> length > 0. But that should not be a problem, because existing uses >> are replaced by newly generated PHI variables which always dominate >> the uses, right? > > I must admit I don't know predcom in such detail but then can we handle > distance == 0 by simply inserting a PHI for those as well (a degenerate > one of course)? Or can for distance == 0 the ref be not loop invariant? Not sure if I understand the question correctly. Distance is difference of niter between one ref and the root ref of the chain, so 0 distance/length doesn't mean a loop invariant, it's very likely two (exactly the same) references in each loop iteration, the address of reference is still a SCEV. OTOH, invariant chain has invariant address, and is handled separately. For the first question, it's length, rather than distance that decides how the chain is handled. For length > 0 chain, we have to insert PHIs to pass carried result of memory reference, even some refs may have 0 distance to the root ref. > > Note that for length == 0 all refs in the chain will have a dependence distance > of zero. So my first argument probably doesn't hold and we could simply > remove handling of length == 0 chains and rely on CSE? I am not sure, that CSE opportunity of references exists at this point means previous cse pass failed for some reason. Predcom could be the only pass that can handle such case as it understands data reference better. Note Martin's patch is not to skip handling of length == 0 chain, later ref will still be CSEed with result of root ref, only the combination operation like chain1 + chain2 is skipped. In this case, following dom should be able to handle such (loop independent) cse opportunities. Thanks, bin > > Richard. > >> Thanks, >> bin >>> >>> Richard. >>> >>>> Martin >>>>
From 41b153cf975374fff48419ec8ac5991ac134735f Mon Sep 17 00:00:00 2001 From: marxin <mliska@suse.cz> Date: Tue, 17 Jan 2017 14:22:40 +0100 Subject: [PATCH] Be careful about combined chain with length == 0 (PR tree-optimization/70754). gcc/testsuite/ChangeLog: 2017-01-17 Martin Liska <mliska@suse.cz> PR tree-optimization/70754 * gfortran.dg/pr70754.f90: New test. gcc/ChangeLog: 2017-01-17 Martin Liska <mliska@suse.cz> PR tree-optimization/70754 * tree-predcom.c (combine_chains): Do not create a combined chain with length equal to zero when root_stmt does not dominate stmts of references. --- gcc/testsuite/gfortran.dg/pr70754.f90 | 35 +++++++++++++++++++++++++++++++++++ gcc/tree-predcom.c | 10 ++++++++++ 2 files changed, 45 insertions(+) create mode 100644 gcc/testsuite/gfortran.dg/pr70754.f90 diff --git a/gcc/testsuite/gfortran.dg/pr70754.f90 b/gcc/testsuite/gfortran.dg/pr70754.f90 new file mode 100644 index 00000000000..758901ce2b2 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr70754.f90 @@ -0,0 +1,35 @@ +! { dg-options "-Ofast" } + +module m + implicit none + private + save + + integer, parameter, public :: & + ii4 = selected_int_kind(6), & + rr8 = selected_real_kind(13) + + integer (ii4), dimension(40,40,199), public :: xyz + public :: foo +contains + subroutine foo(a) + real (rr8), dimension(40,40), intent(out) :: a + real (rr8), dimension(40,40) :: b + integer (ii4), dimension(40,40) :: c + integer i, j + + do i=1,8 + b(i,j) = 123 * a(i,j) + a(i,j+1) & + + a(i,j) + a(i+1,j+1) & + + a(i+1,j) + a(i-1,j+1) & + + a(i-1,j) + c(i,j) = 123 + end do + + where ((xyz(:,:,2) /= 0) .and. (c /= 0)) + a = b/real(c) + elsewhere + a = 456 + endwhere + end subroutine foo +end module m diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c index f0b70a61fb8..768f976cb87 100644 --- a/gcc/tree-predcom.c +++ b/gcc/tree-predcom.c @@ -2323,6 +2323,16 @@ combine_chains (chain_p ch1, chain_p ch2) root_stmt = get_chain_root (new_chain)->stmt; for (i = 1; new_chain->refs.iterate (i, &nw); i++) { + /* PR tree-optimization/70754 + For a combined chain with length equal to zero, we have to guarantee + that ROOT_STMT dominates all references. */ + if (new_chain->length == 0 + && !stmt_dominates_stmt_p (root_stmt, nw->stmt)) + { + release_chain (new_chain); + return NULL; + } + if (nw->distance == new_chain->length && !stmt_dominates_stmt_p (nw->stmt, root_stmt)) { -- 2.11.0