Message ID | CAHFci2_oYx4Bev7ee2mG2cBpLig7KQftdAEd6Td5CO8e_N+AHQ@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Fri, Jun 23, 2017 at 12:19 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Mon, Jun 19, 2017 at 4:20 PM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Mon, Jun 19, 2017 at 3:40 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>> On Wed, Jun 14, 2017 at 2:54 PM, Richard Biener >>> <richard.guenther@gmail.com> wrote: >>>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng <Bin.Cheng@arm.com> wrote: >>>>> Hi, >>>>> Current primitive cost model merges partitions with data references sharing the same >>>>> base address. I believe it's designed to maximize data reuse in distribution, but >>>>> that should be done by dedicated data reusing algorithm. At this stage of merging, >>>>> we should be conservative and only merge partitions with the same references. >>>>> Bootstrap and test on x86_64 and AArch64. Is it OK? >>>> >>>> Well, I'd say "conservative" is merging more, not less. For example >>>> splitting a[i+1] from a[i] >>>> would be bad(?), so I'd see to allow unequal DR_INIT as "equal" for >>>> merging. Maybe >>>> DR_INIT within a cacheline or so. >>>> >>>> How many extra distributions in say SPEC do you get from this change alone? >>> Hi, >>> I collected data for spec2006 only with/without this patch. I am a >>> bit surprised that it doesn't change the number of distributed loops. >>>> >>>> It shows also that having partition->reads_and_writes would be nice >>>> ... the code duplication >>> Yeah, I merged read/write data references in previous patch, now this >>> duplication is gone. Update patch attached. Is it OK? >> >> + gcc_assert (i < datarefs_vec.length ()); >> + dr1 = datarefs_vec[i]; >> >> these asserts are superfluous -- vec::operator[] does them as well. >> >> Ok if you remove them. > Done. > I realized I made mistakes when measuring the impact of this patch. > This patch only apparently causes failure of > gcc.dg/tree-ssa/ldist-6.c, so here is the updated patch. I also > collected the number of distributed loops in spec2k6 as below: > trunk: 5882 > only this patch: 7130 > whole patch series: 5237 > So the conclusion is, this patch does aggressive distribution like > ldist-6.c, which means worse data-locality. The following patch does > more fusion which mitigates impact of this patch and results in > conservative distribution overall. What changed in the patch? Did you attach the correct one? I'm not sure ldist-6.c is a "valid" testcase but I didn't try to see where it was reduced from. > But as we lack of data locality > cost model, ldist-6.c remains failed even after applying whole patch > series. Hmm, a cache-sensitive cost model is need for several passes > now, distribution, prefetch and (possible) interchange. > Richard, do you have second comment based on the new data? I expected the "only this patch" result somewhat, as said, I'd have allowed "related" references to fuse by not requiring equal DR_INIT for example. I suggest to go forward with it in its current form. We can tweak the cost model later. Thanks, Richard. > Thanks, > bin > 2017-06-20 Bin Cheng <bin.cheng@arm.com> > > * tree-loop-distribution.c (ref_base_address): Delete. > (similar_memory_accesses): Rename ... > (share_memory_accesses): ... to this. Check if partitions access > the same memory reference. > (distribute_loop): Call share_memory_accesses. > > gcc/testsuite/ChangeLog > 2017-06-20 Bin Cheng <bin.cheng@arm.com> > > * gcc.dg/tree-ssa/ldist-6.c: XFAIL.
On Fri, Jun 23, 2017 at 11:48 AM, Richard Biener <richard.guenther@gmail.com> wrote: > On Fri, Jun 23, 2017 at 12:19 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >> On Mon, Jun 19, 2017 at 4:20 PM, Richard Biener >> <richard.guenther@gmail.com> wrote: >>> On Mon, Jun 19, 2017 at 3:40 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>>> On Wed, Jun 14, 2017 at 2:54 PM, Richard Biener >>>> <richard.guenther@gmail.com> wrote: >>>>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng <Bin.Cheng@arm.com> wrote: >>>>>> Hi, >>>>>> Current primitive cost model merges partitions with data references sharing the same >>>>>> base address. I believe it's designed to maximize data reuse in distribution, but >>>>>> that should be done by dedicated data reusing algorithm. At this stage of merging, >>>>>> we should be conservative and only merge partitions with the same references. >>>>>> Bootstrap and test on x86_64 and AArch64. Is it OK? >>>>> >>>>> Well, I'd say "conservative" is merging more, not less. For example >>>>> splitting a[i+1] from a[i] >>>>> would be bad(?), so I'd see to allow unequal DR_INIT as "equal" for >>>>> merging. Maybe >>>>> DR_INIT within a cacheline or so. >>>>> >>>>> How many extra distributions in say SPEC do you get from this change alone? >>>> Hi, >>>> I collected data for spec2006 only with/without this patch. I am a >>>> bit surprised that it doesn't change the number of distributed loops. >>>>> >>>>> It shows also that having partition->reads_and_writes would be nice >>>>> ... the code duplication >>>> Yeah, I merged read/write data references in previous patch, now this >>>> duplication is gone. Update patch attached. Is it OK? >>> >>> + gcc_assert (i < datarefs_vec.length ()); >>> + dr1 = datarefs_vec[i]; >>> >>> these asserts are superfluous -- vec::operator[] does them as well. >>> >>> Ok if you remove them. >> Done. >> I realized I made mistakes when measuring the impact of this patch. >> This patch only apparently causes failure of >> gcc.dg/tree-ssa/ldist-6.c, so here is the updated patch. I also >> collected the number of distributed loops in spec2k6 as below: >> trunk: 5882 >> only this patch: 7130 >> whole patch series: 5237 >> So the conclusion is, this patch does aggressive distribution like >> ldist-6.c, which means worse data-locality. The following patch does >> more fusion which mitigates impact of this patch and results in >> conservative distribution overall. > > What changed in the patch? Did you attach the correct one? No code changed in this one. I just added test case change which can't be resolved by following patches. ldist-6.c slipped away because of a bug in patch: [11/13]Annotate partition by its parallelism execution type > > I'm not sure ldist-6.c is a "valid" testcase but I didn't try to see > where it was reduced from. > >> But as we lack of data locality >> cost model, ldist-6.c remains failed even after applying whole patch >> series. Hmm, a cache-sensitive cost model is need for several passes >> now, distribution, prefetch and (possible) interchange. >> Richard, do you have second comment based on the new data? > > I expected the "only this patch" result somewhat, as said, I'd have > allowed "related" references to fuse by not requiring equal > DR_INIT for example. > > I suggest to go forward with it in its current form. We can tweak the > cost model later. Yeah. > > Thanks, > Richard. > >> Thanks, >> bin >> 2017-06-20 Bin Cheng <bin.cheng@arm.com> >> >> * tree-loop-distribution.c (ref_base_address): Delete. >> (similar_memory_accesses): Rename ... >> (share_memory_accesses): ... to this. Check if partitions access >> the same memory reference. >> (distribute_loop): Call share_memory_accesses. >> >> gcc/testsuite/ChangeLog >> 2017-06-20 Bin Cheng <bin.cheng@arm.com> >> >> * gcc.dg/tree-ssa/ldist-6.c: XFAIL.
From a002d0a88ab9e981d9c57bd8f1203072290623ad Mon Sep 17 00:00:00 2001 From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com> Date: Fri, 9 Jun 2017 12:41:36 +0100 Subject: [PATCH 08/13] share-memory-access-20170608.txt --- gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c | 2 +- gcc/tree-loop-distribution.c | 69 +++++++++++++++------------------ 2 files changed, 32 insertions(+), 39 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c index 8eb1c62..e0a68d8 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c @@ -34,4 +34,4 @@ int loop1 (int k) return a[1000-2] + b[1000-1] + c[1000-2] + d[1000-2]; } -/* { dg-final { scan-tree-dump-times "distributed: split to 2 loops" 0 "ldist" } } */ +/* { dg-final { scan-tree-dump-times "distributed: split to 2 loops" 0 "ldist" { xfail *-*-* } } } */ diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index eafd119..119863f 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -1268,30 +1268,16 @@ classify_partition (loop_p loop, struct graph *rdg, partition *partition) } } -/* For a data reference REF, return the declaration of its base - address or NULL_TREE if the base is not determined. */ - -static tree -ref_base_address (data_reference_p dr) -{ - tree base_address = DR_BASE_ADDRESS (dr); - if (base_address - && TREE_CODE (base_address) == ADDR_EXPR) - return TREE_OPERAND (base_address, 0); - - return base_address; -} - -/* Returns true when PARTITION1 and PARTITION2 have similar memory - accesses in RDG. */ +/* Returns true when PARTITION1 and PARTITION2 access the same memory + object in RDG. */ static bool -similar_memory_accesses (struct graph *rdg, partition *partition1, - partition *partition2) +share_memory_accesses (struct graph *rdg, + partition *partition1, partition *partition2) { - unsigned i, j, k, l; + unsigned i, j; bitmap_iterator bi, bj; - data_reference_p ref1, ref2; + data_reference_p dr1, dr2; /* First check whether in the intersection of the two partitions are any loads or stores. Common loads are the situation that happens @@ -1301,23 +1287,30 @@ similar_memory_accesses (struct graph *rdg, partition *partition1, || RDG_MEM_READS_STMT (rdg, i)) return true; - /* Then check all data-references against each other. */ - EXECUTE_IF_SET_IN_BITMAP (partition1->stmts, 0, i, bi) - if (RDG_MEM_WRITE_STMT (rdg, i) - || RDG_MEM_READS_STMT (rdg, i)) - EXECUTE_IF_SET_IN_BITMAP (partition2->stmts, 0, j, bj) - if (RDG_MEM_WRITE_STMT (rdg, j) - || RDG_MEM_READS_STMT (rdg, j)) - { - FOR_EACH_VEC_ELT (RDG_DATAREFS (rdg, i), k, ref1) - { - tree base1 = ref_base_address (ref1); - if (base1) - FOR_EACH_VEC_ELT (RDG_DATAREFS (rdg, j), l, ref2) - if (base1 == ref_base_address (ref2)) - return true; - } - } + /* Then check whether the two partitions access the same memory object. */ + EXECUTE_IF_SET_IN_BITMAP (partition1->datarefs, 0, i, bi) + { + dr1 = datarefs_vec[i]; + + if (!DR_BASE_ADDRESS (dr1) + || !DR_OFFSET (dr1) || !DR_INIT (dr1) || !DR_STEP (dr1)) + continue; + + EXECUTE_IF_SET_IN_BITMAP (partition2->datarefs, 0, j, bj) + { + dr2 = datarefs_vec[j]; + + if (!DR_BASE_ADDRESS (dr2) + || !DR_OFFSET (dr2) || !DR_INIT (dr2) || !DR_STEP (dr2)) + continue; + + if (operand_equal_p (DR_BASE_ADDRESS (dr1), DR_BASE_ADDRESS (dr2), 0) + && operand_equal_p (DR_OFFSET (dr1), DR_OFFSET (dr2), 0) + && operand_equal_p (DR_INIT (dr1), DR_INIT (dr2), 0) + && operand_equal_p (DR_STEP (dr1), DR_STEP (dr2), 0)) + return true; + } + } return false; } @@ -1654,7 +1647,7 @@ distribute_loop (struct loop *loop, vec<gimple *> stmts, for (int j = i + 1; partitions.iterate (j, &partition); ++j) { - if (similar_memory_accesses (rdg, into, partition)) + if (share_memory_accesses (rdg, into, partition)) { partition_merge_into (into, partition, FUSE_SHARE_REF); partitions.unordered_remove (j); -- 1.9.1