diff mbox

[09/13] Simply cost model merges partitions with the same references

Message ID CAHFci2_oYx4Bev7ee2mG2cBpLig7KQftdAEd6Td5CO8e_N+AHQ@mail.gmail.com
State New
Headers show

Commit Message

Bin.Cheng June 23, 2017, 10:19 a.m. UTC
On Mon, Jun 19, 2017 at 4:20 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, Jun 19, 2017 at 3:40 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Wed, Jun 14, 2017 at 2:54 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>> Hi,
>>>> Current primitive cost model merges partitions with data references sharing the same
>>>> base address.  I believe it's designed to maximize data reuse in distribution, but
>>>> that should be done by dedicated data reusing algorithm.  At this stage of merging,
>>>> we should be conservative and only merge partitions with the same references.
>>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>
>>> Well, I'd say "conservative" is merging more, not less.  For example
>>> splitting a[i+1] from a[i]
>>> would be bad(?), so I'd see to allow unequal DR_INIT as "equal" for
>>> merging.  Maybe
>>> DR_INIT within a cacheline or so.
>>>
>>> How many extra distributions in say SPEC do you get from this change alone?
>> Hi,
>> I collected data for spec2006 only with/without this patch.  I am a
>> bit surprised that it doesn't change the number of distributed loops.
>>>
>>> It shows also that having partition->reads_and_writes would be nice
>>> ...  the code duplication
>> Yeah, I merged read/write data references in previous patch, now this
>> duplication is gone.  Update patch attached.  Is it OK?
>
> +      gcc_assert (i < datarefs_vec.length ());
> +      dr1 = datarefs_vec[i];
>
> these asserts are superfluous -- vec::operator[] does them as well.
>
> Ok if you remove them.
Done.
I realized I made mistakes when measuring the impact of this patch.
This patch only apparently causes failure of
gcc.dg/tree-ssa/ldist-6.c, so here is the updated patch.  I also
collected the number of distributed loops in spec2k6 as below:
     trunk:  5882
     only this patch: 7130
     whole patch series: 5237
So the conclusion is, this patch does aggressive distribution like
ldist-6.c, which means worse data-locality.  The following patch does
more fusion which mitigates impact of this patch and results in
conservative distribution overall.   But as we lack of data locality
cost model, ldist-6.c remains failed even after applying whole patch
series.  Hmm, a cache-sensitive cost model is need for several passes
now, distribution, prefetch and (possible) interchange.
Richard, do you have second comment based on the new data?

Thanks,
bin
2017-06-20  Bin Cheng  <bin.cheng@arm.com>

    * tree-loop-distribution.c (ref_base_address): Delete.
    (similar_memory_accesses): Rename ...
    (share_memory_accesses): ... to this.  Check if partitions access
    the same memory reference.
    (distribute_loop): Call share_memory_accesses.

gcc/testsuite/ChangeLog
2017-06-20  Bin Cheng  <bin.cheng@arm.com>

    * gcc.dg/tree-ssa/ldist-6.c: XFAIL.

Comments

Richard Biener June 23, 2017, 10:48 a.m. UTC | #1
On Fri, Jun 23, 2017 at 12:19 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Mon, Jun 19, 2017 at 4:20 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, Jun 19, 2017 at 3:40 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Wed, Jun 14, 2017 at 2:54 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>>> Hi,
>>>>> Current primitive cost model merges partitions with data references sharing the same
>>>>> base address.  I believe it's designed to maximize data reuse in distribution, but
>>>>> that should be done by dedicated data reusing algorithm.  At this stage of merging,
>>>>> we should be conservative and only merge partitions with the same references.
>>>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>>
>>>> Well, I'd say "conservative" is merging more, not less.  For example
>>>> splitting a[i+1] from a[i]
>>>> would be bad(?), so I'd see to allow unequal DR_INIT as "equal" for
>>>> merging.  Maybe
>>>> DR_INIT within a cacheline or so.
>>>>
>>>> How many extra distributions in say SPEC do you get from this change alone?
>>> Hi,
>>> I collected data for spec2006 only with/without this patch.  I am a
>>> bit surprised that it doesn't change the number of distributed loops.
>>>>
>>>> It shows also that having partition->reads_and_writes would be nice
>>>> ...  the code duplication
>>> Yeah, I merged read/write data references in previous patch, now this
>>> duplication is gone.  Update patch attached.  Is it OK?
>>
>> +      gcc_assert (i < datarefs_vec.length ());
>> +      dr1 = datarefs_vec[i];
>>
>> these asserts are superfluous -- vec::operator[] does them as well.
>>
>> Ok if you remove them.
> Done.
> I realized I made mistakes when measuring the impact of this patch.
> This patch only apparently causes failure of
> gcc.dg/tree-ssa/ldist-6.c, so here is the updated patch.  I also
> collected the number of distributed loops in spec2k6 as below:
>      trunk:  5882
>      only this patch: 7130
>      whole patch series: 5237
> So the conclusion is, this patch does aggressive distribution like
> ldist-6.c, which means worse data-locality.  The following patch does
> more fusion which mitigates impact of this patch and results in
> conservative distribution overall.

What changed in the patch?  Did you attach the correct one?

I'm not sure ldist-6.c is a "valid" testcase but I didn't try to see
where it was reduced from.

>   But as we lack of data locality
> cost model, ldist-6.c remains failed even after applying whole patch
> series.  Hmm, a cache-sensitive cost model is need for several passes
> now, distribution, prefetch and (possible) interchange.
> Richard, do you have second comment based on the new data?

I expected the "only this patch" result somewhat, as said, I'd have
allowed "related" references to fuse by not requiring equal
DR_INIT for example.

I suggest to go forward with it in its current form.  We can tweak the
cost model later.

Thanks,
Richard.

> Thanks,
> bin
> 2017-06-20  Bin Cheng  <bin.cheng@arm.com>
>
>     * tree-loop-distribution.c (ref_base_address): Delete.
>     (similar_memory_accesses): Rename ...
>     (share_memory_accesses): ... to this.  Check if partitions access
>     the same memory reference.
>     (distribute_loop): Call share_memory_accesses.
>
> gcc/testsuite/ChangeLog
> 2017-06-20  Bin Cheng  <bin.cheng@arm.com>
>
>     * gcc.dg/tree-ssa/ldist-6.c: XFAIL.
Bin.Cheng June 23, 2017, 10:51 a.m. UTC | #2
On Fri, Jun 23, 2017 at 11:48 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Jun 23, 2017 at 12:19 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Mon, Jun 19, 2017 at 4:20 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Jun 19, 2017 at 3:40 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>> On Wed, Jun 14, 2017 at 2:54 PM, Richard Biener
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>>>> Hi,
>>>>>> Current primitive cost model merges partitions with data references sharing the same
>>>>>> base address.  I believe it's designed to maximize data reuse in distribution, but
>>>>>> that should be done by dedicated data reusing algorithm.  At this stage of merging,
>>>>>> we should be conservative and only merge partitions with the same references.
>>>>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>>>
>>>>> Well, I'd say "conservative" is merging more, not less.  For example
>>>>> splitting a[i+1] from a[i]
>>>>> would be bad(?), so I'd see to allow unequal DR_INIT as "equal" for
>>>>> merging.  Maybe
>>>>> DR_INIT within a cacheline or so.
>>>>>
>>>>> How many extra distributions in say SPEC do you get from this change alone?
>>>> Hi,
>>>> I collected data for spec2006 only with/without this patch.  I am a
>>>> bit surprised that it doesn't change the number of distributed loops.
>>>>>
>>>>> It shows also that having partition->reads_and_writes would be nice
>>>>> ...  the code duplication
>>>> Yeah, I merged read/write data references in previous patch, now this
>>>> duplication is gone.  Update patch attached.  Is it OK?
>>>
>>> +      gcc_assert (i < datarefs_vec.length ());
>>> +      dr1 = datarefs_vec[i];
>>>
>>> these asserts are superfluous -- vec::operator[] does them as well.
>>>
>>> Ok if you remove them.
>> Done.
>> I realized I made mistakes when measuring the impact of this patch.
>> This patch only apparently causes failure of
>> gcc.dg/tree-ssa/ldist-6.c, so here is the updated patch.  I also
>> collected the number of distributed loops in spec2k6 as below:
>>      trunk:  5882
>>      only this patch: 7130
>>      whole patch series: 5237
>> So the conclusion is, this patch does aggressive distribution like
>> ldist-6.c, which means worse data-locality.  The following patch does
>> more fusion which mitigates impact of this patch and results in
>> conservative distribution overall.
>
> What changed in the patch?  Did you attach the correct one?
No code changed in this one.  I just added test case change which
can't be resolved by following patches.  ldist-6.c slipped away
because of a bug in patch:

[11/13]Annotate partition by its parallelism execution type

>
> I'm not sure ldist-6.c is a "valid" testcase but I didn't try to see
> where it was reduced from.
>
>>   But as we lack of data locality
>> cost model, ldist-6.c remains failed even after applying whole patch
>> series.  Hmm, a cache-sensitive cost model is need for several passes
>> now, distribution, prefetch and (possible) interchange.
>> Richard, do you have second comment based on the new data?
>
> I expected the "only this patch" result somewhat, as said, I'd have
> allowed "related" references to fuse by not requiring equal
> DR_INIT for example.
>
> I suggest to go forward with it in its current form.  We can tweak the
> cost model later.
Yeah.
>
> Thanks,
> Richard.
>
>> Thanks,
>> bin
>> 2017-06-20  Bin Cheng  <bin.cheng@arm.com>
>>
>>     * tree-loop-distribution.c (ref_base_address): Delete.
>>     (similar_memory_accesses): Rename ...
>>     (share_memory_accesses): ... to this.  Check if partitions access
>>     the same memory reference.
>>     (distribute_loop): Call share_memory_accesses.
>>
>> gcc/testsuite/ChangeLog
>> 2017-06-20  Bin Cheng  <bin.cheng@arm.com>
>>
>>     * gcc.dg/tree-ssa/ldist-6.c: XFAIL.
diff mbox

Patch

From a002d0a88ab9e981d9c57bd8f1203072290623ad Mon Sep 17 00:00:00 2001
From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com>
Date: Fri, 9 Jun 2017 12:41:36 +0100
Subject: [PATCH 08/13] share-memory-access-20170608.txt

---
 gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c |  2 +-
 gcc/tree-loop-distribution.c            | 69 +++++++++++++++------------------
 2 files changed, 32 insertions(+), 39 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c
index 8eb1c62..e0a68d8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c
@@ -34,4 +34,4 @@  int loop1 (int k)
   return a[1000-2] + b[1000-1] + c[1000-2] + d[1000-2];
 }
 
-/* { dg-final { scan-tree-dump-times "distributed: split to 2 loops" 0 "ldist" } } */
+/* { dg-final { scan-tree-dump-times "distributed: split to 2 loops" 0 "ldist" { xfail *-*-* } } } */
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index eafd119..119863f 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -1268,30 +1268,16 @@  classify_partition (loop_p loop, struct graph *rdg, partition *partition)
     }
 }
 
-/* For a data reference REF, return the declaration of its base
-   address or NULL_TREE if the base is not determined.  */
-
-static tree
-ref_base_address (data_reference_p dr)
-{
-  tree base_address = DR_BASE_ADDRESS (dr);
-  if (base_address
-      && TREE_CODE (base_address) == ADDR_EXPR)
-    return TREE_OPERAND (base_address, 0);
-
-  return base_address;
-}
-
-/* Returns true when PARTITION1 and PARTITION2 have similar memory
-   accesses in RDG.  */
+/* Returns true when PARTITION1 and PARTITION2 access the same memory
+   object in RDG.  */
 
 static bool
-similar_memory_accesses (struct graph *rdg, partition *partition1,
-			 partition *partition2)
+share_memory_accesses (struct graph *rdg,
+		       partition *partition1, partition *partition2)
 {
-  unsigned i, j, k, l;
+  unsigned i, j;
   bitmap_iterator bi, bj;
-  data_reference_p ref1, ref2;
+  data_reference_p dr1, dr2;
 
   /* First check whether in the intersection of the two partitions are
      any loads or stores.  Common loads are the situation that happens
@@ -1301,23 +1287,30 @@  similar_memory_accesses (struct graph *rdg, partition *partition1,
 	|| RDG_MEM_READS_STMT (rdg, i))
       return true;
 
-  /* Then check all data-references against each other.  */
-  EXECUTE_IF_SET_IN_BITMAP (partition1->stmts, 0, i, bi)
-    if (RDG_MEM_WRITE_STMT (rdg, i)
-	|| RDG_MEM_READS_STMT (rdg, i))
-      EXECUTE_IF_SET_IN_BITMAP (partition2->stmts, 0, j, bj)
-	if (RDG_MEM_WRITE_STMT (rdg, j)
-	    || RDG_MEM_READS_STMT (rdg, j))
-	  {
-	    FOR_EACH_VEC_ELT (RDG_DATAREFS (rdg, i), k, ref1)
-	      {
-		tree base1 = ref_base_address (ref1);
-		if (base1)
-		  FOR_EACH_VEC_ELT (RDG_DATAREFS (rdg, j), l, ref2)
-		    if (base1 == ref_base_address (ref2))
-		      return true;
-	      }
-	  }
+  /* Then check whether the two partitions access the same memory object.  */
+  EXECUTE_IF_SET_IN_BITMAP (partition1->datarefs, 0, i, bi)
+    {
+      dr1 = datarefs_vec[i];
+
+      if (!DR_BASE_ADDRESS (dr1)
+	  || !DR_OFFSET (dr1) || !DR_INIT (dr1) || !DR_STEP (dr1))
+	continue;
+
+      EXECUTE_IF_SET_IN_BITMAP (partition2->datarefs, 0, j, bj)
+	{
+	  dr2 = datarefs_vec[j];
+
+	  if (!DR_BASE_ADDRESS (dr2)
+	      || !DR_OFFSET (dr2) || !DR_INIT (dr2) || !DR_STEP (dr2))
+	    continue;
+
+	  if (operand_equal_p (DR_BASE_ADDRESS (dr1), DR_BASE_ADDRESS (dr2), 0)
+	      && operand_equal_p (DR_OFFSET (dr1), DR_OFFSET (dr2), 0)
+	      && operand_equal_p (DR_INIT (dr1), DR_INIT (dr2), 0)
+	      && operand_equal_p (DR_STEP (dr1), DR_STEP (dr2), 0))
+	    return true;
+	}
+    }
 
   return false;
 }
@@ -1654,7 +1647,7 @@  distribute_loop (struct loop *loop, vec<gimple *> stmts,
       for (int j = i + 1;
 	   partitions.iterate (j, &partition); ++j)
 	{
-	  if (similar_memory_accesses (rdg, into, partition))
+	  if (share_memory_accesses (rdg, into, partition))
 	    {
 	      partition_merge_into (into, partition, FUSE_SHARE_REF);
 	      partitions.unordered_remove (j);
-- 
1.9.1