diff mbox

enable loop fusion on isl-15

Message ID 1446759793-2875-1-git-send-email-s.pop@samsung.com
State New
Headers show

Commit Message

Sebastian Pop Nov. 5, 2015, 9:43 p.m. UTC
* graphite-optimize-isl.c (optimize_isl): Call
       isl_options_set_schedule_maximize_band_depth.

       * gcc.dg/graphite/fuse-1.c: New.
       * gcc.dg/graphite/fuse-2.c: New.
       * gcc.dg/graphite/interchange-13.c: Remove bogus check.
---
 gcc/graphite-optimize-isl.c                    |  2 +-
 gcc/testsuite/gcc.dg/graphite/fuse-1.c         | 43 ++++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/graphite/fuse-2.c         | 43 ++++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/graphite/interchange-13.c |  1 -
 4 files changed, 87 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/fuse-1.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/fuse-2.c

Comments

Alan Lawrence Dec. 4, 2015, 1:13 p.m. UTC | #1
On 05/11/15 21:43, Sebastian Pop wrote:
>         * graphite-optimize-isl.c (optimize_isl): Call
>         isl_options_set_schedule_maximize_band_depth.
>
>         * gcc.dg/graphite/fuse-1.c: New.
>         * gcc.dg/graphite/fuse-2.c: New.
>         * gcc.dg/graphite/interchange-13.c: Remove bogus check.

I note that the test

scan-tree-dump-times forwprop4 "gimple_simplified to[^\\n]*\\^ 12" 1

FAILs under isl-0.14, with which GCC can still be built and generally claims to 
work.

Is it worth trying to detect this in the testsuite, so we can XFAIL it? By which 
I mean, is there a reasonable testsuite mechanism by which we could do that?

Cheers, Alan
Mike Stump Dec. 4, 2015, 6:02 p.m. UTC | #2
On Dec 4, 2015, at 5:13 AM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> On 05/11/15 21:43, Sebastian Pop wrote:
>>        * graphite-optimize-isl.c (optimize_isl): Call
>>        isl_options_set_schedule_maximize_band_depth.
>> 
>>        * gcc.dg/graphite/fuse-1.c: New.
>>        * gcc.dg/graphite/fuse-2.c: New.
>>        * gcc.dg/graphite/interchange-13.c: Remove bogus check.
> 
> I note that the test
> 
> scan-tree-dump-times forwprop4 "gimple_simplified to[^\\n]*\\^ 12" 1
> 
> FAILs under isl-0.14, with which GCC can still be built and generally claims to work.
> 
> Is it worth trying to detect this in the testsuite, so we can XFAIL it? By which I mean, is there a reasonable testsuite mechanism by which we could do that?

You can permanently ignore it by updating to 0.15?  I don’t see the advantage of bothering to finesse this too much.  I don’t know of a way to detect 14 v 15 other than this test case, but, if you do that, you can’t use that result to gate this test case.  If one wanted to engineer in a way, one would expose the isl version via a preprocessor symbol (built in), and then the test case would use that to gate it.  If we had to fix it, I think I’d prefer we just raise the isl version to 15 or later and be done with it.
Sebastian Pop Dec. 4, 2015, 7:59 p.m. UTC | #3
I would highly recommend updating the required version of ISL to isl-0.15:
that would simplify the existing code, removing a lot of code under "#ifdef
old ISL",
and allow us to fully transition to schedule_trees instead of dealing with
the
old antiquated union_maps in the scheudler.  The result is faster
compilation time.

Thanks,
Sebastian

-----Original Message-----
From: Mike Stump [mailto:mikestump@comcast.net] 
Sent: Friday, December 04, 2015 12:03 PM
To: Alan Lawrence
Cc: Sebastian Pop; sebpop@gmail.com; gcc-patches@gcc.gnu.org;
hiraditya@msn.com
Subject: Re: [PATCH] enable loop fusion on isl-15

On Dec 4, 2015, at 5:13 AM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> On 05/11/15 21:43, Sebastian Pop wrote:
>>        * graphite-optimize-isl.c (optimize_isl): Call
>>        isl_options_set_schedule_maximize_band_depth.
>> 
>>        * gcc.dg/graphite/fuse-1.c: New.
>>        * gcc.dg/graphite/fuse-2.c: New.
>>        * gcc.dg/graphite/interchange-13.c: Remove bogus check.
> 
> I note that the test
> 
> scan-tree-dump-times forwprop4 "gimple_simplified to[^\\n]*\\^ 12" 1
> 
> FAILs under isl-0.14, with which GCC can still be built and generally
claims to work.
> 
> Is it worth trying to detect this in the testsuite, so we can XFAIL it? By
which I mean, is there a reasonable testsuite mechanism by which we could do
that?

You can permanently ignore it by updating to 0.15?  I don't see the
advantage of bothering to finesse this too much.  I don't know of a way to
detect 14 v 15 other than this test case, but, if you do that, you can't use
that result to gate this test case.  If one wanted to engineer in a way, one
would expose the isl version via a preprocessor symbol (built in), and then
the test case would use that to gate it.  If we had to fix it, I think I'd
prefer we just raise the isl version to 15 or later and be done with it.
Richard Biener Dec. 7, 2015, 10:53 a.m. UTC | #4
On Fri, Dec 4, 2015 at 8:59 PM, Sebastian Paul Pop <s.pop@samsung.com> wrote:
> I would highly recommend updating the required version of ISL to isl-0.15:
> that would simplify the existing code, removing a lot of code under "#ifdef
> old ISL",
> and allow us to fully transition to schedule_trees instead of dealing with
> the
> old antiquated union_maps in the scheudler.  The result is faster
> compilation time.

Hmm.  I think we agreed to raise the requirement to ISL 0.14.  OTOH the plan
was to make graphite enabled by -O3 [-fprofile-use] by default which would
mean making ISL a hard host requirement.  That raises the barrier on making
the version requirement stricter ...

Sebastian, were quite into stage3 already - what's your plans / progress with
the defaulting of GRPAHITE?  (compile-time / performance numbers though
I see ICEs still popping up - a good thing in some sense as it looks like
GRAPHITE gets testing).

Thanks,
Richard.

> Thanks,
> Sebastian
>
> -----Original Message-----
> From: Mike Stump [mailto:mikestump@comcast.net]
> Sent: Friday, December 04, 2015 12:03 PM
> To: Alan Lawrence
> Cc: Sebastian Pop; sebpop@gmail.com; gcc-patches@gcc.gnu.org;
> hiraditya@msn.com
> Subject: Re: [PATCH] enable loop fusion on isl-15
>
> On Dec 4, 2015, at 5:13 AM, Alan Lawrence <alan.lawrence@arm.com> wrote:
>> On 05/11/15 21:43, Sebastian Pop wrote:
>>>        * graphite-optimize-isl.c (optimize_isl): Call
>>>        isl_options_set_schedule_maximize_band_depth.
>>>
>>>        * gcc.dg/graphite/fuse-1.c: New.
>>>        * gcc.dg/graphite/fuse-2.c: New.
>>>        * gcc.dg/graphite/interchange-13.c: Remove bogus check.
>>
>> I note that the test
>>
>> scan-tree-dump-times forwprop4 "gimple_simplified to[^\\n]*\\^ 12" 1
>>
>> FAILs under isl-0.14, with which GCC can still be built and generally
> claims to work.
>>
>> Is it worth trying to detect this in the testsuite, so we can XFAIL it? By
> which I mean, is there a reasonable testsuite mechanism by which we could do
> that?
>
> You can permanently ignore it by updating to 0.15?  I don't see the
> advantage of bothering to finesse this too much.  I don't know of a way to
> detect 14 v 15 other than this test case, but, if you do that, you can't use
> that result to gate this test case.  If one wanted to engineer in a way, one
> would expose the isl version via a preprocessor symbol (built in), and then
> the test case would use that to gate it.  If we had to fix it, I think I'd
> prefer we just raise the isl version to 15 or later and be done with it.
>
Sebastian Pop Dec. 9, 2015, 5:45 p.m. UTC | #5
Richard Biener wrote:
> On Fri, Dec 4, 2015 at 8:59 PM, Sebastian Paul Pop <s.pop@samsung.com> wrote:
> > I would highly recommend updating the required version of ISL to isl-0.15:
> > that would simplify the existing code, removing a lot of code under "#ifdef
> > old ISL",
> > and allow us to fully transition to schedule_trees instead of dealing with
> > the
> > old antiquated union_maps in the scheudler.  The result is faster
> > compilation time.
> 
> Hmm.  I think we agreed to raise the requirement to ISL 0.14.  OTOH the plan
> was to make graphite enabled by -O3 [-fprofile-use] by default which would
> mean making ISL a hard host requirement.  That raises the barrier on making
> the version requirement stricter ...
> 
> Sebastian, were quite into stage3 already - what's your plans / progress with
> the defaulting of GRPAHITE?  (compile-time / performance numbers though
> I see ICEs still popping up - a good thing in some sense as it looks like
> GRAPHITE gets testing).

Richard,

to enable Graphite by default in -O3 we had two requirements:
- reduce compilation time when compiling with Graphite,
- show performance improvements when using Graphite.

We did a good work in reducing the overall compilation time used by Graphite
with the rewrite of the SCoP detection algorithm.  Overall compilation time
spent in SCoP detection on tramp3d-v4 was reduced from 7% to 0.3%.  During a
bootstrap of GCC the slowest SCoP detection was 0.3%, and the overall percentage
overhead in terms of number of instructions executed by the SCoP detection was
reduced from 0.3% to 0.01% (measured with counting the number of instructions
with valgrind.)  Details of algorithmic changes and experiments are in a paper
that we submitted (and just got accepted) in the polyhedral workshop IMPACT'16:
https://gcc.gnu.org/wiki/Graphite?action=AttachFile&do=view&target=scop-detection-submitted-for-review.pdf

On the code generation side, we made sure to keep the representation in SSA all
the time, such that when Graphite does not transform the code, the original code
is left in place with no change.  When Graphite does a transform, the generated
code is also in SSA form, and we fixed several scalar performance issues linked
to the loop level where scalar definitions were inserted: now we compute the
insertion place such that there is no need for other code motion passes like
LICM to optimize the code generated by Graphite.

We are still working on tuning Graphite's fusion, tiling, and interchange and we
will report improvements on the Polybench loop kernels soon.

Thanks,
Sebastian

> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Sebastian
> >
> > -----Original Message-----
> > From: Mike Stump [mailto:mikestump@comcast.net]
> > Sent: Friday, December 04, 2015 12:03 PM
> > To: Alan Lawrence
> > Cc: Sebastian Pop; sebpop@gmail.com; gcc-patches@gcc.gnu.org;
> > hiraditya@msn.com
> > Subject: Re: [PATCH] enable loop fusion on isl-15
> >
> > On Dec 4, 2015, at 5:13 AM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> >> On 05/11/15 21:43, Sebastian Pop wrote:
> >>>        * graphite-optimize-isl.c (optimize_isl): Call
> >>>        isl_options_set_schedule_maximize_band_depth.
> >>>
> >>>        * gcc.dg/graphite/fuse-1.c: New.
> >>>        * gcc.dg/graphite/fuse-2.c: New.
> >>>        * gcc.dg/graphite/interchange-13.c: Remove bogus check.
> >>
> >> I note that the test
> >>
> >> scan-tree-dump-times forwprop4 "gimple_simplified to[^\\n]*\\^ 12" 1
> >>
> >> FAILs under isl-0.14, with which GCC can still be built and generally
> > claims to work.
> >>
> >> Is it worth trying to detect this in the testsuite, so we can XFAIL it? By
> > which I mean, is there a reasonable testsuite mechanism by which we could do
> > that?
> >
> > You can permanently ignore it by updating to 0.15?  I don't see the
> > advantage of bothering to finesse this too much.  I don't know of a way to
> > detect 14 v 15 other than this test case, but, if you do that, you can't use
> > that result to gate this test case.  If one wanted to engineer in a way, one
> > would expose the isl version via a preprocessor symbol (built in), and then
> > the test case would use that to gate it.  If we had to fix it, I think I'd
> > prefer we just raise the isl version to 15 or later and be done with it.
> >
Richard Biener Dec. 9, 2015, 6:18 p.m. UTC | #6
On December 9, 2015 6:45:42 PM GMT+01:00, Sebastian Pop <sebpop@gmail.com> wrote:
>Richard Biener wrote:
>> On Fri, Dec 4, 2015 at 8:59 PM, Sebastian Paul Pop
><s.pop@samsung.com> wrote:
>> > I would highly recommend updating the required version of ISL to
>isl-0.15:
>> > that would simplify the existing code, removing a lot of code under
>"#ifdef
>> > old ISL",
>> > and allow us to fully transition to schedule_trees instead of
>dealing with
>> > the
>> > old antiquated union_maps in the scheudler.  The result is faster
>> > compilation time.
>> 
>> Hmm.  I think we agreed to raise the requirement to ISL 0.14.  OTOH
>the plan
>> was to make graphite enabled by -O3 [-fprofile-use] by default which
>would
>> mean making ISL a hard host requirement.  That raises the barrier on
>making
>> the version requirement stricter ...
>> 
>> Sebastian, were quite into stage3 already - what's your plans /
>progress with
>> the defaulting of GRPAHITE?  (compile-time / performance numbers
>though
>> I see ICEs still popping up - a good thing in some sense as it looks
>like
>> GRAPHITE gets testing).
>
>Richard,
>
>to enable Graphite by default in -O3 we had two requirements:
>- reduce compilation time when compiling with Graphite,
>- show performance improvements when using Graphite.
>
>We did a good work in reducing the overall compilation time used by
>Graphite
>with the rewrite of the SCoP detection algorithm.  Overall compilation
>time
>spent in SCoP detection on tramp3d-v4 was reduced from 7% to 0.3%. 
>During a
>bootstrap of GCC the slowest SCoP detection was 0.3%, and the overall
>percentage
>overhead in terms of number of instructions executed by the SCoP
>detection was
>reduced from 0.3% to 0.01% (measured with counting the number of
>instructions
>with valgrind.)  Details of algorithmic changes and experiments are in
>a paper
>that we submitted (and just got accepted) in the polyhedral workshop
>IMPACT'16:
>https://gcc.gnu.org/wiki/Graphite?action=AttachFile&do=view&target=scop-detection-submitted-for-review.pdf
>
>On the code generation side, we made sure to keep the representation in
>SSA all
>the time, such that when Graphite does not transform the code, the
>original code
>is left in place with no change.  When Graphite does a transform, the
>generated
>code is also in SSA form, and we fixed several scalar performance
>issues linked
>to the loop level where scalar definitions were inserted: now we
>compute the
>insertion place such that there is no need for other code motion passes
>like
>LICM to optimize the code generated by Graphite.
>
>We are still working on tuning Graphite's fusion, tiling, and
>interchange and we
>will report improvements on the Polybench loop kernels soon.

Thanks for the progress report!

Let's raise the ISL requirement to 0.14 now to see whether it imposes problems on some hosts.

Richard.

>Thanks,
>Sebastian
>
>> 
>> Thanks,
>> Richard.
>> 
>> > Thanks,
>> > Sebastian
>> >
>> > -----Original Message-----
>> > From: Mike Stump [mailto:mikestump@comcast.net]
>> > Sent: Friday, December 04, 2015 12:03 PM
>> > To: Alan Lawrence
>> > Cc: Sebastian Pop; sebpop@gmail.com; gcc-patches@gcc.gnu.org;
>> > hiraditya@msn.com
>> > Subject: Re: [PATCH] enable loop fusion on isl-15
>> >
>> > On Dec 4, 2015, at 5:13 AM, Alan Lawrence <alan.lawrence@arm.com>
>wrote:
>> >> On 05/11/15 21:43, Sebastian Pop wrote:
>> >>>        * graphite-optimize-isl.c (optimize_isl): Call
>> >>>        isl_options_set_schedule_maximize_band_depth.
>> >>>
>> >>>        * gcc.dg/graphite/fuse-1.c: New.
>> >>>        * gcc.dg/graphite/fuse-2.c: New.
>> >>>        * gcc.dg/graphite/interchange-13.c: Remove bogus check.
>> >>
>> >> I note that the test
>> >>
>> >> scan-tree-dump-times forwprop4 "gimple_simplified to[^\\n]*\\^ 12"
>1
>> >>
>> >> FAILs under isl-0.14, with which GCC can still be built and
>generally
>> > claims to work.
>> >>
>> >> Is it worth trying to detect this in the testsuite, so we can
>XFAIL it? By
>> > which I mean, is there a reasonable testsuite mechanism by which we
>could do
>> > that?
>> >
>> > You can permanently ignore it by updating to 0.15?  I don't see the
>> > advantage of bothering to finesse this too much.  I don't know of a
>way to
>> > detect 14 v 15 other than this test case, but, if you do that, you
>can't use
>> > that result to gate this test case.  If one wanted to engineer in a
>way, one
>> > would expose the isl version via a preprocessor symbol (built in),
>and then
>> > the test case would use that to gate it.  If we had to fix it, I
>think I'd
>> > prefer we just raise the isl version to 15 or later and be done
>with it.
>> >
diff mbox

Patch

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 53355bb..0d85975 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -404,7 +404,7 @@  optimize_isl (scop_p scop)
   isl_options_set_schedule_maximize_band_depth (scop->isl_context, 1);
 #ifdef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
   /* ISL-0.15 or later.  */
-  isl_options_set_schedule_serialize_sccs (scop->isl_context, 1);
+  isl_options_set_schedule_maximize_band_depth (scop->isl_context, 1);
 #else
   isl_options_set_schedule_fuse (scop->isl_context, ISL_SCHEDULE_FUSE_MIN);
 #endif
diff --git a/gcc/testsuite/gcc.dg/graphite/fuse-1.c b/gcc/testsuite/gcc.dg/graphite/fuse-1.c
new file mode 100644
index 0000000..c9bb67d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/fuse-1.c
@@ -0,0 +1,43 @@ 
+/* Check that the two loops are fused and that we manage to fold the two xor
+   operations.  */
+/* { dg-options "-O2 -floop-nest-optimize -fdump-tree-forwprop-all" } */
+/* { dg-do run } */
+
+/* Make sure we fuse the loops like this:
+ISL AST generated by ISL:
+for (int c0 = 0; c0 <= 99; c0 += 1) {
+  S_3(c0);
+  S_6(c0);
+  S_9(c0);
+}
+*/
+/* { dg-final { scan-tree-dump-times "ISL AST generated by ISL:.*for (int c0 = 0; c0 <= 99; c0 += 1) \{.*S_.*(c0);.*S_.*(c0);.*S_.*(c0);.*\}" 1 "graphite" } } */
+
+/* Check that after fusing the loops, the scalar computation is also fused.  */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to\[^\\n\]*\\^ 12" 1 "forwprop4" } } */
+
+
+
+#define MAX 100
+int A[MAX];
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i;
+
+  for (i = 0; i < MAX; i++)
+    A[i] = i;
+  for(int i=0; i<MAX; i++)
+    A[i] ^= 4;
+  for(int i=0; i<MAX; i++)
+    A[i] ^= 8;
+
+  for (i = 0; i < MAX; i++)
+    if (A[i] != (i ^ 12))
+      abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/graphite/fuse-2.c b/gcc/testsuite/gcc.dg/graphite/fuse-2.c
new file mode 100644
index 0000000..aaa5e2f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/fuse-2.c
@@ -0,0 +1,43 @@ 
+/* Check that the three loops are fused.  */
+/* { dg-options "-O2 -floop-nest-optimize" } */
+/* { dg-do run } */
+
+/* Make sure we fuse the loops like this:
+ISL AST generated by ISL:
+for (int c0 = 0; c0 <= 99; c0 += 1) {
+  S_3(c0);
+  S_6(c0);
+  S_9(c0);
+}
+*/
+
+/* { dg-final { scan-tree-dump-times "ISL AST generated by ISL:.*for (int c0 = 0; c0 <= 99; c0 += 1) \{.*S_.*(c0);.*S_.*(c0);.*S_.*(c0);.*\}" 1 "graphite" } } */
+
+#define MAX 100
+int A[MAX], B[MAX], C[MAX];
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i;
+
+  /* The next three loops should be fused.  */
+  for (i = 0; i < MAX; i++)
+    {
+      A[i] = i;
+      B[i] = i + 2;
+      C[i] = i + 1;
+    }
+  for(int i=0; i<MAX; i++)
+    A[i] += B[i];
+  for(int i=0; i<MAX; i++)
+    A[i] += C[i];
+
+  for (i = 0; i < MAX; i++)
+    if (A[i] != 3*i+3)
+      abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-13.c b/gcc/testsuite/gcc.dg/graphite/interchange-13.c
index 3398de2..4e4a83e 100644
--- a/gcc/testsuite/gcc.dg/graphite/interchange-13.c
+++ b/gcc/testsuite/gcc.dg/graphite/interchange-13.c
@@ -49,4 +49,3 @@  main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "tiled" "graphite" } } */