diff mbox

Reduce complette unrolling & peeling limits

Message ID 20121114233407.GC12910@kam.mff.cuni.cz
State New
Headers show

Commit Message

Jan Hubicka Nov. 14, 2012, 11:34 p.m. UTC
Hi,
this patch reduces max-peeled-insns and max-completely-peeled-insns from 400 to
100.  The reason why I am doing this is that I want to reduce code bloat caused
by my cunroll work that enabled a lot more unrolling then previously causing
considerable code size regression at -O3.

I do not think those params was ever serviously tunned, or re-tunned after
introduction of tree-ssa peeling.  I bootstrapped/regtested x86_64 with few
values - 4000, 200, 100, 50 on spec2000,spec2k6,C++ benchmarks and polyhedron.

I also did partial tests on ia-64 (that is broken quite a lot now, but I wanted
to have some sanity check that these values are not too x86 specific).

With 4000 (and also bumped up max-peel-times/max-completely-peel-times) there
are improvements on
  ammp 1360->1460
  equake 1800->1840
  applu 1450->1500
but i guess those needs to be handled by better heuristic.

Otherwise there are no perfromance regression with going 400->100. With 50
there are tiny performance drops on swim and applu.

I plan to follow by testing the max-peel times parameters and then doing inliner
tests.

Bootstrapped/regtested x86_64-linux, OK?

Honza

	* params.def (max-peeled-insns, max-completely-peeled-insns): Reduce to 100.

Comments

Jakub Jelinek Nov. 15, 2012, 1:19 p.m. UTC | #1
On Thu, Nov 15, 2012 at 12:34:07AM +0100, Jan Hubicka wrote:
> 	* params.def (max-peeled-insns, max-completely-peeled-insns): Reduce to 100.

Ok, thanks.

> --- params.def	(revision 193505)
> +++ params.def	(working copy)
> @@ -290,7 +290,7 @@ DEFPARAM(PARAM_MAX_UNROLL_TIMES,
>  DEFPARAM(PARAM_MAX_PEELED_INSNS,
>  	"max-peeled-insns",
>  	"The maximum number of insns of a peeled loop",
> -	400, 0, 0)
> +	100, 0, 0)
>  /* The maximum number of peelings of a single loop.  */
>  DEFPARAM(PARAM_MAX_PEEL_TIMES,
>  	"max-peel-times",
> @@ -305,7 +305,7 @@ DEFPARAM(PARAM_MAX_PEEL_BRANCHES,
>  DEFPARAM(PARAM_MAX_COMPLETELY_PEELED_INSNS,
>  	"max-completely-peeled-insns",
>  	"The maximum number of insns of a completely peeled loop",
> -	400, 0, 0)
> +	100, 0, 0)
>  /* The maximum number of peelings of a single loop that is peeled completely.  */
>  DEFPARAM(PARAM_MAX_COMPLETELY_PEEL_TIMES,
>  	"max-completely-peel-times",

	Jakub
Eric Botcazou Nov. 18, 2012, 8:07 a.m. UTC | #2
> this patch reduces max-peeled-insns and max-completely-peeled-insns from 400
> to 100.  The reason why I am doing this is that I want to reduce code bloat
> caused by my cunroll work that enabled a lot more unrolling then previously
> causing considerable code size regression at -O3.

Did you notice that gcc.c-torture/compile/pr43186.c regressed?  It now again 
takes a while to compile, so times out on slow machines:

FAIL: gcc.c-torture/compile/pr43186.c  -O3 -fomit-frame-pointer  (test for 
excess errors)
FAIL: gcc.c-torture/compile/pr43186.c  -O3 -fomit-frame-pointer -funroll-loops  
(test for excess errors)
FAIL: gcc.c-torture/compile/pr43186.c  -O3 -fomit-frame-pointer -funroll-all-
loops -finline-functions  (test for excess errors)
FAIL: gcc.c-torture/compile/pr43186.c  -O3 -g  (test for excess errors)
WARNING: program timed out.
WARNING: program timed out.
WARNING: program timed out.
WARNING: program timed out.
Jan Hubicka Nov. 18, 2012, 4:52 p.m. UTC | #3
> > this patch reduces max-peeled-insns and max-completely-peeled-insns from 400
> > to 100.  The reason why I am doing this is that I want to reduce code bloat
> > caused by my cunroll work that enabled a lot more unrolling then previously
> > causing considerable code size regression at -O3.
> 
> Did you notice that gcc.c-torture/compile/pr43186.c regressed?  It now again 
> takes a while to compile, so times out on slow machines:

I did not :(.  I am currently on a trip, but will take a look on tuesday.
If it seems to disturb testing, please just revert the patch for time being.

Honza
Jan Hubicka Nov. 18, 2012, 5:08 p.m. UTC | #4
> > > this patch reduces max-peeled-insns and max-completely-peeled-insns from 400
> > > to 100.  The reason why I am doing this is that I want to reduce code bloat
> > > caused by my cunroll work that enabled a lot more unrolling then previously
> > > causing considerable code size regression at -O3.
> > 
> > Did you notice that gcc.c-torture/compile/pr43186.c regressed?  It now again 
> > takes a while to compile, so times out on slow machines:
> 
> I did not :(.  I am currently on a trip, but will take a look on tuesday.
> If it seems to disturb testing, please just revert the patch for time being.

OK, here are multiple issues.
1) recursive inlining makes huge loop nest (of 18 loops)
2) SCEV is very slow on answering simple_iv tests in this case becuase it walks
   the nest
3) unroller is computing loop body size even when it is clear the body is much larger
   than the limit (the outer loop has 78000 instructions)

I will prepare patches to fix those issues. 

Honza
> 
> Honza
Eric Botcazou Nov. 18, 2012, 6:45 p.m. UTC | #5
> OK, here are multiple issues.
> 1) recursive inlining makes huge loop nest (of 18 loops)
> 2) SCEV is very slow on answering simple_iv tests in this case becuase it
> walks the nest
> 3) unroller is computing loop body size even when it is clear the body is
> much larger than the limit (the outer loop has 78000 instructions)
> 
> I will prepare patches to fix those issues.

Thanks for the analysis (and don't worry, I won't revert anything :-).
Hans-Peter Nilsson Nov. 23, 2012, 6:45 p.m. UTC | #6
On Sun, 18 Nov 2012, Jan Hubicka wrote:
> > > > this patch reduces max-peeled-insns and max-completely-peeled-insns from 400
> > > > to 100.  The reason why I am doing this is that I want to reduce code bloat
> > > > caused by my cunroll work that enabled a lot more unrolling then previously
> > > > causing considerable code size regression at -O3.
> > >
> > > Did you notice that gcc.c-torture/compile/pr43186.c regressed?  It now again
> > > takes a while to compile, so times out on slow machines:
> >
> > I did not :(.  I am currently on a trip, but will take a look on tuesday.
> > If it seems to disturb testing, please just revert the patch for time being.
>
> OK, here are multiple issues.
> 1) recursive inlining makes huge loop nest (of 18 loops)
> 2) SCEV is very slow on answering simple_iv tests in this case becuase it walks
>    the nest
> 3) unroller is computing loop body size even when it is clear the body is much larger
>    than the limit (the outer loop has 78000 instructions)
>
> I will prepare patches to fix those issues.

The recent (well, a week ago) params.def change also regressed
gfortran.dg/reassoc_4.f almost everywhere; see PR55452.

I guess a fix is fairly trivial, I just don't know to what.

brgds, H-P
Jan Hubicka Nov. 24, 2012, 7:47 a.m. UTC | #7
> On Sun, 18 Nov 2012, Jan Hubicka wrote:
> > > > > this patch reduces max-peeled-insns and max-completely-peeled-insns from 400
> > > > > to 100.  The reason why I am doing this is that I want to reduce code bloat
> > > > > caused by my cunroll work that enabled a lot more unrolling then previously
> > > > > causing considerable code size regression at -O3.
> > > >
> > > > Did you notice that gcc.c-torture/compile/pr43186.c regressed?  It now again
> > > > takes a while to compile, so times out on slow machines:
> > >
> > > I did not :(.  I am currently on a trip, but will take a look on tuesday.
> > > If it seems to disturb testing, please just revert the patch for time being.
> >
> > OK, here are multiple issues.
> > 1) recursive inlining makes huge loop nest (of 18 loops)
> > 2) SCEV is very slow on answering simple_iv tests in this case becuase it walks
> >    the nest
> > 3) unroller is computing loop body size even when it is clear the body is much larger
> >    than the limit (the outer loop has 78000 instructions)
> >
> > I will prepare patches to fix those issues.
> 
> The recent (well, a week ago) params.def change also regressed
> gfortran.dg/reassoc_4.f almost everywhere; see PR55452.
> 
> I guess a fix is fairly trivial, I just don't know to what.

Yes, we siply should add explicit unrolling limits there, I believe I posted a patch?
I am currently on a way, I will look up the message and/or post it.

Honza
> 
> brgds, H-P
diff mbox

Patch

Index: params.def
===================================================================
--- params.def	(revision 193505)
+++ params.def	(working copy)
@@ -290,7 +290,7 @@  DEFPARAM(PARAM_MAX_UNROLL_TIMES,
 DEFPARAM(PARAM_MAX_PEELED_INSNS,
 	"max-peeled-insns",
 	"The maximum number of insns of a peeled loop",
-	400, 0, 0)
+	100, 0, 0)
 /* The maximum number of peelings of a single loop.  */
 DEFPARAM(PARAM_MAX_PEEL_TIMES,
 	"max-peel-times",
@@ -305,7 +305,7 @@  DEFPARAM(PARAM_MAX_PEEL_BRANCHES,
 DEFPARAM(PARAM_MAX_COMPLETELY_PEELED_INSNS,
 	"max-completely-peeled-insns",
 	"The maximum number of insns of a completely peeled loop",
-	400, 0, 0)
+	100, 0, 0)
 /* The maximum number of peelings of a single loop that is peeled completely.  */
 DEFPARAM(PARAM_MAX_COMPLETELY_PEEL_TIMES,
 	"max-completely-peel-times",