diff mbox series

Increase min-lto-partition.

Message ID cc527602-2ed8-79ad-f082-e0e0279aa310@suse.cz
State New
Headers show
Series Increase min-lto-partition. | expand

Commit Message

Martin Liška March 13, 2020, 2:25 p.m. UTC
Hi.

I played a bit with -flinker-output=nolto-rel of gimple-match.ii and I identified
that current default of min-lto-partition leads to too many LTRANS. We pay with
LTO overhead and so that user time is high.

Base is:
$ g++ -O2 /tmp/gimple-match.ii -c -fno-checking -o x.o
real	0m40.130s
user	0m39.911s

LGEN:

$ time g++ -O2 /tmp/gimple-match.ii -c -flto -fno-checking
real	0m8.709s
user	0m8.543s

WPA+LTRANS:

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o --param lto-partitions=4  -fno-checking
real	0m11.220s
user	0m33.067s

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o --param lto-partitions=6  -fno-checking
real	0m9.880s
user	0m35.599s

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o --param lto-partitions=8  -fno-checking
real	0m6.681s
user	0m39.746s

default:
$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o -fno-checking
real	0m6.065s
user	1m22.698s

So I would recommend to set the param value to 75000, which leads to 6 partitions. That would be:

9+10s = 19s vs. 40s (total real time 44s). That seems reasonable to me.

Thoughts?
Thanks,
Martin

gcc/ChangeLog:

2020-03-13  Martin Liska  <mliska@suse.cz>

	* params.opt: Bump min-lto-partition in order to not create
	too many LTRANS.
---
  gcc/params.opt | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jan Hubicka March 13, 2020, 2:50 p.m. UTC | #1
> Hi.
> 
> I played a bit with -flinker-output=nolto-rel of gimple-match.ii and I identified
> that current default of min-lto-partition leads to too many LTRANS. We pay with
> LTO overhead and so that user time is high.
> 
> Base is:
> $ g++ -O2 /tmp/gimple-match.ii -c -fno-checking -o x.o
> real	0m40.130s
> user	0m39.911s

Did you configured compiler with checking? If so, I think the benchmarks
are not that good, because -fchecking does not control everything.
It would be relevant for gcc bootstrap but notmuch else. In that case I
would go with explicit --param in our Makefile.

I tried your experiment with linking tramp3d (with -O2 -flto on EPYC)
hubicka@lomikamen-jh:~$ time /aux/hubicka/trunk-install/bin/g++
-flto=auto tramp3d-v44.o --param lto-min-partition=10000

real    0m12.574s
user    1m5.010s
sys     0m0.970s
hubicka@lomikamen-jh:~$ time /aux/hubicka/trunk-install/bin/g++
-flto=auto tramp3d-v44.o --param lto-min-partition=20000

real    0m17.926s
user    1m1.259s
sys     0m1.153s
hubicka@lomikamen-jh:~$ time /aux/hubicka/trunk-install/bin/g++
-flto=auto tramp3d-v44.o --param lto-min-partition=30000

real    0m22.115s
user    0m56.964s
sys     0m0.892s
hubicka@lomikamen-jh:~$ time /aux/hubicka/trunk-install/bin/g++
-flto=auto tramp3d-v44.o --param lto-min-partition=40000

real    0m23.510s
user    0m50.783s
sys     0m0.983s
hubicka@lomikamen-jh:~$ time /aux/hubicka/trunk-install/bin/g++
-flto=auto tramp3d-v44.o --param lto-min-partition=50000

real    0m28.410s
user    0m46.146s
sys     0m0.680s
hubicka@lomikamen-jh:~$ time /aux/hubicka/trunk-install/bin/g++
-flto=auto tramp3d-v44.o --param lto-min-partition=60000

real    0m32.304s
user    0m46.114s
sys     0m0.720s
hubicka@lomikamen-jh:~$ time /aux/hubicka/trunk-install/bin/g++
-flto=auto tramp3d-v44.o --param lto-min-partition=70000

real    0m42.332s
user    0m50.521s
sys     0m0.749s

So going from 10000 to 70000 seems to decarese user time from 65 to 50s
(30% reduction) however the overall linktime goes up from 12s to 42s
(3.5 times)

Which does not seem that great tradeoff. Moreover I seem to get best
results with:
hubicka@lomikamen-jh:~$ time /aux/hubicka/trunk-install/bin/g++
-flto=auto tramp3d-v44.o --param lto-min-partition=1 --param
lto-partitions=200

real    0m5.752s
user    1m8.949s
sys     0m3.826s

Both genmatch and tramp3d seems bit extreme sources, but perhaps we want
to explore thi bit further..
I will try to re-measure your results on my setup so we get idea how
much sensitive it is :)

Honza

> 
> LGEN:
> 
> $ time g++ -O2 /tmp/gimple-match.ii -c -flto -fno-checking
> real	0m8.709s
> user	0m8.543s
> 
> WPA+LTRANS:
> 
> $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o --param lto-partitions=4  -fno-checking
> real	0m11.220s
> user	0m33.067s
> 
> $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o --param lto-partitions=6  -fno-checking
> real	0m9.880s
> user	0m35.599s
> 
> $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o --param lto-partitions=8  -fno-checking
> real	0m6.681s
> user	0m39.746s
> 
> default:
> $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o -fno-checking
> real	0m6.065s
> user	1m22.698s
> 
> So I would recommend to set the param value to 75000, which leads to 6 partitions. That would be:
> 
> 9+10s = 19s vs. 40s (total real time 44s). That seems reasonable to me.
> 
> Thoughts?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2020-03-13  Martin Liska  <mliska@suse.cz>
> 
> 	* params.opt: Bump min-lto-partition in order to not create
> 	too many LTRANS.
> ---
>  gcc/params.opt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 

> diff --git a/gcc/params.opt b/gcc/params.opt
> index e39216aa7d0..49fafac20af 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -363,7 +363,7 @@ Common Joined UInteger Var(param_max_lto_streaming_parallelism) Init(32) Integer
>  maximal number of LTO partitions streamed in parallel.
>  
>  -param=lto-min-partition=
> -Common Joined UInteger Var(param_min_partition_size) Init(10000) Param
> +Common Joined UInteger Var(param_min_partition_size) Init(75000) Param
>  Minimal size of a partition for LTO (in estimated instructions).
>  
>  -param=lto-partitions=
>
Jan Hubicka March 13, 2020, 3:11 p.m. UTC | #2
> > $ time g++ -O2 /tmp/gimple-match.ii -c -flto -fno-checking
> > real	0m8.709s
> > user	0m8.543s
> > 
> > WPA+LTRANS:
> > 
> > $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o --param lto-partitions=4  -fno-checking
> > real	0m11.220s
> > user	0m33.067s
> > 
> > $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o --param lto-partitions=6  -fno-checking
> > real	0m9.880s
> > user	0m35.599s
> > 
> > $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o --param lto-partitions=8  -fno-checking
> > real	0m6.681s
> > user	0m39.746s
> > 
> > default:
> > $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o -fno-checking
> > real	0m6.065s
> > user	1m22.698s

I did
/aux/hubicka/trunk-git/build2/./prev-gcc/xg++ -B/aux/hubicka/trunk-git/build2/./prev-gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ -nostdinc++ -B/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -B/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -I/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I/aux/hubicka/trunk-git/libstdc++-v3/libsupc++ -L/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -L/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -fno-PIE -c   -g -O2 -fchecking=0  -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -Wno-unused -DHAVE_CONFIG_H -I. -I. -I../../gcc -I../../gcc/.  -I../../gcc/../include -I../../gcc/../libcpp/include -I/aux/hubicka/trunk-git/build2/./gmp -I/aux/hubicka/trunk-git/gmp -I/aux/hubicka/trunk-git/build2/./mpfr/src -I/aux/hubicka/trunk-git/mpfr/src -I/aux/hubicka/trunk-git/mpc/src -I../../gcc/../libdecnumber -I../../gcc/../libdecnumber/bid -I../libdecnumber -I../../gcc/../libbacktrace -I/aux/hubicka/trunk-git/build2/./isl/include -I/aux/hubicka/trunk-git/isl/include  -o gimple-match.o -MT gimple-match.o -MMD -MP -MF ./.deps/gimple-match.TPo gimple-match.c -flto

(copying from build disabling checking and adding -flto) and I get:
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=128 -r

real    0m10.394s
user    2m13.809s
sys     0m3.896s
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=8 -r

real    0m21.033s
user    2m3.063s
sys     0m2.539s
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=6 -r

real    0m23.975s
user    1m56.139s
sys     0m2.595s
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=4 -r

real    0m32.383s
user    1m39.411s
sys     0m2.213s

With debug info disabled (like you do, but I guess in less realistic
setting) I get:

hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
/aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
gimple-match.o -fno-checking --param lto-partitions=128 -r

real    0m10.905s
user    1m55.065s
sys     0m2.956s
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
/aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
gimple-match.o -fno-checking --param lto-partitions=8 -r

real    0m17.297s
user    1m26.513s
sys     0m1.626s
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
/aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
gimple-match.o -fno-checking --param lto-partitions=6 -r

real    0m22.365s
user    1m30.969s
sys     0m1.386s
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
/aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
gimple-match.o -fno-checking --param lto-partitions=4 -r

real    0m26.534s
user    1m21.593s
sys     0m0.902s

So I do not see such notable idfference in user times (but they are
consistently worse than yours). Perhaps, can you try to perf it
including the system profile? It may give us some idea why things behave
differently.

Compiler binary I use is profiledbootstrapped with LTO.

Honza
> > 
> > So I would recommend to set the param value to 75000, which leads to 6 partitions. That would be:
> > 
> > 9+10s = 19s vs. 40s (total real time 44s). That seems reasonable to me.
> > 
> > Thoughts?
> > Thanks,
> > Martin
> > 
> > gcc/ChangeLog:
> > 
> > 2020-03-13  Martin Liska  <mliska@suse.cz>
> > 
> > 	* params.opt: Bump min-lto-partition in order to not create
> > 	too many LTRANS.
> > ---
> >  gcc/params.opt | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > 
> 
> > diff --git a/gcc/params.opt b/gcc/params.opt
> > index e39216aa7d0..49fafac20af 100644
> > --- a/gcc/params.opt
> > +++ b/gcc/params.opt
> > @@ -363,7 +363,7 @@ Common Joined UInteger Var(param_max_lto_streaming_parallelism) Init(32) Integer
> >  maximal number of LTO partitions streamed in parallel.
> >  
> >  -param=lto-min-partition=
> > -Common Joined UInteger Var(param_min_partition_size) Init(10000) Param
> > +Common Joined UInteger Var(param_min_partition_size) Init(75000) Param
> >  Minimal size of a partition for LTO (in estimated instructions).
> >  
> >  -param=lto-partitions=
> > 
>
Martin Liška March 13, 2020, 3:25 p.m. UTC | #3
On 3/13/20 4:11 PM, Jan Hubicka wrote:
>>> $ time g++ -O2 /tmp/gimple-match.ii -c -flto -fno-checking
>>> real	0m8.709s
>>> user	0m8.543s
>>>
>>> WPA+LTRANS:
>>>
>>> $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o --param lto-partitions=4  -fno-checking
>>> real	0m11.220s
>>> user	0m33.067s
>>>
>>> $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o --param lto-partitions=6  -fno-checking
>>> real	0m9.880s
>>> user	0m35.599s
>>>
>>> $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o --param lto-partitions=8  -fno-checking
>>> real	0m6.681s
>>> user	0m39.746s
>>>
>>> default:
>>> $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o gimple-match2.o -fno-checking
>>> real	0m6.065s
>>> user	1m22.698s
> 
> I did
> /aux/hubicka/trunk-git/build2/./prev-gcc/xg++ -B/aux/hubicka/trunk-git/build2/./prev-gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ -nostdinc++ -B/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -B/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -I/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu -I/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/include -I/aux/hubicka/trunk-git/libstdc++-v3/libsupc++ -L/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs -L/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -fno-PIE -c   -g -O2 -fchecking=0  -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -Wno-unused -DHAVE_CONFIG_H -I. -I. -I../../gcc -I../../gcc/.  -I../../gcc/../include -I../../gcc/../libcpp/include -I/aux/hubicka/trunk-git/build2/./gmp -I/aux/hubicka/trunk-git/gmp -I/aux/hubicka/trunk-git/build2/./mpfr/src -I/aux/hubicka/trunk-git/mpfr/src -I/aux/hubicka/trunk-git/mpc/src -I../../gcc/../libdecnumber -I../../gcc/../libdecnumber/bid -I../libdecnumber -I../../gcc/../libbacktrace -I/aux/hubicka/trunk-git/build2/./isl/include -I/aux/hubicka/trunk-git/isl/include  -o gimple-match.o -MT gimple-match.o -MMD -MP -MF ./.deps/gimple-match.TPo gimple-match.c -flto
> 
> (copying from build disabling checking and adding -flto) and I get:
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=128 -r
> 
> real    0m10.394s
> user    2m13.809s
> sys     0m3.896s
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=8 -r
> 
> real    0m21.033s
> user    2m3.063s
> sys     0m2.539s
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=6 -r
> 
> real    0m23.975s
> user    1m56.139s
> sys     0m2.595s
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=4 -r
> 
> real    0m32.383s
> user    1m39.411s
> sys     0m2.213s
> 
> With debug info disabled (like you do, but I guess in less realistic
> setting) I get:
> 
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
> /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
> gimple-match.o -fno-checking --param lto-partitions=128 -r
> 
> real    0m10.905s
> user    1m55.065s
> sys     0m2.956s
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
> /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
> gimple-match.o -fno-checking --param lto-partitions=8 -r
> 
> real    0m17.297s
> user    1m26.513s
> sys     0m1.626s
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
> /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
> gimple-match.o -fno-checking --param lto-partitions=6 -r
> 
> real    0m22.365s
> user    1m30.969s
> sys     0m1.386s
> hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
> /aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
> gimple-match.o -fno-checking --param lto-partitions=4 -r
> 
> real    0m26.534s
> user    1m21.593s
> sys     0m0.902s
> 
> So I do not see such notable idfference in user times (but they are
> consistently worse than yours). Perhaps, can you try to perf it
> including the system profile? It may give us some idea why things behave
> differently.

That's strange. So let's take my gimple-match.ii:
https://drive.google.com/file/d/1B8d3bIvz1KA_ksIo8h-JgkaJTCRiSPR4/view?usp=sharing

For gcc9 package (LTO+PGO) I get:

$ time g++ -O2 gimple-match.ii -c -flto
real	0m8.180s
user	0m7.992s

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=4 -r

real	0m9.041s
user	0m28.157s
sys	0m0.493s

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=128 -r

real	0m6.011s
user	1m20.326s
sys	0m2.147s

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking -r

real	0m6.303s
user	1m18.789s
sys	0m2.244s

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=8 -r

real	0m5.875s
user	0m38.938s
sys	0m0.784s

For default I get:

perf report --stdio | head -n30
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 351K of event 'cycles:u'
# Event count (approx.): 341558047686
#
# Overhead  Command          Shared Object                Symbol
# ........  ...............  ...........................  ............................................................................
#
      3.61%  lto1-ltrans      lto1                         [.] df_worklist_dataflow
      1.93%  lto1-ltrans      lto1                         [.] cleanup_cfg
      1.15%  lto1-ltrans      lto1                         [.] init_alias_analysis
      1.02%  lto1-ltrans      lto1                         [.] pre_and_rev_post_order_compute_fn
      0.93%  lto1-ltrans      lto1                         [.] calculate_dominance_info
      0.84%  lto1-ltrans      lto1                         [.] inverted_post_order_compute
      0.75%  lto1-ltrans      lto1                         [.] post_order_compute
      0.71%  lto1-ltrans      libc-2.31.so                 [.] _int_malloc
      0.69%  lto1-ltrans      lto1                         [.] constrain_operands
      0.68%  lto1-ltrans      lto1                         [.] df_bb_refs_record
      0.59%  lto1-ltrans      lto1                         [.] side_effects_p
      0.53%  lto1-ltrans      lto1                         [.] delete_unreachable_blocks
      0.53%  lto1-ltrans      lto1                         [.] rewrite_update_dom_walker::before_dom_children
      0.49%  lto1-ltrans      lto1                         [.] bitmap_set_bit
      0.47%  lto1-ltrans      lto1                         [.] record_temporary_equivalences
      0.46%  lto1-ltrans      lto1                         [.] single_def_use_dom_walker::before_dom_children
      0.46%  lto1-ltrans      lto1                         [.] df_compact_blocks
      0.45%  lto1-ltrans      lto1                         [.] substitute_and_fold_engine::substitute_and_fold
      0.45%  lto1-ltrans      libc-2.31.so                 [.] _int_free


Martin

> 
> Compiler binary I use is profiledbootstrapped with LTO.
> 
> Honza
>>>
>>> So I would recommend to set the param value to 75000, which leads to 6 partitions. That would be:
>>>
>>> 9+10s = 19s vs. 40s (total real time 44s). That seems reasonable to me.
>>>
>>> Thoughts?
>>> Thanks,
>>> Martin
>>>
>>> gcc/ChangeLog:
>>>
>>> 2020-03-13  Martin Liska  <mliska@suse.cz>
>>>
>>> 	* params.opt: Bump min-lto-partition in order to not create
>>> 	too many LTRANS.
>>> ---
>>>   gcc/params.opt | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>>
>>
>>> diff --git a/gcc/params.opt b/gcc/params.opt
>>> index e39216aa7d0..49fafac20af 100644
>>> --- a/gcc/params.opt
>>> +++ b/gcc/params.opt
>>> @@ -363,7 +363,7 @@ Common Joined UInteger Var(param_max_lto_streaming_parallelism) Init(32) Integer
>>>   maximal number of LTO partitions streamed in parallel.
>>>   
>>>   -param=lto-min-partition=
>>> -Common Joined UInteger Var(param_min_partition_size) Init(10000) Param
>>> +Common Joined UInteger Var(param_min_partition_size) Init(75000) Param
>>>   Minimal size of a partition for LTO (in estimated instructions).
>>>   
>>>   -param=lto-partitions=
>>>
>>
Martin Liška March 13, 2020, 3:32 p.m. UTC | #4
And using EPYC2 with 64 cores I get:

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=4 -r

real	0m11.040s
user	0m33.479s
sys	0m0.718s

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=8 -r

real	0m6.542s
user	0m39.334s
sys	0m0.945s

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking --param lto-partitions=128 -r

real	0m4.945s
user	0m59.344s
sys	0m2.475s

So here the growth of user time is only about 100%.
And baseline is:

time g++ -O2 /tmp/gimple-match.ii -c

real	0m39.783s
user	0m39.385s
sys	0m0.372s


Martin
diff mbox series

Patch

diff --git a/gcc/params.opt b/gcc/params.opt
index e39216aa7d0..49fafac20af 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -363,7 +363,7 @@  Common Joined UInteger Var(param_max_lto_streaming_parallelism) Init(32) Integer
 maximal number of LTO partitions streamed in parallel.
 
 -param=lto-min-partition=
-Common Joined UInteger Var(param_min_partition_size) Init(10000) Param
+Common Joined UInteger Var(param_min_partition_size) Init(75000) Param
 Minimal size of a partition for LTO (in estimated instructions).
 
 -param=lto-partitions=