Message ID | 588B2700.9070305@foss.arm.com |
---|---|
State | New |
Headers | show |
> On Jan 27, 2017, at 1:54 PM, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote: > > > On 26/01/17 20:56, Andrew Pinski wrote: >> Hi, >> This patch enables -fprefetch-loop-arrays for -mcpu=thunderxt88 and >> -mcpu=thunderxt88p1. I filled out the tuning structures for both >> thunderx and thunderx2t99. No other core current enables software >> prefetching so I set them to 0 which does not change the default >> parameters. >> >> OK? Bootstrapped and tested on both ThunderX2 CN99xx and ThunderX >> CN88xx with no regressions. I got a 2x improvement for 462.libquantum >> on CN88xx, overall a 10% improvement on SPEC INT on CN88xx at -Ofast. >> CN99xx's SPEC did not change. >> >> Thanks, >> Andrew Pinski >> >> ChangeLog: >> * config/aarch64/aarch64-protos.h (struct tune_params): Add >> prefetch_latency, simultaneous_prefetches, l1_cache_size, and >> l2_cache_size fields. >> (enum aarch64_autoprefetch_model): Add AUTOPREFETCHER_SW. >> * config/aarch64/aarch64.c (generic_tunings): Update to include >> prefetch_latency, simultaneous_prefetches, l1_cache_size, and >> l2_cache_size fields to 0. >> (cortexa35_tunings): Likewise. >> (cortexa53_tunings): Likewise. >> (cortexa57_tunings): Likewise. >> (cortexa72_tunings): Likewise. >> (cortexa73_tunings): Likewise. >> (exynosm1_tunings): Likewise. >> (thunderx_tunings): Fill out some of the new fields. >> (thunderxt88_tunings): New variable. >> (xgene1_tunings): Update to include prefetch_latency, >> simultaneous_prefetches, l1_cache_size, and l2_cache_size fields to 0. >> (qdf24xx_tunings): Likewise. >> (thunderx2t99_tunings): Fill out some of the new fields. >> (aarch64_override_options_internal): Consider AUTOPREFETCHER_SW like >> AUTOPREFETCHER_OFF. >> Set param values if the fields are non-zero. Turn on >> prefetch-loop-arrays if AUTOPREFETCHER_SW and optimize level is at >> least 3 or profile feed usage is enabled. >> * config/aarch64/aarch64-cores.def (thunderxt88p1): Use thunderxt88 tuning. >> (thunderxt88): Likewise. > > --- config/aarch64/aarch64-protos.h (revision 244917) > +++ config/aarch64/aarch64-protos.h (working copy) > @@ -220,10 +220,19 @@ struct tune_params > unsigned int max_case_values; > /* Value for PARAM_L1_CACHE_LINE_SIZE; or 0 to use the default. */ > unsigned int cache_line_size; > + /* Value for PARAM_PREFETCH_LATENCY; or 0 to use the default. */ > + unsigned int prefetch_latency; > + /* Value for PARAM_SIMULTANEOUS_PREFETCHES; or 0 to use the default. */ > + unsigned int simultaneous_prefetches; > + /* Value for PARAM_L1_CACHE_SIZE; or 0 to use the default. */ > + unsigned int l1_cache_size; > + /* Value for PARAM_L2_CACHE_SIZE; or 0 to use the default. */ > + unsigned int l2_cache_size; > > Not a blocker to the patch but I wonder whether it would be a good idea to group these prefetch-related parameters > (plus autoprefetcher_model) into a new nested struct here (prefetch_tunings or something) since there's a decent > number of them and they're all related. Feel free to copy-paste from https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02292.html , which is a copy-paste from aarch32 backend anyway ;-). -- Maxim Kuvyrkov www.linaro.org
On Mon, Jan 30, 2017 at 4:14 AM, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote: >> On Jan 27, 2017, at 1:54 PM, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote: >> >> >> On 26/01/17 20:56, Andrew Pinski wrote: >>> Hi, >>> This patch enables -fprefetch-loop-arrays for -mcpu=thunderxt88 and >>> -mcpu=thunderxt88p1. I filled out the tuning structures for both >>> thunderx and thunderx2t99. No other core current enables software >>> prefetching so I set them to 0 which does not change the default >>> parameters. >>> >>> OK? Bootstrapped and tested on both ThunderX2 CN99xx and ThunderX >>> CN88xx with no regressions. I got a 2x improvement for 462.libquantum >>> on CN88xx, overall a 10% improvement on SPEC INT on CN88xx at -Ofast. >>> CN99xx's SPEC did not change. >>> >>> Thanks, >>> Andrew Pinski >>> >>> ChangeLog: >>> * config/aarch64/aarch64-protos.h (struct tune_params): Add >>> prefetch_latency, simultaneous_prefetches, l1_cache_size, and >>> l2_cache_size fields. >>> (enum aarch64_autoprefetch_model): Add AUTOPREFETCHER_SW. >>> * config/aarch64/aarch64.c (generic_tunings): Update to include >>> prefetch_latency, simultaneous_prefetches, l1_cache_size, and >>> l2_cache_size fields to 0. >>> (cortexa35_tunings): Likewise. >>> (cortexa53_tunings): Likewise. >>> (cortexa57_tunings): Likewise. >>> (cortexa72_tunings): Likewise. >>> (cortexa73_tunings): Likewise. >>> (exynosm1_tunings): Likewise. >>> (thunderx_tunings): Fill out some of the new fields. >>> (thunderxt88_tunings): New variable. >>> (xgene1_tunings): Update to include prefetch_latency, >>> simultaneous_prefetches, l1_cache_size, and l2_cache_size fields to 0. >>> (qdf24xx_tunings): Likewise. >>> (thunderx2t99_tunings): Fill out some of the new fields. >>> (aarch64_override_options_internal): Consider AUTOPREFETCHER_SW like >>> AUTOPREFETCHER_OFF. >>> Set param values if the fields are non-zero. Turn on >>> prefetch-loop-arrays if AUTOPREFETCHER_SW and optimize level is at >>> least 3 or profile feed usage is enabled. >>> * config/aarch64/aarch64-cores.def (thunderxt88p1): Use thunderxt88 tuning. >>> (thunderxt88): Likewise. >> >> --- config/aarch64/aarch64-protos.h (revision 244917) >> +++ config/aarch64/aarch64-protos.h (working copy) >> @@ -220,10 +220,19 @@ struct tune_params >> unsigned int max_case_values; >> /* Value for PARAM_L1_CACHE_LINE_SIZE; or 0 to use the default. */ >> unsigned int cache_line_size; >> + /* Value for PARAM_PREFETCH_LATENCY; or 0 to use the default. */ >> + unsigned int prefetch_latency; >> + /* Value for PARAM_SIMULTANEOUS_PREFETCHES; or 0 to use the default. */ >> + unsigned int simultaneous_prefetches; >> + /* Value for PARAM_L1_CACHE_SIZE; or 0 to use the default. */ >> + unsigned int l1_cache_size; >> + /* Value for PARAM_L2_CACHE_SIZE; or 0 to use the default. */ >> + unsigned int l2_cache_size; >> >> Not a blocker to the patch but I wonder whether it would be a good idea to group these prefetch-related parameters >> (plus autoprefetcher_model) into a new nested struct here (prefetch_tunings or something) since there's a decent >> number of them and they're all related. > > Feel free to copy-paste from https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02292.html , which is a copy-paste from aarch32 backend anyway ;-). I am not a fan of this macro ... > > -- > Maxim Kuvyrkov > www.linaro.org > >
--- config/aarch64/aarch64-protos.h (revision 244917) +++ config/aarch64/aarch64-protos.h (working copy) @@ -220,10 +220,19 @@ struct tune_params unsigned int max_case_values; /* Value for PARAM_L1_CACHE_LINE_SIZE; or 0 to use the default. */ unsigned int cache_line_size; + /* Value for PARAM_PREFETCH_LATENCY; or 0 to use the default. */ + unsigned int prefetch_latency; + /* Value for PARAM_SIMULTANEOUS_PREFETCHES; or 0 to use the default. */ + unsigned int simultaneous_prefetches; + /* Value for PARAM_L1_CACHE_SIZE; or 0 to use the default. */ + unsigned int l1_cache_size; + /* Value for PARAM_L2_CACHE_SIZE; or 0 to use the default. */ + unsigned int l2_cache_size; Not a blocker to the patch but I wonder whether it would be a good idea to group these prefetch-related parameters (plus autoprefetcher_model) into a new nested struct here (prefetch_tunings or something) since there's a decent number of them and they're all related. Thanks, Kyrill