diff mbox

[PATCH/AARCH64] Enable software prefetching (-fprefetch-loop-arrays) for ThunderX 88xxx

Message ID 588B2700.9070305@foss.arm.com
State New
Headers show

Commit Message

Kyrill Tkachov Jan. 27, 2017, 10:54 a.m. UTC
On 26/01/17 20:56, Andrew Pinski wrote:
> Hi,
>    This patch enables -fprefetch-loop-arrays for -mcpu=thunderxt88 and
> -mcpu=thunderxt88p1.  I filled out the tuning structures for both
> thunderx and thunderx2t99.  No other core current enables software
> prefetching so I set them to 0 which does not change the default
> parameters.
>
> OK?  Bootstrapped and tested on both ThunderX2 CN99xx and ThunderX
> CN88xx with no regressions.  I got a 2x improvement for 462.libquantum
> on CN88xx, overall a 10% improvement on SPEC INT on CN88xx at -Ofast.
> CN99xx's SPEC did not change.
>
> Thanks,
> Andrew Pinski
>
> ChangeLog:
> * config/aarch64/aarch64-protos.h (struct tune_params): Add
> prefetch_latency, simultaneous_prefetches, l1_cache_size, and
> l2_cache_size fields.
> (enum aarch64_autoprefetch_model): Add AUTOPREFETCHER_SW.
> * config/aarch64/aarch64.c (generic_tunings): Update to include
> prefetch_latency, simultaneous_prefetches, l1_cache_size, and
> l2_cache_size fields to 0.
> (cortexa35_tunings): Likewise.
> (cortexa53_tunings): Likewise.
> (cortexa57_tunings): Likewise.
> (cortexa72_tunings): Likewise.
> (cortexa73_tunings): Likewise.
> (exynosm1_tunings): Likewise.
> (thunderx_tunings): Fill out some of the new fields.
> (thunderxt88_tunings): New variable.
> (xgene1_tunings): Update to include prefetch_latency,
> simultaneous_prefetches, l1_cache_size, and l2_cache_size fields to 0.
> (qdf24xx_tunings): Likewise.
> (thunderx2t99_tunings): Fill out some of the new fields.
> (aarch64_override_options_internal): Consider AUTOPREFETCHER_SW like
> AUTOPREFETCHER_OFF.
> Set param values if the fields are non-zero.  Turn on
> prefetch-loop-arrays if AUTOPREFETCHER_SW and optimize level is at
> least 3 or profile feed usage is enabled.
> * config/aarch64/aarch64-cores.def (thunderxt88p1): Use thunderxt88 tuning.
> (thunderxt88): Likewise.

Comments

Maxim Kuvyrkov Jan. 30, 2017, 12:14 p.m. UTC | #1
> On Jan 27, 2017, at 1:54 PM, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
> 
> 
> On 26/01/17 20:56, Andrew Pinski wrote:
>> Hi,
>>   This patch enables -fprefetch-loop-arrays for -mcpu=thunderxt88 and
>> -mcpu=thunderxt88p1.  I filled out the tuning structures for both
>> thunderx and thunderx2t99.  No other core current enables software
>> prefetching so I set them to 0 which does not change the default
>> parameters.
>> 
>> OK?  Bootstrapped and tested on both ThunderX2 CN99xx and ThunderX
>> CN88xx with no regressions.  I got a 2x improvement for 462.libquantum
>> on CN88xx, overall a 10% improvement on SPEC INT on CN88xx at -Ofast.
>> CN99xx's SPEC did not change.
>> 
>> Thanks,
>> Andrew Pinski
>> 
>> ChangeLog:
>> * config/aarch64/aarch64-protos.h (struct tune_params): Add
>> prefetch_latency, simultaneous_prefetches, l1_cache_size, and
>> l2_cache_size fields.
>> (enum aarch64_autoprefetch_model): Add AUTOPREFETCHER_SW.
>> * config/aarch64/aarch64.c (generic_tunings): Update to include
>> prefetch_latency, simultaneous_prefetches, l1_cache_size, and
>> l2_cache_size fields to 0.
>> (cortexa35_tunings): Likewise.
>> (cortexa53_tunings): Likewise.
>> (cortexa57_tunings): Likewise.
>> (cortexa72_tunings): Likewise.
>> (cortexa73_tunings): Likewise.
>> (exynosm1_tunings): Likewise.
>> (thunderx_tunings): Fill out some of the new fields.
>> (thunderxt88_tunings): New variable.
>> (xgene1_tunings): Update to include prefetch_latency,
>> simultaneous_prefetches, l1_cache_size, and l2_cache_size fields to 0.
>> (qdf24xx_tunings): Likewise.
>> (thunderx2t99_tunings): Fill out some of the new fields.
>> (aarch64_override_options_internal): Consider AUTOPREFETCHER_SW like
>> AUTOPREFETCHER_OFF.
>> Set param values if the fields are non-zero.  Turn on
>> prefetch-loop-arrays if AUTOPREFETCHER_SW and optimize level is at
>> least 3 or profile feed usage is enabled.
>> * config/aarch64/aarch64-cores.def (thunderxt88p1): Use thunderxt88 tuning.
>> (thunderxt88): Likewise.
> 
> --- config/aarch64/aarch64-protos.h	(revision 244917)
> +++ config/aarch64/aarch64-protos.h	(working copy)
> @@ -220,10 +220,19 @@ struct tune_params
>   unsigned int max_case_values;
>   /* Value for PARAM_L1_CACHE_LINE_SIZE; or 0 to use the default.  */
>   unsigned int cache_line_size;
> +  /* Value for PARAM_PREFETCH_LATENCY; or 0 to use the default.  */
> +  unsigned int prefetch_latency;
> +  /* Value for PARAM_SIMULTANEOUS_PREFETCHES; or 0 to use the default.  */
> +  unsigned int simultaneous_prefetches;
> +  /* Value for PARAM_L1_CACHE_SIZE; or 0 to use the default.  */
> +  unsigned int l1_cache_size;
> +  /* Value for PARAM_L2_CACHE_SIZE; or 0 to use the default.  */
> +  unsigned int l2_cache_size;
> 
> Not a blocker to the patch but I wonder whether it would be a good idea to group these prefetch-related parameters
> (plus autoprefetcher_model) into a new nested struct here (prefetch_tunings or something) since there's a decent
> number of them and they're all related.

Feel free to copy-paste from https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02292.html  , which is a copy-paste from aarch32 backend anyway ;-).

--
Maxim Kuvyrkov
www.linaro.org
Andrew Pinski Jan. 30, 2017, 4:06 p.m. UTC | #2
On Mon, Jan 30, 2017 at 4:14 AM, Maxim Kuvyrkov
<maxim.kuvyrkov@linaro.org> wrote:
>> On Jan 27, 2017, at 1:54 PM, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
>>
>>
>> On 26/01/17 20:56, Andrew Pinski wrote:
>>> Hi,
>>>   This patch enables -fprefetch-loop-arrays for -mcpu=thunderxt88 and
>>> -mcpu=thunderxt88p1.  I filled out the tuning structures for both
>>> thunderx and thunderx2t99.  No other core current enables software
>>> prefetching so I set them to 0 which does not change the default
>>> parameters.
>>>
>>> OK?  Bootstrapped and tested on both ThunderX2 CN99xx and ThunderX
>>> CN88xx with no regressions.  I got a 2x improvement for 462.libquantum
>>> on CN88xx, overall a 10% improvement on SPEC INT on CN88xx at -Ofast.
>>> CN99xx's SPEC did not change.
>>>
>>> Thanks,
>>> Andrew Pinski
>>>
>>> ChangeLog:
>>> * config/aarch64/aarch64-protos.h (struct tune_params): Add
>>> prefetch_latency, simultaneous_prefetches, l1_cache_size, and
>>> l2_cache_size fields.
>>> (enum aarch64_autoprefetch_model): Add AUTOPREFETCHER_SW.
>>> * config/aarch64/aarch64.c (generic_tunings): Update to include
>>> prefetch_latency, simultaneous_prefetches, l1_cache_size, and
>>> l2_cache_size fields to 0.
>>> (cortexa35_tunings): Likewise.
>>> (cortexa53_tunings): Likewise.
>>> (cortexa57_tunings): Likewise.
>>> (cortexa72_tunings): Likewise.
>>> (cortexa73_tunings): Likewise.
>>> (exynosm1_tunings): Likewise.
>>> (thunderx_tunings): Fill out some of the new fields.
>>> (thunderxt88_tunings): New variable.
>>> (xgene1_tunings): Update to include prefetch_latency,
>>> simultaneous_prefetches, l1_cache_size, and l2_cache_size fields to 0.
>>> (qdf24xx_tunings): Likewise.
>>> (thunderx2t99_tunings): Fill out some of the new fields.
>>> (aarch64_override_options_internal): Consider AUTOPREFETCHER_SW like
>>> AUTOPREFETCHER_OFF.
>>> Set param values if the fields are non-zero.  Turn on
>>> prefetch-loop-arrays if AUTOPREFETCHER_SW and optimize level is at
>>> least 3 or profile feed usage is enabled.
>>> * config/aarch64/aarch64-cores.def (thunderxt88p1): Use thunderxt88 tuning.
>>> (thunderxt88): Likewise.
>>
>> --- config/aarch64/aarch64-protos.h   (revision 244917)
>> +++ config/aarch64/aarch64-protos.h   (working copy)
>> @@ -220,10 +220,19 @@ struct tune_params
>>   unsigned int max_case_values;
>>   /* Value for PARAM_L1_CACHE_LINE_SIZE; or 0 to use the default.  */
>>   unsigned int cache_line_size;
>> +  /* Value for PARAM_PREFETCH_LATENCY; or 0 to use the default.  */
>> +  unsigned int prefetch_latency;
>> +  /* Value for PARAM_SIMULTANEOUS_PREFETCHES; or 0 to use the default.  */
>> +  unsigned int simultaneous_prefetches;
>> +  /* Value for PARAM_L1_CACHE_SIZE; or 0 to use the default.  */
>> +  unsigned int l1_cache_size;
>> +  /* Value for PARAM_L2_CACHE_SIZE; or 0 to use the default.  */
>> +  unsigned int l2_cache_size;
>>
>> Not a blocker to the patch but I wonder whether it would be a good idea to group these prefetch-related parameters
>> (plus autoprefetcher_model) into a new nested struct here (prefetch_tunings or something) since there's a decent
>> number of them and they're all related.
>
> Feel free to copy-paste from https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02292.html  , which is a copy-paste from aarch32 backend anyway ;-).

I am not a fan of this macro ...

>
> --
> Maxim Kuvyrkov
> www.linaro.org
>
>
diff mbox

Patch

--- config/aarch64/aarch64-protos.h	(revision 244917)
+++ config/aarch64/aarch64-protos.h	(working copy)
@@ -220,10 +220,19 @@  struct tune_params
    unsigned int max_case_values;
    /* Value for PARAM_L1_CACHE_LINE_SIZE; or 0 to use the default.  */
    unsigned int cache_line_size;
+  /* Value for PARAM_PREFETCH_LATENCY; or 0 to use the default.  */
+  unsigned int prefetch_latency;
+  /* Value for PARAM_SIMULTANEOUS_PREFETCHES; or 0 to use the default.  */
+  unsigned int simultaneous_prefetches;
+  /* Value for PARAM_L1_CACHE_SIZE; or 0 to use the default.  */
+  unsigned int l1_cache_size;
+  /* Value for PARAM_L2_CACHE_SIZE; or 0 to use the default.  */
+  unsigned int l2_cache_size;

Not a blocker to the patch but I wonder whether it would be a good idea to group these prefetch-related parameters
(plus autoprefetcher_model) into a new nested struct here (prefetch_tunings or something) since there's a decent
number of them and they're all related.

Thanks,
Kyrill