diff mbox series

PR target/84743 adjust reassociation widths for power8/power9

Message ID 1520902959.6039.23.camel@linux.vnet.ibm.com
State New
Headers show
Series PR target/84743 adjust reassociation widths for power8/power9 | expand

Commit Message

Aaron Sawdey March 13, 2018, 1:02 a.m. UTC
Looking at CPU2017 results for different reassociation widths, things
have shifted since I last looked at this with CPU2006 in early gcc7
timeframe. Best thing to do seems to be to set reassociation width to 1
for all integer modes, which is what the attached patch does.

I also tried setting width to 1 for float modes PLUS_EXPR as this patch
did for aarch64 but this does not seem to be helpful for power8.
https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01271.html


Results below are % performance improvement on power8 comparing trunk
with the attached patch vs trunk with --param tree-reassoc-width=1 to
disable parallel reassociation for everything (first column of results)
and trunk unmodified (second column of results). 

CPU2017 component    vs width=1   vs trunk
500.perlbench_r        -0.36%     -0.15%
502.gcc_r               0.06%      0.04%
505.mcf_r               0.32%      0.24%
520.omnetpp_r           0.57%     -0.95%
523.xalancbmk_r         1.45%      1.04%
525.x264_r             -0.05%      0.09%
531.deepsjeng_r         0.04%      0.09%
541.leela_r             0.10%      0.72%
548.exchange2_r         0.08%      0.73%
557.xz_r                0.09%      2.12%
CPU2017 int geo mean    0.23%      0.40%
503.bwaves_r            0.00%      0.01%
507.cactuBSSN_r         0.05%     -0.02%
508.namd_r              0.00%      0.00%
510.parest_r           -0.01%      0.20%
511.povray_r            0.03%     -0.24%
519.lbm_r              -0.04%     -0.16%
521.wrf_r              -0.01%     -0.56%
526.blender_r          -0.82%     -0.47%
527.cam4_r             -0.18%      0.06%
538.imagick_r          -0.02%      0.01%
544.nab_r               0.00%      0.23%
549.fotonik3d_r         0.24%      0.54%
554.roms_r             -0.05%      0.03%
CPU2017 fp geo mean    -0.06%     -0.03%

Bottom line is net improvement for CPU2017 int compared with either
current trunk, or disabling parallel reassociation. For CPU2017 fp,
very small overall degradation. 

Currently doing regstrap on ppc64le, ok for trunk if results look good?

Thanks!
   Aaron

2018-03-12  Aaron Sawdey  <acsawdey@linux.vnet.ibm.com>

	PR target/84743
	* config/rs6000/rs6000.c (rs6000_reassociation_width): Disable parallel
	reassociation for int modes.

Comments

Richard Biener March 13, 2018, 8:14 a.m. UTC | #1
On Mon, 12 Mar 2018, Aaron Sawdey wrote:

> Looking at CPU2017 results for different reassociation widths, things
> have shifted since I last looked at this with CPU2006 in early gcc7
> timeframe. Best thing to do seems to be to set reassociation width to 1
> for all integer modes, which is what the attached patch does.
> 
> I also tried setting width to 1 for float modes PLUS_EXPR as this patch
> did for aarch64 but this does not seem to be helpful for power8.
> https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01271.html
> 
> 
> Results below are % performance improvement on power8 comparing trunk
> with the attached patch vs trunk with --param tree-reassoc-width=1 to
> disable parallel reassociation for everything (first column of results)
> and trunk unmodified (second column of results). 
> 
> CPU2017 component    vs width=1   vs trunk
> 500.perlbench_r        -0.36%     -0.15%
> 502.gcc_r               0.06%      0.04%
> 505.mcf_r               0.32%      0.24%
> 520.omnetpp_r           0.57%     -0.95%
> 523.xalancbmk_r         1.45%      1.04%
> 525.x264_r             -0.05%      0.09%
> 531.deepsjeng_r         0.04%      0.09%
> 541.leela_r             0.10%      0.72%
> 548.exchange2_r         0.08%      0.73%
> 557.xz_r                0.09%      2.12%
> CPU2017 int geo mean    0.23%      0.40%
> 503.bwaves_r            0.00%      0.01%
> 507.cactuBSSN_r         0.05%     -0.02%
> 508.namd_r              0.00%      0.00%
> 510.parest_r           -0.01%      0.20%
> 511.povray_r            0.03%     -0.24%
> 519.lbm_r              -0.04%     -0.16%
> 521.wrf_r              -0.01%     -0.56%
> 526.blender_r          -0.82%     -0.47%
> 527.cam4_r             -0.18%      0.06%
> 538.imagick_r          -0.02%      0.01%
> 544.nab_r               0.00%      0.23%
> 549.fotonik3d_r         0.24%      0.54%
> 554.roms_r             -0.05%      0.03%
> CPU2017 fp geo mean    -0.06%     -0.03%
> 
> Bottom line is net improvement for CPU2017 int compared with either
> current trunk, or disabling parallel reassociation. For CPU2017 fp,
> very small overall degradation. 
> 
> Currently doing regstrap on ppc64le, ok for trunk if results look good?

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c  (revision 258101)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -10006,7 +10006,7 @@
       if (VECTOR_MODE_P (mode))
        return 4;
       if (INTEGRAL_MODE_P (mode)) 
-       return opc == MULT_EXPR ? 4 : 6;
+       return 1;
       if (FLOAT_MODE_P (mode))
        return 4;
       break;

so the original widths were very large (IMHO), did you try reducing
width to, say, 2?  In your numbers I see mostly noise but
2% regression for 557.xz_r and 1% for 523.xalancbmk_r.  Maybe
POWER machines give very stable performance measurement results
but from my experience on x86_64 anything < 1% is just noise...

Richard.

> Thanks!
>    Aaron
> 
> 2018-03-12  Aaron Sawdey  <acsawdey@linux.vnet.ibm.com>
> 
> 	PR target/84743
> 	* config/rs6000/rs6000.c (rs6000_reassociation_width): Disable parallel
> 	reassociation for int modes.
> 
> 
>
Segher Boessenkool March 13, 2018, 4:02 p.m. UTC | #2
On Mon, Mar 12, 2018 at 08:02:39PM -0500, Aaron Sawdey wrote:
> Looking at CPU2017 results for different reassociation widths, things
> have shifted since I last looked at this with CPU2006 in early gcc7
> timeframe. Best thing to do seems to be to set reassociation width to 1
> for all integer modes, which is what the attached patch does.
> 
> I also tried setting width to 1 for float modes PLUS_EXPR as this patch
> did for aarch64 but this does not seem to be helpful for power8.
> https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01271.html
> 
> 
> Results below are % performance improvement on power8 comparing trunk
> with the attached patch vs trunk with --param tree-reassoc-width=1 to
> disable parallel reassociation for everything (first column of results)
> and trunk unmodified (second column of results). 
> 
> CPU2017 component    vs width=1   vs trunk
> 500.perlbench_r        -0.36%     -0.15%
> 502.gcc_r               0.06%      0.04%
> 505.mcf_r               0.32%      0.24%
> 520.omnetpp_r           0.57%     -0.95%
> 523.xalancbmk_r         1.45%      1.04%
> 525.x264_r             -0.05%      0.09%
> 531.deepsjeng_r         0.04%      0.09%
> 541.leela_r             0.10%      0.72%
> 548.exchange2_r         0.08%      0.73%
> 557.xz_r                0.09%      2.12%
> CPU2017 int geo mean    0.23%      0.40%
> 503.bwaves_r            0.00%      0.01%
> 507.cactuBSSN_r         0.05%     -0.02%
> 508.namd_r              0.00%      0.00%
> 510.parest_r           -0.01%      0.20%
> 511.povray_r            0.03%     -0.24%
> 519.lbm_r              -0.04%     -0.16%
> 521.wrf_r              -0.01%     -0.56%
> 526.blender_r          -0.82%     -0.47%
> 527.cam4_r             -0.18%      0.06%
> 538.imagick_r          -0.02%      0.01%
> 544.nab_r               0.00%      0.23%
> 549.fotonik3d_r         0.24%      0.54%
> 554.roms_r             -0.05%      0.03%
> CPU2017 fp geo mean    -0.06%     -0.03%
> 
> Bottom line is net improvement for CPU2017 int compared with either
> current trunk, or disabling parallel reassociation. For CPU2017 fp,
> very small overall degradation. 
> 
> Currently doing regstrap on ppc64le, ok for trunk if results look good?

Can't argue with a 0.4% win :-)  Okay for trunk.  Thanks!


Segher


> 2018-03-12  Aaron Sawdey  <acsawdey@linux.vnet.ibm.com>
> 
> 	PR target/84743
> 	* config/rs6000/rs6000.c (rs6000_reassociation_width): Disable parallel
> 	reassociation for int modes.
diff mbox series

Patch

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 258101)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -10006,7 +10006,7 @@ 
       if (VECTOR_MODE_P (mode))
 	return 4;
       if (INTEGRAL_MODE_P (mode)) 
-	return opc == MULT_EXPR ? 4 : 6;
+	return 1;
       if (FLOAT_MODE_P (mode))
 	return 4;
       break;