Message ID | VI1PR0802MB262138504A5473B8286763C783210@VI1PR0802MB2621.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
On Thu, Mar 9, 2017 at 6:42 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > Hi, > > Recently we've put a lot of effort into improving ifcvt to use CSEL on AArch64. > In https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01639.html James determined > the best value for AArch64 code generation. Although this setting is used when > explicitly targeting Cortex cores, it is not otherwise used. This means by > default GCC will not use (F)CSEL in many common cases. Most code is built > without -mcpu= and thus doesn't use CSEL like this example from GLIBC: > > strtok: > stp x29, x30, [sp, -48]! > add x29, sp, 0 > stp x21, x22, [sp, 32] > mov x21, x1 > stp x19, x20, [sp, 16] > adrp x22, .LANCHOR0 > mov x19, x0 > cbz x0, .L12 > .L2: ldrb w0, [x19] > > .L12: > ldr x19, [x22, #:lo12:.LANCHOR0] > b .L2 > > With -mcpu=cortex-a57 GCC generates: > > stp x29, x30, [sp, -48]! > cmp x0, 0 > add x29, sp, 0 > stp x21, x22, [sp, 32] > adrp x21, .LANCHOR0 > stp x19, x20, [sp, 16] > mov x19, x0 > ldr x0, [x21, #:lo12:.LANCHOR0] > csel x19, x0, x19, eq > ldrb w0, [x19] > > This is generally faster and smaller. On one benchmark the new setting fixes a > regression since GCC6 and improves performance by 49%. So I propose to change > generic_branch_cost to be the same as cortexa57_branch_cost so that all supported > cores benefit equally from CSEL. Are there any objections to this? I have no objections. In fact thunderx2t99's branch_cost is 1,3. I had not looked into improving thunderx branch cost yet but that might be because I have local patches that improve phiopt for doing ifcvt earlier. Also my phiopt change does not have a cost model either so using csel more is good for thunderx 1 and ThunderX 2. Thanks, Andrew > > Wilco > > > ChangeLog: > 2017-03-09 Wilco Dijkstra <wdijkstr@arm.com> > > * config/aarch64/aarch64.c (generic_branch_cost): Copy cortexa57_branch_cost. > -- > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 5870b5e5d7e8e48cf925b3a62030346f041a7fd6..ea16074af86087a6200d9895583e05acf43d90e2 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -377,8 +377,8 @@ static const struct cpu_vector_cost xgene1_vector_cost = > /* Generic costs for branch instructions. */ > static const struct cpu_branch_cost generic_branch_cost = > { > - 2, /* Predictable. */ > - 2 /* Unpredictable. */ > + 1, /* Predictable. */ > + 3 /* Unpredictable. */ > }; > > /* Branch costs for Cortex-A57. */
On Thu, Mar 09, 2017 at 02:06:16PM -0800, Andrew Pinski wrote: > On Thu, Mar 9, 2017 at 6:42 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > > Hi, > > > > Recently we've put a lot of effort into improving ifcvt to use CSEL on > > AArch64. In https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01639.html > > James determined the best value for AArch64 code generation. This was before the rewrite to the ifcvt costs which I made earlier in the GCC 7 release cycle. But I think 1,3 is about right, and I'd be happy to see us take that direction for "generic". I'd like to hear comments from the Exynos-M1, Falkor and xgene-1 subtarget contributors, particularly as these targets use generic_branch_costs for their subtarget-sepcific tuning. It may be that your patch needs to preserve the 2,2 setting for such cores even if the generic target does move to 1,3. At this stage in the release, this patch will have to wait for GCC 8 regardless of any comments received. I'd suggest that when we do think about this for GCC 8, we might want to take a wider look at the "generic" tunings, any opinions from other subtarget contributors, or the other AArch64 maintainers as to further changes they would advocate for would be welcome. > I have no objections. In fact thunderx2t99's branch_cost is 1,3. I > had not looked into improving thunderx branch cost yet but that might > be because I have local patches that improve phiopt for doing ifcvt > earlier. Also my phiopt change does not have a cost model either so > using csel more is good for thunderx 1 and ThunderX 2. Thanks for the comments, James
On Tue, Mar 14, 2017 at 2:37 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote: > I'd like to hear comments from the Exynos-M1, Falkor and > xgene-1 subtarget contributors, particularly as these targets use > generic_branch_costs for their subtarget-sepcific tuning. It may be that > your patch needs to preserve the 2,2 setting for such cores even if the > generic target does move to 1,3. I was at Linaro Connect last week of course. I took a look at this issue this week. I don't see any measurable performance change on SPEC CPU2006 for falkor, so the change looks OK to me. In general, I'm not too concerned about changes like this, as I'm watching the FSF GCC tree, and will make appropriate changes to the falkor tuning structure as necessary to maintain good performance. Jim
On 09/03/17 14:42, Wilco Dijkstra wrote: > Hi, > > Recently we've put a lot of effort into improving ifcvt to use CSEL on AArch64. > In https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01639.html James determined > the best value for AArch64 code generation. Although this setting is used when > explicitly targeting Cortex cores, it is not otherwise used. This means by > default GCC will not use (F)CSEL in many common cases. Most code is built > without -mcpu= and thus doesn't use CSEL like this example from GLIBC: > > strtok: > stp x29, x30, [sp, -48]! > add x29, sp, 0 > stp x21, x22, [sp, 32] > mov x21, x1 > stp x19, x20, [sp, 16] > adrp x22, .LANCHOR0 > mov x19, x0 > cbz x0, .L12 > .L2: ldrb w0, [x19] > > .L12: > ldr x19, [x22, #:lo12:.LANCHOR0] > b .L2 > > With -mcpu=cortex-a57 GCC generates: > > stp x29, x30, [sp, -48]! > cmp x0, 0 > add x29, sp, 0 > stp x21, x22, [sp, 32] > adrp x21, .LANCHOR0 > stp x19, x20, [sp, 16] > mov x19, x0 > ldr x0, [x21, #:lo12:.LANCHOR0] > csel x19, x0, x19, eq > ldrb w0, [x19] > > This is generally faster and smaller. On one benchmark the new setting fixes a > regression since GCC6 and improves performance by 49%. So I propose to change > generic_branch_cost to be the same as cortexa57_branch_cost so that all supported > cores benefit equally from CSEL. Are there any objections to this? > > Wilco > > > ChangeLog: > 2017-03-09 Wilco Dijkstra <wdijkstr@arm.com> > > * config/aarch64/aarch64.c (generic_branch_cost): Copy cortexa57_branch_cost. This is OK. We already have a number of cores using these values so I don't think this is likely to be a risky change even in stage 4. R. > -- > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 5870b5e5d7e8e48cf925b3a62030346f041a7fd6..ea16074af86087a6200d9895583e05acf43d90e2 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -377,8 +377,8 @@ static const struct cpu_vector_cost xgene1_vector_cost = > /* Generic costs for branch instructions. */ > static const struct cpu_branch_cost generic_branch_cost = > { > - 2, /* Predictable. */ > - 2 /* Unpredictable. */ > + 1, /* Predictable. */ > + 3 /* Unpredictable. */ > }; > > /* Branch costs for Cortex-A57. */ >
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 5870b5e5d7e8e48cf925b3a62030346f041a7fd6..ea16074af86087a6200d9895583e05acf43d90e2 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -377,8 +377,8 @@ static const struct cpu_vector_cost xgene1_vector_cost = /* Generic costs for branch instructions. */ static const struct cpu_branch_cost generic_branch_cost = { - 2, /* Predictable. */ - 2 /* Unpredictable. */ + 1, /* Predictable. */ + 3 /* Unpredictable. */ }; /* Branch costs for Cortex-A57. */