[AArch64] PR79262: Adjust vector cost

Message ID	DB6PR0801MB2053221E018DE366B60EAC1E83EC0@DB6PR0801MB2053.eurprd08.prod.outlook.com
State	New
Headers	show Return-Path: <gcc-patches-return-471799-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type :content-transfer-encoding:mime-version; q=dns; s=default; b=JuN ezfGpY9Fq2w4ZR7NdG4EshNPaRuBkvtPDwHzZPLGwjdvsL0ZNWovKtIlYhpBt5To P1Tf9caAZ7BwC9GFlJ3EHNQ5ZX6Z+oP9TerEduI87VN9U/0w/vpGjAPQ1cxQtQcm JgMI7H20TIpvApIfaSWG0x0DJIw7B6QM7pqRG10U= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: Wilco Dijkstra <Wilco.Dijkstra@arm.com> To: GCC Patches <gcc-patches@gcc.gnu.org> CC: nd <nd@arm.com>, James Greenhalgh <James.Greenhalgh@arm.com> Subject: [PATCH][AArch64] PR79262: Adjust vector cost Date: Mon, 22 Jan 2018 15:01:05 +0000 Message-ID: <DB6PR0801MB2053221E018DE366B60EAC1E83EC0@DB6PR0801MB2053.eurprd08.prod.outlook.com> nodisclaimer: True received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: b42oJ82Gj9SqVQt8szajow8V7uBJ8Y6HK2+soBoYVAmeRAQ3r9fElMf/AU6mZcys+SAPWlqEH6DgEy4NRWq2Pw== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0
Series	[AArch64] PR79262: Adjust vector cost \| expand [AArch64] PR79262: Adjust vector cost

Message ID

DB6PR0801MB2053221E018DE366B60EAC1E83EC0@DB6PR0801MB2053.eurprd08.prod.outlook.com

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:content-type
	:content-transfer-encoding:mime-version; q=dns; s=default; b=JuN
	ezfGpY9Fq2w4ZR7NdG4EshNPaRuBkvtPDwHzZPLGwjdvsL0ZNWovKtIlYhpBt5To
	P1Tf9caAZ7BwC9GFlJ3EHNQ5ZX6Z+oP9TerEduI87VN9U/0w/vpGjAPQ1cxQtQcm
	JgMI7H20TIpvApIfaSWG0x0DJIw7B6QM7pqRG10U=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>
CC: nd <nd@arm.com>, James Greenhalgh <James.Greenhalgh@arm.com>
Subject: [PATCH][AArch64] PR79262: Adjust vector cost
Date: Mon, 22 Jan 2018 15:01:05 +0000
Message-ID: <DB6PR0801MB2053221E018DE366B60EAC1E83EC0@DB6PR0801MB2053.eurprd08.prod.outlook.com>
nodisclaimer: True
received-spf: None (protection.outlook.com: arm.com does not designate
	permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: 58d7c9c5-2cf3-44ab-2693-08d561a8f638
X-MS-Exchange-CrossTenant-originalarrivaltime: 22 Jan 2018 15:01:05.5599
	(UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR08MB2668

Series

[AArch64] PR79262: Adjust vector cost | expand

Commit Message

Wilco Dijkstra Jan. 22, 2018, 3:01 p.m. UTC

PR79262 has been fixed for almost all AArch64 cpus, however the example is still
vectorized in a few cases, resulting in lower performance.  Increase the cost of
vector-to-scalar moves so it is more similar to the other vector costs. As a result
-mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6
performance improves.

OK for commit?

ChangeLog:
2018-01-22  Wilco Dijkstra  <wdijkstr@arm.com>

	PR target/79262
	* config/aarch64/aarch64.c (generic_vector_cost): Adjust vec_to_scalar_cost.
--

Comments

Richard Biener Jan. 22, 2018, 3:22 p.m. UTC | #1

On Mon, Jan 22, 2018 at 4:01 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> PR79262 has been fixed for almost all AArch64 cpus, however the example is still
> vectorized in a few cases, resulting in lower performance.  Increase the cost of
> vector-to-scalar moves so it is more similar to the other vector costs. As a result
> -mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6
> performance improves.
>
> OK for commit?

It would be better to dissect this cost into vec_to_scalar and vec_extract where
vec_to_scalar really means getting at the scalar value of a vector of
uniform values
which most targets can do without any instruction (just use a subreg).

I suppose we could also make vec_to_scalar equal to vector extraction and remove
the uses for the other case (reduction vector result to scalar reg).

Richard.

> ChangeLog:
> 2018-01-22  Wilco Dijkstra  <wdijkstr@arm.com>
>
>         PR target/79262
>         * config/aarch64/aarch64.c (generic_vector_cost): Adjust vec_to_scalar_cost.
> --
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index c6a83c881038873d8b68e36f906783be63ddde56..43f5b7162152ca92a916f4febee01f624c375202 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -403,7 +403,7 @@ static const struct cpu_vector_cost generic_vector_cost =
>    1, /* vec_int_stmt_cost  */
>    1, /* vec_fp_stmt_cost  */
>    2, /* vec_permute_cost  */
> -  1, /* vec_to_scalar_cost  */
> +  2, /* vec_to_scalar_cost  */
>    1, /* scalar_to_vec_cost  */
>    1, /* vec_align_load_cost  */
>    1, /* vec_unalign_load_cost  */

James Greenhalgh Nov. 9, 2018, 2:57 p.m. UTC | #2

On Mon, Jan 22, 2018 at 09:22:27AM -0600, Richard Biener wrote:
> On Mon, Jan 22, 2018 at 4:01 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> > PR79262 has been fixed for almost all AArch64 cpus, however the example is still
> > vectorized in a few cases, resulting in lower performance.  Increase the cost of
> > vector-to-scalar moves so it is more similar to the other vector costs. As a result
> > -mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6
> > performance improves.
> >
> > OK for commit?
> 
> It would be better to dissect this cost into vec_to_scalar and vec_extract where
> vec_to_scalar really means getting at the scalar value of a vector of
> uniform values
> which most targets can do without any instruction (just use a subreg).
> 
> I suppose we could also make vec_to_scalar equal to vector extraction and remove
> the uses for the other case (reduction vector result to scalar reg).

I have dug up Richard's comments from last year, which you appear to have
ignored and made no reference to when resubmitting the patch.

Please don't do that. Carefully consider Richard's review feedback before
resubmitting this patch.

To reiterate, it is not OK for trunk.

Thanks,
James

> 
> Richard.
> 
> > ChangeLog:
> > 2018-01-22  Wilco Dijkstra  <wdijkstr@arm.com>
> >
> >         PR target/79262
> >         * config/aarch64/aarch64.c (generic_vector_cost): Adjust vec_to_scalar_cost.

Wilco Dijkstra Nov. 9, 2018, 5:43 p.m. UTC | #3

Hi James,

>On Mon, Jan 22, 2018 at 09:22:27AM -0600, Richard Biener wrote:
>> It would be better to dissect this cost into vec_to_scalar and vec_extract where
>> vec_to_scalar really means getting at the scalar value of a vector of
>> uniform values
>> which most targets can do without any instruction (just use a subreg).
>> 
>> I suppose we could also make vec_to_scalar equal to vector extraction and remove
>> the uses for the other case (reduction vector result to scalar reg).
>
> I have dug up Richard's comments from last year, which you appear to have
> ignored and made no reference to when resubmitting the patch.
>
> Please don't do that. Carefully consider Richard's review feedback before
> resubmitting this patch.

I seem to have missed this comment... However I can't see how it relates to my
patch, particularly since Richard claimed in PR79262 that this PR is a backend
issue. Sure it *would* be great to fix the vector cost infrastructure, but that's a lot
of work, and wasn't my goal here, nor is it required to fix PR79262. The existing
costs allow the issue to be solved, just like the other targets/tunings have done
already, so that's clearly the best approach for this PR.

Wilco

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c6a83c881038873d8b68e36f906783be63ddde56..43f5b7162152ca92a916f4febee01f624c375202 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -403,7 +403,7 @@  static const struct cpu_vector_cost generic_vector_cost =
   1, /* vec_int_stmt_cost  */
   1, /* vec_fp_stmt_cost  */
   2, /* vec_permute_cost  */
-  1, /* vec_to_scalar_cost  */
+  2, /* vec_to_scalar_cost  */
   1, /* scalar_to_vec_cost  */
   1, /* vec_align_load_cost  */
   1, /* vec_unalign_load_cost  */