[AArch64] Improve dup pattern

Message ID	AM5PR0802MB2610D1E1D4DDB32F990C2F0583C50@AM5PR0802MB2610.eurprd08.prod.outlook.com
State	New
Headers	show Return-Path: <gcc-patches-return-456329-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type :content-transfer-encoding:mime-version; q=dns; s=default; b=qUi DK8AOozdqYs87cwo4VUNLvn8D8whwx2SeLW/TosGiwE2fcmJ41JIVhftsLsHahQp BcpW0zQMGLS2T41EzNOVZOni8r0dx9W+bOJHrTTZvL0bT7MncRfU9N/OCdQ3f1ck BSSfvwucL4yrpA9MPq2k9vhHrWQ+n3cSvCNua8I4= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: Wilco Dijkstra <Wilco.Dijkstra@arm.com> To: GCC Patches <gcc-patches@gcc.gnu.org> CC: nd <nd@arm.com>, James Greenhalgh <James.Greenhalgh@arm.com> Subject: [PATCH][AArch64] Improve dup pattern Date: Tue, 20 Jun 2017 10:57:59 +0000 Message-ID: <AM5PR0802MB2610D1E1D4DDB32F990C2F0583C50@AM5PR0802MB2610.eurprd08.prod.outlook.com> nodisclaimer: True spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0

Message ID

AM5PR0802MB2610D1E1D4DDB32F990C2F0583C50@AM5PR0802MB2610.eurprd08.prod.outlook.com

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:content-type
	:content-transfer-encoding:mime-version; q=dns; s=default; b=qUi
	DK8AOozdqYs87cwo4VUNLvn8D8whwx2SeLW/TosGiwE2fcmJ41JIVhftsLsHahQp
	BcpW0zQMGLS2T41EzNOVZOni8r0dx9W+bOJHrTTZvL0bT7MncRfU9N/OCdQ3f1ck
	BSSfvwucL4yrpA9MPq2k9vhHrWQ+n3cSvCNua8I4=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>
CC: nd <nd@arm.com>, James Greenhalgh <James.Greenhalgh@arm.com>
Subject: [PATCH][AArch64] Improve dup pattern
Date: Tue, 20 Jun 2017 10:57:59 +0000
Message-ID: <AM5PR0802MB2610D1E1D4DDB32F990C2F0583C50@AM5PR0802MB2610.eurprd08.prod.outlook.com>
nodisclaimer: True
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 20 Jun 2017 10:57:59.2820
	(UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR08MB2660

Commit Message

Wilco Dijkstra June 20, 2017, 10:57 a.m. UTC

Improve the dup pattern to prefer vector registers.  When doing a dup
after a load, the register allocator thinks the costs are identical
and chooses an integer load.  However a dup from an integer register
includes an int->fp transfer which is not modelled.  Adding a '?' to
the integer variant means the cost is increased slightly so we prefer
using a vector register.  This improves the following example:

#include <arm_neon.h>
void f(unsigned *a, uint32x4_t *b)
{
  b[0] = vdupq_n_u32(a[1]);
  b[1] = vdupq_n_u32(a[2]);
}

Before:
	ldr	w2, [x0, 4]
	dup	v0.4s, w2
	str	q0, [x1]
	ldr	w0, [x0, 8]
	dup	v0.4s, w0
	str	q0, [x1, 16]
	ret

After:
	ldr	s0, [x0, 4]
	dup	v0.4s, v0.s[0]
	str	q0, [x1]
	ldr	s0, [x0, 8]
	dup	v0.4s, v0.s[0]
	str	q0, [x1, 16]
	ret

Passes regress & bootstrap, OK for commit?

ChangeLog:
2017-06-20  Wilco Dijkstra  <wdijkstr@arm.com>

	* config/aarch64/aarch64-simd.md (aarch64_simd_dup):
	Swap alternatives, make integer dup more expensive.
--

Comments

James Greenhalgh June 20, 2017, 11:14 a.m. UTC | #1

On Tue, Jun 20, 2017 at 11:57:59AM +0100, Wilco Dijkstra wrote:
> Improve the dup pattern to prefer vector registers.  When doing a dup
> after a load, the register allocator thinks the costs are identical
> and chooses an integer load.  However a dup from an integer register
> includes an int->fp transfer which is not modelled.  Adding a '?' to
> the integer variant means the cost is increased slightly so we prefer
> using a vector register.  This improves the following example:
> 
> #include <arm_neon.h>
> void f(unsigned *a, uint32x4_t *b)
> {
>   b[0] = vdupq_n_u32(a[1]);
>   b[1] = vdupq_n_u32(a[2]);
> }
> 
> Before:
> 	ldr	w2, [x0, 4]
> 	dup	v0.4s, w2
> 	str	q0, [x1]
> 	ldr	w0, [x0, 8]
> 	dup	v0.4s, w0
> 	str	q0, [x1, 16]
> 	ret
> 
> After:
> 	ldr	s0, [x0, 4]
> 	dup	v0.4s, v0.s[0]
> 	str	q0, [x1]
> 	ldr	s0, [x0, 8]
> 	dup	v0.4s, v0.s[0]
> 	str	q0, [x1, 16]
> 	ret
> 
> Passes regress & bootstrap, OK for commit?
> 
> ChangeLog:
> 2017-06-20  Wilco Dijkstra  <wdijkstr@arm.com>
> 
> 	* config/aarch64/aarch64-simd.md (aarch64_simd_dup):
> 	Swap alternatives, make integer dup more expensive.

Have you tested this in cases where an integer dup is definitely the right
thing to do?

e.g. in

  #include <arm_neon.h>
  void f(unsigned a, unsigned b, uint32x4_t *c)
  {
    c[0] = vdupq_n_u32(a);
    c[1] = vdupq_n_u32(b);
  }

And similar cases? If these still look good, then the patch is OK - though
I'm still very nervous about the register allocator cost model!

Thanks,
James

Wilco Dijkstra June 20, 2017, 3:19 p.m. UTC | #2

James Greenhalgh wrote:
>
> Have you tested this in cases where an integer dup is definitely the right
> thing to do?

Yes, this still generates:

  #include <arm_neon.h>
  void f(unsigned a, unsigned b, uint32x4_t *c)
  {
    c[0] = vdupq_n_u32(a);
    c[1] = vdupq_n_u32(b);
  }

	dup	v1.4s, w0
	dup	v0.4s, w1
	str	q1, [x2]
	str	q0, [x2, 16]
	ret

The reason is that the GP to FP register move cost is typically >= 5, while
the additional cost of '?' is just 1.

> And similar cases? If these still look good, then the patch is OK - though
> I'm still very nervous about the register allocator cost model!

Well it's complex and hard to get working well... However slightly preferring one
variant works alright (unlike using '*' which results in incorrect costs).

Wilco

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 24ef178b0de253aa2d49aef022d866266216a0d6..695011eae464d806a5cfeeb7253542c27c211c50 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -44,12 +44,12 @@  (define_expand "movmisalign<mode>"
 (define_insn "aarch64_simd_dup<mode>"
   [(set (match_operand:VDQ_I 0 "register_operand" "=w, w")
 	(vec_duplicate:VDQ_I
-	  (match_operand:<VEL> 1 "register_operand" "r, w")))]
+	  (match_operand:<VEL> 1 "register_operand" "w,?r")))]
   "TARGET_SIMD"
   "@
-   dup\\t%0.<Vtype>, %<vw>1
-   dup\\t%0.<Vtype>, %1.<Vetype>[0]"
-  [(set_attr "type" "neon_from_gp<q>, neon_dup<q>")]
+   dup\\t%0.<Vtype>, %1.<Vetype>[0]
+   dup\\t%0.<Vtype>, %<vw>1"
+  [(set_attr "type" "neon_dup<q>, neon_from_gp<q>")]
 )
 
 (define_insn "aarch64_simd_dup<mode>"