[AArch64] Improve float to int moves

Message ID	AM5PR0802MB2610DDF3A7F7AA8DE2E5446E83110@AM5PR0802MB2610.eurprd08.prod.outlook.com
State	New
Headers	show Return-Path: <gcc-patches-return-452186-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type :content-transfer-encoding:mime-version; q=dns; s=default; b=dzj D8t/rmpiSG37lMaxQ0dVg6dRAFZT9bUsiic1JGq4A4l5k87tbncFm9nTwpI/EuIv ycZfG7O4sRi2+G6XbUysSg1q8BKIExOITIGq73h6qOaL14iBss0PpdF23yTcxyM4 c5xkXnoi8OquTYXBZc+6+9Bm6Cy1nfanXI9aRj8s= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: Wilco Dijkstra <Wilco.Dijkstra@arm.com> To: GCC Patches <gcc-patches@gcc.gnu.org> CC: nd <nd@arm.com>, James Greenhalgh <James.Greenhalgh@arm.com>, "Richard Earnshaw" <Richard.Earnshaw@arm.com> Subject: [PATCH][AArch64] Improve float to int moves Date: Wed, 26 Apr 2017 12:39:51 +0000 Message-ID: <AM5PR0802MB2610DDF3A7F7AA8DE2E5446E83110@AM5PR0802MB2610.eurprd08.prod.outlook.com> nodisclaimer: True spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0

Message ID

AM5PR0802MB2610DDF3A7F7AA8DE2E5446E83110@AM5PR0802MB2610.eurprd08.prod.outlook.com

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:content-type
	:content-transfer-encoding:mime-version; q=dns; s=default; b=dzj
	D8t/rmpiSG37lMaxQ0dVg6dRAFZT9bUsiic1JGq4A4l5k87tbncFm9nTwpI/EuIv
	ycZfG7O4sRi2+G6XbUysSg1q8BKIExOITIGq73h6qOaL14iBss0PpdF23yTcxyM4
	c5xkXnoi8OquTYXBZc+6+9Bm6Cy1nfanXI9aRj8s=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>
CC: nd <nd@arm.com>, James Greenhalgh <James.Greenhalgh@arm.com>,
	"Richard Earnshaw" <Richard.Earnshaw@arm.com>
Subject: [PATCH][AArch64] Improve float to int moves
Date: Wed, 26 Apr 2017 12:39:51 +0000
Message-ID: <AM5PR0802MB2610DDF3A7F7AA8DE2E5446E83110@AM5PR0802MB2610.eurprd08.prod.outlook.com>
nodisclaimer: True
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 26 Apr 2017 12:39:51.9807
	(UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0802MB2385

Commit Message

Wilco Dijkstra April 26, 2017, 12:39 p.m. UTC

Float to int moves currently generate inefficient code due to
hacks used in the movsi and movdi patterns.  The 'r = w' variant
uses '*' which explicitly tells the register allocator to ignore it.
As a result float to int moves typically spill to the stack, which is
extremely inefficient.  For example:

static inline unsigned asuint (float f)
{
  union { float f; unsigned i; } u = {f};
  return u.i;
}

float foo (float x)
{
  unsigned i = asuint (x);
  if (__builtin_expect (i > 42, 0))
    return x*x;
  return i;
}

generates:

	sub	sp, sp, #16
	str	s0, [sp, 12]
	ldr	w0, [sp, 12]
	cmp	w0, 42
	bhi	.L7
	scvtf	s0, w0
	add	sp, sp, 16
	ret
.L7:
	fmul	s0, s0, s0
	add	sp, sp, 16
	ret

Removing '*' from the variant generates:

	fmov	w0, s0
	cmp	w0, 42
	bhi	.L6
	scvtf	s0, w0
	ret
.L6:
	fmul	s0, s0, s0
	ret

Passes regress & bootstrap, OK for commit?

ChangeLog:
2017-04-26  Wilco Dijkstra  <wdijkstr@arm.com>

	* config/aarch64/aarch64.md (movsi_aarch64): Remove '*' from r=w.
	(movdi_aarch64): Likewise.

--

Comments

Richard Earnshaw (lists) May 5, 2017, 3:45 p.m. UTC | #1

On 26/04/17 13:39, Wilco Dijkstra wrote:
> Float to int moves currently generate inefficient code due to
> hacks used in the movsi and movdi patterns.  The 'r = w' variant
> uses '*' which explicitly tells the register allocator to ignore it.
> As a result float to int moves typically spill to the stack, which is
> extremely inefficient.  For example:
> 
> static inline unsigned asuint (float f)
> {
>   union { float f; unsigned i; } u = {f};
>   return u.i;
> }
> 
> float foo (float x)
> {
>   unsigned i = asuint (x);
>   if (__builtin_expect (i > 42, 0))
>     return x*x;
>   return i;
> }
> 
> generates:
> 
> 	sub	sp, sp, #16
> 	str	s0, [sp, 12]
> 	ldr	w0, [sp, 12]
> 	cmp	w0, 42
> 	bhi	.L7
> 	scvtf	s0, w0
> 	add	sp, sp, 16
> 	ret
> .L7:
> 	fmul	s0, s0, s0
> 	add	sp, sp, 16
> 	ret
> 
> Removing '*' from the variant generates:
> 
> 	fmov	w0, s0
> 	cmp	w0, 42
> 	bhi	.L6
> 	scvtf	s0, w0
> 	ret
> .L6:
> 	fmul	s0, s0, s0
> 	ret
> 
> Passes regress & bootstrap, OK for commit?
> 
> ChangeLog:
> 2017-04-26  Wilco Dijkstra  <wdijkstr@arm.com>
> 
> 	* config/aarch64/aarch64.md (movsi_aarch64): Remove '*' from r=w.
> 	(movdi_aarch64): Likewise.
> 

OK.

While on the subject, why is the w->w operation also hidden?

R.

> --
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 51368e29f2d1fd12f48a972bd81a08589a720e07..d656e92e1ff02bdc90c824227ec3b2e1ccfe665a 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1026,8 +1026,8 @@ (define_expand "mov<mode>"
>  )
>  
>  (define_insn_and_split "*movsi_aarch64"
> -  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r  ,*w, r,*w")
> -	(match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,n,m, m,rZ,*w,S,Ush,rZ,*w,*w"))]
> +  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r  ,*w,r,*w")
> +	(match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,n,m, m,rZ,*w,S,Ush,rZ,w,*w"))]
>    "(register_operand (operands[0], SImode)
>      || aarch64_reg_or_zero (operands[1], SImode))"
>    "@
> @@ -1058,8 +1058,8 @@ (define_insn_and_split "*movsi_aarch64"
>  )
>  
>  (define_insn_and_split "*movdi_aarch64"
> -  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r,  *w, r,*w,w")
> -	(match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,n,m, m,rZ,*w,S,Ush,rZ,*w,*w,Dd"))]
> +  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r,  *w,r,*w,w")
> +	(match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,n,m, m,rZ,*w,S,Ush,rZ,w,*w,Dd"))]
>    "(register_operand (operands[0], DImode)
>      || aarch64_reg_or_zero (operands[1], DImode))"
>    "@
>

Wilco Dijkstra May 5, 2017, 4:10 p.m. UTC | #2

Richard Earnshaw (lists) wrote:

> While on the subject, why is the w->w operation also hidden?

No idea, this just fixes one case where it is obvious the use of '*' is incorrect.

However I think all uses of '*' in md files are incorrect and the feature should
be removed. '?' already exists for cases where the alternative may be expensive.

Wilco

Richard Earnshaw (lists) May 5, 2017, 4:20 p.m. UTC | #3

On 05/05/17 17:10, Wilco Dijkstra wrote:
> Richard Earnshaw (lists) wrote:
> 
>> While on the subject, why is the w->w operation also hidden?
> 
> No idea, this just fixes one case where it is obvious the use of '*' is
> incorrect.
> 
> However I think all uses of '*' in md files are incorrect and the
> feature should
> be removed. '?' already exists for cases where the alternative may be
> expensive.
> 

It's not quite as simple as that.  It may be, however, that we should
only use it for restricting subclasses (eg generally avoiding high
registers on Thumb1).

However, things have changed somewhat since the move to LRA and what was
once true might be quite different now.

R.

> Wilco

Wilco Dijkstra May 5, 2017, 8:35 p.m. UTC | #4

Richard Earnshaw (lists) wrote:
> On 05/05/17 17:10, Wilco Dijkstra wrote:
> > However I think all uses of '*' in md files are incorrect and the
> > feature should
> > be removed. '?' already exists for cases where the alternative may be
> > expensive.
> > 
> 
> It's not quite as simple as that.  It may be, however, that we should
> only use it for restricting subclasses (eg generally avoiding high
> registers on Thumb1).

We needed to add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS to ensure
register preferencing works correctly, so this can be used to choose low/high
registers on Thumb-1.

> However, things have changed somewhat since the move to LRA and what was
> once true might be quite different now.

It's possible that existing uses of '*' were trying to hack around the preferencing
issues, but with this callback I don't think they are needed. We will still require '?'
and '!' to set the relative cost of alternatives - if you have a negate and one of the
operands is a SIMD register (so you need one int<->FP move either way), and SIMD
negate has a higher latency, you need '?' to say that it is more expensive than the
integer negate.

Wilco

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 51368e29f2d1fd12f48a972bd81a08589a720e07..d656e92e1ff02bdc90c824227ec3b2e1ccfe665a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1026,8 +1026,8 @@  (define_expand "mov<mode>"
 )
 
 (define_insn_and_split "*movsi_aarch64"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r  ,*w, r,*w")
-	(match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,n,m, m,rZ,*w,S,Ush,rZ,*w,*w"))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r  ,*w,r,*w")
+	(match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,n,m, m,rZ,*w,S,Ush,rZ,w,*w"))]
   "(register_operand (operands[0], SImode)
     || aarch64_reg_or_zero (operands[1], SImode))"
   "@
@@ -1058,8 +1058,8 @@  (define_insn_and_split "*movsi_aarch64"
 )
 
 (define_insn_and_split "*movdi_aarch64"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r,  *w, r,*w,w")
-	(match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,n,m, m,rZ,*w,S,Ush,rZ,*w,*w,Dd"))]
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r,  *w,r,*w,w")
+	(match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,n,m, m,rZ,*w,S,Ush,rZ,w,*w,Dd"))]
   "(register_operand (operands[0], DImode)
     || aarch64_reg_or_zero (operands[1], DImode))"
   "@