diff mbox

[AArch64] Improve float to int moves

Message ID AM5PR0802MB2610DDF3A7F7AA8DE2E5446E83110@AM5PR0802MB2610.eurprd08.prod.outlook.com
State New
Headers show

Commit Message

Wilco Dijkstra April 26, 2017, 12:39 p.m. UTC
Float to int moves currently generate inefficient code due to
hacks used in the movsi and movdi patterns.  The 'r = w' variant
uses '*' which explicitly tells the register allocator to ignore it.
As a result float to int moves typically spill to the stack, which is
extremely inefficient.  For example:

static inline unsigned asuint (float f)
{
  union { float f; unsigned i; } u = {f};
  return u.i;
}

float foo (float x)
{
  unsigned i = asuint (x);
  if (__builtin_expect (i > 42, 0))
    return x*x;
  return i;
}

generates:

	sub	sp, sp, #16
	str	s0, [sp, 12]
	ldr	w0, [sp, 12]
	cmp	w0, 42
	bhi	.L7
	scvtf	s0, w0
	add	sp, sp, 16
	ret
.L7:
	fmul	s0, s0, s0
	add	sp, sp, 16
	ret

Removing '*' from the variant generates:

	fmov	w0, s0
	cmp	w0, 42
	bhi	.L6
	scvtf	s0, w0
	ret
.L6:
	fmul	s0, s0, s0
	ret

Passes regress & bootstrap, OK for commit?

ChangeLog:
2017-04-26  Wilco Dijkstra  <wdijkstr@arm.com>

	* config/aarch64/aarch64.md (movsi_aarch64): Remove '*' from r=w.
	(movdi_aarch64): Likewise.

--

Comments

Richard Earnshaw (lists) May 5, 2017, 3:45 p.m. UTC | #1
On 26/04/17 13:39, Wilco Dijkstra wrote:
> Float to int moves currently generate inefficient code due to
> hacks used in the movsi and movdi patterns.  The 'r = w' variant
> uses '*' which explicitly tells the register allocator to ignore it.
> As a result float to int moves typically spill to the stack, which is
> extremely inefficient.  For example:
> 
> static inline unsigned asuint (float f)
> {
>   union { float f; unsigned i; } u = {f};
>   return u.i;
> }
> 
> float foo (float x)
> {
>   unsigned i = asuint (x);
>   if (__builtin_expect (i > 42, 0))
>     return x*x;
>   return i;
> }
> 
> generates:
> 
> 	sub	sp, sp, #16
> 	str	s0, [sp, 12]
> 	ldr	w0, [sp, 12]
> 	cmp	w0, 42
> 	bhi	.L7
> 	scvtf	s0, w0
> 	add	sp, sp, 16
> 	ret
> .L7:
> 	fmul	s0, s0, s0
> 	add	sp, sp, 16
> 	ret
> 
> Removing '*' from the variant generates:
> 
> 	fmov	w0, s0
> 	cmp	w0, 42
> 	bhi	.L6
> 	scvtf	s0, w0
> 	ret
> .L6:
> 	fmul	s0, s0, s0
> 	ret
> 
> Passes regress & bootstrap, OK for commit?
> 
> ChangeLog:
> 2017-04-26  Wilco Dijkstra  <wdijkstr@arm.com>
> 
> 	* config/aarch64/aarch64.md (movsi_aarch64): Remove '*' from r=w.
> 	(movdi_aarch64): Likewise.
> 

OK.

While on the subject, why is the w->w operation also hidden?

R.

> --
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 51368e29f2d1fd12f48a972bd81a08589a720e07..d656e92e1ff02bdc90c824227ec3b2e1ccfe665a 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1026,8 +1026,8 @@ (define_expand "mov<mode>"
>  )
>  
>  (define_insn_and_split "*movsi_aarch64"
> -  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r  ,*w, r,*w")
> -	(match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,n,m, m,rZ,*w,S,Ush,rZ,*w,*w"))]
> +  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r  ,*w,r,*w")
> +	(match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,n,m, m,rZ,*w,S,Ush,rZ,w,*w"))]
>    "(register_operand (operands[0], SImode)
>      || aarch64_reg_or_zero (operands[1], SImode))"
>    "@
> @@ -1058,8 +1058,8 @@ (define_insn_and_split "*movsi_aarch64"
>  )
>  
>  (define_insn_and_split "*movdi_aarch64"
> -  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r,  *w, r,*w,w")
> -	(match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,n,m, m,rZ,*w,S,Ush,rZ,*w,*w,Dd"))]
> +  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r,  *w,r,*w,w")
> +	(match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,n,m, m,rZ,*w,S,Ush,rZ,w,*w,Dd"))]
>    "(register_operand (operands[0], DImode)
>      || aarch64_reg_or_zero (operands[1], DImode))"
>    "@
>
Wilco Dijkstra May 5, 2017, 4:10 p.m. UTC | #2
Richard Earnshaw (lists) wrote:

> While on the subject, why is the w->w operation also hidden?

No idea, this just fixes one case where it is obvious the use of '*' is incorrect.

However I think all uses of '*' in md files are incorrect and the feature should
be removed. '?' already exists for cases where the alternative may be expensive.

Wilco
Richard Earnshaw (lists) May 5, 2017, 4:20 p.m. UTC | #3
On 05/05/17 17:10, Wilco Dijkstra wrote:
> Richard Earnshaw (lists) wrote:
> 
>> While on the subject, why is the w->w operation also hidden?
> 
> No idea, this just fixes one case where it is obvious the use of '*' is
> incorrect.
> 
> However I think all uses of '*' in md files are incorrect and the
> feature should
> be removed. '?' already exists for cases where the alternative may be
> expensive.
> 

It's not quite as simple as that.  It may be, however, that we should
only use it for restricting subclasses (eg generally avoiding high
registers on Thumb1).

However, things have changed somewhat since the move to LRA and what was
once true might be quite different now.

R.

> Wilco
Wilco Dijkstra May 5, 2017, 8:35 p.m. UTC | #4
Richard Earnshaw (lists) wrote:
> On 05/05/17 17:10, Wilco Dijkstra wrote:
> > However I think all uses of '*' in md files are incorrect and the
> > feature should
> > be removed. '?' already exists for cases where the alternative may be
> > expensive.
> > 
> 
> It's not quite as simple as that.  It may be, however, that we should
> only use it for restricting subclasses (eg generally avoiding high
> registers on Thumb1).

We needed to add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS to ensure
register preferencing works correctly, so this can be used to choose low/high
registers on Thumb-1.

> However, things have changed somewhat since the move to LRA and what was
> once true might be quite different now.

It's possible that existing uses of '*' were trying to hack around the preferencing
issues, but with this callback I don't think they are needed. We will still require '?'
and '!' to set the relative cost of alternatives - if you have a negate and one of the
operands is a SIMD register (so you need one int<->FP move either way), and SIMD
negate has a higher latency, you need '?' to say that it is more expensive than the
integer negate.

Wilco
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 51368e29f2d1fd12f48a972bd81a08589a720e07..d656e92e1ff02bdc90c824227ec3b2e1ccfe665a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1026,8 +1026,8 @@  (define_expand "mov<mode>"
 )
 
 (define_insn_and_split "*movsi_aarch64"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r  ,*w, r,*w")
-	(match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,n,m, m,rZ,*w,S,Ush,rZ,*w,*w"))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r  ,*w,r,*w")
+	(match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,n,m, m,rZ,*w,S,Ush,rZ,w,*w"))]
   "(register_operand (operands[0], SImode)
     || aarch64_reg_or_zero (operands[1], SImode))"
   "@
@@ -1058,8 +1058,8 @@  (define_insn_and_split "*movsi_aarch64"
 )
 
 (define_insn_and_split "*movdi_aarch64"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r,  *w, r,*w,w")
-	(match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,n,m, m,rZ,*w,S,Ush,rZ,*w,*w,Dd"))]
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r,  *w,r,*w,w")
+	(match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,n,m, m,rZ,*w,S,Ush,rZ,w,*w,Dd"))]
   "(register_operand (operands[0], DImode)
     || aarch64_reg_or_zero (operands[1], DImode))"
   "@