diff mbox series

[aarch64] PR 89628 - fix register allocation in SIMD functions

Message ID 0966cb22817a24bf206cbeb9272bb4d441cf9528.camel@marvell.com
State New
Headers show
Series [aarch64] PR 89628 - fix register allocation in SIMD functions | expand

Commit Message

Steve Ellcey March 8, 2019, 9:36 p.m. UTC
This is a patch to fix the register allocation in SIMD functions.  In
normal functions registers V16 to V31 are all caller saved.  In SIMD
functions V16 to V23 are callee saved and V24 to V31 are caller saved.
This means that SIMD functions should use V24 to V31 before V16 to V23
in order to avoid some saves and restores.

My fix for this is to define REG_ALLOC_ORDER.  Right now it is not
defined on Aarch64, so I just defined it as all the registers in order
except for putting V24 to V31 before V16 to V23.  This fixes the
register allocation in SIMD functions.  It also changes the register
allocation order in regular functions but since all the registers (V16
to V31) are caller saved in that case, it doesn't matter.  We could use
ADJUST_REG_ALLOC_ORDER to only affect SIMD functions but I did not see
any reason to do that.

Tested on ThunderX2 Aarch64 with no regressions.

OK to checkin?


2018-03-08  Steve Ellcey  <sellcey@marvell.com>

	PR target/89628
	* config/aarch64/aarch64.h (V16-V31): Update comment.
	(REG_ALLOC_ORDER): New macro.

Comments

Richard Sandiford March 9, 2019, 8:03 a.m. UTC | #1
Steve Ellcey <sellcey@marvell.com> writes:
> This is a patch to fix the register allocation in SIMD functions.  In
> normal functions registers V16 to V31 are all caller saved.  In SIMD
> functions V16 to V23 are callee saved and V24 to V31 are caller saved.
> This means that SIMD functions should use V24 to V31 before V16 to V23
> in order to avoid some saves and restores.
>
> My fix for this is to define REG_ALLOC_ORDER.  Right now it is not
> defined on Aarch64, so I just defined it as all the registers in order
> except for putting V24 to V31 before V16 to V23.  This fixes the
> register allocation in SIMD functions.  It also changes the register
> allocation order in regular functions but since all the registers (V16
> to V31) are caller saved in that case, it doesn't matter.  We could use
> ADJUST_REG_ALLOC_ORDER to only affect SIMD functions but I did not see
> any reason to do that.

REG_ALLOC_ORDER shouldn't really be needed for testcases like the ones
in the PR.  Like you say, we don't currently need it to handle the
equivalent situation for the standard ABI.

I think the problem is that the RA is still using the register sets
for the default ABI when evaluating the prologue/epilogue cost of using
a hard register.  E.g. see calculate_saved_nregs.

Maybe one fix would be to add an equivalent of call_used_reg_set to
rtl_data.  By default it would be the same as call_used_reg_set,
but the target could have an opportunity to change it.  Then code like
calculate_saved_nregs should use the new set to find out what registers
the current function can use without spilling.

This would also be useful for targets that implement interrupt handler
attributes.

It would be good to add the testcase in the PR to the testsuite,
with a scan-assembler to check for spills.

> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 7bd3bf5..d3723ff 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -328,7 +328,9 @@ extern unsigned aarch64_architecture_version;
>     ZR		zero register, encoded as X/R31 elsewhere
>  
>     32 x 128-bit floating-point/vector registers
> -   V16-V31	Caller-saved (temporary) registers
> +   V24-V31	Caller-saved (temporary) registers
> +   V16-V23	Caller-saved (temporary) registers in most functions;
> +		Callee-saved in SIMD functions.
>     V8-V15	Callee-saved registers
>     V0-V7	Parameter/result registers

Probably better as s/SIMD/vector PCS/.  The hunk above is OK with that
change, independently of the rest.

Thanks,
Richard
Steve Ellcey March 11, 2019, 4:10 p.m. UTC | #2
Richard,

I don't necessarily disagree with anything in your comments and long
term I think that is the right direction, but I wonder if that level of
change is appropriate for GCC Stage 4 which is where we are now.  Your
changes would require fixes in shared code, whereas setting
REG_ALLOC_ORDER only affects Aarch64 and seems like a safer change.
I am not sure how long it would take me to implement something along
the lines of what you are suggesting.

Steve Ellcey
sellcey@marvell.com


On Sat, 2019-03-09 at 08:03 +0000, Richard Sandiford wrote:

> Steve Ellcey <sellcey@marvell.com> writes:
> > This is a patch to fix the register allocation in SIMD functions.  In
> > normal functions registers V16 to V31 are all caller saved.  In SIMD
> > functions V16 to V23 are callee saved and V24 to V31 are caller saved.
> > This means that SIMD functions should use V24 to V31 before V16 to V23
> > in order to avoid some saves and restores.
> > 
> > My fix for this is to define REG_ALLOC_ORDER.  Right now it is not
> > defined on Aarch64, so I just defined it as all the registers in order
> > except for putting V24 to V31 before V16 to V23.  This fixes the
> > register allocation in SIMD functions.  It also changes the register
> > allocation order in regular functions but since all the registers (V16
> > to V31) are caller saved in that case, it doesn't matter.  We could use
> > ADJUST_REG_ALLOC_ORDER to only affect SIMD functions but I did not see
> > any reason to do that.
> 
> REG_ALLOC_ORDER shouldn't really be needed for testcases like the ones
> in the PR.  Like you say, we don't currently need it to handle the
> equivalent situation for the standard ABI.
> 
> I think the problem is that the RA is still using the register sets
> for the default ABI when evaluating the prologue/epilogue cost of using
> a hard register.  E.g. see calculate_saved_nregs.
> 
> Maybe one fix would be to add an equivalent of call_used_reg_set to
> rtl_data.  By default it would be the same as call_used_reg_set,
> but the target could have an opportunity to change it.  Then code like
> calculate_saved_nregs should use the new set to find out what registers
> the current function can use without spilling.
> 
> This would also be useful for targets that implement interrupt handler
> attributes.
> 
> It would be good to add the testcase in the PR to the testsuite,
> with a scan-assembler to check for spills.
> 
> > diff --git a/gcc/config/aarch64/aarch64.h
> > b/gcc/config/aarch64/aarch64.h
> > index 7bd3bf5..d3723ff 100644
> > --- a/gcc/config/aarch64/aarch64.h
> > +++ b/gcc/config/aarch64/aarch64.h
> > @@ -328,7 +328,9 @@ extern unsigned aarch64_architecture_version;
> >     ZR		zero register, encoded as X/R31 elsewhere
> >  
> >     32 x 128-bit floating-point/vector registers
> > -   V16-V31	Caller-saved (temporary) registers
> > +   V24-V31	Caller-saved (temporary) registers
> > +   V16-V23	Caller-saved (temporary) registers in most functions;
> > +		Callee-saved in SIMD functions.
> >     V8-V15	Callee-saved registers
> >     V0-V7	Parameter/result registers
> 
> Probably better as s/SIMD/vector PCS/.  The hunk above is OK with
> that
> change, independently of the rest.
> 
> Thanks,
> Richard
James Greenhalgh March 22, 2019, 5:35 p.m. UTC | #3
On Mon, Mar 11, 2019 at 04:10:15PM +0000, Steve Ellcey wrote:
> Richard,
> 
> I don't necessarily disagree with anything in your comments and long
> term I think that is the right direction, but I wonder if that level of
> change is appropriate for GCC Stage 4 which is where we are now.  Your
> changes would require fixes in shared code, whereas setting
> REG_ALLOC_ORDER only affects Aarch64 and seems like a safer change.
> I am not sure how long it would take me to implement something along
> the lines of what you are suggesting.

I'll leave it to Richard to decide, but your workaround seems like the
right level of risk for this time of the release. I'd be happy taking it.

Thanks,
James

> 
> On Sat, 2019-03-09 at 08:03 +0000, Richard Sandiford wrote:
> 
> > Steve Ellcey <sellcey@marvell.com> writes:
> > > This is a patch to fix the register allocation in SIMD functions.  In
> > > normal functions registers V16 to V31 are all caller saved.  In SIMD
> > > functions V16 to V23 are callee saved and V24 to V31 are caller saved.
> > > This means that SIMD functions should use V24 to V31 before V16 to V23
> > > in order to avoid some saves and restores.
> > > 
> > > My fix for this is to define REG_ALLOC_ORDER.  Right now it is not
> > > defined on Aarch64, so I just defined it as all the registers in order
> > > except for putting V24 to V31 before V16 to V23.  This fixes the
> > > register allocation in SIMD functions.  It also changes the register
> > > allocation order in regular functions but since all the registers (V16
> > > to V31) are caller saved in that case, it doesn't matter.  We could use
> > > ADJUST_REG_ALLOC_ORDER to only affect SIMD functions but I did not see
> > > any reason to do that.
> > 
> > REG_ALLOC_ORDER shouldn't really be needed for testcases like the ones
> > in the PR.  Like you say, we don't currently need it to handle the
> > equivalent situation for the standard ABI.
> > 
> > I think the problem is that the RA is still using the register sets
> > for the default ABI when evaluating the prologue/epilogue cost of using
> > a hard register.  E.g. see calculate_saved_nregs.
> > 
> > Maybe one fix would be to add an equivalent of call_used_reg_set to
> > rtl_data.  By default it would be the same as call_used_reg_set,
> > but the target could have an opportunity to change it.  Then code like
> > calculate_saved_nregs should use the new set to find out what registers
> > the current function can use without spilling.
> > 
> > This would also be useful for targets that implement interrupt handler
> > attributes.
> > 
> > It would be good to add the testcase in the PR to the testsuite,
> > with a scan-assembler to check for spills.
> > 
> > > diff --git a/gcc/config/aarch64/aarch64.h
> > > b/gcc/config/aarch64/aarch64.h
> > > index 7bd3bf5..d3723ff 100644
> > > --- a/gcc/config/aarch64/aarch64.h
> > > +++ b/gcc/config/aarch64/aarch64.h
> > > @@ -328,7 +328,9 @@ extern unsigned aarch64_architecture_version;
> > >     ZR		zero register, encoded as X/R31 elsewhere
> > >  
> > >     32 x 128-bit floating-point/vector registers
> > > -   V16-V31	Caller-saved (temporary) registers
> > > +   V24-V31	Caller-saved (temporary) registers
> > > +   V16-V23	Caller-saved (temporary) registers in most functions;
> > > +		Callee-saved in SIMD functions.
> > >     V8-V15	Callee-saved registers
> > >     V0-V7	Parameter/result registers
> > 
> > Probably better as s/SIMD/vector PCS/.  The hunk above is OK with
> > that
> > change, independently of the rest.
> > 
> > Thanks,
> > Richard
James Greenhalgh March 22, 2019, 5:47 p.m. UTC | #4
On Fri, Mar 22, 2019 at 05:35:02PM +0000, James Greenhalgh wrote:
> On Mon, Mar 11, 2019 at 04:10:15PM +0000, Steve Ellcey wrote:
> > Richard,
> > 
> > I don't necessarily disagree with anything in your comments and long
> > term I think that is the right direction, but I wonder if that level of
> > change is appropriate for GCC Stage 4 which is where we are now.  Your
> > changes would require fixes in shared code, whereas setting
> > REG_ALLOC_ORDER only affects Aarch64 and seems like a safer change.
> > I am not sure how long it would take me to implement something along
> > the lines of what you are suggesting.
> 
> I'll leave it to Richard to decide, but your workaround seems like the
> right level of risk for this time of the release. I'd be happy taking it.

Excuse me, I missed the fork in this thread (didn't hit my "AArch64"
filter). I'll try to take a look at Richard's approach early next week.

Thanks,
James

> 
> Thanks,
> James
> 
> > 
> > On Sat, 2019-03-09 at 08:03 +0000, Richard Sandiford wrote:
> > 
> > > Steve Ellcey <sellcey@marvell.com> writes:
> > > > This is a patch to fix the register allocation in SIMD functions.  In
> > > > normal functions registers V16 to V31 are all caller saved.  In SIMD
> > > > functions V16 to V23 are callee saved and V24 to V31 are caller saved.
> > > > This means that SIMD functions should use V24 to V31 before V16 to V23
> > > > in order to avoid some saves and restores.
> > > > 
> > > > My fix for this is to define REG_ALLOC_ORDER.  Right now it is not
> > > > defined on Aarch64, so I just defined it as all the registers in order
> > > > except for putting V24 to V31 before V16 to V23.  This fixes the
> > > > register allocation in SIMD functions.  It also changes the register
> > > > allocation order in regular functions but since all the registers (V16
> > > > to V31) are caller saved in that case, it doesn't matter.  We could use
> > > > ADJUST_REG_ALLOC_ORDER to only affect SIMD functions but I did not see
> > > > any reason to do that.
> > > 
> > > REG_ALLOC_ORDER shouldn't really be needed for testcases like the ones
> > > in the PR.  Like you say, we don't currently need it to handle the
> > > equivalent situation for the standard ABI.
> > > 
> > > I think the problem is that the RA is still using the register sets
> > > for the default ABI when evaluating the prologue/epilogue cost of using
> > > a hard register.  E.g. see calculate_saved_nregs.
> > > 
> > > Maybe one fix would be to add an equivalent of call_used_reg_set to
> > > rtl_data.  By default it would be the same as call_used_reg_set,
> > > but the target could have an opportunity to change it.  Then code like
> > > calculate_saved_nregs should use the new set to find out what registers
> > > the current function can use without spilling.
> > > 
> > > This would also be useful for targets that implement interrupt handler
> > > attributes.
> > > 
> > > It would be good to add the testcase in the PR to the testsuite,
> > > with a scan-assembler to check for spills.
> > > 
> > > > diff --git a/gcc/config/aarch64/aarch64.h
> > > > b/gcc/config/aarch64/aarch64.h
> > > > index 7bd3bf5..d3723ff 100644
> > > > --- a/gcc/config/aarch64/aarch64.h
> > > > +++ b/gcc/config/aarch64/aarch64.h
> > > > @@ -328,7 +328,9 @@ extern unsigned aarch64_architecture_version;
> > > >     ZR		zero register, encoded as X/R31 elsewhere
> > > >  
> > > >     32 x 128-bit floating-point/vector registers
> > > > -   V16-V31	Caller-saved (temporary) registers
> > > > +   V24-V31	Caller-saved (temporary) registers
> > > > +   V16-V23	Caller-saved (temporary) registers in most functions;
> > > > +		Callee-saved in SIMD functions.
> > > >     V8-V15	Callee-saved registers
> > > >     V0-V7	Parameter/result registers
> > > 
> > > Probably better as s/SIMD/vector PCS/.  The hunk above is OK with
> > > that
> > > change, independently of the rest.
> > > 
> > > Thanks,
> > > Richard
diff mbox series

Patch

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 7bd3bf5..d3723ff 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -328,7 +328,9 @@  extern unsigned aarch64_architecture_version;
    ZR		zero register, encoded as X/R31 elsewhere
 
    32 x 128-bit floating-point/vector registers
-   V16-V31	Caller-saved (temporary) registers
+   V24-V31	Caller-saved (temporary) registers
+   V16-V23	Caller-saved (temporary) registers in most functions;
+		Callee-saved in SIMD functions.
    V8-V15	Callee-saved registers
    V0-V7	Parameter/result registers
 
@@ -431,6 +433,26 @@  extern unsigned aarch64_architecture_version;
     V_ALIASES(28), V_ALIASES(29), V_ALIASES(30), V_ALIASES(31)  \
   }
 
+/* This is the default order for everything except V16-V23.  These
+   are caller saved in most functions but callee saved in SIMD
+   functions so we want to use V24-V31 which are always
+   caller saved before using these registers. */
+
+#define REG_ALLOC_ORDER						\
+  {								\
+     0,  1,  2,  3,    4,  5,  6,  7,	/* R0 - R7 */		\
+     8,  9, 10, 11,   12, 13, 14, 15,	/* R8 - R15 */		\
+    16, 17, 18, 19,   20, 21, 22, 23,	/* R16 - R23 */		\
+    24, 25, 26, 27,   28, 29, 30, 31,	/* R24 - R30, SP */	\
+    32, 33, 34, 35,   36, 37, 38, 39,	/* V0 - V7 */		\
+    40, 41, 42, 43,   44, 45, 46, 47,	/* V8 - V15 */		\
+    56, 57, 58, 59,   60, 61, 62, 63,	/* V24 - V31 */		\
+    48, 49, 50, 51,   52, 53, 54, 55,	/* V16 - V23 */		\
+    64, 65, 66, 57,			/* SFP, AP, CC, VG */	\
+    68, 69, 70, 71,   72, 73, 74, 75,	/* P0 - P7 */		\
+    76, 77, 78, 79,   80, 81, 82, 83,	/* P8 - P15 */		\
+  }
+
 #define EPILOGUE_USES(REGNO) (aarch64_epilogue_uses (REGNO))
 
 /* EXIT_IGNORE_STACK should be nonzero if, when returning from a function,