diff mbox series

New target hook for function scratch registers (PR 89628)

Message ID mpt7ed42zhv.fsf_-_@arm.com
State New
Headers show
Series New target hook for function scratch registers (PR 89628) | expand

Commit Message

Richard Sandiford March 12, 2019, 9:50 a.m. UTC
Steve Ellcey <sellcey@marvell.com> writes:
> Richard,
>
> I don't necessarily disagree with anything in your comments and long
> term I think that is the right direction, but I wonder if that level of
> change is appropriate for GCC Stage 4 which is where we are now.  Your
> changes would require fixes in shared code, whereas setting
> REG_ALLOC_ORDER only affects Aarch64 and seems like a safer change.

I guess it's weighing invasiveness in terms of lines of code/location of
code vs. invasiveness in terms of the effect the code has.  Changing
REG_ALLOC_ORDER affects all functinos that use significant numbers of
SIMD registers, which could have unexpected knock-on effects like
performance regressions or exposing latent bugs.  It would also make
the RA try V24 before V16 for callers of vector PCS functions,
even though they should prefer V16.

Keeping the change specific to callees that use the new vector PCS
feature seems more conservative from that point of view.

> I am not sure how long it would take me to implement something along
> the lines of what you are suggesting.

OK, in that case, I thought I'd give a go.

This patch adds a new target hook that says which registers a function
can use without saving them in the prologue and restoring them in the
epilogue.  By default this is call_used_reg_set.

The patch only makes IRA use the hook, and in that context it's just
an (important) optimisation.  But I think we need the same thing for
passes like regrename, where it'll be a correctness issue.  I'll follow
on with patches there if this one is OK.

Tested on aarch64-linux-gnu (with and without SVE) and x86_64-linux-gnu.
OK to install?

Richard


2019-03-12  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	PR target/89628
	* target.def (function_scratch_regs): New hook.
	* doc/tm.texi.in (TARGET_FUNCTION_SCRATCH_REGS): New placeholder.
	* doc/tm.texi: Regenerate.
	* targhooks.h (default_function_scratch_regs): Declare.
	* targhooks.c (default_function_scratch_regs): New function.
	* emit-rtl.h (rtl_data::scratch_regs): New member variable.
	* function.c (init_dummy_function_start): Initialize it.
	(init_function_start): Likewise.
	* ira-color.c (calculate_saved_nregs): Use it instead of
	call_used_reg_set.
	* config/aarch64/aarch64.c (aarch64_function_scratch_regs): New
	function.
	(TARGET_FUNCTION_SCRATCH_REGS): Redefine.

gcc/testsuite/
	PR target/89628
	* gcc.target/aarch64/pr89628.c: New test.

Comments

Richard Sandiford March 22, 2019, 1 p.m. UTC | #1
Ping

Richard Sandiford <richard.sandiford@arm.com> writes:
> Steve Ellcey <sellcey@marvell.com> writes:
>> Richard,
>>
>> I don't necessarily disagree with anything in your comments and long
>> term I think that is the right direction, but I wonder if that level of
>> change is appropriate for GCC Stage 4 which is where we are now.  Your
>> changes would require fixes in shared code, whereas setting
>> REG_ALLOC_ORDER only affects Aarch64 and seems like a safer change.
>
> I guess it's weighing invasiveness in terms of lines of code/location of
> code vs. invasiveness in terms of the effect the code has.  Changing
> REG_ALLOC_ORDER affects all functinos that use significant numbers of
> SIMD registers, which could have unexpected knock-on effects like
> performance regressions or exposing latent bugs.  It would also make
> the RA try V24 before V16 for callers of vector PCS functions,
> even though they should prefer V16.
>
> Keeping the change specific to callees that use the new vector PCS
> feature seems more conservative from that point of view.
>
>> I am not sure how long it would take me to implement something along
>> the lines of what you are suggesting.
>
> OK, in that case, I thought I'd give a go.
>
> This patch adds a new target hook that says which registers a function
> can use without saving them in the prologue and restoring them in the
> epilogue.  By default this is call_used_reg_set.
>
> The patch only makes IRA use the hook, and in that context it's just
> an (important) optimisation.  But I think we need the same thing for
> passes like regrename, where it'll be a correctness issue.  I'll follow
> on with patches there if this one is OK.
>
> Tested on aarch64-linux-gnu (with and without SVE) and x86_64-linux-gnu.
> OK to install?
>
> Richard
>
>
> 2019-03-12  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
> 	PR target/89628
> 	* target.def (function_scratch_regs): New hook.
> 	* doc/tm.texi.in (TARGET_FUNCTION_SCRATCH_REGS): New placeholder.
> 	* doc/tm.texi: Regenerate.
> 	* targhooks.h (default_function_scratch_regs): Declare.
> 	* targhooks.c (default_function_scratch_regs): New function.
> 	* emit-rtl.h (rtl_data::scratch_regs): New member variable.
> 	* function.c (init_dummy_function_start): Initialize it.
> 	(init_function_start): Likewise.
> 	* ira-color.c (calculate_saved_nregs): Use it instead of
> 	call_used_reg_set.
> 	* config/aarch64/aarch64.c (aarch64_function_scratch_regs): New
> 	function.
> 	(TARGET_FUNCTION_SCRATCH_REGS): Redefine.
>
> gcc/testsuite/
> 	PR target/89628
> 	* gcc.target/aarch64/pr89628.c: New test.
>
> Index: gcc/target.def
> ===================================================================
> --- gcc/target.def	2019-03-08 18:14:26.113008680 +0000
> +++ gcc/target.def	2019-03-12 09:34:25.447913041 +0000
> @@ -5763,6 +5763,18 @@ The default version of this hook always
>   default_hard_regno_scratch_ok)
>  
>  DEFHOOK
> +(function_scratch_regs,
> + "This hook sets @var{regs} to the set of registers that function @var{fn}\n\
> +can use without having to save them in the prologue and restore them in the\n\
> +epilogue.\n\
> +\n\
> +By default this set is the same as @code{call_used_reg_set}.  Targets only\n\
> +need to override the hook if some functions implement alternative ABIs that\n\
> +save more registers or fewer registers than normal.",
> + void, (struct function *fn, HARD_REG_SET *regs),
> + default_function_scratch_regs)
> +
> +DEFHOOK
>  (hard_regno_call_part_clobbered,
>   "This hook should return true if @var{regno} is partly call-saved and\n\
>  partly call-clobbered, and if a value of mode @var{mode} would be partly\n\
> Index: gcc/doc/tm.texi.in
> ===================================================================
> --- gcc/doc/tm.texi.in	2019-03-08 18:14:25.577010718 +0000
> +++ gcc/doc/tm.texi.in	2019-03-12 09:34:25.443913054 +0000
> @@ -1705,6 +1705,11 @@ of @code{CALL_USED_REGISTERS}.
>  @cindex call-used register
>  @cindex call-clobbered register
>  @cindex call-saved register
> +@hook TARGET_FUNCTION_SCRATCH_REGS
> +
> +@cindex call-used register
> +@cindex call-clobbered register
> +@cindex call-saved register
>  @hook TARGET_HARD_REGNO_CALL_PART_CLOBBERED
>  
>  @hook TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS
> Index: gcc/doc/tm.texi
> ===================================================================
> --- gcc/doc/tm.texi	2019-03-08 18:14:25.573010734 +0000
> +++ gcc/doc/tm.texi	2019-03-12 09:34:25.443913054 +0000
> @@ -1894,6 +1894,19 @@ of @code{CALL_USED_REGISTERS}.
>  @cindex call-used register
>  @cindex call-clobbered register
>  @cindex call-saved register
> +@deftypefn {Target Hook} void TARGET_FUNCTION_SCRATCH_REGS (struct function *@var{fn}, HARD_REG_SET *@var{regs})
> +This hook sets @var{regs} to the set of registers that function @var{fn}
> +can use without having to save them in the prologue and restore them in the
> +epilogue.
> +
> +By default this set is the same as @code{call_used_reg_set}.  Targets only
> +need to override the hook if some functions implement alternative ABIs that
> +save more registers or fewer registers than normal.
> +@end deftypefn
> +
> +@cindex call-used register
> +@cindex call-clobbered register
> +@cindex call-saved register
>  @deftypefn {Target Hook} bool TARGET_HARD_REGNO_CALL_PART_CLOBBERED (rtx_insn *@var{insn}, unsigned int @var{regno}, machine_mode @var{mode})
>  This hook should return true if @var{regno} is partly call-saved and
>  partly call-clobbered, and if a value of mode @var{mode} would be partly
> Index: gcc/targhooks.h
> ===================================================================
> --- gcc/targhooks.h	2019-03-08 18:14:25.849009683 +0000
> +++ gcc/targhooks.h	2019-03-12 09:34:25.447913041 +0000
> @@ -286,5 +286,6 @@ extern bool speculation_safe_value_not_n
>  extern rtx default_speculation_safe_value (machine_mode, rtx, rtx, rtx);
>  extern void default_remove_extra_call_preserved_regs (rtx_insn *,
>  						      HARD_REG_SET *);
> +extern void default_function_scratch_regs (struct function *, HARD_REG_SET *);
>  
>  #endif /* GCC_TARGHOOKS_H */
> Index: gcc/targhooks.c
> ===================================================================
> --- gcc/targhooks.c	2019-03-08 18:14:25.845009699 +0000
> +++ gcc/targhooks.c	2019-03-12 09:34:25.447913041 +0000
> @@ -2379,4 +2379,10 @@ default_remove_extra_call_preserved_regs
>  {
>  }
>  
> +void
> +default_function_scratch_regs (struct function *, HARD_REG_SET *regs)
> +{
> +  COPY_HARD_REG_SET (*regs, call_used_reg_set);
> +}
> +
>  #include "gt-targhooks.h"
> Index: gcc/emit-rtl.h
> ===================================================================
> --- gcc/emit-rtl.h	2019-03-08 18:15:33.700751752 +0000
> +++ gcc/emit-rtl.h	2019-03-12 09:34:25.443913054 +0000
> @@ -292,6 +292,9 @@ struct GTY(()) rtl_data {
>       sets them.  */
>    HARD_REG_SET asm_clobbers;
>  
> +  /* Registers that the function can use without having to save them first.  */
> +  HARD_REG_SET scratch_regs;
> +
>    /* The highest address seen during shorten_branches.  */
>    int max_insn_address;
>  };
> Index: gcc/function.c
> ===================================================================
> --- gcc/function.c	2019-03-08 18:15:33.660751905 +0000
> +++ gcc/function.c	2019-03-12 09:34:25.447913041 +0000
> @@ -4875,6 +4875,7 @@ init_dummy_function_start (void)
>  {
>    push_dummy_function (false);
>    prepare_function_start ();
> +  COPY_HARD_REG_SET (crtl->scratch_regs, call_used_reg_set);
>  }
>  
>  /* Generate RTL for the start of the function SUBR (a FUNCTION_DECL tree node)
> @@ -4888,6 +4889,8 @@ init_function_start (tree subr)
>    initialize_rtl ();
>  
>    prepare_function_start ();
> +  targetm.function_scratch_regs (DECL_STRUCT_FUNCTION (subr),
> +				 &crtl->scratch_regs);
>    decide_function_section (subr);
>  
>    /* Warn if this value is an aggregate type,
> Index: gcc/ira-color.c
> ===================================================================
> --- gcc/ira-color.c	2019-03-08 18:15:26.824777889 +0000
> +++ gcc/ira-color.c	2019-03-12 09:34:25.447913041 +0000
> @@ -1660,7 +1660,7 @@ calculate_saved_nregs (int hard_regno, m
>    ira_assert (hard_regno >= 0);
>    for (i = hard_regno_nregs (hard_regno, mode) - 1; i >= 0; i--)
>      if (!allocated_hardreg_p[hard_regno + i]
> -	&& !TEST_HARD_REG_BIT (call_used_reg_set, hard_regno + i)
> +	&& !TEST_HARD_REG_BIT (crtl->scratch_regs, hard_regno + i)
>  	&& !LOCAL_REGNO (hard_regno + i))
>        nregs++;
>    return nregs;
> Index: gcc/config/aarch64/aarch64.c
> ===================================================================
> --- gcc/config/aarch64/aarch64.c	2019-03-08 18:15:38.228734542 +0000
> +++ gcc/config/aarch64/aarch64.c	2019-03-12 09:34:25.427913117 +0000
> @@ -1706,6 +1706,19 @@ aarch64_simd_call_p (rtx_insn *insn)
>    return aarch64_simd_decl_p (fndecl);
>  }
>  
> +/* Implement TARGET_FUNCTION_SCRATCH_REGS.  Vector PCS functions have
> +   fewer available scratch registers than normal functions do.  */
> +
> +static void
> +aarch64_function_scratch_regs (function *fn, HARD_REG_SET *scratch_regs)
> +{
> +  default_function_scratch_regs (fn, scratch_regs);
> +  if (aarch64_simd_decl_p (fn->decl))
> +    for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +      if (FP_SIMD_SAVED_REGNUM_P (regno))
> +	CLEAR_HARD_REG_BIT (*scratch_regs, regno);
> +}
> +
>  /* Implement TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS.  If INSN calls
>     a function that uses the SIMD ABI, take advantage of the extra
>     call-preserved registers that the ABI provides.  */
> @@ -19208,6 +19221,9 @@ #define TARGET_MODES_TIEABLE_P aarch64_m
>  #define TARGET_HARD_REGNO_CALL_PART_CLOBBERED \
>    aarch64_hard_regno_call_part_clobbered
>  
> +#undef TARGET_FUNCTION_SCRATCH_REGS
> +#define TARGET_FUNCTION_SCRATCH_REGS aarch64_function_scratch_regs
> +
>  #undef TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS
>  #define TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS \
>    aarch64_remove_extra_call_preserved_regs
> Index: gcc/testsuite/gcc.target/aarch64/pr89628.c
> ===================================================================
> --- /dev/null	2019-03-08 11:40:14.606883727 +0000
> +++ gcc/testsuite/gcc.target/aarch64/pr89628.c	2019-03-12 09:34:25.447913041 +0000
> @@ -0,0 +1,25 @@
> +/* { dg-options "-O2" } */
> +
> +typedef __Float32x4_t vec;
> +
> +__attribute__((aarch64_vector_pcs))
> +vec f(vec a0, vec a1, vec a2, vec a3, vec a4, vec a5, vec a6, vec a7)
> +{
> +  vec t0, t1, t2, t3, t4, t5, t6, t7, s0, s1, s2, s3;
> +  t0 = a0 - a7;
> +  t1 = a1 - a6;
> +  t2 = a2 - a5;
> +  t3 = a3 - a4;
> +  t4 = a4 - a3;
> +  t5 = a5 - a2;
> +  t6 = a6 - a1;
> +  t7 = a7 - a0;
> +  s0 = t0 * t1;
> +  s1 = t2 * t3;
> +  s2 = t4 * t5;
> +  s3 = t6 * t7;
> +  return s0 * s1 * s2 * s3 * a0 * a1 * a2 * a3 * a4 * a5 * a6 * a7;
> +}
> +
> +/* { dg-final { scan-assembler-not '\tldr\t' } } */
> +/* { dg-final { scan-assembler-not '\tstr\t' } } */
Richard Biener March 25, 2019, 12:34 p.m. UTC | #2
On Tue, Mar 12, 2019 at 10:50 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Steve Ellcey <sellcey@marvell.com> writes:
> > Richard,
> >
> > I don't necessarily disagree with anything in your comments and long
> > term I think that is the right direction, but I wonder if that level of
> > change is appropriate for GCC Stage 4 which is where we are now.  Your
> > changes would require fixes in shared code, whereas setting
> > REG_ALLOC_ORDER only affects Aarch64 and seems like a safer change.
>
> I guess it's weighing invasiveness in terms of lines of code/location of
> code vs. invasiveness in terms of the effect the code has.  Changing
> REG_ALLOC_ORDER affects all functinos that use significant numbers of
> SIMD registers, which could have unexpected knock-on effects like
> performance regressions or exposing latent bugs.  It would also make
> the RA try V24 before V16 for callers of vector PCS functions,
> even though they should prefer V16.
>
> Keeping the change specific to callees that use the new vector PCS
> feature seems more conservative from that point of view.
>
> > I am not sure how long it would take me to implement something along
> > the lines of what you are suggesting.
>
> OK, in that case, I thought I'd give a go.
>
> This patch adds a new target hook that says which registers a function
> can use without saving them in the prologue and restoring them in the
> epilogue.  By default this is call_used_reg_set.
>
> The patch only makes IRA use the hook, and in that context it's just
> an (important) optimisation.  But I think we need the same thing for
> passes like regrename, where it'll be a correctness issue.  I'll follow
> on with patches there if this one is OK.
>
> Tested on aarch64-linux-gnu (with and without SVE) and x86_64-linux-gnu.
> OK to install?

So why is call_used_reg_set not updated appropriately?  As you say
if you introduce sth else it becomes a correctness issue to use that "else"
rather than call_used_reg_set which was OK up until now.  I wonder how
other targets with a per-function different ABI handle this situation.  Or are
you the first and others at most affect calling conventions here?  That is,
it looks like making (parts of?) this_taret_hard_regs per function or
re-computed at set_cfun time.  There's alrady a hook for that btw,
targetm.set_current_function ().  Why not change call_used_reg_set there?

Richard.

> Richard
>
>
> 2019-03-12  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         PR target/89628
>         * target.def (function_scratch_regs): New hook.
>         * doc/tm.texi.in (TARGET_FUNCTION_SCRATCH_REGS): New placeholder.
>         * doc/tm.texi: Regenerate.
>         * targhooks.h (default_function_scratch_regs): Declare.
>         * targhooks.c (default_function_scratch_regs): New function.
>         * emit-rtl.h (rtl_data::scratch_regs): New member variable.
>         * function.c (init_dummy_function_start): Initialize it.
>         (init_function_start): Likewise.
>         * ira-color.c (calculate_saved_nregs): Use it instead of
>         call_used_reg_set.
>         * config/aarch64/aarch64.c (aarch64_function_scratch_regs): New
>         function.
>         (TARGET_FUNCTION_SCRATCH_REGS): Redefine.
>
> gcc/testsuite/
>         PR target/89628
>         * gcc.target/aarch64/pr89628.c: New test.
>
> Index: gcc/target.def
> ===================================================================
> --- gcc/target.def      2019-03-08 18:14:26.113008680 +0000
> +++ gcc/target.def      2019-03-12 09:34:25.447913041 +0000
> @@ -5763,6 +5763,18 @@ The default version of this hook always
>   default_hard_regno_scratch_ok)
>
>  DEFHOOK
> +(function_scratch_regs,
> + "This hook sets @var{regs} to the set of registers that function @var{fn}\n\
> +can use without having to save them in the prologue and restore them in the\n\
> +epilogue.\n\
> +\n\
> +By default this set is the same as @code{call_used_reg_set}.  Targets only\n\
> +need to override the hook if some functions implement alternative ABIs that\n\
> +save more registers or fewer registers than normal.",
> + void, (struct function *fn, HARD_REG_SET *regs),
> + default_function_scratch_regs)
> +
> +DEFHOOK
>  (hard_regno_call_part_clobbered,
>   "This hook should return true if @var{regno} is partly call-saved and\n\
>  partly call-clobbered, and if a value of mode @var{mode} would be partly\n\
> Index: gcc/doc/tm.texi.in
> ===================================================================
> --- gcc/doc/tm.texi.in  2019-03-08 18:14:25.577010718 +0000
> +++ gcc/doc/tm.texi.in  2019-03-12 09:34:25.443913054 +0000
> @@ -1705,6 +1705,11 @@ of @code{CALL_USED_REGISTERS}.
>  @cindex call-used register
>  @cindex call-clobbered register
>  @cindex call-saved register
> +@hook TARGET_FUNCTION_SCRATCH_REGS
> +
> +@cindex call-used register
> +@cindex call-clobbered register
> +@cindex call-saved register
>  @hook TARGET_HARD_REGNO_CALL_PART_CLOBBERED
>
>  @hook TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS
> Index: gcc/doc/tm.texi
> ===================================================================
> --- gcc/doc/tm.texi     2019-03-08 18:14:25.573010734 +0000
> +++ gcc/doc/tm.texi     2019-03-12 09:34:25.443913054 +0000
> @@ -1894,6 +1894,19 @@ of @code{CALL_USED_REGISTERS}.
>  @cindex call-used register
>  @cindex call-clobbered register
>  @cindex call-saved register
> +@deftypefn {Target Hook} void TARGET_FUNCTION_SCRATCH_REGS (struct function *@var{fn}, HARD_REG_SET *@var{regs})
> +This hook sets @var{regs} to the set of registers that function @var{fn}
> +can use without having to save them in the prologue and restore them in the
> +epilogue.
> +
> +By default this set is the same as @code{call_used_reg_set}.  Targets only
> +need to override the hook if some functions implement alternative ABIs that
> +save more registers or fewer registers than normal.
> +@end deftypefn
> +
> +@cindex call-used register
> +@cindex call-clobbered register
> +@cindex call-saved register
>  @deftypefn {Target Hook} bool TARGET_HARD_REGNO_CALL_PART_CLOBBERED (rtx_insn *@var{insn}, unsigned int @var{regno}, machine_mode @var{mode})
>  This hook should return true if @var{regno} is partly call-saved and
>  partly call-clobbered, and if a value of mode @var{mode} would be partly
> Index: gcc/targhooks.h
> ===================================================================
> --- gcc/targhooks.h     2019-03-08 18:14:25.849009683 +0000
> +++ gcc/targhooks.h     2019-03-12 09:34:25.447913041 +0000
> @@ -286,5 +286,6 @@ extern bool speculation_safe_value_not_n
>  extern rtx default_speculation_safe_value (machine_mode, rtx, rtx, rtx);
>  extern void default_remove_extra_call_preserved_regs (rtx_insn *,
>                                                       HARD_REG_SET *);
> +extern void default_function_scratch_regs (struct function *, HARD_REG_SET *);
>
>  #endif /* GCC_TARGHOOKS_H */
> Index: gcc/targhooks.c
> ===================================================================
> --- gcc/targhooks.c     2019-03-08 18:14:25.845009699 +0000
> +++ gcc/targhooks.c     2019-03-12 09:34:25.447913041 +0000
> @@ -2379,4 +2379,10 @@ default_remove_extra_call_preserved_regs
>  {
>  }
>
> +void
> +default_function_scratch_regs (struct function *, HARD_REG_SET *regs)
> +{
> +  COPY_HARD_REG_SET (*regs, call_used_reg_set);
> +}
> +
>  #include "gt-targhooks.h"
> Index: gcc/emit-rtl.h
> ===================================================================
> --- gcc/emit-rtl.h      2019-03-08 18:15:33.700751752 +0000
> +++ gcc/emit-rtl.h      2019-03-12 09:34:25.443913054 +0000
> @@ -292,6 +292,9 @@ struct GTY(()) rtl_data {
>       sets them.  */
>    HARD_REG_SET asm_clobbers;
>
> +  /* Registers that the function can use without having to save them first.  */
> +  HARD_REG_SET scratch_regs;
> +
>    /* The highest address seen during shorten_branches.  */
>    int max_insn_address;
>  };
> Index: gcc/function.c
> ===================================================================
> --- gcc/function.c      2019-03-08 18:15:33.660751905 +0000
> +++ gcc/function.c      2019-03-12 09:34:25.447913041 +0000
> @@ -4875,6 +4875,7 @@ init_dummy_function_start (void)
>  {
>    push_dummy_function (false);
>    prepare_function_start ();
> +  COPY_HARD_REG_SET (crtl->scratch_regs, call_used_reg_set);
>  }
>
>  /* Generate RTL for the start of the function SUBR (a FUNCTION_DECL tree node)
> @@ -4888,6 +4889,8 @@ init_function_start (tree subr)
>    initialize_rtl ();
>
>    prepare_function_start ();
> +  targetm.function_scratch_regs (DECL_STRUCT_FUNCTION (subr),
> +                                &crtl->scratch_regs);
>    decide_function_section (subr);
>
>    /* Warn if this value is an aggregate type,
> Index: gcc/ira-color.c
> ===================================================================
> --- gcc/ira-color.c     2019-03-08 18:15:26.824777889 +0000
> +++ gcc/ira-color.c     2019-03-12 09:34:25.447913041 +0000
> @@ -1660,7 +1660,7 @@ calculate_saved_nregs (int hard_regno, m
>    ira_assert (hard_regno >= 0);
>    for (i = hard_regno_nregs (hard_regno, mode) - 1; i >= 0; i--)
>      if (!allocated_hardreg_p[hard_regno + i]
> -       && !TEST_HARD_REG_BIT (call_used_reg_set, hard_regno + i)
> +       && !TEST_HARD_REG_BIT (crtl->scratch_regs, hard_regno + i)
>         && !LOCAL_REGNO (hard_regno + i))
>        nregs++;
>    return nregs;
> Index: gcc/config/aarch64/aarch64.c
> ===================================================================
> --- gcc/config/aarch64/aarch64.c        2019-03-08 18:15:38.228734542 +0000
> +++ gcc/config/aarch64/aarch64.c        2019-03-12 09:34:25.427913117 +0000
> @@ -1706,6 +1706,19 @@ aarch64_simd_call_p (rtx_insn *insn)
>    return aarch64_simd_decl_p (fndecl);
>  }
>
> +/* Implement TARGET_FUNCTION_SCRATCH_REGS.  Vector PCS functions have
> +   fewer available scratch registers than normal functions do.  */
> +
> +static void
> +aarch64_function_scratch_regs (function *fn, HARD_REG_SET *scratch_regs)
> +{
> +  default_function_scratch_regs (fn, scratch_regs);
> +  if (aarch64_simd_decl_p (fn->decl))
> +    for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +      if (FP_SIMD_SAVED_REGNUM_P (regno))
> +       CLEAR_HARD_REG_BIT (*scratch_regs, regno);
> +}
> +
>  /* Implement TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS.  If INSN calls
>     a function that uses the SIMD ABI, take advantage of the extra
>     call-preserved registers that the ABI provides.  */
> @@ -19208,6 +19221,9 @@ #define TARGET_MODES_TIEABLE_P aarch64_m
>  #define TARGET_HARD_REGNO_CALL_PART_CLOBBERED \
>    aarch64_hard_regno_call_part_clobbered
>
> +#undef TARGET_FUNCTION_SCRATCH_REGS
> +#define TARGET_FUNCTION_SCRATCH_REGS aarch64_function_scratch_regs
> +
>  #undef TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS
>  #define TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS \
>    aarch64_remove_extra_call_preserved_regs
> Index: gcc/testsuite/gcc.target/aarch64/pr89628.c
> ===================================================================
> --- /dev/null   2019-03-08 11:40:14.606883727 +0000
> +++ gcc/testsuite/gcc.target/aarch64/pr89628.c  2019-03-12 09:34:25.447913041 +0000
> @@ -0,0 +1,25 @@
> +/* { dg-options "-O2" } */
> +
> +typedef __Float32x4_t vec;
> +
> +__attribute__((aarch64_vector_pcs))
> +vec f(vec a0, vec a1, vec a2, vec a3, vec a4, vec a5, vec a6, vec a7)
> +{
> +  vec t0, t1, t2, t3, t4, t5, t6, t7, s0, s1, s2, s3;
> +  t0 = a0 - a7;
> +  t1 = a1 - a6;
> +  t2 = a2 - a5;
> +  t3 = a3 - a4;
> +  t4 = a4 - a3;
> +  t5 = a5 - a2;
> +  t6 = a6 - a1;
> +  t7 = a7 - a0;
> +  s0 = t0 * t1;
> +  s1 = t2 * t3;
> +  s2 = t4 * t5;
> +  s3 = t6 * t7;
> +  return s0 * s1 * s2 * s3 * a0 * a1 * a2 * a3 * a4 * a5 * a6 * a7;
> +}
> +
> +/* { dg-final { scan-assembler-not '\tldr\t' } } */
> +/* { dg-final { scan-assembler-not '\tstr\t' } } */
Richard Sandiford March 25, 2019, 5:48 p.m. UTC | #3
Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Mar 12, 2019 at 10:50 AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Steve Ellcey <sellcey@marvell.com> writes:
>> > Richard,
>> >
>> > I don't necessarily disagree with anything in your comments and long
>> > term I think that is the right direction, but I wonder if that level of
>> > change is appropriate for GCC Stage 4 which is where we are now.  Your
>> > changes would require fixes in shared code, whereas setting
>> > REG_ALLOC_ORDER only affects Aarch64 and seems like a safer change.
>>
>> I guess it's weighing invasiveness in terms of lines of code/location of
>> code vs. invasiveness in terms of the effect the code has.  Changing
>> REG_ALLOC_ORDER affects all functinos that use significant numbers of
>> SIMD registers, which could have unexpected knock-on effects like
>> performance regressions or exposing latent bugs.  It would also make
>> the RA try V24 before V16 for callers of vector PCS functions,
>> even though they should prefer V16.
>>
>> Keeping the change specific to callees that use the new vector PCS
>> feature seems more conservative from that point of view.
>>
>> > I am not sure how long it would take me to implement something along
>> > the lines of what you are suggesting.
>>
>> OK, in that case, I thought I'd give a go.
>>
>> This patch adds a new target hook that says which registers a function
>> can use without saving them in the prologue and restoring them in the
>> epilogue.  By default this is call_used_reg_set.
>>
>> The patch only makes IRA use the hook, and in that context it's just
>> an (important) optimisation.  But I think we need the same thing for
>> passes like regrename, where it'll be a correctness issue.  I'll follow
>> on with patches there if this one is OK.
>>
>> Tested on aarch64-linux-gnu (with and without SVE) and x86_64-linux-gnu.
>> OK to install?
>
> So why is call_used_reg_set not updated appropriately?  As you say
> if you introduce sth else it becomes a correctness issue to use that "else"
> rather than call_used_reg_set which was OK up until now.  I wonder how
> other targets with a per-function different ABI handle this situation.  Or are
> you the first and others at most affect calling conventions here?  That is,
> it looks like making (parts of?) this_taret_hard_regs per function or
> re-computed at set_cfun time.  There's alrady a hook for that btw,
> targetm.set_current_function ().  Why not change call_used_reg_set there?

The problem is that call_used_reg_set is a global that describes the
base ABI.  It's used both for what the current function can use without
saving and restoring, and for what the target of a call insn might clobber.
Most uses are for the latter.

The global set still needs to describe the base ABI, since the default
assumption is that a call insn uses the base ABI unless something says
otherwise.  TARGET_HARD_REGNO_CALL_PART_CLOBBERED can be used for call
insns that use other ABIs, so it handles the case in which the current
function is the caller.  But there's no hook for handling other ABIs
when the current function is the callee.

So the purpose of the new hook is to split out what the current function
needs to save and restore as a separate bit of data that doesn't affect
call insns.  Since it's an rtl-level thing, I thought it'd be better
to put it in crtl (i.e. the rtl info about the current function).
That way we only need to set it once at the start of the rtl passes,
rather than every time we change cfun via targetm.set_current_function.
(Well, longer-term, we might want to set it after RA too, so that the
scratch registers only include what the prologue and epilogue actually
save & restore.)

Targets can already use targetm.hard_regno_scratch_ok and HARD_REGNO_RENAME_OK
to stop the post-RA passes using a particular register.  I guess that's
been enough for what targets have needed till now, and we could use them
for the correctness thing I mentioned above.  But that wouldn't solve the
RA problem.  Making the information directly available seems better than
adding more hacks around it.

Thanks,
Richard
Richard Biener March 26, 2019, 8:13 a.m. UTC | #4
On Mon, Mar 25, 2019 at 6:48 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Tue, Mar 12, 2019 at 10:50 AM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Steve Ellcey <sellcey@marvell.com> writes:
> >> > Richard,
> >> >
> >> > I don't necessarily disagree with anything in your comments and long
> >> > term I think that is the right direction, but I wonder if that level of
> >> > change is appropriate for GCC Stage 4 which is where we are now.  Your
> >> > changes would require fixes in shared code, whereas setting
> >> > REG_ALLOC_ORDER only affects Aarch64 and seems like a safer change.
> >>
> >> I guess it's weighing invasiveness in terms of lines of code/location of
> >> code vs. invasiveness in terms of the effect the code has.  Changing
> >> REG_ALLOC_ORDER affects all functinos that use significant numbers of
> >> SIMD registers, which could have unexpected knock-on effects like
> >> performance regressions or exposing latent bugs.  It would also make
> >> the RA try V24 before V16 for callers of vector PCS functions,
> >> even though they should prefer V16.
> >>
> >> Keeping the change specific to callees that use the new vector PCS
> >> feature seems more conservative from that point of view.
> >>
> >> > I am not sure how long it would take me to implement something along
> >> > the lines of what you are suggesting.
> >>
> >> OK, in that case, I thought I'd give a go.
> >>
> >> This patch adds a new target hook that says which registers a function
> >> can use without saving them in the prologue and restoring them in the
> >> epilogue.  By default this is call_used_reg_set.
> >>
> >> The patch only makes IRA use the hook, and in that context it's just
> >> an (important) optimisation.  But I think we need the same thing for
> >> passes like regrename, where it'll be a correctness issue.  I'll follow
> >> on with patches there if this one is OK.
> >>
> >> Tested on aarch64-linux-gnu (with and without SVE) and x86_64-linux-gnu.
> >> OK to install?
> >
> > So why is call_used_reg_set not updated appropriately?  As you say
> > if you introduce sth else it becomes a correctness issue to use that "else"
> > rather than call_used_reg_set which was OK up until now.  I wonder how
> > other targets with a per-function different ABI handle this situation.  Or are
> > you the first and others at most affect calling conventions here?  That is,
> > it looks like making (parts of?) this_taret_hard_regs per function or
> > re-computed at set_cfun time.  There's alrady a hook for that btw,
> > targetm.set_current_function ().  Why not change call_used_reg_set there?
>
> The problem is that call_used_reg_set is a global that describes the
> base ABI.  It's used both for what the current function can use without
> saving and restoring, and for what the target of a call insn might clobber.
> Most uses are for the latter.
>
> The global set still needs to describe the base ABI, since the default
> assumption is that a call insn uses the base ABI unless something says
> otherwise.  TARGET_HARD_REGNO_CALL_PART_CLOBBERED can be used for call
> insns that use other ABIs, so it handles the case in which the current
> function is the caller.  But there's no hook for handling other ABIs
> when the current function is the callee.
>
> So the purpose of the new hook is to split out what the current function
> needs to save and restore as a separate bit of data that doesn't affect
> call insns.  Since it's an rtl-level thing, I thought it'd be better
> to put it in crtl (i.e. the rtl info about the current function).
> That way we only need to set it once at the start of the rtl passes,
> rather than every time we change cfun via targetm.set_current_function.
> (Well, longer-term, we might want to set it after RA too, so that the
> scratch registers only include what the prologue and epilogue actually
> save & restore.)
>
> Targets can already use targetm.hard_regno_scratch_ok and HARD_REGNO_RENAME_OK
> to stop the post-RA passes using a particular register.  I guess that's
> been enough for what targets have needed till now, and we could use them
> for the correctness thing I mentioned above.  But that wouldn't solve the
> RA problem.  Making the information directly available seems better than
> adding more hacks around it.

Fair enough.  I'm not too deep into this but it seems to me this is an
area that needs proper documentation and a well-designed extension - both
don't seem like a good fit for stage 4.

Richard.

> Thanks,
> Richard
diff mbox series

Patch

Index: gcc/target.def
===================================================================
--- gcc/target.def	2019-03-08 18:14:26.113008680 +0000
+++ gcc/target.def	2019-03-12 09:34:25.447913041 +0000
@@ -5763,6 +5763,18 @@  The default version of this hook always
  default_hard_regno_scratch_ok)
 
 DEFHOOK
+(function_scratch_regs,
+ "This hook sets @var{regs} to the set of registers that function @var{fn}\n\
+can use without having to save them in the prologue and restore them in the\n\
+epilogue.\n\
+\n\
+By default this set is the same as @code{call_used_reg_set}.  Targets only\n\
+need to override the hook if some functions implement alternative ABIs that\n\
+save more registers or fewer registers than normal.",
+ void, (struct function *fn, HARD_REG_SET *regs),
+ default_function_scratch_regs)
+
+DEFHOOK
 (hard_regno_call_part_clobbered,
  "This hook should return true if @var{regno} is partly call-saved and\n\
 partly call-clobbered, and if a value of mode @var{mode} would be partly\n\
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	2019-03-08 18:14:25.577010718 +0000
+++ gcc/doc/tm.texi.in	2019-03-12 09:34:25.443913054 +0000
@@ -1705,6 +1705,11 @@  of @code{CALL_USED_REGISTERS}.
 @cindex call-used register
 @cindex call-clobbered register
 @cindex call-saved register
+@hook TARGET_FUNCTION_SCRATCH_REGS
+
+@cindex call-used register
+@cindex call-clobbered register
+@cindex call-saved register
 @hook TARGET_HARD_REGNO_CALL_PART_CLOBBERED
 
 @hook TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	2019-03-08 18:14:25.573010734 +0000
+++ gcc/doc/tm.texi	2019-03-12 09:34:25.443913054 +0000
@@ -1894,6 +1894,19 @@  of @code{CALL_USED_REGISTERS}.
 @cindex call-used register
 @cindex call-clobbered register
 @cindex call-saved register
+@deftypefn {Target Hook} void TARGET_FUNCTION_SCRATCH_REGS (struct function *@var{fn}, HARD_REG_SET *@var{regs})
+This hook sets @var{regs} to the set of registers that function @var{fn}
+can use without having to save them in the prologue and restore them in the
+epilogue.
+
+By default this set is the same as @code{call_used_reg_set}.  Targets only
+need to override the hook if some functions implement alternative ABIs that
+save more registers or fewer registers than normal.
+@end deftypefn
+
+@cindex call-used register
+@cindex call-clobbered register
+@cindex call-saved register
 @deftypefn {Target Hook} bool TARGET_HARD_REGNO_CALL_PART_CLOBBERED (rtx_insn *@var{insn}, unsigned int @var{regno}, machine_mode @var{mode})
 This hook should return true if @var{regno} is partly call-saved and
 partly call-clobbered, and if a value of mode @var{mode} would be partly
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	2019-03-08 18:14:25.849009683 +0000
+++ gcc/targhooks.h	2019-03-12 09:34:25.447913041 +0000
@@ -286,5 +286,6 @@  extern bool speculation_safe_value_not_n
 extern rtx default_speculation_safe_value (machine_mode, rtx, rtx, rtx);
 extern void default_remove_extra_call_preserved_regs (rtx_insn *,
 						      HARD_REG_SET *);
+extern void default_function_scratch_regs (struct function *, HARD_REG_SET *);
 
 #endif /* GCC_TARGHOOKS_H */
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	2019-03-08 18:14:25.845009699 +0000
+++ gcc/targhooks.c	2019-03-12 09:34:25.447913041 +0000
@@ -2379,4 +2379,10 @@  default_remove_extra_call_preserved_regs
 {
 }
 
+void
+default_function_scratch_regs (struct function *, HARD_REG_SET *regs)
+{
+  COPY_HARD_REG_SET (*regs, call_used_reg_set);
+}
+
 #include "gt-targhooks.h"
Index: gcc/emit-rtl.h
===================================================================
--- gcc/emit-rtl.h	2019-03-08 18:15:33.700751752 +0000
+++ gcc/emit-rtl.h	2019-03-12 09:34:25.443913054 +0000
@@ -292,6 +292,9 @@  struct GTY(()) rtl_data {
      sets them.  */
   HARD_REG_SET asm_clobbers;
 
+  /* Registers that the function can use without having to save them first.  */
+  HARD_REG_SET scratch_regs;
+
   /* The highest address seen during shorten_branches.  */
   int max_insn_address;
 };
Index: gcc/function.c
===================================================================
--- gcc/function.c	2019-03-08 18:15:33.660751905 +0000
+++ gcc/function.c	2019-03-12 09:34:25.447913041 +0000
@@ -4875,6 +4875,7 @@  init_dummy_function_start (void)
 {
   push_dummy_function (false);
   prepare_function_start ();
+  COPY_HARD_REG_SET (crtl->scratch_regs, call_used_reg_set);
 }
 
 /* Generate RTL for the start of the function SUBR (a FUNCTION_DECL tree node)
@@ -4888,6 +4889,8 @@  init_function_start (tree subr)
   initialize_rtl ();
 
   prepare_function_start ();
+  targetm.function_scratch_regs (DECL_STRUCT_FUNCTION (subr),
+				 &crtl->scratch_regs);
   decide_function_section (subr);
 
   /* Warn if this value is an aggregate type,
Index: gcc/ira-color.c
===================================================================
--- gcc/ira-color.c	2019-03-08 18:15:26.824777889 +0000
+++ gcc/ira-color.c	2019-03-12 09:34:25.447913041 +0000
@@ -1660,7 +1660,7 @@  calculate_saved_nregs (int hard_regno, m
   ira_assert (hard_regno >= 0);
   for (i = hard_regno_nregs (hard_regno, mode) - 1; i >= 0; i--)
     if (!allocated_hardreg_p[hard_regno + i]
-	&& !TEST_HARD_REG_BIT (call_used_reg_set, hard_regno + i)
+	&& !TEST_HARD_REG_BIT (crtl->scratch_regs, hard_regno + i)
 	&& !LOCAL_REGNO (hard_regno + i))
       nregs++;
   return nregs;
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	2019-03-08 18:15:38.228734542 +0000
+++ gcc/config/aarch64/aarch64.c	2019-03-12 09:34:25.427913117 +0000
@@ -1706,6 +1706,19 @@  aarch64_simd_call_p (rtx_insn *insn)
   return aarch64_simd_decl_p (fndecl);
 }
 
+/* Implement TARGET_FUNCTION_SCRATCH_REGS.  Vector PCS functions have
+   fewer available scratch registers than normal functions do.  */
+
+static void
+aarch64_function_scratch_regs (function *fn, HARD_REG_SET *scratch_regs)
+{
+  default_function_scratch_regs (fn, scratch_regs);
+  if (aarch64_simd_decl_p (fn->decl))
+    for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+      if (FP_SIMD_SAVED_REGNUM_P (regno))
+	CLEAR_HARD_REG_BIT (*scratch_regs, regno);
+}
+
 /* Implement TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS.  If INSN calls
    a function that uses the SIMD ABI, take advantage of the extra
    call-preserved registers that the ABI provides.  */
@@ -19208,6 +19221,9 @@  #define TARGET_MODES_TIEABLE_P aarch64_m
 #define TARGET_HARD_REGNO_CALL_PART_CLOBBERED \
   aarch64_hard_regno_call_part_clobbered
 
+#undef TARGET_FUNCTION_SCRATCH_REGS
+#define TARGET_FUNCTION_SCRATCH_REGS aarch64_function_scratch_regs
+
 #undef TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS
 #define TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS \
   aarch64_remove_extra_call_preserved_regs
Index: gcc/testsuite/gcc.target/aarch64/pr89628.c
===================================================================
--- /dev/null	2019-03-08 11:40:14.606883727 +0000
+++ gcc/testsuite/gcc.target/aarch64/pr89628.c	2019-03-12 09:34:25.447913041 +0000
@@ -0,0 +1,25 @@ 
+/* { dg-options "-O2" } */
+
+typedef __Float32x4_t vec;
+
+__attribute__((aarch64_vector_pcs))
+vec f(vec a0, vec a1, vec a2, vec a3, vec a4, vec a5, vec a6, vec a7)
+{
+  vec t0, t1, t2, t3, t4, t5, t6, t7, s0, s1, s2, s3;
+  t0 = a0 - a7;
+  t1 = a1 - a6;
+  t2 = a2 - a5;
+  t3 = a3 - a4;
+  t4 = a4 - a3;
+  t5 = a5 - a2;
+  t6 = a6 - a1;
+  t7 = a7 - a0;
+  s0 = t0 * t1;
+  s1 = t2 * t3;
+  s2 = t4 * t5;
+  s3 = t6 * t7;
+  return s0 * s1 * s2 * s3 * a0 * a1 * a2 * a3 * a4 * a5 * a6 * a7;
+}
+
+/* { dg-final { scan-assembler-not '\tldr\t' } } */
+/* { dg-final { scan-assembler-not '\tstr\t' } } */